Efecto Ev
Efecto Ev
2, July 2003
ABSTRACT This paper presents the procedures and findings of a systematic review of
research on the impact of testing on students’ motivation for learning. The review was
undertaken to provide evidence in relation to claims that, on the one hand, testing raises
standards and, on the other, that testing, particularly in high stakes contexts, has a negative
impact on motivation for learning that militates against preparation for lifelong learning.
Motivation is considered as a complex concept, closely aligned with ‘the will to learn’, and
encompassing self-esteem, self-efficacy, effort, self-regulation, locus of control and goal
orientation. The paper describes the systematic methodology of the review and sets out the
evidence base for the findings, which serve to substantiate the concern about the impact of
summative assessment on motivation for learning. Implications for policy and practice are
drawn from the findings.
Introduction
In this paper we report a review of research carried out to identify evidence of any
impact of testing and other forms of summative assessment on students’ motivation
for learning. Our findings are framed by the reasons for the review, its funding,
timing, methods and focus and the meaning of key terms; thus discussion of these
things forms an important part of this paper. The review was conducted during 2000
and 2001 following the procedures for systematic review of research in education
being developed at that time by the government funded Evidence for Policy and
Practice Information and Co-ordinating Centre (EPPI-Centre). These procedures
differ in several respects from those of narrative reviews. We therefore begin by
setting out the background to the review, our view of the meaning of key terms and
an account of the review methodology. The main section gives the findings of the
review. We conclude with some implications for policy and practice that emerged
from discussing the findings with policy makers and practitioners.
Background
There were two sets of circumstances coinciding to bring about the particular focus
of this review: one relating to the topic and the other to the review methodology.
ISSN 0969-594X print; ISSN 1465-329X online/03/020169-39 2003 Taylor & Francis Ltd
DOI: 10.1080/0969594032000121270
170 W. Harlen & R. Deakin Crick
These circumstances help to explain the choice of what was included and what was
not covered by the review.
the use of extrinsic motivation is problematic and that intrinsic motivation and
self-regulated learning is important to continued learning both within and outside
school. Crooks also drew attention to research that indicated problems associated
with extrinsic motivation in tending to lead to ‘shallow’ rather than ‘deep’ learning.
Ames’ (1992) review was concerned to look at achievement goals and to identify
the situations and instructional strategies that lead to motivation towards desired
goals. She contrasted learning goals with performance goals. In searching for
conditions which affect students’ motivation for learning she cited research which
indicates that social comparisons have a strong role in this respect. Students who are
compared unfavourably and publicly with their peers have low self-esteem in relation
to learning, avoid risks and use less effective and more superficial learning strategies.
Not only do their own perceptions of themselves as learners suffer but this percep-
tion becomes shared by their peers. She cites Grolnick and Ryan’s (1987) findings
that when assessment is perceived as ‘an attempt to control rather than inform,
meta-cognitive processes are short-circuited’ (p. 265).
A review by McDonald (2001) was specifically focused on test anxiety and its
impact on students’ performance. His concern was to look at evidence relating to
students at school, since he notes that conflicting conclusions about the impact of
test anxiety on performance may have resulted from many studies having been
carried out in experimental situations with those who have left compulsory edu-
cation. He found studies difficult to synthesise on account of the different instru-
ments used to assess test anxiety. Where there was a distinction between general
fears and test anxiety (fear of negative assessment) it was found that whilst the
former decrease with age, the latter increases with age. Females were found to score
more highly on test anxiety than males. In relation to performance, there was
considerable evidence from a range of countries and across academic subjects, of a
negative relationship between test anxiety and test performance. Although there
were also studies which reported no relationship, McDonald concluded that overall
the influence is negative and large enough to make the difference between passing
and failing a test for at least one fifth of the students.
Two reviews, by Madaus and Clarke (1999) and McNeil and Valenzuela (1998)
were presented at a conference on High Stakes Testing K–12 held at Harvard
University in December 1998. They had a specific focus on research relating to
issues of high stakes testing in the USA. Madaus and Clarke focused on the impact
of high stakes testing on minority students, drawing mainly on research conducted
at Boston College’s Centre for the Study of Testing, Evaluation and Educational
Policy. They used the research to identify not only the existence of impact but also
how high stakes testing comes to influence what is taught and learned. They point
out that such influence is deliberate in a context of ‘measurement-driven instruction’
and show that teachers use past examination papers to define the curriculum, paying
attention not just to the content but also the form of the test. They discuss the
impact on student motivation and on student dropout rate. They conclude that:
McNeil and Valenzuela (1998) reviewed evidence of the impact of high stakes
testing in general and of the Texas Assessment of Academic Skills (TAAS) in
particular. Like Madaus and Clarke, their focus was on the impact on minority and
economically disadvantaged students. They present an analysis of studies from
which they conclude that
behind the rhetoric of rising test scores are a growing set of classroom
practices in which test-prep activities are usurping a substantive curricu-
lum. These practices are more widespread in those schools where adminis-
trator pay is tied to test scores and where test scores have been historically
low. (McNeil & Valenzuela, 1998, p. 2)
In such schools, mostly attended by African-American and Latino students, the
pressure has meant that ‘a regular education has been supplanted by activities whose
sole purpose is to raise test scores on this particular test’ (McNeil & Valenzuela,
1998, p. 2). McNeil and Valenzuela highlight the distortion of educational expendi-
ture—away from high quality curriculum resources towards test-preparation materi-
als which have little educational benefit beyond the test.
learner’s emotional state, beliefs, interests, goals and habits of thinking. The second
refers to the learner’s creativity, higher order thinking and natural curiosity that
contribute to intrinsic motivation to learn. Intrinsic motivation for learning is
stimulated by tasks of optimal novelty and difficulty, relevant to personal interests
and providing for personal choice and control. The third principle has to do with the
effect of motivation on extended learner effort and guided practice—without motiv-
ation to learn, the willingness to exert this effort is unlikely without coercion.
These three broad principles indicate the range of factors that have to be taken
into account when considering motivation for learning. They have to do with the
learner’s sense of self, expressed through values and attitudes; with the learner’s
engagement with learning, including their sense of control and efficacy; and with the
learner’s willingness to exert effort to achieve a learning goal.
effort in learning, and is that which engages their motivation to process, perform and
develop as a learner over time.
Common to many theories which have been built around the concept of motiv-
ation is reference to goal orientation. People who commit themselves to a goal will
direct their attention towards actions that help them to attain that goal and away
from other actions. Research indicates that students with learning goals (also known
as task involved or mastery goals) show more evidence of superior learning strate-
gies, have a higher sense of competence as learners, show greater interest in school
work and have more positive attitudes to school than do students with performance
(achievement or ego-involving) goals (Ames, 1990a,b; Dweck, 1992).
There are many reasons why a goal may or may not be embraced. In their review
of research evidence Kellaghan et al. (1996) suggest that these include: firstly the
need for an individual to comprehend the goal; secondly that the goal needs to be
reachable yet challenging; thirdly that individuals should believe that their efforts to
reach the goal will be successful and fourthly that attainment of the goal should lead
to actual benefit for the individual.
mined intrinsic motivation across a wide range of activities, populations and types of
reward. However, Hidi (2000) challenged these conclusions, pointing out that they
were drawn from studies only relating to activities that were interesting, excluding
uninteresting tasks. From their review of research on the role of interests and goals
on achievement, Hidi and Harackiewicz (2000) concluded that the dichotomy
between intrinsic and extrinsic motivation is unhelpful and that it is time to seek
‘optimal combinations’. This may be particularly necessary for students lacking
interest and intrinsic motivation for academic studies.
Procedures
The Review Questions
The first step in the systematic procedures employed in this review was to identify
a review question at an appropriate level of specificity. The specification of the
review question requires a balance between being too general and too specific. This
balance is particularly critical in education, where contexts, processes and outcomes
are complex. To focus a question too narrowly has several disadvantages, despite the
obvious potential for identifying relevant studies more precisely. Reducing the
question to a specified outcome of a single controllable factor risks, firstly, not
finding any studies exactly addressing this question and, secondly, if there are such
studies, being unable to relate their findings to the real situation of classroom
practice. On the other hand, to have too broad a question means that it is difficult
to extract specific evidence from the background of ‘noise’ in a range of studies
which are of relevance to the general debates in the area of the review. In the present
178 W. Harlen & R. Deakin Crick
review it was found essential to keep the focus on student outcomes relevant to
motivation that could be ascribed to the effect of summative assessment. Other
student outcomes, such as achievement, were not considered unless motivation was
also reported and other impacts of summative assessment, such as on the curriculum
and classroom practice, were only considered in relation to their mediation of the
impact of assessment on student motivation. Thus the overall review question was
expressed as:
What is the evidence of the impact of summative assessment and testing on students’
motivation for learning? In order to achieve the aim of the review it was necessary to
address the further questions:
• How does any impact vary with the characteristics of the students and the
conditions of the assessment or testing?
• In those studies where impact on students has been reported, what is the evidence
of impact on teachers and teaching?
• What actions in what circumstances would increase the positive and decrease the
negative impact on students of summative testing and assessment programmes? In
particular, what is the evidence that any impact is increased by ‘raising’ the stakes?
• What are the implications for assessment policy and practice of these findings?
Literature Search
The review question served as a framework in the search for studies. All the relevant
electronic databases, journals held in accessible libraries and those on-line (which
were very limited at the time of this review) were searched, citations in earlier
reviews and in obtained papers were followed up and personal contacts used to
obtain further references. This step, as all others of the review, was fully docu-
mented, recording, for example, dates of journals that were hand-searched and
procedures for searching data-bases, so that the extent of the search was made
explicit and the review can be updated later by reference to studies not included to
date. The number of studies relevant to the review question found in this way was
183. Details of these, including abstracts, were entered into a data base. A list of
these studies can be found in the full report of the review 具http://eppi.ioe.ac.uk/EP-
PIWeb/home.aspx?page ⫽ /reel/review groups/assessment/review one.htm典.
terms of a set of key-words, relating, for example, to their source, study type, age
range and type of outcome reported. To check reliability in applying key-words, 30
studies were key-worded by two people. Agreement was considerable and differ-
ences helped in defining terms. Key-wording was useful in drawing attention to
studies not meeting the criteria but which slipped through at earlier stages. For
instance, if a study could not be categorised in terms of an assessment form and a
motivation outcome it was re-coded as excluded. Sixty-one studies were not empiri-
cal studies but were reviews or were of sufficient relevance to be placed in a separate
database labelled for use in background discussion and possible guidance in relation
to recommendations.
Weight of evidence
L ⫽ low; M ⫽ medium; H ⫽ high ‘What I feel ‘How I
and think perceive my
Method Relevance Relevance about ‘The energy capacity to
ological of study of study Type of myself as a I have for under-take
Study quality type topic Overall intervention Age group Country learner’ the task.’ the task.’
Type of outcome
Design types Type of intervention ( ⬎ 1 per study) Country of origin Overall evidence weighting
Outcome:
RCT* 3 National Curriculum tests: 4 Effort 9 Canada 1 High 12
Case control 3 State tests: 3 Self-efficacy 4 Israel 1 Medium 6
Post-tests 3 11 ⫹ Self-esteem 7 Morocco 1 Low 1
Other design 4 (Northern Ireland) 2 Interest 3 UK 8
Process: 3 Classroom Attitude to tests 5 USA 8
Descriptive: 3 assessment: 5 Test anxiety 3
16 ⫹ exams 2 Learning -
Experimental (feedback) 1 Disposition 3
Experimental (other) 2 Self-regulation 2
Self as a learner 4
to which it contributed. Table II summarises information about the design types and
types of outcome reported.
Synthesis of Findings
Lengthy consideration was given to the various ways in which the findings of
different studies could be brought together to form conclusions. In this review of the
impact of testing on motivation for learning the research question sets up summative
assessment and testing (the naturalistic or experimental intervention) as the inde-
pendent variable, and motivation for learning as the dependent variable. However
there is no single dependent variable which can be measured as an outcome, since,
as discussed earlier, motivation for learning is a complex human attribute that is
thought to be evidenced by a range of variables, each of which have affective,
conative and cognitive dimensions. Nor are summative procedures the only factor
affecting this complex overarching concept. A simplified view of the relationship is
attempted in Figure 1.
None of the studies dealt with all the variables included in the concept of
motivation for learning but they could be grouped according to the particular
outcomes that were investigated in each. These outcomes fell into three distinct and
overarching variables that were found to be integral to motivation for learning.
Expressed from a learner’s perspective these are:
‘What I feel and think about myself as a learner.’
(Related to self-esteem, self-concept, sense of self as a learner, attitude to
assessment, test anxiety, learning disposition)
‘The energy I have for the task.’
(Related to effort, interest in and attitude to subject, self-regulation)
‘How I perceive my capacity to undertake the task.’
(Related to locus of control, goal orientation, self-efficacy)
Thus the task of synthesising the studies, to answer the main review question was
tackled through focusing on the impact of tests on students’ motivation for learning,
examined through these three overarching themes which are deemed to be integral
to it.
Consultation
The final phase of the methodology was to present the findings in progress to a peer
group drawn together by the ALRSG. This conference included 45 experts, repre-
senting teacher practitioners (4), Local Authority or independent advisors (7),
Government or government agency representatives (11), teacher educators (8) and
academics with research interests in assessment (6) and policy (9). A draft copy of
the review was sent to all participants before the conference, and the methodology
and findings were presented in detail during the conference. There were no
significant problems or concerns expressed relating to the methodology, nor to the
theoretical framework utilised to analyse the findings. In the second part of the
Testing and Motivation for Learning 183
FIG. 1. Some of the variables relating to motivation and factors affecting them.
Self-esteem
Two studies concerned the Northern Ireland end of primary school selection
examination (known as the 11⫹ tests). Johnston and McLune (2000) investigated
the impact on teachers, students and students’ learning processes in science lessons
through interviews, questionnaires and classroom observations. Leonard and Davey
(2001) reported the students’ perspectives of the process of preparing for, taking and
coming to terms with the results of the 11⫹ tests.
Johnston and McLune (2000) used several instruments to measure students’
learning dispositions, self-esteem, locus of control and attitude to science and
related these to the transfer grades obtained by the students in the 11⫹ examination.
The measures were the Learning Combination Inventory (Johnston, 1996), the B/G
Steem scale for primary pupils (Maines & Robinson, 1996) and the Locus of
Control Scale for Students (Norwicki, 1973). From the Learning Combination
Inventory, they found four main learning dispositions:
• ‘precise processing’ (preference for gathering, processing and utilising lots of data,
which gives rise to asking and answering many questions and a preference for
demonstrating learning through writing answers and factual reports);
• ‘sequential processing’ (preference for clear and explicit directions in approach-
ing learning tasks);
• ‘technical processing’ (preference for hands on experience and problem solving
tasks; willingness to take risks and to be creative);
• ‘confluent processing’ (typical of creative and imaginative thinkers, who think in
terms of connections and links between ideas and phenomena and like to see
the ‘bigger picture’).
Classroom observation showed that teachers were teaching in ways that gave
priority to sequential processing and linked success and ability in science to precise/
sequential processing. The statistical analysis showed a positive correlation between
precise/sequential learning dispositions and self-esteem. The more positive a stu-
dent’s disposition towards precise/sequential or technical processing the higher their
self-esteem and the more internal their locus of control. Conversely the more
confluent the pupils’ learning orientation the more external their locus of control
and the lower their self-esteem. Interviews with teachers indicated that they felt the
need to teach through highly structured activities and transmission of information
on account of the nature of the selection tests. However, the learning dispositions of
students showed a preference for technical processing, that is, through first hand
exploration and problem-solving. Thus teachers may be valuing precise/sequential
processing approaches to learning more than other approaches and in so doing may
Testing and Motivation for Learning 185
summative in purpose. Students realised that whilst effort was encouraged, it was
achievement that counted. Indeed in the early 1990s, the researchers suggested that
pupils did interpret class assessment interactions with their teacher as helping them
in ‘knowing what to do and avoiding doing it wrongly’. But in later years the
students were much less positive about assessment interactions that revealed their
weaknesses. They reported anxiety, tension and uncertainty in relation to teachers’
assessment. Pollard et al. (2000) suggested that the anxiety that students felt was
arguably a consequence of being exposed to greater risk as performance became
more important in the teacher’s eyes. They concluded that assessment had a severely
reduced role in helping learning and became concerned only with achievement as
measured by testing, and there was evidence that students were all too aware of this.
Leonard and Davey (2001) reported that students’ reactions to the Northern
Ireland 11⫹ tests, with their explicit high stakes for the students’ futures, were
particularly strong. They reported that the majority of students approached the tests
with fear and anxiety. The students’ drawings gave evidence of the negative feelings
for the whole process: only four out of 193 drawings collected could be interpreted
as positive towards the tests. Those confident of passing were likely to be more
positive to testing but, as in the Pollard et al. (2000) study, the initial excitement and
novelty of taking practice tests soon wore off. Leonard and Davey (2001) found that
students across all grade levels tended to be highly critical of the 11⫹ and wanted it
to be abolished. Given that selection was inevitable, they favoured instead continu-
ous assessment by the teacher
Reay and Wiliam (1999) noted that all the students in the class they observed,
except the most able boy, expressed anxiety about failure, with girls more anxious
than boys. As in the Northern Ireland study, students also disliked the tests,
particularly their narrow focus, and did not feel that they could do their best under
test conditions.
The association of test anxiety with other characteristics was the subject of
Benmansour’s (1999) study of high school mathematics students in Morocco. Using
questionnaire data, Benmansour found four factors in the measurement of goal
orientation and related these to test-anxiety self-efficacy and learning strategies. He
found that students with strong orientation to getting good grades had high levels of
test anxiety and made greater use of passive rather than active learning strategies.
Students with a stronger intrinsic motivation (a desire to learn mathematics out of
interest) showed a negative relation with test anxiety and a greater use of active
learning strategies. He also found greater levels of test anxiety in girls than boys.
Although cause and effect cannot be unravelled by this study, it does suggest that
test anxiety is related to the use of passive learning strategies and extrinsic motiv-
ation.
comments only. The study of Pollard et al. (2000) confirms that interest and effort
are related and students will put in effort and practice in tasks that interest them.
Thus Butler’s conclusions about feedback can be related to the effort that students
will put into tasks. She concluded that promoting task involvement by giving task
related, non-ego-involving, feedback may promote the interest and performance of
most students.
Roderick and Engel (2001) reported the impact of a quite different approach to
encouraging effort, by using the threat of consequences of failing tests. This study
was the only one of the 19 that involved large proportions of minority students.
It was concerned with the effect of the introduction in 1999 by the Chicago public
schools (CPS) of a requirement for students in the third, sixth and eighth grades
to achieve a minimum cut-off score in reading and mathematics on the Iowa Tests
of Basic Skills (ITBS) in order to qualify for the next grade, instead of automatic,
social promotion from grade to grade. Roderick and Engel investigated the
impact of this policy on 6th and 8th grade students. Their sample consisted of
students at risk of being retained; thus they were already seen as having failed at
school. All were Afro-American or Latino and many had language or other
difficulties and/or home background problems. Baseline data collection included a
student interview (semi-structured), collection of student records, and teacher
assessments. The teacher assessments asked teachers to report on a variety of areas
of student performance using a Likert scale. Following the baseline interview,
students were interviewed a second time immediately after taking the ITBS and
once during the summer. Retained students were interviewed twice during their
retained year.
Roderick and Engel (2001), drawing on questions from the base line interviews to
code work effort, put students into four groups: those who were working harder in
school as a result of the intervention (53% of the students); those working harder
but outside of school, supported by other adults (9%); those who were ‘worrying but
not working’ (34%); and those who were the most highly skilled in the sample and
had already met targets in at least one subject (4%). Across the groups there were
differences in age, gender and race. Eighth graders worked harder than 6th graders,
males less than females and Latinos were more likely to be worrying and not working
than Afro-Americans. Striking differences according to school support were noted.
A school giving high support was markedly more successful in terms of student effort
than a similar school which gave little support. High support meant creating an
environment of social and educational support, working hard to increase students’
sense of self-efficacy, focusing on task-centred goals, making goals explicit, using
assessment to help pupils succeed and having a strong sense of responsibility for
their students. Low teacher support meant teachers not seeing the target grades as
attainable, not translating the need to work harder into meaningful activities, not
displaying recognition of change and motivation on the part of students, not making
personal connections with students in relation to learning goals.
Effort was found to be related to outcome. Almost all students making an effort
passed the test at the required level, whilst only a third of students not making an
effort did so. The authors conclude that although the majority of students responded
192 W. Harlen & R. Deakin Crick
to the policy, the use of testing as a negative incentive means that some students will
fail, and these will be the most vulnerable. However, an important finding is that
schools can, by giving the kind of help described for the supporting school, raise
students’ achievement. The authors claimed that tests on their own, without this
kind of support, do not raise achievement.
Self-regulated Learning
In a study carried out in Canada, Perry (1998) observed the effect on young
children’s effort and control over learning in classrooms that differed in features
related to self-regulated learning (SRL). Students in three classes that were judged
as being high in encouraging SRL were compared with two classes of low SRL. The
high SRL teachers offered complex activities, offered students choices, enabled them
to control the amount of challenge, to collaborate with peers and to evaluate their
work. The low SRL teachers were more controlling, offered few choices and their
assessments of their own work were limited to mechanical features (spelling, punctu-
ation, etc). Data were collected by questionnaire and interview from the grade 2 and
3 children and classrooms were observed. Both questionnaire and interview data
pointed to the children in the high SRL classrooms having interest in their work and
being motivated by this (intrinsic motivation). ‘They indicated a task focus when
choosing topics or collaborators for their writing and focused on what they had
learned about a topic and how their writing had improved when they evaluated their
writing products. In contrast the students in the low SRL classrooms were more
focused on their teacher’s evaluations of their writing and how many they got right
on a particular assignment. Both the high and low achievers in these classes were
concerned with getting ‘a good mark’ (p. 723).
Perry’s (1998) findings compare interestingly with those of Pollard et al. (2000)
that children tend to judge their own work in terms of whether it is neat, correct and
completed, following the criteria that they perceive their teachers to be using. What
Perry adds to this picture is that these criteria can be changed by deliberate action
on the part of the teacher. Benmansour (1999) also notes that emphasising assess-
ment promotes students to embrace extrinsic goals and concludes that ‘In order to
counterbalance the emphasis placed on grades, teachers need to cultivate in students
more intrinsic interest and self-efficacy, which are potentially conducive to the use
of effective strategies and better performance’ (p. 13).
Self-efficacy
Brookhart and DeVoge’s (1999) study of the relationship between perceptions of
task, self-efficacy, effort and achievement, emphasised the role of feedback from
Testing and Motivation for Learning 193
Locus of Control
Johnston and McClune’s (2000) study of the selection test for secondary schools in
Northern Ireland, outlined on page 184, investigated learning disposition (prefer-
ences for different approaches to learning), self–esteem and perceived locus of
control. The authors concluded that there was a close link between performance in
the transfer tests, students’ learning disposition, student self-esteem and pupil locus
of control. There was also a significant gender difference in learning dispositions.
Students who favoured the more structured ‘precise/sequential processing’ ap-
proach to learning had a higher self-esteem than those who favoured a more
exploratory and creative way of learning. This was possibly because precise/sequen-
tial processing aligned with the teaching approach adopted by the science teachers.
Those with other preferences were unable to use their preferred learning style and
their self-esteem as learners suffered. The researchers’ classroom observations
showed that teaching and learning was strongly focused on transmission of factual
knowledge, with much less emphasis on experiential learning and conceptual under-
standing in preparation for the selection tests and teachers felt that they had to teach
in this way on account of the nature of the tests. Thus the existence of the tests was
creating a classroom climate that had a considerable effect on self-esteem and locus
of control.
Goal Orientation
Schunk (1996), in two linked experimental studies, explored self-regulatory pro-
cesses among children who were learning mathematics. In both studies, two groups
of students were randomly assigned to work under either a learning goal or a
performance goal ethos. For the learning goal groups, the teacher introduced the
194 W. Harlen & R. Deakin Crick
task, on manipulating fractions, by saying, ‘While you are working it helps to keep
in mind what you’re trying to do’, and went on: ‘You’ll be trying to learn how to
solve fraction problems where the denominators are the same and you have to add
the numerators’. For the performance goal groups the teacher gave the same first
part of the instruction but did not go on to mention the explicit learning. For all the
groups, the teacher asked the students to repeat the instructions to ensure they made
sense to them. Thus the author claimed that, although there appeared to be a very
small difference between the treatment of the groups, the particular instructions
were registered by the students. In the first study half of each group worked with
self-evaluation and half without. In the second study all students in each goal
condition evaluated their performance. Self-efficacy, motivation and achievement
were measured. Students were randomly assigned to the experimental conditions,
which were implemented in 45-minute instruction sessions over seven days.
Relevant findings for this review are those relating to goal orientation and
self-evaluation. In Study 1 the effect of goal orientation was apparent only when
self-evaluation was absent. Children under self-evaluation conditions and under
learning-goal ethos with no self-evaluation solved significantly more problems than
did those with performance goals and no self-evaluation. Self-evaluation scores for
performance goals and for learning goals were not significantly different. It appeared
from Study 1 that self-evaluation swamped any effect of goal-orientation, so in
Study 2 all students engaged in self-evaluation. With self-evaluation held constant,
the results showed significant effects of goal orientation for self-efficacy and for skill.
The scores of the group working towards learning-goals were significantly higher
than those of the performance-goals group on both measures.
Benmansour’s (1999) study, outlined on page 188, explored Moroccan students’
perceived motivational orientations, self-efficacy, test anxiety and strategies used in
mathematics. High school students studying for the Baccalaureate completed a
self-report questionnaire (in Arabic, which is the language of instruction) designed
to measure motivational goal orientation, self-efficacy and test anxiety. The study
used factor analysis and tests of difference in scores to investigate relations between
these characteristics and their variation with sex.
The findings indicated that self-efficacy was related to higher intrinsic goal
orientations, lower test anxiety and use of a wider repertoire of strategies including
active ones. In terms of frequency of use of active and passive learning strategies,
passive ones were far more frequently used by all students, but intrinsically moti-
vated students were more likely to use active ones as well as passive ones. Although
the generalisability of this study is limited, it points to the conclusion that an
emphasis on assessment is related to greater extrinsic goal orientation in students, to
a lower level of self-efficacy and to a limited use of effective learning strategies.
Overall Level of
weight of Age of achievement of Gender of Conditions
Study evidence students students students testing
Benmansour (1999) H ⫻ ⫻
Brookhart & Devoge (1999) H ⫻
Butler (1988) H ⫻ ⫻
Duckworth et al. (1986) H ⫻
Evans & Engelberg (1988) H ⫻ ⫻ ⫻
Ferguson and Francis (1979) M ⫻
Gordon & Reese (1997) M ⫻ ⫻
Johnston & McClune (2000) H ⫻
Leonard & Davey (2001) H ⫻ ⫻
Little (1994) M ⫻
Paris et al. (1991) M ⫻ ⫻
Perry (1998) M ⫻
Pollard et al. (2000) H ⫻ ⫻ ⫻
Reay & Wiliam (1999) H ⫻ ⫻ ⫻
Roderick & Engel (2001) M ⫻ ⫻ ⫻
Key: H ⫽ high weight of evidence M ⫽ medium weight of evidence L ⫽ low weight of evidence
Testing and Motivation for Learning
195
196 W. Harlen & R. Deakin Crick
Age of Students
Two studies indicated that reactions to grades, attribution and goal orientation vary
with students’ age. Evans and Engelberg’s (1988) study of teachers’ classroom
marking or grading, showed that older students (that is, age 11 and above) were
likely to have a better understanding of simple grades than younger ones. They were
less likely to report teachers’ grades as being fair but attached more importance to
them than did younger children. Pollard et al. (2000) also found that older students
were likely to attribute relative success to effort and ability, whilst younger ones
attributed it to external factors or practice. Older students were more likely to focus
on performance outcomes rather than learning processes.
The findings of Paris et al. (1991) suggest that lower achieving older students were
more likely to minimise effort and respond to test items randomly or by guessing
than younger ones. Thus tests have progressively less validity for these children.
However, under threat of serious consequences for not reaching a required level,
eighth graders were more likely to work harder than sixth graders (Roderick &
Engel, 2001). There is no evidence of age differences in test-taking strategies
(checking, monitoring time, etc.). Indeed it was reported that instead of increasing
motivation and ‘test wiseness’ with increasing age, older students feel more resent-
ment, anxiety, cynicism and mistrust of standardised achievement tests (Paris et al.,
1991).
Level of Achievement
Studies of summative classroom assessment show that high achieving students are
generally less affected by grading than low achievers (Paris et al., 1991; Pollard et al.,
2000). They have a better understanding of grades and their interest is less
influenced by whether they receive grades or comments or both (Butler, 1988). Not
surprisingly, high achievers think grades are fair, whilst low achievers think they are
influenced by outside factors (Evans & Engelberg, 1988).
Results of tests which are ‘high stakes’ for individual students, such as the 11⫹ ,
have been found to have a particularly strong and devastating impact on those who
receive low grades (Leonard & Davey, 2001). All students were aware of repeated
practice tests and the narrowing of the curriculum and only those confident of
success enjoy the tests (Reay & Wiliam, 1999). In taking tests, high achievers are
more persistent, use appropriate test-taking strategies and have more positive self-
perceptions than low achievers. In other words, they become better at taking tests
and so the gap between high and low achievers is wider on this account than might
be the case in terms of actual understanding and skills. Moreover low achievers
become overwhelmed by assessments and demotivated by constant evidence of their
low achievement thus further increasing the gap. A greater emphasis on summative
assessment thus brings about increased differentiation (Paris et al., 1991; Pollard et
al., 2000).
Evidence on the differential impact of testing on low achieving students emerged
in two studies of state-mandated tests in the USA. Gordon and Reese’s (1997)
exploration of the reactions of teachers in the State of Texas to the TAAS found a
Testing and Motivation for Learning 197
strong perception that tests lowered the self-esteem of students ‘at risk’. Similarly,
Paris et al. (1991) found from information collected about the Michigan State
mandated tests, that high achievers had more positive self-perceptions than low
achievers.
Several studies show evidence that low achievers are doubly disadvantaged by
summative assessment. Being labelled as failures has an impact, not just on current
feelings about their ability to learn, but lowers further their already low self-esteem
thus reducing the chance of future effort and success. But there is evidence that
when low achievers have a high level of support (from school or home), which shows
them how to improve, some do escape from this vicious circle (Roderick & Engel,
2001).
Gender
Differences in learning dispositions of boys and girls were found to have particular
importance in classrooms that favour certain approaches to learning. Johnston and
McClune (2000) found that boys are more likely than girls to prefer hands-on
experiences and problem-solving and girls were more likely to prefer ‘sequential’
processing, that is, to have clear directions to follow. Thus girls are more likely to
have a higher self-esteem in classrooms where the dominant teaching strategy,
moulded by the pressure of tests, favours sequential processing.
At the same time girls were reported as expressing more test anxiety than boys
(Benmansour, 1999; Evans & Engelberg, 1988; Reay & Wiliam, 1999). Girls also
make more internal attributions of success or failure than boys, with consequences
for their self-esteem. No gender differences were found in relation to understanding
grades (Evans & Engelberg, 1988).
Ferguson and Francis (1979) studied modes of examination and motivation of
students taking the GCE ‘O’ level examination in English. At the time of their study
candidates could be entered either for an examination or for continuous course
assessment by teachers. Although there were some differences in attitude towards
the subject resulting from mode of examination, these were not significant. The
significant differences in attitude resulted from gender and to a lesser extent place
of study (school or college).
Conditions of Assessment
The conditions that tend to increase or decrease the negative impact of summative
assessment relate to the degree of self-efficacy of students, the extent to which their
effort is intrinsically or extrinsically motivated, the encouragement of self-regulation
and self-evaluation and the pressure imposed by adults outside the school (Gordon
& Reese, 1997; Perry, 1998; Pollard et al., 2000; Reay & Wiliam, 1999; Roderick &
Engel, 2001).
The importance of self-efficacy in supporting student effort and achievement is a
thread in several studies. Feedback has a central role in this since self-efficacy is
judged from performance in previous tasks of the same kind (Brookhart & DeVoge,
198 W. Harlen & R. Deakin Crick
1999; Butler, 1988; Duckworth et al., 1986). If students have experienced success
in earlier performance they are more likely to feel able to succeed in a new task.
Feedback that focuses on the task is associated with greater interest and effort,
whereas feedback that is ego-involving rather than task-involving is associated with
an orientation to performance goals (Brookhart & DeVoge, 1999; Butler, 1988).
Goal-orientation, effort and interest are all interconnected. Benmansour (1999)
reported that students who are performance orientated have less interest in the task
per se and that students who are task-involved and motivated by interest in the work
are less likely to experience high test anxiety than those motivated by achieving a
high grade (Benmansour, 1999).
Duckworth et al. (1986) reported that feelings of self-efficacy are influenced by
students’ perceptions of teachers’ communication about test expectations. They also
found that teachers’ own class testing practices can help to increase self-efficacy if
teachers explain the purpose and expectations of their tests and provide feedback.
Further, a school’s ‘assessment culture’ influences students’ feelings of self-efficacy
and effort. Collegiality—meaning constructive discussion of testing and the develop-
ment of desirable assessment practice in the school—has a positive effect, whilst a
focus on performance outcomes has a negative effect. Brookhart and DeVoge (1999)
also found that the way in which teachers present and treat classroom assessment
events affects the way students approach them.
Perry (1998) found that students who have some control over their work by being
given choice and who are encouraged to evaluate their own work value the
significant content features of their work rather than whether it is correct or not. In
other classrooms students evaluated their work by reference to surface features, such
as whether it was neat, well presented and ‘right’, as was also found by Pollard et al.
(2000). Thus classrooms that allow more self-regulation promote change in the
criteria students use in self-evaluation. In conditions where self-evaluation operates,
task- or learning-goals promote self-efficacy and achievement (Perry, 1998). Stu-
dents would like their point of view to be taken into account in the tests they
undertake (Leonard & Davey, 2001; Little, 1994).
There is a strong basis of evidence that community pressure is brought to bear on
schools for high scores (Gordon & Reese, 1997; Reay & Wiliam, 1999) when test
scores are a source of pride to parents. Similarly, parents bring pressure on their
children when the result has consequences for attendance at high social status
schools (Leonard & Davey, 2001). For many students this increases students’
anxiety even though they recognised their parents as being supportive (Leonard &
Davey, 2001; Reay & Wiliam, 1999).
Johnston and McClune (2000) found that the existence of external tests has a
constricting effect on the curriculum and on teaching methods. Reay and Wiliam
(1999) reported that emphasis in teaching was based on the content of the tests
(invariably focused on reading and mathematics and occasionally on other aspects of
language and some aspects of science) and much less attention was given to subjects
not tested. Areas particularly neglected are those related to creativity and personal
and social development (Gordon & Reese, 1997; Leonard & Davey, 2001).
When they are accountable for test scores but not for effective teaching, teachers
are reported as expending a great deal of time and effort in preparing students for
the tests (Pollard et al., 2000). They administer practice tests, which take up time
from learning as well as serving to confirm for the low achievers their self-perception
as poor learners. Many teachers also go further and actively coach students in
passing tests rather than spending time helping them to understand what is being
tested (Gordon & Reese, 1997; Leonard & Davey, 2001). Direct teaching on how
to pass the tests can be very effective, so much so that Gordon and Reese (1997)
concluded that students can pass tests ‘even though the students have never learned
the concepts on which they are being tested’ (p. 364). As teachers become more
adept at this process, they can even teach students to answer correctly test items
intended to measure students’ ability to apply, or synthesise, even though the
students have not developed application, analysis or synthesis skills. Not only is the
scope and depth of learning seriously undermined, but this also affects the validity
of the tests, for they no longer indicate that the students have the knowledge and
skill needed to answer the questions correctly.
Even when not teaching directly to the tests, teachers reported changing their
approach. They adjusted their teaching in ways they perceived as necessary because
of the tests, spending most time in direct instruction and less in providing oppor-
tunity for students to learn through enquiry and problem-solving (Johnston &
McClune, 2000).
The extent to which these features of the classroom teaching were the results of
the tests, rather than of some other condition, was illuminated by evidence from
studies which followed the introduction of national testing and by the overwhelming
opinion of teachers in systems where testing has become an established part of their
professional experience. Pollard et al.’s (2000) study, covering the introduction of
the national tests in England, reveals an impact on teachers’ own classroom
assessment practice, lending support to the claim that summative assessment drives
out formative assessment. After the introduction of tests students regarded assess-
ment interactions with their teachers as wholly summative, whereas prior to the tests
the same students had regarded these as helping them to learn. Even though
teachers intended their assessment interactions to be formative, the subtle change in
their discourse indicated a summative, performance-related approach that was
evidently communicated to the students. Such changes could, of course, have been
a natural consequence of dealing with students as they get older. Although research
evidence does support the interpretation that older students take teachers’ assess-
ment more seriously and tend to embrace performance goals more than younger
children, the change over time is not entirely explained in this way.
200 W. Harlen & R. Deakin Crick
Other studies point to a real change in teachers’ behaviour (Johnston & McClune,
2000) and also show how readily students pick up from their teacher the signs of
what is valued and will gain approval. Thus, as teachers become more performance-
centred, students pick up the criteria being used and judge their own work accord-
ingly (Pollard et al., 2000). There is evidence that teachers can influence children’s
self-assessment to focus on learning processes (e.g. Perry, 1998), but students are
unlikely to use such criteria whilst their teachers’ assessment and teaching methods
implicitly, and in some cases explicitly, reflect performance goals.
Roderick and Engel (2001) concluded that fewer students would give up on
themselves as learners if more schools worked to raise these students’ sense of
self-efficacy, by focusing on task- and learning-centred goals and using assessment
to help them succeed. This underlines the importance of formative assessment but
at the same time argues for action that prevents the low self-esteem from developing
in the first place.
• Increase in test anxiety (Benmansour, 1999; Leonard & Davey, 2001; Pollard et
al., 2000).
• Students feeling anxiety as a consequence of their sense of being exposed to
greater risk as their teacher raised the stakes (Pollard et al., 2000).
• Increase in the pressure on students to do well resulting from the aspirations of
parents and teachers (Davies & Brember, 1998; Leonard & Davey, 2001).
• Teaching being focused on the content of the tests and teaching methods confined
to transmission modes which favour sequential learning styles (Johnston & Mc-
Clune, 2000).
• The use of repeated practice tests which impresses on students the importance of
the tests, and leads to students adopting test-taking strategies designed to avoid
effort and responsibility and which are detrimental to higher order thinking (Paris
et al., 1991; Reay & Wiliam, 1999).
These effects are similar in high and low achieving schools (Johnston & McClune,
2000; Pollard et al., 2000) and apply equally to high and low achieving students
(Gordon & Reese, 1997).
Testing and Motivation for Learning 201
• Ensuring that the demands of the tests are consistent with the expectations of
teachers and the capabilities of the students (Duckworth et al. 1986).
• Involving students in decisions about testing (Leonard & Davey, 2001; Little,
1994).
• Developing students’ self-assessment skills and use of learning rather than per-
formance criteria (Pollard et al., 2000; Schunk, 1996).
202 W. Harlen & R. Deakin Crick
b. Share and emphasise with students learning goals, not performance goals, and
provide feedback to students in relation to these goals.
c. Develop and implement a school-wide policy that includes assessment both for
learning (formative) and of learning (summative) and ensure that the purpose of
all assessment is clear to all involved, including parents and students.
d. Develop students’ understanding of the goals of their learning, the criteria by
which is it assessed and their ability to assess their own work.
e. Implement strategies for encouraging self-regulation in learning and positive
interpersonal relationships. Ways of doing this have been developed through
research, for example, by McCombs (1999).
f. Avoid comparisons between students based on test results.
g. Present assessment realistically, as a process which is inherently imprecise and
reflexive, with results that have to be regarded as tentative and indicative rather
than definitive.
to the extent to which their learning strategies help them to create links between new
and existing knowledge and to the extent to which they feel in control of their
learning. The recognition of these valued outcomes could be conveyed, for
example, by requiring that criteria used in school evaluation, including self-evalu-
ation, make explicit reference to a full range of subjects and include spiritual, moral,
social and cultural as well as cognitive aims and an appropriate variety of teaching
methods and learning outcomes. The current human and financial resources de-
voted to test development could be used to create assessment systems that enable all
valued outcomes of education, including creativity and learning to learn to be
assessed.
It was noted that alternatives to tests to give summative information about
individual students, avoiding the negative impact on students, could be found in
programmes of testing students when their teachers judge them to be ready to show
their achievement at a certain level. For tracking national standards, more valid and
useful information, from a wider range of test forms and items, can be gained by
sampling students rather than testing whole cohorts.
It was emphasised that assessment policy makers should be aware of the real cost
of current practice, including teaching time taken up for testing and practice testing
and adding to teachers’ workloads, in addition to the cost of the tests and their
development.
Finally the policy of setting targets based only on test results was identified as a
key factor in raising the stakes to the point where test testing begins to act in
opposition to the intentions of reform. Interestingly the chief inspector for schools
in England has reported ‘a very real concern that the innovation and reform that we
need to see in our schools may be inhibited by an over-concentration on targets’
(Bell, 2003).
Conclusion
One of the main outcomes of the research review is to draw attention to the
small number of studies that were found to offer dependable evidence to address
the question posed in this review. The finding that only 19 studies dealing with the
impact of summative assessment on motivation for learning emerged from
the search carried out, indicates that this is an under-researched area. A large corpus
of research on cognitive outcomes of educational practice and indeed of assessment,
evaluation and testing, exists. The number of research studies concerned with
affective and conative outcomes of assessment is very small by comparison. We have
argued that there are important reasons for serious attention to motivation
for learning as an outcome of education. We have also discussed the complexity
of the concept of motivation for learning and indicated that it can be discouraged
unwittingly by assessment and testing practices. It is not the role of this paper
to suggest how to promote motivation, but the review has hopefully pointed out
some of the actions and conditions that impact both positively and negatively on
it.
Testing and Motivation for Learning 205
MADAUS, G. & CLARKE, M. (1999) The adverse impact of high stakes testing on minority
students: evidence from 100 years of test data, High Stakes K–12 Testing Conference,
Harvard University, 4 December, 1998. Paper revised May 1999.
MAINES, B. & ROBINSON, G. (1996) B/G Steem: a self esteem scale with locus of control items (Bristol,
Lucky Duck Publishing).
MCCOMBS, B. L. (1999) Learner-Centred Classroom Practices. Available from the author, University
of Denver Research Institute, Denver, Colorado.
MCCOMBS, B. L. & WHISLER, J. (1997) The Learner Centred Classroom and School (San Francisco,
CA, Jossey-Bass).
MCDONALD, A. (2001) The prevalence and effects of test anxiety in school children, Educational
Psychology, 21, pp. 89–101.
MCNEIL, L. & VALENZUELA, A. (1998) The harmful effects of the TAAS system of testing in
Texas: beneath the accountability rhetoric, High Stakes K-12 Testing Conference, Harvard
University, 4 December, 1998.
NORWICKI, S. & STRICKLAND, B. (1973) A locus of control scale for children, Journal of Consulting
and Clinical Psychology, 40, pp. 148–155.
OECD (2001) Knowledge and Skills for Life. First results from PISA 2000 (Paris, OECD).
OSBORN, M., MCNESS, E., BROADFOOT, P., POLLARD, A. & TRIGGS, P. (2000) What Teachers Do:
changing policy and practice in primary education (London, Continuum).
PROFESSIONAL ASSOCIATION OF TEACHERS (2000) Press release 06/01/00. See www.pat.org.uk
RESNICK, L. B. & NOLAN, K. L. (1995) Standards for education, in: D. RAVITCH (Ed.) Debating the
Future of American Education: do we need national standards and assessment? (Washington DC,
Brookings Institution).
SCHOEN, H. L., FEY, J. T., HIRSCH, C. R. & COXFORD, A. E. (1999) Issues and options in math
wars, Phi Delta Kappan, February, pp. 444–453.
SCHUNK, D. (1991) Self-efficacy and academic motivation, Educational Psychologist, 26, pp. 207–
231.
STIGGINS, R. (2001) Student-Involved Classroom Assessment (3rd edn) (Upper Saddle River, NJ,
Merrill Prentice Hall).
WATKINS, D. (2000) Learning and teaching: a cross-cultural perspective, School Leadership and
Management, 20 (2), pp. 161–173.
8. FERGUSON, C. & FRANCIS, J. (1979) Motivation and mode: an attempt to measure the
attitudes of ‘O’ level GCE candidates to English language, Educational Studies, 5 (3),
pp. 231–239.
9. GORDON, S. & REESE, M. (1997) High stakes testing: worth the price? Journal of School
Leadership, 7, pp. 345–368.
10. HUGHES, B., SULLIVAN, H. & BEAIRD, J. (1986) Continuing motivation of boys and girls
under differing evaluation conditions and achievement levels, American Educational Re-
search Journal, 23, pp. 660–667.
11. JOHNSTON, J. & MCCLUNE, W. (2000) Selection project sel 5.1: pupil motivation and
attitudes—self-esteem, locus of control, learning disposition and the impact of selection on
teaching and learning, in: The Effects of the Selective System of Secondary Education in
Northern Ireland: Research Papers Volume II (Bangor, Co. Down, Department of Edu-
cation).
12. LEONARD, M. & DAVEY, C. (2001) Thoughts on the 11 Plus (Belfast, Save the Children
Fund).
13. LITTLE, A. (1994) Types of assessment and interest in learning: variation in the south of
England in the 1980s, Assessment in Education, 1, pp. 201–222.
14. PARIS, S., LAWTON, T., TURNER, J. & ROTH, J. (1991) A developmental perspective on
standardised achievement testing, Educational Researcher, 20, pp. 12–20.
15. PERRY, N. (1998) Young children’s self-regulated learning and contexts that support it,
Journal of Educational Psychology, 90, pp. 715–729.
16. POLLARD, A., TRIGGS, P., BROADFOOT, P., MCNESS, E. & OSBORN, M. (2000) What Pupils
Say: changing policy and practice in primary education (London, Continuum).
17. REAY, D. & WILIAM, D. (1999) ‘I’ll be a nothing’: structure, agency and the construction of
identity through assessment, British Educational Research Journal, 25, pp. 343–354.
18. RODERICK M. & ENGEL, M. (2001) The grasshopper and the ant: motivational responses of
low achieving pupils to high stakes testing, Educational Evaluation and Policy Analysis, 23,
pp. 197–228.
19. SCHUNK D. (1996) Goal and self-evaluative influences during children’s cognitive skill
learning, American Educational Research Journal, 33, pp. 359–382.