Basic Concept in Assessment
Basic Concept in Assessment
Top
37.
38. What is Assessment?
39. Broadly conceived, classroom assessment involves two major types of
activities: collecting information about how much knowledge and skill
students have learned (measurement) and making judgments about the
adequacy or acceptability of each student's level of learning (evaluation).
Both the measurement and evaluation aspects of classroom assessment
can be accomplished in a number of ways. To determine how much
learning has occurred, teachers can, for example, have students take
exams, respond to oral questions, do homework exercises, write papers,
solve problems, and make oral presentations. Teachers can then evaluate
the scores from those activities by comparing them either to one another
or to an absolute standard (such as an A equals 90 percent correct).
Throughout much of this chapter we will explain and illustrate the various
ways in which you can measure and evaluate student learning.
40.
41. Measurement
42. Measurement is the assignment of numbers to certain attributes of objects,
events, or people according to a rule-governed system. For our purposes,
we will limit the discussion to attributes of people. For example, we can
measure someone's level of typing proficiency by counting the number of
words the person accurately types per minute or someone's level of
mathematical reasoning by counting the number of problems correctly
solved. In a classroom or other group situation, the rules that are used to
assign the numbers will ordinarily create a ranking that reflects how much
of the attribute different people possess (Linn & Gronlund, 1995).
43.
44. Evaluation
45. Evaluation involves using a rule-governed system to make judgments
about the value or worth of a set of measures (Linn & Gronlund, 1995).
What does it mean, for example, to say that a student answered eighty out
of one hundred earth science questions correctly? Depending on the rules
that are used, it could mean that the student has learned that body of
knowledge exceedingly well and is ready to progress to the next unit of
instruction or, conversely, that the student has significant knowledge gaps
and requires additional instruction.
46.
Top
47.
48. Why Should We assess Students' Learning?
49. This question has several answers. We will use this section to address four
of the most common reasons for assessment: to provide summaries of
learning, to provide information on learning progress, to diagnose specific
strengths and weaknesses in an individual's learning, and to motivate
further learning.
50. Summative Evaluation
51. The first, and probably most obvious, reason for assessment is to provide
to all interested parties a clear, meaningful, and useful summary or
accounting of how well a student has met the teacher's objectives. When
testing is done for the purpose of assigning a letter or numerical grade, it is
often called summative evaluationsince its primary purpose is to sum up
how well a student has performed over time and at a variety of tasks.
52. Formative Evaluation
53. A second reason for assessing students is to monitor their progress. The
main things that teachers want to know from time to time is whether
students are keeping up with the pace of instruction and are understanding
all of the material that has been covered so far. For students whose pace of
learning is either slower or faster than average or whose understanding of
certain ideas is faulty, you can introduce supplementary instruction (a
workbook or a computer-based tutorial program), remedial instruction
(which may also be computer based), or in-class ability grouping (recall
that we discussed the benefits of this arrangement in Chapter 6). Because
the purpose of such assessment is to facilitate or form learning and not to
assign a grade, it is usually called formative evaluation.
54. Diagnosis
55. A third reason follows from the second. If you discover a student who is
having difficulty keeping up with the rest of the class, you will probably
want to know why in order to determine the most appropriate course of
action. This purpose may lead you to construct an assessment (or to look
for one that has already been made up) that will provide you with specific
diagnostic information.
56.
57. Effects on Learning
58. A fourth reason for assessment of student performance is that it has
potentially positive effects on various aspects of learning and instruction.
As Terence Crooks points out, classroom assessment guides students'
"judgment of what is important to learn, affects their motivation and self-
perceptions of competence, structures their approaches to and timing of
personal study (e.g., spaced practice), consolidates learning, and affects
the development of enduring learning strategies and skills. It appears to be
one of the most potent forces influencing education" (1988, p. 467).
59.
Top
60.
61. Ways to Measure Student Learning
62. Just as measurement can play several roles in the classroom, teachers
have several ways to measure what students have learned. Which type of
measure you choose will depend, of course, on the objectives you have
stated. For the purposes of this discussion, objectives can be classified in
terms of two broad categories: knowing about something (for example, that
knots are used to secure objects, that dance is a form of social expression,
that microscopes are used to study things too small to be seen by the
naked eye) and knowing how to do something (for example, tie a square
knot, dance the waltz, operate a microscope). Measures that attempt to
assess the range and accuracy of someone's knowledge are usually called
written tests. And measures that attempt to assess how well somebody can
do something are often referred to as performance tests. Again, keep in
mind that both types have a legitimate place in a teacher's assessment
arsenal. Which type is used, and to what extent, will depend on the purpose
or purposes you have for assessing students. In the next two sections, we
will briefly examine the nature of both types.
63.
64. Written Tests
65. Teachers spend a substantial part of each day assessing student learning,
and much of this assessment activity involves giving and scoring some
type of written test. Most written tests are composed of one or more of the
following item types: selected response (multiple choice, true-false, and
matching, for example), short answer, and essay. They are designed to
measure how much people know about a particular subject. In all
likelihood, you have taken hundreds of these types of tests in your school
career thus far. In the next couple of pages, we will briefly describe the
main features, advantages, and disadvantages of each test.
66.
Top
67.
68. Selected-Response Tests
69. Characteristics
70. Selected-response tests are so named because the student reads a
relatively brief opening statement (called a stem) and selects one of the
provided alternatives as the correct answer. Selected-response tests are
typically made up of multiple-choice, true-false, or matching items. Quite
often all three item types are used in a single test. Selected-response tests
are sometimes called "objective" tests because they have a simple and set
scoring system. If alternative (b) of a multiple-choice item is keyed as the
correct response and the student chose alternative (d), the student is
marked wrong, regardless of how much the teacher wanted the student to
be right. But that doesn't mean selected-response items are totally free of
subjective influences. After all, whoever created the test had to make
subjective judgments about which areas to emphasize, how to word items,
and which items to include in the final version. Finally, selected-response
tests are typically used when the primary goal is to assess what might be
called foundational knowledge. This is the basic factual information and
cognitive skills that students need in order to do such high-level tasks as
solve problems and create products (Stiggins, 1994).
71.
72. Advantages
73. A major advantage of selected-response tests is efficiency -- a teacher can
ask many questions in a short period of time. Another advantage is ease
and reliability of scoring. With the aid of a scoring template (such as a
multiple-choice answer sheet that has holes punched out where the correct
answer is located), many tests can be quickly and uniformly scored.
74.
75. Disadvantages
76. Because items that reflect the lowest level of Bloom's Taxonomy (verbatim
knowledge) are the easiest to write, most teacher-made tests are composed
almost entirely of knowledge-level items (a point we made initially in
Chapter 7). As a result, students focus on verbatim memorization rather
than on meaningful learning. Another disadvantage is that, while we get
some indication of what students know, such tests tell us nothing about
what students can do with that knowledge.
77.
Top
78.
79. Short-Answer Tests
80. Characteristics
81. Instead of selecting from one or more alternatives, the student is asked to
supply a brief answer consisting of a name, word, phrase, or symbol. Like
selected-response tests, short-answer tests can be scored quickly,
accurately, and consistently, thereby giving them an aura of objectivity.
They are primarily used for measuring foundational knowledge.
82.
83. Advantages
84. Short-answer items are relatively easy to write, so a test, or part of one, can
be constructed fairly quickly. They allow for either broad or in-depth
assessment of foundational knowledge since students can respond to
many items within a short space of time. Since students have to supply an
answer, they have to recall, rather than recognize, information.
85.
86. Disadvantages
87. This item type has the same basic disadvantages as the selected-response
items. Because these items ask only for short verbatim answers, students
are likely to limit their processing to that level, and these items provide no
information about how well students can use what they have learned. In
addition, unexpected but plausible answers may be difficult to score.
88.
Top
89.
90. Essay Tests
91. Characteristics
92. The student is given a somewhat general directive to discuss one or more
related ideas according to certain criteria. One example of an essay
question is "Compare operant conditioning theory and information-
processing theory in terms of basic assumptions, typical research findings,
and classroom applications."
93.
94. Advantages
95. Essay tests reveal how well students can recall, organize, and clearly
communicate previously learned information. When well written, essays
tests call on such higher-level abilities as analysis, synthesis, and
evaluation. Because of these demands, students are more likely to try to
meaningfully learn the material over which they are tested.
96.
97. Disadvantages
98. Consistency of grading is likely to be a problem. Two students may have
essentially similar responses, yet receive different letter or numerical
grades. These test items are also very time consuming to grade. And
because it takes time for students to formulate and write responses, only a
few questions at most can be given.
99.
Top
100.
Performance Tests
101. In recent years many teachers and measurement experts have
argued that the typical written test should be used far less often because it
reveals little or nothing of the depth of students' knowledge and how
students use their knowledge to work through questions, problems, and
tasks. The solution that these experts have proposed is to use one or more
of what are called performance tests.
102. Performance tests attempt to assess how well students use
foundational knowledge to perform complex tasks under more or less
realistic conditions. At the low end of the realism spectrum, students may
be asked to construct a map, interpret a graph, or write an essay under
highly standardized conditions. That is, everyone completes the same task
in the same amount of time and under the same conditions. At the high end
of the spectrum, students may be asked to conduct a science experiment,
produce a painting, or write an essay under conditions that are similar to
those of real life. For example, students may be told to produce a compare-
and-contrast essay on a particular topic by a certain date, but the
resources students choose to use, the number of revisions they make, and
when they work on the essay are left unspecified. As we noted in Chapter
5, when performance testing is conducted under such realistic conditions,
it is also called authentic assessment (Meyer, 1992). Another term that is
often used to encompass both performance testing and authentic
assessment, and to distinguish them from traditional written tests,
is alternative assessment. In this section we will first define the four
different types of performance tests and then look at their most important
characteristics.
103.
Top
104.
105. Types of Performance Tests
106. Currently, there are four ways in which the performance capabilities
of students are typically assessed: direct writing assessments, portfolios,
exhibitions, and demonstrations.
107.
108. Direct Writing Assessments
109. These tests ask students to write about a specific topic ("Describe
the person whom you admire the most, and explain why you admire that
person.") under a standard set of conditions. Each essay is then scored by
two or more people according to a set of defined criteria.
110.
111. Portfolios
112. A portfolio may contain one or more pieces of a student's work,
some of which demonstrate different stages of completion. For example, a
student's writing portfolio may contain business letters; pieces of fiction;
poetry; and an outline, rough draft, and final draft of a research paper.
Through the inclusion of various stages of a research paper, both the
process and the end product can be assessed. Portfolios can also be
constructed for math and science as well as for projects that combine two
or more subject areas. Often the student is involved in the selection of
what is included in his portfolio. The portfolio is sometimes used as a
showcase to illustrate exemplary pieces, but it also works well as a
collection of pieces that represent a student's typical performances. In its
best and truest sense, the portfolio functions not just as a housing for
these performances but also as a means of self-expression, self-reflection,
and self-analysis for an individual student (Templeton, 1995).
113.
114. Exhibitions
115. Exhibitions involve just what the label suggests -- a showing of such
products as paintings, drawings, photographs, sculptures, videotapes, and
models. As with direct writing assessments and portfolios, the products a
student chooses to exhibit are evaluated according to a predetermined set
of criteria.
116.
117. Demonstrations
118. In this type of performance testing, students are required to show
how well they can use previously learned knowledge or skills to solve a
somewhat unique problem (such as conducting a scientific inquiry to
answer a question or diagnosing the cause of a malfunctioning engine and
describing the best procedure for fixing it) or perform a task (such as
reciting a poem, performing a dance, or playing a piece of music).
119.
Top
120.
121. Ways to Evaluate Student Learning
122. Once you have collected all the measures you intend to collect -- for
example, test scores, quiz scores, homework assignments, special
projects, and laboratory experiments -- you will have to give the numbers
some sort of value (the essence of evaluation). As you probably know, this
is most often done by using an A to F grading scale. Typically, a grade of A
indicates superior performance; a B, above-average performance; a C,
average performance; a D, below-average performance; and an F, failure.
There are two general ways to approach this task. One approach involves
comparisons among students. Such forms of evaluation are called norm-
referenced since students are identified as average (or normal), above
average, or below average. An alternative approach is called criterion-
referenced because performance is interpreted in terms of defined criteria.
Although both approaches can be used, we favor criterion-referenced
grading for reasons we will mention shortly.
123.
124. NORM-REFERENCED GRADING
125. A norm-referenced grading system assumes that classroom
achievement will naturally vary among a group of heterogeneous students
because of differences in such characteristics as prior knowledge, learning
skills, motivation, and aptitude. Under ideal circumstances (hundreds of
scores from a diverse group of students), this variation produces a bell-
shaped, or "normal," distribution of scores that ranges from low to high,
has few tied scores, and has only a very few low scores and only a very few
high scores. For this reason, norm-referenced grading procedures are also
referred to as "grading on the curve."
126.
127. CRITERION-REFERENCED GRADING
128. A criterion-referenced grading system permits students to benefit
from mistakes and to improve their level of understanding and
performance. Furthermore, it establishes an individual (and sometimes
cooperative) reward structure, which fosters motivation to learn to a
greater extent than other systems.
129. Under a criterion-referenced system, grades are determined through
comparison of the extent to which each student has attained a defined
standard (or criterion) of achievement or performance. Whether the rest of
the students in the class are successful or unsuccessful in meeting that
criterion is irrelevant. Thus, any distribution of grades is possible. Every
student may get an A or an F, or no student may receive these grades. For
reasons we will discuss shortly, very low or failing grades tend to occur
less frequently under a criterion-referenced system.
130. A common version of criterion-referenced grading assigns letter
grades on the basis of the percentage of test items answered correctly. For
example, you may decide to award an A to anyone who correctly answers
at least 85 percent of a set of test questions, a B to anyone who correctly
answers 75 to 84 percent, and so on down to the lowest grade. To use this
type of grading system fairly, which means specifying realistic criterion
levels, you would need to have some prior knowledge of the levels at which
students typically perform. You would thus be using normative information
to establish absolute or fixed standards of performance. However, although
norm-referenced and criterion-referenced grading systems both spring
from a normative database (that is, from comparisons among students),
only the former system uses those comparisons to directly determine
grades.
131. Criterion-referenced grading systems (and criterion-referenced tests)
have become increasingly popular in recent years primarily because of
three factors. First, educators and parents complained that norm-
referenced tests and grading systems provided too little specific
information about student strengths and weaknesses. Second, educators
have come to believe that clearly stated, specific objectives constitute
performance standards, or criteria, that are best assessed with criterion-
referenced measures. Third, and perhaps most important, contemporary
theories of school learning claim that most, if not all, students can master
most school objectives under the right circumstances. If this assertion is
even close to being true, then norm-referenced testing and grading
procedures, which depend on variability in performance, will lose much of
their appeal.
132.
Top
133.
134. Suggestions for Teaching in Your Classroom: Effective Assessment
Techniques
135. 1. As early as possible in a report period, decide when and how often
to give tests and other assignments that will count toward a grade, and
announce tests and assignments well in advance.
136. 2. Prepare a content outline and/or a table of specifications of the
objectives to be covered on each exam, or otherwise take care to obtain a
systematic sample of the knowledge and skill acquired by your students.
137. 3. Consider the purpose of each test or measurement exercise in
light of the developmental characteristics of the students in your classes
and the nature of the curriculum for your grade level.
138. 4. Decide whether a written test or a performance test is most
appropriate.
139. 5. Make up and use a detailed answer key.
140. a. Evaluate each answer by comparing it to the key.
141. b. Be willing and prepared to defend the evaluations you make.
142. 6. During and after the grading process, analyze questions and
answers in order to improve future exams.
143.
Top
144.
145. Resources for Further Investigation
146. Suggestions for constructing Written and Performance Tests
147. For specific suggestions on ways to write different types of items for
paper-and-pencil tests of knowledge and on methods for constructing and
using rating scales and checklists to measure products, performances, and
procedures, consult one or more of the following books:Measurement and
Evaluation in Teaching (7th ed., 1995), by Robert Linn and Norman
Gronlund; How to Make Achievement Tests and Assessments (5th ed.,
1993), by Norman Gronlund; Classroom Assessment: What Teachers Need
to Know (1995), by W. James Popham;Student-Centered Classroom
Assessment (1994), by Richard Stiggins; Classroom Assessment (2d ed.,
1994), by Peter Airasian; and Practical Aspects of Authentic
Assessment (1994), by Bonnie Campbell Hill and Cynthia Ruptic.
148. The Learning Resources Development Center (LRDC) at the
University of Pittsburgh publishes a large number of briefs, articles, and
reviews related to assessment and learning, particularly emphasizing
cognitive-based approaches. An online resource of the LRDC can be found
at http://www.lrdc.pitt.edu/publications.html. The most extensive on-line
database of assessment information is the ERIC/AE Test Locater, which is
found at www.cua.edu/www/eric_ae/testcol.html. It includes numerous
topics, reviews of tests, suggestions and digests relating to alternative
assessment, and broader standards and policy-making information as it
relates to evaluation and assessment of students.
149.
150. Writing Higher-Level Questions
151. As Benjamin Bloom and others point out, teachers have a
disappointing tendency to write test items that reflect the lowest level of
the taxonomy-knowledge. To avoid this failing, carefully read Part 2
of Taxonomy of Educational Objectives: The Classification of Educational
Goals, Handbook I: Cognitive Domain (1956), edited by Benjamin Bloom,
Max Englehart, Edward Furst, Walker Hill, and David Krathwohl. Each level
of the taxonomy is clearly explained and followed by several pages of
illustrative test items.
152.
Top
153.
164. Top
165.
166. Copyright Houghton Mifflin Company. All Rights Reserved.
Terms and Conditions of Use, Privacy Statement, and Trademark Information