Module - Assessment and Evaluation of Learning- Final(1)
Module - Assessment and Evaluation of Learning- Final(1)
MEKELLE UNIVERSITY
MINISTRY OF EDUCATION
December, 2013
ADDIS ABABA
0
Assessment and Evaluation of Learning
Credit hours: 3
1
Assessment and Evaluation of Learning
TABLE OF CONTENTS
Content Page
2
Assessment and Evaluation of Learning
3
Assessment and Evaluation of Learning
Module Introduction
This module is designed to equip you with the basic knowledge and
Throughout this module there are different in-text questions which may help you to pose
your reading for a moment and reflect on what you are studying. In addition, there are
many activities that you will come across (at least one in each section) and attempt
before proceeding from one section to the next section. Therefore, you need to
seriously try to reflect on/answer each question and activity if you are to have a deep
and meaningful understanding of the concepts under discussion and be successful
learners. You will also complete two assignments and submit to your course tutor that
will be graded out of 30%.
Finally, I wish you a good and successful learning journey. You may start studying your
module right now.
Module Contents
1. Assessment: Concept, purpose, and principles
2. Assessment strategies, methods, and tools
3. Item Analysis
4. Interpretation of scores
5. Ethical Standards of assessment
5
Assessment and Evaluation of Learning
This tells you there is a question to answer or think about in the text.
This tells you there is an introduction to the module, unit and section.
This tells you there is a checklist of the main points.
Module Icons
In this module the following icons (symbols) are used to facilitate your learning
process.
1.
2.
6
Assessment and Evaluation of Learning
7
Assessment and Evaluation of Learning
Unit 1
1.1 Introduction
8
Assessment and Evaluation of Learning
1.2 Concepts
Dear learner, before you start studying educational assessment and evaluation, you
need to have a clear understanding about certain related concepts. Having a clear
understanding of the basic concepts is fundamental to learn better the subsequent
topics of the course. You might have come across the concepts test, measurement,
assessment, & evaluation.
You might have found it difficult to come up with a clear distinction in meaning among
these concepts. This is because of the fact that they are concepts which may be
involved in a single process. There is also some confusion and differences in the usage
of these concepts as manifested in the literature. Now let us see the meaning of these
concepts as used in this module.
Test: Perhaps test is a concept that you are more familiar with than the other concepts.
You have been taking tests ever since you have started schooling to determine your
academic performance. Tests are also used in work places to select individuals for a
certain job vacancy. Thus test in educational context is meant to the presentation of a
standard set of questions to be answered by students. It is one instrument that is used
to determine students ability or performance to complete certain tasks or demonstrate
mastery of a skill or knowledge of content. Please note that there are many other ways
of collecting information about students educational performances other than tests,
such as observations, assignments, project works, portfolios, etc.
Measurement: In our day to day life there are different things that we measure. We
measure our height and put it in terms of meters and centimeters. We measure some of
our daily consumptions like sugar in kilograms and liquids in liters. We measure
temperature and express it in terms of degree centigrade or degree Celsius. How do we
measure these things? Well definitely we need to have appropriate instruments such as
9
Assessment and Evaluation of Learning
Evaluation: This concept refers to the process of judging the quality of student learning
on the basis of established performance standards and assigning a value to represent
the worthiness or quality of that learning or performance. It is concerned with
determining how well students have learned. When we evaluate, we are saying that
something is good, appropriate, valid, positive, and so forth. Evaluation is based on
assessment that provides evidence of student achievement at strategic times
throughout the grade/course, often at the end of a period of learning. Value is inherent
in the idea of evaluation.
11
Assessment and Evaluation of Learning
Thus, evaluation may or may not be based on measurement (or tests) but when it is, it
goes beyond the simple quantitative description of students behavior. Evaluation
involves value judgment. The quantitative values that we obtain through measurement
will not have any meaning until they are evaluated against some standards or criteria.
Educators are constantly evaluating students and it is usually done in comparison with
some standard. For example, if the objective of the lesson is for students to solve
quadratic equations and if, having given them a test related to this objective, all learners
are able to solve at least 80% of the problems, then the teacher may conclude that his
or her teaching of the topic was quite successful.
So, we can describe evaluation as the comparison of what is measured against some
defined criteria and to determine whether it has been achieved, whether it is
appropriate, whether it is good, whether it is reasonable, whether it is valid and so forth.
Evaluation accurately summarizes and communicates to parents, other teachers,
employers, institutions of further education, and students themselves what students
know and can do with respect to the overall curriculum expectations.
Now, lets summarize the differences and relationship between the four concepts.
12
Assessment and Evaluation of Learning
Assessment is a much more comprehensive and inclusive concept than testing and
measurement. It includes the full range of procedures (observations, rating of
performances, paper and pencil tests, etc) used to gain information about students
learning. It may also include quantitative descriptions (measurement) and qualitative
descriptions (non-measurement) of students behaviors.
Evaluation, on the other hand, consists of making judgments about the level of
students achievement for purposes of grading and accountability and for making
decisions about promotion and graduation. To make an evaluation, we need
information, and it is obtained by measuring using a reliable instrument.
One of the first things to consider when planning for assessment is its purpose. Why
assessment is important? Who will use the results? How will they use them? As
prospective teachers, you also need to have a clear idea as to what the purposes
assessment serves. So lets discuss on the following question:
13
Assessment and Evaluation of Learning
Activity 4: Think-Pair-Share
Thus, based on the reasons for assessment described above, it can be summarized
that assessment in education focuses on:
With regard to the learner, assessment is aimed at providing information that will help
us make decisions concerning remediation, enrichment, selection, exceptionality,
progress and certification. Assessment for improved student learning requires a range
of assessment practices to be used with the following overarching purposes:
14
Assessment and Evaluation of Learning
Assessment for Learning: this type of assessment occurs while teaching and learning
is on progress, rather than at the end. In assessment for learning, teachers use
assessment evidences to monitor students learning progress and inform their teaching.
This form of assessment is designed to provide diagnostic information to teachers about
students prior knowledge and formative information about the effects of their
instruction on student learning. It also provides students with important information
about their learning and the effectiveness of the learning strategies they are using. It is
the most important form of assessment in regard to student learning.
Assessment as learning: This form of assessment makes assessment part of, not
separate from, the instructional process. Assessment as learning involves students in
their own continuous self-assessment and is designed to help students become more
self-directed learners. Self-assessment involves helping students set their own
learning goals, monitor progress toward achieving these goals, and make adjustments
in learning strategies as required. Students can be involved in assessment in a variety
of ways, including helping establish criteria for success and developing rubrics to
measure.
Assessment as learning also takes the form of peer assessment, with peer interaction
and feedback. Although strategies for self- and peer assessment are less well
developed as compared to the other two forms of assessment, they are nonetheless
very important for two reasons. One, these assessments help achieve what some see
as the ultimate goal of educationdeveloping independent learners. And two, students
are often more receptive to feedback from peers than feedback from their teachers.
15
Assessment and Evaluation of Learning
Based on what you have learned from previous courses, reflect on the
following questions.
1) Define educational objectives and learning outcomes
2) How can we classify educational objectives? Describe
Blooms Taxonomy of Educational Objectives.
3) Discuss the importance of educational objectives to the instructional
process.
As you might remember from what you have learned in your Secondary School
curriculum and Instruction course, the first step in planning any good teaching is to
clearly define the educational objectives or outcomes. They are desirable changes in
behavior or outcome statements that capture specifically what knowledge, skills,
attitudes learners should be able to exhibit following the instructional process. Defining
learning objectives is also essential to the assessment of students learning. Effective
assessment practice requires relating the assessment procedures as directly as
possible to the learning objectives.
Educational objectives which are commonly known as learning outcomes play a key
role in both the instructional process and the assessment process. They serve as
guides for both teaching and learning, communicate the intent of instruction to others,
and provide guidelines for assessing students learning.
Educational objectives or learning outcomes are stated in terms of what the students
are expected to be able to do at the end of the instruction. For instance, after teaching
17
Assessment and Evaluation of Learning
them on how to solve quadratic equations, we might expect students to have the skill of
solving any quadratic equation. A learning outcome stated in this way clearly indicates
the kind of performance students are expected to exhibit as a result of the instruction.
This situation also makes clear the intent of our instruction and sets the stage for
assessing students learning. Well stated learning outcomes contain three elements
namely, conditions of performance, observable behavior or action, and measurable
criteria or standards, These elements helps us to make clear the types of students
performance we are willing to accept as evidence that the instruction has been
successful.
Bloom and his associates have developed a taxonomy of educational objectives, which
provides a practical framework within which educational objectives could be organized
and measured. In this taxonomy Bloom et al (1956) divided educational objectives into
three domains. These are cognitive domain, affective domain and psychomotor domain.
Each domain is further categorized into hierarchical levels. That is achievement of a
higher level of skill assumes the achievement of the previous levels. This implies that a
higher level of skill could be achieved only if a certain amount of ability called for by the
previous level has been achieved.
Cognitive domain: This involves those objectives that deal with the development of
intellectual abilities and skills. These have to do with the mental abilities of the brain. For
instance, you cannot apply what you do not know or comprehend.
18
Assessment and Evaluation of Learning
Psychomotor domain: The psychomotor domain has to do with motor skills or abilities.
It deals with such activities which involve the use of the hand or the whole of the body.
Can you think of such abilities or skills. Consider the skills in running, walking,
swimming, jumping, eating, playing, throwing, etc.
19
Assessment and Evaluation of Learning
Naturalization Having high level performance become natural, without needing to think
much about it.
Different educators and school systems have developed somehow different sets of
assessment principles. Miller, Linn and Grunland (2009) have identified the following
general principles of assessment.
Perhaps the assessment principles developed by New South West Wales Department
of Education and Training (2008) in Australia are more inclusive than those principles
listed by other educators. Let us look at these principles and compare them with those
developed by Miller, Linn and Grunland as described above.
should not be advantaged or disadvantaged by such differences that are not relevant to
the knowledge, skills and understandings that the assessment is intended to address.
Students have the right to know what is assessed, how it is assessed and the worth of
the assessment. Assessment will be fair or equitable only if it is free from bias or
favoritism.
6. Assessment should be accurate. Assessment needs to provide evidence that
accurately reflects an individual students knowledge, skills and understandings.
That is, assessments need to be reliable or dependable in that they consistently
measure a students knowledge, skills and understandings. Assessment also
needs to be objective so that if a second person assesses a students work,
they will come to the same conclusion as the first person. Assessment will be fair
to all students if it is based on reliable, accurate and defensible measures.
7. Assessment should provide useful information. The focus of assessment is to establish
where students are in their learning. This information can be used for both summative
purposes, such as the awarding of a grade, and formative purposes to feed directly into
the teaching and learning cycle.
8. Assessment should be integrated into the teaching and learning cycle. Assessment
needs to be an ongoing, integral part of the teaching and learning cycle. It must allow
teachers and students themselves to monitor learning. From the teacher perspective, it
provides the evidence to guide the next steps in teaching and learning. From the
student perspective, it provides the opportunity to reflect on and review progress, and
can provide the motivation and direction for further learning.
9. Assessment should draw on a wide range of evidence. Assessment needs to draw on a
wide range of evidence. A complete picture of student achievement in an area of
learning depends on evidence that is sampled from the full range of knowledge, skills
and understandings that make up the area of learning. An assessment program that
consistently addresses only some outcomes will provide incomplete feedback to the
teacher and student, and can potentially distort teaching and learning.
22
Assessment and Evaluation of Learning
Angelo and Cross (1993) have listed seven basic assumptions of classroom
assessment which are described as follows:
2. To improve their effectiveness, teachers need first to make their goals and
objectives explicit and then to get specific, comprehendible feedback on the
extent to which they are achieving those goals and objectives. Effective
assessment begins with clear goals. Before teachers can assess how well their
students are learning, they must identify and clarify what they are trying to teach.
After teachers have identified specific teaching goals they wish to assess, they
can better determine what kind of feedback to collect.
24
Assessment and Evaluation of Learning
4. The type of assessment most likely to improve teaching and learning is that
conducted by teachers to answer questions they themselves have
formulated in response to issues or problems in their own teaching. To best
understand their students learning, teachers need specific and timely
information about the particular individuals in their classes. As a result of the
different students needs, there is often a gap between assessment and student
learning. One goal of classroom assessment is to reduce this gap.
During teaching, you will be assessing students learning continuously. You will be
interpreting what the students say and do in order to make judgments about their
achievements. The ability to analyze the students learning is vital if you are to make
appropriate teaching points which help the students develop their knowledge and/or
competence. You will be using your subject knowledge to help you identify what to look
for and where to take the student next. You will need to listen, observe and question in
ways which will enable you to give appropriate feedback or further instruction.
When they use it to become aware of the knowledge, skills, and beliefs that their
students bring to a learning task, and;
When they use this knowledge as a starting point for new instruction, and
monitor students changing perceptions as instruction proceeds.
When learning is the goal, teachers and students collaborate and use ongoing
assessment and pertinent feedback to move learning forward. When classroom
assessment is frequent and varied, teachers can learn a great deal about their students.
They can gain an understanding of students existing beliefs and knowledge, and can
identify incomplete understandings, false beliefs, and immature interpretations of
concepts that may influence or distort learning. Teachers can observe and probe
26
Assessment and Evaluation of Learning
students thinking over time, and can identify links between prior knowledge and new
learning.
Learning is also enhanced when students are encouraged to think about their own
learning, to review their experiences of learning and to apply what they have learned to
their future learning. Assessment provides the feedback loop for this process. When
students (and teachers) become comfortable with a continuous cycle of feedback and
adjustment, students begin to internalize the process of standing outside their own
learning and considering it against a range of criteria, not just the teachers judgment
about quality or accuracy. When students engage in this ongoing metacognitive
experience, they are able to monitor their learning along the way, make corrections, and
develop a habit of mind for continually reviewing and challenging what they know.
According to current cognitive research, people are motivated to learn by success and
competence. When students feel ownership and have choice in their learning, they are
more likely to invest time and energy in it. Assessment can be a motivator, not through
reward and punishment, but by stimulating students intrinsic interest. Assessment can
enhance student motivation by:
27
Assessment and Evaluation of Learning
reinforcing the idea that students have control over, and responsibility for, their
own learning
When students learn, they make meaning for themselves, and they approach learning
tasks in different ways. They bring with them their own understanding, skills, beliefs,
hopes, desires, and intentions. It is important to consider each individual students
learning, rather than talk about the learning of the class. Assessment practices lead
to differentiated learning when teachers use them to gather evidence to support every
students learning, every day in every class. The learning needs of some students may
require individualized learning plans.
There is strong evidence that involving students in the assessment process can have
very definite educational benefits. Now stop reading for a moment and reflect on the
following questions.
28
Assessment and Evaluation of Learning
2) In what ways can students benefit if they are involved in the assessment
process? What do you think are the practical challenges in involving students in
assessment?
One way in which we can involve our students in the assessment process is to establish
the standards or assessment criteria together with them. This will help students
understand what is to be assessed. Working with students to develop assessment tools
is a powerful way to help students build an understanding of what a good product or
performance looks like. It helps students develop a clear picture of where they are
going, where they are now and how they can close the gap. This does not mean that
each student creates his or her own assessment criteria. You, as a teacher, have a
strong role to play in guiding students to identify the criteria and features of
understandings you want your students to develop.
Another important aspect is to involve students in trying to apply the assessment criteria
for themselves. The evidence is that through trying to apply criteria, or mark using a
model answer, students gain much greater insight into what is actually being required
and subsequently their own work improves in the light of this.
An additional benefit is that it may enable the students to be provided with more
learning activities on which they will receive feedback which otherwise would not be
provided because of lack of time by the teacher.
There are different ways in which students can be involved in such type of assessment
self-assessment and peer assessment. Self-assessment involves students judging
their own work. It begins with students understanding the learning intentions or
objectives for the particular lesson and the success criteria for the specific task or
activity. It develops into students awareness of their own strengths and weaknesses in
a particular subject (and as a learner in general) and the ability to identify their own
next steps or targets. Self-assessment allows students to think more carefully about
what they do and do not know, and what they additionally need to know to accomplish
certain tasks.
29
Assessment and Evaluation of Learning
Assessment requires so much of a teachers professional time, both inside and outside
the classroom. Therefore, a teacher should have some basic competencies on
classroom assessment so as to be able to effectively assess his/her students learning.
Assessment during instruction provides information about the overall progress of the
whole class as well as specific information about individual students. These assessment
activities provide the basis for monitoring progress during learning.
Following the teaching of a specific unit, semester, academic year, or the like, decisions
must be made about the achievement of short and long-term instructional goals. This
30
Assessment and Evaluation of Learning
In the American education system a list of seven standards for teacher competence in
educational assessment of students has been developed. These standards for teacher
competence in student assessment have been developed with the view that student
assessment is an essential part of teaching and that effective teaching cannot exist
without appropriate student assessment. The seven standards articulating teacher
competence in the educational assessment of students are described below.
5. Teachers should be skilled in developing valid student grading procedures that use
pupil assessments. Grading students is an important part of professional practice for
teachers.
31
Assessment and Evaluation of Learning
In the Ethiopian context, the MoE has also developed such assessment related
competences which professional teachers are expected to possess. These key
competencies are:
1) Assess student learning
2) Provide feedback to students on their learning
3) Interpret student data
4) Make consistent and comparable judgments
5) Report on student achievement
In small groups:
2) Discuss and report on the importance and use of having standards of teacher
competence in assessment for a particular school and the whole education
system in general.
Unit Summary
32
Assessment and Evaluation of Learning
Test, measurement, assessment and evaluation are concepts that are frequently
used in the area of educational assessment and evaluation, often with varying
meanings and some confusion. However, although they overlap, they vary in
scope and have different meanings.
Assessment serves many important purposes including: informing and guiding
teaching and learning; helping students set learning goals; assigning report card
grades; motivating students.
33
Assessment and Evaluation of Learning
References
Angelo, T.A. & Cross, K.P (1993). Classroom Assessment Techniques; A Handbook
for College Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.
Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education
through Assessment, Innovation, and Evaluation. American Academy of Arts and
Sciences
34
Assessment and Evaluation of Learning
Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. 3 rd ed. Learning
Matters Ltd
Mehrens, W.A. & Lehman, I.J Measurement and Evaluation in Education. 4 th Ed.
New York: Harcourt Brace College Publishers.
Miller, D.M, Linn, RL. & Grunland, NE. (2009). Measurement and Assessment in
Teaching. 10th ed. Upper Saddle River:Pearson Education, Inc.
35
Assessment and Evaluation of Learning
UNIT TWO
ASSESSMENT STRATEGIES, METHODS, AND TOOLS
2.1 Introduction
In the previous unit you have been introduced with the major concepts of
educational assessment and evaluation. You also learned about the
purposes and principles of assessment. In this unit you will learn about the
nature, strengths and weaknesses of the various assessment strategies, methods and
tools that can be used in the context of secondary education. You will also learn about
planning, construction and administration of classroom tests.
Learning Outcomes
At the end of this unit you should be able to:
assessment procedures are likely to be used in the classroom. The most commonly
referred to and used categories in this regard are formative assessment and summative
assessment. Can you differentiate these concepts? Please try to describe them before
you proceed studying the following section.
Formative assessment is also known by the name assessment for learning. The
basic idea of this concept is that the basic purpose of assessment should be to enhance
students learning.
There is still another name which is associated with the concept of formative
assessment, continuous assessment. Continuous assessment (as opposed to
terminal assessment) is based on the premise that if assessment is to help students
improvement in their learning and if a teacher is to determine the progress of students
37
Assessment and Evaluation of Learning
In order to assess your students' understanding, there are various strategies that you
can use. Can you mention some of the strategies that you can use to assess your
students for formative purposes? Please, try to mention as many strategies as you can.
The following are some of the strategies of assessment you can employ in your
classrooms namely, you can :
o ask students to summarize the main ideas they've taken away from your
presentation, discussion, or assigned reading.
38
Assessment and Evaluation of Learning
Tests and homework can also be used formatively if teachers analyze where students
are in their learning and provide specific, focused feedback regarding performance and
ways to improve it.
A particular assessment task can be both formative and summative. For example,
students could complete unit 1 of their Module and complete an assessment task for
which they earned a mark that counted towards their final grade. In this sense, the task
is summative. They could also receive extensive feedback on their work. Such feedback
would guide learners to achieve higher levels of performance in subsequent tasks. In
this sense, the task is formative because it helps students form different approaches
and strategies to improve their performance in the future.
Assessment can also be either formal or informal. Let us try to understand their
differences in the paragraphs that follow.
Formal Assessment: Formal assessments are where the students are aware that the
task they are doing is for assessment purposes. They are frequently used in summative
assessments. This usually implies a written document, such as a test, quiz, or paper. A
formal assessment is given a numerical score or grade based on student performance.
We will deal more on formal assessment strategies, particularly on tests in a later
section.
Informal Assessment: "Informal" is used here to indicate techniques that can easily
be incorporated into classroom routines and learning activities. Informal assessment
techniques can be used at anytime without interfering with instructional time. Their
results are indicative of the student's performance on the skill or subject of interest.
Thus they are more frequently used in formative assessments. Can you think of the
informal assessment strategies that you can use in your classes? What informal
assessment strategies have your teachers used when you were a student?
An informal assessment usually occurs in a more casual manner and may include
observation, inventories, checklists, rating scales, rubrics, performance and portfolio
assessments, participation, peer and self evaluation, and discussion. Formal tests
40
Assessment and Evaluation of Learning
assume a single set of expectations for all students and come with prescribed criteria
for scoring and interpretation. Informal assessment, on the other hand, requires a clear
understanding of the levels of ability the students bring with them. Only then may
assessment activities be selected that students can attempt reasonably. Informal
assessment seeks to identify the strengths and needs of individual students without
regard to grade or age norms.
Methods for informal assessment can be divided into two main types: unstructured (e.g.,
student work samples, journals) and structured (e.g., checklists, observations). The
unstructured methods frequently are somewhat more difficult to score and evaluate, but
they can provide a great deal of valuable information about the skills of the students.
Structured methods can be reliable and valid techniques when time is spent creating the
"scoring" procedures. Another important aspect of informal assessments is that they
actively involve the students in the evaluation process - they are not just paper-and-
pencil tests.
How the results of tests and other assessment procedures are interpreted also provides
a method of classifying these instruments. There are two ways of interpreting student
performance criterion-referenced and norm-referenced.
41
Assessment and Evaluation of Learning
Norm-referenced Assessment: This type of assessment has as its end point the
determination of student performance based on a position within a cohort of students
the norm group. This type of assessment is most appropriate when one wishes to make
comparisons across large numbers of students or important decisions regarding student
placement and advancement. For example, students results in grade 8 national exams
in our country are determined based on their relative standing in comparison to all other
students who have taken the exam. Thus, when we say that a student has scored 80
percentile, it doesnt mean that the student has scored an average of 80% score.
Rather it is meant to be that the students average score stands above 79.9% of the
students, and the remaining 20% of students have scored above that particular student.
Students assignment of ranks is also another example of norm-referenced
interpretation of students performances. The focus of attention in this type of
assessment is on how well the student has done on a test in comparison with other
students.
Divergent assessments are those for which a range of answers or solutions might be
considered correct. For example, a Civics teacher might ask his/her students to
compare presidential and parliamentary forms of government as preferable forms of
government for a country. A student might favor a presidential form of government by
providing sound arguments and valid examples. Another student also might come up
with still convincing ideas favoring parliamentary form of government. In both cases the
answers are different but convincingly correct. So in divergent assessments there might
42
Assessment and Evaluation of Learning
not be one single answer. Divergent assessment tools include essay tests, and
solutions to the workout problems.
A convergent assessment are those which have only one correct response that the
students is trying to reach. They are generally easier to mark. They tend to be quicker to
deliver and give more specific and directed feedback to individuals. It can also provide
wide curriculum coverage. Objective test items are the best example and demonstrate
the value of this approach in assessing knowledge.
When selecting assessment strategies in our subject areas, there are a number of
things that we have to consider. First and foremost, it is important that we choose an
assessment technique appropriate for the particular behavior being assessed. We have
to use a strategy that can give students an opportunity to demonstrate the kind of
behavior that the learning outcome demands. Assessment strategies should also be
related to the course material and relevant to students lives. Therefore, we have to
provide assessment strategies that relate to students future work.
There are many different ways to categorize learning goals for students.. One way in
which the different learning outcomes that we want out students to develop can be
categorized is presented as follows:
Knowledge and understanding: What facts do students know outright? What
information can they retrieve? What do they understand?
Reasoning proficiency: Can students analyze, categorize, and sort into
component parts? Can they generalize and synthesize what they have
learned? Can they evaluate and justify the worth of a process or decision?
Skills: We have certain skills that we want students to master such as reading
fluently, working productively in a group, making an oral presentation,
speaking a foreign language, or designing an experiment.
Ability to create products: Another kind of learning target is student-created
products - tangible evidence that the student has mastered knowledge,
reasoning, and specific production skills. Examples include a research paper,
a piece of furniture, or artwork.
Dispositions: We also frequently care about student attitudes and habits of
mind, including attitudes toward school, persistence, responsibility, flexibility,
and desire to learn.
44
Assessment and Evaluation of Learning
From among the various assessment strategies that can be used by classroom
teachers, some are described below for your consideration as student teachers.
educational visit. What other educational activities can you imagine in your
subject area where students can present their works?
Interviews: You should be familiar with the interviews journalists conduct with different
personalities. An interview can also be used for assessment purposes in educational
settings. In such applications interview is a face-to-face conversation in which teacher
45
Assessment and Evaluation of Learning
and student use inquiry to share their knowledge and understanding of a topic or
problem. This form of assessment can be used by the teacher to:
explore the students thinking;
assess the students level of understanding of a concept or procedure; and
gather information, obtain clarification, determine positions, and probe for
motivations.
Observation: Observation is a process of systematically viewing and recording
students while they work, for the purpose of making instruction decisions. Observation
can take place at any time and in any setting. It provides information on students'
strengths and weaknesses, learning styles, interests, and attitudes. Observations may
be informal or highly structured, and incidental or scheduled over different periods of
time in different learning contexts.
Performance tasks: During a performance task, students create, produce, perform, or
present works on "real world" issues. The performance task may be used to assess a
skill or proficiency, and provides useful information on the process as well as the
product. Please mention some examples of performance tasks that students can do in
your subject area.
Questions and answers: Perhaps, this is a widely used strategy by teachers with the
intention of involving their students in the learning and teaching process. In this
46
Assessment and Evaluation of Learning
strategy, the teacher poses a question and the student answers verbally, rather than in
writing. This strategy helps the teacher to determine whether students understand what
is being, or has been, presented; it also helps students to extend their thinking,
generate ideas, or solve problems. Strategies for effective question and answer
assessment include:
Apply a wait time or 'no hands-up rule' to provide students with time to think after
a question before they are called upon randomly to respond.
Ask a variety of questions, including open-ended questions and those that
require more than a right or wrong answer.
During what time of the lesson do you think question and answer strategy
will be more useful? Why?
Checklists, Rating Scales and Rubrics: These are tools that state specific criteria and
allow teachers and students to gather information and to make judgments about what
students know and can do in relation to the outcomes. They offer systematic ways of
collecting data about specific behaviors, knowledge and skills.
Rating Scales allow teachers to indicate the degree or frequency of the behaviors,
skills and strategies displayed by the learner. Rating scales state the criteria and
provide three or four response selections to describe the quality or frequency of student
work.
47
Assessment and Evaluation of Learning
One- Minute paper: During the last few minutes of the class period, you may ask
students to answer on a half-sheet of paper: "What is the most important point you
learned today?" and, "What point remains least clear to you?" The purpose is to obtain
data about students' comprehension of a particular class session. Then you can review
responses and note any useful comments. During the next class periods you can
emphasize the issues illuminated by your students' comments.
Muddiest Point: This is similar to One-Minute Paper but only asks students to
describe what they didn't understand and what they think might help. It is an important
technique that will help you to determine which key points of the lesson were missed by
48
Assessment and Evaluation of Learning
the students. Here also you have to review before next class meeting and use to clarify,
correct, or elaborate.
Student- generated test questions: You may allow students to write test questions
and model answers for specified topics, in a format consistent with course exams. This
will give students the opportunity to evaluate the course topics, reflect on what they
understand, and what good test items are. You may evaluate the questions and use the
goods ones as prompts for discussion.
Tests: This is the type of assessment that you are mostly familiar with. A test requires
students to respond to prompts in order to demonstrate their knowledge (orally or in
writing) or their skills (e.g., through performance). We will learn much more about tests
later in this section.
Activity: Lets say you need to assess student achievement on each of the following
learning targets. Which assessment strategy would you choose? Please
jot down your answers with their justifications and file it in your portfolio
for later reference.
1. Ability to write clearly and coherently
2. Group discussion proficiency
3. Reading comprehension
4. Proficiency using specified mathematical procedures
5. Proficiency conducting investigations in science
Activity:- What problems do you think teachers will face when assessing
students in large classes? In your school years, what problems
49
Assessment and Evaluation of Learning
The existing educational literature has identified various assessment issues associated
with large classes. They include:
a) Surface Learning Approach: Traditionally, teachers rely on time-efficient and
exam-based assessment methods for assessing large classes, such as multiple
choices and short answer question examinations. These assessments often only
assess learning at the lower levels of intellectual complexity. Furthermore, students
tend to adopt a surface rote learning approach when preparing for these kinds of
assessment methods. Higher level learning such as critical thinking and analysis
are often not fully assessed.
50
Assessment and Evaluation of Learning
their students. To minimize plagiarism, assessment tasks must be well thought and
well-designed.
e) Lack of interaction and engagement: Students are often not motivated to engage
in a large-sized lecture. When teachers raise questions in large classes, not many
students are willing to respond. Students are less likely to interact with teachers
because they feel less motivated and tend to hide themselves in a large group. In
fact, interacting with students in class is important for teachers because they can
receive immediate feedback from students regarding their quality of teaching.
Although these issues can be problems in assessment for any class size, they are
worse in large classes because of the additional limitation and strain on resources. They
are problems that are applicable whether the function of the assessment is to facilitate
learning via feedback, or to classify students via grading.
There are a number of ways to make the assessment of large numbers of students
more effective whilst still supporting effective student learning. These include:
1. Front ending: The basic idea of this strategy is that by putting in an increased effort
at the beginning in setting up the students for the work they are going to do, the work
submitted can be improved. Therefore the time needed to mark it is reduced (as well
as time being saved in less requests for tutorial guidance).
2. Making use of in-class assignments : In-class assignments are usually quick and
therefore relatively easy to mark and provide feedback on, and help you to identify
gaps in understanding. Students could be asked to complete a task within the
timeframe of a scheduled lecture, field exercise or practical class. This might be a
very quick task, for example, completing a graph, doing some calculations,
answering some quick questions, making brief notes on a piece of text etc. In some
cases it might be possible to merge the in-class assignment with peer assessment.
3. Self-and peer-assessment: Students can perform a variety of assessment tasks in
ways, which both save the tutors time and bring educational benefits, especially
51
Assessment and Evaluation of Learning
the development of their own judgment skills. These include self assessment and
peer assessment strategies.
Self-assessment reduces the marking load because it ensures a higher quality of work
is submitted, thereby minimizing the amount of time expended on marking and
feedback. The emphasis on student self- assessment represents a fundamental shift in
the teacher-student relationship, placing the primary responsibility for learning with the
student. However, there are problems involved in self-assessment for grading purposes
pertaining to their validity and reliability. If self-assessment is utilized for the purposes of
grading, it is imperative to employ peer or staff cross-marking to ensure the validity of
the results. Self-assessment should also be confined to certain limited objectives such
as ascertaining whether all of the required components of an answer are present, or the
articulation of very transparent assessment criteria and standards, possibly
accompanied by examples of work of varying standards. In this regard, self-assessment
can decrease the marking load of teachers and provide students with a positive learning
experience by compelling them to examine their work from the perspective of a marker
as well as a participant.
52
Assessment and Evaluation of Learning
• students can get to see how their peers have tackled a particular piece of
work,
• they can see how you would assess the work (e.g. from the model
answers/answer sheets you've provided) and;
• they are put in the position of being an assessor, thereby giving them an
opportunity to internalize the assessment criteria.
5. Changing the assessment method, or at least shortening it : Being faced with large
numbers of students will present challenges but may also provide opportunities to
either modify existing assessments or to explore new methods of assessment. You might,
for example, be able to reduce the length of the assessment task you are currently
using without detracting from your module's learning outcomes. Alternatively a large
class may provide a new opportunity to make use of peer and self-assessment.
Assignment: Visit any one of the schools in your vicinity and interview at least three
teachers in your subject area using questions you have prepared
for the purpose. The questions should be related to 1) the
problems they have faced in assessing students in large classes;
and 2) the strategies they have used to tackle the problems.
Based on the information you have collected prepare a report of
1-2 pages. You have to file the report as part of your portfolio.
53
Assessment and Evaluation of Learning
Activity: List all of the forms of assessment that you have experienced during
your school years. Are there other approaches to assessment with which
you are familiar even if you haven't personally experienced them as a
student?
A wide variety of tools are available for assessing student performance and there are
approaches that are suitable for any educational objective you want to test. Examples
include objective exams, short answer and essay exams, portfolios, projects, practical
exams, presentations, and combinations of these. Appropriate tools or combinations of
tools must be selected and used if the assessment process is to successfully provide
information relevant to stated educational outcomes.
Does this assessment tool enable students with different learning styles or
abilities to show you what they have learned and what they can do?
Does the content examined by the assessment align with the content from the
course?
54
Assessment and Evaluation of Learning
Will the data accurately represent what the student can do in an authentic or real
life situation?
Does the assessment provide data that is specific enough for the desired
outcomes?
Will the information derived from the assessment help to improve teaching and
learning?
2. Clearly Focused and Appropriate Purpose. WHY are we assessing? How will the
assessment information be used? By whom? To make what decisions? Will the
cultural and linguistic traits of the user interfere with the intended purpose of the
assessment?
55
Assessment and Evaluation of Learning
5. Fairness and Freedom from Biases that Distort the Picture of Learning. HOW
ACCURATE are the assessments? Do they really assess what we think theyre
assessing? Is there anything about the way a target is assessed that masks the true
learning of a student or group of students? Do we know the strengths that students
bring to learning and use those strengths in our assessments?
To plan a classroom test that will be both practical and effective in providing evidence of
mastery of the instructional objectives and content covered requires relevant
considerations. Hence the following serves as guide in planning a classroom test.
i. Determine the purpose of the test;
56
Assessment and Evaluation of Learning
The instructional objectives of the course are critically considered while developing the
test items. This is because the instructional objectives are the intended behavioural
changes or intended learning outcomes of instructional programs which students are
expected to possess at the end of the instructional process. The objectives are also
given relative weight in respect to the level of importance and emphasis given to them.
Educational objectives and the content of a course are the focus on which test
development is based. I hope you remember from our brief discussion in Chapter One
about how educational objectives are classified.
Table of Specification
A table of specification is a two-way table that matches the objectives and content you
have taught with the level at which you expect your students to perform. It contains an
estimate of the percentage of the test to be associated to each topic at each level at
which it is to be measured. In effect we establish how much emphasis to give to each
objective or content. A table of specification guides the selection of test items which in
effect ensures that the test measures a representative sample of instructionally relevant
tasks.
57
Assessment and Evaluation of Learning
2. Outlining the contents of instruction, i.e. the area in which each type of
performance is to be shown, and
3. Preparing the two way chart that relates the learning outcomes to the
instructional content.
Now, let us try to understand how a test blue print is developed using the following table
of specification developed for a Geography test as an example.
Instructional Objectives
Contents Knowl Compreh Applic Anal Synth Evalua Tot Perc
edge ension ation ysis esis tion al ent
Air 2 2 1 1 - - 6 24%
pressur
e
Wind 1 1 1 1 - - 4 16%
Temper 2 2 1 1 - 1 7 28%
ature
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25
Percent 28% 32% 16% 16% 4% 4% 100%
As can be observed from the table, the rows show the content areas from which the test
is to be sampled; and the columns indicate the level of thinking students are required to
demonstrate in each of the content areas. Thus, the test items are distributed among
each of the five content areas with their corresponding representation among the six
levels of the cognitive domain. The percentage row and column also shown the degree
of representation of both the contents and levels of the cognitive domain in this
particular test. Thus objectives you consider are more important should get more
representation in the test items. Similarly, content areas on which you have spent more
instructional time should be allotted more test items.
58
Assessment and Evaluation of Learning
There are also other ways of developing a test blue print. One of this is a way of
showing the distribution of test items among the content areas and the type of test items
to be developed from each content area. For example, the table of specification that we
have seen earlier can be prepared in the following way.
Item Types
Contents True/ Match Short Multiple Tot Perc
False ing Answer Choice al ent
Air 1 1 1 3 6 24%
pressur
e
Wind 1 1 1 1 4 16%
Temper 1 2 1 3 7 28%
ature
Rainfall 1 1 1 2 5 20%
Clouds 1 - 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%
59
Assessment and Evaluation of Learning
Assessment is a critical component of instruction and with careful use can assist in
achieving curricular goals. Considering the powerful effects of assessments it is very
important that testing tools should be carefully chosen and formulated to provide
constructive feedback to the students and teachers about students' competence and
deficiencies. However, writing high quality test questions is not an easy task. Following
are some general principles that we should consider when constructing written test
items.
1) Make the instructions for each type of question simple and brief.
2) Use simple and clear language in the questions. If the language is difficult,
students who understand the material but who do not have strong language skills
may find it difficult to demonstrate their knowledge. If the language is ambiguous,
even a student with strong language skills may answer incorrectly if his or her
interpretation of the question differs from the instructor's intended meaning.
3) Write items that require specific understanding or ability developed in that
subject.
4) Do not suggest the answer to one question in the body of another question. This
makes the test less useful, as the test-wise student will have an advantage.
Your colleagues should be able to provide the correct response to the questions. If
correct answer can be given by only the question writer, and other teachers of the same
level cannot achieve the passing marks it indicates that: either the pieces of information
being asked will never be used by the learner or the question is framed in an ambiguous
manner or is too difficult
ii. Constructing Objective Test Items
There are various types of objective test items. These can be classified into those that
require the student to supply the answer (supply type items) and those that require the
student to select the answer from a given set of alternatives (selection type items).
Supply type items include completion items and short answer questions. Selection type
test items include True/False, multiple choice and matching.
60
Assessment and Evaluation of Learning
Each type of test has its unique characteristics, uses, advantages, limitations, and rules
for construction.
The main advantage of true/false items is that they do not require the student much time
for answering. This allows a teacher to cover a wide range of content by using a large
number of such items. In addition, true/false test items can be scored quickly, reliably,
and objectively by any body using an answer key.
The major disadvantage of true/false items is that when they are used exclusively, they
tend to promote memorization of factual information: names, dates, definitions, and so
on. Some argue that another weakness of true/false items is that they encourage
students for guessing. This is because any student who takes such type of tests does
have a 50 percent probability of getting the right answer. In addition true/false items:
Can often lead a teacher to write ambiguous statements due to the difficulty of
writing statements which are clearly true or false
Do not discriminate b/n students of varying ability as well as other test items
Can often include more irrelevant clues than do other item types
Can often lead a teacher to favor testing of trivial knowledge
61
Assessment and Evaluation of Learning
The following suggestions might perhaps help teachers to construct good quality
true/false test items.
Avoid negative statements, and never use double negatives. In Right-Wrong or
True-False items, negatively phrased statements make it needlessly difficult for
students to decide whether that statement is accurate or inaccurate.
Restrict single-item statements to single concepts. If you double-up two
concepts in a single item statement, how does a student respond if one concept is
accurate and the other isnt? Take a look at this confusing item:
Use an approximately equal number of items, reflecting the two categories
tested. If you typically overbook on false items in your True-False tests, students
who are totally at sea about an item will be apt to opt for a false answer and will
probably be correct.
Make statements representing both categories equal in length. Again, to avoid
giving away the correct answers, dont make all your false statements brief and (in
an effort to include necessary qualifiers) make all your true statements long.
Students catch on quickly to this kind of test-making tendency.
b) Matching Items
A matching item consists of two lists of words or phrases. The test-taker must match
components in one list (the premises, typically presented on the left) with components in
the other list (the responses, typically presented on the right), according to a particular
kind of association indicated in the items directions.
Like True-False items, matching items can cover a good deal of content in an efficient
fashion. They are a good choice if youre interested in finding out if your students have
memorized factual information. Matching items sometimes can work well if you want
your students to cross-reference and integrate their knowledge regarding the listed
premises and responses.
The major advantage of matching items is its compact form, which makes it possible to
measure a large amount of related factual material in a relatively short time. Another
advantage is its ease of construction.
62
Assessment and Evaluation of Learning
The main limitation of matching test items is that they are restricted to the measurement
of factual information based on rote learning. Another limitation is the difficulty of finding
homogenous material that is significant from the perspective of the learning outcomes.
As a result test constructors tend to include in their matching items material which is
less significant.
The following suggestions are important guidelines for the construction of good
matching items.
Use fairly brief lists, placing the shorter entries on the right. If the premises and
responses in a matching item are too long, students tend to lose track of what they
originally set out to look for. The words and phrases that make up the premises
should be short, and those that make up the responses should be shorter still.
Employ homogeneous lists. Both the list of premises and responses must be
composed of similar sorts of things. If not, an alert student will be able to come up
with the correct associations simply by elimination because some entries in the
premises or responses may clearly be noticeable from the others.
Include more responses than premises. If you use the exact same number of
responses as premises in a matching item, then a student who knows half or more
of the correct associations is in a position to guess the rest of the associations with
very good chances.
List responses in a logical order. This rule is designed to make sure you dont
accidentally give away hints about which responses connect with which premises.
Choose a logical ordering scheme for your responses (say, alphabetical or
chronological) and stick with it.
Describe the basis for matching and the number of times a response can be
used. To satisfy this rule, you need to make sure your tests directions clarify the
nature of the associations you want students to use when they identify matches.
Regarding the students use of responses, a phrase such as the following is often
employed: Each response in the list at the right may be used once, more than
once, or not at all.
63
Assessment and Evaluation of Learning
Try to place all premises and responses for any matching item on a single
page. This rules intent is to free your students from lots of potentially confusing
flipping back and forth in order to accurately link responses to premises.
The short-answer items and completion test items are essentially the same that can be
answered by a word, phrase, number or formula. They differ in the way the problem is
presented. The short answer type uses a direct question, where as the completion test
item consists of an incomplete statement requiring the student to complete. This can be
demonstrated by the following examples:
Short answer item: In which year did the Ethiopians defeat the Italian invaders at Adwa?
Completion item: The Ethiopian forces defeated the Italian invaders at Adwa in the year _____.
The short-answer test items are one of the easiest to construct, partly because of the
relatively simple learning outcomes it usually measures. Except for the problem-solving
outcomes measured in Mathematics and Science, it is used almost exclusively to
measure the recall of memorized information.
A more important advantage of the short-answer item is that the students must supply
the answer. This reduces the possibility that students will obtain the correct answer by
guessing. They must either recall the information requested or make the necessary
computations to solve the problem presented to them. Partial knowledge, which might
enable them to choose the correct answer on a selection item, is insufficient for
answering a short answer test item correctly.
There are two limitations cited in the use of short-answer test items. One is that they are
unsuitable for assessing complex learning outcomes. The other is the difficulty of
64
Assessment and Evaluation of Learning
scoring. This is especially true where the item is not clearly phrased to require a
definitely correct answer and the students spelling ability.
The following suggestions will help to make short-answer type test items to function as
intended.
Word the item so that the required answer is both brief and specific.
Example: An animal that eats the flesh of other animals is _____. Poorly stated
An animal that eats the flesh of other animals is classified as _____. Better item
Do not take statements directly from textbooks to use as a basis for short-answer
items. When taken out of context, such statements are frequently too general
and ambiguous to serve as good short-answer items.
A direct question is generally more desirable than an incomplete statement.
If the answer is to be expressed in numerical units, indicate the type of answer
wanted. For computational problems, it is usually preferable to indicate the units
in which the answer is to be expressed.
d) Multiple-Choice Items
This is the most popular type of selected-response item. It can effectively measure
many of the simple learning outcomes measured by the short-answer item, the true-
false item, and the matching item types. In addition, it can measure a variety of complex
cognitive learning outcomes.
A multiple-choice item consists of three a stem that poses the problem and a list of
suggested solutions. A student is first given either a question or a partially complete
statement. This part of the item is referred to as the items stem. Then three or more
potential answer-options are presented. These are usually called alternatives, choices
or options. The correct response is called the key answer, the remaining alternatives
are called distractors.
65
Assessment and Evaluation of Learning
Which of the following European countries as suffered more from the consequences of
the Second World War?
A. Germany B. Britain C. France D. Russia/USSR/
The key weakness of multiple-choice items is that when students review a set of
alternatives for an item, they may be able to recognize a correct answer that they would
never have been able to generate on their own. In that sense, multiple-choice items can
present an exaggerated picture of a students understanding or competence, which
might lead teachers to invalid inferences.
Another serious weakness, one shared by all selected-response items, is that multiple-
choice items can never measure a students ability to creatively synthesize content of
any sort. Finally, in an effort to come up with the necessary number of plausible
alternatives, novice item-writers sometimes toss in some alternatives that are obviously
incorrect.
66
Assessment and Evaluation of Learning
The general applicability and the superior qualities of multiple-choice test items are
realized well when care is taken in their construction. This involves formulating a clearly
stated problem, identifying plausible alternatives, and avoiding irrelevant clues to the
answer. The following are more specific suggestions for the construction of good
multiple choice items.
The question or problem in the stem must be self-contained. The stem
should contain as much of the items content as possible, thereby rendering the
alternatives much shorter than would otherwise be the case.
Avoid negatively stated stems. Just as with the True/False items, negatively
stated stems can create genuine confusion in students.
Each alternative must be grammatically consistent with the items stem.
Well, as you can see from the next sample item, grammatical inconsistency for
three of these answer-options supplies students with an unintended clue to the
correct answer.
Make all alternatives plausible, but be sure that one of them is indisputably
the correct or best answer. As I indicated when describing the weaknesses of
multiple-choice items, teachers sometimes toss in one or more implausible
alternatives, thereby diminishing the item substantially. Although avoiding that
problem is important, its even more important to make certain that you really do
have one valid correct answer in any items list of alternatives, rather than two
similar answers, either of which could be arguably correct.
Randomly use all answer positions in approximately equal numbers. If you
use four-option items, make sure that roughly one-fourth of the correct answers
turn out to be A, one fourth B, and so on.
Never use all of the above as an answer choice, but use none of the
above to make items more demanding. Students often become confused
when confronted with items that have more than one correct answer. Usually,
what happens is theyll see one correct alternative and instantly opt for it without
recognizing that there are other correct options later in the list. In addition,
students will definitely opt for the all of the above option if they realize that two
67
Assessment and Evaluation of Learning
of the alternatives are correct without considering the third option. However, we
can increase the difficulty level of a test item by presenting three or four answer
options, none of which is correct, followed by a correct none-of-the-above
option.
Verbal associations between the stem and the correct answer should be
avoided. Frequently a word in the correct answer will provide an irrelevant clue
because it looks or sounds like a word in the stem of the item.
The relative length of the alternatives should not provide a clue to the
answer. Since the correct alternative usually needs to be qualified, it tends to be
longer than the distracters unless a special effort is made to control the length of
the alternatives.
Activity: Examine the following faulty multiple choice items and identify their
problems.1) The term "side effect" of a drug refers to:
A. additional benefits from the drug.
B. the chain effect of drug action.
C. the influence of drugs on crime.
D. any action of a drug in the body other than the one the doctor wanted
the drug to have
2) When linking two clauses, one main and one subordinate, one should use a:
A. coordinate conjunction such as and or so
B. subordinate conjunction such as because or although.
C. preposition such as to or from.
D. semicolon.
3) Entomology is:
A. the study of birds.
B. the study of fish.
C. the study of insects.
68
Assessment and Evaluation of Learning
4) The promiscuous use of sprays, oils, and antiseptics in the nose during acute colds
is a pernicious practice because it may have a deleterious effect on
A. the sinuses.
B. red blood cells.
C. white blood cells.
5) An electric transformer can be used:
A. for storing electricity
B. to increase the voltage of alternating current
C. It converts electric energy in mechanical energy
D. alternating current is changed to direct current
In the previous paragraphs you have been learning on how objective tests should be
constructed. You have learned that well constructed objective tests can measure a
variety of learning outcomes, from simple to complex. Despite this wide applicability of
such type of tests, there remain significant learning outcomes for which no satisfactory
objective measurements have been developed. These include such outcomes as the
ability to recall, organize, and integrate ideas and the ability to express oneself in
writing. Such outcomes require less structuring of responses than objective test items,
and it is in the measurement of these outcomes that written essays are of great value.
The distinctive feature of essay questions is that students are free to construct, relate,
and present ideas in their own words. Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and evaluate ideas require the freedom of
response and the originality provided by essay questions.
Essay questions can be classified into two types restricted-response essay questions
and extended response essay questions. Now let us briefly see these type of questions.
Restricted-response essay questions: These types of questions usually limit both the
content and the response. The content is usually restricted by the scope of the topic to
be discussed. Limitations on the form of response are generally indicated in the
question. This can be demonstrated in the following example:
69
Assessment and Evaluation of Learning
In what ways are essay questions more preferable than objective test
items? Answer in a brief paragraph.
In addition to the already described capacity in measuring higher order thinking skills,
essay questions have some more advantages which include the following:
Extended-response essays focus on the integration and application of thinking
and problem solving skills.
Essay assessments enable the direct evaluation of writing skills.
Essay questions, as compared to objective tests, are easy to construct.
Essay questions have a positive effect on students learning.
On the other hand, essay questions also have some limitations which you need to be
aware of. Perhaps the most commonly cited problem of those test questions is their
unreliability of scoring. Thus, the same paper may be scored differently by different
teachers, and even the same teacher may give different scores for the same paper at
different times. Another limitation is the amount of time required for scoring the
responses. Still another problem with essay tests is the limited sampling of content they
provide.
There are some guidelines for improving the reliability and validity of essay scores. The
following are suggestions for the construction of good essay questions:
Restrict the use of essay questions to those learning outcomes that can
not be measured satisfactorily by objective items. As we have seen earlier,
objective measures have the advantage of efficiency and reliability. When
70
Assessment and Evaluation of Learning
objective items are inadequate for measuring learning outcomes, however, the
use of essay questions becomes necessary despite their limitations.
Structure items so that the students task is explicitly bounded. Phrase
your essay items so that students will have no doubt about the response youre
seeking. Dont hesitate to add details to eliminate ambiguity.
For each question, specify the point value, an acceptable response-length,
and a recommended time allocation. What this second rule tries to do is give
students the information they need to respond appropriately to an essay item.
The less guessing that your students are obliged to do about how theyre
supposed to respond, the less likely it is that youll get lots of off-the-wall essays
that dont give you the evidence you need.
Employ more questions requiring shorter answers rather than fewer
questions requiring longer answers. This rule is intended to foster better
content sampling in a tests essay items. With only one or two items on a test,
chances are awfully good that your items may miss your students areas of
content mastery or non mastery.
Dont employ optional questions. When students are made to choose their
essay items from several options, you really end up with different tests,
unsuitable for comparison.
Test a questions quality by creating a trial response to the item. A great
way to determine if your essay items are really going to get at the responses you
want is to actually try writing a response to the item, much as a student might do.
71
Assessment and Evaluation of Learning
desired learning outcomes are achieved up to the expected standards. For example,
oral performance is required to assess a students spoken communication skills in a
certain language. Similarly, the use of mathematics to solve meaningful problems and to
communicate solutions to others may also be best assessed by the use of performance
tasks in realistic settings.
72
Assessment and Evaluation of Learning
Arranging the sections of a test in this order produces a sequence that roughly
approximates the complexity of the outcomes measured, ranging from the simple to the
73
Assessment and Evaluation of Learning
complex. It is then a merely a matter of grouping the items within each item type. For
this purpose, items that measure similar outcomes should be placed together and then
arranged in order of ascending difficulty. For example the items under the multiple
choice section might be arranged in the following order: knowledge of terms, knowledge
of specific facts, knowledge of principles, and application of principles. Keeping together
items that measure similar learning outcomes is especially helpful in determining the
type of learning outcomes causing students the greatest difficulty.
If, for any reason, it is not feasible to group the items by the learning outcomes
measured, then it is still desirable to arrange them in order of increasing difficulty.
Beginning with the easiest items and proceeding gradually to the most difficult has a
motivating effect on students. Also, encountering difficult items early in the test often
causes students to spend a disproportionate amount of time on such items. If the test is
long, they may be forced to omit later questions that they could easily have answered.
With the items classified by item type, the sections of the test and the items within each
section can be arranged in order of increasing difficulty.
To summarize, the most effective method for organizing items in the typical classroom
test is to:
1. Form sections by item type
2. Group the items within each section by the learning outcomes measured, and
3. Arrange both the sections and the items within sections in an ascending order
of difficulty.
Project Work: In groups of four, take one exam paper from the school you are placed
for your practicum experience which includes at least three
types of test items. Then evaluate the items and the test in
general based on the guidelines of test construction you have
learned in the unit. You have to prepare and submit a report
74
Assessment and Evaluation of Learning
of your evaluation to your instructor. The test paper you have evaluated should
also be attached with your report.
There are a number of conditions that may create test anxiety on students and therefore
should be taken care of during test administration. These include:
Threatening students with tests if they do not behave
Warning students to so their best because the test is important
Telling students they must work fast in order to finish on time.
Threatening dire consequences if they fail.
Do not talk unnecessarily before the test. Test takers time should not be
wasted at the beginning of the test with unnecessary remarks, instructions or
threat that may develop test anxiety.
It is necessary to remind the test takers of the need to avoid malpractices
before they start and make it clear that cheating will be penalized.
Stick to the instructions regarding the conduct of the test and avoid giving hints
to test takers who ask about particular items. But make corrections or
clarifications to the test takers whenever necessary.
Keep interruptions during the test to a minimum.
Hence, in test administration, effort should be made to see that the test takers are given
a fair and unaided chance to demonstrate what they have learnt with respect to:
Instructions: Test should contain a set of instructions which are usually of two
types. One is the instruction to the test administrator while the other one is to the
test taker. The instruction to the test administrator should explain how the test is to
be administered the arrangements to be made for proper administration of the test
and the handling of the scripts and other materials. The instructions to the
administrator should be clear for effective compliance. For the test takers, the
instruction should direct them on the amount of work to be done or of tasks to be
accomplished. The instruction should explain how the test should be performed.
Examples may be used for illustration and to clarify the instruction on what should
be done by the test takers. The language used for the instruction should be
76
Assessment and Evaluation of Learning
appropriate to the level of the test takers. The necessary administrators should
explain the test takers instruction for proper understanding especially when the
ability to understand and follow instructions is not part of the test.
Duration of the Test: The time for accomplishing the test is technically important in
test administration and should be clearly stated for both the test administrators and
test takers. Ample time should be provided for candidates to demonstrate what they
know and what they can do. The duration of test should reflect the age and
attention span of the test takers and the purpose of the test.
Venue and Sitting Arrangement: The test environment should be learner friendly
with adequate physical conditions such as work space, good and comfortable
writing desks, proper lighting, good ventilation, moderate temperature,
conveniences within reasonable distance and serenity necessary for maximum
concentration. It is important to provide enough and comfortable seats with
adequate sitting arrangement for the test takers comfort and to reduce
collaboration between them. Adequate lighting, good ventilation and moderate
temperature reduce test anxiety and loss of concentration which invariably affects
performance in the test. Noise is another undesirable factor that has to be
adequately controlled both within and outside the test immediate environment since
it affects concentration and test scores.
Other necessary conditions: Other necessary conditions include the fact that the
questions and questions paper should be friendly with bold characters, neat, decent,
clear and appealing and not such that intimidates test taker into mistakes. All
relevant materials for carrying out the demands of the test should be provided in
reasonable number, quality and on time.
All these are necessary to enhance the test administration and to make assessment
civil in manifestation.
On the other hand, for the credibility effort should be made to moderate the test
questions before administration based on laid down standard. It is also important to
77
Assessment and Evaluation of Learning
ensure that valid questions are constructed based on procedures for test construction
which you already have learned in the earlier sections of this unit.
Secure custody should be provided for the questions from the point of drafting to
constituting the final version of the test, to provision of security and safe custody of live
scripts after the assessment, transmitting them to the graders and provision of secure
custody for the grades arising from the assessment against loss, mutilation and
alteration. The test administrators and the graders should be of proven moral integrity
and should hold appropriate academic and professional qualifications. The test scripts
are to be graded and marks awarded strictly by using itemized marking schemes. All
these are necessary because an assessment situation in which credibility is seriously
called to question cannot really claim to be valid.
Essay questions create some difficulty on teachers in scoring the answers so that
achievement is reliably measured than other types of tests. Therefore, some attention
will be given here to the considerations you should make when scoring essay questions.
As you are already aware the construction and scoring of essay questions are
interrelated processes that require attention if a valid and reliable measure of
achievement is to be obtained. In the essay test the examiner is an active part of the
measurement instrument. Therefore, the variability within and between examiners affect
the resulting score of examinee. This variability is a source of error, which affects the
78
Assessment and Evaluation of Learning
reliability of essay test if not adequately controlled. Hence, for the essay test result to
serve useful purpose as valid measurement instrument conscious effort is made to
score the test objectively by using appropriate methods to minimize the effort of
personal biases on the resulting scores; and applying standards to ensure that only
relevant factors indicated in the course objectives and called for during the test
construction are considered during the scoring. There are two common methods of
scoring essay questions.
i. The point or analytic method: In this method each answer is compared with
already prepared ideal marking scheme (scoring key) and marks are assigned
according to the adequacy of the answer. When used carefully, the analytic method
provides a means for maintaining uniformity in scoring between scorers and
between scripts thus improving the reliability of the scoring.
ii. The global/holistic of rating method: In this method the examiner first sorts the
response into three or more categories of varying quality based on his general or
global impression on reading the responses. The standard of quality helps to
establish a relative scale, which forms the basis for ranking responses from those
with the poorest quality response to those that have the highest quality response.
When the scorer is completely satisfied that each response is in its proper category,
it is marked accordingly..
As we have seen earlier the most serious limitation with essay questions is related to
scoring. Therefore, the following guidelines would be helpful in making the scoring of
essay items easier and more reliable.
1. You should ensure that you are firm emotionally, mentally etc before scoring
2. All responses to one item should be scored before moving to the next item
3. Write out in advance a model answer to guide yourself in grading the students
answers
4. Shuffle exam papers after scoring every question before moving to the next
5. The names of test takers should not be known while scoring to avoid bias
The task of reporting students progress cannot be separated from the procedures
used in assessing students learning. If educational objectives have been clearly defined
in performance terms and relevant tests and other assessment procedures have been
properly used, grading and reporting become a matter of summarizing the results and
presenting them in understandable form. School grades and progress reports serve
various functions in the school. They provide information that is helpful to students,
parents and school personnel.
Different reporting methods have been used in schools including: letter grades, numeric
grades, the pass-fail system, checklist of objectives, portfolios of selected examples of
student work, and parent-teacher conferences. Obviously in our countrys education
system, we use numeric grades to report students performance at secondary school
level. This doesnt, however, mean that we should be limited to this reporting method.
Depending on the purposes we want our report to serve, we can use a combination of
two or more reporting methods. For example, we may give marks to summarize
students overall performance. At the same time we may hold conferences with parents
to report a qualitative description of students progress in their learning. Thus, as
prospective teachers, we are required to have skills of interpreting results to students’
parents or other lay audience. Such conference supplements the more formal written
report of students progress.
Unit Summary
In this unit you were introduced to different types of assessment approaches, namely
formal vs. informal, criterion referenced vs. norm referenced, formative vs. summative
assessments. You also learned about various assessment strategies. These include:
classroom presentations, exhibitions/demonstrations, conferences, interviews,
observations, performance tasks, portfolios, question and answer, students self
assessment, checklists, rating scales and rubrics, one-minute paper, muddiest point,
students-generated questions and tests.
You also learned about the challenges in the assessment of large classes and their
consequences and some of the strategies that we can use to minimize those
80
Assessment and Evaluation of Learning
challenges. These strategies include: front ending, Making use of in-class assignments,
self and peer assessment, group assessment, Changing the assessment method, or at
least shortening it.
Much of this unit was devoted to the construction of the widely assessment techniques,
that is tests. This was preceded by a discussion about the planning of tests. In this
regard, tests were classified into two broad categories: Objective tests and performance
assessment tasks (essay tests). Objective tests were further divided into supply type
items and selection type items. Supply type items include short answer and completion
items, where as selection type items include True/false items, matching items and
multiple choice items. Essay items were also classified into restricted essay items and
extended essay items. Here you have learned about the strengths and limitations of
these different test item types. You were also introduced to the major guidelines you
should follow in constructing these test item types.
This unit also covered about the arrangement of test item types. Finally you also
learned about the techniques and procedures we should follow during test
administration.
Self-Check Exercises
1. State the differences between formative and summative
assessment, criterion referenced and norm referenced
assessment, and formal and informal assessment.
2. What conditions do we consider in selecting assessment strategies in our
subject?
3. List down the major assessment strategies that you can use in your subject and
classify them as formal and informal strategies.
4. What are the major problems associated with assessing students in large
classes? What strategies can we use to minimize these problems?
5. What is a table of specification and what major purposes does it serve?
6. What is the difference between objective tests and essay tests?
7. What are the advantages of objective tests as compared to essay tests?
81
Assessment and Evaluation of Learning
References
Angelo, T.A. & Cross, K.P (1993). Classroom Assessment Techniques; A Handbook
for College Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.
Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education
Through Assessment, Innovation, and Evaluation. American Academy of Arts and
Sciences
Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. 3 rd ed. Learning
Matters Ltd
Mehrens, W.A. & Lehman, I.J Measurement and Evaluation in Education. 4 th Ed.
New York: Harcourt Brace College Publishers.
Miller, D.M, Linn, RL. & Grunland, NE. (2009). Measurement and Assessment in
Teaching. 10th ed. Upper Saddle River:Pearson Education, Inc.
82
Assessment and Evaluation of Learning
83
Assessment and Evaluation of Learning
UNIT 3
In unit two you learned about various assessment strategies that can be
used in the context of secondary education. You were also introduced
with the planning, construction and administration of classroom tests. In
this unit you are going to be familiarized with the idea of test score interpretation and the
major statistical techniques that can be used to interpret test scores. Particularly, you
will learn about the methods of interpreting test scores, measures of central tendency,
measures of dispersion or variability, measures of relative position, and measures of
relationship or association.
Learning Outcomes
Imagine that you receive a grade of 60 for a midterm exam in one of your university
classes. What does the score mean, and how should we interpret it?
test standing on itself rarely has meaning. For instance, a score of 60% in one
Assessment and evaluation of learning test cannot be said to be better than a score of
50% obtained by the same test taker in another test of the same subject. The test
scores on their own lack a true zero point and equal units. Moreover, they are not based
on the same standard of measurement and as such meaning cannot be read into the
scores on the basis of which academic and psychological decisions may be taken.
Kinds of scores
Data differ in terms of what properties of the real number series (order, distance, or
origin) we can attribute to the scores. The most common kinds of scores include
nominal, ordinal, interval, and ratio scales.
A nominal scale involves the assignment of different numerals to categorize that are
qualitatively different. For example, we may assign the numeral 1 for males and 2 for
females. These symbols do not have any of the three characteristics (order, distance, or
origin) we attribute to the real number series. The 1 does not indicate more of
something than the 0.
An ordinal scale has the order property of a real number series and gives an indication
of rank order. For example, ranking students based on their performance on a certain
athletic event would involve an ordinal scale. We know who is best, second best, third
best, etc. But the ranked do not tell us anything about the difference between the
scores.
With interval data we can interpret the distances between scores. If, on a test with
interval data, a Almaz has a score of 60, Abebe a score of 50, and Beshadu a score of
30, we could say that the distance between Abebes and Beshadus scores (50 to 30)
is twice the distance between Almazs and Abebes scores (60 t0 50).
If one measures with a ratio scale, the ratio of the scores has meaning. Thus, a person
whose height is 2 meters is twice as a tall as a person whose height is 1 meter. We can
make this statement because a measurement of 0 actually indicates no height. That is,
85
Assessment and Evaluation of Learning
Criterion - referenced interpretation is the interpretation of test raw score based on the
conversion of the raw score into a description of the specific tasks that the learner can
perform. That is, a score is given meaning by comparing it with the standard of
performance that is set before the test is given. It permits the description of a learners
test performance without referring to the performance of others. Thus, we might
describe a pupils performance in terms of the speed with which a task is performed,
the precision with which a task is performed, or the percentage of items correct on some
clearly defined set of learning tasks. The percentage-correct score is widely used in
criterion-referenced test interpretation.
Criterion referenced interpretation of test results is most meaningful when the test has
been specifically designed for this purpose. This typically involves designing a test that
measures a set of clearly stated learning tasks. Enough items are used for each
86
Assessment and Evaluation of Learning
There are three basic measures of central tendency the mean, the mode and the
median - and choosing one over another depends on two different things:
1. The scale of measurement used, so that a summary makes sense given the
nature of the scores.
87
Assessment and Evaluation of Learning
The Mean
The mean, or arithmetic average, is the most widely used measure of central tendency.
It is the average of a set of scores computed simply by adding together all scores and
dividing by the number of scores. The mean takes into account the value of each score,
and so one extremely high or low score could have a considerable effect on it. It is
helpful to know the mean because then you can see which numbers are above and
below the mean.
Here is an example of test scores for a Maths class: 82, 93, 86, 97, 82. To find the
Mean, first you must add up all of the numbers. (82+93+86+97+82= 433) Now, since
there are 5 test scores, we will next divide the sum by 5. (440÷5= 88). Thus, the Mean is
88. The formula used to compute the mean is as follows:
Where, = Mean
∑ = the sum of
X = any score
N = Number of scores
The Median
In some circumstances, the mean may not be the best indicator of student
performance. If there are one or a few students who score considerably lower (or
higher) than the other students, their scores tend to pull the mean in their direction. In
this case the median is usually considered a better indicator of student performance.
88
Assessment and Evaluation of Learning
There are also some types of scores that are reported for standardized tests for which
the mean is not appropriate (percentile scores), so the median is used.
The median is a counting average. It is the number that divides a distribution of scores
exactly in half. It is determined by arranging the scores in order of size and counting up
to (or down to) the midpoint of the set scores. The median will usually be around where
most scores fall. When the number of scores is odd, the median is the middle score. If
the number of scores is even, the median will be halfway between the two middle most
scores. In this case the median is not an actual score earned by one of the students.
In example 1, our line would be between 44 and 45, so the median would be halfway
between them at 44.5. In this case the median is not an actual score earned by one of
the students. In example 2, the distance between the two middle scores (43 and 46) is
more than one, so we again find the point halfway between them for our median of
44.5. If the number of students is uneven, the median is the one score that is the
middle score in the frequency distribution, having equal numbers of scores above and
89
Assessment and Evaluation of Learning
below it. Thus, the median is 44 in example 3, and 45 in example 4. It does not matter if
more than one student earns that score, as in example 4.
The Mode
This is the score (or scores) that occur most frequently and is determined by inspection.
It is the least reliable type of statistical average and is frequently used merely as a
preliminary estimate of central tendency. A set of scores may sometimes have two or
more modes and in such cases are called bimodal or multimodal respectively.
If the data is categorical (measured on the nominal scale), then only the mode can be
calculated. The mode can also be calculated with ordinal and higher data, but it often is
not appropriate. If other measures can be calculated, the mode would never be the first
choice. For example, the following test scores, 7, 7, 7, 20, 23, 23, 24, 25, 26 have a
mode of 7, but obviously it doesnt make much sense. Remember, measures of central
tendency look for the one number which best describe all of the numbers.
90
Assessment and Evaluation of Learning
To the extent that differences are observed among these three measures, the
distribution is asymmetrical or skewed. These include positively skewed distributions
and negatively skewed distributions. In a positively-skewed distribution (see figure 1
above) most of the scores concentrate at the low end of the distribution. This might
occur, for example, if the test was extremely difficult for the students. .In a negatively-
skewed distribution, as shown in figure 1 above, the majority of scores are toward the
high end of the distribution. This could occur if we gave a test that was easy for most of
the students.
Points to note
With perfectly bell shaped distributions, the mean, median, and
A set of scores can be more adequately described if we know how much they spread
out above and below the measure of central tendency. For example, we might have two
groups of students with a mean score of 70, but in one group the span of scores is from
60 to 80 and in the other group the span is from 50 to 100. These represent quite
different spreads of performance. We can identify such differences by numbers that
indicate how much scores spread out in a group. These are called measures of
variability or dispersion. The three most commonly used measures of variability are the
range, the quartile deviation, and the standard deviation.
The Range
It is the simplest and crudest measure of variability calculated by subtracting the lowest
score from the highest score. For example, if the score of 10 students in a certain test
is: 5, 7, 8, 10, 12, 13, 14, 15, 17, 19, then the range will be 19 -5 = 14. The range
provides a quick estimate of variability but is undependable because it is based on the
position of the two extreme scores. The addition of subtraction of a single score can
change the range significantly.
Inter quartile range (IQR) is another range measure but this time looks at the data in
terms of quarters or percentiles. IQR is the distance between the 25 th and 75th percentile
or the first and third quarter. The range of data is divided into four equal percentiles or
quarters (25%). IQR is the range of the middle 50% of the data. Therefore, because it
uses the middle 50%, it is not affected by outliers or extreme values. The IQR is often
used with skewed data as it is insensitive to the extreme scores.
92
Assessment and Evaluation of Learning
Let us say that two classes took a quiz. There were 10 students in each class, and each
class had an average score of 81.5. Since the averages are the same, can we assume
that the students in both classes have the same performance on the exam?
The answer is No. The average (mean) does not tell us anything about the distribution
or variation in the grades. So, we need to come up with some way of measuring not
just the average, but also the spread of the distribution of our data.
The most useful measure of variability, or spread of scores, is the standard deviation. It
is essentially an average of the degree to which a set of scores deviates from the mean.
If the Standard Deviation is large, it means the numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to their mean.
Because it takes into account the amount that each score deviates from the mean, it is
a more stable measure of variability than either the range or quartile deviation.
The procedure for calculating a standard deviation involves the following steps:
Thus the formula for the standard deviation (SD) is: SD=
Now let us take the previous scenario of two groups of students who too a Math quiz
with a mean score of 81.5 to calculate and compare their standard deviations. The
individual scores of group A is: 72, 76, 80, 80, 81, 83, 84, 85, 85, & 89. The individual
scores of group B is: 57, 63, 65, 71, 83, 93, 94, 95, 96, 98. Let us start with group A. So,
93
Assessment and Evaluation of Learning
the first step to finding the Standard Deviation is to find all the distances from the mean.
This will be followed by squaring each distances which will give us the following results.
I am sure you have come up with 15.1 as a standard deviation for the distribution of
scores of group B. Now, lets compare the two groups of students again.
Group A Group B
What is your interpretation of the test scores of the two groups based on their standard
deviations?
94
Assessment and Evaluation of Learning
Activity: The Math test scores of five students are: 92,88,80,68 and 52.
Find the variance and standard deviation.
The standard deviation is used with the mean. It is the most reliable measure of
variability, and is especially useful in testing. In addition to describing the spread of
scores in a group, it serves as a basis for computing standard scores, the standard error
of measurement, and other statistics used in analyzing and interpreting test scores.
There are different ways to measure the relative position of scores. Suppose that you
have scored 55 on a test. What do you say about this score?
On the surface it might look bad but what if that was the highest in the class or if that
score was better than 80% of the class? This is what we mean by relative position.
Percentiles
95
Assessment and Evaluation of Learning
A percentile is a score that indicates the rank of the student compared to others (same
age or same grade), using a hypothetical group of 100 students. . It tells you what
percentage of people you did better than. A percentile of 25 (25 th percentile), for
example, indicates that the student's test performance equals or exceeds 25 out of 100
students on the same measure. A percentile of 87 indicates that the student equals or
surpasses 87 out of 100 (or 87% of) students. A percentile must always refer to a
students percentile rank as relative to a particular norm group. If you scored at the 80 th
percentile, what does that mean?
2. Count how many items are below your value. If for example your score is 85 and
there are multiple 85s then count how many are under the first 85.
For example, in the students scores of 76, 77, 80, 83, 85, 85, 85, 90, 96 ,97 there are
4 items below 85.
10
Quartiles
Quartile is another term referred to in percentile measure. The total of 100% is broken
into four equal parts: 25%, 50%, 75% 100%.
96
Assessment and Evaluation of Learning
Standard Scores
Another method of indicating a pupils relative position in a group is by showing how far
the raw score is above or below average. This is the approach used with standard
scores. Basically, standard scores express test performance in terms of standard
deviation units from the mean. Standard scores are scores that are based on mean and
standard deviation.
Z Score: For data distributions that are approximately symmetric, a measure of relative
position that is often used is the z-score. z-score gives us an estimate as to how many
standard deviations a particular score lies from the mean.
We define z score as z = X X,
S
97
Assessment and Evaluation of Learning
If the z-score > 0 (positive) then your data value is above the mean
If the z-score < 0 (negative) then your data value is the below the mean.
Example. Almaz scored a 25 on her math test. Suppose the mean for this exam is 21,
with a standard deviation of 4. Dawit scored 60 on an English test which had a mean of
50 with a standard deviation of 5. Who did relatively better?
Since standardized tests typically have score distributions which are approximately
symmetric, we will find the respective z-scores for Almaz and Dawit.
Almaz= z-score: 25 - 21 =1
4
Since Dawit had a higher z-score, we say Dawit did relatively better.
T Scores: This refers to any set of normally distributed standard scores that has a
mean score of 50 and a standard deviation of 10. The T score is obtained by
multiplying the Z-score by 10 and adding the product to 50. That is, T Score = 50 +
10(z). A score of 60 is one standard deviation above the mean, while a score of 30 is
two standard deviations below the mean.
Example
A test has a mean score of 40 and a standard deviation of 4. What are the T scores of
two test takers who obtained raw scores of 30 and 45 respectively in the test?
Solution
The first step in finding the T-scores is to obtain the z-scores for the test takers. The z-
scores would then be converted to the T scores. In the example above, the z
scores are:
For the test taker with raw score of 30, the Z – score is:
98
Assessment and Evaluation of Learning
SD
X = 30, M = 40, SD = 4.
4 4
The T - Score is then obtained by converting the Z – Score (-2∙5) to T – score. Thus:
T – Score = 50 + 10 (z)
= 50 + 10 (-2∙5)
= 50 – 25
= 25
Activity: Following the same procedures find the t score for the second
student whose raw score is 45.
99
Assessment and Evaluation of Learning
Project work: In groups of five, take the roaster of one cooperating teacher of the
school you are placed for your practicum experience and do the
following tasks:
a) Calculate the average marks of the students of the section by
taking five subjects
b) Based on the calculated averages, find the mode, the median, the range, the
inter-quartile range, and the standard deviation
c) Find the average scores that lie in the 25th, 50th, and 75th percentiles
d) Take the scores of two subjects and calculate the coefficient of correlation
You have to prepare a report of your work and submit it for correction.
Validity is the most important quality you have to consider when constructing or
selecting a test. It refers to when a test serves its purpose(s), that is measures what it
100
Assessment and Evaluation of Learning
intended to measure and to the extent desired. It is concerned with whether the
information being gathered is relevant to the decision that needs to be made. It is all
about the extent to which assessment information can be trusted (truthfulness). Thus
validity is always concerned with the specific use of the results and the soundness of
our proposed interpretations. Hence, to the extent that a test score is decided by factors
or abilities other than that which the test was designed or used to measure, its validity is
impaired.
Reliability: Test reliability refers to the accuracy, consistency and stability of scores
students would receive on alternate forms of the same test. The more the pair of scores
observed for the same testee varies from each other, the less reliable the measure is.
The more consistent our test results are from one measurement to another, the less
error there will be and consequently, the greater the reliability. Reliability is a precursor
to test validity. If test scores cannot be assigned reliably, it is impossible to conclude
that the scores accurately measure the domain of interest.
There are some factors the reliability of a test which include the following:
Test length: the longer a test is, the more reliable it is ( in that wide coverage of
content is ensured) but NOT TOO LONG
Sample heterogeneity: the more heterogeneous the test items, the higher the
reliability
Irregularities: lightening conditions, testees failure to follow directions,
In order to be valid, a test must be reliable; but reliability does not guarantee validity.
101
Assessment and Evaluation of Learning
Objectivity- The fairness of a test to the testee, bias test does not portray objectivity
and hence is not reliable. A test that is objective has high validity and reliability
Discrimination- A good test must be able to make distinction between poor and good
learner; it should show the slight differences between learner attainment and
achievement that will make it possible to distinguish between poor and good learner.
What are the likely criteria in order to satisfy these conditions?
Comprehensiveness- Test items that covers much of the content of the course, that is
the subject matter is said to be comprehensive and hence capable of fulfilling purpose.
Practicality and scoring- Assigning quantitative value to a test result should not be
difficult. Why, what and how.
Usability- a good test should be useable, unambiguous and clearly stated with one
meaning only.
Unit Summary
In this unit you learned that test interpretation is a process of assigning meaning and
usefulness to the scores obtained from classroom test and you were introduced to how
to interpret test scores. This includes criterion-referenced and norm-referenced
interpretation. Criterion-referenced interpretation is the interpretation of test raw score
based on the conversion of the raw score into a description of the specific tasks that the
learner can perform. Norm-referenced interpretation is the interpretation of raw score
based on the conversion of the raw score into some type of derived score that indicates
the learners relative position in a clearly defined reference group.
This unit also introduced you with different statistical techniques that are useful in
interpreting test scores. These are classified into measures of central tendency,
measures of dispersion, measures of relative position and measures of association or
relationship. The measures of central tendency help us to come up with the one single
102
Assessment and Evaluation of Learning
score that best describes a distribution of scores. The most commonly used measures
of central tendency are the mean, the mode and the median. The measures of
dispersion tell us how much the scores spread out above and below the measure of
central tendency as well as how much they are spread out from one another. These
measures include the range, the inter-quartile range and the standard deviation. The
measures of relative position are techniques that will show us the relative standing of
individual scores within a certain set of scores. Measures that are used here include
percentile ranks, quartiles, and standardized scores such as the z scores and t scores.
The measures of relationship or association help us to know the degree to which sets of
scores are related. The most commonly used measure of relationship is the product-
moment correlation coefficient. Finally, you have also learned about the major
characteristics that make a good test.
3.3. References
Cohen, Louis and M. Holliday. Statistics for Education and Physical Education, New
York: Harper and Row Publishers
Hinkle. D.E. et al. (1994) Applied Statistics for the Behavioral Sciences. Bodyon:
Houghton Miffline Company
McClave, J.T. and Terry Sincich (2003). Statistics (9th ed.), New Jersey: Prentice Hall.
104
Assessment and Evaluation of Learning
UNIT 4
ITEM ANALYSIS
4.1 INTRODUCTION
Learning Outcomes
Once a teacher has corrected and marked his/her students test papers,
what do you think he should do with them? Should he/she throw them
away? Keep them? Or what?
105
Assessment and Evaluation of Learning
reasons for analyzing questions and tests that students have completed and that have
already been graded. Some of the reasons that have been cited include the following:
1. Identify content that has not been adequately covered and should be re-taught,
2. Provide feedback to students,
3. Determine if any items need to be revised in the event they are to be used again
or become part of an item file or bank,
4. Identify items that may not have functioned as they were intended,
5. Direct the teacher's attention to individual student weaknesses.
The results of an item analysis provide information about the difficulty of the items and
the ability of the items to discriminate between better and poorer students. If an item is
too easy, too difficult, failing to show a difference between skilled and unskilled
examinees, or even scored incorrectly, an item analysis will reveal it. The two most
common statistics reported in an item analysis are the item difficulty and the item
discrimination. An additional analysis that is often reported is the distractor analysis.
Once the item analysis information is available, an item review is often conducted. In
the following sections you are going to learn the statistical techniques used to analyse
responses to test items.
How difficulty do you think a test should be? How do we determine the
difficulty level of test items? Why is it important to know the difficulty
level of test items? Please think over these questions and share your
ideas to your friend.
Item difficulty index is one of the most useful, and most frequently reported, item
analysis statistics. It is a measure of the proportion of examinees who answered the
item correctly; for this reason it is frequently called the p-value. If scores from all
students in a group are included the difficulty index is simply the total percent correct.
106
Assessment and Evaluation of Learning
When there is a sufficient number of scores available (i.e., 100 or more) difficulty
indexes are calculated using scores from the top and bottom 27 percent of the group.
1. Rank the papers in order from the highest to the lowest score
2. Select one-third of the papers with the highest total score and another one-third of
the papers with lowest total scores
3. For each test item, tabulate the number of students in the upper & lower groups who
selected each option
4. Compute the difficulty of each item (% of students who got the right item)
P=
The difficulty indexes can range between 0.0 and 1.0 and are usually expressed as a
percentage. A higher value indicates that a greater proportion of examinees responded
to the item correctly, and it was thus an easier item. The average difficulty of a test is
the average of the individual item difficulties. For maximum discrimination among
students, an average difficulty of .60 is ideal. For example: If 243 students answered
item no. 1 correctly and 9 students answered incorrectly, the difficulty level of the item
would be 243/252 or .96.
107
Assessment and Evaluation of Learning
In the example below, five true-false questions were part of a larger test administered to
a class of 20 students. For each question, the number of students answering correctly
was determined, and then converted to the percentage of students answering correctly.
Activity: Calculate the item difficulty level for the following four options
multiple choice test item. (The sign (*) shows the correct answer).
Response Options
Groups A B C D* Total
High Scorers 0 1 1 8 10
Low Scorers 1 1 5 3 10
Total 1 2 6 11 20
108
Assessment and Evaluation of Learning
D=
In the example below, there are 8 students in the high scoring group and 8 in the low
scoring group (with 12 between the two groups which are not represented). For
question 1, all 8 in the high scoring group answered correctly, while only 4 in the low
scoring group did so. Thus success in the HSG Success in the LSG (8 - 4) = +4. The
last step is to divide the +4 by half of the total number of both groups (16). Thus,
1 8 4 84=4 .5
2 7 2
3 5 6
Activity 2: Calculate the item discrimination index for the questions 2 & 3
on the table above.
The item discrimination index can vary from -1.00 to +1.00. A negative discrimination
index (between -1.00 and zero) results when more students in the low group answered
correctly than students in the high group. A discrimination index of zero means equal
numbers of high and low students answered correctly, so the item did not discriminate
between groups. A positive index occurs when more students in the high group answer
correctly than the low group. If the students in the class are fairly homogeneous in
ability and achievement, their test performance is also likely to be similar, resulting in
little discrimination between high and low groups.
Questions that have an item difficulty index (NOT item discrimination) of 1.00 or 0.00
need not be included when calculating item discrimination indices. An item difficulty of
110
Assessment and Evaluation of Learning
1.00 indicates that everyone answered correctly, while 0.00 means no one answered
correctly. We already know that neither type of item discriminates between students.
When computing the discrimination index, the scores are divided into three groups with
the top 27% of the scores in the upper group and the bottom 27% in the lower group.
The number of correct responses for an item by the lower group is subtracted from the
number of correct responses for the item in the upper group. The difference is divided
by the number of students in either group. The process is repeated for each item.
For a small group of students, an index of discrimination for an item that exceeds .20 is
considered satisfactory. For larger groups, the index should be higher because more
difference between groups would be expected. The guidelines for an acceptable level of
discrimination depend upon item difficulty. For very easy or very difficult items, low
discrimination levels would be expected; most students, regardless of ability, would get
111
Assessment and Evaluation of Learning
the item correct or incorrect as the case may be. For items with a difficulty level of about
70 percent, the discrimination should be at least .30.
Just as the key, or correct response option, must be definitively correct, the distracters
must be clearly incorrect (or clearly not the "best" option). In addition to being clearly
incorrect, the distractors must also be plausible. That is, the distractors should seem
likely or reasonable to an examinee who is not sufficiently knowledgeable in the content
area.
If a distractor appears so unlikely that almost no examinee will select it, it is not
contributing to the performance of the item. In fact, the presence of one or more
plausible distractors in a multiple choice item can make the item artificially far easier
than it ought to be. Let us try to explain this using the following table as an example that
shows the responses of eight students to five multiple-choice questions.
A B C D
TEST ITEM NO 1 5** 1 1 1
TEST ITEM NO 2 0 2 6** 0
112
Assessment and Evaluation of Learning
Over 50% of the students answered question number 1 correctly, and each of the
distractors was selected. The distractors have functioned as they should. The teacher
may be less than satisfied with only 5 of 8 students answering correctly, but a class
would generally have more than eight students and could well have a higher percentage
of correct answers while still having effective distractors.
It is not desirable to have one of the distractors chosen more often than the correct
answer, as occurred with question 4. This result indicates a potential problem with the
question. Distractor D may be too similar to the correct answer and/or there may be
something in either the stem or the alternatives that is misleading.
If students do not know the correct answer and are purely guessing, their answers
would be expected to be distributed among the distractors as well as the correct
answer, much like question 3. If one or more distractors are not chosen, as occurs in
questions 2, 4, and 5, the unselected distractors probably are not plausible. If the
teacher wants to make the test more difficult, those distractors should be replaced in
subsequent tests.
Project Work
In the school where you are placed for your Practicum activities, take
corrected exam papers of 1 section from the cooperating teacher and
by taking 10 multiple choice questions:
i. calculate the difficulty level of each item
ii. calculate the discrimination power of each item
iii. analyze the plausibility of the distractors
Present your work in the form of a report.
Building a file of effective test items and assessment tasks involves recording the items
or tasks, adding information from analyses of students responses, and filing the records
by both the content area and the objective that the item or task measures. Thus, items
and tasks are recorded on records as they are constructed; information form analysis of
students responses is added after the items and tasks have been used, and then the
effective items and tasks are deposited in the file. In a few years, it is possible to start
using some of the items and tasks from the file and supplement these with new items
and tasks. As the file grow, it becomes possible to select the majority of the items and
tasks from the file for any given test or assessment without repeating them frequently.
Such a file is especially valuable in areas of complex achievement, when the
construction of test items and assessment tasks is difficult and time consuming. When
enough high-quality items and tasks have been assembled, the burden of preparing
tests and assessments is considerably lightened. Computer item banking makes tasks
even easier.
Summary
In this unit you learned how to judge the quality of classroom test by carrying out item
analysis which is the process of testing the item to ascertain specifically whether the
item is functioning properly in measuring what the entire test is measuring. You also
learned about the process of item analysis and how to compute item difficulty, item
114
Assessment and Evaluation of Learning
discriminating power and evaluating the effectiveness of distracters. You have learned
that item difficulty indicates the percentage of testees who get the item right; Item
discriminating power is an index which indicates how well an item is able to distinguish
between the high achievers and low achievers given what the test is measuring; and the
distraction power of a distracter is its ability to differentiate between those who do not
know and those who know what the item is measuring.
Finally you learned that after conducting item analysis, items may still be usable, after
modest changes are made to improve their performance on future exams. Thus, good
test items should be kept in test item banks and in this unit you were given highlights on
how to build a Test Item File/Item Bank.
Self-check Exercises
Alternatives
Group A* B C D
Upper 28 14 5 6 3
Lower 28 7 15 1 5
·
6. What information should be included with the test item we put in our item
bank?·
4.3 References
115
Assessment and Evaluation of Learning
Abay Tekle (1982). Evaluation in Education (part one). Bahir Dar Teachers
College (Unpublished teaching material)
Bigge, J.L. and Colleen Shea Stump (1999). Curriculum, Assessment, and
Instruction. Boston: Wadsworth Publishing Company.
116
Assessment and Evaluation of Learning
UNIT 5
Ethical Standards of assessment
5.1 Introduction
In the previous units you have learned about the different assessment
related concepts, different strategies and techniques of assessing
students learning, as well as methods of maintaining the quality of tests.
In this unit you will be introduced with ethics as a mechanism of maintaining quality in
our assessment practice. You will be familiarized with some basic standards that are
expected of professional teachers to be ethical in their assessment practices. You will
also be familiarized with some general considerations in addressing diversity in the
classroom so as to make the assessment procedures accessible and free of bias.
Learning Outcomes
117
Assessment and Evaluation of Learning
The following are some ethical standards that teachers may consider in their
assessment practices.
2. Teachers should develop tests that meet the intended purpose and that are
appropriate for the intended test takers. This requires teachers to:
Define the purpose for testing, the content and skills to be tested, and the
intended test takers.
Develop tests that are appropriate with content, skills tested, and content
coverage for the intended purpose of testing.
Develop tests that have clear, accurate, and complete information.
Develop tests with appropriately modified forms or administration
procedures for test takers with disabilities who need special
accommodations.
118
Assessment and Evaluation of Learning
In addition, the following are principles of grading that can guide the development of a
grading system.
1. The system of grading should be clear and understandable (to parents, other
stakeholders, and most especially students).
120
Assessment and Evaluation of Learning
In the previous section you have learned that fairness is the fundamental principle that
has to be followed in teachers assessment practices. It has been said that all students
have to be provided with equal opportunity to demonstrate the skills and knowledge
being assessed. Fairness is fundamentally a socio-cultural, rather than a technical,
issue. Thus, in this section we are going to see how culture and ethnicity may influence
teachers assessment practices and what precautions we have to take in order avoid
bias and be accommodative to students from all cultural groups.
Do you believe that culture and ethnicity have any role in teachers
assessment practices? In you university experience, have you observed
situations where instructors were biased in the assignment of grades to
students based on culture and ethnicity? If so, do you think that was fair?
Students represent a variety of cultural and linguistic backgrounds. If the cultural and
linguistic backgrounds are ignored, students may become alienated or disengaged from
the learning and assessment process. Teachers need to be aware of how such
backgrounds may influence student performance and the potential impact on learning.
Teachers should be ready to provide accommodations where needed.
121
Assessment and Evaluation of Learning
Assessment practices that attend to issues of linguistic diversity include those that
acknowledge students differing linguistic abilities.
use that knowledge to adjust or scaffold assessment practices if necessary.
use assessment practices in which the language demands do not unfairly
prevent the students from understanding what is expected of them.
use assessment practices that allow students to accurately demonstrate their
understanding by responding in ways that accommodate their linguistic abilities,
if the response method is not relevant to the concept being assessed (e.g., allow
a student to respond orally rather than in writing).
Teachers must make every effort to address and minimize the effect of bias in
classroom assessment practices. Bias occurs when irrelevant or arbitrary factors
systematically influence interpretations and results made that affect the performance of
an individual student or a subgroup of students. For example, bias may occur when
variablessuch as cultural and language differences and socioeconomic statusare not
fairly accounted for when interpreting results from an assessment.
Assessment should be culturally and linguistically appropriate, fair and bias-free. It may
not be possible to totally eliminate all forms of bias from classroom assessments.
However, teachers and others who assess students learning should recognize that
bias is an ever-present concern to student assessment and be vigilant and resistant to
the sources of bias, including plans for identifying and addressing bias. For an
assessment task to be fair, its content, context, and performance expectations should:
122
Assessment and Evaluation of Learning
reflect knowledge, values, and experiences that are equally familiar and
appropriate to all students;
tap knowledge and skills that all students have had adequate time to acquire;
be as free as possible of cultural and ethnic stereotypes.
Inclusive education is based on the idea that all students, including those with
disabilities, should be provided with the best possible education to develop themselves.
This implies for the provision of all possible accommodations to address the educational
needs of disabled students. Accommodations should not only refer to the teaching and
learning process. It should also consider the assessment mechanisms and procedures.
students with different types of disabilities. Each group may discuss on one
type of disability and share its ideas to the other groups.
There are different strategies that can be considered to make assessment practices
accessible to students with disabilities depending on the type of disability. In general
terms, however, the following strategies could be considered in summative
assessments:
124
Assessment and Evaluation of Learning
If the questions involve objects and ideas that are more familiar or less offensive to
members of one gender, then the test may be easier for individuals of that gender.
Standards for achievement on such a test may be unfair to individuals of the gender that
is less familiar with or more offended by the objects and ideas discussed, because it
may be more difficult for such individuals to demonstrate their abilities or their
knowledge of the material.
Unit Summary
In this unit you have learned that ethics is a very important issue we have to follow in
our assessment practices. And the most important ethical consideration is fairness. If
we are to draw reasonably good conclusions about what our students have learned, it is
imperative that we make our assessmentsand our uses of the resultsas fair as
possible for as many students as possible. A fair assessment is one in which students
are given equitable opportunities to demonstrate their abilities and knowledge.
125
Assessment and Evaluation of Learning
Teachers must make every effort to address and minimize the effect of bias in
classroom assessment practices. Biases in assessment can occur because of
differences in culture or ethnicity, disability as well as gender. To ensure suitability and
fairness for all students, teachers need to check the assessment strategy for its
appropriateness and if there are cultural, disability and gender biases.
Equitable assessment means that students are assessed using methods and
procedures most appropriate to them. Classroom assessment practices should be
sensitive and diverse enough to accommodate all types of diversity in the classroom in
order to obtain accurate information about their learning.
References
Washington Educational Research Association 1999. Ethical Standards in Testing:
Childs, Ruth Axman (1990). Gender bias and fairness. Practical Assessment,
Research & Evaluation, 2(3). http://PAREonline.net/getvn.asp?v=2&n=3 .
126
Assessment and Evaluation of Learning
GLOSSARY
Assessment as Learning: The process of developing and supporting student
metacognition.
Assessment criteria: is a property, dimension or characteristic by which a students
achievement is judged or appraised
Assessment for Learning: The ongoing process of gathering and interpreting
evidence about student learning for the purpose of determining where students are in
their learning, where they need to go, and how best to get there
Assessment of Learning: The process of collecting and interpreting evidence for the
purpose of summarizing learning at a given point in time, to make judgements about the
quality of student learning on the basis of established criteria, and to assign a value to
represent that quality
Assessment: the process of gathering information, both formally and informally, about
students understandings and skills
Authentic assessment: demonstration or application of a skill or ability within a real-life
context
Convergent assessment: are those assessment activities that have only one correct
response that the student is trying to reach.
Correlation coefficient: a statistic that indicates the degree of relationship between
any two sets of scores obtained from the same group of individuals
Criterion referenced: criterion-referenced tests measure student performance against
a set of standards with determined levels (advanced, proficient, basic)
Criterion-referenced assessment: is the process of evaluating (and grading) the
learning of students against a set of pre-specified qualities or criteria, without reference
to the achievement of others in the cohort or group.
Diagnostic assessment: information collected before learning that is used to assess
prior knowledge and identify misconceptions
Discrimination: A good test must be able to make distinction between poor and good
learner
127
Assessment and Evaluation of Learning
Divergent assessments: are those assessment activities for which a range of answers
or solutions might be considered correct.
Evaluation: the process of making judgments about the level of students achievement
for accountability, promotion, and certification
Fairness: addresses the issue of possible bias or discrimination of an assessment
toward any individual or group (race, gender, ethnicity)
Formal Assessment: are where the students are aware that the task they are doing is
for assessment purposes
Formative assessment: information collected during learning that is used to make
instructional decisions
Informal Assessment: refers to assessment techniques that can easily be
incorporated into classroom routines and learning activities
Item analysis: is the process of testing the item to ascertain specifically whether the
item is functioning properly in measuring what the entire test is measuring
Learning outcomes: are what students should know and be able to do, and/or value at
the completion of a unit of study.
Measurement: The process of obtaining a numerical description of the degree to which
an individual possesses a particular characteristic.
Norm referenced: norm-referenced tests compare student performance to a national
population of students who served as the norming group
Norm-referenced assessment: determines student achievement (grades) based on a
position within a cohort of students the norm group.
Objectivity: The fairness of a test to the testee. A test that is objective has high validity
and reliability
Percentile rank: is a single number that indicates the percentage of the norm group
that scored below a given raw score.
Percentile: a statistical device that shows how a student compares with students in the
norming group who had the same or a lower score
Performance assessment: students demonstrate that they can perform or
demonstrate specific behaviors and abilities
128
Assessment and Evaluation of Learning
129