0% found this document useful (0 votes)
32 views

Module - Assessment and Evaluation of Learning- Final(1)

Uploaded by

assefagessesse78
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Module - Assessment and Evaluation of Learning- Final(1)

Uploaded by

assefagessesse78
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 130

Assessment and Evaluation of Learning

ASSESSMENT AND EVALUATION OF


LEARNING

MEKELLE UNIVERSITY

PREPARED BY: Yohannes Gebretsadik

EDITOR: Dr Beyene Baraki

TECHNICAL ADVISER: Dr Mulu Nega, PRIN International


Consultancy & Research Services PLC

MINISTRY OF EDUCATION

December, 2013
ADDIS ABABA

0
Assessment and Evaluation of Learning

ASSESSMENT AND EVALUATION OF LEARNING

Course Code: PGDT 412

Credit hours: 3

1
Assessment and Evaluation of Learning

TABLE OF CONTENTS
Content Page

Unit 1: Assessment: Concept, Purpose, and Principles .. 7


1.1. Introduction . 7
1.2 Concepts . .. .. 8
1.3 Importance and Purposes of Assessment ..… 12
1.4 The Role of Educational Objectives in Assessment .. 16
1.5 Principles of Assessment 17
1.6 Assessment and Some Basic Assumptions 22
1.7 Assessment, Learning, and the Involvement of Students .. 24
1.8 Assessment and Teacher Professional Competence in Ethiopia . 29

Unit 2: Assessment Strategies, Methods, and Tools .. . 35


2.1 Introduction 35

2.2 Types of assessment .. 35

2.3 Assessment Strategies 42

2.4 Assessment in large classes . 48

2.5 Selecting and developing assessment methods and tools ... 52

2.6 Arrangement of test items . 71

2.7 Administration of Tests 73

Unit 3: Describing and Interpreting Test Scores … 82


3.1 Introduction 82
3.2 Describing and interpreting test results 82
3.2.1Methods of Interpreting Test Scores .. 84

3.2.2Measures of Central Tendency 85


3.2.3 Measures of Variability/Dispersion 89

2
Assessment and Evaluation of Learning

3.2.4 Measures of Relative Position 93


3.2.5 Measures of Relationship 96
3.3 Characteristics of a Good Test 98

UNIT 4: Item Analysis .. ... 102


4.1 Introduction 102
4.2. Analyzing Test Items .. 102
4.2.1. Item difficulty level index .. 103
4.2.2. Item discrimination index .. 106
4.2.3. Effectiveness of Destructors in multiple choice items 109
4.2.4 Item banking 111

UNIT 5: Ethical Standards of assessment . . 114

5.1 Introduction 114


5.2 Ethical and Professional Standards of Assessment and its Use .. . 114
5.3 Ethnicity and Culture in tests and assessments . 118
5.4 Disability and Assessment Practices 120
5.5 Gender issues in assessment … 122

3
Assessment and Evaluation of Learning

Module Introduction
This module is designed to equip you with the basic knowledge and

 practical skills required to assess students’ learning. It incorporates


descriptions of important concepts that help to clarify the assessment
process; elaboration of the major principles and procedures of the assessment process;
the different tools and strategies; different mechanisms that are used to maintain the
quality of assessment tools and procedures; and the ethical standards of assessment.
There are two prerequisite courses that you need to take before learning this module:
Secondary School Curriculum & Instruction; and Psychological Foundations of Learning
& Development.

Throughout this module there are different in-text questions which may help you to pose
your reading for a moment and reflect on what you are studying. In addition, there are
many activities that you will come across (at least one in each section) and attempt
before proceeding from one section to the next section. Therefore, you need to
seriously try to reflect on/answer each question and activity if you are to have a deep
and meaningful understanding of the concepts under discussion and be successful
learners. You will also complete two assignments and submit to your course tutor that
will be graded out of 30%.

Finally, I wish you a good and successful learning journey. You may start studying your
module right now.

Module Learning Outcomes


This module is aimed at ensuring the following learning outcomes. On successful
completion of this module you will be able to:
 describe concepts related to student learning assessment
 develop and implement classroom assessment strategies
 develop techniques for assessing the performance of students based on sound
principles and educational objectives
 apply proper procedures when administering assessment tools
4
Assessment and Evaluation of Learning

 Interpret assessment results to understand the implications and thereby make


appropriate decisions.
 Use feedback from classroom assessment to improve students’ learning
 Analyze items to increase the fit for purpose of classroom assessment tools.
 Conduct self-assessment of their teaching in classrooms in view of student
learning and standards of teacher professionalism.
 Adhere to professional assessment ethical standards in assessing student
learning, handling records, using or communicating assessment results and
making decisions.

Module Contents
1. Assessment: Concept, purpose, and principles
2. Assessment strategies, methods, and tools
3. Item Analysis
4. Interpretation of scores
5. Ethical Standards of assessment

5
Assessment and Evaluation of Learning

This tells you there is a question to answer or think about in the text.
This tells you there is an introduction to the module, unit and section.
This tells you there is a checklist of the main points.

Module Icons
In this module the following icons (symbols) are used to facilitate your learning
process.



 This tells you there is an activity to do.

 This tells you to note and remember an important point

This tells you there is a self-test for you to do


1.
2.
6
Assessment and Evaluation of Learning

This tells you there is written assignment.


7
Assessment and Evaluation of Learning

Unit 1

Assessment: Concept, Purpose, and Principles

1.1 Introduction

Welcome to the first unit of “Assessment and Evaluation” course module.


This is an introductory unit that is intended to familiarize you with some basic
concepts that you will encounter while studying this course. Specifically the concepts
test, measurement, assessment and evaluation will be elaborated. Following this, the
purposes of educational assessment are described. Next there is a brief explanation of
the role of educational objectives in assessment. This unit also presents you with the
important principles that have to be adhered when assessing students’ learning.
Finally, the importance of involving students in the assessment process is highlighted to
be followed by the most important competencies that professional teachers are
expected to possess so as to effectively assess their students.

Unit Learning Outcomes

Upon successful completion of this unit, you will be able to:

 Define the meaning of test, measurement, assessment and evaluation.

 Distinguish the difference among the different concepts in assessment


and evaluation.

 Examine the purposes of assessment and evaluation of learning

 Identify the principles of assessment and evaluation of learning.

 Apply the principles of assessment and evaluation of learning in the local


context.

8
Assessment and Evaluation of Learning

1.2 Concepts

Dear learner, before you start studying educational assessment and evaluation, you
need to have a clear understanding about certain related concepts. Having a clear
understanding of the basic concepts is fundamental to learn better the subsequent
topics of the course. You might have come across the concepts test, measurement,
assessment, & evaluation.

Reflection: Think about the concepts test, measurement, assessment, &


evaluation for a moment. What do you know about these concepts?
 What is the difference among these concepts?

You might have found it difficult to come up with a clear distinction in meaning among
these concepts. This is because of the fact that they are concepts which may be
involved in a single process. There is also some confusion and differences in the usage
of these concepts as manifested in the literature. Now let us see the meaning of these
concepts as used in this module.

Test: Perhaps test is a concept that you are more familiar with than the other concepts.
You have been taking tests ever since you have started schooling to determine your
academic performance. Tests are also used in work places to select individuals for a
certain job vacancy. Thus test in educational context is meant to the presentation of a
standard set of questions to be answered by students. It is one instrument that is used
to determine students’ ability or performance to complete certain tasks or demonstrate
mastery of a skill or knowledge of content. Please note that there are many other ways
of collecting information about students’ educational performances other than tests,
such as observations, assignments, project works, portfolios, etc.

Measurement: In our day to day life there are different things that we measure. We
measure our height and put it in terms of meters and centimeters. We measure some of
our daily consumptions like sugar in kilograms and liquids in liters. We measure
temperature and express it in terms of degree centigrade or degree Celsius. How do we
measure these things? Well definitely we need to have appropriate instruments such as
9
Assessment and Evaluation of Learning

and standard instruments such as a ruler, a meter, a weighing scale, or a thermometer


in order to have reliable measurements.

In education, measurement is the process by which the attributes of a person are


measured and described in numbers. It is a quantitative description of the behavior or
performance of students. As educators we frequently measure human attributes such
as attitudes, academic achievement, aptitudes, interests, personality and so forth.
Measurement permits more objective description concerning traits and facilitates
comparisons. Hence, we have to use certain measurement instruments so that we can
conclude that a certain student is better in a certain subject than another student. How
do we measure performance in mathematics? We use a mathematics test which is an
instrument containing questions and problems to be solved by students. The number of
right responses obtained is an indication of performance of individual students in
mathematics. Thus, the purpose of educational measurement is to represent how much
of ‘something’ is possessed by a person using numbers. Note that we are only
collecting information. We are not evaluating! Evaluation is therefore quite different from
measurement. Measurement is not also the same as testing. While a test is an
instrument to collect information about students’ behaviors, measurement is the
assignment of numerical value to the results of a test or other assessment techniques.
Measurement can refer to both the score obtained as well as the process itself.

Assessment: In educational literature the concepts ‘assessment’ and ‘evaluation’


have been used with some confusion. Some educators have used them
interchangeably to mean the same thing. Others have used them as two different
concepts. Even when they are used differently there is too much overlap in the
interpretations of the two concepts.

Cizek (in Phiye, 1997) provides us a comprehensive definition of assessment that


incorporates its key elements:

the planned process of gathering and synthesizing information relevant to the


purposes of (a) discovering and documenting students' strengths and
10
Assessment and Evaluation of Learning

weaknesses, (b) planning and enhancing instruction, or (c) evaluating progress


and making decisions about students.

Activity 1: Individual Activity

How do teachers collect the information about their students’


 academic progress as well as about their own teaching? Please
list down the tools as exhaustibly as possible.

Generally, educational assessment is viewed as the process of collecting information


with the purpose of making decisions about students learning progress. We may collect
information using various instruments including teacher made tests, observations of
students including their written works and answers to questions in class, checklists,
questionnaires and interviews. Rowntree (1974) views assessment as a human
encounter in which one person interacts with another directly or indirectly with the
purpose of obtaining and interpreting information about the knowledge, understanding,
abilities and attitudes possessed by that person. The key words in definition of
assessment is collecting data and making decisions. It provides valuable information
that allows teachers to adapt instructional procedures to the learning needs of their
students. Hence, to make decisions one has to evaluate which is the process of making
judgment about a given situation.

Evaluation: This concept refers to the process of judging the quality of student learning
on the basis of established performance standards and assigning a value to represent
the worthiness or quality of that learning or performance. It is concerned with
determining how well students have learned. When we evaluate, we are saying that
something is good, appropriate, valid, positive, and so forth. Evaluation is based on
assessment that provides evidence of student achievement at strategic times
throughout the grade/course, often at the end of a period of learning. Value is inherent
in the idea of evaluation.

11
Assessment and Evaluation of Learning

Activity 2: Pair Activity

What types of decisions might teachers make based on the


 information they collect about the learning and teaching
process in general and students learning in particular? Please
discuss on this question with your colleague.

Evaluation includes both quantitative and qualitative descriptions of student behavior


plus value judgment concerning the desirability of that behavior. The following simple
mathematical arrangement shows the relationship between measurement and
evaluation.

Evaluation = Quantitative description of students’ behavior (measurement) + qualitative


description of students’ behavior (non-measurement) + value judgment

Thus, evaluation may or may not be based on measurement (or tests) but when it is, it
goes beyond the simple quantitative description of students’ behavior. Evaluation
involves value judgment. The quantitative values that we obtain through measurement
will not have any meaning until they are evaluated against some standards or criteria.
Educators are constantly evaluating students and it is usually done in comparison with
some standard. For example, if the objective of the lesson is for students to solve
quadratic equations and if, having given them a test related to this objective, all learners
are able to solve at least 80% of the problems, then the teacher may conclude that his
or her teaching of the topic was quite successful.

So, we can describe evaluation as the comparison of what is measured against some
defined criteria and to determine whether it has been achieved, whether it is
appropriate, whether it is good, whether it is reasonable, whether it is valid and so forth.
Evaluation accurately summarizes and communicates to parents, other teachers,
employers, institutions of further education, and students themselves what students
know and can do with respect to the overall curriculum expectations.

Now, let’s summarize the differences and relationship between the four concepts.

12
Assessment and Evaluation of Learning

A test is a particular type of assessment instrument that typically consists of sets of


questions administered during a fixed period of time under reasonably comparable
conditions for all students.

Measurement is the assigning of numbers to the results of a test or other forms of


assessment according to a specific rule.

Assessment is a much more comprehensive and inclusive concept than testing and
measurement. It includes the full range of procedures (observations, rating of
performances, paper and pencil tests, etc) used to gain information about students’
learning. It may also include quantitative descriptions (measurement) and qualitative
descriptions (non-measurement) of students’ behaviors.

Evaluation, on the other hand, consists of making judgments about the level of
students’ achievement for purposes of grading and accountability and for making
decisions about promotion and graduation. To make an evaluation, we need
information, and it is obtained by measuring using a reliable instrument.

Activity 3: Individual Activity

1)  Have you ever heard or experienced with the process of assessment?


What does the word “assessment” mean to you?
2) Define the terms test, measurement, evaluation using your own terms.
3) Can you differentiate assessment from the concepts: test, measurement and
evaluation?

1.3 Importance and Purposes of Assessment

One of the first things to consider when planning for assessment is its purpose. Why
assessment is important? Who will use the results? How will they use them? As
prospective teachers, you also need to have a clear idea as to what the purposes
assessment serves. So let’s discuss on the following question:

13
Assessment and Evaluation of Learning

Activity 4: Think-Pair-Share

In the previous section we have seen that assessment is the


 process of collecting information and making decisions. Why
do we need assessment in education? What do you think is the
purpose of assessment? Why do teachers assess their students? Reflect on
these questions, write your ideas on a piece of paper and share these ideas
with your colleagues before you proceed on reading the contents of this
section.

Assessment is important because it drives and directs students’ learning; provides


feedback to students on their performance; provides feedback on instruction, and
ensures that standards of learning progression are met.. It involves both students and
teachers in the continuous monitoring of students' learning. It provides the staff with
feedback about their effectiveness as teachers, and it gives students a measure of their
progress as learners. Through close observation of students in the process of learning
and the collection of frequent feedback on students' learning, teachers can learn much
about how students learn and, more specifically, how students respond to particular
teaching approaches. Classroom assessment helps individual teachers obtain useful
feedback on what, how much, and how well their students are learning. The staff can
then use this information to refocus their teaching to help students make their learning
more efficient and more effective.

Thus, based on the reasons for assessment described above, it can be summarized
that assessment in education focuses on:

• helping LEARNING, and;


• improving TEACHING.

With regard to the learner, assessment is aimed at providing information that will help
us make decisions concerning remediation, enrichment, selection, exceptionality,
progress and certification. Assessment for improved student learning requires a range
of assessment practices to be used with the following overarching purposes:

14
Assessment and Evaluation of Learning

 Assessment of Learning: this kind of assessment is usually summative in nature,


which is done at the of a learning task. It is designed to provide evidence for
teachers to make decisions/judgments about students’ achievement against set
goals and standards. It also helps to provide evidence of students’ achievement to
parents, administrators, educators and students themselves.

Assessment for Learning: this type of assessment occurs while teaching and learning
is on progress, rather than at the end. In assessment for learning, teachers use
assessment evidences to monitor students learning progress and inform their teaching.
This form of assessment is designed to provide diagnostic information to teachers about
students’ prior knowledge and formative information about the effects of their
instruction on student learning. It also provides students with important information
about their learning and the effectiveness of the learning strategies they are using. It is
the most important form of assessment in regard to student learning.

Assessment as learning: This form of assessment makes assessment part of, not
separate from, the instructional process. Assessment as learning involves students in
their own continuous self-assessment and is designed to help students become more
self-directed learners. Self-assessment involves helping students set their own
learning goals, monitor progress toward achieving these goals, and make adjustments
in learning strategies as required. Students can be involved in assessment in a variety
of ways, including helping establish criteria for success and developing rubrics to
measure.

Assessment as learning also takes the form of peer assessment, with peer interaction
and feedback. Although strategies for self- and peer assessment are less well
developed as compared to the other two forms of assessment, they are nonetheless
very important for two reasons. One, these assessments help achieve what some see
as the ultimate goal of education—developing independent learners. And two, students
are often more receptive to feedback from peers than feedback from their teachers.

15
Assessment and Evaluation of Learning

With regard to teaching, assessment provides information about the attainment of


objectives, the effectiveness of teaching methods and learning materials

Overall, assessment serves the following main purposes.

1) Assessment is used to inform and guide teaching and learning: A good


classroom assessment plan gathers evidence of student learning that informs
teachers' instructional decisions. It provides teachers with information about what
students know and can do. To plan effective instruction, teachers also need to
know what the student misunderstands and where the misconceptions lie. In
addition to helping teachers formulate the next teaching steps, a good classroom
assessment plan provides a road map for students. Students should, at all times,
have access to the assessment so that they can use it to inform and guide their
learning.

2) Assessment is used to help students set learning goals: Students need


frequent opportunities to reflect on where they are in their learning, where they
need to go and what needs to be done to achieve their learning goals. When
students are actively involved in assessing their own next learning steps and
creating goals to accomplish them, they make major advances in directing their
learning and what they understand about themselves as learners.

3) Assessment is used to assign report card grades: Grade reports provide


parents, employers, schools, and other stakeholders including the government,
post-secondary institutions and employers with summary information about
student learning.

4) Assessment is used to motivate students: Research has shown that students


will be confident and motivated when they experience progress and
achievement, rather than the failure and defeat associated with being compared
to more successful peers.
16
Assessment and Evaluation of Learning

Activity 5: Group Activity


In small groups discuss the extent to which each of the purposes
 of assessment have been served by the different assessment
activities you have gone through while you were at your
respective universities and report the results of your discussions.

1.4 The Role of Educational Objectives in Assessment

Activity 6: Individual Activity

Based on what you have learned from previous courses, reflect on the
following questions.
 1) Define educational objectives and learning outcomes
2) How can we classify educational objectives? Describe
“Bloom’s Taxonomy of Educational Objectives”.
3) Discuss the importance of educational objectives to the instructional
process.

As you might remember from what you have learned in your “Secondary School
curriculum and Instruction” course, the first step in planning any good teaching is to
clearly define the educational objectives or outcomes. They are desirable changes in
behavior or outcome statements that capture specifically what knowledge, skills,
attitudes learners should be able to exhibit following the instructional process. Defining
learning objectives is also essential to the assessment of students’ learning. Effective
assessment practice requires relating the assessment procedures as directly as
possible to the learning objectives.

Educational objectives which are commonly known as learning outcomes play a key
role in both the instructional process and the assessment process. They serve as
guides for both teaching and learning, communicate the intent of instruction to others,
and provide guidelines for assessing students learning.

Educational objectives or learning outcomes are stated in terms of what the students
are expected to be able to do at the end of the instruction. For instance, after teaching
17
Assessment and Evaluation of Learning

them on how to solve quadratic equations, we might expect students to have the skill of
solving any quadratic equation. A learning outcome stated in this way clearly indicates
the kind of performance students are expected to exhibit as a result of the instruction.
This situation also makes clear the intent of our instruction and sets the stage for
assessing students learning. Well stated learning outcomes contain three elements
namely, conditions of performance, observable behavior or action, and measurable
criteria or standards, These elements helps us to make clear the types of students
performance we are willing to accept as evidence that the instruction has been
successful.

Classification of Educational Objectives

Bloom and his associates have developed a taxonomy of educational objectives, which
provides a practical framework within which educational objectives could be organized
and measured. In this taxonomy Bloom et al (1956) divided educational objectives into
three domains. These are cognitive domain, affective domain and psychomotor domain.
Each domain is further categorized into hierarchical levels. That is achievement of a
higher level of skill assumes the achievement of the previous levels. This implies that a
higher level of skill could be achieved only if a certain amount of ability called for by the
previous level has been achieved.

Cognitive domain: This involves those objectives that deal with the development of
intellectual abilities and skills. These have to do with the mental abilities of the brain. For
instance, you cannot apply what you do not know or comprehend.

Levels of the Cognitive Domain


Level Description
Knowledge recognition or recall of previous learned information
Comprehension is all about internalization of knowledge
Application use of abstractions in a concrete situation
Analysis the breaking down of a learnt material into parts, ideas and devices
for clearer understanding
Synthesis Combining components to form a new whole

18
Assessment and Evaluation of Learning

Evaluation making a quantitative or qualitative judgment about a piece of


communication, a procedure, a method, a proposal, a plan etc

Affective Domain: Affective domain has to do with feelings and emotions. It is


concerned with interests, attitudes, appreciation, emotional biases and values. Can you
now see that your feeling, emotion, appreciation, the value you place on this course,
together form your affective disposition of the course. It shows your personal-social
adjustment in this course and indeed in the programme.

Levels of the Affective Domain


Level Description
receiving Freely attending to stimuli
responding Voluntarily reaching to stimuli
valuing Forming an attitude toward a stimulus
organization Bringing together different values and building a consistent value
system by resolving any possible conflicts between them
characterization Behaving consistently with an internally developed, stable value
system

Psychomotor domain: The psychomotor domain has to do with motor skills or abilities.
It deals with such activities which involve the use of the hand or the whole of the body.
Can you think of such abilities or skills. Consider the skills in running, walking,
swimming, jumping, eating, playing, throwing, etc.

Levels of the psychomotor domain


Level Description
Imitation Observing and patterning behavior after someone else
Manipulation Being able to perform certain actions by following written/oral
instructions and practicing
precision Refining, becoming more exact. Few errors are apparent
Articulation Coordinating a series of actions, achieving harmony and internal
consistent

19
Assessment and Evaluation of Learning

Naturalization Having high level performance become natural, without needing to think
much about it.

Activity 7: Group activity

In small groups discuss on the following questions.

 1. To what extent are the course objectives you learnt directly


related to the assessment types your instructors were using to
measure your learning progress?
2. How frequently were your instructors assessing your progress to ensure
whether the objectives were achieved or not?
3. Have you ever thought of the objectives of the course(s) you learn during the
learning process and when you study in preparation for exams?

1.5. Principles of Assessment

Assessment principles consist of statements highlighting what are considered as critical


elements or characteristics of good practice designed to assess student learning
progress. These principles are expressed in terms of elements for a fair (reliable and
valid) assessment system. Thus, each principle introduces an issue that must be
addressed when evaluating a student assessment system. Assessment principles guide
the collection of meaningful information that will help inform instructional decisions,
promote student engagement, and improve student learning.

Different educators and school systems have developed somehow different sets of
assessment principles. Miller, Linn and Grunland (2009) have identified the following
general principles of assessment.

1. Clearly specifying what is to be assessed has priority in the assessment process.


2. An assessment procedure should be selected because of its relevance to the
characteristics or performance to be measured.
3. Comprehensive assessment requires a variety of procedures.
4. Proper use of assessment procedures requires an awareness of their limitations.
5. Assessment is a means to an end, not an end in itself.
20
Assessment and Evaluation of Learning

Perhaps the assessment principles developed by New South West Wales Department
of Education and Training (2008) in Australia are more inclusive than those principles
listed by other educators. Let us look at these principles and compare them with those
developed by Miller, Linn and Grunland as described above.

1. The primary purpose of assessment should to improve student learning and


performance. Good assessment is based on the kind of learning we most value for our
students and sets out to measure what matters most.
2. Assessment should be based on an understanding of how students learn.
Assessment is most effective when it considers the fact that learning is a
complex process that is multi-dimensional, integrated and revealed in student
performance over time.

3. Assessment should be relevant. Assessment needs to provide information about


students’ knowledge, skills and understandings of the learning outcomes specified in
the syllabus.
4. Assessment should be appropriate. Assessment needs to provide information about the
particular kind of learning in which we are interested. This means that we need to use a
variety of assessment methods because not all methods are capable of providing
information about all kinds of learning. For example, some kinds of learning are best
assessed by observing students; some by having students complete projects or make
products and others by having students complete paper and pen tasks. Conclusions
about student achievement in an area of learning are valid only when the assessment
method we use is appropriate and measures what it is supposed to measure.
5. Assessment should be fair. Assessment needs to provide opportunities for every
student to demonstrate what they know, understand and can do. Assessment must be
based on a belief that all learners are on a path of development and that every learner is
capable of making progress. Students bring a diversity of cultural knowledge,
experience, language proficiency and background, and ability to the classroom. They
21
Assessment and Evaluation of Learning

should not be advantaged or disadvantaged by such differences that are not relevant to
the knowledge, skills and understandings that the assessment is intended to address.
Students have the right to know what is assessed, how it is assessed and the worth of
the assessment. Assessment will be fair or equitable only if it is free from bias or
favoritism.
6. Assessment should be accurate. Assessment needs to provide evidence that
accurately reflects an individual student’s knowledge, skills and understandings.
That is, assessments need to be reliable or dependable in that they consistently
measure a student’s knowledge, skills and understandings. Assessment also
needs to be objective so that if a second person assesses a student’s work,
they will come to the same conclusion as the first person. Assessment will be fair
to all students if it is based on reliable, accurate and defensible measures.
7. Assessment should provide useful information. The focus of assessment is to establish
where students are in their learning. This information can be used for both summative
purposes, such as the awarding of a grade, and formative purposes to feed directly into
the teaching and learning cycle.
8. Assessment should be integrated into the teaching and learning cycle. Assessment
needs to be an ongoing, integral part of the teaching and learning cycle. It must allow
teachers and students themselves to monitor learning. From the teacher perspective, it
provides the evidence to guide the next steps in teaching and learning. From the
student perspective, it provides the opportunity to reflect on and review progress, and
can provide the motivation and direction for further learning.
9. Assessment should draw on a wide range of evidence. Assessment needs to draw on a
wide range of evidence. A complete picture of student achievement in an area of
learning depends on evidence that is sampled from the full range of knowledge, skills
and understandings that make up the area of learning. An assessment program that
consistently addresses only some outcomes will provide incomplete feedback to the
teacher and student, and can potentially distort teaching and learning.

22
Assessment and Evaluation of Learning

10. Assessment methods used should be valid and reliable.


Assessment instruments and processes should measure what they are intended to
measure
11. Assessment should give attention to outcomes and processes.
Assessment instruments should reflect both outcomes of learning and the kind of
effort that led to these outcomes.

12. Assessment works best when it is ongoing rather than episodic


Student learning is best fostered when assessment involves a linked series of
activities undertaken over time.

13. Assessment should incorporate feedback mechanisms. Assessment for


improved performance involves feedback and reflection. All assessment
methods should allow students to receive feedback on their learning and
performance.. Assessment should also provide students and staff with
opportunities to reflect on both their practice and their learning overall.

14. Assessment should be manageable. Assessment needs to be efficient, manageable and


convenient. It needs to be incorporated easily into usual classroom activities and it
needs to be capable of providing information that justifies the time spent.

Activity 8: Group Activity


In small groups discuss on the following questions.
1) Are there
 about their differences?
2) What is the importance of each of these principles to the teaching
and learning process?
3) List possible reasons that could hinder in applying the principles and
identify solutions for each constraint to make your assessment
comprehensive and effective?
4) Based on your experiences, compare and contrast the extent each of
these principles were followed at secondary and university education
levels.

1.6. Assessment and Some Basic Assumptions


23
Assessment and Evaluation of Learning

Reflection: When planning to assess students, what are the


assumptions that one held in mind? What are the things that should be

kept in mind when preparing assessment tools for assessing students?

Angelo and Cross (1993) have listed seven basic assumptions of classroom
assessment which are described as follows:

1. The quality of student learning is directly, although not exclusively related


to the quality of teaching. Therefore, one of the most promising ways to
improve learning is to improve teaching. If assessment is to improve the
quality of students learning, both teachers and students must become personally
invested and actively involved in the process.

Reflection: What should be the roles of students and


 teachers in classroom assessment so as it will help students’
learning?

2. To improve their effectiveness, teachers need first to make their goals and
objectives explicit and then to get specific, comprehendible feedback on the
extent to which they are achieving those goals and objectives. Effective
assessment begins with clear goals. Before teachers can assess how well their
students are learning, they must identify and clarify what they are trying to teach.
After teachers have identified specific teaching goals they wish to assess, they
can better determine what kind of feedback to collect.

3. To improve their learning, students need to receive appropriate and focused


feedback early and often; they also need to learn how to assess their own
learning.

Reflection: How do you think feedback and self-assessment


 will help to improve students’ learning?

24
Assessment and Evaluation of Learning

4. The type of assessment most likely to improve teaching and learning is that
conducted by teachers to answer questions they themselves have
formulated in response to issues or problems in their own teaching. To best
understand their students’ learning, teachers need specific and timely
information about the particular individuals in their classes. As a result of the
different students’ needs, there is often a gap between assessment and student
learning. One goal of classroom assessment is to reduce this gap.

Reflection: How does classroom assessment help to reduce


 this gap between assessment and student learning?

5. Systematic inquiry and intellectual challenge are powerful sources of


motivation, growth, and renewal for teachers, and classroom assessment
can provide such challenge. Classroom assessment is an effort to encourage
and assist those teachers who wish to become more knowledgeable, involved,
and successful.

6. Classroom assessment does not require specialized training; it can be


carried out by dedicated teachers from all disciplines. To succeed in
classroom assessment, teachers need only a detailed knowledge of the discipline,
dedication to teaching, and the motivation to improve.

7. By collaborating with colleagues and actively involving students in


classroom assessment efforts, teachers (and students) enhance learning
and personal satisfaction. By working together, all parties achieve results of
greater value than those they can achieve by working separately.

Reflection: Can you explain how teachers’ collaboration with


 colleagues can be more effective in enhancing learning and
personal satisfaction than working alone?

1.7. Assessment, Learning, and the Involvement of Students


25
Assessment and Evaluation of Learning

During teaching, you will be assessing students’ learning continuously. You will be
interpreting what the students say and do in order to make judgments about their
achievements. The ability to analyze the students’ learning is vital if you are to make
appropriate teaching points which help the students develop their knowledge and/or
competence. You will be using your subject knowledge to help you identify what to look
for and where to take the student next. You will need to listen, observe and question in
ways which will enable you to give appropriate feedback or further instruction.

There is considerable evidence that assessment is a powerful process for enhancing


learning. Black and Wiliam (1998) synthesized over 250 studies linking assessment and
learning. From this they came up with the finding that the intentional use of assessment
in the classroom to promote learning resulted in improved student achievement.
Classroom assessment promotes learning when teachers use it in the following ways:

 When they use it to become aware of the knowledge, skills, and beliefs that their
students bring to a learning task, and;

 When they use this knowledge as a starting point for new instruction, and
monitor students’ changing perceptions as instruction proceeds.

Activity 9: Group Activity

In small groups discuss on the following issue.

As prospective teachers, how do you think you will use the


 information you collect through different methods of assessment to
improve the teaching and learning process?

When learning is the goal, teachers and students collaborate and use ongoing
assessment and pertinent feedback to move learning forward. When classroom
assessment is frequent and varied, teachers can learn a great deal about their students.
They can gain an understanding of students’ existing beliefs and knowledge, and can
identify incomplete understandings, false beliefs, and immature interpretations of
concepts that may influence or distort learning. Teachers can observe and probe

26
Assessment and Evaluation of Learning

students’ thinking over time, and can identify links between prior knowledge and new
learning.

Learning is also enhanced when students are encouraged to think about their own
learning, to review their experiences of learning and to apply what they have learned to
their future learning. Assessment provides the feedback loop for this process. When
students (and teachers) become comfortable with a continuous cycle of feedback and
adjustment, students begin to internalize the process of standing outside their own
learning and considering it against a range of criteria, not just the teacher’s judgment
about quality or accuracy. When students engage in this ongoing metacognitive
experience, they are able to monitor their learning along the way, make corrections, and
develop a habit of mind for continually reviewing and challenging what they know.

Assessment also enhances students’ learning by increasing their motivation.


Motivation is essential for students’ engagement in their learning. The higher the
motivation, the more time and energy a student is willing to devote to any given task.
Even when a student finds the content interesting and the activity enjoyable, learning
requires sustained concentration and effort.

Reflection: How do you think assessment will help to increase


 students’ motivation?

According to current cognitive research, people are motivated to learn by success and
competence. When students feel ownership and have choice in their learning, they are
more likely to invest time and energy in it. Assessment can be a motivator, not through
reward and punishment, but by stimulating students’ intrinsic interest. Assessment can
enhance student motivation by:

• emphasizing progress and achievement rather than failure

• providing feedback to move learning forward

27
Assessment and Evaluation of Learning

• reinforcing the idea that students have control over, and responsibility for, their
own learning

• building confidence in students so they can and need to take risks

• being relevant, and appealing to students’ imaginations

• providing the scaffolding that students need to genuinely succeed

Assessment is also an important instrument for implementing differentiated learning.


Classes consist of students with different needs, backgrounds, and skills. Each
student’s learning is unique. The contexts of classrooms, schools, and communities
vary. As well, the societal pressure for more complex learning for all students
necessitates that teachers find ways to create a wide range of learning options and
paths, so that all students have the opportunity to learn as much as they can, as deeply
as they can, and as efficiently as they can.

When students learn, they make meaning for themselves, and they approach learning
tasks in different ways. They bring with them their own understanding, skills, beliefs,
hopes, desires, and intentions. It is important to consider each individual student’s
learning, rather than talk about the learning of “the class.” Assessment practices lead
to differentiated learning when teachers use them to gather evidence to support every
student’s learning, every day in every class. The learning needs of some students may
require individualized learning plans.

There is strong evidence that involving students in the assessment process can have
very definite educational benefits. Now stop reading for a moment and reflect on the
following questions.

Activity 10: Think-Pair-Share

1) As prospective teachers how do you think you can involve


 your students in the assessment process?

28
Assessment and Evaluation of Learning

2) In what ways can students benefit if they are involved in the assessment
process? What do you think are the practical challenges in involving students in
assessment?

One way in which we can involve our students in the assessment process is to establish
the standards or assessment criteria together with them. This will help students
understand what is to be assessed. Working with students to develop assessment tools
is a powerful way to help students build an understanding of what a good product or
performance looks like. It helps students develop a clear picture of where they are
going, where they are now and how they can close the gap. This does not mean that
each student creates his or her own assessment criteria. You, as a teacher, have a
strong role to play in guiding students to identify the criteria and features of
understandings you want your students to develop.

Another important aspect is to involve students in trying to apply the assessment criteria
for themselves. The evidence is that through trying to apply criteria, or mark using a
model answer, students gain much greater insight into what is actually being required
and subsequently their own work improves in the light of this.

An additional benefit is that it may enable the students to be provided with more
learning activities on which they will receive feedback which otherwise would not be
provided because of lack of time by the teacher.

There are different ways in which students can be involved in such type of assessment
– self-assessment and peer assessment. Self-assessment involves students judging
their own work. It begins with students understanding the learning intentions or
objectives for the particular lesson and the success criteria for the specific task or
activity. It develops into students’ awareness of their own strengths and weaknesses in
a particular subject (and as a learner in general) and the ability to identify their own
‘next steps’ or targets. Self-assessment allows students to think more carefully about
what they do and do not know, and what they additionally need to know to accomplish
certain tasks.

29
Assessment and Evaluation of Learning

Peer assessment, by contrast, involves students making judgment about other


students’ work. Students learn how to make better sense of assessment criteria if they
have to give feedback and/or marks against them. Giving and receiving feedback is an
important aspect of student learning and will be valuable skills for them in professional
contexts and for future learning.

1.8 Assessment and Teacher Professional Competence in


Ethiopia

Assessment requires so much of a teachers professional time, both inside and outside
the classroom. Therefore, a teacher should have some basic competencies on
classroom assessment so as to be able to effectively assess his/her students learning.

Activity 11: Think-Pair-Share

As prospective teachers, what competencies do you think you


 should have in the area of assessment? Write down your ideas
and compare it with the work of another colleague. From your
previous experience as a primary or secondary school students,
what were the major professional deficiencies of teachers in assessing
students?

A teacher's professional role and responsibilities for student assessment can be


conceptualized as falling along a time continuum. Assessment activities occur prior to
instruction, during instruction, and after instruction. Assessment prior to instruction
provides a teacher with information about individual differences among students as well
as an understanding of the background or prior knowledge of the class as a whole.
These assessment activities provide the basis for planning instruction.

Assessment during instruction provides information about the overall progress of the
whole class as well as specific information about individual students. These assessment
activities provide the basis for monitoring progress during learning.

Following the teaching of a specific unit, semester, academic year, or the like, decisions
must be made about the achievement of short and long-term instructional goals. This
30
Assessment and Evaluation of Learning

refers to assessment after instruction. In addition to these activities, communication


skills are needed to interpret and report performance standards or levels of
achievement to students and parents.

In the American education system a list of seven standards for teacher competence in
educational assessment of students has been developed. These standards for teacher
competence in student assessment have been developed with the view that student
assessment is an essential part of teaching and that effective teaching cannot exist
without appropriate student assessment. The seven standards articulating teacher
competence in the educational assessment of students are described below.

1. Teachers should be skilled in choosing assessment options appropriate for


instructional decisions. They need to be well-acquainted with the kinds of information
provided by a broad range of assessment alternatives and their strengths and
weaknesses. In particular, they should be familiar with criteria for evaluating and
selecting assessment methods in light of instructional plans.

2. Teachers should be skilled in developing assessment methods appropriate for


instructional decisions. Assessment tools may be accurate and fair (valid) or invalid.
Teachers must be able to determine the quality of the assessment tools they develop.

3. Teachers should be skilled in administering, scoring, and interpreting the results


of assessment methods. It is not enough that teachers are able to select and develop
good assessment methods; they must also be able to apply them properly.

4. Teachers should be skilled in using assessment results when making decisions


about individual students, planning teaching, developing curriculum, and school
improvement.

5. Teachers should be skilled in developing valid student grading procedures that use
pupil assessments. Grading students is an important part of professional practice for
teachers.

31
Assessment and Evaluation of Learning

6. Teachers should be skilled in communicating assessment results to students,


parents, other lay audiences, and other educators. Furthermore, teachers will
sometimes be in a position that will require them to defend their own assessment
procedures and their interpretations of them. At other times, teachers may need to help
the public to interpret assessment results appropriately.

7. Teachers should be skilled in recognizing unethical, illegal, and otherwise


inappropriate assessment methods and uses of assessment information. Teachers
must be well-versed in their own ethical and legal responsibilities in assessment. In
addition, they should also attempt to have the inappropriate assessment practices of
others discontinued whenever they are encountered.

In the Ethiopian context, the MoE has also developed such assessment related
competences which professional teachers are expected to possess. These key
competencies are:
1) Assess student learning
2) Provide feedback to students on their learning
3) Interpret student data
4) Make consistent and comparable judgments
5) Report on student achievement

Activity 12: Group Activity

In small groups:

1) Compare Ethiopia’ standards of teacher competence in


 assessment with that of American standards and report the
similarities and differences.

2) Discuss and report on the importance and use of having standards of teacher
competence in assessment for a particular school and the whole education
system in general.

Unit Summary
32
Assessment and Evaluation of Learning

 Test, measurement, assessment and evaluation are concepts that are frequently
used in the area of educational assessment and evaluation, often with varying
meanings and some confusion. However, although they overlap, they vary in
scope and have different meanings.
 Assessment serves many important purposes including: informing and guiding
teaching and learning; helping students set learning goals; assigning report card
grades; motivating students.

 Assessment for learning refers to the ongoing process of gathering and


interpreting evidence about student learning for the purpose of determining
where students are in their learning, where they need to go, and how best to get
there.
 Assessment as learning is the process of developing and supporting student
metacognition. Students are actively engaged in the assessment process; that is,
they monitor their own learning.
 Assessment of learning refers to the process of collecting and interpreting
evidence for the purpose of summarizing learning at a given point in time, to
make judgements about the quality of student learning on the basis of
established criteria, and to assign a value to represent that quality.
 Assessment should be designed in such a way that it will elicit information about
students’ progression towards the educational objectives.
 There are some important principles that professional teachers should be aware
of that guide the assessment process of students’ learning.
 Any assessment process is based on certain basic assumptions.
 Assessment is an integral process of the teaching and learning process and is an
important tool for enhancing learning.
 In order to maximize the benefits students can get out of assessment, they
should be involved in the assessment process.
 There are certain assessment competencies that teachers need to possess so as
to effectively carry out their professional responsibilities.

33
Assessment and Evaluation of Learning

Self Check Exercises

In order to check your understanding of what you have learned in this


unit, answer the following questions and compare your answers with what
is discussed in the material. If you couldn’t answer any one of these
questions adequately you need to go back and study the material once again where
there is a problem of understanding.

1. Define the concepts test, measurement, assessment and evaluation.


2. Give examples for each of the concepts mentioned above
3. Describe the main purposes of assessment.
4. Discuss the importance of assessment for learning and teaching.
5. Discuss the role of educational objectives in assessment of learning. Give
examples of the classifications of educational objectives.
6. What are the strategies that can be used to involve students in the
assessment of their learning?
7. What are the major assessment competencies that Ethiopia professional
teachers are expected to possess?

References

Angelo, T.A. & Cross, K.P (1993). Classroom Assessment Techniques; A Handbook
for College Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.

Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education
through Assessment, Innovation, and Evaluation. American Academy of Arts and
Sciences

34
Assessment and Evaluation of Learning

Educational Testing Services. Linking Classroom Assessment with Student


Learning.

Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. 3 rd ed. Learning
Matters Ltd

McDonald E. S. & Hershman D. M. (2010). Classrooms that Spark! Recharge and


Revive Your Teaching. 2nd Ed. San Francisco: Jossey-Bass Publishers.

Mehrens, W.A. & Lehman, I.J Measurement and Evaluation in Education. 4 th Ed.
New York: Harcourt Brace College Publishers.

Miller, D.M, Linn, RL. & Grunland, NE. (2009). Measurement and Assessment in
Teaching. 10th ed. Upper Saddle River:Pearson Education, Inc.

NSW DEPARTMENT OF EDUCATION & TRAINING (2008). Principles of


Assessment and Reporting in NSW Public Schools

Phye, G. D. (ed). (1997). Handbook of Classroom Assessment Learning,


Achievement, and Adjustment. San Diago: Academic Press.

Spiller, D. (2009). Assessment: Feedback to promote student learning. Teaching


Development| Wāhanga Whakapakari Ako

Western and Northern Canadian Protocol for Collaboration in Education. (2006).


Rethinking Classroom Assessment with Purpose in Mind: Assessment for
Learning, Assessment as Learning, and Assessment of Learning

35
Assessment and Evaluation of Learning

UNIT TWO
ASSESSMENT STRATEGIES, METHODS, AND TOOLS
2.1 Introduction
In the previous unit you have been introduced with the major concepts of
 educational assessment and evaluation. You also learned about the
purposes and principles of assessment. In this unit you will learn about the
nature, strengths and weaknesses of the various assessment strategies, methods and
tools that can be used in the context of secondary education. You will also learn about
planning, construction and administration of classroom tests.

Learning Outcomes
At the end of this unit you should be able to:

 Identify relevant assessment strategies, methods and tools.

 Compare assessment and evaluation methods and tools to select


appropriate ones.

 Develop formative and summative assessment tools as per the principles


of assessment.

 Complete tasks on linking formative with summative assessment

 Evaluate assessment tools by identifying relevant criteria.

2.2. Types and approaches to assessment


There are different approaches in conducting assessment in the classroom. Here we
are going to see five pairs of assessment typologies: namely, formal vs. informal,
criterion referenced vs. norm referenced, formative vs. summative assessments,
divergent vs. convergent, process vs. product assessment.

2.2.1 Formative vs. Summative Assessments

Assessment procedures can be classified according to their functional role during


classroom instruction. One such classification system follows the sequence in which
36
Assessment and Evaluation of Learning

assessment procedures are likely to be used in the classroom. The most commonly
referred to and used categories in this regard are formative assessment and summative
assessment. Can you differentiate these concepts? Please try to describe them before
you proceed studying the following section.

Formative Assessment: Formative assessments are used to shape and guide


classroom instruction. They can include both informal and formal assessments (which
will be discussed later in this section) and help us to gain a clearer picture of where our
students are and what they still need help with. They can be given before, during, and
even after instruction, as long as the goal is to improve instruction.

Formative assessments are ongoing assessments, reviews, and observations in a


classroom. They are designed to assist the learning process by providing feedback to
both students and teachers. Students receive feedback that they can use to adjust,
improve their performance or other aspects of their engagement in the unit such as
study techniques. For example, a language teacher may give comments on short
stories composed by his/her students, but not overall marks. Teachers receive feedback
on the quality of learners’ understandings and consequently, can modify their teaching
approaches to provide enrichment or remedial activities to more effectively guide
learners. For example, if a teacher observes that some students do not grasp a
concept, he/she can design a review activity to reinforce the concept or use a different
instructional strategy to re-teach it. Teachers can conduct formative assessment at any
point in a unit of study.

Formative assessment is also known by the name ‘assessment for learning’. The
basic idea of this concept is that the basic purpose of assessment should be to enhance
students learning.

There is still another name which is associated with the concept of formative
assessment, ‘continuous assessment’. Continuous assessment (as opposed to
terminal assessment) is based on the premise that if assessment is to help students’
improvement in their learning and if a teacher is to determine the progress of students
37
Assessment and Evaluation of Learning

towards the achievement of the learning goals, it has to be conducted on a continuous


basis. Thus, continuous assessment is a teaching approach as well as a process of
deciding to what extent the educational objectives are actually being realized during
instruction. The main advantage of continuous assessment are that both students and
teachers obtain feedback from the process which can then be used to improve the
process of learning and teaching, and the final result is based on evidence gathered
over the span of the learning period. In schools, continuous assessment of learning is
usually carried out by teachers on the basis of impressions gained as they observe their
students at work or by various kinds of tests given periodically. Therefore, each decision
is based on various types of information that are determined through different
assessment methods at different time by teachers.

In order to assess your students' understanding, there are various strategies that you
can use. Can you mention some of the strategies that you can use to assess your
students for formative purposes? Please, try to mention as many strategies as you can.

The following are some of the strategies of assessment you can employ in your
classrooms namely, you can :

o make your students write their understanding of vocabulary or concepts


before and after instruction.

o ask students to summarize the main ideas they've taken away from your
presentation, discussion, or assigned reading.

o make students complete a few problems or questions at the end of


instruction and check answers.

o interview students individually or in groups about their thinking as they


solve problems.

38
Assessment and Evaluation of Learning

o assign brief, in-class writing assignments (e.g., "Why is this person or


event representative of this time period in history?)

Tests and homework can also be used formatively if teachers analyze where students
are in their learning and provide specific, focused feedback regarding performance and
ways to improve it.

Summative Assessment: Summative assessment typically comes at the end of a


course (or unit) of instruction. It evaluates the quality of students’ learning and assigns
a mark to that students’ work based on how effectively learners have addressed the
performance standards and criteria. Assessment tasks conducted during the progress
of a semester may be regarded as summative in nature if they only contribute to the
final grades of the students.

The techniques used in summative assessment are determined by the instructional


goals. Typically, however, they include teacher made achievement tests, ratings of
various types of performance, and assessment of products (reports, drawings, etc.).

The difference between formative and summative assessment can be summarized as


follows:

Formative Assessment Summative Assessment


Timing Conducted throughout the Conducted at the end of a
teaching-learning process teaching-learning phases (e.g.
end of semester or year)
Method Paper & pencil tests, observations, Paper & pencil tests, oral tests
quizzes, exercises, practical sessions administered to the group
administered to the group and
individually
Aim  To assess progress and recommend  Grading to determine if the
remedial action for non-achievement program was successful.
of objectives  To certify students and improve
• Remediation or enrichment or re-teach the curriculum
the topic
Example Quizzes, essays, diagnostic tests, lab Final exams, national
39
Assessment and Evaluation of Learning

reports and anecdotal records examinations, qualifying tests.

A particular assessment task can be both formative and summative. For example,
students could complete unit 1 of their Module and complete an assessment task for
which they earned a mark that counted towards their final grade. In this sense, the task
is summative. They could also receive extensive feedback on their work. Such feedback
would guide learners to achieve higher levels of performance in subsequent tasks. In
this sense, the task is formative – because it helps students form different approaches
and strategies to improve their performance in the future.

2.2.2 Formal vs. Informal Assessment

Assessment can also be either formal or informal. Let us try to understand their
differences in the paragraphs that follow.

Formal Assessment: Formal assessments are where the students are aware that the
task they are doing is for assessment purposes. They are frequently used in summative
assessments. This usually implies a written document, such as a test, quiz, or paper. A
formal assessment is given a numerical score or grade based on student performance.
We will deal more on formal assessment strategies, particularly on tests in a later
section.

Informal Assessment: "Informal" is used here to indicate techniques that can easily
be incorporated into classroom routines and learning activities. Informal assessment
techniques can be used at anytime without interfering with instructional time. Their
results are indicative of the student's performance on the skill or subject of interest.
Thus they are more frequently used in formative assessments. Can you think of the
informal assessment strategies that you can use in your classes? What informal
assessment strategies have your teachers used when you were a student?

An informal assessment usually occurs in a more casual manner and may include
observation, inventories, checklists, rating scales, rubrics, performance and portfolio
assessments, participation, peer and self evaluation, and discussion. Formal tests

40
Assessment and Evaluation of Learning

assume a single set of expectations for all students and come with prescribed criteria
for scoring and interpretation. Informal assessment, on the other hand, requires a clear
understanding of the levels of ability the students bring with them. Only then may
assessment activities be selected that students can attempt reasonably. Informal
assessment seeks to identify the strengths and needs of individual students without
regard to grade or age norms.

Methods for informal assessment can be divided into two main types: unstructured (e.g.,
student work samples, journals) and structured (e.g., checklists, observations). The
unstructured methods frequently are somewhat more difficult to score and evaluate, but
they can provide a great deal of valuable information about the skills of the students.
Structured methods can be reliable and valid techniques when time is spent creating the
"scoring" procedures. Another important aspect of informal assessments is that they
actively involve the students in the evaluation process - they are not just paper-and-
pencil tests.

2.2.3 Criterion-referenced vs. Norm-referenced Assessments

How the results of tests and other assessment procedures are interpreted also provides
a method of classifying these instruments. There are two ways of interpreting student
performance – criterion-referenced and norm-referenced.

Criterion-referenced Assessment: This type of assessment allows us to quantify the


extent to which students have achieved the goals of a unit of study and a course. It is
carried out against previously specified criteria and performance standards. Where a
grade is assigned, it is assigned on the basis of the standard the student has achieved
on each of the criteria. This type of assessment is most appropriate for quickly
assessing what concepts and skills students have learned from a segment of
instruction. Criterion referenced classrooms are mastery-oriented, informing all students
of the expected standard and teaching them to succeed on related outcome measures.
Criterion referenced assessments help to eliminate competition and may improve
cooperation. In criterion-referenced evaluation a student’s achievement is compared

41
Assessment and Evaluation of Learning

against a specified criterion or standard or according to prescribed curriculum goals. If a


teacher’s main interest is to pinpoint how well students have mastered a particular skill,
the most appropriate form of assessment may be criterion-referenced assessment.

Norm-referenced Assessment: This type of assessment has as its end point the
determination of student performance based on a position within a cohort of students –
the norm group. This type of assessment is most appropriate when one wishes to make
comparisons across large numbers of students or important decisions regarding student
placement and advancement. For example, students’ results in grade 8 national exams
in our country are determined based on their relative standing in comparison to all other
students who have taken the exam. Thus, when we say that a student has scored 80
percentile, it doesn’t mean that the student has scored an average of 80% score.
Rather it is meant to be that the student’s average score stands above 79.9% of the
students, and the remaining 20% of students have scored above that particular student.
Students’ assignment of ranks is also another example of norm-referenced
interpretation of students’ performances. The focus of attention in this type of
assessment is on how well the student has done on a test in comparison with other
students.

To summarize, the criterion-referenced assessment emphasizes description of


student’s performance, and the norm-referenced assessment emphasizes
discrimination among individual students in terms of relative level of learning.

2.2.4 Divergent versus Convergent Assessment

Divergent assessments are those for which a range of answers or solutions might be
considered correct. For example, a Civics teacher might ask his/her students to
compare presidential and parliamentary forms of government as preferable forms of
government for a country. A student might favor a presidential form of government by
providing sound arguments and valid examples. Another student also might come up
with still convincing ideas favoring parliamentary form of government. In both cases the
answers are different but convincingly correct. So in divergent assessments there might

42
Assessment and Evaluation of Learning

not be one single answer. Divergent assessment tools include essay tests, and
solutions to the workout problems.
A convergent assessment are those which have only one correct response that the
students is trying to reach. They are generally easier to mark. They tend to be quicker to
deliver and give more specific and directed feedback to individuals. It can also provide
wide curriculum coverage. Objective test items are the best example and demonstrate
the value of this approach in assessing knowledge.

2.2.5 Process versus Product Assessment

Process assessment focuses on the steps or procedures underlying a particular ability


or task, i.e., the cognitive steps in performing a mathematical operation or the procedure
involved in analyzing a blood sample. Because it provides more detailed information,
process assessment is most useful when a student is learning a new skill and for
providing formative feedback to assist in improving performance. For example, a
Biology teacher teaching his students how to identify a microorganism using a
microscope might give them a task to do an activity. Here his/focus is not only on
whether students are able to identify the microorganism. He should also check on
whether students have followed the proper procedures to reach the conclusion.

Product assessment focuses on evaluating the result or outcome of a process. Using


the above examples, we would focus on the answer to the math computation or the
accuracy of the blood test results. Product assessment is most appropriate for
documenting proficiency or competency in a given skill. A multiple choice test that a
Mathematics teacher gives to his students, for example, is a product assessment. There
is no way he/she will check whether students have followed the proper procedures to
get the correct answer.

2.3. Assessment Strategies


Assessment strategy refers to those assessment tasks (methods/approaches/activities)
in which students are engaged to ensure that all the learning objectives of a subject, a
unit or a lesson have been adequately addressed. Assessment strategies range from
43
Assessment and Evaluation of Learning

informal, almost unconscious, observation to formal examinations. Although different


subject areas may have some differences on the assessment strategies they use,
generally, however, there are varieties of methods that can be used by most subjects.

When selecting assessment strategies in our subject areas, there are a number of
things that we have to consider. First and foremost, it is important that we choose an
assessment technique appropriate for the particular behavior being assessed. We have
to use a strategy that can give students an opportunity to demonstrate the kind of
behavior that the learning outcome demands. Assessment strategies should also be
related to the course material and relevant to students’ lives. Therefore, we have to
provide assessment strategies that relate to students’ future work.

There are many different ways to categorize learning goals for students.. One way in
which the different learning outcomes that we want out students to develop can be
categorized is presented as follows:
 Knowledge and understanding: What facts do students know outright? What
information can they retrieve? What do they understand?
 Reasoning proficiency: Can students analyze, categorize, and sort into
component parts? Can they generalize and synthesize what they have
learned? Can they evaluate and justify the worth of a process or decision?
 Skills: We have certain skills that we want students to master such as reading
fluently, working productively in a group, making an oral presentation,
speaking a foreign language, or designing an experiment.
 Ability to create products: Another kind of learning target is student-created
products - tangible evidence that the student has mastered knowledge,
reasoning, and specific production skills. Examples include a research paper,
a piece of furniture, or artwork.
 Dispositions: We also frequently care about student attitudes and habits of
mind, including attitudes toward school, persistence, responsibility, flexibility,
and desire to learn.

44
Assessment and Evaluation of Learning

Activity: In groups discuss and identify the assessment strategies


that you consider are best for assessing each of these categories of

learning goals and compare your work with that of other groups.

From among the various assessment strategies that can be used by classroom
teachers, some are described below for your consideration as student teachers.

Classroom presentations: A classroom presentation is an assessment strategy that


requires students to verbalize their knowledge, select and present samples of finished
work, and organize their thoughts about a topic in order to present a summary of their
learning. It may provide the basis for assessment upon completion of a student’s
project or essay. For example students may present a report after an

 educational visit. What other educational activities can you imagine in your
subject area where students can present their works?

Conferences: A conference is a formal or informal meeting between the teacher and a


student for the purpose of exchanging information or sharing ideas. A
conference might be held to explore the student’s thinking and suggest

next steps; assess the student’s level of understanding of a particular
concept or procedure; and review, clarify, and extend what the student has already
completed. What advantages do you think conference as a method of assessment will
have?

Exhibitions/Demonstrations: An exhibition/demonstration is a performance in a public


setting, during which a student explains and applies a process, procedure, etc., in
concrete ways to show individual achievement of specific skills and knowledge. What
type of objectives do you think this assessment strategy could serve to

measure?

Interviews: You should be familiar with the interviews journalists conduct with different
personalities. An interview can also be used for assessment purposes in educational
settings. In such applications interview is a face-to-face conversation in which teacher

45
Assessment and Evaluation of Learning

and student use inquiry to share their knowledge and understanding of a topic or
problem. This form of assessment can be used by the teacher to:
 explore the student’s thinking;
 assess the student’s level of understanding of a concept or procedure; and
 gather information, obtain clarification, determine positions, and probe for
motivations.
Observation: Observation is a process of systematically viewing and recording
students while they work, for the purpose of making instruction decisions. Observation
can take place at any time and in any setting. It provides information on students'
strengths and weaknesses, learning styles, interests, and attitudes. Observations may
be informal or highly structured, and incidental or scheduled over different periods of
time in different learning contexts.
Performance tasks: During a performance task, students create, produce, perform, or
present works on "real world" issues. The performance task may be used to assess a
skill or proficiency, and provides useful information on the process as well as the
product. Please mention some examples of performance tasks that students can do in
your subject area.

Portfolios: A portfolio is a collection of samples of a student’s work over time. It offers


a visual demonstration of a student’s achievement, capabilities, strengths,
weaknesses, knowledge, and specific skills, over time and in a variety of contexts. For a
portfolio to serve as an effective assessment instrument, it has to be focused, selective,
reflective, and collaborative. Portfolios can be prepared for different
 subjects in any educational level. What type of materials can be included in
the portfolio of students in relation to your subject?

Attention: At this point of your study, you will be required to start


 filing samples of your work (those that are indicated) as part of
your portfolio to serve as evidence of your performance on this
course.

Questions and answers: Perhaps, this is a widely used strategy by teachers with the
intention of involving their students in the learning and teaching process. In this
46
Assessment and Evaluation of Learning

strategy, the teacher poses a question and the student answers verbally, rather than in
writing. This strategy helps the teacher to determine whether students understand what
is being, or has been, presented; it also helps students to extend their thinking,
generate ideas, or solve problems. Strategies for effective question and answer
assessment include:
 Apply a wait time or 'no hands-up rule' to provide students with time to think after
a question before they are called upon randomly to respond.
 Ask a variety of questions, including open-ended questions and those that
require more than a right or wrong answer.
During what time of the lesson do you think question and answer strategy
 will be more useful? Why?

Students’ self-assessments: Self-assessment is a process by which the student


gathers information about, and reflects on, his or her own learning. It is the student’s
own assessment of personal progress in terms of knowledge, skills, processes, or
attitudes. Self-assessment leads students to a greater awareness and understanding of
themselves as learners.

Checklists, Rating Scales and Rubrics: These are tools that state specific criteria and
allow teachers and students to gather information and to make judgments about what
students know and can do in relation to the outcomes. They offer systematic ways of
collecting data about specific behaviors, knowledge and skills.

Checklists usually offer a yes/no format in relation to student demonstration of specific


criteria. They may be used to record observations of an individual, a group or a whole
class.

Rating Scales allow teachers to indicate the degree or frequency of the behaviors,
skills and strategies displayed by the learner. Rating scales state the criteria and
provide three or four response selections to describe the quality or frequency of student
work.

47
Assessment and Evaluation of Learning

Rubrics use a set of criteria to evaluate a student's performance. They consist of a


fixed measurement scale and detailed description of the characteristics for each level of
performance. These descriptions focus on the quality of the product or performance
and not the quantity. Rubrics use a set of specific criteria to evaluate student
performance. They may be used to assess individuals or groups and, as with rating
scales, may be compared over time.

The purpose of checklists, rating scales and rubrics is to:


 provide tools for systematic recording of observations
 provide tools for self-assessment
 provide samples of criteria for students prior to collecting and evaluating data on
their work
 record the development of specific skills, strategies, attitudes and behaviours
necessary for demonstrating learning
 clarify students' instructional needs by presenting a record of current
accomplishments.
In what specific instances can these assessment strategies (rating scales,
checklists and rubrics) used in your area of study? Think of specific examples
and share your ideas with your colleague.

One- Minute paper: During the last few minutes of the class period, you may ask
students to answer on a half-sheet of paper: "What is the most important point you
learned today?" and, "What point remains least clear to you?" The purpose is to obtain
data about students' comprehension of a particular class session. Then you can review
responses and note any useful comments. During the next class periods you can
emphasize the issues illuminated by your students' comments.

Muddiest Point: This is similar to ‘One-Minute Paper’ but only asks students to
describe what they didn't understand and what they think might help. It is an important
technique that will help you to determine which key points of the lesson were missed by

48
Assessment and Evaluation of Learning

the students. Here also you have to review before next class meeting and use to clarify,
correct, or elaborate.

Student- generated test questions: You may allow students to write test questions
and model answers for specified topics, in a format consistent with course exams. This
will give students the opportunity to evaluate the course topics, reflect on what they
understand, and what good test items are. You may evaluate the questions and use the
goods ones as prompts for discussion.

Tests: This is the type of assessment that you are mostly familiar with. A test requires
students to respond to prompts in order to demonstrate their knowledge (orally or in
writing) or their skills (e.g., through performance). We will learn much more about tests
later in this section.

Activity: Let’s say you need to assess student achievement on each of the following
learning targets. Which assessment strategy would you choose? Please
 jot down your answers with their justifications and file it in your portfolio
for later reference.
1. Ability to write clearly and coherently
2. Group discussion proficiency
3. Reading comprehension
4. Proficiency using specified mathematical procedures
5. Proficiency conducting investigations in science

2.4. Assessment in large classes


It is quite obvious that student numbers in a class limit the teaching methods available
to teachers. Similarly, assessment methods are restricted by class size. Due to time and
resources constraints, teachers often use less time-demanding assessment methods
which however, may not always optimize student learning.

Activity:- What problems do you think teachers will face when assessing
students in large classes? In your school years, what problems

49
Assessment and Evaluation of Learning

have you observed in assessment as a result of large class size? Please


discuss on these questions in groups.

The existing educational literature has identified various assessment issues associated
with large classes. They include:
a) Surface Learning Approach: Traditionally, teachers rely on time-efficient and
exam-based assessment methods for assessing large classes, such as multiple
choices and short answer question examinations. These assessments often only
assess learning at the lower levels of intellectual complexity. Furthermore, students
tend to adopt a surface rote learning approach when preparing for these kinds of
assessment methods. Higher level learning such as critical thinking and analysis
are often not fully assessed.

b) Feedback is often inadequate: Feedback plays an important role in the learning


process of students. Particularly, if students can receive feedback at an early stage
of their learning process, this will help them identify their own problems and improve
their learning. However, with a large class, teachers may not have time to give
detailed and constructive feedback to every student. Most teachers usually can only
afford to give general feedback to their students on written assignments and tests.

c) Inconsistency in marking: Large class usually consists of a diverse and complex


group of students. The issues of different perception towards assessments, cultural
and educational background, prior knowledge and level of interest to the subject all
pose challenges to the fairness of marking and grading. Teachers have to take all
these into account in order to ensure the consistency and fairness in marking and
grading.

d) Difficulty in monitoring cheating and plagiarism: Plagiarism is another


challenge in assessing large classes. Some students deliberately cheat in large
classes because they think that they are less likely to be identified within a large
group. In addition, as teachers usually have a heavy workload and tight marking
schedule, they do not have enough time to thoroughly check the work submitted by

50
Assessment and Evaluation of Learning

their students. To minimize plagiarism, assessment tasks must be well thought and
well-designed.

e) Lack of interaction and engagement: Students are often not motivated to engage
in a large-sized lecture. When teachers raise questions in large classes, not many
students are willing to respond. Students are less likely to interact with teachers
because they feel less motivated and tend to hide themselves in a large group. In
fact, interacting with students in class is important for teachers because they can
receive immediate feedback from students regarding their quality of teaching.

Although these issues can be problems in assessment for any class size, they are
worse in large classes because of the additional limitation and strain on resources. They
are problems that are applicable whether the function of the assessment is to facilitate
learning via feedback, or to classify students via grading.

There are a number of ways to make the assessment of large numbers of students
more effective whilst still supporting effective student learning. These include:
1. Front ending: The basic idea of this strategy is that by putting in an increased effort
at the beginning in setting up the students for the work they are going to do, the work
submitted can be improved. Therefore the time needed to mark it is reduced (as well
as time being saved in less requests for tutorial guidance).
2. Making use of in-class assignments : In-class assignments are usually quick and
therefore relatively easy to mark and provide feedback on, and help you to identify
gaps in understanding. Students could be asked to complete a task within the
timeframe of a scheduled lecture, field exercise or practical class. This might be a
very quick task, for example, completing a graph, doing some calculations,
answering some quick questions, making brief notes on a piece of text etc. In some
cases it might be possible to merge the in-class assignment with peer assessment.
3. Self-and peer-assessment: Students can perform a variety of assessment tasks in
ways, which both save the tutor’s time and bring educational benefits, especially

51
Assessment and Evaluation of Learning

the development of their own judgment skills. These include self assessment and
peer assessment strategies.
Self-assessment reduces the marking load because it ensures a higher quality of work
is submitted, thereby minimizing the amount of time expended on marking and
feedback. The emphasis on student self- assessment represents a fundamental shift in
the teacher-student relationship, placing the primary responsibility for learning with the
student. However, there are problems involved in self-assessment for grading purposes
pertaining to their validity and reliability. If self-assessment is utilized for the purposes of
grading, it is imperative to employ peer or staff cross-marking to ensure the validity of
the results. Self-assessment should also be confined to certain limited objectives such
as ascertaining whether all of the required components of an answer are present, or the
articulation of very transparent assessment criteria and standards, possibly
accompanied by examples of work of varying standards. In this regard, self-assessment
can decrease the marking load of teachers and provide students with a positive learning
experience by compelling them to examine their work from the perspective of a marker
as well as a participant.

Similar to self-assessment, peer-assessment provides useful learning experiences for


students at the same time as reducing the marking load of staff. The use of peer-
assessment can be an effective way for ensuring a student gets individual feedback that
staff may be very busy to provide in a timely manner given the class numbers involved.
This could involve providing students with answer sheets or model answers to a piece
of coursework that you had set previously and then requiring students to undertake the
marking of those assignments in class. For example, you can ask students to exchange
works with one another or collect in all of the named works and randomly assign student
markers to them.

However, as with any form of peer-assessment it needs to be carefully designed.


Students need to know what to do and there needs to be a transparent system by which
students can appeal their marks (especially if used in a summative rather than formative
context). The benefits of this approach are that:

52
Assessment and Evaluation of Learning

• students can get to see how their peers have tackled a particular piece of
work,

• they can see how you would assess the work (e.g. from the model
answers/answer sheets you've provided) and;

• they are put in the position of being an assessor, thereby giving them an
opportunity to internalize the assessment criteria.

4. Group Assessments: The most obvious advantage of group-based assessment is


that it significantly reduces the marking load if the group submits only one piece of
assessable work. The major problem of course is that group members may not
contribute equally, so how are they to be rewarded fairly? There is probably no easy
solution to this but there is a range of possible strategies which may go at least
some way to addressing the problem.

5. Changing the assessment method, or at least shortening it : Being faced with large
numbers of students will present challenges but may also provide opportunities to
either modify existing assessments or to explore new methods of assessment. You might,
for example, be able to reduce the length of the assessment task you are currently
using without detracting from your module's learning outcomes. Alternatively a large
class may provide a new opportunity to make use of peer and self-assessment.

Assignment: Visit any one of the schools in your vicinity and interview at least three
teachers in your subject area using questions you have prepared
for the purpose. The questions should be related to 1) the
problems they have faced in assessing students in large classes;
and 2) the strategies they have used to tackle the problems.
Based on the information you have collected prepare a report of
1-2 pages. You have to file the report as part of your portfolio.

2.5. Selecting and developing assessment methods and tools


One of the most difficult tasks for most teachers is assessing the performance of their
students, determining to what extent each individual student has attained the level of
mastery defined by the course outcomes. Students’ learning behaviors will be

53
Assessment and Evaluation of Learning

determined by the examinations you administer. If your goal is to establish an active


learning environment in which students are expected to learn the facts and learn to
apply them, but your exams assess only the students' ability to recite memorized facts,
you will have made it unlikely that the students will engage in meaningful learning. Thus,
the assessment tools that you employ in your subject area have an enormously
important role in determining how and what students will learn. Thus, the process of
assessing student performance in your subject area must begin with a close look at
your educational outcomes. What you will discover is that some outcomes are relatively
easy to assess, while other outcomes are very much harder to test.

Activity: List all of the forms of assessment that you have experienced during
your school years. Are there other approaches to assessment with which

you are familiar even if you haven't personally experienced them as a
student?

A wide variety of tools are available for assessing student performance and there are
approaches that are suitable for any educational objective you want to test. Examples
include objective exams, short answer and essay exams, portfolios, projects, practical
exams, presentations, and combinations of these. Appropriate tools or combinations of
tools must be selected and used if the assessment process is to successfully provide
information relevant to stated educational outcomes.

2.5.1. Selecting appropriate assessment methods and tools


When selecting and constructing assessment tools, it is very important that we consider
the following questions.

 Does the assessment adequately evaluate academic performance relevant to the


desired outcome?

 Does this assessment tool enable students with different learning styles or
abilities to show you what they have learned and what they can do?

 Does the content examined by the assessment align with the content from the
course?

54
Assessment and Evaluation of Learning

 Does this assessment method adequately address the knowledge, skills,


abilities, behavior, and values associated with the intended outcome?

 Will the assessment provide information at a level appropriate to the outcome?

 Will the data accurately represent what the student can do in an authentic or real
life situation?

 Does the assessment provide data that is specific enough for the desired
outcomes?

 Are the intended uses for the assessment clear?

 Will the information derived from the assessment help to improve teaching and
learning?

Characteristics of a good assessment tools

Good assessment at any level is characterized by the following features.

1. Clear and Appropriate Learning Targets. WHAT do we want to assess? What


knowledge, skills, reasoning ability, products, and habits of mind are essential for
student success? Do students understand what is expected of them? Is the
knowledge or skill that students are expected to demonstrate for the assessment
influenced by cultural and linguistic issues?

2. Clearly Focused and Appropriate Purpose. WHY are we assessing? How will the
assessment information be used? By whom? To make what decisions? Will the
cultural and linguistic traits of the user interfere with the intended purpose of the
assessment?

3. Appropriate Match among Targets, Purposes, and Method of Assessment.


HOW will we assess the targeted learning? Which methods of assessment are most
appropriate for each kind of target? Is the method used to address each target
designed with consideration of the cultural and linguistic traits of the students?

55
Assessment and Evaluation of Learning

4. Sufficient Sampling of Student Work to Make Sound Inferences about


Learning. HOW MUCH will we collect? Do we have enough varied samples of
student work to make good judgments about current proficiency related to the target
learning? Have we chosen sufficient and varied assessment examples that allow
the students to take advantage of their cultural and linguistic strengths?

5. Fairness and Freedom from Biases that Distort the Picture of Learning. HOW
ACCURATE are the assessments? Do they really assess what we think they’re
assessing? Is there anything about the way a target is assessed that masks the true
learning of a student or group of students? Do we know the strengths that students
bring to learning and use those strengths in our assessments?

2.5.2. Planning Tests


Tests are one of the most important and commonly used assessment instruments used
in education. The development of valid, reliable and usable questions involves proper
planning. The plan entails designing a framework that can guide the test developers in
the items development process. This is necessary because classroom test is a key
factor in the evaluation of learning outcomes. The validity, reliability and usability of
such test depend on the care with which the test are planned and prepared. Planning
helps to ensure that the test covers the pre-specified instructional objectives and the
subject matter (content) under consideration. Hence, planning classroom test involves
identifying the instructional objectives earlier stated and the subject matter (content)
covered during the teaching/learning process. This leads to the preparation of table of
specification (the test blue print) for the test while bearing in mind the type of test that
would be relevant for the purpose of testing.

To plan a classroom test that will be both practical and effective in providing evidence of
mastery of the instructional objectives and content covered requires relevant
considerations. Hence the following serves as guide in planning a classroom test.
i. Determine the purpose of the test;
56
Assessment and Evaluation of Learning

ii. Describe the instructional objectives and content to be measured.


iii. Determine the relative emphasis to be given to each learning outcome;
iv. Select the most appropriate item formats (essay or objective);
v. Develop the test blue print to guide the test construction;
vi. Prepare test items that is relevant to the learning outcomes specified in the test
plan;
vii. Decide on the pattern of scoring and the interpretation of result;
viii. Decide on the length and duration of the test, and
ix. Assemble the items into a test, prepare direction and administer the test.

The instructional objectives of the course are critically considered while developing the
test items. This is because the instructional objectives are the intended behavioural
changes or intended learning outcomes of instructional programs which students are
expected to possess at the end of the instructional process. The objectives are also
given relative weight in respect to the level of importance and emphasis given to them.
Educational objectives and the content of a course are the focus on which test
development is based. I hope you remember from our brief discussion in Chapter One
about how educational objectives are classified.

Table of Specification

A table of specification is a two-way table that matches the objectives and content you
have taught with the level at which you expect your students to perform. It contains an
estimate of the percentage of the test to be associated to each topic at each level at
which it is to be measured. In effect we establish how much emphasis to give to each
objective or content. A table of specification guides the selection of test items which in
effect ensures that the test measures a representative sample of instructionally relevant
tasks.

Developing a table of specification involves:


1. Preparing a list of learning outcomes, i.e. the type of performance students are
expected to demonstrate

57
Assessment and Evaluation of Learning

2. Outlining the contents of instruction, i.e. the area in which each type of
performance is to be shown, and
3. Preparing the two way chart that relates the learning outcomes to the
instructional content.
Now, let us try to understand how a test blue print is developed using the following table
of specification developed for a Geography test as an example.

Instructional Objectives
Contents Knowl Compreh Applic Anal Synth Evalua Tot Perc
edge ension ation ysis esis tion al ent
Air 2 2 1 1 - - 6 24%
pressur
e
Wind 1 1 1 1 - - 4 16%
Temper 2 2 1 1 - 1 7 28%
ature
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25
Percent 28% 32% 16% 16% 4% 4% 100%

As can be observed from the table, the rows show the content areas from which the test
is to be sampled; and the columns indicate the level of thinking students are required to
demonstrate in each of the content areas. Thus, the test items are distributed among
each of the five content areas with their corresponding representation among the six
levels of the cognitive domain. The percentage row and column also shown the degree
of representation of both the contents and levels of the cognitive domain in this
particular test. Thus objectives you consider are more important should get more
representation in the test items. Similarly, content areas on which you have spent more
instructional time should be allotted more test items.

Which of the objectives on the example above were given more

 emphasis? Which of them obtained least emphasis? Which content areas


obtained the highest representation? Which one obtained the last
representation? What are the implications of these differences?

58
Assessment and Evaluation of Learning

There are also other ways of developing a test blue print. One of this is a way of
showing the distribution of test items among the content areas and the type of test items
to be developed from each content area. For example, the table of specification that we
have seen earlier can be prepared in the following way.

Item Types
Contents True/ Match Short Multiple Tot Perc
False ing Answer Choice al ent
Air 1 1 1 3 6 24%
pressur
e
Wind 1 1 1 1 4 16%
Temper 1 2 1 3 7 28%
ature
Rainfall 1 1 1 2 5 20%
Clouds 1 - 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%

2.5.3 Constructing Classroom Tests


There are a wide variety of styles & formats for writing test items. One commonly used
distinction is between paper-and-pencil tests, performance assessments, and oral tests.
Paper-and-pencil tests are the traditional assessment techniques where students are
required to respond to a set of questions in writing. Performance assessment tasks
(also called authentic assessments) require the student to use equipment, generate
hypothesis, make observations, construct something or perform for an audience. For
most performance assessment tasks, there is not a single best or right response. Expert
judgment is required to score the performances. Oral test refers to verbal
communication between examiner and testee. The candidate is required to respond
orally instead of in writing. In this section you will learn about test construction
techniques. Before we deal with this issue, however, the general principles that should
be followed in test construction will be presented.

i) General Principles of Test Construction

59
Assessment and Evaluation of Learning

Assessment is a critical component of instruction and with careful use can assist in
achieving curricular goals. Considering the powerful effects of assessments it is very
important that testing tools should be carefully chosen and formulated to provide
constructive feedback to the students and teachers about students' competence and
deficiencies. However, writing high quality test questions is not an easy task. Following
are some general principles that we should consider when constructing written test
items.

1) Make the instructions for each type of question simple and brief.
2) Use simple and clear language in the questions. If the language is difficult,
students who understand the material but who do not have strong language skills
may find it difficult to demonstrate their knowledge. If the language is ambiguous,
even a student with strong language skills may answer incorrectly if his or her
interpretation of the question differs from the instructor's intended meaning.
3) Write items that require specific understanding or ability developed in that
subject.
4) Do not suggest the answer to one question in the body of another question. This
makes the test less useful, as the test-wise student will have an advantage.
Your colleagues should be able to provide the correct response to the questions. If
correct answer can be given by only the question writer, and other teachers of the same
level cannot achieve the passing marks it indicates that: either the pieces of information
being asked will never be used by the learner or the question is framed in an ambiguous
manner or is too difficult
ii. Constructing Objective Test Items
There are various types of objective test items. These can be classified into those that
require the student to supply the answer (supply type items) and those that require the
student to select the answer from a given set of alternatives (selection type items).
Supply type items include completion items and short answer questions. Selection type
test items include True/False, multiple choice and matching.

60
Assessment and Evaluation of Learning

Each type of test has its unique characteristics, uses, advantages, limitations, and rules
for construction.

Activity: 1) As students you have taken tests with different types of


formats – Multiple choice test items, True/False test items,
 short answer test items, etc. Which of this test items did you
feel more comfortable with? What are your reasons? Write
down your answers and compare it with that of your friends.
2) In groups discuss the advantages and limitations of the different types of test
items. Present the results of your discussion to the whole class.

a) True/False Test Items


I am quite sure that you are familiar with true/false test items and therefore it may not be
necessary to describe what it is. Therefore, I will focus on the characteristics of such
type of test items and present you with some guidelines that can help in constructing
better true/false items.

The main advantage of true/false items is that they do not require the student much time
for answering. This allows a teacher to cover a wide range of content by using a large
number of such items. In addition, true/false test items can be scored quickly, reliably,
and objectively by any body using an answer key.

The major disadvantage of true/false items is that when they are used exclusively, they
tend to promote memorization of factual information: names, dates, definitions, and so
on. Some argue that another weakness of true/false items is that they encourage
students for guessing. This is because any student who takes such type of tests does
have a 50 percent probability of getting the right answer. In addition true/false items:
 Can often lead a teacher to write ambiguous statements due to the difficulty of
writing statements which are clearly true or false
 Do not discriminate b/n students of varying ability as well as other test items
 Can often include more irrelevant clues than do other item types
 Can often lead a teacher to favor testing of trivial knowledge

61
Assessment and Evaluation of Learning

The following suggestions might perhaps help teachers to construct good quality
true/false test items.
 Avoid negative statements, and never use double negatives. In Right-Wrong or
True-False items, negatively phrased statements make it needlessly difficult for
students to decide whether that statement is accurate or inaccurate.
 Restrict single-item statements to single concepts. If you double-up two
concepts in a single item statement, how does a student respond if one concept is
accurate and the other isn’t? Take a look at this confusing item:
 Use an approximately equal number of items, reflecting the two categories
tested. If you typically overbook on false items in your True-False tests, students
who are totally at sea about an item will be apt to opt for a false answer and will
probably be correct.
 Make statements representing both categories equal in length. Again, to avoid
giving away the correct answers, don’t make all your false statements brief and (in
an effort to include necessary qualifiers) make all your true statements long.
Students catch on quickly to this kind of test-making tendency.

b) Matching Items
A matching item consists of two lists of words or phrases. The test-taker must match
components in one list (the premises, typically presented on the left) with components in
the other list (the responses, typically presented on the right), according to a particular
kind of association indicated in the item’s directions.

Like True-False items, matching items can cover a good deal of content in an efficient
fashion. They are a good choice if you’re interested in finding out if your students have
memorized factual information. Matching items sometimes can work well if you want
your students to cross-reference and integrate their knowledge regarding the listed
premises and responses.

The major advantage of matching items is its compact form, which makes it possible to
measure a large amount of related factual material in a relatively short time. Another
advantage is its ease of construction.
62
Assessment and Evaluation of Learning

The main limitation of matching test items is that they are restricted to the measurement
of factual information based on rote learning. Another limitation is the difficulty of finding
homogenous material that is significant from the perspective of the learning outcomes.
As a result test constructors tend to include in their matching items material which is
less significant.

The following suggestions are important guidelines for the construction of good
matching items.
 Use fairly brief lists, placing the shorter entries on the right. If the premises and
responses in a matching item are too long, students tend to lose track of what they
originally set out to look for. The words and phrases that make up the premises
should be short, and those that make up the responses should be shorter still.
 Employ homogeneous lists. Both the list of premises and responses must be
composed of similar sorts of things. If not, an alert student will be able to come up
with the correct associations simply by “elimination” because some entries in the
premises or responses may clearly be noticeable from the others.
 Include more responses than premises. If you use the exact same number of
responses as premises in a matching item, then a student who knows half or more
of the correct associations is in a position to guess the rest of the associations with
very good chances.
 List responses in a logical order. This rule is designed to make sure you don’t
accidentally give away hints about which responses connect with which premises.
Choose a logical ordering scheme for your responses (say, alphabetical or
chronological) and stick with it.
 Describe the basis for matching and the number of times a response can be
used. To satisfy this rule, you need to make sure your test’s directions clarify the
nature of the associations you want students to use when they identify matches.
Regarding the student’s use of responses, a phrase such as the following is often
employed: “Each response in the list at the right may be used once, more than
once, or not at all.”

63
Assessment and Evaluation of Learning

 Try to place all premises and responses for any matching item on a single
page. This rule’s intent is to free your students from lots of potentially confusing
flipping back and forth in order to accurately link responses to premises.

 Arrange the list of responses in logical order, place words in alphabetical


order, and numbers in sequence. This will contribute to the ease with which the
students can scan the responses in searching for the correct answers, It will also
prevent them from detecting possible clues from the arrangement of the responses.

C) Short Answer/Completion Test Items

The short-answer items and completion test items are essentially the same that can be
answered by a word, phrase, number or formula. They differ in the way the problem is
presented. The short answer type uses a direct question, where as the completion test
item consists of an incomplete statement requiring the student to complete. This can be
demonstrated by the following examples:

Short answer item: In which year did the Ethiopians defeat the Italian invaders at Adwa?

Completion item: The Ethiopian forces defeated the Italian invaders at Adwa in the year _____.

The short-answer test items are one of the easiest to construct, partly because of the
relatively simple learning outcomes it usually measures. Except for the problem-solving
outcomes measured in Mathematics and Science, it is used almost exclusively to
measure the recall of memorized information.

A more important advantage of the short-answer item is that the students must supply
the answer. This reduces the possibility that students will obtain the correct answer by
guessing. They must either recall the information requested or make the necessary
computations to solve the problem presented to them. Partial knowledge, which might
enable them to choose the correct answer on a selection item, is insufficient for
answering a short answer test item correctly.

There are two limitations cited in the use of short-answer test items. One is that they are
unsuitable for assessing complex learning outcomes. The other is the difficulty of
64
Assessment and Evaluation of Learning

scoring. This is especially true where the item is not clearly phrased to require a
definitely correct answer and the student’s spelling ability.

The following suggestions will help to make short-answer type test items to function as
intended.
 Word the item so that the required answer is both brief and specific.
Example: An animal that eats the flesh of other animals is _____. Poorly stated
An animal that eats the flesh of other animals is classified as _____. Better item
 Do not take statements directly from textbooks to use as a basis for short-answer
items. When taken out of context, such statements are frequently too general
and ambiguous to serve as good short-answer items.
 A direct question is generally more desirable than an incomplete statement.
 If the answer is to be expressed in numerical units, indicate the type of answer
wanted. For computational problems, it is usually preferable to indicate the units
in which the answer is to be expressed.

d) Multiple-Choice Items
This is the most popular type of selected-response item. It can effectively measure
many of the simple learning outcomes measured by the short-answer item, the true-
false item, and the matching item types. In addition, it can measure a variety of complex
cognitive learning outcomes.

A multiple-choice item consists of three a stem that poses the problem and a list of
suggested solutions. A student is first given either a question or a partially complete
statement. This part of the item is referred to as the item’s stem. Then three or more
potential answer-options are presented. These are usually called alternatives, choices
or options. The correct response is called the key answer, the remaining alternatives
are called distractors.

There are two important variants in a multiple-choice item:


(1) whether the stem consists of a direct question or an incomplete statement, and

65
Assessment and Evaluation of Learning

(2) whether the student’s choice of alternatives is supposed to be a correct answer or a


best answer. The following two examples will demonstrate their differences:

A direct-question (best-answer) multiple-choice item

Which of the following European countries as suffered more from the consequences of
the Second World War?
A. Germany B. Britain C. France D. Russia/USSR/

An incomplete-statement (correct-answer) multiple-choice item


The Second World War was started in the year ________________.
A. 1936 B. 1939 C. 1941 D. 1945

A key advantage of the multiple-choice item is its widespread applicability to the


assessment of cognitive skills and knowledge, as well as to the measurement of
students’ affect.. Another advantage of multiple-choice items is that it’s possible to
make them quite varied in the levels of difficulty they possess. Cleverly constructed
multiple-choice items can present very high-level cognitive challenges to students. And,
of course, as with all selected-response items, multiple-choice items are fairly easy to
score.

The key weakness of multiple-choice items is that when students review a set of
alternatives for an item, they may be able to recognize a correct answer that they would
never have been able to generate on their own. In that sense, multiple-choice items can
present an exaggerated picture of a student’s understanding or competence, which
might lead teachers to invalid inferences.

Another serious weakness, one shared by all selected-response items, is that multiple-
choice items can never measure a student’s ability to creatively synthesize content of
any sort. Finally, in an effort to come up with the necessary number of plausible
alternatives, novice item-writers sometimes toss in some alternatives that are obviously
incorrect.

66
Assessment and Evaluation of Learning

The general applicability and the superior qualities of multiple-choice test items are
realized well when care is taken in their construction. This involves formulating a clearly
stated problem, identifying plausible alternatives, and avoiding irrelevant clues to the
answer. The following are more specific suggestions for the construction of good
multiple choice items.
 The question or problem in the stem must be self-contained. The stem
should contain as much of the item’s content as possible, thereby rendering the
alternatives much shorter than would otherwise be the case.
 Avoid negatively stated stems. Just as with the True/False items, negatively
stated stems can create genuine confusion in students.
 Each alternative must be grammatically consistent with the item’s stem.
Well, as you can see from the next sample item, grammatical inconsistency for
three of these answer-options supplies students with an unintended clue to the
correct answer.
 Make all alternatives plausible, but be sure that one of them is indisputably
the correct or best answer. As I indicated when describing the weaknesses of
multiple-choice items, teachers sometimes toss in one or more implausible
alternatives, thereby diminishing the item substantially. Although avoiding that
problem is important, it’s even more important to make certain that you really do
have one valid correct answer in any item’s list of alternatives, rather than two
similar answers, either of which could be arguably correct.
 Randomly use all answer positions in approximately equal numbers. If you
use four-option items, make sure that roughly one-fourth of the correct answers
turn out to be A, one fourth B, and so on.
 Never use “all of the above” as an answer choice, but use “none of the
above” to make items more demanding. Students often become confused
when confronted with items that have more than one correct answer. Usually,
what happens is they’ll see one correct alternative and instantly opt for it without
recognizing that there are other correct options later in the list. In addition,
students will definitely opt for the “all of the above option” if they realize that two

67
Assessment and Evaluation of Learning

of the alternatives are correct without considering the third option. However, we
can increase the difficulty level of a test item by presenting three or four answer
options, none of which is correct, followed by a correct “none-of-the-above”
option.
 Verbal associations between the stem and the correct answer should be
avoided. Frequently a word in the correct answer will provide an irrelevant clue
because it looks or sounds like a word in the stem of the item.
 The relative length of the alternatives should not provide a clue to the
answer. Since the correct alternative usually needs to be qualified, it tends to be
longer than the distracters unless a special effort is made to control the length of
the alternatives.

In addition, it is to consider the following suggestions:


 Design each item to measure an important learning outcome.
 Eliminate unnecessary wordiness
 All options should be homogeneous
 Make all options grammatically consistent with the stem of the item
 Order options logically

Activity: Examine the following faulty multiple choice items and identify their
problems.1) The term "side effect" of a drug refers to:
A. additional benefits from the drug.
B. the chain effect of drug action.
C. the influence of drugs on crime.
D. any action of a drug in the body other than the one the doctor wanted
the drug to have
2) When linking two clauses, one main and one subordinate, one should use a:
A. coordinate conjunction such as and or so
B. subordinate conjunction such as because or although.
C. preposition such as to or from.
D. semicolon.
3) Entomology is:
A. the study of birds.
B. the study of fish.
C. the study of insects.
68
Assessment and Evaluation of Learning

4) The promiscuous use of sprays, oils, and antiseptics in the nose during acute colds
is a pernicious practice because it may have a deleterious effect on
A. the sinuses.
B. red blood cells.
C. white blood cells.
5) An electric transformer can be used:
A. for storing electricity
B. to increase the voltage of alternating current
C. It converts electric energy in mechanical energy
D. alternating current is changed to direct current

iii. Constructing Essay or Subjective test items

In the previous paragraphs you have been learning on how objective tests should be
constructed. You have learned that well constructed objective tests can measure a
variety of learning outcomes, from simple to complex. Despite this wide applicability of
such type of tests, there remain significant learning outcomes for which no satisfactory
objective measurements have been developed. These include such outcomes as the
ability to recall, organize, and integrate ideas and the ability to express oneself in
writing. Such outcomes require less structuring of responses than objective test items,
and it is in the measurement of these outcomes that written essays are of great value.

The distinctive feature of essay questions is that students are free to construct, relate,
and present ideas in their own words. Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and evaluate ideas require the freedom of
response and the originality provided by essay questions.

Essay questions can be classified into two types – restricted-response essay questions
and extended response essay questions. Now let us briefly see these type of questions.

Restricted-response essay questions: These types of questions usually limit both the
content and the response. The content is usually restricted by the scope of the topic to
be discussed. Limitations on the form of response are generally indicated in the
question. This can be demonstrated in the following example:

69
Assessment and Evaluation of Learning

In what ways are essay questions more preferable than objective test
items? Answer in a brief paragraph.


Extended response Essays: these types of questions allow students:


 to select any factual information that they think is relevant,
 to organize the answer in accordance with their best judgment, and;
 to integrate and evaluate ideas as they deem appropriate.
This freedom enables them to demonstrate their ability to analyze problems, organize
their ideas, describe in their own words, and/or develop a coherent argument.

In addition to the already described capacity in measuring higher order thinking skills,
essay questions have some more advantages which include the following:
 Extended-response essays focus on the integration and application of thinking
and problem solving skills.
 Essay assessments enable the direct evaluation of writing skills.
 Essay questions, as compared to objective tests, are easy to construct.
 Essay questions have a positive effect on students learning.

On the other hand, essay questions also have some limitations which you need to be
aware of. Perhaps the most commonly cited problem of those test questions is their
unreliability of scoring. Thus, the same paper may be scored differently by different
teachers, and even the same teacher may give different scores for the same paper at
different times. Another limitation is the amount of time required for scoring the
responses. Still another problem with essay tests is the limited sampling of content they
provide.

There are some guidelines for improving the reliability and validity of essay scores. The
following are suggestions for the construction of good essay questions:
 Restrict the use of essay questions to those learning outcomes that can
not be measured satisfactorily by objective items. As we have seen earlier,
objective measures have the advantage of efficiency and reliability. When

70
Assessment and Evaluation of Learning

objective items are inadequate for measuring learning outcomes, however, the
use of essay questions becomes necessary despite their limitations.
 Structure items so that the student’s task is explicitly bounded. Phrase
your essay items so that students will have no doubt about the response you’re
seeking. Don’t hesitate to add details to eliminate ambiguity.
 For each question, specify the point value, an acceptable response-length,
and a recommended time allocation. What this second rule tries to do is give
students the information they need to respond appropriately to an essay item.
The less guessing that your students are obliged to do about how they’re
supposed to respond, the less likely it is that you’ll get lots of off-the-wall essays
that don’t give you the evidence you need.
 Employ more questions requiring shorter answers rather than fewer
questions requiring longer answers. This rule is intended to foster better
content sampling in a test’s essay items. With only one or two items on a test,
chances are awfully good that your items may miss your students’ areas of
content mastery or non mastery.
 Don’t employ optional questions. When students are made to choose their
essay items from several options, you really end up with different tests,
unsuitable for comparison.
 Test a question’s quality by creating a trial response to the item. A great
way to determine if your essay items are really going to get at the responses you
want is to actually try writing a response to the item, much as a student might do.

2.5.4 Constructing Performance Assessments


In the previous paragraphs you have been learning on how paper- and-pencil tests
should be constructed. You have learned that well constructed tests can measure a
variety of learning outcomes, from simple to complex. Despite this wide applicability of
paper-and-pencil tests, there remain many important learning outcomes that emphasize
the actual performance of tasks in realistic settings can not be assessed using paper
and pencil tests. Performance-based assessments are needed to check whether the

71
Assessment and Evaluation of Learning

desired learning outcomes are achieved up to the expected standards. For example,
oral performance is required to assess a student’s spoken communication skills in a
certain language. Similarly, the use of mathematics to solve meaningful problems and to
communicate solutions to others may also be best assessed by the use of performance
tasks in realistic settings.

Performance assessment is assessment based on observation and judgment; we look


at a performance or product and make a judgment as to its quality. Examples include
the following:

• Complex performances such as playing a musical instrument, carrying out the


steps in a scientific experiment, speaking a foreign language, reading aloud with
fluency, or working productively in a group. In these cases it is the doing—the
process—that is important.
• Creating complex products such as a term paper, a lab report, or a work of art.
In these cases what counts is not so much the process of creation (although that may
be evaluated, too), but the level of quality of the product itself.

As with extended written response assessments, performance assessments have two


parts: a performance task or exercise and a scoring guide. Again, the scoring guide can
award points for specific features of a performance or product that are present, or it can
take the form of a rubric, in which levels of quality are described. For example, to
assess the ability to do a simple process, such as doing long division, points might be
awarded for each step done in the correct order. Or, for more complex processes or
products, you might have a rubric for judging quality that has several dimensions, such
as ideas, organization, voice, word choice, sentence fluency and conventions in writing,
or content, organization, presentation, and use of language in an oral presentation.
Again, scores could be reported in number or percent of points earned, or in terms of a
rubric score.

72
Assessment and Evaluation of Learning

A performance is any activity or product a student demonstrates or creates.


Performances can be assessed usually with the aid of a rating rubric or scoring form.
Performance assessments typically focus on demonstration of skills. Examples include:
 Constructed response written examination (essays, sentence-completion,
products);
 Oral examination or presentations
 Project (individual or team)
 Written case study
 Portfolio
 Work product
 Student peer or self-evaluations
Performance assessments can be administered to individual students or groups of
students.

2.6. Arrangement of test items


There are various methods of grouping items in an achievement test depending on their
purposes. For most purposes the items scan be arranged by a systematic consideration
of:
 The type of items used
 The learning outcomes measured
 The difficulty of the items, and
 The subject matter measured
First, the items should be arranged in sections by item type. That is all True-false items
should be grouped together, then matching items, then all short answer or completion
items, and then all multiple choice items. Extended-response essay questions and
performance tasks usually take a lot of time that they would be administered alone. If
combined with some of the other types of items and tasks, the extended response tasks
should come last.

Arranging the sections of a test in this order produces a sequence that roughly
approximates the complexity of the outcomes measured, ranging from the simple to the

73
Assessment and Evaluation of Learning

complex. It is then a merely a matter of grouping the items within each item type. For
this purpose, items that measure similar outcomes should be placed together and then
arranged in order of ascending difficulty. For example the items under the multiple
choice section might be arranged in the following order: knowledge of terms, knowledge
of specific facts, knowledge of principles, and application of principles. Keeping together
items that measure similar learning outcomes is especially helpful in determining the
type of learning outcomes causing students the greatest difficulty.

If, for any reason, it is not feasible to group the items by the learning outcomes
measured, then it is still desirable to arrange them in order of increasing difficulty.
Beginning with the easiest items and proceeding gradually to the most difficult has a
motivating effect on students. Also, encountering difficult items early in the test often
causes students to spend a disproportionate amount of time on such items. If the test is
long, they may be forced to omit later questions that they could easily have answered.
With the items classified by item type, the sections of the test and the items within each
section can be arranged in order of increasing difficulty.

To summarize, the most effective method for organizing items in the typical classroom
test is to:
1. Form sections by item type
2. Group the items within each section by the learning outcomes measured, and
3. Arrange both the sections and the items within sections in an ascending order
of difficulty.

Project Work: In groups of four, take one exam paper from the school you are placed
for your practicum experience which includes at least three
types of test items. Then evaluate the items and the test in

general based on the guidelines of test construction you have
learned in the unit. You have to prepare and submit a report

74
Assessment and Evaluation of Learning

of your evaluation to your instructor. The test paper you have evaluated should
also be attached with your report.

2.7. Administrating and scoring Tests and reporting results


Test Administration refers to the procedure of actually presenting the learning task that
the examinees are required to perform in order to ascertain the degree of learning that
has taken place during the teaching-learning process. This procedure is as important as
the process of preparing the test. This is because the validity and reliability of test
scores can be greatly reduced when test is poorly administered. While administering
test all examinees must be given fair chance to demonstrate their achievement of the
learning outcomes being measured. This requires the provision of a physical and
psychological environment which is conducive to their making their best efforts and the
control of such factors such as malpractices and unnecessary threat from test
administrators that may interfere with valid measurement. It is also concerned with
selecting convenient and accurate procedures for scoring the results.

There are a number of conditions that may create test anxiety on students and therefore
should be taken care of during test administration. These include:
 Threatening students with tests if they do not behave
 Warning students to so their best “because the test is important”
 Telling students they must work fast in order to finish on time.
 Threatening dire consequences if they fail.

2.7.1. Ensuring Quality in Test Administration


Quality and good control are necessary components of test administration. The
following are guidelines and steps involved in test administration aimed at ensuring
quality in test administration.
 Collection of the question papers in time from custodian to be able to start the
test at the appropriate time stipulated.
 Ensure compliance with the stipulated sitting arrangements in the test to
prevent collision between or among the test takers.
 Ensure orderly and proper distribution of questions papers to the test takers.
75
Assessment and Evaluation of Learning

 Do not talk unnecessarily before the test. Test takers’ time should not be
wasted at the beginning of the test with unnecessary remarks, instructions or
threat that may develop test anxiety.
 It is necessary to remind the test takers of the need to avoid malpractices
before they start and make it clear that cheating will be penalized.
 Stick to the instructions regarding the conduct of the test and avoid giving hints
to test takers who ask about particular items. But make corrections or
clarifications to the test takers whenever necessary.
 Keep interruptions during the test to a minimum.

2.7.2. Credibility and Civility in Test Administration


Credibility and Civility are aspects of characteristics of assessment which have day to
day relevance for developing educational communities. Credibility deals with the value
the eventual recipients and users of the results of assessment place on the result with
respect to the grades obtained, certificates issued or the issuing institution. While civility
on the other hand enquires whether the persons being assessed are in such conditions
as to give their best without hindrances and burdens in the attributes being assessed
and whether the exercise is seen as integral to or as external to the learning process.

Hence, in test administration, effort should be made to see that the test takers are given
a fair and unaided chance to demonstrate what they have learnt with respect to:
Instructions: Test should contain a set of instructions which are usually of two
types. One is the instruction to the test administrator while the other one is to the
test taker. The instruction to the test administrator should explain how the test is to
be administered the arrangements to be made for proper administration of the test
and the handling of the scripts and other materials. The instructions to the
administrator should be clear for effective compliance. For the test takers, the
instruction should direct them on the amount of work to be done or of tasks to be
accomplished. The instruction should explain how the test should be performed.
Examples may be used for illustration and to clarify the instruction on what should
be done by the test takers. The language used for the instruction should be
76
Assessment and Evaluation of Learning

appropriate to the level of the test takers. The necessary administrators should
explain the test takers instruction for proper understanding especially when the
ability to understand and follow instructions is not part of the test.

Duration of the Test: The time for accomplishing the test is technically important in
test administration and should be clearly stated for both the test administrators and
test takers. Ample time should be provided for candidates to demonstrate what they
know and what they can do. The duration of test should reflect the age and
attention span of the test takers and the purpose of the test.

Venue and Sitting Arrangement: The test environment should be learner friendly
with adequate physical conditions such as work space, good and comfortable
writing desks, proper lighting, good ventilation, moderate temperature,
conveniences within reasonable distance and serenity necessary for maximum
concentration. It is important to provide enough and comfortable seats with
adequate sitting arrangement for the test takers’ comfort and to reduce
collaboration between them. Adequate lighting, good ventilation and moderate
temperature reduce test anxiety and loss of concentration which invariably affects
performance in the test. Noise is another undesirable factor that has to be
adequately controlled both within and outside the test immediate environment since
it affects concentration and test scores.

Other necessary conditions: Other necessary conditions include the fact that the
questions and questions paper should be friendly with bold characters, neat, decent,
clear and appealing and not such that intimidates test taker into mistakes. All
relevant materials for carrying out the demands of the test should be provided in
reasonable number, quality and on time.

All these are necessary to enhance the test administration and to make assessment
civil in manifestation.

On the other hand, for the credibility effort should be made to moderate the test
questions before administration based on laid down standard. It is also important to
77
Assessment and Evaluation of Learning

ensure that valid questions are constructed based on procedures for test construction
which you already have learned in the earlier sections of this unit.

Secure custody should be provided for the questions from the point of drafting to
constituting the final version of the test, to provision of security and safe custody of live
scripts after the assessment, transmitting them to the graders and provision of secure
custody for the grades arising from the assessment against loss, mutilation and
alteration. The test administrators and the graders should be of proven moral integrity
and should hold appropriate academic and professional qualifications. The test scripts
are to be graded and marks awarded strictly by using itemized marking schemes. All
these are necessary because an assessment situation in which credibility is seriously
called to question cannot really claim to be valid.

2.7.3 Scoring tests


In the evaluation of classroom learning outcomes marking schemes are prepared
alongside the construction of the test items in order to score the test objectively. The
marking scheme describes how marks are to be distributed amongst the questions and
between the various parts of the question. This distribution is dependent on the
objectives stated for the learning outcome during teaching and the weight assigned to
the questions during test preparation and construction of the test item. The marking
scheme takes into consideration the facts required to answer the questions and the
extent to which the language used meets the requirement of the subject.

Essay questions create some difficulty on teachers in scoring the answers so that
achievement is reliably measured than other types of tests. Therefore, some attention
will be given here to the considerations you should make when scoring essay questions.

As you are already aware the construction and scoring of essay questions are
interrelated processes that require attention if a valid and reliable measure of
achievement is to be obtained. In the essay test the examiner is an active part of the
measurement instrument. Therefore, the variability within and between examiners affect
the resulting score of examinee. This variability is a source of error, which affects the

78
Assessment and Evaluation of Learning

reliability of essay test if not adequately controlled. Hence, for the essay test result to
serve useful purpose as valid measurement instrument conscious effort is made to
score the test objectively by using appropriate methods to minimize the effort of
personal biases on the resulting scores; and applying standards to ensure that only
relevant factors indicated in the course objectives and called for during the test
construction are considered during the scoring. There are two common methods of
scoring essay questions.

i. The point or analytic method: In this method each answer is compared with
already prepared ideal marking scheme (scoring key) and marks are assigned
according to the adequacy of the answer. When used carefully, the analytic method
provides a means for maintaining uniformity in scoring between scorers and
between scripts thus improving the reliability of the scoring.
ii. The global/holistic of rating method: In this method the examiner first sorts the
response into three or more categories of varying quality based on his general or
global impression on reading the responses. The standard of quality helps to
establish a relative scale, which forms the basis for ranking responses from those
with the poorest quality response to those that have the highest quality response.
When the scorer is completely satisfied that each response is in its proper category,
it is marked accordingly..

As we have seen earlier the most serious limitation with essay questions is related to
scoring. Therefore, the following guidelines would be helpful in making the scoring of
essay items easier and more reliable.
1. You should ensure that you are firm emotionally, mentally etc before scoring
2. All responses to one item should be scored before moving to the next item
3. Write out in advance a model answer to guide yourself in grading the students’
answers
4. Shuffle exam papers after scoring every question before moving to the next
5. The names of test takers should not be known while scoring to avoid bias

2.7.4 Reporting Assessment Results


79
Assessment and Evaluation of Learning

The task of reporting students’ progress cannot be separated from the procedures
used in assessing students learning. If educational objectives have been clearly defined
in performance terms and relevant tests and other assessment procedures have been
properly used, grading and reporting become a matter of summarizing the results and
presenting them in understandable form. School grades and progress reports serve
various functions in the school. They provide information that is helpful to students,
parents and school personnel.

Different reporting methods have been used in schools including: letter grades, numeric
grades, the pass-fail system, checklist of objectives, portfolios of selected examples of
student work, and parent-teacher conferences. Obviously in our country’s education
system, we use numeric grades to report students’ performance at secondary school
level. This doesn’t, however, mean that we should be limited to this reporting method.
Depending on the purposes we want our report to serve, we can use a combination of
two or more reporting methods. For example, we may give marks to summarize
students’ overall performance. At the same time we may hold conferences with parents
to report a qualitative description of students’ progress in their learning. Thus, as
prospective teachers, we are required to have skills of interpreting results to students’
parents or other lay audience. Such conference supplements the more formal written
report of students’ progress.

Unit Summary
In this unit you were introduced to different types of assessment approaches, namely
formal vs. informal, criterion referenced vs. norm referenced, formative vs. summative
assessments. You also learned about various assessment strategies. These include:
classroom presentations, exhibitions/demonstrations, conferences, interviews,
observations, performance tasks, portfolios, question and answer, students’ self
assessment, checklists, rating scales and rubrics, one-minute paper, muddiest point,
students-generated questions and tests.

You also learned about the challenges in the assessment of large classes and their
consequences and some of the strategies that we can use to minimize those
80
Assessment and Evaluation of Learning

challenges. These strategies include: front ending, Making use of in-class assignments,
self and peer assessment, group assessment, Changing the assessment method, or at
least shortening it.

Much of this unit was devoted to the construction of the widely assessment techniques,
that is tests. This was preceded by a discussion about the planning of tests. In this
regard, tests were classified into two broad categories: Objective tests and performance
assessment tasks (essay tests). Objective tests were further divided into supply type
items and selection type items. Supply type items include short answer and completion
items, where as selection type items include True/false items, matching items and
multiple choice items. Essay items were also classified into restricted essay items and
extended essay items. Here you have learned about the strengths and limitations of
these different test item types. You were also introduced to the major guidelines you
should follow in constructing these test item types.

This unit also covered about the arrangement of test item types. Finally you also
learned about the techniques and procedures we should follow during test
administration.

Self-Check Exercises
1. State the differences between formative and summative
assessment, criterion referenced and norm referenced
assessment, and formal and informal assessment.
2. What conditions do we consider in selecting assessment strategies in our
subject?
3. List down the major assessment strategies that you can use in your subject and
classify them as formal and informal strategies.
4. What are the major problems associated with assessing students in large
classes? What strategies can we use to minimize these problems?
5. What is a table of specification and what major purposes does it serve?
6. What is the difference between objective tests and essay tests?
7. What are the advantages of objective tests as compared to essay tests?
81
Assessment and Evaluation of Learning

8. What are the advantages of essay tests as compared to objective tests?


9. What are the major procedures we need to follow during test administration?

References

Angelo, T.A. & Cross, K.P (1993). Classroom Assessment Techniques; A Handbook
for College Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.

Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education
Through Assessment, Innovation, and Evaluation. American Academy of Arts and
Sciences

Educational Testing Services. Linking Classroom Assessment with Student


Learning.

Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. 3 rd ed. Learning
Matters Ltd

McDonald E. S. & Hershman D. M. (2010). Classrooms that Spark! Recharge and


Revive Your Teaching. 2nd Ed. San Francisco: Jossey-Bass Publishers.

Mehrens, W.A. & Lehman, I.J Measurement and Evaluation in Education. 4 th Ed.
New York: Harcourt Brace College Publishers.

Miller, D.M, Linn, RL. & Grunland, NE. (2009). Measurement and Assessment in
Teaching. 10th ed. Upper Saddle River:Pearson Education, Inc.

Phye, G. D. (ed). (1997). Handbook of Classroom Assessment Learning,


Achievement, and Adjustment. San Diago: Academic Press.

Spiller, D. (2009). Assessment: Feedback to promote student learning. Teaching


Development| Wāhanga Whakapakari Ako

82
Assessment and Evaluation of Learning

Western and Northern Canadian Protocol for Collaboration in Education. (2006).


Rethinking Classroom Assessment with Purpose in Mind: Assessment for
Learning, Assessment as Learning, and Assessment of Learning

83
Assessment and Evaluation of Learning

UNIT 3

Describing and Interpreting Test Scores


3.1 Introduction

In unit two you learned about various assessment strategies that can be
 used in the context of secondary education. You were also introduced
with the planning, construction and administration of classroom tests. In
this unit you are going to be familiarized with the idea of test score interpretation and the
major statistical techniques that can be used to interpret test scores. Particularly, you
will learn about the methods of interpreting test scores, measures of central tendency,
measures of dispersion or variability, measures of relative position, and measures of
relationship or association.

Learning Outcomes

Upon completion of this unit, you should be able to:

 Describe test scores


 List techniques of interpreting scores
 Apply measures of central tendency, variability, relative position and
relationship in interpreting scores.
 Select appropriate technique (s) for interpreting test scores.
 Develop reports on the implications of scores.
 Propose appropriate score interpretation based decisions

3.2 Describing and interpreting test results

Imagine that you receive a grade of 60 for a midterm exam in one of your university
classes. What does the score mean, and how should we interpret it?

Test interpretation is a process of assigning meaning and usefulness to the scores


obtained from classroom test. This is necessary because the raw score obtained from a
84
Assessment and Evaluation of Learning

test standing on itself rarely has meaning. For instance, a score of 60% in one
Assessment and evaluation of learning test cannot be said to be better than a score of
50% obtained by the same test taker in another test of the same subject. The test
scores on their own lack a true zero point and equal units. Moreover, they are not based
on the same standard of measurement and as such meaning cannot be read into the
scores on the basis of which academic and psychological decisions may be taken.

Kinds of scores

Data differ in terms of what properties of the real number series (order, distance, or
origin) we can attribute to the scores. The most common kinds of scores include
nominal, ordinal, interval, and ratio scales.

A nominal scale involves the assignment of different numerals to categorize that are
qualitatively different. For example, we may assign the numeral 1 for males and 2 for
females. These symbols do not have any of the three characteristics (order, distance, or
origin) we attribute to the real number series. The 1 does not indicate more of
something than the 0.

An ordinal scale has the order property of a real number series and gives an indication
of rank order. For example, ranking students based on their performance on a certain
athletic event would involve an ordinal scale. We know who is best, second best, third
best, etc. But the ranked do not tell us anything about the difference between the
scores.

With interval data we can interpret the distances between scores. If, on a test with
interval data, a Almaz has a score of 60, Abebe a score of 50, and Beshadu a score of
30, we could say that the distance between Abebe’s and Beshadu’s scores (50 to 30)
is twice the distance between Almaz”s and Abebe’s scores (60 t0 50).

If one measures with a ratio scale, the ratio of the scores has meaning. Thus, a person
whose height is 2 meters is twice as a tall as a person whose height is 1 meter. We can
make this statement because a measurement of 0 actually indicates no height. That is,
85
Assessment and Evaluation of Learning

there is a meaningful zero point. However, if a student scored 0 on a spelling test, we


would not interpret the score to mean that the student had no spelling ability.

3.2.1. Methods of Interpreting test scores

If a student responds correctly to 65 items on an objective tests which each correct


item counts one point, the raw score will be 65. Thus a raw score is simply the number
of points received on a test when the test has been scored according to the directions.
We all are familiar with raw scores from our many years of taking classroom tests.
Although a raw score is a numerical summary of a student’s test performance, it is not
meaningful without further information. In general we can provide meaning to a raw
score either by converting it into a description of the specific tasks the student can
perform (criterion referenced interpretation) or converting it into some type of derived
score that indicates the student’s relative position in a clearly defined referenced group
(norm referenced interpretation). In some cases both types of interpretation may be
appropriate and useful.

Criterion referenced interpretation

Criterion - referenced interpretation is the interpretation of test raw score based on the
conversion of the raw score into a description of the specific tasks that the learner can
perform. That is, a score is given meaning by comparing it with the standard of
performance that is set before the test is given. It permits the description of a learner’s
test performance without referring to the performance of others. Thus, we might
describe a pupil’s performance in terms of the speed with which a task is performed,
the precision with which a task is performed, or the percentage of items correct on some
clearly defined set of learning tasks. The percentage-correct score is widely used in
criterion-referenced test interpretation.

Criterion referenced interpretation of test results is most meaningful when the test has
been specifically designed for this purpose. This typically involves designing a test that
measures a set of clearly stated learning tasks. Enough items are used for each

86
Assessment and Evaluation of Learning

interpretation to make it possible to describe test performance in terms of students’


mastery or non-mastery of learning tasks.

Norm referenced test interpretation

Norm – referenced interpretation is the interpretation of raw score based on the


conversion of the raw score into some type of derived score that indicates the learner’s
relative position in a clearly defined referenced group. This type of interpretation tells us
how an individual compares with other persons who have taken the same test.

Norm – referenced interpretation is usually used in the classroom test interpretation by


ranking the test takers raw scores from highest to lowest scores. It is then interpreted by
noting the position of an individual’s score relative to that of other test takers in the
classroom test. The interpretation such as third position from highest position or about
average position in the class provides a meaningful report for the teacher and the test
takers on which to base decision. In this type of test score interpretation, what is
important is a sufficient spread of test scores to provide reliable ranking. The
percentage score or the relative easy / difficult nature of the test is not necessarily
important in the interpretation of test scores in terms of relative performance.

3.2.2. Measures of Central Tendency


It is often important to summarize characteristics of a distribution of test scores. One
characteristic of particular interest is a measure of central tendency. The goal of the
measures of central tendency is to come up with the one single score that best
describes a distribution of scores. They let us know if the distribution of scores tends to
be composed of high scores or low scores.

There are three basic measures of central tendency – the mean, the mode and the
median - and choosing one over another depends on two different things:
1. The scale of measurement used, so that a summary makes sense given the
nature of the scores.

87
Assessment and Evaluation of Learning

2. The shape of the frequency distribution, so that the measure accurately


summarizes the distribution.

The Mean

The mean, or arithmetic average, is the most widely used measure of central tendency.
It is the average of a set of scores computed simply by adding together all scores and
dividing by the number of scores. The mean takes into account the value of each score,
and so one extremely high or low score could have a considerable effect on it. It is
helpful to know the mean because then you can see which numbers are above and
below the mean.

Here is an example of test scores for a Math’s class: 82, 93, 86, 97, 82. To find the
Mean, first you must add up all of the numbers. (82+93+86+97+82= 433) Now, since
there are 5 test scores, we will next divide the sum by 5. (440÷5= 88). Thus, the Mean is
88. The formula used to compute the mean is as follows:

Where, = Mean

∑ = the sum of
X = any score
N = Number of scores

The Median

In some circumstances, the mean may not be the best indicator of student
performance. If there are one or a few students who score considerably lower (or
higher) than the other students, their scores tend to pull the mean in their direction. In
this case the median is usually considered a better indicator of student performance.
88
Assessment and Evaluation of Learning

There are also some types of scores that are reported for standardized tests for which
the mean is not appropriate (percentile scores), so the median is used.

The median is a counting average. It is the number that divides a distribution of scores
exactly in half. It is determined by arranging the scores in order of size and counting up
to (or down to) the midpoint of the set scores. The median will usually be around where
most scores fall. When the number of scores is odd, the median is the middle score. If
the number of scores is even, the median will be halfway between the two middle most
scores. In this case the median is not an actual score earned by one of the students.

Example 1 Example 2 Example 3 Example 4


Scores Scores Scores Scores
50 50 49 50
48 49 48 49
48 48 48 47
47 46 47 47
45 46 45 45
44 43 44 45
43 43 43 45
42 42 42 44
42 41 42 42
41 41 41 41
38 41

In example 1, our line would be between 44 and 45, so the median would be halfway
between them at 44.5. In this case the median is not an actual score earned by one of
the students. In example 2, the distance between the two middle scores (43 and 46) is
more than one, so we again find the point halfway between them for our median of
44.5. If the number of students is uneven, the median is the one score that is the
middle score in the frequency distribution, having equal numbers of scores above and

89
Assessment and Evaluation of Learning

below it. Thus, the median is 44 in example 3, and 45 in example 4. It does not matter if
more than one student earns that score, as in example 4.
The Mode
This is the score (or scores) that occur most frequently and is determined by inspection.
It is the least reliable type of statistical average and is frequently used merely as a
preliminary estimate of central tendency. A set of scores may sometimes have two or
more modes and in such cases are called bimodal or multimodal respectively.

If the data is categorical (measured on the nominal scale), then only the mode can be
calculated. The mode can also be calculated with ordinal and higher data, but it often is
not appropriate. If other measures can be calculated, the mode would never be the first
choice. For example, the following test scores, 7, 7, 7, 20, 23, 23, 24, 25, 26 have a
mode of 7, but obviously it doesn’t make much sense. Remember, measures of central
tendency look for the one number which best describe all of the numbers.

Shape of Distributions: Skewness


There is one important situation in which all three measures of central tendency are
identical. This occurs when a distribution is symmetrical, that is, when the right half of
the distribution is the mirror image of the left half of the distribution. In this case the
mean will fall exactly at the middle of the distribution (the median position) and the value
at this central point will be the most frequently observed data value, the mode. If the
values of the mean, the mode and the median are identical, a distribution will always be
symmetrical.

90
Assessment and Evaluation of Learning

Figure 1: Shape of distribution of scores

To the extent that differences are observed among these three measures, the
distribution is asymmetrical or “skewed”. These include positively skewed distributions
and negatively skewed distributions. In a positively-skewed distribution (see figure 1
above) most of the scores concentrate at the low end of the distribution. This might
occur, for example, if the test was extremely difficult for the students. .In a negatively-
skewed distribution, as shown in figure 1 above, the majority of scores are toward the
high end of the distribution. This could occur if we gave a test that was easy for most of
the students.

Points to note
 With perfectly bell shaped distributions, the mean, median, and

 mode are identical.


 With positively skewed data, the mode is lowest, followed by the
median and mean.
 With negatively skewed data, the mean is lowest, followed by the median and
mode.

3.2.3 Measures of Variability/Dispersion


The measures of central tendency focus on what is typical, average or in the middle of a
distribution. The information provided by these measures is not sufficient to convey all
we need to know about a distribution. Knowing the mean, the median or the mode (or all
91
Assessment and Evaluation of Learning

of these) of a distribution does not allow us to differentiate between distributions. We


need additional information about the distributions.

A set of scores can be more adequately described if we know how much they spread
out above and below the measure of central tendency. For example, we might have two
groups of students with a mean score of 70, but in one group the span of scores is from
60 to 80 and in the other group the span is from 50 to 100. These represent quite
different spreads of performance. We can identify such differences by numbers that
indicate how much scores spread out in a group. These are called measures of
variability or dispersion. The three most commonly used measures of variability are the
range, the quartile deviation, and the standard deviation.

The Range

It is the simplest and crudest measure of variability calculated by subtracting the lowest
score from the highest score. For example, if the score of 10 students in a certain test
is: 5, 7, 8, 10, 12, 13, 14, 15, 17, 19, then the range will be 19 -5 = 14. The range
provides a quick estimate of variability but is undependable because it is based on the
position of the two extreme scores. The addition of subtraction of a single score can
change the range significantly.

Inter quartile range

Inter quartile range (IQR) is another range measure but this time looks at the data in
terms of quarters or percentiles. IQR is the distance between the 25 th and 75th percentile
or the first and third quarter. The range of data is divided into four equal percentiles or
quarters (25%). IQR is the range of the middle 50% of the data. Therefore, because it
uses the middle 50%, it is not affected by outliers or extreme values. The IQR is often
used with skewed data as it is insensitive to the extreme scores.

The Standard Deviation

92
Assessment and Evaluation of Learning

Let us say that two classes took a quiz. There were 10 students in each class, and each
class had an average score of 81.5. Since the averages are the same, can we assume
that the students in both classes have the same performance on the exam?

The answer is No. The average (mean) does not tell us anything about the distribution
or variation in the grades. So, we need to come up with some way of measuring not
just the average, but also the spread of the distribution of our data.

The most useful measure of variability, or spread of scores, is the standard deviation. It
is essentially an average of the degree to which a set of scores deviates from the mean.
If the Standard Deviation is large, it means the numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to their mean.
Because it takes into account the amount that each score deviates from the mean, it is
a more stable measure of variability than either the range or quartile deviation.

The procedure for calculating a standard deviation involves the following steps:

1. Compute the mean.


2. Subtract the mean from each individual’s score.
3. Square each of these individual scores.
4. Find the sum of the squared scores (∑X2).
5. Divide the sum obtained in step 4 by N, the number of students, to get the
variance.
6. Find the square root of the result of step 5. This number is the standard deviation
(SD) of the scores.

Thus the formula for the standard deviation (SD) is: SD=

Now let us take the previous scenario of two groups of students who too a Math quiz
with a mean score of 81.5 to calculate and compare their standard deviations. The
individual scores of group A is: 72, 76, 80, 80, 81, 83, 84, 85, 85, & 89. The individual
scores of group B is: 57, 63, 65, 71, 83, 93, 94, 95, 96, 98. Let us start with group A. So,

93
Assessment and Evaluation of Learning

the first step to finding the Standard Deviation is to find all the distances from the mean.
This will be followed by squaring each distances which will give us the following results.

Scores of Team A Distances from the Mean Distances squared


72 - 9.5 90.25
76 - 5.5 30.25
80 - 1.5 2.25
80 - 1.5 2.25
81 - 0.5 0.25
83 1.5 2.25
84 2.5 6.25
85 3.5 12.25
85 3.5 12.25
89 7.5 56.25
Then we add up all of the squared distances which will gives us 214.5. This will be
divided by the total number of scores of the group which will result 214.5 /10 = 21.45.
This is the variance of the data set. Variance is the average squared deviation from the
mean of a set of data. It is used to find the standard deviation. Finally, we calculate the
Square Root of the variance. This will give us 4.63, which is the standard deviation.
Activity: Using the same procedures calculate the standard deviation for
 the scores of Group B.

I am sure you have come up with 15.1 as a standard deviation for the distribution of
scores of group B. Now, let’s compare the two groups of students again.

Group A Group B

Average on the Quiz 81.5 81.5

Standard Deviation 4.63 15.1

What is your interpretation of the test scores of the two groups based on their standard
deviations?

94
Assessment and Evaluation of Learning

Activity: The Math test scores of five students are: 92,88,80,68 and 52.
 Find the variance and standard deviation.

The standard deviation, like other measures of variability, represents a distance. If we


move the distance equal to one SD above and below the mean, we will find that
somewhere between 60% and 75% of the scores fall in that region of most distributions
of scores. In a normal distribution, 68% of the scores are included between the mean
minus one SD and the mean plus one SD.

Which measure of dispersion to use


The quartile deviation is used with the median and is satisfactory for analyzing a small
number of scores. Because these scores are obtained by counting and thus are not
affected by the value of each score, they are especially useful when one or more scores
deviate markedly from the others in the set.

The standard deviation is used with the mean. It is the most reliable measure of
variability, and is especially useful in testing. In addition to describing the spread of
scores in a group, it serves as a basis for computing standard scores, the standard error
of measurement, and other statistics used in analyzing and interpreting test scores.

3.2.4. Measures of Relative Position

There are different ways to measure the relative position of scores. Suppose that you
have scored 55 on a test. What do you say about this score?

On the surface it might look bad but what if that was the highest in the class or if that
score was better than 80% of the class? This is what we mean by relative position.

Percentiles
95
Assessment and Evaluation of Learning

A percentile is a score that indicates the rank of the student compared to others (same
age or same grade), using a hypothetical group of 100 students. . It tells you what
percentage of people you did better than. A percentile of 25 (25 th percentile), for
example, indicates that the student's test performance equals or exceeds 25 out of 100
students on the same measure. A percentile of 87 indicates that the student equals or
surpasses 87 out of 100 (or 87% of) students. A percentile must always refer to a
student’s percentile rank as relative to a particular norm group. If you scored at the 80 th
percentile, what does that mean?

Converting Data Value to Percentile


1. Arrange the data in ascending order

2. Count how many items are below your value. If for example your score is 85 and
there are multiple 85’s then count how many are under the first 85.

For example, in the students’ scores of 76, 77, 80, 83, 85, 85, 85, 90, 96 ,97 there are
4 items below 85.

Percentile = number of items below your data + 0.5 * 100%

total number of values

So in our data example: Percentile = 4 * 0.5 *100% = 45 Percentile

10

Quartiles
Quartile is another term referred to in percentile measure. The total of 100% is broken
into four equal parts: 25%, 50%, 75% 100%.

96
Assessment and Evaluation of Learning

 Lower Quartile is the 25th percentile. (0.25)


 Median Quartile is the 50th percentile. (0.50)
 Upper Quartile is the 75th percentile. (0.75)

Standard Scores

Another method of indicating a pupils relative position in a group is by showing how far
the raw score is above or below average. This is the approach used with standard
scores. Basically, standard scores express test performance in terms of standard
deviation units from the mean. Standard scores are scores that are based on mean and
standard deviation.

Types of standard scores

Z Score: For data distributions that are approximately symmetric, a measure of relative
position that is often used is the z-score. z-score gives us an estimate as to how many
standard deviations a particular score lies from the mean.

We define z score as z = X – X,
S

Where, X = the data value in question

X = the sample mean

s = the sample standard deviation

For instance, if a person scored a 70 on a test with a mean of 50 and a standard


deviation of 10, then they scored 2 standard deviations above the mean. So, a z score
of 2 means the original score was 2 standard deviations above the mean.

If the z-score is 0 then your data value is the mean

97
Assessment and Evaluation of Learning

If the z-score > 0 (positive) then your data value is above the mean

If the z-score < 0 (negative) then your data value is the below the mean.

Example. Almaz scored a 25 on her math test. Suppose the mean for this exam is 21,
with a standard deviation of 4. Dawit scored 60 on an English test which had a mean of
50 with a standard deviation of 5. Who did relatively better?

Since standardized tests typically have score distributions which are approximately
symmetric, we will find the respective z-scores for Almaz and Dawit.

Almaz= z-score: 25 - 21 =1
4

Dawit's z-score: 60-50 = 2


5

Since Dawit had a higher z-score, we say Dawit did relatively better.

T Scores: This refers to any set of normally distributed standard scores that has a
mean score of 50 and a standard deviation of 10. The T – score is obtained by
multiplying the Z-score by 10 and adding the product to 50. That is, T – Score = 50 +
10(z). A score of 60 is one standard deviation above the mean, while a score of 30 is
two standard deviations below the mean.

Example

A test has a mean score of 40 and a standard deviation of 4. What are the T – scores of
two test takers who obtained raw scores of 30 and 45 respectively in the test?

Solution

The first step in finding the T-scores is to obtain the z-scores for the test takers. The z-
scores would then be converted to the T – scores. In the example above, the z –
scores are:
For the test taker with raw score of 30, the Z – score is:

98
Assessment and Evaluation of Learning

Z – Score = X – M, where the symbols retain their usual meanings.

SD

X = 30, M = 40, SD = 4.

Thus, Z – Score = 30 - 40 = -10 = -2∙5

4 4

The T - Score is then obtained by converting the Z – Score (-2∙5) to T – score. Thus:

T – Score = 50 + 10 (z)

= 50 + 10 (-2∙5)

= 50 – 25

= 25

Activity: Following the same procedures find the t score for the second
 student whose raw score is 45.

3.2.5 Measures of Relationship


If we have two sets of scores from the same group of people, it is often desirable to
know the degree to which the scores are related. For example, we may be interested in
the relationship between the test scores of students for the English Subject and their
overall scores of other subjects. The degree of relationship is expressed in terms of
coefficient of correlation. The value ranges from -1.00 to +1.00. A perfect positive
correlation is indicated by a coefficient of +1.00 and a perfect negative correlation by a
coefficient of -1.00. A correlation of .00 indicates no relationship between the two sets of
scores. Obviously, the larger the coefficient (positive of negative), the higher the degree
of relationship expressed.

There are several different measures of relationship expressed as correlation


coefficients. One of these is the product-moment correlation coefficient, which is by far
the most commonly used and most useful correlation coefficient. It is indicated by the
symbol r.

99
Assessment and Evaluation of Learning

The formula for obtaining the coefficient of correlation is: r=

Where, X = score of person on one variable

Y = score of same person on the other variable

= mean of the X distribution

= mean of the Y distribution

Sx = standard deviation of the X scores

Sy = standard deviation of the Y scores

N = number of pairs of scores

Project work: In groups of five, take the roaster of one cooperating teacher of the
school you are placed for your practicum experience and do the
following tasks:

a) Calculate the average marks of the students of the section by
taking five subjects
b) Based on the calculated averages, find the mode, the median, the range, the
inter-quartile range, and the standard deviation
c) Find the average scores that lie in the 25th, 50th, and 75th percentiles
d) Take the scores of two subjects and calculate the coefficient of correlation
You have to prepare a report of your work and submit it for correction.

3.3 Characteristics of a good test


Test, as an instrument possesses some qualities, which are necessary, before it can be
eligible as a test and usable. A test should therefore possess the following
characteristics:

Validity is the most important quality you have to consider when constructing or
selecting a test. It refers to when a test serves its purpose(s), that is measures what it
100
Assessment and Evaluation of Learning

intended to measure and to the extent desired. It is concerned with whether the
information being gathered is relevant to the decision that needs to be made. It is all
about the extent to which assessment information can be trusted (truthfulness). Thus
validity is always concerned with the specific use of the results and the soundness of
our proposed interpretations. Hence, to the extent that a test score is decided by factors
or abilities other than that which the test was designed or used to measure, its validity is
impaired.

The following factors can influence the validity of a test:


 Unclear direction
 inappropriate level of difficulty
 Poorly constructed items ( clues to items)
 Test item inappropriate for the objectives being measured
 Improper arrangement of item
 Identifiable pattern of answers
 Cheating in exams, emotional disturbance of examines

Reliability: Test reliability refers to the accuracy, consistency and stability of scores
students would receive on alternate forms of the same test. The more the pair of scores
observed for the same testee varies from each other, the less reliable the measure is.
The more consistent our test results are from one measurement to another, the less
error there will be and consequently, the greater the reliability. Reliability is a precursor
to test validity. If test scores cannot be assigned reliably, it is impossible to conclude
that the scores accurately measure the domain of interest.

There are some factors the reliability of a test which include the following:
 Test length: the longer a test is, the more reliable it is ( in that wide coverage of
content is ensured) but NOT TOO LONG
 Sample heterogeneity: the more heterogeneous the test items, the higher the
reliability
 Irregularities: lightening conditions, testee’s failure to follow directions,
In order to be valid, a test must be reliable; but reliability does not guarantee validity.
101
Assessment and Evaluation of Learning

Objectivity- The fairness of a test to the testee, bias test does not portray objectivity
and hence is not reliable. A test that is objective has high validity and reliability

Discrimination- A good test must be able to make distinction between poor and good
learner; it should show the slight differences between learner attainment and
achievement that will make it possible to distinguish between poor and good learner.
What are the likely criteria in order to satisfy these conditions?

Comprehensiveness- Test items that covers much of the content of the course, that is
the subject matter is said to be comprehensive and hence capable of fulfilling purpose.

Ease of administration- a good test should not pose difficulties in administration.

Practicality and scoring- Assigning quantitative value to a test result should not be
difficult. Why, what and how.

Usability- a good test should be useable, unambiguous and clearly stated with one
meaning only.

Unit Summary

In this unit you learned that test interpretation is a process of assigning meaning and
usefulness to the scores obtained from classroom test and you were introduced to how
to interpret test scores. This includes criterion-referenced and norm-referenced
interpretation. Criterion-referenced interpretation is the interpretation of test raw score
based on the conversion of the raw score into a description of the specific tasks that the
learner can perform. Norm-referenced interpretation is the interpretation of raw score
based on the conversion of the raw score into some type of derived score that indicates
the learner’s relative position in a clearly defined reference group.

This unit also introduced you with different statistical techniques that are useful in
interpreting test scores. These are classified into measures of central tendency,
measures of dispersion, measures of relative position and measures of association or
relationship. The measures of central tendency help us to come up with the one single
102
Assessment and Evaluation of Learning

score that best describes a distribution of scores. The most commonly used measures
of central tendency are the mean, the mode and the median. The measures of
dispersion tell us how much the scores spread out above and below the measure of
central tendency as well as how much they are spread out from one another. These
measures include the range, the inter-quartile range and the standard deviation. The
measures of relative position are techniques that will show us the relative standing of
individual scores within a certain set of scores. Measures that are used here include
percentile ranks, quartiles, and standardized scores such as the z scores and t scores.
The measures of relationship or association help us to know the degree to which sets of
scores are related. The most commonly used measure of relationship is the product-
moment correlation coefficient. Finally, you have also learned about the major
characteristics that make a good test.

Self Check Exercises

1. What is test interpretation and why is it necessary to interpret


classroom tests?
2. Highlight the major difference between criterion-referenced
interpretation and Norm-referenced interpretation of test scores.
3. Suppose as student has taken two quizzes in a statistics course. On the first quiz the
mean score was 32, the standard deviation was 8, and the student received a 44.
The student obtained a 28 on the second quiz, for which the mean was 23 and the
standard deviation was 3. If test scores are approximately normal, on which quiz did
the student perform better relative to the rest of the class?
4. You have given an exam to your students. Scores on this exam are normally
distributed with mean = 40 and standard deviation = 6.
a) What score would a student need to be in the top 15%?
103
Assessment and Evaluation of Learning

b) What score represents the 45th percentile?


c) If 200 students took the exam, how many would you expect to score below 30?

3.3. References

Bluman, A. G (1998). Elementary statistics: Step by step Approach, Boston:


McGraw-Hill

Cohen, Louis and M. Holliday. Statistics for Education and Physical Education, New
York: Harper and Row Publishers

Hinkle. D.E. et al. (1994) Applied Statistics for the Behavioral Sciences. Bodyon:
Houghton Miffline Company

McClave, J.T. and Terry Sincich (2003). Statistics (9th ed.), New Jersey: Prentice Hall.

104
Assessment and Evaluation of Learning

UNIT 4

ITEM ANALYSIS

4.1 INTRODUCTION

In the previous unit, you learned about different statistical techniques of


 interpreting test scores. In this unit, you are going to learn about the
techniques for analyzing responses to test items so as to determine their validity and
reliability. You will also learn about the advantages and techniques of test item banking.

Learning Outcomes

Upon completion of this unit, you should be able to:

 Analyze test items using statistical methods


 Define item difficulty and discrimination indices
 Compute item difficulty and discrimination indices.
 Analyze the effectiveness of distracters for multiple choice items
 Improve item qualities through response analysis
 Select items for different purposes
 Establishing test item Bank for future use.

4.2. Analyzing test items

Once a teacher has corrected and marked his/her students’ test papers,
 what do you think he should do with them? Should he/she throw them
away? Keep them? Or what?

Item analysis is an important phase in the construction of tests. It is the process


involved in examining or analyzing testee’s responses to each item on a test with a
basic intent of judging the quality of item. Item analysis helps to determine the adequacy
of the items within a test as well as the adequacy of the test itself. There are several

105
Assessment and Evaluation of Learning

reasons for analyzing questions and tests that students have completed and that have
already been graded. Some of the reasons that have been cited include the following:

1. Identify content that has not been adequately covered and should be re-taught,
2. Provide feedback to students,
3. Determine if any items need to be revised in the event they are to be used again
or become part of an item file or bank,
4. Identify items that may not have functioned as they were intended,
5. Direct the teacher's attention to individual student weaknesses.

The results of an item analysis provide information about the difficulty of the items and
the ability of the items to discriminate between better and poorer students. If an item is
too easy, too difficult, failing to show a difference between skilled and unskilled
examinees, or even scored incorrectly, an item analysis will reveal it. The two most
common statistics reported in an item analysis are the item difficulty and the item
discrimination. An additional analysis that is often reported is the distractor analysis.
Once the item analysis information is available, an item review is often conducted. In
the following sections you are going to learn the statistical techniques used to analyse
responses to test items.

4.2.1. Item difficulty level index

How difficulty do you think a test should be? How do we determine the
 difficulty level of test items? Why is it important to know the difficulty
level of test items? Please think over these questions and share your
ideas to your friend.

Item difficulty index is one of the most useful, and most frequently reported, item
analysis statistics. It is a measure of the proportion of examinees who answered the
item correctly; for this reason it is frequently called the p-value. If scores from all
students in a group are included the difficulty index is simply the total percent correct.

106
Assessment and Evaluation of Learning

When there is a sufficient number of scores available (i.e., 100 or more) difficulty
indexes are calculated using scores from the top and bottom 27 percent of the group.

Item analysis procedures

1. Rank the papers in order from the highest to the lowest score
2. Select one-third of the papers with the highest total score and another one-third of
the papers with lowest total scores
3. For each test item, tabulate the number of students in the upper & lower groups who
selected each option
4. Compute the difficulty of each item (% of students who got the right item)

Item difficulty index can be calculated using the following formula:

P=

Where, HSG = High Scoring Groups

– LSG = Low Scoring Groups

– N= the total number of HSG and LSG

The difficulty indexes can range between 0.0 and 1.0 and are usually expressed as a
percentage. A higher value indicates that a greater proportion of examinees responded
to the item correctly, and it was thus an easier item. The average difficulty of a test is
the average of the individual item difficulties. For maximum discrimination among
students, an average difficulty of .60 is ideal. For example: If 243 students answered
item no. 1 correctly and 9 students answered incorrectly, the difficulty level of the item
would be 243/252 or .96.

107
Assessment and Evaluation of Learning

In the example below, five true-false questions were part of a larger test administered to
a class of 20 students. For each question, the number of students answering correctly
was determined, and then converted to the percentage of students answering correctly.

Question Correct responses Item difficulty

1 ||||||||||||||| 15 75% (15/20)

2 ||||||||||||||||| 17 85% (17/20)

3 |||||| 6 30% (6/20)

4 ||||||||||||| 13 65% 13/20)

5 |||||||||||||||||||| 20 100% (20/20)

Activity: Calculate the item difficulty level for the following four options
 multiple choice test item. (The sign (*) shows the correct answer).

Response Options

Groups A B C D* Total

High Scorers 0 1 1 8 10

Low Scorers 1 1 5 3 10

Total 1 2 6 11 20

Item difficulty interpretation

P-Value Percent Range Interpretation


> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average

108
Assessment and Evaluation of Learning

For criterion-referenced tests (CRTs), with their emphasis on mastery-testing, many


items on an exam form will have p-values of .9 or above. Norm-referenced tests
(NRTs), on the other hand, are designed to be harder overall and to spread out the
examinees’ scores. Thus, many of the items on an NRT will have difficulty indexes
between .4 and .6.
4.2.2. Item discrimination index

To what extent do you think a test item should discriminate between


 higher achievers and lower achievers? Should it be highly
discriminating, averagely discriminating, or less discriminating? What
are your reasons?

The index of discrimination is a numerical indicator that enables us to determine


whether the question discriminates appropriately between lower scoring and higher
scoring students. When students who earn high scores are compared with those who
earn low scores, we would expect to find more students in the high scoring group
answering a question correctly than students from the low scoring group. In the case of
very difficult items which no one in either group answered correctly or fairly easy
questions which even the students in the low group answered correctly, the numbers of
correct answers might be equal for the two groups. What we would not expect to find is
a case in which the low scoring students answered correctly more frequently than
students in the high group.

Item discrimination index can be calculated using the following formula:

D=

Where, HSG = High Scoring Groups


109
Assessment and Evaluation of Learning

– LSG = Low Scoring Groups

In the example below, there are 8 students in the high scoring group and 8 in the low
scoring group (with 12 between the two groups which are not represented). For
question 1, all 8 in the high scoring group answered correctly, while only 4 in the low
scoring group did so. Thus success in the HSG – Success in the LSG (8 - 4) = +4. The

last step is to divide the +4 by half of the total number of both groups (16). Thus,

will give us +.5, which is the D-value.

Question Success in the HSG Success in the LSG Difference D value

1 8 4 8–4=4 .5

2 7 2

3 5 6

Activity 2: Calculate the item discrimination index for the questions 2 & 3

on the table above.

The item discrimination index can vary from -1.00 to +1.00. A negative discrimination
index (between -1.00 and zero) results when more students in the low group answered
correctly than students in the high group. A discrimination index of zero means equal
numbers of high and low students answered correctly, so the item did not discriminate
between groups. A positive index occurs when more students in the high group answer
correctly than the low group. If the students in the class are fairly homogeneous in
ability and achievement, their test performance is also likely to be similar, resulting in
little discrimination between high and low groups.

Questions that have an item difficulty index (NOT item discrimination) of 1.00 or 0.00
need not be included when calculating item discrimination indices. An item difficulty of

110
Assessment and Evaluation of Learning

1.00 indicates that everyone answered correctly, while 0.00 means no one answered
correctly. We already know that neither type of item discriminates between students.

When computing the discrimination index, the scores are divided into three groups with
the top 27% of the scores in the upper group and the bottom 27% in the lower group.
The number of correct responses for an item by the lower group is subtracted from the
number of correct responses for the item in the upper group. The difference is divided
by the number of students in either group. The process is repeated for each item.

The value is interpreted in terms of both:

• direction (positive or negative) and


• strength (non-discriminating to strongly-discriminating).
These values can range from -1.00 to +1.00.The possible range of the discrimination
index is -1.0 to 1.0.

Item discrimination interpretation

D-Value Direction Strength

> +.40 positive strong

+.20 to +.40 positive moderate

-.20 to +.20 None ---

< -.20 negative moderate to strong

For a small group of students, an index of discrimination for an item that exceeds .20 is
considered satisfactory. For larger groups, the index should be higher because more
difference between groups would be expected. The guidelines for an acceptable level of
discrimination depend upon item difficulty. For very easy or very difficult items, low
discrimination levels would be expected; most students, regardless of ability, would get

111
Assessment and Evaluation of Learning

the item correct or incorrect as the case may be. For items with a difficulty level of about
70 percent, the discrimination should be at least .30.

When an item is discriminating negatively, overall the most knowledgeable examinees


are getting the item wrong and the least knowledgeable examinees are getting the item
right. A negative discrimination index may indicate that the item is measuring something
other than what the rest of the test is measuring. More often, it is a sign that the item
has been mis-keyed.

4.2.3. Effectiveness of Destructors in multiple choice items


One important element in the quality of a multiple choice item is the quality of the
item’s distractors. However, neither the item difficulty nor the item discrimination index
considers the performance of the incorrect response options, or distractors. A distractor
analysis evaluates the effectiveness of the distracters in each item by comparing the
number of students in the upper and lower groups who selected each incorrect
alternative (a good distracter will attract more students from the lower group than the
upper group).

Just as the key, or correct response option, must be definitively correct, the distracters
must be clearly incorrect (or clearly not the "best" option). In addition to being clearly
incorrect, the distractors must also be plausible. That is, the distractors should seem
likely or reasonable to an examinee who is not sufficiently knowledgeable in the content
area.

If a distractor appears so unlikely that almost no examinee will select it, it is not
contributing to the performance of the item. In fact, the presence of one or more
plausible distractors in a multiple choice item can make the item artificially far easier
than it ought to be. Let us try to explain this using the following table as an example that
shows the responses of eight students to five multiple-choice questions.

A B C D
TEST ITEM NO 1 5** 1 1 1
TEST ITEM NO 2 0 2 6** 0

112
Assessment and Evaluation of Learning

TEST ITEM NO 3 2** 2 2 2


TEST ITEM NO 4 0 3** 0 5
TEST ITEM NO 5 2 1 0 5**
** Denotes Correct Answer

Over 50% of the students answered question number 1 correctly, and each of the
distractors was selected. The distractors have functioned as they should. The teacher
may be less than satisfied with only 5 of 8 students answering correctly, but a class
would generally have more than eight students and could well have a higher percentage
of correct answers while still having effective distractors.

It is not desirable to have one of the distractors chosen more often than the correct
answer, as occurred with question 4. This result indicates a potential problem with the
question. Distractor D may be too similar to the correct answer and/or there may be
something in either the stem or the alternatives that is misleading.

If students do not know the correct answer and are purely guessing, their answers
would be expected to be distributed among the distractors as well as the correct
answer, much like question 3. If one or more distractors are not chosen, as occurs in
questions 2, 4, and 5, the unselected distractors probably are not plausible. If the
teacher wants to make the test more difficult, those distractors should be replaced in
subsequent tests.

In a simple approach to distractor analysis, the proportion of examinees who selected


each of the response options is examined. The proportion of examinees who select
each of the distractors can be very informative. For example, it can reveal an item mis-
key. Whenever the proportion of examinees who selected a distractor is greater than
the proportion of examinees who selected the key, the item should be examined to
determine if it has been mis-keyed or double-keyed. A distractor analysis can also
reveal an implausible distractor. In criterion referenced tests, where the item p-values
are typically high, the proportions of examinees selecting all the distractors are, as a
result, low. Nevertheless, if examinees consistently fail to select a given distractor, this
may be evidence that the distractor is implausible or simply too easy.
113
Assessment and Evaluation of Learning

Project Work
In the school where you are placed for your Practicum activities, take
 corrected exam papers of 1 section from the cooperating teacher and
by taking 10 multiple choice questions:
i. calculate the difficulty level of each item
ii. calculate the discrimination power of each item
iii. analyze the plausibility of the distractors
Present your work in the form of a report.

4.2.4. Distractor Analysis

Building a file of effective test items and assessment tasks involves recording the items
or tasks, adding information from analyses of students responses, and filing the records
by both the content area and the objective that the item or task measures. Thus, items
and tasks are recorded on records as they are constructed; information form analysis of
students responses is added after the items and tasks have been used, and then the
effective items and tasks are deposited in the file. In a few years, it is possible to start
using some of the items and tasks from the file and supplement these with new items
and tasks. As the file grow, it becomes possible to select the majority of the items and
tasks from the file for any given test or assessment without repeating them frequently.
Such a file is especially valuable in areas of complex achievement, when the
construction of test items and assessment tasks is difficult and time consuming. When
enough high-quality items and tasks have been assembled, the burden of preparing
tests and assessments is considerably lightened. Computer item banking makes tasks
even easier.

Summary

In this unit you learned how to judge the quality of classroom test by carrying out item
analysis which is the process of “testing the item” to ascertain specifically whether the
item is functioning properly in measuring what the entire test is measuring. You also
learned about the process of item analysis and how to compute item difficulty, item
114
Assessment and Evaluation of Learning

discriminating power and evaluating the effectiveness of distracters. You have learned
that item difficulty indicates the percentage of testees who get the item right; Item
discriminating power is an index which indicates how well an item is able to distinguish
between the high achievers and low achievers given what the test is measuring; and the
distraction power of a distracter is its ability to differentiate between those who do not
know and those who know what the item is measuring.

Finally you learned that after conducting item analysis, items may still be usable, after
modest changes are made to improve their performance on future exams. Thus, good
test items should be kept in test item banks and in this unit you were given highlights on
how to build a Test Item File/Item Bank.

Self-check Exercises

1. What is the purpose of test item analysis?


2. What is item difficulty index?
3. What is the power of item discrimination?
4. What is the basic intent of distractor analysis? ·
5. From the data presented below (where alternative A is the correct answer),
compute the difficulty level and the discrimination power and comment on the
effectiveness of the distractors.

Alternatives
Group A* B C D
Upper 28 14 5 6 3
Lower 28 7 15 1 5
·
6. What information should be included with the test item we put in our item
bank?·

4.3 References

115
Assessment and Evaluation of Learning

Abay Tekle (1982). Evaluation in Education (part one). Bahir Dar Teachers
College (Unpublished teaching material)

Bigge, J.L. and Colleen Shea Stump (1999). Curriculum, Assessment, and
Instruction. Boston: Wadsworth Publishing Company.

EQUIP (2008). Reader on Student Assessment. Addis Ababa.

116
Assessment and Evaluation of Learning

UNIT 5
Ethical Standards of assessment
5.1 Introduction

In the previous units you have learned about the different assessment
 related concepts, different strategies and techniques of assessing
students learning, as well as methods of maintaining the quality of tests.
In this unit you will be introduced with ethics as a mechanism of maintaining quality in
our assessment practice. You will be familiarized with some basic standards that are
expected of professional teachers to be ethical in their assessment practices. You will
also be familiarized with some general considerations in addressing diversity in the
classroom so as to make the assessment procedures accessible and free of bias.

Learning Outcomes

Upon completion of this unit, you should be able to:

 List down ethical and professional standards of assessment

 Propose contextualized ethical and professional standards in using


assessment

 Sensitize the consequences of unethical use of assessments

 Adhere to the ethical standards of tests and test uses

5.2. Ethical and Professional Standards of Assessment and its Use


Ethical standards guide teachers in fulfilling their professional obligation to provide and
use tests that are fair to all test takers regardless of age, gender, disability, ethnicity,
religion, linguistic background, or other personal characteristics.

117
Assessment and Evaluation of Learning

Fairness is a primary consideration in all aspects of testing. It:


 helps to ensure that all test takers are given a comparable opportunity to
demonstrate what they know and how they can perform in the area being
tested.
 implies that every test taker has the opportunity to prepare for the test and is
informed about the general nature and content of the test.
 also extends to the accurate reporting of individual and group test results.

The following are some ethical standards that teachers may consider in their
assessment practices.

1. Teachers should be skilled in choosing assessment methods that enable them to


make appropriate for instructional decisions. Skills in choosing appropriate,
useful, administratively convenient, technically adequate, and fair assessment
methods are prerequisite to good use of information to support instructional
decisions. Teachers need to be well-acquainted with the kinds of information
provided by a broad range of assessment alternatives and their strengths and
weaknesses. In particular, they should be familiar with criteria for evaluating and
selecting assessment methods in light of instructional plans.

2. Teachers should develop tests that meet the intended purpose and that are
appropriate for the intended test takers. This requires teachers to:
 Define the purpose for testing, the content and skills to be tested, and the
intended test takers.
 Develop tests that are appropriate with content, skills tested, and content
coverage for the intended purpose of testing.
 Develop tests that have clear, accurate, and complete information.
 Develop tests with appropriately modified forms or administration
procedures for test takers with disabilities who need special
accommodations.

118
Assessment and Evaluation of Learning

3. The teacher should be skilled in administering, scoring and interpreting the


results from diverse assessment methods. It is not enough that teachers are able
to select and develop good assessment methods; they must also be able to
apply them properly. This requires teachers to:
 Follow established procedures for administering tests in a standardized
manner.
 Provide and document appropriate procedures for test takers with
disabilities who need special accommodations or those with diverse
linguistic backgrounds.
 Protect the security of test materials, including eliminating opportunities for
test takers to obtain scores by fraudulent means.
 Develop and implement procedures for ensuring the confidentiality of
scores.

4. Teachers should be skilled in using assessment results when making decisions


about individual students, planning teaching, developing curriculum, and school
improvement. Assessment results are used to make educational decisions at
several levels: in the classroom about students, in the community about a school
and a school district, and in society, generally, about the purposes and outcomes
of the educational enterprise. Teachers play a vital role when participating in
decision-making at each of these levels and must be able to use assessment
results effectively.

5. Teachers should be skilled in developing valid pupil grading procedures which


use pupil assessments. Grading students is an important part of professional
practice for teachers. Grading is defined as indicating both a student's level of
performance and a teacher's valuing of that performance. The principles for using
assessments to obtain valid grades are known and teachers should employ
them.

6. Teachers should be skilled in communicating assessment results to students,


parents, other lay audiences, and other educators. Teachers must routinely
119
Assessment and Evaluation of Learning

report assessment results to students and to parents or guardians. In addition,


they are frequently asked to report or to discuss assessment results with other
educators and with diverse lay audiences. If the results are not communicated
effectively, they may be misused or not used. To communicate effectively with
others on matters of student assessment, teachers must be able to use
assessment terminology appropriately and must be able to articulate the
meaning, limitations, and implications of assessment results. Furthermore,
teachers will sometimes be in a position that will require them to defend their own
assessment procedures and their interpretations of them. At other times,
teachers may need to help the public to interpret assessment results
appropriately.
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise
inappropriate assessment methods and uses of assessment information.
Fairness, the rights of all concerned, and professional ethical behavior must
undergird all student assessment activities, from the initial planning for and
gathering of information to the interpretation, use, and communication of the
results. Teachers must be well-versed in their own ethical and legal
responsibilities in assessment. In addition, they should also attempt to have the
inappropriate assessment practices of others discontinued whenever they are
encountered. Teachers should also participate with the wider educational
community in defining the limits of appropriate professional behavior in
assessment.

In addition, the following are principles of grading that can guide the development of a
grading system.
1. The system of grading should be clear and understandable (to parents, other
stakeholders, and most especially students).

2. The system of grading should be communicated to all stakeholders (e.g.,


students, parents, administrators).

120
Assessment and Evaluation of Learning

3. Grading should be fair for all students regardless of gender, socioeconomic


status or any other personal characteristics.

4. Grading should support, enhance, and inform the instructional process.

Project work: In groups of five, prepare interview questions or a questionnaire to


assess the extent to which assessment ethics is respected in
the school you are placed for your Practicum experience.

Using the instrument you have prepared collect data from the
concerned members of the school community (teachers,
students), analyze the data and reach valid conclusions. You have to prepare
a report your conclusions as well as the procedures you have gone through to
reach your conclusion.

5.3. Ethnicity and Culture in tests and assessments

In the previous section you have learned that fairness is the fundamental principle that
has to be followed in teachers’ assessment practices. It has been said that all students
have to be provided with equal opportunity to demonstrate the skills and knowledge
being assessed. Fairness is fundamentally a socio-cultural, rather than a technical,
issue. Thus, in this section we are going to see how culture and ethnicity may influence
teachers’ assessment practices and what precautions we have to take in order avoid
bias and be accommodative to students from all cultural groups.

Do you believe that culture and ethnicity have any role in teachers’
 assessment practices? In you university experience, have you observed
situations where instructors were biased in the assignment of grades to
students based on culture and ethnicity? If so, do you think that was fair?

Students represent a variety of cultural and linguistic backgrounds. If the cultural and
linguistic backgrounds are ignored, students may become alienated or disengaged from
the learning and assessment process. Teachers need to be aware of how such
backgrounds may influence student performance and the potential impact on learning.
Teachers should be ready to provide accommodations where needed.

121
Assessment and Evaluation of Learning

Classroom assessment practices should be sensitive to the cultural and linguistic


diversity of students in order to obtain accurate information about their learning.
Assessment practices that attend to issues of cultural diversity include those that
 acknowledge students’ cultural backgrounds.
 are sensitive to those aspects of an assessment that may hamper students’
ability to demonstrate their knowledge and understanding.
 use that knowledge to adjust or scaffold assessment practices if necessary.

Assessment practices that attend to issues of linguistic diversity include those that
 acknowledge students’ differing linguistic abilities.
 use that knowledge to adjust or scaffold assessment practices if necessary.
 use assessment practices in which the language demands do not unfairly
prevent the students from understanding what is expected of them.
 use assessment practices that allow students to accurately demonstrate their
understanding by responding in ways that accommodate their linguistic abilities,
if the response method is not relevant to the concept being assessed (e.g., allow
a student to respond orally rather than in writing).

Teachers must make every effort to address and minimize the effect of bias in
classroom assessment practices. Bias occurs when irrelevant or arbitrary factors
systematically influence interpretations and results made that affect the performance of
an individual student or a subgroup of students. For example, bias may occur when
variables—such as cultural and language differences and socioeconomic status—are not
fairly accounted for when interpreting results from an assessment.

Assessment should be culturally and linguistically appropriate, fair and bias-free. It may
not be possible to totally eliminate all forms of bias from classroom assessments.
However, teachers and others who assess students’ learning should recognize that
bias is an ever-present concern to student assessment and be vigilant and resistant to
the sources of bias, including plans for identifying and addressing bias. For an
assessment task to be fair, its content, context, and performance expectations should:

122
Assessment and Evaluation of Learning

 reflect knowledge, values, and experiences that are equally familiar and
appropriate to all students;
 tap knowledge and skills that all students have had adequate time to acquire;
 be as free as possible of cultural and ethnic stereotypes.

5.4. Disability and Assessment Practices


It is quite obvious that our education system was exclusionary in fully accommodating
the educational needs of disabled students. This has been true not only in our country
but in the rest of the world as well, although the magnitude might differ from country to
country. It was in response to this situation that UNESCO has been promoting the
principle of inclusive education to guide the educational policies and practice of all
governments. Different world conventions were held and documents signed towards the
implementation of inclusive education. Our country, Ethiopia, has been a signatory of
these documents and therefore has accepted inclusive education as a basic principle to
guide its policy and practice in relation to the education of disabled students

Activity: In groups of five, find and discuss on the following documents


and briefly report the ideas each document addresses in relation to
 inclusive education:
1. The Dakar Framework For action (2000)
2. The Salamanca Statement and Framework for Action in Special Needs
Education (1994)
3. UN Convention on the Rights of Persons with Disabilities (2006)
One group should work on one convention and documents can be found from the
internet.

Inclusive education is based on the idea that all students, including those with
disabilities, should be provided with the best possible education to develop themselves.
This implies for the provision of all possible accommodations to address the educational
needs of disabled students. Accommodations should not only refer to the teaching and
learning process. It should also consider the assessment mechanisms and procedures.

Activity: In small groups, discuss on what type of accommodations that


can be made to make assessment practices accessible to

123
Assessment and Evaluation of Learning

students with different types of disabilities. Each group may discuss on one
type of disability and share its ideas to the other groups.

There are different strategies that can be considered to make assessment practices
accessible to students with disabilities depending on the type of disability. In general
terms, however, the following strategies could be considered in summative
assessments:

 Modifying assessments: - This should enable disabled students to have full


access to the assessment without giving them any unfair advantage.
 Others’ support: - Disabled students may need the support of others in certain
assessment activities which they can not do it independently. For instance, they
may require readers and scribes in written exams; they may also need others’
assistance in practical activities, such as using equipments, locating materials,
drawing and measuring.
 Time allowances: - Disabled students should be given additional time to
complete their assessments which the individual instructor has to decide based
on the purpose and nature of the assessment.
 Rest breaks: Some students may need rest breaks during the examination. This
may be to relieve pain or to attend to personal needs.
 Flexible schedules: In some cases disabled students may require flexibility in
the scheduling of examinations. For example, some students may find it difficult
to manage a number of examinations in quick succession and need to have
examinations scheduled over a period of days.
 Alternative methods of assessment:- In certain situations where formal
methods of assessment may not be appropriate for disabled students, the
instructor should assess them using non formal methods such as class works,
portfolios, oral presentations, etc.
 Assistive Technology: Specific equipment may need to be available to the
student in an examination. Such arrangements often include the use of personal
computers, voice activated software and screen readers.

124
Assessment and Evaluation of Learning

5.5 Gender issues in assessment

Do you feel that gender has any influence in teachers’ assessment


 practices? Is there any gender-related stereotype in relation to
assessment results? Share your reflections with your friends.

Teachers’ assessment practices can also be affected by gender stereotypes. The


issues of gender bias and fairness in assessment are concerned with differences in
opportunities for boys and girls. A test is biased if boys and girls with the same ability
levels tend to obtain different scores.

Test questions should be checked for:

 material or references that may be offensive to members of one gender,


 references to objects and ideas that are likely to be more familiar to men or to
women,
 unequal representation of men and women as actors in test items or
representation of members of each gender only in stereotyped roles.

If the questions involve objects and ideas that are more familiar or less offensive to
members of one gender, then the test may be easier for individuals of that gender.
Standards for achievement on such a test may be unfair to individuals of the gender that
is less familiar with or more offended by the objects and ideas discussed, because it
may be more difficult for such individuals to demonstrate their abilities or their
knowledge of the material.

Unit Summary

In this unit you have learned that ethics is a very important issue we have to follow in
our assessment practices. And the most important ethical consideration is fairness. If
we are to draw reasonably good conclusions about what our students have learned, it is
imperative that we make our assessments—and our uses of the results—as fair as
possible for as many students as possible. A fair assessment is one in which students
are given equitable opportunities to demonstrate their abilities and knowledge.

125
Assessment and Evaluation of Learning

Teachers must make every effort to address and minimize the effect of bias in
classroom assessment practices. Biases in assessment can occur because of
differences in culture or ethnicity, disability as well as gender. To ensure suitability and
fairness for all students, teachers need to check the assessment strategy for its
appropriateness and if there are cultural, disability and gender biases.

Equitable assessment means that students are assessed using methods and
procedures most appropriate to them. Classroom assessment practices should be
sensitive and diverse enough to accommodate all types of diversity in the classroom in
order to obtain accurate information about their learning.

Self Check Exercises’


1. What is the meaning of fairness in assessment?
2. What are the basic ethical standards that teachers may consider
in their assessment practices?
3. How does culture and ethnicity influence teachers’ assessment practices?
4. What strategies can teachers to follow to address the special needs of disabled
students during tests?

References
Washington Educational Research Association 1999. Ethical Standards in Testing:

Test Preparation and Administration. White Paper .WERA Professional Publications


Volume 1. www.wera-web.org

Childs, Ruth Axman (1990). Gender bias and fairness. Practical Assessment,
Research & Evaluation, 2(3). http://PAREonline.net/getvn.asp?v=2&n=3 .

Stobart, G. (2005). Fairness in multicultural assessment Systems. Assessment in


Education Vol. 12, No. 3, , pp. 275–287

Hilary Burrage (1991) Gender, Curriculum and Assessment Issues to 16 +, Gender


and Education, 3:1, 31-43, DOI: 10.1080/0954025910030103

126
Assessment and Evaluation of Learning

GLOSSARY
Assessment as Learning: The process of developing and supporting student
metacognition.
Assessment criteria: is a property, dimension or characteristic by which a student’s
achievement is judged or appraised
Assessment for Learning: The ongoing process of gathering and interpreting
evidence about student learning for the purpose of determining where students are in
their learning, where they need to go, and how best to get there
Assessment of Learning: The process of collecting and interpreting evidence for the
purpose of summarizing learning at a given point in time, to make judgements about the
quality of student learning on the basis of established criteria, and to assign a value to
represent that quality
Assessment: the process of gathering information, both formally and informally, about
students’ understandings and skills
Authentic assessment: demonstration or application of a skill or ability within a real-life
context
Convergent assessment: are those assessment activities that have only one correct
response that the student is trying to reach.
Correlation coefficient: a statistic that indicates the degree of relationship between
any two sets of scores obtained from the same group of individuals
Criterion referenced: criterion-referenced tests measure student performance against
a set of standards with determined levels (advanced, proficient, basic)
Criterion-referenced assessment: is the process of evaluating (and grading) the
learning of students against a set of pre-specified qualities or criteria, without reference
to the achievement of others in the cohort or group.
Diagnostic assessment: information collected before learning that is used to assess
prior knowledge and identify misconceptions
Discrimination: A good test must be able to make distinction between poor and good
learner

127
Assessment and Evaluation of Learning

Divergent assessments: are those assessment activities for which a range of answers
or solutions might be considered correct.
Evaluation: the process of making judgments about the level of students’ achievement
for accountability, promotion, and certification
Fairness: addresses the issue of possible bias or discrimination of an assessment
toward any individual or group (race, gender, ethnicity)
Formal Assessment: are where the students are aware that the task they are doing is
for assessment purposes
Formative assessment: information collected during learning that is used to make
instructional decisions
Informal Assessment: refers to assessment techniques that can easily be
incorporated into classroom routines and learning activities
Item analysis: is the process of “testing the item” to ascertain specifically whether the
item is functioning properly in measuring what the entire test is measuring
Learning outcomes: are what students should know and be able to do, and/or value at
the completion of a unit of study.
Measurement: The process of obtaining a numerical description of the degree to which
an individual possesses a particular characteristic.
Norm referenced: norm-referenced tests compare student performance to a national
population of students who served as the “norming” group
Norm-referenced assessment: determines student achievement (grades) based on a
position within a cohort of students – the norm group.
Objectivity: The fairness of a test to the testee. A test that is objective has high validity
and reliability
Percentile rank: is a single number that indicates the percentage of the norm group
that scored below a given raw score.
Percentile: a statistical device that shows how a student compares with students in the
“norming” group who had the same or a lower score
Performance assessment: students demonstrate that they can perform or
demonstrate specific behaviors and abilities

128
Assessment and Evaluation of Learning

Portfolio: a collection of student work with reflections


Process assessment: a type of assessment that focuses on the steps or procedures
underlying a particular ability or task
Product assessment: a type of assessment that focuses on evaluating the result or
outcome of a process.
Reliability: The consistency of test ability to measure accurately what it supposes to
measure
Rubrics: a scoring strategy that defines criteria and describes levels of quality (basic,
developing, proficient, exemplary)
Self-Assessment: a process in which students collect information about their own
learning, analyze what it reveals about their progress toward the intended learning goals
and plan the next steps in their learning.
Standard score: is a method of indicating a testee’s relative position in a group by
showing how far the raw score is above or below average
Standardized tests: standardized, summative assessments designed to provide
information on the performance of schools and districts
Summative assessment: information collected after instruction that is used to
summarize student performance and determine grades
Test administration: refers to the procedure of actually presenting the learning task
that the examinees are required to perform in order to ascertain the degree of learning
that has taken place during the teaching-learning process
Test: an instrument or systematic procedure for measuring a sample of behaviours by
posing a set of questions in a uniform manner.
T-score: refers to any set of normally distributed standard scores that has a mean score
of 50 and a standard deviation of 10
Validity: the degree to which an assessment measures what it claims to measure
Z–score: is the simple standard score which expresses test performance simply and
directly as the number of standard deviation units a raw score is above or below the
mean

129

You might also like