0% found this document useful (0 votes)
37 views

EPSC 311 UPDATED 311 NOTES

EPSC 311 is a course on measurement and evaluation in education

Uploaded by

Nathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

EPSC 311 UPDATED 311 NOTES

EPSC 311 is a course on measurement and evaluation in education

Uploaded by

Nathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

DEPARTMENT OF EDUCATION

COURSE OUTLINE

PROGRAMME: BACHELOR OF EDUCATION(ARTS AND SCIENCE)

CREDIT FACTOR: 3.0 CONTACT HOURS: 45

Unit Code & Title EPSC 311: MEASUREMENT AND EVALUATION IN


EDUCATION

Prerequisite NONE

Lecturer Name GERALD MUTUA

Contact Email & [email protected]


Telephone 0720-023-964

Course purpose
The course aims at enabling the learner to study different types of tests which can be used to
monitor learners’ progress and therefore make decisions wisely
Expected Learning Outcomes
At the end of the course the students should be able to:
(i) Explain relationship between measurement, evaluation, testing and assessment.

(ii) Discuss the role of instructional objectives in assessment and evaluation


(iii) Describe test specifications in constructing a test
(iv) Determine the quality of individual test items using item analysis
(v) Determine test validity and reliability
(vi) Describe the common uses of tests
(vii) Describe procedures for reporting tests results to students and parents

Course Content
Relationship between measurement, evaluation, testing and examination. Principles of
evaluation. Historical development of measuring instruments. Scales of measurement.
Characteristics of a good test. Types of tests and test formats. Test construction, planning and

Education for Freedom/Elimu ni Uhuru


administration. Quantitative and qualitative methods applied in selection of test items. Test
validation. Interpreting and reporting test results.
Course Schedule
Week Topic Specific learning outcomes Learning Experiences
Physical/Virtual
1  Relationship  Explain the Relationship Class discussion and brain
between measurement and storming in groups on the
between relationship between
assessment
measurement  Differentiate between measurement, evaluation,
assessment and evaluation testing and examinations
, evaluation,
 Compare testing with
testing and examination
 Analyse various principles
examinations
of evaluation
 Principles of
evaluation.

2 Instruction objectives  Explain the link between Group discussion in class


instruction objectives and Internet search and
assessment and evaluation reporting on Instruction
 Discuss the relevance of objectives
Instruction objectives

3 Historical  Describe the stages of Group work discussions and


development of development of measuring presentations on history of
measuring measurement and
tools evaluation
instruments
 Appreciate contributions of
early inventors of measuring
tools

4 Scales of  Categorize scales of In-class discussions


measurement. measurement Take away assignments and
discussions
 Discuss differences in scales of
measurement
 Apply scales of measurement
in educational measurement
and evaluation.

5 CAT 1
6 Types of tests and  Analyse features of a good Group work discussion
test formats test assignment and
presentations
 Discuss factors that
contribute to poor test

Education for Freedom/Elimu ni Uhuru


7 Characteristics of a  Explain the meaning of test In-class discussions and
good test formats debates about different tests
 Describe the features of formats
different test format like
structured tests and
unstructured tests
8 Test construction  Discuss factors to consider  In-class
and planning when planning and discussions about
construction for a test test planning and
 appreciate the relevance of construction
a table of specification  Project activity on
during test construction the application of
table of
specification
9 Test Administration  Analyze important Group discussions and
consideration during test assignments
administration
 Describe contemporary
strategies of curbing
examination leakages

10 CAT 2
11 Quantitative and  Describe quantitative and  In-class discussions
qualitative methods qualitative methods applied in
applied in selection of
test items selection of test items

12 Test validation.  Explain the importance of Assignments and


Interpreting and test validation presentation; in-class
reporting test results.
 Discuss ways of discussion
interpreting and reporting
test results.

13 End of semester
examination.
14 End of semester
examination.
Mode of delivery
This course shall be delivered through Discussions and presentations, Lectures, and Internet
research

Instructional materials and equipment


This course will make use of videos, Whiteboards, Textbooks, Mobile phones, and
Laptops/computers.

Education for Freedom/Elimu ni Uhuru


Course Assessment
Type Weighting (%)

Continuous Assessment Tests (CATs) 15


Other Assessments (Assignments, Projects, etc.) 15
End of Semester Examination 70

Core Reading Materials


Ayiro, L. P. (2016). Measuring learning outcomes in Kenya: Context and Perspectives. Annual
Review of Comparative and International Education 2015.

Violet, K. N., Marcella, M., & Josphine, M. K. The Evaluation Dilemma in Kenya Education
System.

Recommended Reference Materials

A. R. (1986). Teacher education and teacher‐perceived needs in educational measurement and


evaluation. Journal of Educational measurement, 23(4), 347-354.

Ayot H.O., & Patel M.M., (1992). Instructional methods (general methods). Nairobi,
Educational Research and Publication
Bali S. K., Ingule F.O, &Rono P.K., (1989). Psychology of Education (part two) Tests
and Measurements. Nairobi, Nairobi University Press.
Chase, C.I. (1999). Contemporary assessment for educators. New York:
Longman.Websites
http://www.sfsu.edu/~testing/MCTEST/testconstruction.html

Coffman, W. E. (1971). Essay examination. In R. L. Thorndike (Ed.), Educational


measurement (2nd ed., pp. 271–302). Washington DC: American Council on

Education.

Educational Development Assessment European Journal of Educational and Development


Psychology Vol.2, No.1, pp.1-17, Published by European Centre for Research

Gronlund, N. E., & Linn, R. L. (1995). Measurement and evaluation in teaching. Englewood

Cliffs, NJ: Merrill.

Education for Freedom/Elimu ni Uhuru


O.M Alade ,Igbinosa Victor Omoruyi (2014)Table Of Specification And Its Relevance In
Santrock J.W., (2001). Educational Psychology-1st Ed. Boston, McGraw-Hill

McDaniel, E. (1994). Understanding Educational Measurement. Brown & Benchmark, 25


Kessel Ct., Madison, WI 53711.

Welch C. J. (2006). Item and prompt development in performance testing. In S. M. Downing &

T. M. Haladyna (Eds.). Handbook of test development (pp. 303–327). Mahwah, NJ:

Erlbaum.

Professional Ethics
1. Punctuality is fundamental and students are required to be in class before the designated
time for the lecture.

2. Active participation in class discussions is encouraged.

3. Let us refrain from signing the attendance register on behalf of colleagues who are not
present.

4. Plagiarism is a serious academic offense and is highly discouraged. Plagiarized work shall
NOT be accepted. Notwithstanding the above, collaboration in course work is certainly
encouraged as this promotes team spirit and group synergy, provided originality is preserved.

5. Assignments should be handed in on or before the date they are due. Students can hand in
their assignments through their class representative. Assignments handed in late shall NOT
be accepted.

Prepared By: …………………………………………… Signature: ………………..


GERALD MUTUA
Date: …………………………….

Approved By: …………………………………………… Signature: ………………..


DR JOSEPH KIRUGUA (CoD)
Date: …………………………….

Stamp
Education for Freedom/Elimu ni Uhuru
TOPIC 1: RELATIONSHIP BETWEEN TEST, MEASUREMENT AND
EVALUATION

1. A test is commonly defined as a tool or instrument of measurement that is used to


obtain data about a specific trait or characteristics. A test is a device or technique
used to measure the performance, skill level, or knowledge of a learner on a
specific subject matter. A test is an instrument or a tool used to make a particular
measurement. The tool may be written, oral, mechanical, or another variation.
Examples of such tests are, Achievement tests, personality tests,
cardiorespiratory fitness tests,
2. Measurement is the collection of information on which a decision is made.
Every time, a test is used it must have its score called measurement. To measure
anything, we need some standard scale, questionnaire or an instrument known as
test. Measurement is not a new concept to you. You measured your height and
weight throughout your growing years. You have read how fast athletes have run,
how high some have jumped. All of these are examples of measurement.

Measurement is process of collecting information. A measurement takes place when a


“test” is given and a “score” is obtained. The measurement is a specific score given by an
expert every time when a test is applied. It is the process of collection of data on the
properties or attributes of interest during the administration of test or by some other
reliable sources the measurement takes place.

3. Evaluation is the use of measurement in making decisions. Evaluation is the


process of interpreting, analyzing, and assessing the data obtained from the test.
Evaluation is a process of delineating, obtaining and providing useful information
for judging decision alternatives. The statistical treatment of analysis of data as
per the purpose of the test is called evaluation. The evaluation assigns the worth
of value to the measurement score. Evaluation involves two steps- One
comparison of collected data (measurement) with some standards/norms and two,
decision making or announcement of evaluation on the basis of finding “how
good”. Evaluation is a process of education that uses data gathered from the
products and the process by means of measurement technique. Evaluation

Education for Freedom/Elimu ni Uhuru


becomes a process for judging how effective the educational experience has been
for the students.

EXAMPLES:

(i) A teacher measures Wambua’s height to be 128 cm. She evaluates his height
when she says that he is short.
(ii) (ii) A teacher measures Atieno’s achievement in Geography to be 62%. He
evaluates her achievement in Geography when he says that Atieno’s
performance is satisfactory.
(iii) (iii) Mutuma measures the size of his classroom and finds that it is 4.5m x
3.5m x 2.5m. He evaluates the classroom dimensions when he reports that the
classroom is too small to be used for 50 students.

4. Relationship between Test, Measurement and Evaluation.

From the meaning and definitions, it is obvious that a term test, measurement and
evaluation are interrelated. The tests are specific instruments for measurement.
Administration of a test is a process of measurement, without test, measurement is not
possible. Measurement is a technique necessary for evaluation. It represents status of
certain attributes or properties and is a terminal process. Measurement describe a
situation, evaluation judges its worth of value.

Measurement is a technique of evaluation and test are tool of measurement. The term test,
measurement and evaluation are clearly distinct but related. Teachers obtain measures
from test/s in order to make fair evaluation about specific trait or characteristics of the
students. An evaluation often involves one or more tests and in turn a test is involved in
one or more measurement.

Some corresponding examples for test, measurement and evaluation indicating


inter-relationship.

Education for Freedom/Elimu ni Uhuru


i) Question paper of written examination (test). i) Answer book written by students
are given scores by the examiner (measurement). i) Fail/Pass given in individual
paper (evaluation).
ii) ii) Questionnaire of surveys. ii) Scoring by the researcher. ii) Assigning
individual values bad, good, very good, excellent, to each answer.
iii) iii) Weighting machine or Thermometer etc. iii) Weight in kg. or temperature in
degrees etc. iii) Assigning status (above or below average etc. to each
measurement).
ASSESSMENT

One of the primary measurement tools in education is the assessment. Teachers gather
information by giving tests, conducting interviews and monitoring behavior. The assessment
should be carefully prepared and administered to ensure its reliability and validity. In other
words, an assessment must provide consistent results and it must measure what it claims to
measure.
Measurement determines the degree to which an individual possesses a defined characteristic. It
involves first defining the characteristic to be measured, and then selecting the instrument with
which measured. Test scores vary between being objective or subjective.
A test is objective when two or more people score the same test and assign similar scores. Tests
that are most objective are those that have a defined scoring system and are administered by
trained testers.
A subjective test lacks a standardized scoring system, which introduces a source of
measurement error. We use objective measurements whenever possible because they are more
reliable than subjective measurements. (Barrow and Rosemary, 1979)
Evaluation is a dynamic decision-making process focusing on changes that have been made.
This process involves:
(i)Collecting suitable data (measurement)
(ii)Judging the value of these data according to some standard; and
(iii) Making decisions based on the data.
The function of evaluation is to facilitate rational decisions.
For the teacher, this can be to facilitate student learning; for the exercise specialist, this could
mean helping someone establish scientifically sound weight reduction goals.

Education for Freedom/Elimu ni Uhuru


There are certain characteristics essential to a measurement; without them, little faith can be
put in the measurement and little use made of it.
The first important quality of measurement is reliability. A reliable test or instrument measures
whatever it measures consistently. That is, if an individual whose ability has not changed is
measured twice with a perfectly reliable measuring device, the two scores will be identical.
The second important characteristic is validity. A test or measuring instrument is valid if it
measures what it is supposed to measure.
The third important characteristic of a measurement is objectivity. Objectivity is sometimes
called rater reliability because it is defined in terms of the agreement of competent judges about
the value of a measurement.
Why are Measurement, Assessment and Evaluation important in Education?

According to educator and author, Graham Nuthall (2017), in his book The Hidden Lives of
Learners, "In most of the classrooms we have studied, each student already knows about 40-50%
of what the teacher is teaching." The goal of data-driven instruction is to avoid teaching students
what they already know and teach what they do not know in a way the students will best respond
to.

For the same reason, educators and administrators understand that assessing students and
evaluating the results must be ongoing and frequent. Scheduled assessments are important to the
process, but teachers must also be prepared to re-assess students, even if informally, when they
sense students are either bored with the daily lesson or frustrated by material they are not
prepared for. Using the measurements of these intermittent formative assessments, teachers can
fine-tune instruction to meet the needs of their students on a daily and weekly basis.
Why is data-driven instruction so effective?

Accurately measuring student progress with reliable assessments and then evaluating the
information to make instruction more efficient, effective and interesting is what data-driven
instruction is all about. Educators who are willing to make thoughtful and intentional changes in
instruction based on more than the next chapter in the textbook find higher student engagement
and more highly motivated students.

In fact, when students are included in the evaluation process, they are more likely to be self-
motivated. Students who see the results of their work only on the quarterly or semester report

Education for Freedom/Elimu ni Uhuru


card or the high-stakes testing report are often discouraged or deflated, knowing that the score is
a permanent record of their past achievement.

When students are informed about the results of more frequent formative assessments and can
see how they have improved or where they need to improve, they more easily see the value of
investing time and energy in their daily lessons and projects.

Education for Freedom/Elimu ni Uhuru


TOPIC 2: GENERAL PRINCIPLES OF ASSESSMENT
The following are some questions teachers may ask in the process of assessment in learning and
teaching
 How do I group students?
 Are these materials working?
 Should I use more examples to teach a concept?
 What kind of help do students need in improving their study skills?
 Which students need to be referred for special services?
 How do students learn best?
 How are the students progressing in their learning?
What is Assessment?
Assessment is a full range of procedures used to gain information about student learning and the
formation of value judgments concerning learning progress.
Test– Is a type of assessment, sample of questions, standard instructions/setting
Measurement –is assignment of numbers to results of an assessment using specific rules/tests.
Not all assessment has a measure associated with it!

General Principles of Assessment


1. Clearly stating what is to be assessed is of highest priority. Mathematics reasoning is very
different from mathematics computation
2. An assessment procedure should be selected because of its relevance to the characteristics or
performance to be measured. The learning objective/test purpose defines the type of assessment
you use.
-e.g., “Describe” requires that we use essay or oral test;
“Select” suggests multiple choice;
“Organize” suggests a supply-type assessment
3. Comprehensive assessment requires a variety of procedures. Different levels/types of learning
require different types of assessment. For instance one way of categorizing learning is Bloom’s
taxonomy- Knowledge, understanding, application analysis, synthesis, evaluation.
These different types of learning require different types of assessment
4. Proper use of assessment procedures requires an awareness of their limitations.

Education for Freedom/Elimu ni Uhuru


Each assessment type has its limitation. E.G. Multiple Choice, Essay, True-False Performances
Matching, Self-reports
Some limitations come as a result of sampling error. Chance factors also create limitations.
Incorrect interpretation–IQ tests are not the same as intelligence scores. Can you think of
examples of chance factors?
5. Assessment is a means to an end, not an end in itself. Assessment information helps us make
decisions about students, instruction, and curriculum. Assessment information never should be
used just to give a grade (though even that is a purpose). Everyone should understand what the
purpose of a specific type of assessment is

Assessment and the Instructional Process


 Identify Instructional Goals
 Pre-assess the Learners’ needs
 Provide Relevant Instruction
 Assess the Intended Learning Outcomes
 Use the Results and Give feedback to students, teacher on learning progress
 Evaluate goals, Evaluate curriculum scope and sequence,
 Identify learning strengths, weaknesses
Purposes of Tests

Teachers teach content then test students. This cycle of teaching and testing is familiar to anyone
who has been a student. Tests seek to see what students have learned. However, there can be
other more complicated reasons as to why schools use tests.

At the school level, educators create tests to measure their students' understanding of specific
content or the effective application of critical thinking skills. Such tests are used to evaluate
student learning, skill level growth and academic achievements at the end of an instructional
period, such as the end of a term, unit, course, semester, program or school year.

These tests are designed as summative assessments.

Summative Tests

Education for Freedom/Elimu ni Uhuru


 They are used to determine whether students have learned what they were
expected to learn or to a certain level or degree to which students have learned the
material.
 They may be used to measure learning progress and achievement and to evaluate
the effectiveness of educational programs. Tests may also measure student
progress toward stated improvement goals or to determine student placement in
programs.
 They are recorded as scores or grades for a student’s academic record for a report
card or for admission to higher education/class.

At the national level, standardized tests are an additional form of summative assessments, E.G
KCPE, KCSE and other KNEC exams.

Advantages and Disadvantages of Standardized Testing

1. It is an objective measure of students’ performance.


2. From number 1 above, the tests results are used by many stakeholders for e.g
Governments -in placements to colleges (KCSE), to secondary schools (KCPE) etc
3. Standardized testing is a way to hold public schools accountable to the
taxpayers/government who fund the schools
4. It is a means to improve the curriculum in the future- Evaluates the effectiveness of the
curriculum

Disadvantages

1. Tests demand time that could be used for instruction and innovation. They claim that

2. Schools are under pressure to "teach to the test," a practice that could limit the curricula.

3. Students with special needs or varied learning environments may be at a disadvantage when
they take standardized tests. Compare day schools and national schools in Kenya

4. Summative standardized tests create a lot of anxiety due to the associated value for future
careers, secondary schools, college or universities admission.

Education for Freedom/Elimu ni Uhuru


5. The scores attained may not reflect the true picture of the learner’s ability- There could be
other extraneous factors to influence results, e.g.

Reasons For Testing/Assessment

1. Testing assesses what students have learned

The obvious point of classroom testing is to assess what students have learned after the
completion of a lesson or unit. When the classroom tests are tied to well-written lesson
objectives, a teacher can analyze the results to see whether the majority of students did well or
need more work. This information may help the teacher create small groups or to use
differentiated instructional strategies.

Educators can also use tests as teaching tools, especially if a student did not understand the
questions or directions. Teachers may also use tests when they are discussing student progress at
team meetings eg. PTA meetings

2. Testing identifies student strengths and weaknesses / Diagnosis

Another use of tests at the school level is to determine student strengths and weaknesses. One
effective example of this is when teachers use pretests at the beginning of units to find out what
students already know and figure out where to focus the lesson (Entry behavior).

3. Testing measures effectiveness of the teaching and learning process and curriculum
through summative evaluation.
The national examinations can also be used to compare with the best international
examinations standards or practice

4. Testing determines recipients of awards, certification and recognition

Education for Freedom/Elimu ni Uhuru


Tests can be used as a way to determine who will receive awards and recognition. Entry into
different categories of high schools, colleges and universities. Used by KUCCPS

5. Testing can be used for research purposes


6. For guidance and counseling services
7. For the purpose of selecting students for employment
8. For the purposes of promotions to the student
9. For reporting students’ progress to their parents

Education for Freedom/Elimu ni Uhuru


TOPIC 3: INSTRUCTIONAL OBJECTIVES
Introduction
Instructional objectives are statements of what is to be achieved at the end of the instructional
process. They are, therefore, the subject of assessment and evaluation. This chapter discusses the
importance of instructional objectives, and their formulation.
Importance of instructional/learning objectives
i. Instructional/learning objectives communicative to the learners, instructors and other interested
people what the learners should be able to do at the end of the lesson.
ii. Instructional/learning objectives help learners organize their study and avoid getting lost (if
learners are informed).
iii. Instructional/learning objectives help the teacher plan learning activities with focus
iv. Instructional/learning objectives enable the teacher select the most appropriate teaching
approaches.
v. Well written Instructional/learning objectives help to take time in developing the lesson.
vi. Instructional/learning objectives form the basis for the development of the instruction by
limiting the content.
vii. Instructional/learning enable the teacher identify appropriate teaching & learning resources
viii. Instructional/learning form the basis for lesson evaluation.
Steps in Writing Instructional Objectives
There are 5 steps in writing outcomes
Decide on the content area. This is defining the limits of what should go into instruction.
Use action verbs to identify specific behaviors. The verb should be (a) an observable behavior
that produces measurable results (b) the verb should also be at the highest skill level that the
learner would be required to perform.
Specify the content area after the verb, for example: calculate averages or computer variances.
It is important to specify the content area for clarity. An unspecified content area would for
example be
(a) Calculate statistical information
(b) Computer values needed in economics
These are wide areas and cannot be measured well.

Education for Freedom/Elimu ni Uhuru


Specify applicable conditions. Identify any tools used, information to be supplied or other
constraints. For example: given a calculator, calculate the average of a list of numbers OR
without using a calculator, calculate the average of a list of numbers.
Specify application criteria -identify any desired levels of speed, accuracy, quality, quantity
e.t.c .
For example: Given a calculator, calculate averages from a list of numbers correctly, all the time.
OR Given a spreadsheet package, compute variances from a list of numbers rounded to the
second decimal point.
Review each learning outcome to be sure it is complete, clear and concise
Domains of Learning
Psychologist Benjamin Bloom and his colleagues identified three domains or learning outcomes
of educational activities. This classification frame work came to be known as the taxonomy of
educational objectives or Blooms taxonomy.
It identifies three levels or domains of learning and is hierarchical. Each of the three domains is
divided into levels, starting from the simplest to the most complex. Learning at higher levels is
dependent in knowledge and skills acquired at other levels. The taxonomy, therefore, provides a
sequential model for dealing with curriculum content.

The cognitive domain


The cognitive domain involves knowledge and the development of mental skills. There are six
levels which are listed from the simplest to the most complex there are
Knowledge
Knowledge can be defined as the collection or recall of appropriate, previously learnt
information. It includes the recalls of terminology and specific facts. Verbs to use in the
statement of instructional objectives at this level include: define, describe, enumerate, identify
label, list, match, name, read, select, reproduce and state.
Comprehension
Comprehension entails the understanding of information so as to be able to translate, perceive
and interpret instruction and problems. Learners should be able to classify, cite, convert,
describe, discuss, explain, give examples, paraphrase, restate in own words, summarize,
understand, distinguish and rewrite.
Application

Education for Freedom/Elimu ni Uhuru


Application is using previously learnt information in new situations to solve problems. Those
experiencing this level of learning should show the capacity to apply, change, compute, modify,
predict, prepare, relate, solve, show, use and produce.
Analysis
Analysis refers to the ability to break down informational materials into their component parts
examine them and understand the organizational structure. It may involve identifying motives or
causes, making inferences or finding evidence to support generalization .at this level, learners
should be able to break down, correlate, discriminate, differentiate, distinguish, focus, illustrate,
infer, limit, outline point out, prioritize, recognize separate, subdivide, select and compare.
Synthesis
Synthesis refers to the building of structures or patterns from various kinds of elements. Learners
at this level can put parts together to form a whole that has a new meaning or structure. Key
words in use for this level of learning include categories
Evaluation
Evaluation refers to the process of making judgment s about information, its value and quality.
Learning and the level includes appraising, comparing and contrasting, defending, judging,
interpreting, justifying. Discriminating and evaluating

The affective domain


Concerned with emotions and attitudes
Has five levels
Receiving –involves awareness and paying attention Key words are name, choose, point at,
select, use, locate, follow, describe
Responding
Involves active participation in the learning process. The learner reacts to what they have
received in.
Measuring verbs include answer, assist, aid, discuss, greet, help, label, perform, practice, present,
read, record select & write.
Valuing
This describes the value or worth that a learner attaches to a particular object, phenomenon or
behaviours. Key terms include complete, join, demonstrate, differentiate, explain, form, initiate,
write, join, justify, propose, report, share & study & work.

Education for Freedom/Elimu ni Uhuru


Organization
This involves arranging values according to priority by comparing, relating & synthesizing them.
Key verbs are alter, arrange, combine, explain, organize, synthesize, relate.
Internalization of values
The learner has formed a value system that influences his or her behaviours. Key words are
influence, perform, practice, qualify, question, verify, display, discriminate, act.

The Psychomotor Domain


The psychomotor domain involves; physical movement, coordination and use of the motor skill.
This domain describes the ability to physically manipulate things such as tools. Bloom and his
colleagues did not create levels in this domain. Other educators have created seven levels
from the simplest to the most complex form. They are
Perception
This is the ability to use the senses to guide motor activity. Individuals experiencing learning at
this level should be able to choose, describe, detect, differentiate, distinguish, identify, isolate,
select.
Set
This indicates the readiness to act. At this level, individuals should be able to display, explain,
move, proceed, restate and volunteer
Guided response
This refers to the early stages of learning a complex skill that include imitation, and trial and
error. Achievement at this level is attained by practicing. Learning is demonstrated by copying,
tracing, following, reacting and reproducing.
Mechanism
This refers to the intermediate stage of learning a complex skill. Learnt responses should become
habitual, confident and proficient. Learners can assemble, construct, dismantle, fasten, fix, grind,
heat, measure, mend, sketch, organize and calibrate.
Complete overt response
This is the skillful performance of physical acts that involve complex movement patterns.
Proficiency is indicated by quick, accurate and highly coordinated performance. Key terms that
show learning at this level include assemble, build, calibrate, construct, dismantle, display, fix,
fasten, grind, heat, manipulate, measure, mix, organize and sketch.

Education for Freedom/Elimu ni Uhuru


Adaptation
Individuals experiencing learning at this level have well-developed skills and are able to make
modifications to fit special requirements. Learners at this level adapt, alter, change, rearrange
and reorganize.
Origination
This involves creativity. The learner can create new movement patterns to suit different
situations. Individuals at this level should demonstrate the ability to arrange, build, combine,
compose, construct, design, initiate and make.
Review Questions
1. What are instructional objectives?
2. Why are instructional objectives important in educational measurement and evaluation?
3. Outline the steps in writing instructional objectives
4. Identify and discuss the 6 levels of the cognitive domain of learning as put forth by Bloom.

Education for Freedom/Elimu ni Uhuru


TOPIC 4: HISTORICAL DEVELOPMENT OF TESTING AND EVALUATION

INTRODUCTION

In the last unit, you read through important definitions in measurement and evaluation, you saw
the types of evaluation and the purposes of evaluation. In this unit we shall
move another step to look at the historical development of testing and evaluation.
This may help you to appreciate the course the more, and also appreciate the early players in the
course.

OBJECTIVES

After working through this unit, you shall be able to

 Trace the historical development of testing and evaluation.


 Mention some of the early players in testing and evaluation
 Mention some of the testing organizations in Kenya.

Timeline of Early Milestones in the History of Testing

 2200 B.C.: Chinese emperor examined his officials every third year to determine
their fitness for office.
 1862 A.D.: Wilhelm Wundt uses a calibrated pendulum to measure the “speed of
thought”.
 1869: Scientific study of individual differences begins with the publication of
Francis Galton’s Classification of Men According to Their Natural Gifts.
 1879: Wundt establishes the first psychological laboratory in Leipzig, Germany.
 1884: Galton administers the first test battery to thousands of citizens at the
International Health Exhibit.
 1888: J.M. Cattell opens a testing laboratory at the University of Pennsylvania.
 1890: Cattell uses the term "mental test" in announcing the agenda for his
Galtonian test battery.
 1901: Clark Wissler discovers that Cattellian “brass instruments” tests have no
correlation with college grades.

Education for Freedom/Elimu ni Uhuru


 1904: Charles Spearman describes his two-factor theory of mental abilities. First
major textbook on education measurement, E. L. Thorndike’s Introduction to the
Theory of Mental and Social Measurement, is published.
 1905: Binet and Simon invented the first modern intelligence scale. Carl Jung
uses word-association test for analysis of mental complexes.
 1914: Stern introduces the intelligence quotient (IQ): the mental age divided by
chronological age.
 1916: Lewis Terman revises the Binet-Simon scales, publishes the Standford-
Binet. Revisions appear in 1937, 1960, and 1986.
 1917: Army Alpha and Army Beta, the first group intelligence tests, are
constructed and administered to U.S. Army recruits. Robert Woodworth develops
the Personal Data Sheet, the first personality test.
 1920: Rorschach Inkblot test is published.
 1921: Psychological Corporation – the first major test publisher – is founded by
Cattell, Thorndike, and Woodworth.
 1927: First edition of Strong Vocational Interest Blank for Men is published.
 1938: First Mental Measurements Yearbook is published.
 1939: Wechsler-Bellevue Intelligence Scale is published. Revisions are published
in 1955, 1981, and 1997
 1942: Minnesota Multiphasic Personality Inventory is published.
 1949: Wechsler Intelligence Scale for Children is published. Revisions are
published in 1974 and 1991.

Education for Freedom/Elimu ni Uhuru


TOPIC 5: SCALES OF MEASUREMENTS

In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio. These
are simply ways to sub-categorize different types of data
Nominal Scale

Nominal scales are used for labeling variables, without any quantitative value. “Nominal”
scales could simply be called “labels.” Here are some examples, below. Notice that all of these
scales are mutually exclusive (no overlap) and none of them have any numerical significance. A
good way to remember all of this is that “nominal” sounds a lot like “name” and nominal scales
are kind of like “names” or labels.

Examples of Nominal Scales

Note: a sub-type of nominal scale with only two categories (e.g. male/female) is called
“dichotomous.”

Other sub-types of nominal data are “nominal with order” (like “cold, warm, hot, very hot”) and
nominal without order (like “male/female”).
Ordinal Scale

With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. Take a look at the example below. In each
case, we know that 4 is better than 3 or 2, but we don’t know–and cannot quantify–how much
better it is. For example, is the difference between “OK” and “Unhappy” the same as the
difference between “Very Happy” and “Happy?” We can’t say.

Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc.

Education for Freedom/Elimu ni Uhuru


“Ordinal” is easy to remember because it sounds like “order” and that’s the key to remember
with “ordinal scales”–it is the order that matters, but that’s all you really get from these.

The best way to determine central tendency on a set of ordinal data is to use the mode or median

Example of Ordinal Scales


Interval Scale

Interval scales are numeric scales in which we know both the order and the exact differences
between the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is the same. For example, the difference between 60 and 50
degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees.

Interval scales are nice because the realm of statistical analysis on these data sets opens up. For
example, central tendency can be measured by mode, median, or mean; standard deviation can
also be calculated.

Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval”
itself means “space in between,” which is the important thing to remember–interval scales not
only tell us about order, but also about the value between each item.

Here’s the problem with interval scales: they don’t have a “true zero.” For example, there is no
such thing as “no temperature,” at least not with Celsius. In the case of interval scales, zero
doesn’t mean the absence of value, but is actually another number used on the scale, like 0
degrees Celsius. Negative numbers also have meaning. Without a true zero, it is impossible to
compute ratios. With interval data, we can add and subtract, but cannot multiply or divide.

Consider this: 10 degrees C + 10 degrees C = 20 degrees C. No problem there. 20 degrees C is


not twice as hot as 10 degrees C, however, because there is no such thing as “no temperature”

Education for Freedom/Elimu ni Uhuru


when it comes to the Celsius scale. Bottom line, interval scales are great, but we cannot
calculate ratios, which brings us to our last measurement scale…
Ratio Scale

Ratio scales tell us about the order, they tell us the exact value between units, and they also have
an absolute zero–which allows for a wide range of both descriptive and inferential statistics to
be applied. Good examples of ratio variables include height, weight, and duration.

Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These
variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central
tendency can be measured by mode, median, or mean; measures of dispersion, such as standard
deviation and coefficient of variation can also be calculated from ratio scales.
Summary
In summary, nominal variables are used to “name,” or label a series of values. Ordinal scales
provide good information about the order of choices, such as in a customer satisfaction survey.
Interval scales give us the order of values + the ability to quantify the difference between each
one. Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to
calculate ratios since a “true zero” can be defined.

Summar
y of data types and scale measures

Education for Freedom/Elimu ni Uhuru


TOPIC 6: TYPES OF TESTING AND TYPES OF TESTS

A. Types of Testing

There are four types of testing in schools today — diagnostic, formative, benchmark, and
summative. What purpose does each serve? How should parents use them and interpret the
feedback from them?

1. Diagnostic Testing

This testing is used to “diagnose” what a student knows and does not know. Diagnostic testing
typically happens at the start of a new phase of education, like when students will start learning a
new unit. The test covers topics students will be taught in the upcoming lessons.

Teachers use diagnostic testing information to guide what and how they teach. For example, they
will plan to spend more time on the skills that students struggled with most on the diagnostic
test. If students did particularly well on a given section, on the other hand, they may cover that
content more quickly in class. Students are not expected to have mastered all the information in a
diagnostic test.

Diagnostic testing can be a helpful tool for parents. The feedback children receive on these tests
helps the parent to know what kind of content they will be focusing on in class and to anticipate
which skills or areas they may have trouble with.

2. Formative Testing

This type of testing is used to gauge student learning during the lesson. It is used throughout a
lecture and designed to give students the opportunity to demonstrate that they have understood
the material. This informal, low-stakes testing happens in an ongoing manner, and student
performance on formative testing tends to get better as a lesson progresses.

Schools normally do not send home reports on formative testing, but it is an important part of
teaching and learning. If you help your children with their homework, you are likely using a
version of formative testing as you work together.

Education for Freedom/Elimu ni Uhuru


For example, observing STD 3 pupil/s measure objects using inches and centimeters, you are
able to see when he chose the wrong unit or when he did not start the measurement at the zero
point on the tape measure. That was a form of formative testing. You find it helpful as a teacher
because it lets you correct any mistakes before they become habits for the pupil.

3. Benchmark Testing

This testing is used to check whether students have mastered a unit of content. Benchmark
testing is given during or after a classroom focuses on a section of material, and covers either a
part or all of the content has been taught up to that time. The assessments are designed to let
teachers know whether students have understood the material that’s been covered.

Unlike diagnostic testing, students are expected to have mastered material on benchmark tests,
since they cover what the children have been focusing on in the classroom. Teachers and Parents
will often receive feedback about how their children have grasped each skill assessed on a
benchmark test. This feedback is very important since it gives insight into exactly which
concepts the student did not master.

4. Summative Testing

This testing is used as a checkpoint at the end of the year or course to assess how much content
students learned overall. This type of testing is similar to benchmark testing, but instead of only
covering one unit, it cumulatively covers everything students have been spending time on
throughout the year.

These tests are given using the same process to all students in a classroom, school, or state, so
that everyone has an equal opportunity to demonstrate what they know and what they can do.
Students are expected to demonstrate their ability to perform at a level prescribed as the
proficiency standard for the test.

Since summative tests cover the full range of concepts for a given grade level, they are not able
to assess any one concept deeply. So, the feedback is not nearly as rich or constructive as
feedback from a diagnostic or formative test. Instead, these tests serve as a final check that
students learned what was expected of them in a given unit.

Education for Freedom/Elimu ni Uhuru


As a parent or teacher, summative testing may be considered as a confirmation about what may
already be known about a student’s performance. You may not expect to be surprised by the
results, given the regular feedback given in the form of diagnostic, formative, and benchmark
testing throughout the year.

Combining Test Results

We need a balance of the four different types of testing in order to get a holistic view of our
children’s academic performance. Each type of test differs according to its purpose, timing, skill
coverage, and expectations of students.

Though each type offers important feedback, the real value is in putting all that data together:

 Using a diagnostic test, you can gauge what a student already knows and what
she will need to learn in the upcoming unit.
 Formative tests help teachers and parents monitor the progress a student is
making on a daily basis.
 A benchmark test can be used as an early indicator of whether students have met
the lesson’s goals, allowing parents and teachers to reteach concepts that the
student may be struggling with.

Ideally, when heading into the summative testing, teachers and parents should already know the
extent to which a student has learned the material. The summative testing provides that final
confirmation.
B. TYPES OF TESTS
There are two main categories of tests: SUBJECTIVE AND OBJECTIVE TESTS

Objective test: this is a test consisting of factual questions requiring extremely short answers
that can be quickly and unambiguously scored by anyone with an answer key. They are tests that
call for short answer which may consist of one word, a phrase or a sentence.

Subjective test: this is a type of test that is evaluated by giving opinion. They are more
challenging and expensive to prepare, administer and evaluate correctly, though they can be
more valid.

Education for Freedom/Elimu ni Uhuru


Types of Objective Test Items
They include the following:
1) True –False Test Items

Description: In this format, statements are presented as either true or false, and test-takers must
indicate the correctness of each statement. They are easy to prepare, can be marked objectively
and cover a wide range of topics

Purpose: Useful for assessing basic knowledge and the ability to differentiate between true and
false information.

Advantages
i. can test a large body of material
ii. they are easy to score

Disadvantages
i. Difficult to construct questions that are definitely or unequivocally true or false.
ii. They are prone to guessing
2) Matching Items
Description: Test-takers are presented with two columns, one containing items and the other
containing corresponding answers or options. They must match items from one column with the
correct counterparts in the other.
Purpose: Useful for assessing knowledge of relationships, associations, or definitions.

Advantages
a. Measures primarily associations and relationships as well as sequence of events.
b. Can be used to measure questions beginning with who, when, where and what
c. Relatively easy to construct
d. They are easy to score

Disadvantages
a. Difficult to construct effective questions that measure higher order thinking

b. Contain a number of plausible distracters

Education for Freedom/Elimu ni Uhuru


3) Multiple Choice Test Items
Description: These tests present a question or statement with multiple answer choices, of which
only one is correct. Test-takers must select the correct answer. For example, if a statement is
followed by four or five alternative responses from which only the best or correct one must be
selected. The statement or question is termed as ‘stem’. The alternatives or choices are termed as
‘options’ and the ‘key is the correct alternative. The other options are called ‘distracters’.
Purpose: Often used to assess factual knowledge and the ability to make choices among options
Advantages
a. Measures a variety of levels of learning.
b. They are easy to score.
c. Can be analyzed to yield a variety of statistics.
d. When well-constructed, has proven to be an effective assessment tool.
Disadvantages
a. Difficult to construct effective questions that measure higher order of thinking
b. contain a number of plausible distracters.

4) Completion Items or Short Answer Test Items

In this, learners are required to supply the words or figures which have been left out. They may
be presented in the form of questions or phrases in which a learner is required to respond with a
word or several statements.

Advantages
• Relatively easy to construct.
• Can cover a wide range of content.
• Reduces guessing.
Disadvantages
-Primarily used for lower levels of thinking.
-Prone to ambiguity.
-Must be constructed carefully so as not to provide too many clues to the correct answer.
-Scoring is dependent on the judgment of the evaluator.

Education for Freedom/Elimu ni Uhuru


5. Standardized Tests

Description: These are carefully designed and norm-referenced assessments administered under
standardized conditions to ensure fairness and comparability. Examples include SAT, ACT,
GRE, and IQ tests.

Purpose: Commonly used for educational admissions, employment selection, and large-scale
assessment purposes.

6. Fill-in-the-Blank Tests:

 Description: Test-takers are presented with sentences or questions with one or more blanks, and
they must provide the missing words or phrases.
 Purpose: Typically used to assess factual knowledge and understanding of concepts.

7. Short-Response Tests

 Description: Similar to essay tests, but with shorter responses. Test-takers are asked to provide
concise answers or explanations to questions or prompts.
 Purpose: Suitable for assessing knowledge, comprehension, and the ability to provide brief
explanations.

8. Performance Tests

 Description: These tests require test-takers to perform specific tasks or demonstrate skills.
Examples include driving tests, laboratory experiments, or hands-on assessments.
 Purpose: Effective for evaluating practical skills and abilities in real-world scenarios.

9. Oral Tests

 Description: In oral tests, test-takers respond to questions or prompts verbally, often in an


interview or presentation format.
 Purpose: Commonly used to assess speaking and communication skills, language proficiency,
and the ability to express ideas verbally.

Education for Freedom/Elimu ni Uhuru


10. Portfolio Assessments

 Description: This format involves collecting and reviewing a selection of a person's work or
artifacts over time. It can include essays, projects, artwork, or other evidence of learning and
achievement.
 Purpose: Often used to assess a person's overall progress, development, and skills in a holistic
manner.
11. Formative and Summative Assessments
12. Diagnostic Tests

Types of Subjective Test Items


In subjective test we have two types of test which are:
Restricted-response items and
Extended-response items.

Restricted-response items-On restricted response items; examinees provide brief answers,


usually no more than a few words or sentences, to fairly structured questions.
Extended-response items-Here items require lengthy responses that count heavily in scoring.
These items focus on major concepts of the content unit and demand higher level thinking.
Examinees must organize multiple ideas and provide supporting information for major points in
crafting responses.
Advantages of restricted-response items
a. Measures specific learning outcome.
b. Restricted response items provide for more ease of assessment
c. Restricted response item is more structured
d. Any outcomes measured by an objective interpretive exercise can be measured by a restricted
subjective item.

Limitations of restricted-response items


a. Restricts the scope of the topic to be discussed and indicating the nature of the desired
response which limits student’s opportunity to demonstrate this behavior.

Advantages of extended-response items


I. Measures knowledge at higher cognitive levels of education objective such as analysis,

Education for Freedom/Elimu ni Uhuru


synthesis and evaluation.
II. They expose the individual difference in terms of attitudes, values and creative thinking.

Limitations
i. They are insufficient for measuring knowledge of factual materials because they call for
extensive details in selected content area at a time.
ii. Scoring is difficult and unreliable.
Examples of Subjective/Essay Test Items
Extended response item
Imagine that you and a friend had a visit in a national park. Write a story about an adventure that
you and your friend had in the national park.

Restricted response item


Why is the barometer one of the most useful instruments for forecasting weather? Explain in a
brief summary.
Examples of Objective Test Items
Completion item example:
The capital city of Nigeria is___________

Matching item example:


Match the Original rocks in A with the Metamorphic rocks in B
A
Igneous rock
Coal
Limestone
B

Marble
Gneisis
Slate
Kenya

True-false item
Adolescents face an identity crisis period (true/false)

Education for Freedom/Elimu ni Uhuru


Multiple choice item
Which of the following towns is the capital of TANZANIA?
A. DODOMA C. TANGA
B. DAR ES LAAM D. MUSOMA

Education for Freedom/Elimu ni Uhuru


TOPIC 6: TEST CONSTRUCTION, TEST PLANNNING AND TEST ADMISTRATION

Before discussing test construction describing qualities of a good test and factors affecting test
is important.
Qualities of a Good Test/exam
Validity - A good test should measure what it is supposed to measure i.e. it should measure
specific objective(s) of the test set. A test that is set in a language that is not understandable is
invalid.
Reliability - A good test should yield the same results on a re-test on the same group of learners
under similar conditions.
Practicality /Usability- A test is said to be practical or usable if it can be readily used by the
teacher in everyday classroom conditions.
Cost -- A test which costs too much material to produce or a marking scheme which is hard to
make renders a test useless.
Factors to consider when constructing a Test
Specification of objectives -The kind of vocabularies used should elicit the kind of responses
required from the candidates.
Content -The examiner should ensure that questions set cover all topics taught/covered in class.
Emphasized content areas- Some content areas/topics should be given more emphasis then
others depending on the time spent to cover and the total number of questions usually set from
such topics.
Ability level of students - Questions set should be able to differentiate between bright, average
and weak pupils.
Specification for types of domains to be measured- Questions set should include cognitive,
affective and psychomotor domains.
Specification of the cognitive domain to be measured
This include (Bloom’s taxonomy) as indicated in topic 3
TABLE OF SPECIFICATION AND THE IMPORTANCE OF TABLE OF
SPECIFICATION
A Table of Specification (TOS), also known as a Test Blueprint, is a document or matrix that
outlines the content, skills, and cognitive levels that a test or assessment will measure. It
provides a clear and organized plan for test development, ensuring that the assessment aligns

Education for Freedom/Elimu ni Uhuru


with the learning objectives and curriculum. Here's a description of a Table of Specification and
its importance:

Components of a Table of Specification:


1. Content Areas: The table typically lists the content areas or topics that the test will cover. This
section defines what the test is assessing.
2. Cognitive Levels: The table specifies the cognitive levels or thinking skills required to answer
the questions correctly. Common cognitive levels include knowledge, comprehension,
application, analysis, synthesis, and evaluation (following Bloom's Taxonomy).
3. Percentage or Weighting: Each content area and cognitive level is assigned a percentage or
weighting that represents its relative importance on the test. This indicates the proportion of
questions that will address each area and level.
4. Item Types: Some TOS documents may include information about the types of items or
questions that will be used, such as multiple-choice, essay, or short answer.

Importance of a Table of Specification:


1. Alignment with Objectives: A TOS helps ensure that the test is aligned with the specific
learning objectives, instructional goals, and content covered in a course or curriculum. It
promotes clarity in assessment design.
2. Fairness: By providing a clear breakdown of the content and cognitive levels, a TOS helps
ensure that the test assesses a representative sample of what was taught. This promotes fairness
to all students.
3. Coverage and Balance: It helps test developers maintain a balanced distribution of content and
cognitive levels. This prevents overemphasis on certain topics or skills at the expense of others.
4. Efficiency: A TOS assists in efficient test construction by guiding item writers in creating
questions that match the intended content and cognitive focus. This saves time and resources in
test development.
5. Validity and Reliability: Test validity, which assesses whether a test measures what it is
supposed to measure, is enhanced when the TOS is used to guide item creation. Similarly, a
well-constructed TOS can contribute to the reliability of the assessment.

Education for Freedom/Elimu ni Uhuru


6. Communication: A TOS serves as a communication tool between test developers, instructors,
and other stakeholders. It ensures that all parties have a clear understanding of the test's design
and intent.
7. Feedback and Improvement: After the test has been administered, the TOS can be used to
analyze the results in relation to the planned content and cognitive distribution. This can inform
decisions about the test's quality and potential areas for improvement.
8. Transparency: A TOS promotes transparency in assessment practices, helping students
understand what will be assessed and how it will be assessed. This can reduce test anxiety and
improve student performance.
9. Accountability: In educational settings, a TOS helps ensure that assessments are aligned with
curriculum standards and educational goals, supporting accountability in the education system.
10. Helps a teacher not to concentrate on a particular domain of objectives

In summary, a Table of Specification is a valuable tool in assessment design and development. It


enhances the quality, fairness, and effectiveness of assessments by ensuring alignment with
learning objectives, content coverage, and cognitive levels. It also serves as a blueprint for
constructing valid and reliable tests that accurately measure what they are intended to measure.

Describe What Is Blue Print And Its Application


A blueprint, in the context of education and assessment, is a systematic and detailed plan or
document that outlines the structure and content of an assessment, such as an exam or test. It
serves as a roadmap for designing, constructing, and administering the assessment, ensuring that
it aligns with specific learning objectives and instructional goals. Blueprints are commonly used
in educational assessment, certification exams, and standardized testing. Here's a description of
what a blueprint is and its applications:

Components of a Blueprint:
1. Content Areas: A blueprint specifies the content areas or topics that will be covered in the
assessment. It provides a clear outline of what knowledge or skills are being assessed.
2. Cognitive Levels: Blueprints define the cognitive levels or thinking skills that the assessment
will target. These levels are often based on educational taxonomies like Bloom's Taxonomy and
may include categories such as knowledge, comprehension, application, analysis, synthesis, and
evaluation.

Education for Freedom/Elimu ni Uhuru


3. Weighting or Percentage: Each content area and cognitive level is assigned a percentage or
weighting that indicates its relative importance on the assessment. This weighting reflects the
proportion of questions or items that will address each area and level.
4. Item Types: Blueprints may include information about the types of items or questions that will
be used in the assessment, such as multiple-choice, essay, true-false, or short-answer questions.

Applications of a Blueprint:
1. Assessment Design: Blueprints serve as a foundational document for designing assessments that
are aligned with specific learning objectives and curricular goals. They guide test developers in
creating questions or tasks that accurately measure the intended content and cognitive focus.
2. Content Coverage: Blueprints ensure that the assessment covers a balanced representation of
the content areas and cognitive levels outlined in the curriculum. This prevents overemphasis on
certain topics and provides a comprehensive evaluation of student knowledge and skills.
3. Fairness: By specifying the content areas and cognitive levels in advance, blueprints help ensure
that the assessment is fair to all test-takers. No single topic or skill should be disproportionately
emphasized, which can reduce bias and promote equity.
4. Efficiency: Blueprints contribute to the efficient development of assessments by providing a
clear roadmap for item writers and test constructors. This helps streamline the item-writing
process and ensures that the assessment aligns with the intended focus.
5. Alignment: Blueprints facilitate alignment between the assessment and the curriculum or
learning standards. They ensure that the test measures what is taught and intended to be learned.
6. Validity and Reliability: Following a blueprint can enhance the validity of the assessment by
aligning it with the intended objectives. Additionally, a well-constructed blueprint can contribute
to the reliability of the assessment by ensuring consistency in content coverage and cognitive
focus.
7. Transparency: Blueprints promote transparency in assessment practices by clearly
communicating the assessment's design and intent to stakeholders, including students, teachers,
and administrators.
8. Feedback and Improvement: After the assessment is administered, blueprints can be used to
analyze the results in relation to the planned content and cognitive distribution. This information
can inform decisions about the assessment's quality and areas for improvement.

Education for Freedom/Elimu ni Uhuru


9. Accountability: In educational settings, blueprints help ensure that assessments are aligned with
curriculum standards and educational goals, supporting accountability in the education system.

In summary, a blueprint is a structured plan that guides the design and development of
assessments. Its applications include aligning assessments with learning objectives, promoting
fairness, enhancing efficiency, and ensuring that assessments accurately measure the intended
content and cognitive levels. Blueprints play a crucial role in educational and standardized
testing, helping to create assessments that are valid, reliable, and aligned with educational goals.

A. Ways of Constructions a test


Test construction is the process of building a test. For a test to be deemed good, reliability and
validity must be determined.

Tests should be constructed and administered in such a way that the scores (marks) reflect the
ability they are supposed to measure.
The type of test to be constructed depends on the nature of the ability it’s meant to measure and
purpose of the test.
Certain types of educational tests can only be constructed by teams of suitably qualified and
equipped researchers.
The process of test construction is long and painstaking for it involves creating large batteries of
test questions in the particular area to be examined followed by extensive trials in order to assess
their effectiveness.

Construction of Objective Tests


They are tests that call for short answers which may consist of one word, a phrase or sentence. In
these tests all possibility of human error or prejudice by the marker is removed by constructing
items that demand answers that are either right or wrong and each of which there is only one
possible answer.
Guideline to constructing true or false items
- Do not provide clues by using determinants such as ‘all’, ‘never’, ‘absolutely’ or ‘none’
because they signal that the statement is false. Words such as ‘may’, perhaps’, sometimes and
‘could’ signal that the statement is true. If such words are to be used, they must be balanced and
used in both true and false statements.

Education for Freedom/Elimu ni Uhuru


- Statements must be irrevocably true or false, so they must be unambiguous (clear).
- Use of negative statements should be avoided.
- Limit true or false statement to a single concept. True or false tests items may require the
learner to underline a word or clause in a statement, correct a false statement or trace a path in a
maze.

2. Construction of Matcthing Test Items


These items involve connecting contents of one list to contents in another list. The learners are
presented with two columns of items, for instance, column A and column B. they are asked to
match each item that appears in column A with an appropriate item from column B. in such
questions, an equal number of premises (what is in the left hand column) may be provided for
balance or perfect matching when an unequal number of premises and responses are provided,
this is called an unbalanced or imperfect matching.
To control guess work, it is better to have more responses and fewer premises.
When writing the items in the columns, it is important to:
- Keep the expressions homogeneous.
- Make the items relatively short.
- Use heading for each column that accurately describes its content
- Specify the basis for matching.

3. Completion or Short Answer Test Items


In this, learners are required to supply the words or figures which have been left out.
They may be presented in the form of questions or phrases in which a learner is required to
respond with a word or several statements.
Questions must be specific and unambiguous. For instance: JOMO KENYATTA WAS BORN
IN_____________
This is ambiguous since it’s not clear whether it is his date of birth or the country or place where
he was born that is required.
Besides this, statements that leave too many key words may not carry the intended meaning. If
the answer is numerical or a quantity the unit must be indicated. The answer required should be
related to the main point or statement.
In constructing completion items, the blank should come last to ensure that the learners read the
whole question before supplying the answer. Unintentional help should not be given in the

Education for Freedom/Elimu ni Uhuru


question, for example, JUDAS ISCARIOT, WHO BETRAYED JESUS WAS BORN
AN_____________
In the above question ‘an’ provides unintentional help to the learners as it means that the answer
must begin with a vowel.

4 Multiple Choice Test Items Construction

In a multiple choice question, a statement of fact is made. It is followed by four or five


alternative responses from which only the best or correct one must be selected.
The following are the guidelines that a teacher should use when constructing multiple choice
items:
- Draw a table of specification showing topics or subtopics and the skills to be tested. The tables
of specification come from the subject syllabus. The test items should be based on the three
domains of learning (cognitive, affective, psychomotor)
- The area emphasized during teaching should have more items.
- Questions should be based on bloom’s taxonomy- of the six levels of cognitive objectives
multiple choice questions should reflect comprehension, application and analysis. There should
be minor doses of knowledge, synthesis and evaluation. Knowledge is too basic while synthesis
is too complete. Allocation of marks for these skills can be as follows:
Knowledge- 12%
Comprehension-16%
Application -32%
Analysis-20%
Synthesis-12%
Evaluation -8%
Total= 100%
- The stem of the question should state the problem clearly. It should not contain unnecessary
information
- Options should be carefully selected and must include the best answer or key.
- Each question should be relevant and not far- fetched.
- All options should be almost equal in length.
- The distractors should be relevant and not far-fetched.
- Placement of the key should be unpredictable and should not follow a pattern.

Education for Freedom/Elimu ni Uhuru


- No test or option test should provide clues or be answers to another question in the same test.
- The reading difficult and vocabulary level of items should correspond to the level of the
learners.
- All items should be independent.
- Avoid tricky questions
- Ensure instructions to learners are clear.
- Edit the paper carefully.
 Construction Of Subjective Tests
In this kind of test the objective is to measure qualities such as student’s ability to
perform certain practical or intellectual skills which might include describing something
accurately either in oral or written or written form using materials imaginatively, working
creatively, and handling information logically building convincing arguments or exposing
flaws in the arguments of others.
The common types of subjective test items that we have are:
Restricted-response items
Extended-response items
Construction of the above types test items has a detailed process which includes the
following stages:
 Developing the prompt
 Creating the scoring rubric
 Scoring response

Developing the prompt

The prompt for a subjective item poses a question, presents a problem, or prescribes a task. It
sets forth a set of circumstances to provide a common context for framing the response.
Action verbs direct the examinee to focus on the desired behavior, for instance, solve, interpret,
compare and contrast, discuss or explain. Appropriate directions indicate expected length format
of the response, allowable resources or equipment’s, time limits and features of the response that
count in scoring.

Creating the scoring rubric


These are analytic or holistic in nature.

Education for Freedom/Elimu ni Uhuru


- For holistic rubric the item writer/ constructor lists desired features of the response with a
number of points awarded for each specific feature.
- An analytic rubric provides a scale for assigning points to the response based on overall
impression.
A range of possible points is specified and verbal descriptors are developed to characterize a
response located at each possible point on the scale.
Illustrative responses that correspond to each scale point are often developed or selected from
actual examinee responses.

Scoring response

During subjective scoring at least four types of rater errors may occur as the rater; becomes more
lenient or severe over time or scores erratically due to fatigue or distractions; has knowledge or
belief about an examinee that influences perception of response; is influenced by examinees
good or poor performance on items previously or influenced by the strength or weakness of a
preceding examinees response.
Under extended response items we can take an example of the essay test items look on how it is
constructed:
- Essay items require learners to write or type the answer in a number of paragraphs. The
learners use their own words and organize the information or material as they see it fit.
- In writing essay test, clear and unambiguous language should be used. Words such as ‘how’,
‘why’, ‘contrast’, ‘describe’ and discuss are useful. The questions should clearly define the scope
of the answer required.
- The time provided for the learner to respond to the questions should be sufficient for the
amount of writing required for a satisfactory response. The validity of questions can be enhanced
by ensuring that the questions correspond closely to the goals or objective being tested.
- An indication of the length of the answer required should be given.

B. Test Planning:

Factors to consider when planning for test


 Target Audience: Consider the characteristics of the individuals who will be taking the test,
such as age, educational level, and cultural background.

Education for Freedom/Elimu ni Uhuru


 Test Administration Environment: Determine the physical location, equipment, and resources
required for administering the test. Ensure that the environment is conducive to fair testing
conditions.
 Scoring and Interpretation: Decide how the test will be scored and how scores will be
interpreted. Establish clear scoring criteria and guidelines for evaluating responses.
 Time Allocation: Determine the time limits for the test, ensuring that they are reasonable and
appropriate for the content and skills being assessed.
 Security Measures: Implement security measures to protect the test's integrity, including
preventing cheating and unauthorized access to test materials.
 Accessibility: Consider the needs of individuals with disabilities and provide accommodations
or alternative formats as necessary to ensure fairness.

C. Test Administration:

The following factors should be followed during test admistration;

 Training: Ensure that test administrators are adequately trained in test administration
procedures, including maintaining test security and following standardized instructions.
 Preparation: Set up the test environment before test-takers arrive. Check that all necessary
materials are available, including test booklets, answer sheets, and any required equipment.
 Instructions: Clearly and concisely provide instructions to test-takers regarding the format of
the test, time limits, and any special procedures.
 Monitoring: Supervise the test administration to prevent misconduct, such as cheating or
unauthorized assistance.
 Accommodations: Provide accommodations, as needed, for individuals with disabilities or
special needs, ensuring equal access to the assessment.
 Time Management: Keep track of the time during the test to signal when specific sections or
tasks need to be completed.
 Handling Issues: Address any issues or disruptions during the test administration promptly and
professionally.
 Collection of Test Materials: Collect all test materials, including completed test booklets and
answer sheets, and ensure they are securely stored.

Education for Freedom/Elimu ni Uhuru


Education for Freedom/Elimu ni Uhuru
TOPIC 8: QUANTITATIVE AND QUALITATIVE METHODS APPLIED IN
SELECTION OF TEST ITEMS
Quantitative and qualitative methods are both valuable approaches in the selection of test items
for assessments, whether you're creating an educational test, psychological assessment, or any
other type of evaluation. These methods help ensure that test items are relevant, effective, and
aligned with the intended objectives. Here's a description of how each method can be applied in
the selection of test items:

Quantitative Methods

Quantitative methods provide a systematic and data-driven approach to the selection of test
items, ensuring that assessments are fair, valid, and reliable. These methods help assessment
developers create tests that accurately measure the intended constructs and provide meaningful
information for decision-making in various fields, including education and psychology.

Quantitative Methods involves the following


1. Item Difficulty Analysis

 Description: This quantitative method involves calculating the percentage of test-takers


who answer each item correctly. Items that are too easy (answered correctly by almost
everyone) or too difficult (answered correctly by almost no one) may need revision or
removal.

 Application: Analyzing item difficulty can help ensure that the test includes items with
an appropriate level of challenge.

2. Item Discrimination Analysis:

 Description: Item discrimination assesses how well individual items differentiate


between high- and low-performing test-takers. It is calculated by comparing the
performance of the top and bottom groups of test-takers on each item.

 Application: Items with high discrimination values are typically retained, as they
effectively separate individuals with different levels of the construct being measured.

3. Item-Total Correlation:

Education for Freedom/Elimu ni Uhuru


 Description: This method assesses the relationship between the performance on a
specific item and the overall test score. It helps identify items that are well-aligned with
the overall test.

 Application: Items with low item-total correlations may need revision or removal as they
may not be contributing effectively to the measurement of the intended construct.

4. Item Analysis Statistics:

 Description: Various statistical techniques, such as point-biserial correlation, can be


applied to assess the quality of test items quantitatively. These statistics provide insights
into how well items perform in the context of the test.

 Application: Item analysis statistics help identify problematic items, allowing for data-
driven decisions in item selection and revision.

5. Item Response Theory (IRT): IRT is a sophisticated quantitative approach used to model
the relationship between an individual's ability and their responses to test items. IRT models
provide valuable information about item difficulty and discrimination and are commonly
used in the development of standardized tests.
6. Factor Analysis: Factor analysis is employed to identify underlying factors or dimensions in
a set of test items. This technique helps ensure that test items are measuring the intended
constructs and can reveal if there is redundancy or overlap among items.
7. Differential Item Functioning (DIF) Analysis: DIF analysis examines whether test items
function differently for different subgroups of test takers (e.g., males vs. females, different
ethnic groups). Detecting DIF is important to ensure that test items are unbiased and do not
favor one group over another.
8. Item Banking: Item banking involves storing test items in a database and categorizing them
based on their characteristics (e.g., difficulty level, content area). Quantitative methods are
used to manage and organize item banks effectively, making it easier to select items for
specific tests.
9. Computerized Adaptive Testing (CAT): CAT uses quantitative algorithms to select test
items based on a test taker's responses to previous items. This adaptive approach tailors the
test to an individual's ability level, allowing for more precise and efficient assessment.

Education for Freedom/Elimu ni Uhuru


Qualitative Methods:
1. Expert Review:

 Description: Expert reviewers, often subject-matter experts or experienced educators,


examine each test item for content validity, clarity, and appropriateness.

 Application: Expert reviews can identify items that may need refinement or revision
based on their expertise in the subject area.

2. Cognitive Interviews:

 Description: Cognitive interviews involve administering test items to a small group of


individuals while asking them to "think aloud" as they answer. Researchers observe how
participants interpret and respond to the items.

 Application: Cognitive interviews can uncover issues related to item wording,


comprehension, and response processes, helping refine items for clarity and fairness.

3. Focus Groups and Pilot Testing:

 Description: Focus groups and pilot testing involve gathering feedback from a sample of
test-takers on the overall test, including individual items. This qualitative input can reveal
any difficulties or concerns that arise during the testing process.

 Application: Feedback from focus groups and pilot testing can inform revisions to test
items and administration procedures.

4. Item Review Committees:

 Description: Committees comprised of multiple experts and stakeholders can


collectively review and discuss test items. This collaborative approach ensures diverse
perspectives are considered.

 Application: Item review committees help identify and address issues related to bias,
fairness, and alignment with testing goals.

5. Content Analysis:

Education for Freedom/Elimu ni Uhuru


 Description: Content analysis involves a systematic examination of test items to ensure
they cover the intended content domain adequately. It ensures that the items are
representative of the construct being measured.

 Application: Content analysis helps verify that test items align with the content and
objectives of the assessment.

NOTE

Quantitative and qualitative methods can complement each other in the item selection
process. Quantitative analyses provide numerical insights into item performance, while
qualitative methods offer deeper insights into the cognitive processes and perceptions of test-
takers and experts. The combination of both approaches helps ensure the quality and
effectiveness of test items in assessing the desired constructs accurately.

Education for Freedom/Elimu ni Uhuru


TOPIC 9: TEST VALIDITY AND RELIABILITY OF TESTS

Test validation is a crucial process in the development and use of assessments, whether they are
educational exams, psychological tests, or any other type of evaluation. Validation refers to the
process of gathering evidence to support the appropriateness, meaningfulness, and effectiveness
of a test in measuring the intended construct or trait. Two key aspects of test validation are
validity and reliability.

1. Test Validity

Validity refers to the extent to which a test measures what it is intended to measure. It is the most
critical aspect of test quality because it determines whether the test results are accurate and
meaningful in assessing the specific trait or construct of interest.

There are several types of validity evidence that can be collected during the validation
process:

i) Content Validity: Content validity assesses whether the items on a test adequately represent
the domain or content that the test is supposed to measure. This is typically assessed through
expert judgment and content analysis to ensure the test items cover the relevant material
comprehensively.

Factors Affecting Content Validity:


 Item Sampling: The representativeness of the test items is crucial.
If certain content areas are overrepresented or underrepresented,
content validity can be compromised.
 Relevance: Test items should be relevant to the construct being
measured. Irrelevant or extraneous items can reduce content
validity.
ii) Criterion-Related Validity: Criterion-related validity examines whether a test's scores
correlate with external criteria that are known to be related to the construct being measured.
There are two subtypes of criterion-related validity:

Education for Freedom/Elimu ni Uhuru


Factors Affecting Criterion-Related Validity:
Criterion Choice: The choice of an appropriate criterion is crucial. If the chosen criterion is not
a good representation of the construct, criterion-related validity may be compromised.
Criterion Measure of Reliability: The reliability of the chosen criterion measure can impact the
strength of the correlation and, consequently, criterion-related validity.

iii) Concurrent Validity: This involves comparing test scores to external criteria at the same
time.
iv) Predictive Validity: This assesses the ability of test scores to predict future outcomes.
v) Construct Validity: Construct validity examines whether the test accurately measures an
underlying theoretical construct or trait. It involves a more abstract and theoretical evaluation
of the test's properties and often relies on the use of multiple methods and converging
evidence.
Factors Affecting Construct Validity:
Theoretical Framework: The test should be grounded in a sound theoretical framework that
defines and characterizes the construct of interest.
Convergent and Discriminant Validity: Construct validity can be influenced by the extent to
which the test correlates with other tests measuring similar or different constructs, respectively.
vi) Face Validity: Face validity refers to the superficial appearance of a test and whether it
appears to measure what it claims to measure. It is not a strong form of validity and may not
necessarily indicate a valid test.
Factors Affecting Face Validity:

Item Wording: The wording and presentation of test items can influence how the test is
perceived. Ambiguous or poorly worded items can reduce face validity.

Establishing validity involves conducting research and gathering evidence to support the test's
claims. The evidence can come from various sources, including expert judgments, statistical
analyses, and empirical studies. Collecting validity evidence is an ongoing process, and it helps
ensure that the test remains relevant and accurate for its intended purpose.

Formative Validity When applied to outcomes assessment it is used to assess how well a
measure is able to provide information to help improve the program under study. Example:
When designing a rubric for history one could assess student’s knowledge across the discipline.

Education for Freedom/Elimu ni Uhuru


If the measure can provide information that students are lacking knowledge in a certain area, for
instance the Civil Rights Movement, then that assessment tool is providing meaningful
information that can be used to improve the course or program requirements.

Ways to improve validity


 Make sure your goals and objectives are clearly defined and operationalized.
Expectations of students should be written down.
 Match your assessment measure to your goals and objectives. Additionally, have the test
reviewed by faculty at other schools to obtain feedback from an outside party who is less
invested in the instrument.
 Get students involved; have the students look over the assessment for troublesome
wording, or other difficulties
 If possible, compare your measure with other measures, or data that may be available.

Importance of Test Validation

Test validation is of paramount importance in the field of assessment and testing for several
reasons:

1. Ensures Accuracy and Fairness: Validation ensures that a test accurately measures what it is
intended to measure. It helps confirm that the test's scores are a valid representation of the
construct or trait being assessed. This accuracy is essential to make fair and informed decisions
based on test results.
2. Reduces Bias and Discrimination: Proper validation helps identify and minimize biases in test
items or procedures that could unfairly disadvantage certain groups of test-takers based on
factors like gender, ethnicity, or socioeconomic status. It promotes fairness and equity in
assessment.
3. Enhances Test Utility: Valid tests provide meaningful and useful information. They are more
likely to meet their intended purposes, whether in education, clinical practice, research, or other
domains. Valid assessments are valuable tools for decision-making.
4. Supports Informed Decision-Making: Valid test results are essential for making informed
decisions about individuals, such as educational placement, job selection, or clinical diagnosis.
Without validation, decisions may be arbitrary or unreliable.

Education for Freedom/Elimu ni Uhuru


5. Improves Accountability: In education and other fields, test results are often used to evaluate
the effectiveness of programs, curricula, or interventions. Valid assessments are crucial for
holding educational institutions and other entities accountable for their performance.
6. Safeguards Against Misuse: Validation helps prevent the misuse of tests. Ensuring that a test
measures what it claims to measure and has established reliability prevents tests from being used
for purposes beyond their intended scope or validity.
7. Strengthens Confidence: Valid tests inspire confidence among test-takers, educators, clinicians,
and the general public. People are more likely to trust and accept the results of assessments that
have been rigorously validated.
8. Reduces Legal and Ethical Risks: Using valid tests reduces the risk of legal challenges or
ethical concerns related to assessment practices. Courts and regulatory bodies often require
evidence of test validity when assessing the legality and fairness of test use.
9. Supports Research and Scholarship: Researchers rely on valid assessments to investigate and
explore various phenomena. Valid measures ensure the reliability and accuracy of data collected
for research purposes.
10. Facilitates Continuous Improvement: Validation is an ongoing process. Regularly reviewing
and updating assessments based on new evidence and changing circumstances ensures that tests
remain relevant and effective.
11. Promotes Quality Education and Healthcare: In educational and healthcare settings, valid
assessments are fundamental to providing quality services. They help educators and healthcare
professionals tailor interventions and support to individual needs.
12. Validates Investments: When organizations invest resources in developing, administering, and
using assessments, they want to be confident that these investments will yield accurate and
meaningful results. Validation provides assurance in this regard.

In summary, test validation is essential because it underpins the reliability, fairness, and
usefulness of assessments. It ensures that test results are accurate and appropriate for their
intended purposes, safeguarding individuals' rights and promoting informed decision-making in
various fields.

Education for Freedom/Elimu ni Uhuru


2. Reliability of Test

Reliability refers to the consistency and stability of test scores when the same test is administered
multiple times or by different raters. In other words, a reliable test should produce similar results
when given to the same individuals under consistent conditions.

There are several types of reliability:

 Test-Retest Reliability: This assesses the consistency of scores when the same test is
administered to the same group of individuals on two different occasions. It measures the
stability of scores over time.
 Internal Consistency Reliability: Internal consistency examines the degree to which the items
within a test are measuring the same underlying construct. Common measures of internal
consistency include Cronbach's alpha and split-half reliability.
 Inter-Rater Reliability: Inter-rater reliability is relevant when multiple raters or observers are
involved in scoring assessments. It assesses the degree of agreement among raters.
 Parallel Forms Reliability: Parallel forms reliability assesses the consistency of scores when
two equivalent forms of a test are administered to the same group of individuals. It measures the
consistency of scores across different versions of a test.

Reliability is crucial because if a test is not reliable, it cannot be valid. In other words, a test that
produces inconsistent results cannot accurately measure the trait or construct it is intended to
assess. Therefore, ensuring the reliability of a test is a fundamental step in the validation process.

In summary, test validation involves establishing the validity and reliability of a test to ensure
that it accurately measures the intended construct or trait. Validity assesses whether the test
measures what it claims to measure, while reliability assesses the consistency and stability of test
scores.

Education for Freedom/Elimu ni Uhuru


TOPIC 10: TEST INTERPRETATION

Test interpretation is the process of analyzing and making sense of the results obtained from an
assessment or test. It involves translating raw scores or data into meaningful and useful
information. Effective test interpretation is essential for deriving insights, making informed
decisions, and drawing valid conclusions based on test outcomes. Here's a description of key
aspects of test interpretation:

1. Raw Scores and Scaled Scores:


 Raw Scores: Raw scores represent the actual number of correct or incorrect responses on
a test. They provide a basic measure of performance.
 Scaled Scores: Scaled scores are derived from raw scores and are often used to
standardize test results. They place an individual's performance on a common scale,
making it easier to compare scores across different versions of the test or with a larger
population. Scaled scores are typically reported as standard scores with a mean of 100
and a standard deviation of 15 (for many standardized tests).
2. Percentiles:
 Percentiles rank an individual's performance relative to the performance of others in a
norming group. For example, if a test-taker's score is at the 75th percentile, it means they
scored better than 75% of the people in the norming group. Percentiles provide a clear
sense of where an individual's performance falls in comparison to peers.
3. Interpretive Guidelines and Norms:
 Test publishers often provide interpretive guidelines and norms to help users understand
what different scores mean. These guidelines might include descriptors of performance
levels (e.g., "above average," "average," "below average") based on score ranges.
4. Subscores and Domain Scores:
 Many tests provide subscores or domain scores that break down performance in specific
areas or content domains. Test-takers and stakeholders can gain insights into strengths
and weaknesses in these specific areas.
5. Patterns of Performance:
 Examining patterns of performance across different sections or domains of a test can
provide valuable information. For example, if a student performs significantly better in

Education for Freedom/Elimu ni Uhuru


mathematics than in reading, it may suggest specific areas for improvement or
educational interventions.
6. Comparative Analysis:
 Comparative analysis involves comparing an individual's test scores to a relevant
benchmark or reference group. This can include comparing scores to national averages,
school averages, or previous test scores to track progress over time.
7. Diagnostic Information:
 Some tests provide diagnostic information that goes beyond scores and percentiles. This
information may include detailed item-level analysis to identify specific areas of strength
or areas where improvement is needed.
8. Qualitative Data:
 In addition to quantitative data, qualitative information, such as written responses or
observations, may be part of test interpretation. Qualitative data can provide context and
deeper insights into a person's performance.
9. Clinical Interpretation (in Psychological Assessments):
 In psychological assessments, clinical interpretation involves synthesizing test results
with clinical judgment and contextual information. Psychologists use their expertise to
consider factors such as the individual's history, presenting problems, and cultural
background to arrive at diagnostic conclusions and treatment recommendations.
10. Communication of Results:
 Test results and interpretations should be communicated effectively to test-takers,
parents, educators, clinicians, or other stakeholders. The language and format used should
be clear and actionable, facilitating informed decision-making.
11. Ethical Considerations:
 Test interpretation should adhere to ethical guidelines, ensuring that results are used
responsibly and do not perpetuate biases or stigmatize individuals.
12. Feedback and Follow-up:
 Test interpretation often includes recommendations for follow-up actions based on the
results. This might involve educational interventions, clinical referrals, or further
assessments.

Education for Freedom/Elimu ni Uhuru


TOPIC 11: REPORTING OF TEST RESULTS

Reporting test results is a crucial step in the assessment process, whether in education,
psychology, healthcare, or other fields. How test results are reported can significantly impact the
understanding and utility of the assessment.

Here are several ways of reporting test results, along with reasons for using each approach:

1. Numerical Scores:
 Reason: Numerical scores provide a quantifiable and objective representation of an
individual's performance on a test. They are easy to compare and analyze statistically.
 Use Cases: Numerical scores are commonly used in standardized assessments, such as
educational exams or psychological tests. They allow for straightforward comparisons
across individuals or groups.
2. Percentiles:
 Reason: Percentiles compare an individual's performance to a reference group. They
offer a clear understanding of where an individual's score falls in relation to others.
 Use Cases: Percentiles are valuable in educational and clinical assessments, as they help
educators, clinicians, and individuals interpret how their performance compares to a
larger population.
3. Standard Scores (Z-Scores or T-Scores):
 Reason: Standard scores transform raw scores into a standardized scale with known
properties (e.g., mean of 100, standard deviation of 15). This allows for easier
comparison and interpretation.
 Use Cases: Standard scores are often used in educational and psychological assessments,
providing a common metric for various tests.
4. Grade Equivalents:
 Reason: Grade equivalents express a test score in terms of the grade level at which it is
typical. This can be useful for educators and parents in understanding a student's
performance relative to grade-level expectations.
 Use Cases: Grade equivalents are frequently used in educational assessments,
particularly for young students.
5. Qualitative Descriptors:

Education for Freedom/Elimu ni Uhuru


 Reason: Qualitative descriptors use words or phrases to describe an individual's
performance. They can provide a narrative context for test results.
 Use Cases: Qualitative descriptors are beneficial when communicating with individuals
who may not be familiar with numerical or statistical terminology. They make results
more accessible and understandable.
6. Graphical Representations (Charts or Graphs):
 Reason: Graphical representations visually display test results, making patterns and
trends more apparent. They can enhance comprehension.
 Use Cases: Graphs are useful for illustrating changes in performance over time,
identifying strengths and weaknesses, and conveying complex data concisely.
7. Narrative Reports:
 Reason: Narrative reports provide a detailed, written explanation of test results,
including interpretation, analysis, and recommendations. They offer a comprehensive
understanding of the assessment.
 Use Cases: Narrative reports are common in clinical psychology, healthcare, and
educational assessments when a thorough analysis and interpretation of results are
needed.
8. Profiles or Subscores:
 Reason: Profiles or subscores break down test results into specific areas or domains,
allowing for a more nuanced understanding of strengths and weaknesses.
 Use Cases: Profiles are valuable in educational assessments to guide instructional
planning and in clinical assessments to inform treatment plans.
9. Interpretive Guidelines:
 Reason: Interpretive guidelines provide guidance on how to understand and interpret test
results, including what specific scores or patterns mean.
 Use Cases: Interpretive guidelines are often provided alongside test results to assist users
in drawing meaningful conclusions.
10. Feedback Meetings:
 Reason: Some assessments are best communicated through face-to-face meetings where
a trained professional can discuss results, address questions, and provide personalized
recommendations.

Education for Freedom/Elimu ni Uhuru


 Use Cases: Feedback meetings are common in clinical and educational assessments to
ensure that individuals and stakeholders fully understand the implications of the results.
The choice of reporting method depends on the nature of the assessment, the audience, and the
specific goals of the assessment process. Effective reporting enhances the utility of test results
and supports informed decision-making and interventions

Education for Freedom/Elimu ni Uhuru

You might also like