100% found this document useful (1 vote)
4K views13 pages

Scoring and Interpretation of Test Scores

This document discusses test scoring and interpretation. It provides guidelines for scoring different types of tests, such as multiple choice and essay tests. It explains that marking schemes should be developed prior to testing and scoring should not be done on impression. The document also discusses correcting scores for guessing on objective tests using a correction formula. It describes how test results can be analyzed, including doing an item analysis to evaluate difficulty levels and discrimination of test questions. The goal of analysis is to help improve test quality and better understand student performance.

Uploaded by

Sarim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
4K views13 pages

Scoring and Interpretation of Test Scores

This document discusses test scoring and interpretation. It provides guidelines for scoring different types of tests, such as multiple choice and essay tests. It explains that marking schemes should be developed prior to testing and scoring should not be done on impression. The document also discusses correcting scores for guessing on objective tests using a correction formula. It describes how test results can be analyzed, including doing an item analysis to evaluate difficulty levels and discrimination of test questions. The goal of analysis is to help improve test quality and better understand student performance.

Uploaded by

Sarim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Measurement and Evaluation in Education (PDE 105)

UNIT SIX: TEST SCORING AND INTERPRETATION OF


TEST SCORES

INTRODUCTION
The effort through this Module is to show what test is, why test is important, how tests are
constructed and what precautions are taken to ensure validity of tests. The module will round
off by explaining how tests are scored and interpreted. In order to enjoy the study of the unit,
you should have other units by your side and cross-check aspects relevant to this unit that
were discussed in the previous units.

OBJECTIVES
By the end of this unit, you should be able to:
a. score and interpret tests in general and continuous assessment in particular;
b. analyse test items;
c. compute some measures of general tendency and variability; and
d. compute Z – score and the Percentile.

SCORING OF TESTS
This section introduces to you the pattern of scoring of tests, be they continuous assessment
tests or other forms of tests. The following guidelines are suggested for scoring of tests:
i. You must remember that multiple choice tests are difficult to design, difficult to
administer, especially in a large class, but easy to score. In some cases, they are
scored by machines. The reasons for easy scorability of multiple-choice tests are
because they usually have one correct answer which must be accepted across the
board.
ii. Essay or subject types of tests are relatively easy to set and administer, especially in a
large class. They are, however, difficult to mark or assess. The reason is because
easy questions require a lot of writing of sentences and paragraphs. The examiner
must read all these.
iii. Whether an objective or subjective tests, all tests must have marking schemes.
Marking schemes are the guide for marking any test. They consist the points,
demands and issues that must be raised before the candidate can be said to have
responded satisfactorily to the test. Marking schemes should be drawn before testing
not after the test has been taken. All marking schemes should carry mark allocation.
They should also indicate scoring points and how the scores are totaled up to
represent the total score for the question or the test.
iv. Scoring or marking on impression is dangerous. Some students are very good at
impressing examiners with flowery language without real academic substance. If you
mark on impression, you may be carried away by the language and not the relevant

78
Measurement and Evaluation in Education (PDE 105)

facts. Again, mood may change impression; your impression can be changed by joy,
sadness, tiredness, time of the day and son on. That is why you must always insist on
a comprehensive marking scheme.
v. Scoring can be done question-by-question or all questions at a time. The best way is
to score or mark one question across the board for all students. Sometimes, this may
be feasible and tedious, especially in a large class.
vi. Scores can be interpreted into grades, A, B, C, D, E and F. They may be interpreted
in terms of percentages: 10%, 20%, 50% etc. Scores may be presented in a
comparative way in terms of 1st position, 2nd position, and 3rd position to the last.
Scores can be coded in what is called BAND. In band system, certain criteria are
used to determine those who will be in Excellent, Very Good categories, etc. An
example of a band system is the one given by the International English Testing
Services (IETS) and the one by Teaching English as a Foreign Language (TOEFL)
test.

OBJECTIVE SCORING: Correction Formular


As stated in the main paper, objective test is very easy to score. All other advantages of
objective test are well known to all by now. However, the chances of guessing the correct
answer are high.
To discourage guessing, some objective tests give instructions to candidates that they may be
penalized for guessing. In such situation, the correction formular is applied after scoring.
This is given as:
No. of questions marked right (R) - No of questions marked wrong (W)
No. of options per item (N) - I
If in an objective test of 50 questions where guessing is prohibited, a candidate attempted all
questions and gets 40 of them correctly, then the actual score after correction is.
S = 40 = (assuming the options per item is 5.)
= 40 = 2.5 = 37.5 = 38 out of 50.

Find the corrected scores of two candidates. A and B who both scored 35 in an objective test
of 50, if a attempted 38 questions while B attempted all the questions.
SA = 35 - ¾ = 34 and SB = 35 – 15/4 = 31)
Note that under rights only, each of the students gets 35 out of 50.

USING TEST RESULTS


As earlier mentioned, conducting tests is not an end in itself. However, before tests could be
used for those purposes, the teacher needs to know how well designed the test is in terms of
difficulty level and discrimination power, then he should be able to compare a child’s
performance with those of his peers in the class. Occasionally, he may like to compare the
child’s performance in one subject area with another.
79
Measurement and Evaluation in Education (PDE 105)

To do this, he carries out the following activities at various times:


i. Item analysis.
ii. Drawing of frequency Distribution Tables.
iii. Finding measures of central tendency (Mean, Mode, Median)
iv. Finding measures of Variability and Derived Scores.
v. Assigning grades.
As you are aware of these procedures, some will be discussed in passing while greater
attention will be paid to the others for emphasis.

Item Analysis
Item analysis helps to decide whether a test is good or poor in two ways:
i. It gives information about the difficulty level of a question.
ii. It indicates how well each question shows the difference (discriminate) between the
bright and dull students. In essence, item analysis is used for reviewing and refining a
test.

Difficulty Level
By difficulty level we mean the number of candidates that got a particular item right in any
given test. For example, if in a class of 45 students, 30 of the students got a question
correctly, then the difficulty level is 67% or 0.67. The proportion usually ranges from 0 to 1
or 0 to 100%.
An item with an index of 0 is too difficult hence everybody missed it while that of 1 is too
easy as everybody got it right. Items with index of 0.5 are usually suitable for inclusion in a
test.
Though the items with indices of 0 and 1 may not really contribute to an achievement test,
they are good for the teacher in determining how well the students are doing in that particular
area of the content being tested. Hence, such items could be included. However, the mean
difficult level of the whole test should be 0.5 or 50%.
n x 100
Usually, the formular for their difficulty is p = where
N
P = item difficult
n = the no of students who got the item correct.
N = the number of students involved in the test.
1
However, in the classroom setting, it is better to use the upper of the students that got the
3
1
item right (U) and the lower of the students that got it right (L).
3

80
Measurement and Evaluation in Education (PDE 105)

Hence difficulty level is given by


U + L
N
1
Where N is the number of students actually involved in the item analysis (upper + lower
3
1/3 of the tests).
1
Consider a class of two arms with a population of 60 each. If 36 candidates of the upper
3
population and 20 of the lower 1/3 got question number 2 correctly, what is the index of
difficulty (difficulty level) of the question?
Index of difficulty = P =
1 1
N = 40 = 40 ( upper + lower )
3 3
U = 36
L = 20
(36 + 20) 56
P = =
80 80
= 0.7
i.e. P= 70%
If P = 0 (0%) or 1 (100%) then the test is said to be either too difficulty or too simple
respectively. As much as possible, teachers should avoid administering test items with 0 or 1
difficulty levels.

Item Discrimination
The discrimination index shows how a test item discriminates between the bright and the dull
students. A test with many poor questions will give a false impression of the learning
situation. Usually, a discrimination index of 0.4 and above are acceptable. Items which
discriminate negatively are bad. This may be because of wrong keys, vagueness or extreme
U − L U − L
difficulty. The formular for discrimination index is: or
1 0.5 N
N
2
Where
U = the number of students that got it right in upper group.
L = the number of students that got it right in the lower group.
N = the number of students usually involved in the item analysis.
In summary, to carry out item analysis:

81
Measurement and Evaluation in Education (PDE 105)

i) arrange the scored papers in order of merit – highest and lowest


ii) select the upper 33%
iii) select the lower 33%)
Note that the number of students in the lower and upper groups must be equal.
iv) Item by item, calculate the number of students that got each item correct in each
group.
v) Estimate
U + L
(a) item difficulty =
N
U − L
(b) item discrimination index =
1
N
2
ACTIVITY
In the table below, determine the P and D for items 2, 3 & 4. Item 1 has been calculated as
an example. Total population of Testee is 60.

No. that of the item Right


1/3 upper group 1/3 lower group Difficulty U + Discrimination
Item
U = 20 L = 20 L/N U –L/1/2N
1 15 10 25/40 - 62.5% 5/20 = 0.25
2 18 15 -------(?) -------(?)
3 5 12 -------(?) -------(?)
4 12 12 -------(?) -------(?)

DISTRIBUTION AND MEASURES OF CENTRAL TENDENCY


We shall not dwell so much on the drawing of frequency distribution tables and calculating
measures of central tendency i.e. mode, median and mean here. This will be taken care of
elsewhere. However, we will mention the following about them:

Mode
The mode is the most frequent or popular score in the population. This is usually evident
during the drawing of frequency tables. It is not frequently used as the median and mean in
the classroom because it can fall anywhere along the distribution of scores (top, middle or
bottom) and a distribution may have more than one mode.

82
Measurement and Evaluation in Education (PDE 105)

Median
This is the middle score after all the scores have been arranged in order of magnitude i.e.
50% of the score are on either side of it. Median is very good where there are deviant or
extreme scores in a distribution, however, it does not take the relative size of all the scores
into consideration. Also, it cannot be used for further statistical computations.
The Mean
This is the average of all the scores and it is obtained by adding the scores together and
dividing the sum by the number of scores.
Sum of all Scores
M or = X =
Number of Scores
Though, the mean is influenced by deviant scores, it is very important in that it takes into
cognizance the relative size of each score in the distribution and it is also useful for other
statistical calculations.
ACTIVITY
The mean score is the same as the average score i.e. Sum of all scores/the number of scores.
This is the most common statistical instrument used in our classroom
If in a class of 9, the scores are 29, 85, 78, 73, 40, 35, 20, 10 and 5. Find the mean.
MEASURES OF VARIABILITY
Measure of variability indicates the spread of the scores. The usual measures of variability
are Range, Quartile Deviation and Standard Deviation. Their computations are as illustrated
below.
Range
The range is usually taken as the difference between the highest and the lowest scores in a set
of distribution. It is completely dependent on the extreme scores and may give a wrong
picture of the variability in the distribution. It Is the simplest measure of variability.
Example: 7, 2, 5, 4, 6, 3, 1, 2, 4, 7, 9, 8, 10. Lowest score = 1, Highest = 10. Range =
10 -1 = 9
Quartile Deviation
Note that Quartiles are points on the distribution which divide it into “quartiles”, thus, we
have 1st, 2nd and 3rd quartiles.
Inter-quartile range is the difference between Q3 and Q1 i.e. Q3 = Q1. This is often used
than the Range as it cuts off the extreme score. Semi inter-quartile Range is thus half of
inter-quartile range.
This is also known as the semi-inter quartile range. It is half the difference between the upper
quartile (Q3) and the lower quartile (Q1) of the set of scores.
Q3 − Q1
2
83
Measurement and Evaluation in Education (PDE 105)

Where Q3 = P75 = point in the distribution below which lie 75% of the scores.
Q1 = P25 = Point in the frequency distribution below which lies 25% of the scores.
In cases where there are many deviant scores, the quartile deviation is the best measure of
variability.

Standard Deviation
This is the square root of the mean of the squared deviations. The mean of the squared
deviations is called the variance (S2). The deviation is the difference between each score and
the mean.
∑ x2
SD (Μ) =
N
x = X - X - deviation of each score from the mean
N = number of scores.
The SD is the most reliable of all measures of variability and lend itself for use in other
statistical calculations.
Deviation is the difference between each score (X) and the mean (M). To calculate the
standard deviation:
(i) find the mean (m)
(ii) find the deviation (x-m) and square each.
(iii) sum up the squares and divide by the number of the population (N)
(iv) find the positive square root.

Calculation of Standard Deviation

Deviations
Squared deviation
Students Marks obtained X–m
(X – m)2 = x2
Take m = 54
A 68 14 196
B 58 4 16
C 47 -7 49
D 45 -9 81
E 54 0 0
F 50 4 16
G 62 8 64
H 59 5 25
I 48 -6 36
J 52 -2 4
487
84
Measurement and Evaluation in Education (PDE 105)

N = 10
∑ x2 ( X − M )2 487
SD (Μ) = = = = 6.97
N N 10
Activity
Find the mean and standard deviation for the following marks.
20, 45. 39, 40, 42, 48, 30, 46 and 41.

DERIVED SCORES
In practice, we report on our students after examinations by adding together their scores in
the various subjects and thereafter calculate the average or percentage as the case may be.
This does not give a fair and reliable assessment. Instead of using raw scores, it is better to
use derived scores”. A derived score usually expresses every raw score in terms of other raw
score on the test. The commonly used ones in the class room are the Z-Scores, T-Score and
Percentiles. The computation of each of these will be demonstrated.

STANDARD SCORE OR Z-SCORE


Standard score is the deviation of the raw score from the mean divided by the standard
deviation i.e.
X − X
Z =
SD
Where Z = Z - score
X = any raw score
X = the mean
SD = Standard Deviation
Raw scores above the mean usually have positive Z-scores while those below the mean have
negative Z-scores. Z-scores can be used to compare a child’s performance with his peers in a
test or his performance in one subject with another.

T-Score
This is another derived score often used in conjunction with the Z-score. It is defined by the
equation.
T = 50 + 10Z
Where z is the standard score.
It is also used in the same way as the Z-score except that the negative signs are eliminated in
T-Scores.

85
Measurement and Evaluation in Education (PDE 105)

Student Score in English Score in Maths Total Rank


A 68 20 88 (8)
B 58 45 103 (1)
C 47 39 85 (9)
D 45 40 85 (10)
E 54 42 96 (3)
F 50 48 98 (2)
G 62 30 92 (7)
H 59 36 95 (4)
I 48 46 94 (5)
J 52 41 93 (6)

Consider the maximum scores obtained in English and Mathematics in the table above. We
cannot easily guarantee which of the subject was more tasking and in which the examiner
was more generous. Hence, for justice and fair play, it is advisable to convert the scores in
the two subjects into common scores (Standard scores) before they are ranked. Z – and T –
score are often used.
The Z – score is given by
Raw Score − mean X − M
Z - Score = =
Standard deviaiton SD

Also the T – Score is given as


T = 50 + 10 Z
The T-score helps to eliminate the negative or fractional scores arising from Z-scores.

Activity
Calculate the Z- and T-scores for students A,B,C and D in the table above.

Percentile
This expresses a given score in terms of the percentage scores below it i.e. in a class of 30,
Ibrahim scored 60 and there are 24 pupils scoring below him. The percentage of score below
60 is therefore:
24 100
× = 80%
30 1
Ibrahim therefore has a percentile of 80 written P80. This means Ibrahim surpassed 80% of
his colleagues while only 20% were better than him. The formula for the percentile rank is
given by:

86
Measurement and Evaluation in Education (PDE 105)

100 F
PR = × (b + ) where
N 2
PR = Percentile rank of a given score
b = Number of scores below the score
F = Frequency of the score
N = Number of all scores in the test.

COURSE CREDIT SYSTEM AND GRADE POINTS


Perhaps the most precious and valuable records after evaluation are the marked scripts and
the transcripts of a student. At the end of every examination e.g. semester examination, the
marked scripts are submitted through the head of department or faculty to the Examination
Officer. Occasionally, the Examination Officer can round off the marks carrying decimal,
either up or down depending on whether or not the decimal number is greater or less than 0.5
The marks so received are thereafter translated/interpreted using the Grade Point (GP),
Weighted Grade Point (WGP), Grade Point Average (GPA) or Cumulative Grade Point
Average (CGPA).

CREDIT UNITS
Courses are often weighed according to their credit units in the course credit system. Credit
units of courses often range from 1 to 4. This is calculated according to the number of
contact hours as follows:
1 credit unit = 15 hours of teaching.
2 credit units = 15 x 2 or 30 hours
3 credit units = 15 x 3 or 45 hours
4 credits units = 15 x 4 or 60 hours
Number of hours spent on practicals are usually taken into consideration in calculating credit
loads.

GRADE POINT (GP)


This is a point system which has replaced the A to F Grading System as shown in the
summary table below.

WEIGHTED GRADE POINT (WGP)


This is the product of the Grade Point and the number of Credit Units carried by the course
i.e. WGP = GP x No of Credit Units.

87
Measurement and Evaluation in Education (PDE 105)

GRADE POINT AVERAGE (GPA)


This is obtained by multiplying the Grade Point attained in each course by the number of
Credit Units assigned to that course, and then summing these up and dividing by the total
number of credit units taken for that semester (total registered for).
Total Points Scored
GPA =
Total Credit Units registered

Total WGP
=
Total Credit Units registered

CUMMULATIVE GRADE POINT AVERAGE (CGPA)


This is the up-to-date mean of the Grade Points earned by the student. It shows the student’s
over all performance at any point in the programme.
Total Points so far Scored
CGPA =
Total Credit Units so far taken A or registered

SUMMARY OF SCORING AND GRADING SYSTEM


Cumulative
Percentage Grade Points Grade Points Grade Point Level of Pass
Credit Units Lower Grades
Scores (GP) Average (GPA) Average in Subject
(CGPA)
(I) (II) (III) (IV) (V) (VI (VII)
Derived by
Vary according to
70 - 100 A 5 multiplying column 4-50 – 5.00 Distinction
contact hours
(I)
Gradaunts with GP
assigned to each 60 - 69 B 4 3.50 – 4.49 Credit
and divided by
course per week 50 - 59 CC 3 Total Credit 2.40 – 3.49 Merit
per term and
according to work Units
Load carried by 45 - 49 D 2 1.50 – 2.39 Pass
Student
Lower
40 - 44 E 1 1.00 – 1.49
Pass
0.39 F 0 0.99 Fair

(The scores and their letter grading may vary from programme to programme or Institution to
Institution)

For example, a score of 65 marks has a GP of 4 and a Weighted Grade Point of 4 x 3 if the
mark was scored in a 3 unit course. The WGP is therefore 12. If there are five of such
courses with course units 4, 3, 2, 2 and 1 respectively. The Grade Point Average is the sum
of the five weighted Grade Points divided by the total number of credit units i.e. (4 + 3 + 2 +
2 + 1)

88
Measurement and Evaluation in Education (PDE 105)

ACTIVITY
Below is a sample of an examination transcript for a student
a. Determine for each course the
(i) GP and
(ii) WGP.
b. Find the GPA.

Course Codes Credit Unit Score Grade G.P. W.G.P


EDU 121 2 40 E
EDU 122 3 70 A
EDU 123 1 50 C
GSE 105 4 70 A
GSE 106 1 60 B
GSE 107 3 42 E
PES 131 1 39 F
PES 132 4 10 F
PES 133 2 45 D
TOTAL

NOTE:
WGD
GPA =
Total Credit taken

SUMMARY
• In this unit, we have discussed the basic principles guiding scoring of tests and test
interpretations.
• The use of frequency distribution, mean, mode and mean in interpreting test scores
were also explained.
• The methods by which test results can be interpreted to be meaningful for classroom
practices were also vividly illustrated.

89
Measurement and Evaluation in Education (PDE 105)

ASSIGNMENT
1. State the various types of Tests and explain what each measure are?
2. Pick a topic of your choice and prepare a blue-print table for 25 objective items.
3. Explain why:
(a) we use percentile to describe student’s performance and
(b) Z-scores to describe in a distribution.
4. Give four factors each that can affect the reliability and validity of a test.
5. Use the criteria and basic principles for constructing continuous assessment tests
discussed in this unit to develop a 1 hour continuous assessment test in your subject
area. By citing specific examples from the test you have constructed, show how you
have used the testing concepts learnt to construct the test. You should bring out from
your test at least ten testing concepts used in the construction of the test.

REFERENCES
Canale, M and Swain (1980) Theoretical Basis of Communicative Approaches to Second
Language Teaching and Testing. Applied Linguistics 1 (I): 1 – 47.

Carroll, J. B. (1983) Psychometric Theory and Language testing in Oller, J. W (ed) Issues
in Language Testing Research Rowley, Mass: Newbury House.

Cook, T. d and C. s. Reichardt (eds) (1979) Qualitative and Quantitative Methods in


Evaluation Research. Beverly Hills, Calif… Sage.

Lado, R. (1961) Language Testing: The Construction and Use of Foreign Language
Tests. London: Longman.

Licingston, S. A. and M. J. Zeiky (1982) Passing Scores: A Manual for setting standards
of Performance on Educational and Occupational Tests. Princeton N. J:
Educational Testing Services.

90

You might also like