EPS NOTES
EPS NOTES
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
MAY, 2021
LECTURE ONE
1.1 Introduction
The lesson focuses on meaning of statistic and its uses in the context of education.
Relationship of statistics to measurement/evaluation brought out by analyzing key terms namely,
data, variable, descriptive and inferential statistics and scales (levels) of measurement which
majorly dictate the method of statistics
.
Statistics
Statistics is concerned with scientific methods of collecting, organizing, summarizing,
presenting and analyzing data to draw valid conclusions and make reasonable decision on
the basis of the analysis.
Such data compiled and analyzed could be student marks, student enrolment in a school, sales of
a business, passes and failures in an examination etc
Rationale for doing statistics
Statistics is needed for the purpose of research extension of knowledge and solving problems
And in interpreting mass of data e.g student marks, number of passes and failures in exams, sales
of a business, profits made by different companies in same industry,…etc.
Statistic
This is a derived numerical value that describes some property of data, e.g, averages, rates etc.
Statistical Methods
These are ways or means of processing data to extract their full significance. e.g, calculation of
mean, mode and median
Variable
This refers to any single property or characteristic that is possible for different individuals to
possess in different quantities. A variable that assumes only one value is called a constant (i.e. a
variable with only one value in its domain). A variable which can theoretically assume any value
between any two given values is called a continuous variable while a variable that can only take
particular values (i.e whole numbers) or does not take at least one value between any given two
points is called discrete variable. Generally, measurement gives rise to continuous data while
counting or enumeration gives rise to discrete data.
Ordinal Scale
This scale distinguishes the individuals or objects and also gives the relative position of individuals
with respect to some property or attribute but it does not indicate the distance between positions.It
is characterized by related order categories. The concept of greater than, or less than as well as
counting is applicable. Data in this scale may be assigned e.g, 3 for good, 2 for fair and 1 for poor
in grading system or head teacher>Deputy head teacher >senior teacher
Interval Scale
This scale provides equal intervals from an arbitrary origin. The distance (difference) between any
two numbers on this scale is of a known valueThis scale orders individuals or objects or events
according to the amount of attribute or property they possess and also establishes equal intervals
between the units of measure e.g. given two scores 45 and 40, 45 is better than 40 and five items
were missed more for the one who got 40. In interval scale of measurement, counting is possible,
use of >or <is also possible i.e. it has order and it can be stated meaningfully by how much two
of them differ.
Ratio Scale
Ratio scale is the highest type of measurement, which provides a true zero point as well as equal
intervals. Ratios, which are meaningful, can be formed between any two given values on the scale.
A metric rod used to measure length in units of cm is a ratio scale, for the origin on the scale is an
absolute zero corresponding to no length at all. That is, lengths measured in say cm those numbers
(data) provide ratio scale. Ratio scales are found primarily in physical variables.
1.5 Further Activity
a) Visit the world wide web and read introduction to educational statistics, tests and
measurement.
a) Explain why social scientists (e.g. educators, psychologists) need to have at least a
rudimentary knowledge of statistics?
b) Distinguish between descriptive statistics and inferential statistics.
c) State the 4 major levels of measurement (scale of measurement) and discuss their
characteristics.
d) Define the following terms:
o Variable
o Continuous variable
o Discrete variable.
o Measurement
1.7 Summary
Glass, G.V. & J.C. Stanley (1970) Statistical Methods in Education and Psychology.
New Jersey: Prentice-Hall.
Smith, G.M. (1970) A Simplified Guide to Statistics for psychology and Education
New York: Holt, Rinehart and Winston.
MACHAKOS UNIVERSITY
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
May, 2021
LECTURE TWO
Introduction
The topic focuses on use of tables and graphs to describe distributions of data (marks) for groups
of students or (subjects or individuals). Tabulation of data and presentation of distributions of data
using graphs
Objectives
Graphs and tables are used to describe distributions of marks (data) for groups of students.
The quantitative data of marks scored by students is a raw data. Before raw data can be understood
and interpreted, it is organized and summarized. Some of the commonly used procedures to
Page 2 of 12
organize and summarize raw data include frequency distributions (ungrouped and grouped data),
histograms, frequency polygons and ogives (cumulative frequency curves)
Prepare a frequency distribution table for ungrouped data using the raw scores given as follows:
11 9 5 16 16 16 4 9 5 7
4 10 4 4 15 15 5 5 11 18
8 16 12 11 17 3 3 5 3 7
2 11 6 4 18 1 9 2 2 15
5 10 9 8 7 7 2 5 13 1
Page 3 of 12
Grouped frequency distributions
When there is a wide range of data, the data may be condensed by setting up intervals which
contain a range of possible data. When data are grouped to form intervals of data called “class
intervals” the resulting frequency distribution is known as a grouped frequency distribution.
Examples:
1. The raw data given is a record of scores for students in statistics continuous assessment
test;
38, 68, 39, 55, 60, 61, 56, 49, 51, 35, 58, 48, 58, 47
65, 50, 52, 39, 53, 43, 42, 51, 62, 47, 55, 58, 54, 52
46, 65, 45, 55, 46, 42, 52, 34, 59, 53, 48, 48, 60, 50
Prepare a grouped frequency distribution table (inclusive) using a class interval of 5 units
The class intervals must include the highest and lowest score and each class should start with a
multiple of the class size e.g. if class interval is 5, using the raw data above the lowest class
interval will be 30-34 and highest class interval will be 65-69
Class boundaries include; 30-34, 35-39…, Lower class boundaries include 30, 35,…while Upper
class boundaries are 34, 39,.. Lower class limit include 29.5 and 34.5 and Upper class limit for
the lowest class interval are 34.5 and 39.5
Page 4 of 12
2. Prepare a grouped frequency distribution table ( inclusive) for the raw data in the table below
using a class interval of four units.
11 9 5 16 16 16 4 9 5 7
4 10 4 4 15 15 5 5 11 18
8 16 12 11 17 3 3 5 3 7
2 11 6 4 18 1 9 2 2 15
5 10 9 8 7 7 2 5 13 1
The Ordinary frequency distribution table does not give a very clear picture of the real situation
of the scores and is therefore supplemented with graphical representation of the same data.
Frequency distributions are presented graphically using histograms, frequency polygons and
Cumulative frequency curves ( Ogives)
Histogram
A histogram is a series of continuous (joined) columns or bars, each having as its base one class
interval and its height the number of cases or frequency in that class. The external boundaries of
a histogram are formed by two perpendicular lines i.e. the horizontal (x-axis) and vertical (y-
axis). To construct the histogram, lower and upper class limits (real limits) for each class interval
are used on the horizontal axis
Page 5 of 12
Example
Using the data on statistics test, draw a histogram
12
10
Series1
8 Series2
Series3
frequency
Series4
6
Series5
Series6
4 Series7
Series8
0
1
scores
Frequency Polygon
This is a line graph in which the horizontal axis contains the mid-points of the class intervals,
while the vertical axis contains the frequencies. Each frequency is plotted against corresponding
mid-point of the class interval. Allow one class interval below the first class interval and one
class interval after the last class interval. This is because the polygon must be closed (i.e. the
ends must touch the x-axis).
Page 6 of 12
Frequency polygon for statistics CAT
12
10
8
Frequency
0
24.4-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 49.5-54.5 54.5-59.5 59.5-64.5 64.5-69.5 69.5-74.5
Class-marks used for frequency polygons fall at the middle of the class interval. Also note the
frequency polygon has to be closed. This is done by considering an extra class-mark, i.e. the
next class-marks on the two ends of the class intervals as illustrated. The frequency of each of
these class intervals is 0
.
Cumulative frequency curve (Ogive)
Cumulative frequency refers to scores attained up to and including those in the class interval.
Cumulative frequency curve is a line graph constructed by plotting the cumulative frequencies
against upper class limits of the class intervals and joining the points by free-hand.
Page 7 of 12
Cumulative frequency curve for the statistics continuous assessment data
cumulative
frequency
1. Normal Curve
The normal curve is a bell-shaped symmetrical curve with the peak of the distribution at the
center and the tails of the distribution continually approaching but never touching the
horizontal axis (asymptotic). This kind of curve is a mathematical concept which is not realized
by any real data but plays an important role in statistical inferences.
2. Skewed Distributions
A distribution is said to be skewed if it has no symmetry i.e. to is asymmetrical. In skewed
distributions, the scores ‘trail off’ in one direction. Skewed distributions can either be
described as being positively skewed or negatively skewed. A distribution is said to be
negatively skewed if scores are relatively frequent in number towards the right hand end of the
scale. A distribution is said to be positively skewed if scores are frequent in number towards
the left hand side of the scale. See the illustrations below:
Page 8 of 12
Normal Curve Distribution
freq
Scores
Freq.
Freq.
Page 9 of 12
x
Scores
Kurtosis
Kurtosis refers to the flatness or peakedness of a distribution in relation to the normal curve. If one
distribution is more peaked than the normal curve, it is described as Leptokurtic. If a distribution
is less peaked than the normal curve, it is said to be Platykurtic. The normal distribution in relation
to the Leptokurtic and Platykurtic is described as Mesokurtic
Forms of Kurtosis
freq.
Leptokurtic
Mesokurtic
Platykurtic
Scores
Page 10 of 12
Summary
1. Before raw data can be understood and interpreted, it is usually necessary to organize and
summarize them in some meaningful way.
2. Procedures used to organize and summarize data include frequency distributions,
histograms, frequency polygons and ogives.
3. A frequency distribution is a tabulation of scores (or other attributes) of a group of
individuals to show the number of times each score occurs.
4. When dealing with a large number of scores, we use a grouped frequency distribution in,
which scores are grouped to form intervals of scores called ‘class intervals’.
5. In cumulative frequency distribution, we indicate the number of scores that are less or
greater than a given value.
6. A graph is a very effective method of representing data.
7. Three common methods of representing a distribution graphically are histogram, the
frequency polygon and the smooth curve (ogive).
8. The histogram is a series of columns or bars each having as its base one class interval and
its height the number of cases or frequency in that class.
9. Frequency polygons are similar to histograms but instead of columns, the midpoints at
the appropriate frequency of each class interval are joined by straight lines. The straight
lines are extended down to the vertical (X-axis) one class above and one class below to
create a many sided figure (polygon).
10. An ogive is a cumulative frequency curve and is constructed by plotting the cumulative
frequencies against the actual (real) upper limits of the class intervals.
11. Various forms of frequency distributions exist and these include the normal distribution,
negatively skewed distributions and positively skewed distribute
Further reading
Page 11 of 12
Self-Test Assessment
The following were the scores obtained by a form ii class in a mathematics test:
49 63 59 44 49 51 62 37 30 49 45 52 50 42 54 32 57
41 42 56 44 46 63 44 40 50 46 53 48 37 46 53 68 36
40 56 37 66 43 40 43 51 59 42 52 46 57
(a) Make an ungrouped frequency distribution table for this data. The table should
show both tally marks and frequencies. The total frequency (N) = 50.
(b) Make a grouped frequency distribution that should have both tally marks and
frequencies for each class interval. Use class size (i) = 5 and start with 30-34 as
the lowest class interval. Indicate the class-mark and actual limits for each class
interval. Indicate also the above as well as below cumulative frequencies.
(c) Plot (on graph paper) a histogram and frequency polygon for this data. Note
that the frequency polygon should be closed by extending the lines to X-axis as
emphasized in the text.
(d) Comment on the distribution of the scores (i.e. is their distribution close to
normal or are they skewed positively or negatively?).
1. Using the same data, repeat 1 b and c but now using class size i = 4 and starting with 30-
33 as the lowest class.
2. Select 30 of these scores randomly (one of best ways of selecting them randomly is to
write each score on small piece of paper. Put all these 50 folded pieces of paper in a box
and pick 30 after mixing all thoroughly). Using these 30 scores repeat no. 1 (a), (b) and
(c).
How does the distribution of the scores compare with the original distribution i.e. the
distribution of the 50 scores?
Page 12 of 12
MACHAKOS UNIVERSITY
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
MAY, 2021
LECTUER 3
MEASURES OF CENTRAL TENDENCY
Introduction
There is need to have concise ways of presenting (summarizing) information (data) rather than
by means of graphs or tables. Single numbers (indexes), such as mean, mode and median show
the general level of performance. These indexes are referred to as measures of central tendency
(or commonly called average).
The lecture covers;
i. Meaning of measures of central tendency
ii. Mode, median and arithmetic mean and their calculations
iii. Interpretation of forms of frequency distributions using mode, median and arithmetic
mean
iv. Summary and exercises for self test
Objectives
By the end of this lecture topic the learner should be able to:
1. Describe the three measures of central tendency (mode, median, mean)
2. Compute:
i) Mode for ungrouped and grouped data.
ii) Median for ungrouped and grouped data
iii) Mean for ungrouped and grouped data.
3. Describe unimodal, bimodal and multimodal distributions
4. Describe positively skewed, negatively skewed and normal distributions using the three measures
of central tendency.
5. Discuss the properties of the mean (e.g. what happens to the mean when a constant is added to
all the scores in the distribution).
8+9
Median= = 8.5
2
𝑁
𝑖 𝑁
Median = L + 𝑓 ( 2 − 𝐶)=L+ ( 𝑓2 − 𝑐𝑓𝑤)
Note
The median is a position average i.e. it is determined by placing the scores in rank order and establishing
the middle point. Thus the median is a position average that divides the distribution into two equal
halves such that one half is below it and the other half above it.
Arithmetic Mean
This is the sum of all scores in a distribution divided by the total number of the scores
It is the best known reliable measure of central tendency. Thus mean is simply found by adding all
the scores in a distribution and dividing by the total number of scores (N). It is denoted by X
pronounced X bar.
Where X = mean
Xi is the raw score for each individual i.e. the ith person’s score.
∑ is the summation sign indicating we are summing from the first score to the last
score i.e. all the X-scores in the distribution are added.
Example
N=9
Each class-mark or mid-point ( x i ) is multiplied by its corresponding frequency ( f i ). The products are
then summed up and divided by total frequency to give the mean.
i 1
f i xi fx i 1
i i
X = n
=
f
N
i
i 1
65-69 3 67 201
60-64 4 62 248
55-59 8 57 456
50-54 10 52 520
45-49 9 47 423
40-44 3 42 126
35-39 4 37 148
30-34 1 32 32
n n
i 1
f i =42 f x = 2154
i 1
i i
m
n
fx
i 1
i i
2154
X= n
= = 51.3
f
42
i
i 1
Properties of the mean
1. One important property of the mean is that is that it is the point in a distribution of scores such
that the summed deviations of scores from it (the mean) are equal to zero. What do we mean
by deviation? Deviation is the difference between a score and the mean, X i X , and it can be
either positive or negative. In any distribution the sum of deviations about the mean is always
equal to zero.
N
i.e. (X
i 1
i ) = 0 where μ is the population mean for X-scores and population has
N subjects.
n
(X
i 1
i X ) = 0 where X is the sample mean and sample is of size n.
For illustration, let us consider the following scores. Suppose our scores are 3, 3, 4, 5, 6, 6, 8, 9 and 10
(note this can be considered as a population or a sample without any change of the results). The
mean will be 6 and the deviation scores will be 3-6, 3-6, 4-6, 5-6, 6-6, 6-6, 8-6, 9-6 and 10-6, in
general X i X . These deviations will be respectively -3, -3, -2, -1, 0, 0, 2, 3 and 4 (note their
sum is zero). Thus, the mean may be considered as the exact balance point in a distribution.
2. If we add a constant, say C to every score in the distribution, the resulting scores will have a
mean X X C equal to the original mean X X plus the constant C. If we subtract a constant
instead, the resulting scores will a mean equal to original mean minus the constant. Note
that subtracting a constant C is the same as adding –C. Hence the first formula is adequate
or it includes even the second formula.
i.e. X X C = X X + C
X X C = X X - C
Mean
Mode
median
Median
Characteristics of the Normal Curve
A bell-shaped symmetrical curve with the peak of the distribution in the centre
Slopes on either side equal to each other 50% of the area on the left and 50% on the right
The mean of the distribution lies in the centre
The mean, mode and median are equal
In a positively skewed distribution, the mean is greater than the median and the median is greater
than the mode. For our example, this may mean that most students obtained low marks while there
were extremely few students who got high marks, a situation normally found when a test is too hard.
The positions of these measures of central tendency on a positively skewed curve is shown below:
Mean
Median
Mode
In a negatively skewed distribution, the mode is greater than the median and the median is greater than
the mean. This illustrates a situation where many students have obtained high marks while very few
students have got low marks. This may occur if the test was too easy for most students.
Mean
Mode
Median
Further Reading
Ingule, F. & Gatumu, H. (1996) Essentials of Educational Statistics.
Glass, G.V. & J.C. Stanley (1970) Statistical Methods in Education and Psychology.
Smith, G.M. (1970) A Simplified Guide to Statistics for psychology and Education
41 42 56 44 46 63 44 40 50 46 53 48 37 46 53 68 36
40 56 37 66 43 40 43 51 59 42 52 46 57
a) Compute the mean, median and mode for the ungrouped data (the ungrouped data was
obtained in the earlier exercise).
b) Compute the mean, median, modal interval and mode for grouped data also found earlier.
c) In terms of the magnitude of mean, median and mode, comment on the distribution of these
scores.
2. Compute the mean, median, modal interval and mode for the grouped data but now using the size
of the class interval as 3, and start with 30-32 as the lowest class interval as done earlier.
MACHAKOS UNIVERSITY
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
Copyright © Machakos University, 2021
All Rights Reserved
MAY, 2021
LECTURE 5
MEASURES OF VARIABILITY
Introduction
The measures considered in this lecture unit are range, quartile deviation, mean deviation, variance
and standard deviation. Range is the simplest measure of variability (or dispersion) while Standard
deviation is the most reliable measure of variability
Objectives
By the end of the lecture unit, the trainee should be able to:
i. Compute range, quartile deviation, mean deviation, variance and standard deviation for
grouped and ungrouped data using computational and definitional formulae.
ii. Explain the properties of standard deviation (s.d.) (e.g. when a constant is added to all
the scores of the distribution)
iii. Interpret computed values of range, mean deviation, quartile deviation and standard
deviation.
NB: Range and Q.D do not take into consideration each individual score hence not stable mean
of variability
Mean Deviation ( M. D)
Mean deviation is a stable measure of variability because it considers the spread of each
individual score from the most reliable measure of central tendency, i.e. the mean.
X
i 1
i X
M. D =
n
Example
Calculate the M.D of the following scores
3, 3, 4, 5, 6, 6, 8, 9, 10
3+3+4+5+6+6+8+9+10 54
Mean = = =6
9
9
X i 1
i X
Mean deviation =
n
3+3+2+1+2+3+4
M.D = 9
18
,, = 9
,, =2
A large value of mean deviation indicates a greater spread in the values of the distribution while
a small value indicates that the values are close in in terms of variability
(X
i 1
i X )2
S= i.e. S = √ (variance)
n
Example
Use the scores 3, 4, 5, 5, 6, 8, 9and 10 to compute variance and the standard deviation
Xi Xi X (X i X )2
3 -3.25 10.5625
4 -2.25 5.0625
5 -1.25 1.5625
5 -1.25 1.5625
6 -0.25 0.0625
8 1.75 3.0625
9 2.75 7.5625
10 3.75 14.0625
43.500
(X
i 1
i X )2
Variance, S x2 =
n
43.5
=
8
= 5.4375
Standard deviation, S x = 5.4375
= 2.33
Standard deviation indicates how the scores (or any other variables) are spread. The bigger the
magnitude of standard deviation, the bigger is the spread of the scores. The smaller the
magnitude of the standard deviation, the smaller is the spread of
Summary
The following statements summarize the major pints of this lecture topic
1. The range, variance and standard deviation are measures of variability.
They give an indication of the spread of scores in a distribution.
2. The range is defined as the difference between the highest and the lowest scores in a
distribution.
3. Variance is obtained by dividing the sum of squared deviations by the total number of
observations in the distribution.
4. The standard deviation is the square root of variance.
5. The bigger the standard deviation, the bigger the spread of scores and the more
heterogeneous the group is on which the scores are based.
6. Adding a constant to every score in the distribution has no effect on variance or standard
deviation.
When every score in a distribution is multiplied by a constant, the new variance is the original
variance times the constant squared. Self Test
3.
Class interval
Frequency
65-69 3
60-64 4
55-59 8
50-54 10
45-49 12
40-44 6
35-39 5
30-34 2
N =50
Using the above data and changing the scale of the class-mark by appropriate manipulation (i.e.
using assumed mean method).
(a) Compute:
(i) The mean
(ii) The variance and standard deviation.
(b) Repeat using the above data, but omitting the top class interval and the bottom class
interval, (N = 45).
(c) Double the frequency of the data and determine how the median and mean,
variance and standard deviation are affected.
(d) Double only the last frequency of the data and determine how the median, mean,
variance and standard deviation are affected.
.
4 The following scores were obtained by 30 Form I students in a Kiswahili test:
46 31 18 39 40 38
37 19 15 26 14 37
24 41 18 19 21 25
31 10 20 21 32 46
20 30 32 27 31 37
a) Use the above scores to prepare a grouped frequency distribution
using a class interval size 5 and starting with 10-14 as your lowest
class interval.
b) Basing on your grouping in a) above, prepare a complete frequency
distribution table for grouped data having the following columns:
i) Class interval
ii) Tally marks
iii) Frequency
iv) Real (or exact) class limits
v) Classmark (midpoints)
vi) Cumulative frequencies below (less than)
Cumulative frequencies above (more than)
c) For the grouped data, calculate the following:
i) Mode ii) Median iii) Mean
d) Determine the range for the grouped data.
e) Calculate mean deviation for the grouped data.
f) Compute the variance and standard deviation for the grouped data.
g) i) Comment on the performance in this Kiswahili test
using above information.
ii) Describe fully the shape of the distribution basing your
answers to part (c).
7.
Further Reading
WRITTEN BY:
ROSEMARY MULE
JULY, 2021
LECTURE 8: INTERPRETATION OF SCORES
Introduction
In norm-referenced tests (NRT) where we consider the relative position of a score,
transformation of scores into percentile ranks, standard scores (z-value), standardized scores or
normalized scores are justifiable. Such manipulation (transformation) is not justifiable in
criterion-referenced tests (CRT). A score in CRT is meaningful identity conferring information
about how a candidate has mastered what is being tested.
Objectives
Interpretation of Scores
A test mark or score is sum of marks of the correctly answered questions. Such score is called
the row score of an individual in a particular test Raw scores may not be comparable across tests
e.g. A raw score of 70 on test I and a raw score of 50 on Test II. It is not easy to assess relative
performance on the two tests if the distributions for the two tests have quite different shapes.
It is not easy to interpret a raw score in that a raw score does not give any information about an
examinee’s performance. Raw scores are transformed to interpretable scores using percentiles,
standard and standardized scores, and normalized scores.
Percentile Ranks
The percentile rank or (percentile score) is defined as the percentage of scores, which fall at or below
a given score. For example, if a percentile rank for a score is 85, it means that 85 per cent of the scores
in the total distribution fall at or below the score. The formula used for computing the percentile rank
is:
fw
(cf b )100
Percentile rank = 2
N
Where
cfb = cumulative frequency (below) for the interval immediately below the interval
N = the total number of subjects in the distribution i.e. the total number of examinees.
Let us consider the scores distribution in the table below. For each score value , the frequency (number)
of the examinees obtaining the score appears in the second column. The third column contains the
cumulative frequency for each score value, which is the number of examinees who have score less than
or equal to each score value. Let us use this observed-test-score distribution to estimate the percentile
rank for the score value, 5 (that is the percentile rank for the individual who gets 5).
11 1 10 95
10 1 9 85
8 1 8 75
7 2 7 60
6 1 5 45
5 2 4 30
4 1 2 15
3 1 1 5
fw
(cf b )100
Percentile rank for score value 5 = 2
N
2
(2 )100
= 2
10
= 30
The percentile ranks for other scores in the distribution are provided in the fourth column. Under the
definition, the percentile rank of a score is always less than 100 and greater than zero (i.e. 0 < percentile
rank < 100).
Standard Scores
A standard score indicates the relative position of a score in a distribution in terms of the
number of standard deviations from the mean. To get a standard score corresponding to any
raw score, the mean of the raw scores is subtracted from the raw score, and the result is
divided by the standard deviation of the distribution (or the raw scores).
Standard scores are also called z score or z values. Therefore:
x
z=
or
xx
z= where μ or x is the mean of distribution while s or σ is standard deviation of the
s
distribution under consideration, as observed earlier.
A 3 -7 -1.11
B 6 -4 -.63
C 7 -3 -.47
D 9 -1 -.16
E 15 5 .79
F 20 10 1.58
Converting scores to standard scores using the formula above automatically puts the transformed scores
(standard scores) into a new scale with a mean of zero (0) and a standard deviation (s.d.) of one (1). Each
transformed score indicates how many standard deviations the raw score lies from mean. For instance, a
standard score 1.5 (z = 1.5) indicates that the corresponding raw score lies 1.5 standard deviations above
the mean. If the standard score is equal to –1.5 (i.e. z = -1.5), the corresponding raw score lies 1.5 standard
deviations below the mean.
Converting raw scores to standard scores (z scores) has no effect on the shape of the distribution. If the
original distribution of the raw scores is skewed to the right, the distribution of corresponding standard
scores will also be skewed to the right. If the original raw-score distribution is normal, the distribution of
the corresponding standard scores will also be normal.
Note that percentile ranks are ordinal measures while standard scores are interval measures.
Standard scores may not be easily interpreted by ordinary people who have no knowledge of
what the mean and standard deviation are. Standard scores may be positive or negative
Standardized Scores
Standardized scores are linear transformation of raw scores, but unlike the standard scores, they are
always expressed as whole numbers and are non-negative. Any set of standard scores can be transformed
to an arbitrary mean, μs, and standard deviation, σs, by applying the formula:
Y = μ s + σs z
Where z is the standard score and Y is the standardized score. An example of standardized scores is the
T score (commonly referred to as ‘linear T scores’). The T score has a mean of 50 and standard deviation
of 10. The formula for a T score is:
T = 50 + 10z
To obtain a T score, the z score is multiplied by 10 and 50 is added to the product. If we start with a raw
score, to obtain the equivalent T score; calculate the value of z for the raw score, multiply the z value by
10 and then add 50. These operations are summarized in the formula:
( xi x )
T= 10 + 50
s
( xi )
Or T= 10 + 50
T scores are always whole numbers, and if the value obtained is not a whole number it has to be rounded
to the nearest whole number. T scores are also non-negative and usually greater than zero.
It can be noted from the formula above that standardized scores are linear transformation of standard
scores since the means and standard deviations of the standardized scores (50 and 10 respectively for
linear T scores) are the constants we apply to obtain standardized scores. Since standard scores are a
linear transformation of raw scores, it means that standardized scores are also a linear transformation of
raw scores
Normalization
A raw-score distribution or a distribution obtained from a linear transformation rarely has an exact
statistical meaning. Raw scores or their linear transformation distributions are changed so as to obtain a
normal distribution of scores by performing normalization. All normalized scores have a normal
distribution. On a normalized distribution, every score has a concise statistical meaning as a result. The
percentage of individuals above and below each score is known exactly on a scale with a known mean and
standard deviation unit of measurement.
Normalization involves forcing the distribution of transformed scores to be as close as possible to a normal
distribution by smoothing out, stretching, or condensing irregularities and departures from normality in
the raw-score distribution.
T scores are an example of normalized scores. Normalized scores are whole numbers.
SUMMARY
The principal ideas, conclusions, implications presented in this unit are summarized in the
following statements:
1. Raw scores can be interpreted in a meaningful manner after conversion into
transformed scores in norm-referenced tests.
2. Common forms of expressing transformed scores are percentiles, standard and
standardized scores and normalized scores
3. The percentile is defined as the percentage of scores falling at or below a given score.
The primary advantages of percentile are that they are straightforward to calculate
and that they are easy to interpret.
4. To get a standard score corresponding to any raw score, the mean of the raw score is
subtracted from the raw score and the result is divided by standard deviation of the
distribution. Disadvantage of standard score is that they are often expressed in
negative form and decimals.
5. Standardized scores are linear transformations of standard scores and are always
expressed as whole numbers and are non-negative. Linear T scores are examples of
standardized scores and they (linear T scores) have a mean of 50 and standard
deviation of 10.
6. The transformation to normalized scores involves forcing the distribution of
transformed scores to be as close as possible to a normal distribution by smoothing
out irregularities and departures from normality in the raw-score distribution
REFERENCES
Brown, F.G. (1970) Principles of educational and psychological testing. 2nd ed.
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Self Test
The following is data for 3 students on three tests. Along with these tests scores, the mean ( X i
) and standard deviation (Si) for the scores are given:
(i) By converting X13 in Biology and X33 in physics into z scores, find out whether student
3 had done better in physics or biology? What assumptions are you making here for
the comparison to be justified?
(ii) What is the mean z score for student 1 on all the three tests?
(iii) What is the mean linear T score for student 2 on all the three tests?
1. Discuss why raw scores have to be converted to standard scores and normalized scores?
2. Distinguish between standardized scores and normalized scores. When are standardized and
normalized equal?
3. (i) (a) Define the term percentile rank.
(b) Define term percentile point
(ii) Given the following distribution:
score (Xi ) 1 2 3 4 5 6 7 8 9
-----------------------------------------------------------
frequency (fi ) 1 3 5 9 12 22 23 16 9
4. Given that a raw score distribution on a given test is normal with mean 48 and variance 4.
Complete the table below using this information on relevant values of z scores, percentile ranks,
T scores.
rank
---------------------------------------------------------------------
52
50
48
44
MACHAKOS UNIVERSITY
CENTER OF OPEN, DISTANCE AND e-LEARNING
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
Copyright © Machakos University, 2021
All Rights Reserved
JUNE, 2021
LECTURE 6
MEASURES OF RELATIONSHIP
Introduction
The relationship or association between two variables is an important concept in research or
any studies. It can help in prediction, given one variable and not the other, and if their
relationship is known and is high enough to allow prediction.
Objectives
By the end of the lecture, the learner should be able to:
1. Explain two methods of studying the relationship, one requiring stringent requirement
(assumptions) while the other not so stringent requirements.
2. Compute, given two sets of data for a group, the two indexes (measures) of relationship i.e.
Pearson product moment correlation coefficient and Spearman rank order correlation
coefficient.
3. Interpret the computed value of the relationship
4. Draw a scatter diagram (also called scatter-plot or scatter-gram) and describe relationship it
portrays in simple terms.
5. Give the properties of the indices e.g. what happens to the relationship index when the scores
are linearly transformed.
Measures of Relationship
Measures of relationship show the relationship between two variables and the strength of the
relationship. To show the relationship between two variables, the following methods are used
(i) Scatter diagram or scatter graph
(ii) Correction coefficient
Scatter Diagram
A scatter diagram is a graph of data points that show a relationship between two variables
Example
The table below shows the scores of two subjects in Maths and Physics for 6 form II sections
Maths scores (x) 42 54 66 78 100 120
Physics scores (y) 81 45 55 42 97 77
Plot a scatter diagram for the data
Scatter diagram showing relationship between maths and science scores for case II
100
90
80
70
60
Science scores
50
40
30
20
10
0
0 20 40 60 80 100 120 140
Maths scores
Case 2
Maths 42 54 66 78 100 120
Science 81 88 93 99 109 125
Scatter Diagram
Scatter diagram showing relationship between maths and science scores for case I
140
120
100
Science scores
80
60
40
20
0
0 20 40 60 80 100 120 140
Maths scores
Case I
Scatter diagram shows there is no systematic relationship between the two variables
Case II
Scatter diagram shows that there is a pattern of the points. The patterns suggest a highly positive
relationship i.e. as Maths scores increase, there is a corresponding increase in science score.
Note: A scatter diagram does not provide a precise measure of relationship
The following methods provide a precise measure of relationship, covariance, Pearson Product
Moment Correlation Coefficient and Spearman Rank Correlation Coefficient
Covariance
This is a measure of relationship and it shows the degree of relationship between two variables
by use of a simple averaging procedure
∑(𝑥−𝑥)(𝑦−𝑦)
i.e Covariance (𝑥, 𝑦) = 𝑛−1
= 𝑦 = 50
20 20
Cov (𝑥, 𝑦) = 3−1 = = 10
2
𝑛∑xy − (∑x)(∑y)
𝑟𝑥𝑦 =
√[𝑛∑𝑥 2 − (∑x)2 ][𝑛∑𝑦 2 − (∑y)2 ]
57100=150x145
= [5x6022−(150)2 ][5x5475−(145)2 ]
28550 − 21750
=
√[30110 − 22500][27375 − (21025]
6800
=
√7610 X √6350
6800
= 87.235𝑋79.69
6800
= = 0.978
6951.76
Interpretation
There is a very strong positive relationship between the two variable x and y
Spearman Rank Order Correlation Coefficient
Denoted by r1 because it is an approximation of rxy
6∑𝐷 2
r1 = 1-𝑁(𝑁2 −1)
Where
D = difference in ranks for each pair of scores
N = Number of pairs of scores
r1 is based in the ranks of score and not the scores
6∑𝐷 2
r1 = 1-𝑁(𝑁2 −1)
6x4
= 1-5 x 24
24 1
= 1- 5 x 24 = 1 − 5 = 1 − 0.2 = 0.8
The ρ (rho) is interpreted in the same way as rxy. The value of rho can never be less than –1 nor greater
than +1. It equals to +1, only if each person has exactly the same ranks on both X and Y. It is –1, if there
are no ties and the order is completely reversed for the two variables such that the first is the last in the
other variables and so forth.
Note
1. Although the Spearman correlation coefficient formula is simpler and does not look much like
the computational formula we used for Pearson correlation coefficient, it is algebraically
equivalent to the Pearson when it is used with ranked data instead of the interval data.
2. Tie places are easily handled by assigning the mean value of ranks to each of the tie holders.
3. If a very large number of ties occur, however you would probably be wise to reconsider the use
of Spearman (rho) coefficient, other non-parametric methods such as Kendall’s tau or chi-square
may be more appropriate.
4. Ranking can be done from the smallest or largest and so forth as long as you stick to the
convention you use to the end.
5. if there are no ties in the data, Spearman coefficient is merely what one obtains by replacing the
observations by their ranks and then computing Pearson product moment correlation
coefficient of ranks.
Summary
1. When two measures are related, the term correlation is used to describe this fact.
2. Correlation has two distinctions: correlation that merely describes presence or absence of
relationship and correlation, which shows the degree of magnitude of relationship.
3. A study of correlation to determine presence or absence of relation can be done through logical
examination of data and examination of scatter diagrams. Methods used to provide indices of
the magnitude of relationship include covariance, Pearson product-moment correlation
coefficient and Spearman rank-order correlation coefficient.
4. The measure of correlation assumes only values between –1 and +1.
5. If the larger values (scores) of X tend to be paired with larger values (scores) of Y, and hence the
smaller values (scores) of X and Y tend to be paired together, then the measure of correlation
should be positive and close to +1. If the tendency is strong, then we would speak of a positive
correlation between X and Y.
6. If the large values of X tend to be paired with the smaller values of Y, and vice versa, then the
measure of correlation should be negative and close to –1. If the tendency is strong, then we
say that X and Y are negatively correlated.
7. If the values of X seem to be randomly paired with the values of Y, the measure of correlation
should be fairly close to zero. We then say that X and Y are uncorrelated, or have no correlation
or have correlation zero or are independent.
8. Adding or multiplying every score in two distributions with a constant has no effect on the size
of the correla
9. In order to use rxy, the relationship between the two variables should be linear, the two
distributions must be similar, the variance of the two distributions should be identical
(homoscedastic) and data should be based on interval scale of measurement.
10. When measure is based on ordinal data, the Spearman rank order correlation coefficient, ρ
(rho), should be used. The Spearman rank order correlation coefficient can be interpreted in the
same way
Further Reading
Glass, G.V. & J.C. Stanley (1970) Statistical Methods in Education and Psychology.
Smith, G.M. (1970) A Simplified Guide to Statistics for psychology and Education
Self Test
1. The following scores were obtained when a group of 11 students were tested on two tests, test
A and test B
1 2 2
2 2 3
3 4 4
4 5 4
5 3 5
6 6 5
7 4 6
8 5 6
9 6 7
10 8 8
11 7 9
(a) Plot a scatter diagram for the above data (use graph paper).
(b) Compute the Pearson product moment correlation coefficient, rxy between tests A
and B for this group of 11 examinees.
(c) Interpret your computed value of rxy.
(d) State the assumption underlying this correlation analysis.
(e) Compute the Spearman correlation coefficient, ρ (rho), for the above data.
(f) What are the major differences between these two measures of relationship (i.e.
between Pearson and Spearman correlation coefficients)?
2. Suppose the following were scores of a small class in two tests, test A and test B. Test A is taken
as variable X while test B is taken as variable Y.
John 5 4
Mary 6 6
Peter 5 5
Ali 3 2
Juma 2 3
James 3 4
(b) By means of a scattergram, say the kind of relationship between X and Y in the
above data.
MACHAKOS UNIVERSITY
CENTER OF OPEN, DISTANCE AND e-LEARNING
IN COLLABORATION WITH
SCHOOL OF EDUCATION
DEPARTMENT OF PSY/SNE
WRITTEN BY:
ROSEMARY MULE
Copyright © Machakos University, 2021
All Rights Reserved
JULY, 2021
LECTUR 11:
QUALITY OF A TEST
Introduction
The two important qualities of a test (measurement or instrument) are reliability and validity.
We consider reliability first, before validity though, validity is more important than reliability.
Objectives
By the end of the lecture, the learner should be able to:
1. Define reliability
2. Describe the 3 methods commonly used for estimating reliability.
3. Explain factors that may influence (increase or decrease) reliability coefficient of a test (or
instrument).
4. Define validity
5. Differentiate between reliability and validity
6. State the 3 kinds of validity.
7. Distinguish among all the 3 kinds of validity namely content validity, criterion-related validity
and construct validity.
Quality of a test
There are certain qualities that every measurement device (test or questionnaire) should possess.
The measurement (or test) should be:
Reliable
Valid and
Scored accurately and objectively
Reliability of a Test
Reliability refers to the accuracy of the measurement (scores) provided by a test. It is the degree
of consistency between two measures of the same kind (test). A test must therefore measure
consistently if it is going to be reliable i.e an individual should obtain approximately the same
mark on another administration of the same test. The degree of consistency of a test is referred to
as reliability coefficient of a test and is calculated using Pearson product moment correlation
coefficient (r)
2 𝑥 0.8 1.6
𝑟𝑥𝑥 = = = 0.888
1+0.8 1.8
2. Content Validity
Refers to how adequately a test is related to a specific field of study or content as per the relevant
domain.
It is the extent to which the sum of items is representative of the total population e.g. topics from
which test items should be sampled out
3. Criterion Related Validity
Criterion-related validity refers to an empirical study between the scores of a test and an
external criteria variable or measure. It is used when the scores of a test can be related to a
criterion measure. The criterion refers to some behavior that the scores of a test are used to
predict e.g. KCSE grades are a predictor variable while job effectiveness or performance in
university are the criterion variables.
There are two types of criterion-related validity
Concurrent validity
Predictive Validity
These two only differ in regard to time i.e
In concurrent validity, the criterion data is gathered at the same time as test scores while in
predictive validity, the criterion data is gathered at a later date after the test scores to predict
future behavior. Predictive validity is used e.g. if KCSE results are used to predict performance
in the first year university examinations, then KCSE constitute the test and university exams
provide the criteria. In Concurrent validity, the purpose is to determine whether a test can be
substituted for another test.
4. Construct Validity
A test construct validity is the degree to which the test measures the theoretical construct or
trait that it was designed to measure. A construct refers to a factor or trait and is any domain
of knowledge e.g. verbal ability, mathematical ability. Any skill or any ability can be
regarded as a construct. These skills or abilities cannot be measured directly. To measure
them we need to define them and then test them
Summary
Reliability has to do with consistency. Unless a test measures consistently, it cannot be reliable.
A test is reliable if its observed scores are highly correlated with its true scores
We explored three commonly used methods for estimating reliability coefficient:
1. Equivalent form:
Reliability coefficient is the correlation between observed scores on two equivalent tests
(also called parallel or alternate).
2. Test retest
This is testing the same examinees twice with the same test on different times or days, then
correlating the scores from the two administration of test
3. Internal consistency or split half and correcting for full test using Spearman-Brown
prophecy formula.
4. Validity of a test refers to the ability of the test to measure what it purports to measure.
Face validity
Content validity
Criterion related validity
Construct validity
Further reading:
Allen M.J. & Yen W.M. (1976) Introduction to Measurement Theory.
Brown, F.G. (1970) Principles of educational and psychological testing. 2nd ed.
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Self test
1. a) Name three test properties that influence test reliability
b) Discuss how these properties can be manipulated to increase test reliability
3. Distinguish between:
Further reading:
Allen M.J. & Yen W.M. (1976) Introduction to Measurement Theory.
Brown, F.G. (1970) Principles of educational and psychological testing. 2nd ed.
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Statistical analysis of test items is known as item analysis and deals with difficulty of item and
discrimination index of item.
Objectives
1. Define:
a) Item analysis
b) Difficulty index of an item
c) Discrimination index of an item.
2. Determine difficulty index and discrimination index of a test item
3. State the desirable limits for
a) Difficulty index
b) Discrimination index
Item Analysis
A test is only as good as the items it contains. Thus, when constructing a test, we must be
concerned with the quality of the items. When evaluating the quality of the items, various
criteria are used;
1. An item should measure the knowledge or skill it is designed to measure. This is
validity or soundness of an item.
2. We should also be concerned about the quality of expression. Items (or questions)
must be clearly written, grammatical and at the appropriate reading level. Thus you
should check whether, all serious learners (but not giving unfair hints) understand the
questions.
3. The statistical characteristics (analysis) of the item, which would be a topic of
discussion here. Others two above have already been discussed.
Statistical analysis of test items is what we refer to as item analysis. The item analysis helps an
examiner to:
i. Judge the quality of the item. Thus the examiner can identify good or poor items.
The other major use of item analysis is to
ii. Identify knowledge and skills examinees have and have not mastered, if an item in
a classroom test is answered incorrectly by a majority of the students, this
information tells the teacher something is wrong. Unfortunately without further
investigation, it does not tell her/him what went wrong. The item may have been
misleading or poorly constructed, the material have been so difficult that students
were not able to learn it, or the instruction may have been incomplete. Only
further analyses would tell which is the most likely or the true explanation.
Specifically, most item analyses are concerned with three aspects of an item;
a) .Difficulty of item, which is nothing else but the proportion of examinee who
answer an item correctly and it is referred to as difficult index.
b) Discrimination power of the item or else called discrimination index, is concerned
with whether the item differentiates between people with varying degree of
knowledge or ability.
c) Content validity as well as the effect of distracters.
Note: If the difficult index is low the validity may still be okay, but if the discrimination index is
negative, for an item, the item may not be measuring what it is supposed to measure (not
valid). It may be measuring something else but definitely not ability.
The item analysis is important for item (or question) bank. It is ridiculous for teachers to have to
write new items (questions) every time they prepare a test. Over time, they should have built a
test file of the better items to be reused. Item analyses help in this line for you are able to
know good and bad items and bad items (questions) have to be discarded or be improved.
Note that item analysis is best done in multiple-choice items
Example;
Consider a group of 40 examinees and suppose they responded as below to this item:
A B* C D E Omit
Upper 0 20 0 0 0 0
group
Lower 3 8 4 2 3 0
group
Asterisk indicates the correct answer (or key). B is the key. Others are distracters (the ones which are
incorrect).
For each item, compute the percentage or proportion, which gets (or who get) the item correct. This is
what is called item difficulty index. Thus item difficulty index can be expressed as a decimal, fraction or
percentage. Thus range is from 0-1 (for decimal or fraction) or 0-100% (for percentage). Item difficulty
index is denoted by p.
An item with a difficulty index of 0.3 is more difficult than an item with a difficulty index of 0.8. Why?
The index is quite useful in item analysis. If p for an item is very close to 0 or 1, the item generally should
be altered or discarded, because it is not giving any information about differences among examinees’ trait
levels or abilities.
Acceptable p is 0.3-0.7 for it maximizes the information the test provides about differences among the
examinees. Note: You can have simple items early in the test for motivational reasons.
Examiner should not forget the purpose of the test. A test used to select graduate students for a university
that admits about 10% of the applicants should contain extremely difficult items. A test used to select
children for a remedial education program should contain very easy items. By this time you should have
realized even objective (multiple-choice) tests can be used for any level of education (even for Ph.D.
programs).
Item discrimination index is obtained by subtracting the number of students in the lower group who
answered the item correctly from the number in the upper group who got the item right, and dividing this
by the number of students in either group. That is, half of the total number of students when we divide
the group into upper and lower halves. In our example:
RU RL 20 8 12
Discrimination index = =1 = =0.6
2 40
1
2T
20
This value is usually expressed as a decimal and can range –1.00 to +1.00.
If it has a positive value, the item has a positive discrimination. This means that a large proportion of the
more knowledgeable students than poor students got the item right. If the value is zero, the item has
zero discrimination. This can occur
In general, discrimination index of 0.40 is regarded as satisfactory. However, one should not automatically
conclude that because an item has a low discrimination index, it is a poor item and should be discarded.
Items with low or negative discrimination indices should be identified for more careful examination.
Those with low, but positive, discrimination indices should be kept (especially for mastery tests). As long
as an item discriminates in a positive fashion, it is making some contribution to valid measurement of the
students’ competencies. And as long as we need some easy items to instill proper motivation in the
examinees, such items are valuable.
Further reading
Allen M.J. & Yen W.M. (1976) Introduction to Measurement Theory.
Brown, F.G. (1970) Principles of educational and psychological testing. 2nd ed.
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Exercise:
i) What properties are desirable for item difficulty indices and item-discrimination
indices?
ii) Generally why is an item difficulty index of 0.01or 0.99 undesirable?
iii) Why is a negative or zero item discrimination index undesirable?
2. Use the information below on an analysis of four test items to answer the questions which
follow:
Item
1 A* B C D OMIT TOTAL
__________ _______
Upper group 16 2 1 1 0 20
Lower group 6 6 5 3 0 20
2 A B C* D OMIT TOTAL
__________ _______
Upper group 2 4 10 4 0 20
Lower group 4 5 6 5 0 20
3 A B* C D OMIT TOTAL
__________ _______
Upper group 2 16 2 0 0 20
Lower group 7 8 5 0 0 20
4 A B C D* OMIT TOTAL
__________ _______
Upper group 2 1 1 16 0 20
Lower group 1 1 0 18 0 20
Introduction
A test is a device for measuring psychological variables. The measuring device as we have
found has to be of great reliability as well as being of great validity.
Objectives
We shall talk about tests before looking at validity. Psychological variables or characteristics are
best measured by psychological tests.
Definition of ‘test’:
Measurement answers the question, how much? That is, measurement provides a description of
a person’s (examinee’s) performance, it does not provide judgment; that is, it says nothing about
the worth or value of the performance. If we put value or worth or judgment on it then we are
evaluating. We are going beyond description. We are attempting to answer the question how
good? This is evaluation. A mark or score like 40 out of 50 is measurement. If we say it is B+
then this is an evaluation, since judgment has been made on the value of the mark or score in
terms of how good. That is, objective description here is a measurement, while subjective
judgment of quality is an evaluation.
Uses of tests:
Further reading:
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Psychology. New York: Holt, Rinehart and Winston.
Exercise:
1. Discuss five uses of tests
2. Explain why evaluation is important in a program.
3. Distinguish among test, measurement and examination.
Classification of tests
Introduction
There are several ways in which we can classify tests. The purpose of classifying is to group
together that which have similar properties by large, otherwise tests are unique in their own
rights.
Objectives
Classification of tests:
Essay type:
Essay questions are subdivided into three major types
1. Extended (or Discussion) response
2. Restricted response
3. Oral.
Extended response:
This also referred to as discussion type. Here the question is very much open ended (unstructured).
No restriction is given. Most university questions in many departments are of this type. Example:
a. Discuss what is a system
b. Discuss what is a scientific method (approach)
c. Discuss the Information Processing Model of
(Memory).
Restricted response:
Here the student (examinee) is more limited in the form and scope of his answer because he is told
specifically the context that his answer is to take.
Example:
d. Give the three advantages and three
disadvantages of Essay tests
e. Give the three advantages and three
disadvantages of multiple-choice items
f. Distinguish between Classical conditioning and
Operant conditioning
g. Distinguish among aptitude tests, achievement
tests and ability tests
(v) Distinguish among Memory, Learning and
Insight.
We can refer to these as short answer essay tests.
Oral examination:
Also is called viva, viva voce or defence. Usually done after writing a dissertation or a thesis for
an advance degree, a masters or doctorate degree. Essentially, it is to find out how well the
candidate has linked theory and practice in solution to his problem, and very important to see
whether indeed he/she is the one who wrote that thesis or dissertation.
Objective type:
Objective type item can be subdivided into four major types:
1. Short-answer
i. Single word, symbol, formula
ii. Multiple words or phrase
2. True-false (right-wrong, Yes-No)- dichotomous case
3. Multiple choice
4. Matching.
Variations of the multiple-choice format:
There are four frequent forms:
i. One correct answer
ii. Best answer
iii. Analogy type
iv. Reverse type
Others are substitution, incomplete (blank to fill) etc.
Test takers (examinees) attempt to make the highest possible score. The goal is to measure the
upper limits of examinee’s abilities. Classroom tests are in this category and are example of
achievement tests. Others in this category of maximal performance tests are aptitude tests and
ability tests. Note these are not mutually exclusive. A particular test may serve more than one
of these purposes.
Aptitude tests:
This is a test for giving your potentialities or what you are capable of doing from your formal or
informal experiences. Thus aptitude tests indicate the probability that certain other behaviours
will be acquired or learned. We consider a test to be aptitude test if:
It measures the results of general and incidental learning experiences.
Its frame of reference is toward the future.
This is in contrast to an achievement test, which measures learning from relatively specific
experiences, and focuses on the past learning. Thus aptitude test predict what can be learned in
the future. Thus it measures the ability to acquire certain behaviours or skills given appropriate
opportunity.
Ability tests:
Indicate the power to perform a task. Ability tests measure present status. In this category we
have performance test (or practical test) like driving or tuning an engine, playing piano and
swimming.
The distinction between the two is on how test scores are interpreted. In norm-referenced tests, an
individual’s scores are interpreted by comparing them to those of other people in some comparison
(peer or norm) group. In criterion-referenced tests, the concern is mastery of the content
regardless of the performance of the other examinees. In norm-referenced tests scores usually may
need to be transformed for comparability or meaningful interpretation but in criterion-referenced
tests, scores indicate proficiency or mastery or competency.
Thus a score in criterion referenced test is a meaningful number representing the level of mastery,
while in norm referenced test a (raw) score may not carry much meaning unless it is converted to
a standard scale to show its relative position compared to other scores. In other words,
transformation such as conversion to Percentile ranks, standardization and normalization are
justifiable among norm-referenced tests. A score in criterion-referenced test is a meaningful score
and should not be subjected to such transformation.
The difference between criterion-referenced tests and norm-referenced tests is just theoretical. The
number 1 student (or candidate) can be seen as the criterion or standard, thus having realized the
perfect score or has mastered perfectly. If all have mastered equally they all would be number 1.
But we know this is not the case in practice. In other words, is norm-referenced tests and criterion-
referenced tests are more theoretically different than they are practically.
From another perspective, we should realize that when we are ranking or finding position of a
candidate, we do this according to their performance (mastery). In both, we are talking about
mastery, of course from a different angle. As long as we are concerned about mastery then both
type of tests (criterion referenced and norm referenced tests) are going to have more in common
to an extent that they are hardly different. They end up doing the same thing. Practically they are
the same but theoretically or philosophically different. You do not talk of transforming score in
criterion-referenced tests, only in norm-referenced tests. In criterion-referenced tests, a score is
meaningful.
Further reading:
Brown, F.G. (1970) Principles of educational and psychological testing. 2nd ed.
New York: Holt, Rinehart & Winston.
Mehrens, W.A. & Lehmann, I.J. (1978) Measurement and Evaluation in Education and
Psychology. New York: Holt, Rinehart and Winston.
Exercises
1. Describe
i. Typical performance tests
ii. Maximal performance tests.
2. Distinguish among achievement tests, ability tests and aptitude tests
3. Distinguish between norm referenced test and criterion referenced test.