CHAPTER 7:
CHARACTERISTICS OF
A GOOD TEST
Learning outcomes
At the of the chapter, you must have:
1. enumerated the different ways of establishing validity and
reliability of different assessment tools
2. identified the different factors affecting the validity and
reliability of the test
3. computed and interpreted the validity and reliability
coefficient
LESSON 1. VALIDITY
A good test must first of all be valid. Validity refers to
the extent to which a test measures what it purports to
measure. This is related to the purpose of the test. If the
purpose of the test is to the determine the competency in
adding two-digit numbers, then the test items will be
about n of these two-digit numbers. Thus, if the objective
matches the test addition items prepared, the test is said
to be valid.
Different ways of establishing
validity
FACE VALIDITY
Is done by examining the physical
appearance of the instrument
CONTENT VALIDITY
is done through a careful and critical
examination of the objectives of assessment so that it
reflects the curricular objectives.
For instance, the teacher wishes to validate a test in
English. She requests experts in English to validate if
the test items measure knowledge, skills and values it
supposed to measure as stated in the course
content/syllabus.
CRITERION - RELATED VALIDITY
is established statistically such that a set of
scores revealed by the measuring instrument is
correlated with the scores obtained in another
external predictor or measure. It has two types:
concurrent and predictive validity.
Two types of criterion-related
validity
1. Concurrent validity - describes the present status of the
individual by correlating the sets of scores obtained from
two measures given concurrently.
For instance, the teacher wants to validate the Mathematics
achievement test he has constructed. He administers the
Mathematics test to a group of Mathematics students. The
result of the test is correlated with an acceptable
Mathematics test which has been previously proven as valid.
If the correlation is "high" the Mathematics test that he
constructed is valid.
2. Predictive validity describes the future performance of an
individual by correlating the sets of scores obtained from two
measures given at a longer time interval
For instance, the teacher wishes to estimate how well a
student may do in the graduate courses on the bases of how
well he has done on the test he has undertaken in his
undergraduate courses. The criterion measure against which
the test scores are validated and obtained are available after
a long period of interval.
CONSTRUCT-RELATED VALIDITY
this is the extent to which the test measures a
theoretical and unobservable variable qualities such
as understanding, math achievement, performance
anxiety and the like, over a period of time on the
basis of gathering evidence. It is established through
intensive study of the test or measurement
instrument using convergent/divergent validation and
factor analysis.
1. Convergent Validity is a type of construct validation
wherein a test has high correlation with another test that
measures the same construct.
2. Divergent Validity is a type of construct validation
wherein a test has low correlation with a test that measures
a different construct. In this case, a high validity occurs only
when there is a low correlation coefficient between the tests
that measure different traits. A correlation coefficient in this
instance is also called validity coefficient.
3. Factor Analysis is another method of assessing the
construct validity of a test using complex statistical
procedures conducted with different procedures.
FACTORS AFFECTING VALIDITY
1. Poorly constructed items
2. Unclear directions
3. Ambiguous test items
4. Too difficult vocabulary
5. Complicated syntax
6. Inadequate time limit
7. Inappropriate level of difficulty
8. Unintended clues
9. Improper arrangement of test items
LESSON 2.
RELIABILITY
Another characteristic of a good test is
reliability, Reliability refers to the consistency of
reliability of scores. Test scores may vary under
different reliability conditions. The reliability of
test scores is usually reported by a reliability
coefficient. A reliability coefficient is also a
correlation coefficient
Different ways of establishing
reliability of a test
TEST-RETEST METHOD
in this method, the same test is administered twice
to the same group of students with any time interval
between tests. The result of the test scores are correlated
using the Pearson Product Correlation Coefficient (r) or
Spearman rho formula (r) and this correlation provides a
measure of stability. This indicates how stable or
consistent the test result over a period of time. The
formulae are:
Pearson r (
where X is the first set of scores, y is the
second set of cores and n, number of cases.
Spearman rho (
Where stands for Spearman rho; , sum of the squared
difference between ranks, and n, number of cases.
EQUIVALENT FORM
It is also known as PARALLEL or ALTERNATE forms. In this
method, two different but equivalent forms of the test is
administered to the same group of students with a close time
interval. The two forms of the test must be constructed that the
content type of test item, difficulty, and instruction of administration
are similar but not identical. For instance, in Form A item, "How
many meters are there in 8 kilometers?" In Form B item. "How many
kilometers are there in 8,000 meters? The results of the test scores
are correlated using the Pearson Product Correlation Coefficient (r)
and this correlation provides a measure of equivalence of the tests.
TEST-RETEST WITH EQUIVALENT FORMS
METHOD
It is done by giving equivalent forms of tests
with increased time interval between forms. The
results of the test scores are correlated using the
Pearson Product Correlation Coefficient (r) and this
correlation provides measures of stability and
equivalence of the tests.
SPLIT-HALF METHOD
In this method, the test administered once and the
equivalent halves of the test is scored. The common procedure
is to divide the test into odd-numbered and even-numbered
items. The two halves of the test must be similar but not
identical in content, number of items and difficulty. This
provides two scores for each student. The scores obtained in
the two halves are correlated using Pearson r. The result is
reliability coefficient for a half test. Since the reliability holds
only for a half test, the reliability coefficient for a whole test is
estimated by using the Spearman-Brown formula.
Spearman-Brown formula
2 ( 𝑟 𝑤𝑡 )
𝑟 𝑤𝑡 = where:
1 +𝑟 𝑤𝑡
= reliability of whole test
= reliability of whole test
This correlation coefficient () provides a measure of internal
consistency. It indicates the degree to which consistent results are
obtained from two halves of the test.
KUDER-RICHARDSON METHOD
In this method, the test is administered once,
the total test is scored then the proportion /
percentage of the students passing and not passing
a types: KR-20 and KR-21.
• Kuder-Richardson 20 (KR-20) is applicable only in situations
where students’ responses are scored dichotomously, and therefore,
is most useful with traditional test items that are scored as right or
wrong, true or false, and yes or no type. It uses the formula:
= where
k= number of items
p= proportion of the students who got the item correctly
(difficulty index)
q= 1-p
s²= where n= number of items
= summation of the square of the (x)
summation of x
• Kuder-Richardson 21 (KR-21) is not limited to test
items that are scored dichotomously. It uses the
formula:
where:
k= number of items
= mean value
= variance of the total score
FACTORS AFFECTING RELIABILITY OF A
TEST
1. Length of the test
2. Item difficulty
3. Objective scoring
4. Heterogeneity of the student group
5. Limited time
RELIABILITY COEFFICIENT
Reliability coefficient is a measure of the amount of
error associated with the test scores.
Description of Reliability Coefficient
a) The range of the reliability coefficient is from 0 to 1.0
b) The acceptable range value is 0.60 or higher.
c) The higher the value of the reliability coefficient, the
more reliable the overall test scores is.
Interpretation of Reliability Coefficient
a) The group variability will affect the size of the reliability coefficient.
Higher coefficient results from heterogeneous groups than from the
homogeneous groups. As group variability increases, reliability goes
up.
b) Scoring reliability limits test score reliability. If tests are scored
unreliably, error is introduced. This will limit the reliability of the test
scores.
c) Test length affects test score reliability. As the length increases, the
test's reliability tends to go up.
d) Item difficulty affects test score reliability. As test items become very
easy or very difficult, the test's reliability goes down/.
Level of Reliability Coefficient
Reliability Interpretation
coefficient
0.91-1.00 Excellent reliability. Very ideal for a classroom test.
0.81-0.90 Very high reliability. Very good for a classroom test
0.71-0.80 High reliability. Good for a classroom test. There are
probably few items need to be improved.
0.61-0.70 Moderate reliability. The test needs to be supplemented by
other measures (more test) to determine grades.
0.51-0.60 Low reliability. Suggested need for revision of the test,
unless it is quite short (ten or fewer items). Needs to be
supplemented by other measures (more test) to determine
grades.
0.50 and below Questionable reliability to the course grade and it needs
revision.
Example 1. Prof. Santos conducted a test to his 0 students in Elementary Statistics class
twice after one-day interval. The test given after one day is exactly the same test given the
first time. Scores below where gathered in the first test (X) and second test (Y). Using test-
retest method, is the test reliable? Show the complete solution using the Pearson r formula.
Student First test (X) Second test (Y)
1 36 38
2 26 34
3 28 38
4 15 27
5 17 25
6 28 26
7 32 35
8 35 36
9 12 19
10 35 38
Solution: Find the
students First test Second test xy x2 y2
(x) (y)
1 36 38 1368 1296 1444
2 26 34 884 676 1156
3 38 38 1444 1444 1444
4 15 27 405 225 729
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1120 1024 1225
8 35 36 1260 1225 1296
9 12 19 228 144 361
10 35 38 1330 1225 1444
n=10
= 0.91
Analysis: The reliability coefficient using the Pearson r= 0.91 which means that it has an excellent reliability. The scores of
the 10 students conducted twice with one day interval are consistent. Hence, the test is very ideal for a classroom test.
Note: Compute the reliability coefficient of the same data using Spearman rho. Is the test reliable?
Solution: Rank the scores in the first test (Rx) then rank the score in the second test (Ry). Get the
difference between each rank ( Rx-Ry) to get D. Then multiply D by itself to get D².
Student First test Second test Rank of x Rank of (Ry) Difference Square of the
(x) (y) (Rx) between difference(D²)
Ranks(D)
1 36 38 2 2 0 0
2 26 34 7 6 1 1
3 38 38 1 2 -1 1
4 15 27 9 7 2 4
5 17 25 8 9 -1 1
6 28 26 6 8 -2 4
7 32 35 5 5 0 0
8 35 36 3.5 4 -0.5 0.25
9 12 19 10 10 0 0
10 35 38 3.5 2 1.5 2.25
n= 10 = 13.5
Analysis: The reliability coefficient using the Spearman rho = 0.92 which means that
it has an excellent reliability. The scores of the 10 students conducted twice with one-
day interval are consistent. Hence, the test is very ideal for a classroom test.
Example 2.Prof. Geronimo conducted a test to her 10 students in Biology class twice after one-
week interval are consistent. The test given after one week is the parallel form of the test during
the first time the test was conducted. Scores below were gathered in the first test (x) and second
test or parallel test (y). Using equivalent or parallel form method, is the reliable? Show the
complete solution using the Pearson r formula.
SStudent FirtFirst test (x) Parallel Test (y)
1 12 20
2 20 22
3 19 23
4 17 20
5 25 25
6 22 20
7 15 19
8 16 18
9 23 25
10 21 24
Solution: Find the
Students First test Second test xy
(x) (y)
1 12 20 240 144 400
2 20 22 440 400 484
3 19 23 437 361 529
4 17 20 340 289 400
5 25 25 625 625 625
6 22 20 440 484 400
7 15 19 285 225 361
8 16 18 288 256 324
9 23 25 575 529 625
10 21 24 504 441 576
n=10
= 0.76
Analysis: The reliability coefficient using the Pearson r = 0.76 which means that it has ahigh
reliability. The scores of the 10 students conducted twice with one- week interval are consistent.
Hence, the test is good v for a classroom test but there are probably few items needs to be
improved.
Note: Compute the reliability coefficient of the same data using Spearman rho. Is the test reliable?
Solution: Rank the scores in the fist test (Rx) then rank the scores in the second test (Ry). Get the
differenceStudent
between rank (Rx-Ry) Second
First test to gettest
D. Then multiply
Rank of x D byof itself
Rank (Ry) toDifference
get D Square of the
(x) (y) (Rx) between difference(D²)
Ranks(D)
1 12 20 10 7 3 9
2 20 22 5 5 0 0
3 19 23 6 4 2 4
4 17 20 7 7 0 0
5 25 25 1 1.5 -0.5 0.25
6 22 20 3 7 -4 16
7 15 19 9 9 0 0
8 16 18 8 10 -2 4
9 23 25 2 1.5 0.5 0.25
10 21 24 4 3 1 1
n= 10 = 34.5
Analysis: The reliability coefficient using the Spearman rho = 0.79 which means that it
has an excellent reliability. The scores of the 10 students conducted twice with one-
week interval are consistent. Hence, the test is good for a classroom test but there
are probably few items needs to be improved.
Example 3.Prof. Quinto conducted a test to her 10 students in Filipino class. The test was only
once . The scores of the students in odd (O) and even (E) items below were gathered .Using
split-half method, is the test reliable ? Show the complete solution using Pearson r and
Spearman Brown Formula.
Student Odd (x) Even(y)
1 15 20
2 19 17
3 20 24
4 25 21
5 20 23
6 18 22
7 19 25
8 26 24
9 20 18
10 18 17
Step 1. Use the Pearson r to get the reliability of half
of the test:
Solution: Find the
Students First test Second test Xy x2 y2
(x) (y)
1 15 20 300 225 400
2 19 17 323 361 289
3 20 24 480 400 576
4 25 21 525 625 441
5 20 23 460 400 529
6 18 22 396 324 484
7 19 25 475 361 625
8 26 24 624 676 576
9 20 18 360 400 324
10 18 17 306 324 289
n=10
( 10 ) ( 4249 ) −(200)( 211)
𝑛 ∑ 𝑥𝑦 −( ∑ 𝑥)( ∑ 𝑦 ) 𝑟 𝑥𝑦 =
√ [ (10 )( 4096 ) −(20 0) ² ] [(10)( 4 533) −(21 1) ² ]
𝑟 𝑥𝑦 =
√¿ ¿ ¿ = 0.33
Step 2. Use Spearman Brown Formula to get the reliability of the
whole test.
𝑛 ∑ 𝑥2 −∑ 𝑥
𝑟 𝑥𝑦 =
𝑛(𝑛 − 1)
2(0.33)
𝑟 𝑥𝑦 =
1+ 0.33
0.66
𝑟 𝑥𝑦 =
1.33
𝑟 𝑥𝑦 =0.50
Analysis: The reliability coefficient using the Spearman Brown Formula is 0.50 which
means it is questionable reliability. Hence, the test items should be revised.
Example 4. Prof. Madela administered a 40- item test in English for his Grade VI pupils in
Mayondon Elementary school. Below are the scores of 15 pupils, find the reliability using the
Kuder-Richardson 21 formula.
student Score (x)
1 16
2 25
3 35
4 39
5 25
6 18
7 19
8 22
9 33
10 36
11 20
12 17
13 26
14 35
15 39
Solve the variance and the mean of the scores using the table below.
Students Score (x) x2
1 16 256
2 25 625
3 35 1225
4 39 1521
5 25 625
6 18 324
7 19 316
8 22 484
9 33 1089
10 36 1296
11 20 400
12 17 289
13 26 676
14 35 1225
15 39 1521
N=15
Variance:
= = = = = 70.14
Mean:
∑ 𝑥 405
15𝑥 =27
𝑥= 𝑥=
𝑛
Solve the reliability coefficient using the Kuder-Richardson 21 Formula.
Analysis: The reliability coefficient using KR-21 formula is 0.90 which means that the test has
a very high reliability. Meaning, the test is very good for a classroom test.
Example 5: Ms. Gonzaga administered a 20 item true or false test for her Grade VIII students in Los
Banos National High School. Below are the scores of 40 students , find the reliability using the
Kuder-Richardson 20 formula.
student Score (x)
1 25
2 36
3 28
4 23
5 25
6 33
7 38
8 15
9 23
10 25
11 36
12 35
13 19
14 39
15 28
16 33
17 19
18 37
19 36
20 25
Solution: The first thing to do is to solve the difficulty index of each
item (p) by dividing the number of the students who got the correct
answer by the total number of students. Then solve for the q by
subtracting p from 1 (1-p). Find the product of p and q (pq) then get
the summation of pq. Finally, solve the variance and substitute all the
values in KR-20 formula.
=
Item number Score (x) p q pq
1 25 0.625 0.375 0.234375 625
2 36 0.9 0.1 0.09 1296
3 28 0.7 0.3 0.21 784
4 23 0.575 0.425 0.244375 529
5 26 0.625 0.375 0.234375 625
6 33 0.825 0.175 0.144375 1089
7 38 0.95 0.05 0.0475 1444
8 15 0.375 0.625 0.234375 225
9 23 0.575 0.425 0.244375 529
10 25 0.625 0.375 0.234375 625
11 36 0.9 0.1 0.09 1296
12 35 0.875 0.125 0.109375 1225
13 19 0.475 0.525 0.249375 361
14 39 0.975 0.025 0.024375 1521
15 28 0.7 0.3 0.21 784
16 33 0.825 0.175 0.144375 1089
17 19 0.475 0.525 0.249375 361
18 37 0.925 0.075 0.069375 1369
19 36 0.9 0.1 0.09 1296
20 25 0.625 0.375 0.234375 625
578 3.38875 17698
p of item 1= = 0.0625
q of item 1= 1- 0.625 = 0.375
pq= (0.625)(0.375)= 0.234375
Note: Continue the same procedure up to the last item.
= = = = =52.31
Solve using KR-20 formula:
= 0.98
Analysis: The reliability coefficient using KR-20 formula is 0.98 which means that the test
has an excellent reliability. Meaning, the test is very ideal for a classroom test.