Module 6 Administering Analysing and Improving Tests
Module 6 Administering Analysing and Improving Tests
These components constitute what we are calling test assembly. Let’s consider
each a little more closely.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Are items
of similar
format
grouped
together?
Are items
arranged
from easy
to hard
levels of
difficulty?
Are items
properly
spaced?
Are items
and
options on
the same
page?
Are
diagrams,
maps, and
supporting
material
above
designated
items and
on the
same page
with
items?
Are
answers
random?
Are blanks
for name
and date
included?
Have the
directions
been
checked
for clarity?
Has the
test been
proofread
for errors?
Do items
avoid
racial and
gender
bias?
Part 2. After evaluating, reconstruct the items that need revision so it will satisfy the
requirements in assembling test.
Item #: __________________________________________________
Revised Item #: ___________________________________________
Difficulty Index
An Item’s difficulty index is obtained by calculating the p value (p)
which is the proportion of students answering the item correctly.
The difficulty of an item or item difficulty is defined as the number of
students who are able to answer the item correctly divided by the total
number of students.
p = 30/45
=0.67
Item 2: In the same class, only 10 Item 2 has a p value of 0.22. Out of 45
responded correctly in Item 2. only 10 or 22% got the item right while
35 or 78% missed it.
p = 10/45
=0.22
Discrimination Index
The power of an item to discriminate between informed and uninformed groups
or between more knowledgeable and less knowledgeable learners is shown using
the item discrimination index (D). This is an item statistics that can reveal useful
information for improving an item. Basically an item-discrimination index shows the
relationship between the student’s performance in an item (i.e. right or wrong) and
his/her total performance in the test represented by the total score.
For classroom tests, the discrimination index shows if a difference exists
between the performance of those who scored high and those who scored low in an
item. As a general rule, the higher the discrimination index (D), the more marked
the magnitude of the difference is, and thus, the more discriminating the item is.
The nature of the difference however, can take different directions:
a. Positively discriminating item - proportion of high scoring group is greater
than that of the low scoring group.
b. Negatively discriminating item - proportion of high scoring group is less
than that of the low scoring group.
c. Not discriminating - proportion of high scoring group is equal to that of the
low scoring group.
As you can see is actually getting the p value of an item. So to get D is to get
the difference between the p-value involving the upper half and the p-value
involving the lower half. So the formula for discrimination index (D) can also be
given as (Popham, 2011):
To obtain the proportions of the upper and lower groups responding to the item
correctly, the teacher follows these steps:
1. Score the test papers using a key to correction to obtain the total scores of the
student. Maximum score is the total number of objective items.
2. Order the test papers from highest to lowest score.
3. Split the test papers into halves: high group and low group.
For a class of 50 or less students, do a 50 - 50 split. Take the upper half as
the HIGH GROUP and the lower as the LOW GROUP.
For a big group of 100 or so, take the upper 25 - 27% and the lower 25 - 27%.
Maintain equal numbers of test papers for Upper and Lower groups.
4. Obtain the p value for the upper group and p value for the lower group.
Distracter Analysis
Another empirical procedure to discover areas for item-improvement utilizes an
analysis of the distribution of responses across the distracters. Especially when the
difficulty index and discrimination index of the item seem suggest its being
candidate for revision, distracter analysis becomes a useful follow-up. It can detect
differences in how the more able students respond to the distracters in a
multiple-choice item compared to how the less able ones do it. It can also provide an
index of the plausibility of the alternatives, that is, if they are functioning as good
distracters. Distracters not chosen at all, especially by the uniformed students need
to be revised to increase their attractiveness.
Lower 2 0 12 6
2 .45 -.50 Upper 2 *4 10 4
Lower 5 14 1 0
Analysis:
What kinds of items do you see based on their D?
What does their respective D indicate? Cite the data supporting this.
Which of the two items is more discriminating?Why
Which items need to be revised?
Notice that the calculation for Si carries the same concept as the discrimination
index except that the difference in proportion is obtained between post-test and
pre-test given to the same group. Similar to D interpretation, the higher Si value is
the more sensitive the item is showing the change as a result to instruction. This
item statistics gives additional information regarding the efficiency and validity of
the item.
There could however, be reasons if Si of a test item does not register a
meaningful difference between post-test and pre-test. Especially for a knowledge
level item, it is possible that it was not taken at all during instruction so the students
did not have the chance to learn it or they already know the item prior to instruction.
The teacher should take note of these items for reviewing content coverage for the
period.
Activity 6.2
Task Description: This activity will test your ability to apply empirical procedures for
item-improvement. Solve and answer the following.
1. A final test in Science was administered to a Grade IV class of 50. The teacher
wants to improve further the items for next year’s use. Calculate a quality index that
can be used using the given data and indicate the possible revision needed by some
items.
Item Number getting Index Revision needed to
item correct be done
1 34 ____________
3 10 ____________
4 46 ____________
5 8 ____________
2. Below are additional data collected for the same items. Calculate another quality
index and indicate what needs to be improved with the obtained index as a basis.
Item Upper Group Lower Group Index Revision
needed to be
done
1 25 9 ____________
2 9 9 ____________
3 2 8 ____________
4 38 8 ____________
5 1 7 ____________
3. A distracter analysis table is given for a test item given to a class of 60. Obtain
the necessary item statistics using the given data.
Item Difficulty Discrimination Group Alternatives
N=30 Index (p) Index (D)
A B *C D Omit
1 ? ? Upper 2 18 5 0
Lower 0 10 20 0
Peer review
There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-based
assessment tasks. During these teacher dyad or triad sessions, those teaching
the same subject area can openly review together the classroom tests and tasks
they have devised against some consensual criteria. The suggestions given by
test experts can actually be used collegially as basis for a review checklist:
a. Do the items follow the specific and general guidelines in writing items
especially on:
Being aligned to instructional objectives?
Making the problem clear and unambiguous?
Providing plausible options?
Avoiding unintentional clues?
Having only one correct answer?
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?
Student Review
Engagement of students in reviewing items has become a laudable practice
for improving classroom tests. The judgement is based on the students’
experience in taking the test, their impressions and reactions during the testing
event. The process can be efficiently carried out through the use of a review
questionnaire. Popham (2011) illustrates a sample questionnaire shown in Table
3. It is better to conduct the review activity a day after taking the test so the
students still remember the experience when they see a blank copy of the test.
Comparing the Percentage Answering Each Item Correctly on Both Pre- and
Pos t-test
If your test is sensitive to your objectives (and assuming you teach to your
objectives), the majority of learners should receive a low score on the test prior to
your instruction and a high score afterward. This method can be used to determine
whether this is happening. Subtract the percentage of students passing each item
before your instruction from the percentage of students passing each item after
your instruction. The more positive the difference, the more you know the item is
tapping the content you are teaching. This method is similar to the first step as
described in the preceding section. For example, consider the following percentages
for five test items:
Step 1: Find the number of items each student failed on the pretest, prior to
instruction, but passed on the post-test, after instruction.
The asterisks indicate just the items counted in Step 1 for Bobby. This count is
then repeated for each student.
Step 2: Add the counts in Step 1 for all students and divide by the number of
students.
Step 3: Divide the result from Step 2 by the number of items on the test.
Step 4: Multiply the result from Step 3 by 100.
Let’s see how this would work for a 25-item test given to five students before
and after instruction.
[1] De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana
Publishing Co., Inc.
[2] Kubiszyn, T., & Borich, G. (2013). Educational testing and measurement:
Classroom application and practice. John Wiley & Sons, Inc.
[3] Navaroo, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning 1.
Lorimar Publishing, Inc.
[4] Popham, W.J. (2017). Classroom assessment: What teachers need to know.
Pearson Education, Inc.