Item
Analysis
“Why do we need to
analyze the items of
the test?”
Assessment- it is description,
or explanation of something,
usually based on careful
consideration or investigation.
“Purpose of
assessing the item”
To determine
its difficulty.
Item Analysis
is a process of examining the
student’s response to
individual item in the test. It
consists of different
procedures for assessing the
quality of the test items of
the students
There are two types of item
analysis:
- Quantitative Item Analysis
- Qualitative Item Analysis
Purpose
• improves items used again in later tests
• eliminates ambiguous or misleading items
in a single test administration
• increases instructors' skills in test
construction
• identify specific areas of course content
which need greater emphasis or clarity
Before Item Analysis
• Editorial Review
– 1st: a few hours after the first draft was written
– 2nd: involve one or more other teachers, esp.
those w/ content knowledge of the field
Before Item Analysis
• Editorial Review
– 1st: a few hours after the first draft was written
– 2nd: involve one or more other teachers, esp.
those w/ content knowledge of the field
How to Analyze
the Item?
1. Facility index – the percentage of examinees who
correctly answer an item. The higher the index, the
easier is the item.
Guide for interpreting the facility index:
81-100% correct - very easy
61- 80% correct - easy
41- 60% correct - Average facility
21- 40% correct - Difficult
0- 20% correct - Very difficult
Difficulty Index
refers to the proportion of
the number of students in
the upper and lower groups
who answered an item
correctly.
Difficulty Index
• Occasionally everyone knows the answer
An unusual high level of success may be due to:
a) previous teacher
b) knowledge from home; child’s background
c) excellent teaching
d) poorly constructed, easily guessed
Difficulty Index
• Low scores
– Is it the student’s fault for “not trying”?
a) Motivation level
b) Ability of teacher to get a point across
c) Construction of the test item
Difficulty Index Formula
DF = n/N, where
DF= difficulty index
number of the students selecting item correctly
n=
in the upper group and lower group
N=
total number of students of students who
answered the test
Difficulty Index
p = total who answered correctly
total taking the test
*p is the difficulty index
Difficulty Index
p = 22 get the correct answer
25 students take the test
p= ?
*p is the difficulty index
Difficulty Index
p = 22 get the correct answer
25 students take the test
p = 0.88
*p is the difficulty index
Difficulty Index
p = 0.88
88% of the students got it right
high difficulty index
*p is the difficulty index
Difficulty Index
p = 0.88
a) item was too easy
b) students were well taught
*p is the difficulty index
Level of Difficulty
Index Range Difficulty Level
0.00-0.20 Very Difficult
0.21-0.40 Difficult
0.41-0.60 Average/Moderately
Difficult
0.61-0.80 Easy
0.81-1.00 Very Easy
2. Discrimination index – this is a number
which indicates the ability of an
item to differentiate between high
and low scorers on the test
Guide for interpreting discrimination index:
Index
≤ 0.0 - negatively or not at all
discriminating
.01 - .15 - very low discrimination
.16 - .30 - moderately discrimination
.31 - .45 - good discrimination
> .45 - highly discrimination
Difficulty Index
Sample Problem:
In a Math test administered by Mr.
Reyes, seven students answered
word problem #1 correctly. A total
of twenty-five students took the
test.
What is the difficulty index for word
problem #1?
Difficulty Index
p = 0.28
Difficulty Index
p = 0.28
low difficulty index
low difficulty level at p < 0.30
Difficulty Index
p = 0.28
a) students didn’t understand
the concept being tested
b) item could be badly
constructed
Sample of Difficulty
Index
Item No. 1
Option Freq. % Valid Cum
percent Percent
*A 341 93.2 93.2 93.2
B 1 .3 .3 93.5
C 4 1.1 1.1 94.6
D 20 5.4 5.4 100
TOTAL 366 100 100
Option Analyses
Item No. 1
Option Freq. % The correct key (A)was chosen by
93.2% of the examinees. This item is
*A 341 93.2 too easy; perhaps option A is very
B 1 .3 obviously the correct answer Option B is
C 4 1.1 not performing as an option. It was not
plausible enough. C is also a poor option.
D 20 5.4 (To be considered plausible, an option
TOTAL 366 100 must be chosen by at least 3% of the
examinees)
Item No. 2
Option Freq. % Item 2 suggests a possibly wrong key.
A 2 .5 Or it could imply misconceptions because
B is just too attractive an answer. The
B 340 92.3 key should be reviewed. The item too,
*C 16 4.8 because it might only have been
D 8 2.4 misunderstood.
TOTAL 366 100
Discrimination Index
The power of the item to
discriminate the students
between those who scored
high and those who scored
low in the overall test.
Types of Discrimination Index
1. Positive Discrimination
2. Negative Discrimination
3. Zero Discrimination
Positive Discrimination
happens when more students
in the upper group got the
item correctly than those
students in the lower
group.
Negative Discrimination
occurs when more students
in the lower group got the
item correctly than the
students in the upper group
Zero Discrimination
happens when a number of
students in the upper
group and lower group
who answer the test
correctly are equal
Levels of Discrimination
Ebel (1986) and Hetzel (1997)
Index Range Discrimination Level
0.19 and below Poor item, should be
eliminated or needed to
be revised
0.20- 0.29 Marginal item, needs some
revision
0.30-0.39 Reasonably good item but
possibly for improvement
0.40 and above Very good item
Discrimination Index Formula
DI = CUG - CLG /D , where
DI = discrimination index value
CUG = number of students selecting the
correct answer in the upper group
CLG = number of students selecting the
correct answer in the lower group
Item Analysis on 30 Item
N = 780
N= 780
H = 27% OF 780 = 210
B = 27% OF 780 = 210
Formula:
FI = (h + b) X 100%
(H + B)
DI = (h - b)
H
Distribution of Numbered Items Accdg. To
Facility and Discrimination Indices
Facility
Index DISCRIMINATION INDEX
(%) Neg.-.00 .01 - .15 .16 - .30 .31 - .45 .46 - .60 > .61 Total
81 – 100%
61 – 80 % 4 2, 10 6, 13, 24 11, 14 8
41 – 60% 16 5 23, 17 29 1 3, 7, 20, 22 10
21 – 40% 25 26 15, 27 4
0 – 20% 9, 12, 17 8, 18, 21, 30 19 8
Total 5 6 4 5 4 6 30
The frequencies inside the green box are
those of acceptable items.
Analysis of Response Options
Another way to evaluate the
performance of the entire test
item is through the analysis of
the response options. It is very
important to examine the
performance of each option in a
multiple-choice item.
Distracter Analysis
Distracter- term used for
the incorrect options in the
multiple-choice type of
test while the correct
answer represents the key
Anatomy of a multiple choice
item
How many inches are in a foot?
item stem
keyed or correct option
A. 12 options,
B. 20 alternatives
distracters or foils
C. 24 or choices
D. 100
Distracter Analysis
Answer Choices Number who selected
choice (out of 50)
Choice A 2
Choice B 26
Choice C 7
Choice D 15 or 0.30
p = 15/50
Item Analysis
2. Item Discrimination Level/Index
> extent to which success on an item corresponds
to success on the whole test
>D
Assumptions
• We assume that the total score on the test is
valid.
• We also assume that each item shows the
difference between students who know
more from students who know less.
• The discrimination index tells us the extent
that this is true.
Item Discrimination
Methods:
a) Hogan (2007)
b) point-biserial correlation (rpb)
c) Gronlund & Linn (1990)
Hogan (2007)
#1 Get a total score per student.
#2 Divide students into high and low group.
#3 Get the percentages of students who
answered correctly in both groups.
#4 Subtract high % - low %.
Hogan’s Method
Item Group A B* C D
5 High 0 90 10 0
in % Low 10 50 30 10
Total 5 70 20 5
N=40; Median=35; The test contained 50 items.
* Indicates correct option.
Sample Problem
Item Group A* B C D
5 High 40 60 0 0
Low 60 30 0 10
Total 50 45 0 5
N=40; Median=35; The test contained 50 items.
Interpretation
• Positive discrimination occurs if more
students in the upper group than the students
in the lower group get the item right.
• This indicates that the item is discriminating
in the same direction as the total test score.
Interpretation
• Positive discrimination occurs if more
students in the upper group than the students
in the lower group get the item right.
• This indicates that the item is discriminating
in the same direction as the total test score.
Point-Biserial Correlation
• Used to correlate item scores with the scores
of the whole test
• A special case of the Pearson Product
Moment Correlation, where one variable is
binary (right vs. wrong), and the other is
continuous (total raw test score)
Point-Biserial Correlation
• Used to correlate item scores with the scores
of the whole test
• A special case of the Pearson Product
Moment Correlation, where one variable is
binary (right vs. wrong), and the other is
continuous (total raw test score)
Point-Biserial Correlation
Step 1
• Score the item as either right (1) or wrong (0).
Step 2
• Use point-biserial correlation.
Step 3
• Interpret.
Point-Biserial Correlation
mean raw score of all students who got the item right
mean raw score of all students who got the item wrong
standard deviation of the raw scores
p proportion of students who got the right answer
Point-Biserial Correlation
• A negative point-biserial correlation means
that the students who did well on the test
missed that item, while those students who
did poorly on the test got the item right.
• This item should be rewritten.
Software Support
• Calculates rpb online:
http://faculty.vassar.edu/lowry/pbcorr.html
Date accessed: 1 August 2011
• Software index for free reliability software:
http://www.rasch.org/software.htm
Gronlund & Linn (1990)
For norm-referenced tests:
#1 Rank all from highest to lowest scores.
#2 Select highest and lowest ten scores.
#3 Tabulate the number of students who
selected each alternative.
#4 Compute for p-value.
#5 Compute the discriminating power.
#6 Evaluate effectiveness of distracters.
Gronlund & Linn (1990)
• Item discriminating power
– degree to w/c a test item discriminates
between pupils with high and low scores
D = RU – RL
½T
RU is the number of pupils in the upper group who got the item right
RL is the number of pupils in the lower group who got the item right
T is the total number of pupils included in the analysis
Gronlund & Linn (1990)
• Item discriminating power
– degree to w/c a test item discriminates
between pupils with high and low scores
D = RU – RL
½T
RU is the number of pupils in the upper group who got the item right
RL is the number of pupils in the lower group who got the item right
T is the total number of pupils included in the analysis
Interpreting Values
• D=0.60 indicates average discriminating power.
• D=1.00 has maximum positive discriminating
power where all pupils in the upper group get
the item right while all pupils in the lower
group get the item wrong.
• D=0.00 has no discriminating power where an
equal number of pupils in the upper and lower
groups get the item right.
Interpreting Values
• D=0.60 indicates average discriminating power.
• D=1.00 has maximum positive discriminating
power where all pupils in the upper group get
the item right while all pupils in the lower
group get the item wrong.
• D=0.00 has no discriminating power where an
equal number of pupils in the upper and lower
groups get the item right.
Interpreting Values
• D=0.60 indicates average discriminating power.
• D=1.00 has maximum positive discriminating
power where all pupils in the upper group get
the item right while all pupils in the lower
group get the item wrong.
• D=0.00 has no discriminating power where an
equal number of pupils in the upper and lower
groups get the item right.
Sample Problem
Item Group A B* C D
Upper
5 0 10 0 0
10
Lower
2 4 1 3
10
Find the p and D.
* Indicates correct option.
Answers
p = 0.70
D = 0.90 - 0.50 = 0.40
Analysis of Criterion-Referenced
Mastery Items
Crucial Questions:
To what extent did the test items
measure the effects of instruction?
Item Response Chart
Items 1 2 3 4 5
Pretest (B)
Posttest B A B A B A B A B A
(A)
Jim - + + + - - + - - +
Dora - + + + - - + - + +
Lois - + + + - - + - - +
Diego - + + + - - + - - +
+ means correct
- means incorrect
Sensitivity to Instructional Effects
(S)
S = RA – RB
T
RA is the number of pupils who got the item right after
instruction
RB is the number of pupils who got the item right before
instruction
T is the total number of pupils who tried the item both times
Sensitivity to Instructional Effects
(S)
S= 1
The ideal item value is 1.00. Effective items fall between
0.00 and 1.00.
The higher the positive value, the more sensitive the item
is to instructional effects.
Items with zero and negative values do not reflect the
intended effects of instruction.
Sensitivity to Instructional Effects
(S)
S= 1
The ideal item value is 1.00. Effective items fall between
0.00 and 1.00.
The higher the positive value, the more sensitive the item
is to instructional effects.
Items with zero and negative values do not reflect the
intended effects of instruction.
Sensitivity to Instructional Effects
(S)
S= 1
The ideal item value is 1.00. Effective items fall between
0.00 and 1.00.
The higher the positive value, the more sensitive the item
is to instructional effects.
Items with zero and negative values do not reflect the
intended effects of instruction.
Factors why students failed to get the
correct answer in the given question:
• It is not taught in the class
properly.
• It is ambiguous.
• The correct answer is not in the
given in the options.
• It has more than one correct
answer.
Factors why students failed to get the
correct answer in the given question:
• It contains grammatical clues to mislead
the students.
• The student is not aware of the content.
• The students were confused by the logic of
the question because it has double
negatives.
• The student failed to study the lesson.
Miskeyed Item
The test item is a potential
miskey if there are more
students from the upper
group who choose the
incorrect options than the
key.
Guessing item
Students from the upper group have equal spread of
choices among the given alternatives. The
students from the upper group guess their
answers because of the following reasons:
• the content of the test is not
discussed in the class or in the test.
• The item is very difficult.
• The question is trivial.
• Ambiguous item
Ambiguous item
This happens when more
students from the upper
group choose equally an
incorrect option and the
keyed answer.
Qualitative Item Analysis
is a process in which the
teacher or expert carefully
proofreads the test before
it is administered.
(Zurawski R.M.)
Thank You
And GodBless