0% found this document useful (1 vote)

2K views

Module 6 Administering Analysing and Improving Tests

1. The document discusses guidelines for assembling, administering, scoring, and analyzing tests to evaluate student learning. It provides steps for packaging the test, such as grouping similar item formats and arranging items from easy to hard. 2. Guidelines are also provided for administering the test, such as maintaining a positive attitude and minimizing distractions. Scoring procedures include preparing an answer key and checking scoring. 3. Procedures for quantitative and qualitative analysis of tests are described to identify deficiencies and improve items, leading to better evaluation of student learning. Both norm-referenced and criterion-referenced analysis methods are discussed.

Uploaded by

Jerwin Canteras

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

2K views

Module 6 Administering Analysing and Improving Tests

Uploaded by

Jerwin Canteras

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

CHAPTER 6

Administering, Analyzing, and Improving Tests

Intended Learning Outcomes:

1. Identify appropriate test assembly, administration, and scoring practices
2. Execute guidelines for assembling the test.
3. Apply quantitative and qualitative procedures for analyzing a test.
4. Recall debriefing guidelines for test and instructional improvement.
5. Name important components of achievement testing.

ASSEMBLING THE TEST

At this point let’s assume you have
1. Written measurable instructional objectives.
2. Prepared a test blueprint, specifying the number of items for each content
and process area.
3. Written test items that match your instructional objectives.

Once you have completed these activities you are ready to

1. Package the test.
2. Reproduce the test.

These components constitute what we are calling test assembly. Let’s consider
each a little more closely.

 Packaging the Test

There are several packaging guidelines worth remembering, including
grouping together items of similar format, arranging test items from easy to
hard, properly spacing items, keeping items and options on the same page,
placing illustrations near the descriptive material, checking for randomness in
the answer key, deciding how students will record their answers, providing
space for the test-taker’s name and the date, checking test directions for clarity,
and proofreading the test before you reproduce and distribute it.

1. Group Together All Items of Similar Format

If you have all true–false items grouped together, all completion items
together, and so on, the students will not have to “switch gears” to adjust to
new formats. This will enable them to cover more items in a given time than if
item formats were mixed throughout the test. Also, by grouping items of a
given format together, only one set of directions per format section is necessary,
which is another time-saver.

2. Arrange Test Items from Easy to Hard

Arranging test items according to level of difficulty should enable more
students to answer the first few items correctly, thereby building confidence
and, it is hoped, reducing test anxiety.

[Prof Ed 221 Assessment of Student Learning 1]

3. Space the Items for Easy Reading
If possible, try to provide enough blank space between items so that each
item is distinctly separate from others. When items are crowded together, a
student may inadvertently perceive a word, phrase, or line from a preceding or
following item as part of the item in question. Naturally, this interferes with a
student’s capacity to demonstrate his or her true ability.

4. Keep Items and Options on the Same Page

There are few things more aggravating to a test-taker than to have to turn
the page to read the options for multiple choice or matching items or to ﬁnish
reading a true–false or completion item. To avoid this awkwardness, do not
begin an item at the bottom of the page unless you have space to complete the
item. Not only will this eliminate having to carry items over to the next page, it
will also minimize the likelihood that the last line or two of the item will be cut
off when you reproduce the test.

5. Position Illustrations Near Descriptions

Place diagrams, maps, or other supporting material immediately above the
item or items to which they refer. In other words, if items 9, 10, and 11 refer to a
map of South America, locate the map above items 9, 10, and 11—not between
9 and 10 or between 10 and 11 and not below them. Also, if possible, keep any
such stimuli and related questions on the same page to save the test-taker time.

6. Check Your Answer Key

Be sure the correct answers follow a fairly random pattern. Avoid true–false
patterns such as TFTF, or TTFF, and multiple-choice patterns such as D C B A D
C B A. At the same time, check to see that your correct answers are distributed
about equally between true and false and among multiple-choice options.

7. Determine How Students Record Answers

Decide whether you want to have students record their answers on the test
paper or on a separate answer sheet.
In the lower elementary grades, it is generally a good idea to have students
record answers on the test papers themselves. In the upper elementary and
secondary grades, separate answer sheets can be used to facilitate scoring
accuracy and to cut down on scoring time. Also, in the upper grades, learning to
complete separate answer sheets will make students familiar with the process
they will use when taking standardized tests.

8. Provide Space for Name and Date

Be sure to include a blank on your test booklet and/or answer sheet for the
student’s name and the date. This may seem an unnecessary suggestion, but it
is not always evident to a nervous test-taker that a name should be included on
the test. Students are much more likely to remember to put their names on
tests if space is provided.

[Prof Ed 221 Assessment of Student Learning 1]

9. Check Test Directions
Check your directions for each item format to be sure they are clear.
Directions should specify:
1. The numbers of the items to which they apply.
2. How to record answers.
3. The basis on which to select answers.
4. Criteria for scoring.

10. Proofread the Test

Proofread for typographical and grammatical errors before reproducing the
test and make any necessary corrections. Having to announce corrections to the
class just before the test or during the test will waste time and is likely to inhibit
the test-takers’ concentration.
Before reproducing the test, it’s a good idea to check off these steps. The
checklist in Figure 1 can be used for this purpose.

Figure 1. Test Assembly Checklist (Kubiszyn & Borich, 2013 p. 222)

 Reproducing the Test

Most test reproduction in the schools is done on photocopying machines. As
you well know, the quality of such copies can vary tremendously. Regardless of how
valid and reliable your test might be, poor copies will make it less so. If someone else
will do the reproducing, be sure to specify that the copies are for an important test
and not simply an enrichment exercise. Ask the clerk or aide to randomly inspect
copies for legibility while running the copies and to be alert for blank or partially
copied pages while collating, ordering, and stapling multipage tests.

[Prof Ed 221 Assessment of Student Learning 1]

Activity 6.1
Part 1. Evaluate the 15 multiple-choice items you made in Activity 5C using the
checklist statements below for test assembly. For each item, write YES or NO in the
appropriate box. Write NA if a question does not apply.
Item Number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Are items
of similar
format
grouped
together?

Are items
arranged
from easy
to hard
levels of
difficulty?

Are items
properly
spaced?

Are items
and
options on
the same
page?

Are
diagrams,
maps, and
supporting
material
above
designated
items and
on the
same page
with
items?

Are
answers
random?

[Prof Ed 221 Assessment of Student Learning 1]

Will an
answer
sheet be
used?

Are blanks
for name
and date
included?

Have the
directions
been
checked
for clarity?

Has the
test been
proofread
for errors?

Do items
avoid
racial and
gender
bias?

Part 2. After evaluating, reconstruct the items that need revision so it will satisfy the
requirements in assembling test.

Item #: __________________________________________________
Revised Item #: ___________________________________________

ADMINISTERING THE TEST

The test is ready. All that remains is to get the students ready and hand out the
tests. Here is a series of suggestions to help your students psychologically prepare
for the test.
1. Maintain a positive attitude
2. Maximize achievement motivation
3. Equalize advantages
4. Avoid surprises
5. Clarify the rules
6. Rotate distribution
7. Remind students to check their copies
8. Monitor students
9. Minimize distractions

[Prof Ed 221 Assessment of Student Learning 1]

10. Give time warnings
11. Collect tests uniformly

SCORING THE TEST

Some general suggestions to save scoring time and improve scoring accuracy
and consistency:
1. Prepare an answer key
2. Check the answer key
3. Score blindly
4. Check machine-scored answer sheets
5. Check scoring
6. Record scores

ANALYZING THE TEST

Just as you can expect to make scoring errors, you can expect to make errors in
test construction. No test you construct will be perfect—it will include inappropriate,
invalid, or otherwise deficient items. In the remainder of this chapter we will
introduce you to a technique called item analysis. Item analysis can be used to
identify items that are deficient in some way, thus paving the way to improve or
eliminate them, with the result being a better overall test. We will make a
distinction between two kinds of item analysis, quantitative and qualitative.
Quantitative item analysis is likely to be something new. But as you will see,
qualitative item analysis is something with which you are already familiar. Finally,
we will discuss how item analysis differs for norm- and criterion-referenced tests,
and we provide you with several modified norm-referenced analysis methods to use
with criterion-referenced tests.
Quantitative item analysis is a technique that will enable us to assess the quality
or utility of an item. It does so by identifying distractor or response options that are
not doing what they are supposed to be doing. How useful is this procedure for a
completion or an essay item? Frankly, it is not very useful for these types of items,
but qualitative item analysis is. On the other hand, quantitative item analysis is
ideally suited for examining the usefulness of multiple-choice formats. The
quantitative item analysis procedures that we will describe are most appropriate for
items on a norm-referenced test. As you will see, we are interested in spreading out
students, or discriminating among them, with such a test. When dealing with a
criterion-referenced test,qualitative and modified quantitative item analysis
procedures are most appropriate.

Quantitative Item Analysis

A numerical method for analyzing test items employing student response
alternatives or options.

Empirically-based Improvement Procedures

Item-improvement using empirically-based methods is aimed at improving
the quality of an item using students’ responses to the test. Test developers
refer to this technical process as item analysis as it utilizes data obtained
separately for each item. An item is considered good when its quality indices i.e.

[Prof Ed 221 Assessment of Student Learning 1]

difficulty index and discrimination index, meet certain characteristics. For a
norm-referenced test, these two indices are related since the level of difficulty
of an item contributes to its discriminability. An item is good if it can
discriminate between those who perform well in the test and those who do not.
However, an extremely easy item, that which can be answered correctly by
more than 85% of the group, or an extremely difficult item, that which can only
be answered correctly by 15%, is not expected to perform well as a
“discriminator”. The group will appear to be quite homogeneous with items of
this kind. They are weak items since they do not contribute to “score-based
inference”.

Difficulty Index
An Item’s difficulty index is obtained by calculating the p value (p)
which is the proportion of students answering the item correctly.
The difficulty of an item or item difficulty is defined as the number of
students who are able to answer the item correctly divided by the total
number of students.

Where p is the difficulty index

R = total number of students answering the item right
T = total number of students answering the item

Here are two illustrative samples:

Item 1: There were 45 students in the Item 1 has a p value of 0.67. Sixty-seven
class who responded to Item 1 and 30 percent (67%) got the item right while
answered it correctly. 33% missed it.

p = 30/45
=0.67

Item 2: In the same class, only 10 Item 2 has a p value of 0.22. Out of 45
responded correctly in Item 2. only 10 or 22% got the item right while
35 or 78% missed it.

p = 10/45
=0.22

For Normative-referenced test: Between the two items, Item 2 appears to be a

much more difficult item since less than a fourth of the class only was able to

[Prof Ed 221 Assessment of Student Learning 1]

respond correctly.
For Criterion-referenced test: The class shows much better performance in Item 1
than in Item 2. It is still a long way for many to master Item 2.

Range of Difficulty Index Interpretation Action

0 - 0.25 Difficult Revise or Discard

0.26 - 0.75 Right Difficult Retain

0.76 - above Easy Revise or Discard

Discrimination Index
The power of an item to discriminate between informed and uninformed groups
or between more knowledgeable and less knowledgeable learners is shown using
the item discrimination index (D). This is an item statistics that can reveal useful
information for improving an item. Basically an item-discrimination index shows the
relationship between the student’s performance in an item (i.e. right or wrong) and
his/her total performance in the test represented by the total score.
For classroom tests, the discrimination index shows if a difference exists
between the performance of those who scored high and those who scored low in an
item. As a general rule, the higher the discrimination index (D), the more marked
the magnitude of the difference is, and thus, the more discriminating the item is.
The nature of the difference however, can take different directions:
a. Positively discriminating item - proportion of high scoring group is greater
than that of the low scoring group.
b. Negatively discriminating item - proportion of high scoring group is less
than that of the low scoring group.
c. Not discriminating - proportion of high scoring group is equal to that of the
low scoring group.

Calculation of the discrimination index therefore requires obtaining the

difference between the proportion of the high-scoring group getting the item
correctly and the proportion of the low-scoring group getting the item correctly
using this simple formula:

[Prof Ed 221 Assessment of Student Learning 1]

Another calculation can bring about the same result as (Kubiszyn and Borich,
2010):

As you can see is actually getting the p value of an item. So to get D is to get
the difference between the p-value involving the upper half and the p-value
involving the lower half. So the formula for discrimination index (D) can also be
given as (Popham, 2011):

To obtain the proportions of the upper and lower groups responding to the item
correctly, the teacher follows these steps:
1. Score the test papers using a key to correction to obtain the total scores of the
student. Maximum score is the total number of objective items.
2. Order the test papers from highest to lowest score.
3. Split the test papers into halves: high group and low group.
 For a class of 50 or less students, do a 50 - 50 split. Take the upper half as
the HIGH GROUP and the lower as the LOW GROUP.
 For a big group of 100 or so, take the upper 25 - 27% and the lower 25 - 27%.
 Maintain equal numbers of test papers for Upper and Lower groups.
4. Obtain the p value for the upper group and p value for the lower group.

[Prof Ed 221 Assessment of Student Learning 1]

5. Get the discrimination index by getting the difference between the p-values.
For the purposes of evaluating the discriminating power of items, Popham
(2011) offers the guidelines proposed by Ebel & Frisbie (1991) shown in Table 1. The
teachers can be guided on how to select satisfactory items and what to do to
improve the test.
Table 1. Guidelines for Evaluating the Discriminating Efficiency of Items
Discrimination Index Item Evaluation

.40 and above Very good items

.30 - .39 Reasonably good items, but possibly

subject to improvement

.20 - .29 Marginal items, usually needing

improvement

.19 and below Poor items, to be rejected or improved

by revision

Items with negative discrimination indices, although significantly high, are

subject right away to revision if not deletion. With multiple-choice items, negative D
is a forensic evidence errors in item writing. It suggests the possibility of:
 Wrong key - more knowledgeable students selected a distracter which is
the correct answer but is not the keyed option
 Unclear problem in the stem leading to more than one correct answer
 Ambiguous distracters leading the more informed students be divided in
choosing the attractive options
 Implausible keyed option which more informed students will not choose
As you can see, awareness of item-writing can provide cues on how to improve
items bearing negative or non-significant discrimination indices.

Distracter Analysis
Another empirical procedure to discover areas for item-improvement utilizes an
analysis of the distribution of responses across the distracters. Especially when the
difficulty index and discrimination index of the item seem suggest its being
candidate for revision, distracter analysis becomes a useful follow-up. It can detect
differences in how the more able students respond to the distracters in a
multiple-choice item compared to how the less able ones do it. It can also provide an
index of the plausibility of the alternatives, that is, if they are functioning as good
distracters. Distracters not chosen at all, especially by the uniformed students need
to be revised to increase their attractiveness.

[Prof Ed 221 Assessment of Student Learning 1]

To illustrate this process, consider the frequency distribution of the responses of
the upper group and lower group across the alternatives for two items. Separate
counts are done for the upper and lower group who chose A,B,C, and D. The data is
organized in a distracter analysis table.

Table 2. Distracter Analysis Table

Item Difficulty Discrimination Group Alternatives
N=40 Index (p) Index (D)
A B C D Omit

1 .38 -.35 Upper 2 10 *5 3

Lower 2 0 12 6
2 .45 -.50 Upper 2 *4 10 4

Lower 5 14 1 0

Analysis:
 What kinds of items do you see based on their D?
 What does their respective D indicate? Cite the data supporting this.
 Which of the two items is more discriminating?Why
 Which items need to be revised?

Example (Quantitative Item Analysis):

[Prof Ed 221 Assessment of Student Learning 1]

Sensitivity to Instruction Index
The techniques earlier discussed make use of responses obtained from single
administration of a test. The indices obtained for difficulty, discrimination and
option plausibility are seen as helpful statistics for improvement of norm-referenced
or summative tests given after a period of instruction.
Another empirical approach for reviewing test items is to infer how sensitive an
item has been to instruction. This is referred to as sensitivity to instruction index (Si)
and it signifies change in student’s performance as a result of instruction. The
information is useful for criterion-referenced tests which aim at determining if
mastery learning has been attained, after a designated of prescribed instructional
period. The basic question being addressed is a directional one, i.e., is student
performance better after instruction is given. In the context of item performance, Si
will indicate p-value obtained for the item in the post-test is greater than the
p-value in the pre-test. Consider an item where in a class of 40, 80% answered it
correctly in the post-test while only 10% did it in the pre-test.
Its p-value for the post-test is .80 while for pre-test is .10, thus Si =.70 following
this calculation:
Sensitivity to instruction (Si ) = Ppost - Ppre = .80 - .10 = .70

Notice that the calculation for Si carries the same concept as the discrimination
index except that the difference in proportion is obtained between post-test and
pre-test given to the same group. Similar to D interpretation, the higher Si value is
the more sensitive the item is showing the change as a result to instruction. This
item statistics gives additional information regarding the efficiency and validity of
the item.
There could however, be reasons if Si of a test item does not register a
meaningful difference between post-test and pre-test. Especially for a knowledge
level item, it is possible that it was not taken at all during instruction so the students
did not have the chance to learn it or they already know the item prior to instruction.
The teacher should take note of these items for reviewing content coverage for the
period.

Activity 6.2

Task Description: This activity will test your ability to apply empirical procedures for
item-improvement. Solve and answer the following.

1. A final test in Science was administered to a Grade IV class of 50. The teacher
wants to improve further the items for next year’s use. Calculate a quality index that
can be used using the given data and indicate the possible revision needed by some
items.
Item Number getting Index Revision needed to
item correct be done

1 34 ____________

[Prof Ed 221 Assessment of Student Learning 1]

2 18 ____________

3 10 ____________

4 46 ____________

5 8 ____________

2. Below are additional data collected for the same items. Calculate another quality
index and indicate what needs to be improved with the obtained index as a basis.
Item Upper Group Lower Group Index Revision
needed to be
done

1 25 9 ____________

2 9 9 ____________

3 2 8 ____________

4 38 8 ____________

5 1 7 ____________

3. A distracter analysis table is given for a test item given to a class of 60. Obtain
the necessary item statistics using the given data.
Item Difficulty Discrimination Group Alternatives
N=30 Index (p) Index (D)
A B *C D Omit

1 ? ? Upper 2 18 5 0

Lower 0 10 20 0

Write your evaluation on the following aspects of the item:

a. Difficulty of the Item - _______________________________________________

b. Discriminating power of the Item - ____________________________________

c. Plausibility of Options - ______________________________________________

d. Ambiguity of the answer - ____________________________________________

Qualitative Item Analysis

A non-numerical method for analyzing test items not employing student
responses, but considering test objectives, content validity, and technical item
quality.

[Prof Ed 221 Assessment of Student Learning 1]

Judgmentally-based improvement procedures
This approach basically makes use of human judgment in reviewing the
items. The judges are the teachers themselves who know exactly what the test
is for, the instructional outcomes to be assessed, and the items’ level of
difficulty appropriate to his/her class; the teacher’s peers or colleagues who are
familiar with the curriculum standards for the target grade level, the subject
matter content, and the ability of the learners; and the students themselves
who can perceive difficulties based on their past experiences.
 Teachers’ Own Review (Self-review)
It is always advisable for teachers to take a second look at the
assessment tool s/he has devised for a specific purpose. To presume
perfection right away after its construction may lead to failure to detect
shortcomings of the test or assessment task. There are five (5)
suggestions given by Popham (2011, p253) for the teachers to follow in
exercising judgment:

1. Adherence to item-specific guidelines and general item-writing

commandments. There are specific guidelines in writing various forms of
objective and non-objective constructed-response types and the
selected-response types for measuring lower level and higher level thinking
skills. Those guidelines should be used by the teachers to check how good
the items have been planned and written particularly in their alignment to
intended instructional outcomes.

2. Contribution to score-based inference. The teacher examines if the

expected scores generated by the test can contribute to making valid
inference about the learners. Can the scores reveal the amount of learning
achieved or show what have been mastered? Can the score infer the
student’s capability to move on to the next instructional level? Or rather the
scores obtained do not make any difference at all in describing or
differentiating various abilities.

3. Accuracy of content. This review should especially be considered when

tests have been developed after a certain period of time. Changes that may
occur due to new discoveries or developments can redefine the test content
of a summative test. If this happens, the items or the key to correction may
have to be revisited.

4. Absence of content gaps. This review criterion is especially useful in

strengthening the score-based inference capability of the test. If the
current tool misses out on important content now prescribed by a new
curriculum standard, the score will likely not give an accurate description of
what is expected to be assessed. The teacher always ensures that the
assessment tool matches what is currently required to be learned. This is a
way to check on the content validity of the test.

[Prof Ed 221 Assessment of Student Learning 1]

5. Fairness. The discussions on item-writing guidelines always give
warning on unintentionally favoring the uninformed students obtain higher
scores. These are due inadvertent grammatical clues, unattractive
distracters, ambiguous problems, and messy test instructions. Sometimes,
unfairness can happen because of due advantage received by a particular
group like those seated in front of the classroom or those coming from a
particular socio-economic level. Getting rid of faulty and biased items and
writing clear instructions definitely add to the fairness of the test.

 Peer review
There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-based
assessment tasks. During these teacher dyad or triad sessions, those teaching
the same subject area can openly review together the classroom tests and tasks
they have devised against some consensual criteria. The suggestions given by
test experts can actually be used collegially as basis for a review checklist:
a. Do the items follow the specific and general guidelines in writing items
especially on:
 Being aligned to instructional objectives?
 Making the problem clear and unambiguous?
 Providing plausible options?
 Avoiding unintentional clues?
 Having only one correct answer?
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?

 Student Review
Engagement of students in reviewing items has become a laudable practice
for improving classroom tests. The judgement is based on the students’
experience in taking the test, their impressions and reactions during the testing
event. The process can be efficiently carried out through the use of a review
questionnaire. Popham (2011) illustrates a sample questionnaire shown in Table
3. It is better to conduct the review activity a day after taking the test so the
students still remember the experience when they see a blank copy of the test.

Table 3. Item-Improvement Questionnaire for Students.

1. If any of the items seemed confusing, which ones were they?
2. Did any items have more than one correct answer? If so, which ones?
3. Did any items have no correct answers? If so, which ones?
4. Were there words in any items that confused you? If so, which ones?
5. Were the directions for the test, or for particular subsections, unclear? If so,

[Prof Ed 221 Assessment of Student Learning 1]

which ones?

Acitivity 6.3 Classifying Item - Improvement Approach. Below are descriptions of

procedures done, write J if a judgmental approach is used and E if empirically based.
__________ 1. The Math Coordinator of Grade VII classes examined the periodical
tests prepared by the Math teachers to see if their items are aligned to the target
outcomes for the first quarter.
__________ 2. The alternatives of the multiple-choice items of the Social Studies
test were reviewed to discover if they only have one correct answer.
__________ 3. To determine if the items are efficiently discriminating between the
more able students from the less able ones, a Biology teacher obtained the
discrimination index (D) of the items.
__________ 4. A Technology Education teacher was interested to see if the
criterion-referenced test he has devised shows a difference in the items’ post-test
and pre-test p-values.
__________ 5. An English teacher conducted a session with his students to find out
if there are other responses acceptable in their literature test. He encouraged them
to rationalize their answers.

ITEM ANALYSIS MODIFICATIONS FOR THE

CRITERION-REFERENCED TEST
The statistical test analysis method discussed earlier, called quantitative item
analysis, applies most directly to the norm-referenced test. The classroom teacher
will typically use criterion-referenced tests rather than norm referenced tests. Well,
then, we can just use these same procedures for our teacher-made
criterion-referenced tests. Right? Wrong!
As we will discover in later chapters, variability of scores is crucial to the
appropriateness and success of norm-referenced quantitative item analysis
procedures. In short, these procedures depend on the variability or spread of scores
(i.e., low to high) if they are to do their jobs correctly. In a typical teacher-made
criterion-referenced test, however, variability of scores would be expected to be
small, assuming instruction is effective and the test and its objectives match. Thus,
the application of quantitative item analysis procedures to criterion-referenced
measures may not be appropriate, since by definition most students will answer
these items correctly (i.e., there will be minimal variability or spread of scores). In
this section we will describe several ways in which these procedures can be modified
when a criterion-referenced, mastery approach to test item evaluation is employed.
As you will see, these modifications are straightforward and easier to use than the
quantitative procedures described earlier.

 Using Pre- and Post-tests as Upper and Lower Groups

[Prof Ed 221 Assessment of Student Learning 1]

The following approaches require that you administer the test as a pretest prior
to your instruction and as a post-test after your instruction. Ideally, in such a
situation the majority of students should answer most of your test items incorrectly
on the pretest and correctly on the post-test. By studying the difference between
the difﬁculty (p) levels for each item at the time of the pre- and post-tests, we can
tell if this is happening. At pretest, the p level should be low (e.g., 0.30 or lower), and
at post-test, it should be high (e.g., 0.70 or higher). In addition, we can consider the
pretest results for an item as the lower group (L) and post-test results for the item as
the upper group (U), and then we can perform the quantitative item analysis
procedures previously described to determine the discrimination direction for the
key and for the distractors.

[Prof Ed 221 Assessment of Student Learning 1]

If a criterion-referenced test item manifests these features, it has passed our
“test” and probably is a good item with little or no need for modiﬁcation. Contrast
this conclusion, however, with the following item from the same test.

[Prof Ed 221 Assessment of Student Learning 1]

Thus, the item in Example 2 failed all the tests. Rather than modify the item, it
is probably more efﬁcient to replace it with another.

 Comparing the Percentage Answering Each Item Correctly on Both Pre- and
Pos t-test
If your test is sensitive to your objectives (and assuming you teach to your
objectives), the majority of learners should receive a low score on the test prior to
your instruction and a high score afterward. This method can be used to determine
whether this is happening. Subtract the percentage of students passing each item
before your instruction from the percentage of students passing each item after
your instruction. The more positive the difference, the more you know the item is
tapping the content you are teaching. This method is similar to the ﬁrst step as
described in the preceding section. For example, consider the following percentages
for ﬁve test items:

Notice that item 3 registers no change in the percentage of students passing

from before to after instruction. In fact, a high percentage of students got the item
correct without any instruction! This item may be eliminated from the test, since
little or no instruction pertaining to it was provided and most students already knew
the content it represents.
Now, look at item 5. Notice that the percentage is negative. That is, 14% of the
class actually changed from getting the item correct before instruction to getting it
wrong after. Here, either the instruction was not related to the item or it actually
confused some students who knew the correct answer beforehand. A revision of the
item, the objective pertaining to this item, or the related instruction is in order.

[Prof Ed 221 Assessment of Student Learning 1]

 Determining the Percentage of Items Answered in the Expected Direction
for the Entire Test
Another, slightly different approach is to determine whether the entire test
reﬂects the change from fewer to more students answering items correctly from
pre- to post-test. This index uses the number of items each learner failed on the test
prior to instruction but passed on the test after instruction. Here is how it is
computed:

Step 1: Find the number of items each student failed on the pretest, prior to
instruction, but passed on the post-test, after instruction.

The asterisks indicate just the items counted in Step 1 for Bobby. This count is
then repeated for each student.
Step 2: Add the counts in Step 1 for all students and divide by the number of
students.
Step 3: Divide the result from Step 2 by the number of items on the test.
Step 4: Multiply the result from Step 3 by 100.

Let’s see how this would work for a 25-item test given to ﬁve students before
and after instruction.

[Prof Ed 221 Assessment of Student Learning 1]

DEBRIEFING GUIDELINES
Before handing back answer sheets or grades, you should do the following.
1. Discuss problem items.
2. Listen to student reactions.
3. Avoid on-the-spot decisions.
4. Be equitable with changes.
5. Ask students to double-check.
6. Ask students to identify problems.

THE PROCESS OF EVALUATING CLASSROOM ACHIEVEMENT

Figure 2 summarizes all of the important components of achievement testing
that we have discussed thus far. If you’ve studied and worked at these chapters, you
are ahead in the test construction game. What that means for you is better tests
that cause fewer students and parents to complain and tests that are more valid and
reliable measurements of achievement.

[Prof Ed 221 Assessment of Student Learning 1]

Figure 2. The process of measuring achievement in the classroom. (Kubiszyn &
Borich, 2013 p. 240)

[Prof Ed 221 Assessment of Student Learning 1]

References

[1] De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana
Publishing Co., Inc.
[2] Kubiszyn, T., & Borich, G. (2013). Educational testing and measurement:
Classroom application and practice. John Wiley & Sons, Inc.
[3] Navaroo, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning 1.
Lorimar Publishing, Inc.
[4] Popham, W.J. (2017). Classroom assessment: What teachers need to know.
Pearson Education, Inc.

[Prof Ed 221 Assessment of Student Learning 1]

SNAP-Iv 26 - Teacher and Parent Rating Scale: Patient Name: Date of Birth: MRN/File No: Physician Name: Date
No ratings yet
SNAP-Iv 26 - Teacher and Parent Rating Scale: Patient Name: Date of Birth: MRN/File No: Physician Name: Date
1 page
Pricing Strategy Plan PDF
100% (1)
Pricing Strategy Plan PDF
6 pages
Report1 (Constructing Test Items)
100% (2)
Report1 (Constructing Test Items)
17 pages
Properties of Assessment Methods
100% (5)
Properties of Assessment Methods
19 pages
SIP Annex 2B Child Protection Policy Implementation Checklist
No ratings yet
SIP Annex 2B Child Protection Policy Implementation Checklist
3 pages
M2L6.1 - Administering, Analyzing, and Improving Tests
No ratings yet
M2L6.1 - Administering, Analyzing, and Improving Tests
25 pages
Administering Analyzing and Improving Test
100% (2)
Administering Analyzing and Improving Test
13 pages
Administering
100% (2)
Administering
3 pages
Types of Quantitative Item Analysis
No ratings yet
Types of Quantitative Item Analysis
8 pages
Administering, Analyzing and Improving Test
100% (1)
Administering, Analyzing and Improving Test
3 pages
Lesson 5 What and Why of Performance Assessment
100% (1)
Lesson 5 What and Why of Performance Assessment
15 pages
Chapter 2 Target Testing
No ratings yet
Chapter 2 Target Testing
40 pages
Prof Ed 5 02. Measurement Assessment and Evaluation
No ratings yet
Prof Ed 5 02. Measurement Assessment and Evaluation
22 pages
I. Activities/Assessment:: Activity 1: Assessment Scenarios (3 Points Each)
100% (3)
I. Activities/Assessment:: Activity 1: Assessment Scenarios (3 Points Each)
13 pages
Module 3 Principles of High Quality Assessment
No ratings yet
Module 3 Principles of High Quality Assessment
17 pages
5.analyzing Test Items PDF
100% (3)
5.analyzing Test Items PDF
67 pages
Assessment of Learning 2 Chapter 3
100% (3)
Assessment of Learning 2 Chapter 3
14 pages
4.00 - Designing Meaningful Performance-Based Assessment
75% (4)
4.00 - Designing Meaningful Performance-Based Assessment
34 pages
8 - Selecting and Constructing Test Items and Tasks - Notes
No ratings yet
8 - Selecting and Constructing Test Items and Tasks - Notes
20 pages
Gajultos AssessmentinLearning2 Project
No ratings yet
Gajultos AssessmentinLearning2 Project
4 pages
Chapter 1introduction To Assessment in Learning 2
No ratings yet
Chapter 1introduction To Assessment in Learning 2
39 pages
Types of Objective Tests: Identifying Test Objectives
No ratings yet
Types of Objective Tests: Identifying Test Objectives
11 pages
Affective Learning Competencies: Educ106.2 - MR - Dennis Ubenia Bsed-Eng - Educ3B
No ratings yet
Affective Learning Competencies: Educ106.2 - MR - Dennis Ubenia Bsed-Eng - Educ3B
48 pages
Developing Assessment Tools
No ratings yet
Developing Assessment Tools
7 pages
8c Item Analysis
100% (1)
8c Item Analysis
6 pages
Process Oriented Performance Based Assessment
No ratings yet
Process Oriented Performance Based Assessment
29 pages
Normal Distribution: Example: John Michael Obtained A Score of 82 in
No ratings yet
Normal Distribution: Example: John Michael Obtained A Score of 82 in
3 pages
AL2-Module7-Grading and Reporting
0% (1)
AL2-Module7-Grading and Reporting
7 pages
CHAPTER 2 Assessment of Learning
100% (1)
CHAPTER 2 Assessment of Learning
16 pages
Grace Ann Lautrizo Bsed Fil Iii-A: ST ST
No ratings yet
Grace Ann Lautrizo Bsed Fil Iii-A: ST ST
8 pages
Assessing Skills, Deep Understanding and Reasoning: Performance-Based Assessments
No ratings yet
Assessing Skills, Deep Understanding and Reasoning: Performance-Based Assessments
34 pages
Scenario Based Problem Solving
No ratings yet
Scenario Based Problem Solving
9 pages
Educ 105 A
No ratings yet
Educ 105 A
37 pages
Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor
No ratings yet
Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor
12 pages
Em 23 Answers
No ratings yet
Em 23 Answers
12 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
5 pages
Alternative Response
0% (1)
Alternative Response
28 pages
Selecting and Constructing Test Items and Task2
100% (1)
Selecting and Constructing Test Items and Task2
10 pages
Lesson 3 Role of Assessment in Instructional Decisions
No ratings yet
Lesson 3 Role of Assessment in Instructional Decisions
35 pages
Lesson 6 Establishing Test Validity and Reliability
No ratings yet
Lesson 6 Establishing Test Validity and Reliability
19 pages
Reupload: Product Oriented Performance Based Assessment
100% (1)
Reupload: Product Oriented Performance Based Assessment
25 pages
Cal/ Cale/ Calm Performance-Based Assessment
No ratings yet
Cal/ Cale/ Calm Performance-Based Assessment
24 pages
AL 4.1 - Utilization of Assessment Data
No ratings yet
AL 4.1 - Utilization of Assessment Data
10 pages
Essay Test
100% (1)
Essay Test
48 pages
Module 3 (AoL2)
No ratings yet
Module 3 (AoL2)
8 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Principles of High Quality Assessment
No ratings yet
Principles of High Quality Assessment
31 pages
ASSESSMENT IN LEARNING 1 Module 1
No ratings yet
ASSESSMENT IN LEARNING 1 Module 1
53 pages
2 Principles of High Quality Classroom Assessment
75% (4)
2 Principles of High Quality Classroom Assessment
62 pages
Section 1 Explain The Nature and Roles of A Good Assessment, and Its Relevance To Learners, Teachers, Parents and Stakeholders
No ratings yet
Section 1 Explain The Nature and Roles of A Good Assessment, and Its Relevance To Learners, Teachers, Parents and Stakeholders
15 pages
Teacher Made Tests
100% (3)
Teacher Made Tests
41 pages
Lesson6 Establishing Test Validity and Reliability
No ratings yet
Lesson6 Establishing Test Validity and Reliability
42 pages
Product-Oriented Performance Based Assessment
No ratings yet
Product-Oriented Performance Based Assessment
11 pages
Activity 3
No ratings yet
Activity 3
2 pages
Insturctional Material
No ratings yet
Insturctional Material
4 pages
Lesson 6 Currcilum Evaluation
100% (1)
Lesson 6 Currcilum Evaluation
6 pages
Assess 213-8 Selecting and Constructing Test Items and Tasks
100% (1)
Assess 213-8 Selecting and Constructing Test Items and Tasks
33 pages
Selecting and Constructing Test Items
No ratings yet
Selecting and Constructing Test Items
6 pages
Types of Tests
No ratings yet
Types of Tests
5 pages
Administering Test
No ratings yet
Administering Test
17 pages
Saint Joseph College of Sindangan Incorporated: Pr-Al 1
No ratings yet
Saint Joseph College of Sindangan Incorporated: Pr-Al 1
16 pages
Palacio (Module4exercises)
No ratings yet
Palacio (Module4exercises)
3 pages
Profed 8
No ratings yet
Profed 8
7 pages
FORMAT For Chapter 1-4
No ratings yet
FORMAT For Chapter 1-4
2 pages
Age Frequency Percentage: Total 72 100% Table 1. Teacher's Age
No ratings yet
Age Frequency Percentage: Total 72 100% Table 1. Teacher's Age
17 pages
Final Examination: 1. Define and Explain The Following Terms: A. Assessment For Learning
No ratings yet
Final Examination: 1. Define and Explain The Following Terms: A. Assessment For Learning
5 pages
Case Study SABAH - Santizo Angel B. - BAPOS AP 13
No ratings yet
Case Study SABAH - Santizo Angel B. - BAPOS AP 13
7 pages
000 Revised Teachers Attitude 1
No ratings yet
000 Revised Teachers Attitude 1
30 pages
Case Study Sta - Ana Justine Lovely Joy - BAPOS AP1 3
No ratings yet
Case Study Sta - Ana Justine Lovely Joy - BAPOS AP1 3
5 pages
Matching Type Test Items: Premises Are The Items On The Left. Responses Are The Items in The Right-Hand Column
100% (1)
Matching Type Test Items: Premises Are The Items On The Left. Responses Are The Items in The Right-Hand Column
5 pages
TN 3a Taboo
No ratings yet
TN 3a Taboo
2 pages
5.1 Planning For A Test and Constructing A Table of Specifications
100% (1)
5.1 Planning For A Test and Constructing A Table of Specifications
8 pages
Assessing Knowledge and Comprehension
No ratings yet
Assessing Knowledge and Comprehension
9 pages
The Problem and Its Background
No ratings yet
The Problem and Its Background
23 pages
Binary-Choice Test Items
No ratings yet
Binary-Choice Test Items
9 pages
The Effects of Mobile Games On The Monthly Performance of Selected Employees in 24-7intouch
100% (3)
The Effects of Mobile Games On The Monthly Performance of Selected Employees in 24-7intouch
2 pages
Trisha
No ratings yet
Trisha
1 page
Hawak
No ratings yet
Hawak
2 pages
Trisha 2
No ratings yet
Trisha 2
2 pages
Rich Henson
No ratings yet
Rich Henson
3 pages
Aira Module 2
No ratings yet
Aira Module 2
4 pages
Does Educational Attainment Determine Financial Success? A Statistical Assessment
No ratings yet
Does Educational Attainment Determine Financial Success? A Statistical Assessment
2 pages
Organizational Plan: Over All In-Charge Aira Andrino Marketing Officer Aira Andrino Production Officer Aira Andrino
No ratings yet
Organizational Plan: Over All In-Charge Aira Andrino Marketing Officer Aira Andrino Production Officer Aira Andrino
2 pages
Top Achievers Private School, Inc: Teacher'S Clearance
No ratings yet
Top Achievers Private School, Inc: Teacher'S Clearance
1 page
Name Program: MAELT Professor: Doc. Junie Rosales:: de Lara, Siegfred Z
No ratings yet
Name Program: MAELT Professor: Doc. Junie Rosales:: de Lara, Siegfred Z
6 pages
Isucyn-Ara-Aopf-025 Effectivity: April 10, 2019 Revision: 0
No ratings yet
Isucyn-Ara-Aopf-025 Effectivity: April 10, 2019 Revision: 0
2 pages
2n Quarter SCIENCE 8 - Quiz 2
100% (1)
2n Quarter SCIENCE 8 - Quiz 2
2 pages
CS Form No. 212 Attachment Work Experience Sheet
No ratings yet
CS Form No. 212 Attachment Work Experience Sheet
2 pages
Lesson 1 (Part 2) - Counseling
No ratings yet
Lesson 1 (Part 2) - Counseling
102 pages
Ronald A. Moline - The Diagnosis and Treatment of Dissociative Identity Disorder - A Case Study and Contemporary Perspective (2012, Jason Aronson, Inc.)
No ratings yet
Ronald A. Moline - The Diagnosis and Treatment of Dissociative Identity Disorder - A Case Study and Contemporary Perspective (2012, Jason Aronson, Inc.)
217 pages
Employee Training Plan Onboarding
0% (1)
Employee Training Plan Onboarding
6 pages
The Key To A Happy Life
No ratings yet
The Key To A Happy Life
6 pages
Sociology of Education Bundle
No ratings yet
Sociology of Education Bundle
2 pages
The Effectiveness of Stress Management in Grade 12 HUMSS A in Angono National High School
0% (1)
The Effectiveness of Stress Management in Grade 12 HUMSS A in Angono National High School
3 pages
Socio Google Form
No ratings yet
Socio Google Form
2 pages
Jayasinghe & Wijethunge
No ratings yet
Jayasinghe & Wijethunge
4 pages
1 Love & Relationships - 5 Good Questions To Ask A Guy To Make Him Feel Loved
No ratings yet
1 Love & Relationships - 5 Good Questions To Ask A Guy To Make Him Feel Loved
4 pages
Using The Nine-Consciousness Concept of Vijñ Ānavāda in Moral Judgment
No ratings yet
Using The Nine-Consciousness Concept of Vijñ Ānavāda in Moral Judgment
13 pages
Unit 1 AOS 1: Health Perspectives and Influences
No ratings yet
Unit 1 AOS 1: Health Perspectives and Influences
16 pages
Ali Sameer (2020) PHD Thesis (Securitization and Gender)
No ratings yet
Ali Sameer (2020) PHD Thesis (Securitization and Gender)
245 pages
Fundamentals of Nursing: (BATCH 2021 Bachelor of Science in Nursing 1 - Adeventist Medical Center College - Iligan
No ratings yet
Fundamentals of Nursing: (BATCH 2021 Bachelor of Science in Nursing 1 - Adeventist Medical Center College - Iligan
5 pages
The Development of Moral Behavior
No ratings yet
The Development of Moral Behavior
18 pages
Team Management System: Submitted By: Pallavi Midha, Japneet Kaur, Kritika Dhand
No ratings yet
Team Management System: Submitted By: Pallavi Midha, Japneet Kaur, Kritika Dhand
38 pages
OPSC PGT Exam Pattern 2019
No ratings yet
OPSC PGT Exam Pattern 2019
4 pages
Christine G. Preciosa Cebu Normal University, Osmeña Boulevard, Cebu City
No ratings yet
Christine G. Preciosa Cebu Normal University, Osmeña Boulevard, Cebu City
6 pages
36-Item Short Form Survey Instrument (SF-36) - RAND
No ratings yet
36-Item Short Form Survey Instrument (SF-36) - RAND
5 pages
Topic: Lesson 12: Game-Based Learning
100% (2)
Topic: Lesson 12: Game-Based Learning
3 pages
What Is Research
No ratings yet
What Is Research
4 pages
Community Action & Community Dynamics
No ratings yet
Community Action & Community Dynamics
15 pages
Adhd 2
No ratings yet
Adhd 2
7 pages
F. EB - Jurnal - Handriyono - The Influence of Organizational
No ratings yet
F. EB - Jurnal - Handriyono - The Influence of Organizational
10 pages
Lesson Plan in Physical Education
No ratings yet
Lesson Plan in Physical Education
2 pages
Essay
No ratings yet
Essay
2 pages
Confirmation Class Opening Prayer Fill The Hearts of Your Faithful
No ratings yet
Confirmation Class Opening Prayer Fill The Hearts of Your Faithful
25 pages
Gujarat Technological University: SUBJECT NAME:Professional Ethics Semester IV
No ratings yet
Gujarat Technological University: SUBJECT NAME:Professional Ethics Semester IV
2 pages

Module 6 Administering Analysing and Improving Tests

Uploaded by

Module 6 Administering Analysing and Improving Tests

Uploaded by

CHAPTER 6

Administering, Analyzing, and Improving Tests

Intended Learning Outcomes:

ASSEMBLING THE TEST

Once you have completed these activities you are ready to

 Packaging the Test

1. Group Together All Items of Similar Format

2. Arrange Test Items from Easy to Hard

[Prof Ed 221 Assessment of Student Learning 1]

4. Keep Items and Options on the Same Page

5. Position Illustrations Near Descriptions

6. Check Your Answer Key

7. Determine How Students Record Answers

8. Provide Space for Name and Date

[Prof Ed 221 Assessment of Student Learning 1]

10. Proofread the Test

Figure 1. Test Assembly Checklist (Kubiszyn & Borich, 2013 p. 222)

 Reproducing the Test

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

ADMINISTERING THE TEST

[Prof Ed 221 Assessment of Student Learning 1]

SCORING THE TEST

ANALYZING THE TEST

Quantitative Item Analysis

Empirically-based Improvement Procedures

[Prof Ed 221 Assessment of Student Learning 1]

Where p is the difficulty index

Here are two illustrative samples:

For Normative-referenced test: Between the two items, Item 2 appears to be a

[Prof Ed 221 Assessment of Student Learning 1]

Range of Difficulty Index Interpretation Action

0 - 0.25 Difficult Revise or Discard

0.26 - 0.75 Right Difficult Retain

0.76 - above Easy Revise or Discard

Calculation of the discrimination index therefore requires obtaining the

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

.40 and above Very good items

.30 - .39 Reasonably good items, but possibly

.20 - .29 Marginal items, usually needing

.19 and below Poor items, to be rejected or improved

Items with negative discrimination indices, although significantly high, are

[Prof Ed 221 Assessment of Student Learning 1]

Table 2. Distracter Analysis Table

1 .38 -.35 Upper 2 10 *5 3

Example (Quantitative Item Analysis):

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

Write your evaluation on the following aspects of the item:

b. Discriminating power of the Item - ____________________________________

c. Plausibility of Options - ______________________________________________

d. Ambiguity of the answer - ____________________________________________

Qualitative Item Analysis

[Prof Ed 221 Assessment of Student Learning 1]

1. Adherence to item-specific guidelines and general item-writing

2. Contribution to score-based inference. The teacher examines if the

3. Accuracy of content. This review should especially be considered when

4. Absence of content gaps. This review criterion is especially useful in

[Prof Ed 221 Assessment of Student Learning 1]

Table 3. Item-Improvement Questionnaire for Students.

[Prof Ed 221 Assessment of Student Learning 1]

Acitivity 6.3 Classifying Item - Improvement Approach. Below are descriptions of

ITEM ANALYSIS MODIFICATIONS FOR THE

 Using Pre- and Post-tests as Upper and Lower Groups

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

Notice that item 3 registers no change in the percentage of students passing

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

THE PROCESS OF EVALUATING CLASSROOM ACHIEVEMENT

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

[Prof Ed 221 Assessment of Student Learning 1]

You might also like