8 - Selecting and Constructing Test Items and Tasks - Notes
8 - Selecting and Constructing Test Items and Tasks - Notes
Learning Outcome
Overview
The present section now brings you to answer the second question, “How do I test?” How
do I operationalize assessment of the learning outcomes intended for a period of study? It
now introduces you to a useful menu of test types that are appropriate to gauge the
learning outcomes proposed by the curriculum standards, how to select them and how to
construct them.
TEST
TYPES
Perform
Supply Selectio
ance
Type n Type
Type
Knowledge, as it appears in cognitive taxonomies (Bloom, 1956; Anderson & Krathwol, 2004)
as the simplest and lowest level, is categorized further into what thinking process is involved
in learning.
McMillan (2007) refers to the latter two as simple understanding requiring comprehension of
“concepts, ideas, and generalizations” known as declarative knowledge and application
of skills and procedures learned in new situations, referred to as procedural knowledge.
The examples below will differentiate declarative and procedural knowledge as simple
understanding involving comprehension and application.
Is able to state the Law of Supply and Demand. Is able to compute the area of a rectangle.
Comprehension Comprehension
Is able to explain the Law of Supply and Is able to compare the size of two given lots in
Demand. terms of area.
Application Application
Is able to explain the rising prices of vegetables Is able to determine the number of 1 × 1 tiles
during summer time. needed to cover a 50 ft × 100 ft hall.
Table 3 shows how to align the cognitive levels to learning outcomes with sample behaviors
for each level.
Knowledge and simple understanding involve the first three cognitive levels, i.e.
remembering, comprehending and applying.
Deep understanding requires the three higher cognitive levels, i.e. analyzing,
evaluating and creating.
Knowledge-Understanding Continuum
Knowledge Simple Understanding Deep Understanding
Level 6: Creating
Plan
Generate
Produce
Design
Construct
Compose
Table 4 illustrates the relationship between learning outcomes and test types.
Knowledge-Understanding Continuum
Knowledge Simple Understanding Deep Understanding
Lower-Order: According to the article you just read, what factors contribute to climate change?
(This is a short answer supply type for a simple comprehension question since it is based on a
specific reading material. Without citing the source for the response, it could also be a simple
recall question.)
Higher-Order: Write an article on how the government and the community can work together to mitigate
the factors causing environmental damage?
(This is likewise a supply type but since it requires a higher-order thinking at the “creating” level,
it will call for an extended-essay item. Deeper understanding is necessary to demonstrate this
outcome as it involves synthesis of previous information and observation derived from multiple
sources.
Lower-Order: According to the article you just read, what contributes powerfully to climate change?
A. Volcanic eruption
B. Population explosion
C. Forest denudation
D. Caron emission
(The correct option is based on a specific material which calls for simple comprehension.)
Higher-Order: Which of the following factors which affect climate change can be controlled by man?
A. Strong earthquake
B. Volcanic eruption
C. Melting of glaciers
D. Carbon dioxide emission
(Selection of the correct option requires analysis of the alternatives given. First, it calls for
identifying those factors affecting climate change and later analyzing whether man can
control it or not. Just being able to do one skill will not be adequate to select the correct
option.)
Miller, Linn & Gronlund (2009) presents categories of thought questions for deep
understanding and sample item stems in Table 5.
Why did the candle go out shortly after it was covered by the
jar?
Explaining Explain what the President meant when he said, “The bucks
stops with me.”
Write a letter to the principal to get approval for a class field
trip to the state capital.
Persuading Why the student newspaper should be allowed to decide
what should be printed without prior approval from teachers?
Angelo & Cross (1993) have extensively designed classroom assessment tasks (CATs) for
college level that are performance-based type in nature. Some examples given in Table 6
were taken from their inventory.
1. Completion Type
An item structure consists of a stimulus which defines the question or problem, and a
response which defines what is to be provided or constructed by the learner.
Response
Stimulus
Single word or two, numeral,
Incomplete statement with a blank(s)
symbol, or phrase
Illustrative Items Expected Response
45
c. A book trader sells books 30%
more than what he pays for
them. For a book sold for 𝑃150,
his profit is __________ pesos.
Gap-Filling is another term for this variant as the student fills several gaps in a discourse
depending on the target outcome.
Directions: Give a word that has the same meaning as the word inside the parenthesis.
More than a few people may confuse fine dining with _______________ (costly) dining in
restaurants. Well-trained _______________ (cooks) at the top of their profession can make
their good _______________ (name) in these places. Who the cooks are bring
_______________ (honor) to these restaurants.
b. The blank should be placed at the end or towards the end of the incomplete statement.
Poor: During the _______________ period, Dr. Jose Rizal wrote the novel, Noli Me
Tangere.
Stimulus Response
An interrogative statement Short phrases or
(direct question) statement
Illustrative Items Expected Response
Writing short-answer items similarly follow the guidelines in writing completion items. Here are
those given by McMillan (2007, pp. 170-171) and they are quite self-explanatory.
3. Do not use questions verbatim from textbooks and other instructional materials.
Example:
Poor: How much does the food caterer charge? _______________
Improved: How much does the food caterer charge per head? ___________
Poor: As viewed by creatures from the earth, when does the blood moon appear
in the evening?
Improved: When does a blood moon appear?
The two supply types, completion and short answer items, share common points:
Essay Type
Stimulus Response
a. Incomplete statement with a Single word or two, numeral, symbol or
blank(s) phrase
B. Extended-Response Type. The question or directive does not suggest any form of
restriction in the construction of the response. The students are free to organize and
expound on their ideas freely.
Sample Items Description of Expected Response
1. Explain how the prevailing socio- Student is free to focus on any socio-
economic issues affect the lives of economic issue and choose which
the people in our country today. aspect of the people’s lives he wants to
describe.
Suggestions for constructing essay questions are given by Miller, Linn & Gronlund (2009, p.
243):
1. Restrict the use of essay questions to those learning outcomes that cannot be measured
satisfactorily by objective items.
2. Construct questions that will call forth the skills specified in the learning standards.
Example:
Poor: Why is copper a good material?
Improved: Explain the property of copper that makes it good for making cooking
pans.
The use of a scoring guide called rubrics, can significantly reduce subjectivity and more or
less help in “objectifying” scoring of a non-objective type of item.
If the relevant criteria are singled out and focused separately to show the learner’s profile
across these different dimensions or attributes, analytic scoring is applied.
Organization
Clarity of
Message
Creativity
Total
Overall
Rating
For judging a specific writing genre like an argument, the rubric shown in Table 11 can be
adapted for analytical scoring.
Facts and
opinions clearly
distinguished
Credibility of
source
Relevance of
materials used
Use of logic
Overall Rating
When the attributes are considered together to arrive at an overall judgment or impression,
holistic scoring is in use.
Task: Design a plan for an experiment showing the effect of amount of water on
plant growth.
Scoring Criterion: Completeness of Plan
Rubric:
Label Description
Outstanding All parts of the plan especially the procedure are
concisely and very satisfactorily described.
There are suggestions also given by Miller, Linn & Gronlund (2009, p. 254) to improve the
reliability of scoring responses to essay questions:
There are three sub-types of the selected-response format depending on the number of
given options:
Sample/Suggested Response
Variety Stimulus
Format
The planets of the Solar System
True – False Statement or proposition
revolve around the sun. T F
Do the planets of the Solar
System revolve around the
Yes – No Direct question
sun?
Y N
A computation, an equation,
Right – Wrong Factors of 18 are 2, 3, 6, 9. R W
or statement
Proposition which will be The biggest planet in the Solar
Correction
corrected if incorrect System is the Earth. R W
Correction: Jupiter
Corona virus is easily
transmitted because:
A multiple-choice stem is 1. It is air-borne. T F
Multiple True – False given with statements to be 2. It is transmitted through
judged as true or false body liquids. T F
3. Both children and adults
can be affected. T F
There are suggestions given to construct good binary choice items (McMillan, 2007, Musial,
et. al, 2009) in order to avoid guessing:
Good:
T F Standards are used to interpret criterion-
referenced tests.
Another selected-response item format is the multiple-choice. The wide choice for this
format in classroom testing is mainly due to its versatility to assess various levels of
understanding from knowledge and simple understanding to deep understanding.
Stimulus Response
STEM –
Among the Asian countries, one which has a government with three branches is
_______________.
a. Japan c. Philippines
b. China d. Thailand
Test experts agree on a set of guidelines to achieve this purpose (McMillan, 2007; Miller, Linn
& Gronlund, 2009; Popham, 2011).
Stem
2. Stem should be meaningful by itself and should fully contain the problem.
Poor: The constitution is _______________.
Good: What does the constitution of an organization provide?
(Direct-question format)
The constitution of an organization provides _______________. (Incomplete-
statement format)
3. The stem should use a question with only one correct or clearly best answer.
Poor:
Which product of Thailand makes it economically stable?
A. Rice
B. Dried fruits
C. Dairy products
D. Ready-to-wear
Good:
Which agricultural product of Thailand is most productive for export?
A. Rice
B. Fish
C. Fruits
D. Vegetables
Poor:
What is matter?
A. Everything that surrounds us.
B. All things bright and beautiful.
C. Things we see and hear.
D. Anything that occupies space and has mass.
Quite interesting are the guidelines given by Miller, Linn & Gronlund (2009, p. 212) in making
distracters plausible. See Table 14.
5. Use incorrect answers that are likely to CAUTION: Distracters should distract the
result from student misunderstanding or uninformed, but they should not result in trick
carelessness (e.g. forgets to convert feet questions that mislead knowledgeable
to yards). students (do not insert ‘not’ in a correct answer
to make a distracter).
3. Matching Items
It consists of two parallel lists of words or phrases the students are tasked to pair. The first
list which is to be matched is referred to as premises while the other list from which to
choose its match based on a kind of association is the options.
The first column describes events associated with Philippine presidents while the second
column gives their names. In the space provided, write the letter of the president that
matches the description.
Column A Column B
Column A contains theoretical postulations of how the universe came about. Match each
one with the name of the theory given in Column B. indicate the appropriate letter to the
left of the number in Column A.
Column A Column B
1. Keep the list of premises and the list of options homogeneous or belonging to a category.
2. Keep the premises always in the first column and the options in the second column.
3. Keep the lists in the two columns unequal in number.
4. Test directions always describe the basis for matching.
5. Keep the number of premises not more than eight (8) as shown in the two sample items.
6. Ambiguous lists should be avoided.
It can be seen that matching type as a test format is used quite appropriately in assessing
knowledge outcomes particularly for recall of terminologies, classifications, and
remembering facts, concepts, principles, formulae, and associations.
Source: De Guzman, E. & Adamos, J. (2015). Assessment of Learning 1. Adriana Publishing Company:
Quezon City