0% found this document useful (0 votes)
3 views9 pages

OwenN Testspecifications

The document discusses the importance of test specifications in criterion-referenced testing (CRT), emphasizing their role as blueprints for creating valid and reliable tests. It outlines the components of a test specification, including content descriptions, prompt attributes, response attributes, sample items, and additional specifications. The document also highlights the iterative nature of test specs and their significance for ensuring consistency and comparability across different test versions.

Uploaded by

Luis Rivenditore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

OwenN Testspecifications

The document discusses the importance of test specifications in criterion-referenced testing (CRT), emphasizing their role as blueprints for creating valid and reliable tests. It outlines the components of a test specification, including content descriptions, prompt attributes, response attributes, sample items, and additional specifications. The document also highlights the iterative nature of test specs and their significance for ensuring consistency and comparability across different test versions.

Uploaded by

Luis Rivenditore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336070755

Test Specifications

Chapter · January 2018


DOI: 10.1002/9781118784235.eelt0354

CITATIONS READS

0 16,704

1 author:

Nathaniel Owen
University of Oxford
19 PUBLICATIONS 188 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nathaniel Owen on 26 September 2019.

The user has requested enhancement of the downloaded file.


Test Specifications
NATHANIEL OWEN

­Framing the Issue

The development of criterion-referenced testing (CRT) led to the development of


relating test-taker performance directly to pre-defined knowledge, skills, or
abilities (KSA). The focus on clearly defined KSA led to the widespread use of test
specifications, which are now an essential tool for the development of valid and
reliable tests. A test specification (“spec”) is the iterative, generative blueprint of
test creation (Davidson & Lynch, 2002). Specs are the procedural guidelines used
for the purpose of creating multiple versions of a test for different groups of test
takers. Specs are iterative in that they are not “fixed” once written and may be
subject to revision as the test evolves. The metaphor of blueprints as an explana-
tion for test specs is widely used and serves as a useful conceptual reference point
(Alderson, Clapham, & Wall, 1995; Davidson & Lynch, 2002; Fulcher & Davidson,
2007; Davidson, 2012). Each item type will have its own spec, and the complete test
will contain a variety of these item types representing the test developers’ concep-
tion of the target domain. Individual specs may vary in their level of granularity
(level of detail) depending on the needs of test developers or item writers.
A basic test specification should provide information about the content of the
test, including the number of items, the variety of item types, the format of the test,
how the test is delivered to test takers (paper-based or computer), and information
about any input material (texts) that will be presented to test takers as part of the
test. These are termed content-based specifications (Raymond & Neustel, 2006, pp.
201–3). A test specification at a fine level of granularity should show how individual
KSA within the target domain have been operationalized at the item level. Specs at
this level should also reference the number of items required to make claims about
these individual KSA (Davidson, 2012). Specs that detail the KSA that test takers
require to complete specific items are termed process-based test specifications
(Raymond & Neustel, 2006, p. 202). Alternatively, specs may contain information
related to both content and process. Specs for a complete test are termed a table of
specifications. The specification is used as evidence of a complete chain of inferential
reasoning between observed variables (test-taker responses) and claims made
about test takers in relation to the target domain. This is known as the validity

The TESOL Encyclopedia of English Language Teaching. Edited by John I. Liontas.


© 2018 John Wiley & Sons, Inc. Published 2018 by John Wiley & Sons, Inc.

eelt0354.indd 1 5/4/2017 8:50:51 PM


2 Test Specifications

argument (Messick, 1989; Kane, 1992), although it is important to stress that the
specs as a record of test content are only one form of evidence and do not constitute
sufficient evidence for a complete validation argument.

­Making the Case

A detailed set of specs represents the test developers’ conception of the domain of
interest. The domain is the area of language to which we wish to make predictions
on the basis of test scores (e.g., reading for academic purposes) (Fulcher &
Davidson, 2007, p. 371). This is an important part of test reliability, as specs enable
developers to alter the content of the test without altering the validity argument
(how the test represents the domain of interest). Spec-driven testing provides the
rationale for comparing results across different versions of the same test. In order
to compare across versions of a test, the two (or more) instruments must share
statistical properties. In general, sections of a test reflect different elements of the
target domain.
The level of detail required in the spec should be determined by the test devel-
opers by considering who the spec is being written for (stakeholders). Stakeholders
may include teachers, students, administrators, test developers, item writers, pol-
icy makers, and even members of the public who may need to be convinced of the
face validity of a test. At the institutional level, some of these roles may overlap, as
tests are frequently produced by groups of teachers who have been asked to take
on item-writing responsibilities in addition to their teaching and lesson preparation
workload. The specs will therefore have a strong content focus to enable the
reproduction of many forms of the same test. Detailed specs can be time-consuming
to produce, potentially precluding significant planning before a test needs to be
implemented.
No universal framework exists regarding the construction of a test specification
and, as a result, no two specs will look exactly alike and their production is not an
exact science. Nonetheless, guidelines exist that provide assistance to test develop-
ers in the creation of a spec. The most widely cited guidelines regarding the con-
tent of a spec are those of Popham (1978). These guidelines are further developed
by Davidson and Lynch (2002). The guidelines contain five key components,
defined later.
Tests may initially be produced without reference to test specifications. One
solution to this issue is reverse engineering. Reverse engineering (RE) may be defined
as “the creation of a test specification from representative sample items” (Davidson
& Lynch, 2002, p. 41). RE involves the analysis of individual items to attempt to
identify those KSA that are deemed to be important to the target domain, as well as
how they have been realized at the item level. This “RE spec” can then be used as a
guideline to produce similar items or to produce a representative bank of sample
items for future use. RE constitutes a form of evidence in a validation argument to
establish a direct link between test items and the overall aims of the test or may be
used to critique existing specs or guidelines (Davidson & Lynch, 2002, pp. 42–3).

eelt0354.indd 2 5/4/2017 8:50:51 PM


Test Specifications 3

Table 1 Test specification content.

1 General A detailed description of what is to be tested, to convey the


description (GD) purpose and motivation of the test.

2 Prompt attribute (PA) This is what is to be given to the test taker; what they will see
on their screen or test paper. It is this “stimulus” that triggers
the response that is to be measured (also titled prompt
stimulus).
3 Response attribute This describes what the test taker will do; this may be a
(RA) selected (e.g., multiple-choice selection of an option) or a
constructed response (elaborate writing assignment).
4 Sample item (SI) This provides a manifest example of the three previous
sections and “brings to life” (Davidson & Lynch, 2000, p. 26)
the three previous components.
5 Specification Any additional information that is not contained in the
supplement (SS) previous sections. This may require extra information about
the types of text that are to be selected, or other information
which would make other sections too bloated.
Adapted from Davidson and Lynch (2002, p. 14).

­Pedagogical Implications

Owen (2011) attempted to codify an approach to RE. Since test development occurs
in principled stages, then test deconstruction must also follow rigorous procedures.
Figure 1 is an example of a spec written for a multiple-choice reading question
from an IELTS test. It is written following Davidson and Lynch’s guidelines from
a collection of IELTS test materials without access to the original spec.
The GD describes the IELTS reading component generally, stating the purpose
of the test and the individual sub-skills identified as part of the domain of
interest. It also provides a brief overview of the item types found in the test. In
this example, the PA is considerably detailed. It provides details of the texts that
will be presented to the test takers (input material). Multiple-choice items were
examined for consistency in design patterns. Ten design principles were
identified to which all of the IELTS multiple-choice items adhered. Design
principles relate to characteristics of the stem, key, distractors, grammatical
features, and how each of these interacts with the relevant portions of the text.
The RA is shorter than the PA, reflecting minimal observed behavior we expect
to see in response to this item. The work product in this case involves selecting
one option from four and writing the selected option on the relevant portion of
the answer sheet. The SI in this case is a new multiple-choice item written
according to these new guidelines. The key is marked by an asterisk. The SS
provides additional information that developers should consider including in
relation to the KSA and textual references that the item writers identified as
being essential to completing specific items. Specific terms are included to show
how the key coheres with this part of the text. Additional information explaining

eelt0354.indd 3 5/4/2017 8:50:51 PM


4 Test Specifications

General Description (GD)

The IELTS academic reading paper is a method of assessing the academic reading ability of test-takers. It is a
criterion-measured test that states the degree to which a test taker demonstrates attainment of specified
criterion. As a test of academic reading, linguistic skills tested are as follows: skimming, scanning,
understanding of vocabulary and synonyms, grammatical and lexical cohesion, spelling, logical inference,
understanding of principal message; differentiating between main ideas and supporting ones and
understanding stages in a progressive argument. These skills are tested through a variety of question types,
including: multiple choice, identifying information, identifying writer’s views/claims, matching information,
matching headings, matching features, matching sentence endings, sentence completion, summary
completion, note completion, table completion, flow-chart completion, diagram label completion and short-
answer questions.

Prompt Attribute (PA) – Multiple-choice items

Skills tested: detailed/overall understanding of main points (synonym & paraphrasing skills; logical inference).

 Instructions should be consistent and unambiguous.


 The stem may use words directly from the text or may paraphrase.
 Each stem should be followed by four choices; the key and three distractors (A-D) and the location of
the key should be randomised.
 The key should be of approximately the same length as the distractors.
 The stem should not contain clues to the key (either by including words from the key, or writing
distracters that are not grammatically consistent with the stem).
 Stem should be an incomplete sentence, although it may contain more than one clause. The key and
distracters must all be plausible and grammatically accurate.
 Key /distracters may be clauses or noun phrases, as is consistent with the stem, although they should
all follow the same grammatical pattern.
 Multiple choice questions may be related to any text in the test, and may refer to any paragraph
within that text. The question may specify a particular paragraph in which the answer can be located,
may refer to a particular author, or author’s research mentioned in the text, or may ask for a suitable
title.
 Both the stem and the key should be paraphrases of information contained within the text. The key
can be negative. The answer should not be obtained by simply matching words; meaning must be
congruent.
 Distracters may use language directly from the text; however, these lexical items must be located in
the same paragraph as the information required to correctly select the stem.
 Distracters may have the opposite meaning to the key, but should avoid using similar language (such
as simply adding a negative prefix).

Response Attribute (RA)

Test taker will select one option from the four available. Test takers will record their answer by filling in the
appropriate section of the answer sheet. Linking the stem with any of the distracters produces plausible and
grammatical (albeit inaccurate) sentences in relation to the text. The test taker will identify references to the
distracters throughout the text, although they do not use language directly from the text. Only the key
provides the coherent meaning between the item and the intended textual reference.

eelt0354.indd 4 5/4/2017 8:50:51 PM


Test Specifications 5

Sample Item

Choose the correct letter, A, B, C or D.

14.) Pat Mulroy criticises the ‘Law of the River’ because

A. it came into effect after 20 years of record water flows.


B. it unfairly distributes water between rival states.*
C. it allocates more water than is available in the river.
D. it creates rivalry between urban and rural users.

Specification Supplement for item 14 (text not included)

Textual reference [“Pat Mulroy criticises”] leads reader to paragraph ten – direct quotation; textual cohesion
[“get cut” = “distribute”; “idiocy” = “unfairly”]; textual coherence [“this idiocy” = “The law’s seniority
rules…lost a drop”.] Option D not possible [“creates rivalry” does not cohere with “theoretically”]
.

Skills relevant to the target domain: Understanding of main point; logical inference; synonym recognition
(lexical knowledge); textual cohesion and coherence.

Figure 1 A reverse-engineered IELTS reading specification for a multiple-choice item.

why the distractors are incorrect is included. Including this hypothetical item
completion process also provides teachers with the means to critique and test
certain items in class to determine whether test takers are completing the item
according to the intended processes. Use of a glossary for technical terms and
emphasis on the textual references to explain item completion reflects the desire
of developers to minimize the influence of schematic (background) knowledge
on the outcome of such items. Such language can be pre-taught in classrooms as
part of a syllabus.

­Conclusion

Working with a consistent series of blueprints for a test helps to ensure valid and
reliable tests. The test development process is most successful when it involves a
range of stakeholders, including teachers. The spec format does not have to follow
the one presented here, but should contain sufficient elements such that other
stakeholders may produce equivalent items or to critique the test design or content.
spi0662
SEE ALSO: The Assessment Development Process; General Principles; Reliability; spi0641
Validity eelt0347
eelt0346

eelt0354.indd 5 5/4/2017 8:50:52 PM


6 Test Specifications

References

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge, England: Cambridge University Press.
Davidson, F. (2012). Test specifications and criterion-referenced assessment. In G. Fulcher &
F. Davidson (Eds.), The Routledge handbook of language testing. London, England: Routledge.
Davidson, F., & Lynch, B. (2002). Testcraft: A teacher’s guide to writing and using language test
specifications. New Haven, CT: Yale University Press.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment. London, England:
Routledge.
Kane, M. (1992). An argument-based approach to validation. Psychological Bulletin, 112,
527–35.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103).
New York, NY: American Council on Education/Macmillan.
Owen, N. (2011). Reverse engineering: A model and case study (Unpublished MA dissertation).
University of Leicester.
Popham, W. J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice Hall.
Raymond, M. R. and Neustel, S. (2006). Determining the content of credentialising
examinations. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development.
Mahwah, NJ: Lawrence Erlbaum.

Suggested Readings

Fulcher, G. (2015). Re-examining language testing: A philosophical and social inquiry. London,
England: Routledge. This work tackles the fundamental issues underlying the field of
language testing. It goes beyond the technical considerations of designing and
implementing a language test to consider the ethical and philosophical questions around
testing and the claims that we wish to make about test takers. The work is a fantastic
primer to those new to the field and brings together key works in philosophy (chapter 1),
science (chapter 2), and linguistics (chapter 3) that have contributed to making the field
of language testing what it is today.

eelt0354.indd 6 5/4/2017 8:50:52 PM


Please note that the abstract and keywords will not be included in the printed
book, but are required for the online presentation of this book which will be
published on Wiley’s own online publishing platform.
If the abstract and keywords are not present below, please take this opportu-
nity to add them now.
The abstract should be a short paragraph upto 200 words in length and
keywords between 5 to 10 words.

ABSTRACT

Test specifications are iterative, generative blueprints of test design. They are written at
the item level, and allow test developers or item writers to produce new versions of a test
for different test-taking populations. The specs serve as guidelines so that new versions
can be compared to previous versions. Specs may detail the properties of individual
items or the knowledge, skills, and abilities that are encoded in specific items. A series
of specs that make up a complete picture of a test are known as a table of specifications.
Various models have been proposed for writing specs. This entry uses that proposed by
Davidson and Lynch (2002) with a worked example of an IELTS reading test specification
for a multiple-choice item to elucidate what a spec looks like and how it can be written.

KEYWORDS

assessment, criterion-referenced assessment, reliability, validity argument

eelt0354.indd 1 5/4/2017 8:50:52 PM


Author Query

AQ1 Please can you check and let us know if you are fine with your
affiliation as listed below or would you like to update the affiliation.
Nathaniel Owen: The Open University, UK

eelt0354.indd 2 stats
View publication 5/4/2017 8:50:52 PM

You might also like