0% found this document useful (0 votes)
5 views

Chapter 4 Tests and Testing

Chapter 4 discusses the definitions and distinctions between psychological traits and states, emphasizing the importance of clear definitions in test development. It covers the principles of norm-referenced and criterion-referenced testing, highlighting the need for reliability and validity in psychological assessments. The chapter also addresses potential sources of error in measurement and the implications for interpreting test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter 4 Tests and Testing

Chapter 4 discusses the definitions and distinctions between psychological traits and states, emphasizing the importance of clear definitions in test development. It covers the principles of norm-referenced and criterion-referenced testing, highlighting the need for reliability and validity in psychological assessments. The chapter also addresses potential sources of error in measurement and the implications for interpreting test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Psychological Assessment

Chapter 4 Tests and Testing to be measured and quantified


need to be carefully defined.
• A trait has been defined as “any
distinguishable, relatively • Test developers and researchers,
enduring way in which one much like people in general, have
individual varies from another” many ways of looking at and
(Guilford, 1959, p. 6). defining the same phenomenon.

• States also distinguish one • In each of these different


person from another but are contexts, aggressive carries with
relatively less enduring (Chaplin it a different meaning. If a
et al., 1988). personality test yields a score
purporting to provide information
• Psychological traits that relate to about how aggressive a testtaker
intelligence, specific intellectual is, a first step in understanding
abilities, cognitive style, the meaning of that score is
adjustment, interests, attitudes, understanding how aggressive
sexual orientation and was defined by the test
preferences, psychopathology, developer.
personality in general, and
specific personality traits. • Once having defined the trait,
state, or other construct to be
• Psychological trait exists only measured, a test developer
as a construct—an informed, considers the types of item
scientific concept developed or content that would provide insight
constructed to describe or explain into it.
behavior. We can’t see, hear, or
touch constructs, but we can infer • From a universe of behaviors
their existence from overt presumed to be indicative of the
behavior. targeted trait, a test developer
has a world of possible items that
• In this context, overt behavior can be written to gauge the
refers to an observable action or strength of that trait in testtakers.
the product of an observable
action, including test- or • Measuring traits and states by
assessment-related responses. means of a test entails
developing not only appropriate
• The definitions of trait and state test items but also appropriate
we are using also refer to a way ways to score the test and
in which one individual varies interpret the results.
from another. Attributions of a
trait or state term are relative. • For many varieties of
psychological tests, some
• Once it’s acknowledged that number representing the score on
psychological traits and states do the test is derived from the
exist, the specific traits and states examinee’s responses.
Psychological Assessment

question—might be of great value


• Inherent in cumulative scoring in such an evaluation.
is the assumption that the more
the testtaker responds in a • Competent test users understand
particular direction as keyed by and appreciate the limitations of
the test manual as correct or the tests they use as well as how
consistent with a particular trait, those limitations might be
the higher that testtaker is compensated for by data from
presumed to be on the targeted other sources.
ability or trait.
• All of this may sound quite
• The objective of such tests commonsensical, and it probably
typically has little to do. with is. Yet this deceptively simple
predicting future grid-blackening assumption—that test users know
or key-pressing behavior. Rather, the tests they use and are aware
the objective of the test is to of the tests’ limitations—is
provide some indication of other emphasized repeatedly in the
aspects of the examinee’s codes of ethics of associations of
behavior. assessment professionals.

• For example, patterns of answers


to true–false questions on one • In what is referred to as classical
widely used test of personality test theory (CTT; also variously
are used in decision making referred to as true score theory)
regarding mental disorders. the assumption is made that each
testtaker has a true score on a
• The obtained sample of behavior test that would be obtained but
is typically used to make for the action of measurement
predictions about future behavior, error.
such as the work performance of
a job applicant It is beyond the • Today all major test publishers
capability of any known testing or strive to develop instruments that
assessment procedure to are fair when used in strict
reconstruct someone’s state of accordance with guidelines in the
mind. test manual. However, despite
the best efforts of many
• Still, behavior samples may shed professionals, fairness-related
light, under certain questions and problems do
circumstances, on someone’s occasionally arise.
state of mind in the past.
• One source of fairness-related
• Additionally, other tools of problems is the test user who
assessment—such as case attempts to use a particular test
history data or the defendant’s with people whose background
diary during the period in and experience are different from
the background and experience
Psychological Assessment

of people for whom the test was time and money it took to
intended. administer, score, and interpret it.
Most of all, a good test would
• Some potential problems related seem to be one that measures
to test fairness are more political what it purports to measure.
than psychometric.
• Beyond simple logic, there are
• In a world without tests, teachers technical criteria that assessment
and school administrators could professionals use to evaluate the
arbitrarily place children in quality of tests and other
different types of special classes measurement procedures. Test
simply because that is where they users often speak of the
believed the children belonged. psychometric soundness of
tests, two key aspects of which
• In a world without tests, there are reliability and validity.
would be a great need for
instruments to diagnose • The criterion of reliability involves
educational difficulties in reading the consistency of the
and math and point the way to measuring tool: the precision with
remediation. which the test measures and the
extent to which error is present in
• In a world without tests, there measurements.
would be no instruments to
diagnose neuropsychological • In theory, the perfectly reliable
impairments. measuring tool consistently
measures in the same way
• In a world without tests, there • We want to be reasonably certain
would be no practical way for the that the measuring tool or test
military to screen thousands of that we are using is consistent.
recruits about many key That is, we want to know that it
variables. yields the same numerical
measurement every time it
• Considering the many critical measures the same thing under
decisions that are based on the same conditions.
testing and assessment • A test is considered valid for a
procedures, we can readily particular purpose if it does, in
appreciate the need for tests, fact, measure what it purports to
especially good tests. measure.

• Logically, the criteria for a good • A good test trains examiners can
test would include clear administer, score, and interpret
instructions for administration, with a minimum of difficulty. A
scoring, and interpretation. good test is a useful test, one that
yields actionable results that will
• It would also seem to be a plus if ultimately benefit individual
a test offered economy in the testtakers or society at large.
Psychological Assessment

• If the purpose of a test is to testtakers that are designed for


compare the performance of the use as a reference when
testtaker with the performance of evaluating or interpreting
other testtakers, then a “good individual test scores.
test” contains adequate norms.
Also referred to as normative • As used in this definition, the
data, norms provide a standard “particular group of testtakers”
with which the results of may be defined broadly.
measurement can be compared.
• A normative sample is that group
• Norm-referenced testing and of people whose performance on
assessment as a method of a particular test is analyzed for
evaluation and a way of deriving reference in evaluating the
meaning from test scores by performance of individual
evaluating an individual testtakers.
testtaker’s score and comparing it
to scores of a group of testtakers. • The verb to norm, as well as
related terms such as norming,
• In this approach, the meaning of refer to the process of deriving
an individual test score is norms. Norming may be modified
understood relative to other to describe a particular type of
scores on the same test. norm derivation.

• A common goal of norm- • For example, race norming is


referenced tests is to yield the controversial practice of
information on a testtaker’s norming on the basis of race or
standing or ranking relative to ethnic background. Race norming
some comparison group of was once engaged in by some
testtakers. government agencies and private
organizations, and the practice
• Norm in the singular is used in resulted in the establishment of
the scholarly literature to refer to different cutoff scores. for hiring
behavior that is usual, average, by cultural group.
normal, standard, expected, or
typical. • The process of administering a
test to a representative sample of
• Reference to a particular variety testtakers to establish norms is
of norm may be specified by referred to as standardization or
means of modifiers such as age, test standardization.
as in the term age norm.
• Sampling In the process of
• Norms is the plural form of norm, developing a test, a test
as in the term gender norms. developer has targeted some
defined group as the population
• Norms are the test performance for which the test is designed.
data of a particular group of This population is the complete
Psychological Assessment

universe or set of individuals with specific groups of testtakers are


at least one common, observable developed some time after the
characteristic. original standardization.

• The test developer can obtain a • percentile is an expression of the


distribution of test responses by percentage of people whose
administering the test to a score on a test or measure falls
sample of the population—a below a particular raw score.
portion of the universe of people
deemed to be representative of • Percentage correct refers to the
the whole population. distribution of raw scores—more
specifically, to the number of
• The size of the sample could be items that were answered
as small as one person, though correctly multiplied by 100 and
samples that approach the size of divided by the total number of
the population reduce the items.
possible sources of error due to
insufficient sample size. The • Also known as age-equivalent
process of selecting the portion of scores, age norms indicate the
the universe deemed to be average performance of different
representative of the whole samples of testtakers who were
population is referred to as at various ages at the time the
sampling. test was administered.

• If we arbitrarily select some • Designed to indicate the average


sample because we believe it to test performance of testtakers in
be representative of the each school grade, grade norms
population, then we have are developed by administering
selected what is referred to as a the test to representative samples
Purposive sample. of children over a range of
consecutive grade levels (such as
• An incidental sample or first through sixth grades).
convenience sample is one that
is convenient or available for use. • Both grade norms and age norms
are referred to more generally as
• One note on terminology is in developmental norms, a term
order before moving on. When applied broadly to norms
the people in the normative developed based on any trait,
sample are the same people on ability, skill, or other characteristic
whom the test was standardized, that is presumed to develop,
the phrases normative sample deteriorate, or otherwise be
and standardization sample are affected by chronological age,
often used interchangeably. school grade, or stage of life.

• Increasingly, however, new • National norms are derived from


norms for standardized tests for a normative sample that was
Psychological Assessment

nationally representative of the calculation of test scores for


population at the time the future administrations of the test.
norming study was conducted. In
the fields of psychology and • We may define a criterion as a
education. standard on which a judgment or
decision may be based.
• An equivalency table for scores
on the two tests, or national • Criterion-referenced testing
anchor norms, could provide the and assessment may be defined
tool for such a comparison. Just as a method of evaluation and a
as an anchor provides some way of deriving meaning from test
stability to a vessel, so national scores by evaluating an
anchor norms provide some individual’s score with reference
stability to test scores by to a set standard.
anchoring them to other test
scores. • The focus in the criterion-
referenced approach is on how
• Using the equipercentile scores relate to a particular
method, the equivalency of content area or domain, the
scores on different tests is approach has also been referred
calculated with reference to to as domain- or content-
corresponding percentile score. referenced testing and
assessment.
• Normative sample can be
segmented by any of the criteria • Assessment, error need not refer
initially used in selecting subjects to a deviation, an oversight, or
for the sample. What results from something. that otherwise
such segmentation are more violates expectations.
narrowly defined subgroup
norms. • To the contrary, error traditionally
refers to something that is more
• Local norms provide normative than expected; it is a component
information with respect to the of the measurement process.
local population’s performance on
some test. • More specifically, error refers to a
long-standing assumption that
• Another type of aid in. providing a factors other than what a test
context for interpretation is attempts to measure will
termed a fixed reference group influence performance on the
scoring system. test.

• Here, the distribution of scores • Test scores are always subject to


obtained on the test from one questions about the degree to
group of testtakers—referred to which the measurement process
as the fixed reference group—is includes error.
used as the basis for the
Psychological Assessment

• For example, an intelligence test


score could be subject to debate • Central tendency error is a
concerning the degree to which judgement resulting from a
the obtained score truly reflects general and systematic
the examinee’s intelligence and reluctance to giving ratings at
the degree to which it was due to either the positive or the negative
factors other than intelligence. extreme. Consequently, all of this
rater’s rating would tend to cluster
• Because error is a variable that in the middle of the rating
must be taken account of in any continuum.
assessment, we often speak of
error variance, that is, the • Halo effect is a tendency to give
component of a test score a particular ratee a higher rating
attributable to sources other than than he or she objectively
the trait or ability measured. deserves because of the rater’s
failure to discriminate among
• Yet measurement professionals conceptually distinct and
tend to view error as simply an potentially independent aspects
element in the process of of a ratee’s behavior.
measurement, one for which any
theory of measurement must • Self-handicapping is a tendency
surely account. when a test takers faced with the
expectation that they may not
• Rating is a numerical or verbal perform well, might reduce their
judgement (or both) that places a level of effort. Not trying hard
person or an attribute along a offers a good explanation for poor
continuum identified by a scale of performance. In order to protect
numerical or word descriptors self-worth, they might give
such as ratings scale. themselves an alternative
explanation for disappointing
• Rating error is a judgement performance.
resulting from the intentional or
unintentional misuse of a rating • The amount of linguistic demand
scale. can put non-English speakers at
a disadvantage. Even for tests
• Leniency or generosity error is that do not require verbal
an error in rating that arises from responses, it is important to
the tendency on the part of the consider the extent to which test
rater to be lenient in scoring, instructions assume that the test
marking, and/or grading. taker understands English.

• Severity error on the other hand • Expectancy effects, often called


is an error in rating that arises Rosenthal effects. The literature
from the tendency on the part of on stereotype threat concentrates
the rater to be extremely strict in on how cues in the testing
scoring, marking, and/or grading.
Psychological Assessment

environment can affect the test


taker.

• Beliefs held by people


administering and scoring tests
might also get translated into
inaccurate test scores.

• Reinforcement affects behavior,


tester should always administer
tests under controlled conditions.
Several studies have shown that
reward can significantly affect test
performance.

• Test anxiety may be a serious


source of errors such as
motivation and anxiety can
greatly affect test scores. Student
suffers from this debilitating
condition.

• It may seem obvious that illness


affects test scores. When you
have a cold or the flu, you might
not perform as well as when you
are feeling well. Many variations
in health status affect
performance in behavior and in
thinking.

You might also like