Robert J. Drummond, Carl J. Sheperis, Karyn D. Jones - Assessment Procedures For Counselors and Helping Professionals
Robert J. Drummond, Carl J. Sheperis, Karyn D. Jones - Assessment Procedures For Counselors and Helping Professionals
Assessment Procedures
for Counselors and
Helping Professionals
This page intentionally left blank
Eighth Edition
Assessment Procedures
for Counselors and
Helping Professionals
Robert J. Drummond
Late of University of North Florida
Carl J. Sheperis
Lamar University
Credits and acknowledgments for material borrowed from other sources and reproduced, with permission, in
this textbook appear on the appropriate page within the text.
Every effort has been made to provide accurate and current Internet information in this book. However, the
Internet and information posted on it are constantly changing, so it is inevitable that some of the Internet
addresses listed in this textbook will change.
Copyright © 2016, 2010, 2006 by Pearson Education, Inc. or its affiliates. All Rights Reserved. Manufactured in
the United States of America. This publication is protected by Copyright, and permission should be obtained
from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any
form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding
permissions, request forms, and the appropriate contacts within the Pearson Education Global Rights &
Permissions department, please visit www.pearsoned.com/permissions.
PEARSON and ALWAYS LEARNING are exclusive trademarks in the U.S. and/or other countries owned by
Pearson Education, Inc. or its affiliates.
10 9 8 7 6 5 4 3 2 1
—CJS
Preface
In Assessment Procedures for Counselors and Helping Professionals, our goal is to help
current and future school counselors, marriage and family therapists, mental health
counselors, career counselors, and other helping professionals become better consum-
ers of the various methods and procedures used in the process of assessment. Assess-
ment occurs in many settings, such as schools, mental health clinics, career counseling
centers, substance abuse treatment centers, private practice, psychiatric hospitals, and
vocational rehabilitation centers. Assessment is an integral part of the counseling pro-
cess in which the counselor and client work together to gain a better understanding of
the client’s problems. We believe that effectiveness and accuracy in assessment are
essential to effective counseling. Throughout the text, we stress that assessment is
more than simply giving tests. Assessment involves collecting and integrating infor-
mation about an individual from multiple methods and multiple sources. Throughout this
textbook, our aim is to provide students with an overview of the many approaches
to assessment so they can become competent and ethical practitioners in our multicul-
tural society.
This textbook has three goals. The first goal is to supply foundational information
about assessment, which includes an overview of the various methods and sources of
assessment information. In addition, students must learn some basic principles of
measurement in order to understand the applications and issues in assessment. Thus,
we also provide foundational information about statistical concepts, test scores, and
the psychometric aspects of assessment (e.g., validity and reliability). The second goal
of this textbook is to present an overview of the general areas in which assessment is
commonly utilized, such as in assessing intellectual ability, achievement, aptitude,
career interests and skills, and personality. The third goal is to provide students with
information about specific assessment applications and issues, such as clinical assess-
ment, communicating assessment results, assessment with diverse populations, and
ethical and legal issues.
To meet these goals, the textbook is divided into three parts that provide a balance of
theory and practice information as well as coverage of the assessment instruments and
strategies commonly used in school counseling, clinical mental health counseling, and
vocational or career counseling settings. These sections include Principles and Founda-
tions of Assessment, Overview of Assessment Areas, and Applications and Issues.
Assessment Areas
Part Two of the textbook, Overview of Assessment Areas, builds on the Principles and
Foundations section by exploring specific assessment areas. Chapter 8 supplies infor-
mation about assessing intellectual ability, including the major theories of intelli-
gence, the major tests of intelligence (e.g., the Wechsler scales, the Stanford–Binet, the
Kaufman tests), and special issues in intelligence testing. Chapter 9 covers assessment
of achievement, including achievement test batteries, individual achievement tests,
diagnostic achievement tests, subject-area tests, and other types of achievement tests.
Chapter 10 presents information about aptitude assessment. Extensive changes in
U.S. social and economic conditions may result in more counselors working with cli-
ents on career-related issues; thus, Chapter 11 provides important information about
career and employment assessment. The last chapter in this section, Chapter 12,
focuses on personality assessment and the many types of personality instruments and
techniques.
Acknowledgments
I would like to thank my publisher, Kevin Davis, for believing in me and giving me the
chance to revise such an esteemed book. This book influenced me during my graduate
training, and now I have the privilege to revise it for the eighth edition. Thanks to the late
Robert Drummond for his many contributions to the assessment world and for authoring
such a foundational textbook. I would also like to thank Melinda Rankin for her excellent
copyediting skills. She has an amazing eye for detail and a gentle way of helping me to see
my own writing errors. Finally, I would like to thank the following colleagues, whose
reviews improved this edition: Donald Deering, Oakland University and University of
Phoenix; Josué R. Gonzalez, Clinical Psychologist—San Antonio, Texas; Dawn C. Lorenz,
Penn State University; Diane Kelly-Riley, Washington State University; and Anthony
Tasso, Farleigh Dickinson University.
CJS
About the Authors
Robert J. Drummond
Dr. Robert Drummond passed away on March 14, 2005. He was a retired professor and
counselor educator at the University of North Florida for 20 years. He was foremost in the
field of assessment, and he specialized in educational and psychological testing, career
development, models for evaluation, educational research, and personality theory and
measurement. Dr. Drummond wrote the first edition of this text in 1988. Now in its eighth
edition, the book remains a popular assessment textbook in counseling.
Carl J. Sheperis
Dr. Carl J. Sheperis serves as Chair of the Department of Counseling and Special Popula-
tions at Lamar University. He is a past president of the Association for Assessment and
Research in Counseling, associate editor for quantitative research for the Journal of Coun-
seling and Development, and a director for the National Board for Certified Counselors. He
has worked with the American Counseling Association as the chair of the Research &
Knowledge Committee and has served as the editor of the Journal of Counseling Research
and Practice.
In addition to this textbook, Dr. Sheperis is an author of Research in Counseling: Quan-
titative, Qualitative, and Mixed Methods; Clinical Mental Health Counseling: Fundamentals of
Applied Practice; DSM Disorders in Children and Adolescents; and The Peace Train. He is also
published in various textbooks, academic journals, and reference volumes. A frequent
speaker and presenter at professional conferences and workshops as well, Carl Sheperis
has appeared at such recent events as the American Counseling Association World Con-
ference, The International Autism Conference, the Association for Counselor Education
and Supervision Conference, the National Assessment Conference, and the National Head
Start Conference.
x
Brief Contents
xi
Contents
Chapter 5 Reliability/Precision 91
Reliability 92
Measurement Error 93
Sources of Measurement Error 93
Methods of Estimating Reliability/Precision 95
Test-Retest 97
Alternate Form 98
Internal Consistency Reliability 99
Interrater Reliability 102
Selecting a Reliability Coefficient 103
Evaluating Reliability Coefficients 103
Standard Error of Measurement 104
xiv Contents
Chapter 6 Validity 110
The Nature of Validity 111
Revisiting Construct 112
Threats to Validity 112
Validity and Reliability/Precision 113
Sources of Validity Evidence 114
Test Content Validity Evidence 114
Evidence Based on Internal Structure 116
Relations to Other Variables Evidence 118
Evidence of Homogeneity 121
Convergent and Discriminant Evidence 121
Evidence Based on Consequences of Testing 125
Evidence Based on Response Processes 126
Summary 126 • Questions for Discussion 127 •
Suggested Activities 127 • References 128
Personality Inventories 274
Approaches to Personality Inventory Development 274
Categories of Personality Inventories 276
Structured Personality Inventories 276
Projective Instruments and Techniques 285
Personality Inventories with a Positive Focus 294
Response Styles 297
Summary 298 • Questions for Discussion 298 •
Suggested Activities 299 • References 300
Coaching 350
Test-Wiseness 350
Test Anxiety 351
Summary 352 • Questions for Discussion 353 •
Suggested Activities 353 • References 353
1 Introduction to Assessment
Imagine being asked by a child welfare agency to conduct an assessment that would deter-
mine a child’s potential for transitioning from foster care status to adoption within a family.
As part of the assessment, you might visit the home of the potential parents to determine
the appropriateness of the environment and to have a realistic sense of the family function-
ing. You would also have to evaluate the social and emotional development of the child and
the readiness for adoption. For example, it would be necessary to consider the child’s ability
to bond with a new family, any developmental issues that may be present, and any poten-
tial barriers that might impact the success of the adoption process. In order to gather enough
information to make this type of determination, you might interview the parents, observe
the child playing and interacting, and conduct evaluation using standardized assessment
instruments (e.g., the Bayley Scales of Infant and Toddler Development). Consider how
important this assessment process would be to the children and the parents. The overall
assessment process would be quite involved, and the results would have incredibly high
stakes. The final assessment report would include information about any developmental
concerns, an evaluation of the family environment, an interpretation of standardized scores,
and a final recommendation based on the data. Based on the assessment results, the child
welfare agency would make a decision about finalizing the adoption.
It is a privilege to play such a role in people’s lives, and the privilege should be hon-
ored with careful attention to best practices and a wealth of knowledge about the assess-
ment process. Although the results of assessment do not always lead to happy outcomes,
this example provides some insight into where your journey through this book will lead.
Assessment has long been regarded as a fundamental component of all helping profes-
sions and the cornerstone of the counseling process. Simply put, assessment is the process
of gathering information about a client and determining the meaning of that information.
It is through assessment that counselors can uncover the nature of a client’s problems or
issues; the magnitude of these problems and how they are impacting the client’s life; how
the client’s family, relationships, or past experiences are affecting the current problem; the
client’s strengths and readiness for counseling; and whether counseling can be beneficial
to the client. Assessment is also critical for establishing the goals and objectives of coun-
seling and for determining the most effective interventions. Assessment occurs in all coun-
seling settings, including schools, mental health clinics, career counseling centers, substance
abuse treatment centers, private practice, psychiatric hospitals, and vocational rehabilita-
tion centers. In practice, counselors are always assessing. Assessment is an ongoing, fluid,
and dynamic process that continues throughout the course of the helping relationship.
1
2 Chapter 1 • Introduction to Assessment
Although students in the helping professions often initially question the need for assess-
ment training, competency in assessment is integral to successful counseling practice
(Whiston, 2012).
The purpose of this textbook is to help current and future school counselors, mental
health counselors, career counselors, marriage and family therapists, and other helping
professionals recognize the integral role between assessment and counseling, understand
the process of assessment, develop an awareness of the applications of assessment, and
understand the legal and ethical issues specific to assessment. We believe that competency
in assessment is essential to positive outcomes in counseling. In order to be competent in
assessment, you will need to seek supervised practice opportunities in addition to learning
the content in this textbook. Each chapter in this book will help you build upon your ability
to integrate assessment into your practice as a professional counselor.
Throughout the textbook, we use the term assessment rather than testing. It is impor-
tant to understand that testing is just one component of the assessment process and that
the scope of assessment activities is far beyond the exclusive use of standardized tests.
Although we will present information about important and widely used educational and
psychological tests throughout the text, we stress that assessment is more than simply giv-
ing tests. Assessment involves collecting and integrating information about an individual
from multiple methods (e.g., interviews, observations, tests) and multiple sources (e.g., the
client, family members, teachers, physicians). Corroborating data from multiple assess-
ment methods and sources helps create a more comprehensive and accurate understand-
ing of the client and his or her presenting concerns.
assessment information.
JJ Explain the importance of integrating multiple methods and multiple sources of
assessment information.
JJ List and describe the steps in the assessment process.
JJ Describe the competencies required by counselors for the effective use of assessment
instruments.
JJ Describe the historical context of assessment.
What is Assessment?
Before we can talk about the assessment process, it is important to understand our defini-
tion of assessment. The term assessment refers to any systematic procedure for collecting
information that is used to make inferences or decisions about the characteristics of a per-
son (American Educational Research Association (AERA), American Psychological
Association (APA), & National Council on Measurement in Education (NCME), 2014).
Assessment encompasses a broad array of data collection methods from multiple sources
Chapter 1 • Introduction to Assessment 3
to yield relevant, accurate, and reliable information about an individual. In counseling and
other helping professions, assessment is considered a process, because it is the continual
practice of gathering information. Some hold to a traditional (yet erroneous) belief that
assessment is limited to the first meeting with an individual; in reality, assessment is an
ongoing process that may begin even before the first face-to-face contact with the indi-
vidual and that continues throughout the course of the helping relationship.
Many disciplines employ the activity of assessment, including psychology, coun-
seling, education, social work, health, military, and business and industry. Educators and
other school personnel use assessment to identify learning or behavioral or emotional
problems in students and to determine appropriate interventions and educational plans.
Psychologists and other mental health professionals utilize assessment to help in diagnos-
ing mental disorders, treatment planning, and monitoring and evaluating treatment pro-
gress. Career counselors engage in assessment to evaluate individuals’ vocational interests
and aptitudes. Because numerous types of professionals engage in assessment, we will
refer to those individuals as counselors, test users, assessors, examiners, or simply professionals
throughout the textbook. Similarly, we will refer to individuals who participate in the
assessment process as clients, test takers, assessees, or examinees.
Assessment is often equated with testing, and the two terms are often confused or
erroneously used interchangeably. Even today, many published textbooks hardly distin-
guish between assessment and testing. As Cohen, Swerdlik, and Sturman (2012) noted,
testing has been a catch-all phrase for the entire testing process rather than just the admin-
istration of a test. However, assessment goes beyond merely giving tests. It is a compre-
hensive process that involves the integration of information from multiple data collection
methods (e.g., interviews, tests, observations). Therefore, tests are now considered to be
one aspect of the overall assessment process (American Educational Research Association
(AERA) et al., 2014). The fact that assessment can proceed effectively without testing helps
to distinguish between these two activities (Weiner, 2013).
The methods for collecting assessment information can be grouped into three broad
categories: interviews, tests, and observations. Each category comprises a wide array of
formal and informal instruments and strategies, such as unstructured interviews, rating
scales, standardized tests, projective drawings, checklists, questionnaires, and so on.
Assessment also involves obtaining information from various sources, which may include
the client, family members, spouses or partners, teachers, physicians, mental health profes-
sionals, and other professionals. The assessment process varies from assessment to assess-
ment, depending upon the purpose for assessment, the setting in which the assessment
takes place, the needs of the client, and the availability and utility of the methods and
sources of information (Weiner, 2013). We emphasize the importance of using multiple
methods in most assessments, because the results of a single assessment instrument should
never be the sole determinant of important decisions about clients.
Progress and Outcome Evaluation Once interventions have been implemented, coun-
selors may use various assessment instruments and strategies to monitor a client’s pro-
gress and evaluate outcome. By periodically monitoring a client’s progress, counselors can
Chapter 1 • Introduction to Assessment 5
Assessment
Methods Sources
FIGURE 1.1 Multiple methods and multiple sources of the assessment process.
It may seem like an obvious point, but meeting face-to-face (or via camera) with a
client is critical for gaining a complete picture from the assessment process. The interview
is a face-to-face meeting of the assessment professional and the client. Interviewing may
include such diverse techniques as unstructured interactions, semistructured interactions,
and highly formal structured interactions. Its primary purpose is to gather background
information relevant to the reason for assessment. The interview can be considered the
single most important method of gathering information about the client’s presenting prob-
lem and background information. Without interview data, information from tests and
observations is without context and meaningless. In many settings, the interview is the
primary (and sometimes only) assessment method used to collect data.
Tests are instruments designed to measure specific attributes of an individual, such
as knowledge or skill level, intellectual functioning, aptitude, interests or preferences, val-
ues, personality traits, psychological symptoms, level of functioning, and so on. Counse-
lors may use data collected from formal and informal tests, checklists, questionnaires, or
inventories for several purposes, such as screening for emotional, behavioral, or learning
problems; classifying or diagnosing certain attributes, problems, or disorders; selecting or
placing individuals into training, educational or vocational programs, or employment
opportunities; assisting in planning educational or psychological interventions; or evaluat-
ing the effectiveness of specific interventions or educational programs. Test results are
Chapter 1 • Introduction to Assessment 7
particularly useful in assessment, because they may reveal vital diagnostic information
that would not have been uncovered through other assessment methods.
Observation is an assessment method that involves watching and recording the behav-
ior of an individual in a particular environment. It is a way of seeing what a person actu-
ally does, rather than relying on others’ perceptions of behavior. Observation is useful for
collecting information about an individual’s emotional responses, social interactions,
motor skills, and job performance and for identifying specific patterns of behavior. Obser-
vation can be formal, involving the use of standardized rating scales and highly structured
procedures, or informal, with the counselor taking raw notes regarding a client’s verbal
and nonverbal behavior during the assessment.
In addition to multiple methods, counselors use multiple sources of information. The
client is usually the primary source of information during the assessment process. Other
sources of information (called collateral sources) include personal sources, such as parents,
spouses or partners, and others close to the individual being evaluated, and professional
sources, such as teachers, physicians, mental health professionals, and other professionals.
Information from collateral sources is valuable, because it is typically more objective and
reliable than information obtained directly from examinees. Another source of assessment
information comes from client records, such as school grades or attendance, previous psy-
chological or educational assessment reports, mental health treatment plans or summaries,
court documents, records from social services agencies, and so on.
There is no set standard as to the number of methods or sources that should be used
in assessment. The methods and sources chosen for the assessment process typically depend
upon the nature of the referral questions, the reason for assessment, and available assess-
ment resources. The client interview is considered the cornerstone of assessment and is
employed in almost all cases. However, utilizing additional methods and sources of infor-
mation leads to a more complete and accurate picture of the individual being evaluated. For
example, say that a mental health counselor working in an outpatient counseling center
conducts unstructured interviews with clients to determine the reason they are seeking
counseling and to collect relevant background information. The counselor also asks clients
to complete a self-report checklist of psychological symptoms. From the checklist, the coun-
selor discovers that a particular client has many symptoms of depression, which the client
did not disclose during the interview. In this example, the use of the checklist provided
essential information that was not uncovered by the interview alone. The client profile also
might be more clearly detailed with the administration of some standardized tests; how-
ever, the counselor might not have access to these tests in his work setting.
or written report that contains the assessment results and recommendations. In between
the beginning and end points of the assessment process are other additional actions
directed at collecting relevant client information. Although the process of assessment
might appear overwhelming now, it can be broken down into the following four manage-
able steps (Hardwood, Beutler, & Groth-Marnat, 2011):
1. Identify the Problem The first step in the assessment process is identifying the pre-
senting problem—that is, the reason that the individual is being assessed. Because
assessment is so clearly linked to counseling, the reason for assessment and the rea-
son for counseling are often one and the same. Reasons for assessment and/or coun-
seling can stem from a variety of problems or concerns, such as academic or vocational
performance, cognitive abilities, behavioral problems, or emotional and social func-
tioning (Lichtenberger et al., 2005). In order to proceed to the next step in the assess-
ment process, the counselor must have a clear idea about what the problem is and the
reasons for which the client is being seen.
Clients may be self-referred for assessment, or they may be referred by another
source, such as a family member, teacher, judge, physician, or human resources man-
ager. Referral sources can help clarify the nature and severity of the client’s problem
through the specific questions they want answered about the client. Thus, referral
questions are often directly linked to the problem being addressed in assessment. The
following are examples of referral questions that help define the client’s problem:
• Does this student have a learning disability? If so, does he or she qualify for special
education or related services?
• Is this child ready to begin kindergarten?
• Does this child’s problematic behavior indicate a diagnosis of Attention Deficit/
Hyperactivity Disorder (ADHD)?
• Is this individual suicidal?
• Does this adult have Posttraumatic Stress Disorder (PTSD)?
• Does this parent have a mental disorder that might interfere with parenting?
• What are this individual’s vocational interests?
• How well can this employee be expected to perform if promoted to a management
position?
2. Select and Implement Assessment Methods After counselors determine the nature
of the problem that needs to be appraised in the assessment process, the next step
involves selecting and implementing methods for collecting data (e.g., interviews,
tests, observation) and determining the sources of assessment information. Counse-
lors choose from among numerous formal and informal assessment instruments
and strategies based on the reason for referral, the context in which the assessment
takes place, and the adequacy of the instruments and procedures they will use.
Interviews are used in almost every assessment to obtain background information
about an individual, including family history, work and education background,
social history, and other relevant cultural and environmental factors. Counselors
may administer tests to evaluate a person’s cognitive functioning, knowledge, skills,
abilities, or personality traits. Observation may be used to record or monitor a cli-
ent’s behavior in a particular setting. Collateral information also may be obtained
from family members, spouses or partners, and others close to the individual being
evaluated. Although there are no set guidelines for which or how many assessment
Chapter 1 • Introduction to Assessment 9
instruments or strategies to use, in general, the more methods used to collect data,
the more accurate and objective the information obtained.
3. Evaluate the Assessment Information A key task for counselors is evaluating assess-
ment information, which involves scoring, interpreting, and integrating information
obtained from all assessment methods and sources to answer the referral question.
To be competent in evaluating assessment information, counselors need knowledge
and skills in basic statistical concepts, psychometric principles, and the procedures
for interpreting assessment results. Evaluating assessment information is a difficult
step, because the counselor is often confronted with a dizzying array of information
gathered during the assessment process. To organize this data, counselors can use
the following steps (Kamphaus & Frick, 2010; Sattler & Hoge, 2006):
a. Document any significant findings that clearly identify problem areas.
b. Identify convergent findings across methods and sources.
c. Identify and explain discrepancies in information across methods and sources.
d. Arrive at a tentative formulation or hypothesis of the individual’s problem.
e. Determine the information to include in the assessment report.
4. Report Assessment Results and Make Recommendations The final step in the assess-
ment process is reporting results and making recommendations. This involves
(a) describing the individual being assessed and his or her situation, (b) reporting gen-
eral hypotheses about the individual, (c) supporting those hypotheses with assessment
information, and (d) proposing recommendations related to the original reason for
referral (Kaufman & Lichtenberger, 2002; Ownby, 1997; Sattler, 2008). The general
hypotheses are the counselor’s descriptive or clinical impressions of the individual that
are based on multiple methods and sources of assessment data. When reporting these
hypotheses, make sure to provide enough assessment data to support your conclusion.
Making recommendations involves identifying specific ways to resolve the pre-
senting problem or referral question by addressing the assessment’s key findings
about the individual (Lichtenberger et al., 2005). Counselors recommend strategies
and interventions that are designed to facilitate change and improve outcomes based
on the individual and his or her assessment results (Kaufman & Lichtenberger, 2002).
Because individuals are referred for assessment for a variety of reasons, recommen-
dations vary depending on the referral questions. In addition, the setting in which
the assessment takes place (such as a school, hospital, mental health clinic, college, or
vocational training center) will influence the type and number of recommendations
(Kaufman & Lichtenberger, 2002). For example, in school settings, most referrals for
assessment involve students’ problems that affect their academic performance. In
this situation, recommendations typically focus on behavioral interventions, instruc-
tional strategies, or other appropriate educational services (Lichtenberger et al.,
2005). Assessments at mental health centers are requested generally for diagnosing
mental disorders, treatment planning, and monitoring treatment progress; thus, rec-
ommendations may include a variety of clinical interventions and techniques.
competent. Because some people have underestimated the complexity of assessment, prac-
ticed beyond the scope of their training, or attributed too much meaning to a single test
score, the public has developed a degree of skepticism in relation to assessment. As a
result, several governing bodies and professional associations related to assessment have
set explicit guidelines for the selection, use, administration, and interpretation of assess-
ment instruments. Those guidelines can be translated into the following competencies:
1. Understand the basic statistical concepts and define, compute, and interpret meas-
ures of central tendency, variability, and relationship.
2. Understand basic measurement concepts, such as scales of measurement, types of
reliability, types of validity, and norm groups.
3. Compute and apply measurement formulas, such as the standard error of measure-
ment and Spearman–Brown prophecy formula.
4. Read, evaluate, and understand instrument manuals and reports.
5. Follow exactly as specified the procedures for administering, scoring, and interpret-
ing an assessment instrument.
6. List and describe major assessment instruments in their fields.
7. Identify and locate sources of information about assessment instruments.
8. Discuss as well as demonstrate the use of different systems of presenting data in
tabular and graphic forms.
9. Compare and contrast different types of scores and discuss their strengths and
weaknesses.
10. Explain the relative nature of norm-referenced interpretation in interpreting indi-
vidual scores.
11. Help teach clients to use tests as exploratory tools and in decision making.
12. Present results from assessment instruments both verbally (using feedback sessions)
and in written form.
13. Pace a feedback session to enhance the client’s knowledge of the test results.
14. Use strategies to prepare clients for testing to maximize the accuracy of the test
results.
15. Explain assessment results to clients thoughtfully and accurately, but in language
they understand.
16. Use effective communication skills when presenting assessment results to individu-
als, groups, parents, students, teachers, and professionals.
17. Shape the client’s reaction to and encourage appropriate use of assessment
information.
18. Be alert to the verbal and nonverbal cues expressed by the client throughout the
assessment process.
19. Use appropriate strategies with clients who perceive assessment results as negative.
20. Be familiar with the interpretation forms and computerized report forms so as to
guide the client to the information and explanation.
21. Be familiar with the legal, professional, and ethical guidelines related to assessment.
22. Be aware of the client’s rights and the professional’s responsibilities as a test admin-
istrator and counselor.
23. Have knowledge of the current issues and trends in assessment.
The Association for Assessment in Counseling (now the Association for Assessment
and Research in Counseling) published Responsibilities of Users of Standardized Tests
Chapter 1 • Introduction to Assessment 11
(Association for Assessment in Counseling, 2003), which describes the qualifications that
professionals must have in order to provide valuable, ethical, and effective assessment ser-
vices to the public. Qualifications to use standardized tests depend on at least four factors:
1. Purposes of Testing A clear purpose for using an assessment instrument should be
established. Because the purpose of an instrument directly affects how the results are
used, qualifications beyond general competencies may be needed to interpret the
results.
2. Characteristics of Tests Counselors should understand the strengths and limita-
tions of each instrument used.
3. Settings and Conditions of Test Use Counselors should evaluate the levels of knowl-
edge and skill required for using a particular assessment instrument prior to imple-
menting the instrument.
4. Roles of Test Selectors, Administrators, Scorers, and Interpreters The education,
training, and experience of test users determine which instruments they are qualified
to administer and interpret.
Historical Perspectives
Part of being competent in assessment involves having a working knowledge of the his-
tory of assessment. Assessment is not a new concept. Even though the test movement in
the United States began only at the turn of the 20th century (see Table 1.1), tests actually
have been used for thousands of years. Around 2200 b.c., the Chinese used essay examina-
tions to help select civil service employees. The philosophies of Socrates and Plato empha-
sized the importance of assessing an individual’s competencies and aptitudes in vocational
selection. Throughout the centuries, philosophers and educators have devised certain
scales or items to provide teachers and parents with useful information to help their chil-
dren. Anthony Fitzherbert (1470–1538) identified some items to screen individuals with
retardation from those without—for example, being able to count to 20 pence, being able
to tell one’s age, and being able to identify one’s father or mother.
Juan Huarte (1530–1589) was probably the first author to suggest formal intelligence
testing. His book title was translated as The Trial of Wits: Discovering the Great Differences of
Wits among Men and What Sorts of Learning Suit Best with Each Genius. Jean Esquirol
(1772–1840), a French physician, proposed that there are several levels of intellectual defi-
ciencies and that language is a valid psychological criterion for differentiating among lev-
els. Eduardo Seguin (1812–1880) also worked with individuals with intellectual disabilities
and believed that these people should be trained in sensory discrimination and in the
development of motor control.
The Victorian era marked the beginning of modern science and witnessed the influ-
ence of Darwinian biology on the studies of individuals. In 1879 in Leipzig, Wilhelm
Wundt (1832–1920) founded the first psychological laboratory. His work was largely con-
cerned with sensitivity to visual, auditory, and other sensory stimuli and simple reaction
time. He followed scientific procedures and rigorously controlled observations. He influ-
enced the measurement movement by using methodology that required precision, accu-
racy, order, and reproducibility of data and findings. The interest in the exceptional
individual broadened to include personality and behavior. Sigmund Freud, Jean Martin
Charcot, and Philippe Pinel were interested in individuals with personal and social
12 Chapter 1 • Introduction to Assessment
judgment problems. Early interest in measuring intelligence also dates back to the late 19th
century, when Sir Francis Galton (1822–1911), cousin to Charles Darwin, applied Darwin’s
evolutionary theory to attempt to demonstrate a hereditary basis for intelligence. In 1905,
French psychologist Alfred Binet (1857–1911) constructed the first intelligence test (the
Binet–Simon scale) that measured children’s cognitive ability to learn school-type tasks or
educational attainments, focusing on language, memory, judgment, comprehension, and
reasoning (Binet & Simon, 1916). Binet claimed that his scale provided a crude means of
differentiating between those children who could function in the regular classroom and
those who could not.
The assessment of children rapidly expanded to the assessment of adults when
the United States entered World War I in 1917 (Anastasi & Urbina, 1997). During this time,
the armed services developed a group intelligence test called the Army Alpha to use in the
selection and classification of military personnel. The original purpose of the army test was
to identify those recruits whose lower intelligence would create problems for the military
organization. A similar test was created for use with illiterate or non-English-speaking
recruits, called the Army Beta. At the end of World War I, there was also an interest in
screening recruits for psychosis and other emotional disabilities. The army once again
developed a new test, called the Woodworth Personal Data Sheet, which was a forerunner
of modern personality tests.
The successful use of tests by the armed services led to widespread adoption of tests
in education and industry. Other factors also contributed to the acceptance of tests. Growth
in population, free public education, compulsory school attendance laws, and the increase
in students going on to institutions of higher education all were factors that changed the
philosophy and practice of testing.
In addition, the egalitarian, political, and philosophical movements that championed
integration, women’s rights, rights of individuals with disabilities, and cultural group
heritage influenced how people viewed tests. Tests were criticized for cultural bias, gender
bias, unfairness to minority groups, and unfairness to groups with disabilities. These criti-
cisms led to improved review procedures for the selection of test items and the selection of
norming samples.
In recent years, however, the prevailing educational policy in the United States has
changed from an open, humanistic education to back-to-basics and accountability-based
approaches. The current use of high-stakes tests in the U.S. educational system can impact
a student’s educational paths or choices, such as whether a student is promoted or retained
at a grade level, graduated, or admitted into a desired program.
Despite the changes in the social and political climate, the use of tests in the United
States increased dramatically in the 20th and 21st centuries and continues to grow. It is
estimated that Americans take anywhere from 143 million to nearly 400 million standard-
ized tests yearly for education alone, 50 million to nearly 200 million job tests for business
and industry, and several million more for government and military jobs (Sacks, 1999).
The test industry has been greatly affected by technology. Computer-based tests repre-
sent a great advancement from the time when test usage was time-consuming and labori-
ous in terms of administering, scoring, and interpreting tests, as well as writing up test
results (Cohen et al., 2012). Today, most test publishers offer computer software for
administering, scoring, and/or interpreting tests. Technological advances make the use
of tests in the assessment process more convenient and affordable, further increasing the
growth in test usage.
Chapter 1 • Introduction to Assessment 15
Computer-Based Assessment
The use of computers has been viewed as a way to enhance and advance the field of assess-
ment. In all areas of assessment (e.g., personality, intellectual ability, achievement, career
and employment assessment), computer-based assessment instruments and strategies are
available. With dynamic visuals, sound, user interactivity, and near-real-time score report-
ing, computer-based assessment vastly expands assessment possibility beyond the limita-
tions of traditional paper-and-pencil instruments (Scalise & Gifford, 2006). Although
initially computers were used only in the processing of test data, computer-based assess-
ment now encompasses a broad range of operations and procedures, such as the following:
• Computer Administration of Assessment Instruments Administering tests, ques-
tionnaires, and interviews via the computer is one of the most common computer
assessment applications. This response format has many advantages over traditional
paper-and-pencil methods, such as increased delivery, potential time savings, and
the ability for items to be adapted or tailored based on the test taker’s response to a
previous item.
• Automated Test Scoring Computer-based assessment provides automated scoring
of responses, thereby giving test takers almost immediate feedback and their overall
score. Computer scoring reduces the possibility that respondents would make errors
while filling out handwritten answer sheets and eliminates the errors that clinicians
and technicians would make while hand-scoring items.
• Computer-Generated Reports and Narratives Computer-based assessment instru-
ments often provide computer-generated reports or narratives. These reports are
automated interpretations that are generated based on user input and resulting test
scores (Butcher, 2013). The reports may contain very complex and detailed state-
ments or summary statements.
• Computer-Adaptive Tests Computer-adaptive tests are specifically tailored to an
individual’s ability level. The computer quickly determines the examinee’s ability
level and then tailors the questions to that level. The first question is usually selected
close to the passing level. If the test taker answers the question correctly, then a more
difficult item is presented next. Using a computer-adaptive test, test takers have a
more personalized assessment experience in a controlled environment. Computer-
adaptive tests also provide sensitivity to the needs of users with disabilities, helping
ensure equality and fairness in testing.
• Computer Simulations Computer simulation is the technique of representing real-
world experiences through a computer program. Interactive software programs
allow individuals to explore new situations, make decisions, acquire knowledge
16 Chapter 1 • Introduction to Assessment
based on their input, and apply this knowledge to control the ever-changing simula-
tion state. Simulations have been in use for many years to assess performance in dif-
ferent environments. In the military, simulation has long been used for assessing the
readiness of individuals to perform military operations, and devices used for com-
puter simulations range from plastic mock-ups to laptop computers to full-motion
aircraft simulators. In education, simulations can be used to investigate problem-
solving skills, allowing students to explore a range of options in a particular problem
scenario. Scenario-based testing is also used for some counseling-related exams, such
as the Clinical Mental Health Counseling Exam (CMHCE).
Computer technology also helps facilitate and improve all phases of measurement
practices. The increased availability of powerful computers and computer software in
recent decades has greatly enhanced the ease of evaluating the reliability (consistency) and
validity (accuracy) of test results. Such statistical operations are routinely carried out with
computer software programs such as SPSS and SAS.
Despite the many advantages of using computer-based assessment, there are several
limitations as well. For example, computer-based test interpretations or narrative reports
should not be viewed as stand-alone clinical evaluations (Butcher, 2013). Computer-based
interpretations are unable to take into account the uniqueness of the test taker and incor-
porate such elements as a client’s personal history, life events, or current stressors. There-
fore, computerized reports are considered broad, general descriptions that should not be
used without the evaluation of a skilled counselor. Whether or not a counselor chooses to
use a computer-based test interpretation, it is the counselor who is ultimately accountable
for the accuracy of interpretations.
Internet-Based Assessment
The Internet is also changing the current landscape and the future of assessment by pro-
viding a profusion of assessment instruments with 24–7 access, ease of use, immediate
scoring, and a more limited need for test administrators, leading to convenience, cost effec-
tiveness, and efficient testing. It is difficult to estimate the number of assessment-related
websites currently available on the Internet, other than to say that the number is large and
increasing (Buchanan, 2002). Internet-based assessment websites vary in content, quality,
and function. Some seek to adhere to high standards of professionalism of assessment,
whereas others appear unprofessional and unconcerned with ethical and security issues.
The motivation for development of many of these sites is easy to understand. Commercial
assessment sites can make more money, because the Internet offers easy access to large
numbers of participants. Researchers benefit from Internet-based assessment because they
have access to large numbers of participants; the costs associated with traditional assess-
ment methods, such as publishing and distributing paper surveys, mailing materials to
study participants, and data collection and entry, are eliminated; and the costs to develop,
publish, and maintain web-based surveys are significantly lower.
The expansion of assessment instruments on the Internet has brought about a num-
ber of issues. Concerns about the reliability and validity of the data collected through the
Internet remain, although previous research indicates no significant difference between
traditional and Internet-based testing. Another concern is that although many people have
access to the Internet, not everyone does, which can be a confounding variable in a research
study in terms of population sample. Questions regarding test security remain, and it is
Chapter 1 • Introduction to Assessment 17
difficult to positively identify a person taking an online assessment if the test is not taken
at a special site. Another issue involves providing feedback or results to participants—
specifically, the inability to have human contact with a clinician or researcher while the
participant is receiving and processing test results.
Summary
Many disciplines (e.g., psychology, counseling, are relevant to the assessment results and the
education, social work, health, military, busi- referral question.
ness or industry) employ the activity of assess- Having knowledge of the history of assess-
ment for such purposes as screening, ment can help with understanding current
identification and diagnosis, intervention plan- assessment issues and practices. The testing
ning, and progress evaluation. The assessment movement is about 100 years old. Some tests
process encompasses multiple data collection were constructed in the 19th century, but the
methods from multiple sources to yield rele- majority of test development occurred in the
vant, accurate, and reliable information about 20th century. Many of the innovations and
an individual. A key task of assessment profes- changes of the test movement resulted from
sionals is to analyze and integrate information major national crises and social and political
obtained from all assessment methods and movements. The use of computer-based and
sources to answer the referral question. Profes- Internet-based assessment continues to become
sionals must then make recommendations that more prevalent in the assessment field.
18 Chapter 1 • Introduction to Assessment
Moving Forward
This textbook is divided into three parts (Part I, your ability to be competent in the area of
Principles and Foundations of Assessment; Part assessment. Chapter 1 introduced you to the
II, Overview of Assessment Areas; and Part III, principles and foundations of assessment. You
Applications and Issues) that provide a balance will continue to explore this area in Chapters 2
of theory and practice information as well as through 7. Chapters 8 through 12 provide an
coverage of the assessment instruments and overview of assessment areas, and the remain-
strategies commonly used in the various areas der of the book focuses on applications and
of counseling (e.g., school counseling, clinical issues. As you read each chapter, contemplate
mental health counseling, vocational or career questions for discussion, and complete related
counseling settings). Although each section has activities, you should consider how each chap-
a different focus, it is important to remember ter fits within the framework we have provided
that all of the components of the textbook are and work to integrate the information into your
interrelated and that each section builds upon epistemology of counseling practice.
Suggested Activities
1. Interview individuals who are working in the occurred over the last 5 years that have impacted
helping professions to find out what assessment assessment.
instruments or strategies they regularly use. 3. Discuss the assessment issue you think is most
2. Review media sources (e.g., websites, newspa- important for counselors to address.
pers), and identify three events that have
References
American Educational Research Association (AERA), Anastasi, A., & Urbina, S. (1997). Psychological testing
American Psychological Association (APA), & (7th ed.). Upper Saddle River, NJ: Prentice Hall.
National Council on Measurement in Education Association for Assessment in Counseling. (2003).
(NCME). (2014). Standards for educational and psy- Responsibilities of users of standardized tests (RUST).
chological testing. Washington, DC: Authors. Alexandria, VA: Author.
American Psychiatric Association (APA). (2013). Binet, A., & Simon, T. (1916). The development of intel-
Diagnostic and statistical manual of mental disorders ligence in children (E. Kit, Trans.). Baltimore, MD:
(5th ed., text revision). Washington, DC: Author. Williams & Wilkins.
Chapter 1 • Introduction to Assessment 19
Buchanan, T. (2002). Online assessment: Desirable or Sattler, J. M. (2008). Assessment of children: Cognitive
dangerous? Professional Psychology: Research and foundations (5th ed.). San Diego, CA: Jerome M.
Practice, 33, 148–154. Sattler Publisher Inc.
Butcher, J. N. (2013). Computerized psychological Sattler, J. M., & Hoge, R. D. (2006). Assessment of
assessment. In J. R. Graham & J. A. Naglieri (Eds.), children: Behavioral, social, and clinical foundations
Handbook of psychology: Assessment psychology (5th ed.). San Diego, CA: Jerome M. Sattler
(2nd ed., pp. 165–191). Hoboken, NJ: John Wiley & Publisher Inc.
Sons. Scalise, K., & Gifford, B. (2006). Computer-based
Cohen, R. J., Swerdlik, M. E., & Sturman, E. D. (2012). assessment in e-learning: A framework for con-
Psychological testing and assessment: An introduction structing “intermediate constraint” questions and
to tests and measurement (8th ed.). Boston, MA: tasks for technology platforms. Journal of Technol-
McGraw-Hill. ogy, Learning, and Assessment. Retrieved from
Erford, B. T. (Ed.). (2006). The counselor’s guide to clini- http://www.jtla.org
cal, personality, and behavioral assessment. Boston, Selborn, M., Marion, B. E., & Bagby, R. M. (2013). Psy-
MA: Houghton Mifflin/Lahaska Press. chological assessment in adult mental health set-
Erford, B. T. (2012). Assessment for counselors. Boston, tings. In J. R. Graham, J. A. Naglieri, & I. B. Weiner
MA: Houghton Mifflin Company. (Eds.), Handbook of psychology: Assessment psychol-
Hardwood, T.M, Beutler, L. E., & Groth-Marnat, G. ogy (2nd ed., Vol. 10, pp. 241–260). Hoboken, NJ:
(2011). Integrative assessment of adult personality John Wiley & Sons.
(2nd ed.). New York, NY: Guilford Press. Sturman, E. D., Cohen, R. J., & Swerdlik, M. E. (2013).
Kamphaus, R. W., & Frick, P. J. (2010). Clinical assess- Psychological testing and assessment: An introduction
ment of child and adolescent personality and behavior to tests and measurement (8th ed.). Boston, MA:
(3rd ed.). New York, NY: Springer. McGraw-Hill.
Kaufman, A. S., & Lichtenberger, E. O. (2002). Assess- Urbina, S. (2014). Essentials of psychological testing.
ing adolescent and adult intelligence (2nd ed.). B
oston, (2nd ed). Hoboken, NJ: John Wiley & Sons.
MA: Allyn & Bacon. Weiner, I. B. (2013). The assessment process. In J. R.
Lichtenberger, E. O., Mather, N., Kaufman, N. L., & Graham & J. A. Naglieri (Eds.), Handbook of psy-
Kaufman, A. S. (2005). Essentials of assessment chology: Assessment psychology (2nd ed., Vol. 10,
report writing. Hoboken, NJ: John Wiley and Sons. pp. 3–25). Hoboken, NJ: John Wiley & Sons.
Ownby, R. L. (1997). Psychological reports: A guide to Whiston, S. C. (2012). Principles and applications of
report writing in professional psychology (3rd ed.). assessment in counseling (4th ed.). Belmont, CA:
New York, NY: Wiley. Brooks/Cole.
Sacks, P. (1999). Standardized minds: The high price of
America’s testing culture and what we can do to change
it. New York, NY: Da Capo Press.
CHAPTER
The assessment process involves collecting relevant, accurate, and reliable information
about an individual that can be used to make inferences or decisions about that person. To
collect data that is thorough enough to produce an in-depth understanding of a client,
counselors typically use multiple methods for collecting assessment information. These
methods can be grouped into three broad categories: interviews, tests, and observations.
Each category contains a wide array of formal and informal instruments and strategies,
such as unstructured interviews, rating scales, standardized tests, projective drawings,
checklists, questionnaires, and so on. Assessment also involves obtaining information
from various sources, which may include the client, family members, spouses or partners,
teachers, physicians, and other professionals. This chapter provides an overview of meth-
ods used in the assessment process and the sources of assessment information.
JJ Explain the difference between formal and informal assessment instruments and
strategies.
JJ Explain the importance of using multiple methods and multiple sources of information
in assessment.
JJ Describe the initial interview and explain its purpose in the assessment process.
JJ Explain the differences among structured, semistructured, and unstructured
interviews.
JJ Describe the categories and characteristics of tests used in the assessment process.
JJ Define observation and describe the various observation strategies and approaches used
20
Chapter 2 • Methods and Sources of Assessment Information 21
TABLE 2.1 Examples of Formal and Informal Assessment Instruments and Strategies
Method Formal Informal
as either formal or informal (see Table 2.1). Using a combination of formal and informal
assessment tools is recommended to provide an in-depth evaluation of clients, but the
right mix of formal and informal assessment methods will vary from assessment to assess-
ment. Chapter 7 will provide information on the process of selecting appropriate assess-
ment instruments and strategies.
provides a rating of suicide potential based on categories such as sex, age, depression, and
previous attempts at suicide. If the client has a moderate or high score on the SAD PER-
SONS Scale, then the counselor might decide to use the BDI-II to corroborate the informa-
tion and determine a more formal assessment of depression severity. The questions from
the SAD PERSONS Scale could be woven into the interview process seamlessly.
Structured Interviews Structured interviews are the most rigorous and the least flexible
interview format. As a formal assessment procedure, structured interviews consist of spe-
cific questions formulated ahead of time. They are commercially available standardized
instruments that have specific instructions and guidelines for administering, scoring, and
interpreting results. Using a structured interview, counselors are required to ask each cli-
ent exactly the same questions in the same manner and not deviate from the text. Although
all counselors can use structured interviews, they are especially helpful to those counselors
who are beginning to learn the process of interviewing. The advantages of structured
interviews are that (a) they ensure that specific information will be collected from all inter-
viewees; (b) they do not require as much training, because interviewers simply read from
a list of questions in a prescribed order; and (c) because of the standardization, they sub-
stantially improve the reliability of the assessment process (Erford, 2006). Because of the
consistency of the information obtained through structured interviews, they are invaluable
tools in research settings. Because counselors are not allowed to deviate from the text, the
use of structured interviews is often criticized for potentially damaging rapport with a cli-
ent and preventing the therapeutic alliance between counselor and client from being estab-
lished (Craig, 2005). It is important to note that structured interviews can be quite
time-consuming. As such, counselors in clinical settings may see structured interviews as
impractical because of time constraints.
interview is different in terms of which questions are asked and how questions are
worded, it is difficult to evaluate the reliability or validity of the information obtained
during the interview.
Interview Guidelines
Successful interviews rely on the interviewer’s ability to communicate and understand the
communications of interviewees (Sattler & Hoge, 2006). Professionals should consider the
following general guidelines before and during an interview (Groth-Marnat, 2009;
Morrison, 2008; Young, 2012):
1. Be concerned about the physical setting or environment for the interview. Interviews
will be better if the environment is quiet and comfortable. If the room is noisy or has
poor lighting, then it may detract from the quality of the information gained. Seating
should be arranged so that the interviewer and the interviewee are appropriately
spaced, with no physical barriers (such as desks) between seats.
2. Explain the purpose of the interview and how the session will proceed. Explain how
the interview information will be used.
3. Describe the confidential nature of the interview and the limits of confidentiality. In
addition, explain that the client has the right not to discuss any information he or she
does not wish to disclose.
4. When conducting a standardized semistructured or structured interview, abide by
the published administration procedures.
Chapter 2 • Methods and Sources of Assessment Information 27
Tests
Although assessment is an accepted practice in all helping professions, testing can be a
controversial process and creates a level of suspicion in the public eye. The use of tests in
the United States increased dramatically during the 20th century and continues to grow
into the new millennium. It is estimated that Americans take anywhere from 143 million to
nearly 400 million standardized tests yearly for education alone, 50 million to nearly 200
million job tests for business and industry, and several million more for government and
military jobs (Sacks, 2001). Many test results can have a large impact on an individual’s life
path. These types of tests are often referred to as High Stakes. When tests impact the poten-
tial to obtain a job, graduate from public school, or become admitted into college, or affect
other large life events, the scrutiny of testing becomes even more significant. We discuss
these issues in more depth in Chapters 15 and 17.
As a counselor using testing as part of the assessment process, it is important to
understand the elements of a test and have competency in the administration of tests. A
test may be defined simply as a measurement process. In the helping professions, educa-
tional and psychological tests are used to provide a measure of various individual attrib-
utes, such as cognitive functioning, knowledge, skills, abilities, or personality traits. Test
data is integrated into the overall assessment in a way that helps counselors better under-
stand clients and make decisions in their best interests. Tests are utilized in assessment for
a variety of purposes, including screening for emotional, behavioral, or learning problems;
classifying an individual into a certain descriptive category (e.g., introvert); selecting or
placing individuals into certain training, educational, or vocational programs; assisting in
the diagnosis of a mental disorder; assisting in intervention or treatment planning; evalu-
ating the effectiveness of a particular intervention or course of action (i.e., progress and
outcome evaluation); and hypothesis testing in research studies.
Literally thousands of tests are available in education and psychology, and it is nearly
impossible for counselors to be familiar with every test. Tests may differ on a number of
features, such as content, format, administration procedures, scoring and interpretation
procedures, and cost (Cohen, Swerdlik, & Sturman, 2012). The content (subject matter) of a
test varies depending on the purpose or focus of the particular test. Some tests are compre-
hensive, with content covering a broad range of subject areas. For example, the California
Achievement Test (CAT/6) measures several areas of achievement, including reading,
language, math, study skills, science, and social studies. In contrast, some tests have a
more narrow focus and contain content only on a single subject area, such as the SAT Sub-
ject Test in Biology.
The format of a test pertains to the type, structure, and number of items on the test. Test
items can be classified as either selected-response or constructed-response items. Selected-
response items (also called forced-choice items) require respondents to indicate which of two or
more statements is correct. Multiple-choice, true/false, and matching items are all examples
of selected-response items. Rating scales are also considered a type of selected-response
28 Chapter 2 • Methods and Sources of Assessment Information
format in which items are answered using a scale of successive intervals (rating scales are
discussed in more detail in the Observation section later in this chapter). In contrast to
selected-response items, constructed-response items require test takers to supply their own
responses (rather than selecting a given response). These include fill-in-the-blank items, sen-
tence completion, essay questions, verbal responses, performance tasks, portfolios, draw-
ings, and so on. Selected-response items are typically preferred over constructed-response
items, because they cover a broader range of content and can be answered and scored more
quickly. However, selected-response items constrain test takers to a single appropriate
answer and are subject to guessing, whereas constructed-response items allow individuals
to demonstrate more in-depth understanding and more freedom and creativity in their
responses. Figure 2.1 provides examples of some selected-response and constructed-response
formats. Tests vary widely in the number of items and the length of test-taking time: a test
can consist of 10 or 15 items and take 10 minutes to complete, it can encompass hundreds of
items and take several hours to complete, or it can consist of anything in between.
Although it is important to understand all elements of testing, it is an ethical require-
ment that counselors only use tests for which they received training and that counselors
Selected response
Multiple choice At a grocery store, a customer hands the cashier a $20 bill to pay for a
bottle of soda that costs $1.36. How much change should the cashier
give back to the customer?
A. $17.64
B. $18.36
C. $18.64
D. $18.74
E. $19.36
Rating scale I value work environments that are flexible and do not require a
specific time schedule
Strongly Neither Agree Strongly
Disagree Disagree nor Disagree Agree Agree
Constructed response
Categories of Tests
Because there are thousands of tests, it is useful to have a way to classify tests into catego-
ries. However, because tests differ from each other in a variety of ways, there is no uni-
formly accepted system of classification (Domino & Domino, 2006). Instead, tests can be
categorized based on a variety of aspects, such as the area of assessment, whether or not
the test is standardized, how scores are interpreted, how the test is administered, and item
type. As with most practices in assessment, professionals use different terminology to cat-
egorize tests, depending on their particular training, experience, and the settings in which
they work. We will review some of the common approaches to classifying tests.
Area of Assessment Tests can be classified according to the area of assessment, such as the
following:
• Intellectual Ability Tests Assess variables related to intelligence and cognitive abili-
ties, such as verbal ability, numeric ability, reasoning, memory, and processing
speed.
30 Chapter 2 • Methods and Sources of Assessment Information
Individual and Group Tests Tests can be categorized based on how they are adminis-
tered. For example, individual tests are designed for administration to only a single exam-
inee at a time. Group tests are administered to multiple individuals simultaneously.
Individual tests are typically used for diagnostic decision making and generally require
examiners to meet and establish rapport with examinees. They allow examiners to observe
verbal and nonverbal behaviors during the test administration, enabling examiners to
gain more insight about the source of the examinee’s problems. Usually, administering
individual tests requires competency. This means that a counselor should have special
training, expertise, familiarity with materials, and practice with timing procedures. A
competent test user, according to the International Test Commission (intestcom.org/
Guidelines/Test+Use.php), must have full understanding of the tests to use them appro-
priately; users must also respect all involved in the testing process by acting professionally
and appropriately.
Group tests are typically more efficient than individual tests. They are usually less
expensive than individually administered tests, they minimize the time needed for admin-
istration and scoring, and they require less examiner skill and training. Group tests usually
contain items that can be scored objectively, usually by a computer, which reduces or elim-
inates the scoring errors commonly found in individual tests.
Verbal and Nonverbal Tests Tests can be classified as either verbal or nonverbal. Verbal
tests rely heavily on language usage, particularly oral or written responses. These tests
may involve grammar, vocabulary, sentence completion, analogies, and following verbal
instructions. Because verbal tests require examinees to understand the meaning of words
and the structure and logic of language, they discriminate very heavily toward native
speakers of the language in which the test was developed. In contrast to verbal tests, non-
verbal tests reduce or completely eliminate the need for examinees to use language when
taking the test. Nonverbal tests provide opportunities for examinees to comprehend direc-
tions with little or no language, have limited linguistic content, and allow examinees to
respond to items nonverbally. For example, a nonverbal test may require a test taker to
respond to pictorial materials rather than verbal items. An example of a nonverbal test is
the Peabody Picture Vocabulary Test (4th edition; Dunn & Dunn, 2007) or the PPVT. The
PPVT is a norm-referenced test that is individually administered. The normative sample
for the test included national representation for cultural diversity and special education.
Thus, the PPVT can be used to address some of the issues relative to English as a second
language and special education issues specific to speech production.
Objective and Subjective Tests A common way to distinguish tests is based on the types
of items on the test. An objective test (i.e., structured test) contains selected-response items
(e.g., multiple choice, true/false), each of which contains a single correct or best answer. It
is considered objective, because scoring consists of matching the test taker’s item responses
to previously determined correct answers; there are no subjective or judgmental decisions
involved in the scoring process. In contrast, subjective tests consist of constructed-response
items (e.g., essay questions, performance tasks, portfolios) that require the examiner to
make judgmental decisions to score the test.
Other Terminology Strictly speaking, the term test should be used only for those proce-
dures in which test takers’ responses are evaluated based on their correctness or quality
(Urbina, 2014). Such instruments are usually maximum-performance tests that measure a
person’s knowledge, skills, or abilities. Tests that do not evaluate individuals on the basis of
correct and incorrect item responses (i.e., typical-performance tests) may be referred to by
several different names, such as inventories, questionnaires, surveys, checklists, schedules, or pro-
jective techniques. These instruments typically elicit information about an individual’s moti-
vations, preferences, attitudes, interests, opinions, and emotional makeup (Urbina, 2014).
The term scale is commonly used in connection with tests. Scales can refer to any of the
following (Urbina, 2014): (a) a whole test made up of several parts (e.g., the Stanford–Binet
32 Chapter 2 • Methods and Sources of Assessment Information
Intelligence Scale); (b) a whole test focusing on a single characteristic (e.g., the Internal–
External Locus of Control Scale); (c) a subtest, which is a set of items within a test that meas-
ures specific characteristics (e.g., the depression scale of the MMPI-2); (d) a group of subtests
that share some common characteristic (e.g., the verbal scales of the Wechsler intelligence
tests); or (e) a numerical system used to rate or categorize some measured dimension (e.g.,
a rating scale).
Battery is another term that we often see in assessment. A battery is a group of tests
or subtests administered to one person at one time (Urbina, 2014). For example, in assess-
ing achievement, test batteries may be administered that consist of separate tests that
measure such areas as reading, mathematics, and language.
Participants in the Testing Process There are many stakeholders involved in the test
industry, and therefore the Standards for Educational and Psychological Testing were
developed to provide guidelines for developing and evaluating tests, standards for con-
ducting assessment, and methods of validating the interpretation of assessment data
(American Educational Research Association (AERA), American Psychological Associa-
tion (APA), & National Council on Measurement in Education (NCME), 2014). As such, it
is important to clarify the various parties and their roles in the testing industry. For exam-
ple, test developers are usually, but not always, academicians or investigators who are
mainly interested in research. They are interested in developing a test that accurately
measures the intended construct and will conduct research studies to support their claims.
Test developers provide documentation (in the test manuals) for test users to make sound
judgments about the nature and quality of the test (American Educational Research Asso-
ciation (AERA) et al., 2014). Test publishers are the organizations or corporations that pub-
lish, market, and sell tests. They also sometimes provide scoring services. Test users are the
individuals or agencies that select tests for some purpose. They may also be involved in
administering and scoring tests and using test results to make decisions. Test users are
most interested in the appropriateness of the tests for their purposes, whereas test publish-
ers are naturally more inclined toward profit margins (Urbina, 2014). Test takers are the
individuals who take the test by choice, direction, or necessity. Table 2.4 summarizes the
various parties and their roles.
Test developers The people or organizations that construct tests. They should provide
information and supporting evidence that test users need to select
appropriate tests.
Test publishers The organizations or corporations that publish, market, and sell tests.
Test users The people or agencies that select tests that meet their purposes and are
appropriate for the intended test takers. Test users may also be involved in
administering, scoring, and interpreting tests or making decisions based on
test results.
Test takers The individuals who take the tests.
Test reviewers The individuals who conduct a scholarly review to critically evaluate a test
based on its psychometric and practical qualities.
Chapter 2 • Methods and Sources of Assessment Information 33
Computer-Based Tests Computers were first introduced to the field of psychology in the
1950s. Since that time, and particularly in the last few decades, the use of computers in
assessment has grown exponentially. Advancements in computer technology and the con-
tinued integration of this technology into educational, psychological, and counseling prac-
tice are changing the way professionals conduct assessment. Computer-based testing refers
to using computers for test administration, scoring, and interpretation and for generating
narratives and written reports (Butcher, 2003). Computer-based testing has dramatically
changed the practice of assessment from an overwhelmingly complex process involving
face-to-face administration, extensive preparation time, hand scoring using overlay tem-
plates and hand calculations, manual interpretation, and manual report writing (Cohen
et al., 2012). Today, with the help of computers test takers can respond to test items on a
computer monitor, and the computer program scores the test, analyzes the results, and
even provides some form of interpretive report or narrative (Cohen & Swerdlik, 2012).
Needless to say, computer-based testing saves test users a great deal of time and has
made the process of testing much more convenient. Computer-based tests offer other
advantages over traditional paper-and-pencil tests through immediate scoring and report-
ing of test results, test administration efficiency, flexible test administration schedules,
greater test security, and reduced costs. Computer-based testing also allows for the use of
innovative item types that are not feasible in the paper-and-pencil format, such as audio-
and video-based test items (Parshall, Spray, Kalohn, & Davey, 2002).
During the past 25 years, more than 400 studies have investigated whether results
from computer-based tests could be used interchangeably with paper-and-pencil test
results. Increasingly, tests are being adapted for computerized administration and have
scores comparable to paper-and-pencil administration of the same tests (Boo & Visopel,
2012). It is important to note that some tests have not been adapted effectively for com-
puter-based administration. Counselors should review the psychometric properties of the
test in relation to the mode of administration prior to selecting a computerized version.
Although some limits to computer-based testing still exist, the process continues to expand,
and more tests are becoming available via computer administration. One test that is of
interest to many counselors is the National Counselor Exam (NCE). This 200-item, multi-
ple-choice test is available in both paper-and-pencil and computer-based administrations.
The NCE is the exam required to become a National Certified Counselor and also the exam
required for licensure in many states.
One type of computer-based test commonly used in achievement testing is computer-
adaptive tests. A computer-adaptive test is a test that tailors (or adapts) test questions to the
ability of each test taker. Each time a test taker answers a question, the computer adjusts to
the individual’s responses when determining what question to present next. For example,
a computer-adaptive test will start with a question that is moderately difficult. If the ques-
tion is answered correctly, then the next question will be more difficult. If it is answered
incorrectly, then the next question will be easier. This process continues until all questions
are answered, at which point the computer will determine the test taker’s ability level.
Because the computer scores each item before selecting the next one, only one question is
presented at a time, and the test takers may not skip, return to, or change responses to
previous questions. One example of a computer-adaptive test is the Graduate Manage-
ment Admission Test (GMAT), which is used as part of the admission process for a gradu-
ate degree in business. The GMAT computer-adapted test begins with the assumption that
the test taker has an average score and thus begins with an item of medium difficulty.
34 Chapter 2 • Methods and Sources of Assessment Information
Based on performance on the first question, the difficulty level and points of the exam are
adjusted by the computer.
Observation
So far, we have covered two methods of assessment: interviews and tests. The third method
is observation, which is widely used in psychological and educational assessment. Observa-
tion is monitoring the actions of others or oneself in a particular context and making a record
of what is observed (Aiken & Groth-Marnat, 2006). It is a way of seeing what a person actu-
ally does in situations, rather than simply making inferences about behavior based on infor-
mation from interviews or test results. Behavioral observation can provide professionals
with information about an individual’s functioning, such as emotional responses, social
interactions, motor skills (i.e., body movements), and job performance, to name just a few
(Murphy & Davidshofer, 2005). It is particularly useful for identifying patterns of behav-
ior—that is, identifying the immediate behavior, its antecedents (what happened just before
the behavior), and its consequences (what happened afterward). The process of identifying
behavior patterns is often employed in P–12 schools through an approach called functional
behavior assessment (FBA), which seeks to identify the problem behavior of a student and
determine the function or purpose of the behavior. Counselors can use the results of an FBA
to develop interventions or teach acceptable alternatives to the behavior.
Understanding the function of behavior is not limited to the P–12 system. Counselors
working with children and adolescents in any area should have an understanding of
behavioral observation and FBA. In many cases, counselors can help parents, schools, res-
idential facilities, or inpatient settings to develop behavioral interventions for children and
adolescents. Understanding the elements of observation and behavioral function are criti-
cal for this task. For example, imagine a child who is disrupting an activity because he does
not want to participate. The function (purpose) of his behavior in this case is probably to
escape the activity. If you used time out as a consequence and removed the child from the
activity, then the child would actually be removed from the activity, and get what he
wanted. This would reinforce his behavior rather than change it. One of the more appro-
priate actions in this scenario would be to ignore the behavior or to find opportunities to
reinforce the appropriate actions of others.
Counselors may use formal assessment instruments (e.g., standardized rating scales
and computer-based observation software) or informal strategies (e.g., raw notes) to con-
duct observations. Observations can be a one-shot affair or consist of several samplings
over a longer time span. Observers may center on specific behaviors that are objective and
measurable or on general, overall behavior or adjustment. Depending on the context and
the age of the client, observations can be made and recorded by professionals, significant
others, any other person acquainted with the client, or the client himself or herself. Exam-
ples of observation include the following:
• A school counselor observes a child interacting with his classmates on the school
playground to evaluate his social skills.
• A family therapist views a videotape of parents and children playing together to
assess parenting skills.
• An adult who wants to change his eating patterns records his thoughts and feelings
prior to feeling an urge to overeat.
Chapter 2 • Methods and Sources of Assessment Information 35
Event Recording Event recording (also called frequency recording) is the simplest of the
observation data collection methods. It requires an observer to observe, count, and record
the number of times a behavior has occurred. Event recording is best suited to recording
occurrences of low-rate behaviors, which are behaviors that have a definite beginning and
ending and do not often occur (e.g., a student leaving his or her seat; Shapiro, 1987). A tally
sheet listing the behaviors to be observed and counted is useful: When the observer sees
the behavior of interest, he or she simply makes a tick mark on the sheet (see Figure 2.2).
After the observation, the tick marks can be totaled.
Duration Recording Duration recording is used when it is more important to know for
how long a behavior occurs rather than the frequency of the behavior. In duration record-
ing, the length of time of a behavior from beginning to end is tracked. It is most applicable
Chapter 2 • Methods and Sources of Assessment Information 37
Date(s) 2 /1 to 2/5
Name _________________________________
Observer ______________________________
Description of Behavior: Leaving seat during science class
for recording sustained behaviors that have a clear beginning and a clear ending; it is not
recommended for behaviors that occur at a very high rate. Crying, temper tantrums, and
thumb sucking are examples of behaviors that can be documented using duration record-
ing (see Figure 2.3; Shapiro, 1987). Duration recording usually requires a watch or clock so
that a precise measurement of the behavior can be recorded.
Time Sampling In both event and duration recording techniques, all occurrences of the
behaviors are documented during the observational period. However, some behaviors
occur too frequently to obtain accurate counts or have no clear beginning or ending, which
prevents effective event and duration recording (Kamphaus, Barry, & Frick, 2005). In these
cases, time sampling is a more appropriate data collection method. Time sampling (some-
times referred to as interval recording, interval sampling, or interval time sampling) divides
observation periods into specific time intervals; then, behaviors are simply coded as being
Problem 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’ 1’
Total
Behaviors 30” 30” 30” 30” 30” 30” 30” 30” 30” 30” 30” 30” 30” 30” 30”
Inappropriate
Movement
Inattention
Inappropriate
Vocalizations
Repetitive
Motor
Movements
Aggression
FIGURE 2.4 Sample items from the BASC-2 student observation scale.
Note: Quotation marks (“) refer to seconds. Apostrophes (‘) refer to minutes.
Source: Behavior Assessment System for Children, Second Edition (BASC-2). Copyright © 2004 by NCS Pearson, Inc.
Reproduced with permission. All rights reserved.
either present or absent during each time interval. Interval length varies, depending on the
frequency of the behavior, the amount of time allowed for the observation, and the skill of
the observer in monitoring and recording child behavior (Nock & Kurtz, 2005). As an
example, the Behavior Assessment System for Children (BASC-2; Reynolds & Kamphaus,
2001) is a published, comprehensive system that assesses children’s behavior. It includes a
time-sampling method for observing students in which observers code behaviors (such as
responses to the teacher, peer interactions, or working on school subjects) repeatedly dur-
ing 3-second intervals spaced 30 seconds apart for a total of 15 minutes. The percentage of
intervals during which a given behavior occurred can be calculated to provide information
about the frequency of adaptive and maladaptive behaviors. An example of the BASC-2
Student Observation Scale is shown in Figure 2.4.
Rating Scales In observation, rating scales are used to describe and evaluate an indi-
vidual’s specific behaviors. Rating scales often appear as preprinted sheets on which the
observer rates each behavior to indicate either its quality or how often the behavior
occurred. Rating scales are an efficient means of collecting information about a variety of
behaviors, from general functioning to such specific behaviors as social skills, aggressive
behavior, anxiety, and hyperactivity, to name a few. They can be used repeatedly and
across settings and can be completed by various sources (e.g., the client, teachers, par-
ents, counselors).
Rating scales usually list the important dimensions of the behaviors being rated and
quantify those dimensions in some way to convey meaning (Rutherford, Quinn, & Mathur,
2007). A few commonly used rating scale formats in observation include Likert scales,
graphic scales, and semantic differential scales. The Likert scale (named after the psycholo-
gist who developed it) consists of a series of written statements to which respondents
indicate how much they agree or disagree (i.e., strongly disagree, disagree, neither agree nor
disagree, agree, strongly agree). The graphic rating scale is similar to the Likert scale, except
that it presents respondents with a graphic 5- or 7-point continuum ranging from never to
always or from strongly disagree to strongly agree. Verbal anchors are typically placed at
Chapter 2 • Methods and Sources of Assessment Information 39
Graphic scale The child’s aggressive behavior occurs following a request to perform a
difficult task.
1 2 3 4 5
anxious_______________________________________________calm
1 2 3 4 5 6 7
depressed________________________________________________cheerful
1 2 3 4 5 6 7
various points along the scale. Semantic differential rating scales consist of bipolar adjectives
separated by a 7-point scale on which respondents select one point to indicate their
response. These rating scales are illustrated in Figure 2.5.
An important issue in using rating scales is the identity of the informant (i.e., source
of information) who completes the scale. Typically, informants are teachers, parents, or oth-
ers who know the individual being assessed or the individual himself or herself. Some
informants are in a better position to rate certain behaviors; for example, parents are more
likely to be knowledgeable about a child’s sleep patterns, sibling interactions, eating
behaviors, and so forth (Rutherford et al., 2007). Some published rating scales may be
designed for only one informant (e.g., parents only, children only), and some have sepa-
rate forms for multiple informants (e.g., teachers, parents, child). For example, the Achen-
bach System of Empirically Based Assessment (ASEBA) includes a series of rating scales
designed to assess children’s behavior problems and social competencies by using sepa-
rate rating forms for parents, teachers, and children (see Figure 2.6 for a sample from the
Teacher’s Report Form [TRF] for Ages 6–18; Achenbach & Rescorla, 2001). The best prac-
tice in using rating scales is to employ multiple informants across situations and settings
(Rutherford et al., 2007). This provides a more complete view of an individual’s behavior
across situations and settings.
The most significant criticism of rating scales involves rater (informant) bias, which
can affect the validity of the instrument. Table 2.5 provides a list of common rating errors.
VIII. Compared to typical 1. Much 2. Some- 3. Slightly 4. About 5. Slightly 6. Some- 7. Much
pupils of the same age: less what less average more what more
less more
FIGURE 2.6 Sample of the Teacher’s Report Form for Ages 6–18.
Source: Reprinted with permission from Manual for the ASEBA School-Age Forms and Profiles, by T. M. Achenbach
& L. A. Rescorla, 2001, Burlington, VT: University of Vermont, Research Center for Children, Youth, & Families.
account of what an individual says and does during a particular period of time, or it may
be a single record of a significant incident (e.g., critical incident report). As a method of
informal observation, anecdotal records may consist of notes written on index cards or in
a log, which are more fully elaborated upon and summarized after the observation. As a
formal observation method, anecdotal records can be very detailed accounts written on
formal record forms. They typically include the name of the individual observed; the name
of observer; the date, time, and setting of the observation; the anecdote (i.e., the description
Self-Monitoring
Self-monitoring (also called self-assessment) is one of the most commonly used assessment
approaches in research and practice (Hersen, 2006). Self-monitoring is a process by which
an individual tracks and records their own specific behaviors and related (Cohen et al.,
2012). It can be successfully employed to monitor and record the frequency, antecedents,
and consequences of problematic behaviors (e.g., social anxieties, phobias, social skills
problems, habits). For example, individuals with anger-management problems may use
self-monitoring to record the date, time, and location of an anger episode and the events,
thoughts, and feelings that preceded the episode.
Self-monitoring can be used for both assessment and intervention. Simply gaining
awareness about one’s patterns of behavior can affect the frequency with which behaviors
occur. However, the usefulness of self-monitoring depends a great deal on the individual’s
compliance in recording his or her behavior. Self-monitoring requires individuals to be
aware that they are engaging in the behavior and to have a timely and efficient means of
logging behavior occurrences. Behaviors can be recorded using frequency marks, check-
lists, or other means. To help with recording behaviors in a timely fashion, various devices
are available, such as golf wrist score counters and digital timers or alarms.
Self-monitoring can also include the use of autobiographies, diaries, journals, letters,
stories, and poems. These approaches help provide insight into an individual’s behavior,
attitudes, and personality dimensions.
Collateral Sources
Typically, the primary source of information is the individual who is being evaluated—but
not always. Sometimes, a wellspring of information can come from people who know the
individual best. Any third party who provides information is considered a collateral source
(Table 2.6 provides a list of potential collateral sources). Family members, spouses or part-
ners, and others close to the individual being evaluated are useful personal sources of
information. Information can also be obtained from professional sources involved with the
individual, such as teachers, physicians, and mental health professionals. Collateral infor-
mation is generally obtained from interviews with the third party, either face-to-face or by
telephone (Heilbrun, Warren, & Picarello, 2003). Although family and friends often want
to give opinions, information should focus primarily on behavior descriptions (i.e., what
the informant has personally observed or witnessed). Rating scales completed by collateral
sources are often an effective means for obtaining behavioral descriptions of the client.
42 Chapter 2 • Methods and Sources of Assessment Information
Another valuable source of collateral information can be found in records (Craig, 2009),
which can include school grades or attendance, previous psychological or educational
assessments, mental health treatment plans or summaries, court documents, and letters
from medical providers, to name just a few. In general, information from collateral sources,
especially from more neutral professional sources, can be considered more objective and
reliable than information obtained directly from examinees (Austin, 2002).
The extent to which a counselor uses collateral information depends on the purpose
of the evaluation, the complexity of the client’s presentation, and the intervention goals. It
is particularly important in cases involving individuals with impaired insight into their
problems, such as those with substance-abuse problems or cognitive disabilities (American
Psychiatric Association, 2006). For example, in a psychiatric crisis center, collateral infor-
mation may be crucial to understanding a client’s acute mental health problems. When
assessing children, counselors regularly obtain information from parents about the child’s
behavior at home, or they may contact the child’s teacher to obtain information about the
child’s functioning at school.
Collateral information is considered a necessary component in all forensic evaluations,
which are evaluations that address a given legal issue (e.g., child custody evaluations;
Cavagnaro, Shuster, & Colwell, 2013; Ertelt & Matson, 2013). Information from third par-
ties may offset the potential bias of the examinee’s self-report. For example, it can be
expected that the parents in a child custody evaluation may intentionally or unintention-
ally distort, exaggerate, or minimize the information they present during an interview so
that the examiner will view them favorably. In this case, collateral information from
Chapter 2 • Methods and Sources of Assessment Information 43
rofessional sources can help evaluators scrutinize the credibility of data obtained from
p
parents (Patel & Choate, 2014).
An important issue in obtaining information from collateral sources is the confiden-
tiality of the individual being evaluated. Permission must be obtained through consent
forms signed by the client (or the client’s parent) before professionals can request informa-
tion or contact third parties.
Summary
In this chapter, we provided an overview of the formal or informal instruments and strategies.
three types of assessment. We will continue to Although the client is usually the primary
integrate information about interviews, tests, source of information, collateral information is
and observations throughout the remainder of often gathered from relatives, friends, teach-
the textbook. It is important to note that counse- ers, health professionals, and other relevant
lors and other helping professionals need super- parties. Information may also come from docu-
vised training on each type of assessment in ments, such as medical records, school records,
order to achieve competency. In addition to and written reports of earlier assessments.
covering the types of assessment, we also intro- Counselors are required to integrate informa-
duced you to types of data and sources of infor- tion from multiple data collection methods
mation used to obtain a complete and accurate and multiple sources to form impressions and
picture of the individual being evaluated. make recommendations about the individual
Regardless of the type of assessment, being assessed.
counselors may choose from a wide array of
Suggested Activities
1. Write a brief essay on the subject of assessment. brainstorm and construct as many questions as
Discuss your thoughts on assessment in general possible for each domain.
and your beliefs about its purpose in the helping 4. Using the semistructured interview developed in
professions. the previous activity, role-play an interview in
2. Interview a school counselor, mental health which one student is the counselor, one is the cli-
counselor, or marriage and family therapist. ent, and one is an observer. The counselor asks the
Inquire about which assessment instruments and client the questions that were developed for the
strategies he or she uses on a regular basis. semistructured interview. After the interview, all
3. In triads, develop a semistructured interview. three may then discuss the following: How did
Use the general domains listed in Table 2.3, and the counselor feel asking questions? How much
44 Chapter 2 • Methods and Sources of Assessment Information
did the client respond to questions? Were some 6. Search the Internet for behavior observation
questions harder to ask or answer than other charts. Select three different charts to compare
questions? What counselor and client behaviors and contrast. In small groups, discuss the three
did the observer notice throughout the interview? charts and the types of observations for which
5. In a small group, develop a behavior rating scale they might be designed. Determine a behavior
that evaluates counseling skills. To do this, you that the group would want to observe, and dis-
must first determine the specific counseling skills cuss which form might be the best choice. What
to be evaluated and then decide on the type of modifications would the group have to make in
scale (numerical scales, graphic scales, or seman- order to conduct the observation?
tic differential scales) used to measure the skills.
References
Achenbach, T. M., & Rescorla, L. A. (2001). Manual for Cavagnaro, A. T., Shuster, S., & Colwell, K. (2013).
the ASEBA School-Age Forms and Profiles. Burling- Classification discrepancies in two intelligence
ton, VT: University of Vermont, Research Center tests: Forensic implications for persons with
for Children, Youth, & Families. developmental disabilities. Journal of Forensic Psy-
Aiken, L. A., & Groth-Marnat, G. (2006). Psychological chology Practice, 13(1), 49–67. doi:10.1080/1522893
testing and assessment (12th ed.). Boston, MA: 2.2013.750968
Pearson. Cohen, R. J., & Swerdlik, M. E., & Struman, E. (2012).
American Counseling Association. (2014). ACA Code Psychological testing and assessment: An introduction
of Ethics. Alexandria, VA: Author. to tests and measurement (8th ed.). Boston, MA:
American Educational Research Association (AERA), McGraw-Hill.
American Psychological Association (APA), & Craig, R. J. (2005). The clinical process of interview-
National Council on Measurement in Education ing. In R. J. Craig (Ed.), Clinical and diagnostic inter-
(NCME). (2014). Standards for educational and psy- viewing (2nd ed., pp. 21–41). Lanham, MD: Jason
chological testing. Washington, DC: Authors. Aronson.
Hardwood, T. M., Beutler, L. E., & Groth-Marnat, G. Craig, R. J. (2009). The clinical interview. In J. N. Butcher
(2011). Integrative assessment of adult personality. (Ed.), Oxford handbook of personality assessment
New York: Guilford Press. (pp. 201–225). New York, NY: Oxford University
American Psychiatric Association. (2006). Practice Press. doi:10.1093/oxfordhb/9780195366877.013.0012
guidelines for the psychiatric evaluation of adults Domino, G., & Domino, M. L. (2006). Psychological
(2nd ed.). Arlington, VA: Author. testing: An introduction. New York, NY: Cam-
Austin, W. G. (2002). Guidelines for using collateral bridge University Press.
sources of information in child custody evalua- Dunn, L. M. & Dunn, D. M. (2007). The Peabody Pic-
tions. Family Court Review, 40(2), 177–184. ture Vocabulary Test (4th ed.). Bloomington, MN:
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Man- NCS Pearson Inc.
ual for the Beck Depression Inventory-II. San Anto- Erford, B. T. (2006). Counselor’s guide to clinical, person-
nio, TX: Psychological Corporation. ality, and behavioral assessment. Boston, MA: Lahaska.
Boo, J., & Vispoel, W. (2012). Computer versus paper- Erford, B. T. (2013). Assessment for counselors. (2nd ed.).
and-pencil assessment of educational develop- Boston, MA: Houghton Mifflin Company.
ment: A comparison of psychometric features and Ertelt, T., & Matson, K. (2013). Accurately presenting
examinee preferences. Psychological Reports, your forensic examination in a cohesive forensic
111(2), 443–460. doi:10.2466/10.03.11. report. PsycCRITQUES, 58(43). doi:10.1037/
PR0.111.5.443-460 a0034697
Butcher, J. N. (2003). Computerized psychological Groth-Marnat, G. (2009). Handbook of psychological
assessment. In J. R. Graham, J. A. Naglieri, & I. B. assessment. Hoboken, NJ: John Wiley & Sons.
Weiner (Eds.), Handbook of psychology: Assessment Harwood, R. M., Beutler, L. E., & Groth-Marnat, G.
psychology (Vol. 10, pp. 141–163). New York, NY: (Eds.). Integrative Assessment of Adult Personality
John Wiley & Sons. (3rd ed.). New York, NY: Guilford Press.
Chapter 2 • Methods and Sources of Assessment Information 45
Heilbrun, K. (2001). Principles of forensic mental health c ounselors who are court-appointed as child cus-
assessment (Perspectives in law and psychology, tody evaluators. Journal of Mental Health Coun-
Vol. 12). New York, NY: Kluwer. seling, 36(1), 18–30.
Heilbrun, K., Warren, J., & Picarello, K. (2003). Third- Reynolds, C. R., & Kamphaus, R. W. (2001). Behavior
party information in forensic assessment. In A. M. assessment system for children: Manual. Circle Pines,
Goldstein & I. B. Weiner (Eds.), Handbook of psy- MN: American Guidance Service.
chology: Forensic psychology (Vol. 2, pp. 69–86). Rutherford, R. B., Quinn, M. M., & Mathur, S. R.
Hoboken, NJ: John Wiley & Sons. (2007). Handbook of research in emotional and behav-
Hersen, M. (2006). Clinician’s handbook of adult behav- ioral disorders. New York, NY: Guilford.
ioral assessment. Burlington, MA: Elsevier. Sacks, P. (2001). Standardized minds: The high price of
Hersen, M., & Turner, S. M. (2012). Adult psychopa- America’s testing culture and what we can do to change
thology and diagnosis (6th ed.). Hoboken, NJ: Wiley it. New York, NY: Da Capo Press.
& Sons. Salkind, N. J. (2012). Tests and measurement for people
International Test Commission. (2013, October 8). who (think they) hate tests and measurement (2nd ed.).
ITC guidelines on test use. Retrieved from http:// Thousand Oaks, CA: Sage.
www.intestcom.org/upload/sitefiles/41.pdf Sattler, J. M., & Hoge, R. D. (2006). Assessment of
Jones, K. (2010). The unstructured clinical interview. children: Behavioral, social, and clinical foundations
Journal of Counseling & Development, 88(2), (5th ed.). San Diego, CA: Jerome M. Sattler Pub-
220–226. doi:10.1002/j.1556-6678.2010.tb00013.x lisher Inc.
Kamphaus, R. W., Barry, C. T., & Frick, P. J. (2005). Shapiro, E. S. (1987). Academic problems. In M.
Clinical assessment of child and adolescent personality Hersen & V. Van Hasselt (Eds.), Behavior therapy
and behavior (3rd ed.). New York, NY: Springer. with children and adolescents: A clinical approach
Miller, C. (2009). Interviewing strategies. In M. (pp. 362–384). New York, NY: John Wiley & Sons.
Hersen & D. L. Segal (Eds.), Diagnostic interviewing Sommers-Flanagan, J., & Sommers-Flanagan, R.
(4th ed., pp. 47–66). New York, NY: Kluwer (2008). Clinical interviewing (4th ed.). Hoboken, NJ:
Academic/Plenum Publishers. John Wiley & Sons.
Morrison, J. (2008). The first interview (3rd ed.). New Urbina, S. (2014). Essentials of psychological testing
York, NY: Guilford. (2nd ed.). Hoboken, NJ: John Wiley & Sons.
Murphy, K. R., & Davidshofer, C. O. (2005). Psycho- Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olson,
logical testing: Principles and applications (6th ed.). J. (2008). Comparability of computer-based and
Upper Saddle River, NJ: Merrill/Prentice Hall. paper-and-pencil testing in K–2 reading assess-
Nock, M. K., & Kurtz, S. M. S. (2005). Direct behavioral ments: A meta-analysis of testing mode effects.
observation in school settings: Bringing science to Educational & Psychological Measurement, 68, 5–24.
practice. Cognitive and Behavioral Practice, 12, 359–370. Weiner, I. B. (2013). The assessment process. In J. R.
Opie, C. (2004). Doing educational research: A guide to Graham, J. A. Naglieri, & I. B. Weiner (Eds.),
first-time researchers. London: Sage. Handbook of psychology: Assessment psychology
Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2nd ed., Vol. 10, pp. 3–25). Hoboken, NJ: John
(2002). Practical considerations in computer-based Wiley & Sons.
testing. New York, NY: Springer. Young, M. E. (2012). Learning the art of helping: Build-
Patel, S., & Choate L. (2014). Conducting child cus- ing blocks and techniques (5th ed.). Upper Saddle
tody evaluations: Best practices for mental health River, NJ: Merrill/Prentice Hall.
CHAPTER
3 Statistical Concepts
JJ Describe measures of central tendency, including the mean, median, and mode.
scores, how frequently certain scores can be obtained, or how spread out scores were
among a group of test takers. Because counselors frequently are involved in the assess-
ment process and in making decisions based on assessment results, they should be famil-
iar and comfortable with some basic statistical concepts. A statistic is a numerical
representation of information. Whenever we quantify or apply numbers to data in order
to organize, summarize, or analyze information, we are using statistical methods. There
are two general classes of statistical methods: descriptive and inferential. Descriptive sta
tistics play an important role in interpreting instrument scores. They are used to describe
and summarize large amounts of data in a clear and understandable way. In assessment,
the simplest descriptive statistic used is frequency, or the count of the occurrences of a
particular test score. Other descriptive statistics used in assessment include measures of
central tendency (e.g., mean, median, mode), measures of variability (e.g., variance,
standard deviation), and measures of relationship (i.e., correlation). Inferential statistics
are used when we want to make inferences about a large group of people, called a popula
tion, by examining the characteristics of randomly selected subgroups from the popula-
tion, called samples. Whereas descriptive statistics are used to describe the basic features
of data, inferential statistics are used to reach conclusions that extend beyond the immedi-
ate data alone.
One of the most basic concepts frequently encountered when discussing statistics is
associated with the term variable. A variable is simply anything that can take on more than
one value. Variables can be visible (e.g., gender, hair color) or invisible (e.g., personality,
intelligence, aptitude). Variables can be defined in terms of numerical values (e.g., the
number of children in a family, the average income of individuals in a country) or by dif-
ferent categories (e.g., gender, relationship status; Urbina, 2014). In formal terms, variables
may be classified as quantitative, qualitative, discrete, continuous, observable, or latent
(see Table 3.1). In assessment, we are interested in measuring variables related to educa-
tional or psychological constructs, such as achievement, intellectual ability, interests, and
personality traits.
Quantitative Have a numerical value (e.g., age, income, numerical test scores)
Qualitative Nonnumeric in nature (e.g., gender, relationship status, political party
affiliation)
Continuous Quantitative variables that can take on any value and be subdivided infinitely;
a measure of “how much” (e.g., height, weight, time spent completing a
task, degree of anxiety)
Discrete Quantitative variables that consist of a basic unit of measurement that cannot
be subdivided; a measure of “how many” (e.g., the number of people in a
household)
Observable Can be observed and directly measured
Latent Cannot be directly observed or measured, but are inferred from other
variables that are observed and measured; also called hidden variables, model
parameters, hypothetical variables, or hypothetical constructs
48 Chapter 3 • Statistical Concepts
Scales of Measurement
Any discussion of statistics must begin with an understanding of measurement. Measure
ment is the process of assigning numbers or symbols to represent objects, traits, or behav-
iors (i.e., variables) according to a set of logical rules. For example, to complete a customer
satisfaction survey, customers must use a specific set of rules to assign numbers that indi-
cate their level of satisfaction. They might rate service on a 10-point scale, where 1 means
very dissatisfied and 10 means very satisfied.
We use descriptive statistics to describe or summarize measurement data; however,
the way that data can be summarized depends on the scale of measurement. Scales of mea
surement refers to ways in which variables are defined and categorized. There are four
scales of measurement: nominal, ordinal, interval, and ratio. The type of scale is determined
by the presence or absence of three qualities: (a) magnitude, meaning the inherent order of
numbers from smaller to larger; (b) equal intervals, which means that there are equal dis-
tances between adjacent points on the scale; and (c) the presence of an absolute or true zero,
which refers to the zero point representing the absence of the property being measured
(e.g., no behavior, none correct; see Table 3.2).
Nominal Scale
The simplest scale of measurement is the nominal scale, which is used to describe qualita-
tive variables that can be categorized based on one or more distinguishing characteristics.
Because the nominal scale represents only names, it has none of the three qualities (i.e.,
magnitude, equal intervals, absolute zero). Nominal scale variables are considered discrete
and can be placed into one (and only one) mutually exclusive and exhaustive category. The
following are some examples:
Relationship status:
_____ Single
_____ Married
_____ Widowed
_____ Divorced
_____ Separated
Gender:
_____ Male
_____ Female
Ordinal Scale
Similar to nominal scales, ordinal scales involve classification of discrete variables. How-
ever, in addition to classification, the scale includes the property of magnitude. This means
that variables are rank-ordered from the greatest to the least or the best to the worst. For
example, on an introversion–extroversion continuum, an extremely introverted individual
might be assigned the lowest rank of 1, whereas an overly extroverted individual might
receive the highest rank. On other scales, students could be placed in rank order on the
basis of such attributes as height (arranged from tallest to shortest), examination results
(arranged from best to worst), or grade point average (arranged from highest to lowest).
Here is another example:
Please rate your degree of competence in the listed behaviors using the following
key: 1 = low, 2 = below average, 3 = average, 4 = above average, 5 = high.
1. Oral communication skills 1 2 3 4 5
2. Written communication skills 1 2 3 4 5
3. Listening skills 1 2 3 4 5
Problems can arise with the ordinal scale, because the size of the intervals can be
unequal. For example, an individual might be extremely strong in certain areas, with min-
ute differences distinguishing those competencies, and extremely weak in others. Prob-
lems can also occur when comparing rankings across groups. Individuals with rankings of
1 in each group might vary tremendously on the variable being ranked. In addition, the
numbers used for rankings do not reflect anything quantitative about the variable being
ranked.
Interval Scale
The interval scale represents a higher level of measurement than the ordinal scale. Not
only does the interval scale consist of ordered categories, but the categories form a series
of equal intervals across the whole range of the scale. Thus, interval scales encompass
qualities of both magnitude and equal intervals. An important characteristic of interval
scales is that there is no absolute zero point; that is, there can be no absence of the variable
50 Chapter 3 • Statistical Concepts
being measured. For instance, the Fahrenheit scale is an interval scale, because each degree
is equal but with no absolute zero point. This means that although we can add and subtract
degrees (100 °F is 10 degrees warmer than 90 °F), we cannot multiply values or create
ratios (100 °F is not twice as warm as 50 °F).
Ratio Scale
In addition to all the qualities of nominal, ordinal, and interval scales, the ratio scale consists of
a true or absolute zero point. This means that it contains all three qualities: magnitude, equal
intervals, and absolute zero. Thus, this scale permits all types of meaningful mathematical
calculations. Age, height, weight, and scores on a 100-point test are examples of ratio scales.
Describing Scores
As stated earlier in the chapter, descriptive statistics are used to describe the basic features
of the data in a clear and understandable way. In this section, we will discuss the basic
descriptive statistics that are used to describe and organize test scores to make them more
manageable and understandable. Let’s begin with a picture of what test scores look like.
Let’s suppose we have administered three tests to 30 youthful offenders who have
been assigned to our cottage in a correctional facility. The scores of the Nowicki–Strickland
Locus of Control Scale, Slosson Intelligence Test (SIT-R3), and the Jesness Inventory are
listed in Table 3.3.
Let’s look first at the scores on the Nowicki–Strickland Locus of Control Scale, given in
the LOC column, which measures children’s locus of control (internal or external) as defined
by Rotter (1966). Individuals with a strong internal locus of control tend to attribute outcomes
of events to their own control, whereas individuals with a strong external locus of control tend
to attribute outcomes of events to external circumstances. On the Nowicki–Strickland scale,
low scores indicate internality and high scores indicate externality. With the way that the
scores are listed on the table, it is hard to get a picture of the characteristics of this group of
youths. To visualize the data better, we could arrange the scores from high to low:
15 9, 9, 9 6, 6 3, 3, 3, 3
14 8, 8, 8 5, 5, 5, 5 2, 2, 2
13 7, 7 4, 4, 4, 4, 4 1
We can determine at a glance the highest score and the lowest score, and we can find the
approximate middle. However, when a large number of test scores has a greater range (the
spread between the high and low scores), it is not as easy to organize and record the scores
and see an overall picture. To organize and describe large numbers of scores, researchers
and test developers use several different types of descriptive statistics:
• Frequency distributions
• Measures of central tendency
• Measures of variability
• Measures of relationship
Frequency Distributions
A distribution is simply a set of scores. These scores can represent any type of variable and
any scale of measurement. One of the most common procedures for organizing and
Chapter 3 • Statistical Concepts 51
TABLE 3.3 Scores on the Nowicki–Strickland Locus of Control Scale, Slosson Intelligence
Test, and Jesness Inventory*
Youth LOC SIT-R3 AC IS CA SO CO EN RA CM IN SC CS RE FR UN
1 1 110 12 27 24 17 23 18 22 20 13 12 28 37 18 26
2 3 120 18 27 28 23 27 16 20 24 15 18 29 33 19 34
3 5 99 15 30 20 21 28 23 19 21 12 16 31 34 18 27
4 3 95 12 22 23 16 22 18 19 21 19 15 26 33 19 33
5 8 93 12 28 18 17 25 18 22 19 12 15 30 36 18 26
6 3 112 15 24 24 20 20 17 22 22 20 12 24 35 19 29
7 9 108 13 26 16 15 23 15 19 17 19 16 18 34 15 31
8 8 82 9 21 19 16 22 16 14 16 18 11 22 30 11 24
9 2 115 11 30 21 17 23 18 19 24 20 18 25 41 21 26
10 14 70 19 20 19 22 21 18 21 14 14 15 30 40 17 30
11 9 99 20 28 22 25 27 13 25 21 13 16 23 37 21 35
12 5 83 8 24 23 15 18 22 18 18 10 9 24 34 15 17
13 5 88 17 29 19 22 26 20 23 18 16 18 24 29 16 31
14 15 75 12 21 15 15 20 15 17 18 17 11 19 34 18 26
15 6 102 15 25 21 21 29 16 19 16 16 14 28 32 16 32
16 7 76 13 28 16 17 19 13 14 19 17 16 22 32 16 27
17 3 85 16 28 15 21 25 17 16 19 17 17 32 41 19 30
18 2 112 14 29 21 19 24 20 18 21 20 18 25 40 20 32
19 13 79 13 17 20 14 27 17 12 18 15 12 25 18 16 25
20 2 117 16 24 22 22 23 17 15 15 15 15 25 32 16 28
21 4 113 15 27 16 20 23 18 22 17 15 15 29 33 22 27
22 9 91 9 27 17 11 18 14 21 16 15 12 26 32 14 18
23 6 107 9 20 18 15 29 16 18 13 18 15 22 28 14 29
24 5 105 18 28 25 23 24 16 20 20 17 15 25 35 19 26
25 4 109 15 27 22 22 22 19 15 20 17 15 26 39 15 23
26 4 107 18 25 25 22 30 17 17 18 18 19 29 38 18 33
27 7 83 17 27 23 22 21 16 23 21 18 15 27 33 17 28
28 4 111 16 16 16 20 25 8 16 13 11 18 21 34 20 32
29 4 109 9 20 13 13 20 18 14 14 16 12 23 28 13 19
30 8 101 17 27 19 21 26 13 21 10 16 16 28 33 20 27
*LOC = Nowicki–Strickland Locus of Control Scale. SIT-R3 = Slosson Intelligence Test. The Jesness scales are
Anger Control (AC), Insight (IS), Calmness (CA), Sociability (SO), Conformity (CO), Enthusiasm (EN), Rapport
(RA), Communication (CM), Independence (IN), Social Concern (SC), Consideration (CS), R esponsibility (RE),
Friendliness (FR), and Unobtrusiveness (UN).
15 / 1
14 / 1
13 / 1
12 0
11 0
10 0
9 /// 3
8 /// 3
7 // 2
6 // 2
5 //// 4
4 ///// 5
3 //// 4
2 /// 3
1 / 1
Sometimes there is a greater range of scores because there are more items on the test
and greater variability of performance among the test takers. In such a situation, it is easier
to make a frequency distribution by using a group or series of scores rather than using each
individual score. That grouping is called a class interval. For example, looking at the Slosson
Intelligence Test (SIT-R3) scores in Table 3.3, we notice that scores range from 70 to 120. To
list each individual score on a frequency distribution table would be much too cumbersome
with 51 different score values. Instead, let’s group the scores into a more manageable size—
say, 10 (the choice of grouping is arbitrary). To construct a table, we need to determine the
size of each of the 10 class intervals—that is, the number of score values to be included in
each class interval. To find the class interval size, use the following formula:
Putting this all together, the steps for determining the class intervals on the SIT-R3 include
the following:
120–124 / 1
115–119 // 2
110–114 ///// 5
105–109 ////// 6
100–104 // 2
95–99 /// 3
90–94 // 2
85–89 // 2
80–84 /// 3
75–79 /// 3
70–74 / 1
As we can see from Table 3.5, the frequency distribution table lists each class interval arranged
from lowest to highest value followed by a tally of the scores that fall into each interval.
5
Frequencies
1
12 19
9
11 09
4
10 04
4
24
9
11 14
10 99
4
–8
–9
–7
–7
–8
–1
–1
–1
–1
–1
–
85
90
70
75
95
80
5
5
0
0
0
Scores
5
Frequencies
19
9
09
4
04
4
24
9
14
9
4
–8
–9
–7
–7
–9
–8
–1
–1
–1
–1
–1
85
90
70
75
95
80
5
5
0
0
0
11
10
10
12
11
Scores
using class intervals from Table 3.4. The graph’s horizontal axis (also referred to as the
abscissa or the X axis) represents the class intervals. The vertical axis (also referred to as the
ordinate or the Y axis) represents the frequencies of scores appearing at each class interval.
The intersection of the X and Y axes represents the zero point for each.
A frequency polygon is a variation of a histogram in which the bars are replaced by
lines connecting the midpoint of each class interval. In the frequency distribution of scores
on the Slosson Intelligence Test, a class interval of 5 was used; therefore, the midpoint of
the class interval 70–74 is 72, the midpoint for 75–79 is 78, the midpoint for 80–84 is 82, and
so on. If the class interval consists of an odd number of score points, then the midpoint is
a whole number; if it contains an even number of points, then the midpoint is expressed as
a decimal number, such as 1.5. A frequency polygon of the Slosson Intelligence Test scores
is presented in Figure 3.2.
If there were many more individuals and a greater number of intervals, then the fre-
quency polygon would reflect a smoother curve. A smoothed frequency polygon (or a fre
quency curve) gives a better idea of the shape of the distribution and the frequency of the
scores. If we made the intervals progressively smaller and increased the number of indi-
viduals tested on the Nowicki–Strickland Locus of Control Scale, then the smoothed fre-
quency polygon might look like that shown in Figure 3.3.
When we create smoothed frequency polygons, we will see that the distributions are
sometimes symmetrical (bell-shaped or normal curve) or asymmetrical (skewed; see F igure 3.4).
In a symmetrical distribution, each side of the curve is a mirror image of the other. This
shows that most of the scores are situated around the mean, with the rest tailing off sym-
metrically away from the mean. A frequency curve that is not symmetrical is called a
skewed curve. In everyday terms, skewed refers to something that is out of proportion or
distorted on one side. In frequency distributions, skewness indicates a lack of symmetry of
the distribution. A distribution with an asymmetrical “tail” extending out to the right is
referred to as positively skewed; this means that the majority of scores fell near the lower end
Chapter 3 • Statistical Concepts 55
Frequencies 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Scores
of the scale. A distribution with an asymmetrical tail extending out to the left is negatively
skewed. For example, if the majority of students scored poorly on a class midterm, then the
result would be a positively skewed curve with most of the scores concentrated on the left
end of the distribution. If most of the students performed well on the test, then the distri-
bution would be negatively skewed. A skewed curve is not a normal distribution; normal
distributions will be discussed later in the chapter.
Kurtosis is a statistic that reflects the peakedness or flatness of a distribution relative
to a normal distribution. A distribution could be mesokurtic, leptokurtic, or platykurtic
(see Figure 3.5). Kurtosis with a zero value signifies a mesokurtic distribution, which is simi-
lar in height to a normal distribution. Positive values of kurtosis indicate a leptokurtic dis
tribution that is more peaked than the normal curve, meaning that scores cluster in the
middle of the distribution. Negative values indicate a platykurtic distribution, which is flat-
ter than the normal curve. This means that test scores spread out rather evenly from the
lowest to highest points.
Mean The mean, the most frequently used measure of central tendency, is the arithmetic
average score in a distribution. It is used with interval or ratio scales and whenever addi-
tional statistical analysis is needed. To calculate the mean, we would total the test scores
and divide the sum by the number of individuals who took the test. The formula for the
mean is as follows:
ΣX
X =
N
where
X = the mean
Σ = sigma, which means summation
X = a test score
N = the number of cases
If we add together the scores of the 30 individuals taking the Nowicki–Strickland, our total
is 178. To compute the mean, we divide 178 by 30; therefore, the mean is 5.93.
Median The median is the middle score, or the score that divides a distribution in half; 50%
of the scores will fall below the median, and 50% will fall above. When a set of scores con-
sists of an odd number, the median is the middle score. For example, if the scores are 4, 6,
and 8, then the median is 6. With an even number of scores—such as 4, 6, 8, and 10—the
sum of the two middle scores is divided by 2; the median in this case is 7. As a measure of
central tendency, the median can be used with ordinal, interval, or ratio scales or with
highly skewed distributions.
Mode The mode is the score or numerical value that appears most frequently in a set of
scores. Referring back to the test scores in Table 3.3, the mode of the scores on the
Nowicki–Strickland Locus of Control Scale is 4. For the Anger Control scale on the
Jesness Inventory, try to determine the mode yourself: Is the mode 9, 12, or 15? Some-
times, there is a bimodal or multimodal distribution; in other words, two or more scores
may appear most frequently in the distribution. For example, the distribution of scores
on the Jesness Friendliness scale is multimodal: The same number of individuals had
scores of 16, 18, and 19. The mode is commonly used with variables that are measured at
the nominal level, but it can also provide a quick and easy measure for ordinal, interval,
and ratio variables.
Chapter 3 • Statistical Concepts 57
Median
Mean
Mode
Mode
Mean
FIGURE 3.6 Measures of central tendency with skewed and normal curves.
Measures of Variability
Whereas measures of central tendency attempt to describe a distribution based on typical
or average performance of scores, variability refers to the degree that scores are spread out
and differ from one another. A distribution of test scores can have the same mean, but the
scores may be scattered or dispersed either widely or narrowly around the mean. Measures
of variability are statistics that describe the dispersion of scores—that is, the typical dis-
tances of scores from the mean. In general, when describing a distribution, we consider
both the central tendency and the variability of the distribution. Figure 3.7 graphically dis-
plays these two important characteristics. Three commonly used measures of variability
are the range, variance, and standard deviation. Of these, the standard deviation is per-
haps the most informative and the most widely used.
Range A quick measure of the spread of scores can be obtained by computing the range. The
range is computed by subtracting the lowest score from the highest score. In the following
formula, r represents range, h is the highest score in the dataset, and 1 is the lowest score:
r = h - 1
For example, in the distribution of Nowicki–Strickland scores (Table 3.3), the high score
is 15 and the low score is 1, making the range 14 (15 - 1 = 14). On the Slosson Intelligence
58 Chapter 3 • Statistical Concepts
Central Tendency
Variability
Test score distribution, the high score is 120 and the low score is 70, making the range
50 (120 - 70 = 50).
The range is easy to compute. However, an extreme score in the distribution can
make the range a misleading statistic. For example, a test might produce these scores:
1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 40
The range here would be 39, even though all the scores except for the one score of 40 are
grouped closely together.
Variance The variance is the average amount of variability in a group of scores. It is com-
puted as the average squared deviations of values from the mean. Variance is calculated
using the following formula:
Σ(X - X)2
S2 =
N - 1
where
S2 = the variance
X = a test score
X = the mean of all the test scores
Σ = summation
N = the number of cases
Variance is rarely used as a standalone statistic. Rather, it is used as a step in the cal-
culation of other statistics (e.g., analysis of variance) or, with a slight manipulation,
transformed into the standard deviation.
Chapter 3 • Statistical Concepts 59
Distribution A Distribution B
Mean
Mean
1 SD
1 SD
Standard Deviation The standard deviation is the most frequently used measure of vari-
ability. In practical terms, it is the average distance of test scores from the mean. It is an
important statistic for interpreting the relative position of an individual within a distribu-
tion of test scores. The wider the spread of scores from the mean, the larger the standard
deviation. The smaller the standard deviation, the more scores will tend to cluster around
the mean. To illustrate this, let’s look back at the scores on the Nowicki–Strickland scale
and the Slosson Intelligence Test. Because there is a small range of scores on the Nowicki–
Strickland, just 15 points rather than the 50 points of the Slosson, the standard deviation
on the Nowicki–Strickland will be smaller than the standard deviation on the Slosson.
Figure 3.8 presents a hypothetical illustration.
As a statistic, standard deviation (s) is calculated by taking the square root of the
variance, via the following formula:
Σ(X - X)2
s =
B N - 1
Let’s use this formula for some hands-on experience calculating standard deviation.
Begin with the following students’ test scores:
Using these five test scores, work through the following step-by-step example of how
to compute standard deviation:
1144.5 = 12.02
Thus, the standard deviation is 12.02, indicating that the average variation of the test scores
from the mean is 12.02. What does this mean? Because we know that the mean of test scores
is 80, the standard deviation tells us that the majority of test scores that are within -1 or +1
standard deviation of the mean fall between 67.98 (80 - 12.02) and 92.02 (80 + 12.02).
34% 34%
13.5% 13.5%
2% 2%
−3 s −2 s −1 s X +1 s +2 s +3 s
68%
95%
99.5%
Measures of Relationship
Measures of central tendency and measures of variability are not the only descriptive sta-
tistics that can be used to get a picture of what test scores look like. Measures of relationship
can show the degree of relationship (or correlation) between two variables or scores. For
example, height and weight are related: Taller people tend to weigh more than shorter
people. However, the relationship between height and weight is not perfect: Some shorter
62 Chapter 3 • Statistical Concepts
people can weigh more than some taller people and vice versa. Correlation can tell us just
how strongly pairs of variables relate.
Correlation is an important aspect of testing in the helping professions. For example,
we might want to compare the scores from an intellectual ability test and the scores from
an achievement test to see if any relationship exists between the two different results. If the
strength of the relationship is low, we would then begin to interpret the relationship and
attempt to understand the factors that are impacting the differences. Imagine that the intel-
ligence test scores are above average and the achievement scores are below average. This
might be an indication that a student is underperforming. However, we would need a
great deal of additional information before we could arrive at that determination. Another
possible reason could be poor instruction. We wouldn’t know the cause of the discrepancy
without a thorough investigation.
Another example of the use of correlation that you are likely familiar with is the rela-
tionship between admission tests and the potential to be successful in graduate school. The
correlation between GRE scores and first-year grades in graduate school is .3 without cor-
rection for range restriction, which means that the basic correlation is quite low. However,
the interpretation of this correlation is much more complex. Ahead, we provide a more
in-depth discussion of correlation coefficients and how they relate to assessment.
Correlation Coefficients To measure the correlation, we use a statistic known as the cor
relation coefficient. There are two fundamental characteristics of correlation coefficients:
direction and strength. In terms of direction, correlation coefficients can be either positive or
negative. A positive correlation indicates that the two variables move in the same direction;
that is, when scores on one variable go up, scores on the other variable go up as well. In
contrast, a negative correlation means that the two variables move in the opposite, or inverse,
direction—when scores on one variable go up, scores on the other variable go down.
In regard to strength, correlation coefficients describe the magnitude of the relation-
ship between two variables. Correlation coefficients range in strength between -1.00 and
+1.00. Thus, a correlation coefficient of +1.00 is considered a perfect positive correlation,
meaning that higher values on one variable are directly related to higher values on the
second variable. In contrast, a coefficient of -1.00 is a perfect negative correlation, indicating
that higher values on one variable are associated with lower values on the second variable.
The closer the coefficient is to either of the two extremes, the stronger the degree of rela-
tionship between the two variables. A coefficient of 0 would indicate the complete absence
of a relationship. Most relationships will fall somewhere between a perfect association and
no association. It is important to remember that correlation is not the same as cause. We
cannot determine from an estimate of correlation that one variable causes the other; we can
simply recognize that an association between the two variables exists.
Different methods of statistical analyses are used to calculate correlation coefficients,
depending on the kinds of variables being studied. For this textbook, we will discuss the
Pearson Product Moment Correlation, Spearman’s rho, and the phi coefficient.
Pearson Product Moment Correlation. The Pearson Product Moment Correlation, repre-
sented by the small letter r, is used for measuring a linear relationship between two con-
tinuous variables—such as weight and intelligence in children. Continuous variables can
take on any value over a range of values. The example of weight and intelligence in c hildren
is perfect to illustrate that correlation does not equal causation. Although a relationship
Chapter 3 • Statistical Concepts 63
might exist between the two, there is no evidence to suggest that weight causes intelligence
or vice versa. Why might a relationship exist? There are a number of reasons that could be
explored, but the most plausible would be that another variable exists that is related to
both and that is impacting the relationship of weight and intelligence. Let’s imagine that
parental intelligence is the variable. The explanation might be that parents with high intel-
ligence produce children with high intelligence. These parents also have better under-
standing of nutrition, and their children have healthy weight levels. Conversely, parents
with less intelligence might produce children with less intelligence. These parents may
also have less understanding of nutrition and subsequently have children who are over-
weight. Although these explanations might seem plausible, they are spurious. While you
might be able to make a case for the connection between weight and intelligence, there are
many other important variables that may impact either or both variables. For example,
people with less intelligence may have lower income potential. As a result, these individu-
als have less money for food. Fast food and high fat foods are often cheaper than healthy
food choices. As a result, the families end up with higher overall weight levels. There are
numerous explanations for a relationship between weight and intelligence, none of which
indicate a causative factor.
To demonstrate Pearson’s r, let’s say we want to compare how a group of students
performed on a reading ability test (Test A) and a writing ability test (Test B). Let’s look at
the following student scores on Test A and Test B:
N(ΣXY) - (ΣX)(ΣY)
r =
2[N(ΣX2) - (ΣX)2][N(ΣY2) - (ΣY)2]
where
X = the individual’s score on the X variable (in this case, X is assigned to Test A)
Y = the individual’s score on the Y variable (in this case, Y is assigned to Test B)
Σ = summation
N = the number of paired scores
To calculate Pearson’s r, we take the following steps (Table 3.6 provides the data for
steps 1 to 3):
1 1 3 1 9 3
2 2 2 4 4 4
3 3 4 9 16 12
4 4 6 16 36 24
5 5 5 25 25 25
Sums: 15 20 53 90 68
Step 4 We plug the numbers into the formula and complete the calculation. Because we
have five sets of scores, N = 5.
5(68) - (15)(20)
r =
2[5(55) - (15)2][5(90) - (20)2]
340 - 300 40
= = = .80
2[275 - 225][450 - 400] 50
The correlation of .80 indicates a positive degree of association between the two tests.
This means that a relationship exists between reading and writing levels for the students
who took these tests. Now that we know the correlation, we can have a general idea of the
student’s ability in one area if we have data from the other. For example, if a student only
took the reading test and scored high, then we could estimate that they would also per-
form well on the writing test.
Spearman’s Rho. Spearman’s rho is a variant of Pearson’s r that is used for finding the
association between two ordinal (rank-ordered) variables. Denoted by the Greek letter r
(rho), it is a nonparametric measure of correlation; that is, it does not require linear rela-
tionships between variables or variables that are measured on interval scales. For example,
Spearman’s rho is frequently used for correlating responses from Likert scales (i.e., rating
scales running from never to always or strongly disagree to strongly agree). For example,
imagine that you provided a couple with a marital satisfaction instrument that contained
a Likert scale and asked both members of the couple to complete it. You would want to
know the relationship between both of their instruments. In other words, you would want
to know the level of agreement about their marital satisfaction. Because the instrument
measured responses in a Likert format, a Spearman’s rho would be the appropriate corre-
lation. Fortunately, many scoring programs for assessment instruments calculate level of
agreement for you.
Phi Coefficient. The phi coefficient (f) is used to assess the degree of association between
two nominal variables. For example, let’s say that a researcher is interested in studying the
relationship between gender and counseling group program completion. Gender is a nom-
inal scale variable with two categories: male and female. If group completion is also meas-
ured on a nominal scale using yes or no, then a phi coefficient would be the appropriate
statistic to assess the relationship between the two variables.
Chapter 3 • Statistical Concepts 65
6 (4, 6)
5 (5, 5)
4 (3, 4)
Test B
3 (1, 3)
2 (2, 2)
x
1 2 3 4 5 6
Test A
Regression Correlation is used to assess the size and direction of a relationship between
variables. Regression is a statistical method related to correlation but is primarily used for
prediction. Regression is the analysis of relationships among variables for the purpose of
understanding how one variable may predict another. In the helping professions, we often
use regression to determine whether changes in test scores are related to changes in perfor-
mance. For example, do individuals who score better on the Graduate Records Exam
(GRE) perform better in graduate school? Can certain personality tests predict job
66 Chapter 3 • Statistical Concepts
r = +.80 r = −.80
erformance? Can IQ measured during high school predict success in later life? Regression
p
is a common statistical analysis used in counseling research, and it is important for you to
understand the basic principles of regression so that you can apply them to practice.
Simple linear regression is a form of regression analysis used to predict the value of one
variable (the dependent or outcome variable) based on the value of another variable (the inde
pendent or predictor variable). It involves plotting a straight line through a set of points on a
scatterplot with the minimal amount of deviations from the line (see Figure 3.12). This line,
called the regression line or the line of best fit, best represents the association between the two
variables and illustrates the linear relationship between the variables. The deviation of the
points from the line is called error. Once the equation is determined, you can predict the
dependent variable (on the X axis) based on the value of the independent variable (on the Y
axis). The equation for regression is the same as the algebraic equation of a straight line:
Y = a + bX, where a is the Y intercept and b is the slope of the line.
Multiple regression is the same as simple linear regression, except that it attempts to
predict the dependent variable from two or more independent variables. For example, let’s say
that a researcher wants to predict outcomes in counseling from three variables: age, gen-
der, and scores from the MMPI-2. The researcher would use multiple regression analysis
to find the linear combination of the three variables that best predicts counseling success.
The formula for multiple regression is an extension of the simple regression formula
Y = a + b1X1 + b2X2 + g bnXn.
Chapter 3 • Statistical Concepts 67
Y axis
X axis
Factor Analysis The objective of multiple regression is to analyze the relationship among
variables to predict a single dependent variable by a set of independent variables. Factor
analysis also analyzes the relationship among variables, but not for purposes of predicting
a dependent variable. The principal objective of factor analysis is to simplify the descrip-
tion of data by reducing the number of necessary variables. Thus, it is considered a data-
reduction technique used to analyze the relationships among large numbers of variables to
explain the variables by their common underlying dimensions (i.e., factors). The dimensions
that are generated from the process are much fewer in number and are thought to be rep-
resentative of the original variables.
Factor analysis has been used to construct and refine personality tests by condensing
numerous personality traits into fewer dimensions of personality that can then be meas-
ured and evaluated. For example, Cattell, Eber, and Tatsuoka (1988) developed the Sixteen
Personality Factor Questionnaire (16PF), which consists of 16 primary factors of personal-
ity that were derived from 18,000 personality trait names using factor analysis. The factor
analytic statistical process that resulted in 16 distinct personality factors from 18,000 traits
is quite complex, but it is important to have a basic understanding of the process, because
many assessment instruments are developed using this technique.
In order to illustrate the concept of factor analysis, think about coin-sorting machines
that are available in grocery stores and other public venues. You can take a jar of coins to
the machine, dump the coins in, and receive a cash payout in paper money. The machine
is designed to weigh the coins and to count the iterations of each coin that weighs the same
amount. For example, the machine determines that a quarter is a different weight from a
dime. Once the machine weighs the coins, it provides a value based on each weight and
produces a final amount. A very similar process is used for factor analysis. As you might
guess, when examining 18,000 personality traits, there are many that are of the same
weight statistically. In other words, the traits measured very similar (if not the same)
68 Chapter 3 • Statistical Concepts
c oncepts. Traits like friendly and gregarious might have overlapped extensively. As a result,
the factor analysis process categorizes these as having the same weight. After all 18,000
traits were examined, the result was only 16 distinct personality factors.
Of course, factor analysis is more complex than the illustration provided. As a con-
sumer of statistics, the preceding example can help you to understand how assessment
instruments are developed. However, the example does not provide you with the knowl-
edge of whether the statistic was used correctly or if the interpretation was appropriate.
Statistical techniques like factor analysis require training beyond the scope of this textbook
to use competently. If you are interested, there are books that discuss factor analysis in
greater detail (see Aron, Aron, & Coups, 2012; Cudeck & MacCallum, 2007; Lomax, 2012).
Summary
To be competent in assessment, counselors need measure: nominal, ordinal, interval, and ratio.
to understand the statistical concepts pertinent to The type of scale of measurement and the size of
understanding, interpreting, and communicating the sample being tested influence what statistics
assessment results. Statistics provide test devel- are computed. Descriptive statistics are widely
opers and users with a way to summarize, used in the assessment field to describe and sum-
describe, and interpret assessment data. Statistics marize a set of data. There are several different
also help counselors to evaluate the psychometric types of descriptive statistics, such as frequency
properties (e.g., reliability, validity, standardiza- distributions, measures of central tendency,
tion) of tests, which is important for evaluating measures of variability, and measures of relation-
and selecting tests. Any discussion of statistics ship. Inferential statistics are used to draw infer-
begins with an understanding of the four scales of ences on the population being studied.
Suggested Activities
1. Study a test manual, and identify the different 4. Compute a Pearson correlation between the
types of statistics reported. Make an annotated Anger Control and Calmness scores that are
set of cards with definitions and illustrations of included in Table 3.3.
the statistics identified. 5. Construct a frequency distribution of the Socia-
2. Compute the mean, mode, and median of the bility or Conformity scale reported in Table 3.3.
Anger Control and Calmness scores that are Then, make a histogram and frequency polygon
included in Table 3.3. of the scores.
3. Compute the standard deviations and range of
the Anger Control and Calmness scores that are
included in Table 3.3.
Chapter 3 • Statistical Concepts 69
References
Aron, A., Aron, E. N., & Coups, E. (2012). Statistics for Lomax, R. G. (2012). Statistical concepts: A second
psychology (6th ed.). Upper Saddle River, NJ: course (4th ed.). Mahwah, NJ: Lawrence Erlbaum
Merrill/Prentice Hall. Associates.
Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1988). National Institute of Mental Health. (2014). Major
Handbook for the Sixteen Personality Factor Question depression among adults. Retrieved from http://
naire (16PF). Champaign, IL: Institute for Person- www.nimh.nih.gov/statistics
ality and Ability Testing. Rotter, J. B. (1966). Generalized expectancies for
Cudeck, R., & MacCallum, R. C. (Eds.). (2007). Factor internal versus external control of reinforcement.
analysis at 100: Historical developments and future Psychological Monographs, 80, 609.
directions. Mahwah, NJ: Lawrence Erlbaum Urbina, S. (2014). Essentials of psychological testing
Associates. (2nd ed.). Hoboken, NJ: John Wiley & Sons.
CHAPTER
Because assessment is an integral part of the counseling process, counselors must be able
to interpret the results of tests, rating scales, structured interviews, and various other
instruments and then be able to relay the information from that interpretation in an under-
standable way. For example, a client might complete a depression inventory and obtain a
score of 88. The counselor would have to understand that score in relationship to the norm
group and then be able to explain the score in relation to depression. As another example,
a counselor might need to explain the scores of a career interest inventory to a client and
explain how the client can use the scores to make career decisions. In yet another example,
a school counselor might interpret and explain the scores of an achievement test to stu-
dents or parents. Scores can be expressed in different forms. Sometimes, the meaning of
the various types of scores is confusing to clients, because they have little exposure to the
terminology used in measurement. Therefore, it is crucial that counselors have knowledge
about the different methods of scoring and ways of communicating those scores in a man-
ner that clients can understand.
JJ Describe the types of standard scores and their relationship to the normal curve.
JJ Describe grade and age equivalent scores and explain their limitations.
ASSESSMENT SCORES
Because scores reflect the performance of the individual taking an assessment instrument,
having a clear understanding of scores and their meaning is extremely important. Imagine
that you receive a grade of 60 for a midterm exam in one of your university classes. What
does the score mean, and how should you interpret it? By itself, the number has no meaning
70
Chapter 4 • Understanding Assessment Scores 71
at all and cannot be interpreted: You cannot determine if it is a perfect score, a failing score,
or anything in between. This would be considered a raw score, which is simply represent-
ing the number of correctly answered questions coded in some specific manner, such as
correct/incorrect, true/false, and so on. By itself, a raw score does not convey any mean-
ing. A score has meaning only when there is some standard with which to compare it; thus,
part of the process of understanding scores is transforming a raw score into some mean-
ingful form. These transformed scores are crucial in helping us interpret test scores. In most
cases, scores can be divided into two general classes: norm-referenced scores and criterion-
referenced scores. This classification provides us with a frame of reference to interpret the
meaning of a given score. With norm-referenced scores, an individual’s test score is com-
pared to the scores of other people who took the same test (i.e., a norm group). With crite-
rion-referenced scores, the individual’s score is measured against a specified level of
performance (i.e., a criterion). We will discuss these concepts further, beginning with crite-
rion-referenced score interpretation.
CRITERION-REFERENCED SCORES
Many tests measure how well individuals master specific skills or meet instructional objec-
tives, and in these situations, test results are interpreted using criterion-referenced scores.
Criterion-referenced score interpretation emphasizes the use of some criterion or standard of
performance (e.g., instructional objectives or competencies) to interpret an examinee’s test
results. For example, most tests and quizzes written by school teachers are criterion-refer-
enced tests. The objective is simply to determine whether or not the student has learned
the class material; thus, knowledge of class material becomes the standard of performance
by which a student’s test results are evaluated. Because the emphasis is on the test taker’s
achievement level or mastery, criterion-referenced test scores are interpreted in absolute
terms, such as the percentages, scale scores, and performance categories. Typically, per-
centages are used to reflect the number of correct responses (e.g., the student correctly
answered 80% of the items on a test). Scale scores are usually three-digit scores that have
been converted from raw scores. The meaning of a scale score varies depending on the test;
however, lower scores typically indicate the ability to do easier work, and higher scores
typically indicate the ability to do more difficult work. Performance categories (also called
achievement levels or proficiency levels) describe an individual’s performance by sorting or
classifying scores into categories based on the quality of performance, such as failing,
basic, proficient, and advanced, or Level 1, Level 2, and so on. Performance categories are
actually specified ranges of a particular score, such as scale scores. For example, the Florida
state achievement test—which measures student performance in reading, math, science,
and writing—has five clearly defined proficiency levels that represent a scale score rang-
ing from 100 to 500. On the reading test, Level 1 consists of scale scores that range from 100
to 258, Level 2 scores range from 259 to 283, Level 3 from 284 to 331, Level 4 from 332 to
393, and Level 5 from 394 to 500.
Frequently, criterion-referenced scores involve the use of a cutoff score (i.e., the pass-
ing score), which represents the minimum number of items the examinee must answer
correctly in order to demonstrate mastery. An example of the use of cutoff scores is the
achievement testing of students as a result of the No Child Left Behind legislation. In this
case, the federal government requires students to achieve a minimum proficiency level on
statewide exams (e.g., 75% of items correct in reading) to prove that they have mastered
72 Chapter 4 • Understanding Assessment Scores
certain basic skills. Criterion-referenced instruments are also commonly used for profes-
sional licensure or certification tests in which a cutoff score qualifies an individual for the
job or for appropriate licensure or certification. In these situations, the cutoff score may be
set by using the Angoff method (Angoff, 1971), which is a procedure that requires a panel of
experts to judge each test item based on the probability or likelihood that a candidate with
minimal competency of the material would answer the question correctly. Analysis of the
probability input from the experts is used to generate a cutoff score for the test.
An important consideration with criterion-referenced scores involves the content
domain (i.e., the criterion). Because a score is thought to reflect an individual’s knowledge
of a specific content domain, that domain must be clearly defined. In education, states and
school districts have set standards or competencies for students to meet—for example,
mastery of essential or basic skills. Criterion-referenced tests are commonly used to evalu-
ate mastery of basic skills in classrooms, and the criteria reflect the educational content that
is presented through various learning activities. An example of a well-defined content
domain for second-grade math students is, “Students should understand the relative size
of whole numbers between 0 and 1,000.”
NORM-REFERENCED SCORES
With norm-referenced interpretation, we do not look at a test taker’s mastery of a particular
content area; rather, we compare an individual’s test scores to the test scores of a group of
people (Table 4.1 summarizes the differences between norm-referenced and criterion-
referenced scores). Norm-referenced score interpretation is used when one wishes to make
comparisons across large numbers of individuals on a particular domain being measured.
For example, it might be reported that a high school student scored higher on an achieve-
ment test than 95% of a national sample of high school students who took the same test.
Thus, with norm-referenced tests, individual performance is determined by relative rank-
ing among the group of individuals who took the test. The group scores to which each
individual is compared are referred to as norms, which provide the basis for interpreting
test scores. As you recall from the previous section, there are only a few different types of
criterion-referenced scores (e.g., percentages, scale scores, performance categories).
Comparatively speaking, the various types of norm-referenced scores substantially out-
number the types of scores used with criterion-referenced. The numerous and diverse
types of norm-referenced scores will be presented later in the chapter.
Probably the most important issue in norm-referenced score interpretation involves
the relevance of the group of individuals against whom the test taker’s score is weighed.
This group, called the norm group, provides the standard against which a particular
Norm-referenced How well the test taker does in comparison The norm group must be
with other individuals (a norm group). clearly defined.
Criterion-referenced What the test taker knows or does not know The content domain
with respect to a specified content domain. (i.e., the criterion) must be
clearly defined.
Chapter 4 • Understanding Assessment Scores 73
Norm Group
The norm group (also called the normative group, normative sample, norming sample, or stand-
ardization sample) refers to the large group of individuals who took the test and on whom
the test was standardized. It is the norm group’s performance to which the examinee’s
performance is compared and interpreted, and for interpretations to be meaningful, the
norm group must be relevant. For example, on measures of intelligence, test results from
10-year-old children must be compared to results from other 10-year-old children, not to
6-year-old children or 13-year-old children. To determine if the norm group of a particular
instrument is relevant, counselors evaluate whether the norm group is representative, cur-
rent, and has adequate sample size.
Representativeness refers to the extent to which the characteristics of the norm group
correspond to the individual being tested. Norm groups typically include individuals who
represent the age and demographic characteristics of those for whom the test is intended.
For example, if a test is used to assess the math skills of U.S. middle school students in
grades 6 to 8, then the test’s norm group should represent the national population of sixth-,
seventh-, and eighth-grade students in all pertinent areas. To ensure that a norm group is
representative, many test developers select groups from the general population using
stratified sampling procedures. Stratified sampling involves choosing norm groups based
on certain important variables; in the United States, many psychological and educational
tests use national norms that are stratified in accordance with U.S. Census Bureau popula-
tion statistics, which include such variables as gender, age, education, ethnicity, socioeco-
nomic background, region of residence, and community size. For example, the Kaufman
Brief Intelligence Test (KBIT-2) was standardized using a stratified normative sample com-
prised of 2,120 individuals (ages 4 to 25) that closely resembled the U.S. population based
on the 2001 U.S. Census Bureau Survey (Kaufman & Kaufman, 2004). The sample matched
such demographic variables as education level, parental education, race, ethnicity, and
geographic region. Furthermore, in 2001, the Census Bureau reported that 15% of 11-year-
old children in the United States were Hispanic; thus, 15% of the 11-year-old children
within KBIT-2’s norm group were Hispanic.
Another issue in determining the relevance of norm groups involves whether the
sample is current. Over the last few decades, the demographic composition of the U.S.
population has changed dramatically, such as the increasing numbers of minority popula-
tions. These newer demographics may impact test scores that are interpreted using older,
outdated normative samples. For example, the U.S. population is now over 17% Hispanic
(United States Census Bureau, 2014). Thus, the normative sample of the KBIT-2 is out-of-
date. This does not mean that the instrument is no longer useful, but it does mean that the
results should be interpreted with caution and that further research is needed to ensure
that the scores are accurate reflections when testing Hispanic individuals.
Other norm group variables that fluctuate over time include language, attitudes, and
values; even intelligence has been shown to change over time, as evidenced by the steady
increase in intelligence test scores over the past century (read about the Flynn Effect in
Chapter 8). Overall, it is generally preferable to use tests with the most up-to-date norm
74 Chapter 4 • Understanding Assessment Scores
groups, because these test results will be the most representative of the performance of the
test takers. Although no specific guidelines are available to determine whether a norma-
tive group is too old, it is reasonable to expect that instruments will be revised at least
every 10 years and that new norms will accompany the revisions (Thorndike, 2011).
The size of the norm group can vary tremendously depending on the type of test and
population that the norm group represents. For example, broadband achievement tests
developed for use with K–12 students throughout the United States will have norm
groups that number in the thousands. The Basic Achievement Skills Inventory (BASI;
Bardos, 2004) was standardized on over 2,400 students (for Form A), the Wide Range
Achievement Test (WRAT4; Wilkinson & Robertson, 2006) had a norm group of over
3,000 individuals, and the Stanford Achievement Test (Stanford 10; Pearson Inc., 2004)
had a national normative sample of 275,000 students. Instruments with a more narrow
focus that require samples from specialized populations (e.g., individuals from certain
occupational groups, having specific psychiatric disorders, with learning disabilities)
typically have smaller normative samples. For example, the normative sample of the Beck
Depression Inventory (BDI-II; Beck, Steer, & Brown, 1996) consisted of 500 individuals
who were diagnosed with various psychiatric disorders, and the Posttraumatic Stress
Diagnostic Scale (PDS; Foa, 1995) had a normative sample of 248 individuals who were
receiving treatment for Posttraumatic Stress Disorder. The size of a norm group holds
importance for a couple of reasons: First, the larger the normative sample, the more likely
it is to represent the population; second, a large norm group increases the stability and
accuracy of statistical analysis.
Because the relevance of the norm group is essential when using norm-referenced
tests, test publishers are responsible for providing detailed information about the norm
group in the test manual. Counselors are also obligated to evaluate the norm group and
determine whether it is suitable for comparison with their client’s scores. Table 4.2 lists
questions needed to evaluate a norm group.
PERCENTILE RANKS Percentile ranks are the most common form of scores that expresses the
examinee’s relative position on a norm-referenced test. A percentile rank represents the
percentage of a distribution of scores that falls below a particular test score. For example,
if a test taker has a percentile score of 80, then he or she has performed equal to or better
than 80% of the individuals who took the test. Percentile ranks range from the 1st percen-
tile (lowest) to the 99th percentile (highest), and the 50th percentile normally signifies the
average ranking or average performance. The formula for calculating the percentile rank
for any raw score is as follows:
B
PR = * 100
N
where
PR = percentile rank
B = the number of cases with lower values
N = the total number of cases
To calculate a percentile rank, as an example, say that a math teacher had 10 raw scores on
an exam ranging in value from 10 to 20 (see Table 4.3). From the table, you can see that
Denzel achieved a raw score of 17 on the test. To calculate the percentile rank, sum the total
1 Michelle 20
2 Carlos 18
3 Denzel 17
4 Halle 16
5 Dante 16
6 Corrine 15
7 Maria 15
8 Grant 14
9 Monica 13
10 Joe 10
76 Chapter 4 • Understanding Assessment Scores
Percentages
• A form of raw score that reflects the number of correct responses obtained out of the total
possible number of correct responses on a test.
• Symbol for percentage score: %.
• Example: Shari scored 70% on her test. This means that she answered 70% of the test items correctly.
number of cases that fall below this score of 17; in this case, seven scores have lower values.
Then, calculate the rest of the formula:
7
PR = * 100
10
= 70
Denzel’s score is in the 70th percentile, meaning that he scored equal to or better than 70%
of the students who took the math test.
Percentile ranks should not be confused with percentages. Although the two terms
are similar, they are different concepts conveying different meanings. Table 4.4 describes
the differences between these terms.
Quartiles are another term referred to in percentile measure. Quartiles are similar to
percentile ranks, except that they divide the data set into four equal parts instead of 100
equal parts. The first quartile (also called the lower quartile) is the same as the 25th percen-
tile, the second quartile (median quartile) is the 50th percentile, and the third quartile
(upper quartile) is the 75th percentile. The distance between the first quartile and the third
quartile is called the interquartile range and contains 50% of all values in the distribution.
Many standardized tests report a percentile band rather than a single percentile rank.
A percentile band is the range of percentile values in which a test taker’s “true” percentile
rank would lie. Usually, the band extends one standard error of measurement above and
below the obtained percentile rank. For example, if a test taker had a percentile rank of 35
and the standard error of measurement was 5, then the percentile band would range from
the 30th percentile to the 40th percentile.
Although percentile ranks are the most common way to compare the relative ranking
of an individual’s test score, percentiles are not an equal-interval scale of measurement.
This means that percentile ranks are not equally spaced across all parts of a distribution:
They are compressed near the middle of the distribution (where the majority of scores fall)
and spread out near the tails (where there are fewer scores; see Figure 4.1). This implies
that small differences in percentile ranks near the middle of the distribution may be less
important than differences at the extremes. In addition, without an equal-interval scale,
they do not accurately reflect the differences between scores; for example, the difference
Chapter 4 • Understanding Assessment Scores 77
34% 34%
13.5% 13.5%
−3 s −2 s −1 s X +1 s +2 s +3 s
Percentiles
1 5 10 20 30 40 50 60 70 80 90 95 99
between 20th and 30th percentiles is not the same as the difference between the 50th and
60th percentiles. Furthermore, because most meaningful mathematical calculations require
measurement on equal-interval scales, percentiles cannot be added or averaged.
STANDARD SCORES Standard scores are a means of presenting the relative position of an indi-
vidual’s test score, assuming a normal distribution. Standard scores are linear transformations
of raw scores. This means that raw scores are converted into a particular standard score using
certain mathematical equations; as such, standard scores retain a direct relationship with the
raw score. There are several types of standard scores: z scores, T scores, deviation IQs, stan-
ines, sten scores, and other standard score scales. Each type of standard score has a set mean
and standard deviation. By using standard deviation units, standard scores reflect the dis-
tance that an individual’s test score falls above or below the mean of the distribution of scores.
Thus, the standard deviation of the test becomes its yardstick. These scores are called “stand-
ard” because no matter what the distribution size or the scale of measurement, standard
scores will always be the same, with a fixed mean and fixed standard deviation (Figure 4.2
shows the various standard scores and their relationship to the normal curve). Remember that
standard scores assume a normal distribution of scores; however, many variables measured
in assessment are not distributed normally. For example, measures of certain psychological
symptoms (e.g., depression, anxiety) may deviate substantially from the normal distribution.
In these situations, test developers may elect to use normalized standard scores. Normalized
standard scores are not derived from the linear transformations of raw scores; rather, they are
nonlinearly derived scores that are computed by (a) finding the percentile rank of each raw
score, (b) transforming the percentiles into z scores using a conversion table, and (c) trans-
forming the z score into any other standard score desired. In most situations, normalized
standard scores will be interpreted similarly to other standard scores obtained by linear trans-
formations (e.g., z scores, T scores, deviation IQs); they will have the same mean and standard
deviation as their counterparts. However, they should be identified as normalized standard
scores to alert the test user to the fact that they come from a distribution that was not normal.
78 Chapter 4 • Understanding Assessment Scores
34% 34%
13.5% 13.5%
−3 s −2 s −1 s X +1 s +2 s +3 s
z score
−3 −2 −1 0 +1 +2 +3
T score
20 30 40 50 60 70 80
Deviation IQ
55 70 85 100 115 130 145
Stanine
1 2 3 4 5 6 7 8 9
Sten score
1 2 3 4 5 6 7 8 9 10
SAT/GRE
200 300 400 500 600 700 800
z Scores. z scores are the simplest form of standard scores. A z score conveys the value
of a score in terms of how many standard deviations it is above or below the mean of the
distribution. The mean for a distribution of z scores is 0, and the standard deviation is 1.0.
The range of z scores is approximately from -3.0 to +3.0; thus, scores that fall below the
mean will have negative z scores and those that fall above the mean will have positive
z scores. The z score is computed using the following formula:
X - X
z =
s
where
z = the z score
X = the raw score of any individual taking the test
X = the mean of the raw scores
s = the standard deviation of the raw scores
Chapter 4 • Understanding Assessment Scores 79
For example, if we know that the mean of a test’s raw scores was 80 and the standard
deviation was 10 and that an individual scored 100 on the test, then we have the data to
compute the z score:
100 - 80
z =
10
20
=
10
= +2.00
z scores tell us instantly how large or small an individual score is relative to other scores
in the distribution. Thus, the z score above tells us that the person scored 2 standard devia-
tions above the mean on the exam. Table 4.5 provides more examples of transforming raw
scores into z scores. Because standard scores are linear units, they can be transformed in a
number of ways without having the properties of the original raw score distribution changed.
For example, z scores can be converted into T scores, as described in the next section.
T Scores. A T score is another commonly used standard score that has a fixed mean of
50 and a fixed standard deviation of 10. Using a predetermined mean and standard devia-
tion eliminates the need for decimals and negative values; thus, T scores will always be
positive whole numbers. The formula is as follows:
T = 10z + 50
The fixed mean of 50 becomes a constant that is added to each score; the fixed standard
deviation of 10 becomes a constant multiplied by each z score. Thus, if we know the z score,
then we can calculate the T score. For example, if a test’s raw score was converted into a
z score, which came to +2.00, then the T score would be computed as follows:
T = 10(2.00) + 50
= 20 + 50
= 70
T scores are very popular in assessment and are often used to report personality test
results (see Table 4.6 for examples of assessment instruments and resulting scores). An
example is the MMPI-2, which uses uniform T scores. The average score on an MMPI-2
scale is 50. Scores on the MMPI-2 are not considered clinically significant until they are 1.5
standard deviations above the mean (i.e., 65 and above). When an individual obtains a
80 Chapter 4 • Understanding Assessment Scores
where X - X
Step 1. z =
z = the z score S
X = the raw score of any individual taking 95 - 80
the test =
10
X = the mean of the raw scores 15
S = the standard deviation of the raw scores =
10
Step 2. Convert the z score to any standard = +15
score as follows: Step 2. X′ = S′(Z ) + X′
= 100(1.5) + 500
X′ = S′(Z ) + X′ = 650
T score of 65, this means that the score is equal to or higher than 91% of the population.
When using this lens, the score is clearly different from the rest of the norm group. A
T score of 70 is only five points higher than 65 but means that the score is actually equal to
or higher than 98% of the population. Although a difference of 5 points might appear
minimal upon initial inspection, it actually distinguishes the individual from the rest of the
normative group in a significant manner.
Deviation IQs. The IQ, or intelligence quotient, was originally conceptualized as a
ratio of mental age to chronological age. However, various problems with calculating IQ
in this way were found. For example, it was assumed that mental age was directly corre-
lated with chronological age over time (i.e., mental age increased as chronological age
increased), but in fact mental age increases very slowly after an individual reaches the
chronological age of 16 and then eventually begins to decline. Thus, most intelligence tests
today no longer use the mental age/chronological age formula to compute intelligence;
they use deviation IQ scores instead. Deviation IQ scores have a mean of 100 and a standard
deviation of 15. However, the standard deviations may vary depending on the test; for
example, the Cognitive Abilities Test (CogAT) uses a standard deviation of 16. In general,
most of the major intelligence tests use 15 as the fixed standard deviation.
82 Chapter 4 • Understanding Assessment Scores
CEEB Scores (SAT/GRE). CEEB scores are standard scores that were developed by the
College Entrance Examination Board (now called the Educational Testing Service [ETS])
and used with tests including the SAT and the GRE. CEEB scores range from 200 to 800
and have a mean of 500 and a standard deviation of 100.
Stanines. Stanines (short for STAndard NINE) are a type of standard score that convert
raw scores into values ranging from 1 to 9, making it possible to translate various kinds of
information into one-digit scores. Stanines have a mean of 5 and a standard deviation of 2,
and they have a constant relationship to percentiles in that they represent a specific range
of percentile scores in the normal curve; that is, a given percentile always falls within the
same stanine. Because they categorize test performance into only nine broad units, stan-
ines provide less detail about an examinee’s performance than other derived scores. In
general, stanines of 1 to 3 are considered below average, stanines of 4 to 6 represent aver-
age performance, and stanines of 7 to 9 are above average. A stanine-to-percentile-rank
conversion table is presented in Table 4.6. Stanines are widely used in education.
Sten Scores. Sten scores (short for Standard TEN) are normalized standard scores simi-
lar to stanines, but stens range from 1 to 10, have a mean of 5.5, and have a standard
deviation of 2. Sten scores of 4 to 7 are considered average, stens of 1 to 3 fall in the low
range, and stens of 8 to 10 fall in the high range. A sten-to-percentile-rank conversion table
is presented in Table 4.7.
Normal Curve Equivalent. The normal curve equivalent (NCE) is a normalized stand-
ard score that ranges from 1 to 99, with a mean of 50 and a standard deviation of 21.06.
Used primarily in education, an NCE score of 50 represents the national average of any
grade level at the time of year the test was normed. NCE scores can be used to compute
group statistics, compare the performance of students who take different levels of the same
test, compare student performance across subject matter, and evaluate gains over time. To
interpret NCEs, it is necessary to relate them to other standard scores, such as percentile
ranks or stanines. NCEs may be thought of as roughly equivalent to stanines to one deci-
mal place; for example, an NCE of 73 may be interpreted as a stanine of 7.3. To evaluate a
student’s gains over time, NCE scores should be interpreted as follows: a zero gain indi-
cates that a student made one year of academic progress after one year of instruction, a
positive gain indicates the student made more than one year’s progress, and a negative
gain indicates less than one year’s progress. The main advantage of NCEs is that they are
derived through the use of comparable procedures by the publishers of the various tests
used in federal (i.e., U.S. Department of Education) research projects.
GRADE AND AGE EQUIVALENTS In addition to percentiles and standard scores, other norm-
referenced scores are used for interpreting test performance. We will discuss two scales
that implicitly compare the test taker’s raw score to the average raw score of people at
various developmental levels: age equivalents and grade equivalents. Although we pre-
sent these forms of test scores in this text, both types have many limitations that make
them subject to misinterpretation. Thus, we do not recommend their use as primary scores.
If you are required to use age or grade equivalents, never base interpretations on them
alone: Always include standard scores and percentile ranks when interpreting test scores.
ell-suited as standard scores for comparing different scores on the same test or for com-
w
paring scores on different tests.
Counselors should consider the following points about grade equivalent scores:
1. Grade equivalent scores are not an estimate of the grade at which the student should
be placed.
2. The grade equivalent score of one student cannot be compared to the grade equiva-
lent score of another student.
3. Grade equivalent scores cannot be compared with the scores on other tests.
4. Grade equivalent scores on different subtests of the same test cannot be compared. For
example, grade equivalent scores of 4.0 on the reading and math subtests of the same
achievement test do not indicate that the student has equal proficiency in the two areas.
Age Equivalents. Age equivalent scores represent an examinee’s test performance in
terms of the age at which the “average” individual’s performance matches that of the
examinee. For example, an age equivalent score of 9-3 indicates that a person’s perfor-
mance is similar to the average performance of children 9 years and 3 months old. Although
age equivalents can be used to describe an examinee’s performance in comparison to the
typical performance of test takers of various ages, they have the same limitations as grade
equivalents. In addition, although age equivalents have meaning when the behaviors
being measured have a direct, linear relationship with age, the rate of growth for most
behaviors varies from year to year. Thus, an even progression is not always expressed;
there may be rapid growth during some periods and a plateau or no growth during others.
QUALITATIVE ASSESSMENT
So far, we have presented information about standard scores, focusing primarily on quantita-
tive test scores. However, counselors often collect assessment data that does not lend itself to
quantitative evaluation. Qualitative assessment methods involve interpretive criteria and
descriptions of data. Although our focus in this chapter is on assessment scores, it is impor-
tant to lend attention to alternate types of data produced during the assessment process.
There are many types of qualitative assessment data that can be collected during the assess-
ment process. According to Losardo and Syverson (2011), there are at least nine categories of
qualitative assessment, including naturalistic, ecological, focused, performance, portfolio,
dynamic, authentic, Functional Behavior Analysis (FBA), and informed clinical opinion.
In order to assess the performance of students in a counselor education program, it is
important to examine both quantitative and qualitative data. From a quantitative perspective,
test scores and grades can provide a great deal of insight into learning outcome attainment.
However, examination of student portfolios that include key projects, resumes, field experience
forms, and other materials helps to provide a richer understanding of student performance.
Qualitative data has some limitations on its own, but it can be very useful in combination with
qualitative data. We will discuss qualitative assessment processes in more detail in Chapter 14.
Although quantitative and qualitative assessment can be separate processes, test
developers often provide qualitative descriptors in addition to the assignment of numbers to
test scores. Qualitative descriptors help professionals communicate test results verbally or
in written reports by providing easy-to-understand classifications based on particular
score ranges (Reynolds, Livingston, & Willson, 2008). Qualitative descriptors are widely
used in intelligence and achievement tests. For example, the Wechsler intelligence tests, a
Chapter 4 • Understanding Assessment Scores 85
IQ Classification
130 and above Very Superior
120–129 Superior
110–119 High Average
90–109 Average
80–89 Low Average
70–79 Borderline
69 and below Extremely Low
A similar approach is used with many of the typical-performance tests (i.e., personal-
ity inventories). Many of these instruments use descriptors such as nonclinical (or normal),
borderline clinical range, and clinical range. In general, scores in the clinical range fall well
above the mean scores, thus indicating the presence of severe psychological symptoms.
Other classification systems divide qualitative descriptors into severe, moderate, and mild
categories. An example of this is the BDI-II (Beck, Steer, & Brown, 1996):
SS Standard Score (mean = 100, SD = 15), %ile Percentile Rank, GE Grade Equivalent,
AE Age Equivalent
Exercise 4.1
Interpreting Test Scores: Marisol
Marisol, age 11, is in the fifth grade. The subtests. The composites are Reading
following table contains her scores on the Total, Written Language Total, and Math
BASI, an achievement test battery that Total. The subtests are Vocabulary, Read-
measures math, reading, and language ing Comprehension, Spelling, Language
skills for children and adults. It provides Mechanics, Math Computation, and Math
scores on three composites and six Application.
1. Interpret the standard scores on the 4. Interpret the grade equivalent scores
Reading Total, Written Language on the Vocabulary and Math Com-
Total, and Math Total. putation subtests.
2. Interpret the percentile ranks on the 5. Interpret the age equivalent score on
Reading Total, Written Language the Reading comprehension test and
Total, and Math Total. the Math Application subtests.
3. Interpret the stanine scores on the 6. Explain the meaning of the confidence
Reading Total, Written Language intervals for Reading Total, Written
Total, and Math Total. Language Total, and Math Total.
deviation of 15), percentile ranks, growth scale values, grade equivalents, and age equiva-
lents. Results may also include confidence intervals, which indicate the range of standard
scores that likely includes the test taker’s true score (see Chapter 5).
Profiles also help in interpreting results by providing a visual representation of scores.
Guidelines for interpreting charts and profiles include the following:
1. Small differences should not be over interpreted.
2. The normal curve can be used as a frame of reference to interpret standard scores.
3. The standard error of measurement should be used in interpretation.
4. Patterns in the shape of a profile are important (are the scores all high or low?).
Chapter 4 • Understanding Assessment Scores 87
Professional
Business Communication
Clerical
Consumer
Service
Outdoor Science Arts
Skilled
Technology
Economics
depressed. As you can see, it is just not logical to interpret the score in this way. Criterion-
referenced interpretation lends itself better to educational achievement tests or other tests
designed to measure one’s skills or abilities.
Another reason for choosing either norm-referenced or criterion-referenced interpre-
tation involves the comprehensiveness or breadth of the construct being measured. Some
constructs that assess a broad range of variables, such as intelligence or personality, lend
themselves best to norm-referenced interpretations (Reynolds et al., 2008). In contrast,
criterion-referenced interpretation is often applied to more narrowly defined constructs,
such as educational tests that assess mastery of a single subject area, like math or spelling.
Even the construct of achievement, which is commonly interpreted with criterion-refer-
enced scores, is often better suited for norm-referenced interpretations when achievement
tests cover more extensive knowledge and skill domains.
A comparison of the various types of scores is presented in Table 4.9.
Summary
This chapter presented information about performance level (criterion referenced) or com-
understanding test scores. For counselors to paring it to the scores of a norm group (norm
understand and interpret assessment results, referenced). Standard scores provide a basis to
raw scores need to be transformed using some interpret scores by describing how many stand-
frame of reference to give meaning to test ard deviations an individual’s score is from the
results. In general, scores can be divided into mean. There are numerous types of standard
two broad classifications: criterion-referenced scores, including z scores, T scores, deviation
scores and norm-referenced scores. Interpreting IQs, stanines, sten scores, and other standard
test scores can involve relating the score to a score scales.
Suggested Activities
1. Write a position paper on one of the following Scaled Grade
topics: bias in test scoring, use of grade equiva- Subtests Score Percentile NCE Stanine Equivalent
lents, norm-referenced versus criterion-refer- Reading 639 59 54.8 5 5.4
enced interpretation, or selecting appropriate Mathematics 633 57 57.5 5 5.5
norms. Language 610 39 44.1 4 3.5
2. Explain the differences among raw scores, stand-
Spelling 647 73 62.9 6 6.4
ard scores, and normalized standard scores.
Science 643 69 60.4 6 6.3
3. Select any five students from the distribution of
test scores in Table 4.3, and convert their raw Social 607 40 44.7 4 3.5
Science
scores into percentiles.
4. Jeffrey, a college senior, earned a raw score of 52 Listening 608 35 41.9 4 3.4
on a chemical engineering midterm exam. Thinking 623 56 53.2 5 5.1
Assuming that the mean of scores was 40 and the Skills
standard deviation was 12, convert Jeffrey’s raw
score into a T score. The parent doesn’t understand the types of scores
5. A parent has brought you the following Stanford and the results and wants you to help her inter-
Achievement Test results for her child, who is in pret them. What would you tell her?
fourth grade:
References
Angoff, W. H. (1971). Scales, norms, and equivalent Bardos, A. N. (2004). Basic Achievement Skills Inven-
scores. In R. L. Thorndike (Ed.), Educational meas- tory (BASI)–comprehensive. Minneapolis, MN:
urement (2nd ed., pp. 508–600). Washington, DC: Pearson Assessments.
American Council on Education.
90 Chapter 4 • Understanding Assessment Scores
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). BDI-II Reynolds, C. R., Livingston, R. B., & Willson, V.
manual. San Antonio, TX: Psychological Corporation. (2008) Measurement and assessment in education
Foa, E. B. (1995). The Posttraumatic Diagnostic Scale (2nd ed.). Boston, MA: Pearson.
manual. Minneapolis, MN: Pearson Assessments. Thorndike, R. M. (2011). Measurement and evaluation
Kaufman, A. S., & Kaufman, N. L. (2004). KBIT-2: in psychology and education (8th ed.). Upper Saddle
Kaufman Brief Intelligence Test (2nd ed.). Circle River, NJ: Pearson Education.
Pines, MN: AGS. United States Census Bureau. (2014). USA quick
Losardo, A., & Syverson, A. N. (2011). Alternative facts. Retrieved from http://quickfacts.census
approaches to assessing young children. Baltimore, .gov/qfd/states/00000.html
MD: Brookes Publishing. Wilkinson, G. S., & Robertson, G. J. (2006). Wide Range
Pearson Inc. (2004). Stanford Achievement Test series, Achievement Test 4 professional manual. Lutz, FL:
tenth edition technical data report. San Antonio, TX: Psychological Assessment Resources.
Author.
CHAPTER
5 Reliability/Precision
91
92 Chapter 5 • Reliability/Precision
JJ Describe the factors that should be considered when evaluating the magnitude of
reliability coefficients.
JJ Define standard error of measurement and explain its relationship to reliability.
JJ Explain how confidence intervals are calculated and what they mean in psychological
Reliability
Reliability/precision is one of the most important characteristics of assessment results.
Because many important decisions about individuals are based, wholly or in part, on the
basis of instrument scores, we need to make sure that the scores are reliable. In the context
of measurement, reliability/precision refers to the degree to which test scores are dependa-
ble, consistent, and stable across items of a test, across different forms of the test, or across
repeat administrations of the test. For example, if we administered a test to an individual
on two separate occasions, how consistent would the two scores be? Say that a counselor
administered the Wide Range Achievement Test (WRAT4) to a client on Monday morning
and discovered that the scores indicated average achievement. If the counselor adminis-
tered the WRAT4 to the same client on Monday afternoon, would the client score the same
way the second time as he did the first time? If the client felt more fatigued in the afternoon
rather than the morning, might that affect his score? What if the client remembered some
of the test questions from the first testing? Would that affect how he performed on the test
the second time? Instead of administering the test twice, what if the counselor used two
separate forms of the test—that is, form A and form B? Would the client perform better on
one test form than the other? What if the counselor asked another counselor to help score
the WRAT4? Would the client receive the same score no matter which counselor scored the
instrument?
As you can see from these examples, numerous factors can affect reliability/precision
and whether a test produces consistent scores. To correctly interpret the results of a test,
we need to have evidence that the scores from the test would be stable and consistent if the
test were repeated on the same individual. In other words, we need to know the degree to
which test scores are reliable.
When referring to reliability/precision in relation to assessment, consider the follow-
ing points:
1. Reliability/precision refers to the results obtained with an assessment instrument,
not the instrument itself (Miller, Linn, & Gronlund, 2012).
2. An estimate of reliability/precision always refers to a specific type of reliability/precision.
The reliability/precision of instrument scores may be based on certain intervals of
time, the items on the instrument, different raters, and so on. Instrument developers
provide estimates of the specific types of reliability/precision, depending on the use
of test results.
Chapter 5 • Reliability/Precision 93
3. Scores on assessment instruments are rarely totally consistent or error free. In fact, all
instruments are subject to some degree of error and fluctuation (Urbina, 2014). Therefore,
counselors must evaluate the evidence to determine whether an instrument’s results
have an adequate degree of reliability/precision.
Measurement Error
A critical issue in assessment is the amount of measurement error in any instrument. We
know that the same individual tested on different occasions would score differently, even on
the same test. This difference is equated with measurement error. In testing, measurement error
may be defined as any fluctuation in scores that results from factors related to the measure-
ment process that are irrelevant to what is being measured (Urbina, 2014). These fluctuations
could result from factors unique to the test taker, flaws in testing procedures, or simply from
random chance. For example, the test taker may guess better during one test administration
than during another, know more of the content on one test form than another, or be less
fatigued or less anxious during one time than another. Alternately, the items on a test may not
adequately represent the content domain being tested, or test takers may get different test
results depending on the person grading the test. These are all factors that can cause measure-
ment errors—that is, changes in an individual’s scores from one occasion to the next. The
greater the amount of measurement error on test scores, the lower the reliability/precision.
Measurement error may also be understood as the difference between an individu-
al’s observed (or obtained) score and his or her true score. The true score is considered the
true, 100% accurate reflection of one’s ability, skills, or knowledge (i.e., the score that
would be obtained if there were no errors; Boyd, Lankford, Loeb, & Wyckoff, 2013). The
observed score is the actual score a test taker received on a test. If a person could be tested
repeatedly (without carryover or practice effects), the average of the obtained scores would
approximate the true score. The observed score is made up of two components: the true
score and measurement error. This can be represented in a very simple equation:
This formula demonstrates that measurement error may raise or lower the observed score.
As stated earlier, error represents any other factor that randomly affects the measurement
process; therefore, the less error, the more reliable the observed score.
The concept of true scores is completely theoretical; we can only hypothesize what a
true score might be. However, we can estimate the score range, or at least the boundaries,
within which a true score may fall. We do this by calculating the standard error of measure-
ment (SEM), which is a simple measure of how much an individual’s test score varies from
the true score. SEM will be discussed later in the chapter.
Time-Sampling Error Time-sampling error is associated with the fluctuation in test scores
obtained from repeated testing of the same individual. With time-sampling error, it is
assumed that whatever construct being measured may vacillate over time. In other words,
how an individual performs on an intelligence test today will most likely have a degree of
difference from how he or she performs next time on the same test. How large the differ-
ence is between the intelligence test scores is important. Because of time-sampling error,
we might expect that the scores will differ (either positively or negatively) by only a couple
of points. If an individual scored 100 (50th percentile) one week and then 115 (84th percen-
tile) the next week, we would be quite suspicious of the score differential. Intelligence is
not a construct that changes dramatically.
Constructs differ in their susceptibility to time-sampling error; for example, most
constructs related to personality traits (e.g., introvert/extrovert) and abilities (e.g., verbal
comprehension, spatial reasoning) are usually less prone to fluctuation over time (Urbina,
2014). In contrast, some constructs associated with emotional states (e.g., depression, anxi-
ety) and achievement are more susceptible to the influence of transient conditions and can
vary widely over time. As a rule, we assume that time-sampling error enters into the scores
on tests that are given repeatedly; however, one should expect less time-sampling error in
tests that measure relatively stable traits (Urbina, 2014).
Several specific issues associated with time-sampling error are related to the length
of the time interval between test administrations. If the time interval is too short (e.g., a day
or two), then there is an increased risk of a carryover effect, which occurs when the first test-
ing session influences the scores on the second session (Kaplan & Saccuzzo, 2013). For
example, test takers may remember their answers from the first test administration, thus
inflating their scores on the second test administration. Similar to this is practice effect.
Because some skills improve with practice, when an instrument is given a second time, test
takers’ scores may increase because they have sharpened their skills by having taken the
test the first time. If the length of time between tests is too long, then we may be faced with
test results confounded with learning, maturation (i.e., changes in the test takers themselves
that occur over time), or other intervening experiences (i.e., treatment).
interrelationships among test items and components (e.g., scales, subtests). Methods for
estimating content-sampling error will be discussed later in this chapter.
Interrater differences Instruments involving observation require the use of raters (i.e.,
observers, scorers, judges) to record and score the behaviors of others. When instrument
results rely heavily on the subjective judgment of the individual scoring the test, it is
important to consider differences in scorers as a potential source of error. For example, say
that two raters are recording the number of times a third-grade student is out of seat in
class. Without a definition of out of seat, it is difficult to accurately observe the behavior. For
example, one rater might classify out of seat as having no contact with the seat and thus
record every occurrence when the student leaves full contact with his seat; another observer
might classify out of seat as any time the student’s bottom is out of contact with the desk.
The difference in definitions could lead to differences in agreement about the out-of-seat
behavior. This lack of agreement results in discrepancies between the student’s true score
and his observed score as recorded by the raters. It is assumed that different raters will not
always assign the same scores or ratings to a given test performance, even if the scoring
directions specified in the test manual are explicit and the raters are conscientious in apply-
ing those directions (Urbina, 2014). To assess interrater differences, we need to estimate the
reliability/precision of the raters, which is usually referred to as interrater reliability (also
called interscorer reliability). This will be discussed later in the chapter.
Other Sources of Error Content-sampling error and time-sampling error account for
the major proportion of measurement error in all types of instruments, and interrater dif-
ferences are associated with error in instruments involving behavior observations. In
addition to these, other sources may contribute to the random error in test scores, includ-
ing the following:
• Quality of Test Items Quality of test items refers to how well test items are con-
structed. If items are clear and focused, then they will provide reliable information
about the construct being measured. If items are vague and ambiguous, then they are
open to many interpretations, which may confuse test takers and result in unreliable
test scores.
• Test Length In general, as the number of items on any test increases, the test repre-
sents the content domain being measured more accurately. Thus, the greater the
number of items, the greater the reliability/precision.
• Test-Taker Variables Certain test-taker variables can affect reliability/precision by
influencing the individual’s performance on a test. Motivation, fatigue, illness, phys-
ical discomfort, and mood can all be sources of error variance.
• Test Administration Sources of measurement error that occur during test adminis-
tration may interfere with an individual’s test results, such as the examiner not fol-
lowing specified administration instructions, room temperature, lighting, noise, and
critical incidents occurring during test administration.
instruments are available for review (e.g., Mental Measurements Database, test publishers,
Association for Assessment and Research in Counseling [AARC] test reviews) when you
are selecting an instrument, it is important to understand the manner in which reliability/
precision is calculated so that you can evaluate the reported reliability coefficients. In many
cases, test developers and researchers estimate reliability/precision using more than one
method. The format and nature of the test will dictate the methods of reliability/precision
that are applicable.
As stated earlier, reliability/precision reflects the consistency, dependability, and
reproducibility of scores upon additional test administrations—and those additional test
administrations can take place at different times, involve the use of different forms of the
same test, or some combination of the former and latter. The methods most often used to
estimate reliability/precision involve correlating two sets of test scores to obtain a reliabil-
ity coefficient, which is the same as any correlation coefficient, except that it specifically
represents the reliability/precision of test scores. Conceptually, reliability/precision can
be interpreted as the amount of the variability of the observed scores that is explained by
the variability of the true scores; thus, the reliability/precision coefficient is the ratio of the
true score variance to the observed score variance:
s2T
r =
s2X
where
r = the reliability coefficient
s2T = the variance of the true scores
s2X = the variance of the observed scores
In this formula, the variance of true scores and observed scores is used because we are look-
ing at the score variability of a group of examinees. Thus, reliability coefficients always
relate to a group of test scores, not to an individual score. The closer the reliability coefficient
is to 1.00, the more the observed score approximates the true score, and the better the
reliability/precision. In other words, the higher the reliability coefficient, the more that
variation in test scores is due to actual differences among the test takers in whatever char-
acteristic or trait is being measured. The closer reliability coefficients are to 0, the more that
test scores represent random error, not actual test-taker performance.
The reliability coefficient can also be thought of as a percentage. If we subtract the
reliability coefficient from 1.00, we have the percentage of the observed variation that is
attributable to random chance or error:
Error = 1 - r
For example, suppose that college students must take an admissions test to be eligible to
enroll in a particular academic program, and the reliability coefficient of the test is .35. This
means that 35% of the variation in scores is explained by real differences, and 65% of the
differences must be attributed to random chance or error, because 1 - .35 = .65. How
likely would you be to accept an admission decision based on this test score? If a test of this
nature was the only criteria for admission, then you would want to have a high degree of
certainty that differences in scores of applicants are due to different ability rather than error.
Chapter 5 • Reliability/Precision 97
Test-Retest Same test given twice with a time Stability Time sampling
interval between testings
Alternate Forms
Simultaneous Equivalent tests given at the same Equivalence Content sampling
Administration time
Delayed Equivalent tests given with a time Stability and Time sampling and
Administration interval between testings equivalence content sampling
Internal
Consistency
Split-Half One test is divided into two Equivalence Content sampling
comparable halves, and both halves and internal
are given during one testing session consistency
Kuder–Richardson One test given at one time (items Internal Content sampling
Formulas (KR) and compared to other items or to the consistency
Coefficient Alpha whole test)
Interrater One test given, and two individuals Interrater Interrater differences
independently score the test agreement
Test-Retest
The test-retest method is one of the oldest and perhaps one of the most commonly used
methods of estimating the reliability/precision of test scores. Reliability/precision is con-
cerned with the consistency of test scores, and the test-retest method directly assesses the
consistency of test takers’ scores from one test administration to the next (Murphy &
Davidshofer, 2005). An estimate of test-retest reliability is relatively easy to calculate: sim-
ply administer the same instrument on two separate occasions and then correlate the first
set of scores with the second. This correlation is called a coefficient of stability, because it
reflects the stability of test scores over time.
Because test-retest reliability estimates error related to time sampling, the time inter-
val between the two test administrations must be specified, because it will affect the stabil-
ity of the scores. Although there is no fixed time interval that can be recommended for all
tests, we know that if the interval is very short, then such factors as carryover effect and
practice effect could influence scores on the second test administration. On the other hand,
if the interval is very long, then learning, maturation, or intervening experiences may
affect the scores on the second occasion (Resch et al., 2013).
98 Chapter 5 • Reliability/Precision
The test-retest method is most useful when measuring traits, abilities, or character-
istics that are stable and do not change over time. For example, one would not expect
adult intelligence test scores to change between two separate test administrations,
because intellectual ability typically remains stable through adulthood (assuming that
no unusual circumstances, such as brain injury, affect it). As such, we would expect test-
retest reliability for intelligence test scores to be high; specifically, there would be a
strong correlation between the results of the first test administration and the results of
the second test administration. Instruments that measure variables that are transient and
constantly changing are not appropriate for test-retest evaluation. For example, a mood
state (e.g., an individual’s level of mood at an exact moment in time) can fluctuate from
day to day, hour to hour, and even moment to moment (Furr & Bacharach, 2013). In this
case, the scores on a particular mood-state inventory would not be stable during any
significant time interval. Thus, test-retest reliability on the inventory scores would be
low. This could be mistakenly interpreted as the inventory’s scores not being reliable
when, in reality, the reliability coefficient is reflecting the individual’s change in mood
across the two testing occasions.
Thus, when reviewing test-retest reliability in an instrument manual, it is important
to consider two things: (1) the length of the time interval between test administrations and
(2) the type of construct being tested. Examples of test-retest reliability as reported in test
manuals include the following:
• On the Multidimensional Self-Esteem Inventory (MSEI), the test-retest coefficient on
the Global Self-Esteem scale was .87 over a 1-month interval.
• The subset of 208 college students in a 3-month retest on the five domains of the NEO
Personality Inventory-Revised (NEO PI-R) resulted in coefficients of .79, .79, .80, .75,
and .83 for neuroticism, extraversion, openness, agreeableness, and conscientious-
ness, respectively.
• The test-retest coefficients on the Vocational Preference Inventory (VPI) for a group
of 100 females on the Realistic scale were .79 over 2 weeks, .57 over 2 months, .86 over
1 year, and .58 over 4 years.
Alternate Form
Alternate form reliability, also called parallel forms reliability, helps us determine whether
two equivalent forms of the same test are really equivalent. The two forms use different
items, but the items come from the same content domain. In addition, the forms have
the same number of items, use the same type of format, have almost equal difficulty,
and include the same directions for administering, scoring, and interpreting the test.
There are two procedures for establishing alternate forms reliability: simultaneous
administration and delayed administration (Reynolds, Livingston, & Willson, 2009).
Simultaneous administration involves giving the two forms of the test simultaneously—
that is, to the same group of people on the same day. This procedure eliminates the
problem of memory and practice, which affects test-retest reliability. Delayed administra-
tion involves giving the two forms on two different occasions. Alternate form reliability
based on simultaneous administration is sensitive to sources of error related to content
sampling. The resulting correlation, called a coefficient of equivalence, tells us how closely
the two forms measure the same construct. Alternate form reliability based on delayed
administration reflects both content-sampling and time-sampling errors; it provides
Chapter 5 • Reliability/Precision 99
Split-Half Reliability In estimating split-half reliability, a test is divided into two compara-
ble halves, and both halves are given during one testing session. The results on one half of
the test are then correlated with the results on the other half of the test, resulting in a coef-
ficient of equivalence that demonstrates similarity between the two halves. Once again,
this process is often used when test developers and researchers are attempting to evaluate
100 Chapter 5 • Reliability/Precision
2r
Reliability of Full Test =
1 + r
In this formula, r represents the correlation between the two halves of the test. If the cor-
relation is .60, we can substitute the coefficient in the formula and calculate a reliability/
precision estimate for the full test:
2(.60) 1.20
Reliability of Full Test = = = .75
1 + (.60) 1.60
The reliability coefficient of .75 estimates the reliability/precision of the full test when the
two halves are correlated at .60. When you see a reliability coefficient reported as a
Spearman–Brown coefficient, you automatically know that reliability/precision has been
assessed using the split-half method.
An advantage of the split-half method of estimating reliability/precision is that it
involves administering a test on a single occasion (Beckstead, 2013). However, because
only one testing is involved, this approach reflects errors due only to content sampling; it
is not sensitive to time-sampling errors.
reliability on all combinations of items resulting from different splits of the test. The KR
20 formula is as follows:
N g rq
r = a1 - 2 b
N - 1 s
where
r = reliability
N = number of items on the test
g = summation
p = percentage of examinees that get each item correct
q = percentage of examinees get each item incorrect
s2 = the variance of the test
To apply this formula, say that you wish to see how well the items on a five-item test are related
to each other. You could begin by calculating the percentage of examinees that got each item
correct and the percentage that got each item incorrect. This could be displayed as follows:
Item 1 2 3 4 5
p .5 .4 .8 .6 .7
q .5 .6 .2 .4 .3
pq .25 .24 .16 .24 .21
This shows that 50% of the examinees answered item 1 correctly, and 50% answered it
incorrectly; 40% of examinees answered item 2 correctly, and 60% answered it incorrectly;
and so on. If we calculate the variance of the test (s2 = 2.00), then we can plug the remain-
ing numbers into the KR 20 formula:
The KR 21 formula is very similar to the KR 20. The KR 21 is used to calculate the
reliability/precision of dichotomous items that are homogeneous or about the same level
of difficulty. The only difference between the KR 20 and KR 21 equations is that instead of
calculating the p’s and q’s for every item and summing the results, we simply use the mean
test score (X). The KR 21 formula is as follows:
X
X a1 - b
N £ N §
r = 1 -
N - 1 s2
102 Chapter 5 • Reliability/Precision
Suppose we gave an 80-item test, and the mean score (X) was 50, and the variance (S2) was
100. The KR 21 reliability is computed as follows:
50
50a1 - b
80 £ 80 §
r = 1 -
80 - 1 100
80 50(.375)
= c1 - d
79 100
18.75
= 1.013c 1 - d
100
= 1.013[1 - .1875]
= 1.013[0.8125]
= .82
N s2 - g s2i
a = a b
N - 1 s2
The coefficient alpha formula is similar to the KR 20 formula, except g rq has been replaced
by g s2i . This new term, s2i , stands for the variance of the individual items (i). This means that
instead of summing the proportion of correct and incorrect answers we sum the individual
item variance. This subtle change in the formula allows us to describe the variance in test
items that have multiple possible responses, not just items with a right/wrong format.
The formula for Cronbach’s Alpha is equal to and produces the same result as KR 20
but is not limited to dichotomous items. As a result, Cronbach’s Alpha has been increas-
ingly used in place of KR 20 as a reliability estimate (Suter, 2012). Although different issues
must be considered when selecting a reliability/precision calculation, coefficient alpha has
become the primary measure. As you begin the process of evaluating instruments for use
in counseling practice, it will be important to have a thorough understanding of Cron-
bach’s Alpha.
Interrater Reliability
As described earlier, tests that require direct observation of behavior rely on use of raters
(i.e., observers, scorers, judges) to record and score the behaviors. A potential source of error
with observations is a lack of agreement (i.e., lack of reliability/precision) among individuals
who score the test. Interrater reliability is the extent to which two or more raters agree. Say that
two raters use a 4-point scale (1 means least effective and 4 means most effective) to rate an
individual’s communication skills. Interrater reliability assesses how consistently the raters
Chapter 5 • Reliability/Precision 103
implement the rating scale. For example, if one rater scores a 1 for the individual’s response
and another scores a 4 for the same response, then this would mean that the two raters do not
agree on the score and that there is low interrater reliability.
There are various ways to estimate interrater reliability, but the basic method for
assessing the level of agreement among observers is by correlating the scores obtained
independently by two or more raters. Very high and positive correlations, in the order of
.90 or higher, suggest that the proportion of error that is accounted for by interrater differ-
ences is 10% or less, because 1 - (.90) = .10 (Urbina, 2014). It is important to remember
that interrater reliability does not reflect content-sampling or time-sampling error; rather,
it reflects interrater agreement and is sensitive only to differences due to the lack of agree-
ment between the raters scoring the test.
coefficients to be as large as possible (between +.00 and +1.00). We know that a coefficient
of 1.00 means that an observed score is equal to the true score, meaning there is no error
variance. Thus, the closer a reliability coefficient approaches 1.00, the more reliable it is. For
example, a reliability coefficient (r) of .95 would indicate that 95% of the variation in test
scores is explained by real differences, and only 5% (i.e., 1 - .95) of the variation is attrib-
uted to error; the test scores would be considered very reliable. This is infinitely better than
an r of .20, which means that 80% of the score variations are due to error.
Unfortunately, there is no minimum threshold for a reliability coefficient to be con-
sidered appropriate for all tests. What constitutes an acceptable reliability coefficient can
depend on several factors, such as the construct being measured, the way the test scores
will be used, and the method used for estimating reliability (Reynolds, Livingston, &
Willson, 2009). Because this textbook is concerned with evaluating assessment methods,
we will suggest what level of reliability/precision is sufficient; however, readers must
remember that there is no absolute standard on values that constitute adequate reliability
coefficients. We generally consider that for most assessment instruments reliability coeffi-
cients of .70 or higher are acceptable, reliability/precision estimates below .70 are margin-
ally reliable, and those below .60 are unreliable (Sattler & Hoge, 2006; Strauss, Sherman, &
Spreen, 2006). If a test is being used to make important decisions that significantly impact
an individual (i.e., a high-stakes test), then reliability coefficients should be at least .90
(Reynolds, Livingston, & Willson, 2009; Wasserman & Bracken, 2013). For interrater relia-
bility, coefficients should also be no less than .90 (Smith, Vannest, & Davis, 2011). Table 5.2
provides some general guidelines for evaluating the magnitude of reliability coefficients.
this group of observed scores the error distribution, which is assumed to be normal, with a
mean of zero and a standard deviation equal to SEM. Furthermore, we consider the mean
of the error distribution to be the true score. If test scores are very reliable, then the observed
scores are close to the true score (the mean of the error distribution); if test scores are not
reliable, then they fall away from the true score.
It is important to understand the differences between SEM and standard deviation.
Unlike standard deviation, which is the measure of the spread of scores obtained by a
group of test takers on a single test, SEM is the measure of the spread of scores obtained by
a single individual if the individual was tested multiple times. SEM is a function of both the
reliability/precision and standard deviation of a test and is calculated using the following
formula:
SEM = s 11 - r
where
SEM = the standard error of measurement
s = the standard deviation
r = the reliability coefficient of the test
Say that we want to calculate SEM for the Verbal Comprehension Index (VCI) of the
Wechsler Intelligence Scale for Children (WISC-IV). The reliability coefficient for the VCI
is .94, the mean is 100, and the standard deviation is 15 (which is the same mean and stand-
ard deviation for all index scores on the WISC-IV). SEM is computed as follows:
Now, let’s calculate SEM for the Processing Speed Index (PSI) of the WISC-IV, which
has a reliability coefficient of .88 and a standard deviation of 15:
Note that in these examples both tests have the same standard deviation (S = 15); thus,
SEM becomes a function solely of the reliability coefficient. Because the PSI has a lower
reliability coefficient (r = .88) than the VCI (r = .94), the PSI SEM is higher than the SEM
for the VCI.
Because SEM depends on both the standard deviation and the reliability coefficient
of a test, and because different tests have different standard deviations, the SEM cannot be
used by itself as an index of reliability/precision (Urbina, 2014). You must be able to eval-
uate SEM in context. For example, tests that have greater score range and larger standard
106 Chapter 5 • Reliability/Precision
deviations, such as the SAT (standard deviation = 100), will have much larger SEM num-
bers than tests with small standard deviations, such as the WISC-IV (standard deviation = 3;
Urbina, 2014). Although the standard deviation and SEM may be different, the reliability
coefficients can still be similar. Although we do not use SEM alone to interpret reliability/
precision, we use SEM in creating confidence intervals around specific observed scores,
which can guide score interpretations.
Confidence Intervals
Once we have computed the SEM for a given test score, we can build a confidence interval
around an individual’s observed score so that we can estimate (with a certain level of confi-
dence) his or her true score. Although we will never know what an individual’s true score
is, forming intervals around the individual’s observed score allows us to estimate the
probability that the true score falls within a certain interval. Thus, confidence intervals tell us
the upper limit and lower limit within which a person’s true score will fall. In assessment,
we are interested in confidence intervals at the 68% (within {1 standard deviations from
the mean), 95% (within {2 standard deviations), and 99.5% (within {3 standard devia-
tions) probability levels. The z scores associated with the 68%, 95%, and 99.5% probability
levels are 1.00, 1.96, and 2.58, respectively. We choose the appropriate level or probability
depending on how confident we want to be in our estimate of the test taker’s true score.
We calculate the ranges of the confidence intervals (CI) at the 68%, 95%, and 99.5% levels
as follows:
To illustrate this, say that Leticia received a score of 110 on the VCI of the WISC-IV. We
previously calculated the SEM on the VCI as 3.67. Using this information, we would first
compute the ranges of the confidence intervals as follows:
Using these score ranges, we could calculate the confidence intervals for Leticia’s VCI
score as follows:
These confidence intervals reflect the range in which an individual’s true test score
lies at various probability levels. Thus, based on the information provided previously, we
can conclude that at the 99.5% confidence interval we are 99.5% confident that Leticia’s
true score falls in the range of 101 to 119.
Chapter 5 • Reliability/Precision 107
How much does test length impact reliability/ Next, we apply this to the Spearman–Brown f ormula:
precision? If we already know the reliability/pre-
cision of an existing test, we can use the Spear- 5.30
rnew =
man–Brown formula to predict what the new 1 + (5 - 1).30
reliability/precision would be if we increased the 15
number of items: =
1 + (4).30
n * rxx 1.5
rnew = =
1 + (n - 1)rxx 1 + 1.20
1.5
=
where 2.20
rnew = t he new reliability/precision estimate = .68
after lengthening (or shortening)
the test We can see that by increasing the number of
rxx = the reliability/precision estimate of items from 30 to 80, we increased the total test
the original test score reliability/precision from .30 to .68. To fur-
n = the multiplier by which test length is ther illustrate this, the following list of tests begins
to be increased or decreased (the with a 10-item test that has a reliability/precision
new test length divided by the old of .30. You can see how reliability/precision
test length) increases as the number of test items increases:
For example, let’s assume that the rxx for a
10-item test is 0.30. We decide to increase the Number of Items Reliability/precision
test to 50 items. To determine the reliability coef- 10 .30
ficient for the longer test, we first calculate the n:
20 .46
new test length 50 .68
n = 100 .81
old test length
50 200 .90
= = 5 500 .96
10
Increasing Reliability/Precision
Although measurement error exists in all tests, test developers can take steps to reduce
error and increase the reliability/precision of test scores. Probably the most obvious
approach is simply to increase the number of items on a test (Reynolds, Livingston, &
Willson, 2009). This is based on the assumption that a larger number of test items can more
accurately represent the content domain being measured, thereby reducing content-sam-
pling error. Sidebar 5.1 illustrates how the reliability/precision of a 10-item test with a
reliability coefficient of .30 can be increased by adding items. Other ways of improving
reliability/precision include writing understandable, unambiguous test items; using
selected-response items (e.g., multiple choice) rather than constructed-response items
(e.g., essay); making sure that items are not too difficult or too easy; having clearly stated
administration and scoring procedures; and requiring training before individuals can
administer, grade, or interpret a test.
108 Chapter 5 • Reliability/Precision
Summary
In this chapter, we presented information about We estimate the reliability/precision of
the reliability/precision of test scores, which is test scores by calculating a reliability coefficient.
the degree to which test scores are dependable, The type of reliability coefficient that we use
consistent, and stable upon additional test admin- depends on several factors, such as test content,
istrations. An important concept in reliability/ testing procedures, testing conditions, and the
precision is measurement error. In testing, we use of raters. Although no minimum level for
try to determine how much measurement error reliability coefficients exists, we know that the
may impact an individual’s true score. Meas- higher the coefficient, the more reliable the test
urement error results from factors that are irrel- scores. Calculating the standard error of meas-
evant to the construct being measured, such as urement can be helpful in interpreting reliability/
time-sampling error, content-sampling error, precision by using the statistic to create confi-
interrater differences, test-taker characteristics, dence intervals, which tell us the upper and
flaws in testing procedures, and even random lower limits within which a person’s true score
chance. will fall.
Suggested Activities
1. Review several standardized tests and identify a dequately cover the content domain being
the types of reliability/precision presented in the tested. He decides to use the KR 20 formula to
test manuals. determine the internal consistency of the test.
2. Write a paper or prepare a class report on one of the Using the following information, calculate the
following topics: factors that affect test score relia- internal consistency using the KR 20 formula.
bility/precision, increasing reliability/precision, Are the test scores reliable in terms of internal
or reliability/precision of teacher-made tests. consistency?
3. Read the critical reviews of some of the tests that
you have taken and see how the reviewers rate Item 1 2 3 4 5
the reliability/precision of the tests’ scores. p .3 .4 .6 .5 .4
4. A teacher wrote a test consisting of five dichoto- q .7 .6 .4 .5 .3
mous items and wants to know if the items pq .21 .24 .24 .25 .12
Chapter 5 • Reliability/Precision 109
5. Construct an achievement test based on the con- r eliability coefficient for WMI scores is .92, the
cepts introduced in this chapter and administer it mean is 100, and the standard deviation is 15.
to a number of students in your class. Then, com- Calculate the standard error of measurement,
pute the reliability/precision of the test’s scores and then create confidence intervals for Sara at
using split-half and KR 20 or KR 21. {1 SEM and {2 SEM.
6. Sara scored a 98 on the Working Memory
Index (WMI) of the WISC-IV. We know that the
References
American Educational Research Association (AERA), Murphy, K. R., & Davidshofer, C. O. (2005). Psycho-
American Psychological Association (APA), & logical testing: Principles and applications (6th ed.).
National Council on Measurement in Education Upper Saddle River, NJ: Prentice Hall.
(NCME). (2014). Standards for educational and psy- Resch, J., Driscoll, A., McCaffrey, N., Brown, C.,
chological testing. Washington, DC: Authors. Ferrara, M. S., Macciocchi, S., & Walpert, K. (2013).
Ayeni, E. (2012). Development, standardization, and ImPact test-retest reliability: Reliably unreliable?
validation of Social Anxiety Scale. IFE Psychologia, Journal of Athletic Training, 48(4), 506–511.
20(1), 263–274. doi:10.4085/1062-6050-48.3.09
Beckstead, J. W. (2013). On measurements and their Reynolds, C. R., Livingston, R. B., & Willson, V.
quality. Paper 1: Reliability–History, issues and (2009). Measurement and assessment in education
procedures. International Journal of Nursing Studies, (2nd ed.). Boston, MA: Pearson.
50(7), 968–973. doi:10.1016/j.ijnurstu.2013.04.005 Sattler, J. M., & Hoge, R. D. (2006). Assessment of chil-
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2013). dren: Behavioral, social, and clinical foundations
Measuring test measurement error: A general (5th ed.). San Diego, CA: Jerome M. Sattler
approach. Journal of Educational and Behavioral Statis- Publisher Inc.
tics, 38(6), 629–663. doi:10.3102/1076998613508584 Smith, S. L., Vannest, K. J., & Davis, J. L. (2011). Seven
Cronbach, L. J. (1951). Coefficient alpha and the inter- reliability indices for high-stakes decision making:
nal structure of tests. Psychometrika, 16, 297–334. Description, selection, and simple calculation.
Furr, R. M., & Bacharach, V. R. (2013). Psychometrics: Psychology in the Schools, 48(10), 1064–1075.
An introduction (2nd ed.). Thousand Oaks, CA: doi:10.1002/pits.20610
Sage Publications. Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A
Kaplan, R. M., & Saccuzzo, D. P. (2013). Psychological compendium of neuropsychological tests: Administra-
testing: Principles, applications, and issues (8th ed.). tion, norms, and commentary (3rd ed.). New York,
Belmont, CA: Wadsworth. NY: Oxford University Press.
Kuder, G. F., & Richardson, M. W. (1937). The theory Suter, W. N. (2012). Introduction to educational research:
of estimation of test reliability. Psychometrika, 2, A critical thinking approach. Thousand Oaks, CA:
151–160. Sage Publications.
McManus, I. C. (2012). The misinterpretation of the Urbina, S. (2014). Essentials of psychological testing
standard error of measurement in medical educa- (2nd ed.). Hoboken, NJ: John Wiley & Sons.
tion: A primer on the problems, pitfalls, and pecu- Wasserman, J. D., & Bracken, B. A. (2013). Funda-
liarities of the three different standard errors of mental psychometric considerations in assess-
measurement. Medical Teacher, 34(7), 569–576. doi: ment. In J. R. Graham & J. A. Naglieri (Eds.),
10.3109/0142159X.2012.670318 Handbook of psychology (Vol. 10: Assessment psy-
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2012). chology, 2nd ed., pp. 50–81). Hoboken, NJ: John
Measurement and assessment in teaching (11th ed.). Wiley & Sons.
Upper Saddle River, NJ: Pearson.
CHAPTER
6 Validity
In everyday terms, when we refer to something as valid, we are saying that it is sound,
meaningful, or accurate. Similarly, when applied to measurement, validity refers to
whether the claims and decisions made on the basis of assessment results are sound,
meaningful, and useful for the intended purpose of the results. Traditionally, the defini-
tion of validity centered on simply whether a test or questionnaire (i.e., assessment instru-
ment) “measured what it was supposed to measure.” Today, the concept of validity refers
to evidence that supports the use of an assessment instrument for specific purposes. An
argument for the validity of an instrument can be made using five sources of information:
test content, response process, internal structure, relations to other variables, and testing
consequences (American Educational Research Association (AERA), American Psycho-
logical Association (APA), & National Council on Measurement in Education (NCME),
2014). This new perspective on creating a validity argument is particularly relevant in
recent years due to the accountability movement affecting both educational and psycho-
logical assessment. Therefore, assessment results are often used for making important
and sometimes life-changing decisions about an individual, which further emphasizes
the importance of evaluating the appropriateness of the interpretations made from assess-
ment results (i.e., validity).
JJ Describe the major sources of validity evidence and provide examples of the
process.
110
Chapter 6 • Validity 111
istorically, validity was subdivided into three distinct types: content validity, criterion
H
validity, and construct validity (known as the tripartite view of validity). The most recent
edition of the Standards (American Educational Research Association (AERA) et al., 2014)
makes no reference to distinct types of validity; rather, it emphasizes the unitary concept by
referring to five sources of validity evidence. Despite this current unitary view of validity,
the tripartite language remains entrenched in testing; you will see references to it (i.e., con-
tent validity, criterion-related validity, construct validity) in many of the test manuals and
test reviews today. Furthermore, some current educational researchers are questioning the
unitary concept (centering on construct validity) and are reemphasizing the importance of
content validity (Sireci & Faulkner-Bond, 2014). Thus, many textbooks continue to include
the traditional terminology when describing types and sources of validity evidence.
4. Validity is specific to some particular group, use, or purpose of the test score interpreta-
tions; no test is valid for all purposes or all groups. For example, the validity evidence
might support the interpretation of results from a reading test for current reading compre-
hension levels of test takers but have little evidence for predicting future levels. Alter-
nately, a math test might have strong evidence for using the results to interpret arithmetic
skills for fifth-grade students, but have little support for the interpretation of skills for
third graders. Thus, when evaluating or describing validity evidence, it is important to
consider how test scores are intended to be interpreted and used and the appropriateness
of the population for which the test is employed (American Educational Research Associa-
tion (AERA) et al., 2014).
Revisiting Construct
Because we use the term construct so frequently in this chapter, we will revisit the meaning
of the term before discussing validity evidence. Constructs (also called latent variables) are
scientifically developed concepts, ideas, or hypotheses that are used to describe or explain
behavior (Cohen & Swerdlik, 2012). As such, constructs cannot be measured directly or
observed. Instead, a construct is defined or inferred by a group of interrelated variables or
dimensions that can be directly measured or observed (known as manifest, or observed,
variables). For example, although we cannot measure or observe the construct aggression, it
can be inferred from the interrelationships among the measurable variables that make up
aggression, such as physical violence, verbal attacks, poor social skills, and so on.
Constructs can be very comprehensive and complex (e.g., personality) or narrowly
defined (e.g., depressed mood). In psychology, the term construct may be applied to traits,
characteristics, or behaviors, such as personality, mood, anxiety, aptitude, adjustment,
attention, and self-esteem, to name just a few. In education, examples of constructs include
intelligence, achievement, emotionally disturbed, and learning disabled. Examples of
employment- or career-related constructs include job performance, motivation, clerical
aptitude, interests, or work values. These examples represent only a few of the potential
constructs. In terms of validity, construct is an important concept, because if we have evi-
dence that the interpretations of assessment results are valid based on the purpose of the
test, then the results are considered to accurately reflect the construct being measured.
Threats to Validity
Numerous factors can make test results invalid for their intended use. Two main threats
include construct underrepresentation and construct-irrelevant variance (American
Chapter 6 • Validity 113
ducational Research Association (AERA) et al., 2014). These terms simply refer to a test
E
measuring less (construct underrepresentation) or more (construct irrelevant) of the con-
struct than it is supposed to be measuring (Reynolds, Livingston, & Willson, 2008). Con-
struct underrepresentation implies that a test is too narrow and fails to include important
dimensions or aspects of the identified construct. An example of this is a test that is sup-
posed to measure writing skills. If the test only contains spelling questions and leaves out
other aspects of writing (such as grammar and punctuation), then it is an inaccurate repre-
sentation of writing skills.
In contrast, construct-irrelevant variance means that the instrument is too broad and con-
tains too many variables, many of which are irrelevant to the construct. For example, if a test
on a particular subject area requires undue reading comprehension (e.g., extensive and com-
plex instructions on a math test), then test scores might be invalidly low, reflecting the test
taker’s difficulty with reading (Messick, 1995; Reynolds et al., 2008). Alternately, if the items
on a test contain clues that permit the test taker to respond correctly, then the resulting score
may be invalidly high. An example of this is a reading comprehension test that includes a
passage that is well known to some test takers; their performance may be a result of their
familiarity with the passage, not with their reading comprehension ability. Thus, construct-
irrelevant variance is viewed as a contaminant to accurate test score interpretation.
In addition to construct underrepresentation and construct-irrelevant variance, other
factors can impact the validity of the interpretation of test results, including the following
(Miller et al., 2012):
• Factors within the Test Ambiguous, inappropriate, or poorly constructed test items;
too few test items; improper arrangement of items; identifiable patterns of answers
• Administration and Scoring Procedures Failure to follow standard directions and
time limits, giving test takers unfair aid, scoring errors
• Test-Taker Characteristics Test takers with emotional disturbances, test anxiety,
lack of motivation
• Inappropriate Test Group Giving the test to individuals who differ from the valida-
tion group
measurement
domain Examination
factor values
items statistics
relevance particular
evaluate High tests discrimination
Response
determine
relate related interpretation
processes
instruments
development
difficulty
soundness
Decision
drawn
appropriateness frameworks
Internal compare
dimensionality
correlations
negative
construct
relationship Stakes program
validity
insure
Content
procedures
threads
Comparing
Testing
list dissimilar
Evidence
presence
covers
Test
Inherent systematic
Unintended
use curriculum
bias analysis
Table processcontrol correlate
Matter
construction
during
Scores
between specifications
item
SMEs groups
Subject
Structure
Social
invariance
used
ratings results
criterion quality based decisions
demonstrate
specific
The content validation process begins at the outset of instrument construction and
follows a rational approach to ensure that the content matches the test specifications. The
first step in the process is clearly delineating the construct or the content domain to be
measured. The definition of the construct determines the subject matter and the items to be
included on the instrument.
Once the construct or content domain has been defined, the second step is to develop
a table of specifications. A table of specifications, or a test blueprint, is a two-dimensional chart
that guides instrument development by listing the content areas that will be covered on a
test, as well as the number (proportion) of tasks or items to be allocated to each content area.
The content areas reflect essential knowledge, behaviors, or skills that represent the con-
struct of interest. Test developers decide on relevant content areas from a variety of sources,
such as the research literature, professional standards, or even other tests that measure the
same construct. The content areas of instruments measuring achievement or academic abil-
ities often come from educational or accreditation standards, school curricula, course syl-
labi, textbooks, and other relevant materials. For example, in constructing the KeyMath 3, a
norm-referenced achievement test measuring essential math skills in K–12 students, test
116 Chapter 6 • Validity
developers created a table of specifications with content areas that reflected essential math
content, national mathematical curriculum, and national math standards. For personality
and clinical inventories, content areas may be derived from the theoretical and empirical
knowledge about personality traits or various mental health problems. For example, the
content on the Beck Depression Inventory (BDI-II) was designed to be consistent with the
diagnostic criteria for depression in the Diagnostic and Statistical Manual of Mental Disorders,
Fifth Edition (DSM-5; American Psychiatric Association (APA), 2013). In employment assess-
ment, content areas may reflect the elements, activities, tasks, and duties related to a par-
ticular job; for example, Figure 6.2 displays a table of specifications for an instrument
designed to predict sales performance. Instrument manuals should always provide clear
statements about the source(s) of content areas that are represented on the test.
After developing the table of specifications with the identified content areas, test
developers write the actual test items. Thus, the next step in the content validation process
involves recruiting multiple outside consultants (i.e., subject matter experts [SMEs]); these
consultants review the test items to determine if they do in fact represent the content
domain. The SMEs can consist of both content experts in the field and lay experts. For
example, to evaluate the content validity of an instrument assessing depression, two
groups of experts should participate: (1) people who have published in the field or who
have worked with depressed individuals and (2) people who are depressed (Rubio, 2005).
There are a number of methods that can be used to report the data provided by the SMEs.
Although a test developer can report SME ratings as evidence, it is more useful to repre-
sent the degree of agreement among SMEs through a variety of statistics (e.g., Aiken’s
Validity Index). In addition, SMEs can perform an analysis of alignment between the test
and content standards of external bodies (e.g., state department of education). The SME
analysis of alignment can be conducted by using a variety of methods (e.g., Webb Method
and Achieve Method) and can be presented in a matrix format.
applicants to complete an aptitude test as part of the application process. Because the com-
pany believes that the scores on the aptitude test (the predictor variable) will predict job
success (the criterion), the company will only hire individuals who score equal to or greater
than a designated cutoff score. In this situation, two types of errors might occur: a false
positive error, which means predicting that a positive outcome will occur and it does not; or
a false negative error, which means predicting that a negative outcome will occur and it does
not. Figure 6.3 presents a graph of the two types of error in our example. From the graph,
four quadrants are evident. A person who is predicted to succeed in a job and subsequently
Select
Cutoff Score
Criterion
Job Success
does succeed would fall into the true positive quadrant. Someone who is predicted to suc-
ceed but fails would be rated as a false positive. The person who is predicted to fail and does
so would be tallied as a true negative. Anyone who is predicted to fail but actually succeeds
would be a false negative. Prediction errors can be analyzed, and adjustments to the predic-
tor variable can be made if needed. In our example, if too many false positive errors were
occurring and too many individuals were being hired who did not have job success, then
the cutoff score on the aptitude test could be raised. This would mean that almost everyone
hired would be an above-average performer, and the number of false positives for job suc-
cess would be reduced.
Evidence of relationships between test scores and other variables can be gathered
from several sources. It is important to remember that the accumulation of evidence across
the different types of validity lends credence to an argument for validity. In order to fur-
ther demonstrate the process of gathering evidence of relations to other variables, we pro-
vide a discussion of the following sources:
• Evidence of homogeneity
• Convergent and discriminant evidence
• Group differentiation studies
• Age differentiation studies
• Experimental results
• Factor analysis
Evidence of Homogeneity
Test homogeneity refers to how uniform the items and components (i.e., subtests, scales) of a
test are in measuring a single concept (Cohen & Swerdlik, 2012). In reference to the 2014
Standards for Educational and Psychological Testing (American Educational Research Associa-
tion (AERA) et al., 2014), test homogeneity is one means of providing evidence of the internal
structure of a test. One way test developers can provide evidence of homogeneity is by show-
ing high internal consistency coefficients. As you will recall from the previous chapter on
reliability/precision, an estimate of internal consistency reliability, such as the coefficient
alpha, evaluates the extent to which items on a test measure the same construct by examining
the intercorrelations among test items. High internal consistency reliability means that test
items are homogeneous, which increases confidence that items assess a single construct.
In addition to intercorrelations of items, evidence of homogeneity can also be obtained
by correlating the scores on each test item with the scores on the total test (i.e., the item-to-
total correlation). For example, if you want to obtain evidence of homogeneity on a test of
student motivation, then you could give the test to a group of students and then correlate
the scores on each item with the scores on the total test. If all the items correlated highly
with the total test score, then there is evidence that the items measured the construct of
student motivation. The same procedure can be used for the scales or subtests on a test.
High correlations between scales or subtests on the same test provide evidence that these
components measure the construct that the test is intended to measure.
obtained by correlating instrument results with the results of other instruments that assess
the same construct. Unlike relations to other variables evidence, which makes score com-
parisons with criterion measures that are considered a better representation of the construct
of interest, convergent validity evidence compares instrument scores to scores from other
instruments assumed to be equivalent representations of the construct. High positive correla-
tions between the two tests help demonstrate evidence of overall construct validity. Test
developers show convergent evidence typically by correlating their test’s scores with the
scores of comparable, well-established tests (i.e., a gold standard). For example, researchers
have suggested that the Stanford–Binet Intelligence Scale (SB-5) global composite score is
often higher than scores produced on other measures of intelligence (Minton & Pratt,
2006). Garred and Gilmore (2009) investigated the relationship between the SB-5 and the
Wechsler Preschool and Primary Scales of Intelligence, Third Edition (WPPSI-III). After
testing a sample of 32 four-year-olds with the SB-5 and WPPSI-III, the researchers calcu-
lated a correlation of .79. Although this is a statistically significant result, the relationship
clearly has some error. What does this mean in practice? The decision to use one test over
another may produce different results. In their research, Garred and Gilmore found that
some students’ IQ scores differed by as much as 16 points. If you consider the implications
of IQ scores for school placement, then a difference of 16 points is incredibly large and
quite problematic. This example provokes thought about relations to other variables, but it
also has meaning for the discussion of consequences of testing. Keep this example in mind
as you review the information later in this chapter.
Another common procedure for establishing convergent evidence occurs when tests
are revised (Urbina, 2014). In this case, test developers report high correlations between
the new test and previous editions as evidence that both are measuring the same construct.
For example, the BDI-II test manual cites a correlation (r) of .94 between the BDI-I (Beck,
Ward, & Mendelson, 1961) and the BDI-II (Beck, Steer, & Brown, 1996) as evidence that
both editions measure the construct of depression. Evidence of convergent validity can
also be obtained from correlations across the scores of subtests, or subscales, of different
scales. For example, BDI-II has correlations with the Symptom Checklist-90-R Depression
subscale (r = .89; Beck et al., 1996) and with the Millon Clinical Multiaxial Inventory Major
Depression scale (r = .71; Millon, Millon, & Davis, 1994).
Another similar method used to substantiate validity involves discriminant evidence.
In contrast to convergent evidence, discriminant evidence is based on consistently low cor-
relations between the test and other tests that are supposed to measure different constructs.
For example, we would assume that SAT scores would be unrelated to scores on a social
skills measure or that scores from a measure of physical aggression would have low cor-
relation with the scores on a measure assessing withdrawal.
The Multitrait-Multimethod Matrix. Campbell and Fiske (1959) proposed a design they
called the multitrait-multimethod matrix (MTMM), which provides convergent and discri-
minant evidence of validity of a test. This approach involves measuring two or more distinct
traits (i.e., constructs) with two or more distinct measurement methods. For example, we might
measure the constructs of anxiety and depression using two methods: a self-report inven-
tory and a clinical interview. Data is then collected and assembled into a matrix (also called
a table of correlations) that displays correlations between
• the same trait assessed by the same method (i.e., reliability coefficients),
• the same trait assessed by different methods (i.e., convergent evidence),
Chapter 6 • Validity 123
distinct methods that measure each trait, it is sometimes not feasible in educational and
psychological assessment to meet the stringent requirements needed to use the process.
For example, it may be difficult to find three distinct methods to measure one particular
trait. However, variations of the MTMM are increasingly being used in validation studies
that involve using only tests or subtests (instead of distinct methods) that measure both
similar and dissimilar constructs.
Age Differentiation Studies Related to group differentiation studies are age differentia-
tion studies (also called developmental studies). Age differentiation studies focus on demon-
strating validity by showing the degree to which test scores increase or decrease with age.
We will look at intelligence as an example. According to intelligence theorists Horn and
Cattell (1966, 1967), intelligence encompasses two general types: fluid intelligence and
crystallized intelligence. Fluid intelligence is the ability to solve problems and adapt to new
situations. Crystallized intelligence refers to acquired knowledge and ability obtained
through education and personal experience. According to Horn and Cattell, fluid intelli-
gence improves through childhood and adolescence and then begins to decline as early as
late adolescence and into adulthood. In contrast, crystallized intelligence continues to
increase through middle age and maintains during old age. In the development of the
Kaufman Adolescent and Adult Intelligence Test (KAIT; Kaufman & Kaufman, 1993),
which is a measure of general intelligence composed of separate scales for fluid and crys-
tallized intelligence, the authors conducted age differentiation studies on several groups of
individuals between 11 years and 85+ years. They found that crystallized intelligence
scores increased until the mid-20s and then plateaued until age 54. After age 54, scores
dropped steadily but not dramatically. By ages 75 to 85+, crystallized intelligence scores
are similar to the scores for preadolescents and adolescents. Kaufman and Kaufman also
found that fluid intelligence scores increase until around the early 20s. Fluid intelligence
Chapter 6 • Validity 125
scores drop a bit but then remain the same until after age 54, when scores drop dramati-
cally. By ages 75 to 85+, fluid intelligence scores are well below the mean for age 11. The
results of the KAIT’s age differentiation studies somewhat follow the pattern of age-related
changes in intelligence predicted by Horn and Cattell.
Summary
In this chapter, we introduced the concept of (i.e., content evidence), the relationship between
validity. Validity refers to whether the infer- test results and external variables (i.e., relations
ences and decisions made on the basis of test to other variables), the internal structure of the
results are sound or appropriate. Threats to test, test consequences, or the response process.
validity include construct underrepresentation Methods to evaluate construct validity evidence
(i.e., the test fails to include important dimen- include evaluating evidence of the internal
sions or aspects of the identified construct) and structure, convergent and discriminant validity
construct-irrelevant variance (i.e., the test is too evidence, group differentiation studies, age dif-
broad and contains too many variables, many of ferentiation studies, experimental results, and
which are irrelevant to the construct), in addi- factor analysis. Multiple methods are used to
tion to factors within the test, administration provide overall evidence of validity and to sup-
and scoring procedures, test-taker characteris- port the use of a test for different applications. It
tics, and inappropriate test groups. is the responsibility of test users to carefully
The process of test validation involves read the validity information in the test manual
evaluating various sources of validity evidence. and to evaluate the suitability of the test for
Evidence may focus on the content of the test their specific purposes.
Chapter 6 • Validity 127
Suggested Activities
1. We have provided a list of words and phrases 4. Study the validity sections of a test manual for an
(i.e., constructs) that describe various aspects of achievement test and personality test. How does
personality. Select one of the constructs and information differ for these two types of tests?
address the questions that follow. Explain.
List of Possible Constructs: 5. Read critical reviews of some of the tests that you
• Empathy have taken and see what the reviewers say about
• Genuineness the validity of the tests.
• Selfishness 6. Read the test manual or a critical review of the
• Patience Kaufman Brief Intelligence Instrument (KBIT-2).
• Intolerance Identify two other cognitive ability tests that
a. Develop a clear definition of the construct. were used in the KBIT-2 validation process that
b. Brainstorm and identify several content areas that provided convergent evidence.
reflect the traits or behaviors of your construct. 7. Study the following descriptions of two tests and
c. Construct a table of specifications based on answer the questions that follow the descriptions:
the content areas.
2. Review several standardized tests and identify Test A: 40 items
the types of validity evidence presented in the Description: Measure of self-esteem
test manuals. Scales: Total Score, General Self-Esteem, Social Self-
3. Review the manuals of two instruments that Esteem, Personal Self-Esteem
measure intelligence. Describe how the instru- Reliability: Test-retest r = .81; coefficient alphas for
ments define the construct of intelligence. In the Total Score, General Self-Esteem, Social Self-
what ways are their definitions similar or differ- Esteem, Personal Self-Esteem scales are .75, .78,
ent? What types of content evidence do the man- .57, and .72, respectively.
uals provide? Based on this, which one do you Validity: Content—developed construct definitions
think is a better measure of intelligence? for self-esteem, developed table of specifications,
128 Chapter 6 • Validity
wrote items covering all content areas, used identity integration, and defensive self-
experts to evaluate items. Convergent—correlated enhancement.
with Coopersmith’s Self-Esteem Inventory Reliability: Test-retest for each scale ranges from .65
r = .41. Discriminant—correlated with Beck to .71. Coefficient alphas range on each scale from
Depression Inventory r = .05. Factor analysis .71 to .77.
revealed that the three subscales (General Self- Validity: Content—based on a three-level hierarchi-
Esteem, Social Self-Esteem, Personal Self-Esteem) cal model of self-esteem. Convergent—correlated
are dimensions of self-esteem. Homogeneity—cor- with the Self-Concept and Motivation Inventory
relations between the scales indicate that the Gen- r = .25 and with the Eysenck Personality Inven-
eral scale correlated with the Social scale at .67, the tory r = .45. Discriminant—correlated with
Personal scale at .79, and the Total scale at .89. Hamilton Depression Inventory r = .19.
a. Given the technical information provided,
Test B: 117 items
which of the preceding instruments would
Scales: Global self-esteem, competence, lovability,
you select?
likability, self-control, personal power, moral self-
b. What additional information would you want
approval, body appearance, body functioning,
to have to make your decision?
References
American Educational Research Association (AERA), and SB-5 with typically developing p reschoolers.
American Psychological Association (APA), & Australian Journal of Guidance and Counselling, 19(2),
National Council on Measurement in Education 104–115. doi:10.1375/ajgc.19.2.104
(NCME). (2014). Standards for educational and psy- Gorin, J. S. (2007). Test construction and diagnostic
chological testing. Washington, DC: Authors. testing. In J. P. Leighton & M. J. Gierl (Eds.), Cogni-
American Psychiatric Association (APA). (2013). tive diagnostic assessment in education: Theory and
Diagnostic and statistical manual of mental disorders practice (pp. 173–201). Cambridge: Cambridge
(5th ed.). Washington, DC: Author. University Press.
Anastasi, A. (1988). Psychological testing (6th ed.). Horn, J. L., & Cattell, R. B. (1966). Age differences in
New York, NY: Macmillan. primary mental ability factors. Journal of Gerontol-
Anastasi, A., & Urbina, S. (1997). Psychological testing ogy, 21, 210–220.
(7th ed.). Upper Saddle River, NJ: Prentice Hall. Horn, J. L., & Cattell, R. B. (1967). Age differences in
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Man- fluid and crystallized intelligence. Acta Psycholog-
ual for the Beck Depression Inventory-II. San Anto- ica, 26, 107–129.
nio, TX: Psychological Corporation. Kane, M. T. (2013). Validating the interpretations and
Beck, A. T., Ward, C., & Mendelson, M. (1961). Beck uses of test scores. Journal of Educational Measure-
Depression Inventory (BDI). Archives of General ment, 50(1), 1–73.
Psychiatry, 4, 561–571. Kane, M. T. (2006). Validation. In R. Brennan (Ed.),
Campbell, D.T., & Fiske, D.W. (1959). Convergent Educational measurement (4th ed., pp. 17–64). West-
and discriminant validation by the multitrait- port, CT: Praeger.
multimethod matrix. Psychological Bulletin, 56, Kaufman, A. S., & Kaufman, N. L. (1993). Manual for
81–105. Kaufman Adolescent and Adult Intelligence Test (KAIT).
Cohen, J. (1992). A power primer. Psychological Bulle- Circle Pines, MN: American Guidance Service.
tin, 112, 155–159. Kaufman, A. S., & Kaufman, N. L. (2014). Kaufman
Cohen, R. J., & Swerdlik, M. E. (2012). Psychological Assessment Battery for Children, second edition man-
testing and assessment: An introduction to tests and ual (KABC-II). Circle Pines, MN: American Guid-
measurement (8th ed.). Boston, MA: McGraw-Hill. ance Service.
Cronbach, L. J. (1990). Essentials of psychological test- Kline, P. (2000). The new psychometrics: Science, psy-
ing. New York, NY: Harper & Row. chology, and measurement. London: Routledge.
Garred, M., & Gilmore, L. (2009). To WPPSI or to Binet, Lane, S. (2014). Validity evidence based on testing
that is the question: A comparison of the WPPSI-III consequences. Psicothema, 26(1), 127–135.
Chapter 6 • Validity 129
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa- Piers, E. V., & Herzberg, D. S. (2002). Piers–Harris
tional measurement (3rd ed., pp. 13–103). New York, Children’s Self-Concept Scale manual (2nd ed.). Los
NY: American Council on Education/Macmillan. Angeles, CA: Western Psychological Services.
Messick, S. (1995). Validity of psychological assess- Psychological Corporation. (1997). WAIS-III/WMS-III
ment: Validation of inferences from persons’ technical manual. San Antonio, TX: Author.
responses and performances as scientific inquiry Reynolds, C. R., Livingston, R. B., & Willson, V.
into score meaning. American Psychologist, 50, (2008). Measurement and assessment in education
741–749. (2nd ed.). Boston, MA: Pearson.
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2012). Rios, J., & Wells, C. (2014). Validity evidence based
Measurement and assessment in teaching (11th ed.). on internal structure. Psicothema, 26(1), 108–116.
Boston, MA: Allyn Bacon/Merrill Education. Rubio, D. M. (2005). Content validity. Encyclopedia of
Miller, S. A. (2012). Developmental research methods Social Measurement, 1, 495–498.
(4th ed.). Thousand Oaks, CA: Sage Publications. Shultz, K. S., Whitney, D. A., & Jickar, M. J. (2014).
Millon, T., Millon, C., & Davis, R. (1994). MCMI-III Measurement theory in action: Case studies and
manual: Millon Clinical Multiaxial Inventory–III. exercises (2nd ed.). Thousand Oaks, CA: Sage
Minneapolis, MN: National Computer Systems. Publications.
Minton, B. A., & Pratt, S. (2006). Gifted and highly Sireci, S. G. (1998). The construct of content validity.
gifted students: How do they score on the SB-5? Social Indicators Research, 45, 83–117.
Roeper Review, 28, 232–236. Sireci, S., & Faulkner-Bond, M. (2014). Validity evi-
Oren, C., Kennet-Cohen, T., Turvall, E., & Allalouf, dence based on test content. Psicothema, 26(1),
A. (2014). Demonstrating the validity of three gen- 100–107.
eral scores of PET in predicting higher education Sireci, S., & Padilla, J. (2014). Validating assessments:
achievement in Israel. Psicothema, 26(1), 117–126. Introduction to the special section. Psicothema,
Padilla, J., & Benítez, I. (2014). Validity evidence 26(1), 97–99
based on response processes. Psicothema, 26(1), Urbina, S. (2014). Essentials of psychological testing
136–144. (Vol. 4, 2nd ed.). Hoboken, NJ: John Wiley & Sons.
CHAPTER
In the previous chapters, we described the methods and sources of assessment data and
discussed important information about statistical and measurement concepts. It will be
important for you to keep the previous chapters in mind as you move into the next area of
assessment training. Remember that everything we have covered thus far relates to the
practice of assessment and that you should be working to integrate the information into a
comprehensive framework of counseling assessment. In order to illustrate the primary
concepts in this chapter, we will reiterate some information from previous chapters. We
believe that repetition will help reinforce your learning of difficult concepts. Before we
proceed to the next section of the textbook, which focuses on specific areas of assessment
(e.g., intelligence, achievement, aptitude, career, personality), we need to address the pro-
cess of selecting, administering, and scoring assessment instruments and interpreting
assessment results.
instruments.
JJ Describe the process of evaluating assessment instruments or strategies.
Needed?
Cumulative folder of student Yes No
Birth date __________ __________
Family information __________ __________
Record of attendance __________ __________
Permanent report card __________ __________
Academic records from other schools __________ __________
Attendance records __________ __________
State achievement test results __________ __________
Other achievement test results __________ __________
College entrance exam results __________ __________
Health data and records __________ __________
Disciplinary actions __________ __________
Functional behavior assessment __________ __________
Behavioral intervention plans __________ __________
Individualized Education Program __________ __________
Class records
Grades on tests and assignments __________ __________
Progress in reading __________ __________
Reading level __________ __________
Competencies mastered __________ __________
Samples of work __________ __________
Anecdotal observations __________ __________
Notes from parent conferences __________ __________
ounselors should also consider assessment methods that are best suited to the client (e.g.,
C
paper-and-pencil test, computer assessment) and to the setting (Whiston, 2009). In addi-
tion, counselors need to choose assessment methods that they are qualified to administer
and interpret.
Reference Sources Probably the best source of information about commercial tests is the
Mental Measurements Yearbook (MMY) series. Now in its 19th edition, the yearbook was
founded in 1938 by Oscar K. Buros to provide evaluative information needed for informed
test selection. Published at the Buros Institute of Mental Measurement at the University of
Nebraska–Lincoln, the yearbooks provide descriptive information for over 2,800 tests,
including the publisher, prices, population for whom the test is appropriate, and psycho-
metric information (i.e., reliability, validity, norming data). As a distinctive feature, the
MMY also includes critical reviews by test experts as well as a list of reviewer’s references.
The MMY may be found in hardcopy at academic libraries, electronically through the
EBSCO and OVID/SilverPlatter databases, and at the Buros website (buros.org; for a fee,
test reviews are available online exactly as they appear in the MMY series).
Tests in Print (TIP) is another Buros Institute publication. Now in its eighth edition, it
is a comprehensive listing of all commercially available tests written in the English lan-
guage. It includes the same basic information about a test that is included in the MMY, but
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 135
it does not contain reviews or psychometric information. The TIP guides readers to the
MMY for more detailed information about tests. The TIP can be found in hardcopy in aca-
demic libraries and electronically through the EBSCO database.
Other widely used reference sources include Test Critiques and Tests, both published
by Pro-Ed, Inc. Tests provides a comprehensive listing of all tests available in the English
language. It provides descriptions of tests, but does not contain critical reviews or psycho-
metric information; this information can be found for selected instruments in Test Critiques.
Test Critiques, designed to be a companion to Tests, contains a three-part description of
each test: Introduction, Practical Application/Uses, and Technical Aspects. It also pro-
vides psychometric information and a critical review of each test.
Publishers and Website Catalogs All of the major test publishers have websites that
include online catalogs of their products. These online catalogs can be a good source of
descriptive information about the most recent editions of tests. The information may
include the cost of materials and scoring, types of scoring services, and ancillary materials
available through the publisher. It is important to remember that publishers’ websites are
marketing tools aimed at selling products; they do not provide all the information needed
to fully evaluate a test. Counselors can also order a specimen set of the test from the pub-
lisher, which includes the manual, test booklet, answer sheet, and scoring key. Table 7.2
provides a list of some of the major U.S. test publishers and their website addresses. Many
publishers also provide hardcopy versions of their catalogs.
Manuals Test manuals, available from test publishers, provide administrative and techni-
cal information about a test. Indeed, because most of the information one would want
about a particular psychological or educational test can be found in test manuals, they are
Yes No
1. Test manual available at time of publication of test __________ __________
2. Test manual is complete, accurate, and clearly written __________ __________
3. Rationale and uses of test discussed __________ __________
4. User cautioned about possible misuses __________ __________
5. Norming population described __________ __________
6. Reliability evidence provided __________ __________
7. Validity evidence provided __________ __________
8. Special qualifications of users stated __________ __________
9. Bibliography of research and studies on test presented __________ __________
10. Test manual updated and revised when new edition __________ __________
of test was published
11. Test administration conditions and modes explained __________ __________
12. Interpretive aids provided for test takers __________ __________
13. Test interpretation easy to understand __________ __________
14. Evidence supporting the accuracy of computer-generated __________ __________
interpretations
15. Automated test interpretation service available __________ __________
16. Rationale presented and conceptualized if cutoff __________ __________
scores are given
17. Technical information is presented about appropriateness __________ __________
of the instrument for diverse groups, e.g., age, grade level,
language, cultural background, gender
18. Method of recommended linguistic modification __________ __________
described in detail
the primary source to consult in the test selection process. Tests manuals should be com-
plete, accurate, current, and clear. The manual should provide information about test spec-
ifications (i.e., purpose of the test, definition of constructs measured, description of the
population for which the test is intended, information about interpretation) and general
standards for the preparation and publication of test documentation: manuals and user
guides (American Educational Research Association (AERA), American Psychological
Association (APA), & National Council on Measurement in Education (NCME), 2014).
Figure 7.2 shows a checklist summary of those standards.
Internet Resources The Internet provides a number of ways to search for test informa-
tion. A test locator or test collection allows users to search for information about instru-
ments from a variety of sources. Access to test locators is available through several
sponsored websites, including the Buros Institute (buros.unl.edu/buros/jsp/search.jsp),
the Educational Testing Service (ETS; ets.org/testcoll), and the ERIC Clearinghouse on
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 137
alidity). For most published formal assessment instruments, the test manual is the pri-
v
mary source of evaluative information. However, prior to selecting an assessment instru-
ment for close evaluation, you may elect to read published test reviews as a primary
evaluation. This section presents several questions to guide professionals in evaluating
and selecting a formal assessment instrument or strategy. As you read, you should recall
information from previous chapters and attempt to apply that information to the infor-
mation here. By going through these questions and applying the information you have
already learned, you will improve your ability to successfully evaluate an assessment
instrument.
What is the purpose of the instrument? Who is the intended population? The first
question to ask when investigating an instrument is whether the purpose of the
instrument is appropriate for the counselor’s needs. If an instrument does not meas-
ure the behavior or construct of interest, then there is no need to further evaluate the
instrument. Another factor is the extent to which an instrument is appropriate for the
individual(s) being assessed. The manual should clearly state the instrument’s rec-
ommended uses as well as describe the population for which it was intended.
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 139
What is the makeup of the norm group? This question addresses whether the sample
of individuals (i.e., norm group) used during instrument development represents the
population of potential examinees. The norm group must reflect the population from
which it is drawn. This is particularly important with norm-referenced instruments,
because it is the norm group’s performance to which an examinee’s performance is
compared and by which it is interpreted. Counselors should evaluate the suitability
of the norm group in terms of representativeness, the year that the sample was gath-
ered, and the size of the norm group (see Chapter 4).
Are the results of the instrument reliable? Reliability addresses the degree to which
scores are consistent, dependable, and stable. Any fluctuation in scores that results
from factors irrelevant to what is being measured is called measurement error. There
are several methods of estimating the reliability applicable to the various sources of
measurement error, including test-retest, alternate forms, internal consistency, and
interrater reliability (see Chapter 5).
Do the instrument’s results have evidence of validity? Validity refers to whether the
claims and decisions that are made on the basis of assessment results are sound or
appropriate. Evidence of validity can be obtained by systematically examining the
content of the instrument, by considering how the instrument’s scores relate to other
similar instruments, or by considering the association between scores and other vari-
ables related to the construct being measured. The manual or published research
studies are sources of validity information (see Chapter 6).
Does the instrument’s manual provide clear and detailed instructions about admin-
istration procedures? All administration specifications should be fully described in
the manual, including instructions and time limits. Depending on the type of instru-
ment, other administration issues may be presented, including the use of reference
materials and calculators, lighting, equipment, seating, monitoring, room require-
ments, testing sequence, and time of day.
Does the manual provide sufficient information about scoring, interpreting, and
reporting results? The instrument’s manual should present information about the
materials and resources available to aid in scoring the instrument, such as scoring soft-
ware, mail-in scoring services, or scoring keys and templates. Information should also
be provided about the methods used to interpret and report results. Many instrument
developers provide computer-generated profiles and narrative reports based on test
results. Counselors need to determine whether profiles are clear and easy to under-
stand and whether narrative reports provide accurate and comprehensive information.
Is the instrument biased? An instrument is considered biased if differences in results
are attributable to demographic variables (e.g., gender, race, ethnicity, culture, age,
language, geographic region) rather than to the construct being measured. Instru-
ment developers are expected to exhibit sensitivity to the demographic characteris-
tics of examinees and to document appropriate steps taken to minimize bias (see
Chapter 6).
What level of competency is needed to use the instrument? In the past few decades,
increasing concerns about the possibility of test misuse have led professional organiza-
tions to disseminate information about test user competencies and qualifications. To
use certain assessment instruments, counselors need education, training, and
140 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
Time Required to Administer the Instrument: The time it takes to give a test may
be a factor. In a school setting, can the test be administered during the regular class
period, or does the examiner need more time? In an outpatient mental health coun-
seling center, can an instrument be completed by the client within the traditional
50-minute time period for individual counseling? How many individually admin-
istered tests can an examiner reasonably schedule during a day? We know that the
longer the test, the more reliable the results, but how much reliability is necessary for
a particular purpose might be a concern if it significantly extends the time of testing.
Ease of Administration: Administration of tests is discussed in more detail later in
the chapter, but it should be noted here that there are different procedures for test
administration. Some require extensive training to administer and score; others
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 141
do not. Some tests are more difficult to administer because they have a number
of individually timed subtests and elaborate instructions for the test taker. The
test users should read through the test manual and evaluate the difficulty of test
administration.
Ease of Scoring: How an instrument is scored is an important issue, because it
is possible for scoring to take more time than administering a test. Tests may be
hand scored, computer scored, or sent to the publisher for scoring. In most cases,
hand-scored instruments are more time-consuming; if the instrument’s publishers
provide scoring templates or answer keys, then hand-scoring may be somewhat
quicker. Some instruments also require considerable training and experience on
the part of the examiner.
Ease of Interpretation: Results are not useful unless they are interpretable, and
both test developers and examiners are expected to provide explanation. Many
instruments provide computer-based interpretations based on test results. Instru-
ments may also provide detailed sections in the manual, or separate manuals, that
focus specifically on interpretation. The better tests have sample or illustrative
case studies. Test users should also check to see whether an instrument provides
computer-generated narratives, profile or summary sheets, or other materials to
guide the test takers in understanding the results.
Format: Just as in the evaluation of other printed material, test users should con-
sider factors such as size of print, attractiveness of format, and clarity of illustra-
tions. Some tests are attractively designed and utilize a variety of colors and print
sizes. Some, however, have print that is too small or dark paper that is hard to
read. Some tests may use double columns to cut down on necessary eye move-
ments; others spread the items across the whole page. The test user should think
of the test taker when evaluating such characteristics. An attractive format may
provide more valid results.
Readability: The readability of the test is an important factor. In general, unless
the intent is to test the reading level or verbal facility of the test taker, the reading
level should be kept simple so that the desired construct is measured rather than a
reading comprehension factor. Even a test presented in audio format should have
an appropriate reading level and vocabulary to ensure comprehension.
Cost of the Instrument: Cost is an important feature when considering a particular
instrument, because most schools and agencies have limited budgets. Purchasing a
commercially available instrument can become quite expensive and may require buy-
ing the manual, test booklets, answer sheets, scoring templates, computer software, or
ancillary materials. Some test publishers now lease tests, especially achievement test
batteries, rather than requiring the user to purchase them. Also, some test booklets are
reusable, requiring the examiner to purchase only answer sheets for another testing.
Computer software might increase the initial outlay for an instrument, but may
become more cost effective per test administration if the instrument is used frequently.
There are also scoring services to which most of the major tests and test batteries can
be sent for scoring and interpretation—at an additional cost, of course.
Using an Instrument Evaluation Form One way to make the process of evaluating and
selecting assessment instruments easier is to use an instrument evaluation form (see
Figure 7.3). Such a form provides a convenient way to document important aspects of an
142 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
instrument and increases the likelihood that important facts will not be overlooked. When
utilizing an instrument review form, counselors are encouraged to obtain information
from both the instrument manual and a reference source, preferably one that provides
reviews (such as the MMY). An important aspect of using an instrument evaluation form
is identifying the instrument’s strengths and weaknesses. This aids in the final process of
integrating information and making a selection decision. It is imperative that counselors
have the requisite skills to evaluate a test before using it in practice. Exercises 7.1 and 7.2
provide an opportunity to practice the test review process and to test your knowledge.
Exercise 7.1
Conducting a Test Review: Beck Depression Inventory-II
You are a mental health counselor cur- help you in making an accurate diagno-
rently working in an outpatient coun- sis. You currently work 40 hours each
seling clinic with adult clients. You are week at the clinic and see about 20 clients
considering whether or not to adopt the each week for hour-long individual
Beck Depression Inventory-II (BDI-II; counseling sessions. The rest of your
Beck, Steer, & Brown, 1996) in your prac- time is devoted to treatment planning,
tice. Many of your clients seem to suffer writing progress notes, staff meetings,
from depression, and you think that and supervision.
using an instrument that specifically Information about the BDI-II is
assesses depressive symptoms would located in Chapter 13. After reviewing the
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 143
information about the BDI-II, answer exer- the BDI-II item content was vali-
cise questions 1–7. dated against.
6. Convergent validity: Correlations
Norm Group were examined between the BDI-II
Conduct a search of the Mental Measure- and other instruments. Consider the
ments Yearbook (MMY) for reviews of the various other tests that were used to
Beck Depression Inventory II. As you validate the BDI-II
read the reviews, make note of the follow- 7. Discriminant validity: Several tests
ing pieces of information related to the were performed to determine
instrument. expected differences between the
BDI-II and certain scales on other
1. A clinical sample was used in the instruments. Search the MMY
development of the BDI-II, In the reviews for information related to
reviews, note the number of partici- discriminant validity.
pants, geographic locations of the
sample, demographic data (e.g., After reviewing all of the information in
number of men vs. women, average items 1–7, answer the exercise questions
age, racial/ethnic distributions, and that follow.
any other relevant data about the Exercise Questions:
participants.
2. A nonclinical sample was also used 1. Describe and evaluate the norm
in the development of the BDI-II, In group. Do you think it is representa-
the reviews, note the number of par- tive? Do you think the norm group is
ticipants, geographic locations of the current? Do you believe the size of
sample, demographic data (e.g., the norm group was large enough?
number of men vs. women, average Are the samples related to the popu-
age, racial/ethnic distributions, and lation you intend to use the test
any other relevant data about the with? Explain.
participants. 2. Describe and evaluate each method
used to estimate reliability. Does the
Reliability reliability evidence support a deci-
sion to use the instrument? Explain.
3. Internal consistency was one of the 3. Describe and evaluate each type of
methods used to establish reliability validity evidence.
of the BDI-II. Make note in the 4. Does the validity evidence support a
reviews of the data related to inter- decision to use the instrument? Explain.
nal consistency. 5. Describe the practical aspects of the
4. Test-retest reliability: Test-retest instrument, focusing on issues
reliability was the other method of related to time required for adminis-
establishing reliability for the BDI-II. tration, ease of administration, and
Make note in the reviews of the data ease of scoring.
related test-retest reliability. 6. Summarize the strengths and weak-
nesses of the inventory. Based on
Validity
your review of the BDI-II, would you
5. Content validity: Examine the test adopt this instrument? Explain your
reviews to determine the criteria that answer.
144 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
Exercise 7.2
Conducting a Test Review: Coopersmith Self-Esteem Inventory
You are a middle school counselor starting For interpretation, high scores correspond
a new group aimed at enhancing self- to high self-esteem. A high Lie Scale score
concept in students. You work at a large suggests defensiveness (indicates that the
urban school with a culturally and linguis- test taker attempted to respond positively
tically diverse population of students. You to all items).
have heard about the Coopersmith Self-
Esteem Inventory (SEI), but you’re not sure Technical Information
if it’s the appropriate instrument to use. Norm group: The SEI was adminis-
After reviewing the information tered to 643 public school children in
ahead about the Coopersmith SEI, answer grades 3 through 8. The sample con-
the questions that follow. sisted primarily of students from the
lower and middle upper socioeco-
Description:
nomic ranges. The test manual stated
The Coopersmith SEI (Coopersmith, 1981) that “a considerable number of Span-
measures evaluative attitudes toward the ish surnamed and Black children
self in social, academic, family, and per- were included in the sample.” The
sonal areas of experience. Coopersmith manual strongly recommends that
defined self-esteem as a judgment of wor- users develop local norm groups.
thiness that is expressed by the attitudes
an individual holds toward his or her self. Reliability
Coopersmith believes that self-esteem is Test-retest: The test-retest reliability
significantly associated with effective coefficient after a 5-week interval
functioning, such as school performance. (with a sample of 30 fifth graders)
Each questionnaire presents was .88. Test-retest reliability after a
respondents with generally favorable or three-year interval (with a sample of
generally unfavorable statements about 56 public school children) was .70.
the self, which they indicate as Like Me or
Internal consistency: Studies
Unlike Me. The School Form is a 50-item
reported KR20 coefficients ranging
inventory designed for 8- to 15-year-old
from .87 to .92 on scores for school
children. It provides a Total Self Score as
children in grades 4 to 8.
well as scores on four subscales: General
Self (Gen), Social Self/Peers (Soc), Home/ Alternate forms: A study comparing
Parents (H), and School/Academic (Sch). the SEI to a Canadian version of the test
The School Form is accompanied by an (using a sample of 198 children in third
eight-item Lie Scale to assess defensive- through sixth grades) found correlation
ness. The School Short Form is comprised coefficients ranging from .71 to .80.
of 25 items from the School Form. The
Validity
Adult Form is an adaptation of the School
Short Form for individuals over 15 years Content validity evidence: Most of
of age. the items on the SEI School Form
Administration time rarely exceeds were adapted from scale items used
10 minutes. The instrument can be hand by Rogers and Dymond (1954) in
scored in a few minutes via scoring keys. their classic study of nondirective
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 145
Pretesting Procedures
__________ Review manuals and materials.
__________ Read and practice the directions for administering the test.
__________ Schedule room and facilities.
__________ Orient client(s) about the purpose of testing.
__________ Send out notice to remind client(s) of testing time.
__________ Get informed consent if needed.
__________ Identify any special materials or tools needed.
__________ Have testing materials on hand.
__________ Highlight directions in the manual.
__________ Decide on order of administration of tests.
__________ Decide on procedures for collection of test materials.
__________ Prepare answers for specific questions clients might ask.
Before Administration
The first major responsibility of the counselor prior to administering the instrument is to
know all about the instrument. Counselors should review the manual, forms, answer
sheets, and other materials. They should also be familiar with the content of the instru-
ment, the type of items, and the directions for administering the test. One of the best ways
to become familiar with an assessment instrument is to follow the procedures and actually
take the test.
For many instruments, there are management tasks that must be accomplished prior
to the day of administration. For example, when administering a group achievement test,
it is necessary to secure the appropriate number of tests as well as answer sheets and other
test materials. Other pretesting procedures may include scheduling the date for
148 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
a dministering the test; scheduling rooms or facilities; ensuring accurate numbers of book-
lets, answer sheets, pencils, and any other needed materials; arranging materials for distri-
bution; and so on.
Examiners must also obtain and document informed consent from examinees orally or in
writing prior to the assessment (American Counseling Association (ACA), 2014). Clients
must be informed about the nature and purpose of the evaluation and of information con-
cerning confidentiality limits and how the security of test results will be maintained (Urbina,
2014). If the client is a minor, then it is necessary to get parental consent before the assess-
ment process begins. Furthermore, the Code of Ethics of the National Board for Certified
Counselors (NBCC, 2013) states that prior to administering assessment instruments or tech-
niques counselors must provide specific information to examinees so that the results may be
put in the proper perspective with other relevant factors. Explaining why an instrument is
being administered, who will have access to the results, and why it is in the test taker’s best
interest to participate in the assessment process can help ensure cooperation (Graham, 2011).
Examiners can provide orientation sessions that can cover the following topics:
1. Purpose of the assessment instrument
2. Criteria used for selecting the assessment instrument
3. Conditions under which the instrument is to be taken
4. Range of skills or domains to be measured
5. Administrative procedures and concerns (e.g., group or individual administration,
time involved, cost)
6. Types of questions on the instrument and an overview
7. Type of scoring, method, and schedule for reporting results to the test taker
Many standardized tests provide sample items and an overview of the test. The examiner
should be sure that all who are going to take a given test have had specific practice with
sample problems or have worked on test-taking skills prior to the test. This requirement is
especially appropriate for aptitude, achievement, and ability testing.
When administering an instrument to a large number of clients or students, the coun-
selor will need help from other individuals, such as teachers, administrators, or counse-
lors. These assistants must be trained on the instrument; this includes a general overview
of the instrument and preferably some practice giving and taking it. Hands-on experience
helps in identifying some of the types of problems that might arise—for example, what to
do with clients or students who finish early. All administrators need to know the guide-
lines for answering questions about the test and the importance of following standardized
procedures in administering the test. In order to make determinations about the test and
its appropriateness for certain clients, counselors should have a comprehensive under-
standing of validity and reliability. Exercise 7.3 provides an opportunity to practice an
evaluation of validity and reliability for a pair of instruments. Review the information and
then complete the exercise questions.
During Administration
When it’s time to administer an assessment instrument, the counselor may begin with a
final check to see that all is in order. For group tests, examiners need to check on materials,
lighting, ventilation, seating arrangements, clear work space for examinees, sharpened
pencils, a “Do Not Disturb” sign, and so on. Individual tests should be administered in a
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 149
Exercise 7.3
Examining Instrument Validity and Reliability
Concurrent validity evidence: SEI scores with the SEI General Self subscale and the
correlated with the SRA Achievement Lie Scale scores at .35 and .39, respectively.
Series and the Lorge Thorndike Intelli- Convergent validity Evidence: Correlation
gence Test at .33 and .30, respectively. between SEI scores and the California Psy-
Predictive validity evidence: Reading chological Inventory Self-Acceptance
Gifted Evaluation Scale (a measure of Scale was .45.
reading achievement) scores correlated
quiet, comfortable place. One of the most important tasks on standardized instruments is
to deliver verbatim instructions given in the test manual and to follow the stated sequence
and timing. Any deviation may change the nature of the tasks on the instrument and may
negate any comparison of results with those of the norming group. There are numerous
ways that assessment instruments can be administered to a client, each with distinct
advantages and disadvantages. Regardless of which mode or format is used, counselors
are responsible for following specified administration procedures.
The counselor also needs to establish rapport with the examinees. For some, the
assessment process may be a new and frightening experience; they may feel fear, frustra-
tion, hostility, or anxiety. When individually administering an instrument, the counselor
150 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
can assess these emotional and motivational factors and positively support the test taker.
In group administration, it is harder to establish rapport, but the examiner can be warm,
friendly, and enthusiastic. The goal is for the results to give a valid picture of the attrib-
utes measured, so the examiner should encourage the examinees to do their best on each
task. Examiners should also recognize the need to be genuine and to understand and
recognize personal biases in order to be both positive and objective. One way this is done
is by listening carefully and observing nonverbal cues. Impartial treatment of all those
being assessed is essential.
Throughout the course of administration, the counselor must be alert to the unique
problems of special populations. Young children and individuals with disabilities may
need shorter test periods and perhaps smaller numbers in the testing group. The exam-
iner may have to administer tests individually or make special provisions for visual,
auditory, or perceptual-motor impairments, being sure to record any deviation from
standardized administrative procedures. Many assessment instruments are designed to
test people with disabilities or give suggested procedures to accommodate various dis-
abilities. The test administrator must also carefully observe what is going on during the
process of administering the instrument. The examiner should record any test behavior
or other critical incidents that may increase or reduce an individual’s opportunity to
perform to capacity. Some instruments have observation forms or rating sales on which
an examiner can record an individual’s test behavior (see Figure 7.5). Problems in test
administration are often the result of inadequate preparation for test taking. The aware-
ness or orientation phase is an important element in helping to alleviate response-set
problems, anxiety, and tension. However, certain problems present themselves only
during the testing situation. Some of these, with possible solutions, are detailed in
Table 7.6.
Attention 1 2 3 4 5
Low High
Response time 1 2 3 4 5
Slow Quick
Activity level 1 2 3 4 5
Passive Active
Security 1 2 3 4 5
III at ease Calm and collected
Anxiety 1 2 3 4 5
Low High
Relationship to examiner 1 2 3 4 5
Poor Good
Need for reinforcement 1 2 3 4 5
Low High
Task orientation 1 2 3 4 5
Gives up Sticks to task
Reaction to failure 1 2 3 4 5
Poor Good
Interest in test 1 2 3 4 5
Low High
After Administration
Once the administration is completed, the counselor may have several more tasks to attend
to. For example, when administering a group achievement test, the counselor may need to
collect materials according to a predetermined order, counting the test booklets and answer
sheets and arranging them all face up. In addition, everything should be put back in the
testing kit in the proper way so that it is ready for future use. The counselor should also
take time immediately to record any incident that might invalidate scores.
scoring process and instructions and conversion charts for transforming raw scores into
standard scores. Hand scoring is neither efficient nor cost-effective given the time it takes
to score the instrument, the need for qualified scorers, and the propensity for errors. Com-
puter scoring involves inputting test results or scanning an answer sheet into a software
program, which then automatically generates test results. Compared to hand scoring,
computer scoring is typically easier to perform, less time-consuming, requires scorers with
less training, and produces fewer scoring errors. However, we recommend that you under-
stand the process of scoring and the information related to standard scores in order to
develop competency in the interpretation of computer-scored assessment instruments.
Some hand-scored instruments demand that the scorer judge the degree of correct-
ness of the response or compare the responses to standards provided. For example, essay
questions on many college placement examinations are scored using a holistic scoring
procedure. The raters have model answers that have been given certain weights, and they
compare the examinee’s essays to these. Such raters are asked to assess answers on a
4-point scale and to make an overall rating, or holistic judgment, rather than assign a cer-
tain number of points to each possible component of the answer.
Because rubrics rely heavily on the subjective judgment of the individual scoring the
assessment, it is important to consider differences in scorers as a potential source of scor-
ing errors. Interrater reliability should be estimated to determine how consistently scorers
implement the scoring rubric. One way to improve the reliability of a scoring rubric is to
have at least two individuals score the instrument; if there are discrepancies, then the
results are checked by a third reviewer. Workshops for scoring and frequent consistency
checks often improve reliability of the scoring, and renewal sessions for readers allow
them to review responses from earlier tests. Advance placement tests and many state
assessment tests use procedures similar to these to maintain scoring consistency.
Scoring Errors
A key issue in scoring assessment instruments is scoring errors, which can affect significantly
how test results are interpreted. Scoring errors occur frequently, regardless of who is scoring
the instrument or the scorer’s level of experience with testing. Allard and Faust (2000) found
that 20 (13.3%) out of 150 administrations of the Beck Depression Inventory, 56 (28.7%) of 300
administrations of the State-Trait Anxiety Inventory, and 22 (14.7%) of 150 administrations
of the MMPI-II had scoring errors of some kind. Even test takers make scoring errors: Simons,
Goddard, and Patton (2002) found that scoring errors by test takers ranged from approxi-
mately 20% to 66%. Typical scoring errors include assignment of incorrect score values to
individual responses, incorrectly recording responses, incorrectly converting raw scores to
derived scores, and making calculation errors. Instrument developers have certain responsi-
bilities in making clear to the examiner how the instrument is to be scored. In addition, the
examiner has specific responsibilities to ensure that the test is scored correctly.
Computer scoring helps reduce scoring errors, but errors still occur if data is inputted
incorrectly. If optical scanners are used to input data, then answer sheets must be carefully
examined for incomplete erasures or other such problems prior to the scanning. When
hand scoring instruments, errors can be reduced if scorers understand the absolute impor-
tance of accuracy and if procedures are instituted to regularly monitor the required calcu-
lations and score transformations (Urbina, 2014).
2. Ensure the accuracy of the assessment results by conducting reasonable quality con-
trol procedures before, during, and after scoring.
3. When using a new test, perform trial or practice runs to establish competency prior
to administering to a client.
4. Create test-specific standards.
5. Minimize the effect on scoring of factors irrelevant to the purposes of the assessment.
6. Inform users promptly of any deviation in the planned scoring and reporting service,
or schedule and negotiate a solution with users.
7. Provide corrected score results to the examinee or the client as quickly as practicable
should errors be found that may affect the inferences made on the basis of the scores.
8. Protect the confidentiality of information that identifies individuals as prescribed by
state and federal laws.
9. Release summary results of the assessment only to those persons entitled to such
information by state or federal law or to those who are designated by the party con-
tracting for the scoring services.
10. Establish, where feasible, a fair and reasonable process for appeal and rescoring the
assessment.
Testing Practices in Education (2004) states that test users should avoid using a single test
score as the sole determinant of decisions about test takers. Test scores should be inter-
preted in conjunction with other information about individuals.
Computer-based assessment instruments often provide computer-generated reports
or narratives. These reports are canned interpretations that the computer generates when
certain test scores are obtained (Butcher, 2012). The reports may contain very complex and
detailed statements or paragraphs that are to be printed out. It’s important that counselors
not view computer-generated reports or narratives as standalone interpretations. Comput-
ers are unable to take into account the uniqueness of the test taker and incorporate impor-
tant contextual elements, such as a client’s personal history, life events, or current stressors.
Therefore, computer-generated reports or narratives are considered broad, general
descriptions that should not be used without the evaluation of a skilled counselor. If a
counselor chooses to use computer-generated reports, it is the counselor who is ultimately
accountable for the accuracy of interpretations.
Interpreting assessment instruments requires knowledge of and experience with the
instrument, the scores, and the decisions to be made. The Responsibilities of Users of Stand-
ardized Tests document (Association for Assessment in Counseling, 2003) identifies several
factors that can impact the validity of score interpretations. These include the following:
• Psychometric Factors Factors such as the reliability, norms, standard error of meas-
urement, and validity of the instrument can impact an individual’s scores and the
interpretation of test results.
• Test-Taker Factors The test taker’s group membership (e.g., gender, age, ethnicity,
race, socioeconomic status, relationship status) is a critical factor in the interpretation
of test results. Test users should evaluate how the test taker’s group membership can
affect his or her test results.
• Contextual Factors When interpreting results, test users should consider the rela-
tionship of the test to the instructional program, opportunity to learn, quality of the
educational program, work and home environment, and other factors. For example,
if the test does not align to curriculum standards and how those standards are taught
in the classroom, then the test results may not provide useful information.
Summary
This chapter presented information relevant to such as knowledge of and training in the instru-
selecting, administering, scoring, and interpret- ment being administered. The counselor must
ing assessment results. Counselors need to be be aware of the many tasks that occur before,
aware of the steps involved in selecting appro- during, and after the administration process.
priate assessment instruments, which include Studies have shown that examiners make
identifying the type of information needed, numerous errors in scoring assessment instru-
identifying available information, determining ments. Consequently, a number of guidelines
the methods for obtaining information, search- and standards address scoring procedures. The
ing assessment resources, and evaluating and scoring of assessments should be conducted
selecting an assessment instrument or strategy. properly and efficiently so that the results are
How an instrument is administered can reported accurately and in a timely manner.
affect the accuracy and validity of the results. To accurately interpret instrument results,
Counselors must have important competencies, counselors need to be well informed about the
156 Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results
various types of scores, such as percentiles, also consider any major differences between the
standard scores, and age and grade equivalents. norm group and the test takers.
When interpreting scores, counselors should
Suggested Activities
1. Devise a checklist to use to evaluate assessment Scenario 3
instruments. You are administering a symptom inventory to a cli-
2. Critically review an assessment instrument in ent in an outpatient counseling center. After reading
your field. Use the guidelines for evaluating and the instructions, you leave the client alone to com-
selecting instruments presented in this chapter to plete the inventory. After the client has completed the
organize your review. instrument and left the counseling center, you look
3. Find a test review in the MMY and summarize over his results and notice that he has left a large
the review. What strengths and weaknesses do number of items unanswered.
the reviewers emphasize?
Scenario 4
4. Read the manual for administration for a widely
You are administering a test to a group and have read
used test, such as the SAT or the Graduate Record
the instructions to the individuals and gotten them
Examination. Interview an individual who is
started on the test. Five minutes into the test, one of
responsible for administering the test. What kinds of
the test takers raises his hand, asks a question loudly,
problems has the person encountered in administer-
and disrupts others who are taking the test.
ing the test? How adequate were the directions in
the manual for handling those problem situations? 6. Read the case study and answer the questions
5. Discuss the following four scenarios: that follow:
A company administered a basic skills test to aid
Scenario 1
in selecting new employees. The assistant to the
You are scheduled to administer a diagnostic achieve-
division supervisor administered the test.
ment test to a 6-year-old child. You usually have no
Because the assistant administering the test was
problem establishing rapport, but when you get into
often interrupted by telephone calls and minor
the testing room, the child says to you, “I don’t like
crises, he paid little attention to the applicants
you. I am not going to take any tests for you!” The
taking the test. He estimated the time limit
child puts her hands over her ears.
allowed for taking the test.
Scenario 2 a. What steps could be taken to ensure that the
You are administering a state achievement test to a test is administered properly and fairly to all
large group of students, and you see evidence of a applicants?
student cheating. b. What other procedures would you implement?
Chapter 7 • Selecting, Administering, Scoring, and Interpreting Assessment Results 157
References
Allard, G., & Faust, D. (2000). Errors in scoring objec- Schinka, & W. F. Velicer (Eds.), Handbook of psy-
tive personality tests. Assessment, 7(2), 119–131. chology (pp. 165–191). Hoboken, NJ: John Wiley &
American Counseling Association (ACA). (2014). Sons.
ACA Code of Ethics. Alexandria, VA: Author. Coopersmith, S. (1981). Coopersmith Self-Esteem Inven-
American Educational Research Association (AERA), tories: Manual. Menlo Park, CA: Mind Garden, Inc.
American Psychological Association (APA), & Graham, J. R. (2011). MMPI-2: Assessing personality
National Council on Measurement in Education and psychopathology (5th ed.). New York, NY:
(NCME). (2014). Standards for educational and psy- Oxford University Press.
chological testing. Washington, DC: Authors. International Test Commission (ITC). (2000). Interna-
American Psychological Association (APA). (2010). tional guidelines for test use. Retrieved from http://
Ethical principles of psychologists and code of conduct www.intestcom.org/itc_projects.htm
(Rev. ed.).Washington, DC: Author. National Board for Certified Counselors (NBCC).
Association for Assessment and Research in Coun- (2013). Code of ethics. Greensboro, NC: Author.
seling. (2003). Responsibilities of users of standard- Simons, R., Goddard, R., & Patton, W. (2002). Hand-
ized tests (RUST). Alexandria, VA: Author. scoring error rates in psychological testing. Assess-
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Man- ment, 9, 292–300.
ual for the Beck Depression Inventory-II. San Anto- Urbina, S. (2014). Essentials of psychological testing.
nio, TX: Psychological Corporation. Hoboken, NJ: John Wiley & Sons.
Butcher, J. N. (2012). Computerized psychological Whiston, S. C. (2009). Assessment in counseling
assessment. In I. B. Weiner, D. K. Freedheim, J. A. (3rd ed.). Belmont, CA: Brooks/Cole.
CHAPTER
8 Assessment of Intelligence
and General Ability
The study of intelligence dates back more than a century; over the decades, it has been
characterized by scholarly debates, research breakthroughs, and paradigm shifts about
what constitutes the nature of intelligence. It also brought about the birth of a commercial
industry that generates hundreds of millions of dollars of annual revenue (Wasserman,
2003). Assessing intelligence typically encompasses measuring one’s ability to understand
complex ideas, adapt effectively to the environment, think abstractly, learn from experi-
ence, learn quickly, and engage in various forms of reasoning.
JJ Articulate Terman’s study and its relevance in the study of human intelligence.
JJ Describe individual intelligence tests and group intelligence tests and explain the
JJ Identify and describe the major tests of intelligence, such as the Wechsler scales and the
Defining Intelligence
What is intelligence? Despite the history behind the study of intelligence, there is still a
lack of agreement on the definition of intelligence. In the book Intelligence 101, Plucker and
Esping (2014) dedicated five pages of text to 19 different definitions proffered by psychol-
ogists over the years. The reason for so many differing definitions is that intelligence is a
complex construct that involves both genetic and social learning components. As long as
theorists and researchers have been studying intelligence, there have been various defini-
tions of the concept and how to measure it. Theorists and researchers have argued over the
traits and dimensions that comprise the construct of intelligence, and literally thousands of
158
Chapter 8 • Assessment of Intelligence and General Ability 159
books, research articles, and popular essays on intelligence have been published over the
last 100 years. And yet, there remains no clearly articulated definition of the construct of
intelligence. As such, we will not tender our own definition, but will discuss the concept
from a broad perspective. There is even divergence among professionals about the term
itself: Some professionals prefer using the term general ability rather than intelligence due
to the negative connotations associated with intelligence testing (Whiston, 2012).
Part of the reason scholars have disagreed on how to define intelligence involves the
ongoing debate of whether intelligence is a single, monolithic ability. In other words, is
intelligence a general, unitary concept that governs performance on all tasks and abilities?
This notion, referred to as the general intelligence factor (i.e., g factor), was supported by
some early psychometric research, but it failed to stand the test of scrutiny in educational
settings (Mayer, 2000). Although g as a total representation of intelligence is no longer in
vogue, it is often considered part of the broader construct of intelligence along with
smaller component abilities (e.g., memory, knowledge, processing speed; Ackerman &
Beier, 2012).
Abilities are enduring characteristics of individuals, sometimes called ability traits,
because they are usually stable across time. Abilities can be seen in many domains of indi-
vidual differences, including cognitive, physical (motor skills), visual, auditory, mechani-
cal, and job related, to name a few. When related to intelligence, one refers to cognitive
abilities, which generally constitutes one’s ability to understand complex ideas, solve novel
problems, think abstractly, and engage in various forms of reasoning. In this paradigm,
intelligence is often referred to as general intellectual ability. Therefore, the next logical ques-
tion is this: What specific cognitive abilities make up general intellectual ability? Unfortu-
nately, there is little agreement among researchers as to what those abilities are and which
ones are more important, although some researchers have suggested a common taxonomy
of cognitive abilities (Flanagan & Harrison, 2012; Plucker & Esping, 2014).
New theories that emerged in the 1980s and 1990s took a completely different view
of intelligence. Some theorists refuted the existence of one or several intellectual abilities
altogether and instead favored the notion of multiple separate intelligences. Other theo-
rists, instead of conceptualizing intelligence in terms of abilities, focused on information
processing, which encompasses the cognitive processes required for people to perform
intellectual tasks, such as taking in information (attention, perception) and holding that
information (memory, recall) for further information processing (reasoning, problem solv-
ing, decision making, communication).
As you can see, the construct of intelligence and its meaning have been extensively
discussed, but a multitude of definitions remain. To explore the commonalities among
researchers’ definitions of intelligence, Sternberg and Berg (1986) published a well-known
study that compared definitions from experts in the field of intelligence who met in two
symposia held in 1921 and 1986. The authors reported that common features of defini-
tions included such attributes as adaptation to the environment, basic mental processes,
and higher-order thinking (e.g., reasoning, problem solving, decision making). In addi-
tion, they found that in 1986 more emphasis was placed on metacognition (i.e., awareness
and control of one’s cognitive processes), information processing, and cultural context.
Table 8.1 provides several well-known definitions of intelligence from theorists and
researchers over the last century.
Despite the numerous definitions, the nature of intelligence continues to be elusive.
However, the idea that some individuals are brighter or smarter than others is accepted in
160 Chapter 8 • Assessment of Intelligence and General Ability
Theories of Intelligence
Just as there are many definitions of intelligence, there are many theories of intelligence. The
first and most widely accepted approach is based on psychometrics, which reveals the struc-
ture or dimensions of intelligence through statistical procedures (such as factor analysis) that
analyze the interrelationships of scores on mental ability tests. This approach was used to
Chapter 8 • Assessment of Intelligence and General Ability 161
develop and support the concept of one overall general (g) factor of intelligence. However,
this view was by no means universally accepted among intelligence theorists. Some theorists
endorse the notion of a g factor along with specific abilities that comprise general intelligence;
some view intellectual ability as a hierarchy of general and specific abilities; and some theo-
rists diverge completely from individual intellectual abilities and focus on information pro-
cessing (i.e., the mental operations or processes associated with performing a cognitive task),
cognitive development, or multiple intelligences. Although we are unable to present all of the
many theorists’ views of intelligence, we will present an overview of a few of the more
prominent theories (Table 8.2 provides a comparison of the theories of intelligence).
Vocabulary
s
Arithmetic Reading
reasoning g comprehension
s s
Computation
s
As far as we can determine at present, the tests that have been supposed to be
saturated with the general common factor divide their variance among primary
factors that are not present in all tests. We cannot report any general common
factor in the battery of fifty-six tests that have been analyzed in the present
study. (Thurstone, 1938, p. ix)
Vocabulary
Sentence completion
Verbal-educational
Arithmetic computation
Arithmetic reasoning
Mechanical reasoning
Visualize 2-D space
Practical-mechanical-
spatial Spatial relations Visualize 3-D space
Size discrimination
Hand-eye coordination
general factor of intelligence. The second level included two broad abilities: verbal-
educational ability and practical-mechanical-spatial ability. Each broad ability was further
subdivided into specific abilities shown on the third level: verbal-educational comprises
verbal fluency and numerical ability, and practical-mechanical-spatial includes mechani-
cal ability, psychomotor ability, and spatial relations. The fourth level consists of even
more special and specific factors particular to the abilities in each of the domains above it.
Because Vernon’s model accounted for a g factor as a higher-order factor as well as other
specific factors, it was viewed as a way to reconcile Spearman’s two-factor theory (which
emphasized the g factor) and Thurstone’s multiple-factor theory (which did not have a g
element; Plucker & Esping, 2014).
Content
Products
Domains
Dimensions of Guilford’s
Theory of Intelligence
associated with intellectual ability. The model includes 180 unique intellectual factors
organized around three dimensions: operations, contents, and products (see Figure 8.3).
Operations are rules of logic or mental procedures that solve problems. Contents refers to
a particular kind of information. Products are items of information from the same content
category. According to his model, intellectual functioning involves the application of
operations to contents, which results in products. There are six types of operations: cogni-
tion, memory retention, memory recording, divergent thinking, convergent thinking, and
evaluation. Each of these operations could be applied to one of five types of contents:
visual, auditory, symbolic, semantic, or behavioral. The application of operations to these
contents results in one of six products: units, classes, relations, systems, transformations,
or implications.
Formal
operational
Concrete
operational
Preoperational
Sensorimotor
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Age in Years
design of curriculum materials and educational programs, and a number of scales have
been published to assess an individual’s stage of intellectual development.
Luria’s Model
A. R. Luria (1970) was a Russian neuropsychologist who is best known for his seminal work
on structures of the brain and the behavioral and cognitive deficits associated with various
brain lesions (i.e., areas of brain tissue that appear abnormal). Luria’s research has had con-
siderable influence on intelligence testing; for example, the Kaufman Assessment Battery
for Children (KABC-II) focuses on mental processes based on Luria’s theory. Luria’s work
involved mapping the brain’s systems and functions responsible for human cognitive pro-
cesses, especially the high-level processes associated with the intake and integration of
information and with problem-solving abilities (Kaufman & Kaufman, 2004a; Luria, 1970).
He identified three main blocks in the brain that represented the brain’s functional systems:
Block 1 is responsible for arousal, concentration, and attention. Block 2 involves the use of
one’s senses to analyze, code, and store information. Block 3 applies executive functions for
formulating plans and programming behavior; it represents the output or response center
of the brain. Although Luria distinguished the three blocks of brain functions and their
separate cognitive processes, his main emphasis was on the integration of these blocks in
order to support all cognitive activity (Kaufman & Kaufman, 2004b).
The three-stratum taxonomy suggests that at the highest level cognitive abilities con-
verge to form a general common factor (g; Jensen, 1998); the inclusion of the g is distinct
from the Cattell–Horn Gf-Gc model. As a psychometric approach, Carroll’s theory was
based on a factor analytic study involving over 480 datasets of cognitive ability variables
from psychological tests, school grades, and competence ratings.
Glr Gsm Gv Gf Gc Gq
Stratum II: Broad Abilities Long-Term Short-Term Visual Fluid Crystallized Quantitative
Gray-shaded abilities are Storage Memory Processing Reasoning Ability knowledge
measured by the KABC-II scales &
Retrieval
General
Learning Working Spatial Sequential General Mathematical
Abilities Memory Relations Reasoning Information Knowledge
Serial
Naming Perceptual Geography
Facility Integration Achievement
Oral
Word Length Production
Fluency Estimation and Fluency
FIGURE 8.5 Cattell–Horn–Carroll (CHC) model applied to the Kaufman Assessment Battery for Children
(KABC-III).
Source: Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children—Second Edition
(K-ABC-II). Circle Pines, MN: American Guidance Service. Copyright © 2004 by NCS Pearson, Inc. Reproduced
with permission. All rights reserved. “KABC” is a trademark, in the U.S. and/or other countries, of Pearson
Education, Inc. or its affiliates(s).
e xistence of a g factor (Floyd, Bergeron, Hamilton, & Parra, 2010). The Kaufman Assess-
ment Battery for Children, Second Edition (KABC-II), an intelligence test for children ages
3 to 18, is founded on the CHC model (as well as Luria’s processing theory). Figure 8.5
depicts the CHC model as it applies to the KABC-II.
170 Chapter 8 • Assessment of Intelligence and General Ability
Terman’s Study
Lewis Terman (1877–1956) is well known for his often moved ahead in the curriculum. As adults,
research on intelligence in children. In 1922, he those in Terman’s sample had the following
began a longitudinal study that sought to answer characteristics:
the following question: What kind of adults do
• More likely to earn graduate degrees and
children with high IQs become (Terman & Oden,
doctorates
1959)? He followed the lives of 1,528 “gifted”
• Earned incomes well above population
(i.e., having IQ scores of 135 or higher) children,
averages
who have been tracked for over 80 years (50
• More satisfied with their life and life’s
years after Terman’s death). The study is the
work
longest-running survey ever carried out with
• Low criminality
regard to intelligence and ability. Contrary to the
• Achieved greater marital success
stereotypes, Terman found that the gifted chil-
• Physically healthier than the average adult
dren were taller, healthier, physically better
• Lower incidence of psychological prob-
developed, and superior in leadership and social
lems (e.g., substance abuse, suicide)
adaptability. They also excelled in school and
Intelligence Tests
Intelligence tests measure a broad spectrum of cognitive abilities, such as reasoning,
comprehension, judgment, memory, and spatial ability. They help us gain an under-
standing of an examinee’s cognitive strengths and weaknesses and are considered excel-
lent predictors of academic achievement and academic success (Sattler, 2008) as well as
success in training and job performance (Muchinsky, 2008). Most intelligence tests
include items that require verbal and nonverbal reasoning and a variety of other cogni-
tive abilities. The terminology used on tests to delineate abilities is numerous and varied
and may be based on a particular theory of intelligence or factor analytic evidence. Table
8.3 presents just a few of the common types of cognitive abilities assessed in intelligence
testing. As with other types of assessment instruments, measures of intellectual ability
can be used for a number of purposes. They may be used for screening; for identification
of intellectual disabilities, learning disabilities, or giftedness; to place individuals into
specialized academic or vocational programs; or as a cognitive adjunct to a comprehen-
sive clinical evaluation, in which the main focus is on personality or neuropsychological
assessment (Kaufman, 2000, p. 453).
Intelligence tests typically yield an overall measure of global intelligence as well as
scores on subtests or groups of subtests that represent different cognitive abilities. His-
torically, the most famous index of intelligence is the IQ, or intelligence quotient. This
term came about when Lewis Terman (1916) adopted German psychologist William
Stern’s concept of a mental quotient, renaming it intelligence quotient. (Stern defined men-
tal quotient as the ratio of mental age [MA] to chronological age [CA] multiplied by 100:
IQ = MA>CA * 100.) Thus, if a child’s MA of 5 equals his or her CA of 5, then the IQ is
100 (5>5 * 100), or average intelligence. Although the formula worked fairly well for
children who have a somewhat steady pace of intellectual growth, it did not work for
adolescents and adults, because intellectual development does not steadily increase
throughout the life span; thus, CA would continue growing while MA stayed the same,
Chapter 8 • Assessment of Intelligence and General Ability 171
resulting in unreliably low IQ scores. Today, the term intelligence quotient and its original
formula are no longer used. Instead, overall test scores are usually converted to a
standard score with a mean of 100 and a standard deviation of 15 or 16, depending on the
test. Tests that still refer to IQ scores (such as the Wechsler scales and the Stanford–Binet
Intelligence Scales) are alluding to standard scores, not to actual quotients. Tests may
also provide descriptive classifications in addition to standard scores. It’s important to
remember that hypotheses or recommendations generated solely on IQ scores are not
valid without the inclusion of contextual information gathered from multiple sources,
such as interviews with the client; behavior observations; collateral information from
parents, teachers, and therapists; or other appropriate sources (e.g., information from
records or previous test data).
Intelligence tests can also be categorized based on whether they are administered to
individuals or groups. Individual intelligence tests are administered to a single individual by
a highly qualified examiner. Examiners must be specially trained to administer and score
the varied item types on individual intelligence tests; for example, many tests have items
requiring oral responses from the examinee, have set time limits for certain tasks, and
require examiners to observe qualitative aspects of a test taker’s performance. Individual
tests are considered clinical instruments and are used when intensive psychological evalu-
ation of an individual client is needed—for example, for the identification and classifica-
tion of intellectual disabilities (i.e., mental retardation), learning disabilities, or other
cognitive disorders (Flanagan & Harrison, 2012). In contrast to individual tests, group intel-
ligence tests can be administered to large groups of people either simultaneously or within
a limited time frame by a minimally trained examiner. Less training is required for admin-
istration, because most group tests only require the examiner to read simple instructions
and accurately keep time. The typical group test employs multiple-choice, true/false, or
other selected-response items to ensure uniformity and objectivity of scoring (Flanagan &
Harrison, 2012). Group tests are useful for screening to determine the need for an in-depth
172 Chapter 8 • Assessment of Intelligence and General Ability
evaluation, estimating intelligence scores for academic or vocational purposes, and esti-
mating intelligence scores for research studies. Because group tests offer the advantage of
being able to assess many individuals quickly and inexpensively, they are principally
applied in education, business, government, and military settings for screening and place-
ment purposes.
Intelligence tests can also be classified as verbal or nonverbal, or they may contain
items of both types. Verbal intelligence tests presume that examinees have a certain stand-
ard of language ability; they are able to understand language and use it to reason or
respond (Plucker & Esping, 2014). Nonverbal measures of intelligence reduce language in
directions, items, or responses. They may use oral directions, allow examinees to respond
to test items nonverbally, or use puzzles or other manipulatives. Nonverbal tests are
appropriate for special populations, such as individuals who are non-English speaking,
who are deaf or hearing impaired, or who have other types of language or physical disa-
bilities or reading problems.
The Wechsler Scales Despite the number of intelligence tests available, the Wechsler
scales continue to dominate the field; their popularity is unrivaled in the history of intel-
lectual assessment. In 1939, David Wechsler, a psychologist at Bellevue Hospital in New
Chapter 8 • Assessment of Intelligence and General Ability 173
Intelligence tests Measure a broad Present and • To determine need for more in-depth
spectrum of cognitive future evaluation
abilities • To identify intellectual disabilities,
learning disabilities, or giftedness
• For placement/selection into specialized
academic or vocational programs
• As part of a comprehensive clinical
evaluation
Achievement tests Measure knowledge and Present • To identify academic strengths and
skills learned as a result weaknesses
of instruction • For placement/selection into specialized
academic or vocational programs
• To track achievement over time
• To evaluate instructional objectives and
programs
• To identify learning disabilities
• As part of a comprehensive clinical
evaluation
Aptitude tests Measure talents or Future • To predict future performance
performance abilities • For placement/selection into specialized
prior to instruction academic or vocational programs
• For job placement decisions
• As part of a comprehensive clinical
evaluation
York, first published an individual intelligence test (i.e., the Wechsler–Bellevue Intelli-
gence Scale). Since that time, he developed a series of three intelligence scales designed for
different age groups. The Wechsler Adult Intelligence Scale (WAIS-IV) is designed to assess
the cognitive abilities of individuals aged 16 to 89 years. The Wechsler Intelligence Scale for
Children (WISC-V) is for children of ages 6 years through 17 years. As is the case when new
assessment instruments are released, there is an overlap of time between the use of the
previous version and the adoption of the new version. In preparing to become professional
counselors, you should become familiar with the information related to the WISC-IV
because it is still currently used and also explore the information related to the WISC-V
because it will be widely adopted in the coming years.
The Wechsler Preschool and Primary Scale of Intelligence (WPPSI-IV) is designed for very
young children of ages 2 years through 6 years. The Wechsler scales continue to be the
most widely used measure of intelligence the world over. A study by Camara, Nathan, and
Puente (2000) found that among intelligence tests the WAIS-II and the WISC-III were the
most frequently used by psychologists. In this section, we will discuss the WAIS-IV, the
WISC-IV, and the WPPSI-IV.
174 Chapter 8 • Assessment of Intelligence and General Ability
TABLE 8.5 Indexes and Subtests for the Wechsler Intelligence Scales (WAIS-IV, WISC-IV, WPSSI-III)
Each of the three Wechsler scales yields a Full-Scale IQ (FSIQ), index composite
scores, and subtest scaled scores (Table 8.5 lists the indexes and subtests for the Wechsler
scales). The FSIQ is considered the most representative estimate of global intellectual func-
tioning and is derived from a combination of the subtest scores. The index composite scores
measure more narrowly defined cognitive domains and are composed of the sum of vari-
ous subtest scores. Subtest scores are measures of specific abilities. The FSIQ and index
composite scores have a mean of 100 and a standard deviation of 15. The subtest scaled
scores have a mean of 10 and a standard deviation of 3.
Earlier versions of the Wechsler scales did not use index scores, but utilized a dual IQ
structure that included two IQ scores: verbal IQ (VIQ) and performance IQ (PIQ). The VIQ
score summarized an individual’s performance on subtests designed to measure vocabu-
lary, general knowledge, verbal comprehension, and working memory. The PIQ score
measured visual-motor skills, alertness to details, nonverbal reasoning, processing speed,
and planning ability. The use of the dual IQ structure appears to be changing, as the latest
edition of the WISC completely eliminated the VIQ and PIQ scores in favor of the four
index scores (see Table 8.6 for a description of the WISC-IV indexes and subtests). The
WPPSI-IV now has five index scores: Verbal Comprehension Index (VCI), Visual Spatial
Index (VSI), Working Memory Index (WMI), Fluid Reasoning Index (FRI), and Processing
Speed Index (PSI).
Interpreting scores on the Wechsler scales is very complicated and requires consider-
able education and training. Interpretation strategies, described fully in the test manuals
(The Psychological Corporation, 1997, 2002, 2003), are beyond the scope of this textbook;
however, we will review a few basic interpretive steps. The first step is reporting and
describing the examinee’s FSIQ score (i.e., the examinee’s overall level of intellectual
Chapter 8 • Assessment of Intelligence and General Ability 175
TABLE 8.6 Description of Full-Scale IQ, Indexes, and Subtests on the WISC-IV
Table 8.6 Description of Full-Scale IQ, Indexes, and Subtests on the WISC-IV (Continued)
Working Memory Index This index measures one’s ability to temporarily retain information
(WMI) in memory, perform some operation or manipulation with it, and
produce a result. Involves attention, concentration, and mental
control.
Digit Span This subtest is composed of two parts. The Digit Span Forward task
requires the individual to repeat numbers in the same order as read
aloud by the examiner. Digit Span Backward requires the individual
to repeat the numbers in the reverse order of that presented by
the examiner. This subtest measures auditory short-term memory,
sequencing skills, attention, and concentration.
Letter-Number Sequencing Measures attention, short-term memory, concentration, numerical
a bility, and processing speed. The individual is asked to read
a sequence of numbers and letters and recall the numbers in
ascending order and the letters in alphabetical order.
Arithmetic* Measures mental manipulation, concentration, attention, short-
and long-term memory, numerical reasoning ability, and mental
alertness. It requires the individual to mentally solve a series of orally
presented arithmetic problems within a specified time limit.
Processing Speed Index This index measures one’s ability to quickly and correctly scan,
(PSI) sequence, or discriminate simple visual information. Involves
concentration and rapid eye-hand coordination.
Coding This subtest measures the individual’s short-term memory, learning
ability, visual perception, visual-motor coordination, visual scanning
ability, cognitive flexibility, attention, and motivation. The individual
copies symbols that are paired with simple geometric shapes or
numbers.
Symbol Search Measures processing speed, short-term visual memory, visual-
motor coordination, cognitive flexibility, visual discrimination, and
concentration. The individual is presented with a series of paired
groups of symbols, with each pair containing a target group and a
search group. The individual scans the search group and indicates
whether the target symbol(s) matches any of the symbols in the search
group within a specified time limit.
Cancellation* Measures processing speed, visual selective attention, vigilance, and
visual neglect. The individual scans both a random and a structured
arrangement of pictures and marks target pictures within a specified
time limit.
*Supplemental subtest
ability). Examiners compare the examinee’s FSIQ to those of the general population (as
identified by the performance of the norm group, from which all scores are derived). The
population of the United States has an average intelligence score of 100. Table 8.7 identifies
the descriptive classifications for FSIQ and index composite score ranges on the Wechsler
scales. Using this table, we see that a person who has an FSIQ score of 100 is functioning
within the average range of intellectual ability. Examiners should report the FSIQ score
Chapter 8 • Assessment of Intelligence and General Ability 177
TABLE 8.7 Descriptive Classifications for FSIQ and Index Composite Scores
Percent Included
Composite Score Classification Percentile Range % in Normal Curve
130 and above Very Superior 98–99 2.2
120–129 Superior 91–97 6.7
110–119 High Average 75–90 16.1
90–109 Average 25–74 50.0
80–89 Low Average 9–24 16.1
70–79 Borderline 3–8 6.7
69 and below Extremely Low 1–2 2.2
along with the appropriate percentile rank and descriptive classification in relation to the
level of performance (e.g., average, high average).
Another step in interpretation involves interpreting the index scores. Examiners
may begin by reporting each index score, including the qualitative description and the
percentile rank. Examiners can also analyze an examinee’s strengths and weaknesses on
the index scores by identifying significant statistical differences between the index scores
(based on the normal curve). For example, an examinee may have a good verbal ability
(high VCI) but a weakness in working memory (a relatively lower score on WMI). In
order to determine if differences between index scores are significant, examiners refer to
tables located in the test manual that identify critical index score values by age group. For
example, according to the WISC-IV manual, a 13-year-old child whose VCI score is 15
points higher than his WMI score is considered statistically significant (at the .05 level).
Examiners can also identify differences between an examinee’s index scores and base rate
data, which are scores observed within the general population. Interestingly, it is possible
for index scores to be statistically different from each other and yet, because the difference
occurs so often within the population (base rate), have difference between the two scores
be meaningless. Therefore, it is very important to consider both significant differences
and base rates between an individual’s index scores when making clinical and educa-
tional planning decisions.
Interpretation can also involve describing and analyzing subtest scores within an
individual’s profile. Using a three-category descriptive classification, subtest scores may
be described as follows (understanding that subtests have a mean of 10 and a standard
deviation of 3; Sattler & Dumont, 2004):
• Subtest scaled scores of 1 to 7 indicate a relative weakness (one to three standard devi-
ations below the mean).
• Subtest scores of 8 to 12 indicate average ability (within one standard deviation from
the mean).
• Subtest scores of 13 to 19 indicate a relative strength (one to three standard deviations
above the mean).
Significant differences between the highest and lowest subtest scores are called subtest
scatter, which can be useful in specifying particular strengths and weaknesses of an
178 Chapter 8 • Assessment of Intelligence and General Ability
individual’s performance. Examiners can also analyze subtest scores using a base-
rate approach to compare an examinee’s subtest scores with the scores of the general
population.
Because of the complexity of interpretation, it is very important for the skilled exam-
iner to not only be trained in the administration of the Wechsler scales but also receive
advanced graduate training in the interpretation of examinees’ performance. Furthermore,
it’s essential to remember that test scores are only one method of assessment. Interpreta-
tions for test results are always made in conjunction with an individual’s background
information, direct behavioral observation, and other assessment results. Table 8.8 shows
sample WISC-IV test results.
Source: Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV). Copyright © 2003 by NCS Pearson, Inc.
Reproduced with permission. All rights reserved. “Wechsler Intelligence Scale for Children” and “WISC” are
trademarks, in the United States and/or other countries, of Pearson Education, Inc. or its affiliates(s).
Chapter 8 • Assessment of Intelligence and General Ability 179
Exercise 8.1
Understanding Intelligence Assessment: Jackie
Jackie is a 10-year-old Caucasian female 2. Interpret Jackie’s index composite
who was referred for an evaluation because scores, and describe what each index
of academic problems. She is currently in measures. Use the following inter-
the fifth grade. According to her mother, pretation of the FSIQ as the format
Jackie has attended the same school since for your answers:
initial enrollment in school, including pre-
The Full Scale IQ (FSIQ) is an esti-
kindergarten. She has an excellent attend-
mate of global intellectual function-
ance record, is extremely well-behaved in
ing. Jackie’s FSIQ is 97, which is
school, and has an exemplary conduct
within the average range of intellec-
record in the past. Regarding her academic
tual functioning. She scored at the
performance, Jackie is currently experienc-
42nd percentile, which means that
ing many academic difficulties and has
her score is equal to or exceeds
had many academic difficulties in the past.
approximately 42% of the national
Most recent standardized achievement test
comparison group.
results show that she scored below aver-
age in reading, math, and language. • Verbal Comprehension Index
Jackie arrived on time for the assess- • Perceptual Reasoning Index
ment and was accompanied by her mother. • Working Memory Index
Jackie appeared alert and highly moti- • Processing Speed Index
vated and appeared to put her best effort
into the testing process. 3. Based on her subtest scores, how
would you describe any relative sub-
1. Identify the descriptive classifica-
test strengths and weaknesses Jackie
tions of the FSIQ, index scores, and
may have?
subtest scaled scores.
180 Chapter 8 • Assessment of Intelligence and General Ability
Exercise 8.1
(Continued)
Composite Descriptive
FSIQ and Index Scores Score Percentile Classification
Verbal Comprehension (VCI) 104 61
Perceptual Reasoning Index (PRI) 102 55
Working Memory Index (WMI) 86 18
Processing Speed Index (PSI) 91 27
Full-Scale IQ (FSIQ) 97 42
Descriptive
Subtest Scores Scaled Score Percentile Classification
Verbal Comprehension Subtests
Similarities 11 63
Vocabulary 11 63
Comprehension 11 63
Information 10 50
Word Reasoning 10 50
Perceptual Reasoning Subtests
Block Design 9 37
Picture Concepts 10 50
Matrix Reasoning 13 84
Picture Completion 10 50
Working Memory Subtests
Digit Span 8 25
Letter–Number Sequencing 7 16
Arithmetic 10 50
Processing Speed Subtests
Coding 8 25
Symbol Search 9 37
Cancellation 8 25
Stanford–Binet Intelligence Scale, Fifth Edition One of the oldest and most widely
used intelligence tests is the Stanford–Binet Intelligence Scale, now in its fifth edition. The
Stanford–Binet, which originated from the Binet–Simon scale in 1905, was first published
in 1916 by Lewis Terman at Stanford University (Becker, 2003). The latest edition of the
Stanford–Binet was published in 2003 and is appropriate for use with individuals between
the ages of 2 and 85-plus years. It normally takes 45 to 75 minutes to administer the test.
Chapter 8 • Assessment of Intelligence and General Ability 181
The fifth edition of the Stanford–Binet was constructed on a five-factor hierarchical cogni-
tive model that includes the following scales:
• Full-Scale IQ Measures the ability to reason with both words and visual material,
the ability to store and later retrieve and apply important knowledge, broad span of
memory for both words and visual details, spatial-visualization ability, and the abil-
ity to solve novel problems with numbers and number concepts.
• Nonverbal IQ (NVIQ) Measures reasoning skills in solving picture-oriented, abstract
problems; remembering facts and figures presented in visual displays; solving
numerical problems shown in picture form; assembling visual puzzles; and remem-
bering information presented in visual space.
• Verbal IQ (VIQ) Measures general verbal reasoning ability—solving problems pre-
sented in printed or spoken words, sentences, or stories.
The FSIQ, VIQ, and NVIQ have composite scores with a mean of 100 and a standard devi-
ation of 15, and the VIQ and NVIQ each include five subtests, each having a mean of 10
and a standard deviation of 3. The subtests are organized into five cognitive factors in both
the verbal and nonverbal domains:
1. Fluid Reasoning The ability to solve novel problems, whether presented verbally or
nonverbally
2. Knowledge The accumulated fund of general information acquired at home, school,
work, and in daily life
3. Quantitative Reasoning The ability to solve problems with numbers or numerical
concepts
4. Visual Spatial Processing The ability to see patterns and manipulate visual images,
geographic shapes, or 3-D objects
5. Working Memory The ability to store information in short-term memory and then
sort or transform that information
Interpreting the Stanford–Binet involves identifying and evaluating any significant
discrepancies between the NVIQ and VIQ and among the subtest scores. Extensive tables
for evaluating score differences are available in the Stanford–Binet Technical Manual. The
FSIQ is interpreted using the descriptive categories shown in Figure 8.6 (Roid, 2003).
Average
Delayed Superior
Impaired Stanford
Gifted or
or Binet (SB-5)
Advanced
Delayed Categories
Extensive reliability and validity studies were conducted as part of the standardi-
zation of the fifth edition of the Stanford–Binet. The test norms were based on 4,800
individuals and were stratified according to a 2001 U.S. Census Bureau report on age,
gender, race/ethnicity, geographic region, and socioeconomic level. Internal consistency
reliability ranged from .95 to .98 for IQ scores, from .90 to .92 for the five Factor Index
scores, and from .84 to .89 for the 10 subtests. Correlations between the Stanford–Binet
and other intelligence tests yielded .90, .84, .82, and .83 for the Stanford–Binet fourth edi-
tion, WISC-III, WAIS-III, and WPPSI-R, respectively. Test-retest and interrater reliability
studies were also conducted and showed the stability and consistency of scoring of the
Stanford–Binet.
The Kaufman Instruments Another important name in the area of intelligence testing is
Kaufman. In the 1970s and early 1980s, husband-and-wife team Alan and Nadeen Kaufman
created a series of intelligence tests (based on their extensive clinical and research experience
and their great knowledge in the assessment field), which included the Kaufman Assess-
ment Battery for Children (KABC-II; Kaufman & Kaufman, 2004a), the Kaufman Adolescent
and Adult Intelligence Test (KAIT; Kaufman & Kaufman, 1993), the Kaufman Brief Intelli-
gence Test (KBIT-2; Kaufman & Kaufman, 2004b), and many other instruments. The first
edition of the KABC was the first intelligence test to be founded on two theoretical models:
Luria’s processing model and the CHC model. We will present further information about
the KABC-II and the KAIT.
The Kaufman Assessment Battery for Children, Second Edition (KABC-II; Kaufman &
Kaufman, 2004a) was designed to measure a range of cognitive abilities in children aged
3 to 18. The instrument is grounded in two modern theoretical models: Luria’s neuropsy-
chological theory of processing and the CHC model of broad and narrow abilities.
Because of its dual theoretical foundation, the KABC-II may be interpreted based on
either interpretive approach (i.e., the Luria model or the CHC approach), depending on
which approach the examiner chooses to take. The KABC-II yields a separate global
score for each of the two theoretical models: the Mental Processing Index (MPI), which
measures mental processing ability from the Luria perspective, and the Fluid-Crystallized
Index (FCI), which measures general cognitive ability from the CHC model. The primary
difference between these two global scores is that the MPI (Luria’s model) excludes meas-
ures of acquired knowledge, whereas the FCI (CHC theory) includes measures of acquired
knowledge (Kaufman, Lichtenberger, Fletcher-Janzen, & Kaufman, 2005). Only one of
these two global scores is computed for any examinee. The KABC-II also yields scores on
five broad abilities scales and 18 core and supplementary subtests; however, an exam-
iner would not use all 18 with one child. The selection of subtests is based on the age
range of the child and which theoretical approach is used for interpretation. Table 8.9
provides a description of the KABC-II scales and core subtests for the 7 to 18 age range.
The instrument provides standard scores, percentiles, and age equivalent scores. For the
broad ability scales, standard scores have a mean of 100 and a standard deviation of 15.
Descriptive categories are also provided based on various standard score ranges: upper
extreme (standard score of 131 or greater), above average (116 to 130), average (85 to
115), below average (70 to 84), and lower extreme (79 or less). The KABC-II was stand-
ardized on a sample of 3,025 children reflective of the demographics of the 2001 U.S.
Census report. The test manual reports strong evidence of internal consistency and test-
retest reliabilities as well as evidence of construct validity. Global scores of the KABC-II
Chapter 8 • Assessment of Intelligence and General Ability 183
correlated strongly with the WISC-III, WISC-IV, and WPPSI-III FSIQs. Raiford and Coal-
son (2014) found that the correlation remained strong between the KABC-II and the
WPPSI-IV.
The Kaufman Adolescent and Adult Intelligence Test (KAIT; Kaufman & Kaufman, 1993)
is designed for individuals of ages 11 to 85 years. The core battery consists of three intelli-
gence scales (i.e., Fluid, Crystallized, and Composite Intelligence) and six subtests—three
that assess fluid intelligence and three that assess crystallized intelligence. The Fluid scale
measures the ability to solve novel problems by using reasoning, which is considered a
function of biological and neurological factors. The Crystallized scale measures ability to
solve problems and make decisions based on acquired knowledge, verbal conceptualiza-
tion, formal and informal education, life experience, and acculturation. The Composite
score provides a measure of overall intellectual functioning. The instrument was standard-
ized on a national sample of 2,000 individuals. The manual reports validity and reliability
information, including information on correlational studies with the WISC-IV, the
WAIS‑III, and the KABC-II, as well as internal consistency reliability coefficients in the
.90s. Additional studies have shown similar relationships with the WAIS-IV (Lichten-
berger & Kaufman, 2012)
Additional Individual Intelligence Tests The Differential Ability Scales, Second Edition
(DAS–II; Elliot, 2007) is a comprehensive, individually administered instrument for assessing
Chapter 8 • Assessment of Intelligence and General Ability 185
cognitive abilities of children aged 2 years through 17 years. The DAS was initially devel-
oped to provide professionals assessing children with learning disabilities and develop-
mental disabilities with information at a finer level of detail than an IQ score; thus, its main
focus is on specific abilities rather than on general “intelligence.” In addition to learning
disabilities, the DAS-II is also appropriate for children with a variety of special classifi
cations, including ADHD, language impairment, limited English proficiency, mild to
moderate intellectual impairment, and gifted/talented children. The DAS-II consists of
20 core and diagnostic subtests that are grouped into three batteries: Early Years Battery
(Lower Level) for ages 2 years, 6 months to 3 years, 5 months; Early Years Battery (Upper
Level) for ages 3 years, 6 months to 6 years, 11 months; and School Age Battery for ages
7 years to 17 years, 11 months. These batteries provide the General Conceptual Ability score
(GCA), which is a composite score that focuses on reasoning and conceptual abilities.
The Slosson Intelligence Test-Revised for Children and Adults (SIT-R3) is a 187-item oral
screening instrument that was designed to provide a quick estimate of general verbal cog-
nitive abilities of individuals aged 4 to 65. It measures the following cognitive areas: vocab-
ulary, general information, similarities and differences, comprehension, quantitative, and
auditory memory. As a screening instrument, it can be used to provide tentative diagnoses
or to determine if further in-depth evaluation is needed.
The Das–Naglieri Cognitive Assessment System (CAS; Naglieri & Das, 1997) was built
on the PASS theory of cognitive processes, which emphasizes cognitive processes that are
related to performance rather than to a general intelligence model. The CAS assesses cog-
nitive processes in children aged 5 through 17 and yields scores for the Planning, Atten-
tion, Simultaneous, and Successive subscales as well as a full-scale score. The test may be
used for identifying ADHD, learning disabilities, intellectual disabilities, and giftedness as
well as for evaluating children with traumatic brain injury.
Cognitive Abilities Test The Cognitive Abilities Test (CogAT; Lohman & Hagen, 2001)
measures K–12 students’ learned reasoning and problem-solving skills. It was designed to
help educators make important student placement decisions, such as selecting students for
gifted and talented programs. The CogAT can be administered in whole or in part, and
30 to 60 minutes of time is allowed per session, depending on the child’s grade level. It
provides scores for three batteries as well as an overall composite score:
1. Verbal Battery Measures a child’s ability to remember and transform sequences of
English words, to understand them, and to make inferences and judgments about
them. It includes subtests on Verbal Classification, Sentence Completion, and Verbal
Analogies.
186 Chapter 8 • Assessment of Intelligence and General Ability
Otis–Lennon School Ability Test, Eighth Edition The Otis–Lennon School Ability Test,
Eighth Edition (OLSAT-8; Otis & Lennon, 2003) is a widely used group intelligence test sold
only to accredited schools and school districts, often used for screening children for gifted
and talented programs. First published in 1918, the OLSAT-8 yields an overall school ability
index and consists of five subtests designed to measure abstract thinking and reasoning
abilities of children. The subtests are Verbal Comprehension, Verbal Reasoning, Pictorial
Reasoning, Figural Reasoning, and Quantitative Reasoning. The test has seven levels
extending from grade K through grade 12.
Raven’s Progressive Matrices The Raven’s Progressive Matrices (RPM; Raven, Raven, &
Court, 2003) are multiple-choice tests that measure fluid intelligence, which is the ability to
make sense out of complex data, to perceive new patterns and relationships, and to forge
constructs (largely nonverbal). The tests consist of a series of visual matrices (i.e., a 2 * 2
or 3 * 3 grid), each of which has a piece missing. The examinee studies the pattern on the
grid and selects the missing piece from a group of options. The RPM is available in three
formats: the Standard Progressive Matrices (SPM), which is used with the general popula-
tion; the Colored Progressive Matrices (CPM), which is designed for younger children, the
elderly, and people with moderate or severe learning difficulties; and the Advanced Pro-
gressive Matrices (APM), appropriate for adults and adolescents of above-average intelli-
gence. The tests are called progressive because each test consists of “sets” of items, and each
item and each set are progressively more difficult.
Other commonly used group intelligence tests include the California Test of Mental
Maturity, the Kuhlmann–Anderson Intelligence Tests, and the Henmon–Nelson Tests of
Mental Ability.
Specialized Tests
So far, we’ve provided information about a number of well-known individual and group
intelligence tests. For the most part, these tests can be used for multiple purposes for
Chapter 8 • Assessment of Intelligence and General Ability 187
assessing diverse populations. However, some tests are more specialized, meaning they
were developed specifically for use with certain populations, such as preschool children,
individuals with learning disabilities or other disabilities, children with gifted abilities, or
individuals from diverse cultural or linguistic backgrounds. Examples of intellectual abil-
ity tests designed for special populations include the Test of Nonverbal Intelligence
(TONI-4), the Leiter Performance Scale, Third Edition (Leiter-3), and the Universal Non-
verbal Intelligence Test (UNIT). A more complete (though not exhaustive) list is available
in Table 8.10.
Test of Nonverbal Intelligence, Third Edition A well-known specialized test is the Test
of Nonverbal Intelligence, Third Edition (TONI3). This 45-item instrument is a language-free
measure of intelligence, aptitude, abstract reasoning, and problem solving in individuals
aged 6 through 89 years. It is completely nonverbal, requiring no reading, writing, speak-
ing, or listening on the part of the test taker. Examiners administer the test by pantomim-
ing instructions, and the test takers indicate their response choices by pointing, nodding,
or using a symbolic gesture. The test is well suited for use with individuals who have
severe spoken language disorders (i.e., aphasia), individuals who are deaf or hearing
impaired, non-English speakers or English-language learners, and individuals with cogni-
tive, language, or motor impairments due to intellectual disabilities, deafness, develop-
mental disabilities, autism, cerebral palsy, stroke, disease, head injury, or other
neurological impairment. The TONI3 requires approximately 15 to 20 minutes to administer
and score. The test provides raw scores, deviation IQ scores, percentile ranks, and age
188 Chapter 8 • Assessment of Intelligence and General Ability
equivalent scores. It is sound psychometrically, with strong validity and reliability evi-
dence, and was normed on a demographically representative and stratified sample of
3,451 people.
Leiter International Performance Scale, Third Edition The Leiter International Per-
formance Scale, Third Edition (Leiter-3) has four subtests to calculate nonverbal IQ, two
subtests to calculate nonverbal memory, two subtests to calculate processing speed, and
one nonverbal neuropsychological screener. The Leiter-3 was normed on a range of chil-
dren who range from cognitive impairment to intellectual superiority. The normative sam-
ple also included children with hearing issues; deafness; cognitive impairment;
developmental delays; autism; ESL; and speech and language deficits. The authors have
included a complete profile of cognitive strengths and weaknesses. Scores are presented
for each subtest and skill area. The Leiter was found to correlate with the WISC-IV (r = .88),
the WJ-III (r = .77-.92), and the SB-5 (r = .85). The Leiter-3 has been used to test a wide
variety of individuals, including those with hearing and language disabilities, mental dis-
abilities, cultural disadvantages, or non-English-speaking backgrounds. The Leiter-3
measures dimensions of perceptual and conceptual abilities. In the third edition of the
Leiter, the number of subtests was reduced from 20 to 10 (i.e., Figure-Ground, Form Com-
pletion, Classification and Analogies, Sequential Order, Visual Patterns, Attention Sus-
tained, Forward Memory, Attention Divided, Reverse Memory, and Nonverbal Stroop).
The Leiter-3 is a valuable tool in the assessment of the cognitive abilities of children with
hearing impairments, children with severe expressive or receptive language disabilities,
and adults with severe or profound intellectual disability. The Leiter-3 is also considered
an appropriate instrument for those who have motor and manipulative difficulties in addi-
tion to impaired communication skills.
background information about the examinee and establish the basis for a good working
relationship.
Observations of examinees can provide counselors with information relevant to intel-
ligence assessment. For example, an examiner can observe a test taker’s appearance, emo-
tional responses, social interactions, communication skills, and thought processes, which
can give the examiner insight into the test taker’s functioning. The following is an example
of an observation from a report of an evaluation that included the use of the WISC-IV:
James, a 12-year-old white male, was assessed using the WISC-IV. All testing
was conducted using standard procedures. Due to good lighting, a comfortable
room temperature and appropriate sized furniture, conditions for all testing
sessions were considered to be adequate. The assessment process was per-
formed in a private room with minimal distractions. James did not wear glasses
or hearing aids during testing.
Rapport was established with James through informal conversation and
maintained adequately for all testing. James provided open and clear responses
to all questions while maintaining adequate eye contact. When asked about his
friends, James was eager to discuss his baseball team and his recent placement
on the all-star team. Overall, James appeared to be a cooperative, happy, and
developmentally appropriate child.
With regard to testing, James indicated verbally that he understood the
reasons for the assessment process and that he understood the directions. His
behavior during testing (e.g., immediately beginning tasks when directed)
also supported his understanding. Overall, the examiner believed that the
results of all tests were a valid estimate of current functioning abilities in all
areas assessed.
Summary
Early interest in measuring intelligence dates ability. Tests can be used for screening, identifi-
back to the late 19th century. Different scien- cation, and placement purposes, as well as a
tists had differing ideas about the nature of cognitive adjunct to in-depth psychological
intelligence and how to measure it, but all of evaluation. The most prominent individual
these ideas contributed to our current under- intelligence tests are the Wechsler scales and the
standing of the concept of intelligence. Numer- Stanford–Binet, although numerous tests are
ous theories of intelligence have been presented available for multiple purposes for assessing
over the decades, with varied perspectives diverse populations.
about (a) the existence of a g factor, (b) broad There are several controversial aspects of
and specific abilities that constitute intelli- intelligence assessment. Scholars still discuss
gence, (c) intelligence as a hierarchy of abilities, the existence of a g factor, debate the issue of
(d) multiple intelligences, and (e) information heredity and intelligence, are concerned with
processing. test bias, and question the stability of intelli-
Intelligence tests measure a broad spec- gence over time. Despite these issues, the vari-
trum of cognitive abilities, such as reasoning, ety of published intelligence tests has
comprehension, judgment, memory, and spatial proliferated in the last 100 years.
Suggested Activities
1. Critique an intelligence test discussed in this Charles Murray (Free Press, 1994) and be pre-
chapter. pared to discuss the book in class. The book is
2. Interview a psychologist or counselor who uses available for download at https://lesacredu-
intelligence tests. Find out which tests he or she printemps19.files.wordpress.com/2012/11/the-
uses, and why. Report your results to the class. bell-curve.pdf.
3. Write a review of the literature or a position 6. Review the WISC-IV scores presented in Table 8.8.
paper on one of the following topics: bias in intel- How would you describe the test taker’s overall
ligence testing, history of intelligence testing, or level of intelligence? How would you describe
genetic studies of intelligence. any relative strengths the test taker might have?
4. Take a group IQ test, score it, and write a report How would you describe any relative weak-
of the results. nesses?
5. Read The Bell Curve: Intelligence and Class Struc-
ture in American Life by Richard Herrnstein and
192 Chapter 8 • Assessment of Intelligence and General Ability
References
Ackerman, P. L., & Beier, M. E. (2012). The problem is c ontemporary approach to interpretation. Needham
in the definition: G and intelligence in I-O psy- Heights, MA: Allyn & Bacon.
chology. Industrial and Organizational Psychology: Floyd, R. G., Bergeron, R., Hamilton, G., & Parra, G.
Perspectives on Science and Practice, 5(2), 149–153. R. (2010). How do executive functions fit with the
Anastasi, A., & Urbina, S. (1997). Psychological testing Cattell–Horn–Carroll model? Some evidence from
(7th ed.). Upper Saddle River, NJ: Prentice Hall. a joint factor analysis of the Delis–Kaplan execu-
Becker, K. A. (2003). History of the Stanford–Binet Intel- tive function system and the Woodcock–Johnson
ligence Scales: Content and psychometrics. Itasca, IL: III Tests of Cognitive Abilities. Psychology in the
Riverside Publishing. Schools, 47(7), 721–738.
Braden, J. P., & Athanasiou, M. S. (2005). A compara- Flynn, J. R. (1998). IQ gains over time: Toward find-
tive review of nonverbal measures of intelligence. ing the causes. In U. Neisser (Ed.), The rising curve:
In D. P. Flanagan & P. L. Harrison (Eds.), Contem- Long-term gains in IQ and related measures
porary intellectual assessment (2nd ed., pp. 557–578). (pp. 25–66). Washington, DC: American Psycho-
New York, NY: Guilford. logical Association.
Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Flynn, J. R. (1999). Searching for justice: The discov-
Psychological test usage: Implications in profes- ery of IQ gains over time. American Psychologist,
sional psychology. Professional Psychology: Research 54, 5–20.
and Practice, 31, 141–154. Flynn, J. R. (2000). IQ gains, WISC subtests, and fluid
Carroll, J. B. (1993). Human cognitive abilities: A survey g: G theory and the relevance of Spearman’s
of factor-analytical studies. New York, NY: Cam- hypothesis to race. In Novartis Foundation Sym-
bridge University Press. posium 233 (Ed.), The nature of intelligence (pp. 202–
Carroll, J. B. (2012). The three-stratum theory of cog- 227). Chichester, UK: Wiley.
nitive abilities. In D. P. Flanagan & P. L. Harrison Flynn, J. R. (2009). What is intelligence? Beyond the
(Eds.), Contemporary intellectual assessment Flynn Effect. Cambridge, UK: Cambridge Univer-
(2nd ed., pp. 69–76). New York, NY: Guilford. sity Press.
Cattell, R. B. (1941). Some theoretical issues in adult Gardner, H. (2011). The theory of multiple intelli-
intelligence testing. Psychological Bulletin, 38, 592. gences. In M. Gernsbacher, R. W. Pew, L. M.
Cattell, R. B. (1950). Personality. New York, NY: Hough, & J. R. Pomerantz (Eds.), Psychology and
McGraw-Hill. the real world: Essays illustrating fundamental contri-
Cattell, R. B. (1971). Abilities: Their structure, growth, butions to society (pp. 122–130). New York, NY:
and action. Boston, MA: Houghton-Mifflin. Worth Publishers.
Chamorro-Premuzic, T. (2011). Personality and indi- Gardner, H. (1993). Multiple intelligences: The theory in
vidual differences. Malden, MA: Blackwell. practice. New York, NY: Basic Books.
Das, J. P., Naglieri, J. A., & Kirby, R. J. (1994). Assess- Gardner, H. (2006). Changing minds: The art and science
ment of cognitive processes. New York, NY: Allyn & of changing our own and other people’s minds. Boston,
Bacon. MA: Harvard Business School Publishing.
Dickens, W. T., & Flynn, J. R. (2001). Heritability Guilford, J. P. (1967). The nature of human intelligence.
estimates versus large environmental effects: The New York, NY: McGraw-Hill.
IQ paradox resolved. Psychological Review, 108, Horn, J. L., & Cattell, R. B. (1966a). Refinement and
346–369. test of the theory of fluid and crystallized general
Elliott, C. (2007). Differential ability scales: Administra- intelligences. Journal of Educational Psychology, 57,
tion and scoring manual (2nd ed.). San Antonio, CA: 253–270.
PsychCorp. Horn, J. L., & Cattell, R. B. (1966b). Age differences in
Flanagan, D. P., & Harrison, P. L. (2012). Contempo- primary mental ability factors. Journal of Gerontol-
rary intellectual assessment: Theories, tests, and issues ogy, 21, 210–220.
(3rd ed.). New York, NY: Guilford Press. Horn, J. L., & Cattell, R. B. (1967). Age differences in
Flanagan, D. P., McGrew, K. S., & Ortiz, S. (2000). The fluid and crystallized intelligence. Acta Psycholog-
Wechsler Intelligence Scales and Gf-Gc Theory: A ica, 26, 107–129.
Chapter 8 • Assessment of Intelligence and General Ability 193
Horn, J. L., & Noll, J. (1997). Human cognitive capa- Contemporary intellectual assessment (2nd ed.,
bilities: Gf-Gc theory. In D. P. Flanagan, J. L. pp. 441–460). New York, NY: Guilford.
Genshaft, & P. L. Harrison (Eds.), Contemporary Naglieri, J. A., & Das, J. P. (1997). The PASS cognitive
intellectual assessment: Theories, tests, and issues processing theory. In R.F. Dillon (Ed.), Handbook
(pp. 53–91). New York, NY: Guilford. on testing (pp.138–163). London, UK: Greenwood
Jensen, A. R. (1998). The g factor: The science of mental Press.
ability. Westport, CT: Praeger. Neisser, U., Boodoo, G., Bouchard, T. J., Jr., Boykin,
Kaufman, A. S. (2000). Tests of intelligence. In R. J. A. W., Brody, N., Ceci, S. J. . . . Urbina, S. (1996).
Sternberg (Ed.), Handbook of intelligence (pp. 445–476). Intelligence: Knowns and unknowns. American
New York, NY: Cambridge University Press. Psychologist, 51(2), 77–101.
Kaufman, A. S., & Kaufman, N. L. (1993). Manual for Nettelbeck, T., & Wilson, C. (2005). Intelligence and
Kaufman Adolescent and Adult Intelligence Test IQ: What teachers should know. Educational Psy-
(KAIT). Circle Pines, MN: American Guidance chology, 25, 609–630.
Service. Otis, A., & Lennon, R. (2003). Otis–Lennon School Abil-
Kaufman, A. S., & Kaufman, N. L. (2004a). Kaufman ity Test (OLSAT8) (8th ed.). Marrickville, Aus-
Assessment Battery for Children, second edition man- tralia: Harcourt Assessment.
ual (KABC-II). Circle Pines, MN: American Guid- Pearson Assessment. (2004). WISC-IV integrated per-
ance Service. ceptual domain case study. San Antonio, TX: Author.
Kaufman, A. S., & Kaufman, N. L. (2004b). Kaufman Piaget, J. (1970). The science of education and the psychol-
Brief Intelligence Test, second edition manual (KBIT-2). ogy of the child. New York, NY: Orion.
Circle Pines, MN: American Guidance Service. Plomin, R., DeFries, J. C., Knopils, V. S., & Neider-
Kaufman, A. S., Lichtenberger, E. O., Fletcher-Janzen, hiser, J. M. (2012). Behavioral genetics (6th ed.).
E., & Kaufman, N. L. (2005). Essentials of KABC-II New York, NY: Worth.
assessment. Hoboken, NJ: John Wiley & Sons. Plucker, J. A. (Ed.). (2003). Human intelligence: Histori-
Lichtenberger, E. O., & Kaufman, A. S. (2012). Essen- cal influences, current controversies, teaching
tials of WAIS-IV assessment. Hoboken, NJ: Wiley. resources. Retrieved from http://www.intelltheory
Lohman, D. F., & Hagen, E. (2001). Cognitive Abilities .com
Test (Form 6). Itasca, IL: Riverside Publishing. Plucker, J. A., & Esping, A. (2014). Intelligence 101.
Luria, A. R. (1980). Higher cortical functions in man. New York, NY: Springer Publishing Co.
New York, NY: Basic Books. Raiford, S. E., & Coalson, D. L. (2014). Essentials of
Luria, A. R. (1970). The functional organization of the WPPSI-IV assessment. Hoboken, NJ: Wiley.
brain. Scientific American, 222, 66–78. Raven, J., Raven, J. C., & Court, J. H. (2003). Manual
Luria, A. R. (1966). Higher cortical functions in man. for the Raven’s Progressive Matrices and Vocabulary
New York: Basic Books. Scales. Section 1: General overview. Oxford, UK:
Mayer, R. E. (2000). Intelligence and education. In Oxford Psychologists Press.
R. J. Sternberg (Ed.), Handbook of intelligence Roid, G. H. (2003). Stanford–Binet Intelligence Scales,
(pp. 519–533). Cambridge, UK: Cambridge interpretive manual: Expanded guide to the interpreta-
University Press. tion of SB5 test results (5th ed.). Itasca, IL: Riverside
McGrew, K. S. (2005). The Cattell–Horn–Carroll the- Publishing.
ory of cognitive abilities: Past, present, and future. Sattler, J. M. (2008). Assessment of children: Cognitive
In D. P. Flanagan & P. L. Harrison (Eds.), Contem- foundations (5th ed.). San Diego, CA: Jerome M.
porary intellectual assessment (2nd ed., pp. 136–181). Sattler Publisher, Inc.
New York, NY: Guilford. Sattler, J. M., & Dumont, R. (2004). Assessment of chil-
McGrew, K. S., & Woodcock, R. W. (2001). Technical dren: WISC-IV and WPPSI-III supplement. San
manual: Woodcock–Johnson III. Itasca, IL: Riverside Diego, CA: Jerome M. Sattler Publisher, Inc.
Publishing. Schrank, F. A., Flanagan, D. P., Woodcock, R. W., &
Muchinsky, P. M. (2008). Psychology applied to work: Mascolo, J. T. (2010). Essentials of the WJ III cogni-
An introduction to industrial and organizational psy- tive abilities assessment. Hoboken, NJ: Wiley.
chology. (9th ed.). Belmont, CA: Wadsworth. Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of
Naglieri, J. A. (2005). The Cognitive Assessment Sys- human intelligence. New York, NY: Cambridge
tem. In D. P. Flanagan & P. L. Harrison (Eds.), University Press.
194 Chapter 8 • Assessment of Intelligence and General Ability
Sternberg, R. J. (1988). The triarchic mind: A new theory Thurstone, L. L. (1938). Primary mental abilities. Chi-
of human intelligence. New York, NY: Viking. cago, IL: University of Chicago Press.
Sternberg, R. J. (1997). Successful intelligence. New Toga, A. W., & Thompson, P. M. (2005). Genetics of
York, NY: Plume. brain structure and intelligence. Annual Review of
Sternberg, R. J. (1999). The theory of successful Neuroscience, 28, 1–23.
intelligence. Review of General Psychology, 3, Vernon, P. E. (1950). The structure of human abilities.
292–316. London, UK: Methuen.
Sternberg, R. J. (2003). Wisdom, intelligence, and crea- Wasserman, J. D. (2003). Assessment of intellectual
tivity synthesized. Cambridge, UK: Cambridge functioning. In J. R. Graham, J. A. Naglieri, & I. B.
University Press. Weiner (Eds.), Handbook of psychology: Assessment
Sternberg, R. J., & Berg, C. A. (1986). Definitions of psychology (Vol. 10, pp. 417–442). Hoboken, NJ:
intelligence: A quantitative comparison of the John Wiley & Sons.
1921 and 1986 symposia. In R. J. Sternberg & D. Wasserman, J. D., & Tulsky, D. S. (2012). A history of
K. Detterman (Eds.), What is intelligence? Contem- intelligence assessment. In D. P. Flanagan & P. L.
porary viewpoints on its nature and definition Harrison (Eds.), Contemporary intellectual assess-
(pp. 155–162). Norwood, NJ: Ablex. ment (2nd ed., pp. 3–22). New York, NY: Guilford.
Terman, L. M., & Oden, M. H. (1959). Genetic studies of Wechsler, D. (1997). Wechsler Adult Intelligence Scale:
genius: The gifted group at mid-life: Thirty-five years Technical manual (3rd ed.). San Antonio, TX: The
of follow-up of the superior child (Vol. 5). Stanford, Psychological Corporation.
CA: Stanford University Press. Wechsler, D. (2002). Manual for the Wechsler Preschool
Terman, L. M. (1916). The Measurement of Intelligence. and Primary Scale of Intelligence (3rd ed.). San Anto-
Boston: Houghton Mifflin. nio, TX: Psychological Corporation.
The Psychological Corporation. (1997). WAIS-III— Wechsler, D. (2003). Wechsler Intelligence Scale for Chil-
WMS-III technical manual. San Antonio, TX: dren (4th ed.). San Antonio, TX: Harcourt Assess-
Author. ment.
The Psychological Corporation. (2002). WPPSI-III Whiston, S. C. (2012). Principles and applications of
technical and interpretative manual. San Antonio, assessment in counseling (2nd ed.). Belmont, CA:
TX: Author. Brooks/Cole.
The Psychological Corporation. (2003). WISC-IV Woodcock, R. W., McGrew, K. S., & Mather, N.
administration and scoring manual. San Antonio, (2007). Woodcock–Johnson III. Itasca, IL: Riverside
TX: Author. Publishing.
CHAPTER
9 Assessment of Achievement
Assessing Achievement
Achievement can be defined as an individual’s knowledge or skills in a particular content
area in which he or she has received instruction (American Educational Research
Association (AERA), American Psychological Association (APA), & National Council on
195
196 Chapter 9 • Assessment of Achievement
Measurement in Education (NCME), 2014). A diverse array of tests and procedures can be
used to assess achievement, including achievement test batteries, individual and diagnos-
tic tests, subject-area tests, curriculum-based assessment and measurement, statewide
assessment programs, observations, and more. Instruments that assess achievement are
comprised of items that require test takers to demonstrate some level of knowledge or
skill. The results of achievement testing inform individuals and relevant parties (i.e., par-
ents, teachers, school administrators, and other helping professionals) about an individual’s
academic strengths and weaknesses or any learning difficulties. In addition, achievement
tests may be used to monitor individual achievement and to evaluate the effectiveness of
educational and social programs. In schools, they provide teachers and administrators
with information for planning or modifying the curriculum to better serve a particular
student or group of students.
Scores on achievement tests are often related to scores on other ability measures, such
as intelligence tests and aptitude tests. As stated in Chapter 8, all three (achievement, intel-
ligence, and aptitude tests) may have some overlap in content, but they are distinct in
terms of their focus. Achievement tests focus more on the present—that is, what an indi-
vidual knows or can do right now. However, achievement tests are also considered an
excellent predictor of academic performance and are frequently used to predict future
performance in educational programs (e.g., college) or vocational programs.
It is highly unlikely that an individual can graduate high school without participat-
ing in some form of achievement testing. In the schools, many informal teacher-made tests
constructed to reflect the learning objectives specific to a particular teacher, course, or unit
of instruction are considered informal achievement tests. In contrast, standardized achieve-
ment tests are commercially developed tests designed to reflect learning outcomes and
content common to the majority of U.S. schools. Each year, schools throughout the world
administer standardized achievement tests to large groups of students to determine stu-
dent performance levels and overall school performance. Some standardized tests (par-
ticularly diagnostic tests and individual achievement tests) are designed specifically to
identify students’ strengths and weaknesses or the presence of a learning problem.
Teacher-made tests differ from standardized tests in terms of the quality of test items, the
reliability of test scores, administration and scoring procedures, and interpretation of test
scores (Miller, Linn, & Gronlund, 2012). Because of their differences, teacher-made tests
and standardized tests are complementary rather than opposing methods of evaluating
achievement, and both types can be employed to assess student achievement (Aiken &
Groth-Marnat, 2006). In fact, Phelps (2012) examined studies from 1910 to 2010 and discov-
ered that testing with feedback had a significant impact on overall achievement.
In this chapter, we will continue our discussion of achievement testing, focusing pri-
marily on standardized achievement tests. Although achievement testing is performed in
various settings (e.g., schools, business, industry, military), much of our discussion will
center on standardized achievement testing in the schools.
Stanford Achievement Test, 10th Edition The Stanford Achievement Test Series
(Stanford 10) is one of the leading test batteries utilized by school districts in the United
States. The original Stanford Achievement Test was first published over 80 years ago. The
Stanford 10 test series, sold only to school districts, is a norm-referenced, standardized
battery of tests designed to measure school achievement from kindergarten through
Evaluate
Identify Instructional Assist with Diagnosis Certification
Strengths and Objectives and Screen for Placement/ of Learning and
Weaknesses Programs Selection Decisions Disabilities Licensure
X X
X X X
X X X
X X
grade 12. These 13 test levels are divided into three groups: the Stanford Early School
Achievement Test (SESAT) for students in kindergarten and the first half of first grade, the
Stanford Achievement Test for students in the second half of first grade through ninth grade,
and the Stanford Test of Academic Skills (TASK) for grades 9 through 12 and for beginning
college students. The Stanford 10 consists of several composites and subtests assessing
major academic content areas. The content areas vary somewhat depending on the level
(K–12) and the group (SESAT, Stanford Achievement Test, Stanford Test of Academic
Skills). The following are some content areas assessed in the series:
• Total Reading Measures reading skills, such as word sounds and spellings, deter-
mining word meanings and synonyms, and reading comprehension. This section
includes subtests in Reading Vocabulary and Reading Comprehension and Word
Study Skills.
• Total Mathematics Measures problem-solving skills involving number sense, arith-
metic operations, patterns and algebra, data and probability, geometry, and mea
surement concepts. Two subtests comprise Total Mathematics: Mathematics Problem
Solving and Mathematics Procedures.
• Science Measures students’ understanding of life science, Earth science, physical sci-
ence, and the nature of science.
• Language Measures students’ application of language principles in writing, includ-
ing capitalization, punctuation, word usage, sentence structure, organization, and
composing and editing.
• Spelling Measures students’ ability to recognize the correct spelling of words.
• Social Science Measures achievement in history, geography, political science, and
economics.
• Listening Measures recognition of spoken words and the ability to construct mean-
ing from dictated material.
The Stanford 10 can be administered as either the full-length battery or the abbrevi-
ated battery. It offers a variety of item formats, including multiple-choice, short-answer,
and extended-response questions. Besides requiring a written answer of five or six sen-
tences, the extended response may also require the student to graph, illustrate, or show
work. An interesting feature of the Stanford 10 is the arrangement of test items. Rather
than the typical “easy to hard” arrangement found in most achievement tests, the Stanford
10 mixes easy items with difficult items. The rationale is that in the traditional a rrangement,
200 Chapter 9 • Assessment of Achievement
Source: From Stanford Achievement Test: Tenth Edition. Copyright © by Pearson Education, Inc. and/or its
affiliates. Reproduced with permission. All rights reserved.
students tend to get frustrated as they encounter consistently harder items and “give up,”
but with the “easy-hard-easy” format, difficult questions are surrounded by easy ques-
tions to encourage students to complete the test. The test is also purported to be the only
achievement test battery to provide realistic, full-color illustrations, which is thought to
improve student motivation.
The Stanford 10 provides several types of scores, including scaled scores, percentile
ranks, stanines, grade equivalents, and NCEs (see Figure 9.1). Stanines of 1, 2, and 3 are
considered below average; 4, 5, and 6 are average; and 7, 8, and 9 are above average. Because
the Otis–Lennon Scholastic Ability Test (OLSAT 8) is often administered in conjunction
with the Stanford 10, the Stanford 10 score summary provides an Achievement/Ability
Comparison (AAC) Range, which describes a student’s scores on the Stanford 10 in rela-
tion to the scores of other students of similar ability as measured by the OLSAT 8. Com-
pared to other students, a student’s AAC score is considered high if it falls in the top 23%,
middle if in the middle 54%, and low if in the bottom 23%. Scores categorized as middle
indicate that the student’s ability (i.e., intelligence) and achievement are about the same;
the student is working up to his or her level of ability. High range scores indicate that the
student’s achievement is actually above what would be expected. Low scores indicate that
the student’s achievement is not as high as may be expected.
The psychometric properties of the Stanford 10 are generally strong, with internal
consistency reliability coefficients ranging from the mid-.80s to .90s and alternate forms
Chapter 9 • Assessment of Achievement 201
coefficients in the .80s. Evidence of content validity was based on a well-defined test blue-
print and a test-development process that involved extensive review of several national and
state educational standards. For example, the mathematics subtests were developed with
careful consideration of the National Council of Teachers of Mathematics (NCTM) publica-
tion Principles and Standards for School Mathematics (2000), which emphasizes the necessity of
problem solving as the focus of school mathematics. Even with the strong content validity,
test users are recommended to review the test to determine if the test content matches the
curriculum and goals of the particular school. Evidence of convergent validity is provided
through numerous correlations between the various subtests and totals of the Stanford 10
with the Stanford 9 subtests. Construct validity evidence is shown through strong correla-
tions between the Stanford 10 and the OLSAT. In order to increase your understanding of
the Stanford 10 and the principles of achievement testing, complete Exercise 9.1.
Exercise 9.1
Achievement Testing: Marcus
Marcus, age 9 years 8 months, is in the fourth (see summary report below). After reviewing
grade and is having some academic difficulties Marcus’s summary report, answer the exercise
in school. He was administered the Stanford 10 questions.
Wide Range Achievement Test, Fourth Edition A widely used individual achievement
test is the Wide Range Achievement Test, Fourth Edition (WRAT4). The WRAT4 is a test of basic
academic skills for individuals aged 5 years to 74 years, 11 months. It takes approximately
30 minutes to complete and is often administered as a screening test to determine if a more
comprehensive achievement test is needed. As with most individual achievement tests, the
results of the WRAT4 by themselves are not intended to provide formal identification of
learning or cognitive disorders. The WRAT4 includes four subtests and one composite:
1. Word Reading Measures letter and word decoding through letter identification and
word recognition.
2. Sentence Comprehension Measures an individual’s ability to gain meaning from
words and to comprehend ideas and information contained in sentences through the
use of a modified cloze technique.
3. Spelling Measures an individual’s ability to encode sounds into written form through
the use of a dictated spelling format containing both letters and words.
4. Math Computation Measures an individual’s ability to perform basic mathematics
computations through counting, identifying numbers, solving simple oral problems,
and calculating written mathematics problems.
5. Reading Composite A composite raw score calculated from the Word Reading and
Sentence Comprehension subtests.
The WRAT4 has two alternate forms (Blue and Green) that can be used interchange-
ably as pretest and posttest measures or can be combined for a more comprehensive
Chapter 9 • Assessment of Achievement 203
e valuation. Standard scores, percentile ranks, stanines, NCEs, and grade equivalents are
provided. The WRAT4 was standardized on a representative national sample of over
3,000 individuals selected according to a stratified national sample based on age, gender,
ethnicity, geographic region, and parental education. In order to help further your under-
standing of the WRAT4, complete Exercise 9.2. Review the case study of Maria, her
grades, and her WRAT4 scores. After considering all of the information, respond to the
exercise questions.
Mathematics Tests
• Calculation Perform mathematical computations, from simple addition to complex
equations.
• Math Fluency Calculated speed of performing simple calculations for 3 minutes.
• Applied Problems Oral, math “word problems,” solved with paper and pencil.
The WJ III ACH provides a variety of scores, including age and grade equivalents, raw
score or number correct, percentile ranks, and standard scores (mean = 100; SD = 15). It
also provides verbal labels for standard score ranges: standard scores of 151 and above are
exceptionally superior, 131 to 150 are very superior, 121 to 130 are superior, 111 to 120 are high
average, 90 to 110 are average, 80 to 89 are low average, 70 to 79 are low, 50 to 69 and below
are very low, and 49 and below are exceptionally low.
204 Chapter 9 • Assessment of Achievement
Exercise 9.2
Wide Range Achievement Test: Maria
Maria is a 10th-grade student who was the classroom, Maria is frequently off task,
referred to the school counselor because of asks inappropriate questions, comes
her unacceptable behavior in the class- unprepared, wanders about the class-
room and her poor peer relationships. room, and seeks to make herself the center
School personnel are concerned of attention. Outside of the classroom, she
about Maria’s inability to function well in keeps to herself, but she stands out in the
a school setting. Although she exhibits crowd. Her mannerisms and dress set her
problematic behavior, it is not severe apart. She appears to be preoccupied and
enough for suspension. Maria has an also seems to fear her parents, not wanting
excellent vocabulary and a good sense of them to know what happens in school.
humor. However, she does not know how To obtain more information about
to respond to teasing or correction and Maria, the school counselor looked at her
seems to enjoy telling stories about how current grades as well as her scores on the
she gets back at people who offend her. In WRAT4.
English II C
Geometry C
Biology D
World History C
Physical Education B
Spanish I C
Additional Individual Achievement Tests The Basic Achievement Skills Individual Screener
(BASIS) is designed for students from grades 1 through 9. It includes three subtests—
reading, spelling, and mathematics—and an optional writing exercise. Both criterion ref-
erenced and norm referenced, the test provides beginning-of-the-grade and
end-of-the-grade norms as well as age and adult norms. It takes about 1 hour to adminis-
ter and is designed to help examiners formulate individual educational plans for excep-
tional and special populations.
The Peabody Individual Achievement Test-Revised/Normative Update (PIAT-R/NU) was
designed for individuals aged 5 to adulthood. The PIAT-R/NU takes about 60 minutes to
administer and assesses achievement in six content areas: General Information, Reading
Recognition, Reading Comprehension, Mathematics, Spelling, and Written Expression.
The test yields age and grade equivalents, percentile ranks, NCEs, and stanines.
The Wechsler Individual Achievement Test, Second Edition was developed for use in
conjunction with the Wechsler Intelligence Scale for Children-IV (WISC-IV). The WIAT-II
has four composites, each comprised of specific subtests: (1) Reading (Word Reading,
Reading Comprehension, and Pseudoword Decoding subtests), (2) Mathematics
(Numerical Operations and Math Reasoning subtests), (3) Written Language (Spelling and
Written Expression subtests), and (4) Oral Language (Listening Comprehension and Oral
Expression subtests). The test can be given to individuals aged 4 through 85. It yields
standard scores, percentile ranks, stanines, NCEs, and age and grade equivalents. In order
to further your understanding of achievement assessment, complete Exercise 9.3. This
exercise provides an opportunity to examine the relationship among intelligence, achieve-
ment, and academic functioning. After reading the case material and considering it in the
context of what you have learned thus far, complete the exercise questions.
Diagnostic Tests
The primary purpose for using diagnostic tests is to identify an individual’s academic
strengths or weaknesses and then to design an educational program or intervention to
meet this person’s needs. As compared to achievement batteries and individual achieve-
ment tests, diagnostic tests usually have a narrower focus (on one skill or knowledge area),
contain a larger number and variety of test items, and usually take longer to administer.
The majority of diagnostic tests are in reading, but tests in math and language are widely
used as well.
Diagnostic tests are also used for identifying and diagnosing learning disabilities.
Learning disability is defined by the Individuals with Disabilities Education Improvement
Act of 2004 (IDEA) as “a disorder in one or more of the basic psychological processes
involved in understanding or in using language, spoken or written, which disorder may
manifest itself in the imperfect ability to listen, think, speak, read, write, spell, or do math-
ematical calculations” (US Department of Education, 2004, para. c.10). In education, diag-
nosing learning disabilities in students who show signs of difficulties involves a process of
monitoring their progress through a series of increasingly intensive, individualized
instructional or behavioral interventions. In addition, assessment instruments (such as
diagnostic achievement tests) are administered as part of the evaluation process. IDEA
mandates that achievement tests must measure eight areas of achievement (e.g., oral lan-
guage, basic language, total reading, reading comprehension and fluency, written expres-
sion, mathematics, math fluency, total achievement).
206 Chapter 9 • Assessment of Achievement
Exercise 9.3
Achievement and Intelligence Assessment: Danny
Danny, age 9, is a third grader at Highland have any close friends. He frequently
Elementary School. He has been referred starts arguments in class and on the play-
to the school counselor due to concerns ground. He is very outspoken. In terms of
expressed by his teacher about academic academics, Danny is not meeting basic
and behavioral issues. educational standards. He fails to do his
homework most of the time. He was
Background Information retained in third grade.
Danny’s parents are divorced, and he cur- Danny has interest in making things;
rently lives with his grandparents. Danny’s this is evident in his art class. He has spe-
father graduated from high school and cial skills in making things with his hands.
works as a heating and air-conditioning He enjoys reading, but he reportedly
technician; his mother earned her GED after watches TV for several hours every day.
dropping out of high school and works as a Danny’s teacher commented on his family
waitress. Danny has lived with his grandpar- environment and said that his frequent
ents since he was age 4. Last year, the Depart- moves have deprived Danny of compan-
ment of Social Services investigated his ions. The teacher also commented on
living situation with his grandparents due to Danny’s constant battle with fellow class-
allegations of neglect. His grandparents lack mates in class and that he doesn’t seem to
interest in how he is doing at school. Over have any friends. The teachers believe that
the last couple of years, his grandparents Danny acts out to gain attention.
have moved three times; as a result, Danny The school counselor decides to
has changed schools three times. review Danny’s academic records. She
Danny currently belongs to no reviews his report cards from first grade
organizations or clubs, and he does not to the present and his test records.
Test Records
Wechsler Intelligence Scale for Children (Administered in Third Grade—This Year)
Index Scores Standard Score Percentile Rank Descriptive Category
Verbal Comprehension (VCI) 112 79
Perceptual Reasoning Index (PRI) 108 70
Working Memory Index (WMI) 108 70
Processing Speed Index (PSI) 110 75
Table 9.3 provides a list of just a few of the many available diagnostic tests. Note that
some individual achievement tests (e.g., the Kaufman Test of Educational Achievement,
the Peabody Individual Achievement Test-Revised, the Woodcock–Johnson III Tests of
Achievement) can also be used for diagnostic purposes. We will provide further informa-
tion about two well-known diagnostic tests: the KeyMath-3 Diagnostic Assessment and
the Peabody Picture Vocabulary Test.
208 Chapter 9 • Assessment of Achievement
c hildren and adults for many years. It was first published in 1959, and its most recent edi-
tion was published in 2007. The PPVT-4 is an individually administered, norm-referenced
test designed for people aged 2 years 6 months through 90 years and older. It assesses
the ability to comprehend spoken words in Standard American English and is often used
for screening individuals from diverse linguistic backgrounds or those who have lan-
guage or communication disorders. The instrument can also be used with individuals
who have autism, cerebral palsy, or other physical disabilities. The PPVT has 228 items
that cover a broad range of receptive vocabulary levels, from preschool through adult.
Items consist of words representing 20 content areas (e.g., actions, vegetables, tools) and
parts of speech (e.g., nouns, verbs, adjectives). The test is available in two parallel forms:
Form A and Form B.
The 228 items consist of four full-color pictures arranged on one page. A nonthreat-
ening approach is used to assess vocabulary knowledge: while showing the examinee a set
of four pictures, the examiner says, “I will say something; then I want you to put your
finger on the picture of what I have said. Let’s try one.” Then, the examiner uses a prompt
that precedes the stimulus word: “Put your finger on the star.” The test taker indicates
which picture best represents the word spoken by the examiner. Test administration takes
approximately 10 to 15 minutes, and scoring is rapid and objective, usually occurring dur-
ing test administration.
The PPVT-4 provides standard scores (with a mean of 100 and a standard deviation
of 15), confidence intervals, growth scale values (GSV), percentiles, NCEs, stanines, and
210 Chapter 9 • Assessment of Achievement
0.1% 0.1%
2.2% 13.6% 34.1% 34.1% 13.6% 2.2%
–5SD –4SD –3SD –2SD –1SD Avg. +1SD +2SD +3SD +4SD
Percentile
NCE
Stanine
Extremely Low Moderately Low High Moderately Extremely High
Description Score Low Score Average Score High Score Score
Score Summary
FIGURE 9.3 Graphical profile of the Peabody Picture Vocabulary Test (PPVC-4).
Source: Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4). Copyright © 2007 by NCS Pearson, Inc.
Reproduced with permission. All rights reserved. “PPVT” is a trademark, in the U.S. and/or other countries, of
Pearson Education, Inc. or its affiliates(s).
Subject-Area Tests
The term subject-area tests generally refers to standardized achievement tests that measure
content-area knowledge in specific academic or vocational subject areas. They usually
have more items than an achievement battery’s subtests, providing a better sample of per-
formance and more reliable scores on specific subject areas (Miller et al., 2012). Like other
types of achievement tests, subject-area tests measure an individual’s knowledge and skills
in a particular area; however, they are also used to predict how well an individual will
perform in the future (i.e., aptitude).
Subject-area tests are part of major national testing programs used by colleges and
universities to determine advanced placement or credit or as part of the requirements for
admission. For example, some colleges and universities require students to take a particu-
lar subject-area test offered by the SAT for admission into competitive programs. The SAT
offers subject tests in such areas as English, history, mathematics, science, and language.
The College-Level Examination Program (CLEP) from the College Board also offers sub-
ject-area tests in 34 areas, including composition, literature, foreign languages, history,
science, math, and business, to name just a few. Students can take a CLEP subject-area test
and, by obtaining a passing score, earn college credit.
Chapter 9 • Assessment of Achievement 211
Subject-area tests can also be used as part of certification requirements. For example,
teachers often have to earn a qualifying score on a subject-area exam to become certified to
teach; the particular subject-area test corresponds to the academic subject area in which
they are seeking certification.
from norm-referenced tests in that they focus on mastery of a given objective or skill.
Norm-referenced tests usually include only one or two items to measure a given objective,
whereas criterion-referenced tests include many items on a specific objective. The criterion-
referenced test is scored to an absolute standard, usually the percentage of correct answers.
Students may be required to meet a certain score on a criterion-referenced achievement
test, say 70%, as evidence of mastery. In systems requiring mastery of objectives, the test
taker is not allowed to go on to the next unit until he or she passes the one being studied.
Criterion-referenced tests can be used for other purposes:
1. To evaluate the curriculum
2. To identify topics that should be remediated or enriched
3. To provide information for the counselor to use in educational and vocational plan-
ning with students
4. To help students select courses
5. To document student mastery of objectives
6. To provide systematic evidence of student attainment of objectives across levels and
fields over time
7. To help the counselor mark the progress of the individual over time
An example of a criterion-referenced test is the College Basic Academic Subjects Exami-
nation (College BASE), which assesses the degree to which college students have mastered
particular skills and competencies consistent with the completion of general education
coursework. The instrument measures achievement in four subject areas: English, mathe-
matics, science, and social studies. As of 2001, the exam was used by approximately 150
institutions nationwide. The users included both community colleges and 4-year institu-
tions whose Carnegie designations ranged from baccalaureate college to research univer-
sity. Some universities use the College BASE for admission into a particular program. For
example, Missouri requires all institutions to use the College BASE as a screening tool for
admission into educator preparation programs.
Another example of a criterion-referenced achievement test is the Criterion Test of
Basic Skills (CTOBS-2), which was designed to measure reading and arithmetic skills in
students from grades 6 through 11. The Reading subtest assesses basic word-attack
skills in the following areas: letter recognition, letter sounds, blending, sequencing,
decoding of common spelling patterns and multisyllable words, and sight-word recog-
nition. The Arithmetic subtest assesses skills in these areas: counting, number concepts
and numerical recognition, addition, subtraction, multiplication, division, measure-
ment concepts, fractions, decimals, percents, geometric concepts, prealgebra, and
rounding and estimation.
Criterion-referenced tests are often referred to as minimum-level skills tests, particu-
larly in education, where students must earn a minimum score on a test to be promoted to
the next grade level, graduate, or be admitted into a desired educational program. For
example, satisfactory performance on the State of Texas Assessments of Academic Readi-
ness (STAAR) is required to receive a high school diploma in Texas. Although the STAAR
tests are minimum-level tests, they are still high-stakes tests. Failure on a single test in the
STAAR battery could prevent graduation. Tests like the STAAR are controversial because
of the impact on school funding and the identified disparities in achievement scores by
race. For example, the results of the 2013 STAAR testing showed a large disparity when
Chapter 9 • Assessment of Achievement 213
comparing White and Asian students with Black and Hispanic students (Texas Education
Agency, 2014). Over 25% of Black and Hispanic students in Texas did not attain passing
scores on the STAAR in 2013.
Because of issues like the ones with STAAR, states have been retiring assessment
instruments and implementing new ones. Achievement testing is mandated by legisla-
tures in many states, because federal funding for schools is tied to the No Child Left Behind
Act (NCLB) of 2002, and each state has its own process for achievement testing. Because
the approaches vary from state to state, the issues are varied. We discuss state achievement
tests in detail in the next section.
Performance Assessment
Performance assessment (also called authentic assessment or alternative assessment) is a form of
assessment in which students are asked to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills. The tasks are either replicas of
or analogous to the kinds of problems faced in the real world. Tasks are chosen that would
allow students to demonstrate mastery of a given educational standard.
Mueller (2006) distinguished performance assessment from traditional assessment in
several ways. The most obvious is the type of items on the test. Traditional assessments
typically consist of multiple-choice, true/false, matching, and other similar items, whereas
performance assessments can include essays, projects, portfolios, performance tasks, and
open-ended exercises. Mueller also stated that traditional approaches to assessment are
typically curriculum driven. The curriculum (which delivers a predetermined body of
knowledge and skills) is developed first, then assessments are administered to determine
Chapter 9 • Assessment of Achievement 215
if students have acquired the intended knowledge and skills. In contrast, with perfor-
mance assessment, the assessment drives the curriculum. That is, teachers first determine the
tasks that students will perform to demonstrate their mastery, and then a curriculum is
developed that will enable students to perform those tasks well, which would include the
acquisition of essential knowledge and skills.
Teachers often use a mix of traditional and performance assessments to serve differ-
ent purposes (Mueller, 2006). For example, traditional assessment may be used to assess
whether a student has acquired specific knowledge and skills, and performance assess-
ment may be used to measure the student’s ability to apply the knowledge or skills in a
real-world situation. In other cases, both traditional assessment and performance assess-
ment are used to gather a more comprehensive picture of performance. For example,
achievement in STEM areas can be difficult to measure accurately. In order to have a com-
plete picture, it may be necessary to measure both macro- and microunderstanding of
concepts (Kim, VanTassel-Baska, Bracken, Feng, & Stambaugh, 2014). In order to do this in
a comprehensive manner, both traditional standardized assessments and performance
assessments may prove useful.
Portfolio Assessment
Portfolio assessment is a type of performance assessment widely used in educational set-
tings as a means of examining and measuring students’ progress by reviewing a collection
of work in one or more curriculum areas (McDonald, 2012). Portfolios can vary in style and
format. In some cases, portfolios can be viewed as a kind of scrapbook or photo album that
documents the progress and activities of a student through his or her educational pro-
gram. In other cases, portfolios are more comprehensive pictures of students’ academic
activities. At Lamar University, all graduate students in counseling complete portfolios as
part of their graduate training. In their portfolios, students are required to include key
artifacts (major course projects) from each course that tie to primary learning outcomes.
Students can also add field experience documents, resumes, disclosure statements, and
other materials. These portfolios are reviewed and graded by a team of at least three pro-
fessors as the capstone project for the degree program.
As can be seen from our examples, the contents of portfolios can include a variety of
items (sometimes called artifacts or evidence), such as writing or other work samples,
drawings, photos, video or audio recordings, computer disks, or copies of other tests or
assessment instruments. Portfolio artifacts are not selected haphazardly; rather, they are
chosen through a dynamic process of planning, reflecting, collecting, and evaluating that
occurs throughout an individual’s entire instructional program (McDonald, 2012). Arti-
facts can come from the student, teachers, parents, education staff, and other relevant
community members. Furthermore, self-reflections from students are also a common
component of portfolios.
• Rigor of the Curriculum The extent to which students take challenging, higher-level
academic courses affects achievement test scores.
• Teacher Knowledge and Skills Students perform better on achievement tests when
taught by teachers with appropriate training and education for the subject they are
teaching.
• Teacher Experience and Attendance Research indicates that students learn more
from teachers with experience than they do from novice teachers.
• Class Size Smaller class sizes (20 students or less) are associated with greater aca-
demic achievement in students, especially for elementary school students and for
students from low-income or diverse backgrounds.
• Technology Access to technology-assisted instruction increases academic achievement.
• School Safety Students perform better academically when they feel safe at school.
• Other Relevant Student Data Attendance data, disciplinary records, grade reten-
tion, and homework completion
• Contextual Data Factors such as the student’s cultural background, linguistic back-
ground, gender, out-of-school experiences, health and nutrition, self-concept, and
socioeconomic level
methods, factors impacting achievement, and other issues. As a counselor, you might con-
sult with the teacher and help to uncover the issues impacting achievement.
Although there may be a number of causes, the first step in an investigation is to exam-
ine the purpose of the test. The scores in this example might be low in some areas because the
test was given early in the year, before the students had been exposed to many of the con-
cepts and objectives taught in the sixth grade. Some school districts intentionally give this
test at the beginning of the year to obtain an overall estimate of the entry level of the students.
If you didn’t understand the purpose of the test in this example and tried to interpret the
scores, you could develop some poor perceptions of student performance. In order to further
your understanding of score interpretation, complete Exercise 9.4. Consider the case of
James, the background information, and data. Relate the information to the material you
have read in this chapter as well as others and then respond to the exercise questions.
Chapter 9 • Assessment of Achievement 219
Exercise 9.4
Interpreting Achievement: James
The case study of James illustrates the pro- have deficiencies in reading comprehen-
cess involved in interpreting achievement sion as well as overall scholastic achieve-
data. James is in fifth grade and is having ment. The Total Mathematics score includes
some academic difficulties in school. He is Mathematics Problem Solving and Mathe-
10.3 years old, weighs 80 pounds, and is 4 matics Procedures subtests. The Listening
feet 9 inches tall. A graph of his scores on score is based on the combination of the
the Stanford Achievement Test is shown in Vocabulary and Listening Comprehension
Figure 9.4. The derived scores plotted on scores, which provide data on the learner’s
the graph are percentile bands and repre- ability to understand and remember infor-
sent how James performed in relation to mation presented orally.
students in the national sample. The The scores on the Stanford 10 graph
median is the 50th percentile and repre- show that James scored high in relation to
sents typical or average performance. Some other fifth graders in Total Mathematics;
tests use different systems to classify supe- clearly, his strongest area of achievement
rior or poor performance. In this case, the is mathematics. His score on the Listening
Stanford 10 uses a classification of below test is below average as compared to his
average, average, and above average. Counse- peers. He is also below average in Total
lors need to be familiar not only with what Reading, Language, Spelling, and Science.
scores mean, but also with what is meas- His score in Social Science is average.
ured by each subtest and which scores are
combined to give total scores. The Total Additional Information
Reading score, comprised of the Reading One of the first questions to consider is
Vocabulary and Reading Comprehension how James’s current test performance
subtests, helps identify students who might compares with other achievement data,
Total Reading 28
Total Mathematics 76
Language 22
Spelling 13
Science 6
Social Science 38
Listening 3
Complete Battery 28
1 10 30 50 70 90 99
Below Average Average Above Average
Exercise 9.4
(Continued)
such as his performance in class, other student has been previously referred for
achievement test results, and report cards. assessment, then school counselors may
In terms of his class performance, James’s have access to the results of such instru-
teacher reports that he is an average stu- ments as the Slosson Intelligence Test, the
dent in mathematics but has problems in Otis–Lennon School Ability Test, the
reading comprehension and following Stanford–Binet Intelligence Scale, or the
directions. He does not seem to listen at WISC‑IV. Ability testing would help in
times and cannot remember what he is developing an individually prescribed
supposed to do. His problems in reading educational program for James, but no
comprehension also create problems in intelligence data were available in his
understanding his science and social stud- cumulative folder.
ies materials.
James’s results on the Stanford 10 can Other Relevant Data
also be compared with the results of other Other relevant student data may include
achievement tests. For example, James James’s attendance record, disciplinary
recently completed a local minimum- records, grade retention, and homework
skills-level test (a criterion-referenced test) completion. James has progressed nor-
that was based specifically on the educa- mally in school; he was never retained at a
tional objectives taught in James’s class (in grade level and has been at the same
contrast to the Stanford 10, which is based school since the beginning of the third
on a wide sampling of objectives and text- grade. He did attend two schools in other
books from schools across the nation). On states prior to that time. James has not
the minimum-skills-level test, James failed missed many days of school.
the following areas (i.e., he had fewer than
75% of the items correct): vocabulary, lis- Contextual Data
tening comprehension, synonyms, anto- James is the oldest child of three siblings;
nyms, sequencing, facts and opinions, he has an 8-year-old brother and a 3-year-
recalling details, main ideas, and sentence old sister. He lives with his mother and his
completion. This information combined stepfather, who works as a carpenter for a
with his score on the Stanford 10 Total construction company. His mother works
Reading subtest (James scored at the 28th part-time as a checkout clerk in a super-
percentile) indicates that the problem of market. His father is in the Navy, and his
reading comprehension is very real. How- parents were divorced when James was in
ever, his reading problem did not seem to kindergarten. James’s mother is concerned
affect his performance in Total Mathemat- about his progress and is open to any sug-
ics, in which he scored at the 76th percen- gestions the teacher or counselor may
tile. Test results are inconsistent in this have to improve his academic perfor-
dimension. mance. James relates fairly well with his
classmates, but gets easily frustrated when
Ability Data
he cannot understand his lessons or recite
We can also compare James’s Stanford 10 correctly for the teacher. He likes to per-
scores to his ability. In the schools, ability form duties around the classroom and has
typically refers to intellectual ability. If a become strongly attached to his teacher.
Chapter 9 • Assessment of Achievement 221
Summary
Achievement tests are used to assess an indi- their educational opportunities or choices, such
vidual’s knowledge or skills in a particular con- as being promoted or retained at a grade level,
tent area. Standardized achievement tests can graduating high school, or being admitted into
be divided into four broad categories: achieve- a desired educational program.
ment test batteries, diagnostic achievement Used properly and interpreted wisely,
tests, individual achievement tests, and subject- standardized achievement tests can provide
area tests. Each type of achievement test has valuable information about a student’s achieve-
distinct characteristics and is used for various ment level and can be useful for making deci-
purposes, such as monitoring achievement over sions or inferences based on test scores. Misused,
time, making placement or selection decisions, they can be a source of inestimable harm, not
evaluating educational programs and curricu- only to students, but to the teachers, schools,
lum, diagnosing learning disabilities, and pro- and the community as well. As with any type of
viding endorsement for certification and test, any decisions (such as promotion of stu-
licensure. dents) should never be made solely on the basis
Standardized achievement tests play a of test scores alone. Other relevant information
large role in the U.S. education system. Students should be taken into account to enhance the
are administered tests that can directly affect overall validity of such decisions.
Suggested Activities
1. Select one of the major achievement tests, and 4. Go to the library and find a research article that
write a critique of it. uses a standardized test, and then answer the fol-
2. Interview counselors who use achievement tests lowing questions:
in their work. Find out what tests they use and a. What is the name of the test?
why and how they use the results. Report your b. For what purpose(s) was the test used in the
findings to the class. research study?
3. Administer an individual achievement test or c. Was the test reliable and valid? How do you
diagnostic test and write a report of the results. know?
References
Aiken, L. A., & Groth-Marnat, G. (2006). Psychological National Council of Teachers of Mathematics
testing and assessment (12th ed.). Boston, MA: Pearson. (NCTM). (2000). Principles and standards for school
American Educational Research Association (AERA), mathematics. Reston, VA: Author.
American Psychological Association (APA), & Phelps, R. P. (2012). The effect of testing on student
National Council on Measurement in Education achievement, 1910–2010. International Journal of
(NCME). (2014). Standards for educational and psy- Testing, 12(1), 21–43.
chological testing. Washington, DC: Authors. Reynolds, C. R., Livingston, R. B., & Willson, V.
Barton, P. E. (2003). Parsing the achievement gap: Base- (2008). Measurement and assessment in education.
lines for tracking progress. Princeton, NJ: ETS Policy Boston, MA: Pearson.
Information Center. Schrank, F. A., Mather, N., & McGrew, K. S. (2014).
Dunn, L. M., & Dunn, D. M. (2007). The Peabody Pic- Woodcock–Johnson IV. Itasca, IL: Riverside Publishing.
ture Vocabulary Test (4th ed.). Bloomington, MN: Sewell, M., Marczak, M., & Horn, M. (2001). The use of
NCS Pearson, Inc. portfolio assessment in evaluation. Retrieved from
Johnson, R. S., Mims, J. S., & Doyle-Nichols, A. (2009). http://ag.arizona.edu/fcs/cyfernet/cyfar
Developing portfolios in education: A guide to reflec- /Portfo~3.htm
tion, inquiry, and assessment. Thousand Oaks, CA: Texas Education Agency. (2014). STAAR Statewide
Sage. Summary Reports 2013–2104. Retrieved from http://
Kim, K. H., VanTassel-Baska, J., Bracken, B. A., Feng, www.tea.state.tx.us/index2.aspx?id=25769809035
A., & Stambaugh, T. (2014). Assessing science rea- US Department of Education. (2004). Building the Leg-
soning and conceptual understanding in the pri- acy: IDEA. Section 300.8 child with a disability.
mary grades using standardized and Retrieved from http://idea.ed.gov/explore
performance-based assessments. Journal of /view/p/,root,regs,300,A,300%252E8
Advanced Academics, 25(1), 47–66. Woodcock, R. W., McGrew, K. S., & Mather, N. (2001,
McDonald, B. (2012). Portfolio assessment: Direct 2007). Woodcock Johnson III Tests of Achievement.
from the classroom. Assessment & Evaluation in Rolling Meadows, IL: Riverside.
Higher Education, 37(3), 335–347. Ysseldyke, J. E., & Algozzine, B. (2006). Effective
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2012). assessment for students with special needs: A practical
Measurement and assessment in teaching (11th ed.). guide for every teacher. Thousand Oaks, CA: Cor-
Upper Saddle River, NJ: Pearson. win Press.
Mueller, J. (2006). Authentic assessment toolbox.
Retrieved from http://jonathan.mueller.faculty
.noctrl.edu/toolbox
CHAPTER
10 Assessment of Aptitude
Aptitude Tests
Aptitude can be defined as an innate or acquired ability to be good at something. Aptitude
should not be confused with ability, although the two terms are often interchanged. Abili-
ties typically refer to one’s “present-day” capabilities or the power to perform a specified
task, either physical or cognitive; whereas aptitudes are about one’s “potential” capabilities.
One of the best predictors of an individual’s potential to do well is whether they have a
high aptitude for a certain type of activity. Some people have strong mechanical abilities
and thus would do well working as engineers, technicians, or auto mechanics. Other indi-
viduals might have high artistic abilities and be best suited to working as interior design-
ers or in the performing arts. Thus, understanding an individual’s aptitudes can be very
helpful in advising him or her about appropriate training programs or career paths.
Aptitude tests (also referred to as prognostic tests) measure a person’s performance on
selected tasks to predict how that person will perform sometime in the future or in a dif-
ferent situation. The aptitude could be related to school performance, job performance, or
223
224 Chapter 10 • Assessment of Aptitude
some other task or situation. Aptitude tests measure one’s (a) acquired knowledge (from
instruction) and (b) innate ability in order to determine one’s potential.
Aptitude tests were originally designed to augment intelligence tests. It was found
that the results of intelligence tests did not have strong correlations with future success;
instead, certain abilities were identified as the largest contributors to success (Sparkman,
Maulding, & Roberts, 2012). This led to the construction of aptitude tests that targeted
more concrete or practical abilities. These tests were developed particularly for use in
career counseling and in the selection and placement of industrial and military personnel.
Some tests were designed to assess multiple aptitudes at one time, whereas others focus on
a single area. There are a vast number of aptitudes that tests may assess, either singly or in
multiples; the most commonly tested are mechanical, clerical, musical, and artistic apti-
tude. Because these measures attempt to determine how someone will do in the future on
the basis of current test results, there must be research that demonstrates the relationship
between identified abilities and future success.
The content of aptitude, intelligence, and achievement tests overlaps. For example,
vocabulary items are often part of all three types, as are numerical computations and rea-
soning items. One of the primary differences among these tests is how the tests are used.
Aptitude tests and intelligence tests are often used for predictive purposes, whereas
achievement tests measure what has been learned and are most often used for descriptive
purposes and assessment of growth and change. There is often confusion about the differ-
ence between aptitude and achievement tests.
Because of their future orientation, counselors use aptitude tests to help people plan
their educational and vocational futures on the basis of what appear to be their abilities
and interests. Although aptitude tests are more used by career, rehabilitation, and school
counselors, they are applicable to all areas of professional counseling. These measures are
frequently used in the selection of entry-level workers and for admissions into educational
and vocational programs. The aptitude tests we will discuss in this chapter are standard-
ized, norm-referenced tests that are either group or individually administered. Like
achievement tests, some aptitude tests are batteries that cover a number of aptitudes,
whereas others focus on a specialized area. We will present information about aptitude tests
using the following categories:
• Multiple-aptitude test batteries
• Specialized aptitude tests
• Admissions tests
• Readiness tests
most well-known multiple-aptitude batteries are the Armed Services Vocational Aptitude
Battery (ASVAB); the Differential Aptitude Test, Fifth Edition (DAT); the General Apti-
tude Test Battery (GATB); and the Career Ability Placement Survey (CAPS). Table 10.1
shows a comparison of the subtests on the ASVAB, DAT, GATB, and CAPS.
Armed Services Vocational Aptitude Battery Offered at more than 13,000 schools and
taken by more than 900,000 students per year, The ASVAB is the most widely used mul-
tiple-aptitude test battery in the world (Powers, 2014). Originally developed in 1968 by
the U.S. Department of Defense, it is a norm-referenced test that measures aptitudes for
general academic areas and career areas that encompass most civilian and military work.
The U.S. military uses the ASVAB as an entrance examination as well as to determine
specific job assignments and enlistment bonuses. There are three versions of the ASVAB:
the CAT-ASVAB (computer adaptive test), the MET-site ASVAB (mobile examination test
site), and the Student ASVAB (also referred to as the ASVAB Career Exploration Program
[ASVAB CEP]).
Although the ASVAB was originally developed for use by the military, the ASVAB
Career Exploration Program (ASVAB CEP) is used primarily with the civilian population
(United States Military Entrance Processing Command (USMEPC), Testing Division, 2005).
It is designed specifically to assist high school and postsecondary students with career
planning, whether students plan to enter the workforce, military, or other college or voca-
tional programs. The ASVAB CEP provides both web-based and printed materials to help
students explore possible career choices. Program materials include (1) the original apti-
tude battery (the ASVAB), (2) an interest inventory (Find Your Interests [FYI]), and (3)
other career planning tools. The main aptitude battery, the ASVAB, consists of eight tests,
contains a total of 200 items, and requires 3 hours to complete. The eight tests include Gen-
eral Science, Mathematics, Word Knowledge, Paragraph Comprehension, Electronics
Information, Auto and Shop Information, and Mechanical Comprehension. The ASVAB
yields standard scores (with a mean of 50 and a standard deviation of 10) and percentiles
for the eight tests as well as for three Career Exploration Scores: Verbal Skills, Math Skills,
and Science and Technical Skills (see the ASVAB summary results sheet in Figure 10.1).
226 Chapter 10 • Assessment of Aptitude
The Military Entrance Score (also known as the Air Force Qualifying Test [AFQT]) is
presented on the summary results sheet as a single percentile rank, which is the score that
determines a test taker’s eligibility for enlistment into the military. Each branch of the
military has minimum AFQT score requirements to quality for enlistment: Air Force
recruits must score at least 36 points on the AFQT, the Army requires a minimum score of
31, Marine Corp recruits must score at least 32, Navy recruits must score at least 35, and
the Coast Guard requires a minimum score of 36. Although there are minimum scores
listed, the majority of armed services enlistees score 50 or above (Powers, 2014). Because of
ongoing military efforts in various countries, the Army has issued waivers for those indi-
viduals who score as low as 26 on the ASVAB (Powers, 2014).
The FYI is a 90-item interest inventory developed for the ASVAB program. Based on
John Holland’s widely accepted theory of career choice, it assesses students’ occupational
interests in terms of six interest types: realistic, investigative, artistic, social, enterprising,
and conventional (i.e., RIASEC). Once test takers receive their ASVAB and FYI scores, they
can look up potential occupations in the OCCU-Find, an online resource from which stu-
dents can view descriptions of careers and learn how their skills compare with the skills
profiles of the occupations they are exploring. OCCU-Find contains information on almost
500 occupations sorted by Holland’s RIASEC codes.
A unique feature of the ASVAB is that counselors are able to compare an individual’s
results on the ASVAB tests and the Career Exploration Scores with the results of other
groups of test takers. To do this, the ASVAB summary results sheet provides separate per-
centile scores for three different groups: same grade/same sex, same grade/opposite sex,
and same grade/combined sex. For example, according to the summary results sheet
shown in Figure 10.1, the test taker earned a standard score of 55 on Verbal Skills. Results
indicate that this score corresponds to the 62nd percentile of performance for 11th-grade
females, the 64th percentile for 11th-grade males, and the 63rd percentile for all 11th-grade
students (males and females).
Information about the ASVAB’s psychometric properties is reported in the counse-
lor manual (United States Military Entrance Processing Command (USMEPC), Testing
Division, 2005). The ASVAB CEP has nationally representative norms of approximately
4,700 students in the 10th, 11th, and 12th grades and about 6,000 postsecondary students.
The norming sample was poststratification weighted by grade, gender, and the three
broad race/ethnic groupings of Black, Hispanic, and Other. (Note that poststratification
weights are used when there is an oversample of individuals with certain characteristics,
such as age, education, race, etc. With the ASVAB CEP, the unweighted norming sample
was oversampled by gender and race/ethnic groups.) The ASVAB appears to have evi-
dence of criterion-related validity in predicting success in educational and training pro-
grams and in various civilian and military occupations. It also has strong convergent
validity evidence with the ACT, the Differential Aptitude Tests, and the General Aptitude
Test Battery. Internal consistency reliability coefficients of the ASVAB composites and
subtests range from .88 to .91 and from .69 to .88, respectively. In order to further your
understanding of the ASVAB, complete Exercise 10.1, and respond to the exercise ques-
tions. Consider all of the information provided within the context of your learning experi-
ence related to assessment.
Differential Aptitude Tests For Personnel and Career Assessment The Differential
Aptitude Tests for Personnel and Career Assessment (DAT for PCA) are a series of tests
Chapter 10 • Assessment of Aptitude 227
Exercise 10.1
ASVAB: Halle
You are a school counselor working in at a scores, particularly the three separate sets
local high school. Halle, an 11th-grade stu- of percentile scores. She asks you to help
dent, recently received her ASVAB sum- her understand her test scores.
mary results sheet. She is confused by her
ASVAB Tests
General Science 47 40 43 47
Arithmetic Reasoning 25 21 23 41
Word Knowledge 90 89 89 60
Paragraph Comprehension 73 76 75 54
Mathematics Knowledge 23 24 23 43
Electronics Information 52 31 41 43
Auto and Shop Information 50 16 32 40
Mechanical Comprehension 42 27 37 45
Source: ASVAB Career Exploration Program: Counselor Manual (United States Military Entrance Processing
Command (USMEPC), Testing Division, 2005).
Both the Language Usage and Spelling subtests provide a good estimate of the ability
to distinguish correct from incorrect English usage. The Verbal Reasoning and Numerical
Ability subtest scores are combined to create a composite index of scholastic aptitude.
Overall, the DAT for PCA is useful in exploring an individual’s academic and voca-
tional possibilities. The test may also be used for the early identification of students with
superior intellectual promise. Test limitations relate to the lack of independence of some of
the scales and the separate gender norms. Both computerized-scoring services and hand-
scoring stencils are available for the test. Figure 10.2 presents an example of a profile from
the DAT. A Career Interest Inventory had been available with the DAT for PCA but was
discontinued as of September 2014.
General Aptitude Test Battery (GATB). The General Aptitude Test Battery (GATB) is one
of the oldest general aptitude tests still used today. Developed by the U.S. Employment
Service (USES; Dvorak, 1947), the GATB was originally designed for use by employment
Chapter 10 • Assessment of Aptitude 229
Spelling (1)
counselors in state employment services offices to match job applicants with potential
employers in the private and public sectors. Although it is still used in employment offices,
the instrument has become popular in both government and private organizations as a
method to screen large numbers of job applicants, as well as in high schools for career
counseling. The GATB is a paper-and-pencil test, can be administered to students from
grades 9 through 12 and adults, and takes approximately 2.5 hours to complete. The bat-
tery consists of 12 separately timed subtests, which make up nine aptitude scores:
1. Verbal Aptitude Vocabulary
2. Numerical Aptitude Computation, Arithmetic Reasoning
3. Spatial Aptitude Three-Dimensional Space
4. Form Perception Tool Matching, Form Matching
5. Clerical Perception Name Comparison
6. Motor Coordination Mark Making
7. Finger Dexterity Assemble, Disassemble
8. Manual Dexterity Place, Turn
9. General Learning Ability Vocabulary, Arithmetic Reasoning, Three-Dimensional Space
Career Ability Placement Survey The CAPS is a multiple-aptitude test battery designed
to measure vocationally relevant abilities. It is part of the COPSystem Career Measure-
ment Package, which is a comprehensive system that assesses an individual’s interests,
abilities, and values related to career choice (see Chapter 11). The CAPS is the ability
measure that can be used for matching students’ ability with occupational requirements,
educational or training program selection, curriculum evaluation, career assessment, or
employee evaluation. The battery takes 50 minutes to complete and can be used with mid-
dle through high school students, college students, and adults. The CAPS measures eight
ability dimensions:
1. Mechanical Reasoning Measures understanding of mechanical principals and
devices and the laws of physics.
2. Spatial Relations Measures the ability to think and visualize in three dimensions.
230 Chapter 10 • Assessment of Aptitude
3. Verbal Reasoning Measures the ability to reason with words and to understand and
use concepts expressed in words.
4. Numerical Ability Measures the ability to reason with and use numbers and work
with quantitative materials and ideas.
5. Language Usage Measures recognition and use of proper grammar, punctuation,
and capitalization.
6. Word Knowledge Measures understanding of the meaning and precise use of words.
7. Perceptual Speed and Accuracy Measures how well a person can perceive small
details rapidly and accurately within a mass of letters, numbers, and symbols.
8. Manual Speed and Dexterity Measures rapid and accurate hand movements.
There are self-scoring and machine-scoring forms for the test, as well as a computer-
ized scoring and reporting software program. All types of supportive materials are avail-
able for the counselor and client: a self-interpretation profile sheet and guide, occupational
cluster charts, the COPSystem Career Briefs Kit, and the COPSystem Career Cluster
Booklet Kits, in addition to an examiner’s manual, tapes for administration, and visuals
for orientation.
Clerical Ability Clerical ability refers to the skills needed for office and clerical duties.
Individuals with high clerical aptitude can process information quickly, are detail ori-
ented, and may be successful as bank tellers, administrative assistants, data processors, or
cashiers. Abilities related to clerical aptitude include the following:
• Calculation Ability to perform a series of simple arithmetic calculations.
• Checking Ability to quickly and accurately check verbal and numerical information
(e.g., names, addresses, code numbers, telephone numbers) against a target.
• Coding Ability to copy letter- and number-like symbols according to a specified
pattern.
• Filing Ability to identify the correct location for a set of named files.
• Keyboarding Ability to quickly and accurately type on a computer keyboard.
• Numerical Ability Ability to use numbers efficiently in clerical and administrative
contexts.
• Verbal Reasoning Includes spelling, grammar, understanding analogies, and fol-
lowing instructions.
There are several instruments commercially available to assess clerical ability. The
descriptions of these tests reveal that they are similar to some of the subtests of the group
intelligence tests and the multiple-aptitude test batteries. Although the tests have been
Chapter 10 • Assessment of Aptitude 231
around for decades, they remain valid measures for predicting performance (Whetzel
et al., 2011). The following are examples of several well-known clerical ability tests.
General Clerical Test-Revised. The General Clerical Test-Revised (GTC-R) was developed
to assess one’s ability to succeed in a position requiring clerical skills. It can be either group
or individually administered and takes approximately 1 hour. The instrument consists of
three subtests: Clerical, Numerical, and Verbal. The Clerical subtest assesses speed and
accuracy in performing perceptual tasks (checking and alphabetizing) that involve atten-
tion to detail. The Numerical subtest assesses math computation, numerical error location,
and math reasoning. The Verbal subtest evaluates spelling, reading comprehension,
vocabulary, and grammar.
Minnesota Clerical Test. The Minnesota Clerical Test (MCT) was developed to assist
employers in selecting individuals to fill detail-oriented jobs, especially positions that require
attention to number and letter details (such as bank tellers, receptionists, cashiers, and
administrative assistants). The instrument consists of two parts: number comparison and
name comparison. There are 200 pairs of numbers and 200 pairs of names, and in a specified
amount of time the examinee must correctly select the pairs with elements that are identical.
Clerical Abilities Battery. The Clerical Abilities Battery (CAB) was developed in coopera-
tion with General Motors Corporation and assesses skills most commonly needed for
administrative tasks. The battery is comprised of seven tests: Filing, Copying Information,
Comparing Information, Using Tables, Proofreading, Basic Math Skills, and Numerical
Reasoning. The tests can be administered in any combination to reflect the skills needed
for a particular position or task.
Mechanical Ability Mechanical ability is the capacity to learn about mechanical objects. It
is reflected in familiarity with everyday physical objects, tools, devices, and home repairs,
as well as spatial reasoning (i.e., the ability to visualize a three-dimensional object from a
two-dimensional pattern). Individuals with strong mechanical aptitude may be successful
working as engineers, mechanics, millwrights, machine operators, tool setters, and trades-
persons, to name a few. The following are examples of mechanical ability tests.
Bennett Mechanical Comprehension Test. A well-known mechanic aptitude test is the
Bennett Mechanical Comprehension Test (BMCT; Bennett, 1980). This test has been used
for over 60 years to select individuals for a wide variety of civilian and military occupa-
tions as well as training programs requiring mechanical aptitude. It focuses on spatial
perception and tool knowledge and is well suited for assessing individuals for jobs requir-
ing the operation and repair of mechanical devices. It has 68 multiple-choice items focus-
ing on the application of physical and mechanical principles in practical situations. Each
item portrays a drawing illustrating an object that reflects the concepts of force, energy,
density, velocity, and so on. The test is one of the most widely used mechanical aptitude
tests, and a considerable amount of research has supported the technical quality of this
instrument. However, the test is not without its concerns. For example, some items portray
mechanical concepts involving objects that females may be less familiar with than males.
Another concern involves the norm group. Although the test was renormed in 2005 and
included groups based on the industries and occupations in which the test is most fre-
quently used (i.e., auto mechanic, engineer, installation/maintenance/repair, industrial/
technical, skilled tradesperson, transportation/equipment operator), separate norming
data for males, females, and minorities was not reported.
232 Chapter 10 • Assessment of Aptitude
Clerical
Ability
Other Mechanical
Abilities Ability
Specialized
Ability
Tests
Musical Psychomotor
Ability Ability
Artistic
Ability
Mechanical Aptitude Test. The Mechanical Aptitude Test (MAT 3-C) is a quick evaluation
of a person’s ability to learn production and maintenance job activities. The test was
designed specifically to measure an individual’s potential to be successful in apprentice-
ship or trainee programs for maintenance mechanics, industrial machinery mechanics,
millwrights, machine operators, and tool setters. The test contains 36 multiple-choice items
and takes approximately 20 minutes to complete.
Wiesen Test of Mechanical Aptitude. The Wiesen Test of Mechanical Aptitude (WTMA) is
designed for selecting entry-level personnel for jobs involving the operation, maintenance,
and repair of mechanical equipment of various types. The WTMA measures basic ability
rather than formal schooling or job experience. It is a 60-item paper-and-pencil test that
takes approximately 30 minutes to complete. Each test item uses a simple drawing of
everyday objects to illustrate a mechanical principle or fact (e.g., basic machines, move-
ment, gravity or center of gravity, basic electricity or electronics, transfer of heat, basic
physical properties). Test items are brief and involve the function and/or use, size, weight,
shape, and appearance of common physical objects, tools, and devices. The WTMA is
appropriate for individuals aged 18 and older.
Source: Bennett Hand Tool Dexterity Test. Copyright © 1969 by NCS Pearson, Inc. Reproduced with permission.
All rights reserved.
234 Chapter 10 • Assessment of Aptitude
right hand, then with the left hand, and finally with both hands. Five separate scores can
be obtained with the Purdue Pegboard Test: (1) right hand, (2) left hand, (3) both hands, (4)
right plus left plus both hands (R 1 L 1 B), and (5) assembly. The test takes about 10 min-
utes to administer.
The Purdue Pegboard Test is also used to assess and identify issues related to func-
tional impairment in fine and gross motor dexterity. For example, the Purdue Pegboard
Test has been shown to be a valid tool for identifying functional impairment due to carpal
tunnel syndrome (Amirjani, Ashworth, Olson, Morhart, & Chan, 2011). As another exam-
ple, the Pegboard has been used to evaluate treatment effects on individuals diagnosed
with Parkinson’s disease (Eggers, Fink, & Nowak, 2010).
Artistic Ability The concept of artistic ability often suggests an ability to draw and paint
or appreciate great art. However, because what is considered great art varies from person
to person and from culture to culture, the criteria for measuring artistic ability is difficult
to determine (Aiken & Groth-Marnat, 2006). A number of tests have been published that
measure artistic ability; however, most are old and are no longer commercially available.
Some artistic tests measure art appreciation (i.e., judgment, perception), whereas others
assess artistic performance or knowledge of art. One of the most popular tests of art
appreciation is the Meier–Seashore Art Judgment Test (Meier, 1940), which measures one’s
ability to discern between better and worse artistic works. The test consists of pairs of
Chapter 10 • Assessment of Aptitude 235
pictures that differ in one feature. One of the pictures is “real” (i.e., corresponds to an
original work of art), and the other represents a simple variation of the original. Test tak-
ers are asked to identify the better (original) picture. A related artistic ability test is the
Meier Art Test of Aesthetic Perception. This test presents four versions of the same work—
each differing in terms of proportion, unity, form, or design—and test takers are asked to
rank each set in order of merit. The Graves Design Judgment Test measures artistic ability
by presenting test takers with 90 sets of two- or three-dimensional designs that vary in
unity, balance, symmetry, or other aspects of aesthetic order. Test takers are then asked to
select the best in each set.
Musical Ability Most musical ability tests assess the skills that musicians should possess,
such as the capacity to discriminate pitch, loudness, tempo, timbre, and rhythm. Tests are
available that purport to measure musical abilities, but all suffer from poor validity and
reliability. Probably the oldest and most famous musical ability test is the Seashore Measures
of Musical Talents. The test takes about 1 hour of testing time and can be administered to
students from grade 4 through grade 12 and adults. The test presents on tape six subtests
measuring dimensions of auditory discrimination: pitch, loudness, time, timbre, rhythm,
and tonal memory. Other tests that purport to measure musical aptitude include the Kwal-
wasser Music Talent Test, which is used with fourth-grade to college-age students. The
50-item test presents test takers with three-tone patterns that are repeated with variations in
pitch, tone, rhythm, or loudness. Another instrument is the Musical Aptitude Profile (MAP),
which consists of seven components: tonal imagery (melody and harmony), rhythm imagery
(tempo and meter), and musical sensitivity (phrasing, balance, and style). The test takes
about 3.5 hours to administer. Unfortunately, research has been limited on the value of
these tests, and many tests of musical aptitude have been criticized for a lack of ecological
validity. In other words, the tests do not approximate a realistic experience by which musi-
cal aptitude could be accurately measured (Karma, 2007). Music teachers tend to rely more
on their own clinical judgment and look for individual performance of music as it is written.
Other Abilities The categories of abilities that we discussed in this section (e.g., clerical,
mechanical, artistic, psychomotor, musical aptitude) represent only a few of the many
specialized aptitude tests. Tests are available to assess numerous other specific areas, such
as food services, industrial skills, legal skills, medical office, retail sales, transcription, and
many, many more. Furthermore, businesses can have aptitude tests custom made to iden-
tify qualified applicants for specialized jobs.
Admissions Tests
Most academic programs use aptitude test scores as part of their admissions requirements.
Admissions tests (sometimes referred to as entrance exams or academic ability tests) are
designed to predict performance in a particular educational program. Used in combina-
tion with other relevant information (e.g., high school grade point average, teacher recom-
mendations), they are a general indicator of future academic success. Thus, students who
score high on admissions tests would most likely perform better in college than those who
do not score high.
Most admissions tests assess some combination of verbal, quantitative, writing, and
analytical reasoning skills or discipline-specific knowledge. The tests aim to measure the
236 Chapter 10 • Assessment of Aptitude
most relevant skills and knowledge for mastering a particular discipline. A key psycho-
metric aspect of admissions tests is predictive validity, which is typically evaluated by
correlating test scores with a measure of academic performance, such as first-year grade
point average (GPA), graduate GPA, degree attainment, qualifying or comprehensive
examination scores, research productivity, research citation counts, licensing examination
performance, or faculty evaluations of students. Although the general verbal and quantita-
tive scales are effective predictors of academic success, the strongest predictor of success is
high school GPA (Fu, 2012).
One concern in admissions tests concerns bias against certain groups, including
racial, ethnic, and gender groups. Overall and across tests, research has found that average
test scores do not differ by race or ethnic group but do tend to underpredict the perfor-
mance of women in college settings (Kuncel & Hezlett, 2007). Fu (2012) conducted research
on a sample of approximately 33,000 undergraduate and graduate students from the
United States and abroad. Overall, Fu confirmed that traditional admission tests were
good predictors of success for all students. In fact, Fu found that scores on the SAT had a
high correlation with first-year GPA for international undergraduate students.
We will present information about some prominent admissions tests, including the
SAT, ACT, and others.
SAT Nearly every college in America accepts the SAT (formerly called the Scholastic
ptitude Test and the Scholastic Achievement Test) as a part of its admissions process.
A
Published by the College Board, millions of college- and university-bound high school
students take the SAT each year. The SAT consists of two tests: the Reasoning Test and
Subject Tests. The SAT Reasoning Test assesses the critical thinking skills needed for col-
lege and has three sections: mathematics, critical reading, and writing (see Table 10.2).
The test takes 3 hours and 45 minutes to complete, including an unscored 25-minute
experimental section (used for development of future test questions). It includes several
different item types, including multiple-choice questions, student-produced responses
(grid-ins), and a student-produced essay. The SAT Subject Tests are optional tests designed
to measure knowledge in specific subject areas, such as biology, chemistry, literature,
various languages, and history. Many colleges use the subject tests for admission and
course placement or to advise students about course selection. Some colleges specify the
subject tests required for admission, whereas others allow applicants to choose which
tests to take. Scores on the SAT Reasoning Test and SAT Subject Tests range from 200 to
800, with a mean of 500 and a standard deviation of 100. Percentiles are also given to pro-
vide comparisons among test takers’ scores.
Research studies have looked at the SAT’s ability to predict college success (i.e., pre-
dictive validity) by examining the relationship between test scores and first-year college
GPA (FYGPA). A study sanctioned by the College Board (Kobrin, Patterson, Shaw,
Mattern, & Barbuti, 2008) found correlation coefficients of .26, .29, and .33 between FYGPA
and the scores on the three SAT Reasoning Test sections—Mathematics, Critical Reading,
and Writing, respectively. They also found a correlation of .46 between FYGPA and com-
bined SAT score and high school GPA (HSGPA), indicating that colleges should use both
HSGPA and SAT scores to make the best predictions of student success. Fu (2012) con-
firmed these findings and demonstrated the utility of the SAT in predicting success for
international students. However, because there is a degree of error in the correlation
between FYPGA and SAT scores, it is important to consider a combination of information
sources when making admission decisions.
The Act Information System The ACT program is a comprehensive system for collecting
and reporting information about students planning to enter college or university. It con-
sists of four major components: the Tests of Educational Development, the Student Profile
section, the Course/Grade Information section, and the UNIACT Interest Inventory. A
description of each component follows.
• The Tests of Educational Development are multiple-choice tests designed to assess stu-
dents’ general educational development and their ability to complete college-level
work. Like the SAT, the ACT Tests of Educational Development can be taken by
students in the 11th or 12th grades. The tests cover four skill areas: English, mathe-
matics, reading, and science. The optional Writing Test is a 30-minute essay test that
measures students’ writing skills in English. The tests emphasize reasoning, analysis,
problem solving, and the integration of learning from various sources, as well as the
application of these proficiencies to the kinds of tasks that college students are
expected to perform.
• The Course/Grade Information section provides 30 self-reported high school grades in
the areas of English, mathematics, natural sciences, social studies, language, and the
arts. The courses include those that usually form a college’s core preparatory curricu-
lum and are frequently required for admission to college.
• The Student Profile section contains information reported by students when they reg-
ister for the ACT. This information includes the following categories: admissions or
238 Chapter 10 • Assessment of Aptitude
The Plan Program The PLAN assessment is part of the ACT program. It is considered a
“pre-ACT” test, similar to the PSAT, and is a predictor of success on the ACT. Designed to
be taken by 10th graders, the PLAN consists of four academic achievement tests in English,
mathematics, reading, and science. Other components of the PLAN program include (a)
students’ needs assessment, (b) high school course and grade information, (c) the UNIACT
Interest Inventory, and (d) the Educational Opportunity Services (EOS), which links students
with college and scholarship information based on their PLAN results.
Miller Analogies Test The Miller Analogies Test (MAT) is another admissions exam taken
by individuals applying to graduate schools. The MAT is a 50-item, group-administered
test consisting of 100 multiple-choice analogy items. The content for the analogies comes
from literature, social sciences, chemistry, physics, mathematics, and general information.
In each MAT analogy item, one term is missing and has been replaced with four answer
options, only one of which correctly completes the analogy. The terms in most of the MAT
Chapter 10 • Assessment of Aptitude 239
analogy items are words, but in some cases they may be numbers, symbols, or word parts.
For example:
PLANE : AIR :: CAR : (a. submarine, b. fish, c. land, d. pilot)
The first step in solving a MAT analogy is to decide which two of the three given
terms form a complete pair. In the example, this could either be “PLANE is related to AIR”
(the first term is related to the second term) or “PLANE is related to CAR” (the first term
is related to the third term). On the MAT, it will never be “PLANE is related to (a. subma-
rine, b. fish, c. water, d. pilot)”; the first term is never related to the fourth term.
Other Admissions Tests Other widely used admission tests in higher education include
the Medical College Admissions Test (MCAT), Law School Admissions Test (LSAT), and the
Graduate Management Admissions Test (GMAT). The most important consideration in the
use of these tests is whether the validity of the test for a particular program has been estab-
lished, especially through studies at a local college or university.
Readiness Tests
The term readiness test is applied to aptitude tests used to predict success when a child
enters school. School readiness has historically been defined as readiness to learn specific
material and be successful in a typical school context (Bracken & Nagle, 2007). Schools
frequently use readiness tests to judge whether children are “ready” for kindergarten or
the first grade. These tests are similar to admissions tests, because they measure current
knowledge and skills in order to forecast future academic performance; the distinction is
that the term readiness test is used specifically in reference to tests for young children.
In most states, eligibility to enroll in kindergarten begins at age 5.1. Although chil-
dren may meet this specific age requirement, they vary widely in how well prepared they
are for kindergarten. For example, some children have considerable prekindergarten edu-
cation experiences and are able to recognize letters, numbers, and shapes, whereas other
children may have fewer abilities related to kindergarten success. Readiness tests are used
to determine the extent to which children have mastered the underlying skills deemed
necessary for school learning. Most readiness tests assess facets of the following five
domains (Boan, Aydlett, & Multunas, 2007, p. 52):
• Physical Focuses on impairments in sensory functions, motor skills, illnesses or
medical conditions, growth, and overall well-being
• Social and Emotional Involves age-appropriate social skills, psychological well-
being, self-perceptions, and interpersonal interactions
• Learning Includes attention, curiosity, and enthusiasm for learning
• Language Involves verbal communication skills, nonverbal communication skills,
and early literacy skills
• Cognition and General Knowledge Emphasizes problem-solving skills, abstract rea-
soning, early mathematical skills, and overall fund of knowledge
The majority of schools in the United States administer readiness tests (Csapó,
olnár, & Nagy, 2014). Children who are deemed “not ready” often wait another year
M
before starting school. Readiness tests can be used to develop curricula and to establish
individualized instruction. Most readiness tests are designed to assess areas related to
240 Chapter 10 • Assessment of Aptitude
school tasks, such as general knowledge, language, and health and physical functioning.
Research finds the predictive validity of these tests to be extremely limited; thus, their use
in making decisions about kindergarten enrollment is highly questionable. Often, the con-
cept of readiness is impacted more by parental factors than academic readiness. For exam-
ple, socioeconomic adversity, parental demoralization, and support for learning have been
shown to impact readiness (Okado, Bierman, & Welsh, 2014). Thus, there is a potential for
cultural bias when assessing school readiness. We caution counselors and other helping
professionals in the use of such assessments without a range of other substantial informa-
tion to support the findings. Examples of school readiness tests include the Kaufman
Survey of Early Academic and Language Skills (K-SEALS), the Kindergarten Readiness
Test (KRT), the School Readiness Test (SRT), the Metropolitan Readiness Tests, and the
Bracken School Readiness Assessment (BSRA-3).
There have been several concerns cited about using readiness tests for making impor-
tant decisions about children’s lives. Although most researchers, educators, and policy-
makers agree on the dimensions essential to school readiness (e.g., physical development;
emotional and social development; learning, language, and cognition; general knowledge),
there is some debate as to whether these dimensions are accurate or complete (Meisels,
1999). Also, assessing preschool-age children is challenging because of their rapid and
uneven development, which can be greatly impacted by environmental factors (Saluja,
Scott-Little, & Clifford, 2000). Furthermore, typical standardized paper-and-pencil tests
suitable for older children are not appropriate for children entering school (Shepard,
Kagan, & Wurtz, 1998).
Summary
Aptitude tests measure an individual’s perfor- Battery, the General Aptitude Test Battery, and
mance on selected tasks to predict how that per- the Differential Aptitude Test battery.
son will perform sometime in the future or in a Specialized aptitude tests are designed to
different situation. The aptitude could involve measure the ability to acquire proficiency in a
school performance, job performance, or some specific area of activity, such as art or music.
other task or situation. Aptitude tests measure Mechanical and clerical aptitude tests have been
both acquired knowledge and innate ability in used by vocational education and business and
order to determine one’s potential. Multiple- industry personnel to counsel, evaluate, clas-
aptitude batteries are used in educational and sify, and place test takers. Both multifactor and
business contexts, as well as with career guid- special aptitude tests are designed to help the
ance systems. The most widely used batteries test takers gain a better understanding of their
are the Armed Services Vocational Aptitude own special abilities.
Suggested Activities
1. Write a critique of a widely used multiple- vocational assessment and was administered the
aptitude test battery or specialized aptitude test. DAT, with the following results:
2. Interview counselors who use aptitude tests—
employment counselors, career counselors, and Scale Percentile
school counselors—and find out what tests Verbal Reasoning 40%
they use and why. Report your findings to the Numerical Reasoning 55%
class.
3. Take a multiple-aptitude test battery or special- Abstract Reasoning 20%
ized aptitude test, and write a report detailing the Mechanical Reasoning 55%
results and your reaction to the test. Space Relations 30%
4. Make an annotated bibliography of aptitude tests Spelling 3%
you would use in your field.
5. Albert is a 23-year-old male who dropped out of Language Usage 5%
school in the 10th grade. He is enrolled in a high Perceptual Speed and Accuracy 45%
school equivalency program at a local commu-
nity college sponsored by the Private Industry a. How would you characterize Albert’s aptitudes?
Council. He has had numerous jobs in the food b. What additional information about Albert
services industry but has not been able to hold on would you like to have?
to them. He has a wife and three children and c. If you were a counselor, what educational and
now realizes he needs further training and educa- vocational directions would you encourage
tion to support his family. He participated in Albert to explore?
References
Ackerman, D., & Barnett, W. S. (2005). Prepared for Cermak & D. Larkin (Eds.), Developmental coordina-
kindergarten: What does “readiness” mean? New tion disorder (pp. 1–22). Albany, NY: Delmar.
Brunswick, NJ: National Institute for Early Educa- Csapó, B., Molnár, G., & Nagy, J. (2014). Computer-
tion Research, Rutgers University. based assessment of school readiness and early
Aiken, L. A., & Groth-Marnat, G. (2006). Psychologi- reasoning. Journal of Educational Psychology, 106(3),
cal testing and assessment (12th ed.). Boston, MA: 639–650.
Pearson. Dvorak, B. J. (1947). New United States employment
Amirjani, N., Ashworth, N. L., Olson, J. L., Morhart, service general aptitude test battery. Journal of
M., & Chan, K. M. (2011). Validity and reliability Applied Psychology, 31, 372–376.
of the Purdue Pegboard Test in carpal tunnel syn- Eggers, C., Fink, G., & Nowak, D. (2010). Theta burst
drome. Muscle & Nerve, 43(2), 171–177. doi: stimulation over the primary motor cortex does
10.1002/mus.21856 not induce cortical plasticity in Parkinson’s dis-
Bennett, G. K. (1980). Test of mechanical comprehension. ease. Journal of Neurology, 257(10), 1669–1674. doi:
New York, NY: The Psychological Corporation. 10.1007/s00415-010-5597-1
Boan, C., Aydlett, L. A., & Multunas, N. (2007). Early Fletcher-Janzen, C. R., & Reynolds, E. (2014). Encyclo-
childhood screening and readiness assessment. In pedia of special education: A reference for the education
B. A. Bracken & R. J. Nagle (Eds.), Psychoeducational of children, adolescents, and adults with disabilities
assessment of preschool children (4th ed., pp. 49–67). and other exceptional individuals (Vol. 1). Hoboken,
Mahwah, NJ: Lawrence Erlbaum Associates. NJ: John Wiley & Sons.
Bracken, B. A., & Nagle, R. J. (Eds.). (2007). Psychoedu- Fu, Y. (2012). The effectiveness of traditional admis-
cational assessment of preschool children (4th ed.). sions criteria in predicting college and graduate
Mahwah, NJ: Lawrence Erlbaum Associates. success for American and international students.
Cermak, S. A., Gubbay, S. S., & Larkin, D. (2002). What Dissertation Abstracts International Section A, 73,
is developmental coordination disorder? In S. A. 1383.
242 Chapter 10 • Assessment of Aptitude
Karma, K. (2007). Musical aptitude definition and Saluja, G., Scott-Little, C., Clifford, R. M. (2000).
measure validation: Ecological validity can endan- Readiness for School: A Survey of State Policies
ger the construct validity of musical aptitude tests. and Definitions. Early Childhood Research & Prac-
Psychomusicology: A Journal of Research in Music tice, 2(2). Retrieved from http://ecrp.uiuc.edu
Cognition, 19(2), 79–90. /vsns/saluja.html
Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. Shepard, L. A., Kagan, S. L., & Wurtz, E. (Eds.).
D. & Barbuti, S. M. (2008). Validity of the SAT for (1998). Principles and recommendations for early
predicting first-year college grade point average (Col- childhood assessments. Washington, DC: National
lege Board Research Report No. 2008-5). New Education Goals Panel.
York, NY: College Board Publications. Sparkman, L. A., Maulding, W. S., & Roberts,
Kuncel, N. R., & Hezlett, S. A. (2007). Standardized J. G. (2012). Noncognitive predictors of student
tests predict graduate students’ success. Science, success in college. College Student Journal, 46(3),
315, 1080–1081. 642–652.
Meier, N. C. (1940). Meier Art Tests: Part I. Art judg- Spironello, C., Hay, J., Missiuna, C., Faught, B. E., &
ment. Iowa City, IA: Bureau of Educational Cairney, J. (2010). Concurrent and construct valida-
Research and Service, University of Iowa. tion of the short form of the Bruininks–Oseretsky
Meisels, S. (1999). Assessing readiness. In R. C. Pianta Test of Motor Proficiency and the Movement-ABC
& M. Cox (Eds.), The transition to kindergarten: when administered under field conditions: Impli-
Research, policy, training, and practice (pp. 39–66). cations for screening. Child: Care, Health, and Devel-
Baltimore, MD: Paul Brookes. opment, 36(4), 499–507. doi: 10.1111/j.1365-2214.
Okado, Y., Bierman, K. L., & Welsh, J. A. (2014). Pro- 2009.01066.x
moting school readiness in the context of socio- United States Military Entrance Processing Com-
economic adversity: Associations with parental mand (USMEPC), Testing Division. (2005).
demoralization and support for learning. Child & ASVAB career exploration program: Counselor man-
Youth Care Forum, 43(3), 353–371. ual. North Chicago, IL: Author.
Powers, R. A. (2014). Types of ASVAB tests. Retrieved Whetzel, D. L., McCloy, R. A., Hooper, A., Russell,
from http://usmilitary.about.com/od T. L., Waters, S. D., Campbell, W. J., & Ramos,
/joiningthemilitary/a/asvabtype.htm R. A. (2011). Meta-analysis of clerical perfor-
Saluja, G., Scott-Little, C., & Clifford, R. (2000). Read- mance predictors: Still stable after all these
iness for school: A survey of state policies and years. International Journal of Selection and Assess-
definitions. ECRP Early Childhood Research & Prac- ment, 19(1), 41–50. doi: 10.1111/j.1468-2389.
tice. Retrieved from http://ecrp.uiuc.edu/v2n2 2010.00533.x
/saluja.html
CHAPTER
inventories, including the Self-Directed Search (SDS), the Strong Interest Inventory
(SII), and the Campbell Interest and Skill Survey (CISS).
JJ Describe work values and explain how they influence the process of career decision
making.
JJ Describe the use of personality inventories in career assessment.
JJ Discuss combined assessment programs, such as the COPSystem, the Kuder Career
Career Assessment
Career assessment is the foundation of career planning. It is used to help individuals under-
stand themselves better and to find career options to their liking. People may need help
with career decisions at many points in their lives, such as when entering the world of
work after high school or college graduation, during midlife career change, after losing a
243
244 Chapter 11 • Career and Employment Assessment
job, or after moving to a new town. Career assessment is a process that helps individuals
make informed decisions about future career prospects. It can help individuals to identify
and clarify their interests, values, and personality and to begin exploring career options.
Career assessment involves gathering information using several assessment instruments
and strategies. We will describe these methods, focusing on interest inventories, values
inventories, personality inventories, abilities assessment, career development instruments,
combined assessment programs, and interviews.
Interest Inventories
Interest inventories are among the most popular tools selected by career counselors.
Interests are likes or preferences, or, stated another way, things that people enjoy
(Harrington & Long, 2013). In psychological theory, some theorists hold to the doctrine
that learning cannot take place without a feeling of interest. Sometimes, interest is synony-
mous with motivation. Career development theorists have pointed out the importance of
interest. Strong (1927) postulated that interest is in the domain of motivation and that there
are clusters of attitudes, interests, and personality factors related to choice of and satisfac-
tion with an occupation.
Most people want a career that is tied to their interests, and evaluating interest is a
common component of career assessment. Various techniques that measure interest are
available, such as inventories, checklists, structured and unstructured interviews, and
questionnaires. Interest inventories have been used in the helping professions for over half
a century. E. K. Strong published his first version of the Strong Interest Blank in 1927. A cur-
rent version of the test, the Strong Interest Inventory, is widely used in the field today. It
provides a profile of how an individual’s interests compare with those of successful people
in certain occupational fields. Another interest inventory, the Kuder Preference Record—
Vocational, was first published in 1932. Kuder’s test initially measured interest in 10 gen-
eral areas, such as outdoor, literary, musical, artistic, and scientific. Today, Kuder, Inc.
provides a wide range of Internet-based tools and resources that help individuals identify
their interests, explore their options, and plan for career success.
In career assessment, interest inventories are used specifically for evaluating how
closely an individual’s interests match those of various occupations or with education or
training requirements that correspond to occupations. Some interest inventories are part of
a comprehensive career assessment package that includes measures of ability, work val-
ues, and personality. We will provide information about a few of the more prominent
interest inventories.
Self-Directed Search, Fifth Edition The SDS (Holland, 1994) is one of the most widely
used career interest inventories. It was first developed by Dr. John Holland in 1971 and
has been revised several times. The SDS guides examinees through an evaluation of
their abilities and interests. This inventory is a tool for high school and college students
or adults returning to the workforce to find occupations that best fit their interests and
abilities. It can be used by anyone between the ages of 15 and 70 in need of career guid-
ance. The SDS is easy and uncomplicated to administer and score and is available to
take online. The website for the fifth edition of the SDS has been redesigned to allow
greater access to the general public (see self-directed-search.com). The SDS is based on
Chapter 11 • Career and Employment Assessment 245
Holland’s theory (i.e., the RIASEC model) that people can be classified into six different
groups:
Realistic (R) These people are usually practical, physical, hands-on, tool-oriented
individuals. They like realistic careers, such as auto mechanic, aircraft controller,
surveyor, electrician, and farmer.
Investigative (I) These people are analytical, intellectual, scientific, and explorative.
They prefer investigative careers, such as biologist, chemist, physicist, geologist,
anthropologist, laboratory assistant, and medical technician.
Artistic (A) These people are typically creative, original, and independent. They like
artistic careers, such as composer, musician, stage director, dancer, interior decora-
tor, actor, and writer.
Social (S) These people are cooperative, supportive, helping, and healing/nurturing.
They like social careers, such as teacher, counselor, psychologist, speech therapist,
religious worker, and nurse.
Enterprising (E) These people like competitive environments, have strong leader-
ship qualities, and like to influence people. They prefer enterprising careers, such
as buyer, sports promoter, business executive, salesperson, supervisor, and
manager.
Conventional (C) These people are detailed oriented and have strong organizational
skills. They prefer conventional careers, such as bookkeeper, financial analyst,
banker, tax expert, secretary, and radio dispatcher.
On the inventory, individuals evaluate and record their interests and abilities; specifi-
cally, they are asked to identify their daydreams, activities, likes or dislikes, competencies,
occupations of interest, and self-appraisals in mechanical, scientific, teaching, sales, and cler-
ical fields. After taking the inventory, the SDS provides an individual with his or her three-
letter summary code. This code is based on the RIASEC categories and reflects a combination
of the test taker’s interests. For example, say that an examinee’s three-letter summary code is
IEA (i.e., Investigative, Enterprising, and Artistic). The first letter of the code shows the type
the individual most closely resembles, the second letter shows the type he or she next most
closely resembles, and the third letter shows the third closest type he or she resembles.
The SDS has several forms for use with specific populations:
• Form R, Career Development Helps individuals not yet in the workforce gain insight
into the world of work and match occupations with their interests and skills
• Form E, Vocational Guidance Assists individuals with limited reading skills as they
explore vocational options
• Form CP, Career Path Management Focuses on the needs of individuals who have
or aspire to have high levels of responsibility
SDS Form R also has an Internet version (Holland et al., 2001) that provides a print-
able Interpretive Report. The report identifies the examinee’s three-letter summary code
and lists the occupations, fields of study, and leisure activities that correspond to the sum-
mary code. To describe potential occupations, the report provides information in four
columns (see Table 11.1). Column one is the SDS code. Column two (Occupation) lists the
occupations that are determined from the test taker’s three-letter summary code. The
246 Chapter 11 • Career and Employment Assessment
TABLE 11.1 Sample List of Occupations from the Self-Directed Search Interpretive Report
SDS Code Occupation O*NET Code ED
ISC Computer network specialist 15-1152.00 3
Dialysis Technician 29-2099.00 3
Linguist 19-3099.00 5
Market Research Analyst 11-2011.01 4
Microbiologist 19-1022.00 5
Physician, Occupational 29-1062.00 5
Source: Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204
North Florida Avenue, Lutz, Florida 33549, from the Self-Directed Search Software Portfolio (SDS-SP) by
Robert C. Reardon, Ph.D. and PAR Staff, Copyright 1985, 1987, 1989, 1994, 1996, 1997, 2000, 2001, 2005, 2008,
2013. Further reproduction is prohibited without permission from PAR, Inc.
occupations are taken from the Occupational Information Network (O*NET), which has a
taxonomy of nearly 1,000 occupations. Column three is the O*NET code that provides the
corresponding number for each occupation. Column three (ED) shows the level of educa-
tion required for each occupation. The numbers under ED have the following meanings:
1. means that elementary school training or no special training is required;
2. means that high school or GED is usually needed;
3. means that community college or technical education is usually needed;
4. means that college is required; and
5. means that an advanced degree is required.
Strong Interest Inventory The SII (or the Strong) is one of the most respected measures
of vocational interests in the United States. The inventory has been widely used in educa-
tional settings, public institutions, and private organizations for over 80 years. The first
version of the SII instrument (then called the Strong Vocational Interest Blank) was pub-
lished by E. K. Strong in 1927, and its most recent revision was in 2012 (Donnay, Thompson,
Morris, & Schaubhut, 2012). The SII instrument was designed for evaluating career-related
interests with adults, college students, and high school students aged 14 and up (Harmon,
Hansen, Borgen, & Hammer, 1994). It assesses a client’s interests among a broad range of
occupations, work and leisure activities, and educational subjects and yields results for
several themes and scales:
• Six General Occupational Themes Interest patterns based on Holland’s RIASEC cat-
egories (i.e., realistic, investigative, artistic, social, enterprising, and conventional)
• 30 Basic Interest Scales Specific interest areas within the six General Occupational
Themes
• 244 Occupational Scales The individual’s interests related to satisfied workers
within various occupations
• Five Personal Style Scales The individual’s preferences related to work style, learn-
ing environment, leadership style, risk taking, and team orientation
Results are provided on the SII® Profile and Interpretive Report, which is several pages long
and consists of a comprehensive presentation of an individual’s scores. A Profile Summary
Chapter 11 • Career and Employment Assessment 247
Five Personal
Style Scales
Six General
Occupational
Themes
30 Basic
Interest Scales
244 Occupational
Scales
is also provided that gives a graphic snapshot of the test taker’s results for immediate easy
reference. This includes the individual’s (a) three highest General Occupational Themes,
(b) top five interest areas, (c) areas of least interest, (d) top 10 Strong occupations, (e) occu-
pations of dissimilar interest, and (f) personal style scales preferences. Figure 11.1 shows
an example of an SII® Profile Summary.
Campbell Interest and Skill Survey The CISS (Campbell, Hyne, & Nilson, 1992) is another
instrument that measures interests as well as self-estimates of skills. Its major purpose is to
help individuals understand how their interests and skills map into the occupational
world, thus helping them make better career choices. It focuses on careers that require
postsecondary education and is most appropriate for use with individuals who are college
bound or college educated. The CISS contains 200 interest items, and test takers are asked
to rate their level of interest on each item using a 6-point scale ranging from Strongly Like
to Strongly Dislike. The CISS also contains 120 skill items, which ask respondents to rate
their skill level on a 6-point scale from Expert: Widely recognized as excellent in this area to
None: Having no skills in this area. The CISS yields scores on the three scales: Orientation
Scales, Basic Interest and Skills Scales, and Occupational Scales.
I. Orientation Scales These scales cover seven broad themes of occupational interests
and skills. Each Orientation Scale corresponds to Holland’s RIASEC themes (the
comparable Holland dimensions are noted in parentheses):
1. Influencing (Enterprising)—influencing others through leadership, politics, public
speaking, sales, and marketing
248 Chapter 11 • Career and Employment Assessment
For each CISS scale (Orientation Scales, Basic Interest and Skill Scales, and Occupa-
tional Scales), two scores are calculated, one based on the test taker’s interests and the
other based on skills. The interest score shows how much the respondent likes the specified
activities; the skill score shows how confident the respondent feels about performing these
activities. The CISS uses T scores (with a mean of 50 and a standard deviation of 10), and
the combination of interest and skill score results in four patterns:
• Pursue When interest and skill scores are both high (Ú 55), the respondent has
reported both attraction to these activities and confidence in his or her ability to per-
form them well. This is an area for the respondent to Pursue.
• Develop When the interest score is high (Ú 55) and the skill score is lower (6 55),
this is a possible area for the respondent to Develop. The respondent may enjoy these
activities but feels uncertain about his or her ability to perform them. Further educa-
tion, training, or experience with these skills might lead to better performance and
greater confidence, or these skills may simply be enjoyed as hobbies.
• Explore When the skill score is high (Ú 55) and the interest score is lower (6 55),
this is a possible area for the respondent to Explore. The respondent is confident of
his or her ability to perform these activities but does not enjoy them. With some
exploration, the respondent may find a way to use these skills in other areas that
hold more interest.
• Avoid When interest and skill scores are both low (… 45), this is an area for the
respondent to Avoid in career planning. The respondent has reported not enjoying
these activities and not feeling confident in his or her ability to perform them.
that reflect a range of positions in today’s workforce—including skilled trades and techni-
cal and service professions—requiring 2 years or less of postsecondary training. The Career
Assessment Inventory-Enhanced Version compares an individual’s occupational interests
to those of individuals in 111 specific careers that reflect a broad range of technical and
professional positions in today’s workforce.
The Harrington–O’Shea Career Decision-Making System—Revised (CDM-R) is appropri-
ate for individuals ages 12 and older. The CDM-R provides scale scores in six interest areas
(e.g., crafts, arts, scientific, social). It assesses abilities, interests, and work values in one
instrument. The CDM Level 1 is for middle school students. The CDM Level 2 is for high
school and college students and adults.
The Interest Determination, Exploration, and Assessment System (IDEAS) provides scores
on such scales as mechanical, electronics, nature/outdoor, science, numbers, writing, arts/
crafts, social service, child care, medical service, business, sales, office practice, and food
service. The instrument is designed for use in grades 6 through 12.
The Jackson Vocational Interest Survey (JVIS) is used with high school students, college
students, and adults and has 34 basic interest scales, such as creative arts, performing arts,
mathematics, skilled trades, dominant leadership, business, sales, law, human relations,
management, and professional advising. It also has 10 occupational themes, such as asser-
tive, enterprising, helping, and logical.
Counselors need to be aware of a number of problem areas when using interest
inventories. First, interest does not equate to aptitude. Although someone may be inter-
ested in pursuing a career as an attorney, they may not have the potential for academic
success in that area. Also, because interest inventories are completed based on an indi-
vidual’s own perception of him- or herself (self-report), there may be some issues with
reliability of the results. Counselors should check the validity of test results by asking
examinees about their interests and activities, inquiring about their preferences, and find-
ing out why areas are liked and disliked. For example, individuals may have had no expe-
rience or opportunity in certain interest areas and may not have been able to identify some
potential career options. Also, some of the reported likes and dislikes by a client may be
influenced by parent, spouse, or family attitudes and not be true indicators of the indi-
vidual’s preferences. As you work with clients on career assessment, it is important that
you stress that interests should not be determined by how clients spend their time. Time
may be spent doing things under pressures from parents, spouse, family, or peers. Or indi-
viduals simply may not have time to do the things they most enjoy. Other problems with
interest inventories include the following:
1. Even though there is much evidence of the stability of interests, especially from the
late teens on, some clients change their interests dramatically in their adult years.
2. Instruments given before grade 10 or 11 may not be accurate measures of interests.
Students may not have had the necessary background of experiences, real or vicarious.
3. Job success is usually correlated more to abilities than interests.
4. Many interest inventories are susceptible to faking, either intentional or unintentional.
5. Response set may affect the validity of an individual’s profile. Clients may choose
options they consider more socially desirable or acquiescence may prevail.
6. High scores are not the only valuable scores on an interest inventory. Low scores,
showing what people dislike or want to avoid, are often more predictive than high
scores.
250 Chapter 11 • Career and Employment Assessment
7. Societal expectations and traditions may prove more important than interest in deter-
mining vocational selection. Gender bias was a major concern in interest measure-
ment during the past several decades and needs to be considered when selecting
instruments and interpreting profiles.
8. Socioeconomic class may affect the pattern of scores on an interest inventory.
9. The inventories may be geared to the professions rather than to skilled and semi-
skilled areas. Many inventories were criticized because they were geared exclusively
to college-bound students.
10. A profile may be flat and hard to interpret. In such a situation, a counselor should use
other instruments and techniques to determine interests.
11. Tests use different types of psychometric scoring procedures. Some interest invento-
ries use forced-choice items, in which individuals are asked to choose from a set of
options. Examinees may like or dislike all of the choices but still must choose. Scoring
procedures will have an impact on interpretation of results.
Rokeach Values Survey The RVS is a rank-order instrument in which the test taker is
presented with 18 instrumental values and 18 terminal values and asked to rank them.
Terminal values are the end state we hope to achieve in life, such as freedom, a world of
peace, family, security, pleasure, health, excitement, or a comfortable economic life. Instru-
mental values are the means by which we achieve terminal values, such as being polite,
ambitious, caring, self-controlled, obedient, or helpful.
*Super’s Work Values Inventory-Revised. Copyright © Kuder, Inc. All rights reserved.
252 Chapter 11 • Career and Employment Assessment
Salience Inventory The SI measures the value and preference that an individual places
on his or her career roles in comparison to other life roles. The SI measures the importance
of five major life roles: student, worker, homemaker, leisurite, and citizen. The inventory
consists of 170 items, resulting in 15 scales that examine the relative importance of the five
major life roles in participation, commitment, and value expectations. The inventory can
be administered to adults and high school students.
Minnesota Importance Questionnaire The MIQ measures 20 vocational needs and six
underlying values related to those needs. The values assessed are achievement, altruism,
autonomy, comfort, safety, and status. The instrument is for individuals ages 15 and up
and takes about 20 to 35 minutes to complete.
Personality Inventories
When we think of personality, we think of the thoughts, behaviors, and emotions that are
enduring and distinguish one person from another. The career development theories of
John Holland (1996) and Donald Super (1990) identify personality as a key factor in career
choice. They believe that individuals’ personality characteristics and traits are related to
their career choice and to their ability to perform successfully at a given occupation.
Assessing personality in career assessment usually involves instruments that meas-
ure “normal” personality characteristics rather than maladaptive or “pathological” per-
sonality traits. Characteristics such as sociability, motivation, attitude, responsibility,
independence, and adjustment are the types of personality traits that are important for
career exploration and choice. For example, the Myers–Briggs Type Indicator (MBTI) can be
used in career assessment as a tool to help individuals understand themselves and others,
and how they approach problems in different ways. The test can help organizations with
career planning and development, improving teamwork, conflict resolution, and improv-
ing individual communication with supervisors, peers, and employers. There are 16 per-
sonality types possible on the MBTI, which are more fully described in Chapter 12.
Understanding a client’s personality type can assist in career planning by helping the cli-
ent select a career field that is a good fit for the client’s personality. Furthermore, it can
increase the client’s awareness of his or her learning style so that he or she can benefit more
from career-related education or vocational training programs. Consider the following
example:
Skills:
• Active Listening: Giving full attention to what other people are saying, taking time to understand
the points being made, asking questions as appropriate, and not interrupting at inappropriate times.
• Social Perceptiveness: Being aware of and understanding others’ reactions.
• Reading Comprehension: Understanding written sentences and paragraphs in work related documents.
• Service Orientation: Actively looking for ways to help people.
• Speaking: Talking to others to convey information effectively.
• Critical Thinking: Using logic and reasoning to identify the strengths and weaknesses of
alternative solutions, conclusions or approaches to problems.
• Time Management: Managing one’s own time and the time of others.
• Writing: Communicating effectively in writing as appropriate for the needs of the audience.
• Active Learning: Understanding the implications of new information for both current and future
problem-solving and decision-making.
• Coordination: Adjusting actions in relation to others’ actions.
Abilities
• Oral Expression: The ability to communicate information and ideas in speaking so others will
understand.
• Oral Comprehension: The ability to listen to and understand information and ideas presented
through spoken words and sentences.
• Problem Sensitivity: The ability to tell when something is wrong or is likely to go wrong. It does
not involve solving the problem, only recognizing there is a problem.
• Speech Clarity: The ability to speak clearly so others can understand you.
• Inductive Reasoning: The ability to combine pieces of information to form general rules or
conclusions (includes finding a relationship among seemingly unrelated events).
• Written Expression: The ability to communicate information and ideas in writing so others will
understand.
• Deductive Reasoning: The ability to apply general rules to specific problems to produce answers
that make sense.
• Speech Recognition: The ability to identify and understand the speech of another person.
• Written Comprehension: The ability to read and understand information and ideas presented in
writing.
• Near Vision: The ability to see details at close range (within a few feet of the observer).
FIGURE 11.3 O*NET Description of skills and abilities required for educational, vocational, and school
counselors.
Source: U.S. Department of Labor National Center for O*NET Development (2007). O*NET Online Summary
Report for: 21-1012.00—Educational, Vocational, and School Counselors. Retrieved August 1, 2007 from
http://online.onetcenter.org/link/summary/21-1012.00
Chapter 11 • Career and Employment Assessment 255
Kuder Skills Assessment The KSA is part of the comprehensive online Kuder Career Plan-
ning System, which includes two partner assessments, the Kuder Career Search with Person
Match and the revised Super’s Work Values Inventory. The instruments measure various
skills that are aligned with specific occupational clusters, such as finance, manufacturing,
health science, information technology, and education and training. There are two levels,
one for middle and high school students (the KSA) and one designed for college students
and adults (the KSA-CA). The KSA has 90 items that require students to rate their current
skill level for various tasks using a 4-point scale: 1 (I don’t think I could ever learn to do
this task), 2 (I could probably learn to do this task, but it would be hard), 3 (I could easily
learn to do this task), and 4 (I can already do this task). The KSA-CA is a 160-item instru-
ment that requires adult test takers to rate their degree of certainty in performing a wide
variety of activities using a 5-point scale from 1 (Cannot do at all) to 5 (Completely certain
can do).
Ability Explorer The Ability Explorer (Harrington, Harrington, & Wall, 2012) measures
strengths in the 14 abilities important in today’s workplace. This 140-question assess-
ment enables individuals to learn their strongest abilities, plus related courses, activities,
and careers for developing and using these abilities. Designed for individuals from high
school to adult, the instrument is written at an eighth-grade reading level and takes
about 30 minutes to complete. Test takers are asked to read each statement and then
indicate how good they are or would be at doing an activity. The Ability Explorer meas-
ures 14 abilities: artistic, clerical, interpersonal, language, leadership, manual, musical/
dramatic, numerical/mathematical, organizational, scientific, persuasive, spatial, social,
and technical/mechanical.
256 Chapter 11 • Career and Employment Assessment
Career Maturity Inventory, Revised The CMI was developed for students from grades 6
through 12 to measure attitudes and competencies related to variables in the career-choice
process. The inventory has two parts: an attitude inventory and a competence inventory.
The attitude inventory measures attitudes toward the decisiveness, involvement, inde-
pendence, orientation, and compromise dimensions of career maturity. The competence
inventory has five subscales: Self-Appraisal, Occupational Information, Goal Selection,
Planning, and Problem Solving.
Career Development Inventory The CDI (Super, Zelkowitz, & Thompson, 1981) was
developed for K–12 and college and university students. The inventory contains five scales:
Career Planning, Career Exploration, Decision Making, World-of-Work Information, and
Knowledge of Preferred Occupational Group. The scales are combined to give scores in
Career Development Attitudes, Career Development Knowledge and Skills, and Career
Orientation.
Career Factors Inventory The CFI is designed to help people determine whether they
are ready to engage in the career decision-making process. The CFI can be used for indi-
viduals age 13 and above; its 21-item, self-scoreable booklet takes 10 minutes to complete.
It explores an individual’s perceived issues of lack of self-awareness, lack of career infor-
mation, career anxiety, and generalized indecisiveness.
Chapter 11 • Career and Employment Assessment 257
Kuder Career Planning System The Kuder Career Planning System is an Internet-based
system that was developed based on the work of Dr. Frederick Kuder, a pioneer in the
career development industry. The Kuder system provides several customizable career
development tools to help individuals identify their interests, explore their career options,
and plan for career success. It can be used by middle school, high school, and college stu-
dents, as well as adults. The Kuder system includes an interest inventory, a skills assess-
ment, and a values inventory, all available online:
• Kuder Galaxy is designed for students in pre-K through grade five. The online system
allows for an interactive process that involves games, activities, and video to help
identify areas for career exploration.
• Kuder Skills Assessment is designed for grades 6 through 12 and provides an inter-
active online system to help identify college and career paths. The assessment can
be completed in under 20 minutes and helps students to develop multi-year career
plans.
• Kuder Journey is an online interactive system designed to aid in career planning for
college students and adults. This research-based system provides the user with an
analysis of interests, skills, and work values.
Interviews
As is the case in all areas of counseling, the initial interview is an important aspect of career
assessment. During the interview, counselors gather general background information
common to any unstructured interview (see Chapter 2). In addition, they may ask the cli-
ent questions relevant to career assessment, such as the client’s work experience, education
and training, interests, and leisure activities (Barclay & Wolff, 2012). Counselors may also
258 Chapter 11 • Career and Employment Assessment
ask clients to describe their typical day, including (a) how much they like routine or spon-
taneity and (b) how much they rely on others or are independent. An important aspect of
career planning is the client’s readiness to make career decisions as well as his or her
knowledge about the world of work and personal abilities and skills. Barclay and Wolff
suggested that counselors may use a structured career interview (e.g., Career Construction
Interview) to analyze life themes, patterns, self-concept, and salient interests. Further,
Barclay and Wolff determined that raters of the Career Construction Interview could iden-
tify RIASEC themes from the content of the interview. Although further research is needed
to validate these results, the idea that interviews can result in the same coding as standard-
ized assessments is promising.
McMahon and Watson (2012) argued that a narrative approach to career assessment
can provide a rich qualitative analysis of career interests and potential career paths. Gibson
and Mitchell (2006, p. 112) provided sample questions that may be asked of clients who
have expressed an interest in changing careers:
1. Would you give me a brief review of your employment experiences?
2. What is your educational background?
3. Do you have any unique experience or interest that might be related to the choice of
career, such as hobbies or special interests?
4. Why have you decided, at this time, to change careers?
5. Tell me about your ideas about a new or different career.
Employment Assessment
In the context of employment, businesses and organizations use various assessment strate-
gies to make recommendations or decisions about employees or job applicants. For exam-
ple, various personnel assessment procedures can be used to select individuals for
entry-level positions, make differential job assignments, select individuals for advanced
and specialized positions, decide who will be promoted within an organization, decide
who is eligible for training, and provide diagnostic and career development information
for individuals. Effective personnel assessment involves a systematic approach to gather-
ing information about applicants’ job qualifications. Not all assessment instruments are
appropriate for every job and organizational setting; agencies must determine the most
appropriate assessment strategy for their particular needs. The effective use of assessment
tools will reduce the degree of error in making hiring decisions (U.S. Office of Personnel
Management, 2009). In this section, we will discuss common procedures specific to employ-
ment assessment, including selection interviews, biographical information, and tests. We
will also present information about job analysis and assessment centers.
Selection Interviews
The selection interview is one of the most widely used methods of assessing job applicants.
The purpose of the interview is to gain information about candidates’ qualifications and
experiences relevant to the available job. Although there is limited evidence that inter-
views are valid procedures for personnel selection, they continue to be widely used (Hebl,
Madera, & Martinez, 2014). According to Hebl, Madera, and Martinez, there are numerous
issues with interview processes, and they can be highly impacted by cultural or racial bias.
Although interviews are widely used as a primary selection process, little research a ttention
Chapter 11 • Career and Employment Assessment 259
has been paid to multicultural issues, bias, and prejudice in the interview (Manroop,
Boekhorst, & Harrison, 2013).
Although interviews can be impacted greatly by the lens of the interviewer, they
remain a viable approach to assessment of suitability for personnel selection. In lieu of
unstructured approaches, Hebl, Madera, and Martinez (2014) suggested that interviewers
take a more systematic, structured, job-related interview approach. These structured
approaches tend to have higher validity. However, the interviewer should be trained in
personnel selection and understand the relationship of traits to job performance.
As an example of structured interviewing, the situational interview, with specific
questions based on job-related critical incidents, has also proven to be a valid assessment
tool (Seijts & Kyei-Poku, 2010). Although the use of structured approaches tends to miti-
gate some of the bias issues in interviewing, it is important to identify some of the ongoing
concerns. Studies of the interview show that interviewers tend to make decisions about
candidates early in the interview. Negative information tends to be weighed more heavily
than positive information, and visual cues tend to be more important than verbal cues
(Levashina, Hartwell, Morgeson, & Campion, 2014). In addition, ratings are affected by
how many others are being rated at the same time; clients who share similarities with the
interviewer (the same race, gender, and so on) tend to be more favorably rated. Often,
applicants are rated the same in evaluations, either superior (leniency error), average (cen-
tral tendency error), or poor (stringency error). Sometimes, one or two either good or bad
characteristics of an applicant tend to influence the ratings of all the other characteristics
(halo error). Sometimes, the interviewer allows the quality of the applicants who preceded
the present applicant to influence the ratings of the present applicant (contrast effect).
Interviewers are overconfident at times in their ability to evaluate applicants; as a result,
they make hasty decisions. Interviewers need training and instruction in the following
skills (Gatewood, Feild, & Barnick, 2010):
1. Setting the stage for open communication
2. Consistency in the questioning/interview process
3. Keeping the interview on target
4. Using appropriate speech
5. Mastering listening skills
6. Using notes to keep track of information
7. Using good interviewing skills to maintain the flow of communication and maintain-
ing a collaborative process
8. Managing nonverbal aspects of the interview process
The selection interview can be valuable if the interviewers are trained in the process,
panels are used whenever possible, job-related questions are used, and multiple questions
are designed for the requirements of the job.
Biographical Information
Almost all employers require prospective employees to complete an application blank, a
biographical information blank, or a life history. Employers believe that these forms pro-
vide information on what the candidate has done in the past and can be used to predict
future behavior. Biographical data has received a great deal of attention in the world of
career development. Overall, there are some assumptions that individuals use when
260 Chapter 11 • Career and Employment Assessment
Tests
Many employers use tests as part of the assessment process to develop work-related infor-
mation and recommendations or decisions about people who work for them or are seeking
employment with them. To be effective, instruments must be valid for the intended pur-
pose and should be the least discriminating tool for the decisions that need to be made.
Tests can help ensure fair treatment of workers in an employment context.
Tests Used in the Private Sector Employees in the private sector work primarily in man-
ufacturing, retail trades, and service occupations. Tests used in these settings measure
such attributes as general aptitude, cognitive ability, clerical achievement, spatial and
mechanical abilities, perceptual accuracy, and motor ability. Other factors, such as atti-
tudes, motivation, organizational commitment, and work environment, can also be
assessed to diagnose strengths and weaknesses in an organization and to monitor changes
in the workforce. Job satisfaction instruments are used to measure dimensions such as
attitude toward supervision, the company, coworkers, salary, working conditions, promo-
tion, security, subordinates, autonomy, and esteem needs. Personality inventories are also
used in some employment contexts. In general, the most widely used personality invento-
ries in these settings include the Myers–Briggs Type Indicator, the California Psychologi-
cal Inventory, the Edwards Personal Preference Inventory, and the Guilford–Zimmerman
Temperament Scale.
Examples of standardized instruments used in the private sector include the following:
• The Career Attitudes and Strategies Inventory (CASI) helps to identify career prob-
lems requiring further discussion and exploration. It includes a career checkup; a
self-assessment of the client’s career or situation; and a survey of the beliefs, events,
and forces that affect the client’s career.
• The Comprehensive Ability Battery (CAB) contains 20 pencil-and-paper subtests that
measure a single primary-ability factor related to performance in an industrial set-
ting. The subtests are Verbal Ability, Numerical Ability, Spatial Ability, Perceptual
Completion, Clerical Speed and Accuracy, Reasoning, Hidden Shapes, Rote M emory,
Mechanical Ability, Meaningful Memory, Memory Span, Spelling, Auditory Ability,
Esthetics Judgment, Organizing Ideas, Production of Ideas, Verbal Fluency, Origi-
nality, Tracking, and Drawing.
Chapter 11 • Career and Employment Assessment 261
Other government agencies develop and administer their own tests. The U.S. Depart-
ment of State, for example, uses its own tests to select Foreign Service officers. The U.S.
Employment Service (USES) has developed instruments for use in the local and state
employment offices. The General Aptitude Test Battery (GATB) is widely used; it com-
pares examinee scores with workers in more than 600 jobs. The USES has also developed
interest inventories and tests of proficiency in dictation, typing, and spelling.
State and local government agencies often require tests for selection purposes. Tests
are frequently required of police officers and firefighters, as well as clerical workers. Local
governmental agencies tend to use tests less than state agencies. However, both state and
local agencies use oral and performance examinations more than federal agencies do
(Friedman & Williams, 1982). Skilled workers tend to be given tests more than unskilled
workers, and tests are not used as a sole criterion. Education, experience, character, and
residence requirements are other important factors considered in hiring.
States often mandate occupational and professional licensing for which some type of
examination is required. This is true even in education in many states, with teachers and
principals required to pass licensing examinations. States often establish a licensing board
for a specific occupation. Occupations requiring licensing examinations include the fol-
lowing: architects, acupuncturists, audiologists, chiropractors, dentists, dental hygienists,
engineers, funeral directors, landscape architects, land surveyors, occupational therapists,
psychologists, speech pathologists, hearing aid counselors, optometrists, registered nurses,
practical nurses, pharmacists, physical therapists, physicians, physician’s assistants, podi-
atrists, and social workers.
Tests Used in the Military The military makes extensive use of tests for selection and
classification purposes. For example, the ASVAB is a comprehensive aptitude test that
helps identify the proper technical training or education needed for various types of
occupational areas in the armed forces. In addition, specific tests are used to select can-
didates for admission to the service academies, reserve officer training programs, officer
candidate schools, and specialized programs (such as flight training). The Cadet Evalu-
ation Battery (CEB), the Air Force Officer Qualifying Test (AFOQT), the Alternate Flight
Aptitude Selection Test (AFAST), and the Defense Language Aptitude Battery (DLAB)
are examples.
Job Analysis
One of the key elements in effective personnel assessment is job analysis. Job analysis is a
purposeful, systematic process for documenting the particular duties and requirements of
a job and the relative importance of these duties. It is used to analyze information about
the duties and tasks of a given job; environment and working conditions; tools and equip-
ment; relationships among supervisors and employees; and the knowledge, skills, and
abilities requirement for the job. Job analysis is conducted to (a) help determine the train-
ing requirements of a given job, (b) make decisions about compensation, (c) aid in selecting
job applicants, and (d) review job performance.
The interview is an excellent method for obtaining information for job analysis
and can be conducted with employees, supervisors, and other individuals knowledge-
able about the job. An example of a job analysis interview for employees is shown in
Table 11.2.
Chapter 11 • Career and Employment Assessment 263
Job analysis can also take place in focus groups or multiperson interviews, which are
conducted by one person with several interviewees. A multiperson job analysis should
follow these steps:
1. Read and review existing materials and data on the job to be analyzed.
2. Have supervisors and experienced workers in the field meet together in a group and
discuss the job requirements, producing a list of tasks and roles that are performed.
3. Display the tasks and job characteristics identified so that the group can react to what
you have written.
264 Chapter 11 • Career and Employment Assessment
4. List outputs, knowledge, skills, and abilities (including use of machines, tools, equip-
ment, and work aids) needed to get the job done. Get agreement from the group on
the tasks performed.
5. Have the workers determine the percentage of time spent on each task or skill.
6. Combine common tasks together.
7. Have the workers explain how they know or recognize excellent, satisfactory, or poor
performance of the tasks and the job.
8. Construct an instrument to assess job performance, and have supervisors and work-
ers react to the tasks and performance standards identified.
Several standardized instruments are available for use with job analysis. They usu-
ally contain three major sections: background information on the respondents, job tasks,
and other miscellaneous information. Examples include the Position Analysis Question-
naire (PAQ) and the WorkKeys.
THE POSITION ANALYSIS QUESTIONNAIRE The PAQ is a structured job analysis question-
naire that measures the human attributes needed to perform a job. It consists of 195 items
divided into six major categories:
1. Information Input Where and how does the worker get the information that is used
in performing the job?
2. Mental Processes What reasoning, decision-making, planning, and information-
processing activities are involved in performing the job?
3. Work Output What physical activities are required to perform the job, and what
tools or devices are used?
4. Relationships with Other Persons What relationships with other people are
required in performing the job?
5. Job Context In what physical and social contexts is the work performed?
6. Other Job Characteristics What activities, conditions, or characteristics other than
those described previously are relevant to the job?
Using rating scales, items may be evaluated on the (a) extent of use in the given job, (b)
amount of time, (c) importance to the job, (d) possibility of occurrence, or (e) applicability
to the job in question.
Workkeys WorkKeys is a system developed by the ACT that gives students and workers
reliable, relevant information about their workplace skill levels. It includes eight tests, six
of which correspond to basic academic skills and two of which assess other, nonacademic
workplace skills, such as applied technology and teamwork. WorkKeys also includes a
job-analysis system that helps employers identify the skills employees need to be success-
ful on the job. Three different job-analysis options are available:
1. Job Profiling Used with focus groups to identify the tasks most critical to a job and
the skill levels required to enter the job and perform effectively
2. SkillMap Given to job administrators or job experts, who identify and rate the tasks
and skills requirements needed for a particular job
3. WorkKeys Estimator Provides quick estimates of the skill levels needed for a
given job
Chapter 11 • Career and Employment Assessment 265
Assessment Centers
In terms of employment assessment, the term assessment center is considered a technique or
approach to assessment (it does not refer to a place where assessment occurs). Assessment
centers are group-oriented, standardized series of activities that provide a basis for judg-
ments or predictions of human behaviors considered relevant to work performance in a
particular organizational setting (Muchinsky, 2008). These activities can occur in some
physical location in an organization or at some location away from the workplace. Assess-
ment centers are used in human resource management to (a) decide whom to select or
promote, (b) diagnose strengths and weakness in work-related skills, and (c) develop job-
relevant skills (Thornton & Rupp, 2006). Because assessment centers are expensive, they
have been used mainly by large organizations—but consulting firms may also provide
assessment center services to smaller companies.
Muchinsky (2008, p. 120) lists four characteristics of the assessment center approach:
1. People who attend the center (assessees) are usually management-level personnel
that the company wants to evaluate for possible selection, promotion, or training.
2. Assessees are evaluated in groups of 10 to 20, although subgroups may be formed for
specific exercises. The group format provides opportunity for peer evaluation.
3. Several raters (the assessors) do the evaluation. They work in teams and collectively
or individually make recommendations about selection and promotion.
4. The assessment program can take from one to several days to complete, and multiple
assessment methods are employed. Many are group exercises, but inventories, per-
sonal history forms, and interviews are also used.
The assessment center approach often uses situational judgment tests and work sam-
ple tests. A situational judgment test is usually a paper-and-pencil test that presents the exam-
inee with a written scenario and a list of possible solutions to the problem. The individual
has to make judgments about how to deal with the situation. Work sample tests measure job
skills by having the examinee perform a simulated work task. An example of this is the In-
Basket Test (Frederiksen, Saunders, & Wand, 1957), a test that presents job applicants with
materials in an “in-basket” related to tasks, problems, or situations that routinely occur in
the position for which they have applied. The in-basket typically consists of memos,
appointments, email messages, phone messages, correspondence, reports, forms, and other
materials that reflect real-world, job-related tasks. The candidate’s task is to review the in-
basket items and then take action, recording any notes, comments, and responses on special
forms. A number of versions of this approach have been developed for different groups,
such as school administrators and military officers. Candidates demonstrate how they
would handle each problem and are evaluated not only on their solutions but on other
work-related skills, such as time-management, decision-making, and organizational skills.
Thornton and Byham (1982) identified nine dimensions frequently measured in
assessment centers: communication skills, planning, organizational strategies, delegating
of responsibilities, decisiveness, initiative, stress tolerance, adaptability, and tenacity. The
administrator of an assessment center should keep in mind the following guidelines:
1. Assessment should be based on clearly defined dimensions of the position or behav-
ior in question.
2. Multiple assessment techniques should be used.
3. A variety of job-sampling techniques should be used.
266 Chapter 11 • Career and Employment Assessment
4. Familiarity with the job and the organization is needed; experience in the job or role
is desirable.
5. Thorough training in assessment center procedures is necessary for all observers and
raters.
6. All pertinent behavior should be observed, recorded, and communicated to other
raters and observers.
7. Group discussion and decision-making procedures are used to integrate observa-
tions, rate dimensions, and make predictions.
8. Clients should be assessed against clearly understood external norms, not against
each other.
9. Observers should guard against first impressions and other errors commonly made
in rating and observation.
Summary
There are many approaches to career and employ- Employment tests are more widely used in
ment assessment. Career and employment tests federal and state agencies than in business and
have had a long history, beginning with Strong’s industry. In addition, many skilled and profes-
interest inventory in the 1920s. These tests have sional workers must take certification or licens-
proven to be good measures of vocational suc- ing examinations, such as those for certification
cess, but many factors need to be considered. The as a school counselor or a mental health counse-
maturity, socioeconomic class, and educational lor, which then become the prerequisite for hir-
level of the client all play important roles in ing. Tests in business and industry are primarily
assessing interests and career satisfaction. Envi- used for selection purposes. The decline in test
ronment and heredity issues are also involved. A use has led to a movement away from objective
client may not have had the opportunities and selection procedures. Analysis of research has
experiences necessary to make valid judgments. indicated that research-based testing does not
Family, societal, and peer expectations may influ- discriminate against minority groups and
ence an individual’s pattern of behavior, and women and saves the organization money. Nev-
temperament may influence an individual’s pref- ertheless, currently there is more reliance on clin-
erences. Career counselors need to have a com- ical judgment and consideration of the applicant’s
prehensive knowledge of the world of work as experience and education. Career counselors
well as extensive knowledge of assessment tech- need to be guided by the nine specific standards
niques and counseling procedures. for the use of tests in employment contexts.
268 Chapter 11 • Career and Employment Assessment
Suggested Activities
1. Evaluate an instrument listed or discussed in this advanced as far as possible in her current position
chapter and write a critical review of it. Be sure to and has gotten tired of working at the same job. She
read the evaluations of the test in the Mental has experienced some job burnout and has decided to
Measurement Yearbooks. see what other types of careers she might be able to
2. Administer several career assessment instru- pursue. After discussing her career concerns with a
ments and write an interpretation. Tape or role- career counselor, Erica agreed that an interest inven-
play the results for the client. tory might be a good starting point for discussion.
3. Write a position paper on one of the following The counselor administered the COPS and the CAPS.
topics: the role of assessment in career coun- Erica had the following scores:
seling, gender and cultural bias in interest inven-
tories, Internet-based career assessment, or the Raw Scores Percentiles
history of career assessment. COPS Career Clusters
4. Design some nontest techniques to assess indi-
1. Science, Professional 26 91
vidual interests and values. Try out your assess-
2. Science, Skilled 11 50
ment techniques on a sample of individuals and
3. Technology, 9 45
write a report of your findings.
Professional
5. Study the following hypothetical case and answer
4. Technology, Skilled 4 30
the questions that follow it.
5. Consumer Economics 11 50
Case of Erica 6. Outdoor 18 70
Erica is a 35-year-old mother of one and has been 7. Business, Professional 26 85
working in same job since graduating college. Erica 8. Business, Skilled 14 55
was the first person in her family to go to college. 9. Clerical 1 2
Erica’s family emigrated from Mexico and began 10. Communication 15 45
working in trade jobs for several generations. Erica’s 11. Arts, Professional 17 35
family members take pride in working with their 12. Arts, Skilled 21 48
hands and have had some opposition to her pursuit 13. Service, Professional 30 87
of education. Today, Erica believes that she has 14. Service, Skilled 17 50
Chapter 11 • Career and Employment Assessment 269
References
Arulmani, G. (2014). Assessment of interest and apti- Campbell, D. P., Hyne, S. A., & Nilsen, D. L. (1992).
tude: A methodologically integrated approach. In Manual for the Campbell Interest and Skills Survey.
G. Arulmani, A. J. Bakshi, F. T. L. Leong, & A. G. Minneapolis, MN: National Computer Systems.
Watts (Eds.), Handbook of career development: Inter- Donnay, D. A. C., Thompson, R. C., Morris, M. L.,
national perspectives (pp. 609–629). New York, NY: & Schaubhut, N. A. (2012). Technical brief for
Springer Science & Business Media. the newly revised Strong Interest Inventory assess-
Barclay, S. R., & Wolff, L. A. (2012). Exploring the ment: Content, reliability, and validity. Retrieved
career construction interview for vocational per- from https://www.cpp.com/Pdfs/StrongTech
sonality assessment. Journal of Vocational Behavior, nicalBrief.pdf
81(3), 370–377. doi: http://dx.doi.org/10.1016/j Dozier, V. C., Sampson, J. P., Lenz, J. G., Peterson, G.
.jvb.2012.09.004 W., & Reardon, R. C. (2014). The impact of the
Breaugh, J. A. (2014). Predicting voluntary turnover Self-Directed Search Form R Internet Version on
from job applicant biodata and other applicant counselor-free career exploration. Journal of Career
information. International Journal of Selection and Assessment. doi: 10.1177/1069072714535020
Assessment, 22(3), 321–332. doi: 10.1111/ijsa.12080 Frederiksen, N., Saunders, D. R., & Wand, B. (1957).
Brown, S. D., & Lent, R. W. (2013). Career development The in-basket test. Psychological Monograph, 71(9),
and counseling. Hoboken, NJ: John Wiley and Sons. 1–28.
Chernyshenko, O. S., Stark, S., & Drasgow, F. (2011). Friedman, T., & Williams, E. B. (1982). Current uses
Individual differences: Their measurement and of tests for employment. In A. K. Wigdon & W. R.
validity. In S. Zedeck (Ed.), APA handbook of indus- Garner (Eds.), Ability testing: Uses, consequences,
trial and organizational psychology (Vol. 2: Selecting and controversies (Pt. 2). Washington, DC: National
and developing members for the organization; Academy Press.
pp. 117–151). Washington, DC: American Psycho- Gatewood, R. D., Feild, H. S., & Barnick, M. J. (2010).
logical Association. Human resource selection. Chicago, IL: Dryden.
270 Chapter 11 • Career and Employment Assessment
Gibson, R. L., & Mitchell, M. H. (2006). Introduction to psychology (9th ed.). Kansas City, KS: Hyper-
career counseling for the 21st century. Columbus, graphic Press.
OH: Pearson. Osborn, D. S., Dikel, M. F., & Sampson, J. P. (2011). The
Harmon, L. W., Hansen, J. I. C., Borgen, F. H., & Internet: A tool for career planning. Broken Arrow,
Hammer, A. L. (1994). Strong Interest Inventory OK: National Career Development Association.
applications and technical guide. Palo Alto, CA: CPP. Rokeach, M. (1973). The nature of human values. New
Harrington, T. F., Harrington, J. C., & Wall, J. E. York, NY: Free Press.
(2012). Ability Explorer. Indianapolis, IN: JIST Seijts, G. H., & Kyei-Poku, I. (2010). The role of situ-
Works. ational interviews in fostering positive reactions
Harrington, T., & Long, J. (2013). The history of inter- to selection decisions. Applied Psychology: An Inter-
est inventories and career assessments in career national Review, 59(3), 431–453.
counseling. Career Development Quarterly, 61(1), Sinha, N., & Srivastava, K. B. L. (2014). Examining the
83–92. doi: 10,1002/j.2161-0045.2013.00039.x relationship between personality and work values
Hebl, M. R., Madera, J. M., & Martinez, L. R. (2014). across career stages. Psychological Studies, 59(1),
Personnel selection. In F. T. L. Leong, L. Comas- 44–51.
Díaz, G. C. Nagayama Hall, V. C. McLoyd, & J. E. Strong, E. R. (1927). Vocational Interest Test. Educa-
Trimble (Eds.), APA handbook of multicultural tional Record, 8, 107–121.
psychology (Vol. 2: Applications and training; Super, D. E. (1970). Manual, Work Values Inventory.
pp. 253–264). Washington, DC: American Psycho- Chicago, IL: Riverside Publishing.
logical Association. Super, D. E. (1990). A life-span, life-space approach to
Holland, J. L. (1994). Self-Directed Search. Odessa, FL: career development. In D. Brown, L. Brooks, &
Psychological Assessment Resources. Associates (Eds.), Career choice and development
Holland, J. L. (1996). Exploring careers with a typol- (2nd ed., pp. 196–261). San Francisco, CA:
ogy. American Psychologist, 51, 397–406. Jossey-Bass.
Holland, J. L., Reardon, R. C., Latshaw, R. J., Rarick, Super, D. E., Zelkowitz, R. S., & Thompson, A. S.
S. R., Schneider, S., Shortridge, M. A., & St. James, (1981). Career Development Inventory: Adult Form I.
S. A. (2001). Self-Directed Search Form R Internet New York, NY: Teachers’ College, Columbia
Version 2.0. [Online]. Retrieved from http://www University.
.self-directed-search.com Thornton, G. C., & Byham, W. C. (1982). Assessment
Levashina, J., Hartwell, C. J., Morgeson, F. P., & centers and managerial performance. New York, NY:
Campion, M. A. (2014). The structured employ- Academic Press.
ment interview: Narrative and quantitative review Thornton, G. C., & Rupp, D. E. (2006). Assessment
of the research literature. Personnel Psychology, centers in human resource management. Mahwah,
67(1), 241–293. NJ: Erlbaum.
Manroop, L., Boekhorst, J. A., & Harrison, J. A. (2013). U.S. Office of Personnel Management. (2009). Assess-
The influence of cross-cultural differences on job ment decision guide. Retrieved from https://apps
interview selection decisions. The International Journal .opm.gov/ADT/Content.aspx? page=TOC
of Human Resource Management, 24(18), 3512–3533. U.S. Department of Labor National Center for O*NET
McMahon, M., & Watson, M. (2012). Telling stories of Development. (2007). O*NET Online Summary
career assessment. Journal of Career Assessment, Report for: 21-1012.00—Educational, Vocational,
20(4), 440–451. and School Counselors. Retrieved from http://
Muchinsky, P. M. (2008). Psychology applied to work: online.onetcenter.org/link/summary/21-1012.00
An introduction to industrial and organization
CHAPTER
12 Personality Assessment
Personality includes all the special qualities people have that make them different from
each other (e.g., charm, energy, disposition, attitude, temperament, cleverness) and the
feelings and behaviors they exhibit. Personality assessments assist counselors in under-
standing the behavior of a particular individual, with the aim of making some decision
about a future course of action or making a prediction about the person’s unique future
behavior. They give us a picture of the individual’s enduring personality traits and tran-
sient personality states. Personality inventories are used in employment settings to aid in
personnel selection, in forensic settings to aid in identification of problematic behavioral
patterns, in clinical settings to identify personal problems and diagnose psychopatholo-
gies, and in clinical settings to evaluate growth or change in behavior after counseling. In
recent years, personality assessment has become a popular approach for identifying the
potential for relationship success or failure (Solomon & Jackson, 2014). Numerous instru-
ments and techniques are used in personality assessment (such as structured personality
inventories and projective techniques) to evaluate pathological aspects of personality.
There are also instruments that focus primarily on assessing positive personality traits.
Defining Personality
The term personality has many meanings. Over the years, scientists and researchers have
attempted to find a comprehensive definition of personality—their efforts resulting in
dozens of diverse definitions appearing in the literature. For example, Gordon Allport
271
272 Chapter 12 • Personality Assessment
States In reference to personality, the word state refers to the transitory exhibition of some
trait (Kalimeri, 2013). Whereas trait refers to an enduring personality characteristic, state
usually refers to a temporary behavioral tendency. For example, a person may be described
as being in an anxious state after experiencing a traumatic event, but may not be consid-
ered an anxious person (trait) in general. Although few personality inventories distinguish
traits from states, Spielberger (1970, 1983) designed the State-Trait Anxiety Scale to iden-
tify the anxiety an individual feels at a particular moment (state) versus the way he or she
feels generally (trait).
Personality Inventories
Personality inventories are commonly used to identify and measure the structure and fea-
tures of one’s personality or one’s characteristic way of thinking, feeling, and behaving
(Weiner & Greene, 2011). Some personality inventories measure specific traits (such as
introversion) or states (such as anxiety), whereas others evaluate broad dimensions of per-
sonality that encompass a broad range of characteristics and attributes. In addition to
measurement and evaluation, personality inventories can be used for other purposes, such
as increasing self-knowledge and self-understanding (Urbina, 2014). Most personality
inventories are self-report, meaning that the test taker supplies information about him- or
herself; other instruments can be designed to elicit information from individuals other
than the person being evaluated (e.g., parent, spouse, teacher). The following are examples
of how professionals from various disciplines may use personality inventories:
• A career counselor administers a personality inventory in order to help a person
choose a career.
• A clinical mental health counselor administers a personality inventory to narrow
down a varied set of symptoms and determine the diagnosis of a particular psycho-
logical disorder.
• A school counselor looks at a high school student’s personality inventory results to
determine if any difficulties are related to the student’s academic problems.
• An employment counselor uses a personality inventory to identify a client’s person-
ality attributes (e.g., conscientiousness, emotional stability, agreeableness) that con-
tribute to effective work performance.
• A neuropsychologist administers a personality inventory to determine the extent to
which an individual’s brain injury has affected cognition and behavior.
Rational Approach One of the oldest methods of personality test construction is the
rational approach, which involves the use of reason and deductive logic to construct test
items. In other words, items are developed based on the degree to which they are logically
assumed to be related to a given construct. An example is the Woodworth Personal Data Sheet
(Woodworth, 1920), considered to be the first structured (i.e., objective) personality inven-
tory. The 116-item self-report inventory was assembled in response to needs for psychiatric
screening during the U.S. entry into World War I. Test items consisted of statements that
Woodworth believed were indicators of psychological maladjustment. Items were simply
listed on paper (e.g., Are you happy most of the time?), and the respondent answered yes or no.
Instrument development using the rational approach means that the measure is entirely
dependent upon the test author’s assumptions about the construct (i.e., personality) being
measured, and these assumptions may or may not have been well-founded.
e xample, instruments that use projective techniques are based on the psychodynamic theory of
personality, which emphasizes the importance of the unconscious (i.e., hidden emotions,
internal conflicts). Thus, projective instruments are thought to reveal examinees’ uncon-
scious by allowing them to respond to some type of unstructured stimuli (e.g., an inkblot, a
picture, an incomplete sentence), resulting in the examinee projecting or expressing uncon-
scious fears, conflicts, or needs.
Similarly, the MBTI (Myers et al., 1998) was constructed based on Carl Jung’s (1921)
theory of psychological type. Jung identified two dimensions that combined to create
personality types: attitudes (extraversion and introversion) and functions (sensation,
intuition, thinking, and feeling). Based on Jung’s original concepts, the MBTI sorts an
individual’s preferences into four separate categories: (a) extraversion or introversion, (b)
sensing or intuitive, (c) thinking or feeling, and (d) judging or perceiving. Combinations
of these four preferences result in 16 separate personality types that are used to describe
the examinee.
Criterion Group Approach The criterion group approach is an empirical method of per-
sonality test construction that involves selecting items that can discriminate between rel-
evant criterion groups and control groups. The approach begins with a population sample
with known personality characteristics, such as a group of individuals diagnosed with
schizophrenia (i.e., the criterion group). An instrument based upon this population sample
is administered to both the schizophrenic group and a control group (usually a “normal”
population) to identify the items that distinguish the two groups. The items that distin-
guish the schizophrenic group from the control group are then placed in a separate scale.
The scale is then cross-validated to see how well it distinguishes the schizophrenic group
from other groups. Using the criterion group method, the emphasis is on the discriminat-
ing power of the instrument; what is important is the fact that the test discriminates, not
the reason it does so. This method of instrument construction is exemplified by the crea-
tion of the Minnesota Multiphasic Personality Inventory (MMPI-2) and MMPI-2-RF,
which were developed based on the ability of tests’ items to detect symptoms of psycho-
pathology in adults.
Factor Analysis Factor analysis is another empirical approach that uses statistical proce-
dures to (a) analyze interrelationships among a large number of variables (e.g., personality
traits) and (b) explain these variables in terms of their common underlying dimensions
(factors). For constructing personality inventories, factor analysis helps in condensing
numerous personality traits into fewer dimensions of personality that can then be meas-
ured and evaluated. For example, Cattell, Eber, and Tatsuoka (1970) developed the Sixteen
Personality Factor, Fifth Edition (16PF) by reducing 18,000 personality descriptors (origi-
nally identified by Allport) into 181 personality clusters, from which 12 factors were gener-
ated using factor analysis. These factors laid the foundation for the first 16PF published in
1949. More than half a century and several revisions later, the 16PF continues to be a widely
used instrument for measuring normal adult personality. Another example is the NEO
Personality Inventory (NEO PI-R), a 240-item inventory that is based on five dimensions of
personality (known as the Five Factor Model): neuroticism, extraversion, openness, agree
ableness, and conscientiousness. These broad factor-analytically derived categories have
been shown to represent the underlying structure of Allport’s 18,000 personality adjectives
(Costa & McCrae, 2013) and have received much support from theorists and researchers
276 Chapter 12 • Personality Assessment
(Detrick & Chibnall, 2013; Furnham, Guenole, Levine, & Chamorro-Premuzic, 2013;
Gorostiaga, Balluerka, Alonso-Arbiol, & Haranburu, 2011; Källmen, Wennberg, & Bergman,
2011; Van den Broeck, Rossi, Dierckx, & De Clercq, 2012; Vassend & Skrondal, 2011).
The Minnesota Multiphasic Personality Inventory The MMPI-2 (Butcher et al., 2001),
one of the most widely used personality inventories, is a comprehensive, structured instru-
ment that assesses major symptoms of adult psychopathology. It is often used by clinicians
to assist with the diagnosis of mental disorders and the selection of appropriate treatment
methods. The original MMPI was developed in the late 1930s and early 1940s by psycholo-
gist Starke R. Hathaway and psychiatrist and neurologist J. C. McKinley at the University
of Minnesota. Because of concerns about the adequacy of the original standardization
norm, the MMPI-2, released in 1989, implemented norms that were much more repre-
sentative of the U.S. population. The MMPI-2 can be administered to individuals aged 18
years and older who can read at a minimum sixth-grade level. It has 567 true/false ques-
tions; takes 60 to 90 minutes to complete; can be administered by hand, audiocassette, and
computer; and is available in English, Spanish, Hmong, and French (Canada). Use of the
MMPI-2 is restricted to qualified professionals who are licensed or credentialed and have
adequate training in assessment, personality theory, psychopathology, and diagnosis.
The MMPI-2 contains 10 clinical scales that assess dimensions of personality and psy-
chopathology. The clinical scales were developed using an empirical approach: items were
selected for each scale on the basis of their ability to discriminate between clinical and
normal groups. It also has nine validity scales used to detect response styles, which refers to
a test taker’s pattern of responding to items. A test taker may respond to items in a variety
of ways that can compromise the validity of the test (Butcher et al., 2001). For example, an
examinee may leave a large number of items unanswered, distort his or her self-descriptions
by underreporting or overreporting symptoms, choose answers randomly, or pick all true
or all false answers. The MMPI-2 validity scales were designed to help detect sources of
278 Chapter 12 • Personality Assessment
test invalidity and to evaluate the impact of such distortions on the test results. Table 12.1
provides a description of the clinical and validity scales.
Since the original MMPI was published, a number of additional scales and subscales
(e.g., Content Scales, Supplementary Scales) have been developed that measure numerous
pathological or nonpathological aspects of personality. In addition, the MMPI-2-RF was
published in 2008 as a shorter form of the MMPI. Although the MMPI-2-RF is a more
recent form, the MMPI-2 continues to be the more widely used version because of the sub-
stantial research base to support it.
Scores for the MMPI-2 are reported as T scores. There are several steps involved in
interpreting scores. First, clinicians examine the validity scale scores to determine whether
the profile contains valid, useful, and relevant information about the client’s personality
and clinical problems. If the test taker scores at or above particular T score values on valid-
ity scales, then the results of the clinical scales may be invalid. For example, a T score of 80
or higher on the Lie (L) validity scale may indicate that the individual answered questions
dishonestly, thus invalidating the test. Only after evaluating the validity scales and deter-
mining that the results are valid does the clinician proceed to examine the clinical scales
scores. In general, T scores above 65 on the clinical scales are elevated and considered
within the “clinical” range, which refers to the range of scores obtained by the sample of
individuals experiencing the particular symptom measured by the scale. Inferences should
not be based solely on a high score on one clinical scale; rather, inferences should be made
on the basis of all elevated scale scores as well as the individual’s background and present-
ing problem. Figure 12.1 provides a sample profile of the MMPI-2.
280 Chapter 12 • Personality Assessment
The MMPI-2-RF is a shorter and newer version of the MMPI. The RF contains 338
true/false items, and the results can be used for a wide range of purposes—much like the
longer form of the MMPI-2. The MMPI-2-RF has 51 empirically validated scores and a num-
ber of validity indicators. Many of the same scales from the long form are included in the
MMPI-2-RF. The shorter form is based on the restructured clinical scales, which have been
reported in the long form as well. The restructured clinical scales include the following:
RCd-(dem)—Demoralization
RC1-(som)—Somatic Complaints
RC2-(lpe)—Low Positive Emotions
RC3-(cyn)—Cynicism
RC4-(asb)—Antisocial Behavior
RC6-(per)—Ideas of Persecution
RC7-(dne)—Dysfunctional Negative Emotions
RC8-(abx)—Aberrant Experiences
RC9-(hpm)—Hypomanic Activation
Like the long form of the MMPI-2, interpretation of the scales requires specific training.
The MMPI-2-RF is restricted to use by professionals who have a doctoral degree in a help-
ing profession, licensure in an area of professional practice, or certification by a profes-
sional organization. Although licensed counselors are qualified users, we caution you to
review state licensure laws to ensure state-specific privleges.
Exercise 12.1
Assessment Using the MMPI-2
Nicholas, age 35, was referred for evalua- tions commonly prescribed in the
tion by his psychiatrist. Nicholas was treatment of bipolar disorder. Nicholas
diagnosed with bipolar disorder (also informed his psychiatrist that he is feel-
known as manic-depressive illness), ing much better and wants to stop taking
which is a brain disorder that causes dra- the medications and terminate treatment.
matic mood swings—from overly “high” The psychiatrist wants to know if Nicho-
and/or irritable to sad and hopeless and las is doing as well as he claims he is
then back again, often with periods of doing.
normal mood in between. The psychia- Review Nicholas’ MMPI-2 profile re-
trist was treating Nicholas with medica- port and answer the questions that follow.
MMPI-2 Profile
Validity Scales Description
Cannot Say The Cannot Say score was 4, indicating that David
omitted three items on the inventory. This does not
invalidate test results.
Variable Response Inconsistency—VRIN The T score on the VRIN scale was 46, which
indicates that David’s responses were not random
or inconsistent.
True Response Inconsistency—TRIN
Infrequency—F
Back Infrequency—F(B)
Infrequency-Psychopathology—F(P)
Lie—L
Correction—K
Superlative Presentation—S
the following categories: Modifying Indices, Clinical Personality Patterns, Severe Person-
ality Pathology, Clinical Syndromes, and Severe Syndromes (see Table 12.2). The scales,
along with the items that comprise the scales, are closely aligned to both Theodore Millon’s
(1969) theory of personality and the classification system of the DSM-5 (American Psychi-
atric Association (APA), 2013). The MCMI-III is often considered an alternative to the
MMPI-2, because both instruments cover a broad range of adult psychopathology.
However, the MMPI-2 focuses primarily on clinical symptoms, whereas the MCMI was
specifically designed to assist in diagnosing personality disorders. An important advan-
tage of the MCMI-III is that it is considerably shorter than the MMPI-2 (175 items vs. 567
items) and yet provides a wide range of information.
The California Psychological Inventory First published in 1957, the CPI is a self-
administered personality inventory composed of 434 true/false statements that assess per-
sonality dimensions in “normal” people between the ages of 12 and 70 (Gough, 2000).
Except for the MMPI-2, the CPI is the most thoroughly researched and frequently used
personality inventory. Test items focus on typical behavior patterns, feelings and opinions,
and attitudes relating to social, ethical, and family matters (e.g., “I enjoy social gatherings
just to be with people”). The results are plotted on 20 scales (representing personality
traits) that are clustered into four general domains (see Table 12.3). The inventory was
normed on a standardization sample of 6,000 men and 7,000 women of varying age and
socioeconomic status and included high school and college students, teachers, business
executives, prison executives, psychiatric patients, and prison inmates.
TABLE 12.3 General Domains and Scales of the California Personality Inventory
General Domains Scales
I. Observable, interpersonal 1. Dominance (leadership ability)
style, and orientation 2. Capacity for status (ambitious vs. unsure)
3. Sociability (outgoing vs. shy)
4. Social presence (self-assured vs. reserved)
5. Self-acceptance (positive self vs. self-doubting)
6. Independence (self-sufficient vs. seeks support)
7. Empathy (empathetic vs. unempathetic)
II. Normative orientation 8. Responsibility (responsible vs. undisciplined)
and values 9. Socialization (conforms vs. rebellious)
10. Self-control (overcontrol vs. undercontrol)
11. Good impression (pleases others vs. complains about others)
12. Communality (fits in vs. sees self differently)
13. Well-being (optimistic vs. pessimistic)
14. Tolerance (fair minded vs. fault finding)
III. Cognitive and intellectual 15. Achievement via conformance (efficient and well organized
functioning vs. distracted)
16. Achievement via independence (clear thinking vs. uninterested)
17. Intellectual efficiency (keeps on task vs. hard time getting
started)
IV. Measures of role and 18. Psychological mindedness (insightful and perceptive vs.
personal style apathetic and unmotivated)
19. Flexibility (likes change vs. not changeable)
20. Femininity/masculinity (sensitive vs. unsentimental)
atharine Cook Briggs and Isabel Briggs Myers, the MBTI was designed for measuring an
K
individual’s preference according to the typological theories of Carl Jung (Myers & Myers,
1995). The instrument sorts an individual’s preferences into four separate categories, each
composed of two opposite poles:
• Extraversion (E) or Introversion (I) Where an individual prefers to focus his or her
energy. People who prefer extroversion direct their energy to the outside world of
people, things, and situations. Those who prefer introversion focus their energy on the
inner world of ideas, emotion, information, or beliefs.
• Sensing (S) or Intuitive (I) How an individual prefers to process or acquire informa-
tion. The sensing type prefers to deal with facts, be objective, or use the five senses to
notice what is real. The intuitive type prefers to look beyond the five senses to acquire
information, to generate new possibilities and ways of doing things, or to anticipate
what isn’t obvious.
• Thinking (T) or Feeling (F) How an individual prefers to makes decisions. Thinking
indicates a preference to make decisions based on logic and objective analysis of the
evidence. Feeling types prefer to make decisions based on their values and subjective
evaluation.
284 Chapter 12 • Personality Assessment
• Judging (J) or Perceiving (P) How an individual prefers to organize his or her life.
The term judging does not mean “judgmental.” It simply indicates a preference for
living in a planned, stable, and organized way. Perceiving indicates a preference for
more spontaneity and flexibility, responding to things as they arise.
These four categories are combined to form 16 possible personality types (or combinations
of preferences) that best describe the examinee. Each type tends to be characterized by its
own interests, values, and unique gifts. Types are often referred to by an abbreviation of
four letters, indicating the four type preferences—for example:
ESTJ: Extraversion, Sensing, Thinking, Judging
INFP: Introversion, Intuition, Feeling, Perceiving
The results of the MBTI provide a detailed description of an individual’s type. For exam-
ple, an individual reported as an ENFP (Extraversion, Intuition, Feeling, and Perceiving)
would be described as the following (Myers & Myers, 2004, p. 2):
• Curious, creative, and imaginative
• Energetic, enthusiastic, and spontaneous
• Keenly perceptive of people and of the outside world
• Appreciative of affirmation from others; readily expresses appreciation and gives
support to others
• Likely to value harmony and goodwill
• Likely to make decisions based on personal values and empathy with others
• Usually seen by others as personable, perceptive, persuasive, and versatile
The Sixteen Personality Factor Questionnaire, Fifth Edition The 16PF (Cattell et al.,
1970) is a comprehensive measure of adult personality traits. It has been widely used for a
variety of applications, including treatment planning, couples’ counseling, vocational
guidance, and hiring and promotion recommendations. The questionnaire consists of 187
items and takes 35 to 50 minutes to complete. There are 16 personality factors, which are
also grouped into five global factors. The 16 personality factors and five global factors are
derived from client responses. The 16 personality factors are Warmth, Reasoning, Emo-
tional Stability, Dominance, Liveliness, Rule-Consciousness, Social Boldness, Sensitivity,
Vigilance, Abstractedness, Privateness, Apprehension, Openness to Change, Self-Reliance,
Perfectionism, and Tension. The five global factors are Extraversion, Anxiety, Tough-
Mindedness, Independence, and Self-Control. Raw scores on all personality and global
factors are converted to sten scores: scores from 1 to 3 are classified as low, scores from 4 to
7 are average, and scores from 8 to 10 are classified as high. High and low scores designate
opposite poles of the same personality factor. For example, an individual’s score on the
Warmth factor may fall from reserved (low) to warm (high). Figure 12.2 shows a score
profile for the 16PF.
Personality Factors
Global Factors
Source: 16PF Fifth Edition. Copyright © 2002 by Institute for Personality & Ability Testing, Inc. (employer for
hire). Reproduced with permission of the publisher NCS Pearson, Inc. All rights reserved. “16PF” is a
trademark, in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliates(s).
the person interprets instrument stimuli in a way that reveals or projects elements of his
or her personal characteristics (e.g., concerns, needs, conflicts, desires, feelings). Although
projective techniques are reviewed in the majority of counselor-preparation assessment
courses, the time spent on review does not typically prepare counselors to use these
instruments in practice (Neukrug, Peterson, Bonner, & Lomas, 2013). Counselors inter-
ested in projective techniques require additional coursework or training to be competent
in this area. Of course, before using these techniques or any assessment techniques, you
should consult your state laws to ensure that you have appropriate privileges under your
certification or licensure.
Chapter 12 • Personality Assessment 287
Although counselor licensure laws vary with regard to the use of projective assess-
ment, it is still important to understand the principles related to this method. Various
methods of interpretation are used with projective instruments to make inferences about
the individual’s underlying personality processes and social-emotional functioning (Smith,
2011). Projective instruments are strongly associated with the psychodynamic theory of
personality, which emphasizes the importance of the unconscious (i.e., hidden emotions,
internal conflicts). Thus, by responding to some type of unstructured stimuli (e.g., inkblot,
picture, incomplete sentence), individuals are thought to reveal their unconscious fears,
conflicts, or needs. Some well-known projective instruments and techniques are the
Rorschach Inkblot Test, the Thematic Apperception Test (TAT), verbal projective tech-
niques, and projective drawings.
The Rorschach Inkblot Test The Rorschach Inkblot Test, developed by Herman R
orschach
(1921), is a test that measures an individual’s view or perception of his or her world or envi-
ronment. It consists of a series of irregular but symmetrical inkblots (see F
igure 12.3); subjects
are asked to look at the inkblots and verbally describe what they see. By analyzing what
someone sees on the inkblot, examiners can make various hypotheses about the individual’s
emotions, cognition, coping skills, perception of others and relationships, and self-perception.
The test consists of 10 bilaterally symmetrical inkblots printed on separate cards. Five cards
are black and white; two are black, white, and red; and three are multicolored.
The test has two phases. In the first free-association phase, the examiner presents the
inkblots to the test taker one at a time and instructs him or her to tell what is on each card
by asking questions such as “What might this be?” The examiner records all relevant infor-
mation, including the test taker’s verbatim responses, nonverbal gestures, length of time
before the first response to each card, and so on. The second phase, referred to as the
inquiry phase, involves the examiner attempting to determine what features on the inkblot
played a role in the test taker’s perception of the image. Questions such as “What makes it
look like (whatever)?” and “Where on the inkblot did you see (whatever the examinee
saw)?” are asked to gain information useful for interpreting the responses.
Examiners must be thoroughly trained in the use of the Rorschach. Clinicians can
make numerous errors during administration, scoring, and interpretation. However, with
288 Chapter 12 • Personality Assessment
supervised training the number of errors can be significantly reduced (Callahan, 2014).
The original test came with no test manual and no administration, scoring, or interpreta-
tion instructions. Over the years, a number of manuals and handbooks by various authors
have become available; the most widely used is a “comprehensive system” devised by
Exner and Exner (2013).
Thematic Apperception Test The TAT (Murray, 1943) originally was based on Henry
Murray’s (1938) theory of personality. The TAT includes a series of black-and-white pic-
ture cards that contain a variety of characters, situations, and objects (see Figure 12.4 for an
example of typical administration of a personality test.). Examinees are asked to look at
and make up stories about each picture. After each TAT card story, an inquiry process
begins, and the examiner asks a series of questions to better understand the examinee’s
story, such as what led up to the event shown, what is happening at the moment, what the
characters are thinking and feeling, and what the outcome of the story was. Questions may
involve further clarification of the characters’ thoughts and feelings and how the story was
generated. There are numerous approaches to interpreting the TAT. Some interpretations
involve examiners identifying a hero or main character within the story, the belief being
that the examinee will identify with the character and project his or her unconscious
motives, feelings, and needs onto the character (Aronow, Weiss, & Reznikoff, 2013).
Furthermore, common themes among the stories should be identified, and how the client
relates the story should be noted.
Chapter 12 • Personality Assessment 289
TABLE 12.5 Sample Sentence Stems from the Rotter Incomplete Sentence Blank
1. I wish … 6. Sometimes …
2. I secretly … 7. The happiest time …
3. I feel … 8. The only trouble …
4. I regret … 9. What annoys …
5. I can’t … 10. Other kids …
Source: Rotter Incomplete Sentences Blank, Second Edition. Copyright © 1950, renewed 1977 by NCS Pearson, Inc.
Reproduced with permission. All rights reserved. “Rotter Incomplete Sentences Blank, Second Edition” and
“RISB” are trademarks, in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliates(s).
responses may provide insight into his or her self-image, developmental characteristics,
interpersonal reactions, needs, and perceived threats. Although published sentence-
completion instruments yield an overall score, most examiners simply read over the com-
pleted sentences and form impressions about what examinees’ responses might signify
about their personality characteristics. The Rotter Incomplete Sentence Blank, Second
Edition (RISB) is probably the best known sentence-completion instrument. Table 12.5
provides 10 sample sentence stems from the RISB.
Similar to sentence completion is story completion. There are different versions of the
story completion technique (Koppitz, 1982; Thomas, 1937), but in general, it involves
developing stories about a hypothetical child of the same age and gender as the child being
assessed. Koppitz (1982) gives an example of a story: “A boy is at the table with his parents.
Father suddenly gets angry. Why?” (p. 278). The stories are designed to investigate a
child’s dreams, fantasies, attitudes, and defense mechanisms, and they can be analyzed for
major themes.
Projective Drawings Projective drawings are perhaps the oldest category of projective
assessment procedures and are often used with children and adolescents (Weiner &
Greene, 2011). As with other projective approaches, drawings are thought to contain non-
verbal clues and symbolic messages about a child’s self-concept, motivations, concerns,
attitudes, and desires (Cummings, 1986). One of the most common drawing techniques is
the Draw-a-Person test (DAP; Machover, 1949; see Figure 12.5). In this technique, a child is
simply given a paper and a pencil and asked to draw a picture of a whole person. The
drawing must be created in the presence of the counselor. After the child completes his or
her drawing, an inquiry process ensues in which counselors ask questions about the draw-
ing, such as “Tell me a story about this person,” “What is the person doing?” or “How is
the person feeling?” Answers to the questions as well as the drawings themselves are used
to form various hypotheses and interpretations about the child’s personality and function-
ing. Although many methods are available to score human figure drawings, Koppitz’s
(1968) scoring system is one of the most frequently cited. Koppitz’s scoring system focuses
on 30 emotional indicators in children’s drawings. Emotional indicators are specific details in
drawings (e.g., tiny or large figures, transparent [X-ray] bodies, crossed eyes, showing
teeth, hands cut off, no eyes) that distinguish between normally adjusted children and
emotionally disturbed children. The presence or absence of three or more emotional indi-
cators may indicate underlying problems or maladjustment in the child. For example, poor
achievement might be reflected in poorly integrated body parts, a monster or grotesque
Chapter 12 • Personality Assessment 291
figure, omission of body, omission of arms, or omission of mouth (Koppitz, 1982, p. 285).
The drawing of a hostile, aggressive child might show a big figure, transparency, crossed
eyes, teeth, long arms, big hands, or genitals. Depressed, withdrawn children characteristi-
cally include tiny figures and short arms but no eyes. Some general guidelines for inter-
preting drawings are included in Table 12.6.
Another drawing technique is the House-Tree-Person Technique (HTP; Buck, 1948).
In this technique, children are asked to draw a house, a tree, and a person on separate
sheets of paper (see Figure 12.6). The picture of the house is supposed to arouse children’s
TABLE 12.6 General Guidelines for Interpreting Human Figure Drawings (continued)
Dimensions Interpretation
Detail:
Absent Psychosomatic, hypertensive conditions or depressive and
withdrawing
Excessive Obsessive-compulsive tendencies, rigidity and/or anxiety, highly
emotional
Bizarre Indicative of psychosis
Transparency/X-ray drawings Schizophrenic or manic conditions, poor judgment, sexual
disturbance indicated by sexual organs
Outer clothing Voyeuristic or exhibitionist tendencies
Distortions and omissions:
Distortions Confused, psychotic or schizophrenic condition
Omissions Conflict, denial
Perspective:
From below Rejection, unhappiness or inferiority, withdrawal tendencies
From above Sense of superiority, compensation for underlying feelings of
inadequacy
Distant Inaccessibility, desire to withdraw
Close Interpersonal warmth, psychological accessibility
Shading:
Shaded area Anxiety
Complete absence Character disorder
Color:
Red Problem or danger, anger or violent reaction, need for warmth and
affection
Orange Extroversion, emotional responsiveness to outer world, life-or-death
struggle, ambivalence
Yellow Cheerfulness, intellectualizing tendencies, lack of inhibition,
expansiveness, anxiety
Green Regulation of affective tendencies, healthy ego, peacefulness,
security
Blue Quiet, calm, well-controlled, cold, distant, fading away or
withdrawing
Purple Inner emotional and affective stimulation, internalization of affect,
bold exterior, need to control or possess
Brown Sensuousness, security, fixations, rigidity, in touch with nature
Gray Noninvolvement, repressions, denial, emotional neutralization
White Passivity, emptiness, depersonalization, loss of contact with reality
294 Chapter 12 • Personality Assessment
feelings about their home life and family relationships. Drawing a tree is expected to elicit
feelings about inner strengths or weaknesses. The picture of the person is believed to reveal
children’s view of themselves (i.e., self-concept).
Family drawings are a projective drawing technique that provide a nonthreatening
way to assess a child’s perception of his or her family. For example, the Kinetic Family
Drawing (KFD) asks the child to “Draw a picture of everyone in your family, including
you, doing something” (Burns & Kaufman, 1970, p. 5). The instructions emphasize the
family engaging in some activity—hence the term kinetic. After the image is completed, an
inquiry phase begins in which the child is asked to (a) explain who each figure is (e.g.,
name, age, relationship to the child); (b) describe all the figures, what they are doing in the
picture, how they are feeling, and what they are thinking about; and (c) tell a story, includ-
ing what happened immediately before the actions in the drawing and what happens next.
Family drawings can be analyzed and interpreted in terms of actions, styles, and symbols
(Burns & Kaufman, 1970, 1972), in addition to figures’ positions, distance in relation to one
another, and barriers between figures (Knoff & Prout, 1985).
The Quality of Life Inventory The QOLI (Frisch, 1994) is a brief but comprehensive
measure of a person’s overall satisfaction with life. People’s life satisfaction is based on
how well their needs, goals, and wishes are being met in important areas of life. The inven-
tory consists of 32 items that ask test takers to describe how important certain aspects of
their lives (such as work or health) are and how satisfied they are with them. The QOLI
yields an Overall Quality of Life score (reported as a T score and percentile) that is classi-
fied as very low, low, average, or high. The inventory also yields Weighted Satisfaction Rat-
ings for 16 areas of life (such as health, self-esteem, love, work, etc.). The satisfaction ratings
are summarized on a profile report with a ratings range from –6 (extreme dissatisfaction)
to 6 (extreme satisfaction). Negative weighted satisfaction ratings denote an area of life in
which the individual may benefit from treatment; ratings of –6 and –4 are considered of
greatest concern and urgency.
Source: The table above represents typical types of questions posed in self-esteem assessment instruments. The
items are not from published instruments..
Response Styles
The concept of the response style (also called response bias) in personality assessment refers
to the ways in which a test taker responds to an item in some characteristic manner that
distorts the test results. Response style is an important aspect of personality assessment.
Because most personality assessment instruments are self-report, there is a possibility that
individuals can either distort their responses to impress or distort their responses to imply
that they are functioning poorly, depending on the purpose of the assessment (Ray et al.,
2013). For example, an individual who is being evaluated to determine his or her fitness as
parent may have a tendency to provide socially desirable answers to items. In this situa-
tion, the individual may be “faking good” because of defensiveness about his or her behav-
ior and the desire to reunite with his or her children. Sometimes, the choice is unconsciously
expressed. Some individuals purposely answer questions in way that results in a patho-
logical or undesirable profile. In this case, they may have some motivation for appearing
more symptomatic than they really. For example, imagine that you are a counselor evalu-
ating an individual for the court system to determine if they are fit to stand trial for mur-
der. If the individual can claim that their mental health issues impaired their ability to
determine right from wrong, then they may be able to avoid prosecution. In this case,
“faking bad” becomes a real possibility.
Some test takers may be more apt to respond yes or true on items than no or false, and
others may answer questions with random responses. The problem with response bias is
that the test taker is not providing honest, accurate answers to questions, which can ren-
der the results suspect or even invalid. For example, imagine you are conducting an
assessment on an adolescent who is mandated to counseling services. That individual
might be less than willing to fully participate in the assessment process. Instead of giving
full effort in completing an assessment instrument, that person might simply answer yes
to all of the items. Of course, there are more response styles that can emerge than just
298 Chapter 12 • Personality Assessment
answering yes to all items. Common response styles are as follows (Cohen, Swerdlik, &
Sturman, 2012):
• Social Desirability Choosing the response that bests presents self in a favorable light
• Acquiescent Tendency of the test taker to accept or agree with statements regardless
of the item content (i.e., answering all items as true or yes)
• Nonacquiescence Disagreeing with whatever item is presented (i.e., answering all
items as false or no)
• Deviance Making unusual or uncommon responses
• Extreme Choosing extreme, rather than middle, ratings on a rating scale
• Gambling/Cautiousness Guessing, or not guessing, when in doubt about the answer
All self-report inventories are subject to response biases; therefore, instrument developers
use various means to try to control response bias. For example, the MMPI-2 and MMPI-
2-RF have validity scales that are used to identify test-taker response styles that might
affect or invalidate the test results. The Edwards Personal Preference Inventory (EPPI)
attempts to control the social desirability response style by using forced-choice item match-
ing on the basis of social desirability. In other words, items include paired phrases that
may be both desirable and undesirable.
Summary
Personality inventories are a method of identi- Structured personality inventories are
fying and measuring the structure and features standardized, self-report instruments that ask
of one’s personality, or one’s characteristic way respondents to describe their personality using
of thinking, feeling, and behaving (Cohen et al., a limited set of response options. Inventories
2012). Personality inventories assist profession- such as the MMPI-2 and MMPI-2-RF, the
als in understanding an individual’s behavior MCMI-III, and the CPI are examples of struc-
with the aim of making some decision about a tured personality inventories. Projective instru-
future course of action or to make a prediction ments and techniques require examinees to
about the person’s unique future behavior. answer questions related to ambiguous stimuli
Counselors can choose from a wide array of using open-ended responses. Examples include
personality instruments and techniques, which the Rorschach Inkblot Test, the TAT, verbal pro-
can be grouped into three categories: structured jective techniques, and projective drawings.
personality inventories, projective techniques, Some personality inventories, such as self-
and instruments that assess positive aspects of concept scales, are designed specifically to
personality. measure positive personality traits.
Suggested Activities
1. Operationally define a personality construct and 4. Review the research on one of the current issues
devise a method for measuring the construct. in personality assessment and write a critical
Develop some preliminary items and give them analysis of your findings.
to your classmates. What was the reaction of the 5. Make an annotated bibliography of the personal-
test takers to the form of the test? ity tests that are appropriate for your current or
2. Identify a personality test of interest to you. Criti- future field.
cally analyze the test. Read the reviews of the test 6. Enneagram activity: In this Enneagram exercise,
in the Mental Measurement Yearbooks and Test the following table shows the nine Enneagram
Critiques. types and four groups of adjectives that describe
3. Interview workers in the helping professions each type.
who use tests and find out which personality
tests they use and why. Report your findings to
the class.
Enneagram
Type Group 1 Rank Group 2 Rank Group 3 Rank Group 4 Total Score
a. Rank each adjective within each group in or- 7. Review the scores ahead of the NEO PI-R and
der from 1 (Least Like Me) to 9 (Most Like answer the following questions:
Me). No ties are allowed (each rank must a. How would you describe the dimensions of
be used only once for each adjective in each this individual’s personality?
group). b. How does the absence of validity and lie
b. Reading across each of the nine rows, add the scales on the NEO PI-R influence your ability
total points for the four groups. Look at your to interpret the results of this instrument?
high scores and low scores. Then turn back
to Table 12.4 showing Enneagram types, and NEO PI-R Domain Score Range
look at the description of the nine types. Row
Neuroticism Low
1 gives a score for type 1 Reformer, row 2 for
type 2 Helper, and so on. Identify the Ennea- Extraversion Very High
gram type that you scored highest in. Openness to Experience High
c. Discuss your results in small groups. Do you Conscientiousness High
agree that your Enneagram type matches your
Agreeableness High
personality type? Explain.
300 Chapter 12 • Personality Assessment
References
Allport, G. W. (1937) Personality. New York: Holt, Cummings, J. A. (1986). Projective drawings. In H. M.
1937. Knoff (Ed.), The assessment of child and adolescent
Allport, G. W. (1951). Personality: A psychological inter- personality (pp. 199–244). New York, NY: Guilford.
pretation. New York, NY: Henry Holt. Detrick, P., & Chibnall, J. T. (2013). Revised NEO
American Psychiatric Association (APA). (2013). Personality Inventory normative data for police
Diagnostic and statistical manual of mental disorders officer selection. Psychological Services, 10(4),
(5th ed.). Washington, DC: Author. 372–377.
Aronow, E., Weiss, K. A., & Reznikoff, M. (2013). A Duvoe, K. R., & Styck, K. M. (2014). Book review:
practical guide to the Thematic Apperception Test: The Oxford Handbook of Child Psychological Assess-
TAT in clinical practice. London, UK: Routledge. ment. Journal of Psychoeducational Assessment,
Beutler, L. E., Groth-Marnat, G., & Hardwood, T. M. 32(5), 477–480.
(2012). Integrative assessment of adult personality Exner, J. E., Jr. (Ed.). (2013). Issues and methods in Ror-
(3rd ed.). New York, NY: Guilford. schach research. London, UK: Routledge.
Buck, J. N. (1948). The H-T-P test. Journal of Clinical Fitts, W. H., & Warren, W. L. (1996). Tennessee Self-
Psychology, 4, 151–159. Concept Scale: Manual (2nd ed.). Los Angeles, CA:
Burns, R. C., & Kaufman, S. H. (1970). Kinetic family Western Psychological Services.
drawings (K-F-D). New York, NY: Brunner-Mazel. Frisch, M. B. (1994). Manual and treatment guide for the
Burns, R. C., & Kaufman, S. H. (1972). Actions, styles, Quality of Life Inventory. Minneapolis, MN: Pearson.
and symbols in kinetic family drawings (K-F-D). New Furnham, A., Guenole, N., Levine, S. Z., & Chamorro-
York, NY: Brunner-Mazel. Premuzic, T. (2013). The NEO Personality Inven-
Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Telle- tory–Revised: Factor structure and gender
gen, A., Dahlstrom, W. G., & Kaemmer, B. (2001). invariance from exploratory structural equation
Minnesota Multiphasic Personality Inventory-2 modeling analyses in a high-stakes setting. Assess-
(MMPI-2) manual for administration and scoring, ment, 20(1), 14–23.
second revision. Minneapolis, MN: University of Gorostiaga, A., Balluerka, N., Alonso-Arbiol, I., &
Minnesota Press. Haranburu, M. (2011). Validation of the Basque
Callahan, J. L. (2014). Evidence-based technical skills Revised NEO Personality Inventory (NEO PI-R).
training in prepracticum psychological assess- European Journal of Psychological Assessment, 27(3),
ment. Training and Education in Professional 193–205.
Psychology, 9(1), 21–27. Gough, H. G. (2000). The California Psychological
Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970). Inventory. In C. E. Watkins & V. L. Campbell
Handbook for the Sixteen Personality Factor Question- (Eds.), Testing and assessment in counseling practice
naire (16PF). Champaign, IL: Institute for Person- (2nd ed., pp. 45–71). Mahwah, NJ: Erlbaum.
ality and Ability Testing. Groth-Marnat, G. (2009). Handbook of psychological
Cohen, R., & Swerdlik, M. (2010). Test development: assessment (5th ed.). Hoboken, NJ: John Wiley &
Psychological testing and assessment. New York, NY: Sons.
McGraw-Hill Higher Education. Hodges, S. (2011). The sentence stem technique: An
Cohen, R. J., Swerdlik, M. E., & Sturman, E. D. (2012). innovative interaction between counselor and
Psychological testing and assessment: An introduction client. Journal of Creativity in Mental Health, 6(3),
to tests and measurement (8th ed.). Boston, MA: 234–243.
McGraw-Hill. Jung, C. G. (1971). Psychological types (H. G. Baynes,
Coopersmith, S. (1981). Antecedents of self-esteem. Palo Trans.). Princeton, NJ: Princeton University Press.
Alto, CA: Consulting Psychologists Press. (Origi- (Original work published 1921)
nal work published 1967) Kalimeri, K. (2013). Towards a dynamic view of person-
Costa, P. T., Jr., & McCrae, R. (2013). The Five-Factor ality: Multimodal classification of personality states in
Model of personality and its relevance. Personality everyday situations. Paper presented at the Pro-
and Personality Disorders: The Science of Mental ceedings of the 15th ACM International confer-
Health, 6(4), 17. ence on multimodal interaction.
Chapter 12 • Personality Assessment 301
Källmen, H., Wennberg, P., & Bergman, H. (2011). and use of the Myers–Briggs Type Indicator (3rd ed.).
Psychometric properties and norm data of the Palo Alto, CA: Consulting Psychologists Press.
Swedish version of the NEO-PI-R. Nordic Journal of Myers, I. B., & Myers, P. B. (1995). Gifts differing:
Psychiatry, 65(5), 311–314. Understanding personality type. Palo Alto, CA:
Kamphaus, R. W., & Frick, P. J. (2010). Clinical assess- Davis-Black.
ment of child and adolescent personality and behavior Myers, P. B., & Myers, K. D. (2004). Myers–Briggs
(3rd ed.). New York, NY: Springer. Type Indicator Profile. Retrieved from cpp.com
Knoff, H. M., & Prout, H. T. (1985). The Kinetic Draw- /images/reports/smp261001.pdf
ing System: A review and integration of the kinetic Neukrug, E., Peterson, C. H., Bonner, M., & Lomas,
family and school drawing techniques. Psychology G. I. (2013). A national survey of assessment
in the Schools, 22(1), 50–59. instruments taught by counselor educators. Coun-
Koppitz, E. M. (1968). Psychological evaluation of chil- selor Education & Supervision, 52(3), 207–221. doi:
dren’s human figure drawings. New York, NY: 10.1002/j.1556-6978.2013.00038.x
Grune & Stratton. Newgent, R. A., Parr, P. E., Newman, I., & Higgins,
Koppitz, E. M. (1982). Personality assessment in the K. K. (2004). The Riso–Hudson Enneagram Type
schools. In C. R. Reynolds & T. B. Gutkin (Eds.), Indicator: Estimates of reliability and validity.
Handbook of school psychology (pp. 273–295). New Measurement and Evaluation in Counseling and
York, NY: Wiley. Development, 36(4), 226–237.
Kroeger, O., & Thuesen, J. M. (2013). Type talk: The 16 O’Brien, E. J., & Epstein, S. (2012). MSEI: The Multidi-
personality types that determine how we live, love, and mensional Self-Esteem Inventory. Odessa, FL: Con-
work. New York, NY: Random House LLC. sulting Psychologists Press.
Krueger, R. F., Derringer, J., Markon, K. E., Watson, Ortner, T. M., & Schmitt, M. (2014). Advances and
D., & Skodol, A. E. (2012). Initial construction of a continuing challenges in objective personality
maladaptive personality trait model and inven- testing. European Journal of Psychological Assess-
tory for DSM-5. Psychological Medicine, 42(9), ment, 30(3), 163–168.
1879–1890. Panek, P. E., Jenkins, S. R., Hayslip, B., Jr., & Moske,
Machover, K. (1949). Personality projection in the draw- A. K. (2013). Verbal expressive personality testing
ing of the human figure. Springfield, IL: Thomas. with older adults: 25+ years later. Journal of Per-
McCrae, R. R., & Costa, P. T. (2003). Personality in sonality Assessment, 95(4), 366–376.
adulthood: A Five-Factor Theory perspective (2nd Piers, E. V., & Herzberg, D. S. (2002). Piers–Harris
ed.). New York, NY: Guilford. Children’s Self-Concept Scale—second edition manual.
Meyer, G. J., & Kurtz, J. E. (2006). Advancing personal- Los Angeles, CA: Western Psychological Services.
ity assessment terminology: Time to retire “objec- Ray, J. V., Hall, J., Rivera-Hudson, N., Poythress, N.
tive” and “projective” as personality test descriptors. G., Lilienfeld, S. O., & Morano, M. (2013). The rela-
Journal of Personality Assessment, 87, 223–225. tion between self-reported psychopathic traits
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2012). and distorted response styles: A meta-analytic
Measurement and assessment in teaching (11th ed.). review. Personality Disorders: Theory, Research, and
Upper Saddle River, NJ: Pearson. Treatment, 4(1), 1.
Millon, T. (1969). Modern psychopathology: A biosocial Riso, D. R., & Hudson, R. (1999). The wisdom of the
approach to maladaptive learning and functioning. Enneagram: The complete guide to psychological and
Philadelphia, PA: Saunders. spiritual growth for the nine personality types. New
Millon, T., Millon, C., Davis, R. D., & Grossman, S. York, NY: Bantam.
(2009). Millon clinical multiaxial inventory-III Rorschach, H. (1992). Psychodiagnostik. Bern, Switzer-
(MCMI-III): Manual. Upper Saddle River, NJ: land: Bircher.
Pearson/PsychCorp. Rosenberg, M. (1965). Society and the adolescent self-
Murray, H. A. (1943). Thematic Apperception Test man- image. Princeton, NJ: Princeton University Press.
ual. Cambridge, MA: Harvard University Press. Smith, S. R. (2011). Projective Assessment Tech-
Murray, H. A. (2007). Explorations in personality. New niques. Encyclopedia of Child Behavior and Develop-
York, NY: Oxford University Press. ment, 1163–1164.
Myers, I. B., McCaulley, M. H., Quenk, N. L., & Hammer, Solomon, B. C., & Jackson, J. J. (2014). Why do per-
A. L. (1998). MBTI manual: A guide to the d evelopment sonality traits predict divorce? Multiple pathways
302 Chapter 12 • Personality Assessment
through satisfaction. Journal of Personality and trait facets and Five-Factor Model trait domains.
Social Psychology, 106(6), 978–996. doi: 10.1037/ Assessment, 20(3), 308–311.
a0036190.supp (Suppl.) Urbina, S. (2014). Essentials of psychological testing
Spielberger, C. D. (1970). Manual for the State-Trait (Vol. 4, 2nd ed.). Hoboken, NJ: John Wiley & Sons.
Anxiety Inventory (Form Y). Palo Alto, CA: Con- Van den Broeck, J., Rossi, G., Dierckx, E., & De Clercq,
sulting Psychologists Press, Inc. B. (2012). Age-neutrality of the NEO-PI-R: Poten-
Spielberger, C. D. (1983). Manual for the State-Trait tial differential item functioning in older versus
Anxiety Inventory (STAI). Palo Alto, CA: Consult- younger adults. Journal of Psychopathology and
ing Psychologists Press. Behavioral Assessment, 34(3), 361–369.
Spielberger, C. D., & Butcher, J. N. (2013). Advances in Vassend, O., & Skrondal, A. (2011). The NEO person-
personality assessment (Vol. 7). London, UK: Rout- ality inventory revised (NEO-PI-R): Exploring the
ledge. measurement structure and variants of the Five-
Thomas, M. (1937). Méthode des histoires à com- Factor Model. Personality and Individual Differences,
pleter pour le dépiste des complexes et des con- 50(8), 1300–1304.
flits affectifs enfantins. Archives Psychologie, 26, Weiner, I. B., & Greene, R. L. (2011). Handbook of per-
209–284. sonality assessment. Malden, MA: John Wiley &
Thomas, K. M., Yalch, M. M., Krueger, R. F., Wright, Sons.
A. G. C., Markon, K. E., & Hopwood, C. J. (2013). Woodworth, R. S. (1920). Personal data sheet. Chicago,
The convergent structure of DSM-5 personality IL: Stoelting.
CHAPTER
13 Clinical Assessment
The term clinical assessment generally refers to applying assessment procedures to (a) diag-
nose a mental disorder, (b) develop a plan of intervention, (c) monitor progress in coun-
seling, and (d) evaluate counseling outcome. Traditionally employed in mental health
settings, clinical assessment involves the counselor collecting relevant information about a
client, organizing and integrating the data, and using his or her clinical judgment in form-
ing an opinion about a client’s diagnosis. In addition, counselors conduct clinical assess-
ment to develop a plan of intervention, monitor a client’s progress in counseling, and
evaluate counseling outcomes. Although clinical assessment is typical for mental health
settings, it is imperative that all counselors understand the process and are able to interpret
the information from a clinical assessment. The results of clinical assessment inform coun-
seling in all settings. As with all assessment, clinical assessment encompasses multiple
data collection instruments and strategies as well as multiple sources of information.
JJ Describe the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)
assessment.
JJ Describe the mental status exam.
JJ List the risk factors and warning signs of suicide and explain how suicide risk is assessed.
Mental disorders are common in the United States and internationally. In the United
States, an estimated 26.4% of Americans of ages 18 and older suffer from a diagnosable
mental disorder in a given year (Demyttenaere et al., 2013), and about 20% of children
and adolescents are estimated to have mental disorders. Four of the 10 leading causes of
disability are mental disorders: major depression, bipolar disorder, schizophrenia, and
obsessive-compulsive disorder. The severity of mental disorders is often increased in
developing countries because of a lack of appropriate services (Demyttenaere et al., 2013).
Determining a Diagnosis
Although counselors have different privileges with regard to diagnosis from state to
state, it is still essential to understand the process of determining a diagnosis. To deter-
mine a client’s diagnosis, counselors may use various formal and informal instruments
and strategies in the process of clinical assessment. Usually, the counselor conducts an
interview first, and the background information gathered from the interview helps the
counselor decide which additional assessment instruments or strategies should be
employed. For example, a client who complains of being depressed during the interview
may be asked to complete an inventory that assesses depressive symptoms. The informa-
tion gathered from both the interview and the inventory can aid in deciding whether the
client should be diagnosed with depression.
In addition to diagnosis, the clinical assessment can help guide decisions related to
treatment. From the assessment, counselors can determine the client’s diagnosis, and,
based on this information, they can select a treatment plan that is effective for that par-
ticular diagnosis. For example, a client who is depressed may be treated with cognitive-
behavioral therapy, which is a treatment approach shown to be effective in treating
depression. Cognitive-behavioral therapy involves identifying and correcting the client’s
inaccurate thoughts associated with depressed feelings, helping the client increase social
competence, and enhancing the client’s problem-solving skills.
Chapter 13 • Clinical Assessment 305
Researchers and clinicians may also use clinical assessment for monitoring treatment
progress and evaluating treatment outcomes. They may analyze which treatment approach
was most effective for a client with a particular disorder, or they may determine which
population would most benefit from a certain treatment modality. They can monitor the
client’s progress throughout treatment to note improvements or worsening of symptoms
and to evaluate the final outcome of treatment.
In this chapter, we will discuss the process of clinical assessment. We will begin by
describing the diagnosis of mental disorders and then will present information about the
use of interviews, formal instruments, and observations in clinical assessment.
DSM-5
A key function of clinical assessment is the diagnosis of mental disorders. Clinicians diag-
nose disorders using criteria from the DSM-5 (American Psychiatric Association (APA),
2013). The DSM-5 is the official classification system of mental disorders used by counse-
lors, psychologists, social workers, psychiatrists, and other mental health professionals in
the United States. It is used across settings (inpatient, outpatient, partial hospital, private
practice, and primary care) and with community populations. Substantial training is needed
in the use of the DSM-5, which is beyond the scope of this textbook. Those of you who are
preparing to become clinical mental health counselors will receive coursework in abnormal
human behavior, psychopathology, and diagnosis. For those of you in other counselor
preparation specializations, it will be necessary to complete elective courses or continuing
education to become competent in the diagnostic process. For all counselors, we recom-
mend clinical supervision from an Approved Clinical Supervisor (ACS) or state board–
qualified supervising counselor when developing skills in clinical assessment and diagnosis.
Although we are unable to provide an in-depth exploration of clinical assessment, we will
provide you with an introduction to diagnosis and a brief overview of the DSM-5 disorders.
To begin to understand the DSM-5, counselors need to understand the meaning of
the term mental disorder. In general, a mental disorder is a complex construct comprised of
various symptoms that impact mental health and a broad array of biological, social, occu-
pational, and relational functioning. Mental disorders are classified using the DSM-5 based
on the clinician’s assessment of certain criteria. The DSM-5 contains over 300 separate
disorders divided into 22 categories. For an example of how the various types of personal-
ity disorder fall under a category, see Table 13.1.
Each mental disorder has a list of diagnostic criteria (i.e., symptoms, emotions, cogni-
tions, behaviors) that need to be met in order for a diagnosis to be made. For example, code
296.21 for Major Depressive Disorder can be marked by a variety of symptoms, such as
almost daily insomnia and fatigue, unintended changes in weight, depressed mood, and
lack of interest or pleasure in daily activities. This diagnosis may also be specified with
applicable features, such as anxious distress, catatonia, seasonal patterns, or mood-
congruent psychotic features (American Psychiatric Association (APA), 2013). Each disor-
der also includes a description of its features, including specific age, cultural, and
gender-related features; prevalence, incidence, and risk; course (e.g., typical lifetime
patterns); complications; predisposing factors; familial pattern; and differential diagnosis
(e.g., how to differentiate this disorder from other similar disorders). The DSM-5 also
provides code numbers for each disorder, which is helpful for medical record keeping,
statistical analysis, and reporting to third parties.
In contrast to the multiaxial diagnostic system used in the Diagnostic and Statistical
Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR), the DSM-5 moved
to a nonaxial documentation of diagnosis that combines the information from Axes I, II,
and III of the previous edition (Axis I: Clinical Disorders; Axis II: Personality Disorders
and Mental Retardation; Axis III: General Medical Conditions). Counselors diagnose a
client’s current condition and rate the severity of the disorder using a new dimensional
approach. Using this new dimensional assessment process, counselors provide the diag-
nostic information needed to ensure that clients receive the appropriate treatment for
their current symptoms and disorders. Please see Table 13.2 for an overview of the DSM-5
classification system.
In order to illustrate the new dimensional assessment process, consider the case of
Julio, a 3-year-old male whose parents are concerned about his progress on developmental
milestones. Julio’s mother reports that he was late on sitting up, crawling, walking, and
other areas. Julio has difficulty with verbal communication but seems to understand what
is being said to him. Julio’s mother did not report any pregnancy or delivery complica-
tions. Julio’s mother indicated that Julio has had some difficulty with maintaining weight
and that he is a picky eater. Julio tends to only eat items that are of a certain texture. Also,
Julio is highly sensitive to changes in routine. For example, he displays a tantrum if they
take a different route to preschool than usual. When Julio plays with his toys, he does so in
an atypical manner. For example, instead of rolling a car, he turns it over and only spins
the wheels. Julio becomes very upset if his play is disrupted.
Julio’s preschool teacher reports that he does not engage with other children and that
he does not attempt verbal communication while in preschool. His teacher reports that
Julio often seems to stare off in the distance and does not seem to attend to what is occur-
ing in the classroom. She noted that he does make some verbalizations but that they seem
to be just repeating what she has said.
Based on the information provided, Julio’s DSM-5 diagnosis would be 299.0 Level 2
with accompanying language impairment. According to the DSM-5, individuals at Level 2
require substantial support for their disorder. In this case, Julio has marked deficits in ver-
bal communication, clear social impairments, reduced responses to social overtures from
others, inflexibility of behavior, and other symptoms. He is able to participate in preschool
but is not being successful. He requires significant levels of intervention and support to
manage his current behavior and to make progress on developmental milestones.
The new edition of the Diagnostic and Statistical Manual has launched with a degree of
controversy that will not abate in a short time. The changes from the DSM-IV-TR to the
DSM-5 were significant and have received numerous challenges. The critics of the DSM-5
have been adamant that the new edition lacks the science to drive the numerous changes
(Whitbourne, 2013). Several groups, including the Center for Medicare and Medicaid Ser-
vices have been so opposed to the changes that they are now requiring the use of the Inter-
national Classification of Disease Manual (ICD) in lieu of the DSM-5 (APA, 2013). As a
professional counselor, you will likely either use the DSM-5 to levy a diagnosis or need to
understand the diagnosis that was given by another helping professional. Therefore, we
will review some of the commonly cited advantages and disadvantages. One of the criti-
cisms of the DSM-5 has been the inclusion of a mild category in the severity index. Many
believe that such a classification will result in the pathologizing of common phenomena.
For example, the diagnosis of mild neurological impairment may be difficult to differenti-
ate from average cognitive changes in older adulthood (APA, 2013). In addition to the
criticisms of the new edition, the overall DSM has been criticized because (a) it emphasizes
pathological symptoms, (b) managed care organizations tend to use categories to deny
authorization of client treatment for normal conflicts or for longer-term care, and (c) it has
a strong medical orientation that runs counter to the wellness philosophy that counselors
espouse (Remley & Herlihy, 2013). Further explanation of the DSM-5 is beyond the scope
of this textbook. However, other sources of information are available that provide infor-
mation about the DSM-5 and diagnosis (see Seligman & Reichenberg, 2014).
Of course, the changes in the DSM-5 have some positive attributes. As has been the
case with all iterations of the manual, the DSM-5 has some distinct advantages for clinical
practice: (a) it offers a universal system of diagnosis that permits dialogue among mental
health professionals; (b) it includes attention to cultural, age, and gender features; and (c)
the severity index requires that practitioners consider various physical, psychosocial, and
308 Chapter 13 • Clinical Assessment
depression, then you may want to use only the Mood Disorder module to ask questions
specifically about the symptoms of depression and to confirm the existing diagnosis. The
SCID-I has two versions: the Clinician Version (SCID-CV) and the Research Version (SCID-
I-RV). The Clinician Version is a streamlined, more user-friendly version of the Research
Version that can be used in clinical settings. Administration time can range from 15 min-
utes to around 90 minutes, depending on how many modules are administered. The Biom-
etrics Research Department at Columbia University is now in the process of revising the
SCID to match the DSM-5. The new versions will be the SCID-CV and the SCID-5-RV.
Because the diagnostic criteria for personality disorders did not change from the DSM-
IV‑TR to the DSM-5, the SCID II can still be used, although it is also under revision. The
release of the new SCID versions is expected in 2015.
Examples of other semistructured interviews used for diagnosing include the follow-
ing (note that one of the titles includes the word structured, but it is in fact a semistructured
interview):
• Diagnostic Interview for Children and Adolescents-IV (DICA-IV)
• Schedule for Affective Disorders and Schizophrenia (SADS)
• Semistructured Clinical Interview for Children and Adolescents (SCICA)
• Structured Clinical Interview for DSM-IV Axis-II Disorders (SCID-II)
Unstructured interviews are the most frequently used type of interview in clinical
assessment. They are considered unstructured because there is no standardization of ques-
tioning or recording of responses. It is the counselor who makes up the questions and
determines how to document the responses (Jones, 2010). Although they are flexible,
unstructured interviews are not without some direction or format. Counselors typically
assess several general domains, including the presenting problem, family background,
social and academic history, medical history, and substance use history. As you might
guess, an unstructured interview requires a knowledgeable and skilled clinician to man-
age the various facets of the interview process and to chain together the pieces of informa-
tion gathered within the context of a diagnostic decision tree. Table 2.2 in Chapter 2
provides a description of these general domains.
As part of clinical assessment, topics relevant to diagnosing mental disorders are
addressed in the unstructured interview. For example, one of the domains involves gather-
ing information about the client’s “presenting problem,” which is the client’s primary
problems or concerns. For clinical assessment, counselors may ask specifically about psy-
chological symptoms (such as depression or anxiety) and about occupational/school func-
tioning and social functioning to help determine a DSM-5 diagnosis. In addition, counselors
will ask about the history of the client’s presenting problem in three main areas (American
Psychiatric Association (APA), 2013; Seligman & Reichenberg, 2014):
1. Onset/Course When did the problems begin? Was there a time when the client felt
worse or better? Was there any particular pattern?
2. Severity Do the problems interfere with the client’s life (e.g., work, relationships,
and leisure pursuits) and/or lead to suffering or distress?
3. Stressor Does the client believe that some external event brought on the problems?
Are any stressful life events associated with the problem?
Unstructured interviews in clinical assessment may also include questions about devel-
opmental history, focusing specifically on risk factors associated with the development of
310 Chapter 13 • Clinical Assessment
current or potential mental health problems from childhood through adulthood; questions
about child abuse, domestic violence, and mental illness within the client’s family; and ques-
tions focusing on substance use, including past and current alcohol or drug use (including
prescription drugs), levels of consumption, and any consequences resulting from substance
use (e.g., legal problems, job loss, financial difficulties).
• Judgment and Insight Does the client have the capacity for social judgment? Does
the client have insight into the nature of his or her illness?
• Reliability How accurately was the client able to report his or her situation?
A well-known screening test of mental status is called the Mini Mental Status Exami-
nation (MMSE; Folstein, Folstein, & McHugh, 1975). It is “mini” because it focuses only on
the cognitive aspects of mental functions and excludes questions about mood, perceptual
disturbances, and thought process or content. Thus, the exam is commonly used in the
evaluation of Alzheimer’s disease or other forms of dementia. The MMSE consists of 11
questions that measure five areas of cognitive function: orientation, registration, attention
and calculation, recall, and language (see Table 13.3 ). It is important to gather information
from each area, because neurocognitive issues may impact one aspect of brain functioning
but not another. For example, clients may know their names and where they are but may
not know the year. We stress that a screening tool such as the MMSE is not suitable alone
for making a diagnosis but can be used as a brief general survey of a broad range of cogni-
tive function and can screen for the need for more thorough neuropsychological evalua-
tion. The maximum score of the MMSE is 30. A score of 23 or lower indicates significant
cognitive impairment. The MMSE takes only 5 to 10 minutes to administer and is therefore
practical to use repeatedly and routinely.
TABLE 13.3 Sample Items from the Mini Mental Status Exam
Item Instructions Purpose
Orientation “What is the date?” To assess whether the client is
oriented to the present.
Registration “Listen carefully. I am going to say three words. To assess the client’s immediate
You say them back after I stop. Ready? Here they recall and memory.
are … APPLE [pause], PENNY [pause], TABLE
[pause]. Now repeat those words back to me.”
Naming “What is this?” (Point to a pencil or pen.) To assess the client’s ability to
name an object.
Reading “Please read this and do what it says.” (Show To assess the client’s ability
examinee a piece of paper with the words to read and follow a written
CLOSE YOUR EYES written on it.) command.
Source: Reproduced by special permission of the publisher, Psychological Assessment Resources, Inc., 16204
North Florida Avenue, Lutz, Florida 33549, from the Mini Mental State Examination, by Marshal Folstein
and Susan Folstein, Copyright 1975, 1998, 2001, by Mini Mental LLC, Inc. Published 2001 by Psychological
Assessment Resources, Inc. Further reproduction is prohibited without permission of PAR, Inc.
312 Chapter 13 • Clinical Assessment
Beck Depression Inventory The BDI-II (Beck et al., 1996) is one of the most widely used
instruments for measuring the severity of depression in persons 13 years of age and older. The
21-item self-report inventory takes approximately 5 to 10 minutes for clients to complete. The
content of items reflects the diagnostic criteria for depression in the DSM-5 (American Psychiat-
ric Association (APA), 2013) and includes sadness, pessimism, past failure, loss of pleasure,
guilty feelings, punishment feelings, self-dislike, self-criticalness, suicidal thoughts or wishes,
crying, agitation, loss of interest, indecisiveness, worthlessness, loss of energy, changes in sleep-
ing pattern, irritability, changes in appetite, concentration difficulty, tiredness or fatigue, and
loss of interest in sex. However, because there were some significant changes in the DSM-5 with
regard to the diagnosis of depression, counselors are cautioned to use best practice and to con-
duct a multifaceted assessment process. For each item on the BDI-II, respondents are asked to
choose from among a group of statements (rated on a 4-point scale from 0 to 3) those that best
describe their feelings during the past 2 weeks. The simple rating-scale format of the inventory
allows individuals to easily comprehend the questions and respond appropriately. To score the
inventory, counselors simply sum the item scores. Total raw scores can range from 0 to 63;
scores from 0 to 13 represent minimal depression, from 14 to 19 indicate mild depression, from
20 to 28 represent moderate depression, and from 29 to 63 indicate severe depression. The BDI‑II
is a fast, efficient way to assess depression in either a clinical or nonclinical setting. The publisher
has recently lowered the classification of the BDI-II to a B-Level qualification, meaning that test
users must have a master’s degree in counseling or a related field or be licensed or certified to
practice in their state. In terms of technical quality, the BDI-II has demonstrated strong evidence
of internal consistency and test-retest reliability and evidence of convergent validity.
Symptom Checklist 90-Revised The SCL-90-R is a 90-item self-report symptom inventory
designed to measure psychological symptom patterns in individuals who are ages 13 years
and older (Derogatis, 1994). It can be paper-and-pencil, audiocassette, or computer admin-
istered and requires about 12 to 15 minutes to complete. Each item represents a particular
psychological symptom (e.g., feeling fearful, feeling low in energy, difficulty making deci-
sions), which is rated on a 5-point scale of distress (0–4), ranging from not at all (0) to
extremely (4). The SCL-90-R yields raw scores and T scores for three global indices and nine
primary symptom dimensions (Derogatis, 1994). The three global indices provide measures of
the level of an individual’s distress and include the following:
1. Global Severity Index (GSI) This index is the average score of the raw scores of all 90
items of the questionnaire. It provides a single numeric indicator of the respondent’s
psychological status. In general, T scores at or above 63 suggest the presence of a
clinically significant level of psychological difficulties.
2. Positive Symptom Distress Index (PSDI) This index is a measure of symptom inten-
sity or severity. It is the average value of all items scored above zero (between 1 and
4). For example, if a client’s raw score on the PSDI index was 2.5, then this means that
the average of all items scored above zero was between 2 (Moderately) and 3 (Quite
a Bit). Higher scores on the PSDI indicate increased symptom severity.
3. Positive Symptom Total (PST) This is the number of items scored above zero.
Whereas PSDI is a measure of symptom severity, the PST index represents the num-
ber (or breadth) of symptoms. For example, if a client’s raw score on the PST was 60,
then this would mean that he or she answered 60 out of the 90 items with a rating
from 1 to 4. A low PST would indicate relatively few symptoms, whereas a high PST
would indicate a wide array of symptoms.
314 Chapter 13 • Clinical Assessment
The nine primary symptom dimensions represent various psychological symptom pat-
terns. T scores of 63 or more on two or more of the symptom dimensions suggest clinically
significant levels of psychological distress. The primary symptoms are as follows:
1. Somatization (SOM) This dimension reflects distress from perceptions of bodily
dysfunction. Complaints may focus on cardiovascular, gastrointestinal, respiratory,
gross musculature, or other bodily areas (note responses to actual items). Pain and
anxiety are both likely to be present as well.
2. Obsessive-Compulsive (O-C) This dimension focuses on obsessions and compul-
sions that cause distress. It focuses on impulses, thoughts, and actions that are unre-
mitting, irresistible, and unwanted.
3. Interpersonal Sensitivity (I-S) This dimension reflects feelings of inadequacy and
inferiority. Self-deprecation, self-doubt, and discomfort in interpersonal situations
are evident among individuals who score high on this index. They have negative
expectations regarding interpersonal behavior with others and are self-conscious.
4. Depression (DEP) Elevated scores on the DEP index reflect a range of depressive
symptoms, such as withdrawal, lack of motivation, loss of energy, hopelessness, and
suicidal thoughts.
5. Anxiety (ANX) This index focuses on general signs of anxiety, such as nervousness,
tension, and trembling, as well as panic attacks and feelings of terror, apprehension,
and dread.
6. Hostility (HOS) Elevated scores on the HOS dimension indicate thoughts, feelings,
and behaviors that are characteristic of anger, such as aggression, irritability, rage,
and resentment.
7. Phobic Anxiety (PHOB) This dimension focuses on the presence of persistent and
irrational fear related to a specific person, place, object, or situation. Items in this
dimension more closely reflect agoraphobia or panic attacks rather than merely
phobias.
8. Paranoid Ideation (PAR) This dimension represents the key elements of paranoid
thought, such as suspiciousness, hostility, grandiosity, fear of loss of autonomy, and
delusions.
9. Psychoticism (PSY) Items on the PSY dimension are indicative of a withdrawn and
isolated lifestyle as well as first-rank symptoms of schizophrenia, such as hallucina-
tions and thought control. Scores reflect a range of psychoticism from minor levels of
interpersonal alienation to severe psychotic symptoms.
To interpret SCL-90-R scores, counselors examine (1) the global indices, (2) the
symptom dimensions, and (3) specific items. The SCL-90-R provides a clinical profile
showing both raw scores and T scores for the global indices and the symptom dimen-
sions. It also provides a list of items that were endorsed as quite a bit or extremely dis-
tressed. A sample clinical profile is presented in Figure 13.1. A unique feature of the
SCL-90-R is that an individual’s scores can be compared with and plotted based on four
norm groups: psychiatric outpatients, nonpatients, psychiatric inpatients, and nonpa-
tient adolescents. The SCL-90-R can be used as a one-time assessment of a client’s clini-
cal status, or it can be used repeatedly to document treatment outcome. In order to
further your understanding of the SCL-90-R and how it can be used in the diagnostic
process, complete Exercise 13.1. After reviewing the information, respond to the exer-
cise questions.
Chapter 13 • Clinical Assessment 315
Exercise 13.1
Clinical Assessment: Jada
Jada, age 25, worked part-time as a librarian. persistently worried about having more
She was driving to work one day when she attacks and became increasingly anxious
began to feel intensely anxious. Her heart about having another attack while she was
started pounding, and she began sweating, driving alone with no one to help her. She
had shortness of breath, felt dizzy and shaky, began avoiding driving, and she even began
and felt like she was about to die. She pulled to fear leaving her house alone.
off to the side of the road and recovered after Jada sought counseling at an outpa-
about 10 minutes. Jada drove on to work and tient counseling center and was adminis-
was able to function that day but experi- tered the SCL-90-R during her initial
enced similar incidents over the next couple assessment. Review Jada’s SCL-90-R Clinical
of weeks. For the next few months, she was Profile and answer the questions that follow.
75
70
65
60
55
50
45
40
35
30
SOM O-C I-S DEP ANX HOS PHOB PAR PSY GSI PSDI PST
T Score: 55 48 46 57 63 52 66 51 53 65 56 54
Raw Score: 1.24 1.17 0.97 2.71 2.91 1.17 2.45 1.17 1.04 2.45 2.58 60
Eating Disorder Inventory The Eating Disorder Inventory (EDI-3) is a widely used self-
report measure of psychological traits shown to be clinically relevant in individuals with
eating disorders. The 91-item inventory is organized into 12 primary scales, consisting of
three eating-disorder-specific scales and nine general psychological scales that are highly
relevant to, but not specific to, eating disorders. It also yields six composites: one that is
eating-disorder specific (i.e., Eating Disorder Risk) and five that are general integrative
psychological constructs (i.e., Ineffectiveness, Interpersonal Problems, Affective Problems,
Overcontrol, General Psychological Maladjustment).
PTSD Diagnosis
Yes / No
Symptom Number of Symptom Level of
Severity Symptoms Severity Impairment in
Score Endorsed Rating Functioning
Mild to Mild to
0–50 0−15
Severe Severe
from additional validation research utilizing larger and perhaps demographically more
representative samples. Figure 13.2 shows a sample profile report.
Substance Abuse Subtle Screening Inventory The Substance Abuse Subtle Screening
Inventory (SASSI-3) is a brief, self-report, easily administered psychological screening
measure that is available in separate versions for adults and adolescents. The instrument
helps identify individuals who have a high probability of having a substance-dependence
disorder. It is particularly useful in identifying persons who are experiencing difficulties
with substance abuse but are unwilling or unable to acknowledge them. The SASSI-3 is
now available for online assessment through sassionline.com. The SASSI-3 is comprised of
10 scales that measure the following dimensions:
1. Face Valid Alcohol (FVA)
2. Symptoms (SYM)
3. Subtle Attributes (SAT)
4. Supplemental Addiction Measure (SAM)
5. Correctional (COR)
6. Face Valid Other Drugs (FVOD)
7. Obvious Attributes (OAT)
8. Defensiveness (DEF)
9. Family vs. Control Measure (FAM)
10. Random Answering
2006). Approximately 71% of all counselors will work with a client who attempts suicide
(Granello, 2010).
These statistics are dire, and although no one can predict with absolutely certainty
who will commit suicide, mental health professionals strive to find accurate ways of assess-
ing suicide risk in order to keep people safe. As counselors-in-training, it is critical that you
understand the nuances of suicide risk assessment and the best practices for working with
suicidal clients. Suicide risk is difficult to assess for several reasons. First, there is no “one
size fits all,” meaning determination of suicide risk involves evaluating a combination of
risk factors and warning signs. Risk factors are ongoing client characteristics that increase
suicide risk, and warning signs are client behaviors that warn of imminent suicide risk
(Hawton, Casañas i Comabella, Haw, & Saunders, 2013). Marital and job status, feelings of
hopelessness or isolation, and family history are all examples of risk factors. Warning signs
include settling personal affairs, giving away possessions, and drastic changes in mood
and behavior (Granello & Granello, 2006).
A comprehensive suicide risk assessment typically involves several types of assess-
ment methods and sources of information. The most common assessment method involves
unstructured interviews in which mental health professionals assess suicide risk in several
key areas, such as intent, a suicide plan with accessible means, a history of suicidal thoughts,
and family history of suicidal thoughts and mental disorders (Granello & Granello, 2006).
It’s important that questions about suicide risk be asked in clear and frank language
(i.e., “Have you thought about suicide?” or “Are you considering killing yourself?”).
Sometimes, clients will use euphemisms (e.g., “I just want to be with the angels” or “I won’t
be around for them to pick on anymore”) that hint at suicide plans (Granello & Granello,
2006, p. 189). In these situations, it is vital that professionals ask the client in a direct man-
ner about plans to commit suicide.
Many mental health professionals use suicide risk–assessment instruments in con-
junction with the unstructured interviews, and there are hundreds of instruments availa-
ble that assess suicide risk that are either commercially published or available in the
research literature. Although these instruments were designed to directly measure suicide
ideation and behavior, none have been universally supported as being effective in deter-
mining risk. Thus, a few of the more commonly used suicide risk–assessment instruments
are listed in Table 13.5.
In addition to instruments assessing suicide ideation and behavior, there are instru-
ments that measure variables that have been closely associated with suicide, such as
hopelessness and reasons for living. For example, the Beck Hopelessness Scale (BHS; Beck
& Steer, 1988) is a self-report instrument that is designed to measure the extent of positive
and negative beliefs about the future. Another example is the Brief Reasons for Living
Inventory (BRFL; Ivanoff, Jang, Smyth, & Linehan, 1994), which assesses the reasons for
not killing oneself if the thought were to occur.
Many checklists and inventories that assess general psychological symptoms also
have “critical items” assessing suicide risk. For example, the BDI-II has a suicide item that
consists of four ratings: 1 (“I don’t have any thoughts of killing myself”), 2 (“I have thoughts
of killing myself, but I would not carry them out”), 3 (“I would like to kill myself”), and 4
(“I would kill myself if I had the chance”). Similarly, the Outcome Questionnaire 45 (OQ45;
Lambert, Lunnen, Umphress, Hansen, & Burlingame, 1994) has a suicide potential screen-
ing item (“I have thoughts of ending my life”) that respondents can rate on a 5-point scale
from never to almost always. Although suicide risk cannot be adequately evaluated with one
item alone, these types of items can be useful screening tools for indicating the need for
more thorough assessment of suicide risk.
Although formal assessment instruments can be useful tools, they do not replace
interviews and interactions with a client and are not a substitute for clinical judgment.
Comprehensive suicide risk assessment involves examining information obtained from
interviews, information from assessment instruments, and/or collateral information from
individuals closest to the client.
Child Behavior Checklist In clinical assessment, the CBCL/6-18 (Achenbach & Res-
corla, 2001) is considered the gold standard for evaluating behavioral and emotional
problems in children aged 6 to 18. The CBCL/6-18 is a rating scale completed by par-
ents, guardians, and/or close relatives that assesses children’s competencies and behav-
ioral/emotional problems. It is part of the Achenbach System of Empirically Based
Assessment (ASEBA), which includes an integrated set of rating forms that assess adap-
tive and maladaptive functioning in children and adults. For example, the CBCL/6-18
(completed by the parent/guardian) is designed to be administered in conjunction with
the Youth Self-Report (YSR; completed by the child) and the Teacher’s Report Form
(TRF; completed by the child’s teacher). All three instruments measure eight cross-
informant syndrome constructs: Social Withdrawal, Somatic Complaints, Anxiety/
Depression, Social Problems, Thought Problems, Attention Problems, Delinquent
Behavior, and Aggressive Behavior.
On the CBCL/6-18, parents are asked to rate 118 items that describe their child’s spe-
cific behavioral and emotional problems. Parents rate their child for how true each item is
now or within the past 6 months using the following scale: 0 = not true; 1 = somewhat or
sometimes true; 2 = very true or often true. Examples of items are “Acts too young for age,”
“Cries a lot,” and “Gets in many fights.” In addition, there are two open-ended items for
parents to report additional problems. The CBCL/6-18 requires a fifth-grade reading abil-
ity, and most parents can complete the CBCL/6-18 in about 15 to 20 minutes. Results can
be hand scored or computer scored.
The CBCL/6-18 provides several scales that measure parents’ perceptions of their
child’s behavior and emotional problems: three competence scales and a Total Compe-
tence scale; eight syndrome scales; Externalizing, Internalizing, and Total Problems scales;
and six DSM-oriented scales. The competence scales reflect the degree and quality of the
child’s participation in various activities (e.g., sports, hobbies, games), social relationships,
and academic performance. The syndrome scales include empirically derived syndromes
labeled as Anxious/Depressed, Withdrawn/Depressed, Somatic Complaints, Social Prob-
lems, Thought Problems, Attention Problems, Rule-Breaking Behavior, and Aggressive
Behavior, as well as a list of Other Problems that consists of items that are not strongly
associated with any of the syndrome scales. The first three syndrome scales (Anxious/
Depressed, Withdrawn/Depressed, and Somatic Complaints) comprise the Internalizing
scale, and the last two syndromes (Rule-Breaking Behavior and Aggressive Behavior)
comprise the Externalizing scale. The Total Problems scale is the sum of all of the 120 items
on the instrument. The DSM-oriented scales are comprised of items that experienced psy-
chiatrists and psychologists rated as being very consistent with categories of diagnoses
defined by the DSM-IV (American Psychiatric Association (APA), 1994). Table 13.6 pro-
vides a summary of each of the CBCL/6-18 scales.
The CBCL/6-18 provides separate scoring profiles for boys and girls of ages 6 to 11 and
12 to 18. Scoring profiles include raw scores, T scores, and percentile ranks for each scale
(Figure 13.3 shows an example of a scoring profile for the syndrome scales). T scores are fur-
ther categorized as being in the clinical range, borderline clinical range, or the normal range.
Chapter 13 • Clinical Assessment 321
Syndrome Scales
Anxious/Depressed Cries, fears school, phobias, perfectionist, feels unloved, feels worthless,
nervous, fearful, feels guilty, self-conscious, suicidal thoughts, worries.
Withdrawn/Depressed Enjoys little in life, prefers to be alone, won’t talk, secretive, shy, lacks
energy, unhappy and sad, withdrawn.
Somatic Complaints Nightmares, dizzy, fatigued, aches or pains, headaches, nausea, eye
problems, skin problems, stomach aches, vomiting.
Social Problems Dependent on adults, feels lonely, doesn’t get along with other kids,
jealous, believes others are “out to get” him or her, accident-prone,
teased, not like by other kids, clumsy, prefers to be with younger chil-
dren, speech problems.
Thought Problems Ruminates, harm to self, hears things, body twitches, skin picking, com-
pulsions, sees things that aren’t there, sleeps less, hoards things, strange
behavior, strange ideas, sleepwalks, sleep problems.
Attention Problems Acts too young for his or her age, fails to finish things, problems con-
centrating, trouble sitting still, daydreaming, impulsive, poor school-
work, inattentive, stares.
Rule-Breaking Behavior Drinks alcohol, no guilt after doing something wrong, breaks rules, so-
cializing with “bad” kids, lies or cheats, prefers older kids, has run away
from home, sets fires, often thinks about sex, uses tobacco, truant, uses
drugs, vandalizes.
Aggressive Behavior Argues, mean to others, demands attention, destroys own belongings,
destroys others’ belongings, disobeys parents, disobeys at school, fights,
attacks, screams, stubborn, moody, sulks, suspicious, teases others,
temper problems, threatens to hurt others, louder than other kids.
Internalizing, Externalizing,
and Total Problems Scales
Total Problems The sum of the 0-1-2 scores on all 120 items.
Internalizing The sum of scores for the Anxious/Depressed, Withdrawn/Depressed,
and Somatic Complaints syndromes.
Externalizing The sum of scores for the Rule-Breaking Behavior and Aggressive Behav-
ior syndromes.
(continued)
322 Chapter 13 • Clinical Assessment
DSM-Oriented Scales
Affective Problems Enjoys little in life, cries, harms self, feels worthless, feels guilty,
f atigued, apathetic, suicidal thoughts, low energy, sad. This scale
corresponds to the DSM-IV criteria for Major Depressive Disorder and
Dysthymia.
Anxiety Problems Dependent on adults, phobias, fears school, nervous, fearful, worries.
Corresponds with the DSM-IV criteria for Generalized Anxiety Disorder,
Social Anxiety Disorder, and Specific Phobia.
Somatic Problems Aches or pains, headaches, nausea, eye problems, skin problems,
stomach aches, vomiting. Corresponds with the DSM-IV criteria for
Somatization and Somatoform Disorders.
Attention Deficit/ Fails to finish things, problems concentrating, trouble sitting still,
Hyperactivity Problems fidgets, difficulty following directions, disturbs others, impulsive, talks
out, disruptive, inattentive, talks too much, louder than other kids.
Corresponds with the criteria for ADHD, Hyperactive-Impulsive, and
Inattentive types.
Oppositional Defiant Argues, defiant, disobeys at school, stubborn, temper problems.
Problems Corresponds with the criteria for Oppositional Defiant Disorder.
Conduct Problems Mean, destroys others’ belongings, no guilt after doing something
wrong, breaks rules, fights, socializing with “bad” kids, lies or cheats,
attacks, irresponsible, steals, swears, threatens, truant.
The value of T scores for each range varies depending on the scale. For example, on the com-
petence scales, higher T scores are associated with normal functioning. On the syndrome
scales, Internalizing, Externalizing, Total Problem, and DSM-oriented scales, lower T scores
are associated with normal functioning. For a more thorough understanding of children’s
assessment using the CBCL, complete Exercise 13.2 and respond to the exercise questions.
Internalizing Externalizing
100
95 C
T 90 L
85 I
S 80 N
C 75 I
O C
70
R A
65 L
E 60
55 N
< 50 O
Anxious/ Withdrawn/ Somatic Social Thought Attention Rule-Breaking Aggressive R
Depressed Depressed Complaints Problems Problems Problems Behavior Behavior M
Total Score 2 10 1 5 5 7 1 11 A
T Score 51 80-C 53 59 66-B 62 52 63 L
Percentile 54 >97 62 81 95 89 58 90
Source: Reprinted with permission from Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA School
Age Forms & Profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth and Families.
Chapter 13 • Clinical Assessment 323
Exercise 13.2
Child Clinical Assessment: CBCL
Name: Sabrina Robinson Clinician: Darcy Young Informant: Joanne Robinson
Gender: Female Agency: Community Counseling Relationship: Biological Mother
Clinic
Age: 12 Date of Assessment: 05/28/2007
Birth Date: 04/10/1995
Sabrina, age 12, was brought to an outpa- school, and has never been reprimanded
tient counseling center by her biological or held after school for detention. Ms.
mother, Joanne Robinson. Ms. Robinson Robinson reported that Sabrina has no
reported concern about Sabrina’s prob- friends at school but has two close friends
lems in school and her mood at home. Sab- who attend their church. Sabrina sees her
rina is in the sixth grade and is performing friends less than once a week. At home,
poorly academically; her most recent pro- Ms. Robinson reported that Sabrina has a
gress report showed D’s in social studies, few chores (e.g., making her bed, putting
math, and science and a C in language away dishes). She reported that Sabrina
arts. Ms. Robinson stated that Sabrina has enjoys playing games on her computer
difficulty concentrating on school and fre- and reading.
quently does not finish her homework. Ms. Robinson completed the
She is often tired, taking a nap every day CBCL/6-18. The profile scores of Sabrina’s
after school. Sabrina is well-behaved at CBCL/6-18 follow.
home, has few if any behavior problems in
T Score 34 30 29 10
Total
Percentile 6 2 2 Competence
T Score 27
Percentile 1
324 Chapter 13 • Clinical Assessment
• Perfectionist • Prefers being • Feels dizzy • Lonely • Hoarding • Acts young for age
• Feels alone • Tired • Accident- • Sleep • Fails to finish things
worthless • Won’t talk • Aches or prone problems • Can’t concentrate
• Self- • Secretive pains • Gets teased • Confused
conscious • Lacks energy • Headaches • Clumsy • Daydreams
• Sad • Nausea • Prefers • Poor school work
• Withdrawn • Stomach younger • Inattentive
aches kids • Stares blankly
Note: The items that were endorsed as somewhat or sometimes true or very true or often true are listed
under each scale.
CBCL/6-18 Internalizing, Externalizing, Total Problem, Other Problems for Girls 12–18
100
Other Problems
95
Overeats
90 C
L Overweight
I
85 N Thumb Sucking
I
T C
80 A
S L
C 75
O
R
E 70
65
N
60 O
R
M
55
A
L
50
Total Score 19 0 51
T Score 68 34 66
Percentile 97 6 95
Chapter 13 • Clinical Assessment 325
2. Are the CBCL/6-18 results consist- 4. What inferences can you make about
ent with Sabrina’s background infor- Sabrina based on the background
mation? Explain. information and her scores on the
3. Overall, what do the CBCL/6-18 CBCL/6-18?
results tell you about Sabrina? 5. What recommendations would you
have for Sabrina and her mother?
Neuropsychological Assessment
Neuropsychology is the study of brain-behavior relationships and the impact of brain
injury or disease on the cognitive, sensorimotor, emotional, and general adaptive capaci-
ties of the individual. Neuropsychological assessment involves assessing a wide variety of
cognitive and intellectual abilities, such as attention and concentration, learning and
memory, sensory-perceptual abilities, speech and language abilities, visuospatial skills
(ability to perceive spatial relationships among objects), overall intelligence, and execu-
tive functions. In addition, psychomotor speed, strength, and coordination all would be
addressed in some fashion. Three well-known neuropsychological instruments include
the Halstead–Reitan Neuropsychological Test Battery, the Luria–Nebraska Neuropsycho-
logical Battery (LNNB), and the Bender Visual-Motor Gestalt Test, Second Edition
(Bender-Gestalt II).
older and can be administered in 1.5 to 2.5 hours. The LNNB assesses a wide range of
cognitive functions on 11 clinical scales, two optional scales, three summary scales, and 11
factor scales. The following is a list and description of the clinical scales:
1. Motor Functions Measures a wide variety of motor skills
2. Rhythm Measures nonverbal auditory perception, such as pitch discrimination and
rhythmic patterns
3. Tactile Functions Measures tactual discrimination and recognition
4. Vision Functions Measures visual-perceptual and visual-spatial skills
5. Receptive Speech Measures perception of sounds from simple to complex
6. Expressive Speech Measures ability to repeat sounds, words, and word groups and
produce narrative speech
7. Writing Measures ability to analyze words into letters and to write under varying
conditions
8. Reading Measures ability to make letter-to-sound transformations and read simple
material
9. Arithmetic Measures knowledge of numbers, number concepts, and ability to per-
form simple calculations
10. Memory Measures short-term memory and paired-associate learning
11. Intellectual Processes Measures sequencing, problem-solving, and abstraction skills
Bender Visual-Motor Gestalt Test, Second Edition The Bender-Gestalt II is widely used
for screening neuropsychological impairment in both children and adults. Specifically, the
Bender-Gestalt evaluates visual-motor ability and visual perception skills. Visual-motor
ability is the ability of the eyes and hands to work together in smooth, efficient patterns.
Visual perception refers to one’s ability to receive, perceive, and make sense of visual stimu-
lation. To assess these skills, the Bender-Gestalt uses nine stimulus cards, each displaying
an abstract design (see Figure 13.4). The cards are presented to the examinee one at a time;
then, the examinee is asked to reproduce the design on a blank sheet of paper.
Although several systems are available for scoring the Bender-Gestalt, the scoring
process, in general, involves comparing an individual’s drawings to the designs displayed
on the cards. For example, the Global Scoring System (GSS) rates each drawing on a scale
1. Confused Order Sign of mental confusion, lack of planning ability, and poor organi-
zational skills
2. Wavy Line Poor motor coordination and/or emotional instability
3. Dashes Substituted for Circles Impulsivity or lack of interest
4. Increasing Size of Figures Low frustration tolerance and explosiveness
5. Large Size Impulsivity and acting-out behavior
6. Small Size Anxiety, withdrawal, constriction, and timidity
7. Fine Line Timidity, shyness, and withdrawal
8. Careless Overwork or Heavily Reinforced Lines Impulsivity, aggressiveness, and
acting-out behavior
9. Second Attempt at Drawing Designs Impulsivity and anxiety
10. Expansion (Using Two or More Sheets to Complete the Nine Drawings) Impulsivity
and acting-out behavior
11. Box around the Design Attempt to control impulsivity, weak inner control, and need
for outer limits and structure
12. Spontaneous Elaboration and/or Additions to Designs Unusual preoccupation with
own thoughts, fears, and anxieties; serious emotional problems
Other tests used for neuropsychological assessment include the following:
• The Kaufman Short Neuropsychological Assessment Procedure (K-SNAP) was
developed by H. S. Kaufman and N. L. Kaufman to measure the cognitive function-
ing of individuals ages 11 to 85. The test measures attention and orientation, simple
memory, and perceptual skills.
• The Quick Neurological Screening Test, Second Edition (QNST-II) assesses 15 areas
of neurological integration as they relate to learning. The scales are Control of Large
and Small Muscles, Motor Planning, Motor Development, Sense of Rate and
Rhythm, Spatial Organization, Visual and Auditory Skills, Perceptual Skills, and
Balance Orientation.
• The Ross Information Processing Assessment, Second Edition (RIPA-2) is designed
to assess any cognitive and linguistic deficits and determine severity levels. The
scales on the RIPA-2 are Immediate Memory, Recent Memory, Spatial Orientation,
Orientation to the Environment, Recall of General Information, Organization, and
Auditory Processing and Retention.
Chapter 13 • Clinical Assessment 329
Summary
The term clinical assessment generally refers to diagnosis. Interview techniques include
applying assessment procedures to (a) diagnose unstructured, semistructured, and structured
a mental disorder, (b) develop a plan of treat- interviews, as well as the MSE. In addition,
ment, (c) monitor treatment progress, and (d) many formal instruments are used in clinical
evaluate treatment outcome. Traditionally assessment. Most of these instruments have a
employed in mental health settings, clinical pathological focus and are closely aligned to
assessment involves the counselor determining symptoms of disorders listed in the DSM-5.
the information that is needed to answer the Instruments may assess a broad array of
referral question, organizing and integrating symptoms or a specific set of related symp-
the data, and using his or her clinical judgment toms. Formal and informal observations are
in forming an opinion about a client’s diagnosis. also used in clinical assessment to help make
Interviews are widely used in clinical an accurate diagnosis and in developing a
assessment to help determine an individual’s treatment plan.
330 Chapter 13 • Clinical Assessment
Suggested Activities
1. Interview psychologists and counselors working and has obtained several Valium tablets from a
in the mental health field, and find out what tests neighbor. She says she is lonely and has wasted
and assessment procedures they use and why. her life. She has the tablets in her house and is
Report your findings to the class. thinking about taking them. After further discus-
2. Critically review a test used in clinical assess- sion, the counselor finds out that Karen will be
ment. Read the reviews of the test in different turning 40 next week. She lives alone, has no
journals and yearbooks. friends, and has no family members who live
3. Design an interview schedule, and try it out on nearby. She used to be quite religious but stopped
several individuals. Write an analysis of the results. attending church last year. Karen has been previ-
4. Write a review of the literature on a topic such as ously involved with mental health services, hav-
behavioral assessment, intake interviews, use of ing been treated at an inpatient psychiatric
tests in counseling practice, or neuropsychological hospital several times over the years for depres-
testing. sion and suicidal ideation.
5. Review the following case study and answer the a. Identify the factors and warning signs associ-
questions at the end: ated with suicide risk.
Karen calls the suicide hotline number Friday b. How would you describe Karen’s overall risk
night at 9 p.m. and speaks to an on-call crisis for suicide?
counselor. Karen says that she is depressed and c. To further assess suicide risk, what questions
wants to kill herself. She has been drinking vodka would you like to ask Karen?
References
Achenbach, T. M., & Rescorla, L. A. (2001). Manual for Cohen, R. J., Swerdlik, M. E., & Sturman, E. D. (2012).
the ASEBA school-age forms & profiles. Burlington, Psychological testing and assessment: An introduction
VT: University of Vermont, Research Center for to tests and measurement (8th ed.). Boston, MA:
Children, Youth, & Families. McGraw-Hill.
American Psychiatric Association (APA). (2013). Demyttenaere, K., Bruffaerts, R., Posada-Villa, J.,
Diagnostic and statistical manual of mental disorders Gasquet, I., Kovess, V., Lepine, J., . . . Morosini, P.
(5th ed.). Washington, DC: Author. (2013). Prevalence, severity, and unmet need for
Beck, A. T., & Steer, R. A. (1993). Manual for the Beck treatment of mental disorders in the World Health
Hopelessness Scale. San Antonio, TX: Psychological Organization World Mental Health Surveys. Social
Corporation. Psychiatry and Psychiatric Epidemiology, 48(1),
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual 137–149.
for the Beck Depression Inventory-II. San Antonio, Derogatis, L. R. (1994). SCL-90-R: Symptom Checklist
TX: Psychological Corporation. 90-Revised: Administration, scoring, and procedures
Carlat, D. J. (2011). The psychiatric interview: A practi- manual (3rd ed.). Minneapolis, MN: Pearson.
cal guide (23rd ed.). Philadelphia, PA: Lippincott, First, M. B., Spitzer, R. L., Gibbon, M., & Williams,
Williams & Wilkins. J. B. W. (2012). Structured Clinical Interview for
Chapter 13 • Clinical Assessment 331
DSM-IV Axis I disorders (SCID-I), Clinician Versions. Lambert, M. J., Lunnen, K., Umphress, V., Hansen,
Washington, DC: American Psychiatric Press. N., & Burlingame, G. M. (1994). Administration and
Foa, E. B. (1995). The Posttraumatic Diagnostic Scale scoring manual for the Outcome Questionnaire
manual. Minneapolis, MN: Pearson Assessments. (OQ‑45.1). Salt Lake City, UT: IHC Center for
Folstein, M., Folstein, S. E., & McHugh, P. R. (1975). Behavioral Healthcare Efficacy.
“Mini-Mental State”: A practical method for grad- Louisiana Mental Health Counselor Licensing Act.
ing the cognitive state of patients for the clinician. (1987). Acts 1987, No. 892 §1, eff. July 20, 1987.
Journal of Psychiatric Research, 12(3), 189–198. Mitchell, M. B., & Atri, A. (2014). Dementia Screen-
Gilliland, B. E., & James, R. R. (1997). Theories and ing and Mental Status Examination in Clinical
techniques in counseling and psychotherapy (4th ed.). Practice. Dementia: Comprehensive Principles and
New York, NY: Allyn & Bacon. Practices, 461.
Granello, D. H. (2010). A suicide crisis intervention O’Donohue, W. T., Cummings, N. A., & Cummings,
model with 25 practical strategies for implementa- J. L. (2006). Clinical strategies for becoming a master
tion. Journal of Mental Health Counseling, 32(3), psychotherapist. Burlington, MA: Academic Press.
218–235. Remley, T. P., & Herlihy, B. (2013). Ethical, legal, and
Granello, D. H., & Granello, P. F. (2006). Suicide: An professional issues in counseling (4th ed.). Upper
essential guide for helping professionals and educators. Saddle River, NJ: Prentice Hall.
Boston, MA: Pearson. Sadock, B. J., & Sadock, V. A. (2007). Kaplan and
Hawton, K., Casañas i Comabella, C., Haw, C., & Sadock’s synopsis of psychiatry: Behavioral sciences/
Saunders, K. (2013). Risk factors for suicide in clinical psychiatry (10th ed.). Philadelphia, PA:
individuals with depression: A systematic review. Lippincott Williams & Wilkins
Journal of affective disorders, 147(1), 17–28. Seligman, L., & Reichenberg, L. W. (2014). Selecting
Ivanoff, A., Jang, S. J., Smyth, N. F., & Linehan, M. M. effective treatments: A comprehensive, systematic
(1994). Fewer reasons for staying alive when you guide to treating mental disorders (4th ed.). San Fran-
are thinking of killing yourself: The Brief Reasons cisco, CA: Jossey-Bass.
for Living Inventory. Journal of Psychopathology Sommers-Flanagan, J., & Sommers-Flanagan, R.
and Behavioral Assessment, 16, 1–13. (2013). Clinical interviewing (4th ed.). Hoboken, NJ:
Jones, K. D. (2010). The unstructured clinical inter- John Wiley & Sons.
view. Journal of Counseling & Development, 88(2), Whitbourne, S. K. (2013). What the DSM-5 changes
220–226. mean for you. Psychology Today. Retrieved from
Koppitz, E. M. (1975). The Bender Gestalt Test for Young http://www.psychologytoday.com/blog
Children (Vol. 2: Research and applications, 1963– /fulfillment-any-age/201305/what-the-dsm-5-
1973). New York, NY: Grune & Stratton. changes-mean-you
CHAPTER
14 Assessment in Education
Counselors who work with children, adolescents, college students, parents, or families
require a thorough understanding of assessment in educational systems. Regardless of
work setting, the results of educational assessment inform the counseling process. Edu-
cational assessment, a primary function for school counselors and school-based mental
health counselors, encompasses many of the types of assessment reviewed throughout
this textbook (e.g., achievement, aptitude, career assessment, observation). The purposes
of assessment in educational systems varies and can include tasks such as identifying
students with special needs, determining whether students have mastered the require-
ments for graduation, determining appropriate accommodations for a college student
with a learning disability, advocating for a child with unmet educational needs, coach-
ing a parent on their rights under the Individuals with Disabilities Education Act (IDEA),
and evaluating the effectiveness of a comprehensive school counseling program. Assess-
ment in education is a broad practice that includes the use of a wide variety of instru-
ments (e.g., achievement tests, ability tests, aptitude tests, career assessment instruments)
and assessment procedures (e.g., functional behavior assessment, observation, interpre-
tation, report writing).
School counselors, school-based mental health counselors, and college counselors
play an important role in assessment programs and are often involved in collecting and
using assessment data; regularly monitoring student progress; and communicating the
purposes, design, and results of assessment instruments to various parties. To practice
effectively, counselors need specific knowledge and skills about the assessment instru-
ments and strategies used in the schools. Because the scope of educational assessment is
broad, it is difficult to capture the roles and activities of all counseling professionals who
interact with educational systems in one chapter. Thus, we will focus primarily on the
roles and functions of school counselors in this chapter, but we stress that the material
herein is relevant to all counselors and helping professionals.
JJ List and explain the competencies in assessment and evaluation for school counselors.
Grade Level
Test/Inventory K 1 2 3 4 5 6 7 8 9 10 11 12
Readiness test X
NAEP X X X
Career development/
X
maturity
Career interest/values X
ASVAB X
PSAT/PLAN X
SAT X X
ACT X X
National Merit X
Achievement Tests Every state requires schools to administer achievement tests that are
aligned with their educational standards. As we addressed in the chapter on assessing
achievement (Chapter 9), states may develop their own standardized achievement test or
use a commercially developed achievement test battery, such as the Stanford Achievement
Test (Stanford 10). On these examinations, students must earn a minimum test score to
prove mastery in order to be promoted to the next grade or graduate high school. Schools
may also administer the National Assessment of Educational Progress (NAEP), which
tracks student achievement in 4th, 8th, and 12th grades. Many elementary schools admin-
ister diagnostic reading tests to screen and monitor student progress in learning reading
skills. Other forms of achievement assessment include curriculum-based assessment,
curriculum-based measurement, and performance-based assessment.
Intellectual Ability Tests Test that measure intellectual ability are used in the schools for
a variety of purposes. They may be used to help identify students with intellectual disabil-
ities or learning disabilities or to place students into specialized academic or vocational
programs. Most schools administer ability tests such as the Cognitive Abilities Test
(CogAT) or the Otis–Lennon School Ability Test (OLSAT 8) to screen students for place-
ment in gifted and talented programs.
Readiness Tests Schools frequently use the scores from readiness tests to determine
whether children are “ready” for kindergarten or are “ready” for promotion to first grade.
Readiness tests assess whether children have the underlying skills deemed necessary for
school learning. As we indicated previously, school readiness tests should be used with
caution and careful examination of normative groups. There is a strong potential for cul-
tural bias with these types of instruments.
Aptitude Tests Four of the most widely used aptitude tests are the Armed Services Voca-
tional Aptitude Battery (ASVAB), the General Aptitude Test Battery (GATB), the Differen-
tial Aptitude Test (DAT), and the Career Aptitude Placement Survey (CAPS). Some school
districts administer an aptitude test in the 10th grade. Civilian counselors working with
the ASVAB will visit high schools and administer the test to all high school students who
want to take the test for educational and vocational guidance purposes. Students meet
with armed services personnel only if they request an interview.
Admissions Tests Most high school counselors are in charge of coordinating programs to
administer the PSAT, PLAN, SAT, ACT, or other college admissions tests to students in the
10th, 11th, and 12th grades. Coordinating the administration of admissions tests can
include providing orientation about admissions tests to students and parents, having
information about prep classes, and ensuring that students meet the necessary deadlines
for complete test registrations.
Career Assessment Instruments Various career assessment instruments are used in the
schools, especially at the middle school and high school levels. General interest inventories
are often given at both levels. In addition, school counselors often administer career devel-
opment inventories that help the school district evaluate the educational and vocational
needs of the students. Popular career assessment systems include the COPSystem, the
Kuder Career Planning System, and the College and Career Readiness System.
336 Chapter 14 • Assessment in Education
in all three educational levels, high school counselors had more responsibility for selecting,
administering, and interpreting assessment instruments. Although no studies have been
conducted to validate that these assessment tasks remain the same today, they are clearly
aligned with the accountability requirements of the ASCA National Model (American
School Counselor Association (ASCA), 2012) and the Standards for Educational and
Psychological Testing (American Educational Research Association (AERA), American
Psychological Association (APA), & National Council on Measurement in Education
(NCME), 2014). These tasks are considered common for school counselors, but there is a
degree of concern about the competency of school counselors to perform the tasks (Maras,
Coleman, Gysbers, Herman, & Stanley, 2013). We stress that competency requires practice
under supervision and extended study through continuing education and thorough review
of assessment materials.
Needs Assessments
Assessment of students’ counseling needs is a crucial component of implementing an
effective school counseling program. Needs assessment is a formal process of gathering
information from a range of sources (e.g., students, parents, teachers, administrators)
about their perceptions of the needs of the student population. Needs assessments deter-
mine the students’ needs or desired outcomes and, in doing so, identify the priorities of the
school counseling program within the philosophical framework of the school and com-
munity. According to Sink (2009), school counselors conduct needs assessments as part of
the accountability dimension of the ASCA National Model. Sink indicated that school
counselors must take accountability leadership and evaluate their comprehensive school
counseling programs for missing or underutilized elements, student learning progress,
service improvement efforts, to evaluate concerns, and to guide school program changes.
Most needs assessment instruments are informally constructed to assess students’ needs in
three broad areas: academic, career, and personal or social. Figure 14.2 shows an example
of a career development needs assessment instrument. As another example, you might
consider conducting a needs assessment among parents and teachers to determine the
types of services you should provide in your comprehensive school counseling program.
Items on the needs assessment might ask participants to rate various service needs on a
Likert scale (e.g., bully prevention, multiculturalism, social skills, conflict resolution).
Read each phrase and then decide the importance of that activity to you. Circle the appropriate
number to the right of each phrase using the following scale.
6. Mathematical Calculation Ability to learn basic math facts and perform basic math
operations, such as addition, subtraction, multiplication, and division
7. Mathematical Reasoning Ability to apply mathematical techniques, concepts, or
processes to solve problems
Students with an SLD may have difficulty listening, thinking, speaking, reading, writing,
spelling, or doing math problems. They usually have average to above-average intelligence,
but may have difficulties demonstrating academic knowledge and understanding. They
will show intraindividual differences in their academic skills and abilities; in other words,
they often do well in some school subjects, but usually have extreme difficulty with certain
skills, such as decoding (reading) words, calculating math facts, or putting their thoughts
and ideas into writing. SLDs are believed to be caused by some kind of neurological condi-
tion that affects information processing, which means that although students with SLDs
almost always hear and see normally, they have trouble understanding what they see or
hear. SLDs can be caused by such conditions as perceptual (visual or auditory) disabilities,
brain injury, minimal brain dysfunction, dyslexia, and developmental aphasia (language
impairment). SLDs do not encompass learning problems that are primarily the result of
visual, hearing, intellectual, or motor disabilities; emotional disturbance; environmental,
cultural, or economic factors; or limited English proficiency. Although the true prevalence
rate of SLDs is not known, it is estimated that approximately 5% of students have an identi-
fied SLD, but that as many as 15% more of students remain unidentified (Cortiello, 2014).
Chapter 14 • Assessment in Education 339
An important practice for schools is to identify students with SLDs to ensure that the
students receive appropriate services to meet their educational needs. Historically, the
most common method schools have used to identify students with SLDs is the ability-
achievement discrepancy model (also called the IQ-achievement discrepancy model), which was
a component of the Individuals with Disabilities Education Improvement Act (IDEA) of
1997. This model identifies students as learning disabled when there is a severe discrepancy
between their scores on an ability (i.e., intelligence) test and their scores on an achievement
test. A discrepancy of 1 to 1.5 standard deviations between the scores generally qualifies as
severe. Thus, if a student’s achievement test scores are well below his or her ability test
scores in at least one area (such as reading), then the student can be classified as having a
specific learning disability. There have been many concerns cited about the use of the
ability-achievement discrepancy approach (National Association of State Directors of Spe-
cial Education’s IDEA Partnership, 2007). First, critics have described the ability-achieve-
ment discrepancy approach as a wait-to-fail approach, because intervention is withheld
until a discrepancy can be demonstrated, which often does not occur until the student has
experienced several years of academic failure. Second, the information gathered from the
ability and achievement assessments does not indicate each student’s specific learning
needs. Third, the discrepancy model can create inequitable treatment for students; that is,
disproportionate numbers of students from culturally and linguistically diverse back-
grounds have been identified as having SLDs via the ability-achievement discrepancy
approach. Contemporary cognitive assessment involves more than just standardized test-
ing. Today, evidence-based procedures that include standardized assessment are required
for the identification of SLDs (Decker, Hale, & Flanagan, 2013).
Source: From Responsiveness to intervention and learning disabilities: A report prepared by the National Joint
Committee on Learning Disabilities. Copyright 2005 by the National Joint Committee on Learning Disabilities
(NJCLD). Reproduced by permission.
students who are not meeting grade-level expectations so that instructional or behavioral
interventions can be implemented. Progress monitoring (using curriculum-based assess-
ment) occurs at Tier 2 to identify students who continue to need assistance and to evaluate
the effectiveness of instructional and behavioral interventions. In Tier 3, diagnostic assess-
ment is conducted as part of a comprehensive evaluation for students who fail to make
sufficient progress in Tier 2 in order to determine their eligibility for special education.
Assessing Giftedness
When considering the concept of Giftedness, it is important to note that numerous defini-
tions exist. Although states may differ in how they define giftedness, there is a federally
legislated definition that guides the conceptualization of giftedness in educational set-
tings. The United States federal government views the gifted and talented classification as
encompassing those students who demonstrate potential for high achievement in a range
Chapter 14 • Assessment in Education 341
of areas (e.g., academics, performance, arts, creativity) and who need support services to
realize their potential. (No Child Left Behind Act, 2004). These children demonstrate high
performance capability in intellectual, creative, or artistic areas, as well as strong leader-
ship ability. The National Association for Gifted Children (2008) reported that nearly three
million children (i.e., 5% of the U.S. student population) are considered gifted and tal-
ented. However, this may not be a reliable estimate. Researchers conducting the National
Surveys of Gifted Programs (Callahan, Moon, & Oh, 2014) reported that there was a high
degree of variance in the percentage of identified gifted children from district to district
and that some districts didn’t identify gifted students at all.
Although we can’t identify the exact percentage of gifted children, we do have
research-based approaches for assessment and intervention for that population. Gifted
children require services or activities not typically provided in a regular classroom setting;
thus, most schools have gifted and talented programs that provide unique educational oppor-
tunities for children who have been identified as gifted and talented. These programs can
include modifying or adapting regular curriculum and instruction; accelerated learning
opportunities, such as independent study and advanced placement; opportunities for
subject and grade skipping; and providing differentiated learning experiences. To qualify
for participation in a school’s gifted and talented program, students must meet certain
minimum standards.
The process of assessing students for giftedness usually entails two phases: screening
and identification. To screen for giftedness, most districts conduct annual school-wide
screening using standardized ability or achievement tests. Students may also be referred
or nominated for giftedness screening by their teacher, school counselor, or parent. Stu-
dents must meet a set cutoff score on the screening test to continue to the identification
phase. In the identification stage, further assessment is conducted, focusing on four areas:
1. Cognitive Ability Ability to perform at an exceptionally high level in general intel-
lectual ability, which may be reflected in such cognitive areas as reasoning, memory,
nonverbal ability, and the analysis, synthesis, and evaluation of information
2. Academic Ability Ability to perform at an exceptionally high level in one general
academic area or a few specific academic areas significantly beyond others of one’s
age, experience, or environment
3. Creative Thinking Ability Ability to perform at an exceptionally high level in crea-
tive thinking, as evidenced by creative or divergent reasoning, advanced insight and
imagination, and solving problems in novel ways
4. Visual or Performing Arts Ability Ability to perform at an exceptionally high level
in the visual arts, dance, music, or drama
To assess these four areas, a variety of assessment instruments and strategies may be used
(see Table 14.2). Cognitive ability and academic ability are typically evaluated using
standardized ability tests and standardized achievement tests. To be placed into a gifted
and talented program, students must meet or exceed minimum cutoff scores as defined
by state laws. For example, in some states, students must either score at least two stand-
ard deviations above the mean minus the standard error of measurement or at or above
the 95th percentile on an approved cognitive ability test and an achievement test. For
assessing academic achievement, states may require scores only for specific subtests or
scales of the test, such as reading comprehension, written expression, and mathematics.
Creative thinking ability can be assessed by using intelligence tests, general gifted-
ness screening instruments, or instruments designed specifically to appraise creativity. An
342 Chapter 14 • Assessment in Education
the rarity and unusualness of ideas; Elaboration, the amount of detail in the responses;
Abstractness of Titles, the degree to which a picture title is expressed beyond simple labe-
ling; and Resistance to Premature Closure, the degree of psychological openness. Torrance
also developed Thinking Creatively in Action and Movement, a nonverbal movement assess-
ment of creativity; Thinking Creatively with Sounds and Words, which measures an individu-
al’s ability to create images for words and sounds; and Thinking Creatively with Pictures,
which uses picture-based exercises to measure creative thinking.
Visual or performing arts ability is usually demonstrated through a display of work,
a performance, or an exhibition. For example, students may submit portfolios containing
artistic samples; perform a musical recital; provide evidence of placing first, second, or
third in a music or art contest; or perform a series of drama or dance sessions. Trained
examiners evaluate the products or performance using an approved rubric or checklist.
Instruments are also available that were developed specifically for assessing charac-
teristics of giftedness. For example, the Scales for Rating the Behavioral Characteristics of
Superior Students, Revised (SRBCSS; Renzulli et al., 2010) is a widely used rating scale that
can be completed by school personnel who are familiar with the student’s performance. It
is comprised of 14 scales associated with the characteristics of gifted students: Learning,
Motivation, Creativity, Leadership, Art, Music, Drama, Planning, Communication (Preci-
sion), Communication (Expressiveness), Mathematics, Reading, Science, and Technology.
Score reports include raw scores, grade-level means, and percentile ranks that are calcu-
lated based on the particular group of students taking the test; there are no published
national norms for the scale. Each of the 14 scales represents relatively different sets of
behaviors; therefore, the scores obtained from the separate scales should not be summed to
yield a total score (Renzulli et al., 2010). Test users may select from among the 14 scales
only those scales appropriate to both the purpose of the test and the requirements or needs
of the school; for example, schools may use only the Artistic, Creativity, and Musical scales
as part of their giftedness assessment program. Each scale consists of several items in
which raters evaluate student behaviors using a scale from 1 (never) to 6 (always). The
SRBCSS authors did not establish cutoff scores for identification of gifted children, based
on the assumption that there would be variations in student populations. Instead, they
recommended that test users calculate local cutoff scores. An online version of the SRBCSS
is available, and summary reports include raw scores and percentile ranks. Other instru-
ments designed for identifying gifted and talented students include the Gifted Rating
Scales—School Form (GRS-S), Gifted and Talented Evaluation Scale (GATES), and the Screening
Assessment for Gifted Elementary and Middle School Students (SAGES-2).
Standardized assessment instruments are useful for screening and identifying gifted-
ness in students and for designing programs and services based on these needs. Despite
their potential usefulness, tests also have limitations. The technical inadequacy of tests can
result in bias against certain populations of gifted students, especially those from racial,
cultural, and ethnic minority populations, those from low socioeconomic environments,
and those with disabilities or for whom English is a second language. Thus, although
standardized tests are critically important in the assessment process, careful attention is
critical to the selection of appropriate assessment instruments when assessing underserved
gifted students. Furthermore, a score from a single instrument should never be the sole
basis for placement decisions for gifted and talented programs. In order to further your
understanding of assessment for giftedness, complete Exercise 14.1 and then respond to
the exercise questions.
344 Chapter 14 • Assessment in Education
Exercise 14.1
Assessment of Giftedness
Gayle began taking piano lessons at age 4. onstrates gifted characteristics, so the
By the time she was 10 years old, she was teacher decides to refer Gayle to be
a proficient pianist, had completed in sev- screened for the gifted and talented pro-
eral statewide youth piano competitions, gram. Gayle passed the screening test and
and had recently won the prestigious was assessed for identification of gifted-
national Chopin Youth Piano Competi- ness. In Gayle’s school, to qualify for the
tion. Gayle has already decided that she gifted and talented program students
wants to be a professional musician. have to meet specific criteria for superior
In school, Gayle performs well aca- cognitive ability, specific academic ability,
demically. Her fifth grade teacher believes creative thinking ability, and visual or
that Gayle’s musical ability as well as her performing arts ability, as indicated in the
high level of academic performance dem- following table:
response opportunities (Holland, 1997). Counselors look for ways that different environ-
mental arrangements and conditions force accommodations in behavior and study the
individual/environment interactions. Person-environment fit has wide application and is
especially important when working with preschool children and students with disabilities
who are placed in regular classrooms.
Several assessment instruments have been developed to measure aspects of the class-
room school environment at all levels. Instruments measure the perceptions of various
groups, such as parents, teachers, students, and administrators. Some of the major instru-
ments are as follows:
• The Classroom Environment Scale, Third Edition (CES; Trickett & Moos, 1995) assesses
perceptions of the learning environment of middle and high school classrooms. The
instrument evaluates the effects of course content, teaching methods, teacher personal-
ity, class composition, and characteristics of the overall classroom environment. The
student form contains 90 true/false items and takes 20 to 30 minutes to complete. Stu-
dents rate their perceptions of the classroom climate from using a 5-point scale from 1
(almost never) to 5 (very often). There are nine subscales grouped into three major
dimensions: relationship, personal growth/goal orientation, and system maintenance
and change. The subscales are involvement, affiliation, teacher support, task orienta-
tion, competition, order and organization, rule clarity, teacher control, and innovation.
• The Effective School Battery (ESB) assesses school climate and provides a portrait of the
attitudes and other characteristics of a school’s students and teachers. It measures and
reports on school safety, staff morale, administrative leadership, fairness and clarity of
school rules, respect for students, classroom orderliness, academic climate, school
rewards, student educational expectations, attachment to school, and other aspects of
school climate as reflected in teachers’ and students’ perceptions, behavior, and attitudes.
• The School Environment Preference Survey (SEPS) measures work-role socialization as
it occurs in the traditional school. SEPS has four scales: self-subordination, traditional-
ism, rule conformity, and uncriticalness. The test is helpful in planning instructional
strategies for students or as an aid for placement in alternative learning environments.
• The Responsive Environmental Assessment for Classroom Teaching (REACT) evalu-
ates student perceptions of the classroom environment. The instrument results in a
single factor score (i.e., classroom teaching environment) and six subscale scores: pos-
itive reinforcement, instructional presentation, goal setting, differentiated instruction,
formative feedback, and instructional enjoyment (Nelson, Demers, & Christ, 2014).
school report card) according to their students’ test scores, with the implication that a
school’s ranking reflects the effectiveness or quality of teaching.
Proponents of NCLB believe that testing assists in identifying and providing resources
for children who need help, holds schools accountable for student progress, and provides
parents with choices when their school ranking falls to unacceptable levels, and in 2006,
the U.S. Department of Education reported that that since NCLB’s inception, student
achievement has risen across America. However, many critics believe the act is failing;
they identify the following concerns (Adrianzen, 2010; Frey, Mandlawitz, & Alvarez, 2012;
Kieffer, Lesaux, & Snow, 2008; Ott, 2008; VanCise, 2014):
• School curriculum has narrowed to basic reading, writing, and arithmetic, excluding
subjects (e.g., art, music, social sciences, physical education) that are not tested.
• Schools with children from diverse backgrounds or who have diverse learning skills
are being penalized.
• High-stakes testing has created an atmosphere of greed, fear, and stress in schools,
none of which contribute to learning.
• Extremely high stakes encourage schools to cheat and to urge low-achieving students
to drop out.
• School counselors spend more time coordinating achievement test administration,
reducing their ability to provide services to students, teachers, and administrators.
• There is no consistency among states as to which standardized achievement tests
they use, which means that there is no way to compare the performance of students
from state to state.
The American Educational Research Association (AERA) is the leading organization
that studies educational issues. The organization has acknowledged that although policy-
makers instituted high-stakes tests with the good intention of improving education, they need
to carefully evaluate the tests’ potential to cause serious harm. For example, policymakers and
the public may be misled by spurious test score increases unrelated to any educational
improvement; students may be placed at increased risk of educational failure and dropping
out; teachers may be blamed or punished for inequitable resources over which they have no
control; and curriculum and instruction may be severely distorted if high test scores per se,
rather than learning, become the overriding goal of classroom instruction (American Educa-
tional Research Association (AERA), 2000). The organization recommends validating test
scores and individual uses, providing resources and opportunities for students to learn the
materials, and stating explicitly probable negative consequences as well as the rules used to
determine test-taking individuals. The association’s full policy on high-stakes testing is
available online at aera.net/AboutAERA/AERARulesPolicies/AERAPolicyStatements/
PositionStatementonHighStakesTesting/tabid/11083/Default.aspx.
Although NCLB has been due for reauthorization, there has been continued debate
in congress over the changes in the law. Partisan politics have been a continued issue in
changing the law to address some of the concerns identified by AERA.
reparation and performance strategies that students can use to increase their test
p
scores. These strategies can include coaching, increasing test wiseness, and reducing
test anxiety.
Coaching
Coaching is a method used by administrators, teachers, and counselors to help test tak-
ers improve their test performance. Although there is no universally accepted defini-
tion for coaching, the term is popularly used to mean training test takers to answer
specific types of questions and provide the information required by a specific test
(Hardison & Sackett, 2008). Coaching programs usually focus on test familiarization,
drill and practice on sample test items, or subject matter review. Coaching programs are
provided through classes offered in public schools; private classes; private tutors; and
test books, software programs, or videos. Probably the best example of a coaching pro-
gram is an SAT prep class. Thousands of high school students take SAT prep courses
each year, hoping to increase their scores on the college entrance exam. The results from
a recent nationwide study showed that students who took private SAT prep classes
scored an average of 60 points higher on their SAT tests compared to students who
didn’t take the classes (Grabmeier, 2006).
Of significance are concerns about the social, philosophical, and ethical aspects of
coaching. Research studies have found that students from less-advantaged families—those
with lower family incomes and with parents who have less education and lower-level
jobs—are less likely to use any form of test-preparation program (Grabmeier, 2006).
Test-Wiseness
Test-wiseness refers to an individual’s ability to utilize the characteristics and formats of a
test to receive a high score. Test-wiseness is independent of the student’s knowledge of the
subject matter that the test is designed to measure. Skills involved in test-wiseness include
strategies for time use, error avoidance, guessing, and use of deductive reasoning.
Strategies to increase test-wiseness include becoming familiar with the test before
test day. It is always best to know as much as possible about what to expect before arriving
at the test center. Once students know what to expect on the test, they should practice tak-
ing the test. In general, test takers feel more knowledgeable and have less anxiety when
they receive instructions about how to take a test. This reduces errors caused by unfamili-
arity with test procedures and leads to scores that better reflect an examinee’s knowledge
and abilities.
Special strategies apply to each type of item format. For example, on multiple-choice
tests, test takers should examine carefully all of the options or responses before attempting
to choose the correct answer. If the student stops when she sees a correct answer—say,
option A—then she could miss reading options B, C, D, and E, which might also be correct.
Some options on multiple-choice items may be similar and vary only slightly; usually,
these options can be eliminated. The examinee has a better chance of getting a higher score
if options known to be incorrect can be eliminated and the choice is made from among the
remaining alternatives. Sometimes, an option resembles the stem—that is, it uses the same
names, words, or phrases. Usually, such options should be selected. Correct answers are
often longer and perhaps stated more precisely or specifically than the other alternatives.
Chapter 14 • Assessment in Education 351
Simmonds, Luchow, Kaminsky, and Cottone (1989) designed the SPLASH test-taking
strategy for multiple-choice tests. SPLASH is an acronym that stands for the following:
1. Skim the Test Skim the entire test to get a general idea of the number of items, the
types of questions, and areas of proficiency and deficiency.
2. Plan Your Strategy This includes knowing time constraints of the test and where to
begin.
3. Leave Out Difficult Questions Students should leave difficult questions for the end.
4. Attack Questions You Know Students should first answer all questions they are sure of.
5. Systematically Guess After completing all questions they know, students should
make their best guess on the questions they don’t know.
6. House Cleaning A few minutes should be left prior to the end of the exam to fill in all
answers, double-check forms, and clean up erasures.
Test Anxiety
With the increased use of high-stakes tests throughout K–12 education, test anxiety has
become a common issue confronted by school counselors. Test anxiety is the general feeling of
uneasiness, tension, or foreboding that some individuals experience in testing situations. Test
anxiety can cause a host of problems in students, such as upset stomach, headache, loss of
concentration, fear, irritability, anger, and even depression. Students with test anxiety typi-
cally worry about not doing well on a test. This mind-set inhibits their ability to absorb, retain,
and recall information. Students with low test anxiety, on the other hand, do not worry and
are able to focus on their test performance. There are some controversies about test anxiety.
Some researchers debate the idea that anxiety interferes with test performance. In other words,
there is some belief that those with lesser ability would have higher incidence of anxiety and
that deficits in test performance are based on skill rather than anxiety. Sommer and Arendasy
(2014) concluded that test anxiety is a situation-specific trait based on existing deficits.
There are few published inventories with strong psychometric properties that assess
test anxiety. Only one specific instrument, the Test Anxiety Profile, is reviewed in the Men-
tal Measurements Yearbook, and the reviews suggest extreme caution in using this instru-
ment. Other instruments, such as the Test Anxiety Inventory (TAI) and the Test Anxiety
Scale for Children (TASC), are available from various publishers, but we also suggest cau-
tion in using these inventories. The TAI consists of 20 items in which respondents are
asked to report how frequently they experience specific symptoms of anxiety before, dur-
ing, and after examinations using a 4-point scale from 1 (almost never) to 4 (almost always).
The TASC is a 30-item instrument that assesses test anxiety, remote school concerns, poor
self-evaluation, and somatic signs of anxiety.
Students should be motivated to do their best on tests but should not be made anx-
ious. Sometimes, however, pressure comes not from the counselor or teacher but from the
parents. Currently in our educational system, tests have assumed too much importance in
some states. They are used as the sole criterion to judge whether a student should be pro-
moted to the next grade or allowed to move from one level to another. Counselors should
consider these strategies in test administration:
1. Make sure students understand test instructions; check with them, asking whether
they understand. In a group test, circulate around the room to see if students are fol-
lowing directions and recording their answers properly.
352 Chapter 14 • Assessment in Education
Relaxation exercises are often used in reducing test anxiety. The object of relaxation
exercises is to help students practice mind calming. The following is an example of what a
counselor might say before a test:
Sit down and get very comfortable. Close your eyes and take a deep breath.
Exhale and let your body and mind relax completely. Breathe in again, and as
you breathe out, feel even more relaxed. Forget about everything except what I
am saying. Listen carefully. Continue to breathe deeply and slowly. You should
begin to feel more and more relaxed.
You are sitting in a lounge chair on the beach. It is not too warm or too
cold. The temperature is just right. Everything is very peaceful and pleasant.
You see the waves coming onto the beach. They are a beautiful blue, and the
sun is a brilliant yellow. You feel nice and warm and relaxed all over. Take a
deep breath in the nice clear air. You lose track of time. The sky becomes a
deeper blue.
Now that you are relaxed, think positively of yourself. Say, “I can remem-
ber all I need to know on the test.” Say it several times. Say, “I will know the
right answers.” Say, “I am alert; my mind is powerful.”
Summary
Assessment instruments and strategies are used or coordinate an entire assessment program.
extensively throughout children’s educational Counselors need knowledge about the process
experience and play a major role in their life of designing school assessment programs, the
decisions, ranging from grade promotion and various types of assessment instruments and
graduation to college admission and entry into strategies used, and the assessment activities
certain vocational or educational training pro- they will most often engage in. They should also
grams. School counselors play an active role in be aware of other assessment-related issues in
school assessment programs and may adminis- education, such as assessing giftedness, assess-
ter, score, and interpret various assessment ing learning disabilities, test preparation and
instruments; communicate assessment results to performance, environmental assessment, and
students, parents, teachers, and administrators; high-stakes testing.
Chapter 14 • Assessment in Education 353
Suggested Activities
1. Interview a school counselor to find out what which he is applying. He says, “I don’t have time to
tests are part of that school’s assessment pro- go to the test prep sessions at school. It’s just an apti-
gram. Report your findings to the class or in a tude test, and either I have it or I don’t.”
written paper. a. Do you agree with Kent?
2. Write a position paper on one of the issues related b. What approach would you use to help him?
to school assessment programs.
3. Prepare an annotated bibliography of sources to
help an individual prepare to take one of the Case of Letitia
major college admissions tests. Letitia, a high school senior, is taking an honors
4. Administer a test designed to measure test anxi- class in chemistry. She has a 3.0 grade point aver-
ety, and discuss the results with the class. age on a 4-point system. Whenever she has a test
5. Read the following brief cases and answer the in class, she becomes nauseous and faints. She does
questions at the end of each. not report the same feeling when she takes tests in
other courses.
Case of Kent
a. What do you think is Letitia’s problem?
Kent needs to take the SAT to complete his file so that b. What approach would you use to help her?
he can be considered for admission to the college for
References
Adrianzen, C. A. (2010). A critical examination of equity American Educational Research Association (AERA).
issues associated with the implementation of the Title III (2000). AERA position statement on high-stakes test-
components of the No Child Left Behind Act on bilin- ing in pre-k–12 education. Retrieved from http://
gual students. (70), ProQuest Information & Learn- www.aera.net/?id=378
ing, US. Retrieved from http://libproxy.lamar American Educational Research Association (AERA),
.edu/login?url=http://search.ebscohost.com American Psychological Association (APA), &
/login.aspx?direct=true&db=psyh&AN=2010- National Council on Measurement in Education
99011-429&site=ehost-live Available from EBSCO- (NCME). (2014). Standards for educational and psy-
host psyh database. chological testing. Washington, DC: Authors.
354 Chapter 14 • Assessment in Education
American School Counselor Association (ASCA). & school reform (pp. 57–74). Thousand Oaks, CA:
(2012). The ASCA national model: A framework for Corwin Press.
school counseling programs (23rd ed.). Alexandria, Maras, M. A., Coleman, S. L., Gysbers, N. C., H erman,
VA: Author. K. C., & Stanley, B. (2013). Measuring evaluation
American School Counselor Association (ASCA) & competency among school counselors. Counseling
Association for Assessment in Counseling Outcome Research and Evaluation, 4(2), 99–111.
(AACE). (2000). Competencies in assessment and McCarthy, C., Van Horn Kerne, V., Calfa, N. A., Lam-
evaluation for school counselors. Alexandria, VA: bert, R. G., & Guzmán, M. (2010). An exploration
Author. of school counselors’ demands and resources:
Caldwell, J. S. (2008). Reading assessment: A primer for Relationship to stress, biographic, and caseload
teachers and coaches (2nd ed.). New York, NY: characteristics. Professional School Counseling,
Guilford. 13(3), 146–158.
Callahan, C. M., Moon, T. R., & Oh, S. (2014). National National Association for Gifted Children. (2008). Def-
surveys of gifted programs: Executive summary. Char- initions of giftedness. Retrieved from http://www.
lotesville, VA: University of Virginia. nagc.org/index.aspx?id=574
Cortiella, C., & Horowitz, S. H. (2014). The state of National Association of State Directors of Special
learning disabilities: Facts, trends, and emerging issues Education’s IDEA Partnership. (2007). Dialogue
(3rd ed.). New York, NY: National Center for guides: What is the IQ-achievement discrepancy
Learning Disabilities. Retrieved from http:// model? Alexandria, VA: Author.
www.ncld.org/types-learning-disabilities/what- National Joint Committee on Learning Disabilities
is-ld/state-of-learning-disabilities (NJCLD). (2005). Responsiveness to intervention and
Dahir, C. A., & Stone, C. (2011). The transformed school learning disabilities. Retrieved from http://www
counselor. Boston, MA: Cengage Learning. .ldonline.org/about/partners/njcld#reports
Decker, S. L., Hale, J. B., & Flanagan, D. P. (2013). Pro- Nelson, P. M., Demers, J. A., & Christ, T. J. (2014).
fessional practice issues in the assessment of cog- The Responsive Environmental Assessment for
nitive functioning for educational applications. Classroom Teaching (REACT): The dimensional-
Psychology in the Schools, 50(3), 300–313. ity of student perceptions of the instructional
Ekstrom, R. B., Elmore, P. B., Schafer, W. D., Trotter, environment. School Psychology Quarterly, 29(2),
T. V., & Webster, B. (2004). A survey of assessment 182–197.
and evaluation activities of school counselors. Pro- Renzulli, J. S., Smith, L. H., White, A. J., Callahan,
fessional School Counseling, 8, 24–30. C. M., Hartman, R. K., Westberg, K. L., . . . Sytsma,
Frey, A. J., Mandlawitz, M., & Alvarez, M. E. (2012). R. E. (2010). Scales for Rating the Behavioral Charac-
Leaving NCLB behind. Children & Schools, 34(2), teristics of Superior Students: Technical and adminis-
67–69. tration manual. (Rev ed.). Mansfield Center, CT:
Grabmeier, J. (2006, August 7). SAT test prep tools give Creative Learning Press, Inc.
advantage to students from wealthier families. Retrieved Simmonds, E. P. M., Luchow, J. P., Kaminsky, S., &
from http://researchnews.osu.edu/archive Cottone, V. (1989). Applying cognitive learning
/satprep.htm strategies in the classroom: A collaborative
Hardison, C. M., & Sackett, P. R. (2008). Use of writ- training institute. Learning Disabilities Focus, 4,
ing samples on standardized tests: Susceptibility 96–105.
to rule-based coaching and the resulting effects on No Child Left Behind Act. (2004). P.L. 107–110 (Title
score improvement. Applied Measurement in Educa- IX, Part A, Definition 22, 2002); 20 USC 7801(22).
tion, 21(3), 227–252. Ott, E. M. (2007). Schools Left Behind: Statistical Issues
Holland, J. L. (1997). Making vocational choices: A the- with NCLB (No Child Left Behind). ProQuest.
ory of vocational personalities and work environments Sink, C. A. (2009). School counselors as accountabil-
(3rd ed.). Upper Saddle River, NJ: Prentice Hall. ity leaders: Another call for action. Professional
Kieffer, M. J., Lesaux, N. K., & Snow, C. E. (2008). School Counseling, 13(2), 68–74.
Promises and pitfalls: Implications of NCLB for Sommer, M., & Arendasy, M. E. (2014). Comparing
identifying, assessing, and educating English lan- different explanations of the effect of test anxi-
guage learners. In G. L. Sunderman (Ed.), Holding ety on respondents’ test scores. Intelligence, 42,
NCLB accountable: Achieving, accountability, equity, 115–127.
Chapter 14 • Assessment in Education 355
Strong, M., Gargani, J., & Hacifazlioğlu, Ö. (2011). Do ashington, DC: U.S. Government Printing
W
we know a successful teacher when we see one? Office.
Experiments in the identification of effective VanCise, S. A. (2014). A descriptive study of the
teachers. Journal of Teacher Education, doi: impact of the highly qualified teacher require-
10.1177/0022487110390221 ment of NCLB on the attrition of special education
Torrance, E. P. (1974). Torrance Tests of Creative Think- personnel. Dissertation Abstracts International Sec-
ing, figural form A. Bensenville, IL: Scholastic Test- tion A, 74.
ing Service. Young, A., & Kaffenberger, C. (2011). The beliefs
Trickett, E., & Moos, R. (1995). Classroom Environment and practices of school counselors who use data
Scale manual (3rd ed.) Redwood City, CA: Mind to implement comprehensive school counseling
Garden. programs. Professional School Counseling, 15(2),
U.S. Department of Education. (1994). National excel- 67–76.
lence: A case for developing America’s youth.
CHAPTER
15 Assessment Issues
with Diverse Populations
Fairness in testing is an essential component of the Standards for Educational and Psychologi-
cal Testing. According to the AERA et al. standards (American Educational Research Asso-
ciation (AERA), American Psychological Association (APA), & National Council on
Measurement in Education (NCME), 2014), the concept of fairness to all test takers is foun-
dational to the assessment process and deserves specific attention and evaluation. Fairness
in testing is a complex construct that includes issues related to measurement bias, acces-
sibility, universal design, responsiveness to individual characteristics and testing contexts,
treatment during the testing process, access to constructs being measured, and validity of
individual test score interpretation for the intended test uses (AERA et al., 2014). Counse-
lors who conduct assessment must do so with consideration of the characteristics of diverse
subgroups of people (e.g., race, gender, ability, language).
Cultural competence is critical for all professional counselors. Thus, counseling
professionals who use assessment strive to ensure fair and equitable treatment of diverse
populations. Diverse populations can be defined as “persons who differ by race, ethnic-
ity, culture, language, age, gender, sexual orientation, religion, and ability” (Association
for Assessment in Counseling (AAC), 2003). In this chapter, we will divide our discus-
sion of assessing diverse populations into two sections: (1) multicultural assessment,
which will refer to the competencies required for assessing individuals from various
cultural groups that are distinguished by race, ethnicity, age, language, gender, sexual
orientation, religious or spiritual orientation, and other cultural dimensions, and (2)
assessment of individuals with disabilities, which will center on the competencies and
standards necessary for assessing individuals with significant limitations in physical or
cognitive functioning.
356
Chapter 15 • Assessment Issues with Diverse Populations 357
JJ Define disability, and explain standards for assessing individuals with disabling
conditions.
JJ Describe assessment issues for individuals with visual impairments, hearing
Multicultural Assessment
Cultural diversity among the U.S. population continues to grow. In 2014, the U.S. Census
Bureau reported that as many as one-third of the total 31,600 million U.S. residents now
claim “minority” heritage. Undoubtedly, counselors and other helping professionals will
work with individuals with differing cultural backgrounds, customs, traditions, and val-
ues. As such, counselors need to be prepared to work with individuals from diverse back-
grounds and must recognize and appreciate the differences that exist among people and
clients. Multicultural assessment refers to competencies and standards necessary for assess-
ing individuals who differ on various aspects of cultural identity, such as race, ethnicity,
age, language, gender, sexual orientation, religious or spiritual orientation, and other cul-
tural dimensions. Each cultural dimension has unique issues and concerns. Thus, to be
effective, counselors must possess a depth of knowledge concerning the culture of clients
as well as an awareness of available resources for acquiring information about persons of
diverse cultures (Huey, Tilley, Jones, & Smith, 2014).
Standardized tests are frequently used in the assessment process, and many profes-
sionals rely on the scores of standardized tests to make important decisions or inferences
about a client. Nonetheless, the use of standardized tests has been criticized for being
biased and unfair for use with diverse populations. Multicultural assessment focuses on
the extent to which assessment instruments and procedures are appropriate, fair, and use-
ful for accurately describing abilities and other traits in individuals from a given culture.
We will focus our discussion on several issues pertinent to using standardized tests in
multicultural assessment, including test bias, psychometric considerations, test-taker fac-
tors, examiner bias, etic and emic perspectives, and acculturation.
Measurement Bias
According to the Standards for Educational and Psychological Testing (AERA et al., 2014), one
of the principal threats to fairness is measurement bias. There are two primary compo-
nents of measurement bias: accessibility and universal design. These two components of
measurement bias are evolutions of concepts from earlier versions of test standards. The
concept of accessibility relates to the extent to which all individuals are given an equal
chance to demonstrate their capabilities or level of functioning during the assessment pro-
cess. For example, imagine that a client who speaks English as a second language is taking
the MMPI-2-RF as part of your clinical assessment process. It would be difficult to use the
results of the assessment instrument to accurately formulate a diagnosis if the client was
not proficient enough in English to fully understand the questions.
Universal design is purposeful test design aimed at maximizing accessibility for the
population of intended test takers. The process of universal design involves a test devel-
oper giving consideration to the various processes that may impede access to a test and
attempting to minimize any challenges that may be related to those processes. More sim-
ply stated, test developers must put themselves in the shoes of test takers and imagine the
358 Chapter 15 • Assessment Issues with Diverse Populations
various facets of a test that might block performance unnecessarily. For example, imagine
that you are taking a scenario-based exam aimed at evaluating your ability to accurately
diagnose an individual client. In the exam, you are given a case study and a time limit of
20 minutes to arrive at a diagnosis. What relevance does 20 minutes have to the ability to
diagnose? Is a 20-minute time limit part of competency in diagnosis, or does the time limit
create construct-irrelevant variance (i.e., an incorrect inflation or deflation of test scores
due to measurement error)?
When some aspect of an assessment instrument does not allow for accessibility or is
not designed for universal access, the instrument is viewed as being biased. Measurement
bias can be defined as “construct underrepresentation or construct-irrelevant components
of tests that differentially affect the performance of different groups of test takers and con-
sequently the reliability/precision and validity of interpretations and uses of their test
scores” (AERA et al., 2014, p. 216). In other words, a test is labeled as biased when indi-
viduals with the same ability perform differently on the test because of their affiliation
with a particular group. It’s important to note that test bias is not the same as group differ-
ences; groups that differ on test results in a way that is consistent with the research is evi-
dence of the validity of test results (e.g., individuals with depression scoring higher on the
Beck Depression Inventory). Fairness in measurement is a new way of conceptualizing
measurement bias. According to AERA et al., fairness in measurement quality can be
examined from three different perspectives: fairness as the lack or absence of measurement
bias, fairness as access to the construct being measured, and fairness as validity of the indi-
vidual test score interpretation.
sled, others from Tennessee or Appalachia may consider the term as a type of knit cap. Still
others from Arizona may have never been exposed to the term. Performance on this item
may simply be a result of geographic differences rather than intelligence.
If a test is full of items that result in different groups of equal standing having overall
differences in test scores, then there may be bias related to differential test functioning.
This type of bias may also result from a lack of clarity of test instructions or from faulty
scoring procedures. For example, for non-English-speaking individuals, the failure to per-
form on a task may simply be the result of not understanding test directions.
Fairness in Access to the Construct as Measured When examining a test for fairness,
it is important to determine if the population for which the test is intended has equal
opportunity to access the construct being measured. Measurement bias can occur when
the knowledge, skills, and abilities needed to access the test differ from those being meas-
ured in the test. For example, imagine that a state mandates the use of the paper-and-
pencil version of the MMPI-2 for determination of fitness for the death penalty in a capital
murder case. Having reviewed the MMPI-2 in Chapter 12, you might see this as a reason-
able measure considering the amount of research available, the scales that measure malin-
gering, and the level of training involved in accurate interpretation of scores. However, the
reading level of the MMPI-2 is fifth grade, and many individuals on death row are consid-
ered illiterate. Providing an audio version of the instrument might result in a more valid
picture of personality than the paper-and-pencil version.
additional time. It is important to note that there are many other accommodations that
could be made to increase fairness in the interpretation of test scores in this scenario. The
key is for testing coordinators to consider these factors and to take appropriate steps to
increase fairness.
Baruth and Manning (2011) identified several barriers to effective multicultural coun-
seling that may be applied to the issue of examiner bias:
1. Differences in class and cultural values between counselor and client.
2. Language differences between counselor and client.
3. The counselor believes stereotypes about culturally different people.
4. The counselor fails to understand his or her own culture.
5. Lack of counselor understanding about the client’s reluctance and resistance in
counseling.
6. Lack of understanding of the client’s worldviews.
7. Labeling cultural groups as mentally ill when their behaviors vary from so-called
norms.
8. Expecting all clients to conform to the counselor’s cultural standards and expectations.
In addition, the Standards for Educational and Psychological Testing (AERA et al., 2014)
explain that in educational, clinical, and counseling situations, test users should not
attempt to evaluate test takers whose special characteristics—such as age, disability, or
linguistic, generational, or cultural backgrounds—are outside the range of their academic
training or supervised experience.
Screening Test. The MMPI is also on this list; it has been translated into 150 languages and
has reported applications in 50 countries. The other tests listed have also been translated
into other languages, mainly Spanish.
Emic methods include behavior observations, case studies, studies of life events, pic-
ture story techniques, inkblot techniques, word association, sentence-completion items,
and drawings. Most of these methods are classified as projective. These tests can provide a
personality description of the individual that reflects the data and mirrors the culture and
ethnic group (Dana, 2005). The analysis requires the examiner to have more knowledge of
the culture but aids the understanding of the individual in a cultural context. Thematic
Apperception Test versions include the Tell Me a Story Test, designed for Spanish-speaking
populations, and the Thompson Modification of the Thematic Apperception Test, a 10-card ver-
sion of the TAT for African Americans. Sentence-completion methods have also been used,
and items can be designed to assess the social norms, roles, and values of clients from dif-
ferent cultures. Here are examples of some of the items:
The thing I like most about America is ____________________________________.
Anglos ________________________________________________________________.
If I could be from another culture or ethnic group, __________________________.
Acculturation
The degree of acculturation to the dominant society and the extent to which the original
culture has been retained provide valuable information in interpreting assessment results.
Assessing acculturation can help assessment professionals understand the unique chal-
lenges and transformations experienced by racial and ethnic minorities exposed to a new
culture (Zhang & Tsai, 2014). An examiner can get an idea of whether the client is identify-
ing with a culture of origin rather than with the dominant Anglo-American culture by
asking questions such as those shown in Figure 15.2. In addition, several instruments are
available that evaluate cultural and racial identity, including the following:
• Acculturation Rating Scale for Mexican Americans (ARSMA-II)
• African American Acculturation Scale (AAAS)
• The East Asian Acculturation Measure (EAAM)
• Bicultural Involvement Questionnaire (BIQ)
• Cross Racial Identity Scale (CRIS)
• Multigroup Ethnic Identity Measure (MEIM)
• Racial Identity Attitude Scale (RIAS)
Strategies
Because each client is unique, the examiner needs to be alert to important behavioral signals
during the assessment process. Certain behaviors may affect the reliability and validity of
the assessment procedure. Appropriate examiner responses are summarized in Table 15.1.
spoke a language other than English (Ryan, 2013). Because the diversity of language is now
becoming a clear trend in the United States, it is imperative that U.S.-based assessment
practices include a more global perspective and attend to language-based issues in the
assessment process.
Tests written in English may become tests of language proficiency for these individu-
als, rather than measures of other constructs. Of course, it is sometimes important and
necessary to have tests that measure English proficiency—especially for educational
assessment and placement. Tests not meant to measure proficiency in English are some-
times translated into the appropriate native language. However, there may be problems in
translation, and the content and words might not be appropriate or meaningful to the
group being tested.
The Standards for Educational and Psychological Testing (AERA et al., 2014) includes
several standards relating to the testing of linguistically diverse populations. A number of
standards are intended primarily for the test authors and publishers, but should be taken
into consideration by the test user. If the test is modified for individuals with limited Eng-
lish proficiency, then the changes should be presented in the manual. The reliability and
validity of the test for the intended linguistic group are also important. If two versions of
dual-language tests exist, then evidence of the comparability of the forms should be
included. The Standards also caution users not to require a greater level of English profi-
ciency for the test than the job or profession requires—a significant concern in developing
and selecting employment, certification, and licensing examinations. Another standard
cautions test users to not judge English-language proficiency on the basis of test informa-
tion alone. Many language skills are not adequately measured by multiple-choice exami-
nations. The examiner needs to use observation techniques and perhaps informal checklists
to assess proficiency more completely.
Chapter 15 • Assessment Issues with Diverse Populations 363
Improvement Act of 2004 (IDEA) lists 13 specific disability categories under which chil-
dren may qualify for special education and related services (see Table 15.2).
Although not listed in the IDEA disability categories, the term developmental disabilities
is widely used to describe a diverse group of severe, lifelong disabilities attributable to
mental and/or physical impairments, which manifest prior to age 22. People with devel-
opmental disabilities have problems with major life activities, such as language, mobility,
learning, self-care, and independent living. Autism, cerebral palsy, hearing loss, intellec-
tual disabilities, and vision impairment are examples of developmental disabilities.
Assessment of individuals with disabilities is conducted for a variety of reasons: to
diagnose or determine the existence of disability, to determine intervention plans, for place-
ment or selection decisions, and/or for monitoring performance in an educational setting
(AERA et al., 2014). With regard to ability/disability assessment, instruments are often
designed to measure skills that should be mastered at certain stages across the life span:
• Communication Skills Verbal and nonverbal, receptive and expressive, listening
and comprehension
• Cognitive Skills Reasoning, thinking, memory; basic achievement in reading, writ-
ing, and mathematics; problem solving
• Physical Development General growth; motor and sensory; balance, locomotion,
walking
• Emotional Development Temperament, adjustment, emotional expression, self-
concept, attitudes
• Social Development Peer and family relationships, friendships, interpersonal rela-
tionships
• Self-Care Skills Basic self-care needs, such as drinking, eating, toileting, dressing
• Independent Living Skills Functioning independently in the home and community,
including clothing care, cooking, transportation, shopping, money management
• Work Habits and Adjustment Skills Working independently, maintaining proper
work habits, working with employers and employees, seeking and keeping jobs
• Adjustment Problems Aggression, hyperactivity, acting out, withdrawal, delin-
quency, stress, depression
A major issue in assessing individuals with disabilities concerns the use of accom-
modations or modifications that minimize the impact of the individual’s attributes that are
not relevant to the construct that is being measured. Most assessment instruments are
designed for use with the general population and may not be appropriate for use with
Chapter 15 • Assessment Issues with Diverse Populations 365
individuals with specific disabilities. For example, a person who is blind and reads only
Braille cannot complete a traditional written exam. Thus, modification of tests and test
administration procedures is necessary to make an accurate assessment. Examples of test-
modification strategies include the following (AERA et al., 2014):
• Modifying the presentation form
• Modifying the response format
• Modifying timing
• Modifying test setting
• Using only portions of a test
• Using substitute or alternate tests
The Standards for Educational and Psychological Testing (AERA et al., 2014) established sev-
eral standards for instruments designed to assess people with disabling conditions. Test
developers are required to have psychometric expertise and experience in working with
individuals with disabilities, and test publishers should caution examiners about the use
and interpretation of the test with clients with special needs until it has been fully vali-
dated. Test developers should conduct pilot tests on people who have similar disabilities
to check on the appropriateness and feasibility of test modifications and should include a
careful statement of steps taken to modify the test so that users will be able to identify any
changes that may alter its validity. They should use empirical procedures to establish time
limits for modified forms and explore the effects of fatigue. They also should provide
validity and reliability information on the modified forms and on the unmodified forms.
Clearly, it is important for test users to study the test manual and evaluate the techni-
cal information to determine whether the modifications are valid and reliable for the group
in question. In addition, practitioners working with clients who have special needs should
know what alternate tests and methods are available and appropriate for these persons.
Those who are interpreting the test results need to know which set of norms should be
used. Regular norms are used when the examiner needs to compare the test performance
of the individual with disabilities with that of the general population. Special norms are
appropriate when the examiner is looking at how the individual performs in relation to
peers with the same disabling condition.
• Setting related
• Response related
• Aids related
Counselors who conduct assessment on individuals who experience some form of
visual impairment must become familiar with the various accommodations that are related
to each category. For example, many assessment instruments have audio recorded materi-
als for administration to those with visual impairment. Providing an audio presentation
would make an appropriate accommodation. As another example, consider a student who
is completing an online program in counselor education and must take exams as part of
the training program. It may be an appropriate accommodation to provide the individual
with a screen reader.
Determining appropriate accommodations in the assessment process is a critical ele-
ment of fairness in testing. However, a more appropriate path is to select a test that is
normed on the population for which it is intended. There are specialized tests available for
assessing children and adults with visual impairments. The Hill Performance Test of
Selected Positional Concepts measures the spatial concepts of children aged 6 to 10 with
visual impairments. The Reynell–Zinkin Scales provide a profile of six areas for children
up to 5 years old: social adaptation, sensorimotor, exploration of environment, response to
sound/verbal comprehension, expressive language, and nonverbal communication.
from birth to age 90, who have intellectual disabilities, developmental delays, autism spec-
trum disorders, and other impairments. The instrument contains five domains, each with
two or three subdomains (see Table 15.3). The Vineland-II is available in four formats: a
Survey Interview Form, a Parent/Caregiver Rating Form, an Expanded Interview Form,
and a Teacher Rating Form. The instrument yields standard scores (M = 100, SD = 15),
V-scale scores (M = 15, SD = 3), percentile ranks, age equivalents, stanines, adaptive lev-
els (i.e., low, moderately low, adequate, moderately high, and high), and maladaptive
levels (i.e., average, elevated, clinically significant).
Source: Vineland Adaptive Behavior Scales, Second Edition (Vineland-II). Copyright © 2008 by NCS
Pearson, Inc. Reproduced with permission. All rights reserved.
Chapter 15 • Assessment Issues with Diverse Populations 369
assessment of children with special needs can be a difficult process. All counselors con-
ducting assessment should have supervision of their assessment practice until they gain
competence, but such supervision is especially important when working with children. In
order to be effective, counselors should also consider attending additional training for
each particular disorder.
Autism Spectrum Disorders Autism spectrum disorders (ASDs) include those disorders
characterized by impairment in several areas of development, including social interaction
skills, communication skills, or the presence of stereotyped behavior, interests, and activi-
ties. The revision to the diagnosis of Autism Spectrum Disorder in the DSM-5 now encom-
passes several specific disorders from the DSM-IV-TR (i.e., Rhett’s Disorder, Childhood
Disintegrative Disorder, Asperger’s Disorder, Pervasive Developmental Disorder Not
Otherwise Specified). Autism is a complex spectrum of symptoms that is usually evident
in the first years of life. One of the issues that makes diagnosis complex is that individuals
may range from gifted intelligence to some degree of cognitive impairment. Regardless of
intellectual functioning, children with ASDs display difficulties in the following areas:
1. Social Interaction Appear aloof, want to spend more time alone than with others,
show little interest in making friends, and lack emotional reciprocity
2. Communication Show a significant delay in or absence of speech
3. Repetitive Behaviors Become intensely preoccupied with one toy or activity, or
engage in rituals or routines
4. Senses Demonstrate extreme overreaction or underreaction to sensory (e.g., sight,
hearing, touch, smell, taste) information
Assessing children with an ASD should include, at minimum, a parent or caregiver inter-
view, a medical evaluation, direct behavioral observation, a cognitive assessment, and an
assessment of adaptive functioning (Sheperis, Mohr, & Ammons, 2014). No assessment
instrument or procedure can be used in isolation to make a diagnosis of autism, but the
combined use of evidence-based assessment methods is recommended. Formal assess-
ment instruments are often best for determining risk of autism rather than for making a
specific diagnosis. Examples of these instruments include the following:
• Autism Behavior Checklist (ABC)
• Autism Screening Instrument for Educational Planning, Third Edition (ASIEP-3)
• Autism Diagnostic Interview, Revised (ADI-R)
• Behavior Observation Scale for Autism (BOS)
• Childhood Autism Rating Scale, Second Edition (CARS-2)
• Gilliam Autism Rating Scale, Third Edition (GARS-3)
Behavior checklists and rating scales play a prominent role in assessment of children
with ADHD. They provide important information about a child’s behavior in different set-
tings and how it is judged by significant others (Hersen, 2006). Some checklists or rating
scales assess broad dimensions of child behavior but can be used to assess ADHD, includ-
ing the following:
• Child Behavior Checklist (CBCL/6-18)
• Behavioral Assessment System for Children (BASC-2)
• Conner’s Rating Scales, Third Edition (Conners 3)
Checklists and rating scales that focus more narrowly on ADHD characteristics or symp-
toms include the following:
• ADHD Symptom Rating Scale (ADHD-SRS)
• Attention Deficit Disorder Evaluation Scale (ADDES-2)
• Attention Deficit Hyperactivity Rating Scale (AD/HD-RS-IV)
• Barkley Home Situation Questionnaire (HSQ-R)
• Strengths and Weaknesses of ADHD Symptoms and Normal Behavior (SWAN)
• Swanson, Nolan, and Pelham Rating Scale (SNAP-IV)
• Development and Well Being Assessment (DAWBA)
• Diagnostic Interview Schedule for Children (DISC-IV)
Developmental Delays Developmental delay is a term that means a child is developing slower
than normal in one or more areas and is at risk of academic failure (Glascoe, Marks, & Squires,
2012). These delays may occur in one or more of the major areas of development, including
physical abilities (gross or fine motor skills), cognitive development, language, or personal or
social skills development. In recent years, there has been an effort to identify children with
developmental delays prior to kindergarten. The Early Intervention Program for Infants and
Toddlers with Disabilities (Part C of IDEA) is a federal grant program that assists states in
providing early intervention services for infants and toddlers (ages birth through 2 years)
with disabilities and their families. In order to be eligible for services, a comprehensive, mul-
tidisciplinary evaluation and assessment of the child and family must be conducted. The
evaluation of the child involves assessing the following areas of child development:
• Physical (reaching, rolling, crawling, walking)
• Cognitive (thinking, learning, solving problems)
• Communication (talking, listening, understanding)
• Social/emotional (playing, feeling secure and happy)
• Adaptive (toileting, eating, dressing, personal hygiene)
Various instruments are available to help assess these developmental areas. An anno-
tated list of these instruments is provided as follows:
• The Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III) are
used to assess the cognitive, language, and psychomotor development of children from
2 to 30 months of age and to aid in the diagnosis of normal or delayed development.
• The Battelle Developmental Inventory, Second Edition (BDI-2) was designed to
assess five domains of development in infants and children through age 7: adaptive
behavior, personal/social skills, communication, gross and fine motor ability, and
cognitive skills.
372 Chapter 15 • Assessment Issues with Diverse Populations
• The Cattell Infant Intelligence Scale measures the intellectual development of chil-
dren from 3 to 30 months by assessing verbalizations and motor control.
• The Dallas Preschool Screening Test (DPST) is used to assess the learning disabilities
of children between the ages of 3 and 6; it measures auditory, language, motor, vis-
ual, psychological, and articulation development.
• The Denver Developmental Screening Test, Second Edition (DDST-II) is used to evalu-
ate a child’s personal, social, fine motor, gross motor, language, and adaptive abilities.
The test is appropriate for use as a screening tool with children from birth to age 6.
• The Kent Infant Development Scale, Second Edition (KID Scale) uses a 252-item
inventory to assess infants from birth to 15 months and children up to 6 years with
developmental ages of less than 15 months. The test measures cognitive, language,
motor, self-help, and social skills.
Summary
Fairness in measurement is a primary consider- and other traits in individuals from diverse pop-
ation for all assessment. Counselors and other ulations. Professionals are aware of the impor-
helping professionals must recognize and tant issues related to the use of standardized
appreciate the differences that exist among peo- tests, such as test bias, psychometric properties,
ple who differ in terms of race, ethnicity, age, administration and scoring procedures, test use
language, gender, sexual orientation, religious and interpretation, and test-taker and examiner
or spiritual orientation, ability, and other cul- bias. Furthermore, they have knowledge of the
tural dimensions. ethical principles, etic and emic perspectives,
Multicultural assessment encompasses and the issue of acculturation.
assessment procedures that are appropriate, Assessment of individuals with disabili-
fair, and useful for accurately describing abilities ties is conducted to diagnose or determine the
Chapter 15 • Assessment Issues with Diverse Populations 373
Suggested Activities
1. Interview professionals who assess diverse pop- 4. Review the legislation and judicial decisions
ulations and find out what tests they use and related to assessing individuals from different
why. Report your findings to the class. cultural groups and individuals with disabilities.
2. Review a video of a test administration of an indi- Write a summary of what you find.
vidual with a disability. Did the examiner modify 5. Locate a test that has been adapted and standard-
test procedures? If so, how? ized for use with a specific cultural group or
3. Interview individuals with disabilities and find diverse population. Compare the adaptation
out their experiences and reactions to testing. with the original version. How was the test mod-
Report your findings to the class. ified? How do the two sets of norms compare?
References
American Association on Intellectual and Develop- American Educational Research Association (AERA),
mental Disabilities (AAIDD). (2008). Frequently American Psychological Association (APA), and
asked questions on intellectual disability and the National Council on Measurement in Education
AAIDD definition. Washington, DC: Author. (NCME). (2014). Standards for educational and psy-
American Counseling Association (ACA). (2014). chological testing. Washington, DC: Authors.
ACA Code of Ethics. Alexandria, VA: Author. American Psychiatric Association (APA). (2013).
American Counseling Association and Association Diagnostic and statistical manual of mental disorders
for Assessment in Counseling. (2003). Responsibili- (5th ed.). Washington, DC: Author.
ties of users of standardized tests. Alexandria, VA: Americans With Disabilities Act of 1990, Pub. L.
Author. No. 101-336, 104 Stat. 328 (1990).
374 Chapter 15 • Assessment Issues with Diverse Populations
Association for Assessment in Counseling (AAC). Ryan, C. (2013). Language use in the United States: 2011.
(2003). Standards for multicultural assessment US Department of Commerce: Economics and Sta-
(2nd ed.). Retrieved from http://aarc-counseling tistics Administration. Washington, DC.
.org/assets/cms/uploads/files/multicultural.pdf Sattler, J. M., & Hoge, R. D. (2006). Assessment of
Baruth, L. G., & Manning, M. L. (2011). Multicultural children: Behavioral, social, and clinical foundations
counseling psychotherapy (5th ed.). Upper Saddle (5th ed.). San Diego, CA: Jerome M. Sattler Pub-
River, NJ: Merrill/Prentice Hall. lisher Inc.
Brault, M. W. (2012). Americans with disabilities: 2010 Sheperis, C. J., Mohr, J. D., & Ammons, R. (2014). Best
Household economic studies. U.S. Department of practices: Autism spectrum disorder. Center for
Commerce: Economics and Statistics Administra- Counseling Practice, Policy, and Research. Retrieved
tion, Washington, DC. from http://www.counseling.org/knowledge-
Dana, R. H. (2005). Multicultural assessment: Princi- center/center-for-counseling-practice-policy-and-
ples, applications, and examples. Mahwah, NJ: Law- research/practice-briefs
rence Erlbaum Associates. Simeonsson, R. J. (1986). Psychological and developmen-
Glascoe, F. P., Marks, K. P., & Squires, J. (2012). tal assessment of special children. Boston, MA: Allyn
Improving the definition of developmental delay. & Bacon.
Journal of Developmental and Behavioral Pediatrics, Sparrow, S. S., Cicchetti, D. V., & Balla, D. A. (2005).
33(1), 87–87. Vineland Adaptive Behavior Scales, second edition:
Hersen, M. (2006). Clinician’s handbook of behavioral Survey forms manual. Minneapolis, MN: Pearson
assessment. Burlington, MA: Elsevier/Academic Assessment.
Press. Steer, M., Gale, G., & Gentle, F. (2007). A taxonomy of
Huey, S. J., Jr., Tilley, J. L., Jones, E. O., & Smith, C. A. assessment accommodations for students with
(2014). The contribution of cultural competence to vision impairments in Australian schools. British
evidence-based care for ethnically diverse popu- Journal of Visual Impairment, 25(2), 169–177.
lations. Annual Review of Clinical Psychology, 10, Suzuki, L. A., & Ponterotto, J. G. (2008). Handbook of
305–338. multicultural assessment: Clinical, psychological, and
Joint Committee on Testing Practices. (2004). Code of educational applications (3rd ed.). San Francisco,
fair testing practices in education. Washington, DC: CA: Jossey-Bass.
Author. Tanner, D. C. (2007). Medical-legal and forensic aspects
Kelley, L., Bailey, E. K., & Brice, W. D. (2014). Teach- of communication disorders, voice prints, & speaker
ing methods: Etic or emic. Developments in Busi- profiling. Tuscon, AZ: Lawyers & Judges Publish-
ness Simulation and Experiential Learning, 28, ing Company.
123–126. U.S. Census Bureau (2014). State & Country Quick-
Luckasson, R., Borthwick-Duffy, S., Buntinx, W. H. Facts, 2014. Retrieved from http://quickfacts
E., Coulter, D. L., Craig, E. M., Reeve, A., … Tassé, .census.gov/qfd/states/00000.html
M. J. (2010). Mental retardation: Definition, classifica- Ysseldyke, J. E., & Algozzine, R. (2006). Teaching stu-
tion, and systems of supports (11th ed.). Washington, dents with communication disorders: A practical guide
DC: American Association on Mental Retardation. for every teacher. Thousand Oaks, CA: Sage.
Roth, K. (2013). Understanding and managing the grow- Zhang, Y., & Tsai, J. (2014). The assessment of accultura-
ing SLP shortage. San Fransisco, CA: Presence- tion, enculturation, and culture in Asian-American
Learning, Inc. Retrieved from http://www samples: Guide to psychological assessment with
.slideshare.net/presencelearning/understanding- Asians. New York, NY: Springer.
and-managing-the-growing-slp-shortage
CHAPTER
16 Communicating
Assessment Results
Once the assessment process is complete and results are analyzed and interpreted,
counselors have the responsibility to communicate the results to those individuals enti-
tled to the information. Communicating results in an appropriate manner is as essential
as following appropriate procedures in conducting assessment (see Figure 16.1). We
place this chapter toward the end of the book to allow for developmental learning about
the assessment process. However, the process of communicating results is an integral
component in assessment and should not be considered something that happens after
assessment is completed.
When communicating the results of the assessment process, you may include
more than just the person being assessed. Communication of results may involve the
client, parents (if the examinee is a minor), and other professionals involved with the
examinee (i.e., teachers, school administrators, mental health professionals). In cases in
which clients are mandated for assessment by the court system, you may have to com-
municate results to the court or court officials. It is important to clearly state that results
are communicated to others only with the explicit permission of the client or guardian.
In some cases, you will have to determine who the appropriate person is to receive this
information. For example, in court-mandated cases, the client may actually be the court
system, and the results are only reported to the court. As you can see, the process of
communicating results is more complex than it may appear. This chapter will present
information about communicating assessment results using feedback sessions and
written reports.
JJ List and discuss the guidelines for reporting assessment results to other professionals
375
376 Chapter 16 • Communicating Assessment Results
Integrate
Select
Conduct Score Interpret Communicate Results into
Referral Assessment
Assessment Assessment Scores Results Counseling
Procedures
Process
Feedback Sessions
In most instances, the assessment process ends with an oral or written report, whereby
counselors communicate assessment results to clients (or other relevant parties) in a com-
prehensible and useful manner. Assessment results are of direct interest to clients. Depend-
ing on the reason for assessment, they may be concerned about what the results mean in
terms of their academic standing or goals, career decisions, abilities, or emotional prob-
lems. In some cases, the results of assessment have high stakes associated with them. For
example, assessment is required to determine competency to stand trial in forensic settings
and for competency to be sentenced to death in capital punishment cases. Being able to
communicate fair and valid interpretations of test results can be as serious as life and
death. Even in cases in which the stakes are not as high as life or death, the results can have
lasting implications. Imagine being a counselor in a school setting and having to explain to
parents that their child’s assessment results do not qualify them for gifted education. In a
real example, a parent had two older children who were involved in the gifted education
program at a local school. The parent asked that their youngest child also be tested for
inclusion in the program. After completing a multifaceted and multidimensional assess-
ment, it was determined that the child did not qualify. It was important to communicate to
the parent the procedures followed and the meaning of the test scores. In this case, the
student had an IQ of 110 and had similar achievement scores. His grades were also aver-
age, and his teachers viewed him as comparable to other children at the same grade level.
The parent was certain that the child was gifted and asked for an additional assessment.
As part of communicating the results of this assessment process, the standard error of
measurement was explained to the parent in a manner she could understand. With regard
to IQ tests, there is a band of error (e.g., + > -3) that predicts what a test score would be if
a person was tested over and over. In this case, we could predict that if the child was tested
again, his scores might improve or they might go down. Either way, the change would
only be slight, because intelligence is something that doesn’t change very much across
testing occasions. If the child was tested again, he would likely score between 107 and 113
Chapter 16 • Communicating Assessment Results 377
and still would not qualify for the gifted education program. By taking the time to explain
the results in an understandable manner, the parent was able to accept the results and real-
ize that the youngest child did not qualify for the additional program. Effective communi-
cation also saved the district the time and money involved in additional testing.
Hopefully, you can see that counselors and other helping professionals use feedback
sessions with clients to orally communicate their assessment results. During feedback ses-
sions, counselors can review and clarify assessment results, acquire additional information
relevant to the assessment process, educate the client (and family members, if relevant)
about what their results mean, and provide recommendations regarding interventions or
educational services. Furthermore, feedback sessions can be a vehicle for establishing good
rapport with clients and to help them understand why treatment is being recommended.
The feedback session is usually scheduled soon after the assessment, and spouses and
primary caregivers may be invited to attend. If the examinee is a child, then teachers, school
administrators, and other professionals may be included with the family’s permission.
Counselors should begin feedback sessions by describing the purpose of the assessment.
They should be prepared to explain difficult concepts and the limits of a test. Counselors
may also wish to explain the various sources of test bias that may affect the interpretation
of the scores. The client also needs to understand that test data represent just one source of
information and are not sufficient for fully assessing an individual’s abilities, skills, or traits.
After explaining the purpose of the test, counselors should report test results in a
manner that clients (and family members) can understand. Most people, professionals and
laypersons alike, find that percentile ranks can be the easiest types of scores to understand
(Lichtenberger, Mather, Kaufman, & Kaufman, 2004). When explaining percentile ranks to
clients, it is easy for them to see that being #1 is not the best outcome. If you are in the 1st
percentile, then that means that out of 100 people your score is equal to or higher than one
score. In other words, your score is probably the lowest out of the 100. Hopefully, as a
professional counselor you won’t have to tell someone they are number 1, but you can see
the utility of explaining assessment results in terms of percentile rank. Standard scores can
easily be converted to percentile ranks using a conversion table. Some professionals prefer
using qualitative descriptions of scores rather than the scores themselves. When appropri-
ate, the standard error of measurement can help to emphasize that the test results are not
absolute; results provide approximations of where true scores might fall. The following are
examples of how to orally communicate various types of scores with a hypothetical client
named Ellis, who is 10 years old and in the fifth grade as well as with Ellis’s parent (Miller,
Linn, & Gronlund, 2011):
• Percentile Ranks “With regard to math, Ellis scored in the 95th percentile. This
means that she performed very well compared to others in her grade. In fact, her
score was equal to or better than 95% of children her age who have taken the test.
• Deviation IQ Ellis is also above average in her ability to understand verbal con-
cepts. She performs much higher than most of her peers.
• T Scores Ellis did demonstrate some areas for concern on the Child Behavior Check-
list/6-18 Withdrawn/Depressed syndrome scale. The results indicate that she may
be having some problems with adjustment. According to her scores on the CBCL,
she may be experiencing some aspects of depression. Overall, the scores suggest
that she is somewhat withdrawn and lacks energy. She also seems to be experienc-
ing a sense of sadness.
378 Chapter 16 • Communicating Assessment Results
• Stanines Ellis received a stanine score of 9 on her testing. Basically, a 9 indicates that
Ellis is performing at a higher level than her peers. Her score is in the above-average
range (e.g., stanine scores from 7 to 9).
• Grade Equivalent Scores Ellis’ performance on the math computations test was
also above average. It is important to note that these scores are only comparing her
to others her age. Because she is performing at a higher level may not mean that she
should be promoted to a new grade level. Although a score of 6.3 might suggest the
need to move ahead to the sixth grade, such an assumption is not correct. The only
comparison group is same age peers. However, the 6.3 intimates that she should be
moved forward in a grade. However, the only comparison is with peers. As such,
she has no exposure to advanced materials. She is simply doing better than most of
her peers
• Age Equivalent Scores Age equivalent scores are also important to discuss. Ellis
scored an 11-2. This simply means that Ellis’ performance equals or exceed other
children’s performances in the 11th year.
Because clients are often nervous or anxious about receiving their assessment results,
counselors should focus on the clients’ thoughts and feelings about their scores and inter-
pretations. Thus, the feedback session itself can become a therapeutic experience for the
client. The client’s reaction to results also may provide professionals with new insight
about the individual. Furthermore, by participating in the discussion and interpretation of
assessment results, the client may become more accepting of the findings, gain more self-
awareness, and be more willing to use the information in decision making. Counselors
should consider the guidelines shown in Figure 16.2 for interpreting results in feedback
sessions (Reynolds, Livingston, Willson, & Willson, 2010).
Problem Areas
Counselors may face some issues when disseminating information during feedback ses-
sions. We present some of the more common problems here, including acceptance of
results, client readiness, negative results, flat profiles, and client motivation and attitude.
Acceptance The goal of the feedback session is often to get clients to accept assessment
results and incorporate that information into their decision making. Negative results fre-
Chapter 16 • Communicating Assessment Results 379
quently prompt test takers to resist accepting valid and genuine information about them-
selves. Counselors can enhance acceptance of test results by doing the following:
1. Involve the clients in decision making and general selection of tests prior to the testing.
2. Establish rapport with the clients so that they trust the counselor and are relaxed in
the sessions.
3. Spend sufficient time interpreting the results to test takers, but do not overwhelm
them with too much data.
4. Translate the results into language that the clients can understand.
5. Show the validity of the information for the decision(s) to be made.
Readiness The critical factor in the acceptance of the test results is the examinee’s readiness
to hear it. How one presents test results has a substantial effect on an individual’s ability
to accept the results. If the information is damaging to an examinee’s self-concept, then the
counselor might have to work on getting that individual to extend his or her acceptance.
The following techniques can help increase an examinee’s readiness level:
1. Have several sessions prior to the feedback session to build rapport.
2. Allow the client to bring up the topic of test results; don’t immediately begin the
feedback session with test interpretation.
380 Chapter 16 • Communicating Assessment Results
Negative Results Often, the results are not what the test taker wanted, desired, or
expected. The individual may have failed to pass a certification or licensing examination
or failed to achieve the minimum score for admission to college. On a clinical test, a client
may become defensive after being informed that he or she scored high on scales assessing
substance-abuse problems, or an individual may score high on the validity scales on a
personality test and be told that his or her test was invalid. The test administrator must
know how to appropriately communicate negative test results to individuals. The follow-
ing are several recommendations:
1. Explain the rationale for cutoff scores and the validity of the established procedures.
2. Avoid using negative terms or labels, such as “abnormal,” “deviant,” or “pathologi-
cal.”
3. Gain an understanding of the test taker’s perceptions and feelings.
4. Accept the test taker’s right to argue with test implications, without necessarily
agreeing with the test taker.
5. Identify other information about the individual that supports or does not support the
test data.
6. Discuss the implications of the data and the importance of that information for deci-
sion making.
Flat Profiles Many times, an individual’s pattern of scores has no highs or lows, but is
just a flat profile. On aptitude and achievement tests, this would indicate a similar level of
performance in all areas. On interest or career inventories, flat profiles may indicate that
the client is undecided about future goals. Response set might be a factor on a test, with the
individual ranking everything high, average, or low. In the case of flat profiles on interest
and career guidance tests, counselors can ask clients to read the descriptions of the six
Holland types and rank the three types most characteristic of themselves (Gati & Amir,
2010) and discuss the clients’ expectations, relevant past experiences, previous work activ-
ities, and misconceptions and stereotypes. In the case of flat profiles on aptitude and
achievement tests, counselors can look at the interests, values, and career goals and assure
the clients that their profiles are not abnormal. The counselor should discuss what an indi-
vidual considers his or her goals to be and what he or she can do acceptably. A good pro-
cedure is to investigate the individual’s performance on previous tests to determine
whether this pattern is typical.
Motivation and Attitude Test results are more significant to clients who are motivated
to take a test, come in and discuss the results, and have a positive attitude toward the value
of the data. Some clients have a negative attitude toward testing prior to the test and main-
tain that attitude afterward. Some clients become negative after they see that the test results
are not what they expected.
Counselors should recognize that tests can aid clients in developing more realistic
expectations about themselves and can be valuable in decision making. However, some
clients put too much weight on the results and become overly dependent on test data to
solve their problems. Other clients use test results as a way of escaping from their feelings
and problems. Counselors interpreting test results need to be aware not only of a client’s
Chapter 16 • Communicating Assessment Results 381
motivation to take or not take a test but also his or her attitude toward the test. Other
important information is the immediate goal for the interpretation of the test results and
the client’s desire to be involved in decision making about the type of test to be taken and
the dissemination of test results.
2. Be sure to point out that ability or aptitude test scores may change over time because
the abilities measured are learned.
3. Present the results of ability tests in a way that helps parents to understand that
scores do not equal school performance. Many factors impact performance, and the
results of an ability test only indicate the potential for a certain level of performance.
During a feedback session, parents may express concerns about a variety of aspects
of their child’s assessment. A common question is what to do if they disagree with the
results of the testing—for example, if assessment results indicate that the child is not ready
for kindergarten, yet the parents think the child is, or if the parents think their child should
be placed in the gifted program, but assessment results show that the child doesn’t qualify
for placement. Parents should be advised of avenues they might take, such as pursuing an
independent evaluation on their own, while recognizing that the school district may not
reimburse them for the evaluation and that it may be difficult to have the client tested with
the same tests.
Parents may want to know whether test results really matter. Professionals can
explain the objectives of assessment programs and the issue of accountability to help par-
ents understand why assessment results are important. They should also emphasize to
parents that many placement and selection decisions are based on the outcome of the
assessment process; thus, parents should encourage children to take the assessment pro-
cess seriously and do their best.
Parents might ask whether the tests are culturally biased. Examiners can discuss with
parents the procedures that many standardized tests use to reduce test bias, such as using
panels of experts to review test items or comparing test results among various groups to
see if score differences occur because of affiliation with a particular group. It is important
to explain that even with these attempts, some bias might still be operating in the testing
situation and may distort a person’s true score.
Parents may be concerned about whether scores are fixed or changeable. They need
to be reminded that a test measures a sample of behavior at a particular time, using items
from a given domain. Tests scores do change; they can go up and down, and their child’s
scores may vary from year to year. Sometimes, poor performance happens because a stu-
dent may not be skilled in the areas measured by the test. Most children perform better in
some areas than in others. Parents need to understand the dynamics at work and refrain
from hasty value judgments.
Parents may ask whether children should be informed about their own test results. In
most cases, students should be provided feedback on their performance in words and
terms they can understand. The information can help them understand their own strengths
and weaknesses and make more realistic educational and vocational choices.
Assessment Reports
In addition to oral feedback, counselors may communicate findings in a written assessment
report, often called a psychological report or a psychoeducational report. These reports summa-
rize the data collected from interviews, tests, and observations that directly addressed the
reason for assessment. Professionals from various disciplines write assessment reports,
including school psychologists, clinical psychologists, neuropsychologists, counselors,
and speech and language therapists. Although their professional roles may differ, all
engage in the process of report writing. The central purposes of assessment reports are
384 Chapter 16 • Communicating Assessment Results
fourfold: (a) to describe the client, (b) to record and interpret the client’s performance on
assessment instruments, (c) to communicate assessment results to the referral source, and
(d) to make recommendations or decisions regarding educational programs, therapy, or
other appropriate intervention (Cohen, Swerdlik, & Sturman, 2012; Engelhard & Wind,
2013; Urbina, 2014). Reports also create a record of the assessment for future use and may
serve as a legal document (Gunn & Taylor, 2014).
The content of written reports addresses the specific concerns (e.g., psychological,
behavioral, academic, linguistic) of the referral source (Goldfinger & Pomerantz, 2013).
For example, a teacher may request information about whether a child’s learning pro-
gress differs from peers or why a child is having difficulty in a particular subject area,
such as reading (Schwean et al., 2006). Parents might request an assessment to determine
if their child meets the school’s criteria for inclusion in the gifted and talented program.
A judge may need to know the best placement for a child in a child custody dispute. A
probation officer might want to know an adolescent’s psychological functioning to deter-
mine whether any mental health needs should be addressed. Each referral source has
different information needs. Assessment reports are tailored to address the referral ques-
tions and the intended audiences.
Written reports should integrate information from multiple assessment methods,
including interviews, observations, tests, and records. The extent to which reports inte-
grate assessment information varies greatly. Reports that are low on integration generally
focus more on test scores, neglect inconsistent results between various methods, ignore the
overall context of the client, and do not discuss the meaning of the scores for the client. An
integrated report carefully blends assessment findings so that they have specific meaning
for the client (Goldfinger & Pomerantz, 2013).
The length of written reports can vary substantially. Although the average report is
between five and seven single-spaced pages (Donders, 2001), no universal agreement on
length is recommended (Groth-Marnat & Horvath, 2006). For forensic evaluations, Acker-
man (2006) distinguishes between reports that are brief (1–3 pages), standard (2–10 pages),
and comprehensive (10–50 pages). Length is based on the purpose of the report, the con-
text (i.e., educational, mental health, medical, or vocational settings), the expectations of
the referral source, the importance and impact of the assessment, and the complexity and
quantity of information to be reported.
readers to interpret, and ambiguous sentences are often misinterpreted, because they
imply different meanings to different individuals. The sentence, “John lacks mechanical
aptitude” leaves the reader to interpret “lacks” and “mechanical aptitude.” It would be
better to say, “John is unable to use the screwdriver or put washers on the bolts.” Similarly,
in the sentence “Mary is an extrovert,” who is to define an extrovert? It would be better to
say, “Mary likes to be with people and be the center of attention. She talks and laughs
loudly and makes sure she introduces herself to everyone in the room.”
When writing assessment reports, counselors should consider the following recom-
mendations (Carr & McNulty, 2014; Goldfinger & Pomerantz, 2013; Lichtenberger et al.,
2004; Wiener & Costaris, 2012):
• Avoid jargon and abbreviations.
• Refer to yourself in third person (e.g., “the examiner found” instead of “I found”).
• Use simple words and concise sentences.
• Avoid using needless words and phrases.
• Avoid redundancies.
• Begin paragraphs with a strong declarative sentence followed by information that
supports the declarative statement.
• Write background information and observations in the past tense.
• Write assessment results in the present tense.
• Information irrelevant to the purpose of the assessment generally should not be
included.
• Capitalize test titles.
• Pay attention to punctuation, capitalization, spelling, and grammar.
psychological evaluations) will contain a list of the tests that were used during the
assessment process. However, when tests are not used in the assessment process, they
are not listed or discussed in the written report. We will describe each of the sections of
an assessment report. As an example, we have provided an abbreviated assessment
report later in the chapter.
Title and Identifying Information Most reports begin with a title that is followed by
important identifying information. The title is typically centered across the top of the first
page (e.g., Mental Health Evaluation, Psychoeducational Assessment, Psychological Eval-
uation). Identifying information is generally recorded underneath the title as follows:
• Examinee’s name
• Date of birth
• Chronological age
• School and grade (if assessing a student)
• Date(s) of assessment
• Report date (date written)
• Examiner’s name
Reason for Referral The first substantive section of the report contains the reasons that
the individual is being referred for assessment. This section is crucial, because the reason
for referral determines the focus of an evaluation and provides the rationale for the assess-
ment; all other sections of the report should be written with the referral question in mind
(Lichtenberger et al., 2004). Writers must also assume that the referral source is not the
only individual who will read the report; other individuals, such as other mental health
professionals, attorneys, case managers, parents/spouses/children of the examinees, and
even examinees themselves, may read the report.
Background Information This section provides a context for understanding the client
(Goldfinger & Pomerantz, 2013). In this section, counselors give a brief description of the
individual’s personal history, which may include a chronological summary of the client’s
social and developmental history, medical history, education, family constellation, and
employment information (if relevant). This information is typically obtained through
interviews (with the client and relevant collateral sources) and records (e.g., school records,
previous assessment results, intervention plans, court documents, health records).
Although this section provides the context in which assessment results are interpreted, if
the background information is not relevant to the referral problem and it is very personal,
then counselors should consider carefully whether or not to include it in the report
(Kamphaus & Frick, 2005). The report should not include hearsay, unverified opinions,
generalized statements, or potentially harmful or damaging information.
Behavioral Observations This section covers observations that occurred during the
assessment process and/or observations that occurred in other settings (e.g., the class-
room, waiting room, playground, at home; Lichtenberger et al., 2004). Behaviors can be
observed directly by the counselor, or observations can be provided indirectly by reports
from teachers, parents, and others who have contact with the client. Examples of observa-
tions made during the assessment include physical appearance, ease of establishing and
Chapter 16 • Communicating Assessment Results 387
maintaining rapport with the client, language style (e.g., speed, pitch, volume, and rhythm
of speech), attention span, distractibility, activity level, anxiety level, mood, attitude
toward the assessment process, attitude toward the examiner, and unusual mannerisms or
habits (Lichtenberger et al., 2004). Typically, behaviors are presented in the report if they
had a positive or negative impact on the individual’s performance during the assessment
or on the individual’s academic, social, or emotional functioning.
accuracy of these interpretations. Another concern is that many of the lengthy narratives
often appear valid (even if they are not valid) to untrained consumers and even to some
professionals (Goldfinger & Pomerantz, 2013). Unqualified counselors may use these
reports to compensate for a lack of training and experience. In addition, counselors may
come to depend more on the CBTIs than on their own clinical judgment. There can be
numerous problems with the interpretations made using CBTIs (Lichtenberger, 2006).
Counselors should view CBTIs as a valuable adjunct to, rather than a substitute for,
clinical judgment. In this light, counselors can incorporate the information from CBTIs
into the results and interpretation section of the written report in the context of their own
clinical judgment.
Summary This section of the report integrates key information from each part of the report
as well as the counselor’s hypotheses or clinical impressions about the client. This section
is particularly important; some may read only this section out of the entire written report.
Counselors briefly restate the reason for referral, pertinent background information,
behavioral observations, and test results and interpretations. The hypotheses or clinical
impressions are a key element of this section because they help describe and clarify the
nature of and reasons for the client’s particular problems or concerns. All hypotheses and
clinical impressions are based on and supported by multiple pieces of assessment informa-
tion. A clear summary of the assessment is then provided that answers the referral ques-
tions using connections among the examinee’s background information, behavior
observations, and test results. Summaries are concise and should rarely exceed one page.
New information is never included in the summary.
Assessment Results and Interpretation and struggled to recall the exact word he was
looking for.
Wechsler Intelligence Scale
for Children (WISC-V)
Peabody Picture Vocabulary Test (PPVT-4)
Standard Percentile
Standard Score Percentile Rank
Composite Score Rank
PPVT-4 Score 90 25
Verbal Comprehension 96 39
Index The Peabody Picture Vocabulary Test, Fourth Edi-
Visual Spatial 95 37 tion (PPVT-4) was administered to measure Sean’s
Fluid Reasoning 103 58 receptive (hearing) vocabulary. The test consists
Working Memory 95 37 of items that require test takers to select one pic-
Processing Speed Index 97 42 ture (from a group of four pictures) that best rep-
Full Scale IQ 97 42
resents the meaning of a word. Sean obtained a
PPVT-4 standard score of 90, which is in the low
The Wechsler Intelligence Scale for Children,
average range for his age. This indicates that
Fifth Edition (WISC-V) was administered to assess
Sean has knowledge of basic vocabulary.
Sean’s general intellectual abilities. Sean’s Full
Scale IQ (97) and the Verbal Comprehension
Summary
(96), Perceptual Reasoning (95), Working Mem-
ory (103), and Processing Speed (97) index scores Sean was evaluated for a possible learning disa-
are all within the average normal range. There bility. In terms of cognitive abilities, Sean’s scores
are no significant differences between index are in the average range; there appears to be no
scores, indicating no significant areas of cogni- cognitive issues interfering with his ability to
tive strengths or weaknesses. learn. Sean performed in the low average to
average range on tests assessing expressive and
Expressive Vocabulary Test (EVT-2) receptive language. He exhibits some weakness
in rapid word retrieval, which is consistent with
Standard Score Percentile Rank
teacher reports and observations made during
EVT-2 Score 98 45 the assessment. However, he does not meet the
criteria for a learning disability.
The Expressive Vocabulary Test, Second Edition
(EVT-2) was administered to assess expressive
Recommendations
language and word retrieval. Items consist of
tasks in which the examiner presents a picture Sean’s difficulty with recalling words may inter-
to the test taker, makes a statement or asks a fere with his ability to manage classroom assign-
question, and the test taker responds with a ments. He would benefit from regular classroom
one-word answer. Sean obtained a standard supports, such as allowing ample time when
score of 98, which is in the average range. This Sean is required to respond to questions, allow-
indicates that he is able to use a range of words ing Sean to speak without feeling rushed or
typical of children his age. During the testing, being interrupted, and offering Sean prompts
he displayed some difficulty with word retrieval when he is struggling to recall a word.
Some reports written for medical or forensic settings often require evaluators to
record a formal diagnosis from the Diagnostic and Statistical Manual of Mental Disorders,
Fifth Edition (DSM-5; American Psychiatric Association (APA), 2013). The diagnosis can be
written in sentence format.
outcomes for the individual being assessed. Recommendations vary based on the rea-
son for referral as well as the setting where the assessment takes place. For example,
recommendations for students in K–12 schools typically focus on behavioral interven-
tions, instructional strategies, or other appropriate educational services (Lichtenberger
et al., 2004). Recommendations for clients assessed in mental health clinics usually
center on treatment recommendations based on the client’s diagnosis. Note that some
assessment reports also contain an appendix that includes any additional information
or readings that the counselor wishes to share to help implement the recommenda-
tions—for example, information about a particular learning or emotional disorder, the
pharmaceutical treatment of a particular disorder, or a specific technique to use for
spelling instruction.
1. Find out exactly what information the recipient needs, what he or she plans to do
with it, and what qualifications he or she has.
2. Make sure ethical and legal procedures are followed, such as securing a client’s writ-
ten permission to release information.
3. Check to see whether procedures have been established for test information. Nor-
mally, a policy is already in force.
4. Aim the report as directly as possible to the particular question asked. This practice
saves time and provides clear communication of needed information.
Dear Parents,
Within the next few weeks, your child will be taking the Stanford Achievement Test Series, Tenth
Edition (Stanford 10). The Stanford 10 is a norm-referenced standardized test. This means that your
child’s scores will be compared to the scores of thousands of other children in the same grade who
also took the test. The Stanford 10 includes many of the subjects that your child is taught in school.
Please note that a student does not pass or fail the Stanford 10. Test results help show what a
student knows about a subject and how the teacher can best help the student to learn.
Included in this letter are several sample test items. We recommend that you share these sample
questions with your child so he or she will know what to expect. The teacher will also be reviewing
sample questions before the Stanford 10 is given.
You can help your child prepare for the test-taking experience by following some simple suggestions:
1. Each child should have a good night’s sleep prior to a testing day. We suggest 10 hours of sleep.
2. Each child should have a good, well-balanced breakfast before coming to school on a
testing day.
3. Conflicts and arguments should be avoided. A child’s emotional state has great influence
on performance.
4. Make sure your child understands that the Stanford 10 scores simply provide information.
Scores will not be used to reward or punish students.
If you have any questions or suggestions, please do not hesitate to contact the school office, in
person or by telephone. We have appreciated your assistance and support throughout the year.
Sincerely,
School Counselor
Summary
Communicating assessment results is the final information relevant to the assessment process,
step of the assessment process. Once the assess- to educate the client (and family) about what
ment process is complete and results are ana- the results mean, and to provide recommenda-
lyzed and interpreted, professionals have the tions. Written assessment reports are also used
responsibility of communicating the results to to communicate assessment findings. Reports
those individuals entitled to the information, integrate information obtained from interviews
such as the client, parents (if the examinee is a with the client, test results, observations, and
minor), and other professionals involved with any information obtained from collateral
the examinee (e.g., teachers, school administra- sources. The central objectives of assessment
tors, mental health professionals). reports are to describe the individual being
Feedback sessions are meetings that coun- assessed, answer the referral questions, organ-
selors schedule with the client to disclose assess- ize and interpret the data, and recommend
ment results. The sessions serve to review and interventions related to the original reason for
clarify assessment results, to acquire additional referral.
Suggested Activities
1. Interview school counselors, mental health coun- approaches to problems such as client acceptance
selors, career counselors, or psychologists who of results, examinee readiness, negative results,
use tests frequently. Find out how they commu- flat profiles, and examinee motivation and atti-
nicate assessment results to their clients and their tude. Report your findings to class.
Chapter 16 • Communicating Assessment Results 393
2. Conduct simulated feedback sessions in which a gifted program and was accepted by virtue of a
student portrays a “counselor” communicating 133 score on the Stanford–Binet Intelligence Scale.
test results to (a) a “parent” (also played by a stu- Mary’s mother can’t make any sense out of the
dent) and then to (b) an adult “client.” Videotape test scores and wants the teachers and counselors
the sessions. to help her daughter improve her scores.
3. Devise a workshop for a group of individuals in
a. How would you approach a feedback session
your field who need help in communicating test
with Mary’s mother?
results.
b. What would you tell her about the test results?
4. Role-play a situation in which you have to con-
sult a helping professional on testing problems or 6. Read the following case study and answer the
results. For example, a teacher might want to questions at the end:
know how she can help her students improve
their test scores because she is afraid that she will Albert, a 23-year-old male, would like help in
lose her job if the class does not do well on understanding his results on the Self-Directed
achievement tests, or a doctor might want to Search (SDS) and the Myers–Briggs Type Indica-
know whether a child shows any signs of learn- tor (MBTI). You find out that Albert dropped out
ing disabilities and how that information could of school in the 10th grade and is enrolled in a
be relayed to the child’s parents. high school equivalency program. He’s held
5. Read the following case study and answer the numerous jobs in the food service industry but
questions at the end: has been unable to keep them. He has a wife and
three children and realizes that he needs further
Mary’s mother was very concerned when she training and education to support his family. He
received her second grader’s scores on the Stan- is an ISFJ on the MBTI and an ARS on the SDS.
ford Achievement Test. Mary scored at the 99th
percentile on Total Mathematics but only at the a. How would you communicate Albert’s assess-
81st on Total Reading. Mary scored at the 95th ment results to him?
percentile on Spelling but at the 45th on Language b. What would you tell him about the test re-
and 54th on Science. She had been tested for the sults?
References
Ackerman, M. J. (2006). Forensic report writing. Engelhard, G., Jr., & Wind, S. A. (2013). Educational
Journal of Clinical Psychology, 62, 59–72. testing and schooling: Unanticipated consequences
American Counseling Association. (2014). ACA Code of purposive social action. Measurement: Interdisci-
of Ethics. Alexandria, VA: Author. plinary Research and Perspectives, 11(1–2), 30–35.
American Psychiatric Association (APA). (2013). Gati, I., & Amir, T. (2010). Applying a Systemic
Diagnostic and statistical manual of mental disorders Procedure to Locate Career Decision-Making
(5th ed.). Washington, DC: Author. Difficulties. The Career Development Quarterly,
Carr, A., & McNulty, M. (2014). Intake interviews, 58(4), 301–320.
testing, and report writing. In A. Carr & M. Goldfinger, K., & Pomerantz, A. M. (2013). Psycho-
McNulty (Eds.), The Handbook of Adult Clinical logical assessment and report writing. Thousand
Psychology: An Evidence Based Practice Approach Oaks, CA: Sage.
(pp. 253–288). New York, NY: Routledge. Groth-Marnat, G., & Horvath, L. W. (2006). The psy-
Cohen, R. J., Swerdlik, M. E., & Sturman, E. D. (2012). chological report: A review of current controver-
Psychological testing and assessment: An introduction sies. Journal of Clinical Psychology, 62, 73–81.
to tests and measurement (8th ed.). B oston, MA: Gunn, J., & Taylor, P. (2014). Forensic psychiatry: Clini-
McGraw-Hill. cal, legal, and ethical issues. Boca Raton, FL: CRC
Donders, J. (2001). A survey of report writing by neu- Press.
ropsychologists, II: Test data, report format, and Joint Committee on Testing Practices. (2004). Code of
document length. The Clinical Neuropsychologist, fair testing practices in education. Washington, DC:
15, 150–161. Author.
394 Chapter 16 • Communicating Assessment Results
Kamphaus, R. W., & Frick, P. J. (2005). Clinical assess- Sattler, J. M., & Hoge, R. D. (2006). Assessment of chil-
ment of child and adolescent personality and behavior dren: Behavioral, social, and clinical foundations
(2nd ed.). New York, NY: Springer. (5th ed.). San Diego, CA: Jerome M. Sattler
Lichtenberger, E. O. (2006). Computer utilization and Publisher Inc.
clinical judgment in psychological assessment Schwean, V. L., Oakland, T., Weiss, L. G., Saklofske,
reports. Journal of Clinical Psychology, 62(1), 19–32. D. H., Holdnack, J. A., & Profitera, A. (2006). Report
Lichtenberger, E. O., Mather, N., Kaufman, N. L., & writing: A child-centered approach. In L. G. Weiss,
Kaufman, A. S. (2004). Essentials of assessment D. H. Saklofske, A. Prifitera, & J. A. Holdnack
report writing. Hoboken, NJ: John Wiley and (Eds.), WISC-IV advanced clinical interpretation
Sons. (pp. 371–420). Boston, MA: Academic Press.
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2011). Urbina, S. (2014). Essentials of psychological testing
Measurement and assessment in teaching (11th ed.). (Vol. 4, 2nd ed.). Hoboken, NJ: John Wiley & Sons.
Upper Saddle River, NJ: Pearson. Wiener, J., & Costaris, L. (2012). Teaching psycholog-
Reynolds, C. R., Livingston, R. B., Willson, V. L., & ical report writing: Content and process. Canadian
Willson, V. (2010). Measurement and assessment in Journal of School Psychology, 27(2), 119–135. doi:
education. Upper Saddle River, NJ: Pearson. 10.1177/0829573511418484
CHAPTER
Professional standards and codes of ethics express the values on which counselors build
their practice and provide a framework for responsible test use. There are probably as
many codes of ethics as there are professional societies, but to become effective helping
professionals, individuals must be committed to the ethical standards of their profession
and follow them in their practice. In addition to ethical codes, a number of laws at both
the state and national level affect assessment and testing practices. Counselors need to be
familiar with the statutes, regulations, and court decisions that have implications for
assessment. Although this is the final chapter in the textbook, it is a topic that is integral
to each and every chapter. We considered placing this chapter toward the beginning of
the book, but we believe that you must develop an understanding of the basics of assess-
ment before you can understand how to ethically and legally apply your understanding.
Some of you may read this chapter earlier in a course on measurement and assessment,
whereas others will read it at the end of a graduate course. Regardless of when you read
the chapter, we strongly suggest integrating the material into your understanding of the
assessment process.
assessment.
JJ Explain the importance of professional training and competence in assessment and
JJ List and describe the judicial decisions affecting employment and educational
assessment.
395
396 Chapter 17 • Ethical and Legal Issues in Assessment
by an individual or group that provide the basis for right conduct. Most governing
bodies associated with psychological and educational assessment have established
codes of ethics for their members to follow. These codes provide guidelines for profes-
sionals but do not provide answers to all ethical dilemmas. Thus, it is up to individual
professionals in the field to reflect on their behavior and assess whether what they’re
doing is in the best interest of their clients. Several professional organizations have
ethical standards that relate specifically to assessment. Our discussion will center on
the following:
• American Counseling Association—Code of Ethics
• American Educational Research Association (AERA), the American Psychological
Association (APA), and the National Council on Measurement in Education
(NCME)—Standards for Educational and Psychological Testing
• American Psychological Association (APA)—Ethical Principles of Psychologists and
Code of Conduct
• Association for Assessment in Counseling (AAC)—Responsibilities of Users of Stand-
ardized Tests, Third Edition (RUST)
• Joint Committee on Testing Practices—Code of Fair Testing Practices in Education
• National Council on Measurement in Education—Code of Professional Responsibilities
in Educational Measurement
TABLE 17.1 ACA Code of Ethics Section E.1 to E.4: Evaluation, Assessment, and Interpretation
Counselors use assessment as one component of the counseling process, taking into account the clients’
personal and cultural context. Counselors promote the well-being of individual clients or groups of clients by
developing and using appropriate educational, mental health, psychological, and career assessments.
E.1. General
E.1.a. Assessment
The primary purpose of educational, mental health, psychological, and career assessment is to gather infor-
mation regarding the client for a variety of purposes, including, but not limited to, client decision making,
treatment planning, and forensic proceedings. Assessment may include both qualitative and quantitative
methodologies.
E.1.b. Client Welfare
Counselors do not misuse assessment results and interpretations, and they take reasonable steps to prevent
others from misusing the information provided. They respect the client’s right to know the results, the inter-
pretations made, and the basis for counselors’ conclusions and recommendations.
Source: Reprinted from the ACA Code of Ethics. Copyright © 2014 The American Counseling Association. Reprinted
with permission. No further reproduction authorized without written permission from the American Counseling
Association.
398 Chapter 17 • Ethical and Legal Issues in Assessment
familiar with the guidelines. The current edition of the Standards is organized into
three parts:
1. Foundations This section contains standards for validity, reliability/precision and
errors of measurement, and fairness in testing. The subsections discuss standards
related to test development and revision; scaling, norming, and score comparability;
test administration, scoring, and reporting; and supporting documentation for tests.
According to the Standards, “validity is … the most fundamental consideration in
developing tests and evaluating tests” (AERA et al., 2014, p. 11). As such, the S tandards
addresses the different types of validity evidence needed to support test use. In addi-
tion, standards on reliability and errors of measurement address the issue of consist-
ency of test scores. Although the Standards supports standardized procedures, it
recognizes the need to consider fairness and testing. According to the Standards,
special situations may arise in which modifications of the procedures may be advis-
able or legally mandated; for example, a person with a vision impairment may need
a larger print version of a standardized test. Standards for the development and revi-
sion of formal, published instruments, an often overlooked area of importance,
describe criteria important for scale construction.
The first section of the Standards focuses heavily on fairness and bias, the rights
and responsibilities of test takers, testing individuals of diverse linguistic back-
grounds, and testing individuals with disabilities. This section emphasizes the impor-
tance of fairness in all aspects of testing and assessment. According to the standards,
fairness is a foundational construct that impacts validity. Fairness should be consid-
ered in all aspects of test use. Special attention to issues related to individuals of
diverse linguistic backgrounds or with disabilities may be needed when developing,
administering, scoring, interpreting, and making decisions based on test scores.
2. Operations The second section of the standards relates to test design and develop-
ment; scores, scales, norms, score linking, and cut scores; test administration, scoring
reporting, and supporting documentation for tests; the rights and responsibilities of
test takers; and the rights and responsibilities of test users. This section of the Stand-
ards addresses the more technical and psychometric aspects of assessment. As profes-
sional counselors, it will be important to carefully review Part II and to develop a
strong understanding of how these standards apply in practice.
3. Testing Applications The third section of the Standards provides an overview of psy-
chological testing and assessment; workplace testing and credentialing; educational
testing and assessment; and uses of tests for program evaluation, policy studies, and
accountability. This section provides specific guidance on the general responsibilities
of test users in each area. One of the new elements of the Standards is accountability.
This final section provides specific guidance for test users on evaluation of programs
and policy initiatives, test-based accountability systems, and issues in program and
policy evaluation and accountability.
The first standard (Standard 9.01) states that psychologists should base recommenda-
tions on information and techniques sufficient to substantiate their findings. Standard 9.02
addresses the importance of using valid and reliable assessment techniques as evidenced by
research. The third standard states that psychologists must obtain informed consent when
using assessment techniques; this includes explaining the nature and purpose of the assess-
ment, fees, involvement of third parties, and limits of confidentiality. The fourth standard
asserts that psychologists must not release a client’s test results unless the client gives per-
mission; in the absence of client permission, psychologists provide test data only as required
by law or court order. Standard 9.05 refers to ethical procedures involved in test construc-
tion. The sixth standard addresses test interpretation and psychologists’ need to explain
results in the language that can be understood by the individual being assessed. The sev-
enth standard emphasizes psychologists’ responsibility of not promoting the use of psycho-
logical assessment techniques by unqualified examiners. Standard 9.08 emphasizes the
importance of not using obsolete tests or outdated test results; that is, psychologists must
refrain from basing their assessment, intervention decisions, or recommendations on out-
dated test results and measures that are not useful for the current purpose. Standards 9.09
and 9.10 refer to scoring and interpreting tests and explaining assessment results. Individu-
als offering assessment or scoring services to other professionals have the obligation to
make sure their procedures are appropriate, valid, and reliable. In explaining assessment
results, psychologists must ensure that explanations are given by appropriate individuals
or services. The 11th and final standard holds the psychologist responsible for making rea-
sonable efforts to maintain the integrity and security of tests and other assessment tech-
niques consistent with the law, contractual obligations, and the code of ethics.
includes members from the ACA, the AERA, the APA, the American Speech-Language-
Hearing Association (ASHA), the National Association of School Psychologists (NASP),
the National Association of Test Directors (NATD), and the National Council on Measure-
ment in Education (NCME). The code emphasizes fairness as the primary consideration in
all aspects of testing. Professionals are obligated to provide and use tests that are fair to all
test takers, regardless of age, gender, disability, race, ethnicity, national origin, religion,
sexual orientation, linguistic background, or other personal characteristics. In this regard,
the Code provides guidance for test developers and test users in four areas:
1. Developing and selecting appropriate tests
2. Administering and scoring tests
3. Reporting and interpreting test results
4. Informing test takers
and interpret tests. Different tests require different levels of competency. Some tests that
require a high level of skill include the Wechsler scales, the Thematic Apperception Test, and
the Rorschach. The professional standards of several professional associations have set explicit
guidelines for the competencies of tests users; the following is a summary of these guidelines:
Test-User Qualifications
According to the International Test Commission (2013), competence is a key facet in
qualifying individuals for test use. The concept of test-user qualifications refers to the nec-
essary knowledge, skills, abilities, training, and credentials for using tests (Oakland,
2012). The issue of test-user qualifications remains controversial. Although some profes-
sional groups advocate restricting the use of psychological tests to psychologists only,
others believe firmly that one’s qualifications to use tests is directly related to compe-
tence, not to a specific professional field, and competence can be achieved through vari-
ous means, such as education, training, and experience in the use of tests. Internationally,
competence-based assessment practice is considered the standard over specific licenses
or degrees. According to the International Test Commission (ITC), responsible test users
work within the limits of scientific principle and substantiated experience; set and main-
tain high personal standards of competence; know the limits of their own competence
and operate within those limits; and keep up with relevant changes and advances relat-
ing to the tests they use and to test development, including changes in legislation and
policy, which may impact on tests and test use (2013). The ACA (2003) also developed a
set of standards for test user qualification, the Standards for the Qualifications of Test Users,
which states that professional counselors are qualified to use tests and assessments in
counseling practice to the degree that they possess the appropriate knowledge and skills
(see Table 17.2).
The purchase of tests is generally restricted to persons who meet certain minimum
qualifications. Most test publishers rely on a three-level system for classifying test-user
qualifications that was first developed by APA in 1950. The APA later dropped the classi-
fication system in 1974, but many publishers continue to use this or a similar system. The
classification includes the following levels:
A Level Test users are not required to have advanced training in the test administra-
tion and interpretation to purchase A-level tests. They may have a bachelor’s degree
in psychology, human services, education, or related disciplines; training or certifica-
tion relevant to assessment; or practical experience in the use of tests. Examples of
A-level tests include some aptitude and career exploration tests.
B Level To use B-level tests, practitioners typically have a graduate degree in psy-
chology, counseling, education, or related disciplines; have completed specialized
training or coursework in testing; or have licensure or certification documenting
training and experience in testing. In addition, being a member of a professional
organization, such as ASHA or the American Occupational Therapy Association
(AOTA), may make one eligible to purchase B-level products. Examples of B-level
tests include general intelligence tests and interest inventories.
C Level C-level tests require users to have B-level qualifications plus a doctorate
degree in psychology or a related discipline (that provides appropriate training in
the administration and interpretation of tests), licensure or certification, or to be
under the direct supervision of a qualified professional in psychology or a related
field. Examples of C-level tests include intelligence tests, personality tests, and
projective measures (e.g., the Wechsler Intelligence Scale for Children [WISC-IV],
the Minnesota Multiphasic Personality Inventory [MMPI-II], the Rorschach Ink-
blot Test).
Chapter 17 • Ethical and Legal Issues in Assessment 403
TABLE 17.2 ACA Standards for the Qualifications of Test Users (Continued)
6. Knowledge of the impact of diversity on testing accuracy, including age, gender, ethnicity, race,
disability, and linguistic differences
Professional counselors using tests should be committed to fairness in every aspect of testing. Information
gained and decisions made about the client or student are valid only to the degree that the test accurately
and fairly assesses the client’s or student’s characteristics. Test selection and interpretation are done with
an awareness of the degree to which items may be culturally biased or the norming sample not reflective
or inclusive of the client’s or student’s diversity. Test users understand that age and physical disability dif-
ferences may impact the client’s ability to perceive and respond to test items. Test scores are interpreted
in light of the cultural, ethnic, disability, or linguistic factors that may impact an individual’s score. These
include visual, auditory, and mobility disabilities that may require appropriate accommodation in test
administration and scoring. Test users understand that certain types of norms and test score interpretation
may be inappropriate, depending on the nature and purpose of the testing.
7. Knowledge and skill in the professionally responsible use of assessment and evaluation practice
Professional counselors who use tests act in accordance with the ACA’s Code of Ethics and Standards of
Practice (1997), Responsibilities of Users of Standardized Tests (AAC, 2003), Code of Fair Testing Practices
in Education (JCTP, 2002), Rights and Responsibilities of Test Takers: Guidelines and Expectations (JCTP,
2000), and Standards for Educational and Psychological Testing (AERA et al., 1999). In addition, profes-
sional school counselors act in accordance with the American School Counselor Association’s (ASCA’s)
Ethical Standards for School Counselors (ASCA, 1992). Test users should understand the legal and ethical
principles and practices regarding test security, using copyrighted materials, and unsupervised use of as-
sessment instruments that are not intended for self-administration. When using and supervising the use of
tests, qualified test users demonstrate an acute understanding of the paramount importance of the well-
being of clients and the confidentiality of test scores. Test users seek ongoing educational and training
opportunities to maintain competence and acquire new skills in assessment and evaluation.
Source: Reprinted from Standards for the Qualifications of Test Users, Copyright © 2003 The American Counseling
Association. Reprinted with permission. No further reproduction authorized without written permission from the
American Counseling Association.
Professionals need to stay informed of the major laws that affect assessment. We will
begin by presenting statutes and regulations that have implications for assessment and
several judicial decisions brought about from cases involving litigation.
Americans with Disabilities Act of 1990 The Americans with Disabilities Act (ADA) was
passed by Congress and signed into law on July 26, 1990. The law was passed for the
purpose of reducing discrimination and making everyday life more accessible to the
over 43 million Americans who have some form of disability. The ADA defines a disabil-
ity as a physical or mental impairment that substantially limits a major life activity
(U.S.C. Title 42 Chapter 126 Section 12102 [2]). The ADA states that an employment
agency or labor organization shall not discriminate against individuals with a disability.
This applies to job application procedures, hiring, advancement and discharge of
employees, worker’s compensation, job training, and other terms, conditions, and privi-
leges of employment.
The law has certain provisions related to employment assessment. Under the reason-
able accommodations section, it states the following:
In other words, employers cannot select and administer an employment test if a par-
ticular disability adversely affects an individual’s performance on that test. This means that
to comply with the ADA, individuals with disabilities must be assessed using “reasonable
accommodations”—that is, appropriate changes and adjustments in test-administration
procedures. It is important to remember that when any modification is made to a stand-
ardized test, results should be interpreted cautiously, recognizing that modification can
jeopardize validity. Examples of modifications include the following:
• Extending testing time
• Providing written materials in large print, Braille, or audiotape
• Providing readers or sign language interpreters
• Holding test administration in accessible locations
• Using assistive devices
406 Chapter 17 • Ethical and Legal Issues in Assessment
Civil Rights Act of 1991 Title VII of the Civil Rights Act of 1964—amended in 1971, 1978,
and 1991—outlaws discrimination in employment based on race, color, religion, gender,
pregnancy, or national origin. The original legislation created the Equal Opportunity Com-
mission (EEOC), which was charged with developing guidelines to regulate equal employ-
ment. In the 1970s, the EEOC developed strict guidelines involving the use of employment
tests. The guidelines stated that all formal assessment instruments used for employment
decisions that may adversely affect hiring, promotion, or other employment opportunity
for classes protected by Title VII constitutes discrimination unless the test can demonstrate
“a reasonable measure of job performance” (i.e., validity).
An example of problems with employment tests and potential discrimination is the
landmark case of Griggs v. Duke Power Company (1971). The case involved African Ameri-
can employees of a private power company who filed suit against the company, claiming
that the criterion for promotion, such as requiring high school diplomas and passing scores
on standardized tests (i.e., the Wonderlic Personnel Test and the Bennett Mechanical Com-
prehension Test), were discriminatory. The U.S. Supreme Court decided that Duke Power
violated Title VII of the Civil Rights Act because the standardized testing requirement
prevented a disproportionate number of African American employees from being hired
by, and advancing to higher-paying departments within, the company. Validity of the
tests was a key issue in this case, because neither of the tests was shown to be significantly
related to successful job performance; as a result, the case spurred stronger focus on the
validity of employment tests.
Family Educational Rights and Privacy Act of 1974 The Family Educational Rights and
Privacy Act (FERPA) of 1974 is a federal law that protects that privacy of student records.
FERPA gives parents certain rights with respect to their children’s education records, such
as the right to examine their children’s academic records and stipulate the terms under
which others may have access to them. If there is assessment information in the records,
then parents have a right to see these results as well.
learning disabilities (SLDs). School districts are no longer required to use the traditional
ability-achievement discrepancy model, which was a component of IDEA of 1997. IDEA now
mandates that schools utilize several scientifically based assessments and instructional
and behavioral interventions to determine whether students have SLDs, thereby qualify-
ing them for special education services. As a result, many states and school districts have
adopted an alternative model called Responsiveness to Intervention, or RTI. RTI is a compre-
hensive, multistep approach to providing services and interventions to students at increas-
ing levels of intensity based on progress monitoring and data analysis. RTI is fully described
in Chapter 14.
Health Insurance Portability and Accountability Act of 1996 The Health Insurance
Portability and Accountability Act (HIPAA) was enacted by the U.S. Congress in 1996. The
act has three main purposes: to guarantee insurance portability, to increase protection
against fraud in the insurance industry, and to institute new regulations regarding the
security and privacy of health information.
The HIPAA Privacy Rule (CFR 42. Cite. 42 CFR Part 2. §164.501) establishes a mini-
mum level of privacy protection for health care information. For professionals involved
with assessment, the new privacy regulations are most relevant. The privacy regulations
establish that personal health information (which includes assessment information) must
be kept confidential. The regulations are designed to safeguard the privacy and confiden-
tiality of an individual’s health information, particularly in this age of electronic transmis-
sion of information (U.S. Department of Health & Human Services, 2014). The regulations
define the rights of individuals, the administrative obligations of covered entities (i.e.,
health-care providers, health plans, health-care clearinghouses), and the permitted uses
and disclosures of protected health information. In general, the privacy rule requires prac-
titioners to act as follows:
• Provide clients with information about their right to privacy and how any informa-
tion collected may be used.
• Adopt and follow procedures for protecting privacy
• Conduct training of all staff to ensure that they understand and can accurately apply
the privacy standards.
• Appoint a designated individual to monitor adherence to the adopted privacy stand-
ards and HIPAA.
• Ensure that patient records are maintained in compliance with HIPAA standards.
No Child Left Behind Act of 2001 No Child Left Behind (NCLB) contained the most
sweeping changes to the Elementary and Secondary Education Act (ESEA) since it was
enacted in 1965. It changed the federal government’s role in K–12 education by requiring
America’s schools to describe their success in terms of what each student accomplishes.
The NCLB contains four basic education reform principles: stronger accountability,
increased flexibility and local control, expanded options for parents, and an emphasis on
teaching methods that have been proven to work. NCLB significantly raises expectations
for states, local school systems, and individual schools, in that all students are expected to
meet or exceed state standards in reading and mathematics within 12 years. NCLB requires
all states to establish state academic standards and a state testing system that meet federal
requirements.
408 Chapter 17 • Ethical and Legal Issues in Assessment
As a result of NCLB, states have created policies to reward schools that score well on
high-stakes tests. Merit-based awards, clear accountability and public visibility, and finan-
cial incentives for educators are purported benefits of high-stakes tests. However, many
criticize the use of these tests in education (further discussion in Chapter 14).
Carl D. Perkins Vocational and Technical Education Act of 2006 The Carl D. Perkins
Vocational and Technical Education Act provides federal funding and guidance for career
and technical education, with a focus on student achievement and preparing students for
careers and postsecondary education. The 2006 reauthorization called for increased focus
on the academic achievement of career and technical education students, strengthened the
connections between secondary and postsecondary education, and improved state and
local accountability. The accountability aspect of this new law means that, for the first
time, career and technical education programs will be held accountable for continuous
improvement in performance, measured by the academic proficiency of their students.
Success will be determined through valid and reliable tests, including No Child Left
Behind assessments in reading, math, and science.
Judicial Decisions
Judicial decisions are laws created by opinions from the court, often in litigation cases. Most
of the judicial decisions affecting assessment involve employment and educational tests.
• Sharif v. New York State Educational Department (1989) concerned the use of Scholastic
Aptitude Test (SAT) scores as the sole basis for awarding state merit scholarships.
The plaintiffs claimed that the state was discriminating against girls who were com-
peting for the award. The court ruled that New York could not use the SAT scores
alone as a basis for awarding scholarships and needed to have other criteria, such as
grades or statewide achievement test data.
Judicial Decisions Involving Employment Tests Employment is another area that has
received legal attention for assessment practices. The following is a list of noteworthy judi-
cial decisions related to employment assessment:
• Griggs v. Duke Power Company (1971) decided that Duke Power Company violated
Title VII of the 1964 Civil Rights Act by requiring a high school diploma and two
written tests of applicants for jobs as laborers. The U.S. Supreme Court decided that
Duke Power violated Title VII of the Civil Rights Act because the standardized test-
ing requirement prevented a disproportionate number of African American employ-
ees from being hired by, and advancing to higher-paying departments within, the
company. Validity of the tests was a key issue in this case, because neither of the tests
was shown to be significantly related to successful job performance.
• Washington v. Davis (1976) was a case that dealt with the issue of bias in selection
tests. In this case, two African Americans filed suit against the District of Columbia’s
police department for using a selection test (a test of verbal skills) that dispropor-
tionately screened out African American applicants. The U.S. Supreme Court ruled
against the workers and stated that an official action (in this case, the use of a selec-
tion test) could not be determined unconstitutional solely because it results in a
racially disproportionate impact. The Court instead relied on whether or not the
police department intended to discriminate against African American job appli-
cants. The Court found no intent to discriminate; therefore, it ruled in favor of the
police department.
• Bakke v. California (1978) was a landmark decision by the U.S. Supreme Court on
affirmative action. It ruled that the use of racial “quotas” for university admissions
was unconstitutional, but approved affirmative action programs that give equal
access to minorities.
• Golden Rule Insurance Company v. Richard L. Mathias (1984) was a landmark lawsuit
that charged that the insurance agent licensing exam made by the Educational Test-
ing Service (ETS) was racially biased and not job related. The case was settled out of
court when ETS agreed to construct future tests by (a) including items for which the
percentage of correct responses for all test takers is at least 40% and (b) considering
any item biased if the difference in the percentage of correct answers between White
and African American test takers exceeded 15%.
• Contreras v. City of Los Angeles (1981) held that the employer’s burden is satisfied by
showing it used professionally acceptable methods and that the test was predictive
or significantly correlated with important elements of work behavior that comprised
or were relevant to the job.
• Berkman v. City of New York (1987) struck down an arbitrary conversion process to
enhance scores for female firefighter applicants, even though the underlying tests did
not predict job performance.
410 Chapter 17 • Ethical and Legal Issues in Assessment
• Watson v. Fort Worth Bank and Trust (1988) ruled that adverse impact does not apply
to subjective criteria. The plaintiff must identify a specific criterion as producing an
adverse impact and show reliable and probative statistical evidence to support an
inference of discrimination. The employer needs only to offer a legitimate business
reason for the criterion. Employers are not required, even when defending standard-
ized tests, to introduce formal validation studies showing that particular criteria pre-
dict actual on-the-job performance.
• Ward Cover Packing Company v. Antonio (1989) reversed the impact of Griggs v. Duke
Power (1971). A much more conservative Supreme Court agreed that, yes, employ-
ment tests must be substantially job related, but that employers could require tests
that went beyond what was essential for the job as part of “business necessity.”
Summary
Professional standards and codes of ethics principles are competence, integrity, treating cli-
ensure ethical and professional behavior by ents with respect and dignity, accepting responsi-
counselors and other helping professionals who bility, and concern for the welfare of others. Those
engage in assessment. Laws and court decisions involved in testing need to have competency in
have had an impact on assessment practice. The administering the test selected, provide an expla-
overall effect has been fairer tests and testing nation for using the test, and discuss the test
practices for diverse groups. The role of tests is results in a language test takers can understand.
constantly being redefined by the courts. Pro- They must get informed consent prior to adminis-
fessionals need to be guided by the code of eth- tering the test. Counselors need to be familiar with
ics of their organization, be familiar with laws techniques and procedures that are appropriate
and court interpretations relating to testing, and for use with clients from other ethnic and cultural
be careful, critical consumers of assessment groups. Practitioners also need to be aware of and
practices and procedures. respect the beliefs, attitudes, knowledge, and
Codes of ethics and standards of practice are skills of all individuals they work with. This
essential elements of fair assessment. There are requires counselors to understand their own
similarities among the codes. Some of the common assumptions, biases, and values.
Suggested Activities
1. Conduct a content analysis of two or three of the certain tests for a particular function, misuse of
codes of ethics as they relate to testing. How are tests, appropriateness of a certain test for a minor-
they alike? How are they different? ity group, and the like.
2. Stage a mock trial on one of the major issues in 3. Discuss some of the cases presented in the section
testing, such as due process, appropriateness of on responsible test use.
Chapter 17 • Ethical and Legal Issues in Assessment 411
4. Research to see if any cases in your local court To remain competitive, Memorial Hospital has
system have involved assessment issues. decided it needs to cut its budget by downsizing
5. Read the following case study and answer the semiskilled workers, such as orderlies, custodi-
questions at the end: ans, cafeteria personnel, stockroom clerks, file
A private liberal arts college was working toward room clerks, and so on. The hospital would like
getting regional accreditation. The school had to help these workers qualify for higher-level
usually admitted a large percentage of students jobs so that they can remain with the organiza-
that were lower performers in their high school tion. From the attrition rate, the hospital admin-
class, including any student who had a 2.0 grade istrators know they will need workers with
point average (GPA) on a 4.0 scale. The college advanced technical skills and will have to recruit
did not require the SAT or ACT. The admissions from the outside to fill these positions if they
committee was under pressure to increase the have no one qualified internally. The personnel
academic respectability of the college by chang- department has decided to give all the targeted
ing the standards to require a minimum score of workers who will lose their positions the Wide
900 on the SAT Reasoning test. The committee Range Achievement Test (WRAT-III) and the
was told that enrollment would increase if the Wonderlic Personnel Test-Revised (WPT-R) and
school used an established assessment test and select those with the highest scores to be
that fewer students would drop out. retrained. Of the workers, 80% are women and
minority group members.
a. What are the assessment issues in the situation?
b. What factors or assessment practices are a. What are the ethical and legal issues related
involved? to this case?
c. If you were a consultant invited by the college b. If you were a consultant hired by the hospital
to help the admissions committee in the selec- to help identify workers to be retrained and
tion and retention of students, what would conduct outplacement counseling for those
you advise? who are to be let go, what would you advise
Memorial to do?
6. Read the following case study and answer the
questions at the end:
References
American Counseling Association. (2014). ACA Code International Test Commission (ITC). (2013). Interna-
of Ethics. Alexandria, VA: Author. tional Guidelines for Test Use. Retrieved from
American Counseling Association. (2014). Standards for http://www.intestcom.org/itc_projects.htm
qualifications of test users. Alexandria, VA: Author. National Council on Measurement in Education
American Educational Research Association (AERA), (NCME). (1995). Code of professional responsibilities
American Psychological Association (APA), & in educational measurement. Washington, DC:
National Council on Measurement in Education Author.
(NCME). (2014). Standards for educational and psy- Oakland, T. (2012). Principles, standards, and guide-
chological testing. Washington, DC: Authors. lines that impact test development and use and
American Psychological Association (APA). (2002). sources of information. In Leach, M.M., Stevens,
Ethical principles of psychologists and code of conduct M. J, Ferrero, A., Korkut, Y., & Lindsay, G. (Eds.),
(Rev. ed.). Washington, DC: Author. Oxford international handbook of psychological ethics
Americans With Disabilities Act of 1990, Pub. L. (pp. 201–215). Oxford, UK: Oxford University
No. 101-336, 104 Stat. 328 (1990). Press.
Association for Assessment and Research in Coun- U.S. Department of Health and Human Services
seling (AARC). (2003). Responsibilities of users of (2014). HIPPA privacy rule and sharing informa-
standardized tests (RUST). Alexandria, VA: Author. tion related to mental health. Retrieved from
Code of Fair Testing Practices in Education. (2004). http://www.hhs.gov/ocr/privacy/hipaa
Washington, DC: Joint Committee on Testing /understanding/special/mhguidance.html
Practices.
Appendix I
The Standards
Advocacy
Culturally competent professional counselors recognize the importance of social justice advocacy;
they integrate understanding of age, gender, ability, race, ethnic group, national origin, religion,
sexual orientation, linguistic background, and other personal characteristics in order to provide
appropriate assessment and diagnostic techniques.
Professional counselors should:
• Recognize in themselves and others, subtle biases and the way these biases influence
and impact the assessment process for marginalized populations.
• Seek opportunities for learning by immersion into marginalized populations in order
to gain understanding of clients’ worldview and the impact on the assessment process.
• Support use of assessments with psychometric properties appropriate for individu
als and vulnerable groups and create awareness about assessment of culturally
diverse clients.
• Provide culturally competent and effective practices in all areas of counseling and
assessment in individual, family, school, and community settings.
• Work collaboratively with community leaders to understand and address the needs
of diverse clients providing opportunities to access services if needed.
• Address systemic barriers and consider how these barriers impact the interpretation
and use of assessment results.
• Be knowledgeable of potential bias in assessment instruments and use procedures
that comply with ethical guidelines when assessing marginalized populations.
• Are responsible for the appropriate applications, scoring, interpretations, and use of
assessment instruments relevant to the needs of clients, whether they score and inter
pret such assessments themselves or use technology or other services.
• Take reasonable measures to ensure the proper use of psychological and assessment
techniques by persons under their supervision, that results are kept confidential and
that results are not misused by others.
• Understand how to select and utilize appropriate modified forms of tests for test tak
ers with disabilities who need special accommodations.
• Select assessments that help identify client needs, strengths and resources for client
empowerment and self-advocacy.
• Select instruments with which they are trained and are competent to use and adhere to
the ethical standards for the administration, scoring, interpretation, or reporting proce
dures and ensure that persons under their supervision are aware of these standards.
• Recognize the impact of cultural identity on test administration and interpretation,
and place test results in proper perspective with other relevant factors.
• Recognize how the effects of stigma, oppression, and discrimination impact the inter
pretation and application of assessment results for culturally diverse clients.
• Recognize and collaborate with others to eliminate biases, prejudices, and discrimina
tory contexts in conducting evaluations, interpretations and providing interventions.
• Explain the nature and purpose of assessment and specific use of results in an under
standable, developmental level of the client or the client’s legally authorized repre
sentative providing information about the impact of culture on assessment results
and interpretation.
• Consider other factors present in the client’s situation (e.g., disability or cultural fac
tors or systematic or internalized oppressing) before making any recommendations,
when relevant.
• Do not use data or results from assessments that are obsolete or outdated and make
every effort to prevent the misuse of obsolete measures and assessment data by others.
• Release assessment data in which clients are identified only with the consent of cli
ents or their legal representatives, or court order and only released to professionals
recognized as qualified to interpret the data.
References
American Art Therapy Association. (2011). Art ther- www.counseling.org/resources/codeofethics
apy multicultural/diversity competencies. Retrieved /TP/home/ct2.aspx
from http://www.arttherapy.org/upload/multi Association for Lesbian, Gay, Bi-Sexual, Transgender
culturalcompetencies2011.pdf Issues in Counseling. (2010). Competencies for coun-
American Counseling Association. (2005). Code of seling LGBTQ clients and competencies for counseling
ethics and standards of practice of the American transgender clients. Retrieved from http://www
Counseling Association. Retrieved from http:// .algbtic.org/resources/competencies.
416 Appendix I • Standards for Multicultural Assessment Fourth Revision, 2012
Association for Multicultural Counseling and Devel National Career Development Association. (2010).
opment. (1996). AMCD multicultural counseling Career counselor assessment and evaluation competen-
competencies. Retrieved from http://www.multi cies. Retrieved from http://associationdatabase
culturalcounseling.org. .com/aws/NCDA/asset_manager/get_file
American Psychological Association. (2004). Code of /18143/aace-ncda_assmt_eval_competencies.
Fair Testing Practices in Education. Retrieved from Association for Assessment and Counseling. (2003).
http://www.apa.org/science/programs/test Responsibilities of users of standardized tests.
ing/fair-testing.pdf. Retrieved from http://www.theaaceonline.com
Council on Accreditation of Counseling and Related /rust.pdf.
Educational Programs. (2009). 2009 standards. Texas Professional Educational Diagnosticians Board
Retrieved from http://www.cacrep.org of Registry. (2010). Best practice guidelines.
/doc/2009%20Standards%20with%20cover.pdf. Retrieved from http://regped.com/pdf/TPED
Counselors for Social Justice Position Statement on BestPract.pdf.
Academic Achievement Gap and Equity on Edu
cational Services. (2008). Retrieved from http:// 2011–2012 Executive Council:
counselorsforsocialjustice.com/CSJ_Position_-_
Academic_Achievement_Gap.pdf. • President: Danica G. Hays, Old Dominion
Council on Rehabilitation Education. (2012). Accredi- University, [email protected]
tation manual for masters’ level rehabilitation counse- • President-Elect: Carl Sheperis, Lamar Univer
lor education programs. Retrieved from http:// sity, [email protected]
www.core-rehab.org/Files/Doc/PDF/CORE • Past President: Joshua Watson, Mississippi
StandardsPrograms.pdf. State University-Meridian, jwatson@meridian.
Commission on Rehabilitation Counselor Certifica msstate.edu
tion. (2009). Code of professional ethics for rehabilita- • Secretary: Casey Barrio-Minton, University of
tion counselors. Retrieved from http://www North Texas, [email protected]
.crccertification.com/filebin/pdf/CRCCodeOf • Treasurer: Stephanie Crockett, Oakland
Ethics.pdf. University
Lewis, Arnold, House, and Toporek. (2003). ACA • Governing Council Representative: Joshua
advocacy competencies. Retrieved from http:// Watson, Mississippi State University-Meridian,
www.counseling.org/resources/competencies [email protected]
/advocacy_competencies.pdf. • MAL Publications: Dale Pietzrak, University of
National Association of Alcoholism and Drug Abuse South Dakota, [email protected]
Counselors. (2011). Code of ethics. Retrieved from • MAL Awards: Susan Carmichael, susan.
http://www.naadac.org/membership/code-of- [email protected]
ethics. • MAL Membership: Amy McLeod, Argosy
National Association of School Psychologists. (2010). University- Atlanta, [email protected]
National Association of School Psychologists’ model • Graduate Student Representative: Jayne Smith,
for comprehensive and integrated school psychological Old Dominion University, [email protected]
services. Retrieved from http://www.nasponline Multicultural Assessment Standards Revision
.org/standards/2010standards/2_PracticeModel Task Force: Dr. Linda Foster, Chair (Foster_lh@mercer.
.pdf. edu , Mercer University); Members: Dr. Gabriel
National Board for Certified Counselors. (2005). Code Lomas ([email protected]), Danica Hays (DHays@
of ethics. Retrieved from http://nbcc.org/Assets odu.edu), Michael Becerra ([email protected].
/Ethics/nbcc-codeofethics.pdf. edu), and Anita Neuer Colburn ([email protected])
Appendix II
Technical Knowledge
Responsible use of tests requires technical knowledge obtained through training, educa-
tion, and continuing professional development. Test users should be conversant and com-
petent in aspects of testing, including:
• Validity of Test Results Validity is the accumulation of evidence to support a spe-
cific interpretation of the test results. Since validity is a characteristic of test results, a
417
418 Appendix II • Responsibilities of Users of Standardized Tests (RUST; 3rd Ed.)
test may have validities of varying degree for different purposes. The concept of
instructional validity relates to how well the test is aligned to state standards and
classroom instructional objectives.
• Reliability Reliability refers to the consistency of test scores. Various methods are
used to calculate and estimate reliability depending on the purpose for which the test
is used.
• Errors of Measurement Various ways may be used to calculate the error associated
with a test score. Knowing this and knowing the estimate of the size of the error
allows the test user to provide a more accurate interpretation of the scores and to
support better-informed decisions.
• Scores and Norms Basic differences between the purposes of norm-referenced and
criterion-referenced scores impact score interpretations.
Test Selection
Responsible use of tests requires that the specific purpose for testing be identified. In addi-
tion, the test that is selected should align with that purpose, while considering the charac-
teristics of the test and the test taker. Tests should not be administered without a specific
purpose or need for information. Typical purposes for testing include:
• Description Obtaining objective information on the status of certain characteristics, such
as achievement, ability, personality types, and so on, is often an important use of testing.
• Accountability When judging the progress of an individual or the effectiveness of
an educational institution, strong alignment between what is taught and what is
tested needs to be present.
• Prediction Technical information should be reviewed to determine how accurately
the test will predict areas such as appropriate course placement; selection for special
programs, interventions, and institutions; and other outcomes of interest.
• Program Evaluation The role that testing plays in program evaluation and how the
test information may be used to supplement other information gathered about the
program is an important consideration in test use.
Proper test use involves determining if the characteristics of the test are appropriate for the
intended audience and are of sufficient technical quality for the purpose at hand. Some
areas to consider include:
• The Test Taker Technical information should be reviewed to determine if the test
characteristics are appropriate for the test taker (e.g., age, grade level, language, cul-
tural background).
• Accuracy of Scoring Procedures Only tests that use accurate scoring procedures
should be used.
• Norming and Standardization Procedures Norming and standardization proce-
dures should be reviewed to determine if the norm group is appropriate for the
intended test takers. Specified test administration procedures must be followed.
• Modifications For individuals with disabilities, alternative measures may need to be
found and used and/or accommodations in test-taking procedures may need to be
employed. Interpretations need to be made in light of the modifications in the test or
testing procedures.
Appendix II • Responsibilities of Users of Standardized Tests (RUST; 3rd Ed.) 419
• Fairness Care should be taken to select tests that are fair to all test takers. When test
results are influenced by characteristics or situations unrelated to what is being meas-
ured (e.g., gender, age, ethnic background, existence of cheating, unequal availability
of test preparation programs), the use of the resulting information is invalid and
potentially harmful. In achievement testing, fairness also relates to whether or not
the student has had an opportunity to learn what is tested.
Test Administration
Test administration includes carefully following standard procedures so that the test is
used in the manner specified by the test developers. The test administrator should ensure
that test takers work within conditions that maximize opportunity for optimum perfor-
mance. As appropriate, test takers, parents, and organizations should be involved in the
various aspects of the testing process, including the following:
Before administration, it is important that relevant persons
• are informed about the standard testing procedures, including information about the
purposes of the test, the kinds of tasks involved, the method of administration, and
the scoring and reporting;
• have sufficient practice experiences prior to the test to include practice, as needed, on
how to operate equipment for computer-administered tests and practice in respond-
ing to tasks;
• have been sufficiently trained in their responsibilities and the administration proce-
dures for the test;
• have a chance to review test materials and administration sites and procedures prior
to the time for testing to ensure standardized conditions and appropriate responses
to any irregularities that occur;
• arrange for appropriate modifications of testing materials and procedures in order to
accommodate test takers with special needs; and
• have a clear understanding of their rights and responsibilities.
Test Scoring
Accurate measurement necessitates adequate procedures for scoring the responses of test
takers. Scoring procedures should be audited as necessary to ensure consistency and accu-
racy of application:
• Carefully implement and/or monitor standard scoring procedures.
• When test scoring involves human judgment, use rubrics that clearly specify the cri-
teria for scoring. Scoring consistency should be constantly monitored.
• Provide a method for checking the accuracy of scores when accuracy is challenged by
test takers.
Closing
Proper test use resides with the test user—the counselor and educator. Qualified test users
understand the measurement characteristics necessary to select good standardized tests, admin-
ister the tests according to specified procedures, assure accurate scoring, accurately interpret test
scores for individuals and groups, and ensure productive applications of the results. This docu-
ment provides guidelines for using tests responsibly with students and clients.
RUST Committee
Janet Wall, Chair Brad Erford
James Augustin David Lundberg
Charles Eberly Timothy Vansickle
422
Name Index 423
Butcher, J. N., 15, 16, 33, 155, 277, 294 Darwin, C., 14
Byham, W. C., 265 Das, J. P., 168, 185
Davey, T., 33
C Davidshofer, C. O., 34, 97
Cairney, J., 234 Davis, J. L., 104
Caldwell, J. S., 348 Davis, R., 122, 280
Calfa, N. A., 336 Decker, S. L., 339
Callahan, C. M., 341 De Clercq, B., 276
Callahan, J. L., 288 DeFries, J. C., 189
Camara, W. J., 173 Demers, J. A., 347
Campbell, D. P., 247 Demyttenaere, K., 304
Campbell, D. T., 122 Derogatis, L. R., 313
Campion, M. A., 259 Derringer, J., 273
Carlat, D. J., 319 Detrick, P., 276
Carr, A., 384, 385 Dickens, W. T., 190
Carroll, J. B., 167–168 Dierckx, E., 276
Casañas i Comabella, C., 318 Dikel, M. F., 266
Cattell, R. B., 67, 124, 125, 164, 275, 285 Domino, G., 29
Cavagnaro, A. T., 42 Domino, M. L., 29
Cermak, S. A., 234 Donders, J., 384
Chamorro-Premuzic, T., 160, 276 Donnay, D. A. C., 246
Chan, K. M., 234 Dozier, V. C., 267
Charcot, J. M., 11 Drasgow, F., 260
Chernyshenko, O. S., 260 Driscoll, A., 97
Chibnall, J. T., 276 Dumont, R., 177
Choate, L., 43 Dunn, D. M., 31, 208
Christ, T. J., 347 Dunn, L. M., 31, 208
Cicchetti, D. V., 367 Duvoe, K. R., 294
Clifford, R., 240 Dvorak, B. J., 228
Coalson, D. L., 184 E
Cohen, R. J., 3, 14, 27, 33, 41, 112, 114, 119, 121, 276, Eber, H. W., 67, 275
298, 303, 384 Eggers, C., 234
Coleman, S. L., 337 Ekstrom, R. B., 336
Colwell, K., 42 Elliott, C., 184
Coopersmith, S., 295 Elmore, P. B., 336
Cortiella, C., 338 Engelhard, G., Jr., 384
Costa, P. T., Jr., 272, 275 Epstein, S., 297
Costaris, L., 385 Erford, B. T., 4, 23–25
Cottone, V., 351 Ertelt, T., 42
Coups, E., 68 Esping, A., 158, 159, 164, 172
Court, J. H., 186 Esquirol, J., 11
Craig, R. J., 25, 42 Exner, J. E., Jr., 288
Cronbach, L. J., 102, 119
Csapó, B., 239 F
Cudeck, R., 68 Faught, B. E., 234
Cummings, J. A., 290 Faulkner-Bond, M., 112
Cummings, J. L., 319 Faust, D., 153
Cummings, N. A., 319 Feild, H. S., 259
Feng, A., 215
D Ferrara, M. S., 97
Dahir, C. A., 336 Fink, G., 234
Dana, R. H., 360, 361 First, M. B., 308
424 Name Index
W Z
Wall, J. E., 255 Zelkowitz, R. S., 256
Walpert, K., 97 Zhang, Y., 361
Wand, B., 265 Zytowski, D. G., 250
Subject Index
Gifted Rating Scales–School Form (GRS-S), 343 House-Tree-Person Technique (HTP), 291, 294
Global indices, 313, 314 Hunger, impact on achievement, 216
Global Scoring System (GSS), 327–328
Global Self-Esteem Quotient (GSEQ), 296 I
Global Severity Index (GSI), 313 Identification of problems, 4, 8
Golden Rule Insurance Company v. Richard L. Mathias Implementation, of assessment methods, 8–9
(1984), 409 In-Basket Test, 265
Gold standard, 122 Independent variables, 66
Government employment assessment, 261–262 Index composite scores, 174, 176, 177
Grade equivalent scores, 83–84, 88 Indirect observations, 35
Grade point average (GPA), 118, 236, 237 Individual achievement tests, 202–05
Graduate Management Admissions Test (GMAT), Individual intelligence tests, 171–185
33–34, 239 Individualized Education Plans (IEPs), 340
Graduate Record Examinations (GREs), 62, 82, 238 Individually administered tests, 29, 30, 145, 148–150,
Graphic rating scales, 38, 39 171
Graphs, 53 Individuals with Disabilities Education
Graves Design Judgment Test, 235 Improvement Act of 2004 (IDEA), 205, 332,
Griggs v. Duke Power Company (1971), 406, 409 339, 363–364, 406–07
Group-administered tests, 29, 30, 146, 148, 150, Inferential statistics, 47
171–172 Informal assessment instruments, 5, 22–23
Group differentiation studies, 124 Informal observations, 7, 23, 34–36, 319
Group feedback sessions, 378 Informants, 39
Group intelligence tests, 171–172, 185–186 Information processing, 159, 161, 168
Guilford’s structure-of-intellect model, 161, 164–165 Information processing model (Luria), 166, 182
Guilford–Zimmerman Temperament Scale, 260 Informed consent, 148
Initial interview. See Interviews
H Innate ability, 224
Hall Occupational Orientation Inventory, 252 In-school factors, impact on achievement, 216
Halo effect, 40 Instructional dimensions, 346
Halstead–Reitan Neuropsychological Test Battery, Instrumental values, 251
326, 369 Instrument evaluation forms, 141–142
Hand-scoring, 141, 151–152 Intellectual ability tests, 29, 335
Harrington–O’Shea Career Decision-Making Intellectual disabilities, 367–368
System–Revised (CDM-R), 249 Intelligence, 158–170. See also Intelligence tests
Health Insurance Portability and Accountability Act age differentiation studies of, 124–125
of 1996 (HIPAA), 407 Carroll’s three-stratum model of human abilities,
Hearing impairment, 366–367 161, 167–168
Henmon–Nelson Tests of Mental Ability, 186 Cattell–Horn–Carroll hierarchical three-stratum
Heredity, intelligence and, 189–190 model of, 161, 168–169, 182
Hierarchical model of intelligence (Vernon), 161, Cattell–Horn Gf-Gc theory of, 161, 164
163–164 crystallized, 124, 164
Hierarchical three-stratum model of defined, 158–160
intelligence(Cattell–Horn–Carroll), 161, fluid, 124–125, 164
168–169, 182 Gardner’s theory of multiple intelligences, 161,
High-stakes testing, 14, 27, 212, 348, 349 167
Hill Performance Test of Selected Positional Guilford’s structure-of-intellect model, 161,
Concepts, 366 164–165
Hispanic clients, 73 heredity and, 189–190
Histograms, 53–54 historical perspectives on, 14
Home learning activities, impact on achievement, Luria’s model of information processing, 166, 182
216 PASS theory of cognitive processing, 161, 168, 185
Homogeneity, of test items, 99, 121 Piaget’s theory of cognitive development, 161,
Honesty tests, 267 165–166
Horizontal axis, 54 Spearman’s two-factor theory of, 161–162, 164
436 Subject Index
Psychiatric disorders. See Mental disorders Relations to other variables evidence, 118–121
Psychoeducational reports. See Assessment reports Relaxation exercises, 352
Psychological Abstracts (American Psychological Relevance
Association), 137 of criterion measures, 118
Psychological reports. See Assessment reports of norm groups, 73, 74
Psychometrics, 22, 155, 160. See also Reliability/ Reliability coefficients, 96, 103–04
precision; Validity Reliability/precision, 91–108
Psychometric soundness, 22 alternate forms reliability, 97, 98–99, 103
Psychomotor abilities, 232–234 confidence intervals and, 106–07
Psychopathology, 277 of criterion measures, 118–119
Psychosocial problems, 307–08 defined, 22, 91, 92
PsycINFO database, 137 increasing, 107
PsycLIT CD-ROM, 137 internal consistency reliability, 97, 99–103, 117, 121
Public, communicating results to, 390–392 interrater reliability, 95, 97, 102–03, 104, 153
Publishers, of tests, 32, 74, 133, 135 interviews and, 25–26
Purdue Pegboard Test, 233–234 measurement error and, 93–95, 139
methods of estimating, 95–107
Q overview, 91–93
Qualitative assessment, 36, 84 reliability coefficients, 96, 103–04
Qualitative descriptors, 84–85 standard error of measurement and, 93, 104–06
Qualitative variables, 47, 48 standardized tests and, 30
Quality of Life Inventory (QOLI), 295 test-retest reliability, 97–98, 103
Quality of test items, impact on reliability, 95 validity and, 113, 117
Quantitative knowledge, 164 Reporting, of assessment results, 9, 15
Quantitative reasoning, 181 Representativeness, of norm groups, 73, 139
Quantitative variables, 47 Research literature, 134, 136, 137
Quartiles, 76 Resources, for selecting assessment methods, 133–138
Questionnaires, 31 Response acquiescence errors, 40
Quick Neurological Screening Test (QNST-II), 328 Response deviance errors, 40
Response processes, 126
R Response styles, 277, 297–298
Range, 57–58 Responsibilities of Users of Standardized Tests (RUST),
Rapport, in assessment, 25, 30, 149–150 10–11, 155, 399, 417–421
Rater bias, 39 Responsive Environmental Assessment for
Rating scales, 27–28, 38–40, 276, 319 Classroom Teaching (REACT), 347
Rational approach, to personality inventory Responsiveness to Intervention (RTI), 339–340, 407
development, 274 Results. See Communicating results
Ratio scales, 48, 50 Reynell–Zinkin Scales, 366
Raven’s Progressive Matrices (RPM), 186 Rhythmic intelligence, 167
Raw scores, 71, 77, 79, 81, 88 RIASEC model, 226, 245
Readability, of tests, 141 Risk factors, 318
Readiness, for test results, 379–380 Riso–Hudson Enneagram Type Indicator (RHETI),
Readiness tests, 239–240, 334, 335 277, 285, 287
Reasoning ability, 163 Rokeach Values Survey (RVS), 251
Recommendations, from assessment results, 9 Rorschach Inkblot Test, 287–288, 294
Records Ross Information Processing Assessment (RIPA-2),
of observations, 36–41 328
as source of information, 7, 42 Rotter Incomplete Sentence Blank (RISB), 290
Reference sources, 133–135 RUST. See Responsibilities of Users of Standardized Tests
Referral questions, 8
Regression analysis, 65–67 S
Regression lines, 66, 67 SAD PERSONS Scale, 23–24
Regulations and statutes, 404, 405–08 Salience Inventory (SI), 252
Relationship measures, 61–68 Samples, 47
Subject Index 441
Satisfaction with Life Scale (SWLS), 295 Self-administered tests, 29, 145
SAT test, 27, 82, 106, 118, 210, 236–237, 350 Self-Directed Search (SDS), 244–246
Scales, of tests, 31–32 Self-esteem inventories, 295, 296–297
Scale scores, 71 Self-monitoring, 41
Scales for Rating the Behavioral Characteristics of Semantic differential rating scales, 38, 39
Superior Students (SRBCSS), 343 Semistructured interviews, 24–26, 308–09
Scales of measurement, 48–50 Sensing types, 283
Scatterplots, 65, 66 Sentence-completion tasks, 28, 289–290
Scenario-based testing, 16 Severity errors, 40
Schedules, 31 Sharif v. New York State Educational Department (1989),
Scholastic Abilities Test for Adults (SATA), 211 409
School ability index, 186 Short-term acquisition and retrieval, 164
School and College Ability Tests, 131 Significant statistical differences, 177
School assessment programs, 333–335. See also Simple linear regression, 66
Educational assessment Simulations, 15–16
School counselors Simultaneous administration, 97, 98
assessment activities of, 336–346 Situational interviews, 259
competencies for, 347–348 Situational judgment tests, 265
role of, 333 Sixteen Personality Factor Questionnaire (16PF), 67,
School Environment Preference Survey (SEPS), 347 253, 275, 277, 285, 286
School Readiness Test (SRT), 240 Size, of norm groups, 74, 139
School settings Skewed curves, 54–55, 57
availability of information in, 131, 132 Skewness, 54
Scores and scoring, 70–89 Skills, 253
automated, 15 Skills assessment, 253–255
computer-based, 15, 152 Slosson Intelligence Test-Revised for Children and
criterion-referenced, 71–72, 87–88, 154, 197 Adults (SIT-R3), 50–54, 57–59, 185
ease of, 141 Smoothed frequency polygons, 54, 55
errors in, 153 Social Anxiety Scale, 100
hand-scoring, 141, 151–152 Social consequences of testing, 125
norm-referenced (See Norm-referenced scores) Social desirability errors, 40
overview, 70–71 Social skills, impact on achievement, 216
performance assessment, 152–153 Socrates, 11
procedures for, 29 Sources of assessment, 5–7, 20–21, 41–43
qualitative descriptors of, 84–85 Spatial ability, 163
standards for, 153–154 Spatial intelligence, 167
tables and profiles for, 85–87 Spearman-Brown prophecy formula, 100
Scoring errors, 153 Spearman’s rho, 64
Scoring rubrics, 152–153 Spearman’s two-factor theory of intelligence,
Screening Assessment for Gifted Elementary and 161–162, 164
Middle School Students (SAGES-2), 343 Specialized aptitude tests, 230–235
Screening process, 4 artistic abilities, 234–235
Seashore Measures of Musical Talents, 235 clerical abilities, 230–231
Secondary traits, 273 mechanical abilities, 231–232
Selected-response items, 27–28, 31, 276 musical abilities, 235
Selecting assessment methods, 130–142 overview, 230, 232
determine methods for obtaining information, psychomotor abilities, 232–234
131–133 Specialized intelligence tests, 186–188
evaluate instruments and strategies, 137–142 Specific learning disabilities (SLDs), 337–339,
identify type and availability of information 406–07
needed, 131, 132 Specimen sets, 133, 135
overview, 8–9, 130–131 Speech disorders, 369
resources for, 133–138 SPLASH test-taking strategy, 351
Selection interviews, 258–259 Split-half reliability, 97, 99–100, 103
442 Subject Index