0% found this document useful (0 votes)
47 views

PSYCH ASSESSMENT REVIEWER (PRELIM)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

PSYCH ASSESSMENT REVIEWER (PRELIM)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Testing Assessment

PSYCHOLOGICAL Testing may be Assessment is


ASSESSMENT individual or
group in nature.
typically
individualized.
Assessment
more typically
OVERVIEW: Process focuses on how
evaluation
an individual
- Roots of contemporary psychological testing and processes rather
assessment can be found in early twentieth-century than simply the
France results of that
- In 1905, Alfred Binet and a colleague published a test processing
designed to help place Paris school children in
appropriate classes
- Within a decade, an English language version of
Binet’s test was prepared for use in schools in the
United States. Testing Assessment
- When the United States declared war on Germany The tester is not The assessor is
and entered World War I in 1917, the military needed key to the key to the
a way to screen large numbers of recruits quickly for process. one process of
intellectual and emotional problems. tester may be selecting tests
- World War II, the military would depend even more Role of substituted for and/or other
on psychological tests to screen recruits for service. evaluator another tester tools of
- Following the war, more and more tests purporting to without evaluation as
measure an ever-widening array of psychological appreciably well as in
variables were developed and used. affecting the drawing
evaluation. conclusions
Psychological Testing and Assessment Definition from the entire
evaluation
Testing – the term used to refer everything in the
administration of a test
Testing Assessment
Assessment – data generated subjected to thoughtful
Testing Assessment
integration and evaluation administered by highly
typically typically
trained assessment center staff requires requires an
Psychological testing - the process of measuring technician-like educated
psychology-related variables by means of devices or skills in terms selection of
Skill of administering tools of
procedures designed to obtain a sample of behavior
evaluator and scoring a evaluation, skill
Psychological assessment - the gathering and test as well as in in evaluation,
integration of psychology- related data for the purpose of interpreting a and thoughtful
making a psychological evaluation that is accomplished test result organization
through the use of tools such as tests, interviews, case and integration
of data
studies, behavioral observation, and specially designed
apparatuses and measurement procedures.
Testing Assessment
COMPARED
testing yields a assessment
Testing Assessment test score or entails a logical
to obtain some to answer a series of test problem solving
gauge, usually referral scores. approach that
numerical in question, solve brings to bear
Objectives nature, with a problem, or many sources of
regard to an arrive at a Outcome data designed to
ability or decision shed light on a
attribute through the use referral
of tools of question
- This variety of behavioral observation is referred
to as naturalistic observation.
The Tools of Psychological Assessment
Case history data
Test - may be defined simply as a measuring device or
procedure - refers to records, transcripts, and other accounts
in written, pictorial, or other form that preserve
Psychological test - refers to a device or procedure archival information, official and informal
designed to measure variables related to psychology accounts, and other data and items relevant to an
assesses
Format - pertains to the form, plan, structure,
arrangement, and layout of test items as well as to - History of people
related considerations such as time limits. The format is Computer as a tool
also used to refer to the form in which a test is
administered: computerized, pencil-and paper, or some - Computers can serve as test administrators
other form (online or off) and as highly efficient test
scorers.
Scoring – the process of assigning such evaluative codes - CAPA refers to the term computer assisted
or statements to performance on tests, tasks, interviews, psychological assessment that refers to the
or other behavior samples assistance computers provide to the test user, not
 Cut score - a reference point, usually numerical, the test taker.
derived by judgment and used to divide a set of Who are the parties in the assessment enterprise?
data into two or more classifications.
Test developer - Test developers and publishers create
Psychometrics – may be defined as the science of tests or other methods of assessment
psychological measurement
Test taker - clinicians, counselors, school psychologists,
The Tools of Psychological Assessment human resources personnel, consumer psychologists,
experimental psychologists, social psychologists
The interview
Test user - anyone who is the subject of an assessment
- the interview as a tool of psychological
or an evaluation can be a test taker or an assessee.
assessment typically involves more than talk.
- the interviewer is taking note of both verbal and In what types of settings are assessments conducted?
nonverbal behavior
Psychological autopsy may be defined as a
- method of gathering information through direct
reconstruction of a deceased individual’s psychological
communication involving reciprocal exchange.
profile on the basis of archival records, artifacts, and
The portfolio interviews

- paper, canvas, film, video, audio, or some other


medium— constitute what is called a portfolio Educational settings
- samples of one’s ability and accomplishment, a
portfolio may be used as a tool for evaluation - early in school life to help identify children who
may have special needs
Role play test - School ability tests, another type of test
commonly given in schools is an achievement
- acting an improvised or partially improvised part
test, which evaluates accomplishment or the
in a simulated situation
degree of learning that has taken place
- tool of assessment wherein assesses are directed
- Diagnosis may be defined as a description or
to act as if they were in a particular situation
conclusion reached on the basis of evidence and
Behavioral observation opinion
- Diagnostic test refers to a tool of assessment
- observe behavior of humans in a natural setting used to help narrow down and identify areas of
—that is, the setting in which the behavior deficit to be targeted for intervention.
would typically be expected to occur.
Assessment of people with disabilities
Clinical settings
- People with disabilities are assessed for exactly
- These tools are used to help screen for or the same reasons that people with no disabilities
diagnose behavior problems. are assessed: to obtain employment, to earn a
- The tests employed in clinical settings may be professional credential, to be screened for
intelligence tests, personality tests, psychopathology, and so forth
neuropsychological tests, or other specialized - Accommodation may be defined as the
instruments, depending on the presenting or adaptation of a test, procedure, or situation, or
suspected problem area. the substitution of one test for another, to make
the assessment more suitable for an assessee
Counseling settings with exceptional needs.
- Alternate assessment is an evaluative or
- Assessment in a counseling context may occur
diagnostic procedure or process that varies from
in environments as diverse as schools, prisons,
the usual, customary, or standardized way a
and government or privately owned institutions.
measurement is derived either by virtue of some
- Regardless of the particular tools used, the
special accommodation made to the assessee or
ultimate objective of many such assessments is
by means of alternative methods designed to
the improvement of the assessee in terms of
measure the same variable(s).
adjustment, productivity, or some related
variable. Test Catalogues
Geriatric settings - one of the most readily accessible sources of
information is a catalogue distributed by the
- Older individuals; at some point require
publisher of the test
psychological assessment to evaluate cognitive,
psychological, adaptive, or other functioning. - the catalogue’s objective is to sell the test
- At issue in many such assessments is the extent Test Manual
to which assessees are enjoying as good a
quality of life as possible. - detailed information concerning the
development of a particular test and
Business & Military settings technical information
- In business, as in the military, tests are used in
many ways, perhaps most notably in decision
making about the careers of personnel.

Governmental and organizational


credentialing
- One of the many applications of measurement is
in governmental licensing, certification, or
general credentialing of professionals
measured to any desired level
of accuracy. TEMPERATURE

STATISTICS  Population
individuals.
- entire group of

 Sample - is selected to represent the

REFRESHER
population in a research study. The
goal is to use the results obtained
from the sample to help answer
questions about the population.

SCALES OF MEASUREMENT MEASURING VARIABLES


 Measurement- The act of assigning A structured set of categories that allows
numbers or symbols to characteristics researchers to classify each individual or
of things (people, events, whatever) observation into a specific category
according to rules. (standard in based on the variable being studied.
measuring [less/more])
QUANTITATIVE
 Scale - a set of numbers (or other
symbols) whose properties model Three important properties:
empirical properties of the objects to
 Magnitude—property of “moreness”.
which the numbers are assigned. (no
Higher score refers to more of
less/more) QUALITATIVE
something.
EX. if you're measuring height, a
Variables score of 180 cm is more (taller)
than a score of 170 cm.
 Variables – are the characteristics of
the individuals within the population  Equal intervals—is the difference
that needs to be measured. between any two adjacent numbers
 Qualitative - allow for referring to the same amount of
classification of individuals based difference on the attribute .
on some attribute or EX. the difference between a
characteristic. temperature of 20°C and 21°C is
 Quantitative - provide numerical the same as the difference
measures of individuals. between 30°C and 31°C.
 Discreet - is a quantitative
 Absolute zero—does the scale have
variable that either has a finite
a zero point that refers to having
number of possible values or a
none of that attribute.
countable number of possible
EX. weight measurement, 0 kg
values. The term “countable”
means the values result from means no weight at all.
counting such as 0, 1, 2, 3,
and so on. POPULATION
Nominal: gender; hair color
 Continuous - is a quantitative (CLASSIFICATION ONLY NO OTHER)
variable that has an infinite
number of possible values it  there must be distinct classes but
can take on and can be these classes have no quantitative
properties. No comparison can be
made in terms of one category being have been used and the data have
higher than the other. not been grouped.

Ordinal: Class officers; top 10 honor  Grouped frequency


distribution - test-score
(MAGNITUDE BUT NO INTERVAL, NO ABS. intervals, also called class
0) intervals,replace the actual test
scores. A decision about the size
 there are distinct classes but these of a class interval in a grouped
classes have a natural ordering or frequency distribution is made on
ranking. The differences can be the basis of convenience.
ordered on the basis of magnitude.

Interval: temperature; IQ score


(MAGNITUDE, INTERVAL, NO ABS. 0)
 it is possible to compare differences
in magnitude, but importantly the
zero point does not have a natural
meaning.
Ratio: height; weight; strength
(MAGNITUDE, INTERVAL, ABS. 0)
 Frequency distributions of test scores
 captures the properties of the other can also be illustrated graphically….
types of scales, but also contains a
true zero, which represents the
absence of the quality being  GRAPH - is a diagram or chart
measured. composed of lines, points, bars, or
other symbols that describe and
illustrate data
DESCRIBING DATA
 Distribution- a set of test scores THREE KINDS OF GRAPH
arrayed for recording or study.
 Raw score - a straightforward,  Histogram- is a graph with vertical
unmodified accounting of lines drawn at the true limits of each
performance that is usually numerical test score (or class interval), forming
 Frequency Distributions - all a series of contiguous rectangle
scores are listed alongside the
number of times each score occurred
 Simple frequency distribution
- to indicate that individual scores
MEASURES OF VARIABILITY
 Bar graph- numbers indicative of  statistics that describe the amount of
frequency also appear on the Y -axis, variation in a distribution
and reference to some categorization  some measures of variability include
(e.g., yes/no/maybe, male/female) the range, the interquartile range, the
appears on the X- axis. Here the, semi- interquartile range, the average
rectangular bars typically are not deviation, the standard deviation, and
contiguous. the variance.
 Variability - an indication of how
scores in a distribution are scattered
or dispersed.

RANGE - range of a distribution is equal


to the difference between the highest
and the lowest scores. The range
provides a quick but gross description of
the spread of scores.

 Frequency polygon- are expressed INTERQUARTILE AND SEMI-


by a continuous line connecting the QUARTILE - A distribution of test scores
points where test scores or class can be divided into four parts such that
intervals (as indicated on the X -axis) 25% of the test scores occur in each
meet frequencies (as indicated on the
quarter.
Y -axis).
 Quartile - refers to a specific point
 Quarter-refers to an interval.
 Interquartile range - is a measure
of variability equal to the difference
between Q 3 and Q 1 . Like the
median, it is an ordinal statistic.
 Semi-interquartile range - is equal
to the interquartile range divided by
2.

MEASURES OF CENTRAL TENDENCY


 a statistic that indicates the average
or midmost score between the
extreme scores in a distribution.
 Mean - (average) where n equals the
number of observations or test
scores.
 Median - Middle Score
 Mode - most occurring score
Standard deviation  Negatively skewed examination
results may indicate that the test
 as a measure of variability equal to was too easy.
the square root of the average  In this case, more items of a
squared deviations about the mean. higher level of difficulty would
 More succinctly, it is equal to the make it possible to better
square root of the variance. discriminate between scores at
 The standard deviation is a very
useful measure of variation because
the upper end of the distribution.
each individual score’s distance from
the mean of the distribution is
factored into its computation.

Skewness - the nature and extent to


which symmetry is absent
 A distribution has a positive skew
when relatively few of the scores fall
at the high end of the distribution.
 Positively skewed examination
may indicate that the test was too
difficult.
 More items that were easier would
have been desirable in order to
better discriminate at the lower Kurtosis - refer to the steepness of a
end of the distribution of test distribution
in its center
 Distributions are generally described
as:
 platykurtic (relatively flat),
 leptokurtic (relatively peaked), or
 mesokurtic (normally distributed)
somewhere in the middle

scores.

 A distribution has a negative skew


when relatively few of the scores fall
at the low end of the distribution
The normal curve MEAN: 50 / SD:15

 is a bell-shaped, smooth, STANDARD SCORES


mathematically defined curve that is
a raw score that has been converted
highest at its center. A normal curve
has two tails from one scale to another scale, where
 Development of the concept of a the latter scale has some arbitrarily set
normal curve began in the middle of mean and standard deviation. Raw
the eighteenth century with the work scores may be converted to standard
of Abraham DeMoivre and, later, scores because standard scores are more
the Marquis de Laplace. easily interpretable than raw scores.
 Through the early nineteenth century,
scientists referred to it as the
“Laplace-Gaussian curve.” Z SCORE
 Karl Pearson is credited with being
the first to refer to the curve as the  (zero plus or minus one scale) results
normal curve, perhaps in an effort to from the conversion of a raw score
be diplomatic to all of the people who into a number indicating how many
helped develop it. standard deviation units the raw
Area under normal curve score is below or above the mean of
the distribution.
 50% of the scores occur above the Example: We’ll convert a raw score of
mean and 50% of the scores occur 65 to a z score by using the formula with
below the mean.
a mean of 50 and SD of 15
 Approximately 34% of all scores occur
between the mean and 1 standard
deviation above the mean.
 Approximately 34% of all scores occur
between the mean and 1 standard
deviation below the mean.
 Approximately 68% of all scores occur
between the mean and 1 standard MEAN: 0 / SD: 1
deviation.
 Approximately 95% of all scores occur
between the mean and 2 standard X−50
deviations. 1+ 15 + 1(15)+50 = 65 (X?)
T SCORE
 (fifty plus or minus ten scale); that is,
a scale with a mean set at 50 and a
standard deviation set at 10.
 Devised by W. A. McCall (1922, 1939)
and named a T score in honor of his
professor E. L. Thorndike
Example: We’ll convert a raw score of
65 to a T score by using the formula with
a mean of 50 and SD of 15
1ST FORMULA
10(X −M ) 10(65−50) 150
T= +50 = + 50 = 15 +
SD 15
50 = 60
2ND FORMULA

T= 10(Z)+50
MEAN: 50 / SD: 10

STANINE
 a term that was a contraction of the
words standard and nine.
 Stanines are different from other Transformation
standard scores in that they take on
whole values from 1 to 9, which  A standard score obtained by a
represent a range of performance linear transformation is one that
that is half of a standard deviation in retains a direct numerical relationship
width. to the original raw score.
 A nonlinear transformation may
be required when the data under
consideration are not normally
distributed yet comparisons with
normal distributions need to be made.
In a nonlinear transformation, the
resulting standard score does not
necessarily have a direct numerical
relationship to the original, raw score.
As the result of a nonlinear
transformation, the original
distribution is said to have been
normalized.

MEAN: 5 / SD: 2

SAT/GRE/CEEB
 School Aptitude Test
 Graduate Record Examination
 CEEB
Mean=500, SD=100

Deviation IQ
Mean=100, SD=10
Historical, Cultural, and
WILHEM MAX WUNDT
Legal/Ethical Considerations
Psychological Assessment - First experimental psychology laboratory in
University of Leipzig
19TH CENTURY - Focuses more on relating to how people were
similar, not different from each other
- Tests and testing programs first came into being
in China JAMES MCKEEN CATELL
- Testing was instituted as a means of selecting
who, of many applicants would obtain - Individual differences in reaction time
government jobs (Civil service) - Coined the term mental test
- The job applicants are tested on proficiency in CHARLES SPEARMAN
endeavors such as music, archery, knowledge
and skill etc. - Originating the concept of test reliability as well
as building the mathematical framework for the
GRECO-ROMAN WRITINGS (Middle Ages) statistical technique of factor analysis
- World of evilness VICTOR HENRI
- Deficiency in some bodily fluid as a factor
believed to influence personality - Frenchman who collaborated with Binet on
- Hippocrates and Galen papers suggesting how mental tests could be
used to measure higher mental processes
RENAISSANCE
EMIL KRAEPELIN
- Christian von Wolff – anticipated psychology
as a science and psychological measurement as a - Early experimenter of word association
specialty within that science technique as a formal test

CHARLES DARWIN AND INDIVIDUAL LIGHTNER WITMER


DIFFERENCES
- “Little known founder of clinical psychology”
- Tests designed to measure these individual - Founded the first psychological clinic in the U.S
differences in ability and personality among
PSYCHE CATELL
people
- “Origin of Species” chance variation in species - Daughter of James Cattell
would be selected or rejected by nature - Cattel Infant Intelligence Scale (CIIS) &
according to adaptivity and survival value. Measurement of Intelligence in Infants and
“survival of the fittest” Young Children
FRANCIS GALTON RAYMOND CATTELL
- Explore and quantify individual differences - Believed in lexical approach to defining
between people personality which examines human languages
- Classify people “according to their natural gifts” for descriptors of personality dimensions.
- Displayed the first anthropometric laboratory
20th CENTURY
KARL PEARSON
- Birth of the first formal tests of intelligence
- Developed the product moment correlation - Testing shifted to be of more understandable
technique. His work can be traced directly from relevance/meaning
Galton
C. THE ACADEMIC AND APPLIED
A. THE MEASUREMENT OF
TRADITIONS
INTELLIGENCE Evolving Interest in Culture - Related Issues
Binet created first intelligence to test to identify - Goddard tested immigrants and found most to
mentally retarded school children in Paris be feebleminded
(individual) - invalid; overestimated mental deficiency, even
 Binet-Simon Test has been revised over again in native English speakers
 Group intelligence tests emerged with need to - Lead to nature-nurture debate about what
screen intellect of WWI recruits intelligence tests actually measure Needed to
“isolate” the cultural variable
David Wechsler – designed a test to measure adult - Culture-specific tests: tests designed for use
intelligence test with ppl from one culture, but not from another-
 Wechsler-Bellevue Intelligence Scale Wechsler minorities still scored abnormally low ex.) loaf
Adult Intelligence Test of bread vs. tortillas
- today tests undergo many steps to ensure its
B. THE MEASUREMENT OF PERSONALITY suitable for said nation -take test takers reactions
into account.
Field of psychology was being to test oriented
Some Issues Regarding Culture and
Clinical psychology was synonymous to mental Assessment
testing
Verbal Communication
ROBERT WOODWORTH – develop a measure of
adjustment and emotional stability that could be - Examiner and examinee must speak the
administered quickly and efficiently to groups of same language
recruits
- Also requires understanding of culture
 To disguise the true purpose of the test,
Nonverbal Communication and Behavior
questionnaire was labeled as Personal Data
Sheet - Different between cultures Ex: meaning of
 He called it Woodworth Psychoneurotic not making eye contact
Inventory
- Body movement could even have physical
 first widely used self-report test of personality
cause
Self-report test:
Legal and Ethical Consideration
- Advantages: Respondents best qualified
The Concerns of the Public
- Disadvantages: Poor insight into self, One might
honestly believe something about self that isn’t true, and - Beginning in world war I, fear that tests
unwillingness to report seemingly negative qualities were only testing the ability to take tests
Projective test - Individual is assumed to project onto - Legislation
some ambiguous stimulus (inkblot, photo, etc.) his or her - Litigation
own unique needs, fears, hopes, and motivations Ex.)
The Concerns of the Profession
Rorschack inkblot.
- Test-user qualifications
- Testing people with disabilities
- Computerized test administration, scoring,
and interpretation
TEST AND TESTING Error variance: component of a test score attributable
to sources other than the trait or ability measured
assumption 1: Psychological Traits and Classical test theory (CTT)/ True score theory:
States of Exist assumption is made that each test taker has a true score
on a test that would be obtained but for the action of
Trait: any distinguishable, relatively enduring way in
measurement error
which one individual varies from another
States: distinguishable one person from another but are Assumption 6: Testing and Assessment
relatively less enduring. Can Be Conducted in a Fair and
Psychological trait: covers wide range of possible
Unbiased Manner
characteristics; ex intelligence specific intellectual - Court challenged to various tests and testing programs
abilities cognitive style psychopathology have sensitized test developers and users to the societal
overt behavior - refers to an observable action or the demand for fair tests used in a fair manner
product of an observable action, including test- or - Fairness related problems/questions
assessment-related responses.
Assumption 7: Testing and Assessment
Assumption 2: Psychological Traits and Benefit Society
States Can Be Quantified and Measured
- Many critical decisions are based on testing and
Cumulative scoring: test score is presumed to represent assessment procedures
the strength of the targeted ability or trait or state.
WHAT’S A “GOOD TEST”?
Assumption 3: Test-Related Behavior
Predicts Non-Test-Related Behavior Criteria - Clear instruction for administration, scoring,
and interpretation
- objective of test is to provide some indication of some
Reliability - A “good test”/measuring tool is reliable or
aspects of the examinee’s behavior
the consistency.
- Tasks in certain tests replicate real-life behaviors that
Validity – It measure the things that needs to be
the test user wants to understand. The observed
measured or the content of the test
behaviors from these tests can be used to predict future
actions or to analyze past behaviors. Tools like diaries or NORMS
case history data can be valuable in these evaluations.
- Norm- referenced testing and assessment:
Assumption 4: Tests and Other method of evaluation and a way of deriving
meaning from test scored by evaluating an
Measurement Techniques Have Strengths individual testtaker’s score and comparing it to
and Weaknesses scores of a group of testtakers.
- Norms (psychometric context): the test
- Competent test users understand a lot about the tests
performance data of a particular group of
they use testtakers that are designed for use as a reference
when evaluating or interpreting individual test
- Understand and appreciation limitations for tests they scores.
use - Normative sample: group of people whose
Assumption 5: Various Sources of Error performance on a particular test is analyzed for
reference in evaluation the performance of
Are Part of the Assessment Process individual testtakers.
Everyday error - misstates and miscalculations - Norming: refers to the process of deriving
norms; particular type of norm derivation
Assessment error - a long-standing assumption that
factors other than what a test attempts to measure will
influence performance on a test
- Race norming: controversial practice of be affected by chronological age, school grade,
norming on the basis of race or ethnic or stage of life.
background
STANDARD ERROR OF MEASUREMENT

- estimate the extent to which an observed score


National norms:
deviates from a true score.
- derived from a normative sample that was
STANDARD ERROR OF ESTIMATE
nationally representative of the population at the
- In regression, an estimate of the degree of error time the norming study was conducted
involved in predicting the value of one variable
National Anchor Norms
from another
- Many different tests purporting to measure the
STANDARD ERROR OF THE MEAN
same human characteristics or abilities
- a measure of sampling error
Subgroup norms:
STANDARD ERROR OF THE DIFFERENCE
- result of segmentation; more narrowly defined.
- estimate how large a difference between two
Local norms:
scores should be before the difference is
considered statistically significant - provide normative info with respect to the local
population’s performance on some test.
Tracking - Comparisons are usually with people of the
same age Children at the same age level tend to go Fixed reference group scoring system:
through different growth patterns
- distribution of scored obtained on the test from
Percentile - an expression of the percentage of people one group of testtakers (fixed reference group) is
whose score on a test or measure falls below a particular used as the basis for the calculation of test
raw score scores for future administrators on the test.
Percentage correct: refers to the distribution of raw NORM-REFERENCED VERSUS CRITERION-
scores (number of items that were answered correctly) REFERENCED EVALUATION
multiplied by 100 and divided by the total number of
items *not same as percentile. Way to derive meaning from test score is to evaluate test
score in relation to other scores on same test
AGE NORMS
Criterion-referenced: derive meaning from a test score
- Age-equivalent scores/age norms: indicate the by evaluating it on the basis of whether or not some
average performance of different samples of criterion has been met.
testtakers who were at various ages at the time
the test was administered Criterion-referenced testing and assessment: method
- Age norm tables for physical characteristics of evaluation and way of deriving meaning from test
scores by evaluating an individual’s score with reference
GRADE NORMS to a set standard. Also called Domain/content-
referenced testing and assessment
- Grade norms: designed to indicate the average
test performance of testtakers in a given school Culture and Inference: Culture is a factor in test
grade. administration, scoring and interpretation.
Developmental norms:

- (ex: grade norms and age norms) term applied


broadly to norms developed on the basis of any
trait, ability, skill, or other characteristic that is
presumed to develop, deteriorate, or otherwise
and a growing reliance on objective, computer-
scorable items have virtually eliminated error
variance caused by scorer differences.

RELIABILITY Other sources of error:


RELIABILITY - refers to consistency in measurement.  Surveys And Polls are two tools of assessment
And whereas in everyday conversation reliability always commonly used by researchers who study public
connotes something positive, in the psychometric sense opinion.
it really only refers to something that is consistent—not
 sampling error: the extent to which the
necessarily consistently good or bad, but simply
population of voters in the study actually was
consistent.
representative of voters in the election. The
RELIABILITY COEFFICIENT- is an index of researchers may not have gotten it right with
reliability, a proportion that indicates the ratio between respect to demographics, political party
the true score variance on a test and the total variance. affiliation, or other factors related to the
population of voters.
TRUE VARIANCE- Variance from true differences
 methodological error: did not include enough
ERROR VARIANCE- variance from irrelevant, random people in their sample to draw the conclusions
sources that they did
MEASUREMENT ERROR- refers to, collectively, all
of the factors associated with the process of measuring
Reliability Estimates
some variable, other than the variable being measured. It Test-Retest Reliability - is an estimate of reliability
can be categorized as being either systematic or random. obtained by correlating pairs of scores from the same
people on two different administrations of the same test.
 RANDOM ERROR- is a source of error in
measuring a targeted variable caused by coefficient of stability - The longer the time that passes,
unpredictable fluctuations and inconsistencies of the greater the likelihood that the reliability coefficient
other variables in the measurement process. will be lower. When the interval between testing is
Sometimes referred to as “noise,” or weather, or greater than six months.
health.
Inter-rater reliability- ensures different observers
 SYSTEMATIC ERROR- refers to a source of measure the same thing in consistent manner.
error in measuring a variable that is typically
constant or proportionate to what is presumed to Parallel-Forms and Alternate-Forms Reliability
be the true value of the variable being measured. Estimates

Sources of error variance PARALLEL FORMS OF A TEST - exist when, for


each form of the test, the means and the variances
 TEST CONSTRUCTION- One source of of observed test scores are equal.
variance during test construction is item
sampling or content sampling, terms that refer to - parallel forms reliability - refers to an
variation among items within a test as well as to estimate of the extent to which item
variation among items between tests. sampling and other errors have affected test
 TEST ADMINISTRATION- Sources of error scores on versions of the same test when, for
variance that occur during test administration each form of the test, the means and
may influence the test taker's attention or variances of observed test scores are equal.
motivation.
 TEST SCORING AND INTERPRETATION -
In many tests, the advent of computer scoring
consistency, in turn, is useful in assessing
the homogeneity of the test.
 heterogeneity describes the degree to which
a test measures different factors. A
ALTERNATE FORMS are simply different heterogeneous (or nonhomogeneous) test is
versions of a test that have been constructed so as to composed of items that measure more than
be parallel. Alternate forms of a test are typically one trait.
designed to be equivalent with respect to variables
such as content and level of difficulty.
The Kuder– Richardson formulas
- alternate forms reliability refers to an
estimate of the extent to which these  Dissatisfaction with existing split-half
different forms of the same test have been methods of estimating reliability compelled
affected by item sampling error, or other G. Frederic Kuder and M. W. to develop
error. their own measures for estimating reliability.
SPLIT-HALF RELIABILITY is obtained by Kuder–Richardson formula 20, or KR-20
correlating two pairs of scores obtained from
- named because it was the 20th formula
equivalent halves of a single test administered once.
developed in a series.
(ging hati ang isa ka test or ging divide into to 2)
- test items are highly homogeneous, KR-20
 Step 1. Divide the test into equivalent
and split-half reliability estimates will be
halves.
similar.
 Step 2. Calculate a Pearson r between scores
on the two halves of the test. - KR-20 is the statistic of choice for
 Step 3. Adjust the half-test reliability using determining the inter-item consistency of
the Spearman–Brown formula (discussed dichotomous items, primarily those items
shortly). that can be scored right or wrong (such as
 randomly assign items to one or the other multiple-choice items).
half of the test.
- If test items are more heterogeneous, KR-20
 assign odd-numbered items to one half of
will yield lower reliability estimates than the
the test and even-numbered items to the
split-half method.
other half. Also called as odd-even
reliability. Coefficient alpha - coefficient alpha may be
 divide the test by content so that each half thought of as the mean of all possible split-half
contains items equivalent with respect to correlations, corrected by the Spearman–Brown
content and difficulty. formula.
THE SPEARMAN–BROWN FORMULA allows a  KR-20, which is appropriately used only on
test developer or user to estimate internal tests with dichotomous items,
consistency reliability from a correlation of two  coefficient alpha is appropriate for use on
halves of a test. tests containing nondichotomous item.
Other Methods of Estimating Internal Average proportional distance (APD) - A relatively
Consistency new measure for evaluating the internal consistency
of a test is the average proportional distance (APD)
 INTER-ITEM CONSISTENCY refers to method. the APD is a measure that focuses on the
the degree of correlation among all the items degree of difference that exists between item scores.
on a scale. An index of inter-item
inter-scorer reliability - is the degree of agreement - if it is restricted, reliability tends to be lower.
or consistency between two or more scorers (or
- if it is inflated, reliability tends to be higher.
judges or raters) with regard to a particular measure.
often used when coding nonverbal behavior. 4. SPEED TEST vs POWER TEST
- test is homogenous, means that it is easy but
short time.
- few items but more complex.
5. CRITERION-REFERENCED TEST
- Provide an indication of where a test taker stands
with respect to some variable or criterion.
Using and Interpreting a Coefficient of
Reliability VALIDITY
1. TEST-RETEST RELIABILITY A judgment or estimate of how well a test measures what
it purports to measure in a particular context
- Also known as time-sampling reliability
VALIDATION - process of gathering and evaluating
- Correlating pairs of scores from the same evidence about validity.
group on two different
TYPES OF VALIDITY
- Measure something that is relatively stable
over time FACE VALIDITY - Relates more to what a test appears
to measure to the person being tested than to what the
2. ALTERNATE OR PARALLEL FORMS test actually measures.
- compares two equivalent forms of test that CONTENT VALIDITY - Ensuring the test covers all
measure the same attribute. aspects of the concept being measured
3. INTERNAL OR INTERN CONSISTENCY TEST BLUEPRINT - Is a plan regarding the types of
information to be covered by the items, the number of
- - refers to the extent to which different items items tapping each area of coverage, the organization of
or questions within a test or assessment the items in the test, etc.
measure the same underlying construct or
concept consistently. CRITERION-RELATED VALIDITY - Refers to the
effectiveness of a test in predicting an individual's
NATURE TEST performance or behavior based on a specific criterion or
outcome.
1. HOMOGENEITY vs HETEROGENEITY OF
TEST ITEMS CHARACTERISTICS:
 Relevant
- when a test is homogenous in items, it  Valid
means that it is consistent and uniform in  Uncontaminated
measuring single factor like an ability or
trait. TYPES OF CRITERION RELATED
2. DYNAMIC vs STATIC CHARACTERISTICS
VALIDITY
CONCURRENT VALIDITY - Measures how well a
- dynamic are trait, state, ability to presumed
test correlates with an existing test that has already been
to be ever-changing as function or validated. It checks if the new test can deliver the same
situational and cognitive experiences. results as the existing one.
3. RESTRICTION OR INFLATION OF RANGE
PREDICTIVE VALIDITY - Refers to the extent to
which a test can predict future outcomes or behaviors. It
assesses whether a test score is able to forecast some
criterion or performance that will be measured later.
 BASE RATE - the prevalence or frequency of a
particular trait, behavior, characteristic, or
attribute in a given population.
 HIT RATE - the proportion of people that a test
accurately identifies as having a specific trait,
behavior, characteristic, or attribute.
 MISS RATE - the proportion of people a test
fails to correctly identify as possessing a
particular trait, characteristic, or attribute.

You might also like