0% found this document useful (0 votes)

27 views

RELIABILITY AND VALIDITY

Uploaded by

abdul aleem ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

RELIABILITY AND VALIDITY

Uploaded by

abdul aleem ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

RELIABILITY

Reliability

When a test is reliable, it provides dependable, consistent results

and, for this reason, the term consistency is often given as a
synonym for reliability (e.g., Anastasi, 1988).

Consistency = Reliability
 In the psychometric sense it really only refers to something that
is consistent—not necessarily consistently good or bad, but
simply consistent
Variance

 A statistic useful in describing sources of test score variability is the

variance (σ2 )—the standard deviation squared. This statistic is useful
because it can be broken into components.
 Variance from true differences is true variance, and variance from
irrelevant, random sources is error variance. The total variance is sum of
the true variance and the error variance
Reliability

 The term reliability refers to the proportion of the total variance

attributed to true variance.
 The greater the proportion of the total variance attributed to true
variance, the more reliable the test.
 Because true differences are assumed to be stable, they are presumed
to yield consistent scores on repeated administrations of the same test
as well as on equivalent forms of tests.
 Because error variance may increase or decrease a test score by varying
amounts, consistency of the test score—and thus the reliability—can be
affected.
Methods for Estimating
Reliability
 The selection of a method for estimating reliability depends on the
nature of the test.
 Each method not only entails different procedures but is also affected
by different sources of error. For many tests, more than one method
should be used
 4 types of reliability:
1. Test Re-test Reliability
2. Alternate (Equivalent, Parallel) Forms Reliability
3. Internal Consistency Reliability
4. Inter-Rater (Inter-Scorer, Inter-Observer) Reliability
Test-Retest Reliability

 The test-retest method for estimating reliability involves administering

the same test to the same group of examinees on two different
occasions and then correlating the two sets of scores.
 When using this method, the reliability coefficient indicates the degree
of stability (consistency) of examinees' scores over time and is also
known as the coefficient of stability.
Cont..
 The primary sources of measurement error for test-retest reliability
are any random factors related to the time that passes between the
two administrations of the test.
 These time sampling factors include random fluctuations in
examinees over time (e.g., changes in anxiety or motivation) and
random variations in the testing situation.
 Memory and practice also contribute to error when they have random
carryover effects; i.e., when they affect many or all examinees but not
in the same way.
 Test-retest reliability is appropriate for determining the reliability of
tests designed to measure attributes that are relatively stable over
time and that are not affected by repeated measurement.
 It would be appropriate for a test of aptitude , which is a stable
characteristic, but not for a test of mood, since mood fluctuates over
time, or a test of creativity, which might be affected by previous
exposure to test items.
Alternate (Equivalent, Parallel)
Forms Reliability
 To assess a test's alternate forms reliability, two equivalent forms of
the test are administered to the same group of examinees and the
two sets of scores are correlated.
 Alternate forms reliability indicates the consistency of responding to
different item samples (the two test forms).
 The alternate forms reliability coefficient is also called the coefficient
of equivalence when the two forms are administered at about the
same time;
Cont..

 The primary source of measurement error for alternate forms reliability

is content sampling, or error introduced by an interaction between
different examinees' knowledge and the different content assessed by
the items included in the two forms (e.g: Form A and Form B)
 The items in Form A might be a better match of one examinee's
knowledge than items in Form B, while the opposite is true for another
examinee.
 In this situation, the two scores obtained by each examinee will differ,
which will lower the alternate forms reliability coefficient.
Cont..

 If the same strategies required to solve problems on Form A are used to

solve problems on Form B, even if the problems on the two forms are
not identical, there are likely to be practice effects.
 When these effects differ for different examinees (i.e., are random),
practice will serve as a source of error.
 Although alternate forms reliability is considered by some experts to be
the most rigorous (and best) method for estimating reliability, it is not
often assessed due to the difficulty in developing forms that are truly
equivalent.
 Alternate forms reliability is not appropriate when the attribute
measured by the test is likely to be affected by repeated measurement.
Internal Consistency Reliability

 Reliability can also be estimated by measuring the internal

consistency of a test.
 Split-half reliability and coefficient alpha are two methods for
evaluating internal consistency.
 Both involve administering the test once to a single group of
examinees, and both yield a reliability coefficient that is also known
as the coefficient of internal consistency.
Split-half Reliability

 To determine a test's split-half reliability, the test is split into equal

halves so that each examinee has two scores (one for each half of the
test).
 Scores on the two halves are then correlated
 Tests can be split in several ways, but probably the most common
way is to divide the test on the basis of odd- versus even-numbered
items
Cont..

 A problem with the split-half method is that it produces a reliability

coefficient that is based on test scores that were derived from one-
half of the entire length of the test.
 If a test contains 30 items, each score is based on 15 items. Because
reliability tends to decrease as the length of a test decreases, the
split-half reliability coefficient usually underestimates a test's true
reliability.
 For this reason, the split-half reliability coefficient is ordinarily
corrected using the Spearman-Brown formula, which provides an
estimate of what the reliability coefficient would have been had it
been based on the full length of the test.
Coefficient alpha

 Cronbach's coefficient alpha also involves administering the test once

to a single group of examinees. However, rather than splitting the test
in half, a special formula is used to determine the average degree of
inter-item consistency.
 One way to interpret coefficient alpha is as the average reliability that
would be obtained from all possible splits of the test.
 When test items are scored dichotomously (right or wrong), a
variation of coefficient alpha known as the Kuder-Richardson Formula
20 (KR-20) can be used.
 In contrast to KR-20, which is appropriately used only on tests with
dichotomous items, coefficient alpha is appropriate for use on tests
containing non-dichotomous items.
Cont..

 Content sampling is a source of error for both split-half reliability and

coefficient alpha.
 For split-half reliability, content sampling refers to the error resulting
from differences between the content of the two halves of the test (i.e.,
the items included in one half may better fit the knowledge of some
examinees than items in the other half);
 For coefficient alpha, content (item) sampling refers to differences
between individual test items rather than between test halves.
Cont..

 Coefficient alpha also has as a source of error, the heterogeneity of the

content domain.
 The greater the heterogeneity of the content domain, the lower the
inter-item correlations and the lower the magnitude of coefficient alpha.
 Coefficient alpha could be expected to be smaller for a 200-item test
that contains items assessing knowledge of test construction, statistics,
ethics, epidemiology, environmental health, social and behavioral
sciences, rehabilitation counseling, etc. than for a 200-item test that
contains questions on test construction only.
Cont..

 The methods for assessing internal consistency reliability are useful

when
 a test is designed to measure a single characteristic
 the characteristic measured by the test fluctuates over time
 scores are likely to be affected by repeated exposure to the test
Cont..

 Coefficient alpha typically ranges in value from 0 to 1. The reason for

this is that, conceptually, coefficient alpha (much like other coefficients
of reliability) is calculated to help answer questions about how similar
sets of data are.
 Here, similarity is gauged, in essence, on a scale from 0 (absolutely no
similarity) to 1 (perfectly identical).
 A myth about alpha is that “bigger is always better.” As Streiner (2003)
pointed out, a value of alpha above .90 may be “too high” and indicate
redundancy in the items
Inter-Rater (Inter-Scorer, Inter-
Observer) Reliability

 Inter-rater reliability is of concern whenever test scores depend on a

rater's judgment.
 A test constructor would want to make sure that an essay test, a
behavioral observation scale, or a projective personality test have
adequate inter-rater reliability.
 This type of reliability is assessed either by calculating a correlation
coefficient (e.g., a kappa coefficient or coefficient of concordance) or
by determining the percent agreement between two or more raters.
Cont..

 Sources of error for inter-rater reliability include factors related to the

raters such as lack of motivation and rater biases and characteristics of
the measuring device.
 An inter-rater reliability coefficient is likely to be low, for instance, when
 rating categories are not exhaustive (i.e., don't include all possible responses
or behaviors)
 are not mutually exclusive

 The inter-rater reliability of a behavioral rating scale can also be affected

by consensual observer drift, which occurs when two (or more) observers
working together influence each other's ratings so that they both assign
ratings in a similarly idiosyncratic way.
 Consensual observer drift tends to artificially inflate inter-rater reliability.
VALIDITY
Validity

 Validity, as applied to a test, is a judgment or estimate of how well a test

measures what it purports to measure in a particular context.
 Characterizations of the validity of tests and test scores are frequently
phrased in terms such as “acceptable” or “weak.”
 These terms reflect a judgment about how adequately the test measures
what it purports to measure.
 Inherent in a judgment of an instrument’s validity is a judgment of how
useful the instrument is for a particular purpose with a particular
population of people.
Cont..

 Valid test: what is really meant is that the test has been shown to be
valid for a particular use with a particular population of testtakers at a
particular time.
 No test or measurement technique is “universally valid” for all time, for
all uses, with all types of testtaker populations.
 Rather, tests may be shown to be valid within what we would
characterize as reasonable boundaries of a contemplated usage. If those
boundaries are exceeded, the validity of the test may be called into
question.
 Further, to the extent that the validity of a test may diminish as the
culture or the times change, the validity of a test may have to be re-
established with the same as well as other testtaker populations.
Types of validity

 Content validity
 Construct validity
 Criterion validity
Face Validity

 Face validity relates more to what a test appears to measure to the

person being tested than to what the test actually measures.
 Face validity is a judgment concerning how relevant the test items
appear to be.
 Stated another way, if a test definitely appears to measure what it
purports to measure “on the face of it,” then it could be said to be high
in face validity.
Cont..

 A paper-and-pencil personality test labeled The

Introversion/Extraversion Test, with items that ask respondents whether
they have acted in an introverted or an extraverted way in particular
situations, may be perceived by respondents as a highly face-valid test.
 On the other hand, a personality test in which respondents are asked to
report what they see in inkblots may be perceived as a test with low
face validity. Many respondents would be left wondering how what they
said they saw in the inkblots really had anything at all to do with
personality.
Cont..

 In contrast to judgments about the reliability of a test and judgments

about the content, construct, or criterion-related validity of a test,
judgments about face validity are frequently thought of from the
perspective of the testtaker, not the test user.
 A test’s lack of face validity could contribute to a lack of confidence in
the perceived effectiveness of the test—with a consequential decrease
in the testtaker’s cooperation or motivation to do his or her best
Content Validity

 Content validity evaluates how well an instrument (like a test)

covers all relevant parts of the construct it aims to measure.
 Content validity is the degree to which a test or assessment instrument
evaluates all aspects of the topic, construct, or behavior that it is
designed to measure.
 Do the items fully cover the subject?
 High content validity indicates that the test fully covers the topic for the
target audience. Lower results suggest that the test does not contain
relevant facets of the subject matter.
Cont..

 Ideally, test developers have a clear vision of the construct being

measured, and the clarity of this vision can be reflected in the content
validity of the test (Haynes et al., 1995).
 To ensure content validity, test developers strive to include key
components of the construct targeted for measurement, and exclude
content irrelevant to the construct targeted for measurement.
 Although content validation is sometimes used to establish the validity
of personality, aptitude, and attitude tests, it is most associated with
achievement-type tests that measure knowledge of one or more content
domains and with tests designed to assess a well-defined behavior
domain.
Cont..

 Content validity has two aspects

 Content relevance
 Content representation
Cont..

 Content validity is usually "built into" a test as it is constructed

through a systematic, logical, and qualitative process that involves
clearly identifying the content or behavior domain to be sampled
and then writing or selecting items that represent that domain.
 Once a test has been developed, the establishment of content
validity relies primarily on the judgment of subject matter experts.
 If experts agree that test items are an adequate and
representative sample of the target domain, then the test is said
to have content validity.
Don’t confuse Content validity with Face
validity.

 Content validity refers to the systematic evaluation of a test by

experts who determine whether or not test items adequately sample
the relevant domain, while face validity refers simply to whether or
not a test "looks like" it measures what it is intended to measure.

 Although face validity is not an actual type of validity, it is a desirable

feature for many tests. If a test lacks face validity, examinees may
not be motivated to respond to items in an honest or accurate
manner. A high degree of face validity does not, however, indicate
that a test has content validity.
Construct Validity

 Construct validity is about how well a test measures the

concept it was designed to evaluate.
 Construct validity is a judgment about the appropriateness of
inferences drawn from test scores regarding individual standings
on a variable called a construct.
Construct

 A construct is an informed, scientific idea developed or hypothesized to

describe or explain behavior.
 Intelligence is a construct that may be invoked to describe why a student
performs well in school.
 Other examples of constructs are job satisfaction, personality, intolerance,
aptitude, depression, motivation, self-esteem, emotional adjustment,
creativity etc
 Constructs are unobservable, presupposed (underlying) traits that a test
developer may invoke to describe test behavior or criterion
performance.
Cont..

 Traditionally, construct validity has been viewed as the unifying concept for
all validity evidence (American Educational Research Association et al.,
1999).
 All types of validity evidence, including evidence from the content- and
criterion-related varieties of validity, come under the umbrella of construct
validity.
 The researcher investigating a test’s construct validity must formulate
hypotheses about the expected behavior of high scorers and low scorers
on the test.
 These hypotheses give rise to a tentative theory about the nature of the
construct the test was designed to measure. If the test is a valid measure
of the construct, then high scorers and low scorers will behave as predicted
by the theory.
Evidence of Construct Validity

 A number of procedures may be used to provide different kinds of evidence that a

test has construct validity.
 The various techniques of construct validation may provide evidence, for example,
that
 the test is homogeneous, measuring a single construct
 test scores increase or decrease as a function of age, the passage of time, or an
experimental manipulation as theoretically predicted
 test scores obtained after some event or the mere passage of time (or, posttest
scores) differ from pretest scores as theoretically predicted
 test scores obtained by people from distinct groups vary as predicted by the theory
 test scores correlate with scores on other tests in accordance with what would be
predicted from a theory that covers the manifestation of the construct in question
Types Of Construct Validity

 Convergent validity
 Divergent (discriminant) validity
Convergent validity:
 Convergent validity shows whether a test that is designed to assess a
particular construct correlates with other tests that assess the same
construct.
 We can analyze convergent validity by comparing the results of a test
with those of others that are designed to measure the same construct. If
there is a strong positive correlation between the results, then the test
can be said to have high convergent validity.
Cont..

 Discriminant validity shows whether a test that is designed to

measure a particular construct does not correlate with tests that
measure different constructs. This is based on the idea that we wouldn’t
expect to see the same results from two tests that are meant to
measure different things (e.g. a math test vs a spelling test).
 We can analyze discriminant validity by comparing the results of an
assessment that measures one thing with those of a test that measures
something else altogether. If there is no correlation between the scores,
the test can be said to have high discriminant validity; a strong
correlation would indicate low discriminant validity.
 To establish high construct validity, a test must be shown to have
both high convergent validity AND high discriminant validity
Criterion validity

 Criterion validity (or criterion-related validity) evaluates how

accurately a test measures the outcome it was designed
to measure.
 Criterion validity (or criterion-related validity) measures how well
one measure predicts an outcome for another measure. A test
has this type of validity if it is useful for predicting performance
or behavior in another situation (past, present, or future).
Cont..

For example:
 A job applicant takes a performance test during the interview process. If
this test accurately predicts how well the employee will perform on the
job, the test is said to have criterion validity.
 A graduate student takes the GRE. The GRE has been shown as an
effective tool (i.e. it has criterion validity) for predicting how well a
student will perform in graduate studies.
 The first measure (in the above examples, the job performance test and
the GRE) is sometimes called the predictor variable or the estimator. The
second measure is called the criterion variable
Types of Criterion Validity

 Two types of validity evidence are included under criterion-related

validity.
 Concurrent validity is an index of the degree to which a test score
is related to some criterion measure obtained at the same time
(concurrently).
 Predictive validity is an index of the degree to which a test score
predicts some criterion measure in future.
Concurrent Validity

 If test scores are obtained at about the same time as the criterion
measures are obtained, measures of the relationship between the test
scores and the criterion provide evidence of concurrent validity.
 Statements of concurrent validity indicate the extent to which test
scores may be used to estimate an individual’s present standing on a
criterion.
 Concurrent validity measures how well a new test compares to an well-
established test.
 If we create a new test for depression levels, we can compare its
performance to previous depression tests that have high validity.
Predictive Validity

 Test scores may be obtained at one time and the criterion measures
obtained at a future time, usually after some intervening event has
taken place.
 The intervening event may take varied forms, such as training,
experience, therapy, medication, or simply the passage of time.
 Measures of the relationship between the test scores and a criterion
measure obtained at a future time provide an indication of the predictive
validity of the test; that is, how accurately scores on the test predict
some criterion measure.
Cont..

 If there is a high correlation between scores on the entrance exam and

the first semester GPA, it’s likely that there is predictive
validity between these two variables.
 In other words, the score that a student receives on this particular
college entrance exam is predictive of the GPA they’re likely to receive
during their first semester in college.
 The term predictive validity refers to the extent that it’s valid to use
the score on some scale or test to predict the value of some other
variable in the future.
 For example, we might want to know how well some college entrance
exam is able to predict the first semester grade point average of
students.
Summary

 Construct validity: concerns the extent to which your test or measure

accurately assesses what it’s supposed to.
 Content validity: Is the test fully representative of what it aims to measure?
 Face validity: Does the content of the test appear to be suitable to its aims?
 Criterion validity: Do the results accurately measure the concrete outcome
they are designed to measure?
 All three types of validity evidence contribute to a unified picture of a test’s
validity.
 A test user may not need to know about all three. Depending on the use to
which a test is being put, one type of validity evidence may be more relevant
than another.
THANKYOU!

(FREE PDF Sample) Quantitative Psychological Research The Complete Student S Companion David Clark-Carter Ebooks
100% (3)
(FREE PDF Sample) Quantitative Psychological Research The Complete Student S Companion David Clark-Carter Ebooks
62 pages
Psychological Assessment - Reliability & Validity
100% (1)
Psychological Assessment - Reliability & Validity
56 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Reliabilty Lecture (5)
No ratings yet
Reliabilty Lecture (5)
16 pages
Psych Assessment Unit V
No ratings yet
Psych Assessment Unit V
2 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
4 Reliability Validity
No ratings yet
4 Reliability Validity
47 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
UNIT 05: Reliability: Module Overview
No ratings yet
UNIT 05: Reliability: Module Overview
9 pages
Chapter 13 Assessing Quality of Measurement Tools 2
No ratings yet
Chapter 13 Assessing Quality of Measurement Tools 2
57 pages
Psyc 85 - Reliability
No ratings yet
Psyc 85 - Reliability
37 pages
Reliability and Validity
No ratings yet
Reliability and Validity
32 pages
Lesson 9A_Reliability
No ratings yet
Lesson 9A_Reliability
9 pages
RELIABILITY 2024
No ratings yet
RELIABILITY 2024
30 pages
Students_Slides_1_Realibity
No ratings yet
Students_Slides_1_Realibity
59 pages
reliability
No ratings yet
reliability
15 pages
Psy 112 Handout 6
No ratings yet
Psy 112 Handout 6
6 pages
5 Reliability
No ratings yet
5 Reliability
29 pages
Reliability
No ratings yet
Reliability
9 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Introduction To Reliability: What Is Reliability? Why Is It Important?
No ratings yet
Introduction To Reliability: What Is Reliability? Why Is It Important?
14 pages
Good Psychometric Properties
No ratings yet
Good Psychometric Properties
44 pages
Chapter 4: Reliability
No ratings yet
Chapter 4: Reliability
40 pages
UNIT-5 psychometry_240505_1652001
No ratings yet
UNIT-5 psychometry_240505_1652001
20 pages
Psyc 385 Exam 2 Study Guide
No ratings yet
Psyc 385 Exam 2 Study Guide
17 pages
Week 4 - Reliability
No ratings yet
Week 4 - Reliability
8 pages
Relibility Testing
No ratings yet
Relibility Testing
44 pages
Paprint
No ratings yet
Paprint
3 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
RELIABILITY Report
No ratings yet
RELIABILITY Report
20 pages
Chapter_5_New
No ratings yet
Chapter_5_New
13 pages
Questionnaire Reliability Validity
No ratings yet
Questionnaire Reliability Validity
29 pages
3 - Types of Reliability
No ratings yet
3 - Types of Reliability
36 pages
Characteristics of Effective Selection Techniques
No ratings yet
Characteristics of Effective Selection Techniques
17 pages
Reliability Test by Group 2
No ratings yet
Reliability Test by Group 2
28 pages
Slide 4-Reliability
No ratings yet
Slide 4-Reliability
17 pages
test constrcution
No ratings yet
test constrcution
39 pages
SUPPLEMENTARY READINGS FOR RELIABILITY, VALIDITY, UTILITY
No ratings yet
SUPPLEMENTARY READINGS FOR RELIABILITY, VALIDITY, UTILITY
8 pages
Reliability
No ratings yet
Reliability
11 pages
Item Reliability: Presented By: Jhoanna Rose M. Moreno Group 2 Adv. Psychometrics
No ratings yet
Item Reliability: Presented By: Jhoanna Rose M. Moreno Group 2 Adv. Psychometrics
17 pages
Reliability: Floramae Z. Campos Student/MA-GC
No ratings yet
Reliability: Floramae Z. Campos Student/MA-GC
29 pages
Slide 5_Reliability Validity
No ratings yet
Slide 5_Reliability Validity
31 pages
Reliability
No ratings yet
Reliability
37 pages
CLASS PRESENTATION - Test Reliability
No ratings yet
CLASS PRESENTATION - Test Reliability
7 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
PSYCH STATS SEMI
No ratings yet
PSYCH STATS SEMI
11 pages
Module 4 Psychometric properties (1)
No ratings yet
Module 4 Psychometric properties (1)
49 pages
Reliability and its Types
No ratings yet
Reliability and its Types
13 pages
CHAPTER 6
No ratings yet
CHAPTER 6
8 pages
Reviewer Test Measurement Midterms
No ratings yet
Reviewer Test Measurement Midterms
6 pages
Reliability 1
No ratings yet
Reliability 1
12 pages
Reliability
No ratings yet
Reliability
3 pages
Reliability by Vartika Verma
No ratings yet
Reliability by Vartika Verma
17 pages
May 2 - Reliability
No ratings yet
May 2 - Reliability
16 pages
Readings Psy211
No ratings yet
Readings Psy211
23 pages
Psychometrics
No ratings yet
Psychometrics
102 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Evaluating a Psychometric Test as an Aid to Selection
From Everand
Evaluating a Psychometric Test as an Aid to Selection
Zuzana Robertson C.Psychol
5/5 (1)
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Visionary Report
100% (1)
Visionary Report
4 pages
Fiona Hanley PHD 2016
No ratings yet
Fiona Hanley PHD 2016
228 pages
Repair & Increase Trust
No ratings yet
Repair & Increase Trust
14 pages
Dorian Gray Questions-1
No ratings yet
Dorian Gray Questions-1
16 pages
Connotation and Denotation Extensive
No ratings yet
Connotation and Denotation Extensive
4 pages
Amanda Browning - The Seduction Bid
No ratings yet
Amanda Browning - The Seduction Bid
196 pages
2009 Regional Cases
No ratings yet
2009 Regional Cases
24 pages
Cognitive Behavioral Therapy (CBT)
No ratings yet
Cognitive Behavioral Therapy (CBT)
10 pages
Categorization in 3- and 4-Month-Old Infants
No ratings yet
Categorization in 3- and 4-Month-Old Infants
8 pages
Focus Group Discussion
No ratings yet
Focus Group Discussion
13 pages
Letter To 1st Semester Students
No ratings yet
Letter To 1st Semester Students
2 pages
Professional Educ
67% (3)
Professional Educ
26 pages
Yassin Medhat Ahmed El Masry: Executive Summary
No ratings yet
Yassin Medhat Ahmed El Masry: Executive Summary
2 pages
Trainers Methodology Level 1: Orientation April 10, 2019
100% (1)
Trainers Methodology Level 1: Orientation April 10, 2019
44 pages
MBTI Test
50% (2)
MBTI Test
11 pages
Pankh Profile 20-21
No ratings yet
Pankh Profile 20-21
7 pages
Bad Karma Can Be Changed Final
0% (1)
Bad Karma Can Be Changed Final
7 pages
pp5 Dordzhi
No ratings yet
pp5 Dordzhi
1 page
Detailed LP w4 Oral Comm.
No ratings yet
Detailed LP w4 Oral Comm.
6 pages
Course Outline in The Child and Adolescent Learners and Learnig Principles
No ratings yet
Course Outline in The Child and Adolescent Learners and Learnig Principles
3 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
5 pages
Exercises Lecture 3 SB
No ratings yet
Exercises Lecture 3 SB
6 pages
Health Care Delivery and Nursing Practice: Learning Objectives
No ratings yet
Health Care Delivery and Nursing Practice: Learning Objectives
15 pages
Civilserviceindia Com
No ratings yet
Civilserviceindia Com
16 pages
slidesgo-understanding-the-time-value-of-money-a-comprehensive-research-report-on-methodology-analysis
No ratings yet
slidesgo-understanding-the-time-value-of-money-a-comprehensive-research-report-on-methodology-analysis
15 pages
Lesson 5
No ratings yet
Lesson 5
4 pages
Communication Games Between Teachers and Students at The Tertiary Level A Study Based On Transactional Analysis Theory
No ratings yet
Communication Games Between Teachers and Students at The Tertiary Level A Study Based On Transactional Analysis Theory
14 pages
Lesson Plan 6
No ratings yet
Lesson Plan 6
2 pages
Lesson Plan Day 1
No ratings yet
Lesson Plan Day 1
6 pages

RELIABILITY AND VALIDITY

Uploaded by

RELIABILITY AND VALIDITY

Uploaded by

RELIABILITY

When a test is reliable, it provides dependable, consistent results

 A statistic useful in describing sources of test score variability is the

 The term reliability refers to the proportion of the total variance

 The test-retest method for estimating reliability involves administering

 The primary source of measurement error for alternate forms reliability

 If the same strategies required to solve problems on Form A are used to

 Reliability can also be estimated by measuring the internal

 To determine a test's split-half reliability, the test is split into equal

 A problem with the split-half method is that it produces a reliability

 Cronbach's coefficient alpha also involves administering the test once

 Content sampling is a source of error for both split-half reliability and

 Coefficient alpha also has as a source of error, the heterogeneity of the

 The methods for assessing internal consistency reliability are useful

 Coefficient alpha typically ranges in value from 0 to 1. The reason for

 Inter-rater reliability is of concern whenever test scores depend on a

 Sources of error for inter-rater reliability include factors related to the

 The inter-rater reliability of a behavioral rating scale can also be affected

 Validity, as applied to a test, is a judgment or estimate of how well a test

 Face validity relates more to what a test appears to measure to the

 A paper-and-pencil personality test labeled The

 In contrast to judgments about the reliability of a test and judgments

 Content validity evaluates how well an instrument (like a test)

 Ideally, test developers have a clear vision of the construct being

 Content validity has two aspects

 Content validity is usually "built into" a test as it is constructed

 Content validity refers to the systematic evaluation of a test by

 Although face validity is not an actual type of validity, it is a desirable

 Construct validity is about how well a test measures the

 A construct is an informed, scientific idea developed or hypothesized to

 A number of procedures may be used to provide different kinds of evidence that a

 Discriminant validity shows whether a test that is designed to

 Criterion validity (or criterion-related validity) evaluates how

 Two types of validity evidence are included under criterion-related

 If there is a high correlation between scores on the entrance exam and

 Construct validity: concerns the extent to which your test or measure

You might also like