0% found this document useful (0 votes)
31 views25 pages

Understanding Reliability and Validity

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views25 pages

Understanding Reliability and Validity

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reliability and

Validity

Today’s Objectives
◼ Understand the difference between reliability and
validity
◼ Understand how to develop valid indicators of a
concept
Reliability and Validity
Reliability Validity
• How accurate or • Does the concept
consistent is the measure what it is
measure? intended to measure?
• Would two people • Does the measure
understand a question actually reflect the
in the same way? concept?
• Would the same • Do the findings reflect
person give the same the opinions, attitudes,
answers under similar and behaviors of the
circumstances? target population?
Reliable but not valid

Valid but not reliable

Valid and reliable


Levels of Reliability
Example: Person’s weight
LOW Estimate on the part of the subject

Estimate on the part of the observer

Old bathroom scale

HIGH Industrial scale


Reliability
• Reliability is the consistency of your measurement,
or the degree to which an instrument measures the
same way each time it is used under the same
condition with the same subjects. In short, it is the
repeatability of your measurement. A measure is
considered reliable if a person's score on the same
test given twice is similar. It is important to
remember that reliability is not measured, it is
estimated.
• Here is a simple example to illustrate this. Suppose that
you have bathroom weight scales and these weight scales
are broken. The weight scales will represent the
methodology. One person weighs you with these scales
and obtains a result. Then, the weight scales are passed
along to another person. The second person follows the
same procedure, uses the same weight scales and weighs
you. The same broken weigh scales are used. The two
people, using the same broken weight scales, come to
similar measures. The results are reliable. The results are
obtained by two (or perhaps more) people using the faulty
scale. Although the results are reliable, they may not be
valid. That is, by using the faulty scales, the results are not
a true indicator of the real weight.
Reliability

• Accuracy, precision, or consistency of


measurement
• Degree to which measures are free from
error and therefore yield consistent results
• Reliable measures mean the same data
would have been collected under similar
circumstances
Methods used to determine
reliability
• Test-retest method
– Administer the same measures to the same
respondents at two separate points in time
• Split-half method
– Correlate one-half of a scale with the other
half
• Calculate reliability coefficient
– Statistical test that measures the internal
consistency of a set of items
How to improve Reliability?
• Quality of items; concise statements,
homogenous words (some sort of
uniformity)
• Adequate sampling of content domain;
comprehensiveness of items
• Longer assessment – less distorted by
chance factors
• Developing a scoring plan (esp. for
subjective items – rubrics)
• Ensure VALIDITY
Food Quality
• What items would you include to get
adequate sampling of content domain?
Program Satisfaction
• I like the after-school program
• I like the after-school teachers
• I would sign up again for the after-school
program
Validity

• The ability of a scale to measure what it is


intended to measure
• The extent to which a measure reflects the real
meaning of the concept under consideration
• The extent to which a measure reflects the
opinions and behaviors of the population
under investigation
• Can not be valid unless also reliable
Validity

• Validity refers to the degree to which a


study accurately reflects or assesses the
specific concept that the researcher is
attempting to measure. While reliability is
concerned with the accuracy of the actual
measuring instrument or procedure,
validity is concerned with the study's
success at measuring what the researchers
set out to measure.
Validity
• Depends on the Purpose of the measure
– E.g. a ruler may be a valid measuring device for length, but
isn’t very valid for measuring volume
• Measuring what ‘it’ is supposed to
• Must be inferred from evidence; cannot be directly
measured
What would be valid measures of…

• Intelligence?
• Religiosity?
• Knowledge of RPTS 336 material?
• Tourism motivations?
• Commitment to a leisure activity?
• Satisfaction with a leisure service?
• Environmental ethic?
Types of validity
• Face (content) validity—professional
agreement that variables cover range of
meanings included within the concept
– Items should be evaluated for their presumed
relevance
– Items should cover a range of ideas rather than
a single topic area
– Items should be evaluated in terms of the
abilities of the individuals under investigation
Types of validity
• Construct validity—the degree to which a
measure relates to other variables, as
expected, within a given system of theoretical
relationships
• Satisfaction and Program Quality
• Predictive validity—extent to which a
measure predicts some future event
• Self-esteem and GPA
Factors that can lower Validity
• Unclear directions
• Difficult reading vocabulary and sentence
structure
• Ambiguity in statements
• Inadequate time limits
• Inappropriate level of difficulty
• Poorly constructed test items
• Test items inappropriate for the outcomes
being measured
Continued….
• Tests that are too short
• Improper arrangement of items (complex to
easy?)
• Identifiable patterns of answers
• Teaching
• Administration and scoring
• Students
• Nature of criterion
External Validity
• Answers the question of generalizability
• To what populations or settings can this
effect be generalized?
• Two aspects
• Population validity
• Ecological Validity
Population Validity
• Is the actual sample representative of
• the theoretical population?
• To determine, need to identify:
– Theoretical population
– Accessible population
– Sampling design and selected sample
– Actual sample
Cronbach’s alpha
• Cronbach’s alpha, α (or coefficient alpha),
developed by Lee Cronbach in 1951, measures
reliability, or internal consistency. “Reliability” is
another name for consistency.
• Cronbach’s alpha tests to see if multiple-
question Likert scale surveys are reliable. These
questions measure latent variables—hidden or
unobservable variables like: a person’s
conscientiousness, neurosis or openness. These
are very difficult to measure in real life.
Cronbach’s alpha will tell you how closely related
a set of test items are as a group.
Cronbach’s alpha

Where:
N = the number of items.
c̄ = average covariance between item-pairs.
v̄ = average variance.
Rule of Thumb for Results
A rule of thumb for interpreting alpha for dichotomous questions (i.e.
questions with two possible answers) or Likert scale questions is:
Cronbach’s alpha

In general, a score of more than 0.7 is usually okay. However, some


authors suggest higher values of 0.90 to 0.95.
Cronbach’s alpha
• Avoiding Issues with Cronbach’s Alpha
• Use the rules of thumb listed above with caution. A high level for alpha may mean
that the items in the test are highly correlated. However, α is also sensitive to the
number of items in a test. A larger number of items can result in a larger α, and a
smaller number of items in a smaller α. If alpha is high, this may mean redundant
questions (i.e. they’re asking the same thing).
• A low value for alpha may mean that there aren’t enough questions on the test.
Adding more relevant items to the test can increase alpha. Poor interrelatedness
between test questions can also cause low values, so can measuring more than
one latent variable.
• Confusion often surrounds the causes for high and low alpha scores. This can result
in incorrectly discarded tests or tests wrongly labeled as untrustworthy.
Psychometrics professor Mohsen Tavakol and medical education professor Reg
Dennick suggest that improving your knowledge about internal consistency
and unidimensionality will lead to the correct use of Cronbach’s alpha1:
• Unidimensionality in Cronbach’s alpha assumes the questions are only
measuring one latent variable or dimension. If you measure more than one
dimension (either knowingly or unknowingly), the test result may be meaningless.
You could break the test into parts, measuring a different latent variable or
dimension with each part. If you aren’t sure about if your test is unidimensional or
not, run Factor Analysis to identify the dimensions in your test.

You might also like