0% found this document useful (0 votes)
27 views23 pages

Understanding Validity and Reliability

The document discusses the concepts of reliability and validity in measurement, defining reliability as the consistency of a measure and validity as the accuracy of what is being measured. It outlines different types of reliability (test-retest, internal consistency, inter-rater) and validity (face, content, criterion, discriminant), providing examples for each. The importance of ensuring that measurements are both reliable and valid is emphasized for accurate research outcomes.

Uploaded by

Chubby Director
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Research Accuracy,
  • Measurement Consistency,
  • Subjective Assessment,
  • Criterion Validity,
  • Validity Testing,
  • Survey Methodology,
  • Intelligence Measurement,
  • Research Validity,
  • Data Reliability,
  • Survey Design
0% found this document useful (0 votes)
27 views23 pages

Understanding Validity and Reliability

The document discusses the concepts of reliability and validity in measurement, defining reliability as the consistency of a measure and validity as the accuracy of what is being measured. It outlines different types of reliability (test-retest, internal consistency, inter-rater) and validity (face, content, criterion, discriminant), providing examples for each. The importance of ensuring that measurements are both reliable and valid is emphasized for accurate research outcomes.

Uploaded by

Chubby Director
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Research Accuracy,
  • Measurement Consistency,
  • Subjective Assessment,
  • Criterion Validity,
  • Validity Testing,
  • Survey Methodology,
  • Intelligence Measurement,
  • Research Validity,
  • Data Reliability,
  • Survey Design

Validity

and
Reliability
Reliability
▪Reliability - refers to the consistency of the
measure
▪Reliability refers to how consistently a method
measures something. If the same result can be
consistently achieved by using the same
methods under the same circumstances, the
measurement is considered reliable.
Reliability

You measure the temperature of a liquid


sample several times under identical
conditions. The thermometer displays the
same temperature every time, so the results
are reliable.
Validity
▪ is the extent to which the scores from the measure
represent the variable they are intended to.
▪ Validity refers to how accurately a method
measures what it is intended to measure. If
research has high validity, that means it produces
results that correspond to real properties,
characteristics, and variations in the physical or
social world.
Validity
If the thermometer shows different
temperatures each time, even though you
have carefully controlled conditions to ensure
the sample’s temperature stays the same, the
thermometer is probably malfunctioning, and
therefore its measurements are not valid.
Kinds of Reliability
▪Test Retest
▪Internal Consistency
▪Inter - Rater
Test Retest
▪ Test Retest - researchers measure the
construct that they assumed to be consistent
across time, then the scores they obtain
should also be consistent across time.
(Pearson r)
▪ Do you get the same results when you repeat
the measurement?
Test Retest
▪ A group of participants complete a
questionnaire designed to measure
personality traits. If they repeat the
questionnaire days, weeks or months apart
and give the same answers, this indicates
high test-retest reliability.
Internal Consistency
▪ Internal Consistency - consistency of the
respondents' responses across items on a
multiple item measure. (Cronbach's alpha)
▪ The consistency of the measurement itself
▪ Do you get the same results from different
parts of a test that are designed to measure
the same thing?
Internal Consistency
▪ You design a questionnaire to measure self-
esteem. If you randomly split the results into
two halves, there should be a strong
correlation between the two sets of
results. If the two results are very different,
this indicates low internal consistency.
Internal Consistency
▪ Inter-Rater is the extent to which different
observers are consistent in their
judgement. (Cronbach's alpha (analogous) or
Cohen's Kaрра (categorical).
▪ The consistency of a measure across raters
or observers.
▪ Do you get the same results when different
people conduct the same measurement?
Internal Consistency
▪ Based on an assessment criteria checklist,
five examiners submit substantially different
results for the same student project. This
indicates that the assessment checklist has
low inter-rater reliability (for example, because
the criteria are too subjective).
Kinds of Validity
▪Face Validity
▪Content Validity
▪Criterion Validity
▪Discriminant Validity
Face Validity
▪ Face Validity is the extent to which a
measurement method appears "on its face" to
measure the construct of interest. A test is
considered to have high face validity if there is a
high level of agreement among raters.
▪ It’s similar to content validity, but face validity is
a more informal and subjective assessment.
▪ Does the content of the test appear to be
suitable to its aims?
Face Validity
▪ You create a survey to measure the regularity
of people’s dietary habits. You review the
survey items, which ask questions about
every meal of the day and snacks eaten in
between for every day of the week. On its
surface, the survey seems like a good
representation of what you want to test, so
you consider it to have high face validity.
Content Validity
▪ Content Validity is the extent to which a measure
"covers" the construct of the interest. A test lacks
content validity if it doesn't cover all aspects of a
construct that would be measured or if it covers
topics that are unrelated to the construct in any way.
▪ The extent to which the measurement covers all
aspects of the concept being measured.
▪ Is the test fully representative of what it aims to
measure?
Content Validity
▪ A test that aims to measure a class of
students’ level of Spanish contains reading,
writing and speaking components, but no
listening component. Experts agree that
listening comprehension is an essential
aspect of language ability, so the test lacks
content validity for measuring the overall level
of ability in Spanish.
Criterion Validity
▪ Criterion Validity is the extent to which
respondents scores on measure are correlated
with other variables.
▪ The extent to which the result of a measure
corresponds to other valid measures of the
same concept.
▪ Do the results accurately measure the concrete
outcome they are designed to measure?
Criterion Validity
▪ A survey is conducted to measure the political
opinions of voters in a region. If the results
accurately predict the later outcome of an
election in that region, this indicates that the
survey has high criterion validity.
Discriminant Validity
▪ Is the extent to which scores on a measure are not
correlated with measures of variables that are
conceptually distinct. Discriminant validity shows
whether a test that is designed to measure a
particular construct does not correlate with tests
that measure different constructs. This is based on
the idea that we wouldn't expect to see the same
results from two tests that are meant to measure
different things (e.g., a math test vs a spelling test).
Discriminant Validity
You are researching extroversion as a personality trait
among marketing students. To establish discriminant validity,
you must also measure an unrelated construct, such as
intelligence.

You have developed a questionnaire to measure


extroversion, but you also ask your respondents to fill in a
second questionnaire measuring intelligence in order to test
the discriminant validity of your questionnaire.
Discriminant Validity
Since the two constructs are unrelated, there should
be no significant relationship between the scores of
the two tests.

If there is a correlation, then you may be measuring


the same construct in both tests. This is an indication
of poor discriminant validity.

You might also like