0% found this document useful (0 votes)
3 views

Reliability and Validity in Research

This document discusses the concepts of reliability and validity in research, emphasizing their importance as indicators of measurement quality. Reliability refers to the consistency of results over time, while validity pertains to the accuracy of what a study measures. The document outlines various types of reliability and validity, highlighting their interdependence and the factors that can affect them.

Uploaded by

naeematalati3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Reliability and Validity in Research

This document discusses the concepts of reliability and validity in research, emphasizing their importance as indicators of measurement quality. Reliability refers to the consistency of results over time, while validity pertains to the accuracy of what a study measures. The document outlines various types of reliability and validity, highlighting their interdependence and the factors that can affect them.

Uploaded by

naeematalati3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Reliability and Validity

Prof Tavonga Njaya, PhD

1.0 Introduction

In this note, I discuss two important concepts of reliability and validity and how they are used in
research. Validity and reliability are important concepts in research. They are key indicators of the
quality of a measuring instrument. Reliability and validity are concepts that capture the
measurement properties of a survey, questionnaire, observation schedule, test or any other type of
measure in quantitative research. Many qualitative researchers avoid the terms validity and
reliability and instead use terms such as credibility, trustworthiness, truth, value, applicability,
consistency and confirmability, when referring to criteria for evaluating the scientific merit of
qualitative research.

2.0 Reliability
Reliability, according to Joppe (2000), is the extent to which results are consistent over time.
Reliability is chiefly concerned with making sure the method of data gathering leads to consistent
results. The following are the types of reliability:

Test-retest reliability is a measure of reliability obtained by administering the same test twice
over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be
correlated in order to evaluate the test for stability over time.

Example: A test designed to assess student learning in psychology could be given to a group of
students twice, with the second administration perhaps coming a week after the first. The obtained
correlation coefficient (0-1.0) would indicate the stability of the scores.

Parallel forms reliability is a measure of reliability obtained by administering different versions


of an assessment tool (both versions must contain items that probe the same construct, skill,
knowledge base, etc.) to the same group of individuals. The scores from the two versions can then
be correlated in order to evaluate the consistency of results across alternate versions.

Inter-rater/observer reliability is a measure of reliability used to assess the degree to which


different judges or raters agree in their assessment decisions/answers/estimates. Inter-rater
reliability is useful because human observers will not necessarily interpret answers the same way;
raters may disagree as to how well certain responses or material demonstrate knowledge of the
construct or skill being assessed.

Internal consistency reliability is a measure of reliability used to evaluate the degree to which
different test items that probe the same construct produce similar results. There are two forms of
internal consistency reliability:

Average inter-item correlation is a subtype of internal consistency reliability. It is obtained by


taking all of the items on a test that probe the same construct (e.g., reading comprehension),

1
determining the correlation coefficient for each pair of items, and finally taking the average of all
of these correlation coefficients. This final step yields the average inter-item correlation.

Split-half reliability is another subtype of internal consistency reliability. The process of


obtaining split-half reliability is begun by “splitting in half” all items of a test that are intended to
probe the same area of knowledge (e.g., World War II) in order to form two “sets” of items. The
entire test is administered to a group of individuals, the total score for each “set” is computed, and
finally the split-half reliability is obtained by determining the correlation between the two totals
“set” scores.

The coefficient of internal consistency provides an estimate of the reliability of measurement and
is based on the assumption that items measuring the same construct should correlate. Reliability is
measured by a test called Cronbach’s alpha. Alpha was developed by Lee Cronbach in 1951 to
provide a measure of the internal consistency of a test or scale and it is expressed as a number
between 0 and 1. Internal consistency describes the extent to which all the items in the test measure
the same concept or construct and hence it is connected to the inter-relatedness of the items within
the test. If the items in a test are correlated to reach each other, the value of alpha is increased.
However, a high coefficient alpha does not always mean a high degree of internal consistency.

3.0 Validity
Validity refers to the degree to which a research study measures what it intends to measure. Leedy
(1980) describes validity as how sound or effective the measuring instruments are in gathering
relevant information. Saunders, Lewis and Thornhill (2009) define validity as “the extent to which
data collection methods accurately measure what they are intended to measure.” Validity requires
that an instrument is reliable, but an instrument can be reliable without being valid. For example,
a scale/balance that is incorrectly calibrated may yield exactly the same, albeit inaccurate, weight
values. The balance will be reliable by giving consistent results/readings but its measurements will
not be valid.

There are two main types of validity, internal and external. Internal validity refers to the validity
of the measurement and test itself (measurement technique), whereas external validity refers to the
ability to generalise the findings to the target population.

Types of internal validity


There are four main types of validity used when assessing internal validity.

Face validity: This refers to whether a technique looks as if it should measure the variable it
intends to measure. For example, a method where a participant is required to click a button as soon
as a stimulus appears and this time is measured appears to have face validity for measuring reaction
time.

Concurrent validity: This compares the results from a new measurement technique to those of a
more established technique that claims to measure the same variable to see if they are
related. Often two measurements will behave in the same way, but are not necessarily measuring
the same variable, therefore this kind of validity must be examined thoroughly. Examples include
IQ, Emotional Quotient and most grading systems used in schools.

2
Predictive validity: This is when the results obtained from measuring a construct can be
accurately used to predict behaviour. There are obvious limitations to this as behaviour cannot be
fully predicted to great depths, but this validity helps predict basic trends to a certain degree. An
example of predictive validity is a written drivers’ licence test that can be validated by
hypothesising that the test will accurately predict driving competency amongst a group of drivers.
If the test fails to accurately predict this competency, then it is not valid.

Content Validity
This is a subjective measure but unlike face validity we ask whether the content of a measure
covers the full domain of the content. If a researcher wanted to measure introversion they would
have to first decide what constitutes a relevant domain of content for that trait. This is considered
a subjective form of measurement because the researcher relies on people’s perception for
measuring constructs that would otherwise be difficult to measure. The researcher can use experts
in the field of research. For example in case of depression survey, the researcher could turn to
experts in depression to consider their questions against the known symptoms of depression such
as depressed mood, sleeping problems and weight change. This study can be made more objective
through the use of rigorous statistical tests. For example you could have a content validity study
that informs researchers how items used in a survey represent their content domain, how clear they
are, and the extent to which they maintain the theoretical factor structure assessed by the factor
analysis.

Construct validity: This is whether the measurements of a variable in a study behave in exactly
the same way as the variable itself. This involves examining past research regarding different
aspects of the same variable.

A research study will often have one or more types of these validities but maybe not them all so
caution should be taken. For example, using measurements of weight to measure the variable
height has concurrent validity as weight generally increases as height increases, however it lacks
construct validity as weight fluctuates based on food deprivation whereas height does not.

Criterion-Related Validity
This is also referred to as Instrumental Validity. It is appropriate for researchers to establish
criterion validity. This is the accuracy of a measure that is demonstrated by comparing it with a
measure that has been demonstrated to be valid. In the case we have the gold standard which is
clinical diagnosis of depression. The researcher could see how his/her questionnaire results relate
to actual clinical diagnoses of depression among workers surveyed. For this to work the researcher
must know that the criterion has been measured well. And be aware that appropriate criteria do
not always exist. What is being done is checking the performance of operationalization against
criteria. The criteria you use as a standard of judgment accounts for the different approaches you
would use.

Formative Validity when applied to outcomes assessment it is used to assess how well a measure
is able to provide information to help improve the program under study.

3
Sampling Validity (similar to content validity) ensures that the measure covers the broad range
of areas within the concept under study. Not everything can be covered, so items need to be
sampled from all of the domains. This may need to be completed using a panel of “experts” to
ensure that the content area is adequately sampled. Additionally, a panel can help limit “expert”
bias (i.e. a test reflecting what an individual personally feels are the most important or relevant
areas). Example: When designing an assessment of learning in the theatre department, it would
not be sufficient to only cover issues related to acting. Other areas of theatre such as lighting,
sound, functions of stage managers should all be included. The assessment should reflect the
content area in its entirety

What are the threats to Internal Validity?


Factors that can affect internal validity can come in many forms and it is important that these are
controlled for as much as possible during research to reduce their impact on validity. Here are
some factors which affect internal validity:
 Subject variability
 Size of subject population
 Time given for the data collection or experimental treatment
 History
 Attrition
 Maturation
 Instrument/task sensitivity

External Validity
External validity is one of the most difficult of the validity types to achieve and is at the foundation
of every good experimental design. Many scientific disciplines, especially the social sciences face
a long battle to prove that their findings represent the wider population in real world situations.

The main criteria of external validity is the process of generalisation and whether results
obtained from a small sample group, often in laboratory surroundings, can be extended to make
predictions about the entire population. The reality is that if a research project has poor external
validity, the results will not be taken seriously, so any research design must justify sampling and
selection methods.
Here are seven important factors that affect external validity:
 Population characteristics (subjects)
 Interaction of subject selection and research
 Descriptive explicitness of the independent variable
 The effect of the research environment
 Researcher or experimenter effects
 Data collection methodology
 The effect of time

4.0 Conclusion
It is important that validity and reliability should not be viewed as independent qualities. A
measurement cannot be valid unless it is reliable, it must be both valid and reliable if it is to be
depended upon as an accurate representation of a concept or attribute (Wan, 2002). A research

4
study design that meets standards for validity and reliability produces results that are both accurate
(validity) and consistent (reliability). Knowledge of validity and reliability aids the researcher in
designing and judging one’s own work, it also makes one a better consumer of research through
the ability to evaluate research literature and in choosing among alternative research designs and
interventions (Gliner & Morgan, 2000).

You might also like