0% found this document useful (0 votes)

24 views18 pages

Measuring Reliability and Validity

The document discusses the concepts of reliability and validity in testing, emphasizing that a test must be reliable to be considered valid. It outlines various methods for assessing reliability, including test-retest, parallel forms, internal consistency, and inter-rater reliability, while also detailing types of validity such as face, construct, content, and criterion validity. Additionally, it provides examples and suggestions for improving both reliability and validity in assessments.

Uploaded by

pmaxwell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views18 pages

Measuring Reliability and Validity

Uploaded by

pmaxwell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

RELIABILITY

&
VALIDITY
Taking a closer look!

Facilitator: Shellon Samuels-White

Can a test be valid and not reliable?

A test can be reliable, but not valid. However, for a

test to be considered valid, it must be reliable.

Reliability provides the consistency needed to

obtain validity, and enables us to interpret
assessment results with greater confidence.

Factors such as Test length and test duration can affect reliability.
A test with more items will have a higher reliability. Insufficiency of time decreases the
reliability of a test. E.g. If the time is insufficient it may cause excitement and
carelessness on part of the test taker resulting in reduced reliability of the test.
RELIABILITY
Would you get the same results if you used
the assessment at different times?

If an assessment is being rated, would

different markers rate the same way?

To what extent will the results be free from

errors?

Reliability is the degree to which an assessment tool produces stable and consistent
results.
Testing for Reliability
TEST-RETEST METHOD
Reliability is obtained by administering the same test to the same
group after some time interval. The scores from Time 1 and Time
2 are then be correlated in order to evaluate the test for stability
over time. You use it when you are measuring something that you
expect to stay constant in your sample.

EXAMPLE:
A test designed to assess student reading levels could be given to a
group of students twice, with the second administration coming a
week after the first.
PARALLEL FORMS METHOD (Equivalent/Alternate forms)
Reliability is obtained by administering different versions of an
assessment tool to the same group of individuals. Both versions must
contain items that probe the same skill, knowledge base, etc. The scores
from the two versions can then be correlated in order to evaluate the
consistency of results across alternate versions.

EXAMPLE
A teacher creates two versions of a fraction test (the parallel forms). Both
versions are administered one after the other: The results of the two
tests are compared, and the results are almost identical, indicating high
parallel forms reliability.
INTERNAL CONSISTENCY METHOD:
Split Half Method
Internal consistency assesses the correlation between multiple items in a test that are intended to measure the same
construct.
Why it’s important
When you devise a set of questions or ratings that will be combined into an overall score, you have to make sure that all
of the items really do reflect the same thing. If responses to different items contradict one another, the test might be
unreliable.

SPLIT-HALF METHOD
This process begins by “splitting in half” all items of a test that are intended to probe the same area of knowledge. You
randomly split a set of measures into two sets. After testing the entire set on the respondents, you calculate the
correlation between the two sets of responses.
EXAMPLE
A group of respondents are presented with a set of statements designed to measure optimistic and pessimistic mindsets.
They must rate their agreement with each statement on a scale from 1 to 5. If the test is internally consistent, an
optimistic respondent should generally give high ratings to optimism indicators and low ratings to pessimism indicators.
◦ How are Parallel forms and Split half reliability
similar/different?
SPLIT HALF
PARALLEL FORMS

Version a
Time A

Version B
Time B
First Half Second Half
Time A Time A
INTER-RATER RELIABILITY METHOD
This is a measure of reliability used to assess the degree to which
different judges or raters agree in their assessment decisions. Inter-
rater reliability is useful because human observers will not
necessarily interpret answers the same way; raters may disagree as
to how well certain responses or material demonstrate knowledge
of the construct or skill being assessed.

EXAMPLE
Inter-rater reliability might be employed when different judges are
evaluating the degree to which art portfolios meet certain
standards. Then they calculate the correlation between their
different sets of results. If all the markers give similar ratings, the
portfolio task has high interrater reliability.
IMPROVIING INTER-RATER RELIABILITY

Clearly define your variables and the methods that will

be used to measure them.

Develop detailed, objective criteria for how the

variables will be rated, counted or categorized.

If multiple researchers are involved, ensure that they all

have exactly the same information and training.
Factors that lower the reliability of test scores

 Test scores are based on too few items.

Remedy: Use longer tests or accumulate scores from several
short tests.

 Testing conditions are inadequate:

Remedy: Arrange opportune time for administration &
eliminate interruptions, noise and other disrupting factors.

 Scoring is subjective
Remedy: prepare scoring keys and follow carefully when
scoring essay answers.
The scale is said to be reliable but not a
valid measure of your weight. Explain
why?
VALIDITY
Validity is inferred from available evidence (not
measured).

Validity depends on many different types of evidences. If

we infer that our assessment indicates that students have
good ‘reasoning ability’, we would like some evidence to
support that fact that the results actually reflect that
construct.

Validity refers to the inferences drawn, not the instrument.

Types of Validity
FACE VALIDITY
Face validity considers how suitable the content of a test seems to
be on the surface. It is an informal and subjective assessment.

EXAMPLE
You create a survey to measure the regularity of people’s dietary
habits. You review the survey items, which ask questions about
every meal of the day and snacks eaten every day of the week. On
its surface, the survey seems like a good representation of what
you want to test, so you consider it to have high face validity.
CONSTRUCT VALIDITY
This is used to ensure that the measure actually measures what it is intended to
measure (i.e. the construct), and not other variables. A construct refers to a concept
or characteristic that can’t be directly observed, but can be measured by observing
other indicators that are associated with it. E.g. intelligence, obesity, job satisfaction,
or depression.

Example
If you develop a questionnaire to diagnose depression, you need to know: does the
questionnaire really measure the construct of depression? Or is it actually measuring
the respondent’s mood, self-esteem, or some other construct?
To achieve construct validity, The questionnaire must include only relevant questions
that measure known indicators of depression.
CONTENT VALIDITY
 Content validity refers to the actual content within a test. A test that is valid in content should
adequately examine all aspects that define the objective. The determination of content-validity “should
include several teachers (and content experts when possible) in evaluating how well the test represents
the content taught.”

 Content validity is not “tested for”. Rather it is “assured” by the informed item selections made by
experts in the domain.

Example

A mathematics teacher develops an end-of-semester algebra test for her class. The test should cover every
form of algebra that was taught in the class. If some types of algebra are left out, then the results may not
be an accurate indication of students’ understanding of the subject. Similarly, if she includes questions
that are not related to algebra, the results are no longer a valid measure of algebra knowledge.
CRITERION VALIDITY
Criterion validity evaluates how closely the results of your test correspond to the results
of a different test. The criterion is an external measurement of the same thing. It is
usually an established or widely-used test that is already considered valid.

Example
A university professor creates a new test to measure applicants’ English writing ability.
To assess how well the test really does measure students’ writing ability, she finds an
existing test that is considered a valid measurement of English writing ability, and
compares the results when the same group of students take both tests. If the outcomes
are very similar, the new test has a high criterion validity.
What are some ways to improve validity?

 Make sure your goals and objectives are clearly defined.

Expectations of students should be written down.

 Match your assessment measure to your goals and objectives. Have

the test reviewed by faculty at other schools to obtain feedback
from an outside party who is less invested in the instrument.

 Get students involved; have the students look over the assessment
for troublesome wording, or other difficulties.

 If possible, compare your measure with other measures, or data that

may be available.
SOURCES
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents, pp. 231-249.

Dimitrov, D. M., & Rumrill, Jr, P. D. (2003). Pretest-posttest designs and measurement of change. Work: A Journal of
Prevention, Assessment and Rehabilitation 20(2), 159-165.

Gronlund, N., Waugh, C. (2009) Assessment of student achievement (9th Ed.). New York: Pearson.

Middleton, F. (2020, June 26) Types of reliability and how to measure them.
https://www.scribbr.com/methodology/types-of-reliability/

Middleton, F. (2020, June 26) The four types of Validity. https://www.scribbr.com/methodology/types-of-validity/

Validity & Reliability (Chapter 4 - Learning Assessment)
No ratings yet
Validity & Reliability (Chapter 4 - Learning Assessment)
75 pages
Summary Notes - Qualities of A Good Test
No ratings yet
Summary Notes - Qualities of A Good Test
49 pages
Verified PDF Download EKG Plain and Simple 3rd Edition FULL Version
100% (1)
Verified PDF Download EKG Plain and Simple 3rd Edition FULL Version
402 pages
Chapter 4
No ratings yet
Chapter 4
86 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Evaluation: Prepared By: Usha Kiran Poudel SBA, 2079
No ratings yet
Evaluation: Prepared By: Usha Kiran Poudel SBA, 2079
121 pages
Establishing Validity and Reliability of Tests
No ratings yet
Establishing Validity and Reliability of Tests
48 pages
Get Test Bank For Offensive Cyber Operations Understanding Intangible Warfare Daniel Moore HQ File PDF Download
No ratings yet
Get Test Bank For Offensive Cyber Operations Understanding Intangible Warfare Daniel Moore HQ File PDF Download
405 pages
Thesis About Cramming
67% (9)
Thesis About Cramming
28 pages
Validity and Reliability in Assessment
No ratings yet
Validity and Reliability in Assessment
26 pages
Validity and Reliability
100% (4)
Validity and Reliability
19 pages
Validity and Reliability in Research
0% (1)
Validity and Reliability in Research
13 pages
Qualities of Test (Validity & Relibility Etc)
No ratings yet
Qualities of Test (Validity & Relibility Etc)
38 pages
Qualities of Good Test
No ratings yet
Qualities of Good Test
37 pages
Validity and Reliability Lesson 3.
No ratings yet
Validity and Reliability Lesson 3.
48 pages
Concepts of Reliability in Research
No ratings yet
Concepts of Reliability in Research
27 pages
Validity Merged
No ratings yet
Validity Merged
8 pages
Validity and Reliability of Instruments
No ratings yet
Validity and Reliability of Instruments
26 pages
What Is Test
No ratings yet
What Is Test
35 pages
5 A Assessment Practices
No ratings yet
5 A Assessment Practices
24 pages
Topic 8F Validity Reliability and Sources of Error
No ratings yet
Topic 8F Validity Reliability and Sources of Error
24 pages
2.measurement of Validity Reliability
No ratings yet
2.measurement of Validity Reliability
31 pages
L9 Qualities of A Good Measuring Instrument
No ratings yet
L9 Qualities of A Good Measuring Instrument
22 pages
Issues of Realiability and Validity
100% (1)
Issues of Realiability and Validity
23 pages
Qualities of A Good Test
No ratings yet
Qualities of A Good Test
39 pages
Validity and Reliability
No ratings yet
Validity and Reliability
19 pages
Reliability & Validity
No ratings yet
Reliability & Validity
18 pages
Topic 3 Characteristics and Principles of Assessment
100% (1)
Topic 3 Characteristics and Principles of Assessment
45 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Validity and Reliability
100% (2)
Validity and Reliability
20 pages
Ed 216 NOTES
No ratings yet
Ed 216 NOTES
21 pages
Validity and Reliability
No ratings yet
Validity and Reliability
33 pages
Validity&Reliability
No ratings yet
Validity&Reliability
16 pages
Midterm Educ-3 Talo
No ratings yet
Midterm Educ-3 Talo
13 pages
Lesson 6.2 Item Analysis and Validation
No ratings yet
Lesson 6.2 Item Analysis and Validation
24 pages
Meai 21
No ratings yet
Meai 21
11 pages
Running Head: Reliability and Validity 1
No ratings yet
Running Head: Reliability and Validity 1
10 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
Lesson 6.2 Item Analysis and Validation 3
No ratings yet
Lesson 6.2 Item Analysis and Validation 3
11 pages
Qualities of Good Measuring Instruments
56% (9)
Qualities of Good Measuring Instruments
4 pages
Business Research Methods: Iv - Semester Bba Bangalore University
100% (1)
Business Research Methods: Iv - Semester Bba Bangalore University
89 pages
Validity and Reliability
No ratings yet
Validity and Reliability
7 pages
Test - Education (1) STANDARDIZED TESTS
No ratings yet
Test - Education (1) STANDARDIZED TESTS
9 pages
Reliability and Validity in Assessment
100% (2)
Reliability and Validity in Assessment
5 pages
Psy 323 Topic 3
No ratings yet
Psy 323 Topic 3
5 pages
Kyu Edu 2301 WK3
No ratings yet
Kyu Edu 2301 WK3
5 pages
Class Quiz 4
No ratings yet
Class Quiz 4
6 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
What Is Questionnaire?
No ratings yet
What Is Questionnaire?
4 pages
Validity and Relability
No ratings yet
Validity and Relability
4 pages
Unit Iii - Designing and Developing Assessments
No ratings yet
Unit Iii - Designing and Developing Assessments
5 pages
Validity and Reliability
No ratings yet
Validity and Reliability
3 pages
Key 1
0% (1)
Key 1
1 page
Validity and Reliability
No ratings yet
Validity and Reliability
19 pages
Validity & Realibility
No ratings yet
Validity & Realibility
13 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
Exploring Reliability in Academic Assessment
No ratings yet
Exploring Reliability in Academic Assessment
6 pages
Advanced Well Testing
100% (1)
Advanced Well Testing
12 pages
Conventional Paper-I - EC
No ratings yet
Conventional Paper-I - EC
8 pages
Reliability and Validity: Written Report in Educ 11a
No ratings yet
Reliability and Validity: Written Report in Educ 11a
4 pages
Exam-Auc-1 Anna University Chennai CHENNAI 600 025. Instructions For The Conduct of Examination
100% (3)
Exam-Auc-1 Anna University Chennai CHENNAI 600 025. Instructions For The Conduct of Examination
34 pages
What Is Validit1
No ratings yet
What Is Validit1
5 pages
Lesson 8
No ratings yet
Lesson 8
1 page
St. Matthew Handbook PS 2019-20
No ratings yet
St. Matthew Handbook PS 2019-20
41 pages
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
No ratings yet
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
6 pages
Live Project Final
No ratings yet
Live Project Final
27 pages
Trace College
No ratings yet
Trace College
3 pages
Attention in Psychology
No ratings yet
Attention in Psychology
10 pages
Validity Refers To How Well A Test Measures What It Is Purported To Measure
No ratings yet
Validity Refers To How Well A Test Measures What It Is Purported To Measure
6 pages
Examining The Impacts of Chatgpt On Student Motivation and Engagement
No ratings yet
Examining The Impacts of Chatgpt On Student Motivation and Engagement
28 pages
PORTFOLIO For HANDWRITTEN M1-M11
No ratings yet
PORTFOLIO For HANDWRITTEN M1-M11
26 pages
Verification Application Form 2
No ratings yet
Verification Application Form 2
8 pages
Position Paper
No ratings yet
Position Paper
3 pages
Vocational 2012 June 12
No ratings yet
Vocational 2012 June 12
2 pages
CJR English
100% (2)
CJR English
3 pages
Alfred Binet
No ratings yet
Alfred Binet
13 pages
Independent Research Project Form (RP3) : Hong Kong Examinations and Assessment Authority 99642
No ratings yet
Independent Research Project Form (RP3) : Hong Kong Examinations and Assessment Authority 99642
3 pages
TESDA SOP CO 01 FO7 Competency Based Curriculum
No ratings yet
TESDA SOP CO 01 FO7 Competency Based Curriculum
3 pages
APP2988856
No ratings yet
APP2988856
2 pages
Request For Work Immersion: El Bryan R. Lucena
No ratings yet
Request For Work Immersion: El Bryan R. Lucena
1 page
10 1 1 378 9757 PDF
No ratings yet
10 1 1 378 9757 PDF
4 pages
G10 - Second Prelims Cir - 2023-1
No ratings yet
G10 - Second Prelims Cir - 2023-1
3 pages
(Confirm Personal Details) : (Personal Details) Please Fill The Details As Per Class 10th Marksheet
No ratings yet
(Confirm Personal Details) : (Personal Details) Please Fill The Details As Per Class 10th Marksheet
3 pages
Cambridge Ordinary Level
No ratings yet
Cambridge Ordinary Level
2 pages
Ankit Kumar Jha Application Form
No ratings yet
Ankit Kumar Jha Application Form
2 pages
Uttarakhand Public Service Commission, Haridwar: Lecturer Govt Inter College Exam - 2014 Admit Card
No ratings yet
Uttarakhand Public Service Commission, Haridwar: Lecturer Govt Inter College Exam - 2014 Admit Card
1 page
Hall Ticket 2504219238
No ratings yet
Hall Ticket 2504219238
1 page
Performance-Based Assessment for 21st-Century Skills
From Everand
Performance-Based Assessment for 21st-Century Skills
Todd Stanley
4.5/5 (14)
The Complete ISEE Upper Level Test Prep Book: Over 3000 Practice Questions to Help You Pass Your Exam
From Everand
The Complete ISEE Upper Level Test Prep Book: Over 3000 Practice Questions to Help You Pass Your Exam
Caleb Roster
No ratings yet
Meeting the Assessment Requirements of the Award in Education and Training
From Everand
Meeting the Assessment Requirements of the Award in Education and Training
Nabeel Zaidi
No ratings yet

Measuring Reliability and Validity

Uploaded by

Measuring Reliability and Validity

Uploaded by

RELIABILITY

Facilitator: Shellon Samuels-White

A test can be reliable, but not valid. However, for a

Reliability provides the consistency needed to

If an assessment is being rated, would

To what extent will the results be free from

Clearly define your variables and the methods that will

Develop detailed, objective criteria for how the

If multiple researchers are involved, ensure that they all

 Test scores are based on too few items.

 Testing conditions are inadequate:

Validity depends on many different types of evidences. If

Validity refers to the inferences drawn, not the instrument.

 Make sure your goals and objectives are clearly defined.

 Match your assessment measure to your goals and objectives. Have

 If possible, compare your measure with other measures, or data that

Middleton, F. (2020, June 26) The four types of Validity. https://www.scribbr.com/methodology/types-of-validity/

You might also like