0% found this document useful (0 votes)

6 views

Validity Reliability.2

The document discusses different types of test validity including content validity, face validity, predictive validity, concurrent validity, construct validity, convergent validity, and divergent validity. It provides examples and procedures for establishing each type of validity. It also discusses factors that affect test reliability and how reliability can be measured.

Uploaded by

Course Unknown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Validity Reliability.2

Uploaded by

Course Unknown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 77

ESTABLISH TEST

VALIDITY AND
RELIABILITY
QUALITIES OF GOOD TEST:

1.VALIDITY
2.RELIABILITY
WHAT IS TEST VALIDITY?
A measure is valid when it
measures what it is supposed
to measure.
• If the quarterly exam is valid, then the
contents should directly measure the
objectives of the curriculum.

• If a scale that measures personality is

composed of five factors, then the scores on
the five factors should have highly correlated
items.

• In an entrance exam is valid, it should predict

students’ grades after the first semester.
TYPES OF VALIDITY
1. CONTENT VALIDITY
2. FACE VALIDITY
3. PREDICTIVE VALIDITY
4. CONSTRUCT VALIDITY
5. CONCURRENT VALIDITY
6. CONVERGENT VALIDITY
7. DIVERGENT VALIDITY
CONTENT VALIDITY
When the items represent the domain being
measured.

Procedure:
The items are compared with the objectives of
the program. The items need to measure
directly the objectives (for achievement) or
definition (for scales). A reviewer conducts the
checking.
FACE VALIDITY
When the test is presented well, free from
errors, and administered well.

Procedure:
The test items and layout are reviewed and
tried out on a small group of respondents. A
manual for administration can be made as a
guide for the test administrator.
PREDICTIVE
VALIDITY
A measure should predict a
future criterion. An example is
an entrance exam predicting the
grades of the students after the
first semester.
CONSTRUCT VALIDITY
The components or factors of the test should
contain items that are strongly correlated.

• A good test must have strong construct

validity, meaning that it accurately measures
the underlying theoretical construct it is
intended to measure. This ensures that the
test is capturing the concept it claims to
assess.
CONCURRENT VALIDITY
When two or more measures are present for
each examinee that measure the same
characteristics.

Procedure:
To determine concurrent validity, you have to
measure the correlation of results from an
existing test and a new test and demonstrate
that the two give similar results.
CONVERGENT VALIDITY
When the components or factors of a test are
hypothesized to have a positive correlation.
• For a test to be considered of high quality, it
should demonstrate convergent validity by
showing strong correlations with other
measures that assess the same or related
constructs. This helps establish the test's
credibility and accuracy.
DIVERGENT VALIDITY
When the components or factors of a test
are hypothesized to have a negative
correlation.
• crucial for a good test as it confirms that
the test is distinct from measures of
unrelated constructs. A high-quality test
should show low correlations with measures
of constructs that are theoretically
unrelated.
Group activity: Analyze the given cases of each type of validity. Then answer the
following question below.

1. Content validity

A coordinator in science is checking the science test paper for grade 4. She asked the
grade 4 science teacher to submit the table of specifications containing the objectives of
the lesson and corresponding items. The coordinator checked whether each item is
aligned with the objectives.

• How are the objectives used when creating test items?

• How is content validity determined when given the objectives and the items in a test?
• What should be presented in a test table of specifications when determining content
validity?
• Who checks the content validity of items?
2. Face validity

The assistant principal browsed the test paper made by the math
teacher. She checked if the contents of the items are about
mathematics. She examined if instructions are clear. She browsed
through the items if the grammar is correct and if the vocabulary is
within the students’ level of understanding.

What can be done in order to ensure that the assessment appears

to be effective?
What practices are done in conducting face validity?
Why is face validity the weakest form of validity?
3. Predictive validity

The school admission’s office developed an examination. The officials wanted to

determine if the results of the entrance examination are accurate in identifying good
students. They took the grades of the students accepted for the first quarter. They
correlated the entrance exam results and the first-quarter grades. They found significant
and positive correlations between the entrance examination scores and grades. The
entrance examination results predicted the grades of students after the first quarter. Thus,
there was predictive-prediction validity.

• Why are two measures needed in predictive validity?

• What is the assumed connection between these two measures?
• How can we determine if a measure has predictive validity?
• What statistical analysis is done to determine predictive validity?
• How are the test results of predictive validity interpreted?
4. Concurrent validity

A school guidance counselor administered a math achievement test to

grade 6 students. She also has a copy of the students’ grades in math.
She wanted to verify if the math grades of the students are measuring
the same competencies as the math achievement test. The school
counselor correlated the math achievement scores and math grades to
determine if they are measuring the same competencies.

• What needs to be available when conducting concurrent validity?

• At least how many tests are needed for conducting concurrent
validity?
• What statistical analysis can be used to establish concurrent
validity?
• How are the results of a correlation coefficient interpreted for
concurrent validity?
5. Construct validity

A science test was made by a grade 10 teacher composed of four domains:

matter, living things, force and motion, and earth and space. There are 10
items under each domain. The teacher wanted to determine if the ten items
made under each domain really belonged to that domain. The teacher
consulted an expert in test measurement. They conducted a procedure called
factor analysis. Factor analysis is a statistical procedure done to determine if
the items written will load under the domain they belong.

What type of test requires construct validity?

What should the test have in order to verify its constructs?
What are constructs and factor in a test?
How are these factors verified if they are appropriate for the test?
What results come out in construct validity?
How are the results in constructed validity interpreted?
Construct validity continues…

The construct validity of a measure is reported in journal articles. The following

are guide questions used when searching for construct validity of a measure
from reports:
• what was the purpose of construct validity?
• What type of test was used?
• What are the dimensions or factors that were studied using construct
validity?
• What procedures was used to establish the construct validity?
• What statistics was used for the construct validity?
• What were the results of the test’s construct validity?
5. Convergent validity

A math teacher developed a test to be administered at the end of the school

year, which measures number sense, patterns and algebra, measurement,
geometry, and statistics. It is assumed by the math teacher that students’
competencies in number sense improves their capacity to learn patterns and
algebra, and other concepts. After administering the test, the scores were
separated for each area and these five domains were intercorrelated using
Pearson R. The positive correlation between number sense and patterns, and
algebra indicates that, when number sense scores increase, the pattern of
algebra also increase. This shows student learning of number sense scaffold
patterns and algebra competencies.

• What should the test have in order to conduct a convergent validity?

• What are done with the domain in the test divergent validity?
• What analysis is used to determine convergent validity?
• How are the result in the convergent validity interpreted
5. Divergent validity

An English taught metacognitive awareness strategy to comprehend a

paragraph for grade 11 students. She wanted to determine if the performance
of her students in reading comprehension would reflect well in the reading
comprehension test. She administered the same reading comprehension test
to another class which was not taught the metacognitive awareness strategy.
She compared the result using the T-test for independent samples and found
that the class that was taught the metacognitive awareness strategy
performed significantly better than the other group. The test has a divergent
validity:

• What conditions are needed to conduct divergent validity?

• What assumption is being proved in divergent validity?
• What statistics analysis can be used to establish divergent validity?
• How are the results of divergent validity interpreted?
WHAT IS TEST
RELIABILITY?
Is the consistency of the
responses to measure under
conditions:
1.When retested on the same
person;
2.When retested on the same
measure;
3. Similarity of responses across
items that measures the same
characteristics.
There are different factors that affect the reliability of the measure.
The reliability of the measure can be high or low, depending on the
following factors”

1. The number of items in a test- The more items a test has, the
likelihood of reliability is high. The probability of obtaining
consistent scores is high because of the large pool of items.
2. Individual differences of participants- every participant
possesses characteristics that affect their performance in a test,
such as fatigue, concentration, innate ability, perseverance, and
motivation. The individual factors change over time and affect the
consistency of the answer in a test.
3. External environment- the external environment may include
room temperature, noise level depth of instruction, exposure to
materials, and quality of instruction, which could affect changes n
the responses of examinees in a test
What are the different ways to
establish test reliability?
There are different ways in
determining the reliability of a test.
The specific kind of reliability will
depend on the
(1) variable you are measuring,
(2) type of test, and
(3) versions of the test.
The method of Testing Reliability

1. Test-retest
2. Parallel Forms
3. Internal Consistency
4. Inter-rater Reliability
1. Test-retest
The test needs to be administered at one time to a group
of examinees and administered it again at another time to
the “same group” of examinees.

This type of analysis is of value when we measure “traits”

or characteristics that do not change over time. Tests that
measure some constantly changing characteristic are not
appropriate for test-retest evaluation.
• The time interval is not more than 6
months between the first and second
administration of tests that measure stable
characteristics.
Coefficient of stability- the estimate of test-retest
reliability. Obtained using Pearson-r.

Correlation between administrations 1 and 2 should

be .70 or higher.
CORRELATION COEFFICIENT CORRELATION STRENGTH
.70 to 1.00 VERY STRONG
.50 to .70 STRONG
.30 to .50 MODERATE
0 to .30 WEAK
0 NONE
Pearson’s r describes the linear relationship
between two quantitative variables.
Pearson’s r in terms of statistical analysis:

Formula:

n – number of respondents
X – are scores of the 1st test
Y – are the scores of the 2nd test
XY – are the product of the 1st and 2nd test
Y²
RESPONDENTS
(N) X Y X² XY
1 50 51 2500 2601 2550
2 43 42 1849 1764 1806
3 48 48 2304 2304 2304
4 45 44 2025 1936 1980
5 40 41 1600 1681 1640
6 47 47 2209 2209 2209
7 52 51 2704 2601 2652
8 39 38 1521 1444 1482
9 44 43 1936 1849 1892
10 43 41 1849 1681 1763
TOTAL ⅀X =451 ⅀Y= 446 ⅀X²= 20497 ⅀Y²= 20070 ⅀XY= 20278
r

INTERPRETATION:
.70 to 1.00 VERY STRONG
.50 to .70 STRONG
.30 to .50 MODERATE
r 0 to .30 WEAK
0 NONE
Exercise:
Interpretation:
Complete the table an find the reliability of the two test. 0.70 - 1.00 very strong
X Y XY 0.50 - 0.70 strong
0.30 – 0.50 moderate
10 20 0 – 0.30 weak
9 15 0 none

6 12
10 18
12 19 FIND:
4 8 ∑X:
5 7 ∑Y:
7 10
∑:
16 17
∑:
8 13
∑XY:
2. Parallel Forms
• Compares two equivalent forms of a test that
measure the same attribute.

• Coefficient of Equivalence- the correlation

between the scores obtained on the two forms
represents the reliability coefficient of the test.

• Two ways to administer: Immediate or Delayed

Parallel Forms method
• Same no. of items
• Same format
• Same coverage
• Same difficulty
• Same instructions, time limits,
illustrative examples
Statistical measure:
Pearson’s r
3. Internal Consistency
• Used when tests are administered once.
• This model of reliability measures the internal
consistency of the test which is the degree to which
each test item measures the same construct.
• It is the intercorrelations among the items.
• If all items on a test measure the same construct,
then it has a good internal consistency.
A.Split-Half
• In split-half reliability, a test is given and divided
into halves that are scored separately.

• The results of one half of the test are then

compared with the results of the other.

• The two halves of the test can be created in a

variety of ways: odd-even, random.
• Each examinee will have two scored
coming form the same test. The scores on
each set should be close or consistent.

• Split-half is applicable when the test has a

large number of items.
Statistical measure:
Pearson’s r
Spearman Brown Formula
B. Cronbach’s Alpha
• Used in tests with no right or wrong
answers.

• Average of all split-halves.

• Disadvantage: affected by the number

of items.
C. KR20

• Kuder-Richardson 20

• The formula for calculating the

reliability of a test in which the items
are dichotomous, scored 0 or 1
(usually for right or wrong).
4. Inter-rater Reliability
• The degree of agreement or consistency
between two or more scorers (or judges or
raters) with regard to a particular measure.

• Sufficient training is needed.

• Inter-scorer reliability, judge reliability, observer

reliability, and inter-rater reliability.
Kendall's W (also known as Kendall's
coefficient of concordance)

• A method used for assessing the agreement

among raters.

• 1 (perfect agreement) to -1.

NO. OF NO. OF TYPE OF RELIABILITY
METHOD STATISTICAL MEASURE
FORMS SESSION MEASURE
Test-Retest 1 2 Measure of Stability Pearson r

Parallel Forms-Immediate 2 1
Measure of Equivalence Pearson r
Parallel Forms-Delayed
2 2

Pearson r and Spearman-

Split Half
Brown Formula

Cronbach’s Alpha Measure of Internal Cronbach’s Alpha

1 1
Consistency

Kuder-Richardson
Kuder-Richardson

measure of the consistency

Inter-rater 1 1 and agreement between two Kendall's W
or more raters
Improving
the Test
Items
A. Item Analysis
B. Difficulty Index
C. Discrimination Index
D. Table of Non-plausibility of
Distracters
Item Analysis

Item analysis is a process which examines

student responses to individual test items
(questions) to assess the quality of those items
and of the test as a whole.
Item Difficulty
• Refers to the percentage of learners who
answered an item correctly.

• The larger the percentage of getting an

item right, the easier the item. The higher
the difficulty index, the easier the item is
understood (Wood, 1960)
Interpretation:
DIFFICULTY INDEX REMARK

0.91 and above Very Easy

0.76 to 0.90 Easy

0.26 to 0.75 Moderate or average

0.11 to 0.25 Difficult

Below 0.10 Very Difficult

Discrimination Index
Refers to the power of the item to
discriminate the students between
those scored high and low in the overall
test.
Interpretation:
DISCRIMINATION INDEX REMARK

0.40 and above Very good item

0.30 to 0.39 Good Item
0.20 to 0.29 Reasonably good item
0.10 to 0.19 Maginal item
Below 0.10 Poor item
Types of Discrimination Index

POSITIVE DISCRIMINATION
-happens when more students in the upper group got the item
correctly than those students in the lower group.

NEGATIVE DISCRIMINATION
-happens when more students in the lower group got the item correctly
than those students in the upper group.

ZERO DISCRIMINATION
Happens when the number of students in the upper and lower group
who answer the test correctly are equal.
Table of Non-plausibility of Distracters

is a tool used in educational assessment to

evaluate the effectiveness of multiple-choice test
items.

• The table lists the distractors (incorrect answer

choices) for each test item along with a rating of
their plausibility.
Item distractor
Good item distractor should be:
1. Plausible (reasonable) option but not correct.
2. Have negative discrimination (chosen by
low-performing students)

• Effective distractor distracts students from the

correct answer.
• An effective distractor will attract low-performing
students than high-performing students.
Table of Non-plausibility of Distracters
STEP IN ITEM ANALYSIS:

1. Score the TEST

2. Arrange the test papers from highest to
lowest.
3. Separate the top 27% and the bottom
lower 27%.
Ex. The following scores are arranged from Highest to Lowest.
No. of students: 50 No. of Test Items: 50
UPPER GROUP LOWER GROUP
Upper 27% = 13.5 or 14 students Lower 27% =13.5 or 14
1. 50 11.43 21. 35 31. 29 41. 20

2. 50 12.42 22. 34 32. 27 42. 15

3. 49 13.41 23. 34 33. 27 43. 15

4. 48 14.40 24. 32 34. 27 44. 15

5. 48 15.38 25. 32 35. 25 45. 14

6. 46 16.38 26. 30 36. 25 46. 12

7. 45 17.37 27. 30 37. 23 47. 12

8. 44 18.35 28. 29 38. 22 48. 11

9. 43 19.35 29. 29 39. 20 49. 10

10. 43 20.35 30. 29 40. 20 50. 6

4. Prepare a tally sheet.

Item Upper Lower

Number Group=14 Group=14
1 14 14
2 13 11
3 14 11
4 14 7
5. Convert the frequencies to proportions.
Formula: P= no. of correct answer/n
Example: P=14/14 or P=1

Item Upper Lower

Number Group=14 Group=14
F P F P

1 14 1 14 1
2 13 0.93 11 0.79
3 14 1 11 0.79
4 14 1 7 0.5
6. Compute the Difficulty Index
Formula:
Item Difficulty =

DIFFICULTY INDEX REMARK

0.91 and above Very Easy
0.76 to 0.90 Easy
0.26 to 0.75 Moderate or average
0.11 to 0.25 Difficult
Below 0.10 Very Difficult
Item Difficulty
• Refers to the percentage of learners who
answered an item correctly.

• The larger the percentage of getting an

item right, the easier the item. The higher
the difficulty index, the easier the item is
understood (Wood, 1960)
Item Difficulty =

Item Upper Group=14 Lower Group=14 Difficul

Number ty
Index
F P F P

1 14 1 14 1 1
2 13 0.93 11 0.79 0.86
3 14 1 11 0.79 0.90
DIFFICULTY INDEX REMARK
0.91 above Very Easy
0.76 to 0.90 Easy
0.26 to 0.75 Moderate or average
0.11 to 0.25 Difficult
0.10 below Very Difficult
Item Upper Lower Difficulty REMARKS
No. Group=14 Group=14 Index
F P F P

1 14 1 14 1 1 VERY EASY
2 13 0.93 11 0.79 0.86 EASY
3 14 1 11 0.79 0.90 EASY
7. Compute the Discrimination Index

Formula:
Item Discrimination = pH – pL
DISCRIMINATION REMARK
INDEX
0.40 and above Very good item
0.30 to 0.39 Good Item
0.20 to 0.29 Reasonably good item
0.10 to 0.19 Maginal item
Below 0.10 Poor item
Discrimination Index
Refers to the power of the item to
discriminate the students between
those scored high and low in the overall
test.
Formula:
Item Discrimination = pH – pL
Item Upper Lower Discrimination
Number Group=14 Group=14 Index

F P F P

1 14 1 14 1 0
2 13 0.93 11 0.79 0.14
3 14 1 11 0.79 0.21
4 14 1 7 0.5 0.5
DISCRIMINATION INDEX REMARK

0.41 and above High

0.20 to 0.40 Moderate
Below 0.19 Low

Item Upper Group=14 Lower Group=14 Discriminat REMARKS

No. ion Index
F P F P

1 14 1 14 1 0 LOW
2 13 0.93 11 0.79 0.14 LOW
3 14 1 11 0.79 0.21 MODERATE
4 14 1 7 0.5 0.5 HIGH
Negative
Discrimination

This happens when more students in

the lower group than the upper group
select the right answer to an item.
• Very easy or very difficult items are
not good discriminations.

• A poorly written item will have little

ability to discriminate.
8. Decide whether to retain, revise, or reject
the item.

• When an item meets the general

acceptability guidelines for difficulty and
discrimination, you should keep the item.
DIFFICULTY INDEX DISCRIMINATION INDEX DECISION
High
Moderate or average (0.41 and above)
Items with difficulty index within 0.26
(0.26 to 0.75) Moderate RETAIN to 0.75 and with discrimination index
(0.20 to 0.40) of 0.20 and above are to be retained.
Moderate or Average Low
(0.26 to 0.75) (Below 0.19)

Very easy Item with difficulty index within 0.26 to

(0.91 and above)
Easy
High
REVISE 0.75 but with discrimation index of
(0.41 and above) 0.19 and below OR with discrimination
(0.76 to 0.90)
Moderate
Difficult
(0.20 to 0.40) index of 0.20 and above but with
(0.11 to 0.25) difficulty index not within 0.26 to 0.75
Very Difficult
(below 0.10) should be revised.
Very easy
(0.91 and above)
Easy Items with difficulty index not
(0.76 to 0.90) Low
within 0.26 to 0.75 and with a
Difficult (Below 0.19) REJECT
(0.11 to 0.25) discrimination index of 0.19 and
Very Difficult below should rejected.
(below 0.10)
Upper Lower
Ite Group=14 Group=14
Difficulty Discrimination
m Index
REMARKS
Index
REMARKS DECISION
No.
F P F P

VERY
1 14 1 14 1 1 0 LOW REJECT
EASY

2 13 0.93 11 0.79 0.86 EASY 0.14 LOW REJECT

3 14 1 11 0.79 0.90 EASY 0.21 MODERATE REVISE

REMINDERS:

• Difficulty levels should never be negative – if you get

a negative result, recalculate

• Item discriminations can be negative – that is, more

students in the low-scoring group can get the item
correct than those in the top-scoring group.

• The general advice is to reject, revise, or rewrite

items entirely with the negative item discriminations.

606
No ratings yet
606
9 pages
Lesson 4. Estab-WPS Office
No ratings yet
Lesson 4. Estab-WPS Office
5 pages
Validity
No ratings yet
Validity
6 pages
assessment-good-test
No ratings yet
assessment-good-test
24 pages
Validity
No ratings yet
Validity
12 pages
College of Education
No ratings yet
College of Education
5 pages
8602 (6) 2
No ratings yet
8602 (6) 2
15 pages
Unit 6 (8602)
No ratings yet
Unit 6 (8602)
14 pages
Validity & Reliability (Chapter 4 - Learning Assessment)
No ratings yet
Validity & Reliability (Chapter 4 - Learning Assessment)
75 pages
Validity Module
No ratings yet
Validity Module
5 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Assignment 1 8602 Spring 2023
No ratings yet
Assignment 1 8602 Spring 2023
23 pages
Types of Validity
No ratings yet
Types of Validity
13 pages
TEST VALIDITY
No ratings yet
TEST VALIDITY
5 pages
Measurement and Evaluation
No ratings yet
Measurement and Evaluation
71 pages
NOTE 5 - Validity and Data Gathering Technique
No ratings yet
NOTE 5 - Validity and Data Gathering Technique
6 pages
Validity
No ratings yet
Validity
16 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Assessment Churva
No ratings yet
Assessment Churva
3 pages
Unit 6 8602
100% (1)
Unit 6 8602
22 pages
8602 Assignment
No ratings yet
8602 Assignment
30 pages
Material For Evaluation For Class Lectures 1
No ratings yet
Material For Evaluation For Class Lectures 1
67 pages
Establishing Test Validity and Reliability
No ratings yet
Establishing Test Validity and Reliability
15 pages
What Is Validity in Research
No ratings yet
What Is Validity in Research
6 pages
Assessment in education plays a crucial role in evaluating student learning
No ratings yet
Assessment in education plays a crucial role in evaluating student learning
7 pages
6407-2
No ratings yet
6407-2
17 pages
MODULE
No ratings yet
MODULE
5 pages
Validity seminar
No ratings yet
Validity seminar
14 pages
Vii. Validity
No ratings yet
Vii. Validity
3 pages
Validity
No ratings yet
Validity
11 pages
05A-Validity
No ratings yet
05A-Validity
5 pages
Validity 2
No ratings yet
Validity 2
3 pages
Module-8
No ratings yet
Module-8
3 pages
Validity:: "Validity Is The Degree To Which A Test Measures What It Is Supposed To Measure."
No ratings yet
Validity:: "Validity Is The Degree To Which A Test Measures What It Is Supposed To Measure."
6 pages
Validity TM
No ratings yet
Validity TM
8 pages
Validity 1
No ratings yet
Validity 1
24 pages
Validity New
No ratings yet
Validity New
30 pages
AOL1 Gr. 8 Validity Reliability
No ratings yet
AOL1 Gr. 8 Validity Reliability
44 pages
Test Validity
No ratings yet
Test Validity
15 pages
What Is Content Validity
No ratings yet
What Is Content Validity
2 pages
Validity and Reliability
No ratings yet
Validity and Reliability
2 pages
8602 Assignment No 2 Muhamamd Shahid
No ratings yet
8602 Assignment No 2 Muhamamd Shahid
56 pages
Lecture Notes On Characteristics of Tests
No ratings yet
Lecture Notes On Characteristics of Tests
10 pages
Principle of Language Assessment
No ratings yet
Principle of Language Assessment
56 pages
Ail 1
No ratings yet
Ail 1
5 pages
Criteria For A Good Test
100% (1)
Criteria For A Good Test
5 pages
Validity Refers To How Well A Test Measures What It Is Purported To Measure
No ratings yet
Validity Refers To How Well A Test Measures What It Is Purported To Measure
6 pages
Qualities of Good Measuring Instruments
56% (9)
Qualities of Good Measuring Instruments
4 pages
Validity Reviewer
No ratings yet
Validity Reviewer
2 pages
L9 Qualities of A Good Measuring Instrument
No ratings yet
L9 Qualities of A Good Measuring Instrument
22 pages
Educational Research Validity & Types of Validity: Ayaz Muhammad Khan
No ratings yet
Educational Research Validity & Types of Validity: Ayaz Muhammad Khan
17 pages
Test Worthiness: Validity, Reliability, Cross-Cultural Fairness, and Practicality
No ratings yet
Test Worthiness: Validity, Reliability, Cross-Cultural Fairness, and Practicality
7 pages
Quantitative Analysis - Sir Audrey
No ratings yet
Quantitative Analysis - Sir Audrey
6 pages
Validity and Reliability
No ratings yet
Validity and Reliability
5 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
SSM 118 - 4 - Validity
No ratings yet
SSM 118 - 4 - Validity
71 pages
2.measurement of Validity Reliability
No ratings yet
2.measurement of Validity Reliability
31 pages
Measurement - Task Sheets Gr. 3-5
From Everand
Measurement - Task Sheets Gr. 3-5
Chris Forest
No ratings yet
The Complete ISEE Upper Level Test Prep Book: Over 3000 Practice Questions to Help You Pass Your Exam
From Everand
The Complete ISEE Upper Level Test Prep Book: Over 3000 Practice Questions to Help You Pass Your Exam
Caleb Roster
No ratings yet
Florida Algebra I EOC with Online Practice Tests
From Everand
Florida Algebra I EOC with Online Practice Tests
Elizabeth Morrison
No ratings yet
Subject Report With Exemplars: Textiles Clothing and Fashion
No ratings yet
Subject Report With Exemplars: Textiles Clothing and Fashion
32 pages
Item Analysis
No ratings yet
Item Analysis
5 pages
Otet Syllabus 6ccf2b9f
No ratings yet
Otet Syllabus 6ccf2b9f
7 pages
5V0-31.23 Exam Dumps
No ratings yet
5V0-31.23 Exam Dumps
2 pages
CEH Certified+Ethical+Hacker+Brochure
No ratings yet
CEH Certified+Ethical+Hacker+Brochure
26 pages
Unit-3
No ratings yet
Unit-3
11 pages
Untitled Document
No ratings yet
Untitled Document
35 pages
sample-questions-for-pmi-program-management-professional-exam-by-everett
No ratings yet
sample-questions-for-pmi-program-management-professional-exam-by-everett
9 pages
Online Quizzes A Self-Conscious Exploration of STR
No ratings yet
Online Quizzes A Self-Conscious Exploration of STR
9 pages
QALSCM_Ind. Assignment (workout)_NAC (2025)
No ratings yet
QALSCM_Ind. Assignment (workout)_NAC (2025)
3 pages
Least Learned Mastered Skills 2023 24
No ratings yet
Least Learned Mastered Skills 2023 24
4 pages
The Cane of Love PDF
No ratings yet
The Cane of Love PDF
21 pages
Pe-Tasc - Unit 1 Nature of Curriculum
No ratings yet
Pe-Tasc - Unit 1 Nature of Curriculum
22 pages
K10 Đề cương HKII 23 24 1
No ratings yet
K10 Đề cương HKII 23 24 1
24 pages
Gateway A1+ - Exam Practice Reading
No ratings yet
Gateway A1+ - Exam Practice Reading
22 pages
Software Engineering Conversion
No ratings yet
Software Engineering Conversion
12 pages
R23 EEE Academic Regulations I & II B.Tech Course Structure and Syllabus (3)
No ratings yet
R23 EEE Academic Regulations I & II B.Tech Course Structure and Syllabus (3)
96 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Homework Page 125 2as
100% (1)
Homework Page 125 2as
7 pages
Milan
No ratings yet
Milan
10 pages
NCLEX Test Taking Tips and Strategies 1
No ratings yet
NCLEX Test Taking Tips and Strategies 1
57 pages
Past Exam Question Paper
No ratings yet
Past Exam Question Paper
9 pages
Mie258f - 2022 - Engineering Economics and Accounting - e
No ratings yet
Mie258f - 2022 - Engineering Economics and Accounting - e
12 pages
M Lib I SC
No ratings yet
M Lib I SC
52 pages
SYLLABUS - Analytical and Critical Reading 2024
No ratings yet
SYLLABUS - Analytical and Critical Reading 2024
4 pages
FCX Certification Public-Handbook
No ratings yet
FCX Certification Public-Handbook
19 pages
Java Microproject
No ratings yet
Java Microproject
15 pages
PDF文档
No ratings yet
PDF文档
3 pages
ISTQB CTFL v4.0 Sample-Exam-C-Answers v1.0
No ratings yet
ISTQB CTFL v4.0 Sample-Exam-C-Answers v1.0
38 pages
Time Table - Nov-Dec 2023 - 2022 - UG
No ratings yet
Time Table - Nov-Dec 2023 - 2022 - UG
41 pages

Validity Reliability.2

Uploaded by

Validity Reliability.2

Uploaded by

ESTABLISH TEST

• If a scale that measures personality is

• In an entrance exam is valid, it should predict

• A good test must have strong construct

• How are the objectives used when creating test items?

What can be done in order to ensure that the assessment appears

The school admission’s office developed an examination. The officials wanted to

• Why are two measures needed in predictive validity?

A school guidance counselor administered a math achievement test to

• What needs to be available when conducting concurrent validity?

A science test was made by a grade 10 teacher composed of four domains:

What type of test requires construct validity?

The construct validity of a measure is reported in journal articles. The following

A math teacher developed a test to be administered at the end of the school

• What should the test have in order to conduct a convergent validity?

An English taught metacognitive awareness strategy to comprehend a

• What conditions are needed to conduct divergent validity?

This type of analysis is of value when we measure “traits”

Correlation between administrations 1 and 2 should

• Coefficient of Equivalence- the correlation

• Two ways to administer: Immediate or Delayed

• The results of one half of the test are then

• The two halves of the test can be created in a

• Split-half is applicable when the test has a

• Average of all split-halves.

• Disadvantage: affected by the number

• The formula for calculating the

• Sufficient training is needed.

• Inter-scorer reliability, judge reliability, observer

• A method used for assessing the agreement

• 1 (perfect agreement) to -1.

Pearson r and Spearman-

Cronbach’s Alpha Measure of Internal Cronbach’s Alpha

measure of the consistency

Item analysis is a process which examines

• The larger the percentage of getting an

0.91 and above Very Easy

0.76 to 0.90 Easy

0.26 to 0.75 Moderate or average

0.11 to 0.25 Difficult

Below 0.10 Very Difficult

0.40 and above Very good item

is a tool used in educational assessment to

• The table lists the distractors (incorrect answer

• Effective distractor distracts students from the

1. Score the TEST

2. 50 12.42 22. 34 32. 27 42. 15

3. 49 13.41 23. 34 33. 27 43. 15

4. 48 14.40 24. 32 34. 27 44. 15

5. 48 15.38 25. 32 35. 25 45. 14

6. 46 16.38 26. 30 36. 25 46. 12

7. 45 17.37 27. 30 37. 23 47. 12

8. 44 18.35 28. 29 38. 22 48. 11

9. 43 19.35 29. 29 39. 20 49. 10

10. 43 20.35 30. 29 40. 20 50. 6

Item Upper Lower

Item Upper Lower

DIFFICULTY INDEX REMARK

• The larger the percentage of getting an

Item Upper Group=14 Lower Group=14 Difficul

0.41 and above High

Item Upper Group=14 Lower Group=14 Discriminat REMARKS

This happens when more students in

• A poorly written item will have little

• When an item meets the general

Very easy Item with difficulty index within 0.26 to

2 13 0.93 11 0.79 0.86 EASY 0.14 LOW REJECT

3 14 1 11 0.79 0.90 EASY 0.21 MODERATE REVISE

• Difficulty levels should never be negative – if you get

• Item discriminations can be negative – that is, more

• The general advice is to reject, revise, or rewrite

You might also like