LESSON 6
Establish Test Validity
Reliability
Reported by : RACHEL ANN F. SABANDO
Key Explanations
1: Consistent response expected with
same participants
2: Consistency with same/equivalent
test at different times
3: Consistency across items measuring
same characteristic
What is Reliability?
-is the consistency of the
responses to measure under three
conditions
1. When retested on the same person
2. When retested on the same
measure
3. Similarly of responses across items
measuring the same characteristic
Factors Affecting Reliability
1. Number of Items in a Test
The more items a test has, the likelihood of reliability is high. The
probability of obtaining consistent scores is high because of the large
pool of items.
2. Individual Differences of Participants
. Every participant possesses characteristics that affect their
performance in a test, such as fatigue, concentration, innate ability,
perseverance, and motivation. These individual factors change over
time and affect the consistency of the answers in a test.
3. External Environment
. The external environment may include room temperature, noise
level, depth of instruction, exposure to materials, and quality of
instruction, which could affect changes in the responses of examinees
in a test."
What are the different ways to
establish test reliability?
*Key Determinants of Reliability Method
Selection
1.Variable Measured (e.g., stable traits
like IQ vs. transient states like mood)
2.Test Type (e.g., multiple-choice vs.
performance-based)
3.Number of Test Versions Available
Method in Testing Reliability
1. Test-retest
How is this reliability done
- You have a test, and you need to administer it at
one time to a group of examinees. Administer it again at
another time to the *same group* of examinees.
- There is a time interval of not more than 6 months between
the first and second administration of tests that measure stable
characteristics, such as standardized aptitude tests. The post-
test can be given with a minimum time interval of 30 minutes.
- The responses in the test should more or less be the same
across the two points in time.
- Applicability: Test-retest is applicable for tests that measure
stable variables, such as aptitude and psychomotor measures
(e.g., typing test, tasks in physical education).
What statistics is used?
-Correlate the test scores from the first and the next
administration. Significant and positive correlation indicates that
the test has temporal stability over time. Use **Pearson Product
Moment Correlation (Pearson r)** because test data are usually
in an interval scale.
2. Parallel forms
How is this reliability done?
- There are two versions of a test. The items need to
exactly measure the same skill. Each test version is called a
*form*.
- Administer one form at one time and the other form at
another time to the *same* group of participants.
- The responses on the two forms should be more or less the
same.
- Parallel forms are applicable if there are two versions of the
test (e.g., entrance examinations, licensure examinations).
How statistics is used?
-Correlate the test results for the first form and the second
form using **Pearson r**. A significant and positive
correlation coefficient indicates consistency between forms.
3. Split -Half
How isthis reliability done?
- Administer a test to a group of examinees. Split the
items into halves (usually odd-even technique).
- Correlate the sum of points in odd-numbered items with
the sum of points in even-numbered items. Each examinee
will have two scores from the same test.
-Used when the test has a large number of items.
What statistics is used?
1. Correlate the two sets of scores using **Pearson
r**.
2. Apply the **Spearman-Brown Coefficient** to adjust
for test length.
- A significant positive correlation indicates internal
consistency.
4. Test of internal consistently using kuder-richardson and
cronbachs alha
How is this relibility done?
- Determine if scores for each item are consistently
answered by examinees.
- Works for tests with many items or Likert-scale
inventories (e.g., "strongly agree" to "strongly disagree").
What statistics is used?
- **Cronbach’s alpha** or **Kuder-Richardson
(KR-20/21)**.
- A value ≥ 0.60 indicates internal consistency.
5. Inter-rater reliability
How is hisreliability done?
- Measures consistency among multiple raters
using the same rubric.
- Used when assessments require multiple raters
(e.g., performance evaluations).
What statistics is used?
-Kendall’s tau coefficient of concordance**.
Linear Regression
Linear regression shows the relationship
between two sets of scores from the same test
administered at different times."*
Visual Representation
Scatterplot showing Monday (X) vs. Tuesday (Y)
test scores*
- Each point = one student's paired scores
- Straight line = regression line
**Interpretation**:
- Tight cluster along line → High
reliability
- Scattered points → Low reliability
Computation of Pearson r Correlation
The index of the linear regression is called a correlation
coefficient. When the points in a scatterplot tend to fall
within the linear line, the correlation is said to be strong.
When the direction of the scatterplot is directly
proportional, the correlation coefficient will have a
positive value. If the line is inverse, the correlation
coefficient will have a negative value. The statistical
analysis used to determine the correlation coefficient is
called the Pearson r. How the Pearson r is obtained is
illustrated below.
Formula
£X- add all the X scores ( Monday scores)
£Y- add all the Y scores (Tuesday scores)
X²- square the value of the X scores (Monday scores)
Y²- square the value of the Y scores ( Tuesday scores)
XY- multiply the X and Y scores
£X²- add the square values of X
£Y²- add all the square values of Y
£XY- add all the product of X and Y
* The value of a correlation coefficient does not exceed 1.00
or -1.00. A value of 1.00 and the - 1.00 indicates perfect correlation.
In test of reliability though, we aim for high positive correlation to
mean that there is consistently in the way the student answered
the test taken.
3. Difference between a positive and a negative correlation
*When the value of the correlation coefficient is positive, it means
that the higher the scores in X, the higher the scores in Y. This is
called a positive correlation. In the case of the two spelling scores, a
positive correlation is obtained.
*When the value of the correlation coefficient is
negative, it means that the higher the scores in X, the
lower the scores in Y and vice versa. This is called a
negative correlation. When the same test is administered
to the same group of participants, usually a positive
correlation indicates reliability or consistency of the
scores.
4. Determining the strength of a correlation
Strength Guidelines
| r Value | Interpretation
| 0.80-1.00 | Very strong
| 0.60-0.79 | Strong
| 0.40-0.59 | Moderate
5. Determining the Significance of the Correlation
The correlation obtained between two variables could
be due to chance. In order to determine if the correlation
is free of certain errors, it is tested for significance. When
a correlation is significant, it means that the probability of
the two variables being related is free of certain errors.
In order to determine if a correlation coefficient value
is significant, it is compared with an expected probability
of correlation coefficient values called a critical value.
When the value computed is greater than the critical
value, it means that the information obtained has more
than 95% chance of being correlated and is significant.
Method:
- Compare the computed correlation coefficient to the
*critical value* in the statistical table.
- If the computed value is higher than the critical value,
the correlation is significant.
Example of Cronbach’s Alpha for Internal
Consistency:
Five (5) students answered a checklist regarding
cleanliness (scale: 1–5).
5- always 4- often 3- sometimes 2- rarely. 1-never