0% found this document useful (0 votes)

309 views11 pages

SPL-3 Unit 2

The document discusses different methods for measuring the reliability of tests: 1. Test-retest reliability measures consistency over time by administering the same test twice over an interval. 2. Parallel forms reliability assesses equivalence by administering two alternate forms of a test and correlating scores. 3. Split-half reliability estimates internal consistency by splitting a test in half and correlating scores on each half. It controls for practice effects with a single administration.

Uploaded by

Divyank Surum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

309 views11 pages

SPL-3 Unit 2

Uploaded by

Divyank Surum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

UNIT 2: RELIABILITY OF TESTS

2.1: Reliability: Meaning, True score estimation

Reliability is the extent to which a test is repeatable and gives consistent scores.
Tests that are relatively free from measurement errors are said to be reliable.

Measurement errors are random. A person’s test score might not reflect the true
scores because they were sick, anxious, in a noisy room, in a hurry etc.

The reliability of a test refers to ability to yield consistent result from one set of
measure to another; it is the extent to which the obtained test scores are free from
internal defects.

Reliability also refers to the extent to which a test yields consistent result upon
testing and re-testing. Reliability includes such terms as consistency, stability,
replicability and repeatability.

2.2: Types: Test-retest, Split-half, Parallel-form and Scorer reliability

Types of reliability: There are several types of reliability.

There are four procedures in common use for computing the reliability coefficient of a
test.

These are:
1. Test-Retest (Repetition)
2. Alternate or Parallel Forms
3. Split-Half Technique
4. Rational Equivalence.

Test-retest reliability: This type of reliability is known as “Stability”. Stability is the

extent to which individuals tend to obtain a similar score, upon retaking the same
test.

To estimate reliability by means of the test-retest method, the same test is

administered twice to the same group of pupils with a given time interval between the
two administrations of the test.

The resulting test scores arc correlated and this correlation coefficient provides a
measure of stability, that is, it indicates how stable the test results are over a period
of time. So it is otherwise known as a measure of stability.

Thus it means administrating the same test with a particular time gap say 21 to 28
days then calculate the correlation b/w 1st& 2nd test, if we get significant result then
we can say test is reliable.

The estimate of reliability varies according to the length of time-interval allowed

between the two administrations. The product moment method of correlation is a
significant method for estimating reliability of two sets of scores.
Thus, a high correlation between two sets of scores indicates that the test is reliable.
Means, it shows that the scores obtained in first administration resemble with the
scores obtained in second administration of the same test.

In this method the time interval plays an important role. If it is too small say a day or
two, the consistency of the results will be influenced by the carry-over effect, i.e., the
pupils will remember some of the results from the first administration to the second.

If the time interval is long say a year, the results will not only be influenced by the
inequality of testing procedures and conditions, but also by the actual changes in the
pupils over that period of time.

Test-retest reliability example

You devise a questionnaire to measure the IQ of a group of participants (a property
that is unlikely to change significantly over time).You administer the test two months
apart to the same group of people, but the results are significantly different, so the
test-retest reliability of the IQ questionnaire is low.

Advantages:
• Self-correlation or test-retest method, for estimating reliability coefficient is
generally used.
• It is worthy to use in different situations conveniently.
• A test of an adequate length can be used after an interval of many days
between successive testing.

Disadvantages:
• If the test is repeated immediately, many subjects will recall their first answers
and spend their time on new material, thus tending to increase their scores.
• Besides immediate memory effects, practice and the confidence induced by
familiarity with the material will also affect scores when the test is taken for a
second time. Index of reliability so obtained is less accurate.
• If the interval between tests is rather long (more than six months) growth
factor and maturity will affect the scores and tends to lower down the reliability
index.
• If the test is repeated immediately or after a little time gap, there may be the
possibility of carry-over effect/transfer effect/memory/practice effect.
• On repeating the same test, on the same group second time, makes the
students disinterested and thus they do not like to take part wholeheartedly.
• Sometimes, uniformity is not maintained which also affects the test scores.
• Chances of discussing a few questions after the first administration, which
may increase the scores at second administration affecting reliability.

2. Alternate or Parallel Forms Method:

Parallel form reliability is also known as Alternative form reliability or Equivalent form
reliability or Comparable form reliability.
In this method two parallel or equivalent forms of a test are used. Estimating
reliability by means of the equivalent form method involves the use of two different
but equivalent forms of the test. By parallel forms we mean that the forms are
equivalent so far as the content, objectives, format, difficulty level and discriminating
value of items, length of the test etc. are concerned.

It refers to the degree to which two different forms of the same test yield similar
result. Once we have a test, we should develop our parallel form, we should have
the similar construct/item in form but not the same items, we should have at least
fifteen minutes gap between both test and then study correlation and find the
reliability.

Parallel tests have equal mean scores, variances and inter co-relations among
items. That is, two parallel forms must be homogeneous or similar in all respects, but
not a duplication of test items. The two equivalent forms are to be possibly similar in
content, degree, mental processes tested, and difficulty level and in other aspects.

The reliability coefficient may be looked upon as the coefficient correlation between
the scores on two equivalent forms of test. One form of the test is administered on
the students and on finishing immediately another form of test is supplied to the
same group.

The scores, thus obtained are correlated which gives the estimate of reliability. Thus,
the reliability found is called coefficient of equivalence.

Parallel forms reliability example

A set of questions is formulated to measure financial risk aversion in a group of

respondents. The questions are randomly divided into two sets, and the respondents
are randomly divided into two groups. Both groups take both tests: group A takes
test A first, and group B takes test B first. The results of the two tests are compared,
and the results are almost identical, indicating high parallel forms reliability.

Advantages:
This procedure has certain advantages over the test-retest method:

• Here the same test is not repeated.

• Memory, practice, carryover effects and recall factors are minimised and they
do not affect the scores.

• Useful for the reliability of achievement tests.

• This method is one of the appropriate methods of determining the reliability of

educational and psychological tests.

Limitations:
• It is difficult to have two parallel forms of a test. In certain situations (i.e. in
Rorschach) it is almost impossible.

• When the tests are not exactly equal in terms of content difficulty, length, the
comparison between two set of scores obtained from these tests may lead to
erroneous decisions.
• Practice and carryover factors cannot be completely controlled.

• Moreover, administering two forms simultaneously creates boredom. That is

why people prefer such methods in which only one administration of the test is
required.

• The testing conditions while administering the second Form may not be the
same. Besides, the testes may not be in a similar physical, mental or
emotional state at both the times of administration.

• Test scores of second form of the test are generally high.

• Although difficult, carefully and cautiously constructed parallel forms would

give us reasonably a satisfactory measure of reliability. For well-made
standardised tests, the parallel form method is usually the most satisfactory
way of determining the reliability.

3. Split-Half Method or Sub-divided Test Method:

Split-half method is an improvement over the earlier two methods, and it involves
both the characteristics of stability and equivalence. The above discussed two
methods of estimating reliability sometimes seem difficult.
It may not be possible to use the same test twice and to get an equivalent form of
test. Hence, to overcome these difficulties and to reduce memory effect as well as to
economise the test, it is desirable to estimate reliability through a single
administration of the test.

In this method it is possible to divide a test into two halves and examine the
relationship between the two half scores. The test is administered as a whole, but
test scores are completed separately for each half.

A common approach to split half test division is to assign the odd numbered items to
one half and even numbered items to other. The correlations between the two half
gives an estimate of the reliability of each half test but not of the reliabilities of the full
test.

In this method the test is administered once on the sample and it is the most
appropriate method for homogeneous tests. This method provides the internal
consistency of a test scores.

All the items of the test are generally arranged in increasing order of difficulty and
administered once on sample. After administering the test it is divided into two
comparable or similar or equal parts or halves.

The scores are arranged or are made in two sets obtained from odd numbers of
items and even numbers of items separately. As for example a test of 100 items is
administered. The scores of individual based on 50 items of odd numbers like 1, 3, 5,
….99 and scores based on even numbers 2, 4, 6… 10 are separately arranged. In
part ‘A’ odd number items are assigned and part ‘B’ will consist of even number of
items.
After obtaining two scores on odd and even numbers of test items, co-efficient of
correlation is calculated. It is really a correlation between two equivalent halves of
scores obtained in one sitting. (To estimate reliability, Spearman-Brown Prophecy
formula is used).

While using this formula, it should be kept in mind that the variance of odd and even
halves should be equal. If it is not possible then Flanagan’s and Rulon’s formulae
can be employed. These formulae are simpler and do not involve computation of
coefficient of correlation between two halves).

Advantages:
• Here we are not repeating the test or using the parallel form of it and thus the
testee (subject) is not tested twice. As such, the carry over effect or practice
effect is not there.
• In this method, the fluctuations of individual’s ability, because of
environmental or physical conditions is minimised.
• Because of single administration of test, day-to-day functions and problems
do not interfere.
• Difficulty of constructing parallel forms of test is eliminated.

Limitations:
• A test can be divided into two equal halves in a number of ways and the
coefficient of correlation in each case may be different.
• This method cannot be used for estimating reliability of speed tests.
• As the test is administered once, the chance errors may affect the scores on
the two halves in the same way and thus tending to make the reliability
coefficient too high.
• This method cannot be used in power tests and heterogeneous tests.

Inspite of all these limitations, the split-half method is considered as the best of all
the methods of measuring test reliability, as the data for determining reliability are
obtained upon on occasion and thus reduces the time, labour and difficulties
involved in case of second or repeated administration.

Type of reliability Measures the consistency of…

Test-retest The same test over time.
Inter-rater The same test conducted by different people.
Parallel forms Different versions of a test which are designed to be equivalent.
Internal consistency The individual items of a test.

4. Scorer reliability- refers to the consistency when different people who score the
same test agree. It indicates how consistent test scores are if the test is scored by
two or more people. For a test with a definite answer key, scorer reliability is of
negligible concern. When the subject responds with his own words, handwriting, and
organization of subject matter then scorers’ reliability matter.
Note, it can also be called inter-observer reliability when referring to observational
research. Here researchers observe the same behavior independently (to avoided
bias) and compare their data. If the data is similar then it is reliable.
Internal Consistency Reliability (Method of Rational Equivalence):
This method is also known as “Kuder-Richardson Reliability’ or ‘Inter-Item
Consistency’. It is a method based on single administration. It is based on
consistency of responses to all items.

In this method, it is assumed that all items have same or equal difficulty value,
correlation between the items are equal, all the items measure essentially the same
ability and the test is homogeneous in nature.

This type of reliability refers to the degree to which each item on a test is measuring
the same thing as each other. This inter-item consistency is known as internal
consistency. (Like split-half method this method also provides a measure of internal
consistency).

A test is internally consistent or homogenous if an individual’s responses to, or

performance on one item is related to his/her responses to all of the other items in
the test.

(The most common way for finding inter-item consistency is through the formula
developed by Kuder and Richardson (1937). This method enables to compute the
inter-correlation of the items of the test and correlation of each item with all the items
of the test. J. Cronbach called it as coefficient of internal consistency).

Internal consistency example

A group of respondents are presented with a set of statements designed to measure

optimistic and pessimistic mindsets. They must rate their agreement with each
statement on a scale from 1 to 5. If the test is internally consistent, an optimistic
respondent should generally give high ratings to optimism indicators and low ratings
to pessimism indicators. The correlation is calculated between all the responses to
the “optimistic” statements, but the correlation is very weak. This suggests that the
test has low internal consistency.

Advantages:
• This coefficient provides some indications of how internally consistent or
homogeneous the items of the tests are.
• Split-half method simply measures the equivalence but rational equivalence
method measures both equivalence and homogeneity.
• Economical method as the test is administered once.
• It neither requires administration of two equivalent forms of tests nor it
requires to split the tests into two equal halves.

Limitations:
• The coefficient obtained by this method is generally somewhat lesser than the
coefficients obtained by other methods.
• If the items of the tests are not highly homogeneous, this method will yield
lower reliability coefficient.
• Kuder-Richardson and split-half method are not appropriate for speed test.
Measuring a property that you expect to stay the same over time. Test-retest

Multiple researchers making observations or ratings about the same topic. Inter-rater

Using two different tests to measure the same thing. Parallel forms

Using a multi-item test where all the items are intended to measure the same Internal consistency
variable.

2.3: Standard error of measurement

The standard error of measurement is one of the core concepts in

psychometrics. One of the primary assumptions of any assessment is that it is
accurately and consistently measuring whatever it is we want to measure. We,
therefore, need to demonstrate that it is doing so. There are a number of ways of
quantifying this, and one of the most common is the SEM.

SEM is an index of the reliability of an assessment instrument, representing the

variation of an individual’s scores across multiple administrations of the same test.
The larger the standard error of measurement, the greater the score variation across
administrations.

The standard error of measurement provides an indication of how confident one may
be that an individual’s obtained score on any given measurement test represents his
or her true score.

A standard error of measurement, often denoted SEM, is a measure of how much

measured test scores are spread around a “true” score. For an individual when
repeated measures are taken.

Standard error of measurement (SEm), the standard deviation of error of

measurement in a test or experiment. It is closely associated with the error variance,
which indicates the amount of variability in a test administered to a group that is
caused by measurement error. The standard error of measurement is used to
determine the effect of measurement error on individual results in a test and is a
common tool in psychoanalytical research and standardized academic testing.

The standard error of measurement is a function of both the standard deviation of

observed scores and the reliability of the test. When the test is perfectly reliable, the
standard error of measurement equals 0. When the test is completely unreliable, the
standard error of measurement is at its maximum, equal to the standard deviation of
the observed scores.

The standard error of measurement serves in a complementary role to the reliability

coefficient. Reliability can be understood as the degree to which a test is consistent,
repeatable, and dependable. The reliability coefficient ranges from 0 to 1: When a
test is perfectly reliable, all observed score variance is caused by true score
variance, whereas when a test is completely unreliable, all observed score variance
is a result of error. Although the reliability coefficient provides important information
about the amount of error in a test measured in a group or population, it does not
inform on the error present in an individual test score.

The Pearson product-moment coefficient measure of reliability is commonly used for

the calculation of the standard error of measurement.

The standard error of measurement is a statistic that indicates the variability of the
errors of measurement by estimating the average number of points by which
observed scores are away from true scores. To understand standard error of
measurement, an introduction of basic concepts in reliability theory is necessary.
Therefore, this entry first examines true scores and error of measurement and
variance components. Next, this entry discusses the standard error of measurement
and its uses. Last, this entry comments on methods for reducing measurement error.

A standard error of measurement, often denoted SEm, estimates the variation

around a “true” score for an individual when repeated measures are taken.

It is calculated as:

SEm = s√1-R

where:

• s: The standard deviation of measurements

• R: The reliability coefficient of a test

Note that a reliability coefficient ranges from 0 to 1 and is calculated by administering

a test to many individuals twice and calculating the correlation between their test
scores.

The higher the reliability coefficient, the more often a test produces consistent
scores.

The standard error of measurement (SEm) estimates how repeated measures of a

person on the same instrument tend to be distributed around his or her “true” score.
The true score is always an unknown because no measure can be constructed that
provides a perfect reflection of the true score. SEm is directly related to the reliability
of a test; that is, the larger the SEm, the lower the reliability of the test and the less
precision there is in the measures taken and scores obtained. Since all
measurement contains some error, it is highly unlikely that any test will yield the
same scores for a given person each time they are retested.

Reliability & Standard Error of Measurement

There exists a simple relationship between the reliability coefficient of a test and the
standard error of measurement:
• The higher the reliability coefficient, the lower the standard error of
measurement.
• The lower the reliability coefficient, the higher the standard error of
measurement.

The term standard error of measurement indicates the spread of measurement

errors when estimating an examinee’s true score from the observed score. Standard
error of measurement is most frequently useful in test reliability. An observed score
is an examinee’s obtained score, or raw score, on a particular test. A true score
would be determined if this particular test was then given to a group of examinees
1,000 times, under identical conditions. The average of those observed scores would
yield the best estimate of the examinees’ true abilities. Standard deviation is applied
to the average of those scores across persons and administrations to determine the
standard error of measurement. Observed score and true score can be used
together to determine the amount of error:

Score true = Score observed + Score error.

2.4: Reliability- Influencing factors and improvement techniques

Reliability is the degree to which an assessment tool produces stable and

consistent results. Some intrinsic and some extrinsic factors have been identified to
affect the reliability of test scores.

(A) Intrinsic Factors: The principal intrinsic factors (i.e. those factors which lie
within the test itself) which affect the reliability are:

(i) Length of the Test: One of the major factors that affect reliability is the length of
the test. A longer test provides a more adequate sample of behavior being measured
and is less disturbed by chance factors like guessing.

Reliability has a definite relation with the length of the test. The more the number of
items the test contains, the greater will be its reliability and vice-versa. Logically, the
more sample of items we take of a given area of knowledge, skill and the like, the
more reliable the test will be.

However, it is difficult to ensure the maximum length of the test to ensure an

appropriate value of reliability. The length of the tests in such case should not give
rise to fatigue effects in the testees, etc. Thus, it is advisable to use longer tests
rather than shorter tests. Shorter tests are less reliable.

The number of times a test should be lengthened to get a desirable level of reliability
is given by the formula:

Example: When a test has a reliability of 0.8, the number of items the test has to be
lengthened to get a reliability of 0.95 is estimated in the following way:
Hence the test is to be lengthened 4.75 times. However, while lengthening the test
one should see that the items added to increase the length of the test must satisfy
the conditions such as equal range of difficulty, desired discrimination power and
comparability with other test items.

(ii) Homogeneity of Items: Homogeneity of items has two aspects: item reliability
and the homogeneity of traits measured from one item to another. If the items
measure different functions and the inter-correlations of items are ‘zero’ or near to it,
then the reliability is ‘zero’ or very low and vice-versa.

(iii) Difficulty Value of Items: The difficulty level and clarity of expression of a test
item also affect the reliability of test scores. If the test items are too easy or too
difficult for the group members it will tend to produce scores of low reliability.
Because both the tests have a restricted spread of scores.

(v) Test instructions: Clear and concise instructions increase reliability.

Complicated and ambiguous directions give rise to difficulties in understanding the
questions and the nature of the response expected from the testee ultimately leading
to low reliability.

(vi) Reliability of the scorer: The reliability of the scorer also influences reliability of
the test. If he is moody, fluctuating type, the scores will vary from one situation to
another. Mistake in him give rises to mistake in the score and thus leads to reliability.

(vii) Objectivity: Eliminate the biases, opinions or judgments of the person who
checks the test. Socio-political beliefs shall be set aside when checking the test.

(B) Extrinsic Factors: The important extrinsic factors (i.e. the factors which remain
outside the test itself) influencing the reliability are:

(i) Group variability: When the group of pupils being tested is homogeneous in
ability, the reliability of the test scores is likely to be lowered and vice-versa.

(ii) Guessing and chance errors: Guessing in test gives rise to increased error
variance and as such reduces reliability. For example, in two-alternative response
options there is a 50% chance of answering the items correctly in terms of guessing.

(iii) Environmental conditions: As far as practicable, testing environment should

be uniform. Arrangement should be such that light, sound, and other comforts should
be equal to all testees, otherwise it will affect the reliability of the test scores.

(iv) Momentary fluctuations: Momentary fluctuations may raise or lower the

reliability of the test scores. Broken pencil, momentary distraction by sudden sound
of a train running outside, anxiety regarding non-completion of home-work, mistake
in giving the answer and knowing no way to change it are the factors which may
affect the reliability of test score.

(v) Heterogeneity of the students’ group: Reliability is higher when test scores
are spread out a range of abilities. Reliability is achieved when the test-takers
represent a variety of intellectual levels and skills.
(vi) Limited time: Speed is a factor and is more reliable than a test that is conducted
at a longer time. This factor considers the chances that a student might cheat.

Reliability Improving Techniques

There are also several things that psychologists can do to improve reliability. Where
observer scores do not significantly correlate then reliability can be improved by:

• Training observers in the observation techniques being used and making sure
everyone agrees with them.

• Ensuring behavior categories have been operationalized. This means that

they have been objectively defined.

• increasing the sample size,

• controlling the testing conditions, and

• Running a reliability analysis on the measurement tool.

Test Development
100% (2)
Test Development
2 pages
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 3
No ratings yet
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 3
2 pages
CG Quiz
No ratings yet
CG Quiz
3 pages
Characteristics and Classifications of Measuring Instruments
100% (3)
Characteristics and Classifications of Measuring Instruments
4 pages
Types of Test Items
No ratings yet
Types of Test Items
22 pages
Test Construction 1
100% (1)
Test Construction 1
15 pages
Assignment On - Reading Skills
No ratings yet
Assignment On - Reading Skills
8 pages
MCQ learning outcomes 1
No ratings yet
MCQ learning outcomes 1
6 pages
Administering and Scoring Test
No ratings yet
Administering and Scoring Test
12 pages
Educ8 Assessment Test
No ratings yet
Educ8 Assessment Test
13 pages
Final Exam in ASL
No ratings yet
Final Exam in ASL
13 pages
Backward Children
No ratings yet
Backward Children
6 pages
Culture: Types of Culture Ideal, Real, Material & Non-Material Culture With Examples
No ratings yet
Culture: Types of Culture Ideal, Real, Material & Non-Material Culture With Examples
7 pages
Methods and Teaching Strategies
No ratings yet
Methods and Teaching Strategies
12 pages
Aptitude Test
No ratings yet
Aptitude Test
4 pages
PROF ED Assessment and Evaluation of Learning 3
No ratings yet
PROF ED Assessment and Evaluation of Learning 3
4 pages
Test Questions
100% (1)
Test Questions
10 pages
Stages of Human Development
No ratings yet
Stages of Human Development
7 pages
Professional Education - Let Reviewer
No ratings yet
Professional Education - Let Reviewer
40 pages
Subjective Type Test
No ratings yet
Subjective Type Test
51 pages
Educational Assessment Ch#3 B
100% (1)
Educational Assessment Ch#3 B
13 pages
Graduate Tracer Study FINAL
No ratings yet
Graduate Tracer Study FINAL
7 pages
Types of Tests: Unit: 4
No ratings yet
Types of Tests: Unit: 4
20 pages
PROFESSIONAL EDUCATION 900 Items
100% (1)
PROFESSIONAL EDUCATION 900 Items
48 pages
Assessment Method
No ratings yet
Assessment Method
14 pages
Ethics of The Teaching Profession
100% (2)
Ethics of The Teaching Profession
4 pages
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 4
No ratings yet
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 4
3 pages
Asl-1 1
No ratings yet
Asl-1 1
11 pages
Assessment of Learning
No ratings yet
Assessment of Learning
23 pages
Prof. Ed - Principles and Theories of Learning and Motivation Part 1-2
0% (1)
Prof. Ed - Principles and Theories of Learning and Motivation Part 1-2
4 pages
Let Review - Questions 3 On Norm Referenced and Criterion Referenced Tests
No ratings yet
Let Review - Questions 3 On Norm Referenced and Criterion Referenced Tests
10 pages
Presentation of Teaching (MCQ, S)
No ratings yet
Presentation of Teaching (MCQ, S)
4 pages
Salamanca Statement and Framework UNESCO 1994
No ratings yet
Salamanca Statement and Framework UNESCO 1994
6 pages
Constructing Objective Test Items Simple Forms
100% (1)
Constructing Objective Test Items Simple Forms
52 pages
Assessment For Learning in Classroom: Unit - Ii
No ratings yet
Assessment For Learning in Classroom: Unit - Ii
39 pages
Educational Technology Part 1 With Answers
No ratings yet
Educational Technology Part 1 With Answers
5 pages
SASTRAPOSHINI
No ratings yet
SASTRAPOSHINI
14 pages
Writing Essay Questions: Principles To Be Followed in Constructing Essay
No ratings yet
Writing Essay Questions: Principles To Be Followed in Constructing Essay
2 pages
Assessment of Learning Drill
No ratings yet
Assessment of Learning Drill
5 pages
Professional Education Lecture
No ratings yet
Professional Education Lecture
36 pages
EDFN 440 Classroom Management Final Exam For Questions 1-8, Select 4 Short Answer Questions To Answer. You Must Answer Question 9
No ratings yet
EDFN 440 Classroom Management Final Exam For Questions 1-8, Select 4 Short Answer Questions To Answer. You Must Answer Question 9
3 pages
P3 Edu 601 Reviewer
No ratings yet
P3 Edu 601 Reviewer
6 pages
2016 Let Bse Pre Board Wıth Answer 1
No ratings yet
2016 Let Bse Pre Board Wıth Answer 1
6 pages
The Child and Adolescent Learners and Learning Principles
No ratings yet
The Child and Adolescent Learners and Learning Principles
8 pages
Edu 532 Rational1 With Answer
No ratings yet
Edu 532 Rational1 With Answer
9 pages
Non Standard
No ratings yet
Non Standard
13 pages
Prof Ed
No ratings yet
Prof Ed
14 pages
Micro Teaching Skills: Orientation & Questioning Techniques
100% (1)
Micro Teaching Skills: Orientation & Questioning Techniques
21 pages
Part 1-4
No ratings yet
Part 1-4
15 pages
Factors Affecting Validity
100% (1)
Factors Affecting Validity
8 pages
Importance of Item Analysis
No ratings yet
Importance of Item Analysis
4 pages
Professional Education Test (Sample Let Items With Rationalizations)
No ratings yet
Professional Education Test (Sample Let Items With Rationalizations)
6 pages
3701-Trends and Issues in Education
25% (4)
3701-Trends and Issues in Education
2 pages
Research Activites For The Midterm
No ratings yet
Research Activites For The Midterm
7 pages
Exam Prof Ed Part 9 and 10
No ratings yet
Exam Prof Ed Part 9 and 10
10 pages
Assessment of Learning LET 2 PDF
No ratings yet
Assessment of Learning LET 2 PDF
32 pages
Presentation Topic Unit # 06 Qualities of A Good Test
0% (1)
Presentation Topic Unit # 06 Qualities of A Good Test
18 pages
Education Assessment and Evaluation: Submitted By: Ayesha Khalid Assignment Number 1 B.ED 1.5 Year
No ratings yet
Education Assessment and Evaluation: Submitted By: Ayesha Khalid Assignment Number 1 B.ED 1.5 Year
14 pages
Educational Technology 100 ITEMS ADRIANNENENG
100% (1)
Educational Technology 100 ITEMS ADRIANNENENG
17 pages
Here Are Four Procedures in Common Use For Computing The Reliability Coefficient
No ratings yet
Here Are Four Procedures in Common Use For Computing The Reliability Coefficient
11 pages
Unit 1
No ratings yet
Unit 1
5 pages
Political Culture
No ratings yet
Political Culture
5 pages
Meaning Nature and Scope of Modern Political Analysis
100% (3)
Meaning Nature and Scope of Modern Political Analysis
11 pages
Political Socialisation
No ratings yet
Political Socialisation
6 pages
SPL-3 Unit 4
No ratings yet
SPL-3 Unit 4
5 pages
SPL-3 Unit 1
No ratings yet
SPL-3 Unit 1
11 pages
SPL-3 Unit 3
No ratings yet
SPL-3 Unit 3
4 pages
Political System - Gabrial Almond
100% (4)
Political System - Gabrial Almond
8 pages
Difference Between Traditional and Modern Views of Political Science
No ratings yet
Difference Between Traditional and Modern Views of Political Science
1 page
Gaussian & Gaussview: Shubin Liu, Ph.D. Research Computing Center, Its University of North Carolina at Chapel Hill
No ratings yet
Gaussian & Gaussview: Shubin Liu, Ph.D. Research Computing Center, Its University of North Carolina at Chapel Hill
86 pages
004 - Stress Analysis Report
100% (5)
004 - Stress Analysis Report
11 pages
3rd Exam Math
No ratings yet
3rd Exam Math
4 pages
Introduction To Fluid Mechanics: Frederick Stern, Maysam Mousaviraad, Hyunse Yoon, Zhaoyuan Wang, Timur K. Dogan
No ratings yet
Introduction To Fluid Mechanics: Frederick Stern, Maysam Mousaviraad, Hyunse Yoon, Zhaoyuan Wang, Timur K. Dogan
47 pages
Alternating Series, Absolute and Conditional Convergence-1
No ratings yet
Alternating Series, Absolute and Conditional Convergence-1
42 pages
DNA Sequencing - Dr. Mirko Janc
No ratings yet
DNA Sequencing - Dr. Mirko Janc
8 pages
Discrete Distributions
No ratings yet
Discrete Distributions
25 pages
Kate_Okikiolu
No ratings yet
Kate_Okikiolu
3 pages
Triangle Inequality Theorem 2023 PDF
No ratings yet
Triangle Inequality Theorem 2023 PDF
34 pages
GP106: Computing Lab 01 - Lab SheetTutorial
No ratings yet
GP106: Computing Lab 01 - Lab SheetTutorial
4 pages
Engr. Mark Ondac/Engr. Ivan Karl Camacho
No ratings yet
Engr. Mark Ondac/Engr. Ivan Karl Camacho
8 pages
Lec 3_Continuous-Time Fourier Series
No ratings yet
Lec 3_Continuous-Time Fourier Series
44 pages
Reviewer in Mathematics 10
No ratings yet
Reviewer in Mathematics 10
6 pages
Model Fine Tuning Documentation
No ratings yet
Model Fine Tuning Documentation
11 pages
Portal Method of Approximate Analysis
No ratings yet
Portal Method of Approximate Analysis
4 pages
SECTION 3 FORCE ON A SLUICE GATE GROUP 1
No ratings yet
SECTION 3 FORCE ON A SLUICE GATE GROUP 1
15 pages
One Way Slab
100% (2)
One Way Slab
5 pages
ANIMESH GUPTA_2021UEA6545_PPT_DL
No ratings yet
ANIMESH GUPTA_2021UEA6545_PPT_DL
23 pages
NCERT Solutions For Class 5 Maths 11may Chapter 3 How Many Squares
No ratings yet
NCERT Solutions For Class 5 Maths 11may Chapter 3 How Many Squares
14 pages
Stress Critical Line List
No ratings yet
Stress Critical Line List
1 page
Mathematics p1 June 2024
No ratings yet
Mathematics p1 June 2024
8 pages
Maths 4th Semester 2016
No ratings yet
Maths 4th Semester 2016
13 pages
Module 1 Scientific Inquiry
No ratings yet
Module 1 Scientific Inquiry
39 pages
Advanced Statistical Arbitrage With Reinforcement Learning - Ning
No ratings yet
Advanced Statistical Arbitrage With Reinforcement Learning - Ning
14 pages
Corner Point Solution
No ratings yet
Corner Point Solution
2 pages
June 2023 P2 QP
No ratings yet
June 2023 P2 QP
32 pages
MSTE-terms 1
No ratings yet
MSTE-terms 1
13 pages
2nd Form Calc 2019 MS
No ratings yet
2nd Form Calc 2019 MS
3 pages
Analytical Attributes Approaches
No ratings yet
Analytical Attributes Approaches
24 pages
029 Rong Math AI Exploration
No ratings yet
029 Rong Math AI Exploration
20 pages

SPL-3 Unit 2

Uploaded by

SPL-3 Unit 2

Uploaded by

UNIT 2: RELIABILITY OF TESTS

2.1: Reliability: Meaning, True score estimation

2.2: Types: Test-retest, Split-half, Parallel-form and Scorer reliability

Types of reliability: There are several types of reliability.

Test-retest reliability: This type of reliability is known as “Stability”. Stability is the

To estimate reliability by means of the test-retest method, the same test is

The estimate of reliability varies according to the length of time-interval allowed

Test-retest reliability example

2. Alternate or Parallel Forms Method:

Parallel forms reliability example

A set of questions is formulated to measure financial risk aversion in a group of

• Here the same test is not repeated.

• Useful for the reliability of achievement tests.

• This method is one of the appropriate methods of determining the reliability of

• Moreover, administering two forms simultaneously creates boredom. That is

• Test scores of second form of the test are generally high.

• Although difficult, carefully and cautiously constructed parallel forms would

3. Split-Half Method or Sub-divided Test Method:

Type of reliability Measures the consistency of…

A test is internally consistent or homogenous if an individual’s responses to, or

Internal consistency example

A group of respondents are presented with a set of statements designed to measure

2.3: Standard error of measurement

The standard error of measurement is one of the core concepts in

SEM is an index of the reliability of an assessment instrument, representing the

A standard error of measurement, often denoted SEM, is a measure of how much

Standard error of measurement (SEm), the standard deviation of error of

The standard error of measurement is a function of both the standard deviation of

The standard error of measurement serves in a complementary role to the reliability

The Pearson product-moment coefficient measure of reliability is commonly used for

A standard error of measurement, often denoted SEm, estimates the variation

• s: The standard deviation of measurements

Note that a reliability coefficient ranges from 0 to 1 and is calculated by administering

The standard error of measurement (SEm) estimates how repeated measures of a

Reliability & Standard Error of Measurement

The term standard error of measurement indicates the spread of measurement

Score true = Score observed + Score error.

2.4: Reliability- Influencing factors and improvement techniques

Reliability is the degree to which an assessment tool produces stable and

However, it is difficult to ensure the maximum length of the test to ensure an

(v) Test instructions: Clear and concise instructions increase reliability.

(iii) Environmental conditions: As far as practicable, testing environment should

(iv) Momentary fluctuations: Momentary fluctuations may raise or lower the

Reliability Improving Techniques

• Ensuring behavior categories have been operationalized. This means that

• increasing the sample size,

• controlling the testing conditions, and

• Running a reliability analysis on the measurement tool.

You might also like