Final Notes Assessment of Learning 1 and 2
Final Notes Assessment of Learning 1 and 2
What is a measurement?
● A process of quantifying the degree to which someone possesses a given trait, i.e.
quality of characteristics or feature
● Assigning of numbers to performance, product, skill or behavior of a student, based
on a pre-determined procedure or set of criteria
● Assigning of numbers to the results of a test or other type of assessment
What is assessment?
Assessment as a product
Assessment as a process
What is evaluation?
● A process of making judgments about the quality of a performance, product, skill or
behavior of a student. It includes using some basis to judge worth or value. It
involves judgment about the desirability or changes in the students.
1. Summative Assessment
✔ This Is done after instruction
✔ It is used to certify what students know and can do and level of their
proficiency or competence
✔ Its results reveal whether or not instructions have successfully achieved the
curriculum outcomes
✔ The information from assessment of learning is usually expressed as marks
or letter grade
✔ The results of which are communicated to the students, parents and other
stakeholders of decision making
✔ It is also a powerful factor that could have pave the way for educational
reforms
Assessment AS Learning
✔ This is done for teachers to understand and perform well their role of
assessing FOR and OF learning
✔ It requires teachers to undergo training on how to assess learning and be
equipped with the following competencies needed in performing their work as
assessors
TRADITIO The
● Standardize ● Scoring is ● Preparation
NAL objective
paper and d test objective of instrument
pen test is time
● Teacher- ● Administratio
which consuming
mode tests n is easy
usually because ● Prone to
assess students can cheating
low-level take the test
thinking at the same
skills time
PERFORM A mode of
● Practical ● Preparation ● Scoring
ANCE assessme
nt that test of the tends to be
requires instrument is subjective
● Oral and
actual relatively without
aural test easy rubrics
demonstra
tion of ● Projects ● Measures ● Administratio
skills of behaviors n is time
creation of that cannot consuming
products of be deceived
learning
PORTFOLI A process
● working ● measures ● development
O of
gathering portfolios student’s is time
multiple growth and consuming
● show
indicators development
portfolios ● rating tends
of student intelligence
fair to be
progress ● documentar
subjective
to support y portfolios without
course rubrics
goals in
dynamic,
ongoing
and
collaborati
ve
progress.
S
traditional Authentic
Contrived Real-life
Teacher-structured Student-structured
1. Authenticity- that task is similar to what the students might encounter in the real word
as opposed to encountering only in the school
2. Feasibility- the task is realistically implementable in relation to its cost, space, time
and equipment requirements
3. Generalizability- the likelihood that the student’s performance on the task will
generalize to comparable tasks
4. Fairness- the task is fair to all students regardless of their social status or gender
5. Teachability- the task allows one to master the skill that one should proficient in
6. Multi Foci- the task measures multiple instructional outcomes
7. Scorability- the task can be reliably and accurately evaluated
PORTFOLIO ASSESSMENT
Is also an alternative tool to pen and paper objective test. It is a purposeful, ongoing,
dynamic, and collaborative process of gathering multiple indicators of the leaner’s growth
and development. It is also a performance based.
Rubric is a measuring instrument used in rating performance –based tasks. It is the key to
corrections for assessment tasks designed to measure the attainment of learning
competencies that require demonstration of skills or creation of products of learning. It
offers a set of guidelines or descriptions in scoring different levels of performance or
qualities of products of learning. It can be used in scoring both the process and products of
learning
1. Checklist
● Presents the observed characteristics of a desirable performance or
product
● The rater checks the trait/s that has/ have been observed in one’s
performance or product
2. Rating Scale
● Measures the extent or degree to which trait has been satisfied by one’s
work or performance
● Offers overall description of the different levels of quality of a work or
performance
● Uses 3 to more levels to describe the work of performance although the
most common rating scales have 4 or 5 performance levels
3. Rubric
● Shows the observed traits or a work/performance
1. HOLISTIC RUBRICS
Advantages
Disadvantages:
● It does not clearly describe the degree of the criterion satisfied or not by the
performance of the product
● It does not permit differential weighting of the qualities of a product/
performance
2. ANALYTIC RUBRIC
Advantages:
Disadvantages:
inventories
Assessment Methods
Projective Tests
✔ A psychological test that uses images in order to evoke responses from a subject
and reveal hidden aspects of the subject’s mental life
✔ These were developed in an attempt to eliminate some of the major problems
inherent in the use of self-report measures, such as the tendency of some
respondents to give “socially responsible “ for responses
1. Word Association Test- an individual is given a clue or hint and asked to respond to
the first thing that comes to mind.
2. Completion test- in this the respondents are asked to complete an individual
sentence or story. The completion will reflect their attitude and state of mind.
3. Construction Techniques- this is more or less like completion test. They can give you
a picture and you are asked to write a story about it. The initial structure is limited
and not detailed like the completion test.
4. Expression Techniques- in this, the people are asked to express the feeling or
attitude of other people.
1. The group to be tested is small and the test is not to be used again;
2. You wish to encourage and reward the development of student’s skill in
writing;
3. You are more interested in exploring the student’s attitudes than in
measuring his/her academic achievement;
4. You are more confident of your ability as critical and fair reader as an
imaginative writer of good objective test items
It consists of:
Ex:
a. Evaporation of alcohol
b. Freezing of water
c. Burning of oil
d. Melting of wax
Advantage of using Multiple Choice Items
The STEM
1. When possible, state the stem as a direct question rather than as an incomplete
statement
a. Poor- alloys are ordinarily produced by…
b. Better- how are alloys ordinarily produced?
2. Present a definite, explicit singular question or problem in the stem
a. Poor- Psychology …
b. Better- the science of mind and behavior is called…
3. Eliminate excessive verbiage or irrelevant information from the stem.
a. Poor- While ironing her formal polo shirt, June burned her hand accidentally
on the hot iron. This was due to a heat transfer because…
b. Better- which of the following ways of heat transfer explains why June’s hand
was burned after she touched the hot iron?
4. Include in the stem any word/s that might otherwise be repeated in each alternative
a. Poor- In national elections in the US, the president is officially
i. Chosen by the people
ii. Chosen by the electoral college
iii. Chosen by the Members of Congress
iv. Chosen by the House of Representatives
b. Better- In national elections in the US, the President is officially chosen by
i. The electoral college
ii. Members of the congress
iii. House of Representatives
5. Use negative stated questions sparingly. When used, underline and/or capitalize the
negative word.
a. Poor- which of the following is not cited as an accomplishment of Arroyo
administration?
b. Better- Which of the following is NOT cited as an accomplishment of Arroyo
administration?
6. Make all alternatives plausible and attractive to the less knowledgeable or skillful
student.
a. What process is most nearly opposite of photosynthesis?
i. Poor-
1. Digestion
2. Relaxation
3. Respiration
4. Exertion
ii. Better
1. Digestion
2. Assimilation
3. Respiration
4. Catabolism
7. Make the alternative grammatically parallel with each other and consistent with the
stem
a. Poor- What would advance the application of atomic discoveries to medicine?
a. Standardized techniques for treatment of patients
b. Train the average doctor to apply the radioactive
treatment
c. Remove restriction on the use of radioactive substances
d. Establishing hospital staffed by highly trained
radioactive therapy specialists.
b. Better- What would advance the application of atomic discoveries to
medicine?
a. Development of standardized techniques for treatment
of patients
b. Removal of restriction on the use of radioactive
substances
c. Addition of trained radioactive therapy specialists to
hospital staffs
d. Training the average doctor in application of radioactive
treatments
8. Make the alternatives mutually excessive
a. Poor- the daily minimum required amount of milk that a 10-year old child
should drink is
a. 1-2 glasses
b. 2-3 glasses
c. 3-4 glasses
d. At least 4 glasses
b. Better- What is the daily minimum required amount of milk a 10-year old
should drink?
a. 1 glass
b. 2 glasses
c. 3 glasses
d. 4 glasses
9. When possible, present in some logical order
a. At 7am, two trucks leave a diner and travel north. One truck averages 42
miles per hour and the other averages 38 miles per hour. At what time will be
24 miles apart?
i. Undesirable
a. 6pm
b. 9am
c. 1am
d. 1pm
e. 6am
ii. Desirable
a. 1am
b. 6am
c. 9am
d. 1pm
e. 6pm
10. Be sure there is only one correct or best response to the item
a. Poor- The two most desired characteristics in a classroom test are validity
and
a. Precision
b. Reliability
c. Objectivity
d. Consistency
b. Best- the two most desired characteristics in a classroom test are validity and
a. Precision
b. Reliability
c. Objectivity
d. Standardization
11. Make alternative approximately equal in length
a. Poor- the most general cause of low individual incomes in the US is
i. Lack of valuable productive services to sell
ii. Unwillingness to work
iii. Automation
iv. Inflation
b. Better- What is the most general cause of low individual incomes in the US?
i. A lack of valuable production services to sell
ii. The population’s overall unwillingness to work
iii. The nations increase reliance on automation
iv. An increasing national level of inflation
12. Avoid irrelevant clues, such as grammatical structure, well-known verbal
associations or connections between stem and answer
a. Poor- (grammatical clue) A chain of islands is called an
i. Archipelago
ii. Peninsula
iii. Continent
iv. Isthmus
b. Poor- (verbal association) the reliability of test can be estimated by a
coefficient of
i. Measurement
ii. Correlation
iii. Testing
iv. Error
c. Poor- (connections between stem and answer) the height to which a water
dam is build depends on
i. The length of the reserve behind the dam
ii. The volume of the water behind the dam
iii. The height of water behind the dam
iv. The strength of the reinforcing the wall
13. Use at least four alternatives for each item to lower the probability of getting the item
correctly by guessing
14. Randomly distribute the correct responses among the alternative positions
throughout the test having approximately the same proportion of the alternatives a, b
,c, d and e as correct response
15. Use alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly. When
used, such alternatives should occasionally be used as the correct response.
True-false test items are typically used to measure the ability to identify whether or
not the statements are correct. The basic format is simply declarative statement that the
student must judge as true or false. No modification of the basic form in which the student
must respond “yes “ or “no” , “agree” or “disagree”.
Three Forms:
Examples:
It provides:
1. Base true-false items upon statements that are absolutely true or false, without
qualifications or exceptions
a. Poor- nearsightedness is hereditary in origin
b. Better- Geneticists and eye specialists believe that the predisposition to
nearsightedness is hereditary
2. Express the item statement as simply as clearly as possible
3. Express a single idea in each test item
4. Include enough background information and qualifications so that the ability to
respond correctly to the item does not depend on some special, uncommon
knowledge
5. Avoid lifting statements directly from the text, lecture or other materials so that
memory alone will not permit a correct answer
6. Avoid using negatively stated item statements
7. Avoid the use of unfamiliar vocabulary
8. Avoid the use of specific determiners which should permit a test wise but unprepared
examinee to respond correctly. Specific determiners refer to sweeping terms like
always, all, none, never, impossible, inevitable. Statements including such terms are
likely to be false. On the other hand, statements using qualifying determiners such
as usually, sometimes, often, are likely to be true. When statements require specific
determiners, make sure they appear both true and false items.
9. False items tend to discriminate more highly than true item. Therefore, use more
false items than true items (but not more than 15% additional false items)
In general matching items consists of a column of stimuli presented on the left side
of the exam page and a column of responses placed on the right side of the page. Students
are required to match the response associated with a given stimulus
1. Require short period of reading and response time, allowing the teacher to cover
more content
2. Provide objective measurement of student achievement or ability
3. Provide highly reliable test scores
4. Provide scoring efficiency and accuracy
1. Have difficulty measuring learning objectives more than simple recall or information
2. Are difficult to construct due to the problem of selecting a common set of stimuli and
responses
1. Include directions which clearly state the basis for matching the stimuli with the
responses. Explain whether or not the response can be used more than once and
indicate where to write the answer
2. Use only homogenous material in matching items
3. Arrange the list of responses in some systematic order if possible- chronological,
alphabetical.
4. Avoid grammatical or other clues to the correct response
NOTE:
Completion items:
A classroom essay test consists of a small number of questions to which the student
is expected to demonstrate his/her ability to:
Identify research methods used to study the (S-R) and (S-O-R) theories of
personality (10 pts.)
Essay items:
1. Are easier and less time consuming to construct than most item types;
2. Provide a means for testing student’s ability to compose an answer and present it in
logical manner and
3. Can efficiently measure higher order cognitive objectives- analysis, synthesis,
evaluation
Essay items:
1. Prepare essay items that elicit the type of behavior you want to measure
2. Phrase each items so that the student’s task is clearly indicated
3. Indicate for each item a point or weight and an estimated time limit for answering
4. Ask questions that will elicit responses on which experts could agree that one
answer is better than another
5. Avoid giving a student a choice among optional items as this greatly reduces the
reliability of the test
6. It is generally recommended for classroom examinations to administer several short-
answer items rather than only one or two extended response items
PRINCIPLE 3: BALANCED
PRINCIPLE 4: VALIDITY
1. Unclear directions- directions that do not clearly indicate to the students how to
respond to the task and how to record responses tend to reduce validity
2. Reading Vocabulary and sentence structure too difficult- vocabulary and
sentences structure that are too complicated for the student result in the
assessment of reading comprehension thus altering the meaning of assessment
result
3. Ambiguity- ambiguous statements in assessments tasks contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better
students more than it does the poor students
4. Inadequate time limits- time limits that do not provide students with enough time
to consider the tasks and provide thoughtful responses can reduce the validity of
interpretations of results
5. Overemphasis of easy to assess aspects of domain at the expense of important,
but hard- to assess aspects. It is easy to develop test question that assess
factual recall and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as the evaluation of
completing positions or arguments. Hence, it is important to guard against under
representation of task getting the important, but more difficult to assess aspects
of achievements
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking skills, and other complex types of achievements
with test forms that are appropriate for only measuring factual knowledge will
invalidate the results
7. Poorly constructed test items- test items that unintentionally provide clues to the
answer tend to measure the student’s alertness in detecting clues as well as
mastery of skills or knowledge the test is intended to measure
8. Test too short- if test is too short to provide a representative sample of the
performance we are interested in its validity will suffer accordingly
9. Improper arrangement of items- test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may
cause students to spend too much time on these and prevent them from reaching
items they could easily answer. Improper arrangement may also influenced
validity by having a detrimental effect on student motivation.
10. Identifiable pattern of answer- placing correct answers in some systematic
pattern enables students to guess the answers to some items more easily and
this lowers validity.
PRINCIPLE 5: RELIABILITY
Reliability- it refers to the consistency of scores obtained by the same person when
retested using the same instrument. Its parallel or when compared with other students who
took the same test
1. Test length. In general, a longer test is more reliable than a shorter one because
longer test sample the instructional objectives more adequately
2. Spread of scores. The type of students taking the test can influence reliability. A
group of students with heterogeneous ability will produce a large spread of test than
a group with homogenous ability.
3. Item difficulty. In general, test composed of items of moderate or average difficulty
(.30 to .70) will have more influence on reliability than those composed primarily of
easy of very difficult items.
4. Item discrimination. In general, test composed of more discriminating items will have
greater reliability than those composed of less discriminating items
5. Time limits. Adding a time factor may improve reliability lower-level cognitive test
items. Since all students do not function at the same pace, a time factor adds
another criterion to the test that causes discrimination, thus improving reliability.
Teachers should not, however, arbitrarily impose a time limit. For higher level
cognitive test items, the imposition of time may defeat the intended purpose of the
items.
Item Analysis
● Refers to the process of examining the student’s response to each item in the
test
● There are two characteristics of an item: desirable and undesirable
characteristics
● Desirable items- retained for subsequent use
1. Difficulty of an item
2. Discriminating power of an item
3. Measures of attractiveness
Difficulty index = the number of students who answer item X correctly / total no. of students
who answer item X
0.21-0.40 Difficult
0.61-0.80 Easy
Problem 1: there are 50 students who answered item X, 30 of whom answered the item
correctly. What is the difficulty index?
QUESTION A B C D
1 0 3 24* 3
2 12* 13 3 2
Discrimination index
Formula: Difficulty index of Upper Group (Pu)- Difficulty index of lower Group (PI)
● Positive discrimination- if the proportion of students who got an item right in the
upper group is GREATER THAN THE LOWER GROUP
● Negative discrimination- if the proportion of students who got an item right in the
lower group is GREATER THAN THE UPPER GROUP.
● Zero discrimination- if the proportion of students who got the item right in the upper
performing group and low performing group are EQUAL.
✔ It tends to lie within the center if it is arranged from lowest to highest and vice versa.
MEAN
● Measures stability
● When it is desired to give each score equal weight in determining the central
tendency
● When it is desired to find the measure of central tendency which has the highest
reliability
● When it is desired to compute the standard deviation and the coefficient of
correlation later on
formula:
mean= X/ N
MEDIAN
● refers to the centermost scores when the scores in the distribution are arranged
according to magnitude (from highest to lowest score or from lowest to highest
score)
● used when the middlemost score is desired
● when there are extreme scores, such as a few very high scores or a few low scores,
which could affect the mean disproportionately
MODE
● Refers to the score/s that occurs most frequently in the score distribution
Types of mode
25 25
24 24
24 24
20 20
20 18
20 18
16 17
12 10
10 9
Measures of Variability
● It is a single value that is used to describe the spread of scores in a distribution, that
is above or below the measures of central tendency
1. Range
2. Quartile deviation
3. Standard deviation
RANGE
It is the difference between the highest score and the lowest score in the data set
Formula: R=HS- LS
Problem:
Below are the scores of 10 students in Mathematics and Science. Find the range and in
what subjects has a greater variability
MATHEMATICS SCIENCE
35 35
33 40
45 25
55 47
62 55
34 35
54 45
36 57
47 39
40 52
MATHEMATICS
HS= 62
LS=33
R=HS-LS
R=62-33
R=29
SCIENCE
HS= 57
LS=25
R=HS-LS
R=57-25
R=32
● The scores in science are more scattered than the scores in mathematics
Interpretation of Result:
● Large range- dispersed, scattered, spread apart, far from each other,
heterogeneous, scores are more varied
QUARTILE DEVIATION
● it is the half of the difference between the THIRD QUARTILE (q3) ANF THE FIRST
QUARTILE (Q1)
● it is based on the middle 50% of the range, instead the range of the entire set of
distribution
● It is an average of the degree to which each set of scores in the distribution deviates
from the mean value
● It is more stable measure of variation because it involves all the scores in a
distribution deviates from the mean value
● It is more stable measure of variation because it involves all the scores in a
distribution compared to range and quartile deviation
MEASURES OF SKEWNESS
● Describe the degree of
departure of the scores from a
symmetry
3 classifications
Positively skewed
● Is a distribution where the thin and tail of the graph goes to the right part of
the curve
● This happens when most of the scores of the students are below the mean
● It tells you only on poor performance of takers but not the reasons why
students did poorly in the said examination
● Reasons of poor performance: 1. Ineffective teaching method and instruction;
2. Student’s unpreparedness; 3. Test items very difficult; 4. Not enough time
to answer test items
Negatively Skewed
● Is a distribution where the thin end tail of the graph goes to the left part of the
curve
● This happens when most of the scores of the students are above the mean
● It tells you only on good performance of takers but not the reasons why
students did well in the said examination
● Possible reasons of high scores: 1. Students are smart; 2. Enough time to
finish examination; 3. Test items very easy, 4. Effective instruction; 5.
Students have prepared for the examination
Normal distribution
● Is a special kind of symmetric distribution
● Can be determined using the values of the mean and standard deviation
● The end tails to the curve can be extended indefinitely in both sides
● The shape of the curve will depend on the value of the mean and standard deviation
● This means that the percentage of the examinees in the norm group who
scored below the score of interest
● It is used to clarify the interpretation of scores on standardized tests
2. Z- score
● The number of standard deviation units a score is above or below the mean
of a given distribution
● A positive z-score= measures the number of standard deviation a score is
above the mean
● A negative Z-score= gives the number of standard deviation a score is below
the mean
● Formula: Z= X- Mean / Standard Deviation
3. T-score
● It tells the location of a score in a normal distribution having a mean of 50 and
a standard deviation of 10
● Formula: T= 10z + 50
Stanines 1 2 3 4 5 6 7 8 9
Z-score = 1.9
● Tells the location of a raw score in a specific segment in a normal distribution which
is divided into 9 segments, numbered for low 1 through high of 9
● Scores falling within the boundaries of these segments are assigned one of these 9
numbers (standard nine)
MEASURES OF RELATIONSHIP
±1 Perfect correlation
0 No correlation
Pearson r
● It is used when the relationship between the two variables is a linear one
Spearman rank order Correlation or Spearman Rho