0% found this document useful (0 votes)
4 views

Final Notes Assessment of Learning 1 and 2

The document provides a comprehensive overview of assessment in education, defining key terms such as tests, measurements, assessments, and evaluations. It outlines different types of assessments, including formative, summative, and diagnostic assessments, as well as various assessment methods and principles for high-quality classroom assessment. Additionally, it discusses the use of rubrics and portfolios in evaluating student performance and emphasizes the importance of clear learning targets and appropriate assessment methods.

Uploaded by

K Ann M Gulayda
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Final Notes Assessment of Learning 1 and 2

The document provides a comprehensive overview of assessment in education, defining key terms such as tests, measurements, assessments, and evaluations. It outlines different types of assessments, including formative, summative, and diagnostic assessments, as well as various assessment methods and principles for high-quality classroom assessment. Additionally, it discusses the use of rubrics and portfolios in evaluating student performance and emphasizes the importance of clear learning targets and appropriate assessment methods.

Uploaded by

K Ann M Gulayda
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

CAREPOINT REVIEW CENTER

ASSESSMENT OF LEARNING 1 AND 2

INTRODUCTION TO ASSESSMENT OF LEARNING


To better understand assessment, let us define important items related to it.
What is a test?

● An instrument designed to measure any characteristic, quality, ability, knowledge or


skill
● It comprised of items in the area it is designed to measure

What is a measurement?

● A process of quantifying the degree to which someone possesses a given trait, i.e.
quality of characteristics or feature
● Assigning of numbers to performance, product, skill or behavior of a student, based
on a pre-determined procedure or set of criteria
● Assigning of numbers to the results of a test or other type of assessment

● Awarding points for a particular aspect of an essay or performance

What is assessment?

Assessment can be defined both as product and a process.

Assessment as a product

● It refers to the instrument that is designed to elicit a predetermined behavior, unique


performance, or a product from a student

Assessment as a process

● Collection, interpretation and use of qualitative and quantitative information to assist


teachers in their educational decision-making

Hence, assessment is a pre-requisite to evaluation. It provides the information which


enables evaluation to take place

What is evaluation?
● A process of making judgments about the quality of a performance, product, skill or
behavior of a student. It includes using some basis to judge worth or value. It
involves judgment about the desirability or changes in the students.

ASSESSMENT FOR ASSESSMENT OF ASSESSMENT AS


LEARNING LEARNING LEARNING

✔ Includes 3 types of ✔ This is done after ✔ This done for


assessment done instruction. teachers to
before and during 1. Summative understand their role
instruction of assessing for and
1. Placement of learning
2. Formative 1. Teachers should
3. Diagnostic undergo trainings.

ASSESSMENT FOR LEARNING

1. Placement- done prior to instruction


✔ Its purpose is to assess the needs of the learners to have basis in
planning for a relevant instruction.
✔ The results of this assessment place students in specific learning groups
to facilitate teaching and learning
2. Formative
✔ Done during instruction

✔ It is the assessment where teachers continuously monitor the student’s


level of attainment of the learning objectives
✔ The results of this assessment are communicated clearly and promptly to
the students.
3. Diagnostic
✔ Done during instruction . Know the strengths and weakness

✔ This is used to determine student’s recurring or persistent difficulties

✔ It searches for the underlying causes of student’s learning problems that


do not respond to first aid treatment
✔ It helps formulate a plan for detailed REMEDIAL INSTRUCTION
Assessment of Learning

1. Summative Assessment
✔ This Is done after instruction

✔ It is used to certify what students know and can do and level of their
proficiency or competence
✔ Its results reveal whether or not instructions have successfully achieved the
curriculum outcomes
✔ The information from assessment of learning is usually expressed as marks
or letter grade
✔ The results of which are communicated to the students, parents and other
stakeholders of decision making
✔ It is also a powerful factor that could have pave the way for educational
reforms

Assessment AS Learning

✔ This is done for teachers to understand and perform well their role of
assessing FOR and OF learning
✔ It requires teachers to undergo training on how to assess learning and be
equipped with the following competencies needed in performing their work as
assessors

PLACEMENT SUMMATIVE FORMATIVE DIAGNOSTIC

● Done before ● Done after ● Reinforces ● Determine


instruction instruction successful recurring or
learning persistent
● Determines ● Certifies
difficulties
the mastery of mastery of the ● Provides
searches for
pre requisite intended continuous the underlying
skills learning feedback to causes of
outcomes both students these
● Not graded
and teachers problems that
● Graded
concerning do not
Ex: quarter learning respond to
exams, unit or success and first aid
chapter tests, failures treatment
final exams
● Not graded ● Helps
formulate a
● Ex: short
plan for a
quizzes, detailed
recitation remedial
instruction

● Determines the extent of what pupils ● Administered during instruction


have achieved or mastered in the
● Designed to formulate a plan for
objectives of the intended instruction
remedial instruction
● Determine the student’s strengths
● Modify the teaching and learning
and weaknesses
process
● Place the students in specific learning
● Not graded
groups to facilitate teaching and
learning
● Serve as a pretest for the next unit

● Serve as basis in planning for a


relevant instruction

MODE DESCRIP EXAMPLES ADVANTAGES DISADVANTAGES


TION

TRADITIO The
● Standardize ● Scoring is ● Preparation
NAL objective
paper and d test objective of instrument
pen test is time
● Teacher- ● Administratio
which consuming
mode tests n is easy
usually because ● Prone to
assess students can cheating
low-level take the test
thinking at the same
skills time
PERFORM A mode of
● Practical ● Preparation ● Scoring
ANCE assessme
nt that test of the tends to be
requires instrument is subjective
● Oral and
actual relatively without
aural test easy rubrics
demonstra
tion of ● Projects ● Measures ● Administratio
skills of behaviors n is time
creation of that cannot consuming
products of be deceived
learning

PORTFOLI A process
● working ● measures ● development
O of
gathering portfolios student’s is time
multiple growth and consuming
● show
indicators development
portfolios ● rating tends
of student intelligence
fair to be
progress ● documentar
subjective
to support y portfolios without
course rubrics
goals in
dynamic,
ongoing
and
collaborati
ve
progress.
S

Traditional and Authentic Assessment compared

traditional Authentic

Selecting response Performing a task

Contrived Real-life

Recall/ recognition Construction/ application

Teacher-structured Student-structured

Indirect evidence Direct evidence


Seven Criteria in selecting a good performance Assessment Task

1. Authenticity- that task is similar to what the students might encounter in the real word
as opposed to encountering only in the school
2. Feasibility- the task is realistically implementable in relation to its cost, space, time
and equipment requirements
3. Generalizability- the likelihood that the student’s performance on the task will
generalize to comparable tasks
4. Fairness- the task is fair to all students regardless of their social status or gender
5. Teachability- the task allows one to master the skill that one should proficient in
6. Multi Foci- the task measures multiple instructional outcomes
7. Scorability- the task can be reliably and accurately evaluated

PORTFOLIO ASSESSMENT

Is also an alternative tool to pen and paper objective test. It is a purposeful, ongoing,
dynamic, and collaborative process of gathering multiple indicators of the leaner’s growth
and development. It is also a performance based.

TYPES OF ASSESSMENT PORTFOLIO

1. Documentation or Working Portfolio


● To highlight development and improvement over time

● Showcase the process by including full progression of project development

● Often involves a range of artifacts from brainstormed lists to rough drafts to


finished products
2. Process Portfolio
● To document all stages of the learning process

● It also includes samples of student work throughout the entire educational


progression
● It expands on the information in a documentation portfolio by integrating
reflections and higher order cognitive activities.
3. Product or showcase Portfolio
● To highlight student’s best work by showcasing the quality and range of
student’s accomplishments.

Rubric is a measuring instrument used in rating performance –based tasks. It is the key to
corrections for assessment tasks designed to measure the attainment of learning
competencies that require demonstration of skills or creation of products of learning. It
offers a set of guidelines or descriptions in scoring different levels of performance or
qualities of products of learning. It can be used in scoring both the process and products of
learning

Similarity of Rubric with Other Scoring Instruments

Rubric is a modified checklist and rating scale

1. Checklist
● Presents the observed characteristics of a desirable performance or
product
● The rater checks the trait/s that has/ have been observed in one’s
performance or product
2. Rating Scale
● Measures the extent or degree to which trait has been satisfied by one’s
work or performance
● Offers overall description of the different levels of quality of a work or
performance
● Uses 3 to more levels to describe the work of performance although the
most common rating scales have 4 or 5 performance levels
3. Rubric
● Shows the observed traits or a work/performance

● Shows degree of quality of work/performance


TYPES OF RUBRICS

1. HOLISTIC RUBRICS

It describes the overall quality of a performance or a product. In this rubric, there is


only one rating given to the entire work or performance.

Advantages

● It allows fast assessment

● It provides one score to describe the overall performance or quality of work

● It can indicate the general strengths and weaknesses of the work


performance

Disadvantages:

● It does not clearly describe the degree of the criterion satisfied or not by the
performance of the product
● It does not permit differential weighting of the qualities of a product/
performance

2. ANALYTIC RUBRIC

It describes the quality of a performance or product in terms of the identified


dimensions/ criteria for which are rated independently to give a better picture of the quality
of work or performance

Advantages:

● It clearly describes the degree of the criterion satisfied or not by the


performance or product
● It permits differential weighting of the qualities of a product /performance

● It helps raters pinpoint specific areas of strengths and weaknesses

Disadvantages:

● It is more time consuming to use

● It is more difficult to construct


PRINCIPLES OF HIGH QUALITY CLASSROOM ASSESSMENT

PRINCIPLE 1: CLEAR AND APPROPRIATE LEARNING TARGETS

● Learning targets should be clearly stated, specific, and centers on what is


truly important
✔ Knowledge- student’s mastery of substantive subject matter

✔ Reasoning- student’s ability to use knowledge to reason and solve problems

✔ Skills- ability to demonstrate achievement-related skills

✔ Products- ability to create achievement-related products

✔ Affective/ Disposition- student’s attainment of affective states such as attitudes,


values, interest and self-efficacy

PRINCIPLE 2: APPROPRIATE METHODS


OBJECTI OBJEC ESSAY PERFORMA ORAL OBSERVATI SELF-REPORT
VE TIVE NCE BASED QUESTIO ON
SUPPLY SELEC N
TION

Short Multiple Restricte Presentations Oral Informal Attitude


answer choice d
respons Papers Examinati Formal
e on
Projects Survey
Completi Matchi
on test ng type Athletics
Extende Conferenc
Demonstratio es Sociometric
d ns devices
True/ respons
false e Exhibitions
interviews
portfolios Questionnaires

inventories
Assessment Methods

Types of test according to FORMAT

1. Selective type- provides choices for the answer


a. Multiple choice- consists of a stem which describes the problem and 3 or
more alternatives which give the suggested solutions. The incorrect
alternatives are the distractors
b. True-False or Alternative Response- consists of declarative statement that
one has to mark true or false, right or wrong, correct or incorrect, yes or no,
fact or opinion and the like
c. Matching type- consists of two parallel columns; Column a, the column of
premises from which a match is sought; column b, the column of responses
from which the selection is made
2. Supply Test
a. Short answer- uses a direct question that can be answered by a word,
phrase, number or symbol
b. Completion test- consists of an incomplete statement
3. Essay Test
a. Restricted response- limits the content of the response by restricting the
scope of the topic
b. Extended Response- allows the students to select any factual information that
they think is pertinent to organize their answers in accordance with their best
judgment

Projective Tests

✔ A psychological test that uses images in order to evoke responses from a subject
and reveal hidden aspects of the subject’s mental life
✔ These were developed in an attempt to eliminate some of the major problems
inherent in the use of self-report measures, such as the tendency of some
respondents to give “socially responsible “ for responses

Important Projective Techniques

1. Word Association Test- an individual is given a clue or hint and asked to respond to
the first thing that comes to mind.
2. Completion test- in this the respondents are asked to complete an individual
sentence or story. The completion will reflect their attitude and state of mind.
3. Construction Techniques- this is more or less like completion test. They can give you
a picture and you are asked to write a story about it. The initial structure is limited
and not detailed like the completion test.
4. Expression Techniques- in this, the people are asked to express the feeling or
attitude of other people.

Guidelines for Constructing Test Items

When to use Essay Tests

Essays are appropriate when:

1. The group to be tested is small and the test is not to be used again;
2. You wish to encourage and reward the development of student’s skill in
writing;
3. You are more interested in exploring the student’s attitudes than in
measuring his/her academic achievement;
4. You are more confident of your ability as critical and fair reader as an
imaginative writer of good objective test items

When to use Objective Test Items

Objective test items are especially appropriate when:

1. The group to be tested is large and the test may be reused;


2. Highly reliable test scores must be obtained as efficiently as possible;
3. Impartiality of evaluation, absolute fairness, and freedom from possible
test scoring influences –fatigue, lack of anonymity are essential;
4. You are more confident of your ability to express objective test items
clearly than your ability to judge essay test answers correctly;
5. There is more pressure for speedy reporting of scores than for speedy
test preparation.

Multiple Choice Items

It consists of:

1. Stem- which identifies the question or problem


2. Response alternatives or options
3. Correct answer

Ex:

Which of the following is a chemical change? STEM

a. Evaporation of alcohol
b. Freezing of water
c. Burning of oil
d. Melting of wax
Advantage of using Multiple Choice Items

Multiple choice items can provide:

1. Versatility in measuring all levels of cognitive ability


2. Highly reliable test scores
3. Scoring efficiency and accuracy
4. Objective measurement of student achievement or ability
5. A wide sampling of content or objectives
6. A reduced guessing factor when compared to true-false items
7. Different response alternatives which can provide diagnostic feedback

Limitations of Multiple Choice Items

1. Difficult and time consuming to construct


2. Lead a teacher to favor simple recall of facts
3. Place a high degree of dependence on student’s reading ability and teacher’s writing
ability

Suggestions for Writing Multiple Choice Items

The STEM

1. When possible, state the stem as a direct question rather than as an incomplete
statement
a. Poor- alloys are ordinarily produced by…
b. Better- how are alloys ordinarily produced?
2. Present a definite, explicit singular question or problem in the stem
a. Poor- Psychology …
b. Better- the science of mind and behavior is called…
3. Eliminate excessive verbiage or irrelevant information from the stem.
a. Poor- While ironing her formal polo shirt, June burned her hand accidentally
on the hot iron. This was due to a heat transfer because…
b. Better- which of the following ways of heat transfer explains why June’s hand
was burned after she touched the hot iron?
4. Include in the stem any word/s that might otherwise be repeated in each alternative
a. Poor- In national elections in the US, the president is officially
i. Chosen by the people
ii. Chosen by the electoral college
iii. Chosen by the Members of Congress
iv. Chosen by the House of Representatives
b. Better- In national elections in the US, the President is officially chosen by
i. The electoral college
ii. Members of the congress
iii. House of Representatives
5. Use negative stated questions sparingly. When used, underline and/or capitalize the
negative word.
a. Poor- which of the following is not cited as an accomplishment of Arroyo
administration?
b. Better- Which of the following is NOT cited as an accomplishment of Arroyo
administration?
6. Make all alternatives plausible and attractive to the less knowledgeable or skillful
student.
a. What process is most nearly opposite of photosynthesis?
i. Poor-
1. Digestion
2. Relaxation
3. Respiration
4. Exertion
ii. Better
1. Digestion
2. Assimilation
3. Respiration
4. Catabolism
7. Make the alternative grammatically parallel with each other and consistent with the
stem
a. Poor- What would advance the application of atomic discoveries to medicine?
a. Standardized techniques for treatment of patients
b. Train the average doctor to apply the radioactive
treatment
c. Remove restriction on the use of radioactive substances
d. Establishing hospital staffed by highly trained
radioactive therapy specialists.
b. Better- What would advance the application of atomic discoveries to
medicine?
a. Development of standardized techniques for treatment
of patients
b. Removal of restriction on the use of radioactive
substances
c. Addition of trained radioactive therapy specialists to
hospital staffs
d. Training the average doctor in application of radioactive
treatments
8. Make the alternatives mutually excessive
a. Poor- the daily minimum required amount of milk that a 10-year old child
should drink is
a. 1-2 glasses
b. 2-3 glasses
c. 3-4 glasses
d. At least 4 glasses
b. Better- What is the daily minimum required amount of milk a 10-year old
should drink?
a. 1 glass
b. 2 glasses
c. 3 glasses
d. 4 glasses
9. When possible, present in some logical order
a. At 7am, two trucks leave a diner and travel north. One truck averages 42
miles per hour and the other averages 38 miles per hour. At what time will be
24 miles apart?
i. Undesirable
a. 6pm
b. 9am
c. 1am
d. 1pm
e. 6am
ii. Desirable
a. 1am
b. 6am
c. 9am
d. 1pm
e. 6pm
10. Be sure there is only one correct or best response to the item
a. Poor- The two most desired characteristics in a classroom test are validity
and
a. Precision
b. Reliability
c. Objectivity
d. Consistency
b. Best- the two most desired characteristics in a classroom test are validity and
a. Precision
b. Reliability
c. Objectivity
d. Standardization
11. Make alternative approximately equal in length
a. Poor- the most general cause of low individual incomes in the US is
i. Lack of valuable productive services to sell
ii. Unwillingness to work
iii. Automation
iv. Inflation
b. Better- What is the most general cause of low individual incomes in the US?
i. A lack of valuable production services to sell
ii. The population’s overall unwillingness to work
iii. The nations increase reliance on automation
iv. An increasing national level of inflation
12. Avoid irrelevant clues, such as grammatical structure, well-known verbal
associations or connections between stem and answer
a. Poor- (grammatical clue) A chain of islands is called an
i. Archipelago
ii. Peninsula
iii. Continent
iv. Isthmus
b. Poor- (verbal association) the reliability of test can be estimated by a
coefficient of
i. Measurement
ii. Correlation
iii. Testing
iv. Error
c. Poor- (connections between stem and answer) the height to which a water
dam is build depends on
i. The length of the reserve behind the dam
ii. The volume of the water behind the dam
iii. The height of water behind the dam
iv. The strength of the reinforcing the wall
13. Use at least four alternatives for each item to lower the probability of getting the item
correctly by guessing
14. Randomly distribute the correct responses among the alternative positions
throughout the test having approximately the same proportion of the alternatives a, b
,c, d and e as correct response
15. Use alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly. When
used, such alternatives should occasionally be used as the correct response.

True-False Test Items

True-false test items are typically used to measure the ability to identify whether or
not the statements are correct. The basic format is simply declarative statement that the
student must judge as true or false. No modification of the basic form in which the student
must respond “yes “ or “no” , “agree” or “disagree”.

Three Forms:

1. Simple- consists of only two choices


2. Complex- consists of more than two choices
3. Compound- two choices plus a conditional completion response

Examples:

Simple- the acquisition of morality is a developmental process True False

Complex- the acquisition of morality is a developmental process True False


opinion

Compound- an acquisition of morality is a developmental process true False

If the statement Is false, what makes it false?

Advantages of True-False items

It provides:

1. The widest sampling of content or objectives per unit of testing time


2. Scoring efficiency and accuracy
3. Versatility in measuring all level of cognitive ability
4. Highly reliable test scores; and
5. An objective measurement of student achievement or ability

Limitations of True-false Items

1. Incorporate an extremely high guessing factor


2. It can often lead the teacher to write an ambiguous statements due to the
difficulty of writing statements which are unequivocally true or false
3. Do not discriminate between students varying ability as well as other item types
4. It can often include more irrelevant clues than do other item types
5. It can often lead a teacher to favor testing of trivial knowledge

Suggestions for Writing True-False Items

1. Base true-false items upon statements that are absolutely true or false, without
qualifications or exceptions
a. Poor- nearsightedness is hereditary in origin
b. Better- Geneticists and eye specialists believe that the predisposition to
nearsightedness is hereditary
2. Express the item statement as simply as clearly as possible
3. Express a single idea in each test item
4. Include enough background information and qualifications so that the ability to
respond correctly to the item does not depend on some special, uncommon
knowledge
5. Avoid lifting statements directly from the text, lecture or other materials so that
memory alone will not permit a correct answer
6. Avoid using negatively stated item statements
7. Avoid the use of unfamiliar vocabulary
8. Avoid the use of specific determiners which should permit a test wise but unprepared
examinee to respond correctly. Specific determiners refer to sweeping terms like
always, all, none, never, impossible, inevitable. Statements including such terms are
likely to be false. On the other hand, statements using qualifying determiners such
as usually, sometimes, often, are likely to be true. When statements require specific
determiners, make sure they appear both true and false items.
9. False items tend to discriminate more highly than true item. Therefore, use more
false items than true items (but not more than 15% additional false items)

Matching Test Items

In general matching items consists of a column of stimuli presented on the left side
of the exam page and a column of responses placed on the right side of the page. Students
are required to match the response associated with a given stimulus

Advantages of Using Matching Test Items

1. Require short period of reading and response time, allowing the teacher to cover
more content
2. Provide objective measurement of student achievement or ability
3. Provide highly reliable test scores
4. Provide scoring efficiency and accuracy

Disadvantages of Using Matching Test Items

1. Have difficulty measuring learning objectives more than simple recall or information
2. Are difficult to construct due to the problem of selecting a common set of stimuli and
responses

Suggestions for Writing Matching Test items

1. Include directions which clearly state the basis for matching the stimuli with the
responses. Explain whether or not the response can be used more than once and
indicate where to write the answer
2. Use only homogenous material in matching items
3. Arrange the list of responses in some systematic order if possible- chronological,
alphabetical.
4. Avoid grammatical or other clues to the correct response
NOTE:

1. Keep matching items brief, limiting the list of stimuli to under 10


2. Include more responses than stimuli to help prevent answering through the
process of elimination
3. When possible, reduce the amount of reading time by including only short
phrases or single words in the response list.

Completion Test Items

The completion items requires the student to answer a question or finish an


incomplete statement by filling in a blank with correct word or phrase

Advantages of using Completion Items

Completions items can:

1. Provide a wise sampling of content


2. Efficiently measure lower levels of cognitive ability
3. Minimize guessing as compared to multiple choice or true-false items; and
4. Usually provide an objective measure of student achievement or ability

Limitations of Using completion Items

Completion items:

1. Are difficult to construct so that the desired response is clearly indicated;


2. Have difficulty in measuring learning objectives requiring more than simple recall
of information;
3. Can often include irrelevant clues than do other item types;
4. Are more time consuming to score when compared to multiple choice or true-
false items; and
5. Are more difficult to score since more than one answer may have to be
considered correct if the item was not properly prepared.

Suggestions for Writing Completion Test Items

1. Omit only significant words form the statement


2. Do not omit so many words from the statements that the intended meaning is lsot.
3. Avoid grammatical or other clues to the correct response
4. Be sure there is only one correct response
5. Make the blanks of equal length
6. When possible, delete words at the end of the statement after the student has been
presented a clearly defined problem
7. Avoid lifting statements directly from the text, lecture or other sources
8. Limit the required response to a single word of phrase
Essay Test items

A classroom essay test consists of a small number of questions to which the student
is expected to demonstrate his/her ability to:

a. Recall factual knowledge


b. Organize this knowledge; and
c. Present the knowledge is a logical, integrated answer to the
question

Classification of Essay test:

1. Extended- response essay item


2. Limited response or short-answer essay item

Example of Extended-Response Essay Item:

Explain the difference between S-R (stimulus-response) and S-O-R (stimulus-


organism-response) theories of personality. Include in your answer the following:

a. Brief description of both theories


b. Supporters of both theories
c. Research methods used to study each of the two theories (20
points)

Example of Short- Answer Essay Item:

Identify research methods used to study the (S-R) and (S-O-R) theories of
personality (10 pts.)

Advantages of Using Essay items

Essay items:

1. Are easier and less time consuming to construct than most item types;
2. Provide a means for testing student’s ability to compose an answer and present it in
logical manner and
3. Can efficiently measure higher order cognitive objectives- analysis, synthesis,
evaluation

Limitations of Using Essay Items

Essay items:

1. Cannot measure a large amount of content or objectives;


2. Generally provide a low test scores reliability
3. Require an extensive amount of instructor’s time to read and grade; and
4. Generally do not provide an objective measure of student achievement or ability

Suggestions for Writing Essay Test Items

1. Prepare essay items that elicit the type of behavior you want to measure
2. Phrase each items so that the student’s task is clearly indicated
3. Indicate for each item a point or weight and an estimated time limit for answering
4. Ask questions that will elicit responses on which experts could agree that one
answer is better than another
5. Avoid giving a student a choice among optional items as this greatly reduces the
reliability of the test
6. It is generally recommended for classroom examinations to administer several short-
answer items rather than only one or two extended response items

Guidelines for Grading Essay items

1. When writing each essay item, simultaneously develop a scoring rubric


2. To maintain a consistent scoring system and ensure same criteria are applied to all
assessments, scores or an essay across all test prior to scoring the next essay
3. To reduce the influence of the halo effect, bias and other subconscious factors, all
essay questions should be graded blind to the identity of the student
4. Due to the subjective nature of grading essays, the score on one essay may be
influenced by the quality of previous essays. To provide this type of bias, reshuffle
the order of assessments after reading through each item.

PRINCIPLE 3: BALANCED

A balanced assessments sets targets in all sets in domains of learning ( cognitive,


affective, and psychomotor) or domains of intelligence (verbal-linguistics, logic
mathematical, bodily- kinesthetic, visual-spatial, musical-rhythmic, intrapersonal- social,
intrapersonal-introspection, physical world-natural- existential- spiritual)

A balanced assessment makes use of both traditional and alternative assessment

PRINCIPLE 4: VALIDITY

Validity- is a degree to which the assessment instrument measures what it intends to


measure

✔ It also refers to the usefulness of the instrument for a given purpose

✔ It is the most important criterion of a good assessment instrument

Ways in Establishing Validity

1. Face Validity- is done by examining the physical appearance of the instrument


2. Content Validity- is done through a careful and critical examination of the
objectives of assessment so that it reflects the curricular objectives
3. Criterion- related validity- is established statistically such that a set of scores
revealed by the measuring instrument is correlated with the scores obtained in
another external predictor or measure

It has two purposes:

a. Concurrent Validity- describes the present status of the individual by


correlating the sets of scores obtained from two measures given
concurrently
b. Predictive Validity- describes the future performance of an individual by
correlating the sets of scores obtained from two measures given at a
longer time interval.
4. Construct Validity- validity established by analyzing the activities and processes
that correspond to a particular concept ; is established statistically by comparing
psychological traits or factors that thematically influence scores in a test.
a. Convergent validity helps to establish construct validity when you see two
different measurement procedures and research methods in your study to
collect data about a construct.
b. Divergent validity helps to establish construct validity by demonstrating
that the construct you are interested in is different from other constructs
that might be present in your study.

Factors Influencing the Validity of an Assessment Instrument

1. Unclear directions- directions that do not clearly indicate to the students how to
respond to the task and how to record responses tend to reduce validity
2. Reading Vocabulary and sentence structure too difficult- vocabulary and
sentences structure that are too complicated for the student result in the
assessment of reading comprehension thus altering the meaning of assessment
result
3. Ambiguity- ambiguous statements in assessments tasks contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better
students more than it does the poor students
4. Inadequate time limits- time limits that do not provide students with enough time
to consider the tasks and provide thoughtful responses can reduce the validity of
interpretations of results
5. Overemphasis of easy to assess aspects of domain at the expense of important,
but hard- to assess aspects. It is easy to develop test question that assess
factual recall and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as the evaluation of
completing positions or arguments. Hence, it is important to guard against under
representation of task getting the important, but more difficult to assess aspects
of achievements
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking skills, and other complex types of achievements
with test forms that are appropriate for only measuring factual knowledge will
invalidate the results
7. Poorly constructed test items- test items that unintentionally provide clues to the
answer tend to measure the student’s alertness in detecting clues as well as
mastery of skills or knowledge the test is intended to measure
8. Test too short- if test is too short to provide a representative sample of the
performance we are interested in its validity will suffer accordingly
9. Improper arrangement of items- test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may
cause students to spend too much time on these and prevent them from reaching
items they could easily answer. Improper arrangement may also influenced
validity by having a detrimental effect on student motivation.
10. Identifiable pattern of answer- placing correct answers in some systematic
pattern enables students to guess the answers to some items more easily and
this lowers validity.

PRINCIPLE 5: RELIABILITY

Reliability- it refers to the consistency of scores obtained by the same person when
retested using the same instrument. Its parallel or when compared with other students who
took the same test

METHOD TYPES OF PROCEDURE STATISTICAL


RELIABILITY MEASURE
MEASURE

1. Test-Retest Measure of stability Gives a test twice to Pearson r


the same group with
anytime interval
between tests from
several minutes to
several years

2. Equivalent Measure of Give parallel forms of Pearson r


forms equivalence tests with close time
interval between
forms

3. Test-retest Measure stability and Give parallel forms of Pearson r


with tests with increased
equivalent equivalence time interval between
forms forms

4. Split half Measure of internal Give a test once. Pearson r and


consistency Score is equivalent Spearman Brown
halves of the test. Formula

5. Kuder- Measure of internal Give the test once Kuder-Richardson


Richardson consistency then correlate the Formula 20 and 21
proportion/percentag
e of the students
passing and not
passing a give item

Improving Test Reliability

Several test characteristics affect reliability. They include the following:

1. Test length. In general, a longer test is more reliable than a shorter one because
longer test sample the instructional objectives more adequately
2. Spread of scores. The type of students taking the test can influence reliability. A
group of students with heterogeneous ability will produce a large spread of test than
a group with homogenous ability.
3. Item difficulty. In general, test composed of items of moderate or average difficulty
(.30 to .70) will have more influence on reliability than those composed primarily of
easy of very difficult items.
4. Item discrimination. In general, test composed of more discriminating items will have
greater reliability than those composed of less discriminating items
5. Time limits. Adding a time factor may improve reliability lower-level cognitive test
items. Since all students do not function at the same pace, a time factor adds
another criterion to the test that causes discrimination, thus improving reliability.
Teachers should not, however, arbitrarily impose a time limit. For higher level
cognitive test items, the imposition of time may defeat the intended purpose of the
items.

Item Analysis

● Refers to the process of examining the student’s response to each item in the
test
● There are two characteristics of an item: desirable and undesirable
characteristics
● Desirable items- retained for subsequent use

● Undesirable items- revised or rejected

Three criteria in Determining Desirability and Undesirability of an Item

1. Difficulty of an item
2. Discriminating power of an item
3. Measures of attractiveness

Item Difficulty Level: definition

● The number of people who answer a particular test item correctly

Difficulty index = the number of students who answer item X correctly / total no. of students
who answer item X

Level of Difficulty of an item

INDEX RANGE DIFFICULTY LEVEL

0.0-0.20 Very difficult

0.21-0.40 Difficult

0.41-0.60 Moderately difficult

0.61-0.80 Easy

0.81-1.00 Very easy

Problem 1: there are 50 students who answered item X, 30 of whom answered the item
correctly. What is the difficulty index?

ANSWER: Difficulty index= 30/50= 0.60 or moderately

Problem 2: get the difficulty index of each item

QUESTION A B C D

1 0 3 24* 3
2 12* 13 3 2

Discrimination index

● It is the difference between the proportion of high performing students who


got the item right and the proportion of low performing students who got an
item right
● Discrimination index- is the degree to which the item discriminates between
high performing and low performing group (upper and lower 27%)

Formula: Difficulty index of Upper Group (Pu)- Difficulty index of lower Group (PI)

DISCRIMINATION INDEX ITEM EVALUATION

.40 and up Very good item

0.30 – 0.39 Reasonably good item but possibly subject


for improvement

Marginal item, usually needing and being


0.20-0.29 subject to improvement

Poor item, to be rejected or improved by


Below 0.19 revision

● Positive discrimination- if the proportion of students who got an item right in the
upper group is GREATER THAN THE LOWER GROUP
● Negative discrimination- if the proportion of students who got an item right in the
lower group is GREATER THAN THE UPPER GROUP.
● Zero discrimination- if the proportion of students who got the item right in the upper
performing group and low performing group are EQUAL.

Items are retained if:

● Difficulty index: within .26 to .75

● Discrimination index: .20 and above

Items need to be revised if:


● Difficulty index: .26 to .75

● Discrimination index: .20 and below

Items need to be Discarded/ rejected if:

● Difficulty index: not within 0.26 to .75

● Discrimination index: .19 and below

MINOR PRINCIPLES OF ASSESSMENT

✔ Administrability- the test should be easy to administer such that directions


should clearly indicate how student should respond to the test / task items
and how much time should he/she spend for each item or for the whole test.
✔ Scoreability- the test should be easy to score such directions for scoring are
clear/points for each correct answers are specified
✔ Interpretability- test scores can easily be interpreted and described in terms of
the specific tasks that a student can perform or his relative position in clearly
defined group
✔ Economy- the test should be given in a cheapest way in terms of time and
efforts spent for administrator of the test and answer sheets must be provided
so the test can be given from time to time.

MEASURES OF CENTRAL TENDENCY AND VARIABILITY

✔ It is a single value that is used to identify the center of the data

✔ It is taught as the typical value in the set of scores

✔ It tends to lie within the center if it is arranged from lowest to highest and vice versa.

MEAN

● Refers to the arithmetic average

● Used when the data are interval or in ratio level of measurement


● Used when the frequency distribution is regular, symmetrical or normal

● Easily affected by extreme scores

● Very easy to compute

● Measures stability

● Used to compute other measures such as standard deviation, coefficient of variation,


skewness and score

When to use the mean

● When it is desired to give each score equal weight in determining the central
tendency
● When it is desired to find the measure of central tendency which has the highest
reliability
● When it is desired to compute the standard deviation and the coefficient of
correlation later on

Steps in solving the mean value using raw scores:

1. get the sum of all the scores in the distribution


2. identify the number of scores
3. substitute to the given formula and solve the mean value

formula:

mean= X/ N

MEDIAN

● refers to the centermost scores when the scores in the distribution are arranged
according to magnitude (from highest to lowest score or from lowest to highest
score)
● used when the middlemost score is desired

● used when the data are in ordinal level of measurement


● used when the frequency distribution id irregular or skewed

● used when there are extreme scores

● not affected by extreme scores because it is a positional measure

● may not be an actual observation in the data set

WHEN TO USE THE MEDIAN

● when a quick and easily computed measure of central tendency is desired

● when there are extreme scores, such as a few very high scores or a few low scores,
which could affect the mean disproportionately

Median of Ungrouped Data

1. Arrange the scores from lowest to highest or highest to lowest


2. Determine the middle score in a distribution if n is an odd number and get the
average of the two middle score is an even number
3. Solve the median (ungrouped data) 19,17,16,15,10,5,2

MODE

● Refers to the score/s that occurs most frequently in the score distribution

● Used when the data are nominal level of measurement

● Used when quick answer is need

● Used when the score distribution is normal

● Can be used for quantitative, as well as qualitative data

● May not be unique

● Not affected by extreme values

● May not exist at times

Types of mode

1. Unimodal- is score distribution that consists of one mode


2. Bimodal- score distribution that consists of two modes
3. Trimodal- score distribution that consists of three modes. It is also considered as
multi-modal of a score distribution that consist of more than two modes.

When to use the MODE

1. When it is desired to find the score that occurs most often


2. When it is desired to find the measure of central tendency that has greatest
concentration

Find the mode and identify its classification

Scores of Section A Scores of Section B

25 25

24 24

24 24

20 20

20 18

20 18

16 17

12 10

10 9

Measures of Variability

● It is a single value that is used to describe the spread of scores in a distribution, that
is above or below the measures of central tendency

Three measure of variability:

1. Range
2. Quartile deviation
3. Standard deviation

RANGE

It is the difference between the highest score and the lowest score in the data set
Formula: R=HS- LS

Problem:

Below are the scores of 10 students in Mathematics and Science. Find the range and in
what subjects has a greater variability

MATHEMATICS SCIENCE

35 35

33 40

45 25

55 47

62 55

34 35

54 45

36 57

47 39

40 52

MATHEMATICS

HS= 62

LS=33

R=HS-LS

R=62-33

R=29

SCIENCE

HS= 57

LS=25

R=HS-LS
R=57-25

R=32

● The scores in Science have greater variability

● The scores in science are more scattered than the scores in mathematics

Interpretation of Result:

● Small range- closer, clustered, homogeneous, scores are less varied

● Large range- dispersed, scattered, spread apart, far from each other,
heterogeneous, scores are more varied

QUARTILE DEVIATION

● it is the half of the difference between the THIRD QUARTILE (q3) ANF THE FIRST
QUARTILE (Q1)
● it is based on the middle 50% of the range, instead the range of the entire set of
distribution

formula: QD= Q3-Q1/2

STANDARD DEVIATION (SD)

● It is the most important and useful measures of variation

● It is the square root of the variance

● It is an average of the degree to which each set of scores in the distribution deviates
from the mean value
● It is more stable measure of variation because it involves all the scores in a
distribution deviates from the mean value
● It is more stable measure of variation because it involves all the scores in a
distribution compared to range and quartile deviation

MEASURES OF SKEWNESS
● Describe the degree of
departure of the scores from a
symmetry

3 classifications

1. Positively skewed distribution


2. Negatively skewed distribution
3. Normally distributed – Sk=0

Formula: 3 (mean-median)/ s (standard deviation)

Positively skewed

● Skewed to the right

● Is a distribution where the thin and tail of the graph goes to the right part of
the curve
● This happens when most of the scores of the students are below the mean

● It tells you only on poor performance of takers but not the reasons why
students did poorly in the said examination
● Reasons of poor performance: 1. Ineffective teaching method and instruction;
2. Student’s unpreparedness; 3. Test items very difficult; 4. Not enough time
to answer test items

Negatively Skewed

● Skewed to the left

● Is a distribution where the thin end tail of the graph goes to the left part of the
curve
● This happens when most of the scores of the students are above the mean

● It tells you only on good performance of takers but not the reasons why
students did well in the said examination
● Possible reasons of high scores: 1. Students are smart; 2. Enough time to
finish examination; 3. Test items very easy, 4. Effective instruction; 5.
Students have prepared for the examination

Normal distribution
● Is a special kind of symmetric distribution

● Can be determined using the values of the mean and standard deviation

Properties of Normal Distribution

● The curve has a single peak, meaning the distribution is unimodal

● It is a bell shaped curve

● It is symmetrical to the mean

● The end tails to the curve can be extended indefinitely in both sides

● The shape of the curve will depend on the value of the mean and standard deviation

MEASURE OF RELATIVE POSITION

● Indicates when a score is in relation to all other scores in the distribution

● They make it possible to compare the performance of an individual on two or more


different tests
1. Percentile Rank
● It is the percentage of the scores in the frequency distribution which are lower

● This means that the percentage of the examinees in the norm group who
scored below the score of interest
● It is used to clarify the interpretation of scores on standardized tests

The Descriptive Scale of Percentile Ranks

PERCENTILE RANKS DESCRIPTIVE TERMS

95 or above Very high; superior

85-95 High; excellent

75-85 Above average; good

25-75 Average; satisfactory or fair


15-25 Below average; slightly weak

5-15 Low; weak

5 or below Very low; very weak

2. Z- score
● The number of standard deviation units a score is above or below the mean
of a given distribution
● A positive z-score= measures the number of standard deviation a score is
above the mean
● A negative Z-score= gives the number of standard deviation a score is below
the mean
● Formula: Z= X- Mean / Standard Deviation
3. T-score
● It tells the location of a score in a normal distribution having a mean of 50 and
a standard deviation of 10
● Formula: T= 10z + 50

● It has a mean of 50 and a standard deviation of 10


4. Stanines
● Also known as standard nine

● Are single digit scores ranging from, 1-9

● The distribution of raw scores is divided into nine parts

Percent in 4 7% 12% 17% 20% 17% 12% 7% 4%


stanines %

Stanines 1 2 3 4 5 6 7 8 9

Formula stanine= 1.96z + 5

● Has a mean of 5 and a standard deviation of 1.96


Problem: Bryan had a Z-score of 1.9. What is bryan’s stanine score?

Z-score = 1.9

● Tells the location of a raw score in a specific segment in a normal distribution which
is divided into 9 segments, numbered for low 1 through high of 9
● Scores falling within the boundaries of these segments are assigned one of these 9
numbers (standard nine)

MEASURES OF RELATIONSHIP

● These describe the degree of relationship or correlation between the two


variables
● The relationship between the results of two administration of tests would
determine the reliability of the instrument
● The greater is the degree of relationship, the more reliable is the test

±1 Perfect correlation

± .75 to .99 Very high correlation

± .50 to .74 High correlation

±.25 to .49 Moderately low correlation

± .10 to .24 Very low correlation

±.01 to .09 Negligible correlation

0 No correlation

Pearson r

● It is the most appropriate measure of correlation when sets of data are of


interval or ratio type
● It is the most stable measure of correlation

● It is used when the relationship between the two variables is a linear one
Spearman rank order Correlation or Spearman Rho

● it is most appropriate measure of correlation when the variable is expressed


as ranks instead of scores or when data represent an ordinal scale
● spearman rho is also interpreted in the same ways as Pearson r.

You might also like