Assessment in Learning I Reviewer
Assessment in Learning I Reviewer
Test/Testing (Tool)
• an instrument designed to measure any quality, ability, skill, or knowledge
• the method of measuring the achievement of performance a learner
• refers to the administration, the scoring, and the interpretation of an
instrument, or the procedure designed to illicit information about performance in
a sample of a particular area of behaviour
Modes of Assessment
A. Traditional Assessment (Standardized Testing)
B. Performance Assessment (Skill and Applied Knowledge)
C. Portfolio Assessment (Multiple Indicators of Progress)
a. Placement Assessment
- performance at beginning of instruction
- determines the most beneficial mode of evaluation
b. Diagnostic Assessment
- determines level of competence
- given at the start to measure the mastery of the prerequisite learning
c. Formative Assessment
- given to monitor learning progress
- aims to provide feedback on both learners and instructors
d. Summative Assessment
- given at the unit to determined if the objectives were achieved
Types/Purposes of Assessment
Assessment for learning
- occurs when inferences are used about student progress to inform one's
teaching (formative)
Assessment of learning
- occurs when evidence of learning is used to make judgements on student
achievement against goals and standards (summative)
Assessment as learning
- occurs when students reflect on and monitor their progress to inform their
future learning goals (self assessment)
- Designing
- Constructing
- Planning
- Producing
- Inventing
- Devising
- Making
Products include:
Evaluating
Justifying a decision or course of action
Checking, hypothesizing, critiquing, experimenting, judging
*The learner makes decisions based on in-depth reflection, criticism and
assessment (judging the value of ideas, materials and methods by developing and
applying standards and criteria.)
- Checking
- Hypothesizing
- Critiquing
- Experimenting
- Judging
- Testing
- Detecting
- Monitoring
Products include:
Analyzing
Breaking information into parts to explore understandings and relationships
Comparing, organizing, deconstructing, interrogating, finding
*The learner breaks learned information into its parts to best understand that
information (breaking information down into its component elements)
- Comparing
- Organizing
- Deconstructing
- Attributing
- Outlining
- Finding
- Structuring
- Integrating
*The learner makes use of information in a context different from the one in
which it was learned (using strategies, concepts, principles and theories in new
situations)
- Implementing
- Carrying out
- Using
- Executing
Products include:
Understanding
Explaining ideas or concepts
Interpreting, summarizing, paraphrasing, classifying, explaining
- Interpreting
- Exemplifying
- Summarizing
- Inferring
- Paraphrasing
- Classifying
- Comparing
- Explaining
- Recognizing
- Listing
- Describing
- Identifying
- Retrieving
- Naming
- Locating
- Finding
Products include:
• Lower level questions are those at the remembering, understanding and lower
level application levels of the taxonomy.
• Usually questions at the lower levels are appropriate for:
• Evaluating students’ preparation and comprehension
• Diagnosing students’ strengths and weaknesses
• Reviewing and/or summarizing content
• Higher level questions are those requiring complex application, analysis,
evaluation or creation skills.
• Questions at higher levels of the taxonomy are usually most appropriate for:
• Encouraging students to think more deeply and critically
• Problem solving
• Encouraging discussions
• Stimulating students to seek information on their own
Achievement Tests
— Test Batteries and Single-Subject Tests.
— Personality Assessment.
— Specific Subject Tests (English, social studies or chemistry)
Teacher-Made Tests
Three Reasons for Teacher-Made Tests:
l. They are consistent w/ classroom goals and objectives.
2. They present same questions to all students under nearly identical conditions.
3. They generate a product that can be evaluated and stored for later use—for
example, parent conferences.
2. Multiple-Choice Items:
− Can Cover Many Objectives.
− Measures Different Cognitive Behaviors—factual to the analysis of complex
data.
− Extremely Versatile and Easy to Score.
− Must be Written in a Straightforward, Clear and Concise way.
− Can be Modified after being Adminsitered.
− Relatively Insensitive to Guessing—BUT more sensitive to Guessing than
Supply Items.
3. Matching:
− Designed to Measure Students’ Ability to Recall a Recall
Large Amount of Factual Information—verbal, Factual associative knowledge.
− Two Lines of Items are Presented and Students to Select an Item from One
List that Closely Relates to an Item from the Second List.
− Intended for Lower-Level Learning.
4. Completions:
−Require that Students Write Responses in their Own Handwriting Supplying a
Recalled Word/Phrase.
−Difficult to Write.
−Excellent for Subjects that Require the Recall of Unambiguous Facts/Perform
Certain Calculations.
• Guidelines for Creating Completions:
− Give Clear Directions.
− Be Definite Enough so that Only One Correct Answer is Possible.
− Do Not Utilize Direct Statements from Textbooks (it might Encourage
Memorization)
− Ensure that that all Blanks are of Equal Length and Correspond to the Lengths
of Desired Responses.
− Items should be Completed w/a Single Word/Brief Phrase.
5. Essay:
− Permits Students to Formulate Answers to Questions in their Own Words.
− Measure what Students Know because They Utilize their Own Storehouse of
Knowledge to answer a Question.
− Determines Students’ ability to: analyze, Synthesize, Evaluate
and Solve Problems.
Quizzes:
− Evaluates Student Progress.
− Check Homework.
− Measure whether Content from Immediate/Preceding Lessons was
Understood.
− Short in Length—three to five questions.
− Limited to Material Taught in Immediate/Preceding Lessons.
− Encourage Students to Keep w/their Work.
− Provide Feedback for Teachers Related to their Effectiveness.
− Serve as Warning Signal of Teaching/Learning Problems.
Grading Systems
• Teachers Collect Relevant Data and then Must Interpret it and Assign Grades.
• There is No Way to Assign Grades that is Fair to All Students.
• There are Two Grading Systems:
1. Absolute Grading Standards.
2. Relative Grading Standards.
The Three Ways Three Ways to Assign Grades after Examining Students’ Work
are:
1. Point Grading System—the Importance of Each Assignment,
Quiz/Test is Reflected in the Points Allocated.
2. Weighted Grading System—every Assignment is Given a Letter Grade and All
Grades are Then Weighted to Arrive at a Final Grade.
3. Percentage Grading System—relies on the Calculation of the
Percentage Correct of the Responses Corrected.
− Widely Used because of its Simplicity and Familiarity to Most Caregivers.
− Weakness w/this System is All Exercises Carry the Same Weight.
• Contract System:
− Teacher Promises to Award a Specific Grade for Specified Performance.
− Students Know What they Must Accomplish to Receive a Certain Grade.
Still the two differ. They differ in the quality of test items, the reliability of
test measures, the procedures for administering and scoring and the
interpretation of scores. No doubt, standardized tests are good and better in
quality, more reliable and valid.
But a classroom teacher cannot always depend on standardized tests. These may
not suit to his local needs, may not be readily available, may be costly, and may
have different objectives. In order to fulfill the immediate requirements, the
teacher has to prepare his own tests which are usually objective type in nature.
Teacher-made test is one of the most valuable instrument in the hands of the
teacher to solve his purpose. It is designed to solve the problem or
requirements of the class for which it is prepared.
The following steps may be followed for the preparation of teacher-made test:
l. Planning:
Planning of a teacher-made test includes:
a. Determining the purpose and objectives of the test, 'as what to measure and
why to measure'.
b. Deciding the length of the test and portion of the syllabus to be covered.
c. Specifying the objectives in behavioral terms. If needed, a table can even be
prepared for specifications and weightage given to the objectives to be
measured.
d. Deciding the number and forms of items (questions) according to blueprint.
e. Having a clear knowledge and understanding of the principles of constructing
essay type, short answer type and objective type questions.
f. Deciding date of testing much in advance in order to give time to teachers
for test preparation and administration.
g. Seeking the co-operation and suggestion of co-teachers, experienced teachers
of other schools and test experts.
2. Preparation of the Test:
Planning is the philosophical aspect and preparation is the practical aspect of test
construction. All the practical aspects to be taken into consideration while one
constructs the tests. It is an art, a technique. One is to have it or to acquire it.
It requires much thinking, rethinking and reading before constructing test items.
Different types of objective test items viz., multiple choice, short-answer type
and matching type can be constructed. After construction, test items should be
given 10 others for review and for seeking their opinions on it.
The suggestions may be sought even from others on languages, modalities of the
items, statements given, correct answers supplied and on other possible errors
anticipated. The suggestions and views thus sought will help a test constructor
in modifying and verifying his items afresh to make it more acceptable and
usable.
When creating a test or selecting a test, you need to think about these three
characteristics: reliability, validity, and usability.
RELIABILITY: A reliable test is one that will give the same results over and
over again. It's consistent, dependable, and stable. It's important that a test is
reliable so that you can count on the results. For example, if you give the same
test to the same group of students three times in a row in a short period of
time, the results should not
fluctuate widely. If you use a different form of the test, the results should also
remain consistent. If they don't, the test is not reliable. For example, if you have
two test items to measure one objective, do the students who get one right
also get the other right and the students who get one wrong get the other one
wrong too? You want a test to be reliable so that you can count on it to test
for the same things no matter who you give it to and when you give it. To
improve reliability, you can increase the number of test items, give the test to a
mixed student group, include test items that are of moderate difficulty rather
than of mainly easy or hard questions, double check to make sure all test items
are clear and understandable, and use test items that can be scored objectively
rather than subjectively.
Types of Reliability
l. Test-retest reliability is a measure of reliability obtained by administering the
same test twice over a period of time to a group of individuals. The scores
from Time I and Time 2 can then be correlated in order to evaluate the test for
stability over time.
Example: A test designed to assess student learning in psychology could be given
to a group of students twice, with the second administration perhaps coming a
week after the first. The obtained correlation coefficient would indicate the
stability of the scores.
VALIDITY: When a test is valid, it measures what it's designed to measure. For
example, if you are trying to test if your students have achieved the following
objective "Given a plow, students will be able to plow on the contour to help
prevent soil erosion" but test by using a test item that asks why it's important
to plow on the contour, your test will not provide a valid measure of this
objective.
To test for that objective, you need to actually see the students plow. Or if
your objective is to have students' list three causes of reef destruction, but
the test question has students list three causes of ocean pollution, the test item
doesn't match the objective.
If the test question were to ask students to list three causes of reef
destruction, the question would be valid.
One way to make sure your test is valid is to double check each test item and
make sure each is measuring your pre-determined objectives. You can also ask
your colleagues to rate your questions against your objectives to make sure
they match.
Why is it necessary?
Types of Validity
I. Face Validity ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity.
Although this is not a very "scientific" type of validity, it may be an essential
component in enlisting motivation of stakeholders. If the stakeholders do not
believe the measure is an accurate assessment of the ability, they may become
disengaged with the task.
2. Construct Validity is used to ensure that the measure is actually measure what
it is intended to measure (i.e. the construct), and not other variables. Using a
panel of "experts" familiar with the construct is a way in which this type of
validity can be assessed. The experts can examine the items and decide what that
specific item is intended to measure. Students can be involved in this process
to obtain their
feedback.
Example: When designing a rubric for history one could assess student's
knowledge across the discipline. If the measure can provide information that
students are lacking knowledge in a certain area, for instance the Civil Rights
Movement, then that assessment tool is providing meaningful information that
can be used to improve the course or program requirements.
Additionally, a panel can help limit "expert" bias (i.e. a test reflecting
What an individual personally feels are the most important or relevant areas).
USABILITY: You should also select tests based on how easy the test is to use.
In addition to reliability and validity, you need to think about how much time you
have to create a test, grade it, and administer it -You need to think about how
you will interpret and use the scores from the tests. And you need to check to
make sure the
test questions and directions are written clearly, the test itself is short enough
not to overwhelm the students, the questions don't includes stereotypes or
personal biases, and that they are interesting and make the students think.