Evaluation
Objectives
After completion of this chapter, the readers will be able to:
Describe about concept of test, measurement and evaluation.
State purpose of evaluation.
Identify the steps of evaluation.
Distinguish types of evaluation
Explain the characteristics of evaluation tool.
State the general principles of evaluation.
List different types of evaluation in nursing education.
Explain the planning of classroom test.
Explain the different classroom test with their advantages, disadvantages
and general principles
Explain various methods of clinical evaluation with principles of using
them.
10.1 Concept of Evaluation
Introduction
Evaluation is a part of life. Even in small things like which dress to wear for
work, whal gift to buy or when to cross the road, evaluation has to be
made. In education, evaluations important because only through
evaluation a teacher can judge the development of students, the changes
taking place in their behaviour, the progress they are making in the class
and also the effectiveness of her/his own teaching in the class. Thus,
evaluation has been an integral part of any teaching and learning
situation. It exerts significant influence on system of education. Teaching
for successful learning may not be useful without high quality evaluation.
The quality of any educational system is, thus, directly connected with the
quality of evaluation.
The evaluation is a systematic process of collecting evidence about
students achievement cognitive, affective and psychomotor of learning on
the basis of which judgments are formed and decisions are made
Evaluation is concerned with the application of its findings and implies
some judgment on the effectiveness, social utility or desirability of a
product, process or programme in terms of carefully defined and agreed-
upon objectives or values
Definition
Tyler (1450) defined evaluation as "a systematic process of determining
the extent to which educational objectives are achieved by learners. This
definition indicates that evaluation is a systematic process, done on the
basis of predetermined objective(s) to judge the progress. growth and
development of learners,
"Evaluation is the systematic process of collecting, analyzing and
interpreting information to determine the extent to which learners are
achieving instructional objectines"
-Gronlundand Linn
It is the process which determines the degree of attainment of learning
objectives by the learner
"It is the continuous process based upon criteria developed, concerned
with measurement of the performance of the learner, the effectiveness of
teachers and the quality of program
-Guilbert
Evaluation in Teaching and Learning
Evaluation is an integral part of any teaching and learning programme.
Whenever a question is asked and answered evaluation takes place. Thus,
both teaching and evaluation comes together. In fact, it is not possible to
have teaching and learning without evaluation.
Both teaching and evaluation are based on the instructional objectives
which provide direction to them. Instructional objectives are desirable
behaviours which are to be achieved by the learners. Instructions are
given for achieving the instructional objectives evaluation is done to
assess whether objectives are achieved and to what extent. The
interrelationship of objectives, instructional process or the learning
experience and evaluation in a programme of teaching can be expressed
more clearly through the following diagram:
The above diagram illustrates that the three components of teaching and
learning constitute an integrated network in which each component
depends on the other. Thus, through evaluation, the teacher not only
assesses the learner's achievement of the learning objectives but also
judges the effectiveness of the learning experiences, methodologies
means and the materials used for achieving those objectives.
Evaluation is an important aspect of any educational system. It is a
systematic process carried out in the classroom or school for providing
information for taking important decisions. A teacher should be well
known with the concept of evaluation and procedures used for evaluation
in order to make her/his teaching more purposeful and effective. The
teacher should know what are the objectives that are to be tested, what
techniques and tools are to be used for testing them most appropriately
and how to use evaluation for taking decisions.
Test, Measurement and Evaluation
The term test, measurement and evaluation are inter-related and occur in
same time. The test is the set of question, the measurement is the
assigning numbers to the test result following particular rule [e.g.
counting correct answers) and evaluation involves the value judgment.
The data collected during the evaluation process are only as good as the
measurements upon which the evaluations are based.
The test is the set of question or systematic procedure for measuring a
specific behaviour. Testing is only a technique to collect evidence
regarding learners' behaviour. It provides answer to how well the learner
performed. Measurement is the process of obtaining a numerical
description of the degree to which an individual possesses particular
mething possesses a given trait, ie. quality, characteristic or feature.
Measurement characteristics. Measurement is the process of quantifying
the degree to which someone or permits more precise, more objective
description concerning traits and facilitates comparisons. It answers the
question how much the learner achieved. However, valid measurement in
education is not easy. Some elusive qualities or characteristics such as
empathy appreciation, motivation or interests are difficult to measure.
Compared to mesurement of physical characteristics such as height and
weight, some measurements Indirect and it requires some scale for
measurement. It is difficult to develop standards or valid scales for
measuring traits like intelligence, attitude etc.
The term measurement is not synonymous with the administration of pen-
and pencil tests. Dam may be collected via processes such as observation
and analysis and rating of a product. In some cases, required data may
already be available
Praluation is more comprehensive and inclusive term which includes
testing and measurement and also qualitative description of the learner
behaviour. The term measurement is limited to quantitative description of
achievement and qualitative asription [desirable worth or value of the
result). Evaluation also includes value judgment regarding the worth or
desirability of the behavior assessed. Therefore Goenlund (1981) has
indicated this relationship in the following equation:
Evaluation Quantitative description of learners (measurement) + Value
judgment
Evaluation Qualitative description of learners (non-measurement) + Value
judgment
Thus, evaluation may not be based on measurement alone but it goes
beyond the simple quantitative score. For example, if a student gets 60
marks in an English language test, it alone does not tell us whether his/her
achievement is satisfactory or not. It is only when we compare this mark
of 60% with the marks obtained by other students in the class or with
certain criteria laid down in advance, or with the learner's own marks in
previous tests, we are able to judge or evaluate whether his/her
achievement in English is satisfactory or not. Thus, a student's
achievement may be viewed at three different levels:
Self-referenced: how the student is progressing with reference to
himself/herself.
Criterion-referenced: how the student is progressing with reference to the
criteria set by the teacher.
Norm-referenced: how the student is progressing with reference to his/her
peer group.
Evaluation consists of finding out the extent to which objectives has been
attained. It requires the collection and analysis of data and interpretation
of that data with respect to one or more criteria. The more objective the
criteria, better the evaluation. Some degree of tubjectivity is unavoidable
since people are involved in making the final judgment. In finding answers
to the above questions, evaluation is not merely concerned with
determining whether something is 'good' or 'bad'. Evaluation is used to:
determine the current status of the object of evaluation
compare the status with a set of standards or criteria
select an alternative in order to make decision
Purposes of Evaluation
To determine the level of knowledge and understanding of the
learner at various times.
To determine the level of the learners' performance at various
stages.
To diagnose each student's strength and weaknesses and suggest
remedial measures needed.
To encourage student's learning by measuring their achievement
and informing them of their success.
To become aware of the specific difficulties of individual students or
of an entire class as a basis for further teaching.
▸ To provide additional motivation of examination that provide
opportunity to practice critical thinking, the application of principles,
making the judgment.
To estimate the effectiveness of teaching learning techniques and
instructional medias used by teacher.
To gather information needed for administrative purposes such as
selecting students for placements, writing recommendations and
meeting graduation requirements.
To assess the achievement of the educational activities and improve
it accordingly.
To determine achievement of educational goals
Steps in Evaluation
The process of evaluation involves determination of the types of
data which needs to be collected, determination of the individual,
groups or groups from which data will be obtained, collection of
data, analysis of data, interpretation of the data and decision
making.
Develop the criteria of the educational objective
Develop measuring instruments
Use measuring instruments
Interpret measurement data
Formulate judgments and take appropriate action
Types of Evaluation
According to Use of Result
1. Placement Evaluation: Placement evaluation determines the
learners performance (knowledge and skills) at the beginning of the
session or entry performance. It is used to determine the learner's
knowledge and skills needed to begin the planned instruction or
readiness for the new instruction. Pre-test, readiness test, entrance
examination, aptitude test etc. are the examples of placement
evaluation.
2 Formative Evaluation: It is type of evaluation that monitors
learning progress during instruction and provides continuous
feedback to the learners concerning success and failures. Feedback
to learners acts as reinforcement for successful learning and
correction of the learning errors. It also provides feedback to the
teacher for modification of his/her instruction and for prescribing
group or individual work. Therefore, it is directed towards improving
learning and instruction. Internal Assessment, unit test are the
examples of formative evaluation
1 Diagnostic Evaluation: This type of evaluation diagnoses the
learning difficulties. The main purpose of diagnostic evaluation is to
find out the underlying cause of weaknesses in a student's learning.
It is highly specialized procedure, concerned with the persistent or
recurrent learning difficulties that are left unresolved by the
standard corrective prescription of formative evaluation. If a learner
continues experience failure in learning despite use of prescribed
alternate methods of instruction more detailed diagnostic evaluation
is indicated. It involves use of individualized diagnostic test as well
as various observational techniques. Problem may require the
psychological and medical remedial specialist. It is rare in use.
Summative Evaluation: Summative evaluation evaluates the
learners' achievement at the end of instruction. It is designed to
determine the extent to which the instructional objectives have
been achieved and is used primarily for assigning course grade or
for certifying the learners' mastery of the intended learning
outcomes. It may include evaluation on various types of
performance and product such as research. This type of evaluation
also provides information for judging the appropriateness of the
course objective and the effectiveness of instruction. Examples of
summative evaluation are final examination.
Types of Evaluation in Terms of Interpretation of Test Result
There are two basic ways of interpreting learners' performance.
1. Norm Referenced Test (NRT): The test describes the performance
in terms of the relative position held in some known group. It helps
to find the position a student holds or ranks compared to the group
achievement.
. Criterian Referenced Test (CRT): The test describes the specific
performance a leamer can perform without reference to the
performance of others. It is designed to find out whether the
students meet the predetermined criteria or objectives witho
comparison of achievement of others in the group.
4 fo
These two types of tests are viewed as the ends of continum rather
than a clear cut dichotomy. The criterian referenced test, at one end
emphasizes description of performance and the norm referenced
test at other end emphasizes discrimination among the learners.
Table 10.1: Differences between NRT and CRT
Assessment of rank or position.
Assessment of achievement of previously set criteria.
Standard is relative to the performance of same known group.
Standard is absolute.
Covers large domains of learning tasks with just few items measure
each specific task.
Typically focuses delimited domains of learning task large number of
items measuring each specific task.
Favors items of average difficulty and omits easy items.
Without altering item difficulty or omission.
Primarily used for survey testing.
Primarily used for mastery testing.
Interpretation requires clearly defined group
Interpretation requires clearly defined achievement domain
Characteristic of Evaluation Tool
Like any other measuring tool, educational evaluation tools should
possess certain characteristics which ensure that it elicits
meaningful information. These desirable characteristics are as
following.
1. Validity
Validity is the extent to which the test instrument is measuring that
is actually intended to measure. It is always concerned with the
specific use of the results and the soundness of proposed
interpretation. It answers the question what extent the
interpretation of the scores be appropriate, meaningful and useful
for the intended application of the results. Valid tools are relevant
and reliable. Relevant means it should test the important and useful
learning outcomes. As an examination can only sample a few areas
and cannot test all desirable learning outcomes, it is very important
that only relevant and important learning outcomes are tested in
examination.
It is vital for a test to be valid in order for the results to be
accurately applied and and considered in terms of high validity,
moderate validity and low validity. For example interpreted. It is the
foremost characteristics of the evaluation tool. It is matter of degree
if we want to test a student's ability to conduct normal delivery, the
valid test is to ask adatto conduct normal delivery either on a
simulator or in real situation. Asking a ouston in a written test about
technique of conduction of normal delivery will not be a ase or
interpretation. No test is valid for all purposes. There three types of
validity are as lid test of the skills to conduct normal delivery. It is
always specific to some particular tewing
ance
Content Validity
Content validity is related with how well the test items on the test
represent the entire ange of contents the best should cover. It
means the sample test represent the learning amas to be measured
Content validation is the process of determining the extent to which
set of test provides a relevant and representative sample of the
learning areas. To ensure ntent validity the test should be prepared
comparing/considering the table of specification, Content validity is
especially important in classroom test.
Reliable Not valid
Valid
Not reliable Not reliable Not valid Figure 10.3: Relationship between
Validity and Raliability
Reliable
Valid
tain able
b. Criteria Related Validity
Criteria related validity is defined as the process of determining the
extent to which test performance is related to some other valued
measure of performance. It predicts/estimates current performance
on some valued measures [criterion] other than test itself. According
bestmation based on time period it is of 2 types: concurrent and
predictive validity
d to s of the ults and test tant
*Concurrent validity is the extent to which the results of a test
correlate with those of other tests administered at the same time.
This indicates the extent to which the test scotes accurately
estimate an individual's current state with regards to the criterion
For example, on a test that measures levels of depression, the test
would be said to have concurrent validity if it measured the current
levels of depression experienced by the test taker.
W Predictive validity compares test scores with another measure of
performance obtained at a later date. It predicts relationship
between the two measures over an extended period time. Examples
of test with predictive validity are career or aptitude tests which are
helpful in determining who is likely to succeed or fail in certain
subjects or occupations.
c. Construct Validity
Construct validity is the extent to which the results of a test are
related to the data gained from the measurement of individual
behavior with regard to the construct in question. It demonstrates
an association between the test scores and the prediction of a
theoretical trait Intelligence tests are one example of measurement
instruments that should have construct validity.
Factors Influencing Validity
Numerous factors may influence the test result.
i. Factors related to test itself
The test items should measure the subject matter, content behavior
that the teacher is interested in testing. However following factors
can prevent the test items from functioning as intended and lower
validity of the interpretation of the test scores.
Unclear direction: Directions that do not clearly indicate how to
respond to the items will tend to reduce validity.
Difficult vocabulary and sentence structure: Difficult and
complicated vocabulary and sentence structures may result in
different perception interpretation of the test and may provide
invalid result.
Inappropriate level of difficulty of the test items: Failure to match
difficulty specified by the learning outcome will lower validity.
Similarly, too easy or too difficult test items will not provide reliable
discrimination among students and lowers validity.
▸ Poorly Constructed Test Items: Test items that unintentionally
provide clues to the answer will tend to measure the students
alertness in detecting clues in addition to the aspects of students
performance that is intended to measure.
» Ambiguity: Ambiguous statements in test items contribute to
misinterpretation and confusion.
Inappropriate Test items: Attempting to measure understanding,
thinking skills writing skills, with objective type test result, similarly
measuring performance. attitude domain with paper pencil test will
provide invalid results.
Inadequate Time Limits: Time limits that do not provide students
with enough time to consider and provide thoughtful response can
reduce the validity of the interpretation of the test results. Students
focus on speed of response rather than quality of response.
Improper arrangements of items: Test items are arranged in order of
difficulty with easiest items first. Placing difficult items early in the
test may cause pupils to spend much time on these and prevent
completion of items.
Exaluation
221 Test too short: A test is only sample of the many that might be
asked, If a test is too short, it does not provide representative
sample of the content to be covered and it lowers validity.
>> Identifiable Pattern of Answers: Placing correct answers in some
systematic patterns e.g. T, F, T, F in true false type questions or A,
B, C, D or B, B or C, C number right answer will enable students to
guess the answers more easily and it will lower validity.
i. Factors Related to Test Administration and Scoring
The administration and scoring of a test may also have detrimental
effect on the validity of the test results. Factors such as inadequate
time to complete the test cheating during the examination, and
unreliable scoring of essay answers tend to lower validity. Adverse
physical and psychological conditions at the time of testing may
have negative effect.
iii. Factors Related to Student's Response
Students response may be influenced by emotional disturbances like
stress, fear and anxiety. These and other factors restrict or alter
student's actual response and so distract test result.
2. Reliability
Reliability is the consistency of the measurement or how consistent
test scores or other evaluation results are from one measurement to
another. A test is considered reliable if we get the same result
repeatedly. Reliability refers to the consistent results obtained with
an evaluation instrument so it is reliability of test scores than of the
test. Reliability is necessary but not sufficient condition for validity,
so high reliability does not ensure high validity. Reliability is
primarily statistical. It is assessed by administering test more than
one time to same group or different test and consistency of the
result is determined. The methods of estimating reliability are test-
retest method, split half method and equivalent form method.
Test-retest Method
To estimate reliability by means of the test-retest method, the same
test is administered to the same group of students at two different
points of time. The test scores are correlated f the scores are similar
on both occasion then the test is said to have high reliability. It is
alse called stability or how stable the test results are. There should
be optimum time interval between test and retest. If too short time
interval eg. 1-2 days students may remember some of the answer
and too long eg. year interval their response may be changed due to
time duration.
Equivalent form Method [Parallel or Alternate Forml
The two forms of the test are administered to the same group of
students and the result Scores are correlated. The two forms of test
are built from the same set of specifications but 222
Educational Science for Nurses
are constructed independently. It indicates the degree to which both
forms of the test ar measuring the same aspects of behavior. This
method may be used in same time or in sam time interval. Second
one indicates both stability and equivalence.
Split Half Method
Split half model reliability can be estimated from a single
administration of a single form of a test. The test is administered to
a group of students in the usual manner and then is divided in two
half for scoring purpose, equivalent two halves is done by odd and
even numbered items separately. Scores are correlated which
measures internal consistency. It indicates the degree to which
consistent results are obtained from the two halves of the test
indicates the equivalence of the two halves and adequacy of the
sampling.
Factors Influencing Reliability
Various factors affect the reliability of the test scores.
Considerations of the factors influencing reliability will help to
interpret reliability wisely and help to construct more reliable
classroom test.
Length of test: The longer the test, higher its reliability will be.
Longer test will provide more adequate sample of the behavior to be
measured and the scores are less distorted by chance factors such
as guessing. So, it is important to keep in mind to make
optimum/adequate longer test. If short test are necessary because
of time limit or other factors, frequent testing should be used.
Spread of Score: Chance of errors of measurement is less when
spread of score is high. Spreading score for specific questions like
2+4+4 is better than keeping whole 10 mans in 1 question.
Difficulty of the Test: In easy test, majority of the scores are at the
top end of the scale and in difficult test, the scores are at the
bottom end. In both situation, differences among individuals are
small and tend to be unreliable. Ideal difficulty permits the scores to
spread over the full range of scale from zero to perfect with average
score 50%.
Objectivity: The objectivity of a test refers to the degree to which
equally competent evaluator obtains the same scores.
Inconsistencies in scoring have an adverse effect on the reliability of
the score obtained.
▸ Administration: It is essential that each student has the same time,
equipment instructions, assistance and examination environment.
Test directions should be strictly enforced.
>> Standards: The standards of performance that are established
for one class should be consistent with those used in other classes.
A change in grading policies not based upon facts, uniform
standards will affect the reliability of test results.
>> Instruction: The reliability of tests results will be affected if the
instruction presented to a class tends to over emphasize the
teaching points included in the examination.
This is often known as "teaching the test" and is undesirable. When
the instructor gives students obvious clues as to the test
requirements, it not only affects the reliability of the test, but he
insults the intelligence of his class.
3. Objectivity
refers to the degree to which equally competent evaluators obtain
the same result as there is no chance for the subjective judgment of
the scorer. It is the extent which several independent examiners
agree on what constitutes an acceptable level of performance. Test
ctivity means that an individual's score is the similar regardless of
who is doing the sering A test is objective when instructor's opinion,
bias, or individual judgment is not a saint factor in scoring.
Objectivity is a relative term. The multiple choice question are
completely objective while other evaluation like essay question,
clinical and practical examination, oral examination have varying
degree of objectivity.
the possible difference between the people scoring the same test
high that test is low in objectivity. Even in essay type question,
objectivity can be increased by careful phrasing of the questions
and by using standard set of rules for scoring to increase both
reliability and validity.
4. Usability/Practicability of the Evaluation Tool
It is concerned with the practicability of the test instrument. It
means the overall simplicity of test instrument for the test
constructor, administrator and students. Practicability of evaluation
tool is related with the following
Easiness to administer.
* Requirement of optimum time for administration and can be
managed with available class time.
Requirement of reasonable and minimum resources to use it.
Easy to score.
# Clarity of score reports, and relevance of norms for applying the
results
Relatively inexpensive.
# Can be administered and interpreted by teachers without special
training
5. Discrimination
The test should be constructed in such a manner that it will
distinguish the high achiever and low achiever or detect or measure
small differences in achievement or attainment. The degree of
difficulty in the question should be such that the high achieving and
low achieving students can be distinguished. Often, as to obtain
reliability, it is necessary to Increase the length of the test to get
clear-cut discrimination. A discriminating test
لله
224
Educational Science for Nurses
produces a wide range of scores when administered to the students
who have significantly different achievements.
includes items at all levels of difficulty. Some items will be answered
correctly only by the best students, others will be relatively easy
and will be answered correctly by mos students. If all students
answer an item correctly, it lacks discrimination.
General Principles of Evaluation
10 As
1.
ind
Evaluation is a process to determine the extent to which the
instructional objectives have been achieved by the students. As
mentioned earlier, there are different factors which can threat the
quality of the evaluation. Principles of evaluation is concerned to
minimize those threat and to achieve valid and reliable test
measurement. So that purpose of evaluation could be fulfilled.
1. Determine the expected outcomes (objectives) to be evaluated.
2 Select evaluation techniques appropriate for the measurement of
the expected outcomes.
1.
3. Combine a variety of evaluation techniques to evaluate learner's
achievement in different areas, for more consistent and
comprehensive evaluation.
4. Distinguish the strengths and limitations of various evaluation
techniques to be able to b use a particular evaluation device
meaningfully and effectively.
5. Ensure that critical intended learning behaviours are included as
the representative sample behaviors of the content areas.
6. Eliminate irrelevant barriers to the answer.
When constructing items for a classroom test, care must be taken to
eliminate any extraneous factors that might prevent learners from
responding. It is not appropriate that learners answered such test
items incorrectly merely because the sentence structure was too
complex, the vocabulary too difficult, unclear direction and so forth.
Therefore
avoid ambiguous statements.
avoid use of difficult words and complex sentence structure.
give clear instruction/direction.
7. Ensure appropriate structure of the test to facilitate the students'
response to the test items
Maintain proper item difficulty.
Arrange test items with appropriate level of difficulty.
Use appropriate length of test.
Evaluation
225 Avoid unintended clues to answer by avoiding grammatical
inconsistencies, verbal association, length and location of correct
responses,
e
te
Select/Use evaluation technique(s) which are practical.
Ensure proper use of evaluation.
10.2 Types of Educational Measurement and Evaluation
As the health professional education has both theoretical and
clinical aspects of learning, evaluation also should be done in both
perspectives
Classroom Test Measures
includes paper pencil test to evaluate achievement of learners in
classroom learning. It includes both subjective and objective types
of test.
Subjective Type
Extended response or essay type
Restricted response or short response type
.Objective Type
▸ Supply type
Short answer type
Completion type
▸ Selection type
Alternate response type
Matching type
Multiple-choice question
2. Clinical Evaluation
personal, social aspects should be evaluated by following methods.
Maming achievement in clinical learning such as skill. attitude,
behavioural change in
Observational Technique
Rating scale
Checklist
Anecdotal record
228
Educational Science for Nurses
b. Written report
Case studies
Written report
c. Practical examination
Objective structured practical examination (OSPE)
Objective structured clinical examination (OSCE)
d. Viva voce (oral examination)
10.2.1 Classroom Evaluation
Classroom test play a central role in the evaluation of the students
learning. They provide relevant measurement of many important
learning outcomes such as knowledge. understanding, analysis and
thinking skills. The validity of the information provided by evaluation
depends on the proper planning and preparation of the test.
Planning the Classroom Test
For the preparation of the valid, reliable and useful classroom test, it
should follow certain steps planning as following.
1. Determine the purpose of testing.
2. Developing the test specifications.
3. Select appropriate items and types.
4. Prepares relevant test items.
5. Assembling the test.
6. Administering the test.
7. Appraising the test.
8. Using the results for improved learning and instruction.
1. Determine the purpose of Testing
Classroom test can be used for a variety of instructional purposes.
Test may be done assess the prerequisite knowledge and skill
needed for the instruction, the extent of learners achievement of
the learning outcome and effectiveness of planned instruction and
to assign grades, certificates to the learner. While planning the test,
evaluator should consider the purpose of the test.
. Develop the Test Specifications
To ensure that a classroom test will measure a representative
sample of the relevant content, test specifications should be used.
Test specification is a two-way chart which includes a list of
instructional objective and outline of the course content.
The list of objectives describes the type of performance expected to
learner. It should be limited to those learning outcome those cannot
be measured psychomotor and affective domain. After preparing the
objective, select the content to be covered by the test. The content
outline indicates the area in which performance is to be measured.
Content should be detailed enough to ensure adequate sampling
(representative) during test construction and proper interpretation
of the results.
Based on objective and contents a two way table is built that relates
the instructional objectives to the content. It clearly determines
relative emphasis to be given to each objective and content area
and specify the weightage (mark allotment) for each area.
Table is prepared by using following steps:
List the general instructional objectives across the top of the table
List the major content areas down the left side of the table.
▸ Determine what proportion of the test item should be devoted to
each objectives and each context area
. Selecting Appropriate Test Item Types
Classroom fest are of two types; the highly structured objective
items and subjective essay question. Each type should be used
where most appropriate to measure learning outcome and by the
strengths and limitations of each item type. A basic principle to
select the type of test item is that the test provides the most direct
measurement of learning outcome. It better to use more than one
types in a single test.
4. Preparing/constructing Relevant Test Item
In preparing a set of items for valid measurement test, there are
some general rules that apply to all items types although specific
rules for writing each item are different.
Design the test items considering intended learning outcomes
(instructional objectives
Use table of specification as a basis for construction of test items, to
make more relevant test item. It also provides representative
sample in each of the various areas of course content as all the
course contents cannot be included in test.
Write each item clearly by carefully formulating the question, using
simple direst language, using correct grammar and punctuation and
avoiding unnecessary wording.
Write each test item at an appropriate level with simple vocabulary
and sentenc structure.
Write each test item in such a way so that it does not provide clues
to other items in the test. Unless care is taken during item writing,
one item may provide information that is useful in answering
another item.
Write each test item so that it is a proper level of difficulty according
to purpose of the test. Do not increase difficulty by adding
unimportant or irrelevant materials Evaluation should encourage
learners to concentrate their efforts on learning in important
learning contents useful work activities or professional activities.
Consider appropriate length of text to obtain a representative
sample. Although there are no hard and fast rules for determining
test length, test should be long enough to provide an adequate
sampling of each objective and content areas.
Decide on the model answers to questions. Use answer keys or
checklists for scoring
Write test items in such a way so that the answer is one that is
agreed by experts as best answer.
Write more test items than needed so that it will permit the weaker
items to be discarded during later review.
* Write test items well in advance of the testing date which provide
time for reviewing the items to avoid defects
5. Assembling and Reviewing the Classroom Test
After selection and construction of the most appropriate items that
are relevant to the learning outcomes, it must be assembled into
test, prepare directions to administer the test. Validity built during
construction of the test items, systematic procedures of assembling,
administration and scoring will provide greater assurance that the
item will function as intended.
Assembling the classroom test include reviewing, editing, revising,
arranging and rearranging the test items as needed.
2. Reviewing and Recording the Test
Even though construction is done carefully, there may be unwanted
errors like unnoticed clues, ambiguity and other defects. So test
should be reviewed viewing the items from the learners' viewpoint
as well as from that of the test maker. Consider following to review.
Use item format appropriate for the learning outcome being
measured for example, if the learning outcome is to know definition,
facts, principles, short answer type will be appropriate and if only to
identify, selection type is appropriate.
Careful review the test to avoid ambiguity, inappropriate
vocabulary, difficult sentence structure that may be unnoticed
during their construction. The difficulty of the vocabulary and the
sentence complexity of the sentence structure must be assessed
maintaining according to learners maturity level.
Analyze the content of each item to determine the functional
elements for the correct responses without extra/unrelated elements
for correct responses.
Ensure appropriate item difficulty considering the nature of the test
and educational background of the learners. For the motivational
purpose, some items at the beginning of the test should be easy but
it should not be so easy that everyone answer it correctly. Similarly
none of the test items should be so difficult that everyone misses it.
Type of item should be planned in such a way that it discriminates
between high and low achievers.
Make sure that the test items are free from technical errors and
irrelevant clues such as grammatical inconsistencies, verbal
association, length of statements etc. Most of these clues can be
removed during item review.
b. General Guidelines for the Organization and Formatting of Tests
i . Organization/Arrangement of the Tests Items: The test item can
be arranged by systematic consideration of types of items used, the
difficulty of the items, the subject matter measured.
If there is more than one type of item, items of the same type
should be together. When the items are grouped according to type,
all similar test items
are placed together which simplifies the directions given to the
students When two or more item types are included in the test, it is
arranged in simple to complex sequence as such as true-false items,
Matching items, multiple choice question, short answer items and
essay questions.
If the items are arranged according to their level of difficulty, it is
organized from easiest to most difficult continuum such as
knowledge of terms knowledge of principles and application of
principles. Such an arrangement has advantages for average and
below-average students, because it allows them to use their time
more efficiently.
Another logical way to arrange items is by subject matter topic. All
items related to the same objective or outcome should be grouped
together.
ii. Preparing General Directions for the Test: The pattern of detail for
each of these directions depends mainly on the learners' age, level,
complexity of the test, there experience with the testing procedure
used.
If the purpose and importance of the test are not self-evident, they
should be discussed with the students in order to orient them and
motivate them to do their best.
In additions to directions for selecting (or supplying) answers,
students should be instructed as how to record the answers.
If directions are complicated, or involve procedures unfamiliar to the
students, an example or sample item should be presented to the
students.
If test taking time is not ample, students should be given some
guidelines for using their time effectively.
If there is not penalty for guessing, students should be encouraged
to answer every item, even if they are not sure of the correct
answer. If there is penalty for guessing, students should be
instructed to respond only to those items for which are reasonably
sure concerning the correct answer.
iii. Format Considerations to Reproduce the Test: After preparations
of the test material for reproduction the items should be arranged,
spaced so that they can be read, answered and scored with the
least amount of difficulty. Therefore consider the following
Single items should not be split on two pages,
» Alternative of the MCO should be in a vertical column beneath the
stem rather than across the page.
» Keeping too many test items into one page should be aided to
minimize confusion and time consuming in administration and
scoring
R
Evaluation 231
# Items should be numbered consecutively and directions should be
prominently positioned and separated from the actual items
Answer spaces should be of equal size and large enough to
accommodate the longest response, especially on short-answer
tests, and answer spaces should be placed in a vertical column in
the right or left hand margin or a separate answer sheet may be
used.
If an item is accompanied by any type of illustration, such as
diagrams, it should be accurate and placed adjacent to the item,
directly above it if possible or parallel to it.
After reproduction of test paper, the entire test should be proof read
before administration.
Administering and Scoring Classroom Test
Administration and scoring of the test concerned with
Providing optimum conditions for obtaining learners' response.
Selecting convenient and accurate procedures for scoring the
results.
Administering the Test
Administering the test means creating environment conducive to
their best efforts and the control of factors that might interfere with
valid measurement. Conducive environment includes physical
condition such as comfortable adequate work space, sitting
arrangement, quiet and calm environment as well as psychological
condition like avoidance of excessive anxiety, distraction and
confusion situation. The time of testing can also influence the
results.
Guideline for Administering Classroom Test
1 Do not talk unnecessarily before the test: Learners are mentally
set for the test and will ignore anything pertaining to the test for
fear it will hinder their recall of information needed to consider the
question so it is not useful to talk about assignment, next topic and
so forth just before test.
ii. Keep Interruption to a minimum during the Test: If a learner asks
for an ambiguous item, it is beneficial to explain the item to the
entire group at the same time. Such interruption are necessary but
should be kept to minimum. All other distractions outside and inside
the classroom be eliminated as much as possible.
iii. Avoid giving hints to learners who ask about Individual Items: If
the items are ambiguous, it should be classified for the entire group
as indicated earlier. Otherwise abstain from helping the students to
answer it as giving unfair aid to some learners decreases the validity
of the results and lowers class morale.
232
Educational Science for Nurses
iv. Discourage Cheating: Discourage cheating by special seating
arrangement and Discal supervision. Receiving unauthorized help
from other student during a bet has the same deleterious effect in
validity and class morale as receiving special hints from the teacher.
For valid results their score must be based on their own unaided
efforts.
b. Scoring the Test
In scoring objective test, each correct answer is counted as one
point to collect final score. Scoring of answer provided to subjective
question is done based on coverage of key areas mentioned in
model answer. For all types of classroom test, it is necessary to
score performance of the student without knowing whose test it is.
This will avoid any personal biases in the scoring process.
7. Appraising Classroom Tests
Before a classroom test has been administered it should be
evaluated according to the points discussed earlier. After classroom
test, scoring is done and discussed the results with learners. In
addition, institution and its curriculum should decide what
percentage of achievement is required for passing the examination.
It is especially needed with the criterion referenced assessment.
After all these, it is better to appraise the effectiveness of the test
items and to keep record for use.
Construction of Class Room Test
A. Constructing Class Room Test: Objective Type Questions
Objective tests are the test, in which students are not required to
compose long responses but rather to select from among the
number of given alternatives or supply word, symbol. number etc.
These are more valid and reliable than essay items and scorer
reliability is much higher. Objective tests can be used to measure
objectives at all levels, and are not confined to knowledge level
items, as argued by many.
1. Supply Types Items
Supply type items can be answered by supplying word, phrase,
number or symbol. It can be of two types, short answer type and
completion type. These items are suitable for measuring a wide
variety of relatively simple learning outcomes such as to interpret
diagrams, charts, graphs and pictorial data. In comparison to
selection type test item, pupil must actually provide the answer so
chance of guess work in less.
Short answer type: it is formulated in direct question form and blank
space at the end of the question
Example:
How many systems are there in human body? What is the smallest
bone in human body?
Exaluation
233
Completion type: The question is phrased as an incomplete
sentence and examinee has to complete by filling blank space with
word, phrase, number or symbol.
Example: The number of systems in human body is The smallest
bone of human body is
Strengths
It is easy to construct as it measures relatively simple learning
outcomes.
As a student has to supply the answer, it reduces chance of guess
work
Limitations
It can measure only recall of facts or simple learning outcomes.
Unless the question is very carefully phrased, it can create confusion
for the answer.
Guidelines for Constructing Short Answer Questions
Although it is considered easy to construct short answer items, the
following suggestions will help to avoid possible errors.
Word the item in such a way so that the acquired answer is both
brief and specific.
Do not take the statements directly from textbooks to formulate
short answer items
A direct question is generally more desirable than an incomplete
statement. The direct question form is more natural to the students
and it also minimizes ambiguity than incomplete statements.
Blanks for answer should be equal in length and in a column to the
right of the question. If blanks for answers are kept equal in length
the length of the blank space does not supply a clue to the answer.
When completion items are used, do not include to many blanks. If
there are too many blanks the meaning will be lost, chance of
guesswork will increase.
* If the answer is to be expressed in numerical units, symbols
indicate the type of answer wanted.
2. Alternative Response or True and False Type Test Item
? has to mark true/false, right/wrong, yes/no, correct/incorrect,
agree/disagree etc. Alternative response type items consists of a
declarative statement for which the learner responses. In each item,
there are only two possible answers. The name true and false item
is given as true false option is the most common.
Uses
It is useful to measure the ability of the students to identify
definition of terms statement of facts, principles etc. It helps to
develop critical thinking in a topic. It is also used to assess the
ability to recognize cause and effect relationship.
Strengths
This type of test is relatively easy to construct in comparison to
multiple choice question.
▸ It takes less time for scoring
It can obtain relatively wider sampling of content area
The students can respond to a larger number of questions in limited
time
It can measure the higher cognitive capability if constructed with a
great care
Limitations
▸ Simple true false item can measure only low cognitive
achievement (rote memory)
There is fifty-fifty chance of selecting the correct answer on the
basis of chance as only two alternatives for response. In overall
there is more than fifty percent chance of guess work as the
statement may include some unintended clues.
The validity and reliability of the test item is low because of higher
chance of guess work by students. Students may guess the different
answer in different test.
▸ Response validity is questionable because of response set. Learner
has certain tendency to follow the certain pattern to responds the
test items. For example some learner may consistently mark true
those items they do not know or other may consistently mark false.
If the test favor the response set, the obtained test score is
irrelevant to the purpose of the test.
Because of these limitations, this item is used when other items are
inappropriate for measuring the achievement. This include situation
in which only two alternatives possible and when distinguishing fact
from opinion, cause from effect, relevant from non-relevant
information etc.
Guideline to Write True and False Questions
The main consideration for construction of true and false item is
formulating statements free from ambiguity and irrelevant clues.
General guidelines for construction are as following
» Avoid using broad general statements if it is judged to be true or
false. The broad generalizations are usually false or must be
qualified by specific words. The qualifying words provide clues to
answer. Words often found in false statements are only, nevet all,
every, always, none, and no. Those often found in true statements
are usually generally, sometimes, customarily, often, may, could,
and frequently.
Evaluation
235
TE
Example: Heart disease is generally common among middle age
group.
Penicillin is an effective drug for the treatment of all pneumonia
TF Avoid using trivial statements that have little learning
significance. Such items cause students to direct their attention
toward memorizing finer points instead of general knowledge and
understanding.
Example:
Poor: Liver is the organ of the body. Better: Liver is the largest gland
of the body.
TF
TF
The language of true-false test items should be clear, concise, and
understandable. Avoid words such as more, few, large, and good,
because these are relative and may confuse the students.
Avoid using negative statement as much as possible because
students overlook negative words such as no, not, except. If
negative word must be used it should be underlined/bolded/put in
italic so that students do not miss it.
Examples: "Which of the following is not the"
Poor. Resistance to smallpox obtained through the use of vaccine is
not called active immunity. TF
Better. Resistance to smallpox obtained through the use of vaccine
is called passive immunity. TF
Especially use of double negative words may contribute to
ambiguity of the statement.
TF TF
Poor: None of the steps in the experiment was unnecessary. Better.
All of the steps in the experiment was necessary.
similar to those in true statements. Make both true and false
statements of approximately equal length. Usually true statements
are relatively longer because such statements must be phrased to
make statement absolutely correct. It can provide possible clue to
the correct answer. To overcome it, false statements can be
lengthened through use of qualifying phrases
Try to make true and false statement approximately equal in
number to avoid guess Work Approximately equal means keep the
ratio of true and false statement about 40 and 60 percent. It will
prevent response sets from unduly go up or down the learners
scores.
Avoid using two ideas in one statement unless cause and effect
relationship are being measured
W Avoid using long wordy and complex sentence in true and false
Poor: It is possible to determine whether a solution is acid by the red
color formed on litmus paper when it is inserted into the solution. TF
238
Educational Science for Nurses
TF
Better: Litmus paper turns red in an acid solution.
Poor. Insulin which is secreted by beta cells of pancreas is essential
for glucose metabolism.
Tp TF
TF
TE
Lim
Better: Insulin is essential for glucose metabolism.
linsure unambiguous statements.
Poor: Drying is frequently used to preserve food.
Better: Fruits can be preserved by drying.
Variation in Questions
Stre
a. Cause and effect relationship is designed to test student's higher
cognitive abilities such as
Example: Harmful organism will be killed if it is boiled
TF
b. Grouping of the items can cover large content area
Example: Which of the disease is caused by viruses?
a. Chickenpox
TF
b. Measles
TF
c. Diphtheria
TF
d. Typhoid
TF
e. Malaria
TE
f. Influenza
TE
Give clear direction about where and how to response the question
for example
Given below are some statements. If you agree with the statement,
circle the letter T and if you disagree with statement, circle the 'F'.
3. Matching Type Test Item
The matching type item consists of two parallel columns with each
word, number of symbol in one column which is matched to a word,
sentence or phrase in the other column. Item in left hand column for
which match is sought is called premises. Sentence, phrase number,
word to match the premises in another column is named responses.
Matching type items are used to measure factual information based
on simple association It is appropriate if sufficient number of
homogenous premises and responses (association) are to be
assessed. It also sometime used with pictorial material in relating
pictures and and examples, parts and function, scientist and
inventions, equipment and uses, principles words. Some important
relationship for matching type items are terms and definition, rules
and illustrations etc.
Evaluation
237
Strengths
relatively short period. It is in compact form so it is possible to cover
larger sample of the factual matter in
It is relatively easy and less time consuming to construct
Students can response larger number of question in short time.
It requires less time for scoring.
Limitations
It cannot measure higher cognitive achievement.
It may lack plausibility which will provide clues to correct answer.
Sometime it may be difficult to find significant homogenous content
for the learning outcome.
Guideline to Construct Matching Type Items
The following suggestions are useful to construct effective matching
type items
Keep premises in left side, and response în right side.
Use only homogenous material in one set
Keep the premises items brief and place the shorter responses on
the right side. Brief list enables learners to read the longer premises
first and then find out the responses rapidly. Approximately, 4-7
items are kept in one set of matching test. It should not be more
than ten items.
Include more responses than premises. When equal number
responses and premises are used and each response is used only
once, the probability for guessing the remaining correct responses is
increased with each selection of responses and the last responses
can be selected entirely with this process of elimination.
Arrange the list of responses in logical order (alphabetic, number in
sequence) so that it will be easier to search the answer. It will also
prevent detecting possible clues from arrangement of the
responses.
Provide specific direction. The direction should be clear to the
examinee that each response may be used once or more than once
and what they have to do such as draw lines between matched
items, or transfer letters.
Keep set of items for one exercise on the same page. It will prevent
the disturbance created by switching the pages of the test back and
forth and maintain speed and efficiency.
238
Educational Science for Nurses
Example: Matching Type Question
Direction: Match the items given in the left hand side with the
responses in the right hand side by keeping letter of correct answer
in front of number of items in left hand side. Options can be used
only once.
1. Number of permanent teeth
a. crown
2. Vitamin C
b. jaw
3. Vitamin D
C. hardest substance
4. Molars
d. citrus fruit
5. Enamel
e. 32
6. Pulps
f. citrus fruit
7. Visible part of tooth
g. sunshine
h. crown
i. 28
4. Multiple Choice Questions (MCQ)
Multiple choice questions are the most widely used objective test
item. It can measure many of the simple learning outcomes like
other type of objective type question as well as it can measure a
variety of more complex outcomes in knowledge, understanding and
application areas. It is also free of many of the limitations of other
forms of objective items So it is used extensively in achievement
test.
MCQ item consists of a problem and a list of suggested alternatives.
Problem may be presented as a direct question or an incomplete
statement and is called the stem. Direct question form is easier to
write, more natural and clearly present the problem, whereas the
incomplete statement is more concise. However, question form is
preferable. The list of suggested solution is called alternatives
(choices, options) that may include words numbers, symbols and
phrases. The learner is requested to read the stem and alternatives
and select the one best/correct answer. The correct option is called
answer and remaining are called distracters.
Example:
What is the name of the membrane covering the lungs?
a. Alveoli
b. Peritoneum
c. Pleura
d. Serosa
Or The name of the membrane covering the lungs is
a. Alveoli
b. Peritoneum
c. Pleura
d. Serosa
Evaluation
Alternatives in MCO
231 Eve alternatives has20% chance of guessing, four alternatives
has 25% chance of guesaing So more alternatives reduce chance of
guess work More common is 4 responses item. Plausible distracters
reduce chance of guess work and enhance reliability and validity
Time and effort are needed to make good and effective alternatives
Uses outcomes of most of the subject matter content. So many
standardized tests use multiple The MOCO is the most versatile type
of test item. It can measure simple to complex leatting choice items
exclusively.
It can measure knowledge outcomes in different levels such as
meaning of terms. specific facts and principles. It can also measure
knowledge outcomes of methods and procedures used for problem
solving, common practices and specific procedures.
It is especially adaptable in measuring various aspects of
understanding and application which other objective question can
no measure. It has ability to identify application of facts and
principles, sequence of steps, correct methods in in carrying out the
procedures and to interpret various relationships among facts and
especially cause and effect relationship.
Strengths
It has greater objectivity, validity and reliability as the chance of
guess work is reduced with increased alternatives.
Can measure the learning outcomes of various levels of knowledge
in varied subject areas. If constructed carefully, can measure higher
level cognition.
It can cover large content area.
Students can response quickly and accurately.
It is easy to administer and score.
It takes less time to score.
Provide feedback of achievement for both students and teachers.
Limitations
▸ Takes more time and effort to construct valid MCQs.
Difficult to construct, especially for realistic distractors, unrealistic
distractors reduce validity and reliability.
▸ Difficult to measure higher mental cognition like organization of
thought, problem solving skill, writing skill etc.
Are costly especially if the number of students is small.
240 Educational Science for Nurses
Guideline/Direction to Make MCQ
1. Make sure that the stem is meaningful by itself and has presented
definite problem Whether the stem is kept in incomplete statement
or direct question, its meaning should be clear to select the
alternatives following it.
2 Use precise, clear, simple and unambiguous language to formulate
the items.
3. Use only one correct answer or best answer if single response
type.
4. The item stem should eliminate all unrelated details from an item
so that it will reduce reading time required and enhance easy
understanding of the problem in the stem
Poor: One of the growth parameters of infants is an eruption of teeth
which is als called deciduous teeth. When the deciduous teeth
eruption occurs in a infang
a. 4-5 months
b. 6-8 months
C. 8-10 months
d. 10-12 months
Better. When the deciduous teeth eruption occurs in a infant?
a. 4-5 months
b. 6-8 months
c. 8-10 months
d. 10-12 months
5. Base each item on a single clearly defined problem and include
the main idea in question. The students must know what the
problem is before reading the response of options.
Example: Patient has high fever, headache and bradycardia, what
can be the probable
diagnosis?
a. Meningitis
c. Pneumonia
b. Malaria
d. Enteric fever
Usually fever cause tachycardia but in are condition of the
responses fever cu bradycardia
6. Avoid negative statement as much as possible. If it is used,
underline or make italic
Poor:
Colostrum is not a source of
a. IgM
b. IgG
c. IgE
d. IgA
Better: Colostrum is a rich source of
a. IgM
b. IgG
c. IgE
d. IgA
7. Use/make plausible or logical distracter. Write the options that are
homogene content avoiding use of unrelated or senseless
distracters
Evaluation
241
Poor: What is the amount of potassium in oral rehydration solution?
a. 0.5 grams
b. 1.5 grams
C 5 grams
d. 10 grams.
Better: What is the amount of potassium in oral rehydration
solution?
a. 1.5 grams
b. 2.0 grams
C 2.4 grams
d. 2.6 grams
8. Avoid the use of clues for correct answer somewhere else in the
test.
Which one of the following is used to prevent polio?
Gamaglobulin
b. Penicillin
c. Salk vaccine
d. Sulpha drug
Name of the vaccine makes it easier to guess the prevention of
disease.
Avoid making correct answer long or short than other options to
ensure that the relative length of the alternative should not provide
clue to the answer
10. Include in the stem any word that is repeated in each response
Poor. In enteric fever of 5 days duration:
a. There will be splenomegaly
b. There will be tachycardia
C. There will be positive blood culture
d. There will be positive stool culture
Better. In enteric fever of 5 days duration, there will be:
a. splenomegaly
b. tachycardia
c. positive blood culture
d. positive stool culture.
11. Make responses grammatically consistent with the stem and
parallel with one another Learners will more likely to select the
correct answer by finding the grammatically correct option.
Poor. Nursing measure to prevent transmission of infection to child
with leukemia is:
a. keeping the child in reverse isolation.
b. avoid performing invasive procedure.
c. transfuse blood.
d. maintenance of oral hygiene.
Better: Nursing measure to prevent transmission of infection to child
with leukemia is:
a. keep the child in reverse isolation
b. 242
c.
d. Festional Science for Horses
e.
f. bavoid invasive procedure at all
g.
h. transfuse blood
i.
j. d. maintain oral hygiene
k.
l. 12 Give clear instraction for the learners such as: Question 1-10
are multiple chees questions. Please read each question
carefully, encircle the best answer.
m.
n. 13. The correct answer should appear in each of the alternative
position an approxim equal number and in random order.
o.
p. 14. Avoid all or none of the above as much as possible.
q.
r. 15. Avoid use of MCQ when other item types are more
appropriate such as limited distracters or assessing problem
solving and creativity.
s.
t. B. Constructing Class Room Test: Subjective Type of Question
u.
v. 1. Essay Type Question
w.
x. Despite wide applicability of objective type of question, there are
some significant instructional outcomes for which no satisfactory
objective measurement type has been developed. These include
such outcomes as the ability to recall, organize and integ ideas,
the ability to express oneself in writing. For the measurement of
these outcomes, the essay question is most useful.
y.
z. The distinctive feature of essay question is the freedom of
response. The learners are free select, relate and present ideas in
their own words. Learning outcomes concerns with the abilities to
select, organize, integrate, relate and evaluate ideas. Essay
questions should be used specially to measure those learning
outcomes that cannot be measured by objective test items.
Although it is difficult and time consuming task, it has
educational significance
aa.
bb. Multiple-choice questions, matching exercises and true-false
items require students
cc.
dd. choose an answer from a list of possibilities whereas essay
questions require students compose their own answer. Essay
questions are different from these other construct response items
because they require more systematic and in-depth thinking. An
es question is "A test item which requires a response composed
by the examinee, usually in the form one or more sentences, of a
nature that no single response or pattern of responses can be
listed correct, and the accuracy and quality of which can be
judged subjectively only by skilled or informa in the subject.-M.
Stalnaker
ee.
ff. Based on Stalnaker's definition, an essay question should meet
the following criteria:
gg.
hh. ▸ Requires examinees to compose rather than select their
response.
ii.
jj. Elicits student responses that must consist of more than one
sentence.
kk.
ll. - Allows different or original responses or pattern of responses.
Provide students the opportunity to think and to give a variety of
answers so answers to this question me vary in length, structure,
etc.
mm. Requires subjective judgment by a competent specialist to
judge the correctness, completeness, accuracy and quality of
responses,
nn.
oo. understanding of subject-matter content. The other purpose is
to assess students writing There are two major purposes for
using essay questions. One purpose is to assess students abilities
pp.
qq. According to variation in degree of freedom of response tends
to fall along a continuum between two extreme, it can be
conveniently classified into two types
rr.
ss. The restricted response type
tt.
uu. The extended response type
vv.
ww. The Restricted Response Type (Short Response Type)
xx.
yy. The restricted response question is more structured type of essay
type question. It is useful for measuring learning outcome
requiring the interpretation and application of data in specific
area.
zz.
aaa. The restricted response type question usually limits both the
content and the response. For response limitation, the question is
formulated based on specific problems. This type of question can
measure a variety of complex learning outcomes similar to those
measured by the objective type items. The main difference is
that the objective type items require learners to select the
answer and it requires learner to supply the answer. In some
situation, objective types are preferred because of ease and
reliability of scoring. In other situation the restricted response
question is better because of its more valid result of learning
outcome.
bbb.
ccc. Short response type question makes it possible to measure
specific learning outcome but restriction make them less valuable
as it lack measurement of learning outcome related to
integration, organization and originality.
ddd.
eee. ii. Extended Response Type
fff.
ggg. Extended response type question allows learners to select any
factual information that they think relevant to organize the
answer in accordance with their judgment and integrate and
evaluate ideas as they think appropriate. This freedom enables
the learners to demonstrate their ability to select, organize,
integrate and evaluate ideas. On other side, this freedom makes
the extended response question inefficient for measuring more
specific learning outcomes and introduces scoring difficulties (low
reliability) which severely restrict its use
hhh.
iii. Strengths
jjj.
kkk. Assess higher-order or critical thinking skills: Essay questions
provide an effective way of assessing complex learning outcomes
that cannot be effectively assessed by other commonly used
paper-and-pencil assessment procedures
lll. 244
mmm.
nnn. Educational Science for Nurses
ooo.
ppp. fact, some of the most complicated thinking processes can
only be assessed through essay questions, when a paper-and-
pencil test is necessary. It must be phrased so as to produce the
desired behavior.
qqq.
rrr.▸ Evaluate student thinking and reasoning: This type of question
emphasizes effective measurement of skills such as reasoning,
integration, application, and problem solving, which other types
of questions cannot do effectively
sss.
ttt.Provide authentic experience: Constructed responses are closer
to real life than selected responses. Problem solving and
decision-making are vital life competencies. In most cases these
skills require the ability to construct a solution or decision rather
than select a solution or decision from a limited set of
possibilities.
uuu.
vvv. ▸ It is easy to construct essay type question which has led to
its widespread use by classroom teachers
www.
xxx. ▸ As the learners have to present their answer in their own
handwriting, it improves in writing skill (written expression) and
handwriting.
yyy.
zzz. Limitations
aaaa.
bbbb. The most important limitation is the unreliability of the
scoring.
cccc.
dddd. Answer to the essay question are likely to score differently by
different teachen (inter-scorer reliability) and that even the same
teacher may score the answer differently at different times (intra-
scorer reliability), especially if responses are evaluated without
adequate attention to learning outcomes to be achieved. So, the
score obtained by essay question may be less valid than
objective type question.
eeee.
ffff. Essay questions assess a limited sample of the subject matter,
thereby reducing content validity. Due to the time it takes for
students to respond to essay questions, the number of essay
questions that can be included in a test is limited s it includes
only limited sample of the subject matter, thereby reducing
content validity. A test of 80 multiple-choice questions will most
likely cover a wider rang of content than a test of 8 essay
questions.
gggg.
hhhh. >> Essay questions are difficult and time consuming to score.
It takes considerable time in scoring if the scoring is done
thoroughly and with helpful comment. If the classes are large and
several questions are used, thorough scoring becomes practically
impossible.
iiii.
jjjj.As essay questions allow students some latitude in formulating
their responses they may write everything they know than what
is exactly asked.
kkkk.
llll.Sometime, scoring by examiner may be influenced by quality of
hand writing and volume of writing
mmmm. Evalnation
nnnn.
oooo. 245
pppp.
qqqq. Guidelines for Constructing Essay Questions
rrrr.
ssss. fused in two areas; construction of the question and scoring
the answer. However these If essay questions to be used as valid
and reliable measurement, attention should be twe procedures
are interrelated. Here are some suggestions for formulation of
effective essay type question,
tttt.
uuuu. 1. Avoid using essay questions for intended learning outcomes
that are better assessed with other kinds of assessment,
vvvv.
wwww. Some types of learning outcomes can be more
efficiently and more reliably assessed with objective type
questions than with essay questions. In addition, some complex
learning outcomes can be more directly assessed with
performance assessment than with essay questions. Since essay
questions sample a limited range of subject-matter content, are
more time consuming to score, and involve greater subjectivity in
scoring, the use of essay questions should be reserved for
learning outcomes that cannot be better assessed by some other
means.
xxxx.
yyyy. 2. Clearly define the intended learning outcome to be
assessed.
zzzz.
aaaaa. It is crucial to know the intended learning outcome for
formulating question clarifying the performance that students
should be able to demonstrate as a result of what they have
learned.
bbbbb.
ccccc. 3. Formulate question in such a way that expected the
behavior of the learner is clearly indicated.
ddddd.
eeeee. Each essay type question should be carefully and
thoughtfully constructed to elicit particular aspects of behavior
indicated by objectives. Phrase each question so that the
learner's task is clearly indicated. It is done by specifying the
learners' performance to respond the essay question and giving
explicit direction with directive verb concerning the type of
response desired. For example, verb 'write' may not be clear
about length of answer expected compared to list, state or
explain.
fffff.
ggggg. 4. Specify the relative weightage and the approximate
time limit in clear directions Specifying the weightage helps
students allocate their time in answering several essay
questions.
hhhhh.
iiiii. 135. Use several relatively short essay questions rather than
one long one
jjjjj.
kkkkk. Only a very limited number of essay questions can be
included on a test. This creates a challenge with regards to
designing valid essay questions. Shorter essay questions are
better suited to assess the depth of student learning within a
particular content area whereas longer essay questions are
better suited to assess the breadth of student learning within a
subject. So, focus on assessing the breadth and the depth of
student learning within the same subject.
lllll. 245 Educational Science for Nurses
mmmmm.
nnnnn. 6. Avoid the use of optional questions
ooooo.
ppppp. Students should not be permitted to choose one essay
question to answer from two of more optional questions. The use
of optional questions should be avoided for the following reasons:
qqqqq.
rrrrr. Students may waste time deciding on an option
sssss.
ttttt. Some questions are likely to be harder which could make the
comparati assessment of students' abilities unfair
uuuuu.
vvvvv. >> The use of optional questions makes it difficult to
evaluate if all students are equally knowledgeable about topics
covered in the test
7. Improve the essay question through preview and review.
The following steps can help to improve the essay item before and after
administering the students.
Preview (before administering question to the students)
a. Predict student responses: Try to respond to the question from the
perspective of a typical student. Evaluate whether students have the
content knowledge and the skills necessary to adequately respond to the
question. Reformulate them before handing out the exam.
b. Write a model answer: Before using a question, write model answer(s)
or at leas an outline of major points that should be included in an answer.
Writing the model answer allows reflection on the clarity of the essay
question. Once the model answer has been written compare its alignment
with the essay question and the intended learning outcome and make
changes as needed to assure that the intended learning outcome, the
essay question, and the model answer are aligned with each other.
c. Ask a knowledgeable colleague to critically review the essay
question, the model answer, and the intended learning outcome for
alignment: Before using the essay question on a test, ask a person
knowledgeable in the subject (colleague) to critically review the
essay question, the model answer, and the intended learning
outcome to determine how well they are associated with each other
Based on the intended learning outcome, revise the question as
needed. By having someone else look at the test, the likelihood of
creating effective test items is increased, simply because two minds
are usually better than one.
Review student responses to the essay question.
After students complete the essay questions, carefully review the
range of answers given and the manner in which students seem to
have interpreted the question. Make revisions based on the findings.
Writing good essay questions is a process that requires time and
practice.
Evaluation
247
Guideline for Scoring Essay Question
1. Prepare an outline of the expected answer (model answer) in advance.
It will serve as a basis for the grading of student responses and increase
stability in scoring
Use scoring method that is most appropriate.
There are two common type of scoring essay question: Point method and
rating method. In point method, answer is compared with the ideal answer
and points are assigned according to adequacy of the answer. With the
rating (grading) method, each paper is placed in one of the number of
piles according to degrees of quality and determined the credit point.
Rating and grading method is preferable for the extended response
question for greater objectivity.
3. Evaluate all answers of one question before going on to the next one. It
can maintain more uniform scoring and speed up the scoring action. It is
more useful if rating method of scoring is used as each question can be
rated
4. Evaluate the answers without looking at the learners' name. The
general impression formed about each learner during teaching may be a
source of bias in evaluating essay questions. Halo effect
(positive/negative) is the serious constrain to reliable score so identity of
the learner should be concealed until answers are scored. It is better to
identify paper with numbers than names. Teachers should make conscious
effort to eliminate any such bias in judgment.
5. If especially important decisions are to be based on results, obtain two
or more independent rating. If the essay questions are used to select
learners such as for award, scholarship, special training, two or more
competent evaluators should score the paper independently and their
score should be compared.
10.2.2 Clinical Evaluation
Nursing is a professional education that means graduates involves in
professional practice requiring knowledge and skills in different
professional activities. Clinical performance of nursing students is a
complex phenomenon comprised of aptitude, cognition, skills and
attitudes. In practice, students deal with real patient situation and unique
cases that do not fit the textbook description. What nursing students learn
in clinical practice is quite different from the classroom. Clinical teaching
and evaluation are also different from classroom teaching and evaluation.
Clinical evaluation plays significant role in nursing education as the clinical
teaching. It is essential to ascertain the attainment of the objectives of the
clinical experience/practice. Providing fait and reasonable clinical
evaluation is one of the most important and challenging role of the
nursing teacher.
When clinical performance is evaluated, student's skills are judged
relating with implementation of an established standard of patient care.
Acceptable performance
248
Educational Science for Nurses
clinical setting. Good clinical evaluation includes multidimensional
evaluation with use of involves knowledge, attitudes and skills that
students gradually develop in a variety of diverse evaluation method
completed over time seeking student's growth and progress to ensure
adequate sampling of behavior, elimination of individual bias.
Clinical evaluation is a process by which judgments are made about
learner's competencies in practice. Providing care to patients, families,
communities involve performance of varied observing performance and
arriving judgments about the student's competence. Judgment influence
data collected related to student's performance. Teacher may collect
different data during clinical practice and give feedback that
will help students to prepare for an observed summative examination in
the same area. In addition to being assessed by teachers, tutors, and
clinical instructors are also involved in evaluation process. Students' skills
can also be measured through self-assessment and peer assessment.
Purposes
Planned sequential evaluation of the clinical performance using variety of
evaluation methods is useful for the following purposes
to assess achievement of knowledge, skills and attitude in clinical area by
the learner in different time period.
to provide feedback so that positive behavior are reinforced and
suggestions are given to the learners for further improvement.
to assist students in refining their clinical skills through timely guidance
and counseling.
>> to diagnose deficiencies and difficulties in clinical learning so that
appropriate remedial measures can be implemented.
to evaluate the competency in various aspects of nursing skills necessary
in clinical setting.
Principles of Clinical Evaluation
Sufficient learning time should be provided before performance is
evaluated. Students need to engage in activities that promote learning
and to practice skills before they are evaluated for a grade. They must
practice skills and receive feedback (formative) before being graded
(summative)
> Evaluation system should be aware to maintain a climate of mutual
trust and respect between student and faculty especially teacher is
responsible for setting and maintaining climate
Clinical teaching and learning should focus on essential knowledge, skills,
and attitudes. As the nursing program has limited time for clinical
teaching and learning, time are used to get maximum advantage by
focusing time and effort on the most common practice problems that
students and graduates are likely to face
Clinical Evaluation Methods
Learning outcomes in skill areas and attitude behavioral changes,
personal-social development are especially difficult to evaluate with the
usual paper and pencil test Variety of approaches should be incorporated
in clinical evaluation to measure achievement in different aspects. Course
objective determines which method and instrument to use to measure
clinical learning and performance. Commonly used methods includes
observation of students' performance,
written assignment like care plan and case study,
oral examination, and
▸ practical examination
In clinical evaluation, the teacher makes observation of performance and
collects other types of data then compares this information to a set of
standards to arrive at judgment. From this assessment quantitative
symbol or grade may be applied to reflect the evaluation data and
judgment made about performance. It may be used to give feedback to
the leamer for further improvement.
Observational Techniques of Evaluation
Direct observation is the valid way to evaluate behavior changes of
students related to different intellectual, psychomotor and human relation
skills and attitude in clinical areas. It is direct visualization of the behavior
performance of different areas. It can be used for both formative and
summative assessments. Direct observation can be done in the clinical
setting while students performing clinical activities (formative evaluation).
Oral questioning can be used during direct observation. In many teaching
learning situations, it is difficult to observe each individual student,
particularly when time is limited and the ratio of students to teachers is
high. In this situation, clinical staff members or more senior students can
help observe students and give feedback on their performance.
Although, direct observation is the most valid method for assessing skills
reliability of the observation may be low or inconsistent due to observer
bias. Therefore, an evaluator needs tools to standardize the observation
results or to reduce variations in scoring among different observers.
Checklist, rating scale and anecdotal records are some tools used for
observation. It is essential that the students have access to and be
familiar with the instruments that will be used to assess their skills.
Strengths
It provides information about actual behavior performance of the students
in natun situation.
It can be used all levels of students in varied situation.
It can be used by teachers with minimum training and experience.
It is adaptable for both individual and group.
It can be used as formative evaluation to provide feedback for
improvement and summative evaluation to provide grade to students.
Limitation
There is high chance of subjective bias of the observer if standard tools
are not used reduces the reliability of the assessment.
It can observe only overt behavior expressed by student in particular
situation.
▸▸
Observer may not get adequate sample behavior to observe.
▸▸ It requires adequate time and focused effort by the observer for valid
observation. Its time consuming and to observe large numbers of
students.
Students may be conscious about observation and usual behavior may be
altered during observation.
Guidelines for Observation
Identify the expected behavior outcomes to the learner for observation.
Plan beforehand when and how to observe the learners' behavior.
Inform students about what are the expected behavior performances from
them befort observation. Orientation about objectives and evaluation tool
such as rating scale should be provided beforehand.
Observe whole situation to determine about level of the performance.
> Observe adequate sample behaviour before judging the performance as
far as possible
Use standard tool such as checklist and rating scale to get consistent
result of the observation.
" Observer should focus objectively on behavior criteria mentioned in
observation like rating scale/checklist without personal judgment.
" Provide feedback about their strength and areas for improvement to
them befor observation for evaluation.
Evaluation
to interpret the performance. 251 Observation from independent
observation of different teachers should be combined
» Record the observation simultaneously or after completing the
observation.
Tools Used for Observation
Checklist
Checklist is the tool or instrument used with observation in which
observation is done to determine whether particular behavior
characteristics are present or absent. It does not judge the degree of
quality or frequency. It is one of the common evaluation tools in nursing
and other health professional clinical evaluation where mastery is required
in various tasks Performance of those tasks requires series of steps and
checklist is useful to observe whether each step is carried or not. It is used
in evaluating procedure, product and aspects of personal-social behavior.
Steps for development of a checklist
Identify various behavior criteria/steps/activities required for the
performance. Task analysis can be useful to identify those behavior
criteria. Manuals and procedure books etc. can be used for this purpose.
(See in task analysis)
List those criteria or actions in clear, specific and simple language so that
students and teacher can understand well with same interpretation. It
should be written in observable term.
Avoid negative statement as far as possible.
Arrange the desired behaviour criteria in the sequential order in which
they are expected to occur.
Review the list of criteria carefully to determine the critical behaviours
that need to be evaluated
Draw a table. Keep the behavior criteria in left hand side of the tale and
keep yes/no or present/absent or v/X in right hand side column of the
table
Develop format of checklist with caption including name of university,
institute/campus, program name and level of students. Also include
subject, clinical area, in left hand side above the table and student's
information at right hand side. Mention checklist for which procedure or
product. As well as mention clear direction to be followed to use checklist.
One column can be kept for the remarks, so that, special event or
situation during observation can be noted down.
Keep space to write strength areas and areas to be improved at the end of
the table. Also keep space for date and signature of the evaluator.
W Separate checklist is used for the observation of the individual students
unless planned for the group.
Evaluation
253
Rating Scale
Rating scale is similar in appearance (structure) to the checklist as it also
consists of set of characteristics to be judged. The difference between
them is type of judgment needed. On rating scale, observer can indicate
the degree and quality to which characteristic is present or frequency with
which a behavior occurs.
As it is evaluation instrument, it should be constructed based on learning
objective to be achieved and its use should be confined to those areas in
which there are adequate opportunities to make the necessary
observation.
Types
There are 3 types of rating scales
1. Numerical Rating Scale
It is one of the simplest types of rating scale in which the rater rates or
circles number to indicate the degree to which a characteristic is present.
Each series of numbers is given a verbal description which remains
constant for different characteristics. The numerical rating scale is useful
when the characteristics to be rated can be classified into a limited
number of categories.
Example
>
Direction: Indicate the degree to which extent does the student participate
in group discussion by putting tick mark in appropriate number.
Key for Marking: 1: unsatisfactory, 2: below average, 3: average, 4: above
average, 5:
outstanding
Example: Numerical Rating Scale
Direction: Indicate to what degree student performed wound dressing
procedure by marking her performance under the appropriate column.
a. Unsatisfactory
b. Satisfactory
c. Outstanding
2. Graphic Rating Scale
In this scale, behavior criteria are followed by a horizontal line with set of
categories. The rating is made by placing a check on the specific position
in the continuum.
Example
Direction: Indicate the degree to which this learner contributes to class
discussion by placing anywhere along the horizontal line of each item.
To what extent does the learner participate in discussion?
Never
Seldom
Occasionally
Frequently Always
3. Descriptive Rating Scale
Descriptive rating scale uses descriptive phrases to identity the position
on a graphic scale.
Evaluation
255
Example
Direction: Make your rating on each of the following characteristics
anywhere along the continuum with each item.
To what extent does the learner participate in discussion?
Never participates quiet, passive members
Participate as much as other group member
Participate more than other group
The descriptive graphic rating scale is more satisfactory, has greater
accuracy, objectivity during rating process.
Use of Check List Rating Score
Rating scales can be used to evaluate a wide variety of learning outcomes
and aspects of development of mainly three areas: procedure, product
and personal social development.
a. Procedure Evaluation: In some areas, achievement is expressed
specifically through
learner's performance such as performing procedure, using equipment,
work effectively in a group, communication with others. Such activities do
not result in a product and paper pencil test are inadequate therefore
these areas of learning must be observed and evaluated using checklist or
rating scale. If the rating scales are prepared in terms of specific
procedures it also serves as an excellent teaching device.
b. Product Evaluation: When learner's performance results in some type of
product,
evaluation of product is more desirable than procedure; however
evaluation of both is also desirable. Product evaluation is useful in
evaluating things such as handwriting, drawing, term paper, field reports,
object made, research report etc. Product evaluation is done with the
same characteristics as in procedure evaluation for the qualities desired
for the product. Product may be evaluated according to its overall quality
or its separate features. Product scale is used in judging the quality of any
product.
c. Evaluate Personal-Social Development: One of the most common uses
of rating scale in education is rating various aspects of personal social
development. Evaluation of product and procedure are done during or
immediately after the directed observation wherever rating of personal-
social development are obtained at periodic interval and represent a kind
of summing up of the teachers general personal bias as it is obtained at
the end of a period of planned observation.
Common Errors in Rating
Rating scale is useful tool for valid and reliable observation if it is used
carefully. Certain types of errors can occur in rating which can reduce its
significance. So, special efforts are needed to minimize them. These
include 256
1. Personal bias
ii. Halo effect
iii. Logical error
Personal bias errors are indicated by general tendency to rate all
individuals at approximate the same position on the scale. Some raters
tend to use the high end of the scale only, which is referred to as the
generosity error or leniency effect. Others may favor to rate the lower end
both extreme of the scale and tends to rate everyone as average. This is
called the centrul of the scale which is known as severity error or horn
effect. A third type of evaluator avoids tendency error. Among three,
generosity error is the most common in practice followed by central
tendency error.
The halo effect is an error that occurs when rater's general impression of
the person influences the rating of individual characteristics. If the rater
has a favorable attitude towards the person being rated, there will be a
tendency to give high rating on all traits but if the rater's attitude is
unfavourable, rater tends to rate low.
A logical error results when two characteristics are related as more alike or
less alike then they actually are because of the raters believe concerning
their relationship. In rating clinical performance teachers tend to rate
higher to learners who are active and looks intelligence in classroom
situation.
A A 1 de
Principle of Effective Rating
The improvement of rating requires careful attention to selection of
characteristics to be rated, design of the rating from and condition under
which the rating are obtained Following principles should be considered
for effective rating.
1. The trait/criteria to be evaluated should be educationally significant.
When the constructing or selecting a rating scale ensure that
characteristics are relevant with intended learning outcomes.
2. Ensure characteristics are directly observable. There are two
consideration involved in direct observation. First is that there should be
adequate opportunity to observe the characteristics and second is the
characteristic should be directly visible to an observer such as steps
performed in procedure, communication technique.
St
3. Behavior characteristics on the rating scale should be clearly defined.
Characteristics statements should be specific and simple. The scale points
should be adequately differentiated.
For better clearance, descriptive scale can be used or separate instruction
can be given for numerical and graphic rating scale.
7 scales continuum are 4. Rater should be permitted to mark in between
the points. Rating of between 3 scales to determined, evaluation is also
permitted for rating in between the points in rating usually used. The
exact number of points designed need to be continuum.
Evaluation
257
5. Rater must be instructed to omit rating in unqualified areas. Sometime
rating scale include such characteristics that the teacher have little or no
opportunity to observe. Rating on such traits introduces error so it is
better not to rate and note the reason for not making a rating.
6. Rating from different observer should be combined whenever possible.
The pooled rating of several teachers will be more reliable description of
personal behavior that the obtained from any one teacher.
» Observer must specify who is being rated for what purpose.
» Observer must specify the total weighting and relationship with
students' achievement.
Anecdotal Records
Anecdotal records are description of the meaningful incidents and events
that the teacher has observed in learners performance. It includes factual
description of an event, how it occurred, what happened, when it occurred
and under what circumstances the behavior occurred. It also includes
interpretation and recommendation separate from description. Incident
should be written down shortly after it happens. A good anecdotal record
keeps the objective descriptions of an incident separate from any
interpretation of the behavior meaning
It is brief description of an observed behavior that appears significant for
evaluation purposes. The teacher describes the events s/he observed
carefully and writes his/her comments, takes the signature of students
and her-self and keep it into the file for record. It is also useful to keep
additional space for recommendation concerning ways to improve the
learners learning.
It is frequently used to evaluate various aspects of personal and social
behavior. It is used to assess unusual and exception incidents that
contribute to a better understanding of each learner's unique pattern of
behavior.
Strengths
▸ Anecdotal records depict actual behavior in natural situation and provide
a check on other situation evaluation method and also enable us to
determine the extent of change in the learner's typical pattern of
behavior.
>
It can be used with students with whom paper pencil test, self-report are
impossible.
▸ Keeping and records make us more diligent in observation and increases
awareness of such behaviors.
Limitations
» It requires adequate amount of time and effort to maintain an adequate
system of records (time consuming task).
258
Educational Scienc
» It is difficult to maintain objectivity in observing and reporting learner's
behavior Without objectivity it has not much value in evaluation.
There may be problems of obtaining an adequate sample of behavior. To
gain a reliable picture of a typical pattern of behavior learners should be
observed over a period of time and a variety of situations.
Guideline to use Anecdotal Records
There are suggestions for the effective use of anecdotal records.
1. Determine in advance what to observe, objectives should be reviewed
to decide which behaviors require direct observation: Few types of
significant behaviors are possible to observe so areas to be observed
should be decided in advance however we must be flexible enough to
notice and report any unusual behavior in the events that are significant.
2. Observe and record enough of the situation to make the behavior
meaningful: Record should contain those situational conditions that seem
necessary for understanding the learner's behavior (particular setting,
other persons involved in which behavior took place).
3. Make a record of the incident as soon after the observation as possible:
In most cases, it is not feasible to write a description of an incident when
it happens. However, delay in recording observation, the greater the
chance of forgetting important details Therefore try to make a few brief
notes at opportune time following behavioral incidents and complete the
record later.
4. Limit each anecdote to a brief description of a single incident: Just
enough detail should be included to make the description meaningful and
accurate. Limit the description to a single incident so that it will simplify
the task of writing, using and interpreting the records.
5. Keep the factual description of the incident and interpretation separate:
The description of the incident should be as accurate and objective as
possible that means state exactly what happened clearly and
nonjudgmental words. There is not always need to interpret incidents. If
interpretation is done, they should be kept separate with labeling as
interpretation.
6. Collect a number of anecdotes on a learner before drawing inference
concerning typical behavior: A single behavioral incident is rarely very
meaningful in understanding learner's behavior. We should make
judgment concerning learning and development after observing a
sufficient sample of behavior to provide a reliable picture of how the
learner typically behaves in different situations.
7. Record both positive and negative behavioral incidents: For evaluation
it is
important to record both positive and negative learning behaviors.
Therefore, conscious efforts should be made to observe record these
positive behavior(even subtle) and more obvious negative behavior.
Evaluation 259
considerable difficulty in selecting significant incidents in observing them
accurately Obtain practice in writing Anecdotal Records Initially most
teaches may have and in describing them objectively. So some training
and practice may be useful on the use of anecdotal records, Senior and
supervisor can help and maintain quality of the records to follow teachers
and students.
Written Communication Report
Nursing Care studies
strategy in nursing education. It is an individualized and independent
approach of Nursing care studies are the common clinical learning activity
and formative evaluation learning in clinical area. Individual student is
assigned to write detail nursing care study of the particular patient, who is
under her care. According to Schweer, nursing care study is the problem-
solving activity where by the student under-take comprehensive
assessment dha particular patient's problems and presents critical
analysis of the problems as nursing diagnoses. On the basis of those
nursing diagnoses planning and implementation of nursing interventions
are done followed by evaluation of the goal achievement.
Students prepare the written report of case study following provided
guideline with critical analysis in different components and creativity in
organization. It includes comprehensive assessment of the patient,
description of the nursing actions in meeting the needs of the patient with
rationale. As well as it includes integration of theoretical knowledge of
different areas to solve the problems.
Problem Oriented Records
Problem oriented record is a systematic record keeping focused around
patient's health problems. This kind of report is compatible with nursing
process and useful to evaluate student's skills in collecting data,
identifying problems from data, developing appropriate plans of patient's
care and recording patient's progress related to specific problem. It also
measures student's communication skills (information gathering)
developing plans, noting progress, and writing report. It is used as
formative evaluation in clinical setting.
It consist of following components:
Data base: all appropriate information about patient indicating his/her
condition.
Problem list: identification and listing of the condition or circumstances
identified from data base in priority basis which have health implication
for the patient
Initial plan of solution: plan of diagnostic and therapeutic intervention for
each problem
Progress note: it includes
>> Narrative notes - an descriptive comment related to each problem
▸ Flow sheets -graphic record of repetitive and serial data such as vital
signs.
Discharge summary - organized around follow up of each problem
Nurse's Notes
Writing cogent nursing and progress notes in patient's chart is an
important clinical skills. Evaluating the note written by students provide
faculty with an opportunity to assess the students ability to process and
record relevant data.
Nursing care which is based on nursing process requires considerable
intra-disciplinary and interdisciplinary communication so that the integrity
and continuity of patient care can be maintained. Nurse's note can serve
as the major means of communication between various professional.
Therefore, it is important to evaluate student's written communication
behavior systematically.
Other means of written report for practical evaluation are nursing care
plan, log book, term paper, research proposal and report and so forth.
Oral Examination/Viva Voce
It is integral part of clinical evaluation used to assess only those attribute,
that can't be measured by others. In oral examination, evaluator and
student sits face to face while asking question, students respond the
question(s) immediately. Evaluator may ask various types of questions
however probing type of question are generally asked. It is useful to
evaluate communication skills, attitude in addition to cognitive abilities.
Oral examination has some drawbacks. As responses are not recorded, it
reduces objectivity. As well as, there is chance of subjective judgment by
evaluator. It is time consuming process. However it is useful to assess
some qualities like alertness, confidence, decisiveness, logical reasoning
and clarification of the concept and expression abilities of the student. If
different questions are asked to different students, it is difficult to
compare student's achievement.
Principles for Oral Examination
> Identify learning outcome to be evaluated.
Divide oral examination in several parts and assign different parts to
different examiners.
▸ Plan and write down the set of questions to be asked.
▸ Decide on the model answer in priority basis beforehand.
▸ Pair new examiner with experienced one, if new examiners to be
involved.
▸ Use questions of higher cognitive abilities like interpretation, problem
solving, application.
Use simple and clear language for asking questions.
▸ Create congenial and relaxed atmosphere to facilitate the expression of
the students with minimal nervousness.
Ask similar question to all students, so that their achievements can be
compared. It also increases reliability.
# Start from easy question then gradually go into difficult ones.
# Give enough time to answer the question however it is necessary to
limit time.
# Listen carefully to answer with full attention and avoid other activities
while listening the answer.
# Avoid cross questioning and giving judgment to the answer.
# Ask alternate question(s) if the student cannot answer the question
. # Use checklist for scoring to maintain consistency in scoring to different
students.
# Analyze the question asked in the oral examination to get feedback for
future improvement.
Objective Structured Practical Examination (OSPE), Objective Structured
Clinical Examination (OSCE)
It is well known that conventional practical examination has several
problems, especially in terms of its outcome. Although grading/marking
should depend only on students competence yet variability in experiments
selected and examiners both affect grading in the conventional
examination, significantly. Further, the subjectivity involved in this
examination also affects the correlation negatively between marks
awarded by different examiners and the performance of the same
candidate. In conventional clinical examinations, different students are
tested in different areas
asked different questions
▸ scored by different criteria
To overcome these deficiencies, different measures have arrived. These
attempts were largely related to the adoption of appropriate measures for
bringing the practical examinations towards objectivity so that they may
become valid and reliable. Out of the various methods adopted, the
Objective Structured Clinical Examination (OSCE) was used earlier in
medical institutions and later extended and described to the Objective
Structured Practical Examination (OSPE) in 1975 and in greater detail in
1979 by Harden and his team from Dundee. The experts are now
recommending OSPE and OSCE for both educational and assessment
purposes even for other faculties as well.
OSCE/OSPE
OSCE is a kind of multi-station examination for clinical subjects. It was first
described in 1975 by Harden and Gleeson. It is an assessment tool used in
evaluating students' clinical competence where examiners plan carefully
the areas to be examined. In this examination, 262
Educational Science for Nurses
perform clinical competencies. These competencies usually involve clinical
skills such as the candidates rotate around different stations. At each
station, the examinee has to taking patient's history, physical
examination; wound dressing, administration of drugs, giving a health
teaching, checking vital signs, preparing articles for the procedure,
making conclusions on the basis of their findings, and so forth. Clinical
competencies are tested using checklists.
The OSCE has been used to evaluate critical competency areas in
students of health science, such as the ability to obtain/interpret data,
teach, communicate, and perform procedures In many colleges of
medicine and nursing, the OSCE is the standard mode of practical
examination. Where clinical skills and counseling sessions satisfactorily
complement cognitive knowledge testing in essay writing and objective
examination.
Meaning of OSCE
The acronym OSCE stands for Objective Structured Clinical Examination.
>> Objective: It means having actual existence or reality, that is
uninfluenced by emotions or personal prejudices. OSCE is said to be
objective because examiners use an agreed check list for evaluating the
students.
Structured: This is because every student is presented with the same
problem and performs the same task in the same time frame. In addition,
the marking scheme for each station is structured.
» Clinical: The tasks are representative of those encountered or faced in
real clinical situations.
>> Examination: This is so-called because the skills are assessed in the
form of a formal test of knowledge and ability or competency
(examination).
OSPE has similar characteristics to the OSCE which include a specified set
of tasks to assess students' competency in a structured pattern
objectively under direct observation. It is frequently used to assess the
laboratory skills of students before exposure to clinical settings. The
students rotate through a series of stations and undertake a variety of
tasks to test practical skills reliably and validly. They may have to carry
out an activity or practical procedure in terms of identification of
equipment, use of an instrument, procedure of experiment, handling of
instruments, making observations/results, interpretation of results, and
drawing conclusions. It can be used also for clinical experiments such as
history taking, physical examination, simple procedures, interpretation of
lab results etc.
Methodology
Jation. On a The students normally rotate a number of stations and they
spend a specified time at each signal such as the ring of a bell, the
students move on to the next station preferably in a clockwise direction.
The time allowed is the same for all the stations and the stations must be
designed keeping this in mind. About 4-5 minutes is an appropriate time
for each station. A further 30 seconds should be allowed for students to
move from one station to another to complete any final comments. The
number of OSCE examination about 5 minutes per station. Using 12
stations of 5 minutes each, 12 candidates can stations varies from 12-15
or even 20 stations each of which as earlier stated requires reduced
according to the higher number of students to be evaluated. But, in order
to conveniently complete the examination in one hour. However, number
of stations may be maintain the validity of exam, the time for each station
should not be less than 4 minutes. Further, in a particular exam of a same
period of time. The students are rotated through all stations and have to
move to the next station at the bell. single course, all stations should be
completed in the
Each station is designed to test clinical competency. At some procedure
stations, students are given tasks to perform on models or simulators. In
the procedure stations, there are examiners with agreed checklists to
score the student's performance. At the other stations referred to as
"Question stations', students respond to objective questions or interpret
data or record their findings from the previous procedure station. They are
thus exposed to an equal number of "Procedure stations" and the same
number of "Question stations".
For reliability and objectivity, all the students perform the examination,
gather, wait and start from a large hall and exit from an opposite door at
the end of the assessment. It is thus expected that candidates yet to be
examined should not have contact with those already examined. Students
are earlier briefed about the system in the examination hall and instructed
to stay without their phones until end of the exams. The scores for each
item on the checklist are decided by the examiners depending on the
importance of the item.
This type of evaluation (OSCE/OSPE) has been widely used in various
medical schools to be appropriate in the final feedback to students and
teachers during course assessment. Many institutions prefer using this
type of assessment because of the wide coverage of skills during the
assessment The examination is applicable in any subject where practical
skills need to be assessed.
well organized, it would test the student's competency in communication
skills, decision. making skill, psychomotor skill, and knowledge
competency simultaneously in one setting, It has different features as
following.
# The assessment covers a broad range of clinical skills much wider than
a conventional examination.
> Examination of process and outcome giving importance to individual
competencies
The stations are short which means the task at each station does not
exceed 4-5 minutes,
Numerous stations and each station present different problems or
stimulus, The number of stations to be included depends upon the nature
of the course.
There will be 'response to the question' and 'practical station'
The stations are very highly focused.
After a specified time, students move from one station to another till each
student covers all stations
>>
Pre-structured checklists/marking schemes are used.
▸ Scoring is objective since standards of competence are pretested and
agreed checklists are used for scoring.
The teacher/observer evaluates the student silently.
▸▸
Have scope for immediate feedback
Increased validity of the examination as the elimination of patient
variability (Simulations/models) and examiner variability
Station types
1. Procedure (Skill) stations: students are given tasks to perform. Such as
white and red blood cell (RBC) counting, fragility test for RBC and
hematocrit measurement. In these stations, examiners directly graded
students' performance by checklist.
2. Response stations: students are given 2,3 sets of questions (questions
depend on the theoretical aspects of topics such as blood, muscle, nerve,
the respiratory and cardiovascular system) to write. Designed to test the
skills of identification and/or interpretation of results, similar to spotting.
3. Manned (presence of external examiners) station: students are given 3-
5 questions, also called viva station.
4. Rest stations: no patients, no questions, no examiners. You can take
rest before you can move to another station.
Examples of Procedure and Response Stations
The number of questions and marks for each of them may vary in both
types of stations reflecting the importance of a question or the type of
experiment given.
1. Procedure Station
Objective: To examine the blood pressure of a patient.
2. Response Station:
Questions
i. Write the name of the instrument you have used at the first station.
ii. What was the posture of your subject during the measurement of BP?
iii. What was the systolic and diastolic pressure of the subject in your
experiment?
iv. What was the unit of measurement?
v. What is your conclusion about the result you have obtained, the subject
is normal or abnormal?
Practical Station: In practical stations, students are asked to examine the
parts of the instrument, interact with the patient, interpret the finding like
from X-ray/lab reports, record vital signs, or do laboratory tests like
urinalysis and fixing the slides. The student's performance at each
practical station is observed and scored by the examiner with the help of
a Strengths
The OSCES/OSPEs are potentially more reliable methods of assessment
OSPE is useful for any subject and the main benefit of OSPE is that both
the examination process and the examinee are evaluated by giving
importance to the individual competencies, OSPE can also examine both
the clinical and experimental skills, better than a conventional
examination.
> Each student has to perform the same tasks so increase acceptability
and feel fairness in evaluation.
Careful specification of content increases validity. Coverage of wider
sample of clinic competence increases validity and reliability.
Various psychomotor skills and related knowledge limited time. can be
assessed at one point within
Documentation of all the responses of the examinee ensures fairness in
the evaluation
There is objectivity in OSPE as the standard to check the competencies
are made earlier and agreed check lists are used for marking and
evaluation. Similarly, there is no room for subjective questions, only
objective questions are asked in response stations.
▸ This examination removes the variability of experiments and examiner
for a group of students or a class studying the same subject and thus it
enhances the validity of exam. High validity is also due to measurement of
the areas that students have supposed to do normally in their job.
▸ Student take more interest due to variety and keep themselves alert
during the whale process of examination, which is not found in
conventional one. If such examination is regularly used for formative
assessment then it can enhance teacher-student interaction as well.
Model answer and checklist are already prepared so subject expert may
not essential.
Large number of students can be tested within a relatively short time.
Hence, the process of OSPE/OSCE is also useful for formative assessment
as well.
Limitations
If proper planning, briefing to the students (before examination),
preparation of stations in an appropriate ratio (matching the number of
students/groups) are not done, then the whole process of OSPE/OSCE may
become a failure, Adequate time and effort is needed in planning and
organizing it.
It requires coordinated and teamwork of examiners to plan procedures
and response stations with suitable checklists and response questions,
agreed upon by the examination committee.
It requires an adequate number of teachers as observer is necessary to
evaluate students keenly, one by one, for the whole group/class, at a
single procedure station
Evaluation
267
It requires space, equipment and other logistic resources to make different
stations
> Students' performances are assessed in different stations so students'
performance in totality cannot be measured. Some educators feel that
breaking clinical skills into individual competencies is artificial and not
meaningful.
> Effective training of the examiners is essential to plan and conduct
examinations.
Table 10.5: Limitation and Way-out for OSCE/OSPE
Limitations
it is not a holistic approach to a patient
It assesses only a limited competency samples
It is resource Intensive
The role of the examiner is prescribed.
Only minimum competence is tested
Some learning outcomes are difficult to assess
Students' behaviours are influenced by the context.
It is stressful.
Ways Out
Use it alongside other tools, e.g., portfolios and work-based assessment
instruments
Use a blueprint to sample across the outcome domains and the core tasks
With organization, the resources required can be contained. The cost-
benefit ratio is favorable.
Within the set framework, the examiner can also use his/her judgment
The scoring system can also reflect excellence. More advanced stations
can be included
Performance in an OSCE can be triangulated with other assessments
Design it to be as close to real practice as possible.
Students should be briefed and prepared.
4
120
Performance in practice
Direct observation, workplace-based assessments
DOES
Expert
Demonstration
Simulations, OSCES
Interpretation and application
Case presentations, essays
SHOWS
Knowledge | Skills | Anitudes
KNOWS HOW
Fact recall
MCQ assessments
KNOWS
Novice
Figure 10.5: Miller's Pyramid and Core Competency Assessment
Miller's Pyramid of Assessment provides a framework for assessing clinical
competence in health science education and can assist clinical teachers in
matching learning outcomes (clinical competencies) with expectations of
what the learner should be able to do at any stage.
268
Educational Science for Nurses
10.3 Feedback
It is the individual comment about the performance or behaviour of some
body. It is Perception made by another person(s) about how somebody's
performance but not whe somebody is. Feedback is the way to improve
personal behaviour and performance. It is effective only if suggestions for
change are given. It is not criticism especially negative Two parties are
involved in feedback, receiver and sender (giver). Rules should be
followed for giving and receiving feedback
Types of Feedback
1. Affirmative (Positive): These are acknowledgement (thanks for your
input), positive comments such as e.g. good job, you really worked hard,
well done, excellent chart, your message is clear or mentioning good
points like e.g. Letters are easily readable, your presentation is structured
2. Developmental: These are suggestion and recommendation given for
improvement for example increase your volume of your voice, ask one
question at a time, increase size of letter, use available space e.g.
information could be given in handout, brain storming could be a
appropriate method
Giving Feedback
First ask the presenter if his/her goals were achieved and how s/he feels
about the session. Ask each member of the group to comment on how far
they perceived the criteria to have been met. While giving feedback, try to
▸▸ Be clear about what you want to say in advance.
▸ Start with the positive or strength observed for encouragement.
▸▸ Be descriptive rather than evaluative.
Describe specific behaviors and reactions, particularly those that the
student should continue and those that should be changed.
>> Refer to performance that can be change.
▸ Allow freedom to change or not to change.
▸▸ Emphasize positive feedback during a session. Constructive feedback
should be given in the form of "what you are expected to do next time,"
rather than what you did wrong this time.
Take responsibility for your own feedback in my opinion.
▸ Use description not judgment.
Choose things the person can concentrate on. If the people are
overwhelmed with too many suggestions, they are likely to become
frustrated. When giving feedback call attention to those areas that need
the most improvement.
Evaluation
260 # Avoid conclusions about motives or feelings. For example: rather
than saying "You don't seem very enthusiastic about the lesson", you can
say "Varying your rate and volume of speaking would give you a more
lively style".
# Begin and end with strengths of the presentation. If you start off with
negative criticism, the person receiving the feedback might not even hear
the positive part, which will come later.
Be timely, give feedback soon after the event. Feedback given during
practice should be limited to critical information necessary to avoid a risk.
Convey positive feedback by facial expression and tone of voice rather
than words, when appropriate. This type of feedback can be highly
effective.
# Give students an opportunity to respond to the feedback, with actively
listening to response. At least, students should restate specific behaviors
they will perform the ned time they practice the skill.
Receiving feedback
When you are receiving feedback, try to:
listen to the feedback rather than immediate reaction and argument. Ask
for clarification but not justification,
be open to what you are hearing,
take notes, if possible,
ask for specific feedback, if it is vague or unfocused,
make sure that you understood the feedback,
ask for feedback that you want, and
decide what you will do as a result of the feedback.
How much Feedback?
Amount
None
Too much
Too little
Only negative
Only positive
Effect
No correction, no change, no improvement
Breakdown, fear,
Development may go in wrong direction
Loss of confidence, loss of motivation, frustration
No option for varieties
Feedback should be specific, in right amount and both positive and
negative