0% found this document useful (0 votes)
36 views70 pages

Manual-Language Testing and Assessment

Uploaded by

chrishmahansanee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views70 pages

Manual-Language Testing and Assessment

Uploaded by

chrishmahansanee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

COURSE MANUAL

MODULE BAE2633 LANGUAGE TESTING AND ASSESSMENT

INSTRUCTIONS

 Lecturer should discuss more than what is summarized here, especially, in terms of
methods of delivery.

 All the Topics as per the Syllabus should be touched during the Contact Hours.

 This document is only a synopsis. The content in this document is not sufficient for
students to write answers at the examination.

 Students need to read more on the given Topics. (40% Lectures and 60% Self Learning.)

 References – are the sources which had been used in compiling this document.
Therefore, Students do not have to refer in their learning.

DISCLAIMER

This is a free distribution, based on the request of KASP Leaning Campus of Sri Lanka; this compilation is
authorized by the university to be referred by undergraduate students of Bachelor of Education – TESL
for academic purpose, not for any commercial activity. Reproduction and distribution to Third Party/ies
without written permission of the owner is prohibited. This Compilation contains a set of notes to help
the Lecturer and the students. The University will take limited and reasonable care to ensure that it does
not knowingly infringe the copyright of anyone. If it is suspected that information on this is infringing the
copyright of someone, the local office of University in Sri Lanka, should be informed so that appropriate
action can be taken.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 1
TOPIC BAE 2633 LANGUAGE TESTING AND ASSESSMENT CONTACT
CONTENT HRS
Introduction To Testing And Assessment In
1 Language Teaching 2
• Definition of test and assessments
• Importance of language testing
• Purposes of assessments
Types of Tests:
2 • Criterion-referenced test vs. Norm-referenced 2
test
• Objective vs. Subjective Tests
• Paper-and-pencil language test vs.
performance – based test
• Teacher- made test vs. standardized- test
Classroom Assessment
3 The teachers’ role in assessment 2
Types of Classroom Assessment
• Assessment for Learning (AFL)
• Assessment of Learning (AOL)
• Assessment as Learning (AAL)
Characteristics of tests
4 Principles of assessment 2
• Validity
• Reliability
• Practicability

CONTINUOUS ASSESSMENT – M I D
Types of Classroom Testing items
5 • Completion and short-answer items 6
• Essay items
• Multiple choice questions
• True-false items
Stages of test design and administrations
6 • Planning 3
• Producing
• Administering
Teaching students test taking skills
7 • General Test taking strategies 3
• Test-taking strategies for specific test formats
- Strategies for Short-Answer Tests
- Strategies for Essay Tests
- Strategies for Multiple-Choice Tests
- Strategies for True–false Tests
Language skills testing
8 • Testing reading and writing skills 3
• Testing listening and speaking skills
TOTAL 23
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 2
HOW TO ASSESS YOUR STUDENT

B.ED IN TESL - ASSESSMENT MODES (APPLICABLE FROM JULY 2019)


NOTE : Please refer to the respective module you are assigned to and ensure the students are assessed accordingly.
(Excl: Mid / End Exams).
Summative
Continuous
(End Exams)
Written (Mid) 50 Presentaion - Micro Teaching - 100 Assignment - Others - Written (End) -
Code Module
Marks 100 Marks Marks 100 Marks 100 Marks 100 Marks
Language Testing and
BAE 2633
Assessment
10% 20% No Micro Teaching 30% - 40%

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 3
LECTURE 1 (Time allocation – 2 Hours)

Introduction to Language Testing and Assessment

The Definition of Language Testing

In plain words, testing or administering a test is a method of measuring a person‟ s ability or


knowledge in a given domain. It is a set of techniques, procedures or and items that constitute an
instrument of some sort that requires performance or activity on the part of the test-taker (and
sometimes on the part of the tester as well). A test is an instrument or procedure designed to
elicit performance from learners with the purpose of measuring their attainment of
specified criteria. The method may be intuitive and informal or may be structured and explicit
(Brown, 2001).

Language testing is the administration of test in order to assess and measure a person‟s language
competence and performance or testing language ability. It is an evaluation of an individual‟s
language proficiency.

Evaluation, using criteria governed by a set of standards, is a systematic determination of the


merit, value and significance of a subject. It can help an organization, program, design, project or
any other intervention or initiative to evaluate any goal, concept/proposal feasible, or any
alternative, to assist in decision-making; or to determine the degree of achievement or value of
any such action that has been completed with regard to the goal and objectives and results. In
addition to gaining insight into prior or existing initiatives, the primary purpose of the evaluation
is to allow reflection and help identify future changes.

The relationship between language testing and language teaching

Tests have become a way of life in the educational world and tests are often used for pedagogical
purposes, either as a means of motivating students to study, or as a means of reviewing material
taught (Bachman, 1990). In every learning experience there comes a time to pause and take
stock, to put our focal processes to their best use, and to demonstrate accumulated skills or
knowledge. For optimal learning to take place, a good teacher never ceases to assess students,
whether those assessments are incidental or intended. Thus language tests can be valuable
sources of information the effectiveness of learning and teaching. Language teachers regularly
use tests to help diagnose students‟ strengths and weaknesses, to assess student progress, and to
assist in evaluating student achievement. As sources of feedback on learning and teaching,
language tests can thus provide useful input into the process of language teaching (Bachman,
1990).

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 4
The purposes of administering Tests

Tests may be constructed primarily as devices to reinforce learning and to motivate the student,
or primarily as a means of assessing the students‟ performance in the language (Heaton, 1975).
There are four reasons for having a test:
(1) To indicate future ability
(2) To discover what is already known
(3) To discover what has been learned
(4) To discover what is still to be learned (Bell, 1981)

What to be Tested
In the language testing, what to be tested includes assessing in the four major skills:
(1) Listening (auditory) comprehension, in which single utterances, dialogues, talks and
lectures are given to the testee,
(2) Speaking ability, usually in the form of an interview, a picture description, and readin
aloud,
(3) Reading comprehension, in which questions are set to test the student‟s understanding of a
written text
(4) Writing ability, usually in the form of essays, letters, and reports (Heaton, 1975)

In addition, items designed to test areas of the following components of the language skills:
(1) Phonology (concerned with pronunciation, stress and intonation)
(2) Vocabulary (concerned with word meanings and word arrangements)
(3) Grammar
(Heaton, 1975)

Simply put, a test refers to a tool, technique or a method that is intended to measure students‟
knowledge or their ability to complete a particular task. In this sense, testing can be considered
as a form of assessment. Tests should meet some basic requirements, such as validity and
reliability.
 Validity refers to the extent to which a test measures what it is supposed to measure.
 Reliability refers to the consistency of test scores when administered on different
occasions.

There are different types of Tests:

 Placement tests: It is designed to help educators place a student into a particular level or
section of a language curriculum or school
 Diagnostic tests: they help teachers and learners to identify strengths and weaknesses.
 Proficiency tests: they measure a learner‟s level of language.
 Achievement tests: they are intended to measure the skills and knowledge learned after
some kind of instruction.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 5
Definition of Assessment:

The verb assess comes from the French „assesser‟, but the origin is from the Medieval Latin
„assessare‟ meaning “fix a tax upon,”. Another derivation of the Latin term is „assidere‟ or
„adsidere‟ meaning “to sit beside” (a judge). Reference is made to the assistant of the judge
whose job was to fix the amount of a fine or tax by estimating the value of a property.
Assessment is thus the process of collecting information about students from diverse sources so
that educators can form an idea of what they know and can do with this knowledge. While
evaluation is concerned with making judgments about instruction, a curriculum, or an
educational system, assessment is concerned with the students‟ performance. In other words, one
assesses an individual but evaluates a program, a curriculum, an educational system, etc.

The verb „assess‟ often collocates with:


 skills,
 abilities,
 performance,
 aptitude,
 competence.

According to Le Grange & Reddy, (1998, p.3)


Assessment occurs when judgments are made about a learner’s performance, and entails
gathering and organizing information about learners in order to make decisions and judgments
about their learning.”

Assessment is thus the process of collecting information about learners using different methods
or tools (e.g. tests, quizzes, portfolios, etc).

Educators assess their students for a variety of purposes:


 To evaluate learners‟ educational needs,
 To diagnose students‟ academic readiness,
 To measure their progress in a course,
 To measure skill acquisition.

There are different types of Assessment:

 Formative Assessment:
It is process-oriented and is also referred to as „assessment for Learning‟. It is an ongoing
process to monitor learning, the aim of which is to provide feedback to improve teachers
instruction methods and improve students learning.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 6
 Summative Assessment:
It is product-oriented and is often referred to as „Assessment of Learning‟. It is used to
measure student learning progress and achievement at the end of a specific instructional
period.

 Alternative Assessment:
It is also referred to as authentic or performance assessment. It is an alternative to traditional
assessment that relies only on standardized tests and exams. It requires students to do tasks
such as presentations, case studies, portfolios, simulations, reports, etc. Instead of
measuring what students know, alternative assessment focuses on what students can do with
this knowledge.

Purpose of Testing and Assessment

One of the first things to consider when planning for assessment is its purpose.
Who will use the results?
For what will they use them?

1. Teaching and Learning


2. System Improvement

1. Teaching and Learning


 The primary purpose of assessment is to improve students„ learning and teachers„
teaching as both respond to the information it provides;
 Assessment for learning is an ongoing process that arises out of the interaction between
teaching and learning;
 What makes assessment for learning effective is how well the information is used.

 Inform and guide Teaching and Learning:


A good classroom assessment plan gathers evidence of student learning that informs
teachers' instructional decisions. It provides teachers with information about what
students know and can do. To plan effective instruction, teachers also need to know what
the student misunderstands and where the misconceptions lie. In addition to helping
teachers formulate the next teaching steps, a good classroom assessment plan provides a
road map for students. Students should, at all times, have access to the assessment so they
can use it to inform and guide their learning.

 Help students set learning goals:


Students need frequent opportunities to reflect on where their learning is at and what
needs to be done to achieve their learning goals. When students are actively involved in
assessing their own next learning steps and creating goals to accomplish them, they make

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 7
major advances in directing their learning and what they understand about themselves as
learners.

 Assign report card grades:


Grades provide parents, employers, other schools, governments, post-secondary
institutions and others with summary information about student learning.

 Motivate students
Research (Davies 2004; Stiggins et al. 2004) has shown that students will be motivated
and confident learners when they experience progress and achievement, rather than the
failure and defeat associated with being compared to more successful peers.

The key is to understand the relationship between assessment and student motivation. In
the past, we built assessment systems to help us dole out rewards and punishment. And
while that can work sometimes, it causes a lot of students to see themselves as failures. If
that goes on long enough, they lose confidence and stop trying. When students are
involved in the assessment process, though, they can come to see themselves as
competent learners (sparks 1999).

2. System Improvement

 Assessment can do more than simply diagnose and identify students „learning needs; It
can be used to assist improvements across the education system in a cycle of continuous
improvement.

 Students and teachers can use the information gained from assessment to determine their
next teaching and learning steps;
 Parents and families can be kept informed of next plans for teaching and learning and the
progress being made, so they can play an active role in their children„s learning;
 School leaders can use the information for school-wide planning, to support their teachers
and determine professional development needs;

 Communities and Boards of Trustees can use assessment information to assist their
governance role and their decisions about staffing and resourcing.
 The Education Review Office can use assessment information to inform their advice for
school improvement.
 The Ministry of Education can use assessment information to undertake policy review
and development at a national level, so that government funding and policy intervention
is targeted appropriately to support improved student outcomes.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 8
Importance of Language Testing

In the teaching/learning process, language testing plays an important role. It helps language
teachers to position students at their appropriate levels, to diagnose the strengths and weaknesses
of the students, and to assess their performance during and at the end of the course. More
importantly, language testing can help to plan and manage language programmes. This seems to
be the key issue because planning depends on success or failure, or any language program.

Language learning assessment serves one of two functions: either to measure the proficiency of
learners without reference to a language course, or to measure the extent to which they have
achieved the objectives of a particular learning program. Within the latter function, it is common
to differentiate between formative and summative evaluation. During the course of learning,
formative evaluation takes place to provide learners with feedback on their progress and alert the
teacher to any aspects of the course that may need adjustment; it is sometimes referred to as
„assessment of learning‟ . At the end of the course, summative evaluation takes place and seeks
to measure overall learning achievement; it is sometimes referred to as „assessment of learning‟.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Lecture 1 /page 9
LECTURE 2

TYPES OF TESTS

The general learning objective:


 To understand some kinds of tests.

The specific learning objectives:


 To be able to discuss how tests can be classified.
 To be able to describe each of them.
 To be able to describe some advantages and disadvantages of using them.

A. The Kinds of Tests

There are many kinds of tests, each with a specific purpose, a particular criterion to be measured.

The following describes the five test types that are in common use in language curricula (Brown,
2001).

(1) Proficiency Tests


Proficiency tests aim at toping global competence in a language. A proficiency test is not
intended to be limited to any one course, curriculum or single skill in the language. Proficiency
tests have traditionally consisted of standardized multiple choice items on grammar, vocabulary,
reading comprehension, aural comprehension, and sometimes a sample of writing.

(2) Diagnostic Tests


A diagnostic test is designed to diagnose a particular aspect of a language. A diagnostic test on
pronunciation might have the purpose of determining which phonological features of English are
difficult for a learner and should therefore become a part of a curriculum.

(3) Placement Tests


A placement test aims at placing a student into an appropriate level or section of a language
curriculum or school. A placement test typically includes a sample of material to be covered in
the curriculum.

(4) Achievement Tests


An achievement test is related to directly to classroom lessons, units, or even a total curriculum.
Achievement tests are limited to particular material covered in a curriculum within a particular
time frame, and are offered after a course has covered the objectives in question. It is to
determine acquisition of course objectives at the end of a period of instruction.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 1
(5) Aptitude Tests
An aptitude test is a test that predicts a person‟s future success. A language aptitude test is
designed to measure a person‟s capacity or general ability to learn a foreign language and to be
successful in that undertaking.

B. Subjective and Objective Testing

Subjective and objective testing are terms used to refer to the scoring of tests. All test items, no
matter how they are devised, require candidates to exercise a subjective judgement; the testees
must think what to say and then express his ideas as well as possible. Whereas in objective
testing, a testee will score the same mark no matter which examiner marks his test (Tuckman,
1975). The discussion of this kind of tests will be presented in the next chapter.

C. Teacher-Made Tests and Standardized Tests

A teacher-made test is a test designed by the teacher to meet his or her own course objectives.
The teacher prepares the test items to fit the instructional objectives and uses the items in
classroom setting. It is usually built without careful and thorough analyses and considerations
how the test should be. It also constitutes a major portion of the school testing program
(Tuckman, 1975).

Standardized Tests or Tool


A standardized test is a test, which is developed in such a way that it reaches up to a specific
level of quality or standardization. The test is standardized with respect to the form and
construction, administration procedure and test norms. It is a kind of test, which is standardized
in terms of its development, administration, scoring and interpretation. The test is standardized to
bring objectivity, reliability, validity and to have all the other characteristics of a good test.
Standardized tests have a manual with it, which instructs and guides its users regarding its
administration, scoring and interpretation. The following are some of the important definitions of
standardized tests to clarify the concept of standardized test or tool of measurement and
evaluation.

Standardized test means a test for which comparative norms have bee derived, their reliability
and validity have been established, and directions for administration and scoring prescribed (Ary,
et.al., 1990). A standardized test is a test designed to provide a systematic sample of individual
performance, administered according to prescribed directions, scored in conformance with
definite rules, and interpreted in reference to certain normative information (Tuckman, 1975). It
possesses three properties:
(1) Items that have been tried out, analyzed, and revised,
(2) Widespread and standard use and reuse,
(3) The availability and use of norms for interpretation (Tuckman, 1975).

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 2
According to C. V. Good, „a standardized test is that for which content has been selected and
checked empirically, for which norms have been established, for which uniform methods of
administration and scoring have been developed and which may be scored with a relatively high
degree of objectivity.‟

According to L. J. Cronbach, „a standardized test is one in which the procedure, apparatus and
scoring have been fixed so that precisely the same test can be given at different times and
places.‟

The most important benefits of a standardized test are that it minimizes or reduces four types of
errors. These are personal error, variable error, constant error and interpretive error.

Characteristics of Standardized Tests


Following are the important characteristics of a standardized test:

• It has norms, which contains everything about the test, starting from its preparation to its
scoring and interpretation. Norms of the test describe every aspect of the test in detail so that
any user can use it properly by using the norms of the test.
• It has norms developed for transferring raw score to a standard score.
• Instruction for administration of the test is pre-determined and fixed.
• Duration of the test is fixed.
• The test is standardized on a suitable sample size selected from a well-defined population.
• It has high reliability and validity.
• The test has high objectivity.

D. Written Tests and Oral Tests

A written test is a test in which the test items and their answers are written. While an oral test is a
test in which the test items and their answers are given orally.

E. Individual Tests and Group Tests

An individual test is a test that can be administered to only one person at a time, whereas a group
test is a test that may be administered to a number of individuals at the same time by one
examiner (Tuckman, 1975), scored in conformance with definite rules, and interpreted in
reference to certain normative information (Tuckman, 1975). One example is a standardized test.

F. Criterion- Referenced Achievement Tests and Norm-Referenced Achievement Tests

A criterion-referenced achievement test is a test designed to measure the degree of proficiency


attained on a given set of objectives. One of its examples is a teacher-made test. This helps the
teacher to monitor student progress, diagnose strengths and weaknesses, and prescribe
instructions.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 3
A norm-referenced achievement test is a test designed to provide a systematic sample of
individual performance, administered according to prescribed directions.

G. Criterion-Referenced Vs. Norm-Referenced Tests

Norm-referenced Tests

To understand non-referenced type of evaluation, we have to first learn about the term „norm‟. The
term „norm‟ has two meanings. One is the established or approved set of behavior or conduct to be
followed or displayed by all members of a family or society or any organization. It is the established
customs of the society which most of the people follow without question. The other meaning of the
term, which is meaningful for us here, is the average performance of the group.

Example: A group of students are tested for the awareness towards environmental pollution through a
written test. The test consists of 50 objective type questions of one mark each. There was no negative
marking in the test. The full mark of the test is obviously 50. After the conduction of the test, it is
marked by the examiner. There are 150 students in the group. Marks of all students are added and
the additive marks are divided by 150 to find the average performance of the group. Suppose it is
found to be 30, so 30 marks is the average obtained by the whole group in which some achieve 49 out
of 50 and some other achieve very less, say 12 out of 50. The 30 mark, i.e., the average of the group is
said to be the norm of this group.

Now, the evaluation of all 150 students is done considering this 30 (the norm) as a point of reference.
All students who have got marks above 30 are considered as above average, all those who have
achieved below 30 are considered as below average and all those who have achieved just 30 are
supposed as average. There is no pass or fail in this type of evaluation as there is set marks for
passing the test. This type of evaluation is called as norm-referenced evaluation and the test is
considered as norm-referenced test.

Criterion-referenced Tests

The type of evaluation in which the performances of the testees are evaluated with reference to some
predetermined criteria is called as criterion-referenced evaluation. No weightage is given to norm or
average performance of the group in this evaluation. All decisions, such as pass or fail, distinction,
excellent, etc., are taken with reference to criteria set out in advance.

In the preceding example, if some criteria is set before the test with reference to which the
performance of each students will be evaluated, it will become criterion- referenced devaluation.
Suppose the following criteria are finalized for this test:
Pass marks: 40 % Distinction: 80%

In the test discussed above, all those students who get 20 or more than 20 (40%) are declared as
pass. All those students who score less than 20 are declared as fail. All those who get 40 or more
(80%) are declared as distinction. If any prize is given to those who score at least 90%, then only

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 4
the students who will get 45 or more will get the prize. As all decisions are taken on the basis of
some criteria, this evaluation is called as criterion-referenced evaluation. Let us now look at how
a creterion-referenced evaluation can be constructed.

H. Paper And Pencil Tests Vs. Performance Based Test

Paper-and-pencil test refers to traditional formats of student evaluation, such as written tests, and
also to standardized tests that ask students to use pencils on a scannable answer sheet to fill in
bubbles. Standardized tests are now commonly administered on computers, but students usually
need to submit written answers on paper for classroom evaluation. Paper-and-pencil test often
refers to objectively scored tests in the classroom, which are intended to measure memorized
knowledge and lower levels of understanding, compared with performance-based evaluation,
which is intended to measure deeper understanding through skills and ability.

I. Class Progress Test (Formative/ Continuous Assessment)

Progress tests are longitudinal, feedback-oriented educational evaluation tools for assessing
cognitive knowledge development and sustainability during a learning process. A progress test is
a written knowledge exam (usually involving multiple choice questions) that is usually
administered throughout the entire academic program to all students in the “A” program at the
same time and at regular intervals (usually twice to four times a year). In the test scores, the
differences between the knowledge levels of students show; the higher the scores, the further a
student has progressed in the curriculum. As a result, these resulting scores provide a
longitudinal, repeated, curriculum-independent evaluation of the objectives (in knowledge) of the
program as a whole.

J. Selection Test
The selection tests aim to measure abilities and skills in a worker that are determined to be
essential for successful job performance by job analysis. A test is an instrument designed to
measure psychological factors that are selected.

K. Discrete-point Tests and Integrative Tests

Discrete point testing refers to, item by item, the testing of one element at a time. For example, a
series of items each testing a specific grammatical structure might involve this. Testing for
Language Teachers (Taken from Hughes, A (2003). The underlying assumption that language
can be broken down into its component parts and those parts tested in turn is answered by the
discrete point test. These components, together with subcategories within these units, are the four
skills (listening, speaking, reading, writing) and the various linguistic components (morphology,
graphology, spelling, grammar, morphology, syntax and vocabulary). Therefore, tests are
designed in order to evaluate only one of these elements. Testing English as a foreign language:
An Overview and Some Methodological Consideration (Taken from RESLA (1996).

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 5
Tests for discrete items (or discrete points) are tests that check one language element at a time.
For example, the learner's understanding of the correct form of the verb „sing‟ is tested by the
following multiple choice item:
When I was a child I .......... in a choir.
a. sing b. singed c. song d. sung e. sang

In terms of marking, they have the advantages of often being practical to administer and mark,
and objective. They show, however, only the ability of the learners to recognize or produce
individual items - not how they would use the language in real communication. In other words,
they are inevitably indirect tests - they provide proof of the ability of the learners to recognize or
produce certain specific language elements, but do not show how they could actually use them
Integrative tests, it may, on the other hand, be direct or indirect. The use of the term integrative
indicates that more than one skill and/or knowledge item is being tested at a time. Dictation is an
integrative test because it includes the ability to listen, the ability to write, the recognition of
specific language items, grammar (for example, to distinguish whether /əv/ should be written as
having or of) and so on. However, dictation is still an indirect test.

On the other hand, many integrative tests are often direct tests as well - they ask the learner to
demonstrate their ability to perform a specific communicative "real life" task by asking them to
do it actually. Therefore, they demonstrate the ability of learners to use language in real
communication.

L. Direct and Indirect Tests

In assessment, most test items fall into two categories, which are direct and indirect test items.
Direct test items ask the student to complete a genuine action of some sort. Indirect test items
measure a student's understanding of a topic. Examples of test items that are either direct or
indirect products will be provided in this post.

Direct Test Items


Authentic evaluation approaches were used for direct test items.
 For speaking: interviews and lectures
 To write: Essay questions
 For reading: using actual reading material and having the student respond verbally and or
in writing to the question.
 For listening: Following oral instructions to complete a task

The primary objective of direct test items is to be as real-life-like as possible. Direct test items
are often integrative, meaning that the student has to apply several abilities at once.
Presentations, for example, involve more than just speaking, but also writing the speech, reading
or memorizing the speech, as well as developing the speech with critical thinking skills.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 6
Indirect Test Items
Without authentic application, indirect test items assess knowledge. Some common examples of
indirect test items are provided below.

 Multiple choice questions


 Cloze items
 Paraphrasing
 Sentence re-ordering

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMEN) Lecture 2 / Page 7
LECTURE 3

Types of Classroom Assessments


Learning Outcome

At the end of this lesson, you will be able to:


1. Explain the Role of the Assessor in the Classroom Testing.
2. Clarify the difference Between Assessment for Learning and Assessment as Learning.
3. Discuss the function of the 2 main classroom Assessments.

 Classroom assessment is a systematic process of gathering information about what a


student knows, is able to do, and is learning to do. Assessment information provides the
foundation for decision-making and planning for instruction and learning. Assessment is
an integral part of instruction that enhances and empowers student learning.

 Using a variety of assessment techniques, teachers gather information about what


students know and are able to do, and provide positive, supportive feedback to students.
They also use this information to diagnose individual needs and to improve their
instructional programs, which in turn helps students learn more effectively.

1. Teacher’s Role in Classroom Assessment


 In classroom-based language assessment, the teachers are the ones responsible for
facilitating student learning and obtaining information about their progress and
achievement.

 From planning what and how to assess, through implementing assessment procedures and
monitoring students’ performances to recording students’ attainment and progress, the
teacher is constantly making decisions on how to keep track of students’ progress and
attainment; either through a formal assessment procedure or through informal daily
monitoring and observation.

 In the classroom, teachers are the primary assessors of students. In direct test (i.e.,
speaking tests), raters are influenced by personal and contextual factors. How this can be
avoided?

 So, Ways in which classroom-based language assessment is used and the involved
processes are heavily influenced by the practices of the teachers.

 Teachers design assessment tools with two broad purposes:


- to collect information that will inform classroom instruction, and
- to monitor students’ progress towards achieving year-end outcomes.

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 1


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Teachers also assist students in developing self-monitoring and self-assessment skills and
strategies. To do this effectively, teachers must ensure that students are involved in setting
learning goals, developing action plans, and using assessment processes to monitor their
achievement of goals. Teachers also create opportunities for students to celebrate their progress
and successes.
 Teachers learn about students’ learning and progress by regularly and systematically
observing students in action, and by interacting with them during instruction. Because
students’ knowledge and many of their skills, strategies, and attitudes are internal
processes, teachers gather data and make judgments based on observing and assessing
students’ interactions, performances, and products or work samples.

 Teachers demonstrate that assessment is an essential part of learning. They model


effective assessment strategies. Teachers also collaborate with parents and with
colleagues regarding student assessment.

2. Types of Classroom Assessment

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 2


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Assessment is integral to the teaching–learning process, facilitating student learning and
improving instruction, and can take a variety of forms. Classroom assessment is generally
divided into three types:
 Assessment for Learning
 Assessment of Learning
 Assessment as Learning.

2.1. Assessment for Learning (Formative Assessment)

What is Formative Assessment?


Formative Assessment at its heart is about what the students will learn, and how their learning
progresses. It’s also a tool for teachers and students to communicate their progress and needs
during a lesson. Think of it as a more qualitative process, not focused on grades as much as on
the needs, understanding, and progress of each student! The key to formative assessment is
taking those revelations to help teachers put together a more complete picture of their classroom
and the level of comprehension of their students.

Formative Assessment as a Communication Tool


Formative assessment is about feedback, and communication. It provides teachers with feedback
that is essential for them to address and support the students’ needs. While it also opens up
students to identifying their own progress.

This is where formative assessment really shines! It allows for the communication loop between
the student and the teacher to close. Giving the necessary feedback to both teacher and student to
gauge and react to the understanding of the material in class.

It’s also where peer feedback becomes an essential tool, because this kind of feedback supports
and encourages metacognition and mastery. When peer feedback happens in the draft stages of
assignments students know there is still room for improvement and can find the feedback less
threatening and more of a tool for success. Using tools like Peergrade gives teachers an overview
of student progress and students can self-evaluate their own progress.

Bringing Formative Assessment into Your Classroom


Although it is an informal tool, formative assessment requires a bit of thought to include it in the
classroom. One reason being that students need to know how they will be measured and
evaluated.

The goal of formative assessment is to monitor student learning to provide ongoing feedback
that can be used by instructors to improve their teaching and by students to improve their
learning. More specifically, formative assessments:
 help students identify their strengths and weaknesses and target areas that need work
 help faculty recognize where students are struggling and address problems immediately

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 3


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Formative assessments are generally low stakes, which means that they have low or no point
value. Examples of formative assessments include asking students to:
 draw a concept map in class to represent their understanding of a topic
 submit one or two sentences identifying the main point of a lecture
 turn in a research proposal for early feedback

The purpose of formative assessment is to monitor student learning and provide ongoing
feedback to staff and students. It is assessment for learning. If designed appropriately, it helps
students identify their strengths and weaknesses, can enable students to improve their self-
regulatory skills so that they manage their education in a less haphazard fashion than is
commonly found. It also provides information to the faculty about the areas students are
struggling with so that sufficient support can be put in place.

Formative assessment can be tutor led, peer or self-assessment. Formative assessments have low
stakes and usually carry no grade, which in some instances may discourage the students from
doing the task or fully engaging with it.
Assessment for learning is ongoing assessment that allows teachers to monitor students on a day-
to-day basis and modify their teaching based on what the students need to be successful. This
assessment provides students with the timely, specific feedback that they need to make
adjustments to their learning.
After teaching a lesson, we need to determine whether the lesson was accessible to all students
while still challenging to the more capable; what the students learned and still need to know; how
we can improve the lesson to make it more effective; and, if necessary, what other lesson we
might offer as a better alternative. This continual evaluation of instructional choices is at the
heart of improving our teaching practice.
(Burns 2005, p. 26).
Formative Assessment is:
 A diagnostic tool which provides feedback allowing a teacher to modify their lesson
plans to meet their students’ needs.
 A quick and simple tool to check in and identify areas where students are struggling, and
allow for extra support and care to be given to those that need it during the lesson.
 A side-step from grade-oriented thinking, focusing more on process and mastery. Making
lessons more about self-discovery, curiosity, and learning.
 Giving students ownership in their own learning.

The benefits of Formative Assessment:


 Helps provide structure and clarifies expectations of performance and learning.
 Strengthens student agency and metacognition.
 Students are involved in their learning process, and also in the teaching process through
peer feedback and support.
 Encourages dialogue and opens up communication.
 It helps students reach their learning goals!

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 4


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
2.2. Assessment of Learning (Summative Assessment)

The goal of summative assessment is to evaluate student learning at the end of an instructional
unit by comparing it against some standard or benchmark.

Summative assessments are often high stakes, which means that they have a high point value.
Examples of summative assessments include:
 A Midterm Exam
 A Final Project
 A Paper
 A Senior Recital

Information from summative assessments can be used formatively when students or faculty use it
to guide their efforts and activities in subsequent courses.
Assessment of learning is the snapshot in time that lets the teacher, students and their parents
know how well each student has completed the learning tasks and activities. It provides
information about student achievement. While it provides useful reporting information, it often
has little effect on learning.
The goal of Summative Assessment is to evaluate student learning at the end of an instructional
unit by comparing it against some standard or benchmark. Summative assessments often
have high stakes and are treated by the students as the priority over formative assessments.
However, feedback from summative assessments can be used formatively by both students and
faculty to guide their efforts and activities in subsequent courses.
An over-reliance on summative assessment at the conclusion of an element of study gives
students a grade, but provides very little feedback that will help them develop and improve
before they reach the end of the module/programme. Therefore achieving a balance between
formative and summative assessments is important, although one that students don't always fully
grasp and/or take seriously. Formative assessments provide a highly effective and risk-free
environment in which students can learn and experiment. They also provide a useful lead-in to
summative assessments, so long as feedback is provided.

Summative Assessment
Because summative assessments are usually higher-stakes than formative assessments, it is
especially important to ensure that the assessment aligns with the goals and expected outcomes
of the instruction.

 Use a Rubric or Table of Specifications - Instructors can use a rubric to lay out
expected performance criteria for a range of grades. Rubrics will describe what an ideal
assignment looks like, and “summarize” expected performance at the beginning of term,
providing students with a trajectory and sense of completion.

 Design Clear, Effective Questions - If designing essay questions, instructors can insure
that questions meet criteria while allowing students freedom to express their knowledge

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 5


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
creatively and in ways that honor how they digested, constructed, or mastered meaning.
Instructors can read about ways to design effective multiple-choice questions.

 Assess Comprehensiveness - Effective summative assessments provide an opportunity


for students to consider the totality of a course’s content, making broad connections,
demonstrating synthesized skills, and exploring deeper concepts that drive or found a
course’s ideas and content.

 Make Parameters Clear - When approaching a final assessment, instructors can insure
that parameters are well defined (length of assessment, depth of response, time and date,
grading standards); knowledge assessed relates clearly to content covered in course; and
students with disabilities are provided required space and support.

 Consider Blind Grading - Instructors may wish to know whose work they grade, in
order to provide feedback that speaks to a student’s term-long trajectory. If instructors
wish to provide truly unbiased summative assessment, they can also consider a variety
of blind grading techniques.

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 6


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
2.3. Assessment as Learning

Assessment as Learning develops and supports students' metacognitive skills. This form of
assessment is crucial in helping students become lifelong learners. As students engage in peer
and self-assessment, they learn to make sense of information, relate it to prior knowledge and use
it for new learning. Students develop a sense of ownership and efficacy when they use teacher,
peer and self-assessment feedback to make adjustments, improvements and changes to what they
understand.
Informal vs. Formal Assessment
First, let's define the term Assessment. Assessment is the process of observing a sample of a
student's behavior and drawing inferences about the student's knowledge and abilities. Yes, many
synonyms exist for assessment, such as test, exam, etc.

When using assessments, teachers are looking at students' behavior. We can't see inside a
student's head in order to determine what is going on, so we must take a sample of their behavior
over time in order to make an inference of their knowledge and development. Secondly, the
inferences that are drawn are only that - inferences.

Educators must use a variety of assessment types in order to gain the most accurate inference of
the students' progress overall. Educators should keep in mind assessments are tools that are only
useful depending on how well they are aligned with the circumstances in which they are used.
For example, a written assessment to determine how well a student can keep a beat in a music
class makes no sense and would therefore be an inappropriate tool.

There are two overarching types of assessment in educational settings: informal and formal
assessments. Both types are useful when used in appropriate situations. Informal
Assessments are those assessments that result from teachers' spontaneous day-to-day
observations of how students behave and perform in class. When teachers conduct informal
assessments, they don't necessarily have a specific agenda in mind, but are more likely to learn
different things about students as they proceed through the school day naturally. These types of
assessments offer important insight into a student's misconceptions and abilities (or inabilities)
that might not be represented accurately through other formal assessments. For example, a
teacher might discover that a student has a misconception about other cultures and languages
when she asks, 'What language do people in North Carolina speak?' Or, the teacher may wonder
if Alex needs to make an appointment to have his hearing checked if he constantly says 'What?'
or 'I didn't hear you.

Formal Assessments on the other hand, are preplanned, systematic attempts by the teacher to
ascertain what students have learned. The majority of assessments in educational settings are
formal. Typically, formal assessments are used in combination with goals and objectives set
forth at the beginning of a lesson or the school year. Formal assessments are also different from
informal assessments in that students can prepare ahead of time for them.

BACHELOR OF EDUCATION_TESL Lecture 3 / Page 7


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
LECTURE 4
Principles and Qualities of Language Tests

The general learning objective:


 To understand the basic notions of the qualities or criteria of a good test

The specific learning objectives:


 To be able to mention the qualities of a good test
 To be able to describe each of the qualities
 To be able to analyze a test based on the qualities

1. The Essential Qualities of a Good Test

The following are the characteristics of a good test:

 Objectivity: An evaluation tool should be objective in nature. The test should fetch almost the
same score to an individual, irrespective of the examiner who scores it. It should be free from
all kinds of biases at all levels of testing.

 Reliability: It is the consistency of scores obtained by an individual on different times on the same
test or parallel test, scored by same or different examiner at same or different times. The
difference in scores (if any) should be insignificant.

 Validity: An evaluation tool should measure what it is supposed to measure. If a tool is


developed or prepared to measure a particular aspect of personality, it should measure that
aspect only and nothing else. If the test does so, it is supposed to be valid.

 Practicability: An evaluation tool should have practicability or usefulness in terms of time,


energy and resources. It should also be easy to administer.

The aspects which affect the characteristics of a good test are as follows:
 Validity of the Test
 Reliability of the Test
 Objectivity of the Test
 Usability of the Test
 Comprehensive and Preciseness of the Test
 Administration of the Test
 Test from Economic Viewpoint
 Availability of the Test
 Appearance of the Test
 Standardization of the Test
 Norms of the Test

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 1


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Some of the important Characteristics of a good test are analyzed below:

Validity:

The most important quality of test interpretation or use is validity, or the extent to which the
inferences or decisions we make on the basis of test scores are meaningful, appropriate, and
useful. In order for a test score to be a meaningful indicator of a particular individual‟s ability,
we must be sure it measures that ability and very little else (Bachman, 1990).

The validity is concerned with relevance: does the test actually measure what we want it to
measure and does it do it well enough for us to have faith in the results? (Bell, 1981). If you are
trying to assess a person‟s ability to speak a second language in a conversational setting.

There are four kinds of validity;

1. Content Validity:
If the tasks which the candidates are required to perform in the test are a true reflection of the
skills which are actually required in real life, the test can be said to have content validity.

If you are trying to assess a person‟s ability to speak a second language in a conversational
setting, a test that asks the learner to answer a paper-and-pencil multiple-choice questions
requiring grammatical judgements does not achieve a content validity. Instead, a test that
requires the learner actually to speak within some sort of authentic context can be said to have a
content validity.

2. Construct Validity:
If the test is able to satisfy some previously stated theoretical requirements, it can be said to
possess construct validity (Bell, 1981). One way to look at construct validity is to ask the
question “Does this test actually tap into the theoretical construct as it has been defined?”.
“Proficiency” is a construct. “Communicative competence” is a construct. Virtually every issue
in language learning and teaching involves theoretical constructs (Brown, 2001).

3. Empirical Validity:
If the test results correlate positively and strongly with some trustworthy external criterion, the
test can be said to have empirical validity.

Time is normally taken as a key variable here- whether the correlation is carried out
simultaneously or with some subsequent criterion- and this provides a subdivision of empirical
validity into two varieties:
 Concurrent (or status) Validity
A typical example might be where a group of students is given the test and is
immediately rated by an experienced teacher.

 Predictive Validity
We might give the students the test and, after a period of time has passed, have them
rated again in some way.

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 2


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
4. Face validity:
If the test is accepted as appearing to be appropriate by those who administer it and those who
take it, it can be said to have face validity (Bell, 1981).

Validity of a test refers to its truthfulness; it refers to the extent to which a test measures what it
intends to measure. Standardization of a test requires the important characteristic viz., validity. If the
objectives of a test are fulfilled, we can say that the test is a valid one. The validity of a test is
determined by measuring the extent to which it matches with a given criterion. Let us take an example,
suppose we want to know whether an „achievement test in mathematics‟ is valid. If it really measures
the achievement of students in mathematics, the test is said to be valid, or else not. So „validity‟ refers
to the very important purpose of a test and hence it is the most important characteristic of a good test.
A test may have other merits, but if it lacks validity, it is valueless.

Freeman states, „an index of validity shows the degree to which a test measures what it is supposed to
measure when compared with the accepted criteria‟. Lee J. Cronback held the view that validity „is
the extent to which a test measures what it purports to measure‟.

Reliability:

Reliability refers to consistency of scores obtained by some individuals when re-tested with the
test on different sets of equivalent items or under other variable examining conditions. It refers to
the consistency of scores obtained by the same individuals when they are re-examined with the same
test on different occasions or with different sets of equivalent items or under different examining
conditions. Reliability paves the way for consistency that makes validity possible and identifies the
degree to which various kinds of generalizations are justifiable. It refers to the consistency of
measurement i.e., how stable test scores or other assessment results are from one measurement to
another.

Reliability refers to the extent to which a measuring device yields consistent results upon testing and
retesting. If a measuring device measures consistently, it is reliable. The reliability of a test refers to the
degree to which the test result obtained is free from error of measurement or chance errors. For
instance, we administer an achievement test in mathematics for students of class IX. In this test, Paresh
scores After a few days, we administer the same test. If Paresh scores 52 marks again, we consider the
test to be reliable, because we feel that this test accurately measures Paresh‟s ability in mathematics.
H.E. Garrett stated, „the reliability of test or any measuring instrument depends upon the consistency
with which it gauges the ability to whom it is applied‟. The reliability of a test can also be defined as „the
correlation between two or more sets of scores on equivalent tests from the same group of
individuals‟.

Reliability is a quality of test scores, and a perfectly reliable score would be one which is free
from errors of measurement. There are many factors other than the ability being measured that
can affect performance on tests and that constitute sources of measurement error. Individual‟s
performance may be affected by differences in testing conditions, fatigue, and anxiety, and they
may thus obtain scores that are inconsistent from one occasion to the next. If, for example, a

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 3


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
student receives a low score on a test one day and a high score on the same test does not yield
consistent results, and the scores cannot be considered reliable indicator‟s of the individual‟s
ability (Bachman, 1990).

Reliability is thus has to do with the consistency of measures across different times, test forms,
raters, and other characteristics of the measurement context.

The Practicality:

Practicality is concerned the usability of the test. Two parameters appear to be involved:
 Economy: the cost in time, money, and personnel of administering a particular test.
 Ease: the degree of difficulty experienced in the administration and scoring of the test and
the interpretation of the test results.

Finally, the ideal test would be one which was reliable in that it provided dependable
measurements, was valid in that it not only measured what it was supposed to measure,
supported what we already believed about the nature of language and of learning and agreed with
trustworthy outside criteria but also looked as though it dill all these things. In addition, it would
be cheap and easy to use (Bell, 1981)

Objectivity:

Objectivity is an important characteristic of a good test. Without objectivity, the reliability and
validity of a test is a matter of question. It is a pre-requisite for both validity and reliability.
Objectivity of a test indicates two things— item objectivity and scoring objectivity.

„Item objectivity‟ refers to the item that must call for a definite single answer. In an objective-
type question, a definite answer is expected from the test-takers. While framing the questions,
some points to be kept in mind are: ambiguous questions, lack of proper direction, double
barrelled questions, questions with double negatives, etc. These concepts affect the objectivity of
a test. Let us take an example of an objective item. Suppose we ask students to write about
Gandhi. This question does not have objectivity. Because, here the answers will have different
perceptions for different individuals and also the evaluation. If we ask the students „what was
Gandhi‟s father‟s name‟, this obviously will have only one answer and even the biasness of the
evaluator will not affect the scoring. So, all the items of a test should be objective.

Objectivity of scoring refers to by whosoever checked the test paper would fetch the same score.
It refers to that the subjectivity or personal judgment or biasness of the scorer should not affect
the scores. The essay-type questions are subjective and the scores are affected by a number of
factors like mood of the examiner, his language, his biasness, etc. Essay-type questions can have
objectivity if the scoring key and proper directions for scoring are provided.

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 4


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Usability:

Usability of a test refers to the practicability of a test. It refers to the degree to which the test can
be successfully used by the teachers/evaluators. Usability of a test depends on certain aspects
which are expressed in the following manner:

(a) Comprehensibility: The test items should be free from ambiguity and the direction to the
test items and other directions to the test must be clear and understandable. The directions
for scoring and the interpretation of scores must be within the comprehension of the user.

(b) Ease of administration: If the directions for administration are complicated or if they need
more time and labour, the users may lag behind to use such tests. The directions for
administration must be clear and concise. The test paper should be constructed according
to the availability of time. Lengthy tests involving more time may not be preferred for
use.
(c) Availability: If a test is not available at the time of necessity, it lacks its usability. Most of
the standardized tests are of high validity and reliability, but their availability is very less.
So it is desirable that in order to be reliable, the tests must be readily and easily available.

(d) Cost of the test: The cost of the test must be cheap, so that the schools and teachers can
afford to purchase and use them. If it will be costly, then every school cannot avail it. So
a good test should be of reasonable price.

(e) Ease of interpretation: A test is considered to be good if the test scores obtained can be
easily interpreted. For this, the test manual should provide age norms, grade norms,
percentile norms and standard score norms like standard scores, T-scores, Z-scores etc.
So „interpretability‟ of test refers to how readily the raw scores of test can be derived and
understood.

(f) Ease of scoring: A test in order to be usable must ensure ease of scoring. The scoring
must be a simple one.

All the directions for scoring and the scoring key should be available, to make the scoring an
objective one. The examiner‟s biasness, the handwriting of the examinee should not affect the
scoring of a test.
Backwash

The backwash effect (also referred to as the washback effect) is the impact of a test on how
students are taught (e.g. the teaching mirrors the test because teachers want their students to
pass). The washback effect is the result of a test or an examination that results in positive or
negative results. The washback effect is based on two basic terms, which are the positive and the
negative washback effect. Positive washback occurs when harmony exists between the teaching
and the examination or class test performance of the students. When there is no sync between
what is taught and what is performed, the negative washback effect happens (such as narrowing
down the content). Both of these types of washback influence the teaching as well as the learning
process.

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 5


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Discrimination

All evaluation is based on a comparison, either between one student and another, or as they are
now and as they were before, between students. The ability to discriminate between the
performance of different learners or the same student at different points in time is an important
feature of a good test. Depending on the purpose of the test, the extent of the need for
discrimination varies.

QUESTIONS AND ASSIGNMENTS

Answer the following questions briefly.

1. Mention the three qualities of a good test.


2. Explain what the test validity means.
3. Mention the four kinds of the test validity and explain each of them.
4. Explain what the test reliability means.
5. Explain what the practicality of a test means.
6. What are the two parameters involved in the test practicality?
7. Does a test having a good validity always have a good reliability? Why/Why not? Illustrate
your answer.
8. Does a test which has its perfect validity and reliability always have a good practicality?
9. A dictation and a cloze test were administered as a placement test for an English course. What
do you?

MID EXAM
After completing above Four Topics students will take
Mid Exam

BACHELOR OF EDUCATION_TESL Lecture 4 / Page 6


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
LECTURE 5
Types of Classroom Test Items (Subjective Test Items)

The general learning objective:


 To understand objective and subjective testing

The specific learning objectives:


 To be able to explain the meaning of objective and subjective testing
 To be able to mention some kinds of objective and subjective testing
 To be able to describe some advantages and disadvantages of using them.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 1


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
1. The Objective Testing
Objective testing are usually found in short-answer items. The short-answer items include
(Tuckman, 1975)

Type 1. Objective Type Tests:


Objective type test items are highly structured test items. It requires the pupils to supply a word
or two or to select the correct answer from a number of alternatives. The answer of the item is
fixed one. Objective type items are most efficient to measure different instructional objectives.
Objective type tests are also called as ‗new type tests‘. These are designed to overcome some of
the great limitations of traditional essay type tests.

Objective Type Tests have proved their usefulness in the following way:
 It is more comprehensive. It covers a wide range of syllabus as it includes a large number
of items.
 It possesses objectivity of scoring. The answer in objective type test is fixed and only one
and it is predetermined. So that different persons scoring the answer script arrives at the
same result.
 It is easy to score. Scoring is made with the help of scoring key or a scoring stencil. So
that even a clerk can do the job.
 It is easy to administer.
 Objective type tests can be standardized.
 It is time saving.
 Objective type tests can measure wide range of instructional objectives.
 It is highly reliable.
 It is very much economic.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 2


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Objective Type Tests can be classified into two broad categories according to the nature of
responses required by them:
(A) Supply/Recall Type
(B) Selection/Recognition Type

(A) Supply/Recall Type:


Supply Type items are those in which answers are not given in the question. The students supply
their answer in the form of a word, phrase, number or symbol. These items are also called as
‗free response‘ type items.

According to the method of presentation of the problem these items can be divided into two
types viz,
1. Short Answer Type
2. Completion Type

Example:
1. Short Answer type:
In which year the first battle of Panipath was fought? 1526 A.D.

2. Completion type:
The first battle of Panipath was fought in the year 1526 AD.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 3


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
In the first case the pupil has to recall a response from his past experience to a direct question.
These types of questions are useful in mathematics and physical science. But in the second case
the pupil may be asked to supply a word or words missing from a sentence. So in completion
type a series of statements are given in which certain important words or phrases have been
omitted and blanks are supplied for the pupils to fill in.

Principles of Constructing Recall Type Items:-

If the recall type items are constructed with the following principles then it will be more effective
and it will function as intended.

1. The statement of the item should be so worded that the answer will be brief and
specific:
The statement of the problem should be such that it conveys directly and specifically what
answer is intended from the student.

2. The statement of the item should not be taken directly from the text books:
Sometimes when direct statements from text books are taken to prepare a recall type item it
becomes more general and ambiguous.

3. While presenting a problem preference should be given to a direct question than an


incomplete statement:

4. When the answer is a numerical unit the type of answer wanted should be indicated:
When learning outcomes like knowing the proper unit, knowing the proper amount are expected
at that time it must be clearly stated that in which unit the pupils will express their answer.
Specially in arithmetical computations the units in which the answer is to be expressed must be
indicated.

5. The length of the blanks for answers should be equal in size and in a column to the
right of the question:
If the lengths of the blanks vary according to the length of the answer then it will provide clues
to the pupils to guess the answer. Therefore the blanks of equal size should be given to the right
hand margin of the test paper.

6. One completion type item should include only one blank:


Sometimes too many blanks affect the meaning of the statement and make it ambiguous. So that
in completion type items too many blanks should not be included.

Uses of Recall Type Items:-

Several learning outcomes can be measured by the recall type items.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 4


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Some common uses of Recall Type Items are as following:
 It is useful to measure the knowledge of terminology.
 It is useful to measure the knowledge of specific facts.
 It is useful to measure the knowledge of principles.
 It is useful to measure the knowledge of methods and procedures.
 It is useful to measure the ability to interpret simple data.
 It is useful to measure the ability to solve numerical problems.

Advantages of recall type Items:


 It is easy to construct.
 Students are familiar with recall type items in day to day class room situations.
 Recall type items have high discriminating value.
 In well prepared recall type items guessing factors are minimized.

Limitations of recall type Items:


 These items are not suitable to measure complex learning outcomes.
 Unless care is exercised in constructing the recall items, the scoring is apt to be
subjective.
 It is difficult to measure complete understanding with the simple recall and completion
type items.
 The student may know the material being tested but have difficulty in recalling the exact
word needed to fill in the blank.
 Sometimes misspelt words put the teacher in trouble to judge whether the pupil
responded the item correctly or not.
 Simple recall item tends to over-emphasize verbal facility and the memorization of facts.

(B) Selection/Recognition Type:


In the recognition type items the answer is supplied to the examinee along with some distractors.
The examinee has to choose the correct answers from among them. So that these tests are known
as ‗Selection type‘. As the answer is fixed and given so some call it ‗Fixed response type‘ items.

The Recognition Type Test items are further classified into following types:
1. True-False/Alternate Response Type
2. Matching Type
3. Multiple Choice Types
4. Classification or Rearrangement Type.

1. True False Items:

True false items otherwise known as alternate response items consists of a declaratory statement
or a situation where the pupil is asked to mark true or false, right or wrong, correct or incorrect,
BACHELOR OF EDUCATION_TESL Lecture 5 / Page 5
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
yes or no, agree or disagree etc. Only two possible choices are given to pupils. These items
measure the ability of the pupil to identify the correct statements of facts, definition of terms,
statement of principles and the like.

Principles of Constructing True False Items:

While formulating the statements of the true false items the following principles should be
followed. So that the items will be free from ambiguity and unintentional clues.

(i) Determiners that are likely to be associated with a true or false statement must be
avoided:
Broad general statements like usually, generally, often and sometimes give a clue that the
statements may be true. Statements like always, never, all, none and only which generally
appear in the false statements give clue to the students in responding it.

(ii) Those statements having little learning significance should be avoided:


The statements having little significance sometimes compel the students to remember
minute facts at the expense of more important knowledge and understanding.

(iii)The statements should be simple in structure:


While preparing statements for the true false items long, complex sentences should be
avoided because it acts as an extraneous factor which interferes in measuring knowledge
or understanding.

(iv) Negative statements, especially double negative statements should not be used:
Double negative statements make the item very much ambiguous. Sometimes it is found
that the students overlook the negative statements.

(v) The item should be based on a single idea:


One item should include only one idea. We can obtain an efficient and accurate
measurement of students‘ achievement by testing each idea separately.

(vi) False statements should be more in number than true statements:


Pupils like more to accept than challenge therefore giving more false statements we can
increase discriminating power of the test and reduce the guessing.

(vii) The length of the true statements and false statements should be equal in size:

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 6


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Uses of True False Items:
i. True false items are useful to measure varied instructional objectives. Some of the
common uses of true false items are given below.
ii. It is used to measure the ability to identify the correctness of the statements, facts,
definitions of terms etc. True false items are useful in measuring the ability to distinguish
facts from opinion.
iii. It is useful to measure knowledge concerning the beliefs held by an individual or the
values supported by an organization or institution.
iv. True false items are useful to measure the understanding of cause and effect relationship.
v. It is useful to measure the ability of the students for logical analysis.

Advantages of True-False Items:


i. True-false items provide a simple and direct means of measuring essential outcomes.
ii. All the important learning outcomes can be tested equally well with true-false items like
other objective type items.
iii. The probability of an examinee achieving a high score on a true false test by guessing
blindly is extremely low.
iv. It uses few statements directly from text books.
v. It possesses very powerful discriminating power.
vi. It is easy to construct.

Limitations of True-False Items:


i. As there are only two alternatives so it encourages guessing.
ii. Many of the learning outcomes measured by true-false items can be measured more
efficiently by other items.
iii. A true false item is likely to be low in reliability when the numbers of items are less.
iv. The validity of these items are questionable as the students may guess the uncertain items
consistently ‗true‘ or false.
v. It does not possess any diagnostic value.

2. Matching Items:

Matching items occur in two columns along with a direction on the basis of which the two
columns are to be matched. It consists of “two parallel columns with each word, number or
symbol in one column being matched to a word, sentence or phrase in the other
column.” The first column for which matching is made are called as ‗Premises‘ and the second
column from which the selections are made are called ‗Responses‘. On the basis of which the
matching will be made are described in the ‗Directions‘. The students may be asked to match the
states with their respective capitals, historical events with dates, kings with their achievements
etc.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 7


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Example:
Direction:
Match the dates in the column ‗B‘ with the respective events in column ‗A‘ by writing the
number of the item in ‗B‘ in the space provided. Each date in column ‗B‘ may be used once,
more than once or not at all.

Principles of Constructing Matching Items:

Matching exercises are very much useful when it is properly arranged. While preparing a
matching item care should be taken to prevent ii-relevant clues and ambiguity of direction. The
following principles help to prepare effective matching exercises.

(i) Homogeneous premises and responses should be given in one matching exercise:
In order to functions a matching exercise properly the premises and responses of any matching
cluster should be homogeneous. Therefore one matching exercise may include kings and their
achievements, inventors and their inventions, explorers and their discoveries, countries and its
best productions etc.

Direction:
On the line to the left of each achievement listed in column A write the king‘s name in the
column who is noted for that achievement. The name of the kings in the column ‗B‘ may be used
once, more than once, or not at all.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 8


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
(ii) The list of Premises and Responses should be short:
In order to maintain the homogeneity of items it must be short listed. Experts are of the opinion
that 4 to 5 premises should be matched with 6 to 7 responses. Certainly there should not be more
than ten in either column.

(iii)The longer phrases should be used as premises and the shorter as responses:
It enables to take the test efficiently. It enables examinees to read the longer premise first and
then to search the response rapidly.

(iv) The premises and responses should be unequal in number:


The number of responses should be more or fewer than the premises. The students should be
instructed in the direction that response may be used once, more than once or not at all. This is
the best method to reduce guessing in matching exercises.

(v) Responses should be arranged in logical order:


In responses the numbers should be arranged sequentially from low to high. The words should be
arranged in alphabetical order.

(vi) Directions should clearly explain the intended basis for matching:
To avoid ambiguity and confusion clear direction about the basis for matching must be given. It
will also reduce testing time because the examinees need not to read all the premises and
responses to understand the basis for matching.

(vii) One matching exercise must be given on one page of the test paper:

Uses of Matching Item:

Matching exercise is useful in the following learning outcomes:


i. Useful in measuring the relationship between two things like dates and events, persons
and their achievements, terms and definitions, authors and books, instruments and uses
etc.
ii. It is used to measure the ability to relate the pictures with words.
iii. It is useful to measure the ability to identify positions on maps, charts or diagrams.

Advantages of Matching Items:


i. Matching items are easy to construct.
ii. Large amount of related factual material can be measured within a short period.
iii. Guessing factor is minimum in a carefully and properly constructed matching item.
iv. These items are equally reliable and valid like other objective type items.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 9


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Limitations of Matching Item:
i. It is limited to test the factual information only.
ii. It is not efficient to measure the complete understanding and interpretation ability of the
pupils.
iii. Always it is not possible to find a good number of homogeneous items.
iv. It is inferior to multiple-choice item in measuring application and judgment aspects of
students‘ learning.

3. Multiple-Choice Type Items:

Multiple choice type items are the most widely used objective type test items. These items can
measure almost all the important learning outcomes coming under knowledge, understanding and
application. It can also measure the abilities that can be tested by means of any other item—short
answer, true false, matching type or essay type.

In multiple choice type items a problem is presented before the student with some possible
solutions. The statement of the problem may be presented in a question form or in an incomplete
sentence form. The suggested solutions are presented in words, numbers, symbols or phrases.
The statement in a multiple choice type item is known as ‗stem‘ of the item. The suggested
solutions are called as alternatives, or choices or options. The correct alternative is called as the
answer and the other alternatives are known as distracters or decoys or foils. In the test the
examinees are directed to read the stem and to select the correct answer.

Principles for Constructing Multiple Choice Type Items:

As we have discussed the multiple choices type items have a wide applicability in educational
measurement. Therefore care must be taken in constructing multiple choice type items to en-
hance its applicability and quality.

Construction of Multiple Choice Type Items includes two major functions:


(a) Construction of the stem
(b) Selecting ideal alternatives.

The following principles will help the test maker in this direction:

(i) Formulate the stem that clearly represents the definite problem:
In a multiple choice item a question or an incomplete statement should include the complete
problem in it. It must indicate what the student has to select from the alternatives.

In the first example it is not clear, what answer the test maker expects from the testee. The
alternatives are very much lacks in homogeneity. But in the second example it is clearly
indicated that the test maker wants to know which this temple city of India is. The alternatives
are also homogeneous in nature.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 10


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
(ii) The item stem should be free from irrelevant material:
The item should include the complete problem as well as it should be free from irrelevant
materials. The best item is that which is short, easily read and clearly indicate the complete
problem.

But sometimes when our main thrust is to measure the problem solving ability exceptionally we
use irrelevant materials. Because it helps us to know whether the student is able to identify the
relevant material to solve the given problem or not.

(iii)Avoid the use of questions or exact problems used during instruction:


The exact questions discussed during the class-room instruction should be avoided. The
questions should be novel and unique.

(iv) Negative statements should only be used when it is required:


Items that are negatively stated sometimes put the examinee in confusion. It is easy to construct a
negatively stated item by picking text book statement and turning it into a negative statement.
But sometimes it is desirable to phrase the stem question to ask not for the correct answer, but
for the incorrect answer.

(v) There should not be any grammatical link between the stem and the alternatives:
Sometimes the grammatical link between the stem and alternatives give clue to answer the
question:

(vi) All the distracters should be homogeneous in nature:


All the distracters should be so framed that each distracter seems to be correct answer. Students
those who have not achieved the desired learning out-come, the distracter should be more at-
tractive for them. This distracting power of the distracters can be judged, from the number of
examinees elected it. If a distracter is not attempted by anybody it should be eliminated or
revised.

(vii) Avoid the verbal association between the stem and the correct answer:
Sometimes verbal association between the stem and the correct answer provide an irrelevant clue
to answer the question. Rather the item can be made effective by making the distracters verbally
associated with the stem instead of the correct answer.

(viii) All the alternatives should be equal in length:


There is a tendency to express the correct answer in a greater length than other alternatives. This
provides a clue to choose the alternative as correct answer.

(ix) The correct answer should be clear, concise, correct and free of clues.

(x) The responses „all of the above‟ and „none of the above‟ should only be used when
it is appropriate:

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 11


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Sometimes the responses like ‗all of the above‘ or ‗none of the above‘ are added as the final
alternative.
The response ‗all of the above‘ forces the student to consider all the alternatives and increase the
difficulty of the items. But the case of this response as the correct answer is appropriate, when all
the preceding alternatives are entirely correct to the item. The response ‗none of the above‘ is
used either as the correct answer or as a distracter. Sometime to avoid the distracters which are
more difficult than the correct answer the alternatives are added.

(xi) When other item types are more effective then multiple choice type items should
not be used:
When other items proved effective to measure learning outcomes at that time unnecessary favour
should not be shown to multiple choice items. There are also some learning out- comes which
cannot be measured by multiple choice items in that case other item types like—short answer,
true false or matching type may be used.

(xii) Number of alternatives in each item need not to be same:

Uses of Multiple Choice Items:


Multiple choice items have a wide applicability in measuring student‘s achievements. Except
some special learning outcomes like ability to organise, ability to present ideas all other learning
outcomes can be measured by multiple choice items. It is adaptable to all types of instructional
objectives viz. Knowledge, understanding and application.

Following are some of the common uses of Multiple Choice Type Items:
i. To measure knowledge of terminology:
Multiple choice type items can be used to measure the knowledge of terminology. A student is
asked to select either accurate definition of the term or the accurate term for a given definition.

ii. To measure knowledge of specific facts:


Knowledge of specific facts can be measured with multiple choice type test items, like dates,
name, places etc.

iii. To measure knowledge of principles:


Multiple choice type tests are very much useful to measure the knowledge of principles.

iv. To measure knowledge of methods and procedure:


Methods and procedures related to laboratory experiment, teaching learning process,
communication process; procedures regarding the function of a government, bank or
organization can be best measured by the multiple choice type items.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 12


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
1. To measure the ability to apply the knowledge of facts and principles in solving
problems.
In order to know the understanding level of the students they must be asked to identify correct
application of fact or principles.
Multiple choice type items can be used to measure this ability to apply.

2. To measure the ability to interpreted cause and effect relationship:


One way to measure the understanding level is to ask the students to show the cause and effect
relationships. Here the examinee is presented with some specific cause and effect relationships in
the stem and some possible measures in the alternatives. The student has to find out the correct
reason.

Advantages of Multiple Choice Items:


i. Multiple choice items are very much flexible. So that it can be used to measure a variety
of learning objectives—knowledge, understanding and application areas.
ii. It is free from ambiguity and vagueness if carefully constructed.
iii. Chance of guessing is low than the true-false items.
iv. It does not require homogeneous items like matching exercises.
v. It is more reliable than the true-false items as the number of alternatives are more.
vi. It is easy to construct quality tests with multiple items.
vii. It possess objectivity in scoring.

Limitations of Multiple Choice Items:


i. It is limited to learning outcomes at the verbal level only. As it is a paper-pencil test it
only measures what the Pupils know and understand about the problem situation but do
not measure how the pupil performs in that problem situation.
ii. It is not effective to measure learning outcomes requiring the ability to recall, organise, or
represent ideas.
iii. It is not completely free from guessing.
iv. Guessing factor is more in multiple choice type items than the supply type items.

2 Essay Type Tests:

In class room testing essay type tests are very popularly used. Specially we found its intensive
use in the higher education. History shows that in China essay type test were in use earlier to
2300 B.C. and in the beginning of 20th century this was the only form of written test.

Objective type tests are effective in measuring a variety of learning outcomes. Still there are
some complex learning outcomes which cannot be measured by the objective type test items.
Learning out-comes concerning ability to recall, organize, and integrate ideas; the ability to
express oneself in writing; and the ability to supply ideas cannot be measured with objective type
tests. Measurements of these outcomes require essay type items.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 13


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Essay type tests are those tests in which the examinee is asked to discuss, enumerate, compare
state, evaluate, analyse, summaries or criticize, includes writing at a specified length on a given
topic involving the process listed above. In essay type tests pupil are free to select, relate and
present ideas in their own words. So that the distinctive feature of essay type test is the freedom
of response.

Essay Type Tests can be divided into two categories according to the freedom provided to
the pupils:
(A) Restricted Response Type Tests.
(B) Extended Response Type Tests.

(A) Restricted Response Type Tests:

These classifications are made on the basis of degree of freedom provided to the pupil in
answering the test. The content and the response are limited in the restricted response type
questions. Content is restricted by directing the student to discuss specific aspects of the topic.
Responses are restricted by the form of question.
Example:
Explain five causes of failure of Basic Education?
Differentiate between objective type test and essay type tests with in 100 words.

Major uses of restricted response type test are as following.


Uses:
Restricted response type questions as used to measure ability to explain cause and effect
relationships. to describe application of principles, to present relevant arguments, to formulate
hypotheses, to formulate noted conclusions, to state necessary assumptions, to describe the
limitations of data, to explain methods and procedures etc.

(B) Extended Response Type Test:

Extended response type test items are those which “allows pupils to select any factual
information that they think is pertinent, to organize the answer in accordance with their
best judgement, and to integrate and evaluate ideas as they deem appropriate”—Gronlund
and Linn. There are some complexes behaviours which cannot be measured by objective means
can be measured by extended response questions.

Example:
“Today India needs an idealistic system of education.” Do you agree with this view? Justify
your statement.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 14


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Compare the status of women education of Vedic period with that of the Buddhist period. Test
experts are of the opinion that due to high unreliability in scoring the use of these tests should be
minimum. It should be used to measure only those complex learning outcomes which cannot be
measured by any other testing devices.

Uses:
Extended response type tests are used to measure the ability to produce, organize and express
ideas, to integrate learning in different areas, to create some original forms and designs, to
evaluate the worth of something.

Advantages of Essay Type Question:


i. Essay type tests measure some complex learning outcomes that cannot be measured by
objective type test.
ii. The essay type test especially the extended response type questions emphasize on the
integration and application of thinking and problem solving skills, which cannot be
measured effectively by objective type test.
iii. In essay type test the pupils have to present the answer in his own handwriting therefore
it is a right device to measure the writing skill.
iv. Essay type tests have a wide spreader use by the class-room teachers because of its ease
of construction.
v. Essay type tests are easy to construct. So that class-room teachers use these tests very
frequently.

Limitations of Essay Type Question:


Essay type tests have serious limitations. Limitations of essay type tests are so serious that it
would have been discarded totally unless it measures some specific learning outcomes which
cannot be measured by objective type test items.
i. Unreliability of scoring restricts the use of essay type tests. Different teachers scoring an
essay type question may arrive at different results even the same teacher scoring it at two
different times get two different scores.
ii. It requires expert personnel to score essay type tests.
iii. It requires more time in scoring essay type questions.
iv. Essay type tests are limited in sampling. It can cover a limited range of course content as
well as instructional objectives.

Principle of Constructing Essay Questions:


As essay type tests measure some of the complex achievements which cannot be measured by
objective type tests. So it requires improvement and care for its construction.

The Construction of Essay Type Test involves two major functions:


(a) Constructing essay type questions that measure desired learning outcome
(b) Developing a reliable method to score.

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 15


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Following principles help to prepare a good essay type test with good questions and better
method of scoring.

Principles of Constructing Essay Questions:

(i) The questions should be so stated that the defined instructional objectives are
clearly measured:
Like objective type tests the essay type questions should be based on specific learning objectives.
The statement of the question should be phrased in such a manner that it calls forth the particular
behaviour expected from the students. Restricted response type items are more effective to elicit
particular learning outcomes than extended response type items.

(ii) The pupil‟s task should be stated as completely and specifically as possible:
The question should be carefully stated so that the pupil can understand what the test maker
intends to measure. If the idea is not conveyed clearly an explanation show Id be added to it.

(iii)Choices among optional questions should not be given to the pupils unless it is very
much necessary:
It is a practice to provide more questions than they are expected to answer and to allow them to
choose a given number. For example they are asked to answer any five out of 10 questions. In
this case if different pupils answer different questions, the basis for comparing their scores is
difficult. As the student tends to answer best five so that the range of the test scores will be very
narrow. It also affects validity of the test results. Because the students may select a portion of the
context and get advance preparation, this provides a distorted measure of pupil‘s achievement.

(iv) Approximate time limit for each question should be indicated:


While constructing an essay question the test maker should decide the approximate time required
to answer that question. Thus time limit for each question must be indicated in the test, so that
the students will not be dazzled at the end. If the test contains different sections like objective
type test and essay type test separate time indication for each section must be given.

(v) Use of essay type tests should be limited to measure only those learning outcomes
which cannot be measured by objective type items:
As objective type tests are more reliable, valid and objective than essay type tests, therefore
when the learning objectives can be measured by objectively tests at that time essay type tests
should not be used. But when objective items are inadequate for measuring the learning
outcomes at that time essay type questions must be used in-spite of their limitations.

Principles of Scoring Essay Questions:

Effectiveness of the essay questions as a measuring instrument depends to a great extent on the
scoring procedure. Improving an objective scoring procedure we can improve the reliability of an

BACHELOR OF EDUCATION_TESL Lecture 5 / Page 16


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
essay question. The following principles of scoring help to develop comparatively more
objective procedure of scoring.

(i) Scoring should be made question by question rather than student by student:
Reliability of essay test results are affected due to variation in standard. This means when an
average answer script is valued just after a failed answer scripts the score might be higher than
expected. This can be avoided by evaluating answers question by question instead of student by
student.

(ii) The examinee‟s identity should be concealed from the scorer:


By this we can avoid the ‗halo effect‘ or biasness which may affect the scoring. The answer to
different questions should be written on separate sheets with a code number. Then it should be
arranged question wise and scored.

(iii) An outline of the major points to be included in the answer should be prepared:
The test maker should prepare a list of major points to be included in the answer to a particular
question and the amount of marks to be awarded to each point. Preparation of this scoring key
will provide a basis for evaluation ultimately it will provide a stable scoring.

(iv) Decisions should be made how to deal with factors that might affect the learning
outcome:
Factors like handwriting, spelling, sentence structure, neatness etc. and style of presentation
affect the test scores. While evaluating an answer care must be taken to deal with these factors.

In conclusion;

Relative Merits and Demerits of Different Test Items


Different types of test include multiple-choice questions, short-answer questions, essay –type
questions, true-false questions, and so on. Let us look at their merits and demerits.

a) Multiple-Choice Questions

Merits
The merits of multiple-choice questions are as follows:
 Quick and easy to score, by hand or electronically
 Can be written so that they test a wide range of higher-order thinking skills
 Can cover lots of content areas on a single exam and still be answered in a class period

Demerits
The demerits of multiple-choice questions are as follows:
 Often test literacy skills: ―if the student reads the question carefully, the answer is easy to
recognize even if the student knows little about the subject‖ (p. 194)
 Provide unprepared students the opportunity to guess, and with guesses that are right, they get
credit for things they don‘t know
BACHELOR OF EDUCATION_TESL Lecture 5 / Page 17
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
 Expose students to misinformation that can influence subsequent thinking about the content
 Take time and skill to construct (especially good questions)

b) True-false questions

Merits
The merits of truth-false questions are as follows:
 Quick and easy to score

Demerits
The demerits of truth-false questions are as follows:
 Considered to be one of the most undependable forms of evaluation
 Often written so that most of the statement is true save one small, often trivial bit of
information that then makes the whole statement untrue
 Encourages guess work, and rewards correct guesses

c) Short-answer questions

Merits
The merits of short-answer questions are as follows:
 Quick and easy to grade
 Quick and easy to write

Demerits
The demerits of short-answer questions are as follows:
 Encourage students to memorize terms and details, so that their understanding of the content
remains superficial

d) Essay questions

Merits
The merits of essay questions are as follows:
 Offer students an opportunity to demonstrate knowledge, skills, and abilities in a variety of
ways
 Can be used to develop student writing skills, particularly the ability to formulate arguments
supported with reasoning and evidence

Demerits
The demerits of essay questions are as follows:
 Require extensive time to grade
 Encourage use of subjective criteria when assessing answers

If used in class, necessitate quick composition without time for planning or revision, which can result
in poor-quality writing.
BACHELOR OF EDUCATION_TESL Lecture 5 / Page 18
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
LECTURE 6

Stages of Test Construction


Main Stages of test design.
A. Planning
B. Producing
C. Administering
D. Analyzing

A. Planning Stage
Many new teachers develop classroom tests without spending enough time planning what they
want to do. With experience, they soon learn that planning is critical to developing high-quality
tests. Teachers need to spell out:
 the purpose / the objective of the test,
 the material that will be covered,
 the types of item that will be used,
 the difficulty level of the items, and
 the time that is available for the

Construction of criterion referenced test


During the preparation of a criterion-referenced test, the test constructor is required to take the
following steps:

1: Identifying the Purpose of the Test


First of all, the objectives of the test are finalized. The test developer must know the purpose of the
test. He should be well informed and well aware of the purpose of the test for which it is going to
be prepared. In addition to the purpose, he should also know the following aspects of the test:
 Content areas of the test from where the items will be developed
 Level of students or examinees for whom test is being prepared
 Difficulty level of the test items
 Types of the test objective type or subjective type or mixed type of test
 Criteria for qualifying the test
After having understood these points, the test developer starts the work of constructing the
criterion-referenced test. He moves on to the second step.

2: Planning the Test

This step requires the following work to be done by the test constructor:

i. Content analysis: The test developer analyses the content of the test. It involves the
selection of contents, i.e., the testing areas and its peripherals. He also decides the key areas of
the content from where more questions are to be developed.
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 1
ii. Types of items: The decisions regarding the type of items are taken at this stage. In case of
subjective type, it may be essay type, short-answer type and very short-answer type. In case of
objective type, it may be multiple- choice question, fill in the blanks, true or false type,
sentence completion type, one word answer type, etc., If the test is of mixed type, then questions
are developed accordingly. But what is planned at this stage is the proportion of objective and
subjective type items in terms of marks.

iii. No of items: Total number of questions of each type which is to be included in the test is
decided.

Weightage: It is very important to decide the weightage of each type of items and each
content area. It depends upon the level of the student being tested. As we move from lower to
higher level, the percentage of knowledge domain items decreases and higher order thinking
abilities, such as understanding, application and skill, increases. The test developer also decides the
weightage of each of the content areas being included in the test considering its relevance.
 Weightage to objective: This indicates what objectives are to be tested and what
weightage has to be given to each objective

 Weightage to content: This indicates the various aspects of the content to be tested
and the weightage to be given to these different aspects.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 2
iv. Duration of the test: With consideration to the total number of questions in the test, the level of
examinees, difficulty levels of the test items and the duration of the test are decided.

v. Mechanical aspects: It includes the quality of paper, ink, diagrams, type setting, font size and
printing of the test papers.

vi. Development of key for objective scoring: To bring objectivity in the process of evaluation, it
becomes essential to achieve interpersonal agreement among the examiners with regard to the
meaning of the test items and their scoring. For this purpose, a ‘key’ is prepared for each paper
and given to all examiners while scoring the test. They are supposed to score the test following the
key.

vii. Instructions for the test: The test developer also prepares instructions for its administration, and
scoring and evaluation procedure ‘test manual’. It shows the whole procedure of testing. It acts as a
guide to the individuals involved in testing procedure at all stages. This manual is strictly applied to
bring objectivity in the test.

3: Preparing blueprint of the test


Blueprint is a specification chart which shows the details of the test items to be prepared. It shows all
the content areas and the number and type of questions from those areas. It also reflects the objectives to
be tested. The blueprint describes the weightage given to various content areas, objectives, types of
items and all other details of the test. It serves as a guideline or frame of reference for the person
constructing the test.
Blue print is a three-dimensional chart giving the placement of the objectives, content and form of
questions

Note: O – Objective Type, SA – Short Answer Type, E – Essay Type The number outside the bracket
indicates the marks and those inside indicates the number of questions.
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 3
B. Preparing Test Items
1. Writing of Test Items
According to the blueprint all questions are constructed, covering all the content areas, all the
objectives or abilities and all types of items. Questions may be objective or subjective type as mentioned
in the blueprint.
 preparing items,
 ordering items,
 formatting the test
 preparing instruction

Selecting the items for the test


 The paper setter writes items according to the blue print.
 The difficulty level has to be considered while writing the items.
 It should also check whether all the questions included can be answered within the time
allotted.
 It is advisable to arrange the questions in the order of their difficulty level.
 In the case of short answer and essay type questions, the marking scheme is prepared.
 In preparing marking scheme the examiner has to list out the value points to be credited and
fix up the mark to be given to each value point.

Items have already been constructed as per the guideline of the blue print. It may be that some items are
not suitable or up to the mark. To avoid this fact, generally some more questions are prepared so that if
any question is rejected at any stage, it might be able to be replaced immediately. The following steps are
followed to select the right item for the test. This process of selection is done through a process known as ‘try
out’. The process of try out involves the following steps:
i. Sampling of subjects: As per the size of population for which the test is being prepared, a
workable sample, say around 150 subjects, is selected on a random basis. This is the sample on
which the prepared items are tested for its functionality, workability and effectiveness.
ii. Pre-try out: It is also known as preliminary try-out. The prepared items are administered on a
sample of around ten subjects taken form the sample. The answer sheets are checked, evaluated
and discussed with the candidates for any kind of problems they would have faced during the
test. It is probable that they might have faced the language difficulty, words ambiguity and some
other problems of this kind. These problems are sorted out. The items having these problems are
rewritten or rephrased to improve and modify the language difficulties and ambiguity of the items.
At the end of the pre try- out, the initial draft of the test is prepared.
iii. Proper try-out: At this stage, around fifty candidates are selected from the sample and the initial
draft of the test is administered on them. Answer sheets are scored and item analysis is done.
Difficulty value and discrimination power of each item are calculated. The items which come
within the acceptable range of difficulty value and discrimination power are selected for the test
and others are rejected.
iv. Final try-out: The final try-out is done on a comparatively large sample. The sample size may be
more than 100 or even more, depending upon the size of population. After administration and
scoring of the test, reliability and validity of the test are measured. If it is proved to be reliable and
valid, it gets green signal.
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 4
2. Preparation of Scoring Key and Marking Scheme

3. Proof reading / vetting or moderation….


The last step before printing the test is to proofread it carefully. Take the test yourself as the
students would take the test. Doing so sometimes reveals errors that you would easily miss if you
were simply reading through the test. Making announcements about typos while the students are
taking the test wastes time and distracts some students. Doing so also interferes with the
students’ concentration

C. Administration and Reporting of Assessment


Evaluating the Test and Preparing Final Draft of the Paper
For establishing quality, an index test manual is prepared which informs about the test’s norms,
scoring key, reliability and validity. The final draft of the test paper is prepared. Instructions for
the examinees as well as for the test administration are determined. Item analysis is performed to
find out the item workability for the test. The required changes are done and final draft of the
paper is ready for printing.

Setting Up an Appropriate Testing Environment


A test can only be reliable and valid if it is administered in an appropriate environment. Students
need a comfortable, quiet place, free from distractions, to take a test. The room should be well-
ventilated and be at a comfortable temperature. There should be few distracting noises or
activities going on in the classroom. The room should not be crowded. Students need to feel that
they have enough room to work without being on top of one another.

Minimizing Frustration
Teachers need to minimize student frustration. Frustration often leads to anxiety which tends to
have a negative effect on student performance.
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 5
Testing needs to follow a reasonable schedule. Students need to be given plenty of advanced
warning about an upcoming test so that they have sufficient time to prepare. Many teachers
provide review sessions the day before the test, which can allow students to judge their level of
preparation and gives the students the chance to devote additional time to study, if needed.

Although teachers need to stress the importance of tests, it is probably best not to over-
emphasize a test’s importance. That only serves to heighten student anxiety.

Minimizing Interruptions
It is also important to minimize interruptions during the test. If the test is well prepared and the
teacher gives instructions before the students begin the test, there should be no reason for the
teacher to have to give additional instructions during the test.

D. Item Analysis

 Item analysis is a process which examines student responses to individual test items
(questions) in order to assess the quality of those items and of the test as a whole.

 Specifically, what one looks for is the difficulty and discriminating power of the item as well
as the effectiveness of each alternative. How hard the items for the group tested, and how
well does it distinguish between the more knowledgeable and the less knowledgeable
students?

 If all students mark the item correctly, it has not distinguished between those who know more
and those who know less about the concept. If all students mark an item incorrectly, then the
item is not discriminating for the group. This information may be important to the teacher for
quality-control or diagnostic purpose.

Construction of Achievement Tests and Standardization


A questionnaire is a tool for research, comprising a list of questions whose answers provide
information about the target group, individual or event. Although they are often designed for
statistical analysis of the responses, this is not always the case. This method was the invention of
Sir Francis Galton. Questionnaire is used when factual information is desired. When opinion
rather than facts are desired, an opinionative or attitude scale is used. Of course, these two purposes
can be combined into one form that is usually referred to as ‘questionnaire’.

Questionnaire may be regarded as a form of interview on paper. The procedure for the
construction of a questionnaire follows a pattern similar to that of the interview schedule. However,
because the questionnaire is impersonal, it is all the more important to take care of its
construction.

A questionnaire is a list of questions arranged in a specific way or randomly, generally in print or


typed and having spaces for recording answers to the questions. It is a form which is prepared and
BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 6
distributed for the purpose of securing responses. Thus a questionnaire relies heavily on the validity
of the verbal reports.

According to Goode and Hatt, ‘in general, the word questionnaire refers to a device for securing
answers to questions by using a form which the respondent fills himself’. Barr, Davis and Johnson
define questionnaire as, ‘a questionnaire is a systematic compilation of questions that are submitted
to a sampling of population from which information is desired’and Lundberg says, ‘fundamentally,
questionnaire is a set of stimuli to which literate people are exposed in order to observe their verbal
behavior under these stimuli’.

Types of Questionnaire:

Figure depicts the types of questionnaires that are used by researchers.

Major Tools and Techniques in Educational Evaluation

Types of Questionnaires

Commonly Used Questionnaires Are:

(i) Closed form:

Questionnaire that calls for short, check-mark responses are known as closed-form type or restricted
type. They have highly structured answers like mark a yes or no, write a short response or check an
item from a list of suggested responses. For certain type of information, the closed form
questionnaire is entirely satisfactory. It is easy to fill out, takes little time, keeps the respondent on
the subject, is relatively objective and is fairly easy to tabulate and analyse.

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 7
For example, How did you obtain your Bachelors’ degree? (Put a tick mark against your answer)
(a) As a regular student
(b) As a private student
(c) By distance mode

These types of questionnaires are very suitable for research purposes. It is easy to fill out, less time
consuming for the respondents, relatively objective and fairlymore convenient for tabulation and
analysis. However, construction of such type of questionnaire requires a lot of labour and thought. It
is generally lengthy as all possible alternative answers are given under each question.

(ii) Open form:

The open form, or unrestricted questionnaire, requires the respondent to answer the question in their
own words. The responses have greater depth as the respondents have to give reasons for their
choices. The drawback of this type of questionnaire is that not many people take the time to fill
these out as they are more time consuming and require more effort, and it is also more difficult to
analyse the information obtained.

Example: Why did you choose to obtain your graduation degree through correspondence?
No alternative or plausible answers are provided. The open form questionnaire is good for depth
studies and gives freedom to the respondents to answer the questions without any restriction.
Limitations of open questionnaire are as follows:
• They are difficult to fill out.
• The respondents may never be aware of all the possible answers.
• They take longer to fill.
• Their returns are often few.
• The information is too unwieldy and unstructured and hence difficult to analyse, tabulate and
interpret.

Some investigators combine the approaches and the questionnaires carry both the closed and open
form items. In the close ended questions, the last alternative is kept open for the respondents to
provide their optimum response. For example, ‘Why did you prefer to join B.Ed. programme?
(i) Interest in teaching (ii) Parents’ wish (iii) For securing a government job (iv) Other friends opted
for this (v) Any other.’

BACHELOR OF EDUCATION_TESL
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT Lecture 6/Page 8
LECTURE 7

Test Taking Strategies


Teaching Students Test Taking Skills

A. General Test Taking Strategies

B. Test-Taking Strategies For Specific Test Formats


i. Strategies for Short-Answer Tests
ii. Strategies for Essay Tests
iii. Strategies for Multiple-Choice Tests
iv. Strategies for True–false Tests

A. General Test-Taking Techniques


1. Before You Start Writing:

GLANCE OVER THE WHOLE EXAM. This does two things for you:

It gives you a “set” on the exam: what it covers, where the emphasis lies,
what the main ideas seem to be. Many exams are composed of a series of
short questions all related to one particular aspect of the subject, and then a
longer question developing some ideas from another area.

It may relax you because if you read carefully all the way through it, you are
bound to find something you feel competent to answer.

OBSERVE THE POINT VALUE OF THE QUESTIONS and then figure out a rough time
allowance. If the total point value for the test if 100, then a 50-point question is worth about half of
your time regardless of how many questions there are. (Hint: A quick rule of thumb for a one-hour
test is to divide the point values in half.)

UNDERLINE ALL SIGNIFICANT WORDS IN THE DIRECTIONS.


Many students have penalized themselves because they did not see the word “or” (example:
“Answer 1, 2, OR 3”). You do not get extra credit for answering three questions if the directions
said to answer ONE. In fact, you will probably lose points for not fully developing your answer.

ASK the instructor if you do not clearly understand the directions.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 1


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
2. When You Begin To Work:

THERE IS NOTHING SACRED ABOUT THE ORDER IN WHICH THE QUESTIONS ARE
ASKED. Tackle the questions in the order that appeal to you most. Doing well on a question that you
feel relatively sure of will be reassuring and will free your mind of tension. The act of writing often
unlocks the temporarily blocked mental processes; when you finish that question, you will probably find
the others less formidable. On the other hand, you may be the type of person who wants to get the hard
one off his mind first and save the easy ones “for dessert”.

BE SURE TO IDENTIFY THE QUESTIONS CLEARLY so that the instructor knows which one you
are answering.

KEEP THE POINT VALUE AND TIME ALLOWANCE IN MIND.


This may save you from a very common and panic-producing
mistake such as taking twenty minutes of an hour test to answer a
five-point question, and then finding you have five minutes left in
which to answer a twenty-point question! Remember, it IS
POSSIBLE to score more than five points on a five-point question!
Work systematically, forcing yourself if necessary to do it. If you
tend to rush at things, slow down. If you tend to dawdle, pace
yourself.

Write as neatly as possible while keeping to your time allotment.


You are graded on accuracy, not neatness, but a neat paper may
convince the instructor that the answer is organized and accurate. A
messy paper has the opposite effect!

3. When You Are Finished

CHECK OVER YOUR ENTIRE PAPER. There are six reasons for this:

 To see if you have left out any questions you meant to tackle later.
 To see if you have followed directions.
 To catch careless errors. Note: Don’t take time to recopy answers unless you’re sure they are
really illegible. It’s easier to be reasonably neat the first time.
 Make sure you have numbered your answers correctly.
 Make sure your name is on every page.
 Make sure your answers are complete and that you have not left too much to the imagination of
the professor.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 2


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Test Taking Hints:
Hints for Taking Objective Tests
1. Use time wisely.
2. Read all directions and questions carefully.
3. Attempt every question, but do the easy ones first.
4. Actively reason through the questions.
5. Choose the answer which the test maker intended.
6. Anticipate the answer, then look for it.

True and False Questions


1. Look for absolute qualifiers such as: always, all, nearly. If one is present, the question will probably
be false.
2. Look for relative qualifiers such as: often, frequently, or seldom. These will probably be true.
3. If any part of the question is false, the whole question is false.
4. If you don’t know the answer, guess; you have a 50-50 chance of being correct.

Matching Questions
1. Make sure you understand the directions for matching the items on the lists. For instance, can
you use an item more than once?
2. Answer long matching lists in a systematic way, such as checking off those items already used.
3. Do the matches you know first.
4. Eliminate items on the answer list that are out of place or incongruous.
5. If you don’t know the correct matches, guess.

Short Answer Questions


1. Write no more than necessary.
2. With sentence completion or fill-in questions, make sure your answers are grammatically correct.
3. Make sure your response makes sense.

Multiple Choice Questions


1. Anticipate the answer, then look for it.
2. Consider all the alternatives.
3. Relate options against each other.
4. Balance options against each other.
5. Use logical reasoning.
6. Use information obtained from other questions and options.
7. If the correct answer is not immediately obvious, eliminate alternatives that are obviously absurd,
silly or incorrect.
8. Compare each alternative with the item of the question and with other alternatives.
9. Whenever two options are identical, then both must be incorrect.
10. If any two options are opposites, then at least one may be eliminated.
11. Look for options that do not match the item grammatically. These will be incorrect.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 3


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Tips for Taking Open-Book Tests:
1. Find out why you are being given an open-book test.
2. Prepare for an open-book best as carefully as other tests.
3. Prepare organizational summaries of the course using textbooks and lecture notes.
4. Use quotations from the book only when they relate to the question and supply supporting evidence.
5. Do not use extensive quotations--the professor knows the book. He/she wants to know what
you know.

Hints for Taking Essay Tests:

1. Find out what the professor wants to see as evidence.


2. Learn the professor’s point of view.
3. Determine the criteria that will be used to judge your answers.
4. Read the entire test through before starting.
5. Budget your time according to the point value of each question.
6. Use work sheets to jot down ideas, organize your answers, and remember details
(dates, formulas).
7. Use the question, turned around, as your introductory statement.
8. Note whether you are to define, list, or compare in order to give the professor what
he/she is looking for.
9. Organize your answer as in any well-developed paragraph by expressing your main idea and
then using supporting facts and details to prove your statement.
10. Use facts to support your arguments.
11. Use the technical language of the subject.
12. Use examples, charts, and other illustrations to make your answers more exact.
13. Unless there is a penalty for guessing, answer all questions even if you are not sure.
14. Use partial answers and outlines if you are not sure or are running out of time.
15. Proofread your answers for clarity, grammar, spelling, punctuation and legibility.

Important Words in Essay Questions:


ANALYZE: Present a complete statement of the elements of the idea. Adapt and stick to a single
plan of analysis. Give any conclusions which result.

COMPARE: Look for qualities or characteristics that resemble each other. Emphasize similarities
among them, but in some cases, also mention differences.

CONTRAST: Stress the differences between things, qualities, events, or problems.

CRITICIZE: Express your judgment about the merit or truth of the factors or views mentioned.
Give the results of your analysis of these factors, discussing their limitations and good points.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 4


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
DEFINE: Give concise, clear, and authoritative meanings. Don’t give details, but make sure to give the
limits of the definition. Show how the things you are defining differ from things that are similar.

DESCRIBE: Recount, characterize, sketch, or relate in sequence or narrative form.

DIAGRAM: Give a drawing, chart, or graphic answer. Usually you should label a diagram. In
some cases, add a brief explanation or description.

DISCUSS: Examine, analyze carefully, and give reasons pro and con. Be complete: give details in
an organized manner.

ENUMERATE: Write in list or outline form, giving points concisely one by one. (In some cases, write
in paragraph form.)

EVALUATE: Carefully appraise the problem, citing both advantages and limitations. Emphasize
judgment based on the appraisal of authorities and/or your own personal evaluation (depending on
the demands of the questions).

EXPLAIN: Clarify, interpret, and spell out the material you present. Give reasons for differences of
opinion or of results and try to analyze causes.

IDENTIFY: Write a brief note on who or what is to be identified. State distinguishing actions or
qualities. Include enough information to separate individuals from others of its group.

ILLUSTRATE: Use a figure, picture, diagram, or concrete example to explain or clarify a principle or
problem.

INTERPRET: Translate, give examples of, solve, or comment on a subject, usually giving your
judgment of it.

JUSTIFY: Prove or give reasons for decisions or conclusions, taking pains to be convincing.

LIST: As in “enumerate”, write an itemized series of concise statements.

OUTLINE: Organize a description under main points and subordinate points, omitting minor details and
stressing the arrangement or classification of things.

PROVE: Establish that something is true by citing factual evidence or giving clear, logical
reasons.

RELATE: Show how things are related to, or connected with, each other; or how one causes another,
correlates with another, or is like another.

REVIEW: Examine a subject critically, analyzing and commenting on the important statements to be
made about it.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 5


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
STATE: Present in brief, clear sequence, usually omitting details or examples.

SUMMARIZE: Give the main points or facts in condensed form like the summary of a chapter,
omitting details and illustrations.

TRACE: In narrative form, describe progress, development, or historical events from some points of
origin.

Examination Terms:
1. IDENTIFICATION TERMS

cite indicate define list


enumerate mention give name
identify state

2. DESCRIPTION TERMS

Describe illustrate discuss sketch


review develop summarize outline
diagram trace

3. RELATION TERMS

analyze differentiate compare


distinguish contrast relate

4. DEMONSTRATION TERMS

demonstrate prove explain why


show justify support

5. EVALUATION TERMS

assess evaluate comment


interpret criticize propose

BACHELOR OF EDUCATION_TESL Lecture 7/Page 6


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
6. EXACT TERMS

all necessarily always never


must no, non, without exception

7. INDEFINITE TERMS

hardly ever rarely seldom/infrequently


some/sometimes often/frequently
usually almost/always

These Are Words To Help You Express Yourself Better When Answering Essay
Questions:

COMPARE: used when two ideas that are being compared are alike: similarly, similar, to, both, like, as,
likewise, as well, compared to in the same way, also either..., or, neither..., nor.

CONTRAST: used to express opposite ideas: but, yet, or, in spite of, still, however, although,
regardless, even though, nevertheless, conversely, on the other hand, even so, on the contrary, in
contrast, notwithstanding, despite, in spite of, though, instead of, rather than, opposed to.

CAUSE: used to state that the action or event occurred because of the reason that follows the
signal word: because, for, since, whereas, as.

EFFECT: used to show the action caused the event after the work: so, so that, as a result of, in
order to, therefore, consequently, thus, hence.

CONDITION: used to show the specific condition under which the idea is true: If, when, providing,
unless, whenever, only if, after, assuming that.

SEQUENCE OR TRACE: used to organize ideas in a particular order: first, second, next,
last, in the first place, finally, then, later, before, subsequently, presently, once...then, eventually,
following this.

EXAMPLE OR EMPHASIS: used to identify an idea that is being illustrated or emphasized:


for example, for instance, specifically, to illustrate, to
demonstrate, such as, as in the case of, like, as, in particular, in
other words, that is, to repeat, primarily, especially, again.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 7


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
DEFINITION: used to note a specific meaning given for a term: meaning is, is defined as.

CONTINUATION OF IDEA: used to add another idea or more information about the same
thought: and, nor, also, besides, further, furthermore, in addition, too, moreover, again, and then,
eventually, another.

SUMMARY OR CONCLUSION: used to restate main idea: to sum-up, in brief, in short, on the
whole, as I have said, in conclusion, thus, therefore, for these reasons, consequently, hence, finally,
to repeat, to reiterate, in summary.

BACHELOR OF EDUCATION_TESL Lecture 7/Page 8


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
LECTURE 8

Language Skills Testing

A. Testing Writing
1. Writing tasks should be set that are properly representative of the range of tasks we would
expect students to be able to perform.
2. The tasks should elicit writing that is truly representative of the students’ writing ability.
3. The samples of writing can be appropriately scored.

To elicit examples of the writing ability of students, many different writing tasks can be used. It
is appropriate to specify the length of text that students produce. For instance:
 Writing a letter.
 Writing a description of something from a diagram or picture.
 Writing a summary of text.
 Writing on a topic to a specified length in words or paragraphs.
 Completing a partially written text.
 Writing a paragraph using a given topic sentence.
 Completing a paragraph.
 Writing a criticism or a response to a piece of writing.
 Writing a story, based on an outline provided.

B. Testing Reading

The most common type of published reading test that is available are reading comprehension
assessments. The most popular assessment of reading comprehension involves asking a child to
read a text passage that is appropriate for the child and then asking some specific, detailed
questions about the content of the text (often these are called IRIs). On reading comprehension
assessments, however there are some variations. For example, the child could be asked to answer
inferential questions about information implied by the text instead of explicit questions about
facts directly presented in the text, or the understanding of the child could be tested by his or her
ability to retell the story in the words of the child or to summarize the main idea or the morality
of the story. The "cloze" task is another common reading comprehension evaluation. Words are
omitted from the passage, and the child is asked to fill in the blanks with appropriate words. The
reading comprehension of young children can also be evaluated by asking them to read and
follow simple instructions, such as "Stand up" or Go look out the window."

Reading comprehension, another very common form of reading evaluation, should not be
confused with reading accuracy. A child is asked to read a passage of text clearly, without
making any errors, in a reading accuracy evaluation. The child's mistakes are analyzed to find
clues about the child's decoding techniques (not comprehension strategies). Very often an
evaluation combines these two different evaluations into one evaluation: the child reads a

BACHELOR OF EDUCATION_TESL Lecture 8/Page 1


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
passage out loud while the teacher notes the child's mistakes (sometimes referred to as a "running
record"), and then some questions about the passage are asked to understand by the child.

C. Testing Listening

A variety of skills are involved in testing listening. At the lowest level, discrimination between
sounds, discrimination between patterns of intonation and stress, and understanding of short and
long listening texts are involved. While the first two are part of listening, they are not adequate,
of course.

Testing Phoneme Discrimination


Sounds in a language other than one's native language are sometimes difficult to discriminate
against, particularly if the sounds in the native language are not distinguished. There are several
ways to test the discrimination of phonemes, namely the ability to tell the difference between
different sounds. One way to test phoneme discrimination is to have a picture viewed by the
testees and listen to four words and decide which word in the picture is the object. There should
be close to the right word for the words chosen as alternatives. However, finding sufficiently
common words with similar sounds is often difficult, and if unfamiliar words are used, they will
not create good alternatives. Alternatively, four images could be presented to the testees and they
could be asked to choose the image that corresponds to the word they hear. Another chance is to
give three words to testees and ask them to indicate which two are the same. Finally, a spoken
sentence can be listened to by testees and asked to identify which one of the four similar words
in the sentence was used.

Discriminating Stress and Intonation


By having testees listen to a sentence that they also have in front of them, the ability to recognize
stress can be tested. Testees are instructed to indicate the word that carries the sentence's primary
stress. While it is helpful to recognize stress patterns in English, the issue with this type of test is
that it lacks context. Testees need to show that the difference between "John is going today" and
"John is going today," can be recognized, but they do not need to show that they understand that
the meaning of the two sentences is different or what the difference is.
By having the testees listen to a statement and choose from three interpretations of the statement,
the ability to understand the meaning of the difference in intonation can be tested. For example,
the statement "Vera is a wonderful musician" could be given to testees and asked to decide
whether the speaker is making a simple statement, a sarcastic statement, or a question. However
since the context is neutral, it is sometimes hard to avoid ambiguity. Listeners use their
background knowledge, context, etc. in real communication, as well as intonation to help them
interpret the communicative meaning of an utterance.

BACHELOR OF EDUCATION_TESL Lecture 8/Page 2


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Understanding Sentences and Dialogues
A teacher may also test the comprehension of individual sentences and dialogues by the students.
This type of item consists of a single sentence that testees listen to in the simplest form and four
written statements from which they choose the one closest to the original spoken sentence in
meaning. For instance:

Spoken:
I had hoped to visit you while I was in New York.
Written:
A. I was in New York but did not visit you.
B. I will be in New York and hope to visit you.
C. I visited you in New York and hope to again.
D. I am in New York and would like to visit you.

Another type of item is one in which the testees listen to an utterance and select the most
appropriate response from four responses. The testees are not directly asked in that situation
what the meaning of the utterance is. Rather by demonstrating that they recognize an appropriate
response, they are asked to demonstrate that they know what it means. This tests the listening
ability of both the testees and their knowledge of suitable adjacency pair second pair components.
An example of this type of item is as follows.

Spoken: Would you mind if I visited you next time I came to New York?

Written:
A. Yes, of course. I'd love to visit New York.
B. No, I don't really think that much of New York.
C. Yes, I would. You can come any time.
D. No, not at all. I'd really love to have you.

(At a slightly higher level, both the first statement and the responses can be spoken, but in that
case, it might be better to have only three responses, since it would be difficult to keep all four
responses in mind) (At a slightly higher level, both the first statement and the responses can be
spoken, but in that case, it might be better to have only three responses, since it would be
difficult to keep all four responses in mind.)
The testees need to understand in this instance that Would you mind if I..." is a form used to ask
permission, and that a positive response begins with no (I don't mind)." Since two different types
of information are required for this type of item, there is a certain amount of controversy about it.
Some theorists argue that it is not a good type of product, because these two kinds of knowledge
are required. Testees may be able to fully comprehend the utterance, but do not know how to
respond to it. Also the situation is not realistic since the utterances are presented in isolation and
out of context. However, if these limitations are kept in mind, this sort of item can be helpful. It
is a more communicative type of task than many listening tasks, so it can have beneficial effects
on backwash, and it is relatively simple to handle.

BACHELOR OF EDUCATION_TESL Lecture 8/Page 3


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
D. Testing Speaking
Grammar, pronunciation, fluency, content, organization, and vocabulary are the aspects of
speaking that are considered part of its evaluation.

Testers need to choose the appropriate testing methods and techniques depending on the situation
and the purpose of the test.

The evaluation criteria that was used in that study was as follows:
Evaluation Items:
Presentations:
Content:
Language:
Eye contact:
Interviews:
Comprehensibility:
Pronunciation:
Fluency:
Ability to explain an idea:
Discussing and debating:
Able to be part of the conversation to help it flow naturally:
Uses fillers/ additional questions to include others in conversation:
Transfers skills used in dialogues to group discussions:

The finding of their study reveals that among the three test types, the discussion tests was the
most difficult followed by interview test and the presentation test.

Testing Speaking Using Visual Material


It is possible to test speaking using visuals such as photos, diagrams, and maps without even
understanding spoken or written material. The testers can control the use of vocabulary and the
grammatical structures as required by carefully selecting material. There are various kinds of
visual materials that vary in their difficulty to suit all learners' levels. A series of images showing
a story, where the student should explain, could be a common stimulus material. It requires the
student to put a coherent narrative together. Another way to do that is to put a group of students
in a random order of the story with the pictures. Without showing them to each other the students
decide on the sequence of the images, and then put them down in the order they have decided on.
They then have the chance, if they feel it is necessary, to reorder the images. This system is
already in use in the school-based oral primary school assessment in the Malaysian context.

The Taped Oral Proficiency Test


In that strategy, the performances of the students are recorded on tapes and then evaluated by the
examiner later on. This technique has some benefits and some drawbacks. One disadvantage of
the taped test, according to Cartier (1980), is that it is less personal; the examinee is talking to a
machine and not to a person. Another drawback is that it has a low validity rate. In addition, the
taped test is inflexible; it is virtually impossible to adjust it if something goes wrong during the
recording. On the other hand, that type of test has some benefits. It can be given to a group of
BACHELOR OF EDUCATION_TESL Lecture 8/Page 4
MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
students in a language lab, since each student receives identical stimuli, it is more standardized
and more objective, and scoring can be performed at the most convenient or economical time and
location.

Testing Grammatical Competence


(Cook, 2008) defines grammatical competence as the language knowledge stored in the mind of
a person. The term was first used in the 1960s by Chomsky and refers to the implicit
understanding of language's structural regularities in the mind and the ability to recognize and
produce these distinctive grammatical structures.

Testing Vocabulary
One way to evaluate vocabulary is to ask a person the word definition. This means that the
easiest way to evaluate whether a student has mastered these new vocab words is to give them a
closed book test in which they must provide the definitions if a teacher assigns students a list of
vocabulary words to learn.

Testing overall ability in the language:


a. Cloze Test
Cloze Test is based on a text with gaps which are put there regularly after every seventh, eighth
or ninth word. The examinee has to complete the gaps with appropriate words. Mostly more than
one option is possible. The first three or more lines of the text are without gaps (Scrivener 261).
Example of a Cloze Test:

I was so ......... (1) because it was my first time to visit the place. There are many interesting
places to visit. First, I ……… (2) Tangkupan Perahu. The place is just wonderful. After that, I
went to Dago Street. I ………………….(3) some t-shirt there. Then, I went to Cibaduyut. I
bought many things like shoes, dolls, ands some souvenirs. I also did not forget to buy
“peuyeum” . Bandung is ……….(4) for its “peuyeum”. Finally, I went to a café nearby to have-
lunch. I …………(5) Three days in Bandung and that was really fun. Anyway, I will write to you
again next time. Write to me as soon as you can. Bye Sincerely, Hana
1. Happy 2. Visit 3. Buy 4. Famous 5. Spend

The advantage of Cloze Tests is that it is quite easy to create them. The teacher just needs to find
a suitable text and delete words from it. Nevertheless, Hughes does not consider cloze tests much
reliable because we do not know what ability (speaking, writing, reading etc.) of the examinee it
shows. Moreover, the regular interval of every ninth word does not work very well because some
deleted words are very difficult to determine (Hughes 62-67). This is a kind of cloze test but with
initial letters of words that are omitted. This test is more advantages for the examinee as the texts
are shorter and less difficult. On the other hand, the gaps are so close to one another that the
learner can get lost in the text (Hughes 71).

BACHELOR OF EDUCATION_TESL Lecture 8/Page 5


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
b. C Test
The C-Test, a general language proficiency integrative written test, belongs to the reduced
redundancy test family. It consists of short, authentic, and complete texts, five to six. The first
and the last sentences in these texts are left standing. The 'rule of two' is implemented from the
second sentence of the text: the second half of every second word is deleted beginning at word
two in sentence two. Numbers, proper names and words of one letter are left undamaged, but the
deletion is entirely mechanical otherwise. Until the required number of blanks has been produced,
the process is continued. Texts are arranged with the simplest texts first in order of difficulty.
First proposed in 1981 by Klein-Braley as an alternative to the Cloze test, since its introduction
into the field of language testing, C-test has been the subject of a vast majority of empirical
research, some on validating C-test as a measure of general language skill through qualitative
and quantitative methodologies

Dictation:
A text is dictated by the examiner and written down by students. We mainly examine spelling or
pronunciation here and listening as well. For the teacher, dictation is an easy way of testing
because the preparation is minimal. However, we should consider the dictation as long as there is
the correct order of words and that misspelled words should be accepted because it is
phonologically correct (Hughes 71-72). The difficulty of evaluation is another disadvantage.
Teachers themselves generally determine which mistakes are considered serious and which are
only mild ones. It is advisable to set the evaluation scale before we begin to correct it. There is
also the issue of objectivity because, from her own perspective, every teacher will look at the
dictations. In order to avoid this, we can use an alternative to a dictation called a paused dictation,
a text with missing words, while the teacher dictates, students fill in the missing words (Berka,
Váňová 36-37).

Testing Communicative Competence:


Testing learners' communicative skills means providing testers with information about the ability
of the tester to perform tasks in specific contexts in the target language. The more realistic the
test tasks are the better the results will be for the performance of the testees. Four areas of
competence are made up of communicative competence: linguistic, sociolinguistic, discourse and
strategic. Knowing how to use a language's grammar, syntax, and vocabulary is linguistic
competence. A competent communicator, for example, will take turns when talking instead of
interrupting. In order to further the conversation, a competent communicator would know when
it is appropriate to ask questions and read nonverbal indications/feedback from the receiver to
know when the conversation is over.

Testing Literature:
The objective of literature testing is to institutionalize certain concepts and facts that are part of
literary learning and to focus attention on the relatively more significant literary skills (Tenbrink,
1998). In her paper, "Expanded Dimensions to Literature in ESL/EFL: An Integrated Approach,"

BACHELOR OF EDUCATION_TESL Lecture 8/Page 6


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT
Stern (1987) emphasized the great impact of literature on language learning. Rationale of Testing
of Literature.

Why should literature be tested? The development of literary competence to bring a literary piece
of art to the intellectual and emotional baggage of students to develop decision-making and
meaning-making.

Literature Test Formats The literature test format addresses specific language abilities. It may be
written or oral. Tests of oral literature challenge the students' speaking and listening abilities.
Written tests demand reading and writing skills.

Literature testing has been influenced by reading theories, literary theories and criticism, and
styles of teaching. Questions fall into two main categories: questions that do not require contact
with the text; questions that require contact with the text; literature tests are divided into two
categories: Literary Information Tests; Literary Interpretation Tests;

According to the Taxonomy of Cognitive Questions, the questions are classified as follows,
depending on complexity:
1. Literal Understanding
2. Restructuring
3. Inferential Understanding
4. Assessment
5. Gratitude

Evaluating English Language Tests conducted in Sri Lanka at National Level:


1. G.C.E. O/L English Language Paper
2. G.C.E. A/L General English Paper

Evaluating International English Language Tests:

1. IELTS Exam
IELTS is an English Language Exam that International candidates considering studying or
working in a country where English is the primary languages of communication are required to
take. IELTS is jointly owned and carried out by IDP Education Australia, the British Council,
and the English Language Assessment of Cambridge.

2. TOEFL Exam
The TOEFL Test aims to test your ability to communicate in English in academic, university and
classroom-based environments in particular. More than 8,500 institutions in 130 countries
including the UK, the USA and Australia, as well as all the top 100 universities in the world
accept it.

BACHELOR OF EDUCATION_TESL Lecture 8/Page 7


MODULE BAE2633_LANGUAGE TESTING AND ASSESSMENT

You might also like