The Fundamentals of Assessing EFL Writing
The Fundamentals of Assessing EFL Writing
net/publication/329830739
CITATIONS READS
9 6,569
1 author:
Touria Drid
Université Kasdi Merbah Ouargla
16 PUBLICATIONS 77 CITATIONS
SEE PROFILE
All content following this page was uploaded by Touria Drid on 21 December 2018.
Abstract: Given the significance of writing ability in academic and professional contexts, assessing
this skill has become one of the widely discussed and researched issues in language education. In
general terms, it is observed that assessing writing proficiency, which constitutes an integral part of
language instruction, is at times said to be randomly performed owing to the instructors’
insufficiency of theoretical grounding, inadequate training in this activity or to contextual
constraints. The case of assessing writing in English as a foreign language (EFL) is no exception.
Hence, EFL writing assessment scholars highlight the need for promoting assessment literacy among
its practitioners. The present paper discusses the fundamentals of EFL writing assessment. Its goal is
to provide a resource that writing instructors can draw on to enhance their knowledge, skills and
practice of this pedagogical task. The paper is also meant to assist in equipping trainees with the
requisite knowledge base on the field of assessment.
Keywords: assessment, foreign language, test, writing skill.
Resumé
Compte tenu de l'importance de la capacité d'écrire dans les contextes académiques et professionnels,
l'évaluation de cette compétence est devenue l'un des problèmes largement discutés et étudiés dans
l'éducation des langues. D'une manière générale, on constate que l'évaluation de la compétence en
écriture, qui fait partie intégrante de l'enseignement des langues, est parfois effectuée au hasard en
raison de l'insuffisance théorique des instructeurs, d'une formation insuffisante à cette activité ou de
contraintes contextuelles. Le cas de l'évaluation de l'écriture en anglais langue étrangère (EFL) ne fait
pas exception. Par conséquent, les spécialistes de l'évaluation de l'écriture EFL soulignent la nécessité
de promouvoir l'alphabétisation évaluation parmi ses praticiens. Le présent document traite des
principes fondamentaux de l'évaluation de l'écriture EFL. Son but est de fournir une ressource sur
laquelle les instructeurs en rédaction peuvent s'appuyer pour améliorer leurs connaissances, leurs
compétences et leur pratique de cette tâche pédagogique. Le document vise également à aider les
stagiaires à acquérir la base de connaissances requise dans le domaine de l'évaluation.
Mots-clés: évaluation, langue étrangère, test, compétences en écriture.
* Corresponding author
292
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
Introduction
At the heart of educational discussions lies the issue of writing instruction, being a
major aspect of literacy. Good writing performance comes to the fore in the academia because
it is an indication of students’ ability to communicate, critically grasp and display the
knowledge transmitted to them in various disciplines (Adler-Kassner & O’neill, 2010). Even
beyond academic circles, writing is cardinal in almost every profession. As the role of writing
becomes prominent, the need for an adequate way to assess writing proficiency in language
learning contexts appears to be compelling in order to make the right inferences about writing
ability and to take subsequent pedagogical and curricular decisions. Against this background,
assessment of English as a foreign language (EFL) writing has emerged as one of the heated
themes of language teaching and has intrigued substantial research on its various theoretical
and practical aspects.
Pertaining to the knotty realm of evaluation, assessing EFL writing is fully-grounded
in a multifaceted theory and requires consistency to yield its desired outcomes. However,
observation of current routines of EFL writing instructors in various educational contexts
indicates that their assessment practices are detached from explicit rationales. In spite of the
availability of extensive and technically elaborate literature on writing assessment issues, it
appears that EFL writing teachers might be better enlightened about the field if more practical
resources are offered. Therefore, the need for succinct and handy material becomes manifest.
In this paper we undertake the task of providing a resource roadmap of the theory of EFL
writing assessment, which may serve in improving current practices in EFL teaching contexts.
The paper first defines the fundamental concepts in assessment, its types and its approaches
within the framework of writing instruction. At the heart of the discussion, the standards of
assessment are elucidated and the variety of writing assessment methods is explored,
including both traditional and inventive techniques. The ultimate objective is to minimize the
distance between theory and practice and to address in-service EFL writing instructors’ need
for a plain and functional resource and even contribute to a more efficient training in writing
assessment for pre-service teachers.
293
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
A further term which is often confused with assessment is testing on account of their
shared focus on gathering data to estimate learning. Some authors use them interchangeably
(Bachman and Palmer, 1996), while others make some distinctions. Brown (2004)
differentiates them on the basis of their formality, scope and performer. Tests are planned
methods of evaluation prearranged by teachers. By contrast, assessment is more or less an
ongoing process which can be incidental or intended and which can be undertaken by the
teacher, the peers or the learners themselves. In appears then that tests form an important
subset of assessment. Weigle and Malone (2016) hold a slightly dissimilar view. For them,
the two terms are positioned on stakes continuum. Assessment is confined to classrooms, has
low stakes and is developed by teachers, while testing has higher stakes, is developed by
professionals and is standard and applied in larger contexts such as schools, districts, states
and so on. In sum, assessment in language learning is seen as evaluation which focuses on
measuring learners’ performance and which makes use of tests, among other methods.
Despite that the aforementioned distinctions are strictly clear-cut, some researchers tend to
use the discussed terms interchangeably. It is important then to consider what sense is meant
in any discussion of assessment issues.
294
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
B. Methods of Assessment. Scholars set two broad perspectives for the manner to conduct
assessment: Formal assessment and informal assessment. Obviously, outlining formal
assessment essentials is an oversimplification of this massively rich area of research. In some
discussions, formal assessment is seen as equivalent to testing (Harris & McCann, 1994).
However, some researchers consider them dissimilar. According to Brown (2004), for
example, formal assessment includes “exercises or procedures specifically designed to tap
into a storehouse of skills and knowledge. They are systematic, planned sampling techniques
constructed to give teacher and student an appraisal of student achievement” (p.6). For him,
all tests are formal assessment, but there are two features which distinguish them: tests are
time-bound and are based on small samples of behaviour, while assessment is not. On these
grounds, formal assessment research seems at times to be restricted to discussing testing
issues alone.
Informal assessment, on the other hand, is performed by teachers in ordinary
conditions and normal classroom environments. In this connection, Harris and McCann
(1994) explain that every intuitive evaluation of students’ performance of various abilities is
part of this kind of assessment. Teachers can straightforwardly see whether students encounter
difficulties, what attitudes they have towards learning, how much effort they exercise and how
much they are involved in class work. For Brown (2004), informal assessment involves all
incidental, spontaneous remarks or impromptu feedback to the student without necessarily
compiling results or taking them as a basis of subsequent decisions about competence. Even
the very insignificant comments on papers, feedback on draft work or a suggestion on specific
strategies are entailed. It is important to note that informal assessment can consider both
linguistic (in case language is being assessed) and non-linguistic factors. Harris and McCann
(1994) see that non-linguistic factors, though difficult to evaluate, constitute an important
section of what goes on in the classroom. It is the task of teachers then to elaborate consistent
methods for the appraisal of such factors as learner attitude, cooperativeness, independence,
creativity and presentation. The intricacy of the assessment methods is summarized in Figure
3
Methods of assessment
295
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
The first entails the comparison of individuals’ performance to each other. Based on the
principle of fairness, the test point in this case is to rank the learners for the purpose of
distributing limited resources or positions (e.g. grants, job vacancies). The second paradigm
compares individuals’ performance to given standard(s) in order to determine whether they
have met some pre-established criteria or instructional objectives (Fulcher, 2010).
Another axis on which testing goals can be placed is the formative-summative
continuum, that is, as a way to probe into strengths and weaknesses during a course or as a
means to sum up learner attainment at the end of a course. Cohen (2001) puts this within a
larger taxonomy, which encapsulates three broad categories of test functions: administrative,
instructional or research functions. Within these categories, the following specific test types
are very common:
- Placement tests: Aim at supplying information that will help allocate students to proper
classes.
- Diagnostic tests: Used to spot students’ strengths and weaknesses.
- Achievement tests: Permit learners to exhibit the growth they have made in their course.
- Performance tests: Give information about students’ ability to perform particular tasks,
usually associated with known academic or workplace requirements.
- Proficiency tests: Used to assess a student’s broad level of competence, usually to offer
certification for employment, university study, and so on.
These extensive categories represent the chief rationales for assessment work, but
teachers might utilize tests for other supplementary purposes, such as enhancing learners’
motivation, providing exercise opportunities for national or international exams, collecting
information about subsequent teaching, or appraising the success of their methods, tasks, or
materials (Hyland, 2003).
296
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
vital for assessors to gain sufficient literacy on the workings of systematic assessment. For the
performance of writing assessment to be efficient, the fundamentals of general assessment
ought to be attended to, obviously with a careful attention to the nature and intricacies of the
writing skill itself. The task of instructors is to balance a number of principles, which are in
fact applicable to all other subjects, for optimum outcomes. These include validity, reliability,
practicality, authenticity and accountability.
297
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
better to design a test that is primarily valid, and then search for ways of making it reliable,
rather than creating a reliable test and attempting to make it valid (Johnson, 2001).
In the context of assessing writing, reliability is critically important notably for high-
stakes tests, those on which placement or passing are based. However, threats to reliability are
numerous and often not observed by practitioners. In general terms, Cohen (2001)
distinguishes three factors which might influence reliability of assessment: test factors
(related to the test itself and rating), situational factors (related to the conditions of test
administration) and individual factors (Related to the state of the test takers). Figure 4
expounds on these factors. Indeed, achieving invariant perception of writing ability is a
stumbling block in writing classes. Harris and McCann (1994) mention the case of
“impressionistic marking”. This involves marking according to indiscriminate scales which
do not explicitly state the evaluation criteria. In that case, examiners might produce
inconsistent judgments even if the students’ writing is evaluated by the same teacher on
different occasions. Such scales as “Excellent writer”, “marginal writer” or “poor writer”
remain highly subjective if not attributed unambiguous criteria with detailed descriptors of
writing ability. The criteria can cover such features as comprehensibility, grammatical
accuracy, spelling, and text organization. According to Nation (2009), another way in which
reliability is at risk is restricting assessment to one piece of writing, often evaluated by one
examiner. For more consistent results, there should be more than one writing task with at least
a second scorer, although this might seem impractical.
On the whole, Weigle (2002) argues that reliability must be integrated into the
assessment process by establishing standardized procedures with well-defined modus
operandi for test construction, supervision and scoring, which would lessen bias in case
practical problems arise. Some researchers opt for indirect assessment as a tool to diminish
disparity in test results on account of their power to reveal test takers’ knowledge of writing
sub-skills. However, others dismiss such tests because they are based on correctness at the
expense of communicativeness (Hyland, 2003). On the whole, to enhance reliability in EFL
writing assessment, East (2008) argues that examiners have to devise cautiously worded and
satisfactorily thorough rubrics. Training raters in scoring and determining the extent to which
raters concur about rated scores are also crucial.
Test factors Situational factors Individual factors
298
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
D. Authenticity. Authenticity differs somehow from the previous principles in the sense that
it is specific to the assessment of language. This principle is a way to evaluate tests in terms
of their representation of the target language use (TLU). Bachman and Palmer (1996) define it
as the extent of correlation between the traits of test tasks and those of the target language use.
When tests are authentic, their results may be generalized further than performance in the test
to true language use in non-test domains. For example, an authentic task that measures writing
ability is one that requires learner to write meaningfully as is the case in real situations. As a
result of authenticity, they add, test takers would have positive perceptions of the relevance of
the assessment tool and would therefore react positively to the tasks. Brown (2004) sees that
by insuring the authenticity of language tests, one would present natural, non-contrived
language which matches real world language. If writing tests are authentic, the type of writing
students will produce would simulate the one they encounter in the factual world, at least in
terms of genres and communicative purposes (e.g. writing a letter of complaint, producing a
touristic brochure, etc). Weigle (2002) observes that in FL learning contexts, where the target
language is scarcely used outside the classroom, it might be thorny to hit upon a writing task
that presents an authentic writing situation. Thus, test developers in such contexts sometimes
allot authenticity less weight than other principles.
299
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
A. Indirect Assessment. In general, indirect (or objective) assessment uses tasks which are
not reflective of real target language use situations but are used to make implications on the
ability lying behind performance in the test (Richards & Schmidt, 2002). Indirect assessment
is a traditional method of assessing writing which was popular in the 1950s and 1960s.
Attempting to measure the sub-skills involved in writing, this type of assessment usually
employs multiple choice questions, error spotting or other selected response measures
(Weigle, 2012). Indirect assessment reflects the accepted ideas about composition of the time,
where focus was placed on such features as grammar, usage and punctuation. Although it is
recognized to be consistent and easy to administer and score, writing specialists have noticed
important limitations of objective assessment: It seems to decontextualize knowledge and
meaning making as it does not require real writing. Narrowing the conception of competence,
depriving students from revision opportunities as well as excluding rhetorical and contextual
considerations in writing are the most noticed drawbacks of this form of writing assessment
(Neff-Lippman, 2012). It should be stressed, however, that indirect assessment is highly
reliable and practical although it is deficient in terms of validity and authenticity.
B. Direct Assessment. Direct assessment, as its name suggests, requires that learners’ writing
ability is directly evaluated. In broad terms, a direct test refers to “a test that measures ability
directly by requiring test takers to perform tasks designed to approximate an authentic target
language use situation as closely as possible” (Richards & Schmidt, 2002). Indirect tests
appraise the key abilities which are thought to be indicators of the target behaviour, but they
do not model the behaviour itself, while direct tests seek to reproduce the real eventual
behaviour in the test itself (Johnson, 2001). In assessing writing directly, the test tasks involve
production of a sample of writing. Through these tests, students show writing competence
rather than spot the right answers without production. Reflecting changes in composition
theory, this form of assessment supplanted the indirect paradigm and has become widely used
since the 1970s. In fact, it is still used in standardized examinations nowadays and is
highlighted as a typical form of large-scale assessment. Weigle (2002) asserts that direct tests
are the most widespread and the best researched methods in all contexts of language learning.
The form of direct writing assessment is well-defined. Essentially, such measuring
devices are administered in a limited time frame (hence the term “timed impromptu writing
test”), and the topic is not supplied to writers before the examination. Hamp-Lyons (1991)
specifies five additional key features:
(1) Writers produce one piece of continuous (at least 100 words),
(2) Writers receive a set of instructions (or prompt) but with flexibility given for dissimilar
responses,
(3) Produced samples are read by at least one but normally two or more qualified raters,
(4) Judgment is tied to common standard (model essays or rating scales),
(5) Judgment is expressed in numbers.
According to Weigle (2012), both the proper construction of tasks and the appropriate
implementation of scoring are important in the use of direct tests, especially to ensure
reliability and eliminate variation in the results of assessment. As for the construction of tasks,
here are three factors that do influence performance: subject matter (personal Vs non-personal
topics, general Vs specialized topics), discourse mode (genre, rhetorical task, cognitive
demands) and stimulus material (textual, visual). It is necessary that EFL writing instructors
balance these factors in order to make their assessment more systematic and reflective of
genuine competence.
Regarding scoring procedures in direct assessment, three approaches are can be
utilized: holistic scoring, analytical scoring and primary trait scoring, all of which use a
rating scale (or a scoring rubric). Holistic scoring is developed in such a way that it assesses
300
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
writing performance, and it complies with the validity and reliability principles. It starts from
the belief that evaluating writing skill does not involve measuring an array of sub-skills, but
rather measuring a whole piece of discourse (Williams, 2003). In Holistic scoring, raters give
a single score (or point) for the whole script based on trained rater’s impression (e.g. 1, 2, 3 or
4). For each point, general overall descriptions of performance are given (descriptors show
clear criteria but are usually integrated in a patterned way). The use of such scales requires
training raters so that consistent scoring can be achieved, and it is preferred when assessing a
large number of tests (Weigle, 2002). Analytical scoring divides writing ability into
fundamental elements or criteria (e.g. such as content, word choice, mechanics, organization,
grammar, etc) and assesses them independently. Focus is put on traits which are held to be
common to all writing. The criteria of assessment are separated and the descriptors for each
are supplied independently. Discrete scores are attributed to separate aspects of performance,
permitting learners to pinpoint their strengths and weaknesses in precise areas (Brown, 2004).
This scale is more appropriate in formative assessment. Primary trait scoring focuses on
selected aspects of writing, usually a specific range of discourse (e.g. persuasion or
explanation) (Weigle, 2002). The writer’s performance on the very particular task at hand is
assessed in terms of how much it achieves a given rhetorical goal.
While impromptu timed tests have brought the assessment of writing much rigour,
especially in large-scale testing situations, doubts are often raised regarding the faithfulness of
this method to reflect learners’ real competence. Weigle (2002) argues that direct testing
judges a single piece of writing administered under non-ordinary conditions. This seems to
present only a partial picture of students’ abilities. Further, Neff-Lippman (2012) sees that
direct testing discards process and contextual issues and represents a restricted conception of
competence. But because direct tests are still widely used, she suggests a number of qualities
(e.g. clarity, engagement and audience specification, etc) to be incorporated in their
construction for more efficiency.
301
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
to open variation in the modes of accumulating and appraising learner written products
(Weigle, 2002).
4. Conferences and Interviews. Conversational in nature and rooted in the process approach
to writing, conferences involve discussion of learner’s written work, portfolios or journals
with teachers and peers in order to fine-tune ideas, talk about difficulties, highlight strengths
and weaknesses or receive feedback (Thornbury, 2006; Richards & Schmidt, 2002). It is
claimed that conferences are a formative assessment tool whose the chief function is to offer
affirmative washback. An interview is a carefully constructed type of conference in which
teachers interrogate students about a specific assignment using focused probes. The use of
both conferences and interviews in assessing writing informally calls for caution in order
conform to the principles of assessment. Both are shown to be of low practicality, while their
reliability rests on a clear specification of objectives and procedures (Brown, 2004).
302
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
Conclusion
We have attempted in this paper to provide a holistic picture of the essentials of EFL
writing assessment, whose understanding might enhance current practices. This is in a way an
attempt to reaffirm the issue of promoting a sufficient and urgent writing assessment literacy
among EFL writing instruction practitioners and developing their pedagogical repertoire with
inventive active methodologies. Against this background, some recommendations are
provided to mend the malfunctioning parts of the assessment apparatus and to eliminate the
widely observed unproductive, static assessment routines in EFL writing classes. The
following pointers are provided:
EFL writing teachers ought to be fully acquainted with the technical distinctions and the
wide variety of assessment purposes.
For EFL writing assessment to yield its desired outcomes, the principles of general
assessment have to be observed.
While institutional restrictions may not always permit teachers to use the assessment tasks
they would favour, practicality issues should be cautiously treated in such a way that
validity is ensured through assessing writing performance.
Reliability issues should not seek consistency of measurement at the expense of preparing
learners for a more authentic use of the target language.
Writing assessors have to receive sufficient training in test construction methods and
scoring procedures in both pre-service and in-service contexts in order to ensure fair and
effective assessment. This can be achieved via enhancing a collaborative exercise and
designing appropriate benchmarking of texts to achieve consistency.
In order for writing assessment to have a positive influence on teaching and to promote
learner progress, alternative tools have to be integrated into EFL writing classes.
303
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
In the end, it should be stated that the enterprise of assessing EFL writing follows an
intricate network of principles and approaches derived from the vast field of general
assessment. These are tailored to fit the nature of the writing skill and the context of language
teaching simultaneously. In fact, an appropriate practice of assessing EFL writing must be
grounded in a thorough knowledge of assessment fundamentals. Not equipped with adequate
assessment literacy, EFL writing instructors may fall in the trap of rendering this activity a
mere psychometric, statistical process, which discards important aspects of language learning
and which provides no direct feedback to teaching. Written language is in the first place a
medium of communication, and if assessing writing does not help in preparing EFL writers
for wider communications, the role of writing programmes in developing literacy would be
negligible.
References
AdlerKassner, L. & O'Neill, P. (2010). Reframing writing assessment to improve teaching
and learning. Logan, UT: Utah State University Press.
Bachman, L.F. & Palmer, A.S. (1996). Language testing in practice: Designing and
developing useful language tests. Oxford: Oxford University Press.
Bridgeman, B., & Carlson, S. (1983). Survey of academic writing tasks required of graduate
and undergraduate foreign students (TOEFL Research Report No. 15). Princeton, NJ:
Educational Testing Service.
Broughton, G., Brumfit, C., Flavell, R., Hill, P. & Pincas, A. (1980). Teaching English as a
foreign language (2nd ed.). London and New York: Routledge.
Brown, H. D. (2001). Teaching by principles: An Interactive approach to language pedagogy.
New York: Longman Inc. NY: Longman.
Brown, H. D. (2004). Language assessment: Principles and classroom practices. White Plains,
NY: Pearson Education.
Clapham, C. (2000). Assessment and testing. In M. Byram (Ed.) Routledge encyclopedia of
language teaching and learning. London: Routledge, pp 48-53.
Cohen, A.D. (2001). Second language assessment. In M. Celce-Murcia (ed.), Teaching
English as a second or foreign language (3rd ed., pp. 515-534). Boston: Heinle.
Crusan, D. (2010). Assessment in the second language writing classroom. Ann Arbor, MI:
The University of Michigan.
East, M. (2008). Dictionary use in foreign language writing exams: impact and implications.
Amsterdam: John Benjamins Publishing Company.
Elliot, N., Plata, M., & Zelhart, P. (1990). A program development handbook for the holistic
assessment of writing. Baltimore: University Press of America.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Fulcher, G. (2010). Practical language testing. London: Hodder Education.
Gould, J., & Roffey-Barentsen, J. (2014). Achieving your diploma in education and training.
London: Sage Publications Ltd.
Hamp-Lyons, L. (1991). Scoring procedures. In L. Hamp-Lyons (Ed.), Assessing second
language writing in academic contexts (pp. 241-276). Norwood, NJ: Ablex
Harris, M. & McCann, P. (1994). Assessment. Oxford: Heinemann Publishers.
Hathaway, J. (2014). Writing strategies for fiction. United States: Shell Educational
Publishing.
Hyland, K. (2003). Second language writing. New York: Cambridge University Press.
Hyland, K. (2013). ESP and writing. In Brian Paltridge, B. & Starfield, S. (Eds.) The
handbook of English for specific purposes (pp. 95-114). Oxford: Wiley-Blackwell.
304
Psychological & Educational studiEs, Vol 11, N°1 / june, 2018
305