0% found this document useful (0 votes)
12 views

Evaluating-a-Learning-Design-for-EFL-Writing-Using-ChatGPT

Uploaded by

Kwok Chun Wong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Evaluating-a-Learning-Design-for-EFL-Writing-Using-ChatGPT

Uploaded by

Kwok Chun Wong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/373265937

Evaluating a Learning Design for EFL Writing Using ChatGPT

Preprint · August 2023


DOI: 10.13140/RG.2.2.23851.62243

CITATIONS READS

0 1,125

1 author:

David James Woo


Everwrite Limited
46 PUBLICATIONS 311 CITATIONS

SEE PROFILE

All content following this page was uploaded by David James Woo on 22 August 2023.

The user has requested enhancement of the downloaded file.


Evaluating a Learning Design for EFL Writing Using ChatGPT

David James Woo a, *


a
Precious Blood Secondary School, Hong Kong, China
*
Corresponding author

- Postal address: Precious Blood Secondary School, 338 San Ha Street, Chai Wan,

Hong Kong, China

- Email address: [email protected]

- Phone: +852 2570 4172

Funding Acknowledgements

This research received no specific grant from any funding agency in the public, commercial,

or not-for-profit sectors.

Declaration of Conflicting Interests

The authors report there are no competing interests to declare.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author,

David James Woo, upon reasonable request.

Biographical Note

David James Woo is a secondary school teacher. His research interests are in artificial

intelligence, natural language processing, digital literacy, and educational technology

innovations. ORCID: https://orcid.org/0000-0003-4417-3686


2

Evaluating a Learning Design for EFL Writing Using ChatGPT

Abstract

This study explores the application of ChatGPT in enhancing English as a Foreign Language

(EFL) writing skills in a Hong Kong secondary school setting. The innovation focused on

implementing 'machine-in-the-loop' writing instruction, with students using ChatGPT for

writing tasks and integrating AI-generated text with their own words. Evaluation showed that

while students could utilize ChatGPT's capabilities, heavy reliance on AI output could mask

their writing abilities. The study emphasizes the need for students to exercise more agency in

editing AI output and suggests pedagogical strategies. It provides valuable insights for

educators seeking to integrate ChatGPT in language education.

Keywords: learning design; EFL writing; ChatGPT; machine-in-the-loop; Hong Kong

secondary education

1 Introduction

Generative artificial intelligence (AI) language models such as OpenAI’s GPT-2,

GPT-3 and GPT-4 have captivated educational researchers’ and practitioners’ interest. This is

because they appear to be “stronger” AI (Hockly, 2023) that can generate large chunks of

coherent text indistinguishable from human writing (Brown et al., 2020). Besides, ChatGPT

has popularized interaction with language models through a chatbot, that is, a conversational

user interface that enables human users to engage in meaningful verbal or text-based

exchanges with a computer program (Kim et al., 2022). Importantly, these language models

enable people to write with a machine-in-the-loop (Clark et al., 2018), that is, to write with

the support of generative AI that is designed to assist people, such as a chatbot, while people

exercise full agency on how to act on generative AI output, if at all. Since AI curriculum

(Chiu et al., 2022; Education Bureau, 2023) has shown a gap in instruction for writing with a

machine-in-the-loop , machine-in-the-loop writing appears to be an educational technology


3

innovation that has not yet been widely adopted (Rogers, 1962) but may enhance students’

writing abilities.

We are motivated to contribute theoretical and practical knowledge to advance

writing with a machine-in-the-loop in schools although we acknowledge competent machine-

in-the-loop writing can vary not least with the type of students and the type of writing. In this

paper, we report a case on the design, implementation and evaluation of this innovation.

2 The Teaching Context — A Need for Innovation

Our innovation coincides with the release of the POE app. At the time of study, the

app granted free access to ChatGPT and five other chatbots (i.e. Sage; GPT-4; Claude+;

Claude-instant; and Google-PaLM) that rely on state-of-the-art (SOTA), commercial

language models hundreds of billions of parameters in size (see Figure 1). The language

models in SOTA chatbots have capabilities such as understanding abstract task descriptions

and human concepts in natural language (Reynolds & McDonell, 2021) and chain-of-thought-

reasoning, that is, breaking down a problem into steps before delivering a verdict (Kojima et

al., 2022). For students to take advantage of these novel capabilities and to get desired output,

students must learn prompt engineering, that is, how to craft appropriate instructions for

generative AI to generate desired output (Reynolds & McDonell, 2021).

(Figure 1 here)

Hong Kong’s educational conditions also influence our innovation. First, Hong Kong

secondary schools deliver English as a foreign language (EFL) curriculum and we aim to

release a curriculum module about writing with a machine-in-the-loop in this subject area.

Second, at the author’s Hong Kong school where the author is an EFL teacher, the principal

has tasked the first author to develop a generative AI ethical use policy for the school.

Besides, the principal has tasked the author to instruct the first author’s colleagues in the EFL

department so that colleagues might acquire the technological pedagogical knowledge and
4

skills (Puentedura, 2015) to effectively integrate ChatGPT into their writing classrooms.

Since we have considerable opportunity to leverage SOTA chatbots and to influence policy

and practice in our local education system, the stakes for the design, implementation and

evaluation of the innovation are high. As such, we are piloting our innovation, which we

report in the subsequent sections.

3 Description of the innovation

We report our pilot as a learning design, that is, a framework for describing learning

environments and learning activities (Conole & Wills, 2013). Because we had not found

learning designs for EFL students’ machine-in-the-loop writing with ChatGPT, we

approached our learning design’s development by design-based research (DBR) (Wang &

Hannafin, 2005), that is, a flexible and systematic methodology that can improve educational

practice iteratively though design, development, implementation and analysis.

We adopted an outcome-based learning design so that for any implementation, we

first designed its purpose and intended learning outcomes (ILOs), that is, what students

should achieve by the end of the implementation. Then we designed the learning activities,

that is, basic units of interaction with or among learners. Table 1 summarizes our initial

learning design for EFL students’ machine-in-the-loop writing with ChatGPT. The design

comprises its (1) title, (2) purpose, (3) ILOs, (4) learning activities, and (5) materials and

resources.

(Table 1 here)

4 Development and Implementation of the Learning Design

In the following sections, we report features of our pilot learning design that we

sought to evaluate; we then report the implementation and evaluation.


5

4.1 Writing Prompt

Students could write either a feature article or a letter to the editor. Figure 2 shows the

prompts we selected, taken from the 2023 Hong Kong university entrance examination for

the EFL subject area (HKDSE), writing paper, which almost all Hong Kong mainstream

school students take in their final year of secondary school. Prior to the implementation, only

two students had reported writing a feature article in English language, while eight students

had written a letter to the editor. Although many students may not have been taught to read

and write these text types yet, we wondered if students could use ChatGPT and other

chatbots’ capabilities to overcome students’ limitations and if so, how studennts would

integrate chatbot output to complete the task.

(Figure 2 here)

4.2 Task Rules

As the actual HKDSE writing prompts instruct students to write around 400 words,

we limited a student’s written work to no more than 500 words on Google Docs, using their

own words and words generated from POE chatbots. Students could prompt any POE chatbot

in any way possible, as many times as necessary and use any chatbot output. In this way, we

thought chatbots could adapt to students’ abilities and provide differentiated instruction

(Kohnke, 2022). Since we proposed students could use words from more than one chatbot,

we instructed students to differentiate their own words from AI words by highlighting words

from each chatbot in a specific color.

4.3 Marking Scheme

Since we were exploring how students would use SOTA chatbots and how these

chatbots would contribute to the quality of students’ written work, we used the actual

HKDSE writing paper marking scheme1, which comprises dimensions of content, language

1
Please see HKDSE English language paper 2 writing marking scheme
6

and organization. The highest possible score for each dimension is seven and the total

possible score is 21.

The author and teachers from the school double-scored students’ written work. To do

this, we anonymized the texts so that a scorer would not know who wrote the text and which

words were human words. Then two scorers independently scored each text for dimensions of

content, language and organization according to the marking scheme. By identifying texts

with higher human-rated scores and analyzing these texts’ integration of human words and

chatbot output, we might then have evidence to inform effective practice for AI word use.

4.4 Instructional Materials

Instruction focused on introducing chatbots, prompt engineering, the writing prompt

and task rules. We introduced chatbots using an inductive approach, first, showing a chatbot

screenshot and asking students, “What are you looking at?” Second, we asked students to

interact with a chatbot, before asking students what this type of generative AI is and how to

interact with it. We introduced the features of chatbots, including turn-taking and memory; as

for the garbage-in-garbage-out principle for interacting with chatbots, we showed a chatbot

screenshot to students and asked, “What is a problem with this conversation?” Next, we

explicitly introduced prompt engineering, by defining prompts, prompt engineering,

classmates’ actual prompt content and theoretical prompt content. Finally, we introduced the

writing prompt, task rules, materials including POE app and Google Docs and assessment.

4.5 Implementation

The learning design was implemented at a one-hour, 45-minute workshop in the

STEM lab of the first author’s school on July 5, 2023 and repeated on July 6. Six students

voluntarily participated in the July 5 workshop and 16 on July 6. The students came from

form levels 1, 2, 3 and 4.


7

The instructional materials were delivered in English by the first author. At the same

time, the first author’s colleague provided simultaneous spoken translation in Cantonese

Chinese language. At the workshop, students were given 45 minutes to complete a writing

task using ChatGPT.

4.6 Evaluation

At the end of the workshop, students completed an eight-item, post-workshop

cognitive load questionnaire developed from measures by Paas (1992) and Sweller, van

Merrienboer, and Passi (1998). Table 2 summarizes the descriptive statistics of the

questionnaire. According to a six-point Likert rating scheme, 1 refers to strongly disagree and

6 strongly agree. The results show it is normal and common for students to agree that the

workshop content was difficult, that it required a lot of mental effort and that students did not

have sufficient time to complete the task.

(Table 2 here)

15 students submitted valid texts, written according to the task rules and 5 students

wrote invalid texts. Figure 3 shows a feature article written with a student’s own words and a

chatbot’s words. Figure 4 shows a letter to the editor written with a student’s own words and

chatbots’ words.

(Figure 3 here)

(Figure 4 here)

We analyzed valid texts for language features, including students’ use of chatbot

words. The average length of a composition was 340 words, and a composition on average

contained 80% chatbot words. In fact, only one student wrote at least 50% of her composition

with her own words. The modes for number of instances of human words, total number of

human words, number of instances of chatbot words and number of chatbots used were two,
8

four, one and one, respectively, showing little integration of human words and editing of

chatbot output.

The average human-rated score was 11.8 with a range from 14.5 to 5. Content showed

the lowest average score at 3.6 and language the highest at 4.5. For content, students’ lacked

details and creativity and for organization, clear intra-paragraph structure such as by using a

topic sentence and coherent referencing, but we found for language, students’ sentences were

verbose and accurate.

5 Reflection and Future Direction

The learning design for writing with a machine-in-the-loop has shown utility in our

EFL classroom. By leveraging SOTA chatbots’ capabilities, students at lower secondary

grade levels could answer a challenging HKDSE writing prompt. It appears students were

enhancing their knowledge and skills to prompt SOTA chatbots. In addition, most students

completed the writing task validly.

How students leveraged chatbots highlight learning design weaknesses. First, students

relied heavily on large unedited chunks of chatbot output to complete the task and that from

just one chatbot. Thus, ChatGPT is not a supplemental language learning tool and is not

adapting to students’ abilities (Kohnke, 2022). Instead, ChatGPT is replacing students’

language and masking students’ writing abilities. Second, although students on average

scored adequately, students may not attain higher human-rated scores without exercising

more agency to edit chatbot output for content and organization. How students leveraged

chatbots and their output appear related to the difficulty of the task (Charters, 2003), which

created a high cognitive load in many students.

We recommend the following pedagogical strategies that may reduce task difficulty

and facilitate student agency to edit chatbot output. First, the writing prompt should be

approachable from students’ existing abilities so that students can answer the prompt
9

independently or with some chatbot assistance (Vygotsky, 1978). For example, writing

prompts can feature topics, text types and word counts with which students are already

familiar. Second, teach task completion and prompt engineering in a multi-step, scaffolded

way. For instance, teach task completion according to a writing approach (e.g. genre-based;

process-based; product-based) and teach how to prompt SOTA chatbots according to each

stage in that writing approach. For process writing students could learn about the roles that

ChatGPT can play in pre-writing, drafting, editing and revising. Besides, for low literacy EFL

students, emphasize prompts for chatbots to produce outputs in students’ native language

(Kohnke et al., 2023) and to produce English language output in a simpler style.

Implementation of these recommendations will require more time, but by applying these

methods, students could move beyond using ChatGPT to replace their own effort to answer a

writing prompt. Finally, consider an assessment method that penalizes wholesale copying

from ChatGPT but rewards editing of chatbot output and the inclusion of students’ own

words in an answer. For instance, the HKDSE marking scheme did not account for copying

text from other sources. However, the HKDSE marking scheme for integrated skills tasks

accounts for students’ wholesale copying, penalizing them in the dimensions of language and

appropriacy. Adopting a marking scheme like that may motivate students to critically

evaluate chatbot output and to edit their writing more.


10

6 References

Brown TB, Mann B, Ryder N, et al. (2020) Language Models are Few-Shot Learners.

arXiv:2005.14165, 22 July. arXiv. DOI: 10.48550/arXiv.2005.14165.

Charters E (2003) The Use of Think-aloud Methods in Qualitative Research An

Introduction to Think-aloud Methods. Brock Education : a Journal of Educational

Research and Practice 12: 68–82.

Chiu TKF, Meng H, Chai C-S, et al. (2022) Creation and Evaluation of a Pretertiary

Artificial Intelligence (AI) Curriculum. IEEE Transactions on Education 65(1): 30–

39. DOI: 10.1109/TE.2021.3085878.

Clark E, Ross AS, Tan C, et al. (2018) Creative Writing with a Machine in the Loop:

Case Studies on Slogans and Stories. In: 23rd International Conference on

Intelligent User Interfaces, New York, NY, USA, 5 March 2018, pp. 329–340.

IUI ’18. Association for Computing Machinery. DOI: 10.1145/3172944.3172983.

Conole G and Wills S (2013) Representing learning designs – making design explicit

and shareable. Educational Media International 50(1). Routledge: 24–38. DOI:

10.1080/09523987.2013.777184.

Education Bureau (2023) Module on Artificial Intelligence for Junior Secondary Level.

Education Bureau.

Hockly N (2023) Artificial Intelligence in English Language Teaching: The Good, the

Bad and the Ugly. RELC Journal. SAGE Publications Ltd: 00336882231168504.

DOI: 10.1177/00336882231168504.
11

Kim H, Yang H, Shin D, et al. (2022) Design principles and architecture of a second

language learning chatbot. University of Hawaii National Foreign Language

Resource Center. Available at: http://hdl.handle.net/10125/73463 (accessed 21

August 2023).

Kohnke L (2022) A Pedagogical Chatbot: A Supplemental Language Learning Tool.

RELC Journal. SAGE Publications Ltd: 00336882211067054. DOI:

10.1177/00336882211067054.

Kohnke L, Moorhouse BL and Zou D (2023) ChatGPT for Language Teaching and

Learning. RELC Journal. SAGE Publications Ltd: 00336882231162868. DOI:

10.1177/00336882231162868.

Kojima T, Gu SS, Reid M, et al. (2022) Large Language Models are Zero-Shot

Reasoners. arXiv:2205.11916, 24 May. arXiv. DOI: 10.48550/arXiv.2205.11916.

Paas F (1992) Training Strategies for Attaining Transfer of Problem-Solving Skill in

Statistics: A Cognitive-Load Approach. Journal of Educational Psychology 84:

429–434. DOI: 10.1037/0022-0663.84.4.429.

Puentedura R (2015) SAMR: A Brief Introduction. In: Ruben R. Puentedura’s Blog.

Available at: http://hippasus.com/blog/archives/227 (accessed 25 July 2023).

Reynolds L and McDonell K (2021) Prompt Programming for Large Language Models:

Beyond the Few-Shot Paradigm. arXiv:2102.07350. arXiv. DOI:

10.48550/arXiv.2102.07350.

Rogers EM (1962) Diffusion of Innovations. New York: Free Press.


12

Sweller J, van Merrienboer JJG and Paas FGWC (1998) Cognitive Architecture and

Instructional Design. Educational Psychology Review 10(3): 251–296. DOI:

10.1023/A:1022193728205.

Vygotsky LS (1978) Mind in Society: The Development of Higher Psychological

Processes. Mind in society: The development of higher psychological processes. L.

S. Vygotsky. Oxford, England: Harvard U Press.

Wang F and Hannafin M (2005) Design-based research and technology-enhanced

learning systems. Educational Technology Research & Development 53: 1042–

1629.
13

Figure 1

Six Chatbots on the POE App


14

Figure 2

Writing Prompts
15

Figure 3

A Feature Article

Note. Words from the Google-Palm chatbot in grey.


16

Figure 4

A Letter to the Editor

Note. Words from the Sage chatbot are colored purple; ChatGPT in green; GPT-4 in blue; Claude+ in red; and
Google-Palm in grey.
17

Table 1

Initial learning design

Title How to use ChatGPT effectively to complete a writing task


Time 1 hour 45 minutes
Purpose To provide hands-on experience with prompt engineering
Intended learning 1. I understand what a chatbot and a prompt are.
outcomes 2. I can access and use an app’s chatbots.
3. I can engineer prompts so I get what I want.
4. I can integrate my words and chatbot output to write my best
for a writing task.
Learning activities 1. Introduction to Workshop, AI and Language Models (10)
(minutes) 3. Interacting with Chatbots (10)
4. Defining Prompt Engineering (10)
5. Task introduction (10)
6a. Opening POE on an iPad (5)
6b. Thinking-aloud Protocol introduction (5)
7. Completing a Writing Task with ChatGPT (45)
8. Reviewing Concepts and Reflecting (15)
Materials 1. Generative AI tools on POE app on iPads
2. Google Docs
3. Shared Google Drive folder:
a. Contest website
b. Marking scheme
c. Pre- and post-workshop questionnaires
d. Workshop slidedeck
18

Table 2

A summary of student responses to a post-workshop cognitive load questionnaire

Questionnaire The learning I had to put a lot It was I felt frustrated I did not have During the I need to put lots The
Item content in this of effort into troublesome for answering the enough time to workshop, the of effort into instructional
workshop was answering the me to answer questions in this answer the way of completing the way in the
difficult for me. questions in this the questions in workshop. questions in this instruction or learning tasks or workshop was
workshop. this workshop. workshop. learning content achieving the difficult to
presentation learning follow and
cause me a lot of objectives in this understand.
mental effort. workshop.

Average 4.136363636 4.181818182 4.227272727 4.045454545 4.181818182 4.181818182 4.363636364 4

Standard 1.582507189 1.435481125 1.377776614 1.463110604 1.592732412 1.468279318 1.398824491 1.66189794


Deviation
Mode 6 3 3 3 6 6 6 6

View publication stats

You might also like