Student Perceptions of ChatGPT Use in A College Essay
Student Perceptions of ChatGPT Use in A College Essay
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
G
enerative artificial intelligence (AI) tools are mandated use of ChatGPT. We obtained perceptions of
penetrating society at an incredible rate and have ChatGPT before the assignment. Then, after the use of
produced significant disruption, particularly with ChatGPT to complete an essay, obtained perceptions of
the release of ChatGPT [1]. Writing as a uniquely human ChatGPT after using it. Previous research has indicated
activity appears to be under threat with the onset of tools that new technological capabilities do not always change
that can generate movie scripts, news articles, and journal human perceptions and performance in expected ways
manuscripts [2]. In education, the launch of ChatGPT has [15]. Additionally, student perceptions play a vital role in
resulted in mixed feelings among educators [3], popular shaping their motivation, engagement, and academic
media [4], and researchers [5] with some declaring the achievement [14, 16, 17]. As such, this study aims to
college essay now dead [6] or banning use of ChatGPT provide quantitative and qualitative analyses of student
[7]. As reported by Sullivan and her colleagues [5], perspectives of ChatGPT to fill this gap in the literature
roughly 33% of 1,000 students surveyed used ChatGPT with a more informed and nuanced understanding of the
for their writing and, out of those students, 75% use of AI in education. This study also seeks more
acknowledged it as cheating [5]. Others are more practical contributions including user-centered
optimistic that the use of AI will change education and recommendations for effectively integrating ChatGPT in
essay writing for the better [3]. ChatGPT can provide both digital and physical classrooms, enhancing its use in
students with immediate and personalized feedback, writing assignments, improving AI technologies, and
flexible learning, and accessibility [8]. These informing teachers with LLM-supported grading.
technologies also have the potential to make completing
mundane tasks more efficient [9] including grading, A. Related Work
d The emergence of AI tools, including ChatGPT, has
Chad Tossell, Ali Momen, Katrina Cooley, and Ewart de Visser are with
fthe Department of Behavioral Sciences & Leadership, USAF Academy, brought about new possibilities and challenges in
US (email: [email protected], [email protected], education. These technologies have changed teaching
[email protected], & [email protected]). Nathan and learning including how teachers interact with
Tenhundfeld is with the Department of Psychology at the University of
Alabama – Huntsville, US (email: [email protected]).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
students, develop course material, tutor, and assess [see 16]. Positive perceptions of the learning experience can
18 for a review]. ChatGPT was effectively leveraged to enhance students' engagement and motivation, leading to
provide interactive dialogues with students to help them improved academic outcomes. Conversely, negative
learn a new language [19]. In STEM education, ChatGPT perceptions can result in disengagement, reduced
has shown promise in physics [20], math [21], design motivation, and hindered academic success. In one of the
[14], and engineering [22]. Based on these results, only studies that investigated student perceptions of
scholars have argued more capable tools, including ChatGPT to date, Shoufan [14] assessed how students
ChatGPT, can become integral to more effective writing, perceived ChatGPT and its impact on their learning [14].
akin to calculators supporting advanced mathematical In the first stage, students were asked to evaluate
computations [23]. Beyond support in traditional ChatGPT after using it to complete a learning activity,
educational settings, there have been several applications and their responses were analyzed through coding and
of AI-based tools to enhance personalized learning, better theme building. In the second stage (three weeks later), a
support students with disabilities and inclusivity, and questionnaire was administered revealing that students
help to make teaching and grading more efficient [10]. found the tool helpful for their studies and work.
Although there is excitement based on this research However, students also acknowledged that ChatGPT's
and the innovative leaps in development of LLMs, these answers were not always accurate and recognized the
AI tools have generated new concerns and exacerbated need for background knowledge to effectively work with
previous challenges [5]. These include copyright issues, the tool. Despite its limitations, most students remained
bias, trustworthiness, excessive reliance on the optimistic about future improvements in ChatGPT's
technology, and the difficulty of effectively incorporating performance. Outside of this study, most reports have
AI-based tools into teaching practices [24]. Additionally, predominantly examined the LLM system's capabilities
AI’s limited knowledge base and inconsistent factual by engineers, educators, and researchers rather than
precision have been recognized as significant drawbacks students' perspectives in a natural setting.
[25]. One concern is the potential for perpetuating bias
and reinforcing existing inequalities. Language models D. The Current Study
like ChatGPT learn from vast amounts of data, including This study explores student perceptions before and
potentially biased or discriminatory sources, which can after use of ChatGPT as part of an essay writing
result in biased or discriminatory outputs. Lastly, the assignment within a Human Factors Engineering in
mere availability of the tool can lead to distrust. For Design course at the United States Air Force Academy
example, a teacher attempted to fail an entire class of (USAFA). In contrast to previous studies, our analyses
students based on incorrect suspicion of wide-spread focus on student perceptions of the technology for an
ChatGPT use [7]. Given the debate about its ability to actual essay assignment requiring use of ChatGPT. We
accurately perform human tasks, the morality of using the assess student responses to address three research
tool, and the distrust of its use, it is especially important questions (RQs):
to investigate trust in ChatGPT [26, 27].
AI, with its ability to analyze vast amounts of data and RQ1: What are students’ perceptions of an assignment
perform complex tasks, has also made inroads into requiring the use of ChatGPT?
grading systems. Automated grading algorithms have RQ2: What are student perceptions of ChatGPT to
been developed to assess student assignments, saving support their learning and are they comfortable
time for educators, and providing prompt feedback to taking responsibility for the content it produces?
learners. Automated writing evaluation (AWE) RQ3: Do students trust ChatGPT and how does this
technologies, for example, can help teachers save time in impact their intent to rely on it for future
assessing writing, encourages more writing practice, and assignments and for grading?
complements writing instruction. Student perspectives
on AWE are diverse [28, 29]. In one study, students rated
II. METHOD
AWE favorably in terms of ease, enjoyment, usefulness,
and fairness. They reported revising more and increasing To investigate our RQs, a mixed-methods approach
their confidence after using the system. However, combining the strengths of quantitative and qualitative
students tended to focus on low-level writing feedback methods in education was used [33, 34]. The quantitative
and sometimes felt overwhelmed by the amount of phase primarily used self-report ratings on Likert items
feedback provided. When directly asked about their using a pre-post test approach. For the qualitative phase,
preferences for human versus automated feedback, two open-ended questions were asked and then analyzed
students tended to prefer comments from teachers or to further understand the quantitative findings.
peers rather than computers [30, 31, 32].
A. Participants
B. The Importance of Student Perceptions in Education Participation in this study was voluntary and did not
Student perceptions play a vital role in shaping their impact final grades in the course. Twenty-four of 47
motivation, engagement, and academic achievement [14, cadets (eight women) that were enrolled in the course
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
completed both the pre- and post-surveys. All opportunity to fill out questionnaires about their
participants who completed the pre-survey also perceptions of and experiences with ChatGPT as part of
completed the post-survey. Participants were in their this assignment.
senior year at USAFA with a mean age of 22.25 (SD =
1.23). The participants in this study, mirroring
undergraduates in comparable engineering programs, had C. Assignment
successfully completed a diverse range of foundational The final paper assignment involved writing an essay
courses, including (but not limited to) calculus, on a topic covered in class to extend the online and in-
mechanics, electrical circuits, thermodynamics, and their class discussions. Students were expected to present an
specialized electives. Unlike other undergraduate intriguing point related to the topic; one that may not
students, all USAFA cadets are required to take additional have been apparent without prior discussions and
engineering courses in astronautical, aeronautical, readings. The essay needed to reference ideas from class
mechanical, and electrical engineering regardless of their and various assigned readings, use additional sources,
engineering focus. At the time of this study, cadets and extend the discussion by incorporating these
reported very limited experience with ChatGPT (Table additional sources. Students were required to submit
1). None had used ChatGPT for a course assignment. three components near the end of the semester: (1) an
initial draft listing the uncorrected portions generated by
TABLE I
ChatGPT and any human-generated content, (2) a second
STUDENTS’ EXPERIENCE WITH CHATGPT BEFORE
draft with corrections made by the student, highlighting
THE ASSIGNMENT.
and addressing any errors made by the AI, and (3) the
Survey Questions Yes (%) No (%) final polished paper. The final paper adheres to a five-
Have you heard of ChatGPT? 92 8 page limit, following the conference paper template of
Have you opened ChatGPT? 30 70 the Human Factors and Engineering Society. In the rubric
Have you used ChatGPT? 30 70 provided to students, the paper needed to demonstrate
Have you used ChatGPT in any novelty, structure, a strong case, technical understanding,
0 100 and clear English presentation. The full assignment
way for a course assignment?
description and the rubric were presented to students
halfway through the semester and they were given
roughly eight weeks to complete their essays.
B. Setting
USAFA is an undergraduate educational institution
where students are cadets in the military and undergo D. Procedure
rigorous academic, physical, and leadership training to Nearly halfway through the course in the Spring
become officers in the United States Air Force or United Semester of 2023, students were provided the ChatGPT-
States Space Force. Cadets are known to value honor as supported assignment description (summarized above)
demonstrated by their adherence to the Honor Code: “We on a physical handout. They were given roughly five
will not lie, steal, or cheat, nor tolerate among us anyone minutes to read the assignment. Immediately after
who does.” This code reflects the academy’s commitment reading the assignment description, we administered the
to fostering an environment of integrity, honesty, and pre-survey with items described in the next section.
ethical conduct among its cadets. Confidentiality and Students then completed the assignment over the last half
anonymity were maintained throughout the research of the semester (roughly two months) and every cadet
process. submitted it online ahead of the due date. After
The course “Human Factors Engineering in Design” submission, they then had the option to complete the
is the final course in the Human Systems Engineering post-survey and open-ended questions. Their essays were
major at USAFA and required for all senior cadets graded and used for their formal grade in the course.
majoring in this ABET-accredited degree program. It is However, their anonymous participation in completing
an advanced course covering topics such as robotics, the surveys associated with this study was not factored
extended reality, ethics, theories, and methods in design. into the essay or final grades.
It adopts a graduate seminar format, emphasizing active
participation and interaction rather than traditional
lectures. E. Measurement of Perceived Assignment Difficulty and
Grades in the course were determined based on Quality
several components. Students were required to To address RQ1, we assessed student expectations of
participate in critical discussions online, active the assignment concerning difficulty, quality, and
participation in class, and the submission of two papers— anticipated grade with single item measures (see
one analyzing a journal article and another addressing Appendix). Consistent with prior research, the validity
current HF/HCI challenges (i.e., the final course paper)— and reliability of single-item measures have been
also contributed to the final grade. For this final paper, demonstrated in self-assessments of learning [e.g., 35,
students were required to use ChatGPT and offered the 36, 37]. Participants rated the “difficulty of the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
assignment” on a Likert scale from 1 (Very Easy) to 7 has been used previously to assess trust directly and
(Very Difficult). Participants indicated their expectations serves as a good way to distinguish trustworthiness of an
on the “quality of this paper” on a Likert scale from 1 automated agent from trusting that agent [43, 42, 45]. We
(Lower) to 7 (Higher). Participants were prompted to adjusted the first four items to remove references to
“Estimate what grade percentage (1 - 100%) you believe “surveillance” and “route” which were used in the
you will obtain for this assignment” and provided their original survey to refer to an automated navigation
response in a text box. system.
We also assessed to what degree students trusted a
grade assigned by ChatGPT and their preference to be
F. Measurement of Perceived Learning and graded by ChatGPT or the instructor. These measures
Responsibility were adapted for this essay assignment in education from
To address RQ2, we used questions about the learning surveys used in studies in military surveillance and
value and the students' comfort in taking ownership of training [46] and autonomous driving [47]. As shown in
their work. Historical studies, such as those by Mabe & the Appendix, trust in grading was measured by
West [38] and Pace [29], have highlighted the agreement or disagreement rated with three statements
effectiveness of self-reports as a genuine measure of concerning the trust in fairness of grading by the
student learning, especially under conditions of instructor, ChatGPT, and a combination of both. Ratings
anonymity. Due to the absence of more direct student were provided on a Likert scale from 1 (Strongly
learning metrics in the course, we utilized “perceived Disagree) to 7 (Strongly Agree). Participants also
learning value” and “relative learning value” as suitable selected their grading preference among the instructor,
alternatives based on research in self-efficacy [84]. ChatGPT alone, or a combination of the instructor with
Participants assessed the educational value of the ChatGPT.
assignment on a Likert scale from 1 (Not very valuable) Lastly, we measured propensity to trust, a construct
to 7 (Very valuable). Participants compared the used to capture individual difference attitudes for trust in
assignment's learning value to other papers, rating it on a machines [48, 49, 50, 51, 52]. This measure uses six
Likert scale from 1 (Not very valuable) to 7 (Very items to characterize individual trait-based trust [53].
valuable). Participants expressed their “comfort level in
being responsible for all content created by ChatGPT” on
a scale from 1 (Very uncomfortable) to 7 (Very I. Open-Response Questions
comfortable). To provide context to the quantitative measures, we
asked participants to respond to two open-ended
questions after submitting their assignment: 1) Please
G. Measurement of Perceived Trustworthiness provide overall comments on your experience using
To address RQ3, we assessed the perceived ChatGPT on this assignment and 2) How do you think
trustworthiness of ChatGPT and participant’s trust in ChatGPT should be integrated with education in the
ChatGPT. We utilized the updated Multi-Dimensional future?
Measure of Trust, Version 2 (MDMT) questionnaire,
containing subscales like Reliable, Capable, Ethical,
J. Data Analysis
Transparent, and Benevolent [40]. This survey was
selected because of its reliability (our items: α > .90) and Data were analyzed with various parametric (t-tests)
for its strong theoretical justification to distinguish and non-parametric (chi-squared) tests. Additionally,
performance (reliable, capable) and moral (ethical, bayes factors (BF10) were used to provide evidentiary
transparent, benevolent) trustworthiness [41] and weights for the null/alternative hypothesis. Bayes factors
validation [40, 42]. The MDMT was executed as provide a useful alternative to parametric statistics for
prescribed with 4 items per dimension, totaling 20 items. smaller sample sizes because they are relatively immune
One additional item of “trustworthy” was added to test to sample size for two reasons. Bayes factors only depend
how well the single item reflected the entire scale in line on observed data, not sampling characteristics. Bayes
with efforts to make more efficient trust scales [43, 44]. factors are also more coherent because inferential
Participants ranked these 21 questions about ChatGPT on statements, based on comparing the null hypothesis and
a scale from 0 (Not at all) to 7 (Very). the alternative hypothesis, are mutually consistent as
required by probability theory [54; for more in-depth
review see 55].
H. Measurement of Trust in and Reliance on ChatGPT To interpret bayes factors, any value above one is
To measure trust, participants expressed their considered the relative likelihood that the alternative
likelihood of relying on the agent in future situations hypothesis is true. For example, a BF10 of 20 means that,
using a Likert scale from 0 (Strongly Disagree) to 7 given the data, the alternative hypothesis is 20 times more
(Strongly Agree) [45] with items focused on whether likely than the null. Conversely, values under one are the
participants would monitor ChatGPT’s outcomes as well evidentiary weight in favor of the null hypothesis; a
as rely on it in future scenarios (α > .74). This measure
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
7
6
Average Likert Ratings
5
4
3
2
1
0
Difficulty Essay Quality Learning Value Relative Learning Comfort Taking
Value Responsibility
Assignment Learning
Pre Post Midline
Fig. 1. Students’ perceptions of ChatGPT in assignment, context of learning, and trustworthiness prior to completing the assignment
(blue) and (red) relative to a midline (dotted line). Error bars represent +/- 1 SEM.
BF10 of 0.05 means that, given the data, the null capabilities and the authors independently reviewed and
hypothesis is 20 times more likely than the alternative. verified the veracity of the summary. To verify, the
Traditionally, interpretations consider a BF10 of less than authors read through the open-ended responses, and
3 to be anecdotal evidence, 3 – 10 to be moderate, 10 – compared them against the themes generated by
30 to be strong, 30 – 100 to be very strong, and greater ChatGPT. The authors looked for any hallucinations or
than 100 to be extreme evidence for the alternative mischaracterizations of what was said, but none were
hypothesis [56]. The inverse of those values gives the found.
interpretation cutoffs for evidentiary weight of the null
hypothesis. Additionally, because bayes factors do not III. RESULTS
rely upon arbitrary decision cutoffs, like p-value
thresholds as part of null-hypothesis significance testing A. Course Grades
(NHST), corrections are not needed for multiple tests as The instructor assessed all 47 student essays with the
one is interpreting the cumulative evidentiary weight tailored rubric developed for this assignment. The
rather than making a binary determination about the resulting scores (M = 85.3%, SD = 8.9%, LL = 51%, UL
existence of an effect [57]. Where appropriate, Cohen’s d = 97.5%) were comparable to scores from previous
was used for effect sizes with the ability to distinguish semesters for similar pre-ChatGPT essay assignments in
small (d = .2), medium (d = .5), and large (d = .8) effect this course as graded by the same instructor.
sizes [58].
The perceived trustworthiness scale (MDMT) gives B. Perceived Difficulty of Assignment and Quality of
users the option to select “Does Not Fit” in their response Essay Produced
option. We therefore evaluated whether there was a Figure 1 shows comparisons of pre- and post-survey
significant difference between the frequency with which responses for items rated on a Likert scale relative to a
participants selected “Does Not Fit” as a function of the midline. Considering RQ1, participants found the
dimension and between pre and post. We ran a chi square assignment difficult both before and after completing the
on the five dimensions, excluding the trustworthy essay. Prior to completing the assignment, participants’
question. We chose to exclude this item because it was anticipated difficulty of the essay assignment requiring
added by us for this study and was not part of the original ChatGPT use was significantly lower (M = 4.88, SD =
five dimensions. 0.85) than their reported difficulty after completing the
Open-ended responses were submitted to a thematic assignment, (M = 5.25, SD = 0.61), t(23) = 2.10, p = 0.05,
analysis, using ChatGPT (May 24th, 2023, version 4.0) BF10 = 1.37, d = 0.43. This was also represented by a
to code responses. While the use of ChatGPT for thematic significant decrease between their self-reported
analyses is novel, the LLMs which underlie ChatGPT anticipated grade from before the assignment (M = 88.28,
have demonstrated exceptional summarization abilities SD = 3.68) and their self-reported anticipated grade from
[59, 60] while avoiding experimenter biases human after the assignment (M = 86.33, SD = 5.85), t(22) = 2.42,
evaluators may have in this domain of qualitative work p = 0.02, BF10 = 2.34, d = 0.50. There was also a
[61]. As such, we leveraged these summarization significant decrease between the quality participants
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
7 Pre Post
5
Average Likert Rating
0
Reliable Competent Ethical Transparent Benevolent Trustworthy Intent to Rely
Performance Trustworthiness Moral Trustworthiness Single Item Reliance
Fig. 2. Average scores on each dimension of the multidimensional trust scale. Error bars represent +/- 1 SEM.
expected on this paper, relative to their other E. Trust Propensity, Trust in and Reliance on ChatGPT
assignments, before the assignment (M = 5.48, SD = Further considering RQ3, there was no significant
0.99) compared to after the assignment (M = 4.75, SD = difference in propensity to trust before their interaction
1.42), t(22) = 2.51, p = 0.02, BF10 = 2.77, d = 0.52. (M = 3.46, SD = 0.84) and after completing the
C. Perceived Learning and Responsibility assignment (M = 3.73, SD = 0.58), t(21) = 1.53, p = 0.14,
BF10 = 0.61, d = 0.33. Furthermore, cadets’ intent to rely
Considering RQ2, participants’ expectation of the on ChatGPT prior to completing their essays was
learning value of ChatGPT was somewhat high before somewhat low (M = 4.07, SD = 0.91), and was not
they completed the essays (M = 5.43, SD = 1.04) and this significantly different from their intent to rely on
did not change significantly after they completed their ChatGPT after completing their essays (M = 4.23, SD =
essays (M = 5.57, SD = 1.17), t(22) = 0.59, p = 0.55, BF10 1.28), t(22) = 0.753, p = 0.46, BF10 = 0.282, d = 0.16.
= 0.38, d = 0.23. Compared to other assignments, cadets
thought this assignment was more valuable both before F. Trust in Grading
completing the assignment (M = 5.46, SD = 1.13) and Additionally, participants indicated a significant
after (M = 5.57, SD = 1.13). Cadets were not very difference in trust in the grading process between
comfortable taking responsibility for the assignment both instructor only, ChatGPT only, and a combination of the
before (M = 3.71, SD = 1.58) and after completing their instructor & ChatGPT, F(2, 46) = 26.68, p < 0.01, BF10 >
essays (M = 3.82, SD = 1.66). Differences between pre-
1000, η2 = 0.537 (Figure 3). Post-hoc analyses indicated
and post-survey scores were not significant for learning
that participants trusted the instructor (M = 6.29, SD =
value, relative learning, and comfort scores.
16 7
D. Perceived Trustworthiness of ChatGPT
14 6 Average Trust to Grade Fairly
Preference (Frequency)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
1.08) significantly more than ChatGPT alone (M = 4.29, propensity and perceived trustworthiness on intent to
SD = 1.52), p < .01, BF10, U > 1000, d = 1.48, the instructor rely. The model was a significant predictor of intent to
more than the instructor and ChatGPT together (M = rely, F(2, 20) = 34.357, p < .001, r 2 = 0.77, with both
5.50, SD = 1.22), p < 0.01, BF10, U = 7.339, d = 0.586, and trust propensity (β = 0.342, p = 0.018) and perceived
between ChatGPT alone and the instructor & ChatGPT trustworthiness (β = 0.631, p < .001) as significant
together, p < .01, BF10, U > 1000, d = 0.90. Despite this predictors.
clear difference in trust, 15 of the 24 would have
preferred the instructor and ChatGPT grade together (the H. Qualitative Analysis for Open Response Questions
other 9 preferred the instructor alone and no one preferred
ChatGPT alone). As shown in Figure 3, this represented We submitted cadet responses to the two open-ended
questions to a thematic analysis using ChatGPT.
a significant difference between observed frequencies,
ChatGPT was given the following instructions “Conduct
X2(2, N = 24) = 14.20, p < 0.01. There was no significant
a thematic analysis on the responses below. For context,
difference on anticipated grade or for the assignment as a these responses were given to a question that asked about
function of who the students preferred to grade the participants’ general experiences using ChatGPT on this
assignment, t(22) = 0.79, p = 0.44, BF10 = 0.48, d = 0.33. assignment.” Table 2 was written by ChatGPT, with
G. Relationships between Trustworthiness, Trust minor edits made by the human authors.
Propensity and Trust As shown in Table 2, cadets' responses to the
assignment demonstrated a range of perspectives. On the
Across both observations, we regressed average trust
TABLE II
THEMES AND FREQUENCY COUNTS BY QUESTION
Please provide any overall comments on your How do you think ChatGPT should be integrated with
experience using ChatGPT on this assignment. education in the future?
Theme Summary Count Theme Summary Count
The assignment led to a shift in Participants suggest using ChatGPT as
perspective regarding AI, an aid or learning tool to support
Change in understanding its functions, and Assistance studying, gather information, come up
Perspectives its potential impact on the future. and Learning with ideas, and enhance the learning
7 9
and Positive Participants appreciated the Tool not process. Participants liken ChatGPT's
Feedback assignment, found it beneficial, Replacement role in education to graphing calculators,
and recognized the potential of AI dictionaries, or internet resources, being
tools like ChatGPT. used as a tool or reference.
ChatGPT performed poorly on the
assignment and was often very
ChatGPT should be integrated into
repetitive. It was difficult to use
ChatGPT education in a collaborative manner,
and did not do a good job. Collaborative
Poor where students and AI work together. It
Working with ChatGPT posed 6 & Ethical 9
Performance can be used as a teammate, tutor, or a
challenges, including figuring out Use
& Usability resource to support learning. It is not
how to ask proper questions,
simply a cheating tool.
wrestling with AI responses, and
difficulty with citations.
ChatGPT should not be relied upon
Participants found the assignment
entirely. It should be used as a reference,
to be a valuable learning
tool, or method to complement human
Learning experience. They learned about Appropriate
4 thought, but not generate whole papers. 5
Experience the subject & ChatGPT's Reliance
There are limitations and assignments
capabilities, how to use it
like this help identify the areas where
effectively, and its limitations.
ChatGPT falls short
Some participants noticed that
A few students identified the potential
ChatGPT's first drafts and
Need for Error for errors in code and auto-grading
responses were not up to the
Manual 3 Checking & though were explicit in not trusting 3
desired standard, requiring
Writing Auto Grading ChatGPT to grade without human
extensive additional work,
oversight
analysis, and editing.
Some participants remained The technology could redefine
Skepticism
skeptical about ChatGPT's Redefining education, and its full impact is uncertain
and Limited 3 3
capabilities and were reluctant to Education though this should be implemented with
Trust
fully rely on its responses. caution.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
one hand, the assignment was met with enthusiasm, as evolving from viewing ChatGPT as a potential “cheating
evidenced by one cadet who described it as the “coolest tool” to recognizing it as a collaborative resource
assignment” they had encountered, expressing their requiring human oversight, technical aptitude, subject-
belief in the transformative potential of AI tools like area proficiency, and calibrated trust. After use, students
ChatGPT for the future: rated ChatGPT as a valuable tool for learning and more
ethical and benevolent relative to their perceptions before
“Coolest assignment I’ve done to date. I think tools
use. Students' low comfort in taking responsibility for the
like ChatGPT will change our future and assignments
assignment could be attributed to ethical concerns, given
like these are paramount to understanding the
the high percentage of students who believe using
direction we want to take them.”
ChatGPT is akin to cheating [12]. Additionally, the lack
On the other hand, it was apparent that the cadets' of full confidence in the accuracy and reliability of
views on the assignment's learning value were more ChatGPT's output likely contributed to students'
varied and some expressed the shift in the skills being discomfort after the assignment. These findings highlight
assessed: the complex dynamics associated with integrating AI
tools like ChatGPT into higher education.
“I thought it was more of an assessment of our editing Students did not want to be evaluated on this
skills then our opinions on the topic.” assignment by ChatGPT alone, instead preferring to be
While the cadets acknowledged the exposure to graded by ChatGPT and the instructor together or by the
cutting-edge technology, concerns emerged regarding the instructor alone. Overall, our results reveal that
assignment's applicability to assessments and its technologies like ChatGPT do not eliminate the need for
potential for grading. One cadet noted that while they student and instructor engagement, but rather
found the experience valuable, they were not comfortable complement it, requiring judicious trust and a blend of
with ChatGPT being involved in the grading process: human skill and AI capabilities. In educational contexts,
this integration of AI tools with student participation
“I thought it was a great exposure to the technology seemed to foster an effective learning experience
that we are going to be seeing so much more of in the according to students, yet also revealed areas where
future! I recommend keeping this assignment, but I ChatGPT and its integration could be improved.
don't recommend having ChatGPT grade it.”
This apprehension seemed to indicate a lack of trust A. Implications for Student Learning
in the AI's ability to provide reliable and accurate
assessments. Students’ open-ended responses also STEM and non-STEM educators should be
uncovered cadets' overall trustworthiness and comfort encouraged to integrate AI technologies in deliberate
level when working with ChatGPT. Several participants ways to promote student learning, with some caution. An
expressed feelings of discomfort, likely stemming from assignment requiring the use of ChatGPT to produce
the assignment's departure from their norms. Wrestling better papers was widely accepted by students as
with ChatGPT's responses and seeking ways to integrate valuable. Some students were enthusiastic, describing it
them effectively highlighted the cadets' commitment to as the “coolest assignment” they had encountered,
taking responsibility for their work: emphasizing the transformative potential of AI tools like
ChatGPT for the future. The assignment was also viewed
“I think it was an interesting assignment, although I by students as difficult and more difficult after they used
felt a little uncomfortable doing it just because it was ChatGPT to complete it. While some of this was due to
outside of my norm. I found myself wrestling with usability and related concerns with ChatGPT, the
ChatGPT on some parts but I feel like there were assignment also required “more” from our students: both
times it provided some decent feedback that I could in terms of the number of drafts required to turn in (i.e.,
improve on. In my mind it is still nothing more than a three versions of the paper) and the overall quality
tool and I find it difficult to rely wholly upon it.” expected. ChatGPT helped them reach higher levels as
students’ self-reported assessment of their paper quality
IV. DISCUSSION was high. Despite this result, grades for this assignment
were comparable to grades for similar essay assignments
In this study, we explored student perceptions of
that did not mandate use of ChatGPT. It is possible that
ChatGPT in the context of essay writing within an
the beneficial or detrimental effects of ChatGPT are more
engineering course focusing on three RQs tied to: 1)
subtle, such as the finding that the tool is particularly
ChatGPT use in an assignment, 2) its capabilities to
beneficial for weaker performers [87].
support learning, and 3) student trust and comfort in
While students recognized the learning value and
relying on ChatGPT for future assignments and grading.
improved essay quality facilitated by ChatGPT’s
Our results showed that ChatGPT did not make the
feedback, they also grappled with calibration issues
writing assignment easy but changed it in ways that
between their initial expectations and the actual
yielded perceived learning benefits for students. The
outcomes. This could be a double-edged sword for
thematic analysis revealed a shift in student perception,
educators. On one hand, engaged learning is a
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
cornerstone of successful knowledge and skill acquisition C. Improving the Writing Process with ChatGPT
[62, 63, 64, 65] and reviewing, critiquing, and editing Instructors should further experiment with integrating
papers is effective for engaged learning [66]. The more ChatGPT intentionally [71] versus open-ended use by
critical students are in their reviews of others’ papers, the considering the iterative and dynamic nature of the
better they do on their own work, and this leads to better writing process as well as the specific functions that need
learning of writing skills and more knowledge of the to be performed to accomplish a writing assignment (see
subject material itself [66]. Reasons for this include that Figure 4). We deliberately designed the task to trade-off
their reading is not passive, but active and critical to text generation by ChatGPT, editing by the human, and
identify aspects of writing to keep, but also elements they then combining the product. By design, combined human
want to avoid (e.g., poor writing structure, mistakes). and AI teaming was necessary for successful completion
Like assignments requiring students to critique their of the task [72], as is mostly the case with the human use
peers, the essay assignment used in this study required of automated tools [73] and human-autonomy teaming
participants to critique ChatGPT output. This active [74]. This approach is often advocated when automation
reviewing, editing, and writing are integrated to reinforce is not perfect, and interdependence is required to create
the other and this is invaluable to student learning [67]. good team performance [68]. Guiding students to
Based on our results, students recognized the learning collaborate with the AI, instead of preventing its use, or
benefits of integrating AI tools like ChatGPT in allowing supervised use without specific constraints
educational settings. could be beneficial to discover the best way to integrate
On the other hand, some of the student engagement ChatGPT into course assignments.
with ChatGPT was frustrating and did not seem It is not clear yet where ChatGPT can be most
beneficial for learning. Recall the theme ChatGPT effective in the writing process when working
produced in its thematic analysis of students’ open-ended collaboratively with people, as opposed to strictly
responses: working by itself, but our data suggest some possible
“ChatGPT performed poorly on the assignment and directions. Students indicated it could be useful at early
was often very repetitive. It was difficult to use and stages of idea generation, providing topic information,
did not do a good job.” producing text from rough ideas, and reviewing the text.
Other potential uses could be in specific roles at the
Participants mentioned challenges in creating prompts, process level (proposing, evaluating, transcribing) or
managing word counts, and dealing with repetitive providing high level writing schemas and monitoring
language. These findings indicate the need for clear them at the control level (Figure 4 [75]). Students can
practical guidelines and instructions for students to further be encouraged to use AI-generated content as a
optimize their interactions with AI tools and streamline starting point and then iteratively refine and enhance their
the assignment process, which instructors should essays. Others have suggested using ChatGPT to prepare
provide. Guidance and training on creating suitable outlines, revise content, proofread the paper or reflect on
prompts, for example, can help students optimize their the writing [71]. This approach can foster a deeper
use of AI tools and reduce potential frustrations. understanding of the writing process and enable students
to develop their skills through active engagement with AI
technology. Traditional schools and platforms like Khan
B. Design Recommendations for LLMs in Education
Academy and Udacity are increasingly exploring the
integration of AI tools like ChatGPT to enhance
The findings also point to the further development
personalized learning experiences and to supplement
and customization of LLMs for educational contexts. AI
their existing course materials.
developers should continue refining and improving
language models like ChatGPT to enhance their
effectiveness and reliability. Common issues like
repetitive language generation and inadequate essay
production should be addressed to ensure a more
seamless and valuable user experience. Furthermore,
meta-data could be provided about the tool’s confidence
to assess whether it is producing an accurate output for a
specific prompt, by providing a hallucinating score, for
example. Such confidence indicators have been known to
increase trust calibration [68, 69] and help to increase
transparency of the tool [70]. ChatGPT was not designed
specifically for student learning. As LLM capabilities are
enhanced and more customized AI tools are developed Fig. 4. Adapted from Hayes’ (2012) model of writing [75].
for education, it is likely to be better for more tailored ChatGPT as a ‘Collaborator’ helps 'propose' by generating
student engagement. ideas, ‘translate’ by converting concepts to text, 'evaluate' by
providing feedback, and 'transcribe' by drafting content. This
new process likely impacts traditional human writing schemas.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
As discussed above, even simple integration of AI extending existing trust models, originally developed for
tools to encourage exchanges with ChatGPT likely human-automation interaction, to a novel setting [50, 80
yielded valuable repetition to practice the skill of writing ,43]. Our regression model showed strong support for
in an iterative way. One study examined the performance existing trust models and previous research validating
of over 4,000 students using Project Essay Grading our theoretical and measurement framework and
(PEG) automated grading for writing and revising essays investigations of trust with ChatGPT specifically [48, 43,
with feedback [76]. Students who revised their papers 81, 86]. Moreover, our study extends our understanding
achieved small score increases with each draft though the of trust in AI by highlighting the pivotal role of moral
rate of growth diminished over time, reaching a plateau trust, particularly in the realms of ethical and benevolent
around the 11th or 12th revision. LLMs offer the potential trustworthiness, in shaping overall trust and reliance on
to provide more precise and effective feedback, perhaps AI systems like ChatGPT. The increase in these
requiring fewer revisions because of enhanced feedback; dimensions after using ChatGPT (with medium to large
however, more research is needed to determine the effect sizes) was novel. It is possible that students
differences between other ways of learning to write perceived ChatGPT to be more ethically trustworthy
versus a more cyclic ChatGPT-writing approach. The because they received content violation messages while
taxonomy for levels and degrees of automation could be using the program, indicating that ChatGPT tries to
helpful as an initial guide [77, 78, 79] to distinguish adhere to OpenAI’s content policy. Such behavior may
between use of AI for initial ideas and inspiration (low have increased perceptions of moral competence of
writing automation) to using AI for early drafts, feedback ChatGPT which can increase trustworthiness [42].
and edits (medium writing automation) and purely AI- Ethicality may have further increased because students
generated work with little human input. perceived ChatGPT as providing information without
personal beliefs or opinions. Benevolence
trustworthiness may have increased because students
D. Implications for Grading observed ChatGPT’s behavior to be very helpful and
Most students preferred the instructor to use ChatGPT responsive to feedback from the user. While ChatGPT
to support grading and fewer students preferred the was considered trustworthy overall, intent to rely on
teacher alone or the LLM alone. Before ChatGPT, AWE ChatGPT was lower, which is consistent with findings
technologies were developed to help teachers save time from previous work [42, 85]. This result suggests that
in assessing writing, encourage more writing practice, students understood ChatGPTs capabilities but also
and complement writing instruction in the classroom realized that they could not fully depend on it alone to
Similar to AWEs, ChatGPT can assess essays in seconds complete the assignment; a conclusion consistent with
and enable teachers to assign more writing tasks without decades of human factors automation research that
an overwhelming increase in workload. Still, students demonstrate automation does not replace the human but
expressed the importance of human oversight: One of the changes the way we work with technology [15, 77].
strongest effects in this study was that no student wanted
to be graded by ChatGPT alone. This result appears to be
consistent with a preference to receive writing feedback F. Limitations
from teachers or peers rather than computers [30, 31, 32]. There were a few limitations to this study. Most
The strong reluctance to rely on ChatGPT alone for notably, the sample size was relatively small taking
grading suggests the need for further investigation into advantage of timing where ChatGPT had not been widely
the factors influencing students' trust and confidence in used by students yet and certainly not incorporated into
AI-generated outputs, particularly when they bear curriculums. Low sample sizes are not uncommon with
personal responsibility for the final work. This also raises early studies on technology integration including for
questions about the role of AI in the evaluation of student ChatGPT [8], smartphone use [82] and robots [83]
work. It seemed, when the stakes were high (e.g., presenting a trade-off between the impact of novel
assigning official grades), human instructor involvement technology use and the generalizability of results.
was perceived as critical. Despite this preference, and the However, we do believe our sample was representative
high trust in the instructor alone, most students preferred of senior undergraduate engineering students as well as
to be graded by the instructor and ChatGPT together. It is novice scientific writers. The use of bayes factors
possible that students thought that some involvement of uniquely accommodates small samples, and the
ChatGPT, as an assistive tool perhaps, in grading would moderate-to-large effect sizes, adds confidence to our
be beneficial rather than the instructor grading by results. However, our sample size may have precluded us
himself. This result speaks to a potential required shift in from identifying smaller effects. Future studies should
work for both teachers and students in academic settings. assess an increased number of students. Additionally,
long-term investigations that track students' experiences
and perceptions of AI-powered tools could provide
E. Implications for Trust in Artificial Intelligence valuable insights into the evolving dynamics between
Our study is among the first to rigorously explore trust students and AI technology and identify areas for
in AI, using ChatGPT, within the education domain continuous improvement.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
Second, the assignment was highly structured and understand how students build trust in AI technologies
specified how students had to work with ChatGPT in like ChatGPT in a learning environment. Practically, this
three iterative steps to show contributions of the AI study demonstrates the feasibility of integrating AI-based
versus the student. However, this may have constrained technologies into the classroom and provides usability
the use of ChatGPT in other more creative ways or in insights and recommendations based on user (i.e.,
ways that suited the student better. Another potential student) feedback. By integrating approaches from both
approach would have been to have the student use the human factors engineering and educational research, our
tool anyway they like but acknowledge in the final study not only provides practical insights for the design
product where ChatGPT had assisted, as we have in this of AI-enhanced educational tools but also a
paper for summarization of qualitative results and in multidisciplinary examination of the complex dynamics
another for figure generation [42]. Students could either of trust in AI systems.
summarize ChatGPT assistance in the acknowledgement
section or provide brief annotations in paper in the places V. CONCLUSION
where ChatGPT specifically assisted. There are many
Our research indicates that while AI tools like
ways to incorporate ChatGPT in an assignment and our
ChatGPT have promising applications in higher
approach represented one way. As such, our results could
education, they also pose challenges. These tools should
reflect our design of the assignment and not ChatGPT or
be viewed as helpful assistants to enhance writing,
other LLM capabilities more broadly. We hope this report
learning, and grading and not as replacements for student
encourages novel uses of LLM technologies in course
effort or teaching oversight in grading. Effective and
didactics and future studies of their effectiveness.
ethical use of AI in education requires acknowledging its
Finally, the participants in this study were also senior-
limitations, fostering AI literacy, and developing proper
level undergraduate human systems engineering
assessment methods. Institutions must also train students
students. Their education in human factors processes
and educators in using AI responsibly and creatively.
including knowledge elicitation through survey-based
Future studies should focus on improving understanding
user feedback methods (the approach used in this study)
of AI, guiding its use, and addressing issues with AI-
could have influenced their responses.
generated content.
G. Contributions APPENDIX
These limitations notwithstanding, our study provides Table 3 describes all the measures that were included in
empirically-grounded insights that are important for the surveys created for this study.
understanding how students perceive and interact with AI
in educational settings. Our study contributes one of the TABLE III
first mixed-methods studies exploring ChatGPT’s
application in college classes informing research and CUSTOM ITEM SURVEY MEASURES
design vis-à-vis intelligent technology use in naturalistic Assignment
contexts. To this end, we used a unique approach to
1. How would you rate the difficulty of this assignment on a scale of 1
formally integrate ChatGPT as part of a writing (Very easy) to 7 (Very difficult).
curriculum in the semester following the world-wide 2. How would you rate the difficulty of this assignment relative to writing
the paper by yourself (with no help from ChatGPT) on a scale of 1 (Very
release of ChatGPT. Our study provided a common AI easy) to 7 (Very difficult).
interaction for engineering students—before LLMs 3. Estimate what grade percentage (1 – 100%) you believe you will obtain
became ubiquitous—allowing a focused analysis of how for this assignment.
such exposure specifically influences their perceptions Learning
and attitudes towards AI in essay writing tasks. This 1. Rate how valuable of a learning experience you think this will be / was
included writing a full technical paper, not just a few on a scale of 1 (Not very valuable) to 7 (Very valuable).
2. Rate how valuable of a learning experience you think this will be / was
paragraphs, and had real consequences for students who relative to an essay you have already written on a scale of 1 (Not very
received an actual grade for their assignment. ChatGPT valuable) to 7 (Very valuable).
was evaluated as a writing assistant as well as a grading 3. Relative to an essay you have already written, rate what you think the
quality of this paper will be / was on a scale of 1 (Lower) to 7 (Higher).
assistant with implications for theory, design, and 4. What is / was your comfort level being responsible for all content
practice. created by ChatGPT on a scale of 1 (Very uncomfortable) to 7 (Very
comfortable).
Beyond pre- and post-student perceptions of 5. Generally, how did you feel about this assignment (open response)?
ChatGPT use for learning, grading, and their comfort in 6. How well has this technology (e.g. ChatGPT) been integrated into
college education (open response)?
taking responsibility for the essay produced with its help, 7. How should this technology (e.g. ChatGPT) be integrated into college
this report provides the first comprehensive trust education (open response)?
assessment for ChatGPT which has strong validity due to
the real vulnerability and consequences associated with
Grading
this assignment; a requirement for accurate trust
assessment [43, 48, 80]. Theoretically, this study extends 1. I trust that the instructor will grade fairly on his own from 0 (Strongly
Disagree) to 7 (Strongly Agree.
established trust models to a new domain, education, to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
2. I trust that ChatGPT will grade fairly on it from 0 (Strongly Disagree) tGPT_means_for_universities_Perceptions_of_scholars_and_stu
to 7 (Strongly Agree s own. dents
3. I trust that the instructor and ChatGPT will grade fairly together from 0
(Strongly Disagree) to 7 (Strongly Agree. [10] M. Rahman and Y. Watanobe. “ChatGPT for Education and
4. Of the three options (the instructor on his own, ChatGPT on its own, or Research: Opportunities, Threats, and Strategies.” Applied
the instructor and ChatGPT together), my preference is for _____ to Sciences vol. 13, issue 5783, May. 2023,
grade my paper from 0 (Strongly Disagree) to 7 (Strongly Agree). doi:10.3390/app13095783 [Online]. Available:
https://www.mdpi.com/2076-3417/13/9/5783
ACKNOWLEDGMENTS [11] S. Stacey, “Cheating on your college essay with ChatGPT won’t
get you good grades, say professors – but AI could make
This material is based upon work supported by the Air education fairer,” MSN, Dec. 2022. [Online]. Available:
Force Office of Scientific Research under award https://www.msn.com/en-us/news/technology/cheating-on-your-
college-essay-with-chatgpt-won-t-get-you-good-grades-say-
number 21USCOR004 and DARPA under award professors-but-ai-could-make-education-fairer/ar-
number FA8650-23-C-7318. The views expressed in this AA15pCJ8?li=BBnbcA1
paper are those of the authors and do not reflect those of [12] M. Nietzel, “More Than Half of College Students Believe Using
ChatGPT to Complete Assignments is Cheating,” Forbes, Mar.
the U.S. Air Force, DARPA, Department of Defense, or 2023. [Online]. Available:
U.S. Government. Please contact the authors for access https://www.forbes.com/sites/michaeltnietzel/2023/03/20/more-
than-half-of-college-students-believe-using-chatgpt-to-complete-
to the assignment and/or rubric used in this study. assignments-is-cheating/?sh=5d4d763c18f9
[13] R. Ventayen, “ChatGPT by OpenAI: Student’s Viewpoint on
REFERENCES Cheating using Artificial Intelligence-Based Application,” SSRN,
Feb. 2023, doi: 10.2139/ssrn.4361548. [Online]. Available:
[1] OpenAI releases ChatGPT: An advanced AI language model for https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4361548#:~
text-based interactions [Online]. Available: :text=Abstract,the%20integrity%20of%20student%20submissio
https://openai.com/chatgpt ns.
[2] N. Manohar and S. S Prasad, “Use of ChatGPT in academic [14] A. Shoufan, “Exploring Students’ Perceptions of ChatGPT:
publishing: a rare case of seronegative systemic lupus Thematic Analysis and Follow-Up Survey,” IEEE Education
erythematosus in a patient with HIV infection,” Cureus, vol. 15, Society Section vol. 11, Apr. 2023, doi:
no. 2, Feb. 2023, doi: 10.7759%2Fcureus.34616. 10.1109/ACCESS.2023.3268224.
[3] D. Baidoo-Anu and L. Owusu Anash. (2023, January). “Education [15] R. Parasuraman and V. Riley, “Humans and Automation: Use,
in the Era of Generative Artificial Intelligence (AI): Misuse, Disuse, Abuse,” The Journal of the Human Factors and
Understanding the Potential Benefits of ChatGPT in Promoting Ergonomics Society, vol. 39, issue 2, Jun. 1997, doi:
Teaching and Learning,” SSRN, Jan. 2023, doi: 10.1518/00187209777854388. [Online]. Available:
10.2139/ssrn.4337484. [Online]. Available: http://web.mit.edu/16.459/www/parasuraman.pdf
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337484 [16] K. Muenks, E. Canning, J. LaCosse, and D. Green, “Does my
[4] W. Heaven. (2023, April). “ChatGPT is going to change professor think my ability can change? Students’ perceptions of
education, not destroy it,” MIT Technology Review: Artificial their STEM professors’ mindset beliefs predict their
Intelligence, Apr. 2023. [Online]. Available: psychological vulnerability, engagement, and performance in
https://www.technologyreview.com/2023/04/06/1071059/chatgpt class,” Journal of Experimental Psychology General. vol. 149,
-change-not-destroy-education-openai/
issue 11, May. 2020, doi: 10.1037/xge0000763
[5] M. Sullivan, A. Kelly, and P. McLaughlan. “ChatGPT in higher
[17] J. Van Brummelen, V. Tabunshchyk, & T. Heng, (2021, June).
education: Considerations for academic integrity and student
“’Alexa, Can I Program You?’: Student Perceptions of
learning.” Journal of Applied Learning & Teaching, vol. 6, issue
Conversational AI Before and After Programming Alexa,”
1, Mar. 2023, doi: 10.37074/jalt.2023.6.1.17. [Online]. Available:
Interaction Design and Children (pp. 305-313), Feb. 2021, doi:
https://www.researchgate.net/publication/369378950_ChatGPT_
10.48550/arXiv.2102.01367
in_higher_education_Considerations_for_academic_integrity_an
[18] C. Zhang, C. Zhang, C. Li, Y. Qiao, S. Zheng, S. K. Dam, and C.
d_student_learning
S. Hong, “One small step for generative ai, one giant leap for agi:
[6] S. Marche, “The College Essay is Dead,” The Atlantic, Dec. 2022.
A complete survey on chatgpt in aigc era,” ACM, Mar. 2023, doi:
[Online] Available:
10.13140/RG.2.2.24789.70883. [Online]. Available:
https://www.theatlantic.com/technology/archive/2022/12/chatgpt
https://www.researchgate.net/publication/369618942_One_Smal
-ai-writing-college-student-essays/672371/
l_Step_for_Generative_AI_One_Giant_Leap_for_AGI_A_Com
[7] L. Lonas, “Professor Attempts to Fail Entire after Falsely
plete_Survey_on_ChatGPT_in_AIGC_Era
Accusing them of Using ChatGPT to Cheat,” The Hill, May. 2023.
[Online]. Available: [19] O. Topsakal, and E. Topsakal, E. (2022). “Framework for a
https://thehill.com/homenews/education/4010647-professor- foreign language teaching software for children utilizing AR,
attempts-to-fail-students-after-falsely-accusing-them-of-using- voicebots and ChatGPT (Large Language Models),” The Journal
chatgpt-to-cheat/ of Cognitive Systems, Vol. 7, No. 2, 33-38, 2022, doi:
[8] M. Firat. “How Chat GPT Can Transform Autodidactic 10.52876/jcs.1227392.
Experiences and Open Education?” Jan. 2023, doi: [20] W. Yeadon, O. O. Inyang, A. Mizouri, A. Peach, and C. P. Testrow,
10.31219/osf.io/9ge8m. [Online]. Available: “The death of the short-form physics essay in the coming AI
https://www.researchgate.net/publication/367613715_How_Chat revolution,” Physics Education, vol. 58, no. 3, Dec. 2022, doi:
_GPT_Can_Transform_Autodidactic_Experiences_and_Open_E 10.48550/arXiv.2212.11661.
ducation [21] S. Frieder, L. Pinchetti, R. R. Griffiths, T. Salvatori, T.
[9] M. Firat, “What ChatGPT means for universities: Perceptions of Lukasiewicz, P. C. Petersen, ... and J. Berner, “Mathematical
scholars and students,” Journal of Applied Learning & capabilities of chatgpt,” arXiv preprint, Jan. 2023, doi:
Teaching, vol. 6, issue 1, Apr. 2023, doi: 10.48550/arXiv.2301.13867.
10.37074/jalt.2023.6.1.22. [Online]. Available: [22] J. Qadir, “Engineering education in the era of ChatGPT: Promise
https://www.researchgate.net/publication/370107010_What_Cha and pitfalls of generative AI for education,” In 2023 IEEE Global
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
Engineering Education Conference (EDUCON), May 2023, pp. Naukowców w Polsce – Nauki humanistyczne i społeczne, Poland,
1-9. Poznan, 2006, pp. 41-49.
[23] B. Warner. “ChatGPT is and is not like calculators.” Inside Higher [37] A. Bartoszko, S. Czerwinski, and A. Bereznowska, (Jun. 2019).
Education. https://www.insidehighered.com/blogs/just- Single-item self-report measure of learning engagement: What
visiting/chatgpt-both-and-not-calculator (accessed July 26, 2023). does it measure? Presented at Interdisciplinary Scientific
[24] Y. Dwivedi, N. Kshetri, L. Hughes, E. Slade, A. Jeyaraj, A. Kar, International Conference for PhD Students and Assistants.
A. Baabdullah, et.al., “Opinion Paper: “’So what if ChatGPT
[Paper]. Available:
wrote it?’ Multidisciplinary perspectives on opportunities,
https://www.researchgate.net/publication/334974047_Single-
challenges and implications of generative conversational AI for
research, practice and policy,” International Journal of item_self-
Information Management, vol. 71, Aug. 2023, doi: report_measure_of_learning_engagement_What_does_it_measu
10.1016/j.ijinfomgt.2023.102642. [Online]. Available: re
https://www.sciencedirect.com/science/article/pii/S02684012230 [38] B. Barker and D. Brooks, “An evaluation of short-term distributed
00233 online learning events,” International Journal on E-Learning,
[25] J. Rudolph, S. Tan, and S. Tan, “ChatGPT: Bullshit spewer or the vol. 4, no. 2, Jun. 2005.
end of traditional assessments in higher education?” Journal of [39] R. A. Wisher and C. K. Curnow, “Techniques for evaluating
Applied Learning and Teaching vol. 6, issue 1, Jan. 2023, doi: distance learning events,” U.S. Army Research Institute,
10.37074/jalt.2023.6.1.9. [Online]. Available: Alexandria, VA, Rep. IR019028, 1998.
https://journals.sfu.ca/jalt/index.php/jalt/article/view/689 [40] D. Ullman and B. F. Malle, “Measuring gains and losses in
[26] E. Glikson and A. W. Woolley, "Human trust in artificial human-robot trust: Evidence for differentiable components of
intelligence: Review of empirical research," Academy of trust,” presented at the 2019 14th ACM/IEEE International
Management Annals, vol. 14, no. 2, pp. 627-660, Aug. 2020, doi: Conference on Human-Robot Interaction (HRI), Mar.ch, 2019,
10.5465/annals.2018.0057. pp. 618-619
[27] A. D. Kaplan, T. T. Kessler, J. C. Brill, and P. A. Hancock, "Trust [41] B. F. Malle and D. Ullman, "A multidimensional conception and
in artificial intelligence: Meta-analytic findings," Human Factors,
measure of human-robot trust," In Trust in human-robot
vol. 65, no. 2, pp. 337-359, May. 2021, doi:
interaction, Academic Press, 2021, pp. 3-25.
10.1177/00187208211013988.
[28] D. Grimes and M. Warshauer, “Utility in a Fallible Tool: A [42] A. Momen, E. De Visser, K. Wolsten, K. Cooley, J. Walliser, and
Multi-Site Case Study of Automated Writing Evaluation,” The C. C. Tossell, "Trusting the moral judgments of a robot: perceived
Journal of Technology, Learning and Assessment, vol. 8, no. 6, moral competence and Humanlikeness of a GPT-3 enabled AI,"
Mar. 2010. Available: presented at the 56th Hawaii International Conference on System
https://ejournals.bc.edu/index.php/jtla/article/view/1625 Sciences, January, 2023, pp. 501- 510.
[29] M. Warschauer and D. Grimes, “Automated Writing Assessment [43] S. C. Kohn, E. J. de Visser, E. Wiese, Y. C. Lee, and T. H. Shaw,
in the Classroom.” Pedagogies: An International Journal, vol. 3, "Measurement of trust in automation: A narrative review and
no. 1, Jan. 2008, doi: 10.1080/15544800701771580. [Online]. reference guide," Frontiers in Psychology, vol. 12, Oct. 2021, doi:
Available: 10.3389/fpsyg.2021.604977.
https://education.uci.edu/uploads/7/2/7/6/72769947/awe- [44] S. S. Monfort, J. J. Graybeal, A. E. Harwood, P. E. McKnight, and
pedagogies.pdf T. H. Shaw, "A single-item assessment for remaining mental
[30] P. Haber-Curran and D. Tillapaugh, “Leadership Learning through
resources: development and validation of the Gas Tank
Student-Centered and Inquiry-Focused Approaches to Teaching
Questionnaire (GTQ)," Theoretical Issues in Ergonomics Science,
Adaptive Leadership,” Journal of Leadership Education, vol. 12,
issue 1, pp. 92-116, 2013, doi: 10.12806/V12/I1/R6. vol. 19, no. 5, Oct. 2017, doi: 10.1080/1463922X.2017.1397228.
[31] Y. Lai, “Which Do Students Prefer to Evaluate Their Essays: [45] J. B. Lyons and S. Y. Guznov, "Individual differences in human–
Peers or Computer Program,” British Journal of Educational machine trust: A multi-study look at the perfect automation
Technology, vol 41, Apr. 2010, doi: 10.1111/j.1467- schema," Theoretical Issues in Ergonomics Science, vol. 20, no.
8535.2009.00959.x. [Online]. Available: https://bera- 4, pp. 440-458, Nov. 2018, doi:
journals.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467- 10.1080/1463922X.2018.1491071.
8535.2009.00959.x [46] M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, L. G. Pierce, and
[32] A. Lipnevich and J. Smith, “Effects of differential feedback on H. P. Beck, "The role of trust in automation reliance,"
students’ examination performance,” Journal of Experimental International Journal of Human-Computer Studies, vol. 58, no. 6,
Psychology, vol. 15, issue 4, 2009, doi: 10.1037/a0017841. pp. 697-718, Jun. 2003, doi: 10.1016/S1071-5819(03)00038-7.
[33] B. M. Olds, B. M. Moskal, and R. L. Miller, “Assessment in [47] N. L. Tenhundfeld, E. J. de Visser, A. J. Ries, V.S. Finomore, and
engineering education: Evolution, approaches and future
C. C. Tossell, "Trust and distrust of automated parking in a Tesla
collaborations,” J. Eng. Edu., vol. 94, pp. 13–25, Jan. 2005. doi:
10.1002/j.2168-9830.2005.tb00826.x. Model X," Human Factors, vol. 62, no. 2, pp. 194-210, Aug.
[34] C. Teddlie and A. Tashakkori, "Major issues and controversies in 2019, doi: 10.1177/0018720819865412.
the use of mixed methods in the social and behavioural sciences," [48] R. C. Mayer, J. H. Davis, and F. D. Schoorman, "An Integrative
in Handbook of Mixed Methods in Social and Behavioural Model of Organizational Trust, The Academy of Management
Research, Sage, 2003, pp. 3-50. Review, vol. 20, no. 3, pp. 709-734, Jul. 1995, doi:
[35] K. Gogol, M. Brunner, T. Goetz, R. Martin, S, Ugen, U. Keller, A, 10.2307/258792.
Fischbach, and F. Preckel, “’My Questionnaire is Too Long!’ The [49] S. M. Merritt and D. R. Ilgen, "Not All Trust Is Created Equal:
assessments of motivational-affective constructs with three-item Dispositional and History-Based Trust in Human-Automation
and single-item measures,” Contemporary Educational Interactions," Human Factors, vol. 50, no. 2, pp. 194-2008, Apr.
Psychology, vol. 39, no. 3, pp. 188-205, Jul. 2014, doi: 2008, doi: 10.1518/001872008X288574.
10.1016/j.cedpsych.2014.04.002 [50] K. A. Hoff and M. Bashir, “Trust in Automation: Integrating
[36] P. Lukowicz, A. Choynowska, A. M. Swiatkowska, and P.
Empirical Evidence on Factors That Influence Trust,” Human
Bereznowski, “Validity of Single Item Self-Report Measure of
Factors, vol. 57, no. 3, Sep. 2014, doi:
Learning Engagement,” in Badania i Rozwój Młodych
10.1177/0018720814547570.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
[51] I. L. Singh, R. Molloy, and R. Parasuraman, “Automation-induced psychology, Washington, D.C., American Psychological Society,
‘complacency’: Development of the Complacency-Potential pp. 109-114, 1999.
Rating Scale,” The International Journal of Aviation Psychology, [66] M. M. Yalch, E. M. Vitale, and J. K. Ford, “Benefits of Peer
vol. 3, no. 2, pp. 111-122, 1993, doi: Review on Students’ Writing,” Psychology Learning & Teaching,
10.1207/s15327108ijap0302_2. vol. 18, no. 3, Apr. 2019, doi: 10.1177/1475725719835070.
[52] S. M. Merritt, A. Ako-Brew, W. J. Bryant, A. Staley, M. McKenna, [67] P. Elbow, Everyone Can Write: Essays toward a hopeful theory of
A. Leone, and L. Shirase, “Automation-Induced Complacency writing and teaching writing, New York, Oxford University Press,
Potential: Development and Validation of a New Scale,” Frontiers 2000.
in Psychology, vol. 10, no. 225, Feb. 2019, doi: [68] E. de Visser and R. Prasuraman, “Adaptive aiding of human-robot
10.3389%2Ffpsyg.2019.00225. teaming: Effects of imperfect automation on performance, trust,
[53] S. M. Merritt, H. Heimbaugh, J. LaChapell, and Deborah Lee, “I and workload,” Journal of Cognitive Engineering and Decision
Trust It, but I Don’t Know Why: Effects of Implicit Attitudes Making, vol. 5, no. 2, pp. 209-231, Jun. 2011, doi:
Toward Automation on Trust in an Automated System,” Human 10.1177/1555343411410160.
Factors, vol. 55, no. 3, Nov. 2012, doi: [69] M. McGuirl and N. Sarter, “Supporting trust calibration and the
10.1177/0018720812465081. effective yse of decision aids by presenting dynamic system
[54] D. Schmid and N. A. Stanton, “Exploring Bayesian analyses of a confidence information, Human Factors, vol. 48, no. 4, pp. 656-
small-sample-size factorial design in human systems integration: 665, 2006, doi: 10.1518/001872006779166334.
the effects of pilot incapacitation,” Human-Intelligent Systems [70] J. Chen, S. Lakhmani, K. Stowers, A. R. Selkowitz, J. Wright, and
Integration, vol. 1, pp. 71-88, Oct. 2020, doi: 10.1007/s42454- M. Barnes, “Situation awareness-based agent transparency and
020-00012-0. human-autonomy teaming effectiveness,” Theoretical issues in
[55] E. J. Wagenmakers, M. Marsman, T. Jamil, A. Ly, J. Verhagen, J. ergonomics science, vol. 19, no. 3, pp. 259-282, Feb. 2018, doi:
Love, R. Selker, Q. F. Gronau, M. Smira, S. Epskamp, D. Matzke, 10.1080/1463922X.2017.1315750.
J. N. Rouder, and R. D. Morey, “Bayesian inference for [71] Y. Su, Y. Lin, and C. Lai, “Collaborating with ChatGPT in
psychology. Part I: Theoretical advantages and practical argumentative writing classrooms,” Accessing Writing, vol. 57,
ramifications,” Psychonomic Bulletin & Review, vol: 25, pp. 35- Jul. 2023, doi: 10.1016/j.asw.2023.100752.
57, Feb. 2018, doi: 10.3758/s13423-017-1343-3. [72] N. Tenhundfeld, “Two Birds With One Stone: Writing a Paper
[56] S. Andraszewicz, B. Scheibehenne, J. Rieskamp, R. Grasman, J. Entitled ‘ChatGPT as a Tool for Studying Human-AI Interaction
Verhagen, and E. J. Wagenmakers, “An Introduction to Bayesian in the Wild’ with ChatGPT,” Preprint, Feb. 2023, Available:
Hypothesis Testing for Management Research,” Journal of https://www.researchgate.net/publication/368300523_Two_Birds
Management, vol. 41, no. 2, Dec. 2014, doi: _With_One_Stone_Writing_a_Paper_Entitled_ChatGPT_as_a_T
10.1177/0149206314560412. ool_for_Studying_Human-
[57] R. Kelter, “Bayesian alternatives to null hypothesis significance AI_Interaction_in_the_Wild_with_ChatGPT.
testing in biomedical research: A non-technical introduction to [73] R. Parasuraman and C. Wickens, “Humans: Still vital after all
Bayesian inference with JASP,” BMC Medical Research these years of automation, Human Factors, vol. 50, no. 3, pp. 511-
Methodology, vol. 20, no. 1, pp. 142, Jun. 2020, doi: 520, Jun. 2008, 10.1518/001872008X312198.
10.1186/s12874-020-00980-6. [74] T. O’Neill, N. McNeese, A. Barron, and B. Schelble, “Human-
[58] J. Cohen, “A power primer,” Psychol. Bull., vol. 112, no. 1, pp. autonomy teaming: A review and analysis of the empirical
155-159, Jul. 1992, doi: 10.1037//0033-2909.112.1.155. literature,” Human Factors, vol. 65, no. 5, pp. 904-938, Oct. 2020,
[59] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. doi: 10.1177/001872082096086.
Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg, H. Nori, H. Palangi, [75] J. R. Hayes, “Modeling and remodeling writing,” Written
M. T. Ribeiro, and Y. Zhang, “Sparkes of Artificial Intelligence: Communication, vol. 29, no. 3, pp. 369-288, Jul. 2012, doi:
Early experiments with GPT-4,” Microsoft Research, Apr. 2023, 10.1177/0741088312451260.
doi: 10.48550/arXiv.2303.12712. [76] J. Wilson, N. G. Olinghouse, and G. N. Andrada, “Does
[60] J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, B. Yin, and Automated Feedback Improve Writing Quality?” Learning
X. Hu, “Harnessing the Power of LLMs in Practice: A Survey on Disabilities, vol. 12, no. 1, pp. 93-118, 2014.
ChatGPT and Beyond,” Cornell University, Apr. 2023, doi: [77] R. Parasuraman, T. B. Sheridan, and C. D. Wickens, “A model for
10.48550/arXiv.2304.13712. types and levels of human interaction with automation,” IEEE
[61] P. Mackieson, A. Shlonsky, and M. Connolly, ”Increasing rigor Transactions on systems, man, and cybernetics-Part A: Systems
and reducing bias in qualitative research: A document analysis of and Humans, vol. 30, no. 3, pp. 286-297, Jun. 2000, doi:
parliamentary debates using applied thematic analysis,” 10.1109/3468.844354.
Qualitative Social Work, vol. 18, no. 6, Jul. 2018, doi: [78] L. Onnasch,C. D. Wickens, H. Li, and D, Manzey, “Human
10.1177/147332501878699. performance consequences of stages and levels of automation: An
[62] T. V. McGovern, “Process/pedagogy,” in The teaching of integrated meta-analysis,” Human Factors, vol. 56, no. 3, pp. 476-
psychology: Essays in honors of Wilbert J. McKeachie and 488, 2014, doi: 10.1177/0018720813501549.
Charles L. Brewer, Mahwah, NJ, Lawrence Erlbaum Associates [79] D. B. Kaber, “Issues in human–automation interaction modeling:
Inc, 2002, pp. 81-90. Presumptive aspects of frameworks of types and levels of
[63] W. J. McKeachie, “Teaching, learning, and thinking about automation,” Journal of Cognitive Engineering and Decision
teaching and learning,” in Higher education: Handbook of theory Making, vol. 12, no. 1, pp. 7-24, Oct. 2017, doi:
and research Vol. XIV, New York, Agathon, 1999, pp. 1-38. 10.1177/1555343417737203.
[64] W. J. McKeachie, “Ebbs, flows, and progress in the teaching of [80] J. D. Lee and K. A. See, “Trust in automation: Designing for
psychology,” in The teaching of psychology: Essays in honors of appropriate reliance,” Human Factors, vol. 46, no. 1, pp. 50-80,
Wilkert J. McKeachie and Charles L. Brewer, Mahwah, NJ, 2004, doi: 10.1518/hfes.46.1.50_30392.
Lawrence Erlbaum Associates Inc, 2002, pp. 487-498. [81] R. C. Mayer and J. H. Davis, “The effect of the performance
[65] M. Miserandino, “Those who can do: Implementing active appraisal system on trust for management: A field quasi-
learning,” in Lessons learned: Practice advice for the teaching of
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Learning Technologies. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TLT.2024.3355015
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/