0% found this document useful (0 votes)

47 views19 pages

Impacts of Interacting With An AI Chatbot On Prese

This study investigates the impact of interacting with an AI chatbot on preservice teachers' (PSTs) responsive teaching skills in math education. It finds that engagement with a responsive AI chatbot enhances PSTs' noticing abilities and questioning practices, leading to improved perceptions of their effectiveness and confidence in teaching. The research highlights the potential of AI technologies in teacher education, suggesting further exploration of their long-term effects on teaching skills.

Uploaded by

Fadime Ulusoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views19 pages

Impacts of Interacting With An AI Chatbot On Prese

Uploaded by

Fadime Ulusoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Received: 30 March 2024 Revised: 26 August 2024 Accepted: 26 October 2024

DOI: 10.1111/jcal.13091

ORIGINAL ARTICLE

Impacts of interacting with an AI chatbot on preservice

teachers' responsive teaching skills in math education

Dabae Lee 1 | Taekwon Son 2 | Sheunghyun Yeo 3

1
School of Instructional Technology and
Innovation, Kennesaw State University, Abstract
Kennesaw, Georgia, USA
Background: Artificial Intelligence (AI) technologies offer unique capabilities for pre-
2
Department of Mathematics Education,
Korea National University of Education, service teachers (PSTs) to engage in authentic and real-time interactions using natural
Cheongju, Republic of Korea language. However, the impact of AI technology on PSTs' responsive teaching skills
3
Department of Mathematics Education,
remains uncertain.
Daegu National University of Education,
Daegu, Republic of Korea Objectives: The primary objective of this study is to examine whether interaction
with a responsive AI-based chatbot that acts as a virtual student improves pre-
Correspondence
Taekwon Son, Department of Mathematics service teachers' noticing abilities. The second objective is to compare how the pres-
Education, Korea National University of ence or absence of chatbot responses affects changes in PSTs' questioning practices.
Education, Address: 250, Taeseongtabyeon-ro,
Gangnae-myeon, Heungdeok-gu, Cheongju-si, Finally, the third objective is to investigate how the experience of interacting with
Chungcheongbuk-do, Republic of Korea. the responsive virtual student affects PSTs' perceptions of the effectiveness of their
Email: [email protected]
questioning, their satisfaction with the interactions, and their confidence about inter-
[Correction added on 15 November 2024,
acting with a real student compared to the non-responsive chatbot.
after first online publication: Affiliation details
for the first author were corrected in this Methods: A randomised controlled pre- and post-test design was used with 50 PSTs.
version.]
PSTs' noticing, interaction with the chatbot, and post-survey data were collected,
and a t-test was conducted to examine significant differences by group.
Results and Conclusion: In the experimental group, the virtual student responded to
PSTs' questions, while in the control group, she did not. Notable differences were
observed in their questioning practices.
Takeaways: Overall, AI-based chatbots hold promise for enhancing PSTs' responsive
teaching skills. Future research is needed to examine the long-term impact of respon-
sive chatbot use on PSTs' noticing skills.

KEYWORDS
approximation of practice, artificial intelligence, chatbot, math elementary education, practice-
based teacher education, responsive teaching

1 | I N T RO DU CT I O N Effective teaching involves finding the mathematics in

students' comments and actions, considering what stu-
Building a solid mathematical foundation in early education is vital for dents appear to know in light of the intended learning
nurturing mathematical interests and proficiency. (National Council of goals and progression, and determining how to give
Supervisors of Mathematics and National Council of Teachers the best response and support to students based on
of Mathematics, 2019) This requires mathematics teaching that is their current understandings. (p. 56)
grounded in a robust definition of mathematical proficiency (National
Research Council, 2001) The National Council of Teachers of Mathe- In short, effective mathematics teaching is responsive to the math-
matics (2014) observed that ematical ideas that children express in the moment.

J Comput Assist Learn. 2025;41:e13091. wileyonlinelibrary.com/journal/jcal © 2024 John Wiley & Sons Ltd. 1 of 19
https://doi.org/10.1111/jcal.13091
2 of 19 LEE ET AL.

Responsive teaching is a pedagogical approach in which the users to interact with a technology system in a natural language
teacher's instructional decisions are continually adjusted in response instead of a programming language or a set of preset options
to student thinking rather than a pre-written script (Jacobs & (Mathew et al., 2021). Machine Learning (ML) allows computing sys-
Empson, 2016). In responsive teaching, learners' unique ideas can be tems to repeatedly learn from data and check for accuracy to perform
presented, attended to, and responded to, and richer, more productive tasks without being directly scripted (Kharb & Singh, 2021). These
mathematical discussions can take place (Jacobs & Empson, 2016). technical advantages can innovate the way approximations of practice
Responsive teaching allows teachers to “respond productively to stu- have been delivered, potentially creating an authentic teaching simu-
dents' emergent thinking as they grapple with tasks in ways that push lation that would be otherwise unattainable. The perceived authentic-
them to take their thinking farther than they could on their own” ity and immersion in such authentic settings can offer educational
(Kavanagh et al., 2020, p. 96). As a result, responsive teaching encour- benefits to PSTs.
ages students to express mathematical ideas (Flood et al., 2020), However, it is still unclear whether the use of AI technologies
deepens students' mathematical understanding (Robertson makes meaningful impacts on teaching and learning (Zawacki-Richter
et al., 2015), and fosters inclusive learning environments where indi- et al., 2019). Specifically, the effects of using such AI conversational
vidual differences – demographic or different ideas or levels of agents or chatbots in mathematics teacher education have not been
understanding – become assets to learning by seeking to understand systematically examined according to a recent meta-analysis (Dai
students' ideas from diverse backgrounds (National Council of et al., 2023). In this study, we tried to provide a more authentic simu-
Teachers of Mathematics, 2014; Robertson et al., 2015). lated environment by utilising an AI-based chatbot to offer PSTs
Practicing responsive teaching is complex and multifaceted as it opportunities for approximations of practice. We examined the
requires teachers to engage in and respond to a wide spectrum of stu- impacts of this intervention on their responsive teaching skills, ques-
dents' unique thinking without imposing their ideas on students tioning practices, and perceptions of the experiences through a ran-
(Ball, 1993; Lampert, 2013). Therefore, sufficient support must be domised controlled design.
provided for preservice teachers (PSTs) to develop competencies in
responsive teaching (McDonald et al., 2013). Nevertheless, PSTs have
limited opportunities to develop these skills before their field 2 | LI T E RA T U R E RE V I E W
experiences.
One approach to developing PSTs' responsive teaching skills is 2.1 | Practice-based approach in mathematics
implementing practice-based teacher education. This involves (1) pre- teacher education
senting a representation of teaching practice, (2) decomposing com-
plex teaching practices into discrete and manageable components, As the demand for reforming teaching practice in mathematics educa-
and (3) approximating practices by providing PSTs with opportunities tion has increased, there has been a growing call for practice-based
to practice and reflect on each component of teaching practices teacher education that centrers on teachers' enactment of decom-
within environments that reduce the complexity of the situation posed practicable skills (Grossman et al., 2009; Loewenberg Ball &
(Grossman et al., 2009; Grossman & McDonald, 2008). Grossman Forzani, 2009). Eliciting and extending students' mathematical think-
et al. (2009) argued that breaking down complex teaching practices ing through questioning is a crucial component of effective practice
into manageable practical skills and approximating those skills in a (Jacobs & Empson, 2016; National Council of Teachers of
simplified context could provide PSTs with opportunities to under- Mathematics, 2014). Although practicing such skills with real students
stand and enact individual components of complex practices. would constitute authentic learning opportunities for PSTs
Educational researchers have explored various ways to approxi- (e.g., Webel & Yeo, 2021), it is often not feasible to arrange extensive
mate practices. For example, some researchers have developed experiences of working with real children within teacher preparation
technology-based teaching simulations that allow users to engage programmes. Also, children might not be well-served by working
with virtual students to try out and learn from teaching moves in a directly with novices who are still learning how to support students'
low-stakes context (Estapa et al., 2018; Webel et al., 2018; Webel & mathematical development (Webel & Hopkins, 2019).
Conner, 2017). These studies have demonstrated the potential affor- Grossman et al. (2009) conceptualised a practice-based teacher
dances of practicing pedagogical skills in settings that do not require education approaches with three components: representations,
human students. However, these studies often lacked the incorpora- decomposition, and approximations of practice. These three compo-
tion of authentic teaching situations. For example, technology-based nents are crucial to help PSTs understand the intricacies of teaching.
simulations that present only a limited set of predefined teaching Representations “comprise the different ways that practice is repre-
moves to PSTs may not reflect a real instructional situation (Howell & sented in professional education and what these various representa-
Mikeska, 2021). Opportunities for authentic practice are crucial for tions make visible to novices” (p. 2058). In the field of mathematics
adequately preparing PSTs for the challenges they may face in their education, representations can be accomplished in various ways, such
future careers as educators. as presenting classroom observations, video recordings of classroom
Artificial Intelligence (AI) technologies offer unique affordances to interactions, lesson or unit plans, and instructional materials. Decom-
overcome this limitation. Natural Language Processing (NLP) allows position of practice provides PSTs opportunities to examine a
LEE ET AL. 3 of 19

complex practice, broken down into manageable components, and When deciding how to respond, teachers can choose two com-
then gradually reintegrated into the complete practice. mon forms: asking follow-up questions (questioning) or deciding on
Approximation of practice is a way for PSTs to transfer those the next problems (Jacobs et al., 2022). Those questions can be used
skills into their teaching practice, providing them with “opportunities to elicit students' thinking and to assess their understanding (Guarrella
to rehearse and enact discrete components of complex practice in the et al., 2023) as well as to expand and develop students' mathematical
setting of reduced complexity” (Grossman et al., 2009, p. 283). strengths (Scheiner, 2023). Therefore, questioning is an essential
Approximations of practice help PSTs pay attention to and develop a assessment and instructional method in responsive teaching.
specific aspect of teaching practice in an attainable manner Consider a student working on a fraction addition problem, 1/4
(Santagata & Yeh, 2014). Several studies have explored how approxi- + 2/4. The student might draw two circles, divide each into four parts,
mations can enhance various pedagogical skills of PSTs. For example, colour one part of a circle, and colour two parts of another circle.
Ayalon and Wilkie (2020) approximated the practice of formative Then the student might count all parts and coloured parts and answer
assessment by asking novice teachers to analyze various students' it 3/8. Observing this, the teacher might direct her attention to the
answers to an open-ended task to support their development of student's two circles, coloured segments, and the response of 3/8.
assessment criteria. They found that PSTs drew on their content Subsequently, the teacher might interpret the student's mathematical
knowledge when developing assessment criteria. Through multiple understanding based on the employed mathematical strategies. The
iterations, the assessment criteria were refined. student understands how to represent a fraction using a circular
It may not be difficult to provide PSTs with approximations of model and adds numerators and denominators separately, similar to
practice for design and development tasks like creating assessment the addition of natural numbers. This reveals an incomplete under-
criteria. However, it can be challenging to simulate authentic situa- standing concerning the role of denominators as units, highlighting a
tions for the enactment of teaching. Responsive teaching requires potential area for intervention. Lastly, the teacher would decide how
having PSTs elicit and respond to children's thinking (Richards & to respond based on her interpretation of the student's current under-
Robertson, 2015). Therefore, simulating an authentic teaching situa- standing. The teacher might ask a question to further explore the stu-
tion to develop responsive teaching skills is even more challenging dent's thinking. Based on the student's answer to the question, the
due to the interactive and spontaneous nature of responsive teaching teacher would engage in another noticing process. This process recurs
(Jacobs & Empson, 2016). In the next section, we will unpack respon- until the student reaches a complete understanding.
sive teaching and discuss how approximations of practice were pre- The effectiveness of a teacher's questioning can be evaluated by
pared to develop PSTs' responsive teaching skills. how well it incorporates the student's thinking (Webel &
Conner, 2017) and promotes the development of high-order mathe-
matical thinking (Loewenberg Ball & Forzani, 2009). Son et al. (2024)
2.2 | Developing responsive teaching skills and investigated the nature of PSTS' questioning practice and categorised
noticing through simulations their questions into two dimensions: (1) high or low responsiveness
based on their incorporation of students' thinking, and (2) high or low-
Responsive teaching is a pedagogical approach that values and elicits leverage questions based on their potential to develop mathematical
students' thinking and engages students in their learning processes thinking. They found that PSTs asked various types of questions that
(Flood et al., 2020; Jacobs & Empson, 2016). Noticing is seen as a core varied across these two dimensions, calling for a measure to improve
practice of responsive teaching that takes place at the moment before PSTs's questioning practice.
teachers' observable teaching moves (Jacobs et al., 2022). Jacobs and Because of the interactive and iterative nature of responsive
Empson (2016) conceptualised responsive teaching into two genera- teaching, simulations have been proposed as a way to approximate
tive instructional practices: (1) noticing children's mathematical think- practices. Some studies have employed simulations in-person settings
ing and (2) enacting moves to support and extend children's where adults play the role of children (e.g., Shaughnessy &
mathematical thinking. Noticing is defined as “how teachers pay Boerst, 2018), and others utilised digital simulations, such as video-
attention to and make sense of what happens in the complexity of based simulations (e.g., Codreanu et al., 2020) and web-based teach-
instructional situations” (Sherin et al., 2010, p. 1). According to Jacobs ing simulations (e.g., Lee, 2021; Webel & Conner, 2017). These efforts
et al. (2022), professional noticing is composed of (1) attending to chil- provide PSTs with rehearsal opportunities to engage in mathematical
dren's strategies, (2) interpreting children's understandings, and discourses without the challenges and risks associated with PSTs
(3) deciding how to respond. Attending involves “teachers recognising interacting with real children.
mathematically noteworthy aspects of children's strategies” (p. 3). Specifically, digital teaching simulations have been used to
Interpreting involves teachers reasoning through the details of chil- enhance PSTs' responsive teaching skills. For example, the LessonS-
dren's strategies to discern their understanding. Deciding how to ketch tool has been used to provide approximations of teaching prac-
respond involves teachers using “what they have learned from chil- tices, such as making professional decisions while reviewing student
dren's strategy details and understandings to determine their next work (Herbst et al., 2011; Herbst et al., 2014), helping PSTs notice
instructional steps” (p. 3). The three elements are connected and students' reasoning and errors in a mathematical problem (Lee &
occur simultaneously. Lim, 2020), and questions in response to student thinking (Webel &
4 of 19 LEE ET AL.

Conner, 2017). Theelen et al. (2019) found that computer-based class- Several chatbots have been developed to serve as a distance learning
room simulations had positive associations with PSTs' classroom man- assistant (e.g., Tamayo et al., 2020), gather attitudes towards bullying
agement and teaching skills and bridged the common gap between issues (e.g., Oh et al., 2020), practice foreign languages (e.g., Fryer
teacher education programmes and field placements. et al., 2019; Jeon, 2022; Oh & Song, 2021), and enhance literacy skills
However, designing and implementing teaching simulations – in- (e.g., Xu et al., 2021). A recent review of AI chatbots in education
person or computer-based comes with challenges and limitations. In highlighted some benefits of chatbots to students and teachers
the work by Shaughnessy and Boerst (2018), adults were trained to (Labadze et al., 2023). Students appreciated homework and study
perform in a way that mimics a real student's mathematical thinking assistance, a flexible personalised learning experience providing indivi-
and accounts, and only a few selected PSTs got to try their questions. dualised guidance and feedback, and the development of various
LessonSketch activities, such as those designed by Webel and Conner skills, such as writing and problem-solving skills. Teachers valued
(2017), typically provided a few closed-ended decision points in which time-saving administrative assistance and improved instructional and
users were asked to select among multiple response choices. In real assessment practice by leveraging the capabilities of ChatGPT. A
teaching situations, however, teachers are not prompted with possible recent meta-analysis of 24 randomised studies on AI choatbots
options that they could choose, and their teaching moves are open- revealed several educational benefits such as academic outcomes,
ended with many pedagogical contingencies (Rowland et al., 2005). In especially with college students, self-efficacy, interest, and perceived
our study, we tried to overcome these limitations by utilising AI to value of learning (Wu & Yu, 2024). These effects were larger with col-
better approximate teaching situations, including affording open- lege students than elementary students and shorter durations of
ended interactions and offering personalised responses from a virtual interventions.
student to PSTs' questions. Another meta-analysis provided another perspective on the
effects of AI chatbots on academic outcomes. Dai et al. (2023) found
that AI chatbots used in simulation-based learning had a medium
2.3 | AI in education overall effect on learning outcomes, and types of AI technologies used
in the AI chatbots moderated the overall effects in simulation-based
Rapid advances in AI technologies have led to scholarly efforts to learning. Dai and Ke (2022) defined several types of AI technologies,
explore how their unique affordances enhance education (Chiu such as scripted AI, rule-based AI, module-based AI, and NLP and
et al., 2023) as well as practical issues associated with the use of AI in ML. Scripted AI uses coding scripts of a list of responses to be exe-
education (Šedlbauer et al., 2024). Recent efforts focus on preparing cuted linearly. Rule-based AI utilises existing algorithms in decision-
teachers to integrate AI into teaching and learning. For example, Kim making processes. Module-based AI uses a Bayesian model in
and Kwon (2023) explored Korean elementary teachers' experiences decision-making processes. NLP and ML adopt more advanced algo-
teaching AI curricula. They found that teachers were least confident rithms based on neural networks and deep learning than the other AI
in the content knowledge of AI, pointing out a lack of appropriate technologies. According to Dai et al. (2023), chatbots utilising
ways to learn about core AI concepts or principles. This lack of under- advanced AI technologies like NLP (that allow users to interact with
standing led to their anxiety about designing and implementing AI cur- chatbots in their own words) and ML (that enable the system to auto-
ricula. However, other studies show that this can be resolved by matically process and learn from data) had a larger effect measured by
offering appropriate professional development to enhance teachers' Hedge's g (g = 0.42) than using scripted (g = 0.33) or rule-based AI
AI competencies (Kitcharoen et al., 2024) or opportunities to interact (g = 0.23). This finding highlights the possibility of creating effective
with AI systems (Šedlbauer et al., 2024). Therefore, it is critical to and enhanced learning environments through the unique affordances
understand how AI's unique capabilities can be utilised to enhance of these AI technologies. Artificial Intelligence-based educational sys-
education. tems have been utilised in math education as well. According to a
There have been many studies to investigate how to take advan- meta-synthesis, 25% of studies conducted on AI in education pub-
tage of AI's unique capabilities to create enhanced learning environ- lished from 2011 to 2021 were on math education (Crompton
ments that would otherwise be extremely difficult or impossible to et al., 2022). Reflecting the overall trend of AI in education, most of
replicate (Feng & Law, 2021; Zawacki-Richter et al., 2019). According those studies featured intelligent tutoring systems (e.g., Lenat &
to a recent review of AI in education, the intelligent tutoring system is Durlach, 2014; Liu et al., 2019; Ritter et al., 2007). Recently,
the most widely used AI-based system (Feng & Law, 2021; Zawacki- researchers have explored the possibilities of utilising AI chatbots in
Richter et al., 2019). Intelligent tutoring systems provide a way to mathematics education. Nguyen et al. (2019) developed an interactive
scale individualised instruction by assessing student knowledge and intelligent chatbot for mathematical learning at a high school and
adjusting instruction accordingly (Graesser et al., 2001; Zawacki- described its functionalities. Their chatbot used a scripted-based sys-
Richter et al., 2019). However, in such systems, users are considered tem that did not utilise NLP or ML, which was known to bring a larger
as knowledge recipients playing a passive role, rather than as contrib- effect than scripted or rule-based AI (Dai et al., 2023).
utors playing an active role in their learning (Lee & Yeo, 2022). Beyer (2022) developed a chatbot using NLP and ML for mathe-
AI-based chatbots have the potential to provide more interactive matics teachers to support material-based, self-regulated professional
and flexible learning platforms than intelligent tutoring systems. development. Datta et al. (2021) also developed chatbots, where PSTs
LEE ET AL. 5 of 19

engaged in directly teaching the chatbot a mathematical idea. Most application of mathematics to students' lives, and the integration of
recently, Lee and Yeo (2022) developed a chatbot to enhance PSTs' mathematics with other academic disciplines. Preservice teachers
responsive teaching skills. In their design-based research, they went learned the importance of and purposefulness of questioning and
through two iterations to develop a chatbot that served as a virtual interaction between teachers and students. It was emphasised that
student with mathematical misconceptions. At the final iteration, the questioning allowed teachers to probe and extend students' mathe-
chatbot was able to cover 97% of users' input, and the users per- matical thinking. However, they did not learn about effective ques-
ceived the virtual student's responses to be realistic, which may con- tioning or the notions of noticing or responsive teaching.
tribute to the perceived authenticity of the interactions with the The four sections shared the same instructional goals, lesson
chatbot. materials, assignments, and instructional methods. Although the four
Although there have been several studies that documented the sections were taught by different instructors, the instructors met
process of the output of the AI chatbot development, few studies weekly to plan the sessions to ensure the same course content was
examined the chatbot's impacts on intended outcomes in math educa- delivered using the same planned instructional strategies. Preservice
tion (Dai et al., 2023). More specifically, the potential impacts of using teachers did not have any AI-chatbot experience throughout the
a chatbot for approximations of practice on PSTs' responsive teaching course, except for this intervention. They were informed how to
skills are unknown, despite promising possibilities afforded by the access and interact with the chatbot.
medium's unique abilities to enhance the authenticity of a simulated
environment. The purpose of this study was to systematically examine
the effects of a chatbot that acted as a virtual student who displayed 3.2 | Research design and procedures
a partial understanding of fraction concepts when solving a fraction
addition task. To examine its effect, PSTs were randomly assigned to This study employed a pre- and post-test experimental design, where
two conditions: (1) experimental condition where the virtual student 50 participants were randomly assigned to the four sections of a
responded to PSTs' questions simulating a child's responses mathematics methods course, which were again randomly assigned to
(a responsive chatbot) and (2) control condition where the virtual stu- either control (n = 25) or experimental conditions (n = 25). In both
dent solicited questions but did not respond to the PSTs' questions conditions, pre- and post-tests on noticing were administered to mea-
(a non-responsive chatbot). For the experimental condition, the AI sure each group's noticing abilities before and after the intervention.
chatbot developed by Lee and Yeo (2022) was adopted. We aimed to Table 1 summarises the experimental design and procedure.
answer the following research questions (RQs). The study began in Session 7. During sessions 6 and 7, the partici-
pants learned whole numbers and operations and were encouraged to
1. Does interacting with a responsive AI-based chatbot that serves anchor their instruction in students' mathematical thinking and
the role of a virtual student improve PSTs' noticing abilities? to build upon it. They analyzed work samples that were solved by ele-
2. How does interacting with the responsive virtual student, com- mentary students. When analyzing those samples, PSTs interpreted
pared to the non-responsive control group, shape the questioning the intention and meaning of the students' drawings and representa-
practice of PSTs? tions in the strategies. At the end of Session 7, a pre-test was adminis-
3. Does the experience of interacting with the responsive virtual stu- tered to assess PSTs' noticing abilities in terms of attending,
dent affect PSTs' perceptions of the effectiveness of their ques- interpreting, and deciding on how to respond (Appendix A). During
tioning, their satisfaction with the interactions, and their the pretest, PSTs watched a short video clip that showed a student's
confidence about interacting with a real student compared to the solution to a fraction problem with a general misconception of
non-responsive chatbot? fractions.
During Session 8, both groups engaged in the topic of fraction
concepts. Multiple meanings of fraction concepts such as part-whole
3 | METHODS construct, measurement, and quotient constructs were introduced
with problem-solving activities. Then they watched a video, in which a
3.1 | Setting and participants

The participants were 50 PSTs at a research one southern university TABLE 1 Experimental design and procedure.
in the US in their second year of a teacher preparation programme.
Control Experimental
The participants were from four sections of the 15-week methods
course for elementary mathematics. The course objectives included Before Pretest on noticing

learning pedagogical content knowledge and teaching skills for effec- During Watch Jiwoo's video and pose potential questions for Jiwoo

tive teaching of various mathematical domains, such as numbers and Chatbot with no Chatbot responding to PSTs'
operations, geometry, and measurement. Preservice teachers were response questions

also encouraged to develop a professional vision to focus on elemen- After Posttest on noticing
Reflection survey
tary students' mathematical thinking as the construction of ideas, the
6 of 19 LEE ET AL.

FIGURE 1 The webpage with the fraction problem, video of Jiwoo's problem-solving, and chat window.

third grader named Jiwoo solved a fraction concept problem. Jiwoo chatbot solicited three questions from the participants, by presenting
tried to evaluate whether the amount of flour was bigger than one the prompt, “What question do you have for me about my solution?”
cup when 3/4 and 3/6 cups of flour were added together. Jiwoo but did not provide any responses to their questions. The participants
solved the problem, and her answer was 6/10. were supposed to pose questions as they would to a student to
Figure 1 is a screenshot of the webpage where the fraction problem, explore the virtual student's thinking or facilitate learning but did not
Jiwoo's solution video, and the chatbot were embedded. Through the receive a response from Jiwoo.
conversation, we expected PSTs to support the virtual student in devel- For the experimental group, the chatbot developed by Lee and
oping an appropriate understanding of the fraction concept and the Yeo (2022) was refined to address more types of questions about
imperativeness of the whole. After watching the video, the participants in Jiwoo's conceptual understanding of fractions. This chatbot was
both conditions interacted with different chatbots, which are described in designed to start with the same prompt but respond to the partici-
the following section through the interactions, PSTs in both groups were pants' questions as if Jiwoo would. The chatbot was programmed with
expected to practice their responsive teaching skills by asking questions. several interaction capabilities.
At the end of Session 8, a post-test was administered after com- First, the chatbot responded to the participants' questions reflect-
pleting the chatbot interaction. A different video clip was used with ing Jiwoo's current understanding and misconceptions. To questions
the same assessment questions on PSTs' noticing skills (Appendix A). about Jiwoo's misconceptions, she responded incorrectly reflecting
After that, a reflection survey was completed (Appendix B). They were her misconceptions. To do so, possible PST questions were cate-
asked to rate their perceptions of the effectiveness of their question- gorised into specific users' intents, and a response to each intent was
ing, satisfaction with their interactions with the chatbots, and their formulated based on Jiwoo's current understanding. For example,
confidence in teaching in similar situations with a 5-point Likert scale. Jiwoo understood the different sizes of fractions but had not learned
how to make common denominators. So, to a question like “Is ¼ larger
than 1/6?”, Jiwoo answered, “Umm. 1/4 is larger than 1/6.” To a
3.3 | Chatbots question like, “Can you make a common denominator?”, Jiwoo
answered, “I don't know what a common denominator means.”
Different chatbots were used for the experimental and control groups. Second, when a user asked multiple questions beyond Jiwoo's
Figure 2 presents a sample interaction from each condition. For the current understanding repeatedly, such as questions on common
control group, a chatbot was developed to elicit questions from PSTs denominators, Jiwoo reminded the user of her current understanding
but not to respond to their questions. In the control condition, the and encouraged them to ask a different question, saying
LEE ET AL. 7 of 19

FIGURE 2 Sample interactions of the control and experimental conditions.

“I am a third grader, and I have not learned how to add evaluation criteria were used, but different videos on different frac-
fractions yet., So, I don't know how to make bottom tion tasks were used. Both groups watched a short video clip of a stu-
values the same or what the common denominator dent's solution for each test and responded to three prompts related
is. However, I can read fractions and compare the sizes to noticing skills (Jacobs et al., 2022): (1) Describe what you noticed in
of them. Can you ask a question that I can the video clip (attending), (2) Interpret or analyze what you noticed
understand?” (interpreting), and (3) Describe how you would support the student if
you were the teacher (deciding how to respond). See Appendix A.
Third, the chatbot was designed to be able to learn from the In the video for the pre-test, a student solved a fraction problem,
interaction. When multiple key questions that touched on Jiwoo's 4–1/8. The student used a strategy to make a common denominator
misunderstanding were asked, she resolved her cognitive dissonance between 4 and 1/8 as 8 by multiplying 1 and 8. With an incomplete
and reached an aha moment. For instance, when asked to compare understanding, the student did not multiply the numerators with the
3/6 with ½, and then “How much more flour do you need if you opposite denominators and arrived at an incorrect answer (4/8–
already have /4 cup full?”, Jiwoo realised, “Oh, I got it! I think the total
3
1/8 = 3/8). In the video for the post-test, another student solved a
amount should be greater than one cup because both of the red and different fraction problem, 2 1/4–1 3/8. The students subtracted
blue parts are more than half. Thanks!” 1 from 2, then changed from 1/4 to 2/8 as an equivalent fraction to
match the denominator. To find a fractional part, the student resulted
in 1/8 by subtracting 2/8 from 3/8 (2/8–3/8) rather than decompos-
3.4 | Data collection ing and composing the smaller fractional part with the whole number
(1 + 2/8 = 10/8).
Regarding RQ 1, pre- and post-tests were administered for both con- To answer RQ 2, we collected the participants' conversational
trol and experimental groups to evaluate three noticing components data with the chatbot for both groups. The chat data was extracted
(attending, interpreting, and deciding how to respond) to examine from IBM Watson and transferred to spreadsheets to compare both
their impacts on responsive teaching skills. The same prompts and groups' questioning practices.
8 of 19 LEE ET AL.

For RQ 3, the participants rated how they perceived their interac- attending, interpreting, and deciding how to respond (e.g., “In the
tion with the chatbot in the reflection survey (Appendix B). They rated video clip, Mathew doesn't multiply the denominator and numerator
their perceptions of the effectiveness of their questioning, their satis- by the same number, so he doesn't understand that the denominator
faction with the chatbot interaction, and their confidence in teaching and numerator must be multiplied by the same number. In the follow-
similar situations on a 5-point Likert scale. All 50 participants com- ing learning, I will ask him to use the visual representations of fraction
pleted the survey, reaching a 100% response rate. changes. Asking him to draw 4/1 and 4/8 might give him a chance to
check his misconception for himself.”). Two points were awarded
when it demonstrates some connection among them based on limited
3.5 | Data analysis evidence (e.g., “He didn't understand how to make an equivalent frac-
tion. If I were his teacher, I would have him draw a picture of the 4/
3.5.1 | RQ 1. Noticing assessment 8 to show that it is one half and not equal to 4.”). One point was
granted when little connection is made with lack of evidence (e.g., “I
For RQ 1, two authors scored the pre- and post-test data using a would see if he could figure that step out on his own or give him a
rubric from the professional noticing framework (Jacobs et al., 2010) small hint”).
to quantify each PST's noticing expertise. The assessment consisted All data were double coded by two authors. The inter-rater reli-
of three parts: attending, interpreting, and deciding how to respond. ability for each set of coding was 91%, and discrepancies were
Each part was scored from 1 to 3 depending on the extent to which resolved through discussion. Using the scored data, independent t-
the response was evidence-based: 3 = Robust Evidence, 2 = Limited tests were performed to determine if there were any significant differ-
Evidence, 1 = Lack of Evidence, resulting in a total maximum score of ences between the two groups in the pre- and post-test results.
9 for the three components. Paired-samples t-tests were conducted to examine improvements
Attending was scored according to the extent to which PSTs from the pre-test to post-test in both groups.
noticed misconceptions and provided specific and salient explanations Effect sizes were estimated to understand the magnitude of the
related to them. Three points were given when the PST's explanation differences and improvements that statistical significance tests alone
contained the student's misconceptions and significant mathematical cannot provide (Balow, 2017). Both Cohen's d and Hedge's g were
details (e.g., “He knows how to do this by adding both of the two frac- calculated for statistically significant effects. Cohen's d is a commonly
tion's denominators by the same number…. However, he does not used effect size measure, and Hedge's g is known to adjust the ten-
understand the numerators must also be multiplied by that same dency of Cohen's d to overestimate the population effect size espe-
number…”). Two points were awarded when limited evidence was cially when small sample sizes are employed (Fritz et al., 2012). The
presented by describing general features of the student's problem- following are the benchmark values of Cohen (1988): 0.2 being small,
solving strategies without explicit explanation of the student's mis- 0.5 being medium, 0.8 being large. However, it should be noted that
conceptions (e.g., “To solve the problem, the student began by sub- Cohen cautioned that these values should be used as a general rule of
tracting the numerators to get 3 and then found the common thumb when no previous findings exist (Volker, 2006). Hedges and
denominator of 8. He then put the 3 over 8 for his answer.”). One Hedberg (2007) argued educational research reporting effect sizes
point was given when it included an incorrect explanation about the around 0.20 are of policy interest.
student's problem-solving process or nonmathematical explanations
(e.g., “He had a hard time talking through the process that he com-
pleted in his head.”). 3.5.2 | RQ 2. Chatbot conversation data
Interpreting was scored by evaluating how clearly PSTs presented
their analyses based on observed evidence. Three points were The chatbot conversation data was analyzed to determine if there
awarded when their interpretations were based on robust evidence were any differences in the types of questions posed by the partici-
that supported a reasonable explanation of the student's thinking pants in the two conditions. The intents and categories of the chatbot
(e.g., “Mathew does not understand that the numerator must also be system were used as the coding scheme. First, one author reviewed
multiplied by the value that the denominator is multiplied by. In the the users' input data and the intents that were automatically assigned
video, he referred to the numbers and rules about fractions that he to each input to see if any questions were inadequately assigned by
knew while working on the problem, but he did not multiply the the system. In the experimental group, 25 users had one interaction
numerator at the end of the equation.”). Two points were given when for each, and out of the total 244 user inputs, 46 questions' intents
limited evidence was included when explaining the student's thinking were inadequately assigned by the chatbot. In the control group,
(e.g., “He divides a given rectangle into parts and the missing 25 users made 35 interactions and a total of 189 inputs. On average,
fractional pieces make a whole.”). One point was granted when the the 25 users made 1.6 interactions each. Out of the 189 inputs,
explanation did not include supporting evidence (e.g., “He understood 54 inputs were inadequately assigned. In both groups, a total of
how to create a common denominator.”). 433 inputs were made, and 100 of them were inadequately assigned.
Deciding how to respond was scored based on connecting Second, the conversation data were reviewed, recoded when
attending and interpreting to educational decision-making. Three inadequately coded by the system and cleaned. Two authors reviewed
points were given when the response made a solid connection among the 100 inadequately assigned questions from both groups and
LEE ET AL. 9 of 19

reassigned proper codes to them. In this process, new intents were (t = 1.809, p = 0.077). This result indicates that the two
developed, and the data was cleaned. Six categories were identified: groups were at comparable levels of noticing expertise in the pre-test.
(1) General, (2) Concept, (3) Parts and Whole, (4) Compare Size, At the post-test, the score of the experimental group (M = 5.72,
(5) Add and Subtract, and (6) Formula-Related. Appendix C presents SD = 1.792) was significantly higher than that of the control group
the final coding scheme, where definitions of the categories and their (M = 3.88, SD = 1.563, t = 3.869, p < 0.0001). For the post-test
intents can be found. Invalid user inputs that were not questions on difference, Cohen's d was 1.094, and Hedge's g was 1.077. Both
the fraction task were excluded. For example, greetings (e.g., hello, effect sizes indicated a large effect according to the guidelines by
goodbye, etc.), compliments (e.g., You are a good student.), or short Cohen (1988).
answers (e.g., yes, no, great, etc.) were removed from the data. As a Table 3 shows the paired t-test results that compared pre- and
result, 150 questions remained in the experimental group, and post-test results. In the control group, there was no significant
96 questions remained in the control group for analysis. improvement (t = 0.405, p = 0.689). In the experimental group, a
Subsequently, the frequencies of the categories and intents were significant increase was observed (t = 3.99, p = 0.001). These
tallied within each group. For further analysis, to understand how results indicated that the responsive chatbot was effective in improv-
many interactions contained each category of questions, the numbers ing PSTs' noticing expertise compared to the non-responsive chatbot.
of PST questions of each category were counted in each interaction The effect sizes of the experimental group (d = 0.934, g = 0.919) indi-
for both groups. For example, the numbers of interactions that con- cated a large effect according to Cohen's (1988) guideline.
tained at least one question in the General category were counted for
the control and experimental group.
4.1.2 | Changes in the components of noticing
expertise
3.5.3 | RQ 3. Survey data
Table 4 presents the results of independent t-tests between the con-
To analyze the survey data, we performed independent t-tests to trol and experimental groups for the three noticing components in the
examine differences in PSTs' perceptions of their effectiveness, satis- pre-test. Although the means of the experimental group were slightly
faction, and confidence between the two groups. As we did with the higher than those of the control group, no significant difference was
noticing assessment data, we calculated effect sizes using the mea- observed for all three components. This indicates that the levels of
sures of Cohen's d and Hedge's g to understand the magnitude of the the three noticing components of the two groups were comparable
differences, and the confidence interval (CI) was accepted as 95%. before interacting with the chatbots.
The benchmark values for Cohen's d were determined following the Table 5 presents the results of independent t-tests between the
guidelines of Cohen (1988): 0.2 small, 0.5 medium, and 0.8 large. two groups for the three noticing components in the post-test. There
The benchmark values for Hedge's g coefficient were determined by were significant differences between the two groups for all compo-
the criteria by Cohen et al. (2007): 0–0.20 weak, 0.21–0.50 small, nents (Attending p = 0.002, Interpreting p = 0.010, Deciding
0.51–1.00 medium, and if it is greater than 1, the effect is large. p < 0.0001). Attending (d = 0.924, g = 0.910) and deciding how to
respond (d = 1.093, g = 1.076) had large effect sizes larger than 0.8,
and interpreting had a medium effect size (d = 0.761, g = 0.749).
4 | RESULTS Figure 3 visualises the post-test differences between the two groups
with CIs. As shown, the experimental group scored significantly higher
4.1 | RQ 1. Effects on noticing expertise than the control group in all three components. The CIs of the two
groups did not overlap.
4.1.1 | Changes in the overall noticing expertise Table 6 displays the results of paired t-tests for the three noticing
components comparing the pre-test to the post-test. The participants'
Table 2 shows the independent t-test results between the two abilities to attend, interpret, and decide how to respond to students'
groups. There was no significant difference between the two groups thinking significantly improved after interacting with the responsive

TABLE 2 Independent t-test results of noticing expertise between the control and experimental group.

Test Group Mean SD CI df t p Cohen's d Hedge's g

Pre-test Control 3.76 0.723 [3.477, 4.043] 48 1.809 0.077 -
Experimental 4.28 1.242 [3.793, 4.767] -
Post-test Control 3.88 1.563 [3.267, 4.493] 48 3.869 0.000** 1.094
Experimental 5.72 1.792 [5.018, 6.422] 1.077

**p < 0.01.

10 of 19 LEE ET AL.

TABLE 3 Results of paired samples t-test for Pre- and Post- tests in the control and experimental group.

Mean (SD) CI

Group Pre-test Post-test Pre Post df t p Cohen's d Hedge's g

Control 3.76 3.88 [3.477, 4.043] [3.267, 4.493] 24 0.405 0.689 -
(0.723) (1.563) -
Experimental 4.28 5.72 [3.793, 4.767] [5.018, 6.422] 24 3.99 0.001** 0.934
(1.242) (1.792) 0.919

**p < 0.01.

TABLE 4 Independent t-test results of the noticing components between the control and experimental group in the pre-test.

Component Group Mean SD CI df t p Cohen's d Hedge's g

Attending Control 1.4 0.500 [1.204, 1.596] 48 1.124 0.267 0.318
Experimental 1.56 0.507 [1.361, 1.759] 0.313
Interpreting Control 1.24 0.436 [1.069, 1.411] 48 1.496 0.141 0.423
Experimental 1.44 0.507 [1.241, 1.639] 0.417
Deciding Control 1.12 0.332 [0.990, 1.250] 48 1.414 0.164 0.400
Experimental 1.28 0.458 [1.100, 1.460] 0.394

TABLE 5 Independent t-test results of the noticing components between the control and experimental group in the post-test.

Component Group Mean SD CI df t p Cohen's d Hedge's g

Attending Control 1.36 0.638 [1.110, 1.610] 48 3.27 0.002** 0.924
Experimental 1.92 0.572 [1.696, 2.144] 0.910
Interpreting Control 1.36 0.638 [1.110, 1.610] 48 2.691 0.010* 0.761
Experimental 1.88 0.726 [1.595, 2.165] 0.749
Deciding Control 1.16 0.473 [0.975, 1.345] 48 3.865 0.000** 1.093
Experimental 1.92 0.862 [1.582, 2.258] 1.076

*p < 0.05.**p < 0.05.

0 0.5 1 1.5 2 2.5

Attending

Control 1.36

Experimental 1.92
Interpreting

Control 1.36

Experimental 1.88

Control 1.16
Deciding

Experimental 1.92

FIGURE 3 Visualisation of the mean differences in noticing components (Post-test means with confidence intervals). Black bars indicate.

chatbot. The effect size of deciding how to respond was largest 4.2 | RQ 2. Questioning practice
(d = 0.927, g = 0.913), which indicates that among the three compo-
nents of noticing expertise, deciding how to respond showed the The questioning practices of the two groups were characterised dif-
greatest improvement. ferently. The control and experimental groups were compared in
LEE ET AL. 11 of 19

TABLE 6 Paired samples t-test results for each noticing component of the experimental group.

Component Test Mean SD df CI t p Cohen's d Hedge's g

Attending Pre 1.56 0.507 24 [1.361, 1.759] 2.377 0.026* 0.666
Post 1.92 0.572 24 [1.696, 2.144] 2.377 0.026* 0.656

Interpreting Pre 1.44 0.507 24 [1.241, 1.639] 2.864 0.009** 0.703

Post 1.88 0.726 24 [1.595, 2.165] 2.864 0.009** 0.692

Deciding Pre 1.28 0.458 24 [1.100, 1.460] 4.226 0.000** 0.927

Post 1.92 0.862 24 [1.582, 2.258] 4.226 0.000** 0.913

*p < 0.05.**p < 0.001, n = 25.

TABLE 7 Frequencies of the categories. TABLE 8 Frequencies of the intents.

Control Experimental Control

Category
f. % f. % Category Intent f. %
General 37 39% 36 24% General #explain_how 25 26%
Concept 1 1% 3 2% Add & Subtract #add_denominator 16 17%
Parts & Whole 5 5% 12 8% Compare Size #compare_part 15 16%
Compare Size 18 19% 51 34% Formula Related #make_common 8 8%
Add & Subtract 26 27% 21 14% Add & Subtract #add_numerator 6 6%
Formula Related 9 9% 27 18% General #explain_why 4 4%
Sum 96 100% 150 100% Add & Subtract #add_different 4 4%

Note: f: frequency, %: percentage. General #try_other 3 3%

General #ask_correct 3 3%
General #try_model 2 2%
Parts & Whole #represent_fraction 2 2%
terms of the frequencies of the categories (Table 7), the frequencies
Concept #identify_what 1 1%
of the intents (Table 8), and the number of interactions in which at
least one inquiry of each category was made (Table 9). For example, in Parts & Whole #count_whole 1 1%

the control group, a total of 35 interactions took place, and 49% of Parts & Whole #explain_partitioning 1 1%
the participants made 24 interactions containing at least one General Compare Size #compare_whole 1 1%
type of question. We found noteworthy differences between the two Compare Size #compare_one 1 1%
groups. Compare Size #make_equivalent 1 1%
First, the experimental group asked more specific and varied Formula Related #show_formula 1 1%
questions than the control group. Whereas the control group asked Parts & Whole #convert_whole 1 1%
general questions like “How did you solve the problem?”, the ques-
Total 19 96 100%
tions from the experimental group consisted of specific questions like
Experimental
“Are all of the pieces the same size?” As shown in Table 7, in the con-
trol group, 39% of the questions were categorised as general ques- Category Intent f. %

tions, and 61% were specific questions. In the experimental group, General #explain_how 25 17%
24% of the questions were general, and 76% were specific. Given that Formula Related #make_common 20 13%
the control group's percentage of participants that asked at least one Compare Size #compare_one 17 11%
General question was smaller (49%) than the experimental group Compare Size #compare_part 14 9%
(68%) as shown in Table 9, it can be reasoned that general questions Compare Size #compare_half 11 7%
were asked more frequently in each interaction in the control group.
Add & Subtract #add_denominator 10 7%
Looking at Table 8, a wider spectrum of questions were asked by the
Compare Size #make_equivalent 9 6%
experimental group (26 intents) than the control group (19 intents).
General #try_other 6 4%
Second, interestingly, the experimental group asked more
Add & Subtract #add_different 5 3%
formula-related questions (18%) than the control group (9%) as shown
Formula Related #multiply_denominator 5 3%
in Table 7. Supporting that, 52% of the total interactions made by the
experimental group contained formula-related questions, compared to Parts & Whole #explain_fraction 5 3%

18% by the control group (see Table 9). More specifically, questions (Continues)
12 of 19 LEE ET AL.

TABLE 8 (Continued) T A B L E 9 Numbers of interactions in which at least one inquiry in

each category made.
Experimental
Control Experimental
Category Intent f. % Category
Parts & Whole #show_fraction 4 3% # of % of # of % of
Interactions participants Interactions participants
Add & Subtract #add_numerator 3 2%
General 24 49% 17 68%
Concept #explain_concept 2 1%
Concept 1 2% 3 12%
Formula Related #reduce_fraction 2 1%
Parts & 5 10% 6 24%
General #try_model 2 1%
Whole
Add & Subtract #add_more 1 1%
Compare 14 29% 18 72%
Add & Subtract #substract_fraction 1 1% Size
Add & Subtract #add_fraction 1 1% Add & 20 41% 13 52%
Concept #contrast_concept 1 1% Subtract

General #ask_correct 1 1% Formula 9 18% 13 52%

Related
General #compare_rectangles 1 1%
# of Total 35 25
General #state_answer 1 1%
Interactions
Parts & Whole #represent_whole 1 1%
Note: #: number, %: percentage.
Parts & Whole #show_all 1 1%
Parts & Whole #represent_fraction 1 1%
Total 26 150 100% on her understanding. Conversely, the control group touched on com-

Note: f: frequency, %: percentage. paring different fractions (intent: compare_part), but their questions
did not evolve to different Compare Size questions. Given that, the
experimental group's questioning practice seemed more effective in
on making common denominators were more frequently asked (see addressing Jiwoo's misconceptions.
Table 8). These questions aimed at teaching Jiwoo to solve the frac-
tion task correctly rather than exploring Jiwoo's thoughts. This ten-
dency was more evident in the experimental group. For example, a 4.3 | RQ 3. Perceptions of Interaction
participant in the experimental group said, “Do you think the answer
should be different? You should find a common denominator.” We An independent t-test was conducted to examine the difference in
conjecture that with Jiwoo answering their questions, they felt more perception between the control group and the experimental group
encouraged to correct her misconceptions by teaching her the com- after interaction with the AI chatbot. Table 10 shows the t-test
mon denominator method that they were familiar with rather than results. The differences in scores between the two groups regarding
engaging in responsive teaching. preservice teachers' perceptions were all statistically significant, with
Third, converse to the second point, the experimental group's substantial effect sizes observed for all three factors. This indicates
questioning delved into the depth of Jiwoo's misconceptions com- that the participants in the experimental group exhibited significantly
pared to the control group. The experimental group asked more ques- more favourable perceptions on the effectiveness of their question-
tions on comparing sizes of fractions or pieces (34%) than the control ing, their satisfaction with the interactions, and their confidence about
group (19%) as in Table 7. The control group asked more questions on interacting with a real student.
fraction addition and subtraction (27%) than the experimental group
(14%). Although the task was adding fractions, Jiwoo's misconceptions
about adding fractions stemmed from her lack of understanding 5 | IMPLICATIONS
regarding varying sizes of denominators. As in Table 9, 72% of the
participants in the experimental group asked at least one Compare 5.1 | Discussion
Size question compared to 29% of the control group.
Additionally, inspecting the Compare Size intents closely used in There have been few efforts to utilise AI chatbots in mathematics
both groups in Table 8 the experimental group asked more questions teacher education (Datta et al., 2021; Lee & Yeo, 2022), and even
on comparing Jiwoo's solution to one whole (intent: compare_one, fewer studies examined their effects. This experimental study sheds
e.g., “How would you tell whether the amount of flour is more or less light on the impact of a responsive chatbot on preservice teachers'
than a cup?”), after comparing different parts of fractions within the noticing ability. The experimental group that interacted with Jiwoo
fraction problem (intent: compare_part), or comparing them with a ref- who responded to their questions showed a significant improvement
erence point, 1/2 (intent: compare_half). This shows a desirable pat- in their noticing ability. The improvement was most pronounced in
tern of questioning from probing Jiwoo's understanding and building the “deciding how to respond” component. The effect sizes in this
LEE ET AL. 13 of 19

TABLE 10 Independent t-test result of PSTs' perception after interacting with AI chatbot.

Perception Group Mean SD df CI t p Cohen's d Hedge's g

Effectiveness Control 2.36 0.860 24 [2.023, 2.697] 5.023 0.000** 1.421
Experimental 3.52 0.770 [3.218, 3.822] 1.399
Satisfaction Control 2.32 0.900 24 [1.967, 2.673] 4.033 0.000** 1.141
Experimental 3.32 0.852 [2.986, 3.654] 1.123
Confidence Control 2.56 1.121 24 [2.121, 2.999] 3.679 0.001** 1.041
Experimental 3.56 0.768 [3.259, 3.861] 1.024

**p < 0.01.

study were notably larger than the reported effect sizes of similar However, in those approaches, PSTs may not receive learners'
chatbots. According to the recent meta-analysis by Dai et al. (2023), authentic responses to their questions, which is critical in practicing
interventions that employed chatbots utilising NLP and ML in responsive teaching. In responsive teaching, learners' ideas should be
simulation-based learning had effect sizes ranging from 0.32 to 0.52. presented, attended to, and responded to (Jacobs & Empson, 2016).
Questioning is an essential method in responsive teaching that Thus, learners' authentic responses are critical in the development of
helps teachers explore, understand, and extend understanding of stu- PSTs' responsive teaching skills to generate rich and productive math-
dents' thinking. Questioning skills are not developed solely from ematical discussions. In approaches like rehearsal without students or
teaching experiences and should be purposely developed (Casey & peer-to-peer microteaching, PSTs receive less authentic responses. In
Amidon, 2020; Copur-Gencturk & Rodrigues, 2021). The questioning the simulated approaches, PSTs' questions are often limited to a few
practice of the experimental group provides some insights into how options. For example, LessonSketch activities, such as those designed
the questioning skills should be developed. Although some showed a by Webel and Conner (2017), typically only provide a few closed-
tendency to teach a certain method instead of exploring Jiwoo's ended decision points where users are asked to select among multiple
thoughts, a majority, in comparison to the control group, asked more response choices. In real teaching situations, however, teachers are
productive questions that probed her thoughts and delved deeper not prompted with possible options that they could choose, and their
into uncovering her misconceptions. In addition, the experimental teaching moves are open-ended with many pedagogical contingencies
group perceived the interaction as more effective, satisfying, and (Rowland et al., 2005).
confidence-building compared to the control group. Given the signifi- Our approach of using an AI-based chatbot allowed PSTs to enact
cant large effects of the chatbot on PSTs' noticing skills, the chatbot responsive teaching in a more authentic teaching situation. Utilising
can be effectively integrated into various teacher education pro- the unique capabilities of NLP and ML afforded authentic learning
grammes that emphasise responsive teaching practice. In most experiences by allowing users to ask questions in their own words,
teacher preparation programmes, PSTs are not given opportunities to rather than choosing from a menu, and having the chatbot adequately
interact with children until field placement, which limits their develop- respond to their questions. This constitutes a substantial advance-
ment of pedagogical skills. By simulating a virtual student with an AI ment over existing practice-based simulations, which tend to have rel-
chatbot, this limitation can be overcome, enhancing the quality of the atively restricted options for practicing teaching moves. Also, by
programmes, especially for important accreditation requirements, imitating a child's mathematical thinking that reflected the child's
such as instructional practice to provide inclusive learning experiences unique understanding, the chatbot could provide more authentic
(CAEP, 2022). responses to PSTs.
As the importance of practice-based education is emphasised, it Noticing entails (1) attending to children's strategies, (2) interpret-
becomes essential to provide PSTs with opportunities to practice ing their understanding, and (3) deciding how to respond (Jacobs
pedagogical skills in teacher education (National Council of et al., 2022). By showing Jiwoo's specific problem-solving strategies
Teachers of Mathematics, 2014). A key element of practice-based and enabling PSTs to explore Jiwoo's thinking by asking questions,
education is approximations of practice (Roberts & Olarte, 2023). PSTs can be exposed to her problem-solving strategies and mathe-
Several approaches have been utilised to approximate practice, such matical thinking, allowing them to attend to the multi-faceted
as PSTs rehearsing a certain skill without students (Roberts & mathematical understanding of Jiwoo. This also offers various oppor-
Olarte, 2023), peer-to-peer microteaching (Arsal, 2014), video- tunities for PSTs to interpret her level of understanding and decide
based simulations (Codreanu et al., 2020), animated simulation how to respond. The ongoing interaction allows PSTs to continuously
(Estapa et al., 2018; Lee, 2021; Webel & Conner, 2017), or utilising reflect on and adjust their questioning practice based on Jiwoo's
mixed reality simulations (Aguilar & Kang, 2023). These approaches responses, which helps PSTs refine their questing skills.
provided PSTs with opportunities to practice pedagogical skills Our research has shown that AI-based chatbots may provide an
without the challenges and risks associated with PSTs interacting effective, low-stakes, relatively authentic, and personalised learning
with real children. experience where PSTs can practice and develop their responsive
14 of 19 LEE ET AL.

teaching skills before experiencing the full complexity of interacting Third, this study focused on evaluating the impact of the respon-
with real students. The improvement of PSTs' responsive teaching sive chatbot in comparison to the non-responsive one on PSTs' notic-
skills may lead to a more inclusive learning environment where every ing skills. Consequently, our research findings do not offer evidence
student's idea is respected and used as a unique asset of the student regarding whether interacting with the chatbot is more effective than
to be built upon. conventionally used methods or alternative simulation approaches. To
Moreover, we uncovered an interesting tendency. Despite being establish a more comprehensive understanding of the chatbot's com-
trained to practice responsive teaching, many PSTs instinctively parative effectiveness, additional research is needed to assess its
attempted to teach Jiwoo using the common denominator method effects in relation to other pedagogical methods and existing simula-
without exploring her thought process. This inclination was more pro- tion techniques. This broader investigation will provide valuable
nounced in the experimental group, where the interactions were insights into the relative advantages and limitations of incorporating
designed to closely mimic authentic teaching situations, compared to chatbots into teacher education programmes as compared to other
the control group. This finding highlights the possibility that PSTs may approaches.
default to a didactic teaching approach when they are in real teaching Finally, while our research findings indicate a positive and imme-
situations with children even after receiving training in responsive diate impact on the noticing skills of PSTs through their interaction
teaching. It underscores the importance of offering opportunities for with a responsive chatbot, it remains uncertain whether these newly
PSTs to gain practical experience where they can truly grasp and apply acquired skills will transfer to real teaching situations. To address this,
responsive teaching pedagogy. In essence, it highlights the signifi- we recommend future research studies that focus on the long-term
cance of offering approximations of practice where the principles of effects of engaging with a responsive chatbot on PSTs noticing skills.
responsive teaching can be internalised and enacted by PSTs. This Such studies should extend beyond the confines of teacher education
approach may allow them to better resonate with students and incor- courses, encompassing field placements and their subsequent careers
porate responsive teaching into their teaching repertoire. in the teaching profession. This broader perspective would provide a
In conclusion, our findings contribute to the existing knowledge more comprehensive understanding of the enduring benefits and
base by providing an effective example of how AI chatbots can sup- practical applicability of the skills developed during their interaction
port the development of a teaching skill that is important for student with the chatbot.
learning: responding to student thinking by supporting and extending
that thinking. This innovative approach not only enhanced preservice AUTHOR CONTRIBU TIONS
teachers' responsive teaching skills but also emphasised the signifi- Dabae Lee: Conceptualization; funding acquisition; visualization;
cance of approximations of practice in teacher education. Our findings writing – review and editing; writing – original draft; supervision; data
ultimately contribute to a more inclusive and effective learning envi- curation; formal analysis; project administration; validation; methodol-
ronment where all students' ideas are respected and nurtured. ogy; investigation; software. Taekwon Son: Writing – review and
editing; writing – original draft; funding acquisition; investigation; con-
ceptualization; validation; project administration; data curation; meth-
5.2 | Limitations and future research odology; visualization; software. Sheunghyun Yeo: Writing – review
and editing; writing – original draft; funding acquisition; formal analy-
This study bears some notable limitations that warrant consideration. sis; software; resources; methodology; conceptualization.
First, the sample used in this research is not representative of the
broader population of PSTs in the United States, and the sample size CONFLIC T OF INTER E ST STATEMENT
is relatively small. This limitation implies that the generalizability of The authors declare no conflicts of interest.
our findings to a more diverse and extensive population of PSTs may
be limited (Bracht & Glass, 1968; Olsen & Orr, 2016). To improve the DATA AVAILABILITY STAT EMEN T
external validity and statistical power of our study, future research The data that support the findings of this study are available from the
should aim to employ a larger and more diverse sample that better corresponding author upon reasonable request.
reflects the PST population.
Second, it is important to acknowledge that the two experimen- ET HICS S TAT E MENT
tal conditions in our study were administered by different instruc- An IRB approval was obtained for collecting and using data.
tors. While we took measures to standardise the instructional
materials and methods across these conditions, the potential influ- OR CID
ence of instructor differences on the noticing skills of the partici-
Taekwon Son https://orcid.org/0000-0003-4497-9188
pants cannot be entirely ruled out (Schneider et al., 2007; Shadish
et al., 2002). To address this issue and enhance the internal validity RE FE RE NCE S
of our findings, future research could benefit from having the same
Aguilar, J. J., & Kang, S. (2023). Innovating with in-service mathematics
instructor administer both conditions to ensure consistency in teachers' professional development: The intersection among mixed-
instructional delivery. reality simulations, approximation-of-practice, and technology-
LEE ET AL. 15 of 19

acceptance. International Electronic Journal of Mathematics Education, Estapa, A. T., Amador, J., Kosko, K. W., Weston, T., De Araujo, Z., &
18(4), em0750. https://doi.org/10.29333/iejme/13628 Aming-Attai, R. (2018). Preservice teachers' articulated noticing
Arsal, Z. (2014). Microteaching and pre-service teachers' sense of self- through pedagogies of practice. Journal of Mathematics Teacher Educa-
efficacy in teaching. European Journal of Teacher Education, 37(4), 453– tion, 21(4), 387–415. https://doi.org/10.1007/s10857-017-9367-1
464. https://doi.org/10.1080/02619768.2014.912627 Feng, S., & Law, N. (2021). Mapping artificial intelligence in education
Ayalon, M., & Wilkie, K. J. (2020). Developing assessment literacy through research: A network-based keyword analysis. International Journal of
approximations of practice: Exploring secondary mathematics pre- Artificial Intelligence in Education, 31, 277–303. https://doi.org/10.
service teachers developing criteria for a rich quadratics task. Teaching 1007/s40593-021-00244-4
and Teacher Education, 89, 103011. https://doi.org/10.1016/j.tate. Flood, V. J., Shvarts, A., & Abrahamson, D. (2020). Teaching with embodied
2019.103011 learning technologies for mathematics: responsive teaching for
Ball, D. L. (1993). With an eye on the mathematical horizon: Dilemmas of embodied learning. ZDM, 52(7), 1307–1331. https://doi.org/10.1007/
teaching elementary school mathematics. The Elementary School Jour- s11858-020-01165-7
nal, 93(4), 373–397. http://www.jstor.org/stable/1002018 Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Cur-
Balow, C. (2017). The “effect size” in educational research: What is it and rent use, calculations, and interpretation. Journal of Experimental Psy-
how to use it? Illuminate Education. www.illuminateed.com/blog/ chology: General, 141(1), 2–18. https://doi.org/10.1037/a0024338
2017/06/effect-size-educational-research-use/ Fryer, L. K., Nakao, K., & Thompson, A. (2019). Chatbot learning partners:
Beyer, S. (2022). Developing a chatbot for mathematics teachers to sup- Connecting learning experiences, interest and competence. Computers
port digital innovation of subject-matter teaching and learning Society in Human Behavior, 93, 279–289. https://doi.org/10.1016/j.chb.2018.
for Information Technology & Teacher Education International Confer- 12.023
ence 2022, San Diego, CA, United States. https://www.learntechlib. Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., & Harter, D.
org/p/220914 (2001). Intelligent tutoring systems with conversational dialogue. AI
Bracht, G. H., & Glass, G. V. (1968). The External Validity of Experiments. Magazine, 22(4), 39. https://doi.org/10.1609/aimag.v22i4.1591
American Educational Research Journal, 5(4), 437–474. https://doi.org/ Grossman, P., Hammerness, K., & McDonald, M. (2009). Redefining teach-
10.3102/00028312005004437 ing, re-imagining teacher education. Teachers and Teaching: Theory and
CAEP. (2022). 2022 CAEP standards. https://caepnet.org/standards/ Practice, 15(2), 273–289. https://doi.org/10.1080/135406009028
2022-itp/introduction 75340
Casey, S., & Amidon, J. (2020). Do You See What I See? Formative Assess- Grossman, P., & McDonald, M. (2008). Back to the future: Directions for
ment of Preservice Teachers' Noticing of Students' Mathematical research in teaching and teacher education. American Educational
Thinking. Mathematics Teacher Educator, 8(3), 88–104. https://doi.org/ Research Journal, 45(1), 184–205. https://doi.org/10.3102/0002831
10.5951/mte.2020.0009 207312906
Chiu, T. K. F., Xia, Q., Zhou, X., Chai, C. S., & Cheng, M. (2023). Systematic Guarrella, C., Van Driel, J., & Cohrssen, C. (2023). Toward assessment for
literature review on opportunities, challenges, and future research rec- playful learning in early childhood: Influences on teachers' science
ommendations of artificial intelligence in education. Computers and assessment practices. Journal of Research in Science Teaching, 60(3),
Education: Artificial Intelligence, 4, 100118. https://doi.org/10.1016/j. 608–642. https://doi.org/10.1002/tea.21811
caeai.2022.100118 Hedges, L. V., & Hedberg, E. C. (2007). Intraclass Correlation Values for
Codreanu, E., Sommerhoff, D., Huber, S., Ufer, S., & Seidel, T. (2020). Planning Group-Randomized Trials in Education. Educational evaluation
Between authenticity and cognitive demand: Finding a balance in and policy analysis, 29(1), 60–87. https://doi.org/10.3102/
designing a video-based simulation in the context of mathematics 0162373707299706
teacher education. Teaching and Teacher Education, 95, 103146. Herbst, P., Chazan, D., Chen, C., Chieu, V., & Weiss, M. (2011). Using
https://doi.org/10.1016/j.tate.2020.103146 comics-based representations of teaching, and technology, to bring
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Law- practice to teacher education courses. ZDM, 43(1), 91–103. https://
rence Erlbaum. doi.org/10.1007/s11858-010-0290-5
Cohen, L., Manion, L., & Morrison, K. (2007). Research methods in educa- Herbst, P., Chieu, V., & Rougée, A. (2014). Approximating the practice of
tion. Routledge. mathematics teaching: What learning can web-based, multimedia
Copur-Gencturk, Y., & Rodrigues, J. (2021). Content-specific noticing: A storyboarding software enable? Contemporary Issues in Technology and
large-scale survey of mathematics teachers' noticing. Teaching and Teacher Education, 14(4), 356–383. https://citejournal.org/volume-14/
Teacher Education, 101, 103320. https://doi.org/10.1016/j.tate.2021. issue-4-14/mathematics/approximating-the-practice-of-mathematics-
103320 teaching-what-learning-can-web-based-multimedia-storyboarding-
Crompton, H., Jones, M. V., & Burke, D. (2022). Affordances and chal- software-enable
lenges of artificial intelligence in K-12 education: a systematic review. Howell, H., & Mikeska, J. N. (2021). Approximations of practice as a frame-
Journal of Research on Technology in Education, 1-21, 248–268. work for understanding authenticity in simulations of teaching. Journal
https://doi.org/10.1080/15391523.2022.2121344 of Research on Technology in Education, 53(1), 8–20. https://doi.org/
Dai, C.-P., & Ke, F. (2022). Educational applications of artificial intelligence 10.1080/15391523.2020.1809033
in simulation-based learning: A systematic mapping review. Computers Jacobs, V. R., & Empson, S. B. (2016). Responding to children's mathemati-
and Education: Artificial Intelligence, 3, 100087. https://doi.org/10. cal thinking in the moment: An emerging framework of teaching
1016/j.caeai.2022.100087 moves. ZDM, 48(1–2), 185–197. https://doi.org/10.1007/s11858-
Dai, C.-P., Ke, F., Pan, Y., Moon, J., & Liu, Z. (2023). A meta-analysis on the 015-0717-0
effects of using artificial intelligence powered virtual agents in simulation- Jacobs, V. R., Empson, S. B., Jessup, N. A., Dunning, A., Pynes, D. A.,
based learning. American Educational Research Association. Krause, G., & Franke, T. M. (2022). Profiles of teachers' expertise in
Datta, D., Phillips, M., Bywater, J. P., Chiu, J., Watson, G. S., Barnes, L., & professional noticing of children's mathematical thinking. Journal of
Brown, D. (2021). Virtual pre-service teacher assessment and feed- Mathematics Teacher Education, 27, 295–324. https://doi.org/10.
back via conversational agents. Proceedings of the 16th Workshop on 1007/s10857-022-09558-z
Innovative Use of NLP for Building Educational Applications, Associa- Jacobs, V. R., Lamb, L. L. C., & Philipp, R. A. (2010). Professional noticing of
tion for Computational Linguistics. children's mathematical thinking. Journal for Research in Mathematics
16 of 19 LEE ET AL.

Education, 41(2), 169–202. https://doi.org/10.5951/jresematheduc. National Council of Teachers of Mathematics. (2014). Principles to actions:
41.2.0169 Ensuring mathematical success for all. https://www.nctm.org/Store/
Jeon, J. (2022). Exploring AI chatbot affordances in the EFL classroom: Products/Principles-to-Actions–Ensuring-Mathematical-Success-
Young learners' experiences and perspectives. Computer Assisted Lan- for-All/
guage Learning, 1-26, 1–26. https://doi.org/10.1080/09588221.2021. National Research Council. (2001). Adding it up: Helping children learn
2021241 mathematics. National Academy Press.
Kavanagh, S. S., Metz, M., Hauser, M., Fogo, B., Taylor, M. W., & Nguyen, H. D., Pham, V. T., Tran, D. A., & Le, T. T. (2019). Intelligent tutor-
Carlson, J. (2020). Practicing Responsiveness: Using Approximations ing chatbot for solving mathematical problems in high-school. 11th
of Teaching to Develop Teachers' Responsiveness to Students' Ideas. International Conference on Knowledge and Systems Engineering
Journal of Teacher Education, 71(1), 94–107. https://doi.org/10.1177/ (KSE), Da Nang, Vietnam.
0022487119841884 Oh, E., Song, D., & Hong, H. (2020). Interactive computing technology in
Kharb, L., & Singh, P. (2021). Role of Machine Learning in Modern Educa- anti-bullying education: The effects of conversation-bot's role on K-12
tion and Teaching. In S. Verma & P. Tomar (Eds.), Impact of AI Technol- students' attitude change toward bullying problems. Journal of Educa-
ogies on Teaching, Learning, and Research in Higher Education (pp. 99– tional Computing Research, 58(1), 200–219. https://doi.org/10.1177/
123). IGI Global. https://doi.org/10.4018/978-1-7998-4763-2.ch006 0735633119839177
Kim, K., & Kwon, K. (2023). Exploring the AI competencies of elementary Oh, E. Y., & Song, D. (2021). Developmental research on an interactive
school teachers in South Korea. Computers and Education: Artificial application for language speaking practice using speech recognition
Intelligence, 4, 100137. https://doi.org/10.1016/j.caeai.2023.100137 technology. Educational Technology Research and Development, 69,
Kitcharoen, P., Howimanporn, S., & Chookaew, S. (2024). Enhancing 861–884. https://doi.org/10.1007/s11423-020-09910-1
Teachers' AI Competencies through Artificial Intelligence of Things Olsen, R. B., & Orr, L. L. (2016). On the “Where” of Social Experiments:
Professional Development Training. International Journal of Interactive Selecting More Representative Samples to Inform Policy. New Direc-
Mobile Technologies (iJIM), 18(2), 4–15. https://doi.org/10.3991/ijim. tions for Evaluation, 2016(152), 61–71. https://doi.org/10.1002/ev.
v18i02.46613 20207
Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI chatbots in Richards, J., & Robertson, A. D. (2015). A review of the research on
education: systematic literature review. International Journal of Educa- responsive teaching in science and mathematics. In A. D. Robertson,
tional Technology in Higher Education, 20(1), 1–17. https://doi.org/10. R. E. Scherr, & D. Hammer (Eds.), Responsive teaching in science and
1186/s41239-023-00426-1 mathematics (pp. 54–73). Routledge.
Lampert, M. (2013). Studying teaching as a thinking practice. In J. Greeno & Ritter, S., Anderson, J. R., Koedinger, K. R., & Corbett, A. (2007). Cognitive
S. Goldman (Eds.), Thinking practices in mathematics and science learning tutor: Applied research in mathematics education. Psychonomic Bulle-
(pp. 53–78). Routledge. https://doi.org/10.4324/9780203053119 tin & Review, 14(2), 249–255. https://doi.org/10.3758/BF03194060
Lee, D., & Yeo, S. (2022). Developing an AI-based chatbot for practicing Roberts, S. A., & Olarte, T. R. (2023). Enacting multilingual learner core
responsive teaching in mathematics. Computers & Education, 191, practices: a PST's approximations of practice of mathematics language
104646. https://doi.org/10.1016/j.compedu.2022.104646 routines. Journal of Mathematics Teacher Education. https://doi.org/10.
Lee, M. Y. (2021). Improving preservice teachers' noticing skills through 1007/s10857-023-09600-8
technology-aided interventions in mathematics pedagogy courses. Robertson, A. D., Scherr, R., & Hammer, D. (2015). Responsive teaching in
Teaching and Teacher Education, 101, 103301. https://doi.org/10. science and mathematics. Routledge.
1016/j.tate.2021.103301 Rowland, T., Huckstep, P., & Thwaites, A. (2005). Elementary teachers'
Lee, M. Y., & Lim, W. (2020). Investigating patterns of pre-service teachers' mathematics subject knowledge: The knowledge Quartet and the case
written feedback on procedure-based mathematics assessment items. of Naomi. Journal of Mathematics Teacher Education, 8(3), 255–281.
International Electronic Journal of Mathematics Education, 15(1), https://doi.org/10.1007/s10857-005-0853-5
em0561. https://doi.org/10.29333/iejme/5946 Santagata, R., & Yeh, C. (2014). Learning to teach mathematics and to ana-
Lenat, D. B., & Durlach, P. J. (2014). Reinforcing math knowledge by lyze teaching effectiveness: evidence from a video- and practice-based
immersing students in a simulated learning-by-teaching experience. approach. Journal of Mathematics Teacher Education, 17(6), 491–514.
International Journal of Artificial Intelligence in Education, 24(3), 216– https://doi.org/10.1007/s10857-013-9263-2
250. https://doi.org/10.1007/s40593-014-0016-x Scheiner, T. (2023). Shifting the ways prospective teachers frame and
Liu, R., Stamper, J., Davenport, J., Crossley, S., McNamara, D., Nzinga, K., & notice student mathematical thinking: from deficits to strengths. Edu-
Sherin, B. (2019). Learning linkages: Integrating data streams of multi- cational Studies in Mathematics, 114(1), 35–61. https://doi.org/10.
ple modalities and timescales. Journal of Computer Assisted Learning, 1007/s10649-023-10235-y
35(1), 99–109. https://doi.org/10.1111/jcal.12315 Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J.
Loewenberg Ball, D., & Forzani, F. M. (2009). The work of teaching and (2007). Estimating causal effects: Using experimental and observa-
the challenge for teacher education. Journal of Teacher Education, tional designs (report from the Governing Board of the American Edu-
60(5), 497–511. https://doi.org/10.1177/0022487109348479 cational Research Association Grants Program). (0935302344).
Mathew, A. N., Rohini, V., & Paulose, J. (2021). NLP-based personal learn- Šedlbauer, J., Činčera, J., Slavík, M., & Hartlová, A. (2024). Students' reflec-
ing assistant for school education. International Journal of Electrical and tions on their experience with <scp>ChatGPT</scp>. Journal of Com-
Computer Engineering, 11(5), 4522. https://doi.org/10.11591/ijece. puter Assisted Learning, 40(4), 1526–1534. https://doi.org/10.1111/
v11i5.pp4522-4530 jcal.12967
McDonald, M., Kazemi, E., & Kavanagh, S. S. (2013). Core practices and Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and
pedagogies of teacher education: A call for a common language quasi-experimental designs for generalized causal inference. Wadsworth
and collective activity. Journal of Teacher Education, 64(5), 378–386. Cengage learning.
https://doi.org/10.1177/0022487113493807 Shaughnessy, M., & Boerst, T. A. (2018). Uncovering the skills that preser-
National Council of Supervisors of Mathematics and National Council of vice teachers bring to beacher education: The practice of eliciting a
Teachers of Mathematics. (2019). Building STEM education on a student's thinking. Journal of Teacher Education, 69(1), 40–55. https://
sound mathematical foundation. https://www.nctm.org/Standards- doi.org/10.1177/0022487117702574
and-Positions/Position-Statements/Building-STEM-Education-on-a- Sherin, M. G., Jacobs, V. R., & Philipp, R. A. (2010). Mathematics teacher
Sound-Mathematical-Foundation/ noticing. Routledge.
LEE ET AL. 17 of 19

Son, T., Yeo, S., & Lee, D. (2024). Exploring elementary preservice Webel, C., & Yeo, S. (2021). Developing skills for exploring children's think-
teachers' responsive teaching in mathematics through an artificial ing from extensive one-on-one work with students. Mathematics
intelligence-based Chatbot. Teaching and Teacher Education, 146, Teacher Educator, 10(1), 84–102. https://doi.org/10.5951/mte.2020-
104640. https://doi.org/10.1016/j.tate.2024.104640 0003
Tamayo, P. A., Herrero, A., Martín, J., Navarro, C., & Tránchez, J. M. (2020). Wu, R., & Yu, Z. (2024). Do <scp>AI</scp> chatbots improve students
Design of a chatbot as a distance learning assistant. Open Praxis, 12(1), learning outcomes? Evidence from a meta-analysis. British Journal of
145–153. https://doi.org/10.5944/openpraxis.12.1.1063 Educational Technology, 55(1), 10–33. https://doi.org/10.1111/bjet.
Theelen, H., Beemt, A., & Brok, P. (2019). Using 360-degree videos in 13334
teacher education to improve preservice teachers' professional inter- Xu, Y., Wang, D., Collins, P., Lee, H., & Warschauer, M. (2021). Same bene-
personal vision. Journal of Computer Assisted Learning, 35(5), 582–594. fits, different communication patterns: Comparing Children's reading
https://doi.org/10.1111/jcal.12361 with a conversational agent vs. a human partner. Computers & Educa-
Volker, M. A. (2006). Reporting effect size estimates in school psychology tion, 161, 104059. https://doi.org/10.1016/j.compedu.2020.104059
research. Psychology in the Schools, 43(6), 653–672. https://doi.org/ Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Sys-
10.1002/pits.20176 tematic review of research on artificial intelligence applications in
Webel, C., Conner, K., & Zhao, W. (2018). Simulations as a tool for practic- higher education – where are the educators? International Journal of
ing questioning. In O. Buchbinder & S. Kuntze (Eds.), Mathematics Educational Technology in Higher Education, 16(1), 1–27. https://doi.
teachers engaging with representations of practice (pp. 95–112). Springer. org/10.1186/s41239-019-0171-0
Webel, C., & Conner, K. A. (2017). Using simulated teaching experiences
to perturb preservice teachers' mathematics questioning practices.
Mathematics Teacher Educator MTE, 6(1), 9–26. https://doi.org/10.
5951/mathteaceduc.6.1.0009 How to cite this article: Lee, D., Son, T., & Yeo, S. (2025).
Webel, C., & Hopkins, S. (2019). How an immersiveschool-based partner- Impacts of interacting with an AI chatbot on preservice
ship surfaced critical tensionsin practice for preservice teachers. In S. teachers' responsive teaching skills in math education. Journal
Otten, A. G. Candela, Z. d. Araujo, C. Haines, & C. Munter (Eds.), Pro-
of Computer Assisted Learning, 41(1), e13091. https://doi.org/
ceedings of the 41st Annual Meeting of the North American Chapter of
the International Group for the Psychology of Mathematics Education 10.1111/jcal.13091
(pp. 671–675). University of Missouri.
18 of 19 LEE ET AL.

APP E NDIX A: PRE/POST-TEST FOR NOTICING

A.1 | PRE-TEST

Watch Mathew's solution strategy (4 ⅛), then respond to the following prompts.
1. Describe what you noticed in the video clip.
2. Interpret or analyze what you noticed.
3. Describe how you would support the student if you were the teacher.

A.2 | POST-TEST

Watch Emma's solution strategy (2 1/4–1 ⅜), then respond to the following prompts.
1. Describe what you noticed in the video clip.
2. Interpret or analyze what you noticed.
3. Describe how you would support the student if you were the teacher.

APP E NDIX B : REFLECTION SURVEY

Effectiveness, Satisfaction and Confidence in Interacting with Mathbot

Based on your interaction with Mathbot, please select the response that best represents your experience.
1. My questioning skills were effective.
□ Strongly Agree □ Agree □ Neutral □ Disagree □ Strongly Disagree
2. I am satisfied with my interaction with Mathbot.
□ Strongly Agree □ Agree □ Neutral □ Disagree □ Strongly Disagree
3. I feel confident about interacting with a student with similar misconceptions.
□ Strongly Agree □ Agree □ Neutral □ Disagree □ Strongly Disagree
LEE ET AL. 19 of 19

APP E NDIX C : FINAL CODING SCHEME

Category Code/Intent Description

General explain_how asking the student to explain the process of the solution.
Questions on the overall solution, try_other asking the student to try other strategies to solve the problem
strategy, and answer try_model asking the student to try other models such as an area or length model.
ask_correct asking the student if the student thinks her answer is correct
compare_rectangles asking the student to compare the two rectangles that she drew in her solution
state_answer asking the student what the answer is.
explain_why asking the student to explain why the student used the current strategy.
Concept explain_concept asking the student to explain a certain concept related to fractions
Questions that check conceptual contrast_concept asking the student to compare different concepts related to fractions
understanding identify_what asking the student to identify the numerator or denominator of a fraction.
Parts & Whole explain_fraction asking the student to connect a part of a rectangle to a denominator or
numerator
Questions on parts and whole of show_fraction asking the student to show what a certain fraction represent in the picture
fractions or represent_whole asking the student what a rectangle represents
pieces of the rectangles in the student's
solution show_all asking the student to put together all parts/pieces
represent_fraction asking the student what the red or blue rectangle represents
count_whole asking the student how many wholes there are
explain_partitioning asking the student to explain how or why she partitioned the rectangles in her
solution.
convert_whole asking the student to represent a whole in a fraction form. For example, What is
the whole of 3/6?
Compare Size compare_one asking the student to compare all combined parts to one
Questions on comparing sizes of compare_part asking the student to compare unequal parts of the drawing/fractions
fractions compare_half asking the student to compare a fraction against half
make_equivalent asking the student if a fraction can be converted to an equivalent fraction
compare_whole asking the student to compare the wholes of the fractions
Add & Subtract add_denominator asking the student how they got the current denominator
Questions on addition and subtraction of add_different asking the student to add different parts or sizes
fractions add_numerator asking the student to add numerators only.
add_more asking the student to add some more to a certain fraction or number
substract_fraction asking the student to subtract a fraction from another fraction
add_fraction asking the student about adding two fractions
Formula Related make_common asking the student how to make a common denominator
Questions to ask her to use certain multiply_denominator asking the student to multiply denominators
formula reduce_fraction asking the student to convert a fraction to a reduced form
in her solution
show_formula asking the student to show the number expression of the formula