ASD_ET_Paper__Revision____v2
ASD_ET_Paper__Revision____v2
Abstract.
BACKGROUND: Atypical visual processing and social attention patterns are
commonly observed in children with Autism Spectrum Disorder (ASD), making
the application of eye-tracking technology an area of growing interest in the re-
search community. However, most existing studies focus on isolated aspects within
the fields of special education or psychology, without any technical approaches to
identify these visual features.
OBJECTIVE: The propose of this paper is developing an eye-tracking method to
provide detailed visual features of children with ASD.
METHODS: Designed the eye-tracking based method based on the requirements
of applying eye-tracking technology across various aspects to support children with
ASD. We have conducted an experiment on 29 typical development children and
26 ASD children.
RESULTS: The results demonstrate differences in visual processing and attention
between children with ASD and their typically developing peers when viewing
stimuli with and without social elements that have been evaluated by special ed-
ucation experts as beneficial in supporting ASD children. Additionally, an initial
pre-screening of children with ASD using the extracted statistical eye movement
features with an SVM model achieved an accuracy of 90.91%.
CONCLUSIONS: This method can be utilized by experts to analyze, support diag-
nosis, adjust intervention strategies, and evaluate the effectiveness of interventions
for children with ASD, thereby swiftly identifying and addressing their challenges,
enhancing their learning abilities, and improving their social integration as well as
quality of life.
Keywords. Autism Spectrum Disorder, eye-tracking technology, visual features,
classification
1 Corresponding Author: Thi Duyen Ngo, Human-Machine Interaction Laboratory, Faculty of Information
Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam. E-mail:
[email protected].
September 2024
1. Introduction
the automatic collection, preprocessing, and extraction of visual features. Such informa-
tion is crucial for addressing various tasks, including the diagnosis of ASD, interventions
for children with ASD, and the evaluation of these interventions.
In this paper, we have proposed a comprehensive eye-tracking-based method aimed
at providing visual features of children with ASD. This method supports various appli-
cations aiming at supporting ASD children, including detection, severity classification,
interventions, and the evaluation of intervention effectiveness. Furthermore, these ex-
tracted visual features and their visualizations can assist experts in diagnosing and de-
signing more tailored intervention strategies for each child, providing recommendations
to address existing challenges, teach essential skills, and enhance opportunities for social
integration. The experimental findings demonstrate that the proposed method effectively
distinguishes differences in visual processing between children with ASD and typically
developing (TD) children when exposed to social and non-social visual stimuli, corrobo-
rating observations made in clinical practice. These differences, validated by special ed-
ucation experts, are recognized as valuable in supporting interventions for children with
ASD. Furthermore, by utilizing the extracted statistical eye movement features as input
for a machine learning algorithm, the results reveal the method’s significant potential for
the automated pre-screening of children with ASD.
The remainder of this paper is structured as follows: Section 2 reviews related pre-
vious research. Section 3 illustrates the proposed methodology for providing visual fea-
tures in children with ASD. Section 4 details the experimental procedures and presents
the results. Finally, Section 5 provides the conclusion and discusses potential directions
for future research.
2. Related works
In recent years, eye-tracking technology has proven effective in detecting atypical gaze
behavior related to social interactions in children with ASD. An individual’s interest and
attention to an object can be determined using various measures: the number of fixations,
the average duration of eye fixations, the amount of time spent on a visual stimulus, and
the ability to shift gaze; where fixation and saccade analysis has been applied in most
studies [11, 12, 13]. A fixation refers to a period during which our visual gaze remains
at a specific location, whereas a saccade describes the rapid eye movements between
two successive fixation points [14]. Some studies have applied eye-tracking technology
to analyze attentional patterns, cognitive development, learning abilities, and social in-
teractions, revealing a reduced preference for social stimuli in ASD children compared
to their TD peers. Sasson et al. [15] found that children with ASD showed a greater
decrease in fixation time and sustained fixation on faces when objects of circumscribed
interests (e.g., trains) were present, indicating a broad impact on several aspects of social
attention. Differences in visual attention were also evident, as TD children showed a sig-
nificant increase in both fixation counts and fixation duration on the eyes, whereas chil-
dren with ASD exhibited a significant increase in fixation duration on the mouth [17, 18].
In addition to findings consistent with theese studies, Julia Vacas et al. [16] discovered
that children with ASD exhibited heightened emotional sensitivity, demonstrating atyp-
ical visual orientation towards objects when these objects competed with neutral faces.
They tend to identify positive emotions, such as happiness, more easily than negative
September 2024
emotions, such as anger [19, 20]. In learning contexts, Thompson et al. [21] discovered
that ASD children engaged with e-books for only half the time they were displayed, with
half of that time focused on salient stimuli, and showed slightly better attention to print
when text was both read aloud and highlighted compared to when it was only presented
or read aloud. These findings underscore the potential of eye-tracking technology for the
early signs detection and assessment of ASD.
Based on the identification of distinctive characteristics in visual processing and
social attention in children with ASD, several studies have focused on applying eye-
tracking technology to support the diagnosis of this condition. Guobin Wan et al. [23]
showed a 10-second video of a woman speaking to a group of children aged 4–6 and used
fixation duration on the mouth and body, along with Support Vector Machine (SVM)
analysis, to distinguish between children with ASD and TD children, achieving an accu-
racy of 85.1%. Jessica S. Oliveira et al. [24] proposed a computational method that in-
tegrates concepts of Visual Attention Models (VAM), image processing techniques, and
artificial intelligence to develop a model for supporting the diagnosis of children with
ASD, utilizing ET data. Jiannan Kang et al. [25] combined Electroencephalogram and
ET data while ASD children viewed own-race and other-race stranger faces, using an
SVM classification model to achieve an accuracy of 85.44%. Ibrahim Abdulrab Ahmed
et al. [26] used fixations and saccades as ET input features, combining machine learning
and deep learning techniques to diagnose ASD in children, achieving a high accuracy of
99.8%.
Intervention is typically regarded as a process designed to support individuals facing
challenges in learning, behavior, emotional, and social domains due to deficiencies in
specific skills. Eye-tracking can be a valuable tool for designing plans to enhance learn-
ing in children with ASD, making the intervention process more effective. Fabienne Giu-
liani et al. [27] combined eye-tracking with the TEACCH intervention method to support
two adolescents with ASD, aged 14 and 16. ET data were used to assess visual charac-
teristics and make tailored adjustments to their intervention programs. Quan Wang et al.
[28] implemented gaze-contingent adaptive cueing to guide children with ASD towards
typical looking patterns of an actress in videos, finding that this approach effectively im-
proved their attention to social faces on screen. Additionally, some studies have proposed
combining eye-tracking systems with virtual reality (VR) to enhance communication,
joint attention, and learning in children with ASD [29, 30]. Eye-tracking technology has
the potential to offer an objective assessment of the effectiveness of intervention meth-
ods and to test hypotheses related to early intervention theories for children with ASD.
Kim et al. [31] conducted a quality review of technology-assisted reading interventions
for ASD students using eye-tracking technology, finding that these technologies can ben-
efit these students in learning various reading skills, such as word recognition through
images, and vocabulary comprehension. Additionally, Trembath et al. [32] investigated
the hypothesis that children with ASD learn more effectively when responding to visual
instructions (e.g., images) compared to verbal instructions.
However, most of these studies focus on addressing individual issues in psychology
and special education. Typically, they offer visual information that specialists can use to
propose specific solutions for supporting individuals with ASD without providing clear
technical and technological descriptions. In several countries, during autism interven-
tion, teachers and specialists need extensive visual information about the children, but
this information is mainly gathered through their observations. As a result, the outcomes
September 2024
depend on the observers’ skills and experience, leading to subjective, potentially inaccu-
rate, or incomplete results. Furthermore, to the best of our knowledge, there has been lit-
tle research aimed at developing a comprehensive solution for the automatic collection,
preprocessing, and extraction of visual features. Such information could address various
issues, including autism diagnosis, intervention, and intervention assessment.
3. Methodology
The visual stimuli integration module receives input in the form of a list of visual stimu-
lus images provided by special education experts or teachers. A visual stimulus can be an
image or video containing content (e.g., a human face, social scene, food, toy) designed
to stimulate the viewer’s visual attention, selected based on the specific objectives of
the task at hand. Given the tendency of children with ASD to show reduced attention to
social factors, the selection of visual stimuli content typically ensures a comprehensive
representation of both social elements (e.g., faces, conversations) and nonsocial elements
(e.g., toys, vehicles), allowing for a thorough and complete assessment of their visual
attention characteristics. The selected visual stimuli are resized to fit the screen dimen-
sions. They are then compiled into a video sequence with the following structure: ini-
tially, a red “+” symbol is displayed on a gray background for 0.5 seconds to capture the
child’s attention. This is followed by the presentation of the visual stimuli for t present = 5
seconds to ensure children’s sustained attention and sufficient gaze data collection. Af-
September 2024
terward, the red “+” symbol reappears on a gray background for another 0.5 seconds to
regain the child’s focus. This sequence is repeated until all the visual stimuli have been
presented. Figure 2 depicts an example of a visual stimuli video. To prevent the exper-
iment from becoming overly lengthy and to maintain the child’s focus, the number of
images is limited to 12.
The eye movement recording module captures the children’s eye movements in response
to each visual stimulus displayed on the monitor, storing the raw data within the system.
During the experimental process, the system must be connected to an eye-tracking de-
vice. This device features an infrared camera mounted on a lightweight frame that can
be attached to the monitor. The eye tracker emits infrared light onto the participant’s
eye, with a portion of the light reflecting off the cornea. The method then identifies the
center of the eye (the point where the gaze direction passes through) and the location
of the corneal reflection by capturing images of the participant’s eye using one or more
infrared cameras within the eye tracker. The gaze vector is determined from the infor-
mation about the eye’s center and the corneal reflection point. Finally, the coordinates
of the gaze point on the screen are identified as the intersection of the gaze vector with
the screen. Before starting ET recording, it is essential to calibrate the eye tracker for
each participant. During recording, the eye movement signals captured by the device are
represented within the system as a sequence of points (x, y, t), where x and y denote the
gaze coordinates on the screen, and t represents the timestamp for each data point. The
sequence of timestamps depends on the sampling rate of the eye tracker used.
The ET signals collected may contain noise, which can arise from imperfections in the
data collection setup, environmental interferences during the experiments, as well as
minor eye movements such as tremors or micro-saccades. This noise can significantly
affect the performance and accuracy of eye movement characteristic calculations. In this
study, we minimized noise by applying a moving average filter with a window size of
3 to the x and y coordinate sequences independently. Figure 3 illustrates an example of
the results obtained from this signal denoising process. With proper adjustments, this
method effectively eliminates high-frequency oscillations in the data; however, excessive
smoothing could potentially result in data loss.
September 2024
Figure 3. An example of ET signals before (left) and after (right) denoising. The signals become smoother,
while still preserving the key features of the sampled data after applying the moving average filter.
ET data loss can occur when the eye tracker is unable to estimate the pupil’s direc-
tion from the captured eye image, often due to children blinking (an action that may ac-
count for approximately 2% of the recorded data) or when the child’s gaze shifts outside
the tracking range of the device. In children with ASD, particularly those with impaired
attention spans, the likelihood of ET signal loss may be higher compared to TD children.
Reconstructing ET signals through interpolation is crucial, as it not only ensures smooth
transitions between adjacent data points but also preserves the key characteristics of pupil
size — namely, its temporal continuity and slow variance over hundreds of milliseconds
per second [37]. Several interpolation methods have been employed to manage missing
ET data (e.g., linear interpolation, polynomial interpolation, spline interpolation); how-
ever, linear interpolation is the most commonly used method due to its simplicity and
lowest mean error relative to the ground truth [38]. Let (x′ , y′ ,t ′ ) be the missing point,
and (x1′ , y′1 ,t1′ ), (x2′ , y′2 ,t2′ ) be the two closest known points in time to the missing point.
The linear interpolation algorithm is given by the Equation 1:
(t ′ − t1 )(x2 − x1 )
x ′ = x1 +
t2 − t1
. (1)
′ (t ′ − t1 )(y2 − y1 )
y = y1 +
t2 − t1
Area of Interest (AOI) analysis is a technique that assigns eye movements to specific
regions within a visual scene [14]. Each of these regions contains an object that may be
related to humans, such as a face, eyes, or mouth, or could represent animals or inan-
imate objects displayed at particular coordinates on the monitor. Unlike obtaining eye
movement measures across the entire scene, AOI analysis offers semantically targeted
eye movement metrics that are especially valuable for research focused on attention. In
September 2024
previous research utilizing eye-tracking technology to support children with ASD, AOIs
were delineated entirely manually. This approach heavily relied on the expertise and
knowledge of the individual segmenting the visual stimulus, which could potentially in-
troduce errors. Moreover, given that the proposed method aims to support intervention,
intervention assessment, and personalization, it is crucial to establish a method capa-
ble of accurately processing large and diverse sets of visual stimuli. In this study, we
have suggested two distinct methods for identifying AOIs using the Segment Anything
Model. These methods include automatic AOIs identification and semi-automatic AOIs
identification.
Figure 4. An example of using SAM to automatically recognize all internal objects in an image
diverse range of children, reusable, and highly accurate. In this context, utilizing a semi-
automatic AOIs identification method yields superior results. Specifically, a labeler se-
lects an object, and the SAM then automatically determines the mask for that object. The
labeler subsequently refines this mask to achieve higher quality. According to research
on the SAM model, this semi-automatic labeling method is faster than manual labeling
from scratch, with the average time per mask decreasing from 34 seconds to 14 sec-
onds. Besides object selection by clicking, the SAM model can also receive prompts by
drawing a bounding box or sketching a rough mask over an object.
Given the input as a sequence of gaze points obtained from the eye movements recording
module, the features extraction module calculates the eye movement features of children.
These extracted features include fixation, identified using the PeyeMMV algorithm [35],
and saccade, determined through the I-VT algorithm. Both the velocity threshold and
dispersion threshold-based algorithms have been shown to achieve high accuracy and
computational efficiency when applied in practice [36].
Fixations are identified using the PeyeMMV algorithm, which utilizes a two-step
spatial threshold (denoted as parameters t1 and t2 ) along with a minimum duration thresh-
old. Figure 5 illustrates the three steps of PeyeMMV to determine the list of fixations.
Beginning from the initial point, the mean coordinates are computed as long as the Eu-
clidean distance between this mean point and the current point remains less than t1 . If
the distance exceeds t1 , a new fixation cluster is established. Subsequently, the distance
between each point within a cluster and the cluster’s mean point is calculated, with any
point exceeding the threshold t2 being excluded from the cluster. The duration of each
fixation is determined by the time difference between its start and end points. Fixation
clusters with a duration shorter than the minimum specified value are removed from the
list. A calculated fixation is formatted as (x, y, d,tstart ,tend , no gaze points), where x, y
represent the coordinates of the fixation point, d is the duration of the fixation point and
d = tend − tstart , and no gaze points represents the number of gaze points in that fixation
cluster.
The extracted fixation data can be further processed in the subsequent step for
visualization or integrated with the list of AOIs identified in the AOI recognition
September 2024
module. The calculated statistical features include fixation duration, fixation count,
time to first fixation. fixation duration is the total time, in seconds, spent on fixations
within the target AOI. fixation count refers to the number of fixations within the tar-
get AOI. time to first fixation denotes the interval, in seconds, between the onset of the
visual stimulus and the first fixation on the target AOI.
The I-VT algorithm for saccade calculation employs a velocity threshold to identify
which gaze points fall within a saccade. It starts by calculating the velocity between
points. If the velocity exceeds the threshold, the gaze point is considered the beginning
of a saccade. Conversely, if the velocity falls below the threshold and is the closest point
to the start, it marks the end of the saccade. Subsequently, the duration of the saccade
is calculated as the time difference between its end and start points. Saccades with a
duration shorter than the minimum duration threshold are excluded.
3.7. Visualization
The developed method aims to provide detailed visual characteristics of children with
ASD. One application of this method is to offer visual insights to assist professionals
and educators. Therefore, it is essential to present the results in a manner that enables
experts to easily observe eye movements, facilitating accurate diagnoses, classifications,
and intervention strategies for the children. The visualization module is employed to
display the features obtained from previous modules. In this module, the list of fixations
from the prior step is used to illustrate the information through heatmaps and scanpaths.
Heatmaps can illustrate the spatial distribution of a child’s eye movements in re-
sponse to visual stimuli. While they do not depict the sequence of fixations, they analyze
the spatial distribution of fixation points, highlighting areas within the visual stimulus
where the child focuses more or less. In the proposed method, heatmaps are generated
by applying a Gaussian mask to the input visual stimulus to visualize fixation frequency
through color representation. Initially, the system creates a Gaussian distribution repre-
senting fixation points as a 2D Gaussian mask of specified size. Using this Gaussian dis-
tribution and the fixation array, a heat map is produced, normalized, and converted into
an 8-bit grayscale representation. For the provided input stimulus image, the heatmap
is resized to match the image dimensions, and applied a color map to transform into a
colored heatmap. This color-encoded heatmap is then overlaid onto the input image.
Scanpaths illustrate the eye movement patterns during a task, consisting of a se-
quence of fixation points and saccades. Typically, scan-paths are represented as a series
of connected nodes (denoting fixations) and edges (indicating saccadic movements be-
tween consecutive fixations) overlaid on the visual stimulus image. The method gener-
ates a mask featuring fixations with sizes proportional to the duration of each fixation,
and lines connecting successive fixations representing saccadic movements. The size of
each fixation is determined relative to the duration of the longest fixation, with other fix-
ations scaled accordingly. By utilizing scanpaths, experts can analyze the areas of focus,
the initial points of gaze, the regions receiving the most attention, and the sequence of
eye movements within the visual stimulus data.
Another critical aspect of the proposed method is its potential to assist specialists
in the early identification of children with ASD. Eye-tracking features have demon-
September 2024
strated potential as biomarkers for identifying atypical social attention and possible
visual information-processing deficits in children with ASD. In this study, to evalu-
ate the effectiveness of the extracted system features in supporting the early diagno-
sis of children with ASD automatically, we utilized these statistical features of eye
movements derived from the features extraction module as input for the machine
learning classification method. The three primary eye movement features used include
time to f irst f ixation, f ixation duration, and f ixation count for each stimulus across
12 visual stimuli; these features were then combined into a single input feature vector
and fed into a machine learning model to classify children as either ASD or TD. We
employed the SVM classifier, with parameters selected through a grid search algorithm.
SVM is a supervised learning algorithm that has been previously utilized for distinguish-
ing between individuals with and without ASD [23, 39]. The primary objective of the
SVM classifier is to construct an optimal hyperplane within a multidimensional space
using labeled training data. Classification of testing samples is performed based on the
sign of their distance from the hyperplane, with the magnitude of this distance indicat-
ing the likelihood of the samples belonging to a particular category. Given the limited
amount of data, we trained this SVM model using a 5-fold cross-validation approach.
4.1. Participants
Figure 6. 12 images appear in the stimuli video. Each stimulus was labeled sequentially from ob j1 to ob j12 ,
arranged from left to right and top to bottom.
The experimental task was designed as a free-viewing activity, allowing children to freely
observe any stimuli displayed on the screen to capture their visual information and at-
tentional processes. The visual stimuli were selected based on specific criteria, which
included incorporating both familiar and unfamiliar objects for both groups of children,
covering social and non-social elements, and ensuring gender neutrality. These stimuli
were subsequently validated by special education experts to ensure their suitability for
evaluating children’s attention and interest levels. It comprised 12 images, each measur-
ing 900 × 900 pixels, with each image featuring an object positioned randomly. These
images were compiled into a single video, where each image was displayed for 5 sec-
onds. Figure 6 illustrates the images presented in the stimuli video.
4.3. Procedure
Eye movements were recorded using Tobii Eye Tracker with a sampling rate of 90Hz
and an eye-tracking distance of 50-95 cm. The visual stimuli were displayed on a 14-inch
screen with Full HD resolution (1920×1080 pixels) and a refresh rate of 60Hz.
Each child participating in the experiment was seated in a quiet room alongside two
individuals: a supervising teacher and a technician responsible for adjusting the equip-
ment. The experimental room was devoid of any objects that might attract the children’s
attention. For each child, the data collection device was adjusted so that the child’s eye
level was aligned with the height of the display screen, with the distance from the child’s
eyes to the center of the screen being approximately 60 cm. This setup was designed to
ensure consistency across all participants during the experiment.
September 2024
For each child, the eye-tracking device needed to be calibrated twice before com-
mencing the experiment. Initially, the teacher would use verbal cues and hand gestures
to direct the child’s attention to the screen, after which the technician would play the
visual stimulus video and allow the child to observe freely. Typically, for children with
typical development, data collection could be completed in a single session, lasting ap-
proximately 5 to 10 minutes. However, for children with ASD, who might exhibit dis-
tractibility or fail to focus on the screen, data collection might need to be repeated after
a rest period, extending the total duration to 20 to 30 minutes.
The metrics time to first fixation, fixation duration, and fixation count as visual features
were extracted for both groups: ASD children and TD children in 12 different AOIs us-
ing the proposed method. To examine visual attention patterns toward different objects
in both groups, mixed-design ANOVAs were conducted. ANOVA [44] is one of the most
widely applied statistical methods for hypothesis testing. Key values commonly used in
ANOVA include the F-statistic, which represents the ratio of differences between groups
(or conditions) to variability within groups, and the p-value, which indicates the probabil-
ity of observing an F-value equal to or greater than the actual value if the null hypothesis
is true. A p-value less than 0.05 is typically considered statistically significant, allowing
researchers to reject the null hypothesis and conclude that the factor under investigation
has an effect. Additionally, partial eta-squared (η p2 ) measures the effect size, indicating
the extent to which the independent variable influences the dependent variable. In this
study, the ANOVA included Dependent variables (Time to first fixation, fixation dura-
tion, fixation count), Group (ASD, TD) as the between-subject factor and the 12 AOIs as
the within-subject factor. Significant interactions between group and object type across
the twelve AOIs would indicate that children with ASD exhibit reduced visual attention
when observing different objects presented on the screen.
4.5. Results
Tables 2, 3, and 4 present the mean values of the eye-tracking metrics, including time
to first fixation, fixation duration, and fixation count, for both groups. Each stimu-
lus, depicted in Figure 6, is labeled from ob j1 to ob j12 . Additionally, Figure 7 illus-
trates the mean values and variances of these three metrics for selected representa-
tive AOIs. The mixed-design ANOVA analyses of visual attention metrics revealed dis-
tinct group-level differences between children with ASD and TD across various mea-
sures. For time to first fixation, a significant main effect of group was observed,
F(1, 48) = 7.11, p = .012, η p2 = .173, with children in the ASD group demonstrating sig-
nificantly longer times to first fixation compared to the TD group. Children with ASD
showed slower visual responses compared to TD children when stimuli appeared on the
screen (See Figure 7). The type of AOI significantly influenced first fixation times irre-
spective of group, p < .001, η p2 = .174, whereas the interaction between group and AOI
was non-significant, p = .647, η p2 = .023, indicating consistent fixation patterns across
AOIs for both groups. For fixation duration, group differences were also significant,
F(1, 48) = 6.35, p = .017, η p2 = .157, with children in the ASD group displaying re-
duced focus, as evidenced by shorter average fixation durations compared to TD chil-
September 2024
Figure 7. Estimated average values for Fixation Count (top), Fixation Duration (middle), and Time to First
Fixation (bottom) illustrating the visual attention patterns of ASD and TD groups while observing several
objects.
September 2024
dren. AOI type showed a substantial effect on fixation duration, p < .001, η p2 = .137,
while an interaction effect, p = .043, η p2 = .052, suggested that group differences varied
across specific AOIs. Finally, for fixation count, significant differences between groups
emerged, F(1, 48) = 7.21, p = .011, η p2 = .175, with lower fixation counts observed
in the ASD group. AOI type accounted for considerable variability in fixation counts,
p < .001, η p2 = .280, but no significant interaction was detected, p > .05, η p2 = .051.
Heatmaps are also extracted and averaged between the two groups of children, as il-
lustrated in Figure 8. For non-social objects (e.g., toys, spinning tops), children with ASD
exhibit gaze patterns that are largely similar to those of the TD group. When viewing
stimulus object 1, children with ASD tend to focus on the button that triggers interactive
surprises rather than on the toy bear that creates the surprises. From this observation, we
deduced that children with ASD are more interested in understanding the mechanisms
that produce actions and surprises than in the animals involved. In other words, chil-
dren with ASD, especially those with average and above-average cognitive abilities, can
comprehend causality. Conversely, when it comes to objects with social elements (e.g.,
puppet faces, eyes on toy cars), children with ASD demonstrate atypical gaze patterns
compared to TD children. Most children with ASD avoid looking at the eyes or faces
on these objects and exhibit slower reactions and less focus when viewing social objects
compared to non-social ones, unlike the TD group.
The findings indicate that the proposed method is effective in providing valuable
insights into the visual characteristics of children with ASD when observing stimuli im-
ages. Key screening traits in children with ASD can be identified through their prefer-
ences, interests, number of fixation points, and gaze patterns, including direction. The
visual processing capabilities of children with ASD are sufficiently developed to under-
stand basic life issues and needs. Observations reveal that characteristics typically seen
in direct interactions, such as aversion to eye contact, gaze avoidance, and reduced fo-
cus, are also present when children view images. For familiar objects, children show re-
sponses similar to those in direct environments. These results have been evaluated and
positively received by experts in the field of special education, who affirm that the visual
September 2024
characteristics and presentations of the solution are beneficial in supporting children with
ASD.
We also utilized the extracted eye movement features from a cohort of 55 chil-
dren to facilitate the early identification of ASD using a machine learning classifier.
The SVM model achieved a commendable average accuracy of 90.91%, with a sensi-
tivity of 86.67% (the proportion of correctly identified ASD children) and a specificity
of 96.67% (the proportion of correctly identified TD children). Furthermore, we investi-
gated whether any specific stimuli could effectively differentiate between ASD and TD
children by analyzing the discriminative weights of input features derived from SVM
coefficients. The analysis revealed that two stimuli—one featuring puppets and the other
animated cars—significantly distinguished ASD children from their TD counterparts.
However, this result, though promising, is lower than those reported in several state-of-
the-art studies utilizing eye-tracking technology for early ASD diagnosis. We identified
that a key factor contributing to this discrepancy is the selection of visual stimuli used
during the eye movement data recording. Previous studies have predominantly employed
stimuli with strong social elements, such as facial images or conversation videos, to em-
phasize the differences in social attention between ASD and TD children [23, 40, 41]. In
contrast, our study used 12 visual stimuli designed to support intervention, which did not
prominently feature social cues. Despite this, the preliminary results underscore the po-
tential of the proposed method in providing valuable eye movement features, particularly
September 2024
when children are exposed to stimuli containing both social and non-social elements,
thereby aiding in the automatic pre-screening of ASD using advanced machine learning
techniques.
5. Conclusions
Acknowledgements
This research was funded by the research project QG.23.39 of Vietnam National Univer-
sity, Hanoi.
Author contributions
CONCEPTION: Thi Duyen Ngo, Thi Cam Huong Nguyen, Nu Tam An Nguyen, Thanh
Ha Le and Thi Quynh Hoa Nguyen.
PERFORMANCE OF WORK: Thi Duyen Ngo, Thi Cam Huong Nguyen, Nu Tam An
Nguyen, Thanh Ha Le and Thi Quynh Hoa Nguyen.
INTERPRETATION OR ANALYSIS OF DATA: Duc Duy Le, Thi Quynh Hoa Nguyen,
Dang Khoa Ta, Thi Duyen Ngo, Nu Tam An Nguyen and Thi Cam Huong Nguyen.
PREPARATION OF THE MANUSCRIPT: Duc Duy Le, Thi Duyen Ngo, Thi Quynh
Hoa Nguyen and Thanh Ha Le.
REVISION FOR IMPORTANT INTELLECTUAL CONTENT: Thi Duyen Ngo and
Thanh Ha Le.
SUPERVISION: Thi Duyen Ngo and Thanh Ha Le.
September 2024
Conflict of interest
References
[13] Sabatos-DeVito M, Schipul SE, Bulluck JC, Belger A, Baranek GT. Eye Tracking
Reveals Impaired Attentional Disengagement Associated with Sensory Response
Patterns in Children with Autism. J Autism Dev Disord. 2016;46(4):1319-1333.
doi:10.1007/s10803-015-2681-5
[14] Mahanama B, Jayawardana Y, Rengarajan S, Jayawardena G, Chukoskie L, Snider
J, Jayarathna S. Eye movement and pupil measures: A review. Frontiers in Com-
puter Science. 2022;3:1-22. doi:10.3389/fcomp.2021.733531
[15] Sasson NJ, Touchstone EW. Visual attention to competing social and object im-
ages by preschool children with autism spectrum disorder. J Autism Dev Disord.
2014;44(3):584-592. doi:10.1007/s10803-013-1910-z
[16] Vacas J, Antolı́ A, Sánchez-Raya A, Pérez-Dueñas C, Cuadrado F. Visual
preference for social vs. non-social images in young children with autism
spectrum disorders. An eye tracking study. PLoS One. 2021;16(6):e0252795.
doi:10.1371/journal.pone.0252795
[17] Bataineh E, Almourad MB, Marir F, Stocker J. Visual attention toward Socially
Rich context information for Autism Spectrum Disorder (ASD) and Normal Devel-
oping Children: An Eye Tracking Study. In: Proceedings of the 16th International
Conference on Advances in Mobile Computing and Multimedia. New York, NY,
USA: ACM; 2018. doi:10.1145/3282353.3282856
[18] Zhang K, Yuan Y, Chen J, Wang G, Chen Q, Luo M. Eye Tracking Research on
the Influence of Spatial Frequency and Inversion Effect on Facial Expression Pro-
cessing in Children with Autism Spectrum Disorder. Brain Sci. 2022;12(2):283.
Published 2022 Feb 18. doi:10.3390/brainsci12020283
[19] Tsang V. Eye-tracking study on facial emotion recognition tasks in individuals
with high-functioning autism spectrum disorders. Autism. 2018;22(2):161-170.
doi:10.1177/1362361316667830
[20] Matsuda S, Minagawa Y, Yamamoto J. Gaze Behavior of Children with ASD
toward Pictures of Facial Expressions. Autism Res Treat. 2015;2015:617190.
doi:10.1155/2015/617190
[21] Thompson JL, Plavnick JB, Skibbe LE. Eye-Tracking Analysis of Atten-
tion to an Electronic Storybook for Minimally Verbal Children With Autism
Spectrum Disorder. The Journal of Special Education. 2019;53(1):41-50.
doi:10.1177/0022466918796504
[22] Hosozawa M, Tanaka K, Shimizu T, Nakano T, Kitazawa S. How children with spe-
cific language impairment view social situations: an eye tracking study. Pediatrics.
2012;129(6):e1453-e1460. doi:10.1542/peds.2011-2278
[23] Wan G, Kong X, Sun B, et al. Applying Eye Tracking to Identify Autism
Spectrum Disorder in Children. J Autism Dev Disord. 2019;49(1):209-215.
doi:10.1007/s10803-018-3690-y
[24] Oliveira JS, Franco FO, Revers MC, et al. Computer-aided autism diagnosis
based on visual attention models using eye tracking. Sci Rep. 2021;11(1):10131.
doi:10.1038/s41598-021-89023-8
[25] Kang J, Han X, Song J, Niu Z, Li X. The identification of children with autism
spectrum disorder by SVM approach on EEG and eye-tracking data. Comput Biol
Med. 2020;120(103722):103722. doi:10.1016/j.compbiomed.2020.103722
September 2024
[26] Ahmed IA, Senan EM, Rassem TH, Ali MAH, Shatnawi HSA, Alwazer SM, Al-
shahrani M. Eye Tracking-Based Diagnosis and Early Detection of Autism Spec-
trum Disorder Using Machine Learning and Deep Learning Techniques. Electron-
ics. 2022;11(4):530. https://doi.org/10.3390/electronics11040530
[27] Giuliani F. Case Report: Using Eye- Tracking as Support for the TEACCH Program
and Two Teenagers with Autism- Spectrum Disorders. Journal of Clinical Neuro-
science. 2016;1. doi:10.4172/jnscr.1000104.
[28] Wang Q, Wall CA, Barney EC, et al. Promoting social attention in 3-year-olds
with ASD through gaze-contingent eye tracking. Autism Res. 2020;13(1):61-73.
doi:10.1002/aur.2199
[29] Feng Y, Cai Y. A Gaze Tracking System for Children with Autism Spectrum Dis-
orders. Gaming Media and Social Effects. 2017:137-145. doi:10.1007/978-981-10-
0861-0 10
[30] Mei C, Zahed BT, Mason L, Ouarles J. Towards Joint Attention Training for
Children with ASD - a VR Game Approach and Eye Gaze Exploration. 2018
IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 2018:289-296.
doi:10.1109/VR.2018.8446242
[31] Kim SY, Rispoli M, Mason RA, Lory C, Gregori E, Roberts CA, Whitford D,
David M. A Systematic Quality Review of Technology-Aided Reading Interven-
tions for Students With Autism Spectrum Disorder. Remedial and Special Educa-
tion. 2022;43(6):404-420. doi:10.1177/07419325211063612
[32] Trembath D, Vivanti G, Iacono T, Dissanayake C. Accurate or assumed: visual
learning in children with ASD. J Autism Dev Disord. 2015;45(10):3276-3287.
doi:10.1007/s10803-015-2488-4
[33] Banire B, Al-Thani D, Qaraqe M, Khowaja K, Mansoor B. The Effects of Visual
Stimuli on Attention in Children With Autism Spectrum Disorder: An Eye-Tracking
Study. IEEE Access. 2020;8:225663-74. doi:10.1109/ACCESS.2020.3045042
[34] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment
Anything. 2023; Available from: http://dx.doi.org/10.48550/ARXIV.2304.02643
[35] Vassilios K. PeyeMMV: Python implementation of EyeMMV’s fixa-
tion detection algorithm. Software Impacts. 2023;15(100475):100475.
doi:10.1016/j.simpa.2023.100475
[36] Salvucci DD, Goldberg JH. Identifying fixations and saccades in eye-tracking
protocols. In: Proceedings of the symposium on Eye tracking research &
applications - ETRA ’00. New York, New York, USA: ACM Press; 2000.
doi:10.1145/355017.355028
[37] Strauch C, Georgi J, Huckauf A, Ehlers J. Slow trends - A problem in analysing
pupil dynamics. In: Proceedings of the 2nd International Conference on Physiologi-
cal Computing Systems. SCITEPRESS - Science and and Technology Publications;
2015. doi:10.5220/0005329400610066
[38] Grootjen JW, Weingärtner H, Mayer S. Uncovering and addressing blink-related
challenges in using eye tracking for interactive systems. In: Proceedings of the CHI
Conference on Human Factors in Computing Systems. New York, NY, USA: ACM;
2024. p. 1–23. doi:10.1145/3613904.3642086
[39] Zhao Z, Tang H, Zhang X, Qu X, Hu X, Lu J. Classification of Children With
Autism and Typical Development Using Eye-Tracking Data From Face-to-Face
Conversations: Machine Learning Model Development and Performance Evalua-
tion. J Med Internet Res. 2021;23(8):e29328. doi:10.2196/29328
September 2024
[40] Ozturk MU, Arman AR, Bulut GC, Findik OTP, Yilmaz SS, Genc HA, et al. Sta-
tistical analysis and multimodal classification on noisy eye tracker and applica-
tion log data of children with autism and ADHD. Intell Automat Soft Comput.
2018;24(4):891-905. doi:10.31209/2018.100000058
[41] Minissi ME, Chicchi Giglioli IA, Mantovani F, Alcañiz Raya M. Assessment of
the Autism Spectrum Disorder Based on Machine Learning and Social Visual
Attention: A Systematic Review. J Autism Dev Disord. 2022;52(5):2187-2202.
doi:10.1007/s10803-021-05106-5
[42] American Psychiatric Association. Diagnostic and statistical manual of mental dis-
orders (5th ed.). 2013. doi:10.1176/appi.books.9780890425596
[43] Robins DL, Fein D, Barton ML, Green JA. The Modified Checklist for Autism
in Toddlers: an initial study investigating the early detection of autism and perva-
sive developmental disorders. J Autism Dev Disord. 2001 Apr;31(2):131-44. doi:
10.1023/a:1010738829569.
[44] St≫hle L and Svante W. Analysis of variance (ANOVA). Chemometrics and Intelli-
gent Laboratory Systems. 1989; 6(4):259-272.