0% found this document useful (0 votes)
28 views

A Multimodal Approach For Identifying Autism Spectrum Disorders in Children

Uploaded by

Karthik G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

A Multimodal Approach For Identifying Autism Spectrum Disorders in Children

Uploaded by

Karthik G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL.

30, 2022 2003

A Multimodal Approach for Identifying Autism


Spectrum Disorders in Children
Junxia Han, Guoqian Jiang , Member, IEEE, Gaoxiang Ouyang , Member, IEEE, and Xiaoli Li

Abstract — Identification of autism spectrum disorder I. I NTRODUCTION


(ASD) in children is challenging due to the complexity and
heterogeneity of ASD. Currently, most existing methods
mainly rely on a single modality with limited information and
often cannot achieve satisfactory performance. To address
A UTISM spectrum disorder (ASD) is a complex neu-
rodevelopmental disorder characterized by social and
communication impairments and restricted and stereotyped
this issue, this paper investigates from internal neurophys- behaviors [1]. ASD typically appears during early childhood
iological and external behavior perspectives simultane- and affects the child’s cognitive ability, social emotion, sensory
ously and proposes a new multimodal diagnosis framework
for identifying ASD in children with fusion of electroen- and motor functioning, and social interaction. It is becoming
cephalogram (EEG) and eye-tracking (ET) data. Specifically, more widespread. According to the estimates from the Centers
we designed a two-step multimodal feature learning and for Disease Control and Prevention (CDC) in United States,
fusion model based on a typical deep learning algorithm, about 1 in 54 children has been identified with ASD [2]. Con-
stacked denoising autoencoder (SDAE). In the first step, sequently, the diagnosis and treatment of ASD have become
two SDAE models are designed for feature learning for EEG
and ET modality, respectively. Then, a third SDAE model in a worldwide public health concern and attracted considerable
the second step is designed to perform multimodal fusion attention. However, the cause of ASD still remains unclear.
with learned EEG and ET features in a concatenated way. Currently, clinical diagnosis of ASD mainly relies on behavior
Our designed multimodal identification model can auto- diagnosis and scales assessmen [3]. However, there is limited
matically capture correlations and complementarity from understanding of the neural patterns behind ASD and the
behavior modality and neurophysiological modality in a
latent feature space, and generate informative feature repre- severity of the disease. Also, there is a lack of experienced
sentations with better discriminability and generalization for experts for ASD diagnosis [4]. Therefore, there is an increas-
enhanced identification performance. We collected a multi- ing need to develop objective and effective tools to identify
modal dataset containing 40 ASD children and 50 typically and diagnose ASD in children and assist clinicians to make
developing (TD) children to evaluate our proposed method. accurate diagnosis result.
Experimental results showed that our proposed method
achieved superior performance compared with two uni- In recent years, different neuroimaging techniques, includ-
modal methods and a simple feature-level fusion method, ing functional magnetic resonance imaging (fMRI) [5], [6],
which has promising potential to provide an objective and magnetoencephalography (MEG) [7], and electroencephalo-
accurate diagnosis to assist clinicians. gram (EEG) [8], have been used to explore the characteristics
Index Terms — Autism spectrum disorders (ASD), mul- of brain structure and function associated with ASD. Among
timodal fusion, electroencephalogram (EEG), eye-tracking these neuroimaging techniques, EEG is a relatively easy-
(ET), stacked denoising autoencoders, classification. to-use, low-cost brain measurement tool that has been widely
used for monitoring atypical brain development. Previous
Manuscript received 16 September 2021; revised 5 May 2022 studies have shown that patients with ASD have abnormal-
and 6 June 2022; accepted 15 July 2022. Date of publication 19 July
2022; date of current version 22 July 2022. This work was supported ities in neural oscillations and brain functional connections
in part by the National Natural Science Foundation of China under at different developmental stages [8]–[10], and accordingly,
Grant 62003228 and Grant 61761166003, in part by the Science various EEG-based features or indicators were extracted from
and Technology Development Project of Beijing Municipal Education
Commission of China under Grant KM202010028019, and in part by different aspects such as neural oscillation rhythm, func-
the National Key Research and Development Program of China under tional connectivity, and nonlinear information dynamics to
Grant 2017YFC0820205. (Junxia Han and Guoqian Jiang contributed quantitatively describe the differences between ASD children
equally to this work.) (Corresponding author: Xiaoli Li.)
This work involved human subjects or animals in its research. Approval and TD children. To further facilitate an automated diagno-
of all ethical and experimental procedures and protocols was granted by sis, machine learning techniques have been used to develop
the School of Psychology Research Ethics Committee of Beijing Normal diagnosis models with extracted EEG features. For example,
University.
Junxia Han is with the Beijing Key Laboratory of Learning and Cogni- Wadhera et al. developed a support vector machine (SVM)
tion, School of Psychology, Capital Normal University, Beijing 100048, classification model with a combination of two features, aver-
China (e-mail: [email protected]). age weighted degree and mutual information, and obtained
Guoqian Jiang is with the School of Electrical Engineering, Yanshan
University, Qinhuangdao 066004, China (e-mail: jiangguoqian@ a detection accuracy of 92.34% [11]. Mehmet Baygin et al.
ysu.edu.cn). proposed a new hybrid deep lightweight feature extractor
Gaoxiang Ouyang and Xiaoli Li are with the State Key Laboratory of to extract deep features using a combination of pre-trained
Cognitive Neuroscience and Learning, Beijing Normal University, Beijing
100875, China (e-mail: [email protected]). models and achieved 96.44% accuracy with an SVM
Digital Object Identifier 10.1109/TNSRE.2022.3192431 classifier [12].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
2004 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 30, 2022

Moreover, there are not only brain abnormalities in indi- Motivated by previous studies, this study aims to develop a
viduals with ASD, but also atypical eye gaze patterns such reliable and accurate diagnosis model with fusion of EEG and
as eye contact avoidance and altered joint attention in social ET data and proposes a new multimodal diagnostic framework
activities. Eye-tracking (ET) technology can provide a direct to identify ASD in children. Specifically, we attempt to design
measure of gaze allocation and goal-directed looking behaviors a deep learning-based fusion model to capture correlations and
and has been primarily used to study the attention allocation complementarity from EEG and ET data. To evaluate the per-
of ASD population [13], [14]. Nakano and Hosozawa et al. formance of our proposed framework, we collect a multimodal
demonstrated that children with ASD spent less time looking at dataset containing the resting-state EEG data and task-state
faces and social interactions than TD children [15]. Liu et al. ET data from 40 children with ASD and 50 TD children.
presented a machine learning framework to deal with an ET Experimental results demonstrate the superior performance of
dataset in a face recognition task to classify ASD children and our proposed method.
TD children, and the maximum classification accuracy reached The rest of this paper is organized as follows. Section II
88.51% [16]. Wan et al. employed a linear discriminant analy- details our proposed multimodal framework for identification
sis using the fixation time of children watching a 10-second of children with ASD. Section III presents detailed experi-
video of a female speaking and the classification accuracy can mental and performance evaluation results. Lastly, Section IV
be achieved 85.1% [17]. concludes this paper.
In summary, EEG and ET have been independently applied
in ASD studies to identify effective biomarkers and then II. M ETHODOLOGY
to design diagnosis models with advanced machine learning The pipeline of the proposed multimodal identification
algorithms. Notably, these existing studies mainly focus on framework for children with ASD is shown in Fig. 1, where
single modality data analysis. However, ASD is a complex EEG and ET data are used as two individual input modalities.
and heterogeneous disease with abnormal manifestations from It aims to integrate the complementary information in both
the cellular level to the behavioral level, and therefore, it is modalities to enhance identification performance. It mainly
difficult to accurately and effectively identify ASD solely consists of three sequential steps: data acquisition, feature
relying on unimodal data, such as EEG or ET. EEG and ET extraction, and multimodal identification. Firstly, resting-state
data are completely different modalities and can be viewed EEG data and ET data were acquired and preprocessed to
from internal neurophysiological and external behavioral remove unrelated noise signals, respectively. Then, in the
perspectives, respectively. These two modalities contain feature extraction stage, we extracted typical features from
rich and complementary information associated with ASD each modality respectively as the initial EEG feature set and
[18], [19]. Due to the data heterogeneity of neurophysiological ET feature set, which contains rich but redundant diagnosis
and behavioral modalities, it is still challenging to explore information related to ASD. In the multimodal identifica-
hidden correlations and complementarity directly from the tion stage, a two-step multimodal fusion network based on
original data. To address this challenge, multimodal fusion is stacked denoising autoencoders (SDAE) is designed to learn
a great option. In recent years, multimodal fusion has attracted useful EEG and ET representation from two initial feature
considerable attention, especially in the medical context, sets and further fuse learned multimodal information for
which has been applied in the diagnosis of ASD [20], [21] as final classification between ASD and TD. Our multimodal
well as other diseases, such as Parkinson [22], Alzheimer [23] fusion network can capture the complementary characteristics
and Depression [24]. In a recent study, Cociu et al. integrated of neurophysiological (EEG) and behavioral (ET) modalities
three different neuroimaging techniques, EEG, fMRI, and and enhance identification performance, which is evaluated
diffusion tensor imaging (DTI) to characterize an autistic through a comparative study with unimodal data in Section III.
brain and provided a better understanding of the neurobi- The details of each part will be described in the following
ological basis of ASD [20]. Mash et al. concentrated on subsections.
multimodal analysis to explore the relation in ASD between
fMRI and EEG measures of spontaneous brain activity [21]. A. Data Acquisition and Preprocessing
In Vasquez-Correa et al. [22], a deep learning-based mul- 1) Subjects: In our study, a total of 90 subjects, includ-
timodal diagnosis model was proposed to classify patients ing 40 ASD children and 50 typically developing (TD) chil-
with Parkinson’s disease in different stages of the disease dren aged 3-6 years, were enrolled. The detailed demographics
by integrating different information from speech, handwriting, of all subjects are listed in Table I. All ASD children were
and gait signals. In Shi et al. [23], multimodal neuroimaging recruited and received diagnostic confirmation based on the
data, magnetic resonance imaging (MRI) and positron emis- Diagnostic and Statistical Manual of Mental Disorders, Fifth
sion tomography (PET), were fused to perform the diagnosis Edition (DSM-V) [1]. However, we had limited access to
of Alzheimer’s disease and achieved superior performance reliable information in the school site. Thus, the children
in binary and multi-class classification tasks. These studies were assessed by using the parent report on the Autism
have proven that multimodal information fusion can take full Behavior Checklist (ABC), Social Communication Question-
advantage of the strengths of individual modality data and naire (SCQ), Social Responsiveness Scale (SRS), and Clancy
overcome their respective weakness, yielding an enhanced Behavior Scale (CABS). Details of the sample demographics
performance. and behavior scores are shown in Table I. All TD children
HAN et al.: MULTIMODAL APPROACH FOR IDENTIFYING AUTISM SPECTRUM DISORDERS IN CHILDREN 2005

Fig. 1. Proposed multimodal identification framework of children with ASD by fusion of multimodal EEG and ET data, which are from
neurophysiological and behavioral view, respectively.

TABLE I employing Net Station (EGI, Inc.) and was reduced below
D EMOGRAPHIC I NFORMATION OF A LL S UBJECTS , W HERE p VALUES 50 kilo-ohm. The EEG data were referenced online to Cz.
W ERE O BTAINED BY A T WO -I NDEPENDENT S AMPLE t T EST. ( STD :
S TANDARD D EVIATION ; SCQ: S OCIAL C OMMUNICATION
EEG data preprocessing was done using EEGLAB [25] and
Q UESTIONNAIRE ; SRS: S OCIAL R ESPONSIVENESS MATLAB. The EEG signal was first downsampled to 250 Hz.
S CALE ; ABC: AUTISM B EHAVIOR C HECKLIST ; A notch filter centered at 50 Hz was employed to remove
CABS: C LANCY B EHAVIOR S CALE ) the line noise, and the data were then band-pass filtered
(0.5-45 Hz). EEG Data were divided into 4 s segments with no
overlap. An artifact detection algorithm proposed in [26] was
utilized to select the segments without artifact involvement,
including eye movements, eye-blinks, power supply, breathing,
muscle movements, abrupt slopes, and outlier values. After
that, a visual inspection was performed to reject those seg-
ments containing noise. During individual recording segments
or throughout the entire recording, sensors were marked as
bad channels by using a 200 μV threshold, which were
interpolated from neighboring channels as described in our
were recruited from a local kindergarten. We also employed previous study [8]. Finally, 4.5 ± 2.2 (mean ± std) bad
these ABC, SRS, and SCQ reported by their teachers to channels were identified and processed, leaving 25.11 ± 5.95
examine if there were any autistic symptoms in the TD segments for further analysis (ASD: 20.4 ± 4.9 versus TD:
group. No TD children reached the cut-off score of ABC, 27.2 ±5.3). In this study, we selected 62 electrodes of interest
SRS and SCQ. The present study has been approved by the from the 128-channel GSN to ensure maximal spatial cover-
School of Psychology Research Ethics Committee of Beijing age of the frontal, central, temporal, and occipital regions.
Normal University and informed consent was obtained from The whole brain was divided into ten regions as shown in
all children with the permission of their parents before subject Fig. 2 (b).
enrollment. 3) ET Data: ET experiments were carried out after that
2) EEG Data: In our study, continuous open-eye resting- each subject finished EEG data collection and had a break.
state EEG signals were recorded with a high-density array Fig. 3 shows the ET data acquisition experimental setup. The
of 128 Ag/AgCl passive electrodes (Electrical Geodesics Inc., Tobii TX300 eye tracker was used to record the gaze behavior
EGI) with a sampling rate of 1000 Hz for at least five of each subject with a sampling frequency of 300 Hz. The
minutes, as shown in Fig. 2 (a). All participated children were screen resolution was set to 1024 pixels × 768 pixels. Before
instructed to be seated comfortably on an armchair usually the formal experiment, a five-point calibration program was
accompanied by their caregivers in a quiet room. Before performed and the experiment proceeded after all 5 points
the EEG recording, scalp impedance was checked online by were captured with small error vectors. All subjects were
2006 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 30, 2022

in ASD [27], [28]. Resting-state EEG studies of ASD


showed reduced alpha power in individuals with ASD
and increased power in low-frequency bands (delta band
and theta band) [29]. Therefore, power spectral den-
sity (PSD) analysis was used to calculate spectral fea-
tures. The spectral power was computed by employing
a Hanning window on each 4-second segment using the
fast Fourier transform (FFT). Relative power energy is
defined as the ratio of the power within each frequency
band to the total power over the whole power spectrum.
For ten brain regions, we first calculated the relative
power energy of each channel, and then the mean
relative power energy was calculated in five frequency
Fig. 2. EEG experimental setup. (a) Resting-state EEG data acquisition; bands: delta (1-4Hz), theta (4-8Hz), alpha (8-13Hz),
(b) Electrodes of interest with different colors representing different brain beta (13-30Hz), and gamma (30-45Hz), thus yielding
regions, where LF: left frontal, RF: right frontal, LC: left central, RC: right 5 × 10 = 50 spectral features in total.
central, LT: left temporal, RT: right temporal, LO: left occipital, RO: right
occipital, LP: left parietal, and RP: right parietal. 2) Multiscale entropy features: Many recent reports
have described exploration of abnormal brain signal
complexity in ASD with multiscale entropy (MSE)
[10], [30]. MSE is a method to describe the complexity
of signals on multiple time scales by introducing a
multiscale coarse-grain process. For a given time series
X = [x 1 , x 2 , x 3 , · · · , x N ], where N is the length of the
time series, it was first coarse-grained using the scale
factor s, with non-overlapping windows as follows:

1 
js
N
y (s)
Fig. 3. Eye-tracking experiment setup: (a) eye-tracking data acquisition
and (b) a visual stimuli video clip and its ROIs. j = xi , 1 ≤ j ≤ (1)
s s
i=( j −1)s+1

required to be seated in front of a monitor with an eye- Then, the sample entropy for each coarse-grained time
to-monitor distance of about 60 cm, and they were instructed series was calculated to describe the complexity of the
to watch the dynamic visual stimuli. The video stimuli material original time series at different time scales s. In this
was selected from the Tiger Qiao Hu1 containing social study, the scale factor s was set to 20 suggested by
interaction between a child and an adult as shown in Fig. 3 (b). previous studies [10]. With the MSE method, we cal-
In each experiment, the video stimuli clips were presented culated the averaged signal complexity on four scale
and each clip lasted about 30 seconds. During the interval ranges (scales 1-5, scales 6-10, scales 11-15, and scales
between trials, a dynamic kitten with sound was presented in 16-20) for ten brain regions shown in Fig. 2 (b). Finally,
the middle of the screen to attract the attention of children. a total of 10 × 4 = 40 complexity features were
A total of 2 trials were displayed in a random order for each obtained.
child. During the whole experiment, no response was required 3) Brain network features: Previous studies have shown
from the children. To explore the child’s engagement with that the brain network in children with ASD was dis-
each AOI, we processed the eye-tracking data according to rupted [8], [31]. In this study, a complex network with
the Tobii fixation filter. Linear interpolation with a maximum 62 nodes was constructed based on graph theory analysis
gap length of 100 ms was used to fill the missing gaze data. to explore the brain network features of ASD children
Eye-tracking sample data were calculated with averaged gaze and TD children. Functional connectivity quantifies the
positions of the left and right eyes. All selected subjects had relationship of EEG oscillatory activities between two
more than 70% screen-looking time captured by the eye- nodes. Specifically, we calculated phase lag index (PLI)
tracking equipment. between two nodes as the edge weights of the con-
nectivity matrix of the brain network. To describe the
B. Feature Extraction differences in the brain function network of ASD and TD
children, we computed the following seven network met-
1) EEG Features: In this study, multi-domain features from
rics [8] of the whole brain, including global efficiency,
different analytic perspectives are extracted to highlight the
clustering coefficient, path length, normalized clustering
characteristics of EEG signals associated with ASD.
coefficient, normalized path length, small-worldness,
1) Relative power energy features: Previous studies showed
and transitivity. Note that these features are global not
atypical activity in multiple EEG oscillatory measures
local. In summary, a total of 7×5 = 35 network features
1 https://kids.qiaohu.com/ over five frequency bands were extracted.
HAN et al.: MULTIMODAL APPROACH FOR IDENTIFYING AUTISM SPECTRUM DISORDERS IN CHILDREN 2007

C. Multimodal Identification Model


EEG and ET data are collected from neurophysiological
and behavioral perspectives, respectively, and they belong
to two different modalities and contain rich and comple-
mentary information associated with ASD. Due to the data
heterogeneity of two modalities, it is still difficult to explore
hidden correlations and complementarity directly from the
original data. To effectively fuse both modalities for improved
performance, we designed a two-step multimodal feature
learning and fusion model, multimodal stacked denoising
autoencoder (MMSDAE), as shown in Fig. 1. It consists of
two core modules: unimodal feature learning module and
multimodal feature fusion module. The first one is used to
learn the high-level and compact features in a latent space
through multiple nonlinear transformations with a designed
SDAE multilayer network structure from high-dimensional
and correlated EEG features and ET features with infor-
mation redundancy. The second one aims to fuse com-
plementary information between the high-level EEG-based
and ET-based representations learned from the previous
stage.
1) SDAE: SDAE is a typical deep neural network
architecture, which consists of multiple denoising autoen-
coders (DAEs) in a stacked way [32]. Fig. 4 illustrates a
typical SDAE model with three DAEs. Specifically, a DAE
Fig. 4. Illustration of an SDAE model with three DAEs, which consists consists of an encoder network and a decoder network. Note
of two steps: unsupervised pre-training and supervised fine-tuning. that DAE has the same number of neurons in the input layer
and the output layer and reproduces its inputs at its output
layer. In other words, it attempts to reconstruct itself only with
2) ET Features: For ET data analysis, we consider eight input data while without extra label information. DAE aims to
areas of interest (AOIs) shown in Fig. 3 (b): (1) background, recover a data sample x from its corrupted version  x with a
(2) adult body, (3) child body, (4) adult eyes, (5) child eyes, typical zero masking strategy [32]. In doing so, it can prevent
(6) adult mouth, (7) child mouth, and (8) joint attention. For the autoencoder from simply learning the identity mapping
each AOI, we extracted six statistical indicators, including and help obtain robust representations from noisy data.
time to first fixation, fixations before, total fixation duration, For the encoder network of DAE, it aims to transform the
fixation count, fixation duration, and visit count. We analyzed original corrupted version x̃ by a nonlinear mapping function
two different dynamic video clips in the experiment. To sum f into a hidden representation h as (3)
up, we finally obtained a total of 6 × 8 × 2 = 96 ET features
for each subject. h = f (W1 x̃ + b) (3)
After performing the above initial feature extraction for
EEG and ET data, respectively, a multimodal dataset contain- where W1 is the weight matrix and b is the bias vector.
ing 125 EEG features and 96 ET features is generated, which In this study, we use the sigmoid function f (x) = 1/(1 +
will be used for multimodal feature learning and fusion for exp(−x)) [32] for the nonlinear mapping purpose. The learned
ASD identification. Considering that all EEG and ET features latent representation h can be viewed as a compression of
have different scales which will impact the subsequent model input data with some loss when the number of hidden units is
training performance, each feature is then linearly normalized less than the number of input units. It can capture the main
in the range of [0 1] using Eq. (2). variations in the high-dimensional input data and eliminate
those less important information through dimension reduction,
x − x min as demonstrated in Section III-B.
x norm = (2)
x max − x min A decoder network then maps the hidden representation h
where x and x norm are the original feature value and the back to a reconstruction output x̂ as
normalized value, respectively and x min and x max are the x̂ = g(W2 h + c) (4)
minimum and maximum value of the original features.
It should be noted that feature normalization is first per- where W2 is the weight matrix, c is the bias vector, and g is the
formed on the training set and accordingly, and the testing set activation function. Likely, sigmoid function is chosen here.
is then rescaled according to the maximum and the minimum The training process of DAE is to find optimal parameters
value of the training set, thus ensuring that both data sets are θ = {W1 , W2 , b, c} by minimizing the mean square error
in the similar range. between the original input and the reconstructed output, which
2008 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 30, 2022

is performed in an unsupervised manner only with input data multimodal feature fusion via Fusion-SDAE are performed,
while without any label information. and finally give the identification label (ASD or TD).
As shown in Fig. 4, the training process of an SDAE model
consists of two steps: an unsupervised pre-training step and III. E XPERIMENTS AND R ESULS
a supervised fine-tuning step. Given a training set data, the To demonstrate the effectiveness and superiority of our pro-
learning of SDAE is started by a greedy layer-wise pre- posed method, we compare it with two unimodal-based meth-
training procedure which learns a stack of DAEs one by ods, EEG-SDAE and ET-SDAE. For EEG-SDAE, unimodal
one in an unsupervised learning manner. The key concept in EEG data is used for training an SDAE-based identification
greedy layer-wise learning is to train one layer every time. model. For ET-SDAE, an SDAE-based model is trained with
In this way, the network parameters are initialized, reducing unimodal ET data. Also, a simple feature-level fusion method
the problem of local minima. In the fine-tuning phase, all named CONCAT-SDAE is compared, where EEG data and
the learned hidden layers from several DAEs are stacked to ET data are simply concatenated to train an SDAE model. All
form a deep network, the decoder networks of each DAE are compared methods are evaluated using the same dataset.
removed, and a softmax layer is added in the top of the encoder
network, as shown in Fig. 4 (b). Then, the whole network
parameters can be jointly optimized and fine-tuned using the A. Evaluation Metric
back-propagation (BP) algorithm with the label information in To evaluate the performance of our proposed method,
a supervised manner. Thus, the learned representations from we used three common classification metrics [23], including
the unsupervised learning step can be improved with better accuracy, sensitivity, and specificity, which are defined as
intra-class compactness and inter-class discriminability. follows:
Recent studies have shown that SDAE has powerful non- TP +TN
Accur acy = (8)
linear feature learning ability and can capture more hidden T P + T N + FP + FN
information and high-level features with compactness. It has TP
been widely used in many challenging tasks, such as diagnosis Sensi ti vi t y = (9)
T P + FN
of Alzheimer’s disease [33], [34], classification of attention TN
deficit/hyperactivity disorder (ADHD) [35], and machinery Speci f i ci t y = (10)
T N + FP
fault diagnosis [36], and gained successful achievements.
where TP, TN, FP, and FN denote true positive, true negative,
Motivated by its excellent property in feature presentation
false positive, and false negative, respectively. In our study,
learning, in this study, SDAE is used for EEG and ET feature
we define ASD children as the positive class and TD chil-
learning and fusion.
dren as the negative class. For a comprehensive evaluation,
2) Unimodal Feature Learning Module: To reduce the
the Receiver Operating Characteristic (ROC) curve and the
high-dimensionality and redundancy of the extracted initial
resulting Area Under Curve (AUC) [22] are also used.
EEG and ET features, we designed an EEG-SDAE and an
Due to the limited samples, we perform a 10-fold subject-
ET-SDAE to learn the high-level EEG and ET feature repre-
independent cross-validation to evaluate the performance of
sentations, respectively. For two input modalities, X E E G and
the proposed model, where each subject is regarded as a
X E T , two feature learning models f E E G and f E T , are trained
sample. For TD and ASD subjects, we perform a stratified
through an unsupervised pre-training followed by a supervised
sampling strategy to ensure that each fold contains TD and
fine-tuning shown in Fig. 4, respectively. Once two models
ASD samples with a similar ratio. All samples are divided into
are trained, the top layer (i.e. classification layer) obtained
10 folds, nine folds of which are trained as the training set and
in the supervised step can be removed and the output of
the remaining one fold for testing. Thus, each fold is tested
the last second layer can be treated as the learned high-level
once and the evaluation process is repeated 10 times. Finally,
representation, which can be denoted as follows:
the average testing performance of ten folds is reported.
H E E G = f E E G (X E E G ) (5)
H E T = f E T (X E T ) (6) B. Parameter Setup
3) Multimodal Feature Fusion Module: Considering that In our proposed framework, there are three SDAE models
two different modalities provide different and complemen- for unimodal feature learning and multimodal feature fusion
tary discriminability for TD and ASD, we designed another of EEG and ET, respectively. For each SDAE, we designed a
Fusion-SDAE to fuse the high-level representations H E E G and 4-layer network structure, consisting of one input layer, two
H E T , which are concatenated to form a multimodal feature hidden layers, and one output layer. Specifically, three network
vector H F usion as follows: structures are set as follows: 125-64-50-2 for EEG feature
learning, 96-64-50-2 for ET feature learning, and 100-50-20-2
H F usion = [H E E G ; H E T ] (7)
for multimodal feature fusion. For simplicity, other parameters
Using H F usion as the input, the Fusion-SDAE model is for model training are set as the same values for three SDAEs.
trained to learn unified representations hidden in two different The noise level of each DAE is set to 0.1. The learning rate for
modalities. pretraining and fine-tuning are set as 0.1 and 0.2, respectively.
Once the proposed framework is trained, feature extraction, The iterations for pretraining and fine-tuning are 100 and
unimodal feature learning via EEG-SDAE and ET-SDAE and 200, respectively. During the training phase, we minimize the
HAN et al.: MULTIMODAL APPROACH FOR IDENTIFYING AUTISM SPECTRUM DISORDERS IN CHILDREN 2009

TABLE II
P ERFORMANCE (%) OF D IFFERENT M ETHODS

cost function by using the stochastic gradient descent (SGD)


optimization algorithm. All experiments are implemented with
a deep learning toolbox developed with MATLAB, which can
be available online .2

C. Overall Performance Comparison


Table II gives the comparative results with different models
in terms of accuracy, sensitivity, and specificity, where the
average and standard deviation of three metrics over ten Fig. 5. ROC curves of different models with the corresponding AUC
values. The horizontal axis of the ROC curve is the false positive rate
folds are reported for each method. It is obvious that our (FPR), also called the misdiagnosis rate, which is defined as the ratio
proposed MMSDAE model achieved the best performance in between the number of falsely identified ASD children (FP) and the total
terms of 95.56% accuracy, 92.5% sensitivity, and 98% speci- number of all true TD children (TN + FP). The vertical axis is the true
positive rate (TPR), also called the diagnosis rate, which is defined as
ficity, which significantly outperformed two unimodal methods the ratio between the number of accurately identified ASD children (TP)
(EEG-SDAE and ET-SDAE). Also, our MMSDAE presents and the total number of all true ASD children (TP + FN).
the best stability and robustness with the smallest standard
deviation among all compared methods. By comparing two while CONCAT-SDAE model only requires one SDAE to be
multimodal fusion methods, CONCAT-SDAE performs worse trained. Therefore, our proposed MMSDAE model has much
than our MMSDAE. This result demonstrates that the complex more computational cost, especially during the model training
relations between EEG and ET modalities are difficult to phase. In practice, we should make a trade-off between the
be captured. Different from a simple feature-level fusion, computation cost and the identification performance.
our MMSDAE can learn the shared representations between
D. Investigation of Complementary Characteristics of
two modalities at a higher level via a multilayer network
EEG and ET Data
architecture. In more detail, compared with two unimodal
methods, we find that ET features obtained higher average To further investigate the complementary characteristics of
classification accuracy (86.67%) than EEG features (81.11%), EEG and ET data, we calculate the confusion matrix of four
and this means that ET modality has better discriminative different models, which reveals the classification ability of
ability between ASD and TD children. each modality. Fig. 6 shows the confusion matrix results.
Fig. 5 shows the ROC curves of different models with Comparing Fig. 6 (a) and (b), we can conclude that EEG and
their corresponding AUC values. It can be clearly found that ET data have important complementary characteristics. Specif-
our MMSDAE obtains the best performance with an AUC ically, EEG-SDAE obtained a higher classification accuracy
value of 0.984, significantly higher than those of the other (82.5%) than ET-SDAE (77.7%) for ASD subjects, whereas
three methods. Also, the true positive rate of our MMSDAE for TD subjects, ET significantly outperforms EEG (94.0%
increases at the beginning of the ROC curve, which means its versus 80%). This proves that EEG and ET data contain
higher identification rate with a lower misdiagnosis rate for complementary information and have different classification
ASD subjects. This will provide accurate and reliable diagno- abilities for ASD and TD children. Fig. 6 (c) and (d) show
sis results in clinical applications. Notably, the performance that two multimodal models achieved significant performance
of ET exceeds the result obtained with EEG, indicating that improvements compared to two unimodal models (EEG-SDAE
ET features have the advantage of classifying ASD. In more and ET-SDAE). Multimodal fusion methods achieved the
detail, it can be observed that two multimodal fusion methods enhanced performance of identifying ASD children with an
(CONCAT-SDAE and MMSDAE) significantly outperform accuracy of 92.5% (10% improvement), where only three ASD
two unimodal methods (EEG-SDAE and ET-SDAE). This can children are not accurately identified and wrongly classified as
be explained that multimodal fusion can combine the comple- TD children. Also, our proposed MMSDAE model improved
mentary information in each modality and effectively enhance the classification accuracy from 94% to 98% with 4% perfor-
the performance. These results demonstrate the effectiveness mance improvement for identifying TD children, which means
of multimodal information fusion combining EEG and ET data the lowest misdiagnosis rate (only 1 child is wrongly classified
for ASD identification and diagnosis. as ASD). As shown in Table II, in terms of overall accuracy,
It should be noted that from model structure complexity, MMSDAE gained an improvement of 14.45% and 8.89%
our proposed MMSDAE needs to train three SDAE models, against EEG-SDAE and ET-SDAE, respectively. These results
reveal the contribution of each modality for ASD identification
2 https://github.com/rasmusbergpalm/DeepLearnToolbox and diagnosis and why the fusion of both modalities can
2010 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 30, 2022

Fig. 6. Confusion matrices of different methods: (a) EEG-SDAE; (b) ET-SDAE; (c) CONCAT-SDAE and (d) MMSDAE. Each row and each column
represent the predicted labels and the true labels, respectively.

enhance the identification performance. Multimodal fusion


methods integrate the advantages of EEG for classifying ASD
children and the advantages of ET for classifying TD children
and take full use of the complementary information between
EEG and ET to enhance the classification accuracies for each
modality.

E. Feature Visualization
To explicitly show the better feature learning and fusion
ability of our proposed method, we adopt the t-SNE tech-
nique [37] to do a 2-D feature visualization. Specifically,
we reduce unimodal original EEG and ET features, unimodal
learned EEG and ET features via SDAE, concatenated multi- Fig. 7. Feature visualization: (a) original EEG features; (b) original ET
modal features, and fused multimodal features via MMSDAE features; (c) concatenated EEG and ET features; (d) learned EEG fea-
into 2-D maps for visualization. The scatterplot of different tures via EEG-SDAE; (e) learned ET features via ET-SDAE and (f) fused
multimodal features via MMSDAE.
features is shown in Fig.7. It can be seen that original features
(EEG, ET, and concatenated) are randomly distributed in
the 2-D mapping with a larger overlap between ASD and
realizes the joint modeling and analysis of EEG and ET
TD, which indicates the difficulty in identifying ASD when
data and can learn complementary information between two
directly using original features. Also, we can observe that the
different modalities and enhance identification performance.
learned EEG features are not separated well between TD and
Experimental results have demonstrated that our proposed
ASD. A possible reason is that our proposed SDAE model
method achieved better identification performance with an
performs dimension reduction on raw EEG features, which
overall accuracy of 95.56% than unimodal methods and a
will lose some useful information and therefore result in poor
simple feature-level fusion method. It should be noted that
performance. From Fig. 7, the learned ET features via SDAE
our proposed framework is data-driven and can automatically
and fused multimodal features via MMSDAE exhibit better
learn and fuse useful information from neurophysiological and
intra-class cluster performance and inter-class discriminability.
behavioral modalities to identify ASD without the need of
Our proposed MMSDAE presents the best performance, which
much more diagnostic expert experience. It provides a new
supports the highest overall classification accuracy listed in
tool for an easier and more objective diagnosis of ASD in
Table II.
children, which can assist clinicians to make a precise diag-
nosis decision and improve diagnosis efficiency, suggesting its
IV. C ONCLUSION AND F UTURE W ORKS great potential in clinical applications.
In this paper, we proposed a new multimodal feature learn- It is worth noting that our developed method has the
ing and fusion framework for identifying ASD in children. following limitations. On the one hand, to deploy our model
Its core idea is a two-step multimodal learning model, where in practice, multimodal data from EEG and ET modalities
at the first step the high-level EEG and ET feature repre- must be available simultaneously. On the other hand, our
sentations are learned via an EEG-SDAE and an ET-SDAE model is a two-step approach containing three SDAE models,
respectively from initial high-dimensional features with infor- which requires much more computational costs than only one
mation redundancy, and then the learned EEG and ET feature SDAE model needs to be trained. To address these limitations,
representations are further fused for final classification via in our future work, we will investigate more efficient models
a Fusion-SDAE at the second stage. Our proposed model by introducing advanced neural network algorithms, such as
HAN et al.: MULTIMODAL APPROACH FOR IDENTIFYING AUTISM SPECTRUM DISORDERS IN CHILDREN 2011

convolution neural networks (CNN) and attention networks, [18] J. Kang, X. Han, J. Song, Z. Niu, and X. Li, “The identification of
to fuse multimodal data, even especially in the absence of children with autism spectrum disorder by SVM approach on EEG
and eye-tracking data,” Comput. Biol. Med., vol. 120, May 2020,
one modality during the model training. In addition, we will Art. no. 103722.
attempt to explore more effective features associated with ASD [19] S. Zhang, D. Chen, Y. Tang, and L. Zhang, “Children ASD evaluation
through joint analysis of EEG and eye-tracking recordings with graph
using advanced signal processing methods. convolution network,” Frontiers Human Neurosci., vol. 15, May 2021,
Art. no. 651349.
R EFERENCES [20] B. A. Cociu et al., “Multimodal functional and structural brain connec-
tivity analysis in autism: A preliminary integrated approach with EEG,
[1] C. Sarmiento and C. Lau, Diagnostic and Statistical Manual of Mental fMRI, and DTI,” IEEE Trans. Cognit. Develop. Syst., vol. 10, no. 2,
Disorders, 5th ed. Hoboken, NJ, USA: Wiley, 2020, pp. 125–129. pp. 213–226, Jun. 2018.
[2] G. Xu, L. Strathearn, B. Liu, and W. Bao, “Prevalence of autism [21] L. E. Mash et al., “Atypical relationships between spontaneous EEG and
spectrum disorder among US children and adolescents, 2014–2016,” fMRI activity in autism,” Brain Connectivity, vol. 10, no. 1, pp. 18–28,
Jama, vol. 319, no. 1, pp. 81–82, 2018. Feb. 2020.
[3] Z.-A. Huang, Z. Zhu, C. H. Yau, and K. C. Tan, “Identifying autism [22] J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-Arroyave,
spectrum disorder from resting-state fMRI using deep belief network,” B. Eskofier, J. Klucken, and E. Nöth, “Multimodal assessment of
IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 7, pp. 2847–2861, Parkinson’s disease: A deep learning approach,” IEEE J. Biomed. Health
Jul. 2021. Inform., vol. 23, no. 4, pp. 1618–1630, Jul. 2019.
[4] R. P. Goin-Kochel, V. H. Mackintosh, and B. J. Myers, “How many [23] J. Shi, X. Zheng, Y. Li, Q. Zhang, and S. Ying, “Multimodal neuroimag-
doctors does it take to make an autism spectrum diagnosis?” Autism, ing feature learning with multimodal stacked deep polynomial networks
vol. 10, no. 5, pp. 439–451, Sep. 2006. for diagnosis of Alzheimer’s disease,” IEEE J. Biomed. Health Inform.,
[5] H. C. Hazlett et al., “Early brain development in infants at high risk vol. 22, no. 1, pp. 173–183, Jan. 2018.
for autism spectrum disorder,” Nature, vol. 542, no. 7641, pp. 348–351, [24] H. Dibeklioglu, Z. Hammal, and J. F. Cohn, “Dynamic multimodal
2017. measurement of depression severity using deep autoencoding,” IEEE
[6] H. Zhang, R. Li, X. Wen, Q. Li, and X. Wu, “Altered time- J. Biomed. Health Inform., vol. 22, no. 2, pp. 525–536, Mar. 2018.
frequency feature in default mode network of autism based on improved [25] A. Delorme and S. Makeig, “EEGLAB: An open source toolbox for
Hilbert–Huang transform,” IEEE J. Biomed. Health Informat., vol. 25, analysis of single-trial EEG dynamics including independent component
no. 2, pp. 485–492, Feb. 2020. analysis,” J. Neurosci. Methods, vol. 134, no. 1, pp. 9–21, Mar. 2004.
[7] M. Kikuchi, Y. Yoshimura, K. Mutou, and Y. Minabe, “Magnetoen- [26] P. J. Durka, H. Klekowicz, K. J. Blinowska, W. Szelenberger, and
cephalography in the study of children with autism spectrum disorder,” S. Niemcewicz, “A simple system for detection of EEG artifacts in
Psychiatry Clin. Neurosci., vol. 70, no. 2, pp. 74–88, Feb. 2016. polysomnographic recordings,” IEEE Trans. Biomed. Eng., vol. 50, no. 4,
[8] J. Han et al., “Development of brain network in children with autism pp. 526–528, Apr. 2003.
from early childhood to late childhood,” Neuroscience, vol. 367, [27] S. Matlis, K. Boric, C. J. Chu, and M. A. Kramer, “Robust disruptions
pp. 134–146, Dec. 2017. in electroencephalogram cortical oscillations and large-scale functional
[9] T.-M. Heunis, C. Aldrich, and P. J. De Vries, “Recent advances in networks in autism,” BMC Neurol., vol. 15, no. 1, p. 97, Dec. 2015.
resting-state electroencephalography biomarkers for autism spectrum [28] A. R. Levin, K. J. Varcin, H. M. O’Leary, H. Tager-Flusberg, and
disorder—A review of methodological and clinical challenges,” Pediatric C. A. Nelson, “EEG power at 3 months in infants at high familial risk
Neurol., vol. 61, pp. 28–37, Aug. 2016. for autism,” J. Neurodevelop. Disorders, vol. 9, no. 1, p. 34, Dec. 2017.
[10] T. Takahashi et al., “Enhanced brain signal variability in children [29] J. Wang, J. Barstein, L. E. Ethridge, M. W. Mosconi, Y. Takarae, and J.
with autism spectrum disorder during early childhood,” Human Brain A. Sweeney, “Resting state EEG abnormalities in autism spectrum disor-
Mapping, vol. 37, no. 3, pp. 1038–1050, Mar. 2016. ders,” J. Neurodevelopmental Disorders, vol. 5, no. 1, p. 24, Dec. 2013.
[11] T. Wadhera and D. Kakkar, “Social cognition and functional brain [30] A. Catarino, O. Churches, S. Baron-Cohen, A. Andrade, and H. Ring,
network in autism spectrum disorder: Insights from EEG graph-theoretic “Atypical EEG complexity in autism spectrum conditions: A multiscale
measures,” Biomed. Signal Process. Control, vol. 67, May 2021, entropy analysis,” Clin. Neurophysiol., vol. 122, no. 12, pp. 2375–2383,
Art. no. 102556. Dec. 2011.
[12] M. Baygin et al., “Automated ASD detection using hybrid deep light- [31] K. Zeng et al., “Disrupted brain network in children with autism
weight features extracted from EEG signals,” Comput. Biol. Med., spectrum disorder,” Sci. Rep., vol. 7, no. 1, pp. 1–12, Dec. 2017.
vol. 134, Jul. 2021, Art. no. 104548. [32] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
[13] G. Tan, K. Xu, J. Liu, and H. Liu, “A trend on autism spec- “Stacked denoising autoencoders: Learning useful representations in a
trum disorder research: Eye tracking-EEG correlative analytics,” IEEE deep network with a local denoising criterion,” J. Mach. Learn. Res.,
Trans. Cognit. Develop. Syst., early access, Aug. 5, 2021, doi: vol. 11, no. 12, pp. 3371–3408, Dec. 2010.
10.1109/TCDS.2021.3102646. [33] S. Liu et al., “Multimodal neuroimaging feature learning for multiclass
[14] V. Yaneva, L. A. Ha, S. Eraslan, Y. Yesilada, and R. Mitkov, “Detecting diagnosis of Alzheimer’s disease,” IEEE Trans. Biomed. Eng., vol. 62,
high-functioning autism in adults using eye tracking and machine no. 4, pp. 1132–1140, Apr. 2015.
learning,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 28, no. 6, [34] R. Ferri et al., “Stacked autoencoders as new models for an accurate
pp. 1254–1261, Jun. 2020. Alzheimer’s disease classification support using resting-state EEG and
[15] T. Nakano et al., “Atypical gaze patterns in children and adults with MRI measurements,” Clin. Neurophysiol., vol. 132, no. 1, pp. 232–245,
autism spectrum disorders dissociated from developmental changes in Jan. 2021.
gaze behaviour,” Proc. Roy. Soc. B, Biol. Sci., vol. 277, no. 1696, [35] S. Liu et al., “Deep spatio–temporal representation and ensemble
pp. 2935–2943, Oct. 2010. classification for attention deficit/hyperactivity disorder,” IEEE Trans.
[16] W. Liu, M. Li, and L. Yi, “Identifying children with autism Neural Syst. Rehabil. Eng., vol. 29, pp. 1–10, 2021.
spectrum disorder based on their face processing abnormality: A [36] G. Jiang, H. He, P. Xie, and Y. Tang, “Stacked multilevel-denoising
machine learning framework,” Autism Res., vol. 9, pp. 888–898, autoencoders: A new representation learning approach for wind turbine
Aug. 2016. gearbox fault diagnosis,” IEEE Trans. Instrum. Meas., vol. 66, no. 9,
[17] G. Wan et al., “Applying eye tracking to identify autism spectrum dis- pp. 2391–2402, Sep. 2017.
order in children,” J. Autism Develop. Disorders, vol. 49, pp. 209–215, [37] L. Van Der Maaten and G. Hinton, “Visualizing data using t-SNE,”
Jan. 2019. J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, Nov. 2008.

You might also like