Slides
Slides
Mentor
Debasis Samanta
Associate Professor
Department of Computer Science, IIT Kharagpur
Abstract
• In this study, we implemented a data-driven approach to classify ASD
patients and typically developing (TD) participants by using resting-
state fMRI data.
• SVM and KNN were used for classification purpose.
• Neural Networks were also used (just for comparison)
• Each classifier was fine-tuned via grid search on hyper-parameters via
cross-validation.
• Finally, a stacked model was also used comprising of fine-tuned
classifier as its base models.
What is Autism Spectrum Disorder?
Autism Spectrum Disorder
• It is a complex developmental condition that involves persistent
challenges in social interaction, speech and nonverbal
communication, and restricted/repetitive behaviors.
• One in 59 children is estimated to have autism in United States.
Why Machine Learning?
Autism Spectrum Disorder
• There is no medical test for autism.
• Early diagnosis and treatment are important to reducing the
symptoms of autism and improving the quality of life for people with
autism and their families.
• This procedure usually takes a lot of time like 5-10 years.
• Hence, the objective is to use machine learning algorithms for
classification.
Why Functional Connectivity Analysis important?
Role of Functional Connectivity
• Past extensive brain imaging studies have reported that ASD is
associated with brain connectivity.
• It has been found that in case of autism, some brain regions have
weaker functional connectivity (than normal) which may be
responsible for their social behaviour, while some of them are highly
correlated (than normal) which may account for their exceptional
ability to concentrate.
• Studies have found that most of the gold medallists in IMO, IOI, IPhO,
etc. have autism.
The Dataset
ABIDE Dataset
• For this task, we used ABIDE dataset
• It contains aggregated functional and structural brain imaging data
collected from laboratories around the world to accelerate our
understanding of the neural bases of autism.
• The ABIDE 1 dataset contains fMRI images of 1112 subjects, 539 from
individuals with ASD, and 573 from typical controls.
Pre-processing
Pre-processing
• For this task, I used the data provided by NYU only. Reasons being:
• Data provided from different laboratories have different configuration for
their instruments, way of examining patients, etc.
• Of all the sources, NYU had maximum number of subjects (172)
• For pre-processing, I used CPAC pipeline (as the task involves
functional connectivity analysis).
nilearn
• nilearn is the library for doing ‘Statistics for Neuroimaging in Python’
• Uses sklearn as backend
• This library is used in this project for:
• Extracting pre-processed ABIDE 1 dataset
• Building connectivity matrices from the time-series signals obtained from
fMRI images
Brain MAP / Atlas
Brain Atlas
• As per Wikipedia, a brain atlas is composed of serial sections along
different anatomical planes of the healthy or diseased developing or
adult animal or human brain where each relevant brain structure is
assigned a number of coordinates to define its outline or volume.
• Brain atlases are contiguous, comprehensive results of visual brain
mapping and may include anatomical, genetical or functional
features.
• In this project, I used AAL atlas, which has 116 ROIs (functional
clusters).
Extracting data and building
connectivity matrices
Snippet
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import nilearn
𝒑 Best Accuracy 𝒌
1 64.55 29
2 63.99 23
SVM vs KNN
SVM vs KNN
• The best accuracy observed for SVM is 67.97 for radial kernel.
• The best accuracy observed for KNN is 64.55 for k = 29, p = 1.
LR
SVM
Base Models
KNN
Meta Model
LR
Performance of Stacked model
Standalone fine-tuned
SVM classifier
outperforms the
stacked model
Classification using Neural Networks
Neural Networks
• The best ML classifier of the task is SVM giving accuracy as high as
67.97.
• Let’s see whether Neural Networks can outperforms the SVM or not.
Tuning Neural Networks
• Neural Networks have a large number of hyper-parameters
• # of hidden layers
• # of nodes in each hidden layer
• Non-linear activation used in hidden layers
• Learning rate
• Optimization algorithms (Adam, Momentum, RMSProp, Adagrad)
• Regularization algorithms (Dropout, L2-regularization)
• .
• .
• .
Tuning Neural Networks
• For this task, I am using sklearn’s MLP API.
• Some background information:
• It uses ‘Adam’ optimizer by default.
• It uses SGD for optimization.
• We can vary the (initial) learning rate and the architecture
• I trained the model for 3000 epochs.
Performance of NN