Prediction of Autism Spectrum Disorder
Prediction of Autism Spectrum Disorder
YELAHANKA, BANGALORE.
Department of Computer Science & Engineering
❏ The choice of data entirely depends on the problem you’re trying to solve.
Picking the right data must be your goal, luckily, almost every topic you can think
of has several datasets which are public & free.
3 of my favorite free awesome website for dataset hunting are:
❏ Kaggle which is so organized. You’ll love how detailed their datasets are, they
give you info on the features, data types, number of records. You can use their
kernel too and you won’t have to download the dataset.
❏ Reddit which is great for requesting the datasets you want.
❏ Google Dataset Search which is still Beta, but it’s amazing
System Analysis
Handling the data :
➢ One of the hardest step and the one that will probably take the longest
unless you’re lucky with a complete perfect dataset, which is rarely the
case. Handling missing data in the wrong way can cause disasters.
➢ Generally, there are many solutions such as:
○ null value replacement
○ mode/median/average value replacement
○ deleting the whole record
○ Interpolation \ Extrapolation
○ Forward filling \ Backward filling — Hot Deck
○ Multiple imputation
System Analysis
System Analysis
System Analysis
System Analysis
REQUIREMENT
ANALYSIS
Hardware Requirements:
▪ System : Pentium IV 2.4 GHz.
▪ Hard Disk : 500 GB.
▪ Ram : 4 GB
▪ Any desktop / Laptop system with above configuration or higher level
Software Requirements:
▪ Operating system : Windows XP / 7
▪ Coding Language : Python
▪ Software : Anaconda
▪ IDE : Jupyter Notebook
▪ Database : SQLite
System Analysis
Feature Selection:
❏Feature engineering is the process of using domain knowledge of the data to create
features that make machine learning algorithms work. If feature engineering is done
correctly, it increases the predictive power of machine learning algorithms by creating
features from raw data that help facilitate the machine learning process.
❏Feature engineering is the most important art in machine learning which creates the
huge difference between a good model and a bad model. Feature engineering is the
process of transforming raw data into features that better represent the underlying
problem to the predictive models, resulting in improved model accuracy on unseen data.
•1 Does your child look at you when you call his/her name?
•A2 How easy is it for you to get eye contact with your child?
•A3 Does your child point to indicate thats/he wants something?(e.g.a toy that is
out of reach)
•A4 Does your child point to share interest with you?(e.g.pointing at an interesting
sight)
•A7 If you or someone else in the family is visibly upset,does your child show signs
of wanting to comfort them?(e.g.stroking hair,hugging them)
•A8 Would you describe your child’s first words as:
Next, we will slice a single data set into a training set and test set.
Make sure that your test set meets the following two conditions:
Is representative of the data set as a whole? In other words, don't pick a test set
with different characteristics than the training set.
Assuming that your test set meets the preceding two conditions, your goal is to
create a model that generalizes well to new data. Our test set serves as a proxy for
new data.
System Implementation
Step 1 − First, start with the selection of random samples from a given dataset.
Step 2 − Next, this algorithm will construct a decision tree for every sample.
Then it will get the prediction result from every decision tree.
Step 3 − In this step, voting will be performed for every predicted result.
Step 4 − At last, select the most voted prediction result as the final prediction
result.
System Implementation
Model 2-SVM
An SVM model is basically a representation of different classes in a
hyperplane in multidimensional space. The hyperplane will be generated in an
iterative manner by SVM so that the error can be minimized. The goal of SVM is to
divide the datasets into classes to find a maximum marginal hyperplane (MMH).
System Implementation
Model 2-SVM
Support Vectors − Datapoints that are closest to the hyperplane is called support
vectors. Separating line will be defined with the help of these data points.
Margin − It may be defined as the gap between two lines on the closet data points
of different classes. It can be calculated as the perpendicular distance from the line
to the support vectors. Large margin is considered as a good margin and small
margin is considered as a bad margin
System Implementation
RESULT ANALYSIS
❏ It is the easiest way to measure the performance of a classification problem where
the output can be of two or more type of classes.
❏ A confusion matrix is nothing but a table with two dimensions viz. “Actual” and
“Predicted” and furthermore, both the dimensions have “True Positives (TP)”, “True
Negatives (TN)”, “False Positives (FP)”, “False Negatives (FN)” as shown below –
Explanation of the terms associated with confusion matrix are as follows −
•True Positives (TP) − It is the case when both actual class & predicted class of data
point is 1.
•True Negatives (TN) − It is the case when both actual class & predicted class of data
point is 0.
•False Positives (FP) − It is the case when actual class of data point is 0 & predicted class
of data point is 1.
•False Negatives (FN) − It is the case when actual class of data point is 1 & predicted
class of data point is 0.
References
❏ WHO, Autism spectrum disorders, 2017 [Accessed August 22, 2018].
[Online]. Available: http://www.who.int/news-room/fact-sheets/detail/
autism-spectrum-disorders.