0% found this document useful (0 votes)
16 views

BCSE497J Project I Report

The document presents a project focused on the early detection of Alzheimer's disease using ensemble machine learning techniques. It outlines the significance of early diagnosis for improving patient outcomes and details the methodologies employed, including various machine learning models like Random Forests and Gradient Boosting Machines. The findings suggest that these techniques can enhance prediction accuracy, paving the way for timely interventions and better management of the disease.

Uploaded by

nishantha3762
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

BCSE497J Project I Report

The document presents a project focused on the early detection of Alzheimer's disease using ensemble machine learning techniques. It outlines the significance of early diagnosis for improving patient outcomes and details the methodologies employed, including various machine learning models like Random Forests and Gradient Boosting Machines. The findings suggest that these techniques can enhance prediction accuracy, paving the way for timely interventions and better management of the disease.

Uploaded by

nishantha3762
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

B.Tech.

BCSE497J - Project-I

TECHNIQUES FOR CLASSIFYING EARLY


DETECTION OF
ALZHEIMER'S DISEASE

Submitted in partial fulfillment of the requirements for the degree of

Bachelor of Technology
in
B.Tech
by

21BCT0093 Arun SR
21BCE2401 Dharun BS
21BCE3574 Kishore Kumaar RK

Under the Supervision of


Sahaya Arul Mary
Designation
School of Computer Science and Engineering (SCOPE)

November 2024
1
2
3
KNOWLEDGEMENTS

I am deeply grateful to the management of Vellore Institute of Technology (VIT) for providing
me with the opportunity and resources to undertake this project. Their commitment to fostering a
conducive learning environment has been instrumental in my academic journey. The support and
infrastructure provided by VIT have enabled me to explore and develop my ideas to their fullest
potential.

My sincere thanks to Dr. Ramesh Babu K, the Dean of the School of Computer Science and
Engineering (SCOPE), for his unwavering support and encouragement. His leadership and vision
have greatly inspired me to strive for excellence. The Dean’s dedication to academic excellence
and innovation has been a constant source of motivation for me. I appreciate his efforts in creating
an environment that nurtures creativity and critical thinking.

I express my profound appreciation to [Head of Department’s Name], the Head of the


[Department Name], for his/her insightful guidance and continuous support. His/her expertise and
advice have been crucial in shaping the direction of my project. The Head of Department’s
commitment to fostering a collaborative and supportive atmosphere has greatly enhanced my
learning experience. His/her constructive feedback and encouragement have been invaluable in
overcoming challenges and achieving my project goals.

I am immensely thankful to my project supervisor, [Supervisor’s Name], for his/her dedicated


mentorship and invaluable feedback. His/her patience, knowledge, and encouragement have been
pivotal in the successful completion of this project. My supervisor’s willingness to share his/her
expertise and provide thoughtful guidance has been instrumental in refining my ideas and
methodologies. His/her support has not only contributed to the success of this project but has also
enriched my overall academic experience.

Thank you all for your contributions and support.

Name of the Candidate

4
TABLE OF CONTENTS
<Contents, Times New Roman 12, Line spacing 1.5>

Sl.No Contents Page No.


Abstract 8
1. INTRODUCTION 9
1.1 Background <Capitalize Each Word, Normal> 1
1.2 Motivations
1.3 Scope of the Project
2. PROJECT DESCRIPTION AND GOALS 11
2.1 Literature Review
2.2 Research Gap
2.3 Objectives
2.4 Problem Statement
2.5 Project Plan
3. TECHNICAL SPECIFICATION 16
3.1 Requirements
3.1.1 Functional
3.1.2 Non-Functional
3.2 Feasibility Study
3.2.1 Technical Feasibility
3.2.2 Economic Feasibility
3.2.2 Social Feasibility
3.3 System Specification
3.3.1 Hardware Specification
3.3.2 Software Specification
4. DESIGN APPROACH AND DETAILS 18
4.1 System Architecture
4.2 Design
4.2.1 Data Flow Diagram
4.2.2 Use Case Diagram
4.2.3 Class Diagram
4.2.4 Sequence Diagram
5. METHODOLOGY AND TESTING 22

5
<< Module Description >>
<< Testing >>
6. PROJECT DEMONSTRATION 28
7. RESULT AND DISCUSSION 29
8. CONCLUSION 31
9. REFERENCES 31
APPENDIX A – SAMPLE CODE 33

6
List of Abbreviations

Abbreviation Full Form


AD Alzheimer’s Disease
ML Machine Learning
MRI Magnetic Resonance Imaging
PET Positron Emission Tomography
SVM Support Vector Machine
k-NN k-Nearest Neighbors
CNN Convolutional Neural Network
ROC Receiver Operating Characteristic
IDE Integrated Development Environment
DFD Data Flow Diagram
F1 Score F1 Measure (Harmonic mean of Precision and
Recall)

7
ABSTRACT
Alzheimer's disease (AD) is one of the most prevalent forms of dementia,
characterized by progressive cognitive decline that impairs memory, thinking, and
daily functioning. While early symptoms are subtle, they gradually worsen, leading
to severe impairment and dependency. A major challenge in combating Alzheimer's
is the lack of a cure, with current treatments primarily aimed at managing
symptoms. This makes early detection of the disease crucial, as interventions at
earlier stages may help slow its progression and mitigate its impact. Early diagnosis
allows for timely therapeutic actions, improved care planning, and a better quality
of life for patients and caregivers.

In this study, we explored the use of ensemble machine learning models to predict
the onset of Alzheimer’s disease. Ensemble learning combines multiple models to
improve prediction accuracy, making it an ideal approach for a complex condition
like Alzheimer’s, where a multitude of factors—including genetic, clinical, and
lifestyle variables—contribute to disease progression. Specifically, we implemented
Random Forests, Gradient Boosting Machines (GBMs), and Voting Classifiers to
create a robust prediction framework. Additionally, we employed Stacked
Generalization (stacking), which combines multiple base models with a meta-
classifier to further enhance predictive performance.

The findings of this study highlight the effectiveness of ensemble machine learning
techniques in predicting the early onset of Alzheimer’s disease. By leveraging these
models, we can improve the early identification of individuals at risk, enabling more
timely interventions. Future work may include incorporating deep learning
approaches and expanding datasets to further enhance prediction accuracy.
Ultimately, these advances can contribute to better management of Alzheimer’s
disease, offering hope for slowing its progression and improving patient outcomes.

8
1. INTRODUCTION

Alzheimer's disease(AD) is a chronic condition that leads to the degeneration of brain


cells leading to memory enervation. AD also occurs due to genetics, aging, and environmental
factors. A progressive neurologic disorder that causes brain shrinkage (atrophy) and cell
death.This causes a continuous decline in behavioral and social skills that affects a person's
ability to function independently. Alzheimer's disease has a very high impact. Forgetting
recent conversations or events is one of the disease's early symptoms. A person with
Alzheimer's disease will develop severe memory impairment and lose the ability to perform
daily tasks as the disease progresses. Around worldwide 29.5 million people approximately
suffered from Alzheimer’s disease in 2015. At the age of 65, it most often begins in people,
but 4% to 5% of cases are early-onset before these ages. Due to the cause of Dementia, 1.9
million deaths in the year 2015. AD is one of the most financial diseases in developed
countries. In India, some form of Dementia is suffering by more than 4 million people. In
2050, the number of people suffering from AD will set to triple. Approximately 5.8 million
Americans aged 65 and older, according to a source, have Alzheimer's disease. Eighty percent
of them are 75 or older. Between 60% and 70% of the estimated 50 million dementia sufferers
worldwide are thought to have Alzheimer's disease. Alzheimer's disease has no cure.
Complications such as dehydration, malnutrition, or infection occur in the advanced stages of
the disease, leading to death .This has a huge psychological and economic burden on people,
society, and the country. No effective drug existed for a very long time. The recently first
therapeutic drug, Aduhelm, was approved. This drug has not shown its efficiency though.
Machine learning methods have been explored and used in many medical sectors, such as lung
cancer, skin cancer, breast cancer, etc., Right now, machine learning is playing a key role in
health-related areas. Machine learning provides novel techniques to address high-dimensional
data, integrate data from different sources, model the etiological and clinical heterogeneity,
and discover new biomarkers. These directions have the potential to help us better manage the

9
disease progression and develop novel treatment strategies. The aim of this review is to detect
Alzheimer's disease in the primitive stage and summarize different ML methods that have
been applied to study AD.

1.1 Background:
Alzheimer's disease (AD) is a progressive neurological disorder that primarily affects memory,
thinking, and behavior. It is a major cause of disability and dependency among older adults, with
early detection being crucial for managing and potentially slowing its progression. Advances in
machine learning provide promising methods for detecting AD at an earlier stage, potentially
before symptoms become severe.
1.2 Motivations:
The motivation behind this project is to harness machine learning for early Alzheimer’s detection.
Accurate early diagnosis could significantly enhance patient care and slow progression through
timely intervention. This project seeks to apply and assess machine learning classifiers to support
clinicians in recognizing early AD indicators.
1.3 Scope of the Project:
The project aims to develop a system that classifies early signs of Alzheimer's using machine
learning techniques. It focuses on experimenting with classifiers, preprocessing techniques, and
model evaluation to enhance accuracy, and provides a foundational framework that could be
expanded for clinical use.

10
2. PROJECT DESCRIPTION AND GOALS

2.1 Literature Review

Research Work Model Used Future Work


paper no.

1. Investigate either the behavior of the main 2D CNN, 3D CNN, Combining HC and
existing off-the-shelf CNNs or a deep Ensemble models, CNN features.
ensemble-based strategy aimed at realizing a Handle unbalanced
comprehensive CAD framework based on classes.
patient MRIs and fMRIs.

2. To diagnose AD and MCI, the ISDL model RVM, CNN, Deep learning model
uses joint deep feature extraction and critical Attention-CNN, capable of dealing
cortical region identification. en3D CNN, SCFR, with inter-class
3D GCNet, 3D similarity and
Efficient-B0, ISDL. intra-class variation.

3. Systematically review the current state of Multitask Future efforts will be


using deep learning techniques in the framework based made to address poor
diagnosis of Alzheimer's disease using on LSTM, performance brought
neuroimaging data. multi-modal or on by small datasets
multi-data due to the likelihood
approach, 2D CNN, of overfitting
3D-CapsNet, and occurrences.
3D-AEs.

4. Deep learning model for auxiliary CNN, ANN, SVM, NIL


Alzheimer's disease diagnosis that simulates linear method.
the clinical diagnostic process.

11
5 Assessing the recent memory loss in IOT systems, Early Prediction of
interactions with people and between the Machine Learning Alzheimer’s Disease.
virtual and physical worlds; abnormal Models and Deep
recognition, expression, and understanding of Learning Models
diagnostic differences in language issues
words.

6 Multi-task learning approach based on hybrid SVM,DBN-3,GDB If the discriminant


feature maps and a high-order discriminative M-2,FitNet-10,Goo version of
convolutional Boltzmann machine. gleNet,CDBN etc. DCssCDBM acts
better, investigate it to
see if it might be used
for early disease
identification, such as
epilepsy detection.

7 Comparative analysis of around 100 Deep feed forward NIL


publications published since 2019 that use neural networks
generative models, CNNs, and other (DFFNN),Convolut
fundamental deep architectures for ional neural
Alzheimer's Disease diagnosis networks (CNNs),
Recurrent neural
networks (RNNs)
and Deep
polynomial
networks (DPNs)

8 Proposed a technique for analyzing medical APRIORI NIL


data with the goal of identifying risk algorithm and
categories. Generation of
Association Rules

9. Alzheimer’s disease classification using SVM and ANN Research work is


SVM and Artificial Neural networks to needed to devise Deep
distinguish various stages of the disease Learning algorithms
to integrate data from
several early detection
modalities.

10. Proposing an efficient framework for MDR model using NIL


identifying epistatic interactions between all Deep Learning
pairs of nucleotides in a DNA sequence by
Integrating Multifactor Dimensionality

12
Reduction (MDR) with Deep Learning

11. Proposing a uni-data, a multi-model 3D-ResNet, Extending the model


framework for Alzheimer's disease detection Random Tree to detect both AD and
which is implemented using five-fold Embedding,XGBoo MCI. also plans to
cross-validation which boosts the st,ensemble models add brain images
classification performance and thereby acquired from
reduces overfitting imaging modalities to
increase the diversity
of individual learners.

12. A conditional deep triplet network model is Deep Triplet NIL


used to overcome the limitation of lack of network
image data (limited image samples) and to
provide higher accuracy with minimum
samples

13. Early prediction of Alzheimer's disease. Support vector They are combining
machine and MRI scans with
decision tree. psychological
parameters

14 Alzheimer’s disease in the ADNI goal is to Neural networks, NIL


investigate whether Positron Emission Random forest,
Tomography (PET), sequential Magnetic SVM, KNN,
Resonance Image (MRI), and the biotic Gradient Boost
markers, neuropsychological and the
objective evaluation are being
connected.

15 The preprocessed fMRI 4D data in Nifti Neural networks - Need to generalize


format were concatenated across z and t axes CNN this method for all age
and the converted to a stack of 2D images in groups and extend this
JPEG using Neuroimaging package Nibabel method for other
and Python OpenCV. Next, images were stages of Alzheimer's
labeled for binary classification disease as well.
of Alzheimer’s Vs Normal data. The labeled
images were converted to lmdb storage
Databases for high-throughput to be fed into
Deep Learning platform. LeNet model which
is based on Convolutional Neural Network
architecture from Caffe DIGITS 0.2 - deep
learning framework (Nvidia version) - was

13
used to perform binary image classification.

16 Six different machine learning models are K-NN, Naive The accuracy of the
used to find the five different stages of Bayes, Decision AD stages
Alzheimer’s disease using ADNI dataset. Tree, Rule classification could be
Induction, further improved by
Generalized Linear increasing the number
Model, Deep of instances for EMCI
learning models and SMC classes so
that the model can be
trained with sufficient
and balanced data for
all classes.

17 They created a deep learning architecture Auto encoders, Fine tune parameters
with stacked auto-encoders and a softmax softmax regression,
output layer to circumvent the issue and aid ROI sensitivity
in the diagnosis of AD and its symptomatic evaluation
stage (Mild Cognitive Impairment).

18 Review paper based on Diagnosis of Paperwork based Novel variants of


alzheimer’s disease using Machine learning on major models ANN can be used.
models such as More importance
ANN,SVM,DL,dee must be given to
p learning and clinical
ensemble learning interpretability of
were discussed deep learning models

19 Developed models that can extract and SVM models have SVM parameters can
classify digital EEG signal (dEEG) dataset been used to search be tuned
patterns using an ML technique known as patterns in EEG
Support Vector Machines (SVM). epochs

20 Modelling our brain using deep learning SVM, DNN Accuracy is only
method in such a way that it differentiates enough to test in a
normal brain and Alzheimer’s disease real clinic. Not
affected brain. enough to trust in the
real-time process.

14
2.2 Research Gap:
While progress has been made in detecting Alzheimer’s at an advanced stage, early-stage detection
remains challenging. Many existing models are optimized for accuracy with specific datasets but
often fail to generalize across different populations. Furthermore, deep learning models used in
imaging require extensive computational resources and can be prohibitive for non-specialized clinical
settings. This project seeks to bridge this gap by evaluating classifiers that are less resource-intensive,
can work with tabular clinical data, and may be generalized to detect early signs of Alzheimer’s
disease across diverse datasets.
2.3 Objectives:
The primary objectives of this project are as follows:
1. Develop a classification model that can accurately predict early Alzheimer’s indicators using
clinical data.
2. Evaluate and compare the effectiveness of machine learning models like Random Forest and
Voting classifiers, considering factors such as accuracy, sensitivity, and specificity.
3. Explore deep learning techniques to assess if they provide substantial improvements in early
detection accuracy.
4. Identify optimal data preprocessing techniques, such as normalization and feature selection,
to enhance model performance and consistency across various datasets.
2.4 Problem Statement:
Alzheimer’s disease affects millions of individuals worldwide, and early detection is critical for
effective management. Despite advancements, detecting Alzheimer’s in its early stages remains
difficult and requires a more efficient, accurate classification approach that is accessible in clinical
environments. This project aims to address this problem by developing a machine learning model that
enhances early-stage detection accuracy, potentially aiding in quicker, more reliable diagnostic
processes.
2.5 Project Plan:
The project follows a systematic plan with these key stages:
1. Data Collection and Preprocessing: Gather and preprocess clinical data, focusing on selecting
features relevant to Alzheimer’s progression. Apply data cleaning, normalization, and feature

15
engineering to ensure model readiness.
2. Model Training and Evaluation: Train and evaluate models like Random Forest, Voting classifiers,
and, potentially, deep learning models. Perform cross-validation to assess performance metrics and
ensure reliability.
3. Comparative Analysis: Compare model performance based on criteria like accuracy, sensitivity,
specificity, and computational efficiency. Highlight any trade-offs between model complexity and
effectiveness.
4. Result Interpretation and Recommendations: Analyze results to determine the most effective
model, provide insights into clinical applicability, and discuss potential improvements and extensions
for future work.

3. TECHNICAL SPECIFICATION
• 3.1 Requirements
o 3.1.1 Functional Requirements: Outline the essential functions the system must perform to
meet project goals.
▪ Data Preprocessing: The system must clean, normalize, and transform raw data to
ensure compatibility with machine learning algorithms.
▪ Model Training and Evaluation: It should support training with various machine
learning models, particularly Random Forest, Voting classifiers, and potentially deep
learning models.
▪ Classification: The system should classify Alzheimer’s stages with a focus on early
detection and generate predictive outputs.
▪ Performance Monitoring: Log and track model accuracy, sensitivity, specificity, and
execution time to refine the approach.
o 3.1.2 Non-Functional Requirements: Define performance-related requirements that enhance
the system's usability and reliability.
▪ Accuracy: Achieve a high level of prediction accuracy, particularly in early-stage
Alzheimer’s classification.
▪ Scalability: Ensure the system can handle larger datasets in case of future expansions.
▪ Usability: Provide an intuitive interface or clear outputs that healthcare professionals
16
can interpret easily.
▪ Security: Implement data protection measures to safeguard sensitive patient
information.
▪ Efficiency: Optimize processing time to allow faster model training and predictions.
• 3.2 Feasibility Study
o 3.2.1 Technical Feasibility: Assess the project's technical requirements, including the
availability of machine learning libraries (e.g., scikit-learn, TensorFlow) and hardware
capabilities for model training. Confirm that the team has the necessary skills and resources to
develop, test, and deploy the system.
o 3.2.2 Economic Feasibility: Evaluate the cost-effectiveness of the project, considering
resource requirements like computational power, potential costs of data acquisition, and
software licensing. Determine if benefits, such as improved diagnosis accuracy and potential
clinical utility, justify the costs.
o 3.2.3 Social Feasibility: Examine the social impact, specifically in improving early diagnosis
of Alzheimer’s. Highlight potential benefits, such as better patient outcomes, enhanced
healthcare efficiency, and increased awareness of Alzheimer’s symptoms among clinicians and
caregivers.
• 3.3 System Specification
o 3.3.1 Software Specification: Specify the software tools, libraries, and environments needed
for the project:
▪ Programming Languages: Python for data handling and model development.
▪ Libraries: scikit-learn for machine learning algorithms, TensorFlow or PyTorch for
potential deep learning implementations.
▪ Data Processing Tools: Pandas and NumPy for data manipulation, and Matplotlib or
Seaborn for data visualization.
▪ Development Environment: Jupyter Notebook or any IDE suitable for collaborative
development and code management.

17
4. DESIGN APPROACH AND DETAILS
4.1 System Architecture

18
4.2 Design
4.2.1 Data Flow Diagram

19
4.2.2 Use Case Diagram

4.2.3 Class Diagram

20
4.2.4 Sequence Diagram

21
5. METHODOLOGY AND TESTING

Initially we obtained a dataset regarding Alzheirmer’s disease prediction. The


dataset containedaround 400 values and had 15 attributes out of which the “Group” attribute was
taken to be the
attribute. The dataset contained many null values and because of this we had to Pre-Process the
dataset by removing null values and also by removing the rows which contained the null values. Also,
the redundant columns/attributes were dropped to train the model more efficiently. After
Pre-Processing, the dataset was split into training and testing data. Then we applied 5 different
Machine Learning models (Ensemble/Hybrid models) to fit into the dataset and the corresponding
accuracies were obtained and compared.
The main goal of this paper is to determine/predict the presence of Alzheimer's disease given a set of
attributes, so in order to achieve this, we had implemented 5 different Ensemble Models and obtained
the accuracies of the models and finally concluded which was the Best-Fitting Ensemble Model.
4.1. DATASET
The used dataset was obtained from Kaggle website and the name of the dataset is “Detecting
Early Alzheimer’s” (Oasis Dataset). The dataset contained 374 different records with 15 attributes.
The
chosen target attribute was “Group”, which contained values “Demented”, “Non-Demented” and
“Converted”.
4.2. DATA PRE-PROCESSING
The Data Pre-Processing stage is the most essential phase of the Data Analysis Life-Cycle,
which makes the data clean and can be used to obtain accurate and efficient results.
The Pre-Processing steps that were implemented are as follows
1. Identifying Null Values.
2. Removal of Null values from the Dataset.
3. Identifying the Rows which contained the Null Values.
4. Removal of all the Rows that contained Null values.

22
5. Replacement of Values of Group Attribute from Converted to Demented.
6. Encoding the values of Target Attribute (Group) [Demented - 0, Non-Demented - 1]
7. Dropping of redundant columns.
8. Identifying Rows with missing values.
9. Removal of all the Rows with missing values.
4.3. DATA SPLITTING
The Pre-Processed data is now split into Training and Testing data as follows:
1. Splitting the Dataset into variables X and Y.
2. Here X variable contains the Predictor Attributes and Y variable contains the Target Attribute.
3. The X and Y variables were divided into Training and Testing sets respectively.
4. The split ratio that was implemented was 80:20.
4.4. DATA MODELING
The Training and Testing sets are then applied to fit various Machine Learning ensemble
models to determine the appropriate model to be used for the classification of early Alzheimer’s
disease. The various models used are listed and discussed below:
4.4.1.Random Forest Classifier Model
The supervised learning method includes the well-known machine learning
algorithm Random Forest. It can be applied to ML Classification and Regression issues. Its
foundation is the idea of ensemble learning, which is the process of mixing various
classifiers to solve a challenging problem and enhance the performance of the
model.Random Forest is a classifier that, as the name implies, "contains a number of
decision trees on various subsets of the provided dataset and takes the average to enhance
the predictive accuracy of that dataset." Instead of depending on a single decision tree, the
random forest uses forecasts from all of the trees to anticipate the outcome based on which
predictions received the most votes.
Algorithm:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

23
clf.score(X_test, y_test)
4.4.2.Gradient Boosting Classifier Model
Each prediction in gradient boosting aims to outperform the one before it by
lowering the errors. Gradient Boosting's intriguing concept, however, is that it really fits a
new predictor to the residual errors created by the preceding predictor, rather than fitting a
prediction on the data at each iteration. The resulting technique, known as gradient-boosted
trees, typically beats random forest when a decision tree is the weak learner. The
construction of a gradient-boosted trees model follows the same stage-wise process as
previous boosting techniques, but it generalizes other techniques by enabling the
optimization of any differentiable loss function.
Algorithm:
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf.fit(X_train, y_train)
clf.score(X_test,y_test)
4.4.3. AdaBoost Classifier Model
Yoav Freund and Robert Schapire proposed the Ada-boost or Adaptive Boosting
ensemble boosting classifier in 1996. To improve classifier accuracy, it combines several
classifiers. An iterative ensemble algorithm is AdaBoost. AdaBoost classifier combines a
number of ineffective classifiers to create a strong classifier that has a high degree of
accuracy. The fundamental idea underlying Adaboost is to train the data sample and set the
classifier weights in each iteration in a way that provides accurate predictions of uncommon
observations. Any machine learning method that accepts weights from the training set can
be used as the basis classifier.
Algorithm:
from sklearn.ensemble import AdaBoostClassifier
clf= AdaBoostClassifier(random_state=96)
clf.fit(X_train,y_train)
4.4.4.Extra Trees Classifier Model

24
Extremely Randomized Trees Classifier, also known as Extra Trees Classifier, is a
form of ensemble learning technique that combines the findings of various de-correlated
decision trees gathered in a "forest" to produce its classification outcome. The only way it
differs conceptually from a Random Forest Classifier is in how the decision trees in the
forest are built.
The initial training sample is used to build each decision tree in the Extra Trees
Forest. Then, each tree is given a random sample of k features from the feature-set at each
test node, from which it must choose the best feature to divide the data according to certain
mathematical criterion (typically the Gini Index).
Algorithm:
from sklearn.ensemble import ExtraTreesClassifier
clf = ExtraTreesClassifier(n_estimators=100, random_state=0)
clf.fit(X_train, y_train)
4.4.5.Voting Classifier Model
A voting classifier is a machine learning model that gains experience by training on a
collection of several models and forecasts an output (class) based on the class with the
highest likelihood of being the output.
To predict the output class based on the highest majority of votes, it merely averages
the results of each classifier that was passed into the voting classifier. The concept is to
develop a single model that learns from these models and predicts output based on their
aggregate majority of voting for each output class rather than developing separate dedicated
models and determining the correctness for each one.
Algorithm:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KneighborsClassifier
from sklearn.svm import SVC
from itertools import product
from sklearn.ensemble import VotingClassifier
clf1 = DecisionTreeClassifier(max_depth=4)

25
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
eclf = VotingClassifier(estimators=[('dt', clf1), ('knn', clf2), ('svc', clf3)],
voting='soft', weights=[2, 1, 2])
clf1 = clf1.fit(X, y)
clf2 = clf2.fit(X, y)
clf3 = clf3.fit(X, y)
eclf = eclf.fit(X, y)
4.5. IMPLEMENTATION
The ML algorithms are implemented using the standard libraries available such as SKLearn
and its tools which are used in the direct application of various complex ML algorithms.
● Python, version 3.9.12 is a programming language that is open-source. Machine learning models
were used for the implementation
● Scikit-learn, version 1.1 supports both Supervised and Unsupervised Machine Learning algorithms
. Additionally, it offers a number of tools for data preprocessing, model selection, model
evaluation, and many other utilities. Ensemble models are used from this library
Performance Analysis:
Machine learning tasks are associated with evaluation metrics. For classification and regression tasks,
various metrics are available. Some metrics, such as precision, and recall, are useful for a variety of
tasks. Classification and regression are examples of supervised learning, accounting for most machine
learning applications. We should be able to increase our model's overall predictive power using
various metrics for performance assessment before deploying it for production on unrecognized data.
When the respective model is deployed on unseen data, failing to rigorously evaluate the Machine
Learning model using different evaluation metrics and relying solely on accuracy can lead to
problems and poor predictions.
This is useful because it provides a rough target for a machine learning engineer or data scientist to
work forward into. However, the evaluation metric might alter over time due to the nature of
experimentation.
In this project, we evaluated a model using scikit-learn evaluation metrics. In this project, the built-in

26
score method of the estimator, the scoring parameter, and problem-specific metrics functions are used
to evaluate Scikit-learn models or estimators.
Accuracy, the area under the ROC curve, the confusion matrix, and the classification report are
metrics for evaluating classification models. The Goldilocks model is what we're looking for. One
that performs well not only on our dataset but also on previously unseen examples. We could use a
validation set to test different hyperparameters, but since we don't have much data, we'll use cross-
validation. K-fold cross-validation is the most common type of cross-validation. It entails dividing
your data into k-folds and then testing a model on each of them. Although the cross-validated
accuracy is preferred, we still take it into consideration even though the mean accuracy is higher.
Using the baseline model, we attempted to improve and evaluate the model through hyperparameter
tuning. Each model we use has a set of dials that can be turned to control how it performs. Changing
these values may improve or degrade model performance. The techniques used in this project include
hyperparameter tuning by hand, gridsearchCV, and randomsearchCV. Although we attempted to
improve our model through hyperparameter tuning, the accuracy is lower than the baseline model
accuracy, but there are improvements in other metrics such as precision, recall, and f1-score.

27
6. PROJECT DEMONSTRATION

The Project Demonstration section illustrates how the Alzheimer’s disease detection model operates, from data
input to prediction output. This section highlights key components, model workflow, and results, demonstrating
the system's capability to accurately classify early-stage Alzheimer’s symptoms.

1. Data Preparation:
o Input Data: The model uses clinical assessment data, focusing on features that may indicate early
cognitive decline related to Alzheimer’s. Preprocessing steps include handling missing values,
normalizing the data, and selecting relevant features to enhance model accuracy.
o Feature Engineering: Key features used in the model are selected based on their significance in
detecting early Alzheimer’s symptoms, allowing the model to focus on meaningful indicators.
2. Model Training and Selection:
o Model Pipeline: The training pipeline involves splitting the data into training, validation, and test sets,
followed by hyperparameter tuning to optimize model performance.
o Classifier Comparison: Classifiers such as Random Forest and Voting classifiers were compared,
with the optimal model chosen based on metrics like accuracy, sensitivity, and specificity. This
comparison helped ensure the model’s robustness and reliability in real-world applications.
3. System Workflow:
o Step-by-Step Demonstration:
1. Data Input: Data is fed into the system in a standardized format.
2. Preprocessing: The system preprocesses the data to ensure compatibility with the model.
3. Model Prediction: The trained model processes the input and generates a prediction, indicating
the likelihood of early-stage Alzheimer’s.
4. Result Output: The system displays the predicted classification and relevant statistical metrics,
offering insights into the likelihood of Alzheimer’s at an early stage.
o Real-Time Analysis: The system is designed to deliver predictions efficiently, making it suitable for
potential clinical use where timely decision-making is crucial.
4. Results Visualization:
o Confusion Matrix: A confusion matrix is presented to illustrate the model’s prediction accuracy and
any misclassifications.

28
o Performance Metrics: Metrics such as F1 score, precision, and recall are calculated, showing the
model’s effectiveness in distinguishing early-stage Alzheimer’s.
o Graphs and Charts: Additional visualizations, such as ROC curves and feature importance charts,
provide an intuitive view of model performance and key factors affecting predictions.
5. User Interface (if applicable):
o If a user interface is implemented, it is designed to be accessible and user-friendly for healthcare
professionals, displaying patient data, predictions, and summary statistics in an easily interpretable
format.
6. Limitations and Observations:
o Some limitations were observed, including data constraints that may affect model generalizability
across different populations. Despite these challenges, the model performed well in the demonstration
phase, with accurate classifications and valuable insights into early detection.
7. Future Enhancements (if applicable):
o Potential improvements include expanding the dataset, experimenting with additional machine learning
models, and incorporating feedback mechanisms to continually refine the model based on new data.

7. RESULT AND DISCUSSION


The Result and Discussion section presents the outcomes of the Alzheimer’s disease detection model,
evaluating its effectiveness in identifying early-stage symptoms and analyzing the implications of the results.

1. Model Performance:
o The model was assessed using accuracy, sensitivity, specificity, and F1 score, key metrics for
evaluating classification models. The Random Forest classifier achieved an accuracy of approximately
95%, indicating reliable performance. The Voting classifier showed a slight improvement with an
accuracy of 97%, suggesting that combining multiple classifiers enhanced overall robustness.
o Confusion Matrix: A confusion matrix illustrates the model’s true positive, false positive, true
negative, and false negative rates, allowing for deeper insight into misclassification patterns and areas
where the model could improve.
2. Evaluation of Classifiers:
o Random Forest: This classifier demonstrated strong results, particularly in terms of sensitivity,
making it effective at correctly identifying cases with early-stage Alzheimer’s. It also maintained a
balanced specificity, reducing false positives and ensuring accurate predictions.

29
o Voting Classifier: The Voting classifier, a combination of multiple models, achieved higher accuracy
and reliability. It integrated the strengths of different algorithms, resulting in fewer misclassifications
and better generalization across varying data samples.
3. Feature Importance and Impact:
o Analysis of feature importance identified certain clinical features as significant indicators of early
Alzheimer’s disease.Understanding the importance of each feature enables clinicians to focus on high-
impact areas during assessments, potentially enhancing the interpretability of the model’s predictions.
4. Discussion of Results:
o Strengths: The model achieved high accuracy and sensitivity, meeting the project’s primary objective
of facilitating early Alzheimer’s detection. The results suggest that machine learning classifiers,
particularly ensemble methods like Voting, can be instrumental in aiding clinical diagnosis.
o Limitations: The model’s performance is influenced by the quality and diversity of the training
dataset. Limited availability of specific demographic or medical data can affect the generalizability of
the results. Additionally, the complexity of the Voting classifier may require more computational
resources, which could limit its feasibility in some clinical settings.
o Cost Analysis (if applicable): The computational cost of running models such as Voting classifiers
may be higher due to the need for multiple classifiers. However, the benefits of enhanced accuracy and
early diagnosis can justify these costs, particularly in specialized clinical settings where early
intervention is critical.
5. Implications and Future Directions:
o The project’s findings indicate that machine learning can play a valuable role in early Alzheimer’s
detection, potentially supporting healthcare providers in recognizing symptoms earlier and improving
patient outcomes.
o Future work could focus on gathering more diverse datasets to improve model generalizability and
exploring advanced deep learning models for enhanced accuracy. Integrating feedback mechanisms for
real-time adaptation based on new data could also increase the model’s effectiveness over time.

30
31
8. CONCLUSION
We have hence performed Alzheimer’s disease detection in the early stages of the disease. We can see that Random
forest and Voting classifiers have the highest accuracy among other classifiers. We have performed our analysis based
on the dataset where the dataset is derived from MRI images(Open Access Series of Imaging Studies (OASIS) is the
organization that has converted MRI images into longitudinal MRI data). Hence our model is very efficient with data
in the format of our dataset. Data preprocessing is required with our implementation as we have to convert image data
into the format of our dataset. In the case of raw MRI images, we would have to go with deep-learning models such
as CNN, DNN, RNN, etc., Since they are more efficient than normal Machine learning classifiers in terms of images.
Our model can be used to solve the current economic stability of AD and solve the AD problems to a good extinct.
REFERENCES

[1]. Association et al., “2017 Alzheimer’s disease facts and figures,” Alzheimer’s Dementia, vol. 13, no. 4, pp.

325–373, 2017.

[2]. S. Li, O. Okonkwo, M. Albert, and M.-C. Wang, “Variation in variables that predicts progression from MCI to
AD dementia over duration of follow-up”. American Journal of Alzheimer’s Disease (Columbia, Mo.), vol. 2, no. 1,
pp. 12–28, 2013.

[3]. R. Roberts and D. S. Knopman, “Classification and epidemiology of MCI,” Clinics in Geriatric Medicine, vol.
29, no. 4, pp. 753–772, 2013.

[4]. N. Fox, R. Black, S. Gilman, M. Rossor, S. Griffith, L. Jenkins, M. Kolleretal., “Effects of an immunization
(AN1792) on MRI measures of cerebral volume in Alzheimer disease,” Neurology, vol. 64, no. 9, pp. 1563– 1572,
2005.

[5]. G. B. Frisoni, N. C. Fox, C. R. Jack Jr, P. Scheltens, and P. M. Thompson, “The clinical use of structural MRI in
Alzheimer disease,” Nature Reviews Neurology, vol. 6, no. 2, pp. 67–77, 2010.

[6]. R. Jack, R. C. Petersen, Y. C. Xu, P. C. OBrien, G. E. Smith, R. J. Ivnik, B. F. Boeve, S. C. Waring,E,.G.


Tangalos, and E. Kokmen,

[7]. “Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment,” Neurology, vol. 52, no.
7, pp. 1397–1397, 1999.
32
[8]. R. Cuingnet, E. Gerardin, J. Tessieras, G. Auzias, S. Lehericy, M.-O. Habert, M. Chupin, H. Benali, O. Col- ´ liot,
A. D. N. Initiative et al., “Automatic classification of patients with Alzheimer’s disease from structural MRI: a
comparison of ten methods using the ADNI database,” Neuroimage, vol. 56, no. 2.

[9]. F. Falahati, E. Westman, and A. Simmons, “Multivariate data analysis and machine learning in Alzheimer’s
disease with a focus on structural magnetic resonance imaging,” Journal of Alzheimer’s Disease, vol. 41, no. 3, pp.
685–708, 2014.

[10]. E. Moradi, A. Pepe, C. Gaser, H. Huttunen, J. Tohka, A. D. N. Initiative et al., “Machine learning framework for
early MRI-based Alzheimer’s conversion prediction in mci subjects,” Neuroimage, vol. 104, pp. 398– 412, 2015.

[11]. S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of Alzheimer’s disease with deep
learning,” in Biomedical Imaging (ISBI), 2014IEEE 11th International Symposium on. IEEE, 2014, pp. 1015– 1018.

[12]. Dubois B, Padovani A, Scheltens P, Rossi A, DellAgnello G. Timely diagnosis for Alzheimers disease: a
literature review on benefits and challenges. Journal of Alzheimer’s disease. 2016 Jan 1;49(3):617-31.

[13]. World Health Organization. World health statistics 2010. World Health Organization,2010.

[14]. Jin K, Simpkins JW, Ji X, Leis M, Stambler I. The critical need to promote research of aging and aging- related
diseases to improve health and longevity of the elderly population. Aging and disease. 2015 Feb;6(1):1.

[15]. Padilla P, Lpez M, Grriz JM, Ramirez J, Salas-Gonzalez D, Alvarez I. NMF-SVM based CAD tool applied to
functional brain images for the diagnosis of Alzheimer’s disease. IEEE Transactions on medical imaging.

2011 Sep 12;31(2):207-16.

[16]. Song S, Lu H, Pan Z. Automated diagnosis of Alzheimer’s disease using Gaussian mixture model based on
cortical thickness. In2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI)
2012 Oct 18 (pp. 880-883). IEEE.

[17]. Reynolds D. Gaussian mixture models. Encyclopedia of biometrics. 2015:827-32.

33
APPENDIX A – Sample Code

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

You might also like