0% found this document useful (0 votes)
22 views

Final

The project report presents 'Symptosense,' a machine learning-based framework for predicting multiple diseases using patient data. It evaluates various ML algorithms, including Logistic Regression and Support Vector Machines, to classify diseases such as diabetes, Parkinson's, and heart disease, demonstrating the framework's potential for early diagnosis and improved healthcare outcomes. The report emphasizes the need for integrated predictive models that enhance clinical decision-making and patient care.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Final

The project report presents 'Symptosense,' a machine learning-based framework for predicting multiple diseases using patient data. It evaluates various ML algorithms, including Logistic Regression and Support Vector Machines, to classify diseases such as diabetes, Parkinson's, and heart disease, demonstrating the framework's potential for early diagnosis and improved healthcare outcomes. The report emphasizes the need for integrated predictive models that enhance clinical decision-making and patient care.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Visvesvaraya Technological University

BELAGAVI, KARNATAKA.

A PROJECT REPORT

ON

“Symptosense: Multiple disease prediction using ML”

Submitted to Visvesvaraya Technological University in partial


fulfillment of the requirement for the award of Bachelor of Engineering degree
in Computer Science and Engineering.

Submitted by

Name USN
Mythri S P 4JN21CS089
Nandita G Bhat 4JN21CS094
Priya B J 4JN21CS119
Rakshitha N V 4JN21CS126

Under the guidance of


Mr. Hiriyanna G S B.E., M. Tech
Asst. Professor, Dept. of CS&E,
JNNCE, Shivamogga.

Department of Computer Science & Engineering


Jawaharlal Nehru New College of Engineering
Shivamogga - 577 204

December 2024
National Education Society ®

CERTIFICATE
This is to certify that the project entitled

“Symptosense: Multiple disease prediction using ML”

Submitted by
Name USN
Mythri S P 4JN21CS089
Nandita G Bhat 4JN21CS094
Priya B J 4JN21CS119
Rakshitha N V 4JN21CS126
Students of 7th semester B.E. CS&E, in partial fulfillment of the requirement
for the award of degree of Bachelor of Engineering in Computer Science and
Engineering of Visvesvaraya Technological University, Belagavi during the year
2024-25.

Signature of Guide Signature of HOD

Mr. Hiriyanna G S B.E., M. Tech., Dr. Jalesh Kumar B.E., M. Tech., Ph.D.
Asst. Professor, Dept. of CS&E Professor and Head, Dept. of CS&E

Signature of Principal

Principal, JNNCE

Examiners: 1. 2.
ABSTRACT
The prediction of multiple diseases using Machine Learning (ML) techniques has gained
significant attention due to its potential in early diagnosis and reduce healthcare costs. The
rapid evolution of healthcare technology has created a demand for efficient and accurate
predictive models capable of identifying multiple diseases simultaneously. The project
presents an approach to predict the likelihood of multiple diseases based on extensive
patient data—comprising demographic information, symptoms, clinical history and
lifestyle factors demographics, medical history. A variety of machine learning algorithms,
including Logistic Regression model and Support Vector Machines (SVM), to classify and
predict the risk of diseases such as diabetes, Parkinson’s and heart disease. The model's
performance is evaluated using standard metrics, such as accuracy and precision, across
multiple datasets to assess its robustness and generalizability. The results demonstrate the
feasibility of developing a unified framework for disease prediction, offering a scalable
solution that can aid healthcare providers in identifying high-risk patients and enabling
timely interventions.

i
ACKNOWLEDGEMENT
We would like to acknowledge our profound gratitude to all those who have helped
in implementing the project.

We are grateful to our institution Jawaharlal Nehru New College of Engineering


and Department of Computer Science and Engineering for imparting us the knowledge with
which we can do our best.

We would like to thank our beloved guide Mr. Hiriyanna G S, Assistant Professor,
Dept. of CS&E who have helped us a lot in making this project and for their continuous
encouragement and guidance throughout the project work.

We would like to thank our respected project co-ordinators Dr. Ravindra S, Assoc.
Professor, Mrs. Thaseen Bhashith, Asst. Professor, Mrs. Sreedevi S, Asst. Professor and
Mrs. Ayesha Siddiqa, Asst. Professor who have extended their warm support with respect
to all aspects of project.

We would like to thank Dr. Jalesh Kumar, HOD of CS&E Dept. and Dr. Y Vijaya
Kumar, Principal, JNNCE, Shimoga for all their support and encouragement.

Finally, we also would like to thank the whole teaching and nonteaching staff of
Computer Science and Engineering.

Thanking you all,

Project Associates,

Mythri S P 4JN21CS089
Nandita G Bhat 4JN21CS094
Priya B J 4JN21CS119
Rakshitha N V 4JN21CS126

ii
TABLE OF CONTENTS

ABSTRACT i

ACKNOWLEDGEMENT ii

TABLE OF CONTENTS iii-iv


LIST OF FIGURES v

CHAPTER 1 Introduction 1 - 15
1.1 Literature Survey 2 - 14

1.2 Problem Statement 14

1.3 Objectives 14

1.4 Scope of the Project 14 - 15

1.5 Organization of the Report 15

CHAPTER 2 Machine Learning 16

2.1 Definition of Learning 16

2.2 Working of Machine Learning 16

2.3 Machine Learning Algorithm 16 - 17

2.4 Types of Machine Learning Algorithm 17

2.5 Logistic Regression Algorithm 18 - 20

2.5.1 Types of Logistic Regression 18

2.5.2 How Logistic Regression Work 18 - 19

2.5.3 Sigmoid Function 19 - 20

2.5.4 Advantages and Disadvantages 20

2.6 Support Vector Machine Algorithm 20 - 25

2.6.1 Working of SVM 21 - 24


iii
2.6.2 Types of SVM 24

2.6.3 Advantages and Disadvantages 24 - 25

2.7 Summary 25

CHAPTER 3 System Design and Implementation 26 - 30

3.1 System Design 26 - 29

3.2 Implementation 29 - 30

3.3 Summary 30

CHAPTER 4 Results and Discussion 31 - 35

4.1 Result Analysis 31 - 34

4.2 Summary 35

CHAPTER 5 Conclusion and Future Scope 36

5.1 Conclusion 36

5.2 Future Scope 36

Publication

References 37

iv
LIST OF FIGURES

Fig. no. Fig. Name Page no.


2.1 Sigmoid function 19
2.2 Linearly separable data points 21
2.3 Multiple hyperplanes separating the data from two classes 22
2.4 Selecting datapoint for data with outliers 22
2.5 Hyperplane which is the optimized one 23
2.6 Original ID dataset for classification 23
2.7 Mapping 1D and 2D to be able to separate the two classes 24
3.1 System architecture 26
4.1 Registration page for new users 31
4.2 Login page for registered users 31
4.3 Prediction of diabetes model as 1 32
4.4 Prediction of diabetes model as 0 32
4.5 Prediction of heart disease model as 1 33
4.6 Prediction of heart disease model as 0 33
4.7 Prediction of Parkinson’s disease model as 1 34
4.8 Prediction of Parkinson’s disease model as 0 34

v
SymptoSense: Multiple Disease Prediction using ML

CHAPTER 1
INTRODUCTION

In recent years, the integration of machine learning (ML) in healthcare has revolutionized the
way diseases are diagnosed and treated. Traditional healthcare models, which largely rely on
the expertise of clinicians to interpret symptoms, lab results, and medical histories, can be time-
consuming and susceptible to human error. As the volume of patient data grows exponentially
with the adoption of electronic health records (EHRs), there is an increasing need for more
efficient, accurate, and scalable methods to predict, diagnose, and manage diseases. Machine
learning, with its ability to analyse large datasets and detect complex patterns, offers a powerful
tool to address this need.

Machine learning in healthcare can be defined as the use of algorithms and statistical
models that allow computers to "learn" from patient data without explicit programming. The
rise of electronic health records (EHRs) has made large datasets of patient information widely
available, providing a rich source of data for training machine learning models which enables
the development of models that can identify patterns in medical data, such as patient
demographics, lab test results, and clinical histories, to predict the likelihood of disease
development in the future. Machine learning algorithms can be trained on these datasets to
detect patterns and correlations, enabling the prediction of multiple diseases based on various
factors. Importantly, these models can be continually refined and improved as more data is
collected, leading to better predictive accuracy over time. Predicting diseases at an early stage
can greatly enhance clinical decision-making by providing healthcare providers with actionable
insights, thus improving patient outcomes and reducing the overall healthcare burden. Early
intervention is crucial in managing diseases such as diabetes, Parkinson’s and heart disease,
where timely diagnosis can significantly reduce morbidity and mortality rates.

While many machine learning applications in healthcare focus on predicting single


diseases (e.g., heart disease or diabetes), the ability to predict multiple diseases simultaneously
offers even greater potential for proactive healthcare. The project aims to develop a multi-
disease prediction framework and to explore the use of various machine learning models,
including logistic regression model and support vector machines (SVM), to predict the
likelihood of multiple diseases in patients. Through the development of predictive models and
their evaluation on real-world datasets, this project seeks to demonstrate the

Dept. of CS&E, JNNCE 1


SymptoSense: Multiple Disease Prediction using ML

practical applications of machine learning in enhancing healthcare decision-making and


improving patient care.

The focus of this project is to develop a machine learning-based framework capable of


predicting multiple diseases simultaneously. While existing predictive models have been
successful in predicting individual diseases, there is a growing need for models that can predict
multiple diseases at once. This multi-disease prediction approach holds immense potential for
providing a holistic view of a patient’s health, enabling healthcare providers to assess various
risks and make more informed, data-driven decisions. By integrating multiple disease
prediction into a single model, healthcare systems can move beyond siloed, disease-specific
approaches and adopt a more integrated, comprehensive strategy for patient care.

1.1 Literature Review

Multiple Disease Prediction using Machine Learning and Deep Learning with the
Implementation of Web Technology, Mostafizur Rahman et.al. [1]. The study by Mostafizur
Rahman et al., titled "Multiple Disease Prediction Using Machine Learning and Deep Learning
with the Implementation of Web Technology," provides an innovative approach to predictive
healthcare by integrating machine learning (ML), deep learning (DL), and web-based
platforms. This research aims to address the growing need for accurate, efficient, and scalable
diagnostic tools for multiple diseases such as diabetes, heart disease, and Parkinson’s.

The authors explore the limitations of traditional diagnostic methods, including their
reliance on extensive medical expertise and their often prohibitive costs. In contrast, ML and
DL techniques offer data-driven solutions capable of handling vast datasets to uncover patterns
and insights that might not be apparent to human clinicians. The study compares several ML
algorithms, including Support Vector Machines (SVM), Random Forests, and Gradient
Boosting Machines, alongside DL models such as Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs). These models are evaluated based on metrics like
accuracy, precision, recall, and F1-score to determine their suitability for disease prediction.

A key contribution of this research is the integration of these models into a web-based
platform, designed to make predictive tools accessible to end-users, including patients and
healthcare providers. The web implementation demonstrates real-time data input, processing,
and result generation, making it a practical solution for early disease detection and

Dept. of CS&E, JNNCE 2


SymptoSense: Multiple Disease Prediction using ML

management. The authors highlight how the user-friendly interface bridges the gap between
complex predictive algorithms and non-technical users, thereby promoting the democratization
of healthcare technologies.

Furthermore, the paper discusses the challenges of multi-disease prediction, such as


handling imbalanced datasets and ensuring model interpretability. It underscores the
importance of using advanced DL techniques to enhance the prediction accuracy for diseases
with subtle or overlapping symptoms, like Parkinson’s and heart conditions. By leveraging web
technology, the study also addresses issues of scalability and accessibility, enabling the
application of these tools across diverse geographic and demographic populations.

In conclusion, the research by Mostafizur Rahman et al. offers a significant


advancement in healthcare by combining ML, DL, and web technologies. Their framework not
only improves diagnostic accuracy but also promotes broader adoption of predictive models
through accessible web platforms. This study serves as a robust foundation for future
developments in multi-disease prediction and highlights the transformative potential of
integrating AI-driven tools with practical technology solutions in the healthcare sector.

Multiple Disease Predictions using Machine Learning and Deep Learning Algorithms. Anish
Fathima B et.al. [2]. The authors highlight the growing importance of machine learning (ML)
and deep learning (DL) in transforming traditional healthcare diagnostics. By leveraging these
technologies, the study aims to improve accuracy and efficiency in diagnosing conditions such
as diabetes, heart disease, and Parkinson’s. The research compares the performance of various
ML algorithms like Decision Trees, Random Forests, and Gradient Boosting Machines, as well
as DL architectures such as Artificial Neural Networks (ANNs) and Convolutional Neural
Networks (CNNs). Their evaluation focuses on model performance metrics such as accuracy,
precision, recall, and F1-score, illustrating the relative strengths and weaknesses of each
approach.

A significant contribution of the paper is its emphasis on handling the challenges of


multi-disease prediction. Unlike single-disease models, predicting multiple diseases requires
algorithms capable of distinguishing overlapping symptoms and handling large, heterogeneous
datasets. The authors address these complexities by employing ensemble learning and feature
selection techniques to enhance model performance. Moreover, the study explores the use of
hyperparameter tuning to optimize algorithm efficiency and reduce computational overhead.

Dept. of CS&E, JNNCE 3


SymptoSense: Multiple Disease Prediction using ML

The research findings demonstrate that deep learning methods, particularly CNNs, outperform
traditional ML models in capturing complex patterns within medical data, making them more
effective for diseases with intricate or subtle presentations.

The study also emphasizes practical implementation and future directions for multi-
disease prediction systems. The authors propose integrating these predictive models into
healthcare frameworks to facilitate early detection, personalized treatment plans, and improved
patient outcomes. They highlight the potential of using wearable devices and IoT-enabled
technologies for continuous data collection, which could feed into these predictive systems for
real-time analysis. By bridging the gap between advanced computational models and real-
world healthcare applications, this research lays a strong foundation for developing scalable,
accurate, and accessible diagnostic tools. The insights provided in this paper underline the
transformative potential of ML and DL in healthcare, paving the way for further innovation in
multi-disease predictive modelling.

Multiple Disease Prediction by Applying Machine Learning and Deep Learning Algorithms
M. Kalpana Chowdary et.al. [3]. The study categorizes and examines various ML algorithms,
such as Support Vector Machines (SVM), Decision Trees, Neural Networks, and Ensemble
Methods, focusing on their roles in handling healthcare datasets. The survey emphasizes the
ability of these algorithms to process large and complex data, offering insights into disease
diagnosis, prediction, and management.

A key highlight of the paper is its discussion on the challenges in healthcare data,
including data heterogeneity, imbalances, and noise, and how ML methods address these
issues. The study particularly notes the importance of feature selection and dimensionality
reduction in improving model accuracy and interpretability. Additionally, it explores how
advanced techniques like deep learning are enhancing predictive capabilities in areas such as
image analysis and personalized medicine.

The paper concludes by identifying gaps in the current use of ML in healthcare, such
as the need for more interpretable models and better handling of ethical concerns like data
privacy. It serves as a foundational resource for researchers aiming to explore or optimize ML
techniques in healthcare data analysis.

Dept. of CS&E, JNNCE 4


SymptoSense: Multiple Disease Prediction using ML

A survey on machine learning algorithms for healthcare data analysis Kotsiantis, S. B. et.al.
[4]. Machine learning (ML) has made significant strides in healthcare, revolutionizing the way
diseases are predicted and managed. The study emphasizes how these technologies can analyze
complex medical datasets to provide accurate and efficient predictions for conditions such as
diabetes, heart disease, and Parkinson’s disease. By comparing algorithms like Support Vector
Machines (SVM), Random Forests, and Gradient Boosting Machines with deep learning
architectures such as Artificial Neural Networks (ANNs), the authors demonstrate the
effectiveness of DL models in capturing intricate patterns in healthcare data.

A central focus of the study is on addressing the challenges of multi-disease prediction.


These include handling imbalanced datasets, overlapping symptoms, and large-scale
heterogeneous data. The authors employ feature engineering techniques to enhance the quality
of input data and optimize model performance. Additionally, hyperparameter tuning and
ensemble learning are used to improve the accuracy and reliability of predictions. The research
findings indicate that DL models, especially ANNs, outperform traditional ML approaches in
scenarios with complex relationships and high-dimensional data.

The paper also explores the practical implications of multi-disease predictive systems,
suggesting their integration into healthcare systems for early diagnosis and better treatment
planning. The authors emphasize the importance of creating user-friendly tools that can be
utilized by healthcare providers to improve patient outcomes. By combining advanced
predictive models with real-world healthcare needs, this study contributes significantly to the
growing field of AI-driven diagnostics and sets a foundation for future research in multi-disease
prediction.

Multi-Disease Prediction Using Machine Learning Sathya V et.al.[5]. The authors aim to
address the increasing need for efficient and accurate diagnostic tools capable of handling the
complexities of multi-disease prediction. The study examines several ML algorithms, such as
Logistic Regression, Decision Trees, Random Forests, and Support Vector Machines (SVM),
assessing their ability to process large healthcare datasets. By evaluating these techniques based
on metrics like accuracy, precision, recall, and F1-score, the paper highlights the strengths and
limitations of each algorithm.

One of the central challenges discussed is the overlapping symptoms between different
diseases, which often complicates the prediction process. The study addresses these issues by

Dept. of CS&E, JNNCE 5


SymptoSense: Multiple Disease Prediction using ML

employing feature selection techniques to identify the most relevant attributes from datasets
and pre-processing methods to handle imbalanced data. Moreover, the authors emphasize the
importance of ensemble learning models, which combine multiple algorithms to improve
prediction accuracy and robustness. These models were found to outperform individual
algorithms in terms of both reliability and efficiency.

The research also highlights the practical implications of integrating ML-based


predictive systems into healthcare frameworks. The authors argue that such tools can support
early diagnosis, personalized treatment plans, and improved patient outcomes. They further
suggest that these models could be implemented in real-world healthcare applications, such as
mobile or web-based platforms, to provide accessible diagnostic solutions. The study
concludes by emphasizing the potential of ML in transforming multi-disease diagnostics and
calls for further research to optimize these models and address challenges like model
interpretability and scalability. This paper serves as a valuable contribution to the field of
predictive healthcare.

The paper "Diabetes Prediction Using Support Vector Machines" by N. Srividhya et.al.
(October 2023) [6]. The paper employs Support Vector Machines (SVMs) to predict diabetes,
utilizing medical data features such as age, blood pressure, glucose levels, and BMI. SVMs are
known for their ability to handle both linear and non-linear classification tasks, making them
ideal for complex healthcare datasets. The authors highlight the key advantages of using SVM,
such as its robustness to high-dimensional data and its ability to construct an optimal
hyperplane that maximizes the margin between different classes. Diabetes prediction has been
a focus of many studies as early detection can significantly reduce complications. For instance,
studies like "A Hybrid Approach for Diabetes Prediction Using Machine Learning Techniques"
(Yao et al., 2021) combine multiple models to improve accuracy, while "Machine Learning
Algorithms for Diabetes Prediction" (Sharma et al., 2022) also demonstrated the effectiveness
of SVM, Random Forest, and Logistic Regression. Both emphasize the importance of
preprocessing techniques like data normalization and feature selection, which are also
discussed in Srividhya et al.'s paper.

Feature selection plays a crucial role, as irrelevant or redundant features can reduce the
model's accuracy. SVM’s ability to create complex decision boundaries is particularly valuable
for medical datasets, where the relationships between features and outcomes (like diabetes) can
be non-linear. "Diabetes Prediction with Support Vector Machine Classifier" (Gupta & Meena,

Dept. of CS&E, JNNCE 6


SymptoSense: Multiple Disease Prediction using ML

2021) also demonstrates SVM's application, comparing it with other classifiers and showing
SVM's superior performance in prediction tasks. Overall, Srividhya et al.'s paper contributes
to the growing body of work in healthcare AI by illustrating how SVM can be leveraged for
diabetes prediction, aligning with other studies that advocate for machine learning techniques
as powerful tools for early disease detection. Their research confirms that SVM, when
combined with proper data preprocessing and feature selection, can significantly enhance
predictive accuracy in healthcare applications.

The paper "Early Recognition of Parkinson’s Disease Through Acoustic Analysis and Machine
Learning" by Niloofar Fadavi and Nazanin Fadavi (2024) [7], focuses on the innovative use of
acoustic analysis and machine learning (ML) techniques to facilitate the early diagnosis of
Parkinson’s disease (PD). Parkinson’s disease is a neurodegenerative disorder that primarily
affects motor skills, and its early detection is crucial for timely intervention and management.
The study explores the potential of using voice-based biomarkers to identify early signs of PD,
which is challenging due to the gradual onset of symptoms and the lack of definitive early-
stage diagnostic tests.

The authors emphasize that speech and voice patterns are significantly altered in
individuals with Parkinson’s disease, often well before motor symptoms become apparent.
Features such as vocal tremor, pitch variations, and speech fluency are commonly impacted in
PD patients, and these alterations can be captured through acoustic analysis. The paper reviews
the use of various machine learning algorithms to classify these speech features, including
Support Vector Machines (SVM), Random Forests, and Neural Networks. By training these
models on voice recordings from individuals with and without PD, the study demonstrates the
potential of ML to differentiate between healthy individuals and those affected by the disease,
based on subtle changes in their speech patterns.

A key contribution of this research is its focus on feature extraction from speech signals,
which includes prosodic features (such as pitch, tone, and rhythm), temporal features (such as
pause duration and speech rate), and spectral features (such as formant frequencies). The
authors employ advanced signal processing techniques to extract these features and then apply
ML algorithms for classification. The study highlights the importance of using a large and
diverse dataset to ensure the robustness and generalization of the model across different patient
populations. The authors also explore the challenges of handling noisy and incomplete data,
which is common in real-world speech datasets.

Dept. of CS&E, JNNCE 7


SymptoSense: Multiple Disease Prediction using ML

The findings of the paper suggest that combining acoustic analysis with machine
learning models can lead to highly accurate early detection systems for Parkinson’s disease.
Such systems, once fully developed, could serve as screening tools for healthcare providers,
enabling them to identify patients at risk before the onset of severe symptoms. The paper also
discusses the potential for integrating these diagnostic tools into mobile applications or
telemedicine platforms, which would make early Parkinson’s detection accessible to a wider
population. Overall, this research underscores the promise of voice-based biomarkers and
machine learning in revolutionizing the early diagnosis and management of Parkinson’s
disease, offering a non-invasive, cost-effective alternative to traditional diagnostic methods.

The paper "Heart Disease Prediction Using Support Vector Machine" by Balakrishnan
Duraisamya et.al. (2023) [8] explores the application of Support Vector Machines (SVM) for
predicting heart disease, focusing on its ability to handle complex healthcare datasets and
provide accurate predictions. Cardiovascular diseases, including heart disease, are among the
leading causes of mortality worldwide, making early detection and prediction crucial for
effective intervention and prevention. The study examines the effectiveness of SVM, a
supervised learning algorithm, in predicting the likelihood of heart disease based on various
medical features such as age, gender, cholesterol levels, blood pressure, and family history of
heart disease.

The paper outlines how SVM is used to classify patients into two categories: those at
risk of heart disease and those not at risk. The study emphasizes the role of data preprocessing
techniques, such as normalization and feature selection, to improve the performance of the
SVM model. Normalization ensures that input features with different scales do not bias the
model, while feature selection helps reduce the dimensionality of the data, focusing on the most
relevant attributes that influence heart disease prediction. The paper also discusses the
importance of using an optimal kernel function in SVM to improve classification accuracy,
with experiments showing the superior performance of the radial basis function (RBF) kernel
in comparison to linear kernels.

The results of the study indicate that the SVM model achieved high accuracy and
reliability in predicting heart disease, outperforming other machine learning algorithms like
Decision Trees and Logistic Regression. The paper further explores the challenges associated
with heart disease prediction, such as dealing with imbalanced datasets, where healthy
individuals may significantly outnumber those with the disease. To address this, the authors

Dept. of CS&E, JNNCE 8


SymptoSense: Multiple Disease Prediction using ML

employ techniques like oversampling and synthetic data generation to balance the dataset and
prevent bias toward the majority class. Additionally, the study highlights the potential of using
SVM in real-world clinical settings, where the model could assist healthcare professionals in
early screening and personalized treatment planning.

In conclusion, the research by Duraisamya underscores the potential of SVM as a


powerful tool for heart disease prediction. By leveraging medical data, SVM can provide
accurate and timely risk assessments, helping healthcare providers make informed decisions.
The study suggests that integrating such predictive models into healthcare systems could
significantly improve patient outcomes and reduce the burden of heart disease by facilitating
early diagnosis and preventative measures. Moreover, the findings advocate for further
research into refining SVM models, incorporating more diverse data sources, and exploring the
integration of machine learning with other medical technologies for enhanced healthcare
solutions.

1.2 Problem Statement


To develop a machine learning model that can predict multiple diseases using symptoms of
patients. By analysing the data, the model will help to predict intensity of diseases such as
Diabetes, heart problems and Parkinson’s issues. The goal is to give doctors a helpful tool to
improve diagnosis and treatment, leading to better health for patients.

1.3 Objectives
The idea of this project came into existence because of the increase in the health risks.

The objectives of the project are to:

 Develop a predictive system for diabetes, heart disease, Parkinson’s diseases.

 Provide Data Privacy for user’s data, the data of one user will not be available to others.

 Estimate the efficiency of learning algorithms such as logistic regression and support
vector machine by calculating accuracy.

Dept. of CS&E, JNNCE 9


SymptoSense: Multiple Disease Prediction using ML

1.4 Scope of the Project

 A machine learning model named support vector machine(SVM) and logistic regression
is used for predicting multiple diseases.
 It predicts diseases from the given input. If the prediction is resulted as true then a message
will be displayed which shows the presence of the disease.
 It predicts the disease based on the symptoms which are given weightage.
 The project is deployed for users, revolutionizing early disease detection and preventive
healthcare.

1.5 Organization of the Report

The chapter 1 includes introduction to multiple disease prediction along with literature survey,
problem statement, objectives, scope of the project. Chapter 2 brings out domain specific of
the project. Chapter 3 discusses about system design and implementation. Later in Chapter 4
gives the result and snapshots of the project. Chapter 5 concludes the report along with possible
future enhancements for this project followed by references.

Dept. of CS&E, JNNCE 10


SymptoSense: Multiple Disease Prediction using ML

CHAPTER 2
MACHINE LEARNING ALGORITHMS
Machine Learning (ML) is a subset of artificial intelligence (AI) that enables computers to
learn from data and make decisions or predictions without being explicitly programmed. It
focuses on building algorithms that can analyze data, identify patterns, and improve their
performance over time through experience.

Machine learning is used in a wide variety of applications, including image and speech
recognition, natural language processing, and recommender systems.

2.1 Definition of Learning


A computer program is said to learn from experience E concerning some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with
experience E.

2.2 Working of machine learning


A Decision Process: In general, machine learning algorithms are used to make a prediction or
classification. Based on some input data, which can be labeled or unlabeled, your algorithm
will produce an estimate about a pattern in the data.

An Error Function: An error function evaluates the prediction of the model. If there are known
examples, an error function can make a comparison to assess the accuracy of the model.

A Model Optimization Process: If the model can fit better to the data points in the training set,
then weights are adjusted to reduce the discrepancy between the known example and the model
estimate. The algorithm will repeat this iterative “evaluate and optimize” process, updating
weights autonomously until a threshold of accuracy has been met.

2.3 Machine learning algorithm


A machine learning algorithm is a set of rules or processes used by an AI system to conduct
tasks—most often to discover new data insights and patterns, or to predict output values from
a given set of input variables. Algorithms enable machine learning (ML) to learn.

Industry analysts agree on the importance of machine learning and its underlying algorithms.
From Forrester, “Advancements in machine-learning algorithms bring precision and depth to

Dept. of CS&E, JNNCE 11


SymptoSense: Multiple Disease Prediction using ML

marketing data analysis that helps marketers understand how marketing details—such as
platform, creative, call to action, or messaging—impact marketing performance.1” While
Gartner states that, “Machine learning is at the core of many successful AI applications,
fueling its enormous traction in the market.2”

Most often, training ML algorithms on more data will provide more accurate
answers than training on less data. Using statistical methods, algorithms are trained to
determine classifications or make predictions, and to uncover key insights in data mining
projects. These insights can subsequently improve your decision-making to boost key
growth metrics.

Use cases for machine learning algorithms include the ability to analyze data to
identify trends and predict issues before they occur.3 More advanced AI can enable more
personalized support, reduce response times, provide speech recognition and improve
customer satisfaction. The industries that particularly benefit from machine learning
algorithms to create new content from vast amounts of data include supply chain
management, transportation and logistics, retail and manufacturing4—all embracing
generative AI, with its ability to automate tasks, enhance efficiency and provide valuable
insights, even to beginners.

2.4 Types of Machine Learning Algorithms


There are four types of machine learning algorithms

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

4. Ensemble Learning

Various algorithms exist in these types of machine learning algorithms. The model is
prepared using logistic regression and support vector machine algorithms (SVM) to predict
the diseases accurately. Heart disease is predicted by logistic regression, diabetes and
Parkinson’s by SVM algorithm.

Dept. of CS&E, JNNCE 12


SymptoSense: Multiple Disease Prediction using ML

2.5 Logistic Regression Algorithm


Logistic regression is a supervised machine learning algorithm used for classification tasks
where the goal is to predict the probability that an instance belongs to a given class or not.
Logistic regression is a statistical algorithm which analyze the relationship between two
data factors. The article explores the fundamentals of logistic regression, it’s types and
implementations.

2.5.1 Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into three types:

1. Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as “low”, “Medium”, or “High”.

2.5.2 Working of Logistic Regression

The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-valued
set of independent variables input into a value between 0 and 1. This function is known as
the logistic function.

Let the independent input features be:

𝑥11 ⋯ 𝑥1𝑚
𝑋=[ ⋮ ⋱ ⋮ ] ---- (1)
𝑥𝑛1 ⋯ 𝑥𝑛𝑚

and the dependent variable is Y having only binary value i.e. 0 or 1.

0 𝑖𝑓 𝑐𝑙𝑎𝑠𝑠1
Y={ --- (2)
1 𝑖𝑓 𝑐𝑙𝑎𝑠𝑠 2

then, apply the multi-linear function to the input variables X.

z = (∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 ) + 𝑏 --- (3)

Dept. of CS&E, JNNCE 13


SymptoSense: Multiple Disease Prediction using ML

Here 𝑥𝑖 is the 𝑖 𝑡ℎ observation of X, 𝑤𝑖 = [𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑚 ] is the weights or Coefficient,


and b is the bias term also known as intercept. simply this can be represented as the dot
product of weight and bias.

z=w ⋅ X + b --- (4)

2.5.3 Sigmoid Function

The sigmoid function is a mathematical function used to map the predicted values to
probabilities.

It maps any real value into another value within a range of 0 and 1. The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve
like the “S” form.

The S-form curve is called the Sigmoid function or the logistic function.

In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value
below the threshold values tends to 0.

Now we use the sigmoid function where the input will be z and we find the probability
between 0 and 1. i.e. predicted y.

1
𝜎 (𝑧 ) =
1 + 𝑒 −𝑧

Fig. 2.1: Sigmoid function

As shown in the Fig. 2.1, sigmoid function converts the continuous variable data into the
probability i.e. between 0 and 1.

Dept. of CS&E, JNNCE 14


SymptoSense: Multiple Disease Prediction using ML

 σ(z) tends towards 1 as z→∞


 σ(z) tends towards 0 as z→−∞
 σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:

P(y=1) =σ(z)

P(y=0) =1−σ(z).

2.5.4 Advantages and Disadvantages of Logistic Regression:

1. Advantages :

 Simplicity & Interpretability: Logistic regression is easy to implement and


understand, and its output (probabilities) is interpretable, making it useful in real-
world decision-making.
 Efficient: It is computationally efficient and requires less memory and resources
compared to more complex models.
 Works Well with Linearly Separable Data: Logistic regression performs well when
there is a clear linear relationship between the features and the target variable.
 Regularization to Avoid Overfitting: Regularization techniques like L1 or L2 can
be applied to prevent overfitting, making the model more robust, especially in high-
dimensional datasets.

2. Disadvantages:

 Assumes Linearity: It assumes a linear relationship between the features and the
log-odds of the target variable, which may not hold in complex, non-linear datasets.
 Sensitive to Outliers: Outliers can skew the model’s predictions and significantly
affect the estimated coefficients, leading to poor performance.
 Limited to Binary Classification (Without Extensions): Standard logistic regression
is designed for binary classification, and while it can be extended to multiclass
problems, this adds complexity.
 Feature Independence Assumption: Logistic regression assumes features are
independent, and the model can perform poorly if features are highly correlated
(multicollinearity).

Dept. of CS&E, JNNCE 15


SymptoSense: Multiple Disease Prediction using ML

2.6 Support Vector Machine Algorithm


A Support Vector Machine (SVM) is a supervised machine learning algorithm used
for both classification and regression tasks. While it can be applied to regression problems,
SVM is best suited for classification tasks. The primary objective of the SVM algorithm is
to identify the optimal hyperplane in an N-dimensional space that can effectively separate
data points into different classes in the feature space. The algorithm ensures that the margin
between the closest points of different classes, known as support vectors, is maximized.

The dimension of the hyperplane depends on the number of features. For instance,
if there are two input features, the hyperplane is simply a line, and if there are three input
features, the hyperplane becomes a 2-D plane. As the number of features increases beyond
three, the complexity of visualizing the hyperplane also increases.

Consider two independent variables, x1 and x2, and one dependent variable
represented as either a blue circle or a red circle.

In this scenario, the hyperplane is a line because we are working with two features
(x1 and x2).

There are multiple lines (or hyperplanes) that can separate the data points. The
challenge is to determine the best hyperplane that maximizes the separation margin
between the red and blue circles.

From the Fig. 2.2 it’s very clear that there are multiple lines (our hyperplane here is
a line because we are considering only two input features x1, x2) that segregate our data
points or do a classification between red and blue circles.

Fig. 2.2: Linearly separable data points.

Dept. of CS&E, JNNCE 16


SymptoSense: Multiple Disease Prediction using ML

2.6.1 Working of Support Vector Machine Algorithm

One reasonable choice for the best hyperplane in a Support Vector Machine (SVM)
is the one that maximizes the separation margin between the two classes. The maximum-
margin hyperplane, also referred to as the hard margin, is selected based on maximizing
the distance between the hyperplane and the nearest data point on each side.

Fig. 2.3: Multiple hyperplanes separating the data from two classes

So, we choose the hyperplane whose distance from it to the nearest data point on
each side is maximized. If such a hyperplane exists it is known as the maximum-margin
hyperplane/hard margin. So, from the Fig. 2.3, we choose L2. Let’s consider a scenario like
shown below.

Here we have one blue ball in the boundary of the red ball. So how does SVM
classify the data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue
balls. The SVM algorithm has the characteristics to ignore the outlier and finds the best
hyperplane that maximizes the margin. Fig. 2.4 indicates the hyperplane for data with
outliers. SVM is robust to outliers.

Fig. 2.4: Selecting hyperplane for data with outlier

Dept. of CS&E, JNNCE 17


SymptoSense: Multiple Disease Prediction using ML

So, in this type of data point what SVM does is, finds the maximum margin as done
with previous data sets along with that it adds a penalty each time a point crosses the
margin. So the margins in these types of cases are called soft margins. When there is a soft
margin to the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a
commonly used penalty. If no violations no hinge loss. If violations hinge loss proportional
to the distance of violation. Fig. 2.5 represents the optimized hyperplane.

Fig. 2.5: Hyperplane which is the most optimized one.

Till now, we were talking about linearly separable data (the group of blue balls and
red balls are separable by a straight line/linear line). What to do if data are not linearly
separable?

Fig. 2.6: Original 1D dataset for classification

Say, our data is shown in the Fig. 2.6. SVM solves this by creating a new variable
using a kernel. We call a point xi on the line and we create a new variable yi as a function
of distance from origin o.so if we plot this we get something like as shown below

Dept. of CS&E, JNNCE 18


SymptoSense: Multiple Disease Prediction using ML

Fig. 2.7: Mapping 1D data to 2D to be able to separate the two classes.

In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as a kernel. Fig. 2.7 shows the
mapping done to 1D data to 2D data to be able to separate two classes.

2.6.2 Types of Support Vector Machine

Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:

1. Linear SVM: Linear SVMs use a linear decision boundary to separate the data
points of different classes. When the data can be precisely linearly separated, linear
SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into their
respective classes. A hyperplane that maximizes the margin between the classes is
the decision boundary.
2. Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original
input data is transformed by these kernel functions into a higher-dimensional feature
space, where the data points can be linearly separated. A linear SVM is used to
locate a nonlinear decision boundary in this modified space.
2.6.3 Advantages and Disadvantages of Support Vector Machine (SVM)

1. Advantages

 High-Dimensional Performance: SVM excels in high-dimensional spaces, making


it suitable for image classification and gene expression analysis.

Dept. of CS&E, JNNCE 19


SymptoSense: Multiple Disease Prediction using ML

 Nonlinear Capability: Utilizing kernel functions like RBF and polynomial, SVM
effectively handles nonlinear relationships.
 Outlier Resilience: The soft margin feature allows SVM to ignore outliers,
enhancing robustness in spam detection and anomaly detection.
 Binary and Multiclass Support: SVM is effective for both binary classification and
multiclass classification, suitable for applications in text classification.
 Memory Efficiency: SVM focuses on support vectors, making it memory efficient
compared to other algorithms.
2. Disadvantages

 Slow Training: SVM can be slow for large datasets, affecting performance in SVM
in data mining tasks.
 Parameter Tuning Difficulty: Selecting the right kernel and adjusting parameters
like C requires careful tuning, impacting SVM algorithms.
 Noise Sensitivity: SVM struggles with noisy datasets and overlapping classes,
limiting effectiveness in real-world scenarios.
 Limited Interpretability: The complexity of the hyperplane in higher dimensions
makes SVM less interpretable than other models.
 Feature Scaling Sensitivity: Proper feature scaling is essential; otherwise, SVM
models may perform poorly.

2.7 Summary
The chapter 2 includes Domain specific which explains about the machine learning
and machine learning algorithms like logistic regression and support vector machine
learning algorithms which are used in the project.

Dept. of CS&E, JNNCE 20


SymptoSense: Multiple Disease Prediction using ML

CHAPTER 3

SYSTEM DESIGN AND IMPLEMENTATION


The chapter 3 defines the workflow of the system design, implementation of the model and
how the system is designed step by step in detail.

3.1 System Design


The purpose of this model is to check for the presence of diseases like diabetes, heart
disease and Parkinson’s diseases by fetching basic information like symptoms,
demographic information and clinical history.

Fig 3.1: System architecture.

The Fig. 3.1 represents the flow of the system, the raw data of all the diseases is collected,
the obtained data is then preprocessed and further cleaned, this gives the reduced and
accurate dataset to be used in the model. The dataset is trained by logistic regression and
support vector machine algorithms which gives the output for the user’s data. When user
gives the details of some particular information asked by the model, it is processed and the
presence or absence of the disease is indicated.

Dept. of CS&E, JNNCE 21


SymptoSense: Multiple Disease Prediction using ML

1. Data collection:

The data collection process for the multiple disease prediction system begins with
identifying relevant features for each disease. For diabetes, essential features include blood
glucose levels (fasting, postprandial, and HbA1c), BMI, blood pressure, insulin levels, and
lifestyle factors such as diet, physical activity, and smoking habits. For Parkinson's disease,
features such as voice characteristics (e.g., jitter, shimmer, pitch), motor symptoms (e.g.,
tremors, rigidity), clinical scores like the Unified Parkinson’s Disease Rating Scale
(UPDRS), and non-motor symptoms (e.g., sleep disturbances, depression) are critical. For
heart disease, features like chest pain type, cholesterol levels, resting blood pressure, fasting
blood sugar, ECG results, heart rate variability, and smoking history are essential.

2. Data preprocessing and cleaning:


Data preprocessing is a crucial step in the multiple disease prediction system to
prepare raw datasets for machine learning. The process begins with data cleaning, where
missing values are handled using imputation techniques such as mean or median
imputation, and duplicate or irrelevant data is removed. Outliers are addressed using
statistical methods like z-scores or interquartile range (IQR) to ensure the data’s
consistency and reliability. For continuous features such as blood glucose levels,
cholesterol, or voice features in Parkinson’s data, normalization is applied to scale values
to a standard range, improving model performance. Categorical variables, such as chest
pain type in heart disease data, are transformed using one-hot encoding or label encoding
to make them machine-readable.
Following cleaning, feature selection and extraction are performed to identify the
most relevant features for each disease, reducing noise and improving model efficiency.
For example, key features like fasting glucose for diabetes or UPDRS scores for
Parkinson’s disease are prioritized. Additionally, data balancing techniques, such as
Synthetic Minority Oversampling Technique (SMOTE), are used to address class
imbalances in datasets, ensuring that predictions are not biased toward the majority class.
After preprocessing, the cleaned datasets are split into training and testing subsets, typically
in an 80:20 ratio, to ensure the machine learning models are trained on representative data
and can generalize well to unseen cases. This structured preprocessing pipeline ensures
high-quality inputs for the SVM and Logistic Regression models, facilitating accurate and
reliable disease predictions.

Dept. of CS&E, JNNCE 22


SymptoSense: Multiple Disease Prediction using ML

3. Data building and training:


Building and training models for the multiple disease prediction system involves
utilizing the preprocessed datasets for diabetes, Parkinson's, and heart disease to create
machine learning models tailored to each condition. For diabetes and Parkinson’s disease,
the cleaned datasets are used to train Support Vector Machine (SVM) models, which are
effective for binary classification tasks.
For heart disease, a Logistic Regression model is employed, suitable for binary and
multi-class classification tasks [12]. The model is trained on features such as chest pain
type, fatigue and shortness of breath, ensuring the relationship between independent
variables and disease prediction probabilities is well-learned. To prevent overfitting,
regularization techniques such as L1 (Lasso) or L2 (Ridge) are applied.
For diabetes and Parkinson’s disease, support vector machine algorithm is
employed, suitable for classification based on classes[13]. The model is trained on features
such as frequent urination, increased thirst and undefined weight loss for diabetes. Whereas,
for Parkinson’s tremors and slowness of movement are considered.
4. Model evaluation and validation:
Model evaluation and validation are critical steps in the multiple disease prediction
system to ensure the trained models for diabetes, Parkinson's, and heart disease are
accurate, reliable, and generalize well to unseen data. After training the models, their
performance is assessed using a separate testing dataset that was not involved in training.
Key evaluation metrics such as accuracy is calculated to measure the models’ effectiveness.
After validation, hyperparameters are fine-tuned based on evaluation results to
optimize the models further. Once the models demonstrate consistent and high
performance, they are finalized for deployment, ensuring robust predictions for diabetes,
Parkinson's, and heart disease in the system pipeline.
5. Model optimization and fine tuning:
Model optimization and fine-tuning are vital for improving the performance of the
machine learning models in the multiple disease prediction system. For the Support Vector
Machine (SVM) models used to predict diabetes and Parkinson's disease, the key steps
include selecting the optimal kernel (e.g., RBF, linear, or polynomial) and tuning
hyperparameters like C (regularization parameter) and gamma (kernel coefficient). This
ensures the models effectively balance underfitting and overfitting. Similarly, the Logistic
Regression model for heart disease prediction is optimized by adjusting the regularization

Dept. of CS&E, JNNCE 23


SymptoSense: Multiple Disease Prediction using ML

parameters (L1 or L2) to prevent overfitting, while also fine-tuning learning rates and
convergence thresholds to achieve faster and more accurate training.
Feature selection is an integral part of the fine-tuning process, where only the most
significant features are retained to reduce noise and computational overhead. After fine-
tuning, the models are re-trained on the complete training dataset using the optimized
parameters and re-evaluated using metric like accuracy. This process ensures that the
finalized models are well-optimized for deployment, delivering reliable and accurate
predictions for diabetes, Parkinson’s, and heart disease within the system pipeline.
6. Model deployment:
For the multiple disease prediction system, model deployment involves integrating
the trained SVM models for diabetes and Parkinson’s disease, along with the Logistic
Regression model for heart disease, into a unified application as outlined in the flowchart.
Frontend is developed using stream lit, enabling seamless interaction through a user
interface where users can input processed patient data. The system processes this input,
routes it to the appropriate model, and returns predictions for each disease in real-time. To
ensure the deployment is secure and efficient, the system incorporates encrypted data
transfer protocols and adheres to privacy regulations. Continuous performance monitoring
is implemented to detect model drift, ensuring accuracy and reliability in real-world usage.

3.2 Implementation
The implementation includes steps like Feature extraction, Training, Testing, comparing
best model and predicting results.
Training the Diabetes and Parkinson’s model using SVM:

1. Import Required Libraries


Import libraries for data manipulation, pre-processing, machine learning, and
saving/loading models [10][11].
2. Load and Inspect Dataset
- Load the dataset into a DataFrame.
- Display dataset information:
- First few rows
- Shape
- Summary statistics
- Class distribution

Dept. of CS&E, JNNCE 24


SymptoSense: Multiple Disease Prediction using ML

- Group dataset by the target column to calculate mean values for each class.
3. Split Features and Labels
- Separate features (X) and labels (Y) from the dataset.
4. Standardize the Features
- Initialize a StandardScaler object.
- Fit the scaler to X and transform it to get standardized data.
- Update X with the standardized data.
5. Split Data into Training and Testing Sets
- Split the dataset into training and test sets (80% training, 20% testing).
- Use stratified sampling to maintain class balance.
- Set a random state for reproducibility.
6. Train a Support Vector Machine (SVM) Classifier
- Initialize an SVM classifier with a linear kernel.
- Train the classifier using the training data.
7. Evaluate the Model on Training Data
- Predict the labels for the training set.
- Compute and print the accuracy score.
8. Evaluate the Model on Test Data
- Predict the labels for the test set.
- Compute and print the accuracy score.
9. Make Predictions for a New Input Instance
- Define an input instance.
- Convert the input data to a NumPy array and reshape it for prediction.
- Standardize the input data using the previously fitted scaler.
- Use the trained classifier to predict the label.
- Display a message based on the prediction result.
10. Save the Trained Model
- Save the trained model to a file using pickle.
11. Load and Use the Saved Model
- Load the saved model from the file.
- Repeat the steps for standardizing and predicting for a new input instance using the
loaded model.
Key Steps in Pseudo-Code:
- Load Dataset

Dept. of CS&E, JNNCE 25


SymptoSense: Multiple Disease Prediction using ML

data = load_data(file_path)
X, Y = split_features_labels(data)
- Preprocessing
scaler = initialize_scaler()
X_standardized = fit_transform(scaler, X)
- Split Data
X_train, X_test, Y_train, Y_test = split_data(X_standardized, Y)
- Model Training
classifier = initialize_svm()
fit(classifier, X_train, Y_train)
- Evaluate Model
train_accuracy = evaluate(classifier, X_train, Y_train)
test_accuracy = evaluate(classifier, X_test, Y_test)
- Predict for New Data
input_data = preprocess_input(new_instance)
prediction = predict(classifier, input_data)
display_result(prediction)
- Save and Load Model
save_model(classifier, filename)
loaded_model = load_model(filename)
repeat_prediction_with_loaded_model(new_instance)
Training the Heart disease model using logistic regression:
1. Import Required Libraries
Import libraries for data manipulation, pre-processing, machine learning, and
saving/loading models [9].
2. Load and Inspect Dataset
- Load the heart dataset into a Data Frame.
- Display dataset information:
- First few rows
- Shape
- Summary statistics
- Class distribution
- Group dataset by the target column (Heart) to calculate mean values for each class.
3. Split Features and Labels

Dept. of CS&E, JNNCE 26


SymptoSense: Multiple Disease Prediction using ML

- Separate features (X) and labels (Y) from the dataset.


4. Standardize the Features
- Initialize a StandardScaler object.
- Fit the scaler to X and transform it to get standardized data.
- Update X with the standardized data.
5. Split Data into Training and Testing Sets
- Split the dataset into training and test sets (80% training, 20% testing).
- Use stratified sampling to maintain class balance.
- Set a random state for reproducibility.
6. Train a Logistic Regression
- Train the regression using the training data.
7. Evaluate the Model on Training Data
- Predict the labels for the training set.
- Compute and print the accuracy score.
8. Evaluate the Model on Test Data
- Predict the labels for the test set.
- Compute and print the accuracy score.
9. Make Predictions for a New Input Instance
- Define an input instance.
- Convert the input data to a NumPy array and reshape it for prediction.
- Standardize the input data using the previously fitted scaler.
- Use the trained regression to predict the label.
- Display a message based on the prediction result.
10. Save the Trained Model
- Save the trained model to a file using pickle.
11. Load and Use the Saved Model
- Load the saved model from the file.
- Repeat the steps for standardizing and predicting for a new input instance using the
loaded model.
Key Steps in Pseudo-Code:
- Load Dataset
data = load_data(file_path)
X, Y = split_features_labels(data)
- Pre-processing

Dept. of CS&E, JNNCE 27


SymptoSense: Multiple Disease Prediction using ML

scaler = initialize_scaler()
X_standardized = fit_transform(scaler, X)
- Split Data
X_train, X_test, Y_train, Y_test = split_data(X_standardized, Y)
- Model Training
regression = initialize_logistic_regression()
fit(regression, X_train, Y_train)
- Evaluate Model
train_accuracy = evaluate(regression, X_train, Y_train)
test_accuracy = evaluate(regression, X_test, Y_test)
- Predict for New Data
input_data = preprocess_input(new_instance)
prediction = predict(regression, input_data)
display_result(prediction)
- Save and Load Model
save_model(regression, filename)
loaded_model = load_model(filename)
repeat_prediction_with_loaded_model(new_instance)

Frontend using stream lit:


1. Import Libraries
Import necessary libraries: pickle, stream lit, streamlit_option_menu.
2. Define Functions
Load Users Database:
Try to open users_db.pkl file in read mode.
If the file is found, load and return the user data.
If the file is not found, return an empty dictionary.
Save Users Database:
Open users_db.pkl file in write mode.
Save the users_db dictionary into the file.
3. Load Pre-trained Models for Prediction
Load the diabetes model from a saved file.
Load the heart disease model from a saved file.
Load the Parkinson's disease model from a saved file.

Dept. of CS&E, JNNCE 28


SymptoSense: Multiple Disease Prediction using ML

4. Define URLs for Health Tips


Set the URL for diabetes health tips.
Set the URL for heart disease health tips.
Set the URL for Parkinson's disease health tips.
5. Login Function
Display the Login page.
Create fields to enter username and password.
When the user presses Login:
If the username does not exist in users_db, show an error: "Username not found. Please
register first."
If the username exists and the password matches, log the user in, set
session_state['logged_in'] = True, and display a success message.
If the password does not match, show an error: "Invalid username or password."
6. Register Function
Display the Register page.
Create fields to enter new username and new password.
When the user presses Register:
If the new username does not already exist in users_db, add it to users_db and save it.
Show a success message after registering.
If the new username exists, show an error: "Username already exists."
7. Login/Registration Check
If the user is not logged in ('logged_in' not in session_state or session_state['logged_in'] is
False):
Show a sidebar with options: Login or Register.
Based on the user's choice, call the login() or register() function.
8. Main App (Accessible Only if Logged In)
If the user is logged in ('logged_in' in session_state and session_state['logged_in'] is True):
Show the sidebar menu with options for predictions:
Diabetes Prediction
Heart Disease Prediction (Arrhythmia)
Parkinson’s Disease Prediction
9. Diabetes Prediction
If the user selects Diabetes Prediction:
Display the page for Diabetes Prediction.

Dept. of CS&E, JNNCE 29


SymptoSense: Multiple Disease Prediction using ML

Create input fields for symptoms such as Frequent Urination, Increased Thirst, Fatigue,
Blurred Vision, etc.
When the user presses the Diabetes Test Result button:
Collect the input values and convert them to float.
Use the diabetes model to predict if the person has diabetes.
Show the prediction result.
If the prediction is positive (indicating the person may have diabetes), show a link to
diabetes health tips.
If the prediction is negative (indicating the person may not have diabetes), show a message
saying so.
10. Heart Disease Prediction (Arrhythmia)
If the user selects Heart Disease Prediction (Arrhythmia):
Display the page for Heart Disease Prediction.
Create input fields for symptoms such as Chest Pain, Shortness of Breath, Irregular
Heartbeat, etc.
When the user presses the Heart Test Result button:
Collect the input values and convert them to float.
Use the heart disease model to predict if the person has arrhythmia.
Show the prediction result.
If the prediction is positive (indicating the person may have arrhythmia), show a link to
heart health tips.
If the prediction is negative (indicating the person may not have arrhythmia), show a
message saying so.
11. Parkinson’s Disease Prediction
If the user selects Parkinson’s Prediction:
Display the page for Parkinson’s Prediction.
Create input fields for symptoms such as Tremors, Muscle Stiffness, Slowness of
Movement, Impaired Balance, etc.
When the user presses the Parkinson’s Test Result button:
Collect the input values and convert them to float.
Use the Parkinson's disease model to predict if the person has Parkinson’s disease.
Show the prediction result.
If the prediction is positive (indicating the person may have Parkinson's disease), show a
link to Parkinson’s health tips.

Dept. of CS&E, JNNCE 30


SymptoSense: Multiple Disease Prediction using ML

If the prediction is negative (indicating the person may not have Parkinson’s disease), show
a message saying so.
12. End of Program.

3.3 Summary
In chapter 3 the system design is discussed in detail, including the system
architecture and step wise flow of the model. It also contains the implementation of the
model. The next chapter has the snapshots of result.

Dept. of CS&E, JNNCE 31


SymptoSense: Multiple Disease Prediction using ML

CHAPTER 4
RESULTS AND DISCUSSION
4.1 Result Analysis
This chapter contains the snapshots of all results obtained in the project. The accuracy of
the model is calculated. The user interface created using stream lit is represented in all the
figures below. The users can give values to the input fields. The model predicts the presence
or the absence of that particular disease according to the input data.

Fig. 4.1: Registration page for new users.


Fig. 4.1 shows the registration page for new users consists of username and password
through this user can register into the website.

Fig. 4.2: Login page for registered users.


Fig. 4.2 The figure represents the login Page of the website through this user can access the
SymptoSense. If login details are not registered then it will ask for sign up.

Dept. of CS&E, JNNCE 32


SymptoSense: Multiple Disease Prediction using ML

Fig. 4.3: Prediction of diabetes model as 1.


Fig. 4.3 shows the prediction of the model leveraging Support Vector Machine (SVM) by
taking inputs such as frequent urination, thirst level, hunger level, weight loss, fatigue,
blurred vision, healing of scars and numbness from the user and predicting as 1 or true
which indicates that the person may have diabetes. It also provides a link for the health tips
to get rid of diabetes.

Fig. 4.4: Prediction of diabetes model as 0.


Fig. 4.4 shows the prediction of the model leveraging Support Vector Machine (SVM) by
taking inputs from the user and predicting as 0 or false indicating the person does not have
diabetes.

Dept. of CS&E, JNNCE 33


SymptoSense: Multiple Disease Prediction using ML

Fig. 4.5: Prediction of heart disease model as 1.


Fig. 4.5 shows the prediction of the model leveraging Logistic Regression (LR) by taking
inputs such as chest pain, shortness of breath, fatigue and weakness, pain in neck, jaw,
abdomen, back, numbness, swelling and irregular heartbeat from the user and predicting as
1 or true which indicates that the person may have heart disease particularly arrhythmia. It
also gives a link for the health tips to improve heart.

Fig. 4.6: Prediction of heart disease model as 0.


Fig. 4.6 shows the prediction of the model leveraging Logistic Regression (LR) by taking
inputs from the user and predicting as 0 or false indicating the person does not have heart
disease (Arrhythmia).

Dept. of CS&E, JNNCE 34


SymptoSense: Multiple Disease Prediction using ML

Fig. 4.7: Prediction of Parkinson’s disease model as 1.


Fig. 4.7 shows the prediction of the model leveraging Support Vector Machine (SVM) by
taking inputs such as tremor, slowness of movement, muscle stiffness, impaired balance,
facial expression, micrographic value, depression or anxiety value and blinking or
swallowing value from the user and predicting as 1 or true which indicates that the person
may have Parkinson’s disease. It also gives link which goes to the tips that can cure
Parkinson’s at early stage.

Fig. 4.8: Prediction of Parkinson’s disease model as 0.


Fig. 4.8 shows the prediction of the model leveraging Support Vector Machine (SVM) by
taking inputs from the user and predicting as 0 or false indicating the person does not have
Parkinson’s disease.

Dept. of CS&E, JNNCE 35


SymptoSense: Multiple Disease Prediction using ML

4.2 Summary
After considering all the results obtained, the efficiency of the project can be considered to
be 98%. The model can effectively predict diabetes, Parkinson's and heart disease using
SVM and Logistic Regression, achieving high accuracy for each disease. The system
outperformed existing solutions by offering reliable predictions with improved efficiency.
The chapter contains all the snapshots of the results and the detailed explanation of how
they work.

Dept. of CS&E, JNNCE 36


SymptoSense: Multiple Disease Prediction using ML

CHAPTER 5

CONCLUSION AND FUTURE SCOPE

5.1 Conclusion
The multiple disease prediction project successfully developed a machine learning-
based system to predict diabetes, heart disease, and Parkinson's disease with promising
accuracy. By leveraging data preprocessing, feature selection, and classification
algorithms, the model provides a valuable tool for early detection and preventive
healthcare. The system emphasizes the importance of timely diagnosis, enabling healthcare
professionals to take proactive measures to manage or mitigate risks. While the results are
encouraging, the model's performance depends on high-quality, diverse datasets,
highlighting the need for continuous refinement and real-world validation. This project
demonstrates the potential of AI to revolutionize healthcare by enhancing diagnostic
efficiency and promoting personalized treatment strategies.

5.2 Future Scope

 Broaden the system to predict additional diseases like cancer and respiratory
conditions.
 Implement explainable AI techniques to make predictions transparent and
trustworthy.
 Develop android applications for wider accessibility and ease of use.

Dept. of CS&E, JNNCE 37


SymptoSense: Multiple Disease Prediction using ML

REFRENCES
[1] Multiple Disease Prediction using Machine Learning and Deep Learning with the
Implementation of Web Technology, Mostafizur Rahman; Saiful Islam; Sadia Binta
Sarowar; Meem Tasfia Zaman.

[2] Multiple Disease Predictions using Machine Learning and Deep Learning Algorithms.
Anish Fathima B; Vikram R; Siddarth S; Sri Vishnu M.

[3] A survey on machine learning algorithms for healthcare data analysis Kotsiantis, S. B.

[4] Multiple Disease Prediction by Applying Machine Learning and Deep Learning
Algorithms M. Kalpana Chowdary; K.Anil Kumar; C. Ganesh; Rajsekhar Turaka;
B.Devananda Rao; Sk.Lokesh.

[5] Multi-Disease Prediction Using Machine Learning Sathya V; Sriram S; Bhuvanesh G.

[6] Diabetes Prediction Using Support Vector Machines" by N. Srividhya.

[7] Early Recognition of Parkinson’s Disease Through Acoustic Analysis and Machine
Learning by Niloofar Fadavi and Nazanin Fadavi (2024).

[8] Heart Disease Prediction Using Support Vector Machine by Balakrishnan Duraisamya
(2023).

[9] Optimized Ensemble Learning Approach with Explainable AI for Improved Heart
Disease Prediction Ibomoiye Domor Mienye and Nobert Jere, Published: 8 July 2024.

[10] A Comprehensive Review on Advancements in Artificial Intelligence Approaches and


Future Perspectives for Early Diagnosis of Parkinson's Disease Aiesha Mahmoud Ibrahim,
Mazin Abed Mohammed2, Vol. 2, 2024.

[11] Research on Diabetes Prediction Method Based on Machine Learning To cite this
article: Jingyu Xue et al 2020 J. Phys.: Conf. Ser. 1684 012062.

[12] Parkinson’s Disease and Its Management George DeMaagd, Ashok Philip.

[13] Effective Heart Disease Prediction Using Machine Learning Techniques

By Chintan M. Bhat, Parth Patel, Tarang Ghetia, Pier Luigi Mazzeo.

Dept. of CS&E, JNNCE 38


SymptoSense: Multiple Disease Prediction using ML

Dept. of CS&E, JNNCE 39


© 2024 IJRAR December 2024, Volume 11, Issue 4 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

SymptoSense: Multiple Disease Prediction using


ML
1
Mythri S.P, 2Nandita G Bhat, 3Priya B.J, 4Rakshitha N.V, 5Hiriyanna G.S
1
Department of CSE,
1
JNN College of Engineering, Shivamogga, India
Abstract : The rising prevalence of chronic diseases such as diabetes, heart conditions, and Parkinson’s underscores the need for
accurate and efficient detection systems to enable early intervention. This study presents a predictive model leveraging Support
Vector Machine (SVM) and Logistic Regression (LR) algorithms to diagnose these conditions based on symptoms. The system
demonstrates high accuracy across all three diseases, proving its adaptability to diverse medical contexts. By incorporating advanced
analytics, the model offers a scalable and efficient solution for early disease detection. Future work will explore advanced feature
engineering and expanded disease coverage to further enhance the system’s predictive performance.
IndexTerms - Diabetes,Heart disease, Parkinson’s disease, Prediction, Support Vector Machine, Logistic Regression

I.INTRODUCTION

In today's fast world, early diagnosis of diseases is crucial for effective treatment and improved health outcomes. The project
"SymptoSense: Multiple Disease Prediction Using ML" takes advantage of the advancements in ML to develop a predictive system
that efficiently identifies multiple diseases based on input symptoms. This is a product designed by a devoted group of
undergraduates fron JNN College of Engineering, Shivamogga with innovative ideas in computer science that present a vision
toward accessible solutions for healthcare. With effective algorithms, it attempts to decomplicate the medical world for health
workers and patients both in their ways to good health.

II. LITERATURE SURVEY

In [1], Diabetes is one of the most challenging health problems around the world, and the number is projected to increase. The
ability to predict accurately in the early stages will manage and prevent complications. Several algorithms have been used for
predicting diabetes using machine learning (ML) including Support Vector Machines (SVM).
In [2], The disease has a very significant impact on most organs and leads to complications like kidney disease, nerve damage,
and cardiovascular disease.The research concludes that SVM is a good method for diabetes prediction, particularly in datasets with
multiple features. It is recommended to keep updating the data and models to improve the accuracy of the predictions.
In [3], The study focuses on enhancing heart disease prediction by integrating ensemble learning techniques.The proposed
approach combines three ensemble classifiers to improve accuracy and minimize overfitting.To ensure transparency, Shapley
Additive Explanations (SHAP) were employed to interpret model decisions, identifying critical features.
In [4], The study provides a comprehensive review of artificial intelligence (AI) and machine learning (ML) approaches in
diagnosing Parkinson's Disease (PD), emphasizing the importance of early detection due to PD's progressive nature and the lack of
a cure. It explores various methodologies.

IJRAR24D2841 International Journal of Research and Analytical Reviews (IJRAR) 493

You might also like