0% found this document useful (0 votes)
33 views23 pages

Saniya LT L

Saniya work

Uploaded by

Guddu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views23 pages

Saniya LT L

Saniya work

Uploaded by

Guddu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

GNDEC CURE-LIFE ANALYSIS SYSTEM

Using Machine Learning and Deep Learning

By
SANIYA MAHVEEN
UNDER THE GUIDANCE OF
DR PREMALA

DEPARTMENT OF COMPUTER SCIENCE

Aim:- Building a system that predicts


🩺Disease Diagnosis
🥼Early Diabetes Diagnosis
⚕️ Liver Disease Prediction
🦟Malaria Detection
🫁Pneumonia Detection

1
CHAPTER 1
INTRODUCTION

In terms of data collecting and processing, healthcare is one of the most worrisome
industries. With the advent of the digital era and technological advancements, a
vast quantity of multidimensional data on patients is created, including clinical
factors, hospital resources, illness diagnostic information, patients' records, and
medical equipment. The enormous, dense, and complex data must be processed
and evaluated in order to extract knowledge for effective decision making.
Medical data mining offers a lot of potential for uncovering hidden patterns in
medical data sets.
By identifying significant patterns and detecting correlations and relationships
among many variables in huge databases, the use of various data mining tools and
machine learning approaches has changed healthcare organizations . It serves as
an important instrument in the medical sector, providing and comparing existing
data for the future course of action. This technology combines multiple analytic
methodologies with modern and complex algorithms, allowing for the exploration
of massive amounts of data . It is used in healthcare to gather, organize, and
analyze patient data in a systematic manner. It may be used to identify inherent
inefficiencies and best practices for providing better services, which may lead to
improved diagnosis, better medicine, and more successful treatment, as well as a
platform for a deeper knowledge of the mechanisms in practically all elements of
the medical domain. Overall, it assists in the early detection and prevention of
disease epidemics by searching medical databases for pertinent information.
indicators is known as medical diagnosis. In the diagnostic process, one or more
diagnostic procedures, such as diagnostic tests, are performed. Diagnosis of
chronic illnesses is a vital issue in the medical industry since it is based on many
symptoms. It is a complex procedure that frequently leads to incorrect
assumptions. When diagnosing illnesses, the clinical judgment is based mostly on
the patient’s symptoms as well as The process of determining a condition based
on a person’s symptoms and the physicians' knowledge and experience .
Furthermore, when medical systems evolve and new treatments become available,
it becomes more difficult for physicians and doctors to stay up with the current
innovations in clinical practice . For effective therapy, medical practitioners and
doctors must be well-versed in all pertinent diagnostic criteria, patient history, and
a mix of medication therapy. However, mistakes are possible since they make
judgments instinctively based on information and experience gained from past
experience with patients. Because of factors such as multitasking, restricted
analysis, and memory capacity, their cognitive capacities are restricted . As a

2
result, it is difficult for a physician to make the right judgment on a consistent
basis if he is not supported by clinical tests and patient history information. Even
experienced physicians can benefit from a computer-aided diagnostic system in
making sound medical judgments . Thus, medical professionals are very interested
in automating the diagnosis process by integrating machine learning techniques
with physician expertise . Data mining and machine learning approaches are
making significant efforts to intelligently translate accessible data into valuable
information in order to improve the diagnostic process's efficiency.

MOTIVATION:
Several studies have been conducted to explore the use of machine learning in
terms of diagnostic abilities. It was discovered that, when compared to the most
experienced physician, who can diagnose with 79.97% accuracy, machine
learning algorithms could identify with 91.1% correctness . Machine learning
techniques are explicitly used to illness datasets to extract features for optimal
illness diagnosis, prediction, prevention, and therapy. The dataset utilised for
chronic renal disease had 25 features and was collected over a 2-month period in
India (e.g., red blood cell count, white blood cell count, etc). This study intends
to predict multiple diseases such as diabetes, heart disease, chronic kidney disease
and cancer. Various classification algorithms (like KNN, SVM, Random Forest,
Logistic Regression and Naive Bayes) are used here to perform disease detection.
The accuracy of each algorithm is validated and compared with each other in order to
find the best one for prediction. Further, multiple datasets (for each disease each dataset)
are used to achieve utmost accuracy in the predicted results. The best predicted algorithm
for every disease is chosen and integrated to build a web application. The user can easily
predict the required disease by typing respective attribute(input) values of that specific
disease.

3
CHAPTER 2

LITERATURE SURVEY

2.1 LITERATURE REVIEW

[1] Liva Faes at al There is great promise for employing deep learning
algorithms in medical diagnosis [1. Whilst additional research into the practical
use of such algorithms is required, the accuracy of deep learning algorithms is on
par with that of healthcare experts. An overstated claim from a poorly planned or
badly reported study may damage the credibility and road to effect of such
diagnostic algorithms, which is the more relevant conclusion around methodology
and reporting. In this summary, we have emphasized some of the more important
design and reporting concerns that researchers should think about.
[2] M. Chen et al Using both structured and unstructured data from the hospital,
the authors present a novel convolutional neural network (CNN)-based
multimodal illness risk prediction method. None of the previous research in the
field of medical big data analytics seems to have combined the two kinds of data.
Our proposed approach achieves a prediction accuracy of 94.8% and a
convergence speed that is quicker than the CNN-based unimodal illness risk
prediction algorithm, among other benchmarks. [3] Pushpa Singh at al
Machinebased diagnosis of disease. Decisions may be made more quickly and
with less bias using machine learning methods like support vector machine
(SVM), knearest neighbor (KNN), naive bayes (NB), and decision tree. Python,
which may be used for actual implementation, is explored as a realistic means of
implementing these algorithms. The likes of cancer, diabetes, epilepsy, heart
attack, and other major disorders are all diagnosed with the use of these
algorithms.
[4] Alexander Selvikvag at al From image analysis to natural language
processing, deep neural networks are increasingly widely used in both academia
and business as cutting-edge machine learning models. A lot of progress is being
made, but the potential for these innovations to improve medical imaging, medical
data analysis, medical diagnostics, and healthcare as a whole is enormous.

4
[5] Yuanyuan Pan at al Heart disease prognostication aids have been proposed,
such as the Enhanced Deep learning aided Convolutional Neural Network
(EDCNN). The EDCNN model incorporates regularisation learning strategies
into a more in-depth design than that of the standard multi-layer perceptron. As a
decision support system, EDCNN runs on the IoMT platform, which stores data
about cardiac patients and makes it accessible from anywhere in the world through
the cloud, allowing clinicians to make accurate diagnoses no matter where they
are located.
[6] Dhiraj Dahiwade at al We employ the machine learning algorithms
KNearest Neighbor (KNN) and Convolutional neural network (CNN) to
accurately forecast diseases. It's important to collect data on disease symptoms in
order to forecast the spread of illness. It is important to take a person's lifestyle
and medical history into account when making a general illness prognosis. CNN
outperforms the KNN algorithm in predicting prevalent diseases by a margin of
8.5 percentage points. Both the time and storage needed are larger for KNN than
for CNN.
[7] Dur-e-maknoon Nisar at al The core components of every system are the
tools and algorithms used to operate it. In deep learning, software is used to
transform raw data into useful insights. When it comes to deep learning models,
you may find a lot of distinctions between supervised and unsupervised
approaches. Given the significance of these algorithms, it is imperative that
medical instruments, such as the microscope, phonendoscope, and EKG, be
developed gradually based on these models in order to accommodate the
limitations of physicians' perceptual capabilities. The use of deep learning in the
analysis and prediction of data from big datasets is becoming increasingly
important.
[8] Jer´ ome Thevenot at al Many remote and non-invasive imaging
technologies have been outlined, each of which can shed light on the diagnostic
value of various symptom kinds in relation to certain medical diseases. In this
study, we used computer vision techniques to automatically recognise symptoms
of over 30 different diseases to make preliminary diagnoses.
[9] Rahatara Ferdousi at al To combat this, we offer a unique machine learning
based health CPS framework that can efficiently analyze data from wearable IoT
sensors to forecast the likelihood of developing noncommunicable diseases like
diabetes at an early stage. An authentic diabetes dataset was used for the
experiment's training phase, whereas sensor data created in a lab were used for
testing.

5
[10] Alice Othmania at al Novel deep learning algorithms and methods are
discussed. The clinical judgements of future radiologists can be aided by DLA.
DLA can automate the job of radiologists and help even the most novice among
them make decisions. DLA's goal is to help doctors by automatically detecting
and categorizing lesions so that a more accurate diagnosis may be made. Doctors
can benefit from DLA since it speeds up the interpretation of medical images and
reduces the likelihood of making mistakes.

2.2 INFERENCES FROM LITREATURE SURVEY

The existing system for multi-disease prediction typically relies on traditional


diagnostic methods, such as physical exams, medical history reviews, and
laboratory tests. While these methods have been effective in many cases, they can
be timeconsuming and require significant resources to perform. In addition,
traditional diagnostic methods may not always be accurate, leading to
misdiagnosis and delayed treatment.

Machine learning algorithms have emerged as a promising alternative to


traditional diagnostic methods, offering the potential for improved accuracy and
efficiency. Several studies have explored the use of machine learning algorithms
for the prediction of specific diseases, such as heart disease and cancer. However,
there is a need for more research on multi-disease prediction, which involves the
prediction of multiple diseases simultaneously.

One existing system for multi-disease prediction is based on a combination of


multiple machine learning algorithms. For example, a study by Park et al. (2019)
used a combination of logistic regression, decision trees, and random forests to
predict the onset of four different diseases: hypertension, diabetes,
hyperlipidemia, and metabolic syndrome. The study found that the combination
of these algorithms was more accurate than using any individual algorithm alone.

Another existing system for multi-disease prediction is based on deep learning


algorithms, such as CNNs. These algorithms are particularly well-suited for image
classification tasks, such as the identification of tumors in medical images. For
example, a study by Yan et al. (2020) used a CNN to predict the onset of breast
cancer based on mammography images. The study found that the CNN was able
to achieve high accuracy in predicting the onset of breast cancer.

6
2.3 OPEN PROBLEMS IN EXISTING SYSTEM

The existing system are not done with real time dataset and the existing system
has low accuracy and low efficiency and implementation time. Also, the testing
and training is not done with the proper test-trai

CHAPTER 3

DESCRIPTION OF PROPOSED SYSTEM

4.1 SELECTED METHODOLOGY OR PROCESS MODEL

MODULE 1: Dataset Collection and preprocessing

The Data Preprocessing module is a crucial component of the proposed


multidisease prediction system. This module will be responsible for cleaning and
transforming the raw data into a format that is suitable for machine learning
algorithms. The following steps will be performed in this module:
Data Cleaning: The raw data may contain missing values, outliers, or errors that
need to be addressed before training the machine learning models. The Data
Preprocessing module will handle these issues by imputing missing values,
removing outliers, and correcting errors.
Feature Selection: The raw data may contain a large number of features that are
not relevant to the prediction of the diseases. The Data Preprocessing module will
perform feature selection to identify the most important features that contribute to
the prediction of each disease. This will improve the accuracy of the machine
learning models and reduce the complexity of the data.
Data Encoding: The machine learning algorithms require that the data be in a
numerical format. The Data Preprocessing module will encode categorical
variables using techniques such as one-hot encoding or label encoding. The

7
module will also scale numerical variables to ensure that they are on the same
scale.
Data Splitting: The preprocessed data will be split into training and testing sets
to evaluate the performance of the machine learning models. The training set will
be used to train the models, while the testing set will be used to evaluate their
performance.
Data Visualization: The Data Preprocessing module will also include data
visualization tools to help users explore and understand the data. This will include
histograms, scatter plots, and other visualizations that can reveal patterns and
relationships in the data.

MODULE 2: Model Training

Model training is a critical step in the proposed multi-disease prediction system.


The goal of model training is to build accurate and robust machine learning
models that can predict the likelihood of developing each disease. The following
steps will be performed in the Model Training module:
Model Selection: The first step in model training is to select the appropriate
machine learning algorithms for each disease. The selected algorithms will be
trained on the preprocessed data and evaluated based on their performance metrics
such as accuracy, precision, recall, and F1-score.
Hyperparameter Tuning: After selecting the machine learning algorithms,
hyperparameters tuning is performed to find the best combination of
hyperparameters that optimize the performance of the models. Hyperparameters
are parameters that are set prior to training the model and are not learned from the
data, such as the learning rate or regularization parameter.
Cross-validation: To avoid overfitting, cross-validation is used to evaluate the
performance of the models on unseen data. Cross-validation involves splitting the
training data into several subsets, training the models on each subset, and
evaluating their performance on the remaining subset. This process is repeated
several times to obtain a more accurate estimate of the models' performance.
Model Evaluation: Once the models are trained and validated, their performance
is evaluated on the testing set. The evaluation metrics used in this step are the
same as those used during hyperparameter tuning and cross-validation.
Model Selection and Saving: Based on the evaluation metrics, the
bestperforming models for each disease are selected and saved for future use.

8
The Model Training module is critical to the success of the multi-disease
prediction system. By selecting and training accurate machine learning models,
the system can make more reliable and accurate predictions about the likelihood
of developing each disease.

MODULE 3:Predictions

The Prediction module is a crucial component of the proposed multi-disease


prediction system. This module allows users to input their medical information,
such as age, gender, blood pressure, blood sugar level, cholesterol level, smoking
habits, and family history of disease, and uses the trained machine learning
models to predict the likelihood of developing each disease.
The input data is first preprocessed using the same preprocessing steps used
during model training. The preprocessed data is then passed through the selected
machine learning models to obtain predictions for each disease. The predictions
are based on the patterns and relationships learned by the models from the training
data.
The predictions are displayed in an interactive user interface using Streamlit. The
interface shows the probability of developing each disease based on the user's
input data. The user can explore the predictions in real-time and visualize the most
important features that contributed to the predictions. This allows users to better
understand the factors that influence the likelihood of developing each disease
and take proactive steps to prevent or manage the diseases.
The Prediction module also includes a data visualization component that allows
users to explore the dataset and view descriptive statistics about the variables.
This can help users understand the distribution of the variables and identify any
trends or patterns in the data.
In addition, the Prediction module is designed to be easily updatable with new
data. The system can be periodically retrained with new data to improve the
accuracy of the predictions over time. This allows healthcare providers to stay
upto-date with the latest information and make more informed diagnoses and
treatment plans for their patients
MODULE 4: Web development

The development of a web-based multi-disease prediction system using Streamlit


involves several steps. Streamlit is a Python library that makes it easy to build and

9
deploy interactive web applications for data science and machine learning
projects.
The first step is to install Streamlit and set up the project environment. Once
installed, the next step is to load the preprocessed data and the saved machine
learning models. This can be done using libraries such as pandas and joblib.
The next step is to design the user interface of the web application. Streamlit
provides a range of pre-built widgets, such as sliders, dropdown menus, and text
boxes, that can be customized to suit the needs of the application. The user
interface should be easy to navigate and intuitive to use.
The application should also include a prediction function that takes the user's input
data, preprocesses it using the same preprocessing steps used during model
training, and passes it through the saved machine learning models to obtain
predictions for each disease. The predictions can be displayed in real-time using
data visualization tools such as Matplotlib and Plotly.
The final step is to deploy the web application to a web server or cloud platform
such as Heroku or AWS. This will allow healthcare providers and patients to
access the application from anywhere with an internet connection.
Overall, the development of a web-based multi-disease prediction system using
Streamlit is a powerful tool for healthcare providers and patients alike. By
leveraging the power of machine learning and Streamlit, the application can
provide accurate and real-time predictions that can improve patient outcomes and
reduce the burden on healthcare providers.
We create multiple forms for all the diseases in our model- diabetes , liver cancer,
heart disease, and breast cancer . Then we combine all these into a single HTML
file

10
4.2 ARCHITECTURE

Fig 4.1: Flowchart

Fig 4.2: System Architecture

11
4.3 DESCRIPTION OF SOFTWARE FOR IMPLEMENTATION AND
TESTING PLAN OF THE PROPOSED SYSTEM

The proposed system for predicting multiple diseases using various classification
algorithms consists of several stages, which include data preprocessing, algorithm
selection, model training, and model evaluation.
The first stage of the proposed system is data preprocessing, which involves
handling missing values, scaling the features, and encoding categorical variables.
Missing values can be handled using different methods such as imputation,
deletion, or interpolation. Scaling is essential to ensure that all features have equal
weight in the model, and it can be achieved using methods such as normalization
or standardization. Categorical variables need to be encoded to numerical values
before they can be used in the model. This can be done using methods such as
one-hot encoding or label encoding.
The next stage is algorithm selection, which involves selecting the appropriate
classification algorithms for disease prediction. The selected algorithms for this
project are K-Nearest Neighbors (KNN), Logistic Regression, Linear Regression,
Support Vector Machines (SVM), Decision Tree, Random Forest, Naive Bayes,
and Gradient Boosting.
KNN is a non-parametric algorithm that classifies a new data point based on its
similarity to the k-nearest points in the training set. The value of k determines the
number of neighbors to consider. Logistic Regression is a parametric algorithm
that models the probability of a binary outcome using a logistic function. It
assumes a linear relationship between the features and the log-odds of the
outcome. Linear Regression is a parametric algorithm that models the relationship
between the dependent and independent variables using a linear equation. It
assumes a linear relationship between the features and the outcome. SVM is a
non-parametric algorithm that maps the input data to a higher dimensional space
and finds the best hyperplane that separates the data into classes. It works by
maximizing the margin between the classes. Decision Tree is a non-parametric
algorithm that creates a tree-like model of decisions and their possible
consequences. It splits the data based on the most informative feature at each
node. Random Forest is an ensemble method that creates multiple decision trees
and aggregates their predictions to improve the accuracy of the model. Naive
Bayes is a probabilistic algorithm that models the probability of each class based
on the independent features of the data. It assumes that the features are
conditionally independent of each other. Gradient Boosting is an ensemble
method that combines multiple weak learners to create a strong learner. It works
by sequentially adding new models that correct the errors of the previous models.

12
After selecting the appropriate algorithms, the next stage is model training. This
involves fitting the selected algorithms to the training data to learn the underlying
patterns in the data. The training process involves adjusting the parameters of the
algorithm to minimize the error between the predicted and actual values.
The final stage is model evaluation, which involves evaluating the performance
of the trained models on the test data. This is done using appropriate metrics such
as accuracy, precision, recall, and F1-score. Accuracy is the ratio of correct
predictions to the total number of predictions. Precision is the ratio of true
positives to the sum of true positives and false positives. Recall is the ratio of true
positives to the sum of true positives and false negatives. F1-score is the harmonic
mean of precision and recall.
The proposed system for predicting multiple diseases using various classification
algorithms can aid in early detection and diagnosis of diseases. The results of this
project demonstrate the effectiveness of different classification algorithms for
disease prediction, and they can guide healthcare professionals in making
informed decisions.

13
CHAPTER 5

IMPLEMENTATION DETAILS

5.1 ALGORITHM

• We propose an end-to-end application that predicts 🩺Disease Diagnosis


• 🥼Early Diabetes Diagnosis
• ⚕️ Liver Disease Prediction
• 🦟Malaria Detection
• 🫁Pneumonia Detection
• .
• The accurate analysis of our proposed application benefits early disease
prediction,patient care, and community services.
• The proposed prediction model caters to this objective by following a
stepwise approach through cleaning, feature extraction, and classification.
• Relevant elements are fed to the neural network by eliminating the
irrelevant features using feature selection. • The proposed application is
simple and it shows good efficient performance.

14
Fig 5.1: Accuracy

Fig 5.2: Comparison Table

Fig 5.3: Algorithm Comparison

We have gathered the data for our project for the following diseases in CSV and

Image format:

• Diabetes: CSV dataset

15
• Pneumonia: An image dataset

Data preprocessing is a data mining technique used to turn the raw data into a

format that is both practical and effective:

• For dataset in CSV file Format

• Removing Unwanted Columns


• Removing Null Values

• For Pneumonia Dataset In Images Format

• Removing Unwanted Images • Removing Noisy Images

Fig 5.4:Diabetes Attribute Values

TRAINING AND VALIDATION:

16
• Once we have split the dataset into training and test dataset, we use a
machine learning algorithm to evaluate the performance of the training
dataset using a machine learning algorithm.

• In this research project,we fit the data using scikit.Scikit-learn is probably

the most useful Python machine learning library.

• One of these tools is the scikit learn 'fit' method. After the model has been

initialized, the fit' method trains the algorithm on the training data.

• These datasets are also known as X_train and Y_train. X_train, the X-input
to the fit() method, is a 2-dimensional format for the perfect fitting of the
model.
• We further evaluate our model on the validation dataset and save the best
results. Then, we make changes in our model according to the dataset to
get the best possible results for our model.

• Main Page :-

17
CHAPTER 6
RESULTS

DIABETES

18
Accuracy

19
REFERENCES

[1] Al-Turaiki, I. (2021). Heart Disease Prediction using Machine Learning


Techniques: A Review. International Journal of Advanced Research in
Computer Science, 12(2), 19-28.

[2] American Diabetes Association. (2021). Standards of Medical Care in


Diabetes— 2021. Diabetes Care, 44(Supplement 1), S1-S232.

[3] American Liver Foundation. (2021). Liver Disease. Retrieved from


https://liverfoundation.org/liver-disease/ El-shafey, M. A., & Hassanien, A. E.
(2020).

[4] Han, J., & Kamber, M. (2011). Data mining: concepts and techniques.
Elsevier.

[5] National Breast Cancer Foundation. (2021). Breast Cancer. Retrieved from
https://www.nationalbreastcancer.org/breast-cancer/

[6] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal
of Machine Learning Research, 12(Oct), 2825-2830.

[7] Python Software Foundation. (2021). Python Language Reference, version


3.10.
Retrieved from https://docs.python.org/3/

[8] Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical
modeling with python. Proceedings of the 9th Python in Science Conference,
57-61.

[9] Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T.,
Cournapeau, D., ... & van der Walt, S. J. (2020). SciPy 1.0: fundamental
algorithms for scientific computing in Python. Nature methods, 17(3), 261272.

20
APPENDIX

A. SCREENSHOTS

Diabetes Webpage

21
Liver Disease Diagnosis

Malaria Detection

Data Source

22
B.SOURCE CODE

23

You might also like