0% found this document useful (0 votes)
40 views

Tarp Final

This document describes a proposed web portal for predicting multiple diseases using machine learning. Key details include using machine learning algorithms and a patient's uploaded information to predict diseases. The goal is to assist physicians by providing automatic disease predictions so that effective treatment can be determined. The project aims to address resource and time constraints in healthcare by developing an accessible machine learning-based prediction tool.

Uploaded by

Namrata Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Tarp Final

This document describes a proposed web portal for predicting multiple diseases using machine learning. Key details include using machine learning algorithms and a patient's uploaded information to predict diseases. The goal is to assist physicians by providing automatic disease predictions so that effective treatment can be determined. The project aims to address resource and time constraints in healthcare by developing an accessible machine learning-based prediction tool.

Uploaded by

Namrata Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Web Portal for Multiple Disease Prediction Using

Machine Learning

Technical Answers for Real World Problems (TARP)

Team Details:

Namrata Singhal | 19BCE0643

Khush Khandelwal | 19BCE0644

Under the Supervision of:

Prof. RAMANI S
INDEX

Serial No. Topic Page No.

1 Project Title 3

2 Abstract 3

3 Introduction 3-4

4 Literature Survey 4-9

5 Survey Table 10 - 11

6 Proposed Work 12 - 14

7 Proposed Model and Implementation 15 - 16

8 HTML Snippets 16 - 18

9 Results and Discussions 19 - 22

10 Conclusion and Future Work 23

11 References 23 - 24
PROJECT TITLE - Web Portal for Multiple Disease Prediction Using Machine Learning

ABSTRACT

In medical imaging, Computer Aided Diagnosis (CAD) is a rapidly growing dynamic area of
research. In recent years, significant attempts are made for the enhancement of computer
aided diagnosis applications because errors in medical diagnostic systems can result in
seriously misleading medical treatments. Machine learning is important in Computer Aided
Diagnosis. After using an easy equation, objects such as organs may not be indicated
accurately. So, pattern recognition fundamentally involves learning from examples. In the
field of bio-medical, pattern recognition and machine learning promise the improved
accuracy of perception and diagnosis of disease. They also promote the objectivity of
decision-making. For the analysis of high-dimensional and multimodal bio-medical data,
machine learning offers a worthy approach for making classy and automatic algorithms. In
this project, we have made a web portal to assist all the physicians to predict diseases just by
uploading some information about the patient.

Keywords: Machine Learning, Computer Aided Diagnosis, Machine Learning Techniques,


Web Portal

INTRODUCTION

Diseases are extremely dangerous to the human body. They can be transmitted by a variety of
viruses or by some chemical reactions in our body. Among a variety of life-threatening
illnesses, diseases with similar symptoms have received much attention in medical research.
Diagnosis of symptoms with similar symptoms is a challenging task, which can provide
automatic predictions about a patient's illness so that further treatment can be effective. The
diagnosis of such diseases is usually based on the symptoms, signs and image of the patient.
A major challenge facing healthcare organisations, such as hospitals and medical centres, is
the lack of affordable resources and limited time. Healthcare is a major diligence that
provides value-based care to millions of people while also producing significant revenue for
many countries. Excellence, Value, and Outcome are three buzzwords that still surround
healthcare and potential a lot, and today, healthcare experts and stakeholders all over the
world are searching for new ways to deliver on that promise. Technology is now helping
healthcare specialists to build alternative staff models, IP capitalization, provide smart
healthcare, and reduce administrative and supply costs, in addition to playing a vital role in
patient care, billing, and medical records.

During the COVID pandemic, the medical staff is concentrating more on COVID patients
hence reducing the workforce in other medical departments. The objective is to assist the
physician/doctor by using a classification. Heart disease, breast cancer, diabetes, migraine,
jaundice, chickenpox, and other diseases and health issues have a major impact on one's
health and can also lead to death if overlooked. The healthcare industry will make better
decisions by ‘mining’ their massive database, that is, extracting secret tendencies and
relations in the data and then using various machine learning algorithms for the prediction of
a disease.

Machine learning in healthcare is one such field that is progressively getting momentum in
the industry. The consequence of machine learning in the exploration and development of
these modalities, as well as their applied application in a medical environment, cannot be
exaggerated. Image segmentation, image registration, image fusion, image-guided therapy,
image annotation, and image database retrieval are all examples of machine learning in
medical imaging. Today, machine learning is more wanted in the healthcare and medical
sectors. When such machine learning techniques are applied properly, valuable understanding
can be collected from vast databases, allowing medical physicians to make more informed
decisions and increase health care.

LITERATURE SURVEY

1] Breast Cancer Likelihood using fluctuating Parameters of Machine Learning Models:

In this research paper six supervised machine learning algorithms such as KNN, Logistic
Regression, Decision Tree, ensemble, and Support Vector Machine along with deep learning
along with Adam, Gradient Descent as optimizers through an artificial neural network. The
data was used for Wisconsin Breast Cancer, consisting of 30 factors calculated using fine
needle aspiration of the breast mass. The precision found by Adam Gradient Descent
Learning was found to be highest, with a performance of 98.9%.

Advantages:

Ø Rectified linear unit was used, did not cause any vanishing gradient issue and allowed the
model to learn faster and perform better.

Ø Considering advantages of Support Vector Machine, the developed model was found to be
superior.

Disadvantages:

Ø Work could not be extended for breast cancer classification.

Ø Random Forest algorithm was found to be the worst for high dimensional sparse data.

2] Mining and Prediction of Diabetes Complication Disease using Data Mining Algorithm.
Most general diabetes diseases in Indonesia are retinopathy, nephropathy and neuropathy.
This research paper built a prediction model for above mentioned diabetes types. It was found
that the diabetes risk factor was divided into seven features, such as Age, Gender, BMI,
family history of diabetes, blood pressure, duration of diabetes sufferers and blood glucose
level. Data used in this research were analysed using a number of machine learning
algorithms such as Naïve Bayes tree, C4.5 decision tree-based classification techniques and
k- means clustering techniques. There are three main phases in this research: data attribute
selection and data mining preprocessing, data mining algorithm and its evaluation criteria
analysis, finally to generate rules for the three major micro vascular diabetes complication
diseases. The accuracy of the proposed model is 68%, highest accuracy on the retinopathy
prediction model.

Advantages:

Ø Proposed model could show the most powerful risk factor for diabetes complication
illnesses.

Ø It showed that compared to the clustering technique, classification technique gives better
information.

Ø Can predict diabetes complications at an early stage. Disadvantages:

Ø The developed model could not predict the automatic pre – diagnosis system, help the
patients to each risk factor of the complication disease.

Ø To improve the prediction model more diabetes medical reports are required, especially to
get sample datasets from all regions in Indonesia.

Ø Models to gain more prominent performance such as logistic model tree, random forest
were mentioned but not implemented in the research.

3] Data-driven modelling and prediction of blood glucose dynamics: Machine learning


applications in type 1 diabetes:

The purpose of this research work is to develop a compact guide on modelling options and
machine learning strategies, as well as a hybrid system focused on predicting the dynamics of
blood glucose (BG) in type 1 diabetes. Artificial pancreas, custom contour modelling, custom
decision support system and BG alarm event application. Literature reviews are collected
through various online databases, including Google Scholar, ScienceDirect and many more.
This research uses different types of machine learning algorithms, such as artificial neural
networks, support vector machines, Bayesian neural networks, and decision trees. [6]

Advantages:

Ø The proposed model can obtain the best prediction of the test value. [6]
Ø The network topology was successfully used to model and predict the blood glucose level
of patients with type 1 diabetes [6]

Disadvantages:

Ø Due to some complexity, the prediction of BG dynamics was not accurate. [6]

Ø The time lapse between continuous blood glucose monitoring (CGM) and actual blood
glucose levels is not covered. [6]

Ø Lack of clear methods to estimate carbohydrate intake. [6]

4] Evaluation of machine learning algorithms for medical event prediction (hazard of


coronary heart disease):

This research paper makes use of several supervised ml algorithms for projecting therapeutic
events with metrics as accuracy. Obtained results were compared using two statistical
platforms i.e., R-Studio and Rapid Miner. Machine learning algorithms such as decision tree,
random forest, support vector machine, neural networks and logistic regression were
implemented. Data used in this research were extracted from an open database of the
Framingham Heart Study consisting of observations where 57.1% corresponded to women
and 42.9% to men. The accuracy values for both sexes were calculated using formula:
[(sensitivity * prevalence) + (specificity * (1−prevalence))]

Advantages:

Ø Procedures developed can enhance the diagnostic and prognostic capacity of more
traditional regression techniques.

Ø Investigation was done on a comparatively small database which lends itself to be


conducted on any personal computer.

Ø It could keep track of all the changes in a reproducible file (script), was more replicable
and willing to correct, which provided a sense of assurance.

Disadvantages:

Ø Data represented as part of Machine Learning competition, neither its clinical reliability
nor the quality of results can be guaranteed.

Ø Dataset has a limited number of observations, limiting the prediction capability of the
trained models.

Ø Much more precise model could be obtained by using more information about risk factors
of patients.
5] Developing a dengue likelihood model based on weather in Tawau, Malaysia

Malaysia had a 10-fold increase of dengue cases over the last decade (i.e., 2006 – 2017). In
this paper, the relationship between weather prediction sensors including its lagged terms and
dengue incidence in the district of Tawau. Time series model was proposed to predict future
outbursts in Tawau. Correlation between dengue incidence was studied by Superman Rank
Correlation.

Advantages:

Ø Model developed, proved an ability to forecast potential dengue outbreaks 1 to 4 months in


advance.

Ø Developed model (i.e., SARIMA model) with regression was more precise when
forecasting timeliness of dengue incidence.

Disadvantages:

Ø Established model remains a work in progress, needing more varied and greater data.

Ø Data used was under-reporting (1 case reported to 23 unreported) and did not contain
measures like sunshine hours wind velocity.

Ø Model developed may not be sustainable for the long term.

6] An improved collaborative learning approach for the estimation of heart infection risk:

While studying this research, enhanced machine learning techniques (KNN, Logistic
Regression, support vector machine, linear discriminant analysis) were suggested to foresee
heart disease successfully. Cleveland and Framingham datasets were used and segregated into
smaller subsets using a mean based splitting approach. Segregated dataset were then
demonstrated using classification and model tree [CART]. Attained classification accuracies
on Cleveland and Framingham datasets were 93% and 91% respectively.

Advantages:

Ø Obtained precisions outperformed other machine learning procedures and similar scholarly
works.

Ø Proposed model showed the heart ailment risk can be anticipated commendably. Ø Can be
of assistance in medical counselling.
7] Evaluation of Dengue Model Performances Developed Using Artificial Neural Network
and Random Forest Classifiers

The goals of this research are: [7]

1) Evaluate the performance of models used to predict the correct category in a given data
set. These models are developed using artificial neural network (ANN) classifiers and
random forest (RF) classifiers, respectively.

2) To find a classifier with the best performance.

The result was a small accuracy due to the small data in this research model. The similar
searches were not impossible to compare with other results, since they have not been done
before. The data used in the project come from the microbial department. [7]

Advantages:

Ø The results obtained can be used for the development of a machine learning model, which
can predict the clinical degree of dengue at a critical stage. [7]

Ø A classifier provided the best performance. [7]

Ø Obtained result can contribute to health care research [7]

Disadvantages:

Ø Third party opinions were used. [7]

Ø Use of insufficient data. It consisted of only 77 dengue patients. [7]

Ø ANN architecture can be further developed for achieving higher accuracy rate. [7]

8] Efficient Heart Disease Prediction System

On studying this paper, there was a proper planned framework for finding the risk level of the
patients with involvement of certain parameters. Data mining algorithm was used for heart
disease prediction. Good level of accuracy was obtained in this paper. Dataset was taken from
Cleveland. [8]

Advantages:

Ø With the given paper, it was easy for doctors in making their decision. [8] Ø This project
and predict coronary illness risk level with high accuracy [8]
Ø Let patients get early measurement results as it can be done well even without retraining.
[8]

9] F-test feature selection in Stacking ensemble model for breast cancer prediction

In this research paper existing models combined with supervised machine learning algorithms
were used for developing a new model for prediction of breast cancer. ML algorithms such as
SVM, KNN, Logistic regression, Naïve Bayes were used. The dataset was taken from
Wisconsin. For achieving higher accuracy F test and variance threshold were considered.

Advantages:

Ø F test feature selection provided better accuracy when piling ensemble models. [9]

Ø Shows that learning occurs even at the stacked multilevel, which is different from the
majority voting standard used in packaging and promotion. [9]

Disadvantages:

Ø Inclusion of feature selection algorithm showed that the performance of the model cannot
improve. [9]

10. Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis

On studying this research paper, we found out that, comparison was made between different
machine learning algorithms. Algorithms that were used were support vector machine,
decision tree, ingenuous Bayes and KNN. Dataset taken was from Wisconsin breast cancer.
On completion of the research highest precision was found when performing support vector
machines with very low error rate. [10]

Advantage:

Ø SVM model showed greater accuracy in Breast cancer prediction with comparison to other
models. [10]

Ø Developed algorithm was able to show that it had best performance in terms of precision
and low error rate. [10]

Disadvantage:

Ø The precision and efficiency of other classifier models are lower than those of SVM
classifiers [10]
SURVEY TABLE:

Paper Model Used Metrics Preprocessing Dataset Disease


of Data

[1] Naïve Bayes tree, Accuracy Attribute Indonesiandiabetic’s Diabetes


C4.5 decision tree selection, data patient
and k-means mining
clustering

[2] Decision tree, Precision, Duplicate, Kaggle Heart


random forest, Recall and eliminate and
support vector Accuracy combine
machine, neural
networks and
logistic regression

[3] SARIMA Model Precision Spearman Tawau, Malaysia Dengue


and Rank-
Accuracy Correlation
Test

[4] KNN, Logistic Precision, Classification Cleveland and Heart


Regression, support Recall, and model tree Framingham
vector machine, sensitivity,
linear discriminant fscore and
analysis Accuracy
[5] supervised machine Precision, Standardisation Wisconsin Breast Cancer Breast
learning algorithms Recall, Cancer
with deep learning fscore and
Accuracy

[6] Artificial neural S mean Training, Kaggle Blood


network, support square Validation & glucose
vector machines, error Testing dynamics
Bayesian neural (MSE),
networks and error grid
decision tree analysis

[7] Artificial Neural Precision, Training, Department of Dengue


Network (ANN) Recall and Validation & microbiology
classifier and Testing
Random Forest (RF) Accuracy
classifier

[8] Data mining Accuracy Original Rules, Cleveland Heart


Pruned Rules
and Rules
without
duplicates

[9] Support Vector Accuracy Filter, Wisconsin dataset Breast


Machines, Naive Ensemble, Cancer
Bayes, K-NN, Wrapper
Logistics Regression Feature
and feature selection Selections
PROPOSED WORK

In practical situations and high demand of tests, it is difficult for the doctors and lab assistants
to check the results however fast they try. Communication of reports to the patient also takes
significant time thereby putting the patient’s life in danger. Our solution deals with assisting
doctor/lab professionals in prediction of multiple diseases by pacing up the prediction to a
matter of seconds by using high accuracy machine learning algorithms. The sources of
dataset used are:

o Breast cancer - sklearn o Heart disease - UCI

o Dengue - BioGPS o Diabetes - UCI

The tools and libraries required are:

• Atom (for making websites)

• Jupyter Notebook (for machine learning related stuff) • Sklearn

• Tensorflow

• keras

• Flask

• Mongodb • SMTP

• HTML

• CSS

• Pymongo • Pandas

• numpy

Our website deals with diseases like dengue, breast cancer, diabetes & heart disease risk. We
will be implementing multiple machine learning algorithms to minimise false positives
(predicting absence when the disease is actually present) which can be fatal for the patient.
There will be a feature for signing up and login for the lab assistant. Where he/she can enter
the patient’s details and all his symptoms and medical logs. All the patient details will be
stored in the MongoDB database.
Fig 1: Overall working of the web Portal

Fig 2: Use Case Diagram


The proposed work for the portal begins with a login/register option for a patient where
he/she can register their symptoms they are feeling which gets stored in the database, and our
ML will predict the disease and update the database and keep notifying the Doctor.

Fig 3: CNN Architecture

CNN FEATURES:A convolution tool that separates and identifies the various features of the
image for analysis in a process called Feature Extraction. The network of feature extraction
consists of many pairs of convolutional or pooling layers. A fully connected layer that utilises
the output from the convolution process and predicts the class of the image based on the
features extracted in previous stages.This CNN model of feature extraction aims to reduce the
number of features present in a dataset. It creates new features which summarises the existing
features contained in an original set of features. There are many CNN layers as shown in the
CNN architecture diagram.

POOLING LAYER :Convolutional Layer is followed by a Pooling Layer. The primary aim
of this layer is to decrease the size of the convolved feature map to reduce the computational
costs. This is performed by decreasing the connections between layers and independently
operates on each feature map. Depending upon the method used, there are several types of
Pooling operations. It basically summarises the features generated by a convolution layer.In
Max Pooling, the largest element is taken from the feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the
elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually
serves as a bridge between the Convolutional Layer and the FC Layer.This CNN model
generalises the features extracted by the convolution layer, and helps the networks to
recognise the features independently. With the help of this, the computations are also reduced
in a network
PROPOSED MODEL IMPLEMENTATION

We have used the Deep Learning algorithm CNN to increase the accuracy of our Dengue
Diagnosis part of the system. Convolutional neural network (CNN, or ConvNet) is one of the
most important algorithms and processes for in-depth learning and is widely used to analyse
image data. CNN uses a variety of multilayer perceptrons designed to require minimal
processing. CNN uses a much smaller processing compared to other image class algorithms.
This means that the network reads filters in traditional manual algorithms. Other reasons why
CNNs do so much better than classic neural networks on images is that the convolutional
layers take advantage of inherent properties of images.

1. Convolutions

● Simple feedforward neural networks don’t see any order in their inputs. If you
shuffled all your images in the same way, the neural network would have the very
same performance it had when trained on not shuffled images.
● CNN, in opposition, takes advantage of local spatial coherence of images. This
means that they are able to dramatically reduce the number of operations needed to
process an image by using convolution on patches of adjacent pixels, because
adjacent pixels together are meaningful. We also call that local connectivity. Each
map is then filled with the result of the convolution of a small patch of pixels, slid
with a window over the whole image.

2. Pooling layers

a. There are also the pooling layers, which downscale the image. This is possible because we
retain, throughout the network, features that are organised spatially like an image, and thus
downscaling them makes sense as reducing the size of the image. On classic inputs you
cannot downscale a vector, as there is no coherence between an input and the one next to
it.This independence from previous knowledge and human efforts in the construction of the
feature is of great benefit. They have applications for image and video
recognition,complimentary programs, image classification, medical image analysis, and
natural language processing. CNN contains input and output layer, as well as many hidden
layers. CNN's hidden layers usually consist of convolutional layers, composite layers, fully
connected layers and standard layers. The comparative results of the existing and proposed
system are as follows:

Figure 4: Working of CNN Visualised


Table 2: Comparison between existing and proposed system

HTML Snippets:

Figure 5: Login Page for Patients


Figure 6: Login Page for Doctors

Figure 7: Page for selecting disease


Figure 8: Patient’s database

Figure 9: Doctor’s Database


RESULTS AND DISCUSSION

● Breast Cancer

Figure 10: KNN analysis

● Our model: Logistic Regression


● Features used – radius mean, perimeter mean, perimeter worst
● Metrics used – Accuracy, recall, Precision, F-score

Figure 11: Performance Comparison with other algorithms


Precision Recall F-score support

0 1.00 0.96 0.98 75

1 0.93 1.00 0.96 39

accuracy - - 0.97 114

Macro avg 0.96 0.98 0.97 114

Weighted avg 0.98 0.97 0.97 114

Table 3: Class Distribution by KNN model

● Heart Disease

Table 4: Accuracy of other algorithms

● Our model: Logistic Regression


● Features used – cp, restecg, thalach, slope Performance – Best among non
deepmodels
● Used only limited features for better convenience

Accuracy Score 0.7540983606557377

F-Score 0.7457627117644068

Table 5: Our model’s Accuracy and f-score


● Dengue

Table 6: Performance of the model developed using ANN classifier

● Trained for 50 epochs on dengue blood sample dataset


● 89% accuracy on test data
● 99% on train data

Figure 12: Loss VS Epochs


Figure 13: Accuracy VS Epochs

● Diabetes

Table 7 :Impact of learning rate on accuracy measurement

● Our model: Logistic Regression


● Features used: BMI, Glucose, Age

precision recall f1-score support

0 0.86 0.77 0.81 56

1 0.52 0.67 0.58 21

accuracy - - 0.74 77

Macro avg 0.69 0.72 0.70 77

Weighted avg 0.77 0.74 0.75 77

Table 8: Precision, recall and F-score


CONCLUSION AND FUTURE WORK

We have successfully created a web portal to assist the doctors and lab workers in faster and
more accurate deductions of test results particularly heart disease, diabetes, dengue and breast
cancer. To be useful, a prediction model must provide accurate and validated estimates of the
risks to the individual and ultimately improve an individual’s outcome or the
cost-effectiveness of care. We got comparable performances with traditional machine
learning models and limited features. Also integrated a communication module for immediate
probable results of the tests to the patient/doctor whoever is concerned.

In future, we’ll try to update our portal for more diseases with higher accuracy particularly
less false positive results. We’ll also add more features for doctors and section wise doctors
for better tracking of patients.

REFERENCES

[1] Fiarni, C., Sipayung, E. M., & Maemunah, S. (2019). Analysis and prediction of diabetes
complication disease using data mining algorithm. Procedia Computer Science, 161,
449-457.

[2] Beunza, J. J., Puertas, E., García-Ovejero, E., Villalba, G., Condes, E., Koleva, G., ... &
Landecho, M. F. (2019). Comparison of machine learning algorithms for clinical event
prediction (risk of coronary heart disease). Journal of biomedical informatics, 97, 103257.

[3] Jayaraj, V. J., Avoi, R., Gopalakrishnan, N., Raja, D. B., & Umasa, Y. (2019). Developing
a dengue prediction model based on climate in Tawau, Malaysia. Acta tropica, 197, 105055.

[4] Mienye, I. D., Sun, Y., & Wang, Z. (2020). An improved ensemble learning approach for
the prediction of heart disease risk. Informatics in Medicine Unlocked, 20, 100402.

[5] Gupta, P., & Garg, S. (2020). Breast cancer prediction using varying parameters of
machine learning models. Procedia Computer Science, 171, 593-601.

[6] Woldaregay, A. Z., Årsand, E., Walderhaug, S., Albers, D., Mamykina, L., Botsis, T., &
Hartvigsen, G. (2019). Data-driven modeling and prediction of blood glucose dynamics:
Machine learning applications in type 1 diabetes. Artificial intelligence in medicine, 98,
109-134.

[7] Silitonga, P., Dewi, B. E., Bustamam, A., & Al-Ash, H. S. (2021). Evaluation of Dengue
Model Performances Developed Using Artificial Neural Network and Random Forest
Classifiers. Procedia Computer Science, 179, 135-143.

[8] Saxena, K., & Sharma, R. (2016). Efficient heart disease prediction system. Procedia
Computer Science, 85, 962-969.

[9] Dhanya, R., Paul, I. R., Akula, S. S., Sivakumar, M., & Nair, J. J. (2020). F-test feature
selection in Stacking ensemble model for breast cancer prediction. Procedia Computer
Science, 171, 1561-1570.
[10] Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Using machine learning
algorithms for breast cancer risk prediction and diagnosis. Procedia Computer Science, 83,
1064-1069.

[11] Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for
diabetes prediction. ICT Express.

You might also like