0% found this document useful (0 votes)
17 views3 pages

Disease Prediction

In this paper we have made an attempt to design a model in which unstructured symptoms is taken as an input from the user, based on which list of possible diseases is provided back to the user. Additionally, user can get the detailed information about the symptoms, causes, treatments, diagnosis of the output disease. Many other symptoms are recommended by the system based on the symptoms entered by the user. System is user friendly. People with less medical understanding can easily use i

Uploaded by

hitarthpatel001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Disease Prediction

In this paper we have made an attempt to design a model in which unstructured symptoms is taken as an input from the user, based on which list of possible diseases is provided back to the user. Additionally, user can get the detailed information about the symptoms, causes, treatments, diagnosis of the output disease. Many other symptoms are recommended by the system based on the symptoms entered by the user. System is user friendly. People with less medical understanding can easily use i

Uploaded by

hitarthpatel001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Symptoms Based Disease Prediction Using Machine

Learning Techniques
Rutul Gandhi Hitarth Patel
19BEC033 19BEC039
Dept. of Electronics and Communication Dept. of Electronics and Communication
Institute of Technology,Nirma University Institute of Technology,Nirma University
[email protected] [email protected]

Abstract—In this paper we have made an attempt to design a • Prepare a list of symptoms by separating it based on
model in which unstructured symptoms is taken as an input from comma.
the user, based on which list of possible diseases is provided back • Convert all the letters to lowercase.
to the user. Additionally, user can get the detailed information
about the symptoms, causes, treatments, diagnosis of the output • Removal of punctuation and other special characters (if
disease. Many other symptoms are recommended by the system any).
based on the symptoms entered by the user. System is user
friendly. People with less medical understanding can easily use
it, and it can help with early disease identification and diagnosis. B. Symptom Expansion using Synonyms
It can also help people who are hesitant to go to the doctor when
they have minor symptoms. This will give user a rough indication Symptom expansion is done using the synonyms from
about the seriousness of the sickness. thesauras.com and Princeton University’s WordNET available
Index Terms— Symptoms, Disease, Diagnosis, Machine learn- in python. To find the synonyms set, each symptom is broken
ing model. into its combinations. Figure 2 shows user input symptoms
and the symptoms that match the synonym string.
I. I NTRODUCTION
U se of Machine learning in healthcare and biomedical
area has led to early detection of the disease and improved
diagnosis. Machine learning has improved patient care in
recent times. People nowadays take help of the internet for
identifying any probable health related issues.Several disease
prediction systems are available that helps in prediction of
only one particular disease such as heart disease prediction,
neurological disorders prediction, and many more. But a
symptom based disease prediction system helps doctors as well
as medical experts to detect the disease at an early stage based
on symptoms. When a user enters a query, the most probable
disorders are offered to them based on their probability and
scores.

II. I MPLEMENTATION
Initially the system asks the user to enter the symptoms on
which it predicts the diseases with the highest probability. The
flow chart depicts the process of disease prediction based on
user input symptoms. Each module is discussed in detail in
the following subsections.

A. Symptoms Preprocessing
The system takes input of the symptoms in a single line
separated by comma(,) and does the following preprocessing
steps on it:
Fig. 1. Flow Chart
Fig. 4. Predicted Diseases

Fig. 2. Input Symptoms


• Document Frequency(DF) is the total number of oc-
currences of a system among all the diseases. Inverse
C. Symptoms Suggestion and Selection document frequency(IDF) can be computed by:
Here, we try to find the similar or related symptoms after count(AllDiseases)
expanding the symptom query. Each symptom is split into a IDF = log10 (1)
DF
token and it is checked for the presence in the expanded set.
Based on which, a similarity score is measured and if the score • Symptom vector for symptoms is calculated by:
is above the threshold value, the symptom will be similar to
the user’s input symptom and will be given as suggestion to T F.IDF (sym, dis) = IDF ∗ log10 (1 + T F ) (2)
the user. The user selects some of the symptoms from the
above set. Other symptoms are shown to the user which are • Score of user symptom query and disease entered by
top co-occurring symptoms based on the selected ones. The the user is given by the following equation. High score
user gets 3 options either to select symptoms, skip or to end indicates a higher level of similarity between two vectors.
the selection process. Figure 3 shows the selection process.
T F.IDF Score(Q, A) = dot(Q, A) (3)

• The higher value of cosine similarity represents higher


similarity between the disease and the query vector. The
scores are sorted in descending order and a list of top K
diseases is obtained.
dot(Q, A)
cos.sim(Q, A) = (4)
|Q| ∗ |A|
Fig. 3. Symptom Suggestion IV. I MPROVEMENTS

• Earlier, we were predicting the disease based on the


III. D ISEASE P REDICTION symptoms entered by the user which was then improved
by adding a query expansion feature.
On the basis of the final symptom list, vectors are generated • Feature of suggesting common co-occurring symptoms is
specific to the model and disease prediction is done. added to provide more flexibility.
• More details about the disease and its treatment recom-
A. Prediction using Logistic Regression Model mendation is provided to the user to make it a complete
medical system.
A binary table is generated that contains 1 for the symptoms
present in the user’s selection list and 0 otherwise. The model
is trained with the given dataset, which is then used for predic- V. R ESULT AND A NALYSIS
tion analysis. The input to the model is the symptom table and
it outputs a list of top k diseases sorted in decreasing order of Various machine learning algorithms are applied to the
their probabilities. Figure 4 shows the list of predicted diseases dataset and evaluation is done on the basis of accuracy
with a probability that is obtained by Logistic Regressor. obtained from each technique. Earlier we used the Multinomial
Naive Bayes classifier to predict top diseases since it works
efficiently for discrete values and gives average accuracy. But
B. Prediction using TF.IDF and Cosine Similarity Model
then we observed that higher accuracy can be obtained by
• Term Frequency(TF) is the count of occurrence of a using Logistic Regression. The Accuracy vs. Classifier graph
symptom in the disease. depicts the accuracy of each model.
Fig. 5. Model Accuracy Comparison

VI. C ONCLUSION
In this paper, we have designed a Symptoms Based Disease
Prediction using Logistic Regression technique which is sim-
ple to implement and efficient to train .The designed model
predicts the disease on the basis of the symptoms entered
by the user. List of probability of top predicted diseases is
returned to the user. Also, the dataset was subjected to various
machine learning techniques and was evaluated on the basis
of accuracy obtained by each technique. Appreciable accuracy
was obtained by Logistic Regression technique which was
about 91.18%.
ACKNOWLEDGEMENT
We would like to express our gratitude to Prof. Manish I.
Patel who provided us the opportunity to make research on
the topic of our interest and present it in the form of a paper.
We would also like to thank them for their guidance and
support to us whenever it was needed. At the end we would
also like to thank the authors of the research papers that we
referred to and gained relevant information from it.
R EFERENCES
[1] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang,“Disease prediction
by machine learning over big data from healthcare communities”, ,”
IEEE Access, vol. 5, no. 1, pp. 8869–8879, 2017.
[2] Dahiwade, Dhiraj, Gajanan Patle, and Ektaa Meshram. ”Designing
disease prediction model using machine learning approach.” 2019 3rd
International Conference on Computing Methodologies and Communi-
cation (ICCMC). IEEE, 2019.
[3] Ajinkya Kunjir, Harshal Sawant, Nuzhat F.Shaikh, “Data Mining and
Visualization for prediction of Multiple Diseases in Healthcare,” in IEEE
big data analytics and computational intelligence, Oct 2017 pp.2325.
[4] B. Nithya , Dr. V. Ilango Professor, “Predictive Analytics in Health
Care Using Machine Learning Tools and Techniques,” International
Conference on Intelligent Computing and Control Systems,2017.
[5] S.Leoni Sharmila, C.Dharuman and P.Venkatesan “Disease Classifica-
tion Using Machine Learning Algorithms - A Comparative Study”,
International Journal of Pure and Applied Mathematics Volume 114 No.
6 2017, 1-10.

You might also like