AI-based chatbot for skin disease prediction
using DCNN and Ensemble model
Team Members:
Vishnupriya N 2019103599
Ishwarya Rani M 2019103527
Navvya L 2019103548
Under the guidance of
Dr. P . Geetha
Introduction:
• Skin diseases are a common problem that affects people of all ages, genders, and ethnicities. They can be caused by
a variety of factors such as genetics, environmental factors, and lifestyle habits.
• However, access to dermatologists or medical experts can be limited, leading to delayed or incorrect diagnoses,
which can further worsen the condition. Skin diseases can be difficult to diagnose accurately, and people may not
always have easy access to dermatologists or other healthcare professionals.
• With the rapid advancements in technology, the development of a skin disease prediction chatbot has become an
effective solution to this problem.
• In recent years, chatbots have emerged as a promising tool for providing healthcare services to people in remote or
underserved areas. Chatbots will have the necessary information and the symptoms already stored in them.
• So when a patient wants to consult a doctor he can simply use a chatbot for clearing his doubts. It is easy to use, we
can use it and clear our doubts then and there and we will get a remedy immediately.
• This technology can provide individuals with immediate access to dermatological expertise, especially in areas with
limited healthcare resources. In addition, the skin disease chatbot can help reduce the workload of dermatologists by
screening and prioritizing cases that require immediate attention.
Introduction:
• The skin disease chatbot system works by allowing users to upload images of their skin condition through a user-
friendly interface. The chatbot then analyses the images and provides information on the type of skin disease. The
chatbot can also answer basic questions about the condition and provide prevention.
• Overall, the skin disease chatbot is a promising solution to the problem of limited access to dermatologists and can
significantly improve the accuracy and timeliness of skin disease diagnoses, leading to better health outcomes.
• The main idea is to build a chatbot that takes several inputs regarding the user’s lifestyle and takes an image of the
skin disease. Predicts the exact disease by asking questions about the symptoms to the user and giving the required
remedy.
• The chatbot system is particularly useful for people in rural or remote areas where access to dermatologists may be
limited. By providing accurate and reliable information about skin diseases, the chatbot can help users make informed
decisions about their health and seek appropriate treatment.
• In this project we use the ISIC2019 dataset where each disease has its own set of images. This dataset is small and
imbalanced and consists of images of six skin diseases namely actinic keratosis, basal cell carcinoma, melanoma,
nevus, seborrheic keratosis and squamous cell carcinoma.
Introduction:
• We will use Convolutional Neural Networks for Feature Extraction and ID3 Decision Tree Model to predict skin
diseases based on the symptoms.
• A customized convolutional neural network (CNN) has been developed to classify skin disease images. The CNN uses
a Sigmoid function to calculate the probability of each disease appearing in the image. The probabilities are normalized
so that their sum is equal to 1.
• However, for the chatbot, a sigmoid function is used to shortlist which diseases are present in the image. The sum of
the sigmoid values is not necessarily 1, and a threshold of 0.5 is used to shortlist the diseases.
• In addition to the image, not only is it classified according to looks, external factors such as the patient's region,
smoking and drinking habits, bleeding, and changes in the wound are taken into consideration when asking questions
through the chatbot.
• A decision tree has been built to model these external factors, and the order of the questions for the chatbot is
determined based on this decision tree. It can be used to guide the chatbot to ask questions and help it make a more
accurate prediction of skin disease by getting responses from the user.
• Overall, the skin disease chatbot is a promising solution to the problem of limited access to dermatologists and can
significantly improve the accuracy and timeliness of skin disease diagnoses, leading to better health outcomes.
Overall Objective
• To develop a system that can predict skin diseases based on images and provide basic information about the disease
condition through a chatbot.
• The system should be able to analyze skin image submitted by users and classify them into different categories of skin
diseases. Additionally, the chatbot asks basic questions to tell the exact disease.
• The system should be user-friendly and accessible to people with different levels of technical expertise.
• The accuracy of the skin disease prediction should be evaluated and validated against a set of known skin diseases to
ensure that it is reliable and effective.
Literature Survey
SNO TITLE METHODOLOG ADVANTAGES DISADVANTAG IDEAS FOR ADOPTION
Y ES
1 Single Model Deep A novel skin The models with It totally depends Dealing with the class imbalance issue.
Learning on lesion moderate complexity on training. So, if The use of regularization techniques
Imbalanced Small classification outperform the larger there are such as DropOut and DropBlock can
Datasets for Skin method consisting ones. Achieve symptoms too help to prevent overfitting and improve
Lesion of modified significant different than the the generalization of the model. The
Classification. Yao, DCNNs performance with trained model use of a modified version of
P., Shen, S., Xu, M., integrating less computing dataset, then the RandAugment can also help to increase
Liu, P., Zhang, F., regularization resources and shorter prescription may the diversity of the training data and
Xing, J., Shao, P., DropOut and time. Deals with the not be valid. improve the robustness of the model.
Kaffenberger, B. and DropBlock, class imbalance issue, The use of a MultiWeighted New Loss
Xu, R.X. (2022). Modified improve the accuracy function is another important aspect of
IEEE Transactions RandAugment, of key classes, reduce this approach. This allows the model to
on Medical Imaging, MultiWeighted the interference of assign different weights to different
41(5), pp.1242– New Loss and an outliers in the classes based on their frequency in the
1254. end-to-end network training. dataset, which can help to address the
Cumulative class imbalance issue.
Learning Strategy.
Literature Survey
SNO TITLE METHODOLOG ADVANTAGES DISADVANTAGES IDEAS FOR ADOPTION
Y
2 An AI-Based An AI Chatbot An all around If it is implemented It's important to note that accuracy is crucial
Medical Chatbot interaction and 24/7 in its low accuracy in any medical prediction model. If the model
Model for prediction model support,provides state, the bot will not is not accurate, it can potentially lead to
Infectious using a deep necessary be helpful for the misdiagnosis or inappropriate treatment,
Disease feedforward information future user and which can have serious consequences for
Prediction. multilayer about the causes problems patients. Information about hospital bed
Chakraborty, S., perceptron for availability of when it’s about to availability is a useful feature for users, but
Paul, H., Ghatak, Covid-19 dataset hospital beds in give the perfect reply it's important to ensure that the information
S., Pandey, S.K., and uses an area where which is wanted by provided is up-to-date and accurate.
Kumar, A., Singh, TensorFlow to the user wants the user. Inaccurate information could lead to users
K.U. and Shah, build the NLP for the patient to be being directed to hospitals that are full or
M.A. (2022). chatbots and taken. unable to provide the necessary care, which
IEEE Access, 10, utilizes DNN could be potentially harmful.
pp.128469– architecture.
128483.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGE DISADVANTAGES IDEAS FOR ADOPTION
S
3 Skin Lesion Combines several Yields a more More deep networks
Classification by convolutional deep accurate and other topologies of
Ensembles of Deep classifiers which form assessment of the lattice can be
Convolutional an ensemble, by the lesion, the tested. Image
Networks and merging the class nevi class transformations, such
Regularly Spaced information provided scores were as rotations combined
Shifting. by the classifiers higher in most with the proposed
Thurnhofer-Hemsi, when they are run on of the shifts, may improve
K., Lopez-Rubio, shifted versions of the predictions. the generalization level
E., Dominguez, E. test image. This way, of the model. Not
and Elizondo, D.A. the advantages of related to standard
(2021). IEEE each classifier are train time data
Access, 9, exploited, while augmentation by
pp.112193–112205. positional invariance training image shifting.
is enhanced by the
shifting procedure.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
4 Skin Disease Computer algorithm An effective, low-cost Can be binded with a Training the algorithm
Prediction. Sanas, ResNet152V2 which solution. Provides website or an android on clusters of data
S., Pawale, P., contains few steps that better accuracy than app to provide real-time rather than the entire
Ghadage, G. and involve image processing, other algorithms like data for skin disease dataset is also a good
Sahani, M. (2021). image feature extraction, BPN, Logistic prediction. Dataset used suggestion. This can
8(4), pp.4344–4347. and classification of data regression, KNN, is small. For better help to reduce the
has been implemented with Transfer Learning. performance, designing training time and
the help of a classifier such deep learning network improve the efficiency
as an artificial neural structures, using of the algorithm.
network (ANN). The adaptive learning rates,
algorithm uses feature and training it on
extraction & soft-max clusters of data rather
classifier of Convolutional than the whole dataset
neural network (CNN) for can be done.
the detection of skin disease
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
5 Targeted Ensemble A dynamic AI-model A new classification Classification can be One key feature of the
Machine Classification configuration and process was provided to done for a wider skin- proposed approach is
Approach for secured IoT-Fog-Cloud produce better related disease the use of a two-phase
Supporting IoT Enabled and a new classification classification results in classification. classification process,
Skin Disease Detection. process to produce skin disease detection. which the authors
Yu, H.Q. and Reiff- better classification Having a two-phase suggest can produce
Marganiec, S. (2021). results in skin disease classification process better results than using
IEEE Access, pp.1–1. detection is used. To can produce a better a single CNN model.
achieve it, the existed result than only using The approach is also
machine learning one specific CNN said to enable
models is evaluated and model. classification of a wider
cross validated in a range of skin-related
controlled and standard diseases.
testing environment.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
6 Prediction of Skin Using computer-aided Application developed Research and execution Ensuring that the
Diseases Using techniques in Machine is light-weight and can of limited medical system is trained on
Machine Learning. learning such as be used in machines information. It can be diverse and
Mtende Mkandawire Ensemble Algorithm with low system explored with recent representative data sets
and Dr. Glorindal and Data Mining specifications. It has advances in AI and the is critical to its accuracy
Selvam (2022). Algorithms to predict also a simple user benefits of diagnosis and generalizability.
International Journal skin diseases real-time. interface for the assisted with AI.
of Advanced Research convenience of the user.
in Science,
Communication and
Technology, pp.54–61.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
7 Technical Aspects of It follows a scoping Results showed that the It is important to English as the base
Developing Chatbots review methodology, common language of conduct more in-depth language for chat bot.
for Medical specifically, the communication systematic reviews on Question generation
Applications: Scoping PRISMA extension of between the user and the effectiveness of method to be dependent
Review. Safi, Z., Abd- scoping reviews chatbot is English, most chatbots in supporting on the answers in a
Alrazaq, A., Khalifa, M. common technique was and enhancing positive dynamic way.
and Househ, M. (2020). pattern matching for clinical outcomes.
Journal of Medical developing text
Internet Research, understanding.
22(12), p.e19127.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
8 IntelliDoctor – AI based The app utilizes System tracks their Not reliable or accurate A user friendly
Medical Assistant. predictive analytics to physical activities like compared to traditional application. A focus on
Gandhi, M., Kumar generate periodic health periodic step count and methods. Relies on usability and simplicity,
Singh, V. and Kumar, V. reports which are their calories intake; it user's health activities provide clear
(2019). In: 2019 Fifth tailored according to allows them to set a like step counts, sleep instructions and
International individual users' needs goal to follow. The tracking etc., so any feedback, ensure data
Conference on Science by acquiring system allows users to discrepancies in these accuracy, consider data
Technology Engineering information on the add the food and it readings could affect its privacy and security.
and Mathematics patients everyday automatically fetches accuracy too. Requires
(ICONSTEM). activities and their nutrition users to input their
environment. information. All the medical information
information is displayed accurately, which can
into graphical interface. sometimes lead to
Periodic health reports incorrect results if the
are generated for the data provided by them
users to follow. is inaccurate.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
9 A Convolutional Neural CNN is used for text The variation of the Does not have an online A set of disease can be
Network Model for classification and system accuracy with service platform and predicted using the
Online Medical perform feature varying number of accuracy can be image, while an
Guidance. Yao, C., Qu, construction and returned answers; if the improved (70% addition of chatbot to
Y., Jin, B., Guo, L., Li, transformation on raw, standard answer is accuracy). If only one know the other
C., Cui, W. and Feng, L. noisy data. among the answers disease name can be information and
(2016). IEEE Access, returned by the system, returned as the final conclude on one
4, pp.4094–4103. the system answer is answer, the results are disease.
considered to be correct not ideal.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR
ADOPTION
10 Healthcare Chatbot Uses computer-aided Application developed Research and Having a simple user
using Artificial techniques in Machine is light-weight and can execution of limited interface can also help to
Intelligence. learning such as be used in machines medical information. make the tool more user-
Patil, A. (2022). Ensemble Algorithm with low system It can be explored friendly and increase its
International Journal and Data Mining specifications. It has with recent advances adoption by both
for Research in Algorithms to predict also a simple user in AI and the benefits healthcare professionals
Applied Science and skin diseases real-time. interface for the of diagnosis assisted and patients. However,
Engineering convenience of the user. with AI. it's important to ensure
Technology, 10(8), that the limited medical
pp.905–909. information used in the
development of the
chatbot is reliable and
accurate.
Summary of Issues
The effectiveness of chatbots in supporting and enhancing positive clinical outcomes depends
on training and training. It is important to conduct more in-depth systematic reviews to assess their
effectiveness. Deep networks and other topologies of the lattice can be tested, image transformations
can improve the generalization level of the model, and they can be binded with a website or an
android app to provide real-time data for skin disease prediction. For better performance, designing
deep learning network structures, using adaptive learning rates, and training it on clusters of data
rather than the whole dataset can be done. Classification can be done for wider skin-related disease
classification, and research and execution of limited medical information can be explored with recent
advances in AI and the benefits of diagnosis assisted with AI. However, it is not reliable or accurate
compared to traditional methods, and requires users to input their medical information accurately. It
does not have an online service platform and accuracy can be improved, but only one disease name
can be returned as the final answer.
Proposed System
Our main contribution is the chatbot which has a questionnaire model
• The questionnaire includes a set of questions related to the user's symptoms, medical history, and other
relevant factors that can help identify the skin disease.
• A set of questions related to symptoms and lifestyle will be hard-coded already to the questionnaire
model. The questions will be asked in a certain order based on the decision tree.
• The customized questionnaire (ordered) is created. It can be used to guide the chatbot to ask questions
and help it make a more accurate prediction of skin disease by getting responses from the user.
Overall Architecture
Detailed Module Design
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
Module 2: Shortlisting of skin diseases for given user image
Module 3: Dynamic ID3 Decision Tree Model
Module 4: Creation of Chatbot and Questionnaire Model
Module 5: Prediction of exact Skin Disease by the user- chatbot interaction
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
• CNN model is created for the ISCI2019 skin disease image data set.
• This dataset consists of images of six skin diseases which are actinic keratosis, basal cell carcinoma, melanoma,
nevus, seborrheic keratosis, squamous cell carcinoma.
• For this image dataset, the customized CNN model is trained and created
INPUT - ISIC2019 skin disease image data set.
OUTPUT - CNN model
Module 2: Shortlisting of skin diseases for given user image
• Shortlisting skin diseases for given user image is done using the CNN model which was created in module 1.
• User image is collected and is applied on the created CNN model.
• The skin disease above the threshold 0.5 are shortlisted for the next process.
INPUT – CNN model and User Image
OUTPUT – Shortlisted skin diseases
Module 3: Dynamic ID3 Decision Tree Model
• Here another dataset PAD-UFES-20, which consist of the external factors is used.
• Only the details of the diseases that were shortlisted from the previous module are extracted from the PAD-UFES-20
dataset.
• ID3 decision tree model will be created for the extracted details.
INPUT – Shortlisted skin diseases
OUTPUT – ID3 Decision tree model
Module 4:Creation of Chatbot and Questionnaire Model
• The chatbot is created using the questions that need to be answered by the user.
• A model for creating the customized question and answer is created. Using this model, the chatbot generates the
customized question-answer for the dynamically generated decision tree.
• The Chatbot collects the user image for further processing.
INPUT - NLP Model and Question-Answer model
OUTPUT - Chatbot
Module 5: Prediction of exact Skin Disease by the user- chatbot interaction
• Accurate questions are queried by the chatbot to the user based on the Questionnaire that is generated.
• Based on the user response about the symptoms follow-up questions are also generated.
• The exact skin disease of the user is predicted.
• Remedies for the predicted disease are also provided.
INPUT - Chatbot
OUTPUT - Skin disease of the user is predicted
IMPLEMENTATION DETAILS 30%
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
All the required libraries and packages are being imported
The path for extracting the images from the ISIC2019 image dataset is defined. And the count of images is displayed.
Listing out all the class names of the skin cancer and storing them in a list
Visualising the images using matplotlib
Visualising the distribution of each disease in the train dataset. From this visualisation it can be seen that the
data is unevenly distributed among each class and there is less data sets in each class.
Rectifying the class imbalance using augmentor. In each class 1000 images count is maintained. The data is
randomly applied with left rotation and right rotation with maximum magnitude of 10 each.
After data augmentation 1000 images in each class has been created and hence totally for 6 class 6000 images
has been generated
Visualising the distribution of augmented data after adding new images to the training data. And it can be seen
that all the classes are balanced now.
Splitting data for training and validation, with a split of 0.2.
The CNN model is created by adding customised layers to it.
Summary of the created CNN model
Compiling the model. Adam optimizer is used here. The model is trained for 25 epochs.
Accuracy of the created CNN model is displayed for each epoch. At the 25 th epoch, a training accuracy of 0.89 and
a validation accuracy of 0.87 is obtained.
The created model is saved in the path final_model in the name of cnn_sigmoid_model.h5
Module 2: Shortlisting of skin diseases for given user image
Loading the previously created CNN model.
Creating the map list and setting the threshold for shortlisting to 0.5.
Creating the function for finding the diseases which are above the threshold. All the predicted value for each of the
six class is checked if it is above 0.5. If the value is above 0.5 the disease is appended to the shortlisted diseases list.
Applying the CNN model to predict the shortlisted diseases for an example test image.
Module 3: Dynamic ID3 Decision Tree Model
ID3 DECISION TREE ALGORITHM is implemented from scratch. Here the data of only the shortlisted diseases is
extracted and decision tree is built for it.
Building decision tree for a shortlisted diseases. The PAD-UFES-20 dataset and the shortlisted diseases list is
passed as the arguments for building the tree.
Performance Measures
• For the performance evaluation, First we denote TP, FP, TN and FN as true positive (the
number of instances correctly predicted as required), false positive (the number of
instances incorrectly predicted as required), true negative (the number of instances
correctly predicted as not required) and false negative (the number of instances incorrectly
predicted as not required), respectively. Then, we can obtain four measurements: accuracy,
precision, recall and F1-measure. The F1-Measure is the weighted harmonic mean of the
precision and recall and represents the overall performance.
• In addition to the aforementioned evaluation criteria, receiver operating characteristic
(ROC) curve and the area under curve (AUC) can be used to evaluate the pros and cons of
the classifier. The ROC curve shows the trade-off between the true positive rate (TPR) and
the false positive rate (FPR), where the TPR and FPR are defined as follows:
Performance Measures
• If the ROC curve is closer to the upper left corner of the graph, the model is better. The AUC is the
area under the curve. When the area is closer to 1, the model is better. In medical data, more
attention is to be paid to recall rather than accuracy. The higher the recall rate, the lower the
probability that a patient who will have the risk of disease is predicted to have no disease risk.
• Balanced accuracy (BACC) is used as the main evaluation measure. BACC is equivalent to the
average sensitivity or recall, which treats all the classes equally, and is expressed as:
• Where TP denotes true positives, FN denotes false negatives and C denotes the number of classes.
The averaged specificity and the average area under the receiver operating characteristic curve
(AUC) are also reported for the evaluation of the results of state-of-the-art algorithms.
Test Cases
Test Case 1:
Image 1 is imported from the test folder:
Importing image and rescaling it to the height 180 and width 180:
Converting the image to array and applying it on the created CNN model and getting the shortlisted diseases
list.
Generation of the decision tree for the data of the shortlisted diseases:
Test Case 2:
Image 2 is imported from the test folder:
Importing image and rescaling it to the height 180 and width 180:
Converting the image to array and applying it on the created CNN model and getting the shortlisted diseases
list.
Generation of the decision tree for the data of the shortlisted diseases:
References
[1] Yao, P., Shen, S., Xu, M., Liu, P., Zhang, F., Xing, J., Shao, P., Kaffenberger, B. and Xu, R.X.
(2022). Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification. IEEE Transactions on Medical Imaging, 41(5), pp.1242–1254.
doi:10.1109/tmi.2021.3136682.
[2] Chakraborty, S., Paul, H., Ghatak, S., Pandey, S.K., Kumar, A., Singh, K.U. and Shah, M.A.
(2022). An AI-Based Medical Chatbot Model for Infectious Disease Prediction. IEEE Access, 10,
pp.128469–128483. doi:10.1109/access.2022.3227208.
[3] Thurnhofer-Hemsi, K., Lopez-Rubio, E., Dominguez, E. and Elizondo, D.A. (2021). Skin Lesion
Classification by Ensembles of Deep Convolutional Networks and Regularly Spaced Shifting. IEEE
Access, 9, pp.112193–112205. doi:10.1109/access.2021.3103410.
[4]Sanas, S., Pawale, P., Ghadage, G. and Sahani, M. (2021). SKIN DISEASE PREDICTION. 8(4),
pp.4344–4347.
References
[5] Yu, H.Q. and Reiff-Marganiec, S. (2021). Targeted Ensemble Machine Classification
Approach for Supporting IoT Enabled Skin Disease Detection. IEEE Access, pp.1–1.
doi:10.1109/access.2021.3069024.
[6] Mtende Mkandawire and Dr. Glorindal Selvam (2022). Prediction of Skin Diseases
using Machine Learning Algorithms. International Journal of Advanced Research in
Science, Communication and Technology, pp.54–61. doi:10.48175/ijarsct-7139.
[7] Safi, Z., Abd-Alrazaq, A., Khalifa, M. and Househ, M. (2020). Technical Aspects of
Developing Chatbots for Medical Applications: Scoping Review. Journal of Medical
Internet Research, 22(12), p.e19127. doi:10.2196/19127.
References
[8] Gandhi, M., Kumar Singh, V. and Kumar, V. (2019). IntelliDoctor – AI based Medical Assistant.
In: 2019 Fifth International Conference on Science Technology Engineering and Mathematics
(ICONSTEM).
[9] Yao, C., Qu, Y., Jin, B., Guo, L., Li, C., Cui, W. and Feng, L. (2016). A Convolutional
Neural Network Model for Online Medical Guidance. IEEE Access, 4, pp.4094–4103.
doi:10.1109/access.2016.2594839.
[10] Patil, A. (2022). Healthcare Chatbot using Artificial Intelligence. International
Journal for Research in Applied Science and Engineering Technology, 10(8), pp.905–
909. doi:10.22214/ijraset.2022.46299.
Algorithm:
1. Calculate entropy for the dataset.
2. For each node
2.1. Calculate entropy for all its categorical values.
2.2. Calculate information gain for the node.
3. Find the node with the highest information gain at a particular level.
4. Repeat steps 1 to 3 till we reach the leaf node and have created our decision tree.
Entropy:
i - is the set of classes in Dataset
pi - is the probability of i in Dataset
E=
Information Gain:
T = Target
A = the variable(column) we are testing
v = each value in A