0% found this document useful (0 votes)
10 views

Conference Paper 28-02

This document discusses the development of an ensemble learning approach to enhance multi-disease detection accuracy, specifically for diabetes, kidney diseases, and heart diseases. The proposed system integrates multiple machine learning models, achieving up to 95% accuracy, significantly outperforming standalone models. The research highlights the potential of ensemble learning to improve diagnostic precision and patient care in healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Conference Paper 28-02

This document discusses the development of an ensemble learning approach to enhance multi-disease detection accuracy, specifically for diabetes, kidney diseases, and heart diseases. The proposed system integrates multiple machine learning models, achieving up to 95% accuracy, significantly outperforming standalone models. The research highlights the potential of ensemble learning to improve diagnostic precision and patient care in healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Enhancing Multiple Disease Detection Using

Ensemble Learning: A Robust Approach for


Improved Accuracy
Mahadevuni Shirisha Devaruppala Kathyayani
Sreenidhi Institute of Science and Technology Sreenidhi Institute of Science and Technology
Hyderabad, India Hyderabad, India
[email protected] [email protected]

M Lakshmi Sravani Dr M Shailaja


Sreenidhi Institute of Science and Technology
Hyderabad, India Hyderabad, India
[email protected] email address

ABSTRACT – While traditional diagnostic Introduction


methods have been reliable, they often require
significant time, manual effort, and are prone to According to world health organization, as of 2022,around
human error. As healthcare data becomes more 14% of adults aged 18 and older worldwide are living with
complex, there is a growing need for faster and more diabetes.10% of the global population are suffering from
accurate disease detection. Machine learning- kidney diseases and 32%of deaths occur globally due to the
powered automated diagnostic systems are proving cardiovascular diseases. Among heart diseases, coronary
to be a game changer in this space. heart disease (CHD) is deadly disease that occurs due to the
formation of a plaque in the walls of the coronary arteries,
systems rely on individual machine learning models i.e, the arteries that carries the oxygenated blood to the heart
like Support Vector Machines (SVM), Decision muscles, which in turn restricts the flow of the blood to the
Trees, Random Forest, K-Nearest Neighbors (KNN), heart causing chest pain ,termed as angina. If there is
and Logistic Regression. While these models complete blockage of the coronary artery due to plaque ,it
perform well for detecting a single disease, they tend leads to the heart attack [1].
to struggle when diagnosing multiple conditions.
Issues such as overfitting, poor generalization, and Diabetes is a condition that occurs due to the increase of the
difficulty handling complex medical data can reduce blood glucose levels in the body. The inability of the body to
accuracy and increase misclassification rates. cope up with such high levels of glucose may lead to a group
Despite the promise of ensemble learning—a of diseases. Kidney diseases occur due to the inability of the
technique that combines multiple models to improve body to maintain the eliminatory needs mostly due to any
accuracy—it remains underutilized in multi-disease injury or trauma.
detection.
In India ,61% of the deaths are due to the nob -
To tackle this issue, our research introduces an communicable diseases such as heart
ensemble learning approach that integrates multiple disorder ,cancer ,kidney diseases and diabetes which may be
models into a more advanced diagnostic system. By caused due to the environmental conditions and the living
leveraging the strengths of each model, our system habits of the people[4].
improves accuracy, reduces errors, and processes
In the medical field, machine learning can be used for the
diverse medical datasets efficiently. Our results
prediction, detection and diagnosis of the disease[5].
show that this ensemble model achieves 95%
accuracy, outperforming standalone models like
Logistic Regression (88%), SVM (89%), Random Ensemble learning is a machine learning technique that is
Forest (91%), and KNN (90%). used to achieve better results by combining the multiple
models. In this study, the ensemble models used for the
These findings highlight how ensemble learning can human disease prediction are random forest classifier,
revolutionize multi-disease detection, paving the gradient boosting, decision tree, xg boost, logistic
way for earlier diagnoses, improved patient care, and regression, knn, svm for the prediction of diseases namely
more advanced healthcare solutions. kidney diseases, heart diseases and diabetes.

Keywords 98%accuracy which is 8%greater than decision trees and the


best result with 99%accuracy was given by random forest.
Multi-disease detection, ensemble learning,
machine learning, SVM, Decision Trees, Random
Forest, KNN, Logistic Regression.
Methodologies Used
Literature Survey
Machine Learning: Machine Learning (ML) is a branch
Pratyush Lohumi et al.in [1]used ensemble of artificial intelligence (AI) that focuses on the
learning classification for medical diagnostics development of algorithms and statistical models that
explaining various machine learning techniques for enable computers to learn from and make predictions or
the prediction of heart disease using algorithms decisions based on data, without being explicitly
namely logistic regression, support vector programmed for each task. These algorithms are trained
machine , decision trees,random forest using the using labeled datasets, where the input data (features) are
parameters for performance measures namely paired with corresponding output labels (target variable),
accuracy, sensitivity,roc curve(receiver operating allowing the algorithm to learn the mapping between the
characteristic),AUROC . input and output.

Under the algorithm of decision trees, bagging and


boosting techniques were used and boosting was Support Vector Machine (SVM): SVM is a powerful
said to be more accurate. This study concludes that classification algorithm that helps distinguish different
random forest algorithm performed better than all diseases by finding the best possible boundary (hyperplane)
other algorithms. between them. It is particularly effective in handling
complex, high-dimensional medical data, making it a
Yi Yang et al [2] used multiple graph valuable tool for disease detection. However, SVM can be
convolutional networks and random forest for the computationally intensive, especially when dealing with
prediction of disease related mirna's using the large datasets, which may limit its scalability.
methods namely layer attention gcn and prediction
by RF. Using the materials such as mirna sequence Decision Trees: Decision Trees take a straightforward yet
similarity, mirna function similarity, disease target effective approach by making decisions based on feature
similarity and heterogeneous network. This study conditions. For example, a patient’s blood pressure or
concludes that mirna's have been confirmed to be glucose level could lead to different branches in the tree,
inextricably linked to the emergence of human ultimately predicting a specific disease. While highly
complex disease and MGCNRF can serve as a interpretable, Decision Trees can sometimes overfit the data,
scientific tool for predicting potential MDA's. making them less reliable for unseen cases.
Random Forest: To address the overfitting issue of
Rahma Atallah[3] et al. used machine learning Decision Trees, we incorporated Random Forest, an
majority voting ensemble method for the detection ensemble learning technique that builds multiple Decision
of heart disease and the first test was run on the Trees and combines their predictions. By averaging results
default parameter of the classifier and produced an from various trees, Random Forest improves accuracy and
accuracy of 80% and after running a Grid search generalization, making it a robust choice for analyzing
CV, the optimized parameters based on cross - complex medical data.
validation were found and the accuracy increased
to 88%.This study concludes that the model can be K-Nearest Neighbors (KNN): KNN is a simple yet
used to assist doctors in analyzing patient cases in effective method that classifies a patient’s condition based
order to validate their diagnosis and help decrease on the most similar cases in Logistic Regression: Logistic
human error. Regression is a well-established statistical model used for
both binary and multi-class disease prediction. It estimates
Sayali Ambedkar [4] et al used convolutional the probability of a disease based on medical features,
neural network for disease risk prediction ,using providing clear and interpretable results.
the algorithms namely naive bayers and knn By combining these models using ensemble learning, we
algorithm ,cnn-udrp algorithms and 65%of were able to capitalize on their strengths while minimizing
accuracy of disease risk prediction with the help of their weaknesses. This approach led to a highly accurate
structured and 82%accuracy using NB was found. multi-disease detection system, paving the way for early
diagnosis, improved patient care, and a more reliable
Vijetha Sharma et al.[5]used machine learning
future in healthcare.
techniques for heart disease prediction using the
techniques namely support vector machine,
decision trees, naive bayers and random forest
classification. In this study, the model developed
with SVM given. Block Diagram:
Bagging (Bootstrap Aggregating): Bagging
improves model stability by training multiple
Random Forest:
versions of the same algorithm on slightly different
subsets of data. These subsets are created through
random sampling with replacement, meaning
some data points appear multiple times while others
are omitted. Once all models are trained, their
predictions are averaged (for regression) or
combined using majority voting (for
classification). This method helps reduce variance
and prevent overfitting, making predictions more
stable and reliable. A well-known example is
Random Forest, which builds multiple Decision
Trees and combines their outputs to achieve higher
accuracy.

Boosting: Unlike bagging, boosting follows a


sequential approach, where models are trained one
after another. Each new model learns from the
mistakes of the previous one, focusing more on KNN:
misclassified cases to refine predictions. This
iterative process significantly boosts accuracy and
improves pattern recognition, which is particularly
beneficial in medical diagnostics where some
diseases may be harder to detect. Popular boosting
algorithms, such as AdaBoost, Gradient Boosting,
and XGBoost, are known for their ability to handle
imbalanced datasets and enhance classification
performance.

Stacking: Stacking takes ensemble learning to the


next level by combining predictions from multiple
diverse models using a meta-model. Instead of
simply averaging results like bagging, stacking
allows the meta-model (often a Logistic Regression
or Neural Network) to learn how to best integrate
the predictions from base models. This approach is
especially valuable when dealing with complex
medical datasets, where different models capture
different aspects of the data.

By integrating bagging, boosting, and stacking,


our system achieves higher accuracy, stronger
generalization, and more reliable disease
detection. This not only enhances diagnostic
precision but also contributes to earlier detection,
better patient outcomes, and more efficient
healthcare solutions.
Outcome graphs And HeatMaps:
Heart disease :
The analysis of heart disease prediction shows that ensemble
models like Random Forest and Gradient Boosting perform The evaluation of diabetes prediction models shows that ensemble
exceptionally well, with high accuracy and reliability. The techniques like Random Forest, XG Boost, and Gradient Boosting
Decision Tree model leads with the highest AUC of 0.83, while perform the best, achieving high accuracy and strong predictive
Logistic Regression and XG Boost also demonstrate strong
power. Logistic Regression also delivers reliable results, making it a
predictive capabilities. K-Nearest Neighbors performs
moderately well, whereas Support Vector Machine struggles with solid choice. K-Nearest Neighbors and Decision Tree models show
an AUC of 0.50, making it less effective. The accuracy moderate performance, while Support Vector Machine struggles
comparison further supports these findings, with Random Forest, with the lowest accuracy and ROC score, making it less effective
Gradient Boosting, and XG Boost achieving the best results. On for this dataset. The similarity between accuracy and ROC values
the other hand, SVM shows the lowest accuracy, indicating it across models suggests consistent classification performance.
may not be suitable for this dataset. Overall, ensemble learning Overall, the results highlight the strength of ensemble learning in
techniques prove to be highly effective for heart disease
enhancing diabetes prediction accuracy.
detection.

Diabetes disease:
Kidney disease : Conclusion

In this experiment, we used ensemble learning using the algorithms


namely random forest classifier, gradient boosting, decision tree, xg
boost, logistic regression, knn, svm for the prediction of human
diseases namely kidney disease, heart diseases and diabetes.
Random forest classifier method was found to be most accurate than
other methods and the accuracy was found to be more than 90%.
We can conclude, due to its randomness, the data was well captured
by the random forest model and provides the best fit for the data.
Ensemble learning is concluded to be more effective than using a
single method would have been.

Further Research
The field of multiple disease detection using ensemble learning has
immense potential for further development. Future research can
explore more advanced deep learning techniques, such as
transformer-based models, to enhance accuracy and efficiency.
Expanding datasets with diverse medical records from different
demographics will help improve model generalization and reduce
biases.
Additionally, real-time implementation of ensemble models in
clinical settings could be a game-changer, providing instant
diagnostic support to healthcare professionals. Integration with
wearable devices and electronic health records (EHRs) can further
improve early disease detection and personalized treatment
recommendations.
Another promising direction is explainable AI (XAI), which can
make these models more transparent and trustworthy for medical
practitioners. By focusing on model interpretability, regulatory
compliance, and real-world validation, ensemble learning can play a
crucial role in shaping the future of AI-driven healthcare.

REFERENCES
[1] P. Lohumi, S. Garg, T. P. Singh and M. Gopal, "Ensemble
Learning Classification for Medical Diagnosis," 2020 5th
International Conference on Computing, Communication and
Security (ICCCS), Patna, India, 2020, pp. 1-5, doi:
10.1109/ICCCS49678.2020.9277277.
[2] Y. Yang, Y. Sun, F. Li, B. Guan, J. -X. Liu and J. Shang,
"MGCNRF: Prediction of Disease-Related miRNAs Based on
Multiple Graph Convolutional Networks and Random Forest," in
IEEE Transactions on Neural Networks and Learning Systems, vol.
35, no. 11, pp. 15701-15709, Nov. 2024, doi:
10.1109/TNNLS.2023.3289182
[3] R. Atallah and A. Al-Mousa, "Heart Disease Detection Using
Machine Learning Majority Voting Ensemble Method," 2019 2nd
The evaluation of kidney disease prediction models reveals that International Conference on new Trends in Computing Sciences
ensemble techniques like Random Forest and Gradient Boosting (ICTCS), Amman, Jordan, 2019, pp. 1-6, doi:
perform the best, achieving the highest accuracy and reliability. 10.1109/ICTCS.2019.8923053.
The Decision Tree model also shows strong predictive power, [4] S. Ambekar and R. Phalnikar, "Disease Risk Prediction by
making it a valuable choice. Logistic Regression maintains high Using Convolutional Neural Network," 2018 Fourth International
accuracy, while K-Nearest Neighbors and Support Vector Conference on Computing Communication Control and Automation
Machine deliver moderate performance. On the other hand, XG (ICCUBEA), Pune, India, 2018, pp. 1-5, doi:
Boost struggles with the lowest accuracy and ROC score, making 10.1109/ICCUBEA.2018.8697423.
it less effective for this dataset. Overall, these findings emphasize [5] Shah, D., Patel, S. and Bharti, S.K., 2020. Heart disease
the importance of ensemble learning in improving the accuracy prediction using machine learning techniques. SN Computer
and reliability of kidney disease prediction. Science, 1(6), p.345.

You might also like