A Study on Predictive Algorithms in Heal
A Study on Predictive Algorithms in Heal
Volume 11, Issue 10, October 2020, pp. 1239-1245, Article ID: IJARET_11_10_119
Available online at http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=10
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: 10.34218/IJARET.11.10.2020.119
ABSTRACT
In this Data Science period, there are a huge amount of data is generated. In the
health care system, there are predictive algorithms and machine learning techniques
but still not effective. Algorithms used in health care can be made effective and can be
used for predictions. This particular covid -19 period has made good healthcare
facilities for countries to revamp their strategies in health care. Giving proper
predictions, routing to the appropriate treatment, providing good health care facility,
correct diagnosis all these can be done with the machine learning algorithms. These
algorithms are widely used in various areas of the medical field. This paper suggests
some predictive algorithms and usage of visualization which can be used in the field of
healthcare.
Key words: Visualization, predictive algorithms, Health care.
Cite this Article: K. Tamilselvi and Dr. K. Ramesh Kumar, A Study on Predictive
Algorithms in Health Care System Analysis, International Journal of Advanced
Research in Engineering and Technology, 11(10), 2020, pp. 1239-1245.
http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=10
1. INTRODUCTION
1.1. Predictive Analytics
In today’s world data generation and data analysis is a boon. But getting the correct data for
proper input is the biggest challenge. Health science provides huge data sets. As in the present
scenario day by day, there are more predictive models and data visualization tools, at some
point in time the models help to give an accurate result. In some cases, there is a mismatch in
the model. Taking the proper input, processing the input, and churning out the correct output
is going to be the key factor. The countries which were boasting of high health care facilities
came down so drastically that they were not able to sustain the current situation.
With the help of data science, the hospital can predict the deterioration in patient’s health
and provide preventive measures and start an early treatment that will assist in reducing the
risk of the further aggravation of patient health. Prediction can also play a role in monitoring
the logistic supply of hospitals.
Monitoring the Patients health, tracking and preventing the diseases, providing virtual
assistance all these can be done with the help of predictive modeling.
In this pandemic period, many people are finding it difficult to approach Hospitals. As
most people have information regarding the vitals (BP, Sugar, Temperature, oxygen, pulse,
etc.,) can this information be used for predicting their health condition without getting help
from the hospitals? Not all cases can be done like this, predictions can help us to do this.
The various fields in which predictive analytics is used are Medicine and Healthcare,
Traffic congestion, Business intelligence, Education, Finance, HR, Entertainment, etc. Almost
37% of the analytics are carried out in the field of medicine and healthcare.[1]
2. METHODOLOGY
As there is always a Medical dataset is available what are the steps that can be taken in the
prediction modeling has to be discussed.
STEP 1:
Identifying the data. An environment that enables us to automate, extract, aggregate, and
integrate relevant information(It can be patient details or any relevant data) and apply good
analytics to measure and organize processes and outcomes has to be set up.
STEP 2:
Prediction Modeling
To gather the initial data and evaluate algorithm models c4.5 and KNN and Random
Forest
To select one of the best performing algorithms
To run in a real-world environment
3. ARCHITECTURE
EVALUATING
PREDICTION MODELS
C4.5,KNN MODEL
Figure 1
4. ALGORITHMS
To get the data, the scenario in the case of a country like India, data is huge. Some officially
perform testing on a particular area for checking the corona virus. Some are done manually,
some data are digitized. Will the data remain with the Government for future use? A
sophisticated tool is needed to collect all this data. The data can be from a record book, or a
mobile or a WhatsApp message or the hospital. So we might not get the data in the form of
.csv, the data has to be collected from various data sources. That architecture has to be
designed. And then various applications can be used for visualization. Visualization and
models can play a big role in data analytics.
Machine learning algorithms have received a huge amount of interest from researchers in
the past few years in the Healthcare industry predicting various applications of health data.
Experimental work has been carried out using various classification algorithms, like Decision
Tree(DT), Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR),
and Adaptive Boosting (AdaBoost). Opt-KNN is more effective than traditional KNN for
prediction accuracy by assuming the K-value.[2]
In this paper, it is suggested that predictive algorithms can be useful in health data
sciences. Here modeling decision tree, C4.5 was created using input variables and
determining target variables. Also, the sensitivity, specificity, accuracy, as well as positive
and negative predictive values were used to evaluate the model. And the experiment gave
good results according to this paper.[3]
In this paper, an improved KNN algorithm is proposed in place of KNN algorithm, and
the expected purpose is achieved, which reduces the running time of the KNN algorithm.
KNN algorithm also proves to be good in analyzing health data.[4]
In this paper[7] prediction of eight disease categories was done. Overall, the RF ensemble
learning method outperformed SVM, bagging and boosting in terms of the area under the
receiver operating characteristic (ROC) curve (AUC). Also, RF has the advantage of
computing the importance of each variable in the classification process.
In this paper[8] the feature ranking based models proposed have achieved better training
(cross-validation) accuracy as well as better independent testing accuracy for medical data
classification as compared with the baseline (i.e., without feature ranking). Their performance
is also quite promising it presents the testing accuracy and F-score of the best models reported
in this paper.
Four algorithms, SVM, NB, k-NN, and C4.5 on the Wisconsin Breast Cancer datasets
were compared for its efficiency and effectiveness in terms of accuracy, precision, sensitivity,
and specificity and SVM reaches and accuracy of 97.13% and outperforms well.[9]
In [10] they have discussed two challenges to develop (1) algorithms and data structures
for evaluation of knowledge integrity in the data set and (2) algorithms that measure the
influence that modifications of data values have on discovered statistical importance of
patterns are being developed. Also insisted on the role of visualization techniques, as the
picture is easiest for people to understand, and can provide plenty of information in a snapshot
of the results.
[11]showed that the machine learning classification model - the random forest model
accurately predicted fatty liver disease patients using minimum clinical variables.
In [12] they have presented a model that implements the Random Forest algorithm
boosted by the AdaBoost algorithm, with an F1 score of 0.86 on the COVID-19 patient
dataset. The model used the patient's geographical, travel, health, and demographic data to
predict the severity of the case and the possible outcome, recovery, or death. It provides
accurate predictions even on imbalanced datasets. The data analyzed, has revealed that death
rates were higher amongst the Wuhan natives compared to non-natives. Also, male patients
had a greater death rate compared to female patients. The majority of affected patients are
aged between 20 and 70 years.
This paper [13] used six popular machine learning algorithms for predictive analytics,
which include SVM, KNN, LR, DT, RF, and NB. Predictions were made about diabetes on
the PIMA Indian dataset consisting of 768 records and 8 attributes were selected for this
model. The experimental results showed that SVM and KNN give the highest accuracy for
predicting diabetes. Some limitations are the size of the dataset and missing attribute values.
To build a prediction model for diabetes with 99.99% accuracy, it suggested having thousands
of records with zero missing values which will reveal more insights and better prediction
accuracy.
A comprehensive study on diabetes dataset in [14] with Random Forest (RF), SVM, k-
NN, CART, and LDA algorithms showed results that RF is giving more accurate predictions
with compared to other algorithms to predict gestational or Type1 or Type2 diabetes.
The study in [15] to build a model and evaluate AdaBoost and Gradient boost ensemble
machine learning algorithms to predict diabetes disease on Pima Indians Diabetes Dataset
suggests that Gradient Boosting achieves better prediction accuracy over AdaBoost.
Steps to be taken while using predictive modeling:
In this paper[16] rules for planning a predictive model is discussed
This information can be taken into account when the prediction model is designed
1.Use out-of-sample prediction to generate more accurate and generalizable models
2.Training and testing data independent
3.Using internal validation(i.e., cross-validation) as a practical solution for validating
predictive models
4.Use internal validationShare data, code, and models to facilitate external validation and
open science
5.predicting what is the actual thing we have to predict
6.One model will not fit all problems
7.Interpretation takes the key role
Several models are working on the health data but predicting it correctly is the key.
For example, predicting covid 19 using the testing methods in various countries are not
false proof.
As the findings are interesting present scenario teaches us all models will not work every
time. Finding the model prediction alone is not enough for the implementation part which
involves capital and investment of time. Sometimes the data might provide lesser insights. So
a false proof model has to be designed.
5. VISUALIZATION
Visualizing the data improve the way we recognize the data. There are so many advantages of
visualization. For predicting the disease it can be useful to put in a good visualization format
that can be understood by everyone.
For example
Figure 2
This is an example of the Cleveland Clinic Heart Disease dataset which contains 13
variables related to patient diagnostics and one outcome variable indicating the presence or
absence of heart disease is visualized.[17] So predicting and visualizing can be helpful for the
human community.
An R package is flexible and very powerful for Data visualization. The predicted data can
be visualized using any graph based on the application. In [18] the prediction of covid 19 is
pictured clearly whether the occurrence of the virus as positive or negative using ggplot in
Louisiana and Massachusetts. The first diagram shows the positive cases and the second both
cases for comparison.
Figure 3
Figure 4
6. CONCLUSION
Major Challenges are always there in the medical field. Machine learning methods,
visualization tools, Deep learning and ANN is proved a great way of understanding and
exploiting information for the medical field. In the period of mobile Apps and various
misleading information, a proper prediction provided can be vital. In this paper, various
prediction models are reviewed and visualization tools are discussed. The combination of the
two can be more vital in the prediction of diseases. Prediction and detection are going to play
a key role in the medical field, as it may decrease the complexity and mortality rate of
mankind.
REFERENCES
[1] Močarníková, Katarína, and Michal Greguš. "Conceptualization of Predictive Analytics by
Literature Review." Data-Centric Business and Applications. Springer, Cham, 2020. 205-234.
[2] Sarker, Iqbal H., et al. "K-Nearest Neighbor Learning based Diabetes Mellitus Prediction and
Analysis for eHealth Services." EAI Endorsed Transactions on Scalable Information
Systems 7.26 (2018). doi:10.4108/eai.13-7-2018.162737
[3] Hamed Sabbagh Gol, Hamed Sabbagh Gol, A detection of type2 diabetes using C4.5 decision
tree, J. Health biomed. info. 2018; 5 (2): 293-303
[4] Xing, Wenchao, and Yilin Bei. "Medical Health Big Data Classification Based on KNN
Classification Algorithm." IEEE Access 8 (2019): 28808-28819.
[5] Kaur, Harleen, and Vinita Kumari. "Predictive modelling and analytics for diabetes using a
machine learning approach." Applied computing and informatics (2020).
[6] Boukenze, Basma, Hajar Mousannif, and AbdelkrimHaqiq. "Predictive analytics in healthcare
system using data mining techniques." Comput Sci Inf Technol 1 (2016): 1-9.
[7] Khalilia, Mohammed, Sounak Chakraborty, and Mihail Popescu. "Predicting disease risks
from highly imbalanced data using random forest." BMC medical informatics and decision
making 11.1 (2011): 51.
[8] Alam, Md Zahangir, M. Saifur Rahman, and M. Sohel Rahman. "A Random Forest based
predictor for medical data classification using feature ranking." Informatics in Medicine
Unlocked 15 (2019): 100180.
[9] Asri, Hiba, et al. "Using machine learning algorithms for breast cancer risk prediction and
diagnosis." Procedia Computer Science 83 (2016): 1064-1069.
[10] Milovic, Boris, and Milan Milovic. "Prediction and decision making in health care using data
mining." Kuwait Chapter of the Arabian Journal of Business and Management Review 1.12
(2012): 126.
[11] Wu, Chieh-Chen, et al. "Prediction of fatty liver disease using machine learning
algorithms." Computer methods and programs in biomedicine 170 (2019): 23-29.
[12] Iwendi, Celestine, et al. "Covid-19 patient health prediction using boosted random forest
algorithm." Frontiers in public health 8 (2020): 357.
[13] Sarwar, Muhammad Azeem, et al. "Prediction of diabetes using machine learning algorithms
in healthcare." 2018 24th International Conference on Automation and Computing (ICAC).
IEEE, 2018.
[14] Kumar, P. Suresh, and S. Pranavi. "Performance analysis of machine learning algorithms on
diabetes dataset using big data analytics." 2017 International Conference on Infocom
Technologies and Unmanned Systems (Trends and Future Directions)(ICTUS). IEEE, 2017.
[15] Bahad, Pritika, and Preeti Saxena. "Study of adaboost and gradient boosting algorithms for
predictive analytics." International Conference on Intelligent Computing and Smart
Communication 2019. Springer, Singapore, 2020.
[16] Scheinost, Dustin, et al. "Ten simple rules for predictive modeling of individual differences in
neuroimaging." NeuroImage 193 (2019): 35-45.
[17] https://www.r-bloggers.com/2019/09/heart-disease-prediction-from-patient-data-in-r/
[18] https://www.infoworld.com/article/3533453/easier-ggplot-with-the-ggeasy-r-package.html