0% found this document useful (0 votes)
28 views6 pages

Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms

The article presents a study on improving heart disease prediction accuracy using a hybrid machine learning approach that combines KNN and SVM algorithms. The proposed hybrid model outperformed individual algorithms, achieving an accuracy of 81% compared to 76% for SVM and 75% for KNN. The research utilized a dataset from the UCI machine learning repository and implemented the model using Python and Jupyter Notebook.

Uploaded by

jagaenator
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms

The article presents a study on improving heart disease prediction accuracy using a hybrid machine learning approach that combines KNN and SVM algorithms. The proposed hybrid model outperformed individual algorithms, achieving an accuracy of 81% compared to 76% for SVM and 75% for KNN. The research utilized a dataset from the UCI machine learning repository and implemented the model using Python and Jupyter Notebook.

Uploaded by

jagaenator
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023

Contents available at the publisher website: G A F T I M . C O M

Journal homepage: https://journals.gaftim.com/index.php/ijcim/index

Improving Heart Disease Prediction Accuracy Using a Hybrid Machine Learning


Approach: A Comparative study of SVM and KNN Algorithms
Rehan Ahmed ¹, Maria Bibi ², Sibtain Syed³
¹˒³ Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Pakistan
2 University of Engineering and Technology Taxila, Pakistan

A R T I C L E I N F O A B S T R A C T

Keywords: The largest cause of mortality worldwide is heart disease, and early identification is
Heart Diesease Prediction, critical in limiting disease development. Early approaches for detecting
KNN, Machine Learning cardiovascular illnesses assisted in determining the progressions that should have
Hybrid Model, SVM. happened in high-risk persons, reducing their risks. The major goal is to save lives
by recognising anomalies in cardiac circumstances, which will be performed by
Received: May, 11, 2023 identifying and analysing raw data produced from cardiac information. Machine
learning can provide an efficient method for making decisions and creating accurate
Accepted: June, 22, 2023
forecasts. Machine learning techniques are being used extensively in the medical
Published: June, 23, 2023 business. A unique machine learning technique is provided in the proposed study
to predict cardiac disease. The planned study made advantage of open source heart
disease dataset from kaggle. Hybrid algorithms for machine learning prediction are
the logical mixture of many previous methodologies designed to improve efficiency
and produce improved outcomes. The presented work introduces a hybrid method
that employs the notion of categorization for prediction analysis. We used real
patient data to build a hybrid technique to predicting cardiac disease. KNN and SVM
classification techniques were utilized in this paper. Jupyter Notebook is used to
implement this hybrid method. A hybrid technique outperforms other algorithms
in the prediction analysis of heart disease.

1. INTRODUCTION
The practise of collecting useable information and medical sector is best described by the guiding
patterns from a range of raw datasets is usually variables that have an influence on it, among which
referred to as data mining. It comprises analyzing is healthcare data, which may be viewed as the
massive amounts of data and discovering trends or system’s foundation for improvement for each
patterns using one or more techniques. It is useful patient. The application of data mining techniques
in a variety of contexts, including analysis, to extract knowledge from medical records or
research, and healthcare. Because data mining is a datasets will help in the discovery of sickness
method of investigation and Numerous excellent occurrence, evolution, recognition, and significant
early prediction systems for healthcare have facts to establish the sources of diagnostic
evolved from medical datasets, which can detect procedures based on the components existing
trends in large volumes of data (J. H. Joloudari, et inside healthcare. An investigation for the
al., 2019). Improving the level of healthcare in the information cycle for the classification of illnesses
R. Ahmed, M. Bibi, & S. Syed International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023- 50

might potentially involve data mining techniques. applications. (Hashi, E.K. and Zaman, M.S.U, 2020)
Therefore, it will reveal hidden relationships and A cognitive method is used in this article to predict
detect patterns in the data, leading to better and heart disease. The study assesses five machine
enhanced diagnostic recognition. This paper gives learning techniques for prediction based on their
us insight into the subject of heart disease accuracy. The logistic model tree is used to increase
prognosis. The heart, which weighs around 3 prediction accuracy by utilising an ADA boost and
pounds, is a vital body organ. The ribcage shields it, bagging model. The experimental results show that
as it is located on the left side of the chest. All the the random forest model predicted cardiac disease
body’s organs are supplied with blood by the heart with great accuracy. (Soni, Jyoti, et al., 2011) The
through a system of blood arteries. The blood helps author of this article investigates the prediction of
the body stay healthy by supplying it with the cardiac disease using methods based on data
minerals and oxygen it needs. Heart illness or heart mining. Decision tree algorithm, KNN
dysfunction can cause serious health issues like method,Bayesian classification, neural network
heart attacks, strokes, and even death. To classifications, and techniques are all evaluated in
guarantee appropriate treatment and care, it is the study. Furthermore, the author looks into the
crucial to identify any heart disease symptoms at usage of genetic algorithms for feature selection in
an early stage. The aim of this study is to design a identifying critical traits for heart disease. The
system that aids in determining whether a patient decision tree model achieves good accuracy in the
has cardiac disease by suggesting a hybrid experiments. (Alkeshuosh, Azhar Hussein, et al,
approach utilizing data mining techniques. For this, 2017) this publication describes a method for
predictions are made using a predictive analysis detecting cardiac disease that employs the Particle
model and a variety of algorithms. This model’s Swarm Optimization method. The author created
procedure is divided into four steps. Pre-process explicit rules based on the Particle swarm
the raw data at this step. At the second stage, optimization algorithm and tested them to find a
transform the data that has been processed into a more precise rule for detecting heart disease.
useable form for model. Model training in the third Following the evaluation of the rules, the author
stage. Fourth step uses a learning model to create employed the C 5.0 algorithm for disease
predictions and then reviews them as necessary. classification based on binary classification. The
author verified great accuracy was achieved using
Particle sworm optimization and a Decision Tree
2. LITERATURE REVIEW
algorithm. (UCI Official site) offers a study on the
(Jabbar, et al., 2016) conducted research on heart prediction of heart disease using data mining
disease by utilising the random forest method and techniques. The research looks at techniques like
Cleveland dataset. In order to carry out the the KNN algorithm, decision tree algorithms,
investigation, the author uses the Chi Square neural network classifications, and Bayesian
attribute selection of features as well as the GA- classification techniques. Furthermore, the author
based selection of features model. Despite the fact investigates the use of genetic algorithms for
that the evaluation was confined to existing feature selection of critical heart disease features.
machine learning models, the experimental The study tests different strategies and assesses
findings reveal that the suggested model with GA their performance, concluding that the decision
feature selection beat the present models. (Al-Milli, tree model achieves excellent accuracy. (P. M.
and Nabeel, 2013) explores the use of a back Barnaghi, et al., 2012) In this article a author uses
propagation neural network in predicting cardiac random forest and decision tree first for prediction
disease. The author used a deep learning model and to measure their accuracy. After that he uses a
known for its accuracy in disease prediction and mixed hybrid approach consisting decision tree
implemented it using a deep learning. The and random forest for the prediction of heart
Cleveland dataset was used in the study, and a disease and measure the accuracy and compare
simulation was done in Matlab. While the research from the previous. And it gives an excellent
has yielded promising results, there is room for performance as compared to other models.
improvement by employing deep learning models
and applying the findings to real-world

https://doi.org/10.54489/ijcim.v3i1.223 Published by: GAFTIM, https://gaftim.com


R. Ahmed, M. Bibi, & S. Syed International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023- 51

3. ALGORITHMS USED benefits. Integration of algorithms produce new


3.1. KNN results that may be more accurate and precise than
utilising the techniques alone. The hybrid model is
A supervised learning algorithm called the K-
created by merging the KNN and SVM methods.
Nearest Neighbor (KNN) classifier divides a given
Svm probabilities are used in the hybrid model. The
dataset into various clusters based on the user’s
knn probabilities are combined with the train data
citations (P. M. Barnaghi, et al., 2012). This
and sent into the svm algorithm. Likewise, SVM
algorithm is flexible and can be applied to
probabilities are determined and supplied into the
classification and regression issues. KNN’s
test data. Lastly, values are forecast. Machine
fundamental premise is that items that are similar
learning is applied to a preprocessed dataset, and
often cluster together, and the algorithm finds
the predicted cardiovascular disease for the
these clusters by measuring the separation
provided test dataset.
between different data points. Since KNN holds the
data and performs operations on it during the
classification phase rather than learning directly 5. METHODOLOGY
from the training dataset, it is sometimes referred The suggested work is written in Python and
to as a lazy learner algorithm. A new data point’s implemented in Jupyter Notebook. All of the
classification is determined by the majority vote of methodology’s implementation phases are used
its closest neighbours. Modifying the value of k can here. The dataset used to train the system is
have an impact on the accuracy of algorithm. obtained from the UCI machine learning library
(UCI Official site).
3.2. Support Vector Machine 5.1. Data Description.
SVM is a specialised supervised machine learning This study utilized a heart disease Dataset obtained
classifier that may be utilised in statistical learning from the UCI machine learning repository to
(Alkeshuosh, Azhar Hussein, et al., 2017) for linear construct a model. The Dataset included various
and non-linear dataset categorization. It works by attributes such as sex, age, resting blood pressure,
utilising a non-linear mapping function to change chest pain, fasting blood sugar, cholesterol,
the original dataset into a more understandable maximum heart rate achieved, resting
representation. SVM seeks for a linear hyperplane electrocardiogram, ST depression induced by
in this newly transformed space that can partition exercise, exercise-induced angina, number of
the data points into different classes. The major vessels, slope of peak exercise ST, pred
hyperplane is an ideal decision-making boundary, attribute, and thalassemia.
and SVM creates them with support vectors. The
hyperplane can split data into multiple classes by
utilizing an appropriate function for nonlinear
function. Despite being a precise classification
approach, SVM is computationally expensive since
it involves addressing quadratic issues using
mathematical functions that require sophisticated
calculations that can take time (Soni, Jyoti, et al.,
2011)

3.3 Hybrid Approach


A hybrid is typically defined as a combination of
two or more elements with traits that are either
similar or dissimilar. Different elements have
various characteristics, however once they’re
joined, the final element may have both
characteristics. A hybrid approach combines two
or more algorithms, each init hold thier set of

https://doi.org/10.54489/ijcim.v3i1.223 Published by: GAFTIM, https://gaftim.com


R. Ahmed, M. Bibi, & S. Syed International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023- 52

model’s performance on a dataset containing


known actual values. A confusion matrix is a table
that summarises the outcomes of a classification
issue prediction, offering four values: True Positive
(TP), False Negative (FN), True Negative (TN), and
False Positive (FP). Several factors are obtained
from the confusion matrix to compare the
strategies. The study computed three factors for
each technique: accuracy, precision, and recall.
Accuracy: This parameter calculates the
proportion of values obtained that agree with the
true values.
Acuracy Formula = TP + TN/TP + FP + TN + FN
Precision: This parameter computes the
proportion of results that are relevant.
Precision Formula = TP /TP + FP
Recall: This parameter computes the proportion of
total relevant values categorised properly by the
algorithm.
Recall Formula = TP/ TP + FN
Figure 1
6. DISCUSSION ON THE RESULTS
5.2. Working Python was used to implement our suggested
study, together with relevant libraries such as
Proposed workflow utilized two machine learning sklearn, pandas, and matplotlib. This study’s
algorithms and a hybrid model to achieve accurate dataset was taken from uci.edu and consisted of
predictions of heart disease. The advantages of this heart disease cases. To predict cardiac disease,
approach include implementing an optimized algorithms based on machine learning such as KNN
model through the use of the hybrid model. The and SVM were used. A hybrid model integrating
methodology involved collecting the dataset from KNN and SVM was also created to increase the
uci.edu, performing data visualization, and uniqueness of this study. According to the findings
splitting the dataset into test and train data. We are of this study shown in table 1, the hybrid model
applying the KNN and SVM models for training and were successful in diagnosing cardiac disease. KNN
analysis. The model is trained using 70% of the had an accuracy of roughly 75%, SVM had an
dataset as training input, and the remaining 30% is accuracy of 76% , and the hybrid model had an
used as testing data for heart disease prediction. accuracy of 81% .
The KNN, SVM, and Hybrid of both are used to
predict heart disease. The predicted values are Table 1. Experimental results
then plotted and compared for accuracy. Algorithm Accuracy Precision Recall F1

SVM 76 80 80 80
5.3. Camparative analysis
In this scientific work, the proposed strategy was KNN 75 80 78 79
compared against current approaches to establish
their usefulness. The findings demonstrated that Hybrid 81 80 89 84
the suggested method is more precise, efficient,
and appropriate for predictive analysis. The Table 1 shows the performance metrics achieved
assessment is carried out using confusion matrix by the proposed hybrid technique and based
parameters, that are typically used to evaluate a algorithms following their implementation, include
recall, precision, accuracy, and f1-score.

https://doi.org/10.54489/ijcim.v3i1.223 Published by: GAFTIM, https://gaftim.com


R. Ahmed, M. Bibi, & S. Syed International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023- 53

Table 2. Accuracy results Table 2 compares the proposed hybrid model to


Algorithm Accuracy
existing algorithms on the basis of the accuracy
performance indicator. The proposed hybrid
SVM 76 technique outperforms all existing algorithms in
KNN 75 terms of accuracy. The suggested hybrid
Hybrid 81 approach’s parameter values outperform the
results of other algorithms, as indicated by the
graph shown in

Figure 2
Figure 2. This performance analysis reveals that
the suggested approach outperforms existing REFERENCE
algorithms in predicting liver disorders. J. H. Joloudari, H. Saadatfar, A. Dehzangi, and S. Shamshirband,
“Computer-aided decision-making for predicting liver
disease using PSO-based optimized SVM with feature
7. CONCLUSION selection,” Informatics Med. Unlocked, vol. 17, no.
The method of information mining includes October, p. 100255, 2019, doi:
10.1016/j.imu.2019.100255.
analyzing crude information to reveal critical Jabbar, M. A., B. L. Deekshatulu, and Priti Chandra. ”Intelligent
designs that can advise future applications. This heart disease prediction system using random forest
strategy utilizes different classification strategies and evolutionary approach.” Journal of Network and
to anticipate heart disorders. Our research Innovative Computing 4.2016 (2016):175- 184
Al-Milli, Nabeel. ”Backpropagation neural network for
approach utilizes knn to extricate properties from prediction of heart disease.” Journal of theoretical and
a endless dataset and applies svm classification to applied information Technology 56.1 (2013): 131-135.
create a model for predictive analysis. Compared to Hashi, E.K. and Zaman, M.S.U., 2020. ”Developing a
built up calculations such as KNN and SVM, our Hyperparameter Tuning Based Machine Learning
proposed approach yields prevalent comes about. Approach of Heart Disease Prediction. Journal of
Applied Science Process Engineering”, 7(2), pp.631-647.
Our investigation shows that the cross breed Soni, Jyoti, et al. ”Predictive data mining for medical diagnosis:
approach accomplishes a 81% exactness rate in An overview of heart disease prediction.” International
anticipating heart disease prediction, Journal of Computer Applications 17.8 (2011): 43-48.
outperforming the execution of other calculations Alkeshuosh, Azhar Hussein, et al. ”Using PSO algorithm for
producing best rules in diagnosis of heart disease.” 2017
on the same dataset. international conference on computer and applications

https://doi.org/10.54489/ijcim.v3i1.223 Published by: GAFTIM, https://gaftim.com


R. Ahmed, M. Bibi, & S. Syed International Journal of Computations, Information and Manufacturing (IJCIM) 3(1) -2023- 54

(ICCA). IEEE, 2017. Inf. Comput. Networks, vol. 27, no. Icicn, pp. 62–66,
Soni, Jyoti, et al. ”Predictive data mining for medical diagnosis: 2012.
An overview of heart disease prediction.” International Dr. M. Kavitha, G. Gnaneswar, R. Dinesh, Y. Rohith Sai, R. Sai
Journal of Computer Applications 17.8 (2011): 43-48. Sura et al. ”Heart Disease Prediction using Hybrid
“UCI Official site.” [Online]. Available: machine Learning Model.” Sixth International
https://archive.ics.uci.edu/ml/datasets/ Conference on Inventive Computation Technologies
ILPD+(Indian+Liver+Patient+Dataset). [ICICT 2021]. IEEE
P. M. Barnaghi, V. A. Sahzabi, and A. A. Bakar, “A Comparative
Study for Various Methods of Classification,” Int. Conf.

https://doi.org/10.54489/ijcim.v3i1.223 Published by: GAFTIM, https://gaftim.com

You might also like