Predicting Heart Disease Through Machine Learning Methods
Predicting Heart Disease Through Machine Learning Methods
Abstract:- Heart diseases including heart attacks, cause bodily organs, presenting a formidable challenge. Unhealthy
about 31% of global deaths, remaining a significant health dietary habits and the rapid pace of modern lifestyles
threat despite preventability. Limited tech advancements contribute substantially to the heightened risk of heart-related
and awareness, especially in developing nations, amplify diseases.
this challenge. Machine learning offers promise in tackling
this issue, with studies advocating ensemble methods for Leveraging machine learning and deep learning
accurate predictive models. These models analyze techniques to analyse diverse patient data within the medical
extensive medical data to efficiently predict heart diseases, field offers a promising avenue for assessing risks, identifying
undergoing stages like data exploration, feature selection, symptoms, and predicting heart-related diseases. Factors such
model implementation, and comparative analysis. A model as diabetes, smoking, excessive alcohol consumption, high
using Logistic Regression, Naive Bayes, and Random cholesterol, high blood pressure, and obesity significantly
Forest initially identified top-performing models, later elevate the risk of heart issues. Despite efforts to manage these
refined to CatBoost, RandomForest, and XGBoost factors, heart diseases can manifest regardless of gender or
through cross-validation and tuning. A hybrid model, age.
combining Logistic Regression, CatBoost, and
RandomForest, achieved a 97% accuracy, showcasing The purpose of this project is to use machine learning and
improved precision, recall, F1 score, and ROC AUC. This deep learning techniques to predict heart disease by doing a
underscores machine learning's potential in enhancing thorough examination of these risk variables. These kinds of
predictive accuracy and refining strategies to combat predictive powers could transform healthcare and improve
heart diseases effectively. people's lives. Furthermore, the research delves into a range of
heart disorders, such as Cardiomyopathy, Congenital Heart
Keywords:- Logistic Regression(LR), K-Nearest Disease, Heart Failure, and coronary artery disease, each with
Neighbors(KNN), RandomForest(RF), CatBoost(CB), unique traits and implications for the cardiovascular system.
XSBoost (XSB), Stochastic Gradient Descent(SGD), Cross-
Validation(CV), Support Vector Machine(SVM) In the initial phase of the study, the dataset was loaded,
Hyperparameter Tuning(HT) and Voting Classifier(VC). and multiple machine learning algorithms were employed,
including SGD, NB, RF, CB, XB, KNN, LR, and SVM.
I. INTRODUCTION Performance metrics such as Precision, recall, accuracy, F1
Score, and ROC AUC were computed, identifying the top-
In today's fast-paced world, the emphasis on self-care performing models before hyperparameter adjustment.
often gets overshadowed by the demands of daily life, leading
to heightened stress levels and neglect of one's health. Even Subsequently, cross-validation and hyperparameter
with the progress that medicine has made, diseases like cancer, tuning were performed, leading to the identification of another
heart disease, and tuberculosis still take a lot of lives each year. set of top-performing models with enhanced predictive
Globally, cardiovascular disease (CVD) is now the leading capabilities. Most models exhibited noticeable improvements
cause of death, accounting for about 31% of all deaths, across various criteria following hyperparameter adjustment,
according to the World Health Organisation (WHO). Over the particularly in precision, recall, accuracy, and F1 score.
span of 15 years, WHO reported an alarming 15.2 million
deaths attributed to heart-related diseases, underscoring the Finally, a hybrid model combining LR, CB, and RF was
persistent threat posed by these conditions. Notably, heart- developed using a voting classifier. This model demonstrated
related ailments inflicted a significant economic toll, remarkable predictive performance, achieving high accuracy
amounting to around $237 billion in India alone between 2005 and impressive precision and recall scores. The balanced F1
and 2015. score and outstanding ROC AUC further underscored the
model's overall performance.
The heart, as a vital organ responsible for blood
circulation, plays a crucial role in supplying oxygen and This comprehensive approach utilizing machine learning
nutrients throughout the body. Any dysfunction in this techniques highlights the potential to accurately predict heart
essential organ severely impacts the functionality of other disease, marking significant progress in early identification
and intervention against cardiovascular ailments. The advanced techniques such as genetic algorithms and hyper-
integration of various algorithms and methodologies signifies parameter optimization, the absence of a thorough explanation
the potential for impactful advancements in healthcare and of feature selection procedures is noted [6].
enhanced patient outcomes.
Katarya, R., & Meena, S. K.'s literature review paper
II. LITERATURE SURVEY explores the use of machine learning and deep learning
techniques for heart disease analysis. The systematic review of
In the paper "Heart disease identification from patients’ existing literature aims to guide future research in the
social posts, machine learning solution on spark" by H. healthcare industry. However, the paper's reliance on
Ahmed, E.M.G. Younis, A. Hendawi, and A.A. Ali, Apache secondary sources and lack of original research may limit its
Spark and Apache Kafka are utilized alongside machine contributions [7].
learning methods such as Decision Tree, Support Vector
Machine, RF Classifier, and LR Classifier to create a real-time Abeer Alsadoon's paper compares the accuracy of
system for predicting heart disease from medical data streams. different machine learning models for heart disease prediction.
The methodology includes feature selection algorithms, While the study recommends specific models for
machine learning algorithms, hyperparameter tuning, and classification, the lack of detail regarding feature selection is
cross-validation. However, limitations exist in terms of sample identified as a drawback [8].
size, data quality, and generalizability to other populations [1].
Katarya, R., & Meena, S. K.'s paper delves into the
S. Matin Malakouti's paper, "Heart disease classification application of machine learning for heart disease prediction,
based on ECG using machine learning models," explores the emphasizing the increasing prevalence of heart disease and the
automated categorization of Electrocardiography (ECG) data need for efficient data analysis in the medical sector. The study
using Gaussian NB, RF, LR, and Linear Discriminant reviews various risk factors and employs algorithms such as
Analysis. The study discusses the advantages and LR, K-Nearest Neighbor, Support Vector Machine, Naïve
disadvantages of these methods, emphasizing the use of 10- Bayes, and Decision Trees for prediction and classification
fold cross-validation to reduce prediction variance and avoid [9].
biased assessment. However, the study's limitation lies in the
challenges of accurately distinguishing between healthy and Finally, the paper by Naseri, A., Tax, D., van der Harst,
sick individuals using machine learning and deep learning P., Reinders, M., & van der Bilt explores the use of machine
methods [2]. learning methods to detect atrial fibrillation and heart failure
from wearable devices. While the study presents innovative
Md Mamun Ali et al.'s paper, "Heart disease prediction methods for cardiovascular outcome prediction, data privacy
using supervised machine learning algorithms: Performance concerns and limited sample size may impact the
analysis and comparison," investigates various machine generalizability of the findings [11].
learning classifiers for heart disease prediction. While the RF
method achieved 100% accuracy, sensitivity, and specificity III. PROPOSED METHODOLOGY
on a specific dataset, the study's reliance on a single dataset
raises concerns about generalizability to other datasets [3]. Before implementing cross-validation and
hyperparameter tuning, LR was the leading model, exhibiting
L. Sharan Monica et al.'s paper, "Latest trends on heart commendable accuracy. However, following cross-validation
disease prediction using machine learning and image fusion," and hyperparameter tuning, CB emerged as the top-
aims to develop a program for reliable and instant disease performing model, showcasing superior accuracy. Throughout
diagnosis. The methodology involves exploratory data these processes, the RF algorithm consistently demonstrated
analysis, attribute selection, and the use of machine learning strong performance both before and after tuning.
methods such as NB, decision trees, SVM, and artificial neural
networks. Similar to previous studies, the reliance on a single Given the robust performances of LR, CB, and RF
dataset limits the generalizability of the findings [4]. individually, a hybrid model was crafted using a voting
classifier, leveraging the strengths of these three algorithms.
Ivan Miguel Pires et al.'s paper, "Machine learning for
the evaluation of the presence of heart disease," explores The code demonstrates the creation and evaluation of a
different machine learning techniques for detecting cardiac VCensemble, amalgamating three distinct algorithms:
illness. Despite achieving high accuracy using Decision Tree RFClassifier, CBClassifier, and LogisticRegression. The
and Support Vector Machine approaches, the paper lacks a 'voting' parameter is set to 'soft', indicating that the final
detailed description of feature extraction, selection, and model prediction is determined by the weighted average probability
training methods [5]. of each classifier.
Jinny, S. V., & Mate, Y. V.'s paper, "Early prediction After training the VC on the given dataset, it's evaluated
model for coronary heart disease using genetic algorithms, using various metrics. The achieved performance metrics are
hyper-parameter optimization, and machine learning impressive: an accuracy of 97%, with a precision of 99%,
techniques," aims to identify heart diseases using machine recall of 95%, and an F1 score of 97%. Additionally, the
learning methods and heart rate features. While the study uses Receiver Operating Characteristic (ROC) curve showcases an
Area Under the Curve (AUC) of 99.85%, signifying Handling Missing Values: Address missing values through
exceptional model discrimination ability across different imputation (mean, median, mode) or deletion based on the
thresholds. extent of missingness.
Outlier Treatment: Identify and handle outliers using
The 'soft' voting method considers the probabilities statistical methods (e.g., Z-score, IQR) to prevent skewing
predicted by each model, weighing them and making of results.
predictions based on these weighted probabilities. This tends Normalization/Standardization: To improve model
to offer more nuanced decisions by taking into account the performance and convergence, normalize or standardize
confidence levels of individual models. In contrast, 'hard' numerical features to bring them to a common scale.
voting considers only the class labels predicted by each model
and selects the majority class as the final prediction. The 'soft' Data Split:
approach can often lead to improved performance when Partitioning the dataset into test, validation, and training
models are well-calibrated and have reliable probability sets is essential for building and assessing models:
estimates.
Training set: The predictive model is trained using the
IV. SYSTEM DESIGN & IMPLEMENTATION training set.
Validation Set: Used to evaluate model performance
To create a reliable predictive model for heart disease, during training and adjust hyperparameters.
the suggested methodology includes sophisticated machine Test Set: Used to assess the performance of the finished
learning algorithms, deliberate data preprocessing, and model model on unobserved data.
validation. The steps in the methodology are as follows:
B. Exploratory Data Analysis (EDA):
A. Data Collection and Preprocessing
Descriptive Analysis:
Data Sourcing: Understand the dataset's characteristics, distributions,
Obtaining a comprehensive dataset involves sourcing and statistical summaries:
diverse patient information from various sources, including
hospitals, research databases, or healthcare institutions. This Central Tendency: Mean, median, mode of features.
dataset should encompass:
Dispersion: Standard deviation, range, interquartile range
(IQR).
Demographics: Age, gender, ethnicity, etc.
Correlation Analysis: Identify relationships between
Medical History: Pre-existing conditions (diabetes,
variables (e.g., correlation matrix) to understand feature
hypertension), medication history.
importance.
Vital Signs: Blood pressure, heart rate, BMI.
Lab Results: Cholesterol levels, blood glucose, etc. Visualization:
Utilize visual tools to gain deeper insights and identify
Data Cleaning: potential patterns related to heart disease:
Cleaning the dataset is essential to ensure data quality
and consistency: Histograms: Display distributions of numerical variables.
Heatmaps: Visualize correlations between features.
Scatter Plots: Explore relationships between two numerical
variables.
prone to categorization issues. The LR formula for CB also incorporates robust handling of missing data and
establishing the probability that input X belongs in class 1 can provides excellent accuracy by default, requiring minimal
be expressed as: hyperparameter tuning, making it an efficient and user-
friendly choice for predictive modeling tasks, especially in
scenarios with complex datasets containing categorical
features.
XGBoost (XB):
XB, an abbreviation for extreme Gradient Boosting,
Here is bias and is the weight that is multiplied stands out as a highly efficient and accurate ensemble learning
by input X . technique tailored for structured or tabular data. Belonging to
the gradient boosting family, it constructs models sequentially,
K-Nearest Neighbors (KNN): addressing the shortcomings of its predecessors. By
A flexible supervised machine learning technique for integrating weak learners, typically decision trees, XB
regression and classification applications is the KNN mitigates loss through the optimization of a predefined
algorithm. It functions according to the similarity principle, objective function. Its methodology entails a gradient descent
which states that the majority class of a sample's KNN in the algorithm, which computes gradients for updating model
feature space determines its class. To ascertain the KNN for a parameters. This algorithm aims to minimize a regularized
new data point, KNN calculates the Euclidean distance objective, comprising both a loss function and a penalty term,
between the new point and each point in the training set. A thereby preventing overfitting and enhancing generalization.
majority vote among these neighbours then determines the The final prediction of the XB model results from a weighted
class of the new point. In regression tasks, KNN uses a aggregation of predictions generated by individual trees within
weighted average or an average of the target values of its KNN the ensemble. The objective function of XB incorporates a loss
to forecast the value of the incoming data point. In an n- function (L) for error measurement and a regularization term
dimensional feature space. The Euclidean distance between (Ω) to manage model complexity, formulated as: Objective =
two points is determined using the subsequent formula: L(predictions, targets) + Ω(complexity)
Model Development: Implement the selected algorithms Area Under Curve - Receiver Operating Characteristic, or
using appropriate libraries (e.g., scikit-learn) to create ROC-AUC:
predictive models. The ROC Curve is a plot of True Positive Rate
Training: Train each model on the training dataset using (Sensitivity) against False Positive Rate (1 - Specificity). How
appropriate parameters. successfully the model can distinguish between the two groups
(heart disease vs. no heart disease) is shown by the area under
E. Model Assessment: the ROC curve (AUC). A greater AUC in heart disease
A crucial first step in evaluating the efficacy, precision, prediction indicates improved ability to distinguish between
and resilience of machine learning models for heart disease those with and without heart disease.
prediction is model evaluation. A few crucial elements of
model evaluation are as follows: The code initializes an empty dictionary model_scores1
to store evaluation metrics for various machine learning
Accuracy: models. After that, iterating through a dictionary of models,
each model is assessed using X_test and y_test data after being
Formula: (TP + TN) / (TP + TN + FP + FN) trained using X_train and y_train data. Using the appropriate
functions from scikit-learn, it computes evaluation metrics for
Measures the proportion of correct predictions out of the each model, including precision, recall, accuracy, F1 score,
total predictions made. and ROC AUC. These metrics are then appended to the
model_scores1 dictionary along with the model's name. This
In heart disease prediction, it reflects the overall process creates a structured collection of evaluation scores for
correctness of identifying both healthy individuals and those each model, allowing easy comparison of their performance.
with heart disease.
The code employs Python libraries like matplotlib,
Precision: pandas, and scikit-learn to visualize and analyze the
performance metrics of multiple machine learning models for
Formula: TP / (TP + FP) classification tasks. Initially, it imports necessary modules for
plotting, data manipulation, and model evaluation. Assuming
Indicates the accuracy of positive predictions. In heart the existence of a populated DataFrame model_scores1
disease prediction, it measures the proportion of correctly containing model performance metrics (Precision, recall,
identified individuals with heart disease among all predicted accuracy, F1 Score, ROC AUC) for various models, it
positive cases. converts this data into a pandas. DataFrame scores_df. The
subsequent section uses matplotlib to create a 2x3 subplot grid,
High precision means fewer false positives, reducing plotting bar graphs for each metric (Precision, recall, accuracy,
unnecessary interventions or treatments for individuals who F1 Score, ROC AUC) against different model names on
are actually healthy. separate subplots, enabling visual comparison of model
performances. It then identifies and prints the top-performing
Recall (Sensitivity): models based on each metric and displays their individual
performance metrics like precision, recall, accuracy, F1 score,
Formula: TP / (TP + FN) and ROC AUC. The code concludes by summarizing the
overall analysis of top models' performances, aiming to
Evaluates how well the model can accurately recognise provide insights into the most effective models for the
every positive case. When it comes to heart disease prediction, classification task at hand. The code's layout enables a
it measures the percentage of accurately diagnosed heart comprehensive analysis and comparison of multiple models'
disease patients among all true positive cases. performances, aiding in model selection and decision-making
processes based on key evaluation metrics.
A high recall rate indicates that the model is successful
in identifying heart disease patients, lowering the possibility
of overlooking those who need medical attention.
G. Re- Model Evaluation after Cross-Validation and Hyper Comparison with Initial Results:
Parameter Tuning Comparing the performance metrics before and after
We are evaluating the model performance on the test set tuning helps gauge the improvement achieved through
using various metrics like precision, recall, accuracy, F1 score, hyperparameter tuning. It allows us to verify if the changes
and ROC AUC again after cross-validation and hyper made to the model indeed enhance its predictive capabilities.
parameter tuning. The reason why we performing model
evaluation gain after cross validation are as follows: Selecting the Best Model:
Post-tuning, this evaluation helps identify the top-
Performance Evaluation on Test Set: performing models based on their performance on the unseen
The initial assessment you performed before any tuning test data. It ensures that you select the best-performing model
provides a baseline. However, after cross-validation and for deployment or further consideration.
hyperparameter tuning, the model might have changed
significantly. Hence, it's essential to evaluate the tuned models
on an unseen dataset (the test set) to get a realistic estimate of
how well our models generalize to new, unseen data.
Providing Final Conclusions: Therefore, re-evaluating the model on the test set post-
This evaluation assists in summarizing the outcomes of tuning is a vital step to ensure you have an accurate
the entire process, emphasizing the improvements achieved understanding of the model's performance and to make
through tuning and aiding in decision-making for model informed decisions about which model(s) to proceed with.
selection or next steps in the model.
H. Comparative Analysis and Evaluation: across multiple metrics, solidifying their positions as top-
performing models even after tuning.
Before Tuning:
The initial assessment of the models revealed varied Comparative Analysis and Evaluation.
performances. Models like LR and NB displayed The tuning process had a considerable impact on refining
commendable precision, recall, accuracy, and F1 Score, the models' predictive abilities. Models that initially had
whereas KNN exhibited relatively lower performance in weaker performance, like KNN, exhibited notable
comparison. Notably, models such as SVM and SGD depicted improvements in accuracy, precision, and F1 Score.
suboptimal performance, evident from lower accuracy, recall, Furthermore, the SVM and SGD models, which initially
and F1 Score. performed poorly, showed noticeable enhancements in
multiple metrics post-tuning.
After Tuning (Cross-Validation & Hyperparameter
Tuning): Best Model Selection.
Following cross-validation and hyperparameter tuning, The evaluation highlights that CB, after tuning, emerged
there was a marked improvement across most models. LR as the most consistent and robust performer. It displayed
showed significant enhancements in precision and F1 Score, noteworthy improvements in precision, recall, accuracy, F1
indicating better predictive capabilities after tuning. After Score, and ROC AUC. These enhancements position CB as
post-tuning, RF showed notable gains in ROC AUC, accuracy, the top-performing model among the others, showcasing its
and precision, indicating its increased robustness as a suitability for this specific dataset and problem context.
classifier. CB and XB retained their high-performance levels
I. Development of Hybrid Model probabilities assigned by each model, ultimately enhancing the
ensemble's predictive accuracy.
Introduction
The hybrid model we've constructed amalgamates the Subsequently, the VCis trained on the given training
predictive strengths of three key models: LR, deemed the best dataset (X_train and y_train) via the fit() function, which
performer before cross-validation and hyperparameter tuning, allows the ensemble to learn from the provided data. Through
CB, identified as the superior model after this tuning, and the this process, the hybrid model learns to make predictions by
RF demonstrating consistent competence both before and after aggregating the outputs of the individual classifiers,
the tuning process. The inclusion of LR, CB, and RF within harnessing the diverse strengths and approaches of each
this hybrid framework leverages the varied strengths and model. Upon training completion, the voting_classifier
diverse learning methodologies of these models. LR, instance is equipped to predict on new, unseen data,
recognized for its interpretability and simplicity, acts as a capitalizing on the collective intelligence derived from the
strong baseline, whereas CB, with its advanced boosting constituent classifiers to potentially improve overall predictive
technique and optimized parameters post-tuning, bolsters performance.
predictive accuracy. Additionally, the RF, exhibiting
commendable performance across different scenarios, Hybrid Model Performance
contributes to the ensemble's robustness and adaptability. This Evaluating the performance of the trained hybrid
hybridization strategy aims to capture the collective prowess VotingClassifier model involves using various assessment
of these models, potentially enhancing predictive accuracy and metrics and visual aids. Following the training of the
resilience across a wide spectrum of datasets and real-world 'voting_classifier' on the dataset, the model undergoes testing
scenarios. using the test set (X_test) to generate predictions ('y_pred') for
the target values. Evaluation metrics such as precision, recall,
Hybrid Model Implementation accuracy, and F1 score are calculated to assess the model's
The code demonstrates the creation of a hybrid model predictive accuracy, indicating its capability to accurately
using the VotingClassifier (VC) ensemble from Scikit-Learn. classify instances from the test data. Furthermore, the Receiver
The objective is to merge the predictive capabilities of three Operating Characteristic (ROC) curve and its associated Area
distinct machine learning algorithms: RF Classifier, CB Under the Curve (AUC) metric are determined, offering
Classifier, and LR. Initially, individual instances of these insights into the model's balance between true positive rate and
classifiers are initialized with specific hyperparameters: a RF false positive rate across varying threshold values. The
Classifier with 100 estimators, a CB Classifier with 100 resulting ROC curve visualization illustrates the model's
iterations, and a LR instance. These models are integrated into discriminative performance, highlighting its ability to
a VC, which acts as a meta-estimator, combining the distinguish between classes effectively. This thorough
predictions of its constituent models. The 'soft' voting scheme, assessment aids in comprehending the model's strengths and
employed in this instance, weighs predictions based on the weaknesses, facilitating the interpretation and evaluation of its
predictive abilities.
After undergoing cross-validation and hyperparameter different metrics. Models like RF, CB, XB, and LR improved
tuning, a refined set of models emerged. their scores across metrics like precision, recall, accuracy, and
F1 score, indicating better-tuned parameters and enhanced
Metrics for Models after Tuning: predictive capabilities.
After hyperparameter tuning, there was a noticeable
improvement in the performance of most models across
The development of a blended hybrid model, which Table 3 Performance Metrics for Hybrid Model
merges LR and CB through a voting classifier, yielded
outstanding predictive capabilities.
Hyperparameter tuning significantly improved the [9]. Naseri, A., Tax, D., van der Harst, P., Reinders, M., &
performance of individual models, enhancing their predictive van der Bilt, I. (2023). Data-efficient machine learning
capabilities. methods in the ME-TIME study: Rationale and design
of a longitudinal study to detect atrial fibrillation and
The hybrid model, leveraging the strengths of LR, CB, heart failure from wearables. Cardiovascular Digital
and RF, emerged as a powerful ensemble, showcasing Health Journal
exceptional performance across multiple evaluation metrics, [10]. Pires, I. M., Marques, G., Garcia, N. M., & Ponciano,
indicating its potential for robust predictions on the heart V. (2020). Machine learning for the evaluation of the
disease dataset. presence of heart disease. Procedia Computer
Science, 177, 432-437.
FUTURE WORK [11]. Rimal, Y., Paudel, S., Sharma, N., & Alsadoon, A.
(2023). Machine learning model matters its accuracy: a
Future advancements may include integrating comparative study of ensemble learning and AutoML
hyperparameter optimization with emerging technologies like using heart disease prediction. Multimedia Tools and
reinforcement learning, boosting adaptability. Additionally, Applications, 1-18
the model's methodologies might evolve to address multi-
objective optimization, considering factors like
interpretability, fairness, and robustness in model
optimization.
REFERENCES