0% found this document useful (0 votes)
6 views8 pages

Prediction of Heart Problems In Diabetic Patients

The document discusses the use of machine learning algorithms, specifically Support Vector Machines (SVM), to predict heart problems in diabetic patients. It outlines the methodology for data collection, preprocessing, feature selection, model training, evaluation, and deployment, emphasizing the potential for early detection and improved healthcare decision-making. The findings indicate a high accuracy in predicting cardiovascular risks among diabetics, highlighting the significance of data mining in healthcare.

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Prediction of Heart Problems In Diabetic Patients

The document discusses the use of machine learning algorithms, specifically Support Vector Machines (SVM), to predict heart problems in diabetic patients. It outlines the methodology for data collection, preprocessing, feature selection, model training, evaluation, and deployment, emphasizing the potential for early detection and improved healthcare decision-making. The findings indicate a high accuracy in predicting cardiovascular risks among diabetics, highlighting the significance of data mining in healthcare.

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Prediction of Heart Problems in Diabetic Patients

using Machine Learning Algorithm


BadriGopal A Bharath Ayyar Bharanidharan R Jasmine Ilakkia N
Dept of Instrumentation and Control Dept of Instrumentation and Control Dept of Instrumentation and Control
Engineering Engineering Engineering
Sri Krishna College of Technology Sri Krishna College of Technology. Sri Krishna College of Technology
Coimbatore, India Coimbatore, Tamilnadu. Coimbatore, Tamilnadu.
[email protected] [email protected] [email protected]

Saravanakumar K Dhanaselvam J Rajesh R


Asst. professor, Dept. of Electrical and Asst. professor, Dept. of Electrical and Department of Engineering Design,
Electronics Engineering Electronics Engineering Indian Institute of Technology Madras,
Sri Krishna College of Technology Sri Krishna College of Technology Chennai, Tamilnadu, India 600036.
Coimbatore, Tamilnadu. Coimbatore, Tamilnadu. [email protected]
[email protected] [email protected]

Abstract— A key technique in the field of information extraction and analysis, Knowledge Discovery in Databases (KDD) process,
incorporates data mining. Although the terms "data mining" and "knowledge discovery in databases" are frequently used synonymously,
it is important to understand that data mining is a particular element that is tucked away inside the larger knowledge discovery
framework. The comprehensive examination of large datasets is the main goal of data mining techniques. These methods seek to reveal
hidden trends and clarify complex connections that might otherwise remain invisible to the human eye. These data mining-derived
insights are essential for making well-informed decisions in a variety of fields, including healthcare. Take diabetes, a chronic illness
brought on by either the pancreas producing insufficient amounts of insulin or the body using the insulin that is generated inefficiently.
In this area, a few studies have used Support Vector Machines (SVM), a reliable machine learning technique, to great effect for
classification tasks. In keeping with this scientific tendency, we also used an SVM classifier in our trials, supplemented by a radial basis
function kernel. The results that were shown by our suggested system has shown great outputs. It had shown a great accuracy in
predicting diabetic patients' chances for heart diseases. This development has the ability to transform algorithms for decision making in
healthcare and greatly helpful in the early detection and prevention of cardiovascular problems in diabetics.

Keywords—Data Mining, Type 2 Diabetes, Heart Disease, Knowledge Discovery, Machine Learning Method, Support Vector
Machine.

I. INTRODUCTION
The goal of knowledge discovery in databases (KDD), sometimes referred to as Data Mining, is to extract insightful information
from large datasets. This process involves multiple stages, including knowledge display, data cleaning, data selection, and data
mining pattern identification [1]. Data mining technology effectively extracts non-trivial information from medical datasets,
bridging many domains including data warehousing, statistics, machine learning, and neural networks. Data mining technologies
promote predictive analysis by allowing businesses to concentrate on important information found in their data repositories [2].
This enables proactive, knowledge-driven decision-making. There are several data mining techniques accessible, and the best one
to use will depend on the domain application.
Data mining has enormous promise in the healthcare industry since it can automatically find predictive insights in large databases
[3]. Using training data, classification prediction models are built, and their effectiveness is evaluated using testing data. The kind
of health condition known as "diabetes mellitus" is characterized by continuous hyperglycemia, that affects the metabolism of
proteins, fats, and carbohydrates and because of impaired insulin production [4].
Diabetes is classified into two types: Type 1 (T1B) – it results from inadequate insulin production in childhood and requires lifelong
insulin injections to survive; and Type 2 (T2B)- it results from the body's inefficient use of insulin in adulthood and it is frequently
associated with obesity, inactivity, and poor diet [5]. Diabetes causes serious risk for heart disease, renal failure, and blindness,
among other conditions [6]. Heart problem is a broad term that refers to a variety of heart-related diseases that can cause some
symptoms such as trouble while breathing, chest discomfort, and heart attacks [7]. The most prevalent kind of diabetes, i.e. type 2,
that makes up 90% of cases worldwide, is linked to poor dietary choices, sedentary lifestyles, and obesity. Some of the treatment
options include dietary changes, weight loss activities, oral medicines, and, if required, insulin dosage [8]. Diabetes, a chronic
illness is brought on by either the pancreas producing insufficient amounts of insulin or the body using the insulin that is generated
inefficiently. In this area, major studies have used Support Vector Machines (SVM) [9], a reliable machine learning technique, for
classification tasks. In keeping with this technique, we have also used SVM classifier in our trials, by a radial basis function kernel.
The results that our suggested system produced shows great promise [10]. It showed a remarkably high degree of accuracy in
predicting diabetic patients' vulnerability to heart diseases. This development has the potential to transform healthcare decision-
making and greatly aid in the early detection and prevention of cardiovascular problems in diabetics.
Diabetes is a major risk factor for cardiovascular disease, which is the main cause of death for diabetics. Individuals with Type 2
diabetes have a five-to-ten-year mortality advantage over those without the illness [11]-[14]. Treating cardiovascular illnesses
accounts for a large portion of the related expenses. However, there are ways to greatly lower the risk of cardiovascular disease,
including dietary modifications, greater physical activity, weight loss, and lifestyle changes [15].
The cost of treating cardiovascular disease and its associated mortality can be decreased by early identification of people who are
at-risk for diabetes. While current systems mainly predict diabetes risk [16], intelligent automatic diagnosis systems have the
potential to identify vulnerable subsets of diabetic patients.
Like this, there are various systems in place that predict the risk of cardiovascular diseases, and their usefulness in the medical field
is clear [17]. Because In these kinds of systems, Support Vector Machines (SVMs) have proven to be effective, we used an SVM
classifier in our experiments.
II. METHODOLOGY
A. Data Collection
To achieve a data collection process, it involves defining objectives, determining data types needed, and selecting appropriate
methods. Gather a dataset which includes relevant features and labels (heart disease presence or absence) for diabetic patients.
Ensure the dataset is well-curated and contains enough data, and to guarantee the accuracy and dependability to guarantee the
accuracy and dependability of the chosen classifier, rigorous testing phases are executed [18]. These tests are crucial in validating
the predictive power of the model and ensuring its suitability for medical applications. By leveraging methods for machine learning
and data mining [19], Using this dataset, the study seeks to extract meaningful information that may help with early identification,
diagnosis, and even the creation of more efficacious treatment plans for individuals with diabetes.
B. Data Preprocessing
This section of the data pre-processing method randomly selects a point on either the x-axis or y-axis. It then calculates the intercept
of this point on the opposite axis using the notion of a parallelogram. This procedure divides the dataset into a training dataset and
a testing dataset [20]. Partitioning of the dataset into training and testing sets. This section of the data pre-processing method
calculates the quantity of training and test datasets located on each side of the intercept.
C. Feature Selection/Engineering
Identify similar features that may impact heart problems in diabetic patients. Feature selection methods such as correlation analysis
or important features from the tree-based [21] models can be useful.
D. Splitting Data
It is important to divide the dataset into two parts when preparing the data for model development: a training set and a testing or
validation set. Generally, 70% to 80% of the data are split for training and the remaining 20% to 30% are split for testing [22]. This
divide ensures robustness and accuracy in forecasting heart disease risks among diabetic patients by allowing the model to learn
from one section while separately evaluating its performance on unseen data.
E. Model Selection
SVM (Support Vector Machine) method is wise method for the classification problem of predicting heart disease among diabetic
individuals. It helps in handling binary classification problems with effectiveness and toughness [23]. Its capability to define clear
decision boundaries and generalize well to the unseen data makes it a suitable choice for this medical application.
F. Training of Model
Now, train the SVM model on training data. This can be experimented with different SVM parameters (e.g., linear, radial basis
function) to find the best configuration and check for its accuracy [24]. This process helps to identify the best settings for accurate
heart disease prediction among diabetic patients.
G. Model Evaluation
Evaluate the performance of the model by using appropriate parameters such as accuracy, precision, recall, F1-score, and ROC-
AUC on testing/validation set. Consider the method using cross-validation for more robust evaluation [25]. The evaluation process
by classifying the data into subsets, thus ensuring that the model's performance is consistent across different data splits.
H. Hyperparameter Tuning
Improve the performance of the SVM (Support Vector Machine) model by hyperparameter tuning using techniques like random or
grid search. For achieving this, parameters like learning rates and regularization strengths were adjusted. Conducted a systematic
search over a range of values for each parameter to find the optimal combination. Considered using cross-validation to assess model
performance across different parameter sets. Monitored training and validation results to avoid overfitting or underfitting.
Experimented with various algorithms and their hyperparameters to find the most suitable configuration. Regularly updated the
tuning process based on new insights and data. Through a methodical exploration of various hyperparameter combinations, the
model is optimized, leading to increased predictive accuracy and efficacy in identifying heart disease risks in patients with diabetes
[26]-[29].
I. Model Interpretation
Examine the SVM model to determine the significance of various features in the prediction of heart disease with diabetes. The
SVM's coefficients or feature importance scores can be applied [30]. Healthcare providers can use this information to make well-
defined decisions about patient care and intervention -tactics [31]. It is crucial for comprehending the underlying causes of heart
disease in diabetic patients.
J. Deployment
Once got satisfied with the model’s performance, this have to deploy it within the healthcare system for real-time predictions, which
aids in the identification and management of heart problem risks in diabetic patients [32]-[34]. Continuous monitoring will ensure
its reliability and the ongoing support for the informed healthcare decisions. The table of attributes used in the diagnosis is given
in table 1.
K. Monitoring and Maintanance
Monitor the model’s performance continuously and update it, if necessary, as new data becomes available. If the model's
effectiveness drops or if there are shifts in the data, consider retraining it with the updated data or adjusting hyperparameters. This
proactive approach ensures that the SVM model stay remains a reliable tool for predicting heart disease among patients with diabetic
and adapts to changing healthcare scenarios and patient populations [35].
L. Interpretablity
If the interpretability is critical, consider techniques like SHAP (Shapley Additive explanations) in order to gain insights into the
individual predictions made by the SVM (Support Vector Machine) model for heart disease prediction among diabetic patients.
These methods provide transparent means of understanding the decision-making process of the model [36]. SHAP values, for
instance, reveal the purpose of each feature to a specific prediction, while LIME generates locally interpretable models to explain
the individual predictions, enhancing the trust and facilitating the informed healthcare decisions. The block diagram of the process
is given in Figure 1.

Age
Gender Classified
Chest Pain Type Output
Resting Blood Pressure ML System
Cholesterol Data
Fasting Blood Sugar Pre-Processing
Data Resting ECG
Acquisition Maximum Heart Rate
Exercise Induced Feature
Angina Selection
ST Depression
ST Segment slope SVM
Fluoroscopy Coloured Prediction
Vessels Model Performance
Thallium Stress Measure
Target

Figure 1 Block Diagram


1. Linear Regression:
The supervised learning process that is called linear regression is used to predict a continuous target output from one or more
independent variables. This technique operates on the assumption that the relationship between the input features and the output is
said to be linear [37]. Here's a brief explanation of linear regression:

- Goal: Predicts a continuous numeric value (For example, stock prices, house prices, temperature) based on input features.
- Model: A supervised learning process, linear regression assumes that a linear equation captures the relation between input variables
and target variable [38]. - Equation: The equation for simple linear regression (one input feature) usually looks like this:
y = mx + b,
where x is the input feature, m is the slope, b is the intercept, and y is the target variable [39]. The equation gets more complicated
in multiple linear regression (multiple input features).
- Training: By determining the values of the coefficients (m and b in the simple case) that minimize the sum of squared differences
between the predicted values and the actual target values, the algorithm learns the best-fit line, also known as a
Hyperplane [40].
- Evaluation: Common metrics used in the evaluation of linear regression include Mean Squared Error (MSE), Root Mean Squared
Error (RMSE), and R-squared (coefficient of determination).

2. Supervised Machine Learning Algorithm


When an algorithm learns from a labelled dataset—one in which every example is linked to the right response or target value—it
is referred to as supervised machine learning. Supervised learning tries to teach the algorithm a mapping or the connection between
the input parameters (predictors) and corresponding target values (labels) for it to be able to predict new, unseen data with accuracy
[41]. The elements of supervised machine learning are as follows:

Dataset Creation: A dataset that contains input data and their output data (labels or target values) is the starting point for supervised
learning. The dataset is split into two sections: - a test or validation sets that are used to estimate the model's performance, and a
training sets that are used to train the model.
Feature Extraction: This section involves choosing critical information from various health data. It helps focus on some of the key
factors such as blood sugar levels, cholesterol, and blood pressure. By simplifying complex data, machine learning models can
better understand the pattern and make accurate predictions [42]. This enhances model performance in identifying heart issues in
diabetic individuals. In simple terms, feature extraction streamlines relevant information, that helps machine learning algorithms in
making more precise predictions for better healthcare outcomes.
Selection of Model: Select a machine learning model or algorithm that is suitable for the current task. Common supervised learning
algorithms include: -
 Linear Regression: - prediction of continuous values (e.g., predicting house prices).
 Logistic Regression: for binary classification of problems (e.g., spam or not spam).
 Support Vector Machine (SVM): used with the goal of maximizing margins for regression and classification.
 Neural Networks: they are Deep learning methodology that can handle complex patterns and are used for a wide range of
tasks.
Model Training: The algorithm that discovers the underlying relationships or patterns in the training data during the training phase
[43]. To lessen the difference between the predefined loss function and the actual target values, the model modifies its internal
parameters. This optimization process continues until the model converges to a satisfactory state.
Hyperparameter Tuning: There are hyperparameters in many algorithms that need to be set before training. Finding the ideal values
for these parameters through methods like grid search, random search, or Bayesian optimization is known as hyperparameter tuning
[44].
Model Evaluation: Once the model gets train up, it needs to get evaluated to assess its performance on new, unseen data.
Model Deployment: Once got satisfied with the model’s performance, deploy it within the healthcare system for real-time
predictions [45], that aids in the identification and management of heart problem risks in diabetic patients. Deployment may involve
creating APIs, integrating with applications, or running the model on cloud-based servers.
Monitoring and Maintenance: In order to make sure the model keeps performing well after deployment, it should be regularly
observed [46]. The model requires retraining or fine-tuning if performance deteriorates over time because of shifting data patterns.

3. Logistic Regression:
Logistic regression is applied to binary classification tasks (e.g., yes/no, spam/not spam, disease/no disease) where the objective is
to predict one of the two possible outcomes. This is the description of logistic regression:
- Goal: Estimate the likelihood that a given input belongs to a particular class. (e.g., 0 or 1).
- Model: The logistic function is used in logistic regression for modelling the predictability of the binary outcome. The output,
which shows probabilities, is limited to values between 0 and 1 [47].
- Equation: The logistic regression equation looks like this: The equation for logistic regression looks like this:
p(y = 1 / x) = 1 / (1 + e ^ ( -z ) ),
z - linear combination of the input features & their corresponding coefficients, x - input features, and p(y = 1 / x) is the probability
of the positive class [48].
- Training: The algorithm finds the best coefficients to use given the model to maximize the likelihood of the observed data.
- Evaluation: this includes accuracy, precision, recall, F1- score, and the area under the Receiver Operating Characteristic curve.

ATTRIBUTES REPRESENTATION DESCRIPTION

Gender Gender male or female


Age AGE In years
in terms of mmol per liter
Urea Urea
(1.8 – 7.1 mmol/liter)
normal result: 0.7 - 1.3 mg/Dl (61.9 - 114.9
Creatine Cr
µmol/L)
Glycated Hemoglobin HbA1C Normal range is between 4% and 5.6%

Cholesterol Chol Less than 200 mg/dl

Blood Pressure SP between 90/60mmHg and 120/80mmHg

Triglycerides TG Below 150 mg/Dl

HDL HDL More than 40 mg/dl

LDL LDL Less than 100 mg/dl

VLDL VLDL 2 – 30 mg/dl


BMI BMI Normal range is 18.5 – 24.9
Table 1 Attributes used for the diagnosis

III. EXPERIMENTAL RESULT AND DISCUSSION


Without using plaque data, Support Vector Machine (SVM) algorithm is used in this study for predicting the incidence of heart
disease in diabetic patients. Medical records of diabetic patients with a variety of clinical and demographic characteristics were
included in the dataset used for this analysis. Our SVM model showed encouraging results in predicting heart disease in patients
with diabetes. Allocation 70% of the dataset for training and 30% for testing the system is done. The model classified patients as
having or not having heart disease with an accuracy of about 85%, a sensitivity of 82%, and a specificity of 88%. These findings
imply that SVM may be a useful technique for risk assessment and early identification in diabetics who are at-risk for heart disease.

With an accuracy of about 85%, our SVM model demonstrated its ability to successfully differentiate between diabetic patients
with and without heart disease. Because of its linear decision boundary, linear regression is generally employed for regression tasks
and tends to have lower accuracy when applied to classification problems such as this one. Its accuracy of about 74.3% in our study
suggests its classification limitations. Despite being created especially for binary classification, logistic regression's accuracy was
only about 77.93%. It is still slower than the SVM, but it performs better than linear regression.

By using kernel functions, SVM is excellent at identifying intricate, non-linear relationships in the data. SVM is inherently robust
to outliers due to its margin-based approach. Outliers have minimal impact on the decision boundary, making SVM well-suited for
medical datasets that may contain noisy data points. Linear regression can be sensitive to outliers, as it attempts to minimize the
sum of squared errors, leading to potential distortions in the model's predictions. Logistic regression, like linear regression, can be
affected by outliers, potentially leading to misclassification. SVMs can learn complex, non-linear decision boundaries through
kernel functions, allowing for better separation of patients with and without heart disease. For linear decision both logistic regression
models and linear regression models will not perform well as they won't not be effective for datasets with complex relationships.

In our paper it is have shown that when it comes to prediction of heart disease in diabetic individuals without plaque data, SVM
tackles more complex datasets more effectively compared to both the logistic regression and linear regression methods. Because of
its strong capacity to deal with non-linearity, handle non-linear selection limits, and its tolerant against outliers, it is therefore a
great choice in these challenging problems. Because of this it helps in providing a high accuracy and dependable predictions in
healthcare.

A specific dataset of 1000, all those data's in the dataset where especially pertinent to people with diabetes,was used to train the
machine learning model. Among these, almost 424 records (cases) were related to people with heart disease, and the remaining 576
records (cases) pertinent to people without heart disease. Used the records to preprocess and the trained the SVM classifier. This
model uses the kernel parameters for non-linear SVMs and coefficients for the importance of features for linear SVMs. It is easy to
evaluate the effectiveness of the model by analysing metrics like recall, accuracy, precision, and F1-score. The decision parameters
can be shown graphically utilising visualisation tools, demonstrating how the model classifies data points. In machine learning,
SVM output are more important to feature detailed analysis, performance, assessment, and classification of the model. The data
points that are located closest to the hyperplane are the support vectors. They serve as essential for calculating the decision limit
and the margin. The output of the model includes the support vectors and thier corresponding labels that go with them.

Classification Output: The SVM model yields a classification output for newly discovered data points that shows which side of
the hyperplane the data point falls on. Based on their location in relation to the hyperplane, data points are assigned to one of the
predefined classes.

Experimented with values for the RBF kernel parameters, C and γ, during the training process. Checked the accuracy of the resulting
models for every combination of C and γ. When working with datasets, a good classifier should exhibit accuracy outside of training.
Used a technique known as "10-fold cross validation" to determine the accuracy of the classifier.

Separated the training set into ten-sized subsets for 10 cross validations. Next, used a classifier trained on the remaining nine subsets
to test each subset in turn. In this manner, each occurrence in our training dataset was forecasted once. The cross-validation accuracy
represents how many instances were correctly classified as a percentage of all data tested. Using cross validation helps prevent
overfitting issues.

After conducting trials, it is determined that the classifier achieved its accuracy of 94.60% when using C = 5.0 and γ = 1.0.

The outcomes are listed below. The obtained results show that 94.60% overall classification accuracy is achieved by the classifier,
which is impressive. In addition, it shows remarkable recall for the positive class at 83.10% and remarkable precision for the positive
class at 97.52%. The classifier continues to maintain good recall (99.10%) and high precision (93.67%) about the negative classes.
The heatmap for data visualization is given in Figure 5.

Figure 2. Logistic Regression

Figure 4. Linear Regression


Figure 5. SVM Input 1

Figure 6. SVM Input 2

The output of a linear regression offers vital information for comprehending and evaluating the model's performance. Coefficients
show how much the dependent parameter changes when the independent variables changes in one unit while keeping other variables
constant, thereby quantifying the effect of independent variables on the dependent variable. As a baseline, the intercept shows the
dependent variable's estimated value when all independent variables are zero. The X- and Y-axes represent the independent and
dependent variables, respectively, in a linear regression graph. It shows the relationship between the data by displaying it as dots in
a scatter plot. The regression line, which is commonly expressed as "Y = mx + b," with "m" standing for the slope and "b" for the
intercept, is a best-fit straight line that shows the linear relationship. Plotting of actual data points is accompanied by a regression
line that displays predicted values. The vertical lines that extend from the data points to the line represent the errors, in predictions.
An optional R squared value indicates how well the model explains the data. This visual provides an overview of prediction
accuracy, model fit and data relationships.

A support vector machine model helps in allows binary or multi-class classification by produce decision boundary that increases
the margin between classes. This is done based on where the data points actually belong in the boundary, it classifies them as
identifying it to classes. As we are using more high-dimensional data so SVM model makes the best suit to perform well in these
situations. In SVM model the boundaries are defined by the means of critical data points from the dataset. For classification purposes
they require clear and distinct class distinctions, SVMs therefore serve as an effective tool for classification tasks. Using support
vector machine is highly beneficiary, where by it can be highly efficient in predicting heart disease in diabetic individuals. First,
support vector machine can manage high-dimensional data effectively, which is extremely required when working with medical
datasets, where we have multiple features
We have included variety of clinical characteristics in our study, by including features like age, gender, cholesterol, blood pressure,
and glucose levels etc. Secondly, we have not used the plaque information in the model. Without a doubt, plaque information is
essential in evaluating the risk of heart disease. Nevertheless, removing it from the model offers a more practical, effective and
economical method of predicting heart disease in people with diabetes.

This is especially important in healthcare settings with limited resources because plaque assessment may not always be available.
Our study is not without limitations, though. It is imperative to conduct additional validation on larger and more diverse datasets as
the used dataset might not accurately reflect the entire diabetic population. The selection of hyperparameters can also affect the
model's performance, and fine-tuning could increase its accuracy even more. In Figures 8 and 9, various inputs are provided for the
SVM algorithm.
IV CONCLUSION
In this paper diabetic individuals have significant risk of developing heart diseases. These kinds of classifiers can help in early
identification of heart diseases risk in diabetic patients. In the future, machine learning techniques will be increasingly utilised to
enhance the analysis of heart disorders and enable early disease prediction. This will help minimise the number of deaths by raising
knowledge about these diseases. Patients can be warned to alter their lifestyles there. By doing this, diabetic patients will be shielded
from heart disease, which will lower their mortality rates and save the state money on medical expenses. SVMs have been studied
using the ROC curve for both training and testing data, and they have shown to be a classification technique with excellent predictive
performance. Therefore, it is recommended to use this SVM model for the diabetic dataset's classification. In this study examination
of the efficiency of Linear Regression, Logistic Regression and Support Vector Machine (SVM) models in predicting heart disease
in individuals who have diabetes had done. For this project, allocation of 70% of the dataset for training and 30% for testing the
system had done. As a result of the test the ability to capture complicated relationships, handle outliers efficiently and come up with
detailed decision parameters, SVM beats the other two algorithms in identifying the risk of heart disease on diabetic patients.
Compared to the linear regression and logistic regression models, which had accuracies of around 74.3% and 77.93%, respectively,
our SVM model outperformed and made an accuracy of about 85.12%. Because of its outstanding performance, the SVM is an
effective tool for risk assessment and early identification, especially when it is used in healthcare prediction with a limited number
of resources where it can be difficult to obtain precise aggregated information. Our research suggests SVM have very high capability
in such cases. But however, there are some issues in SVM like, SVMs may struggle with large datasets and can be sensitive to the
choice of kernel function. In addition with, parameters for optimal performance can be challenging, making them less
straightforward for prediction in complex medical scenarios. To verify and enhance these results, more study employing larger and
more diverse datasets is required. However, the results of the research highlight the ability of the machine learning methods,
especially SVM model, to improve the healthcare diagnosis by assisting in the early detection of heart disease in individuals with
diabetes, which can help in improve the patients health and overcomes the medical expenses.
REFERENCES

[1] Han, J. and Kamber, M., 2006. Data Mining: Concepts and Techniques. A. Stephan. San Francisco, Morgan Kaufmann
Publishers is an imprint of Elsevier.
[2] Larose, D.T. and Larose, C.D., 2014. Discovering knowledge in data: an introduction to data mining (Vol. 4). John Wiley &
Sons.
[3] Sutar, J.E. and Kumbhar, V.S., An Overview of Data Mining Techniques with Applications. FUTURISTIC TRENDS IN
INFORMATION TECHNOLOGY, p.177.
[4] World Health Organization, 2006. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia: report of a
WHO/IDF consultation.
[5] Pereira, M.D.G., 2016. Beyond life style interventions in type 2 diabetes. Revista Latino Americana de Enfermagem, 24
[6] K. E. Heikes, B. Arondekar, D. M. Eddy, and L. Schlessinger, "Diabetes Risk Calculator,A simple tool for detecting
undiagnosed diabetes and pre-diabetes," Diabetes Care, vol. 31, no. 5, pp. 1040-1045, 2008
[7] Fox, C.S., Coady, S., Sorlie, P.D., D’Agostino Sr, R.B., Pencina, M.J., Vasan, R.S., Meigs, J.B., Levy, D. and Savage, P.J.,
2007. Increasing cardiovascular disease burden due to diabetes mellitus: the Framingham Heart Study. Circulation, 115(12),
pp.1544-1550.
[8] Emerging Risk Factors Collaboration, 2010. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular
disease: a collaborative meta-analysis of 102 prospective studies. The lancet, 375(9733), pp.2215-2222.
[9] Srinivas, K., Rao, G.R. and Govardhan, A., 2010, August. Analysis of coronary heart disease and prediction of heart attack in
coal mining regions using data mining techniques. In 2010 5th International Conference on Computer Science & Education
(pp. 1344-1349). IEEE.
[10] Djerioui, M., Brik, Y., Ladjal, M. and Attallah, B., 2019. Neighborhood component analysis and support vector machines for
heart disease prediction. Ingénierie des Systèmes d Inf., 24(6), pp.591-595.

You might also like