0% found this document useful (0 votes)
13 views11 pages

Rudini - Artikel Sistem Cerdas

Uploaded by

Rudini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

Rudini - Artikel Sistem Cerdas

Uploaded by

Rudini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ANDALAS JOURNAL OF ELECTRICAL AND ELECTRONIC ENGINEERING TECHNOLOGY - VOL. XX NO.

XX (20XX) XXX-XXX

Available online at: http://ajeeet.ft.unand.ac.id/

Andalas Journal of Electrical and Electronic


Engineering Technology
ISSN 2777-0079

Click here and write your Article Category

Classification of Diabetes Using Naïve Bayes, K-Means and K-Nearest


Neighbor (KNN) Methods
Rudini1
1
Department of Electrical Engineering, Engineering Faculty of Andalas University, Padang 25163, Indonesia

ARTICLE INFORMATION ABSTRACT

Type 2 diabetes mellitus (T2DM) is a prevalent chronic metabolic disease, affecting around 422 million people
globally. Characterized by chronic hyperglycemia due to insulin secretion and action disorders, T2DM accounts
for 90-95% of all diabetes cases. Major risk factors include obesity, heredity, age, inactive lifestyle, and a high-
calorie/high-fat diet. Despite the significance of age in T2DM risk, it is often not included as an independent
criterion for screening, leading to underdiagnosis in the elderly. The pathogenesis involves insulin resistance
and pancreatic beta-cell dysfunction, leading to various complications such as cardiovascular disease and
neuropathy. Beyond BMI, waist circumference and waist-to-hip ratio are also crucial indicators of T2DM risk.

In diagnosing stroke, data mining techniques such as Naïve Bayes, K-Means, and K-Nearest Neighbor (KNN)
are used. These methods were applied to the Brain Stroke Prediction Dataset from Kaggle, consisting of 4981
data points. Data preprocessing ensures high-quality input for model evaluation. Nominal and ordinal data
improve the model's accuracy. Naïve Bayes showed a test accuracy of 80.35%, while K-Means showed varying
accuracies. Results indicate that Naïve Bayes and K-Means are more suitable for diagnosing diabetes compared
to KNN.

Keywords : Type 2 diabetes mellitus (T2DM), Obesity, Naïve Bayes, K-Means, K-Nearest Neighbor (KNN),
Model accuracy

the development of this disease. As age increases, the risk of


T2DM
INTRODUCTION also rises, even in individuals with a normal BMI. Nevertheless,
many clinical guidelines do not include age as an independent
Type 2 diabetes mellitus (T2DM) is one of the most common criterion for diabetes screening, which can lead to underdiagnosis
chron-ic metabolic diseases worldwide, affecting approximately in the elderly population. The pathogenesis of T2DM involves a
422 milli- combination of insulin resistance and pancreatic beta-cell
on people according to the World Health Organization (WHO) dysfunction. In the early stages, the body attempts to compensate
report. This disease is characterized by chronic hyperglycemia due for insulin resistance by increasing insulin production. However,
to disorders in insulin secretion, insulin action,or both. T2DM over time, the ability of beta cells to secrete insulin decreases,
accountts for about 90-95% of all diabetes cases. With the resulting in hyperglycemia. Chronic hyperglycemia can lead to
increasing global population and lifestyle changes, the prevalence various serious complications, including cardiovascular disease,
of T2DM continues to rise, making it a significant public health diabetic nephropathy, retinopathy, and neuropathy. Recent
issue. research indicates that besides BMI, other body parameters such
as waist circumference and waist-to-hip ratio are also crucial
The primary risk factors for developing T2DM include obesity, indicators for assessing the risk of T2DM and other chronic
heredity, age, inactive lifestyle, as well as a high-calorie and high- conditions. Using BMI alone has limitations in measuring body
fat diet. Obesity, usually measured by body mass index (BMI), is fat distribution and overall body composition, so a combination of
a significant risk factor. However, age also plays a crucial role in several measurements can provide a more accurate risk
assessment.
2.4.1. Nominal and Ordinal Data
METHOD Nominal and ordinal data are types of data commonly
used in the data mining process.
Classification and clustering are closely related to Nominal Data is categorical data where the categories
grouping. Grouping is the determination of similar data groups, do not have an order or ranking. Examples include gender (male
while studying the structure of a data set that has been partitioned or female), marital status (single, married, divorced), or
into groups is referred to as categories or classes. The occupation (doctor, teacher, engineer). Nominal data only
classification process using the Naïve Bayes method and provides information about the category without indicating
clustering using K-Means and K-Nearest Neighbor (KNN) will be differences or order among the categories [14].
explained in the methodology section, which includes the output Ordinal Data is categorical data that has an order or
in the form of model evaluation results to measure the success of ranking. Examples include education level (elementary, middle
the system in diagnosing stroke disease. school, high school, bachelor's, master's, doctorate), satisfaction
level (very dissatisfied, dissatisfied, neutral, satisfied, very
2.1. Methodology satisfied), or disease severity (mild, moderate, severe). Ordinal
The method used in diagnosing stroke disease is applied data provides information about the categories as well as the order
to patient data that has been previously tested. The data mining or ranking among the categories, but the distances between the
techniques applied to the dataset aim to diagnose stroke disease ranks do not have to be equal or measurable.
early so that prevention and faster treatment can be carried out. The use of nominal and ordinal data in data mining
The data mining techniques used are Naïve Bayes, KMeans, and techniques is crucial because it provides additional information
K-Nearest Neighbor (KNN). The system method is shown in that can improve the accuracy of the model. In the classification
Figure 1. and clustering process, nominal and ordinal data are used to
identify patterns and relationships between variables that can be
used to make more accurate predictions or groupings.

2.4.2. Naïve Bayes


Naïve Bayes is a classification method that uses simple
probabilities by calculating a set of probabilities that sum up the
combination of values and frequencies from the given dataset.
Naïve Bayes also has the advantage of being easy to build as it
does not require complicated parameter estimation and is easy to
apply to large datasets, and the classification results are easily
interpreted by laypeople.
The equation for the Naïve Bayes method is written using
Equation 1:
P(B ∣ A) ⋅ P(A)
P( A ∣ B ) =
2.2. Data Collection and Preprocessing P(B) Where:
The initial stage of the data mining process is data • P(A) : Prior probability of hypothesis A
collection and preprocessing. The important stage is data • P(B) : Prior probability of hypothesis B
preprocessing because only valid data will produce accurate • P(A∣B) : Likelihood probability of hypothesis A given
output. Data preprocessing is carried out to process raw data into condition B
efficient and high-quality data before proceeding to the next • P(B∣A) : Posterior probability of hypothesis B given
process. In the dataset used, there are several inconsistent data, condition A
such as empty data in a variable. The preprocessing stage also The classification process using the Naïve Bayes algorithm for
involves changing the data type format of a data to match the other diagnosing stroke disease is illustrated in the flowchart in Figure.
data types.

2.3. Database
The dataset is stored in a database obtained from Kaggle
(Brain Stroke Prediction Dataset) with a total of 4981 data points
consisting of 10 variables and one output target, which is stroke
and non-stroke [13]. After obtaining the dataset, data mining
techniques are applied to diagnose stroke disease.

2.4. Data Mining Techniques


In this system's data grouping, three data mining
techniques are used: Naïve Bayes, K-Means, and K-Nearest
Neighbor (KNN). The grouped data is called the training dataset.
Using this data, data collection and preprocessing will be carried
out to obtain the testing data, making it possible to apply stroke
diagnosis. Naïve Bayes, K-Means, and KNN algorithms find the
relationship between predictor values and target values. The
model learns from the training set, and this knowledge is used as
test data for prediction in its evaluation.
2.4.3. K-Means 2.4.4. K-Nearest Neighbor (KNN)
K-Means clustering is a non-hierarchical clustering K-Nearest Neighbor (KNN) is a classification method
method that partitions objects based on their characteristics. that groups data based on proximity or similarity to existing data.
Objects with similar characteristics are grouped into the same This method is simple yet effective, especially when applied to
cluster, while objects with different characteristics are grouped datasets that are not too large [17]. The steps of the KNN
into another cluster. The steps of the K-Means algorithm are as algorithm are as follows:
follows: a. Determine the number of nearest neighbors (K value).
a. Determine the number of clusters, K. b. Calculate the distance between the data to be classified
b. Determine the initial centroid values randomly. and all data in the training set using the Euclidean distance
c. Determine the data closest to the centroid using the formula:
Euclidean distance formula shown in Equation 2.

𝐷
𝐷(𝑥,𝑦) = √(𝑥𝑖 − 𝑠𝑖)2 + (𝑦𝑖 − 𝑡𝑖)2
c. Sort these distances and select the K closest data points.
Where 𝐷(𝑥,𝑦) is the distance from data x to cluster center d. Determine the class of the new data based on the majority
y. 𝑥𝑖 and 𝑦𝑖 are centroid data. 𝑠𝑖 and 𝑡𝑖 are data records. class of the K nearest neighbors.
d. Group the data based on the closest distance to the
centroid. The classification process using the KNN algorithm for
e. Return to Step 3 (iteration) if the members of each diagnosing stroke disease is illustrated in the flowchart in Figure
cluster change from the previous iteration. Before 4.
recalculating using Equation 2, recalculate the centroid
values using the formula shown in Equation 3.

𝑆
𝑍1
Where SlS_lSl is the new cluster average, ZlZ_lZl is
the number of data in the l-th cluster, and tnlt_{nl}tnl
is the n-th pattern that is part of the l-th cluster [16].

The clustering process using the K-Means algorithm for


diagnosing stroke disease is illustrated in the flowchart in Figure
3.

3
2.4.5. Confusion Matrix Based on the confusion matrix above, the performance of the
This method is used to determine how well the data classification can be evaluated using the following calculations:
mining methods perform. Performance measurement using the 1. Accuracy: Measures how many correct predictions the
confusion matrix consists of a representation of the classification model made for the entire test dataset.
process, namely True Positive (TP), True Negative (TN), False
Positive (FP), and False Negative (FN). The confusion matrix is 𝑇𝑃 + 𝑇𝑁
shown in Figure 5. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

2. Precision: Determines whether our model is reliable or


not.

𝑇𝑃 + 𝑇𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

3. Recall (Sensitivity): Tells us how many actual positive


cases we can correctly predict with our model. High
In Table 1, it is shown that the rows in the confusion matrix recall will lead to a higher number of FP measurements
represent the predicted class values, while the columns represent and overall lower accuracy. Low recall means if a
the actual class values, where: positive case is found, it is like becoming a true
positive.
• TP (True Positive): The number of correctly predicted
positive data points.
𝑇𝑃
• TN (True Negative): The number of correctly predicted 𝑅𝑒𝑐𝑎𝑙𝑙 =
negative data points. 𝑇𝑃 + 𝐹𝑁
• FP (False Positive): The number of incorrectly
predicted positive data points. 4. F1-Score: The harmonic mean of Precision and Recall,
• FN (False Negative): The number of incorrectly providing a combined insight into these two metrics. It
predicted negative data points. is maximum when precision equals recall.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 . 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 .
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
RESULTS AND DISCUSSION NO Age BMI Chol TG HDL LDL Cr BUN
3.1 Results 32 39 22 4 0,6 1,1 2,6 67 6
In this Diabetes system, analysis has 33 38 21 3 1,2 0,6 2 20 2
been carried out using data mining methods,
34 46 21 3,7 1,3 0,8 2,4 54 2
namely Naïve Bayes, KMeans, and KNN. The
35 44 24 3,7 1,5 1,1 2 74 3,8
following are the results obtained after
performing the processes according to the 36 41 23 4,4 1,6 0,8 3 48 3,4

methods. 37 45 24 3,8 0,7 1,3 2,2 31 4,7

38 44 21 9,5 1,7 1,3 2,5 39 3


Training Data Overview
39 43 21 4,7 1,9 1,4 2,6 44 4,1

40 33 24 4,2 1,4 1,3 2,6 47 2,7

NO Age BMI Chol TG HDL LDL Cr BUN 41 49 21 5,2 1,1 0,9 1,4 74 5,7

1 50 24 4,2 0,9 2,4 1,4 46 4,7 42 31 23 4,3 2,1 1 2,4 67 3,6

2 26 23 3,7 1,4 1,1 2,1 62 4,5 43 49 24 4,7 1,8 0,7 3,3 60 3,8

3 33 21 4,9 1 0,8 2 46 7,1 44 47 22 4,7 1,6 1 2,4 48 3,7

4 45 21 2,9 1 1 1,5 24 2,3 45 44 24 4,9 1,7 1,4 2,8 49 3,1

5 50 24 3,6 1,3 0,9 2,1 50 2 46 45 22 4,3 0,9 1,1 4,2 57 4

6 48 24 2,9 0,8 0,9 1,6 47 4,7 47 43 21 4,8 1,9 1,1 3 40 2,4

7 43 21 3,8 0,9 2,4 3,7 67 2,6 48 47 20 4,1 0,7 1,7 2,8 53 3,3

8 32 24 3,8 2 2,4 3,8 28 3,6 49 50 24 3,9 0,7 2,3 0,3 69 4,4

9 31 23 3,6 0,7 1,7 1,6 55 4,4 50 50 23 3,2 0,8 1,2 1,7 52 5,4

10 33 21 4 1,1 0,9 2,7 53 3,3 51 49 22 3,1 1,3 1 2,5 52 2,3

11 30 22 4,9 1,3 1,2 3,2 42 3 52 50 22 3,4 0,7 1,1 2 32 4,5

12 45 23 4,2 1,7 1,2 2,2 54 4,6 53 49 21 4,2 0,8 0,9 3 38 2,1

13 50 24 4 1,5 1,2 2,2 39 3,5 54 44 21 5,2 1,5 1 3,5 42 3,9

14 50 21 3,6 1,1 1 2,1 74 5,5 55 48 23 3,6 0,6 2,1 1,2 55 2,8

15 50 21 5,3 0,8 1,1 4,1 53 5,9 56 47 23 4 1,3 0,9 2,6 45 4

16 49 24 5 1,3 1,2 3,3 28 2,2 57 47 24 4,3 2,3 0,9 2,4 85 7,6

17 49 23 4,4 0,9 1 1,3 55 3,8 58 44 21 4,9 2,8 2 1,8 64 6,8

18 49 24 0,5 1,9 1,3 2,8 175 13,5 59 35 22 3,8 5,9 0,5 4,3 38 3,9

19 42 24 6,2 1 1,1 4,6 73 5 60 40 23 4,8 2,5 1,1 2,7 63 5

20 33 24 4,2 1,5 1,2 2,3 62 5,3 61 35 20 4,7 2,5 1,3 2,4 50 2,8

21 50 23 4,8 1 0,9 3,6 28 4 62 42 21 3 1,1 1,1 1,4 45 2,1

22 39 21 4,6 1,3 1 3 55 3,2 63 59 23 4,9 1,2 0,9 3,4 38 5,2

23 30 22 4,9 1,3 1,2 3,2 45 3 64 31 22 4,4 1,8 1,1 2,6 6 3,9

24 30 19 5,5 1,8 1,2 3,5 80 4,8 65 40 22 7,6 1,3 0,9 3,4 40 4,7

25 41 22 2,8 2,9 0,8 3,8 99 4,2 66 41 21 3,2 4,5 1,3 1,8 48 3,8

26 33 22 3,7 1,3 0,8 2,4 54 2 67 41 21 3,4 1,2 1,7 1,1 39 2

27 44 22 5,6 1,4 1,4 3,6 49 4,3 68 43 24 4,1 1,1 1,2 2,4 54 4

28 48 23 3,2 1,8 1,6 0,9 82 7,5 69 44 21 3,4 1,3 1,3 1,5 56 4,4

29 47 24 4,6 0,8 0,9 4,2 55 4,6 70 59 24 6,3 0,6 1,1 4,9 58 4,7

30 36 22 3,6 0,7 1,3 1,9 70 3,8 71 35 23 4,1 1,9 4 1,3 44 3,3

31 47 23 6,5 1,5 0,9 4,9 67 5,6 72 51 20 4,1 1,5 0,9 2,7 88 4,5
NO Age BMI Chol TG HDL LDL Cr BUN NO Age BMI Chol TG HDL LDL Cr BUN

73 30 24 6,5 1,8 1,5 4,2 61 3,6 114 50 25 2 0,8 0,6 1 74 5

74 50 21 4,4 2,7 1,3 3,1 61 6 115 33 25 4,8 1,1 1,7 2,6 64 4,8

75 57 22 3,2 1,3 0,9 3 97 4,6 116 50 25 5,3 1,3 1 3,7 62 4,8

76 50 22 4,5 1,2 1,8 4,1 88 6,3 117 30 22 5,4 1,7 1,4 3,3 53 5,7

77 35 24 4,3 1,3 0,8 1,3 61 3,6 118 50 19 5,3 1,3 1 3,7 62 4,8

78 63 20 4,8 1,7 1,1 3 106 6,6 119 50 25 5,4 1,7 1,4 3,3 53 5,7

79 36 20 4,9 2,5 0,9 1,9 70 3,3 120 49 25 4,8 1,4 0,7 3,9 60 4,6

80 50 21 2 1,2 1,3 3 61 5,8 121 33 19 4,2 1,4 1,3 2,6 47 2,7

81 25 22 4,3 3,5 0,8 1,3 35 10 122 49 24 4,8 1,1 1,7 2,6 46 4,8

82 40 24 4,6 1,5 0,7 3 123 5,8 123 50 24 4,2 2,2 0,8 2,5 53 4,7

83 40 22 4,3 0,8 0,8 1,8 79 6,3 124 50 25 4 2,1 1,4 1,9 59 3,5

84 50 21 3,2 1,8 1,6 0,9 97 5,5 125 55 24 3,6 3 1,5 0,8 60 4,8

85 30 24 3,9 1,6 0,9 3,3 79 5,5 126 40 30 2,1 2,3 0,9 2,8 52 2,1

86 50 22 3,8 5,9 0,5 4,3 203 9,6 127 40 31 6,5 3,8 1 3,9 64 3,4

87 60 24 3,4 5,3 1,1 3,6 70 7,5 128 35 32 4 2,5 1,3 2,3 37 4,4

88 77 24 3,9 2,1 1,2 4,2 106 5 129 41 21 4,7 5,3 0,9 1,7 62 5,9

89 44 21 5,2 1,9 2,5 3 132 7,3 130 43 29 4,3 1,8 1,6 1,9 60 4,4

90 40 24 3,1 1,6 1,1 1,3 159 22 131 30 21 4,9 1,6 1,7 2,5 344 17,1

91 54 20 4,3 2 1,3 2,2 106 6,3 132 54 28 4,4 2,9 0,6 2,5 88 4

92 50 24 3,7 0,9 1,2 2,7 70 3,3 133 30 19 4,2 1,7 1,2 2,2 97 6

93 60 24 3,4 5,3 1,1 3,6 70 7,5 134 31 37 4,1 2,2 0,7 2,4 60 3

94 77 19 0 2,8 0,8 1,8 106 5 135 30 27 4,1 1,1 1,2 2,4 81 7,1

95 59 22 4,5 1,8 1,8 1,8 58 4,7 136 45 34 4,8 1,3 0,9 3,3 63 4,1

96 38 24 4,5 1,7 0,9 2,8 83 6,1 137 45 29 3,9 1,5 1,3 2 77 5,3

97 34 23 6,2 3,9 0,8 1,9 81 3,9 138 31 24 4,9 1,6 1 3,2 55 3,4

98 34 23 6,2 3,9 0,8 3,8 81 3,9 139 30 34 4,5 1,8 1,2 2,6 80 5

99 31 24 4,9 1,6 1 3,2 55 3,4 140 35 27 3,7 1 1,2 2 64 4,8

100 43 25 4,7 5,3 0,9 1,7 55 2,1 141 45 31 4,7 1,8 0,8 3,1 82 4,8

101 42 23 5,9 3,7 1,3 3,1 53 5,4 142 45 22 6,1 3,7 0,7 3,9 80 3,6

102 47 23 3,7 1,8 1 2 87 4,1 143 50 29 4,4 2 1 2,5 56 4

103 50 24 4 3 1 1,8 59 4,3 144 48 23 4,4 2,3 1,3 2,2 38 4

104 49 25 2 0,8 0,6 1 74 5 145 38 40 5,3 2 1,6 2,9 59 5,8

105 50 25 4,2 2,2 0,8 2,5 53 4,7 146 46 24 5,7 3,8 1,3 2,8 59 3

106 49 24 4 2,1 1,4 1,9 59 3,5 147 45 25 4,4 1,5 1 2,8 42 2,3

107 49 21 5,6 1,9 0,75 1,35 44 3,3 148 54 22 9,5 1,7 1,3 2,5 39 3

108 50 24 2 0,8 0,6 1 74 5 149 43 21 5,9 2 1,1 3,9 62 5,4

109 49 23 5,6 1,9 0,75 1,35 44 3,3 150 49 24 5,1 1,7 3,9 0,8 65 3,9

110 39 22 4,7 1,3 1,1 3,1 38 3 151 49 25 6 3,5 1,1 3,5 56 3,8

111 50 24 4 2,4 1 1,8 59 4,3 152 45 24 5,9 1,8 1,6 3,5 54 3,1

112 39 24 4,7 1,3 1,1 3,1 46 3 153 47 23 6,3 2,2 1,1 2,8 65 3,5

113 49 24 3,6 2,4 1,9 1,1 75 3,1 154 38 47 5,2 2 1,1 3,2 67 4
NO Age BMI Chol TG HDL LDL Cr BUN NO Age BMI Chol TG HDL LDL Cr BUN

155 42 25 4,7 2,5 1,3 2,4 39 2,8 196 60 27 7,2 2,2 0,8 2,2 45 2

156 39 25 6,7 2,5 1,1 4,5 49 4,3 197 60 27 7,2 2,2 1 2,2 45 2

157 30 25 5,5 1,8 1,2 3,5 80 4,8 198 73 27 5,3 1,4 1,5 3,2 79 4,3

158 40 24 5 2,1 1,6 3 76 5,9 199 61 28 4,1 4,2 1,2 1,4 23 2,1

159 46 24 6,8 0,7 1,7 4,7 47 4,4 200 51 32 3,5 1,8 1,8 1,95 70 6,5

160 45 25 2,5 2,2 1 0,6 49 3,7 201 55 31 4,5 1,5 1,2 2,7 64 4,16

161 33 21 2,4 1,9 0,8 2,5 76 3,3 202 55 30 4,6 1,7 1 2,9 52 2,7

162 40 40 4 1,8 0,9 2,4 72 4,3 203 73 28 5,3 1,4 1,5 3,2 79 4,3

163 40 28 4,4 1,4 1,3 2,5 74 7,1 204 63 29 5,9 2,2 1,2 3,7 93 8,7

164 50 23 5,2 2,1 1,1 3,2 67 7,7 205 52 31 2,7 1,2 0,8 1,4 76 6

165 63 32 5,8 1,7 1,7 3,4 96 6,6 206 55 30 4,1 2,7 1 2 46 2,1

166 44 23 6,2 2,3 1,2 4,1 64 6,8 207 51 36 4,1 2,7 1 2 46 2,1

167 49 25 4,2 1,1 1,1 2,7 53 4,3 208 57 34 4,5 1,6 2,1 1,9 72 4,8

168 42 22 5,6 2,1 0,9 3,8 91 4,6 209 55 31 4,5 1,8 1,1 2,7 78 5,1

169 44 25 5,3 1,8 0,9 3,6 32 4 210 58 33 6,6 2,9 1,1 4,3 800 20,8

170 33 31 3,7 1,2 1,6 1,5 31 1,8 211 60 26 4,4 2,1 1,1 2,5 72 6

171 48 25 4,4 2,3 1,3 2,2 38 4 212 56 26 4,7 1,3 0,9 3,3 60 3,5

172 57 37 4 6 2,5 3,5 370 4,6 213 61 38 2,6 1,1 0,9 1,6 92 5,7

173 47 23 5,3 2,3 0,7 3,7 68 5,1 214 73 34 4,2 1,9 1,95 9,9 67 4,3

174 57 37 6,1 6 2,5 3,5 370 4,6 215 55 35 4,3 1,5 1 2,6 46 3,8

175 33 24 6,2 3,8 0,8 3,7 56 4,6 216 60 37 4,7 1,3 0,9 3,3 60 3,5

176 33 23 6,8 3,1 1 3,9 48 5,7 217 53 39 5,4 3,8 1,9 3 68 4,5

177 34 21 5,1 1,2 1,4 0,9 80 7,7 218 54 33 3,8 1,7 1,1 3 67 5

178 43 23 6,2 3,2 1 3,9 42 3,2 219 61 38 2,6 1,1 0,9 2 92 5,7

179 28 24 5,3 3,2 0,8 0,8 73 4,1 220 54 33 2 1,9 0,9 2,5 25 1,2

180 47 24 7 2,8 0,9 4,9 62 5,8 221 66 26 4,2 1 1,4 2,4 46 3,2

181 39 20 7,1 1,5 1,2 4,1 80 9,1 222 52 33 6 1,2 1,1 2 34 2

182 39 25 4,4 1,7 2,8 0,7 45 4,2 223 61 29 4,4 2 1 2,5 56 4,3

183 49 23 6,6 3,8 1 4,1 23 2,2 224 55 29 4,1 1 1,1 2,1 44 2,9

184 50 24 6,3 4,4 1 3,6 106 2,6 225 56 32 4,9 2,5 0,5 3,4 33 3,2

185 56 35 4,8 1,7 1,3 2,8 92 8,5 226 55 30 5,2 1,8 1,3 3,2 85 5,4

186 51 31 3,8 3,8 1 1,1 65 7,3 227 56 30 4,1 0,6 1,3 1,4 45 4

187 52 33 3,8 3,2 0,8 1,7 60 3 228 66 30 3,6 5,1 0,9 2,5 63 4,1

188 56 39 4,1 1,5 0,8 1,7 44 3,4 229 66 33 5,8 3,3 1 3,4 146 14,1

189 59 38 4,2 2 2,3 0,9 91 3,8

190 54 37 3,1 1,1 3,1 1,2 52 4,3 1. Results using Naïve Bayes
191 69 33 5,4 1,3 1,7 3,1 71 5,9
Diabetes with the Naïve Bayes
192 60 26 4,7 2,3 1,4 1,6 76 6,6
classification method based on the data
193 54 32 5,4 1,3 1,7 3,1 71 5,9
obtained produces test data accuracy of
194 57 33 5,5 1,9 1 3,7 77 2
80.34934%, with test data accuracy of 93%.
195 55 33 5,6 4,6 0,8 2,9 76 5 The confusion matrix and evaluation results are
shown in the table below.
A. Data Latih 2. Results using K-Means

Matriks Konfusi Diabetes classification using the K-Means


Kelas Uji method based on the training data obtained
Tidak produces an unindicated accuracy of 75%, and
Kelas Asli Terindikasi Total an indicated data accuracy of 44%. Based on
terindikasi
Penyakit the test data obtained, the unindicated accuracy
Penyakit
(1) was 78%, and the indicated data accuracy was
(0)
Terindikasi 92%. The confusion matrix and evaluation
penyakit 68 28 96 results are shown in the table below.
(1)
Tidak A. Data Latih
terindikasi
17 116 133
penyakit Kelas Asli Kelas Uji
(0) Total
Tidak
Total Data 229 Indikasi Indikasi
Tidak
Metriks evaluasi Indikasi 100 33 133
F1 Indikasi 54 42 96
ERR ACC TPR FPR Recall Prisn
Score Total data 229
19% 80% 70% 12% 70% 80% 21%
Jumlah TP FP
No Kategori Sampel Rate Rate Akurasi
B. Data Uji Tidak
1 Indikasi 133 0,75 0,25 75%
Matriks Konfusi 2 Indikasi 96 0,44 0,56 44%
Kelas Uji
Tidak
Kelas Asli Terindikasi
terindikasi Total B. Data Uji
Penyakit
Penyakit
(1)
(0) Kelas Kelas
Terindikasi Asli Uji
penyakit 47 3 50 Total
Tidak
(1) Indikasi Indikasi
Tidak Tidak
terindikasi Indikasi 39 11 50
4 46 50
penyakit Indikasi 4 46 50
(0) Total
Total Data 100 data 100

Jumlah TP FP
Metriks evaluasi No Kategori Sampel Rate Rate Akurasi
F1 Tidak
ERR ACC TPR FPR Rcl Prn
Score 1 Indikasi 50 0,78 0,22 78%
7% 93% 94% 8% 94% 92% 14% 2 Indikasi 50 0,92 0,08 92%
Discussion CONFESSION

Based on the results obtained after The author would like to thank various
testing using the three data methods for parties who have played a role in preparing this
diagnosing Diabetes, accuracy values were article. Therefore, with great respect, sincerity
obtained from the three methods, namely: and humility, the author would like to thank:
Naïve Bayes, K-Means, and KNN. It can be 1. Prof. Dr. Eng. Ir. Muhammad Ilhamdi
seen that the accuracy, precision and recall Rusydi, S.T.,
values for Naïve Bayes and K-Means are higher M.T., as the supervisor who has provided
than KNN. This shows that the more suitable lots of input and suggestions regarding writing
methods for diagnosing Diabetes are Naïve this article.
Bayes and K-Means. 2. The author's parents, siblings, and the
author's extended family have provided many
CONCLUSION prayers and encouragement until the
It can be concluded that Diabetes completion of this article.
diagnosis using an intelligent system can be 3. Fellow Andalas University students who
done using various methods, including Naïve have provided support and motivation to the
Bayes, K-Means, and K-Nearest Neighbor author.
(KNN). Naïve Bayes uses probability to group 4. All parties who have helped in completing
existing data, while K-Means uses the distance this article, whose names the author cannot
between each data point, and KNN uses the mention one by one. May Allah bestow His
closest distance from the selected data. To mercy and guidance on them.
evaluate the accuracy, precision and recall of May Allah SWT reward all your guidance, help
these methods, a confusion matrix is needed. and support. Amen.
The results and discussion show that the
accuracy of the Naïve Bayes and K-Means
methods is much higher than the KNN method.
Therefore, it can be concluded that the Naïve
Bayes and K-Means methods are more suitable
for implementing intelligent systems in
diagnosing Diabetes.
REFERENCES 10) [https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC6444850/](https://www.n
1) [https://www.ncbi.nlm.nih.gov/pmc/a
cbi.nlm.nih.gov/pmc/articles/PMC64
rticles/PMC7056531/](https://www.n
44850/)
cbi.nlm.nih.gov/pmc/articles/PMC70
11) Berl T, Schrier RW. Disorders of
56531/)
water metabolism. Chapter 1. In:
2) [https://www.ncbi.nlm.nih.gov/pmc/a
Schrier RW, editor. *Renal and
rticles/PMC8920809/](https://www.n
Electrolyte Disorders*. 6th ed.
cbi.nlm.nih.gov/pmc/articles/PMC89
Philadelphia: Lippincott Williams and
20809/)
Wilkins; 2002. pp. 1–63. [Google
3) [https://www.ncbi.nlm.nih.gov/pmc/a
Scholar]
rticles/PMC9316578/](https://www.n
12) Dossetor JB. Creatininemia versus
cbi.nlm.nih.gov/pmc/articles/PMC93
uremia. The relative significance of
16578/)
blood urea nitrogen and serum
4) [https://www.ncbi.nlm.nih.gov/pmc/a
creatinine concentrations in azotemia.
rticles/PMC7054063/](https://www.n
*Ann Intern Med*. 1966;65:1287–
cbi.nlm.nih.gov/pmc/articles/PMC70
1299. doi: 10.7326/0003-4819-65-6-
54063/)
1287. [PubMed] [CrossRef] [Google
5) [https://www.ncbi.nlm.nih.gov/pmc/a
Scholar]
rticles/PMC10724412/](https://www.
13) Hosten AO. BUN and creatinine. In:
ncbi.nlm.nih.gov/pmc/articles/PMC1
Walker HK, Hall WD, Hurst JW,
0724412)
editors. *Clinical Methods: The
6) [https://www.ncbi.nlm.nih.gov/pmc/a
History, Physical, and Laboratory
rticles/PMC5586853/](https://www.n
Examinations*. 3rd ed. Boston:
cbi.nlm.nih.gov/pmc/articles/PMC55
Butterworths; 1990. pp. 874–878.
86853/)
[PubMed] [Google Scholar]
7) [https://www.ncbi.nlm.nih.gov/books
14) Kalim S, Karumanchi SA, Thadhani
/NBK507821/](https://www.ncbi.nlm
RI, Berg AH. Protein carbamylation
.nih.gov/books/NBK507821/)
in kidney disease: pathogenesis and
8) [https://www.ncbi.nlm.nih.gov/pmc/a
clinical implications. *Am J Kidney
rticles/PMC10663898/](https://www.
Dis*. 2014;64:793–803. doi:
ncbi.nlm.nih.gov/pmc/articles/PMC1
10.1053/j.ajkd.2014.04.034. [PMC
0663898)
free article] [PubMed] [CrossRef]
9) [https://www.ncbi.nlm.nih.gov/pmc/a
[Google Scholar]
rticles/PMC8173137/](https://www.n
15) Lau WL, Vaziri ND. Urea, a true
cbi.nlm.nih.gov/pmc/articles/PMC81
uremic toxin: the empire strikes back.
73137/)
*Clin Sci (Lond)*. 2017;131:3–12.
doi: 10.1042/CS20160203. [PubMed] 20) Cirillo P, Gersch MS, Mu W, Scherer
[CrossRef] [Google Scholar] PM, Kim KM, Gesualdo L, et al.
16) Vanholder R, Gryp T, Glorieux G. Ketohexokinase-dependent
Urea and chronic kidney disease: the metabolism of fructose induces
comeback of the century? (in uraemia proinflammatory mediators in
research). *Nephrol Dial Transplant*. proximal tubular cells. *J Am Soc
2018;33:4–12. doi: Nephrol*. 2009;20:545–553. doi:
10.1093/ndt/gfx039. [PubMed] 10.1681/ASN.2008060576. [PMC
[CrossRef] [Google Scholar] free article] [PubMed] [CrossRef]
17) Cauthen CA, Lipinski MJ, Abbate A, [Google Scholar]
Appleton D, Nusca A, Varma A, et al.
Relation of blood urea nitrogen to
long-term mortality in patients with
heart failure. *Am J Cardiol*.
2008;101:1643–1647. doi:
10.1016/j.amjcard.2008.01.047.
[PubMed] [CrossRef] [Google
Scholar]
18) Matsushita K, Kwak L, Hyun N,
Bessel M, Agarwal SK, Loehr LR, et
al. Community burden and prognostic
impact of reduced kidney function
among patients hospitalized with
acute decompensated heart failure: the
atherosclerosis risk in communities
(ARIC) study community
surveillance. *PLoS One*.
2017;12:e0181373. doi:
10.1371/journal.pone.0181373.
[PMC free article] [PubMed]
[CrossRef] [Google Scholar]
19) Bouby N, Bachmann S, Bichet D,
Bankir L. Effect of water intake on the
progression of chronic renal failure in
the 5/6 nephrectomized rat. *Am J
Phys*. 1990;258:F973–F979.
[PubMed] [Google Scholar]

You might also like