0% found this document useful (0 votes)

13 views11 pages

Rudini - Artikel Sistem Cerdas

Uploaded by

Rudini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Rudini - Artikel Sistem Cerdas

Uploaded by

Rudini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

ANDALAS JOURNAL OF ELECTRICAL AND ELECTRONIC ENGINEERING TECHNOLOGY - VOL. XX NO.

XX (20XX) XXX-XXX

Available online at: http://ajeeet.ft.unand.ac.id/

Andalas Journal of Electrical and Electronic

Engineering Technology
ISSN 2777-0079

Click here and write your Article Category

Classification of Diabetes Using Naïve Bayes, K-Means and K-Nearest

Neighbor (KNN) Methods
Rudini1
1
Department of Electrical Engineering, Engineering Faculty of Andalas University, Padang 25163, Indonesia

ARTICLE INFORMATION ABSTRACT

Type 2 diabetes mellitus (T2DM) is a prevalent chronic metabolic disease, affecting around 422 million people
globally. Characterized by chronic hyperglycemia due to insulin secretion and action disorders, T2DM accounts
for 90-95% of all diabetes cases. Major risk factors include obesity, heredity, age, inactive lifestyle, and a high-
calorie/high-fat diet. Despite the significance of age in T2DM risk, it is often not included as an independent
criterion for screening, leading to underdiagnosis in the elderly. The pathogenesis involves insulin resistance
and pancreatic beta-cell dysfunction, leading to various complications such as cardiovascular disease and
neuropathy. Beyond BMI, waist circumference and waist-to-hip ratio are also crucial indicators of T2DM risk.

In diagnosing stroke, data mining techniques such as Naïve Bayes, K-Means, and K-Nearest Neighbor (KNN)
are used. These methods were applied to the Brain Stroke Prediction Dataset from Kaggle, consisting of 4981
data points. Data preprocessing ensures high-quality input for model evaluation. Nominal and ordinal data
improve the model's accuracy. Naïve Bayes showed a test accuracy of 80.35%, while K-Means showed varying
accuracies. Results indicate that Naïve Bayes and K-Means are more suitable for diagnosing diabetes compared
to KNN.

Keywords : Type 2 diabetes mellitus (T2DM), Obesity, Naïve Bayes, K-Means, K-Nearest Neighbor (KNN),
Model accuracy

the development of this disease. As age increases, the risk of

T2DM
INTRODUCTION also rises, even in individuals with a normal BMI. Nevertheless,
many clinical guidelines do not include age as an independent
Type 2 diabetes mellitus (T2DM) is one of the most common criterion for diabetes screening, which can lead to underdiagnosis
chron-ic metabolic diseases worldwide, affecting approximately in the elderly population. The pathogenesis of T2DM involves a
422 milli- combination of insulin resistance and pancreatic beta-cell
on people according to the World Health Organization (WHO) dysfunction. In the early stages, the body attempts to compensate
report. This disease is characterized by chronic hyperglycemia due for insulin resistance by increasing insulin production. However,
to disorders in insulin secretion, insulin action,or both. T2DM over time, the ability of beta cells to secrete insulin decreases,
accountts for about 90-95% of all diabetes cases. With the resulting in hyperglycemia. Chronic hyperglycemia can lead to
increasing global population and lifestyle changes, the prevalence various serious complications, including cardiovascular disease,
of T2DM continues to rise, making it a significant public health diabetic nephropathy, retinopathy, and neuropathy. Recent
issue. research indicates that besides BMI, other body parameters such
as waist circumference and waist-to-hip ratio are also crucial
The primary risk factors for developing T2DM include obesity, indicators for assessing the risk of T2DM and other chronic
heredity, age, inactive lifestyle, as well as a high-calorie and high- conditions. Using BMI alone has limitations in measuring body
fat diet. Obesity, usually measured by body mass index (BMI), is fat distribution and overall body composition, so a combination of
a significant risk factor. However, age also plays a crucial role in several measurements can provide a more accurate risk
assessment.
2.4.1. Nominal and Ordinal Data
METHOD Nominal and ordinal data are types of data commonly
used in the data mining process.
Classification and clustering are closely related to Nominal Data is categorical data where the categories
grouping. Grouping is the determination of similar data groups, do not have an order or ranking. Examples include gender (male
while studying the structure of a data set that has been partitioned or female), marital status (single, married, divorced), or
into groups is referred to as categories or classes. The occupation (doctor, teacher, engineer). Nominal data only
classification process using the Naïve Bayes method and provides information about the category without indicating
clustering using K-Means and K-Nearest Neighbor (KNN) will be differences or order among the categories [14].
explained in the methodology section, which includes the output Ordinal Data is categorical data that has an order or
in the form of model evaluation results to measure the success of ranking. Examples include education level (elementary, middle
the system in diagnosing stroke disease. school, high school, bachelor's, master's, doctorate), satisfaction
level (very dissatisfied, dissatisfied, neutral, satisfied, very
2.1. Methodology satisfied), or disease severity (mild, moderate, severe). Ordinal
The method used in diagnosing stroke disease is applied data provides information about the categories as well as the order
to patient data that has been previously tested. The data mining or ranking among the categories, but the distances between the
techniques applied to the dataset aim to diagnose stroke disease ranks do not have to be equal or measurable.
early so that prevention and faster treatment can be carried out. The use of nominal and ordinal data in data mining
The data mining techniques used are Naïve Bayes, KMeans, and techniques is crucial because it provides additional information
K-Nearest Neighbor (KNN). The system method is shown in that can improve the accuracy of the model. In the classification
Figure 1. and clustering process, nominal and ordinal data are used to
identify patterns and relationships between variables that can be
used to make more accurate predictions or groupings.

2.4.2. Naïve Bayes

Naïve Bayes is a classification method that uses simple
probabilities by calculating a set of probabilities that sum up the
combination of values and frequencies from the given dataset.
Naïve Bayes also has the advantage of being easy to build as it
does not require complicated parameter estimation and is easy to
apply to large datasets, and the classification results are easily
interpreted by laypeople.
The equation for the Naïve Bayes method is written using
Equation 1:
P(B ∣ A) ⋅ P(A)
P( A ∣ B ) =
2.2. Data Collection and Preprocessing P(B) Where:
The initial stage of the data mining process is data • P(A) : Prior probability of hypothesis A
collection and preprocessing. The important stage is data • P(B) : Prior probability of hypothesis B
preprocessing because only valid data will produce accurate • P(A∣B) : Likelihood probability of hypothesis A given
output. Data preprocessing is carried out to process raw data into condition B
efficient and high-quality data before proceeding to the next • P(B∣A) : Posterior probability of hypothesis B given
process. In the dataset used, there are several inconsistent data, condition A
such as empty data in a variable. The preprocessing stage also The classification process using the Naïve Bayes algorithm for
involves changing the data type format of a data to match the other diagnosing stroke disease is illustrated in the flowchart in Figure.
data types.

2.3. Database
The dataset is stored in a database obtained from Kaggle
(Brain Stroke Prediction Dataset) with a total of 4981 data points
consisting of 10 variables and one output target, which is stroke
and non-stroke [13]. After obtaining the dataset, data mining
techniques are applied to diagnose stroke disease.

2.4. Data Mining Techniques

In this system's data grouping, three data mining
techniques are used: Naïve Bayes, K-Means, and K-Nearest
Neighbor (KNN). The grouped data is called the training dataset.
Using this data, data collection and preprocessing will be carried
out to obtain the testing data, making it possible to apply stroke
diagnosis. Naïve Bayes, K-Means, and KNN algorithms find the
relationship between predictor values and target values. The
model learns from the training set, and this knowledge is used as
test data for prediction in its evaluation.
2.4.3. K-Means 2.4.4. K-Nearest Neighbor (KNN)
K-Means clustering is a non-hierarchical clustering K-Nearest Neighbor (KNN) is a classification method
method that partitions objects based on their characteristics. that groups data based on proximity or similarity to existing data.
Objects with similar characteristics are grouped into the same This method is simple yet effective, especially when applied to
cluster, while objects with different characteristics are grouped datasets that are not too large [17]. The steps of the KNN
into another cluster. The steps of the K-Means algorithm are as algorithm are as follows:
follows: a. Determine the number of nearest neighbors (K value).
a. Determine the number of clusters, K. b. Calculate the distance between the data to be classified
b. Determine the initial centroid values randomly. and all data in the training set using the Euclidean distance
c. Determine the data closest to the centroid using the formula:
Euclidean distance formula shown in Equation 2.

𝐷
𝐷(𝑥,𝑦) = √(𝑥𝑖 − 𝑠𝑖)2 + (𝑦𝑖 − 𝑡𝑖)2
c. Sort these distances and select the K closest data points.
Where 𝐷(𝑥,𝑦) is the distance from data x to cluster center d. Determine the class of the new data based on the majority
y. 𝑥𝑖 and 𝑦𝑖 are centroid data. 𝑠𝑖 and 𝑡𝑖 are data records. class of the K nearest neighbors.
d. Group the data based on the closest distance to the
centroid. The classification process using the KNN algorithm for
e. Return to Step 3 (iteration) if the members of each diagnosing stroke disease is illustrated in the flowchart in Figure
cluster change from the previous iteration. Before 4.
recalculating using Equation 2, recalculate the centroid
values using the formula shown in Equation 3.

𝑆
𝑍1
Where SlS_lSl is the new cluster average, ZlZ_lZl is
the number of data in the l-th cluster, and tnlt_{nl}tnl
is the n-th pattern that is part of the l-th cluster [16].

The clustering process using the K-Means algorithm for

diagnosing stroke disease is illustrated in the flowchart in Figure
3.

3
2.4.5. Confusion Matrix Based on the confusion matrix above, the performance of the
This method is used to determine how well the data classification can be evaluated using the following calculations:
mining methods perform. Performance measurement using the 1. Accuracy: Measures how many correct predictions the
confusion matrix consists of a representation of the classification model made for the entire test dataset.
process, namely True Positive (TP), True Negative (TN), False
Positive (FP), and False Negative (FN). The confusion matrix is 𝑇𝑃 + 𝑇𝑁
shown in Figure 5. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

2. Precision: Determines whether our model is reliable or

not.

𝑇𝑃 + 𝑇𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

3. Recall (Sensitivity): Tells us how many actual positive

cases we can correctly predict with our model. High
In Table 1, it is shown that the rows in the confusion matrix recall will lead to a higher number of FP measurements
represent the predicted class values, while the columns represent and overall lower accuracy. Low recall means if a
the actual class values, where: positive case is found, it is like becoming a true
positive.
• TP (True Positive): The number of correctly predicted
positive data points.
𝑇𝑃
• TN (True Negative): The number of correctly predicted 𝑅𝑒𝑐𝑎𝑙𝑙 =
negative data points. 𝑇𝑃 + 𝐹𝑁
• FP (False Positive): The number of incorrectly
predicted positive data points. 4. F1-Score: The harmonic mean of Precision and Recall,
• FN (False Negative): The number of incorrectly providing a combined insight into these two metrics. It
predicted negative data points. is maximum when precision equals recall.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 . 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 .
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
RESULTS AND DISCUSSION NO Age BMI Chol TG HDL LDL Cr BUN
3.1 Results 32 39 22 4 0,6 1,1 2,6 67 6
In this Diabetes system, analysis has 33 38 21 3 1,2 0,6 2 20 2
been carried out using data mining methods,
34 46 21 3,7 1,3 0,8 2,4 54 2
namely Naïve Bayes, KMeans, and KNN. The
35 44 24 3,7 1,5 1,1 2 74 3,8
following are the results obtained after
performing the processes according to the 36 41 23 4,4 1,6 0,8 3 48 3,4

methods. 37 45 24 3,8 0,7 1,3 2,2 31 4,7

38 44 21 9,5 1,7 1,3 2,5 39 3

Training Data Overview
39 43 21 4,7 1,9 1,4 2,6 44 4,1

40 33 24 4,2 1,4 1,3 2,6 47 2,7

NO Age BMI Chol TG HDL LDL Cr BUN 41 49 21 5,2 1,1 0,9 1,4 74 5,7

1 50 24 4,2 0,9 2,4 1,4 46 4,7 42 31 23 4,3 2,1 1 2,4 67 3,6

2 26 23 3,7 1,4 1,1 2,1 62 4,5 43 49 24 4,7 1,8 0,7 3,3 60 3,8

3 33 21 4,9 1 0,8 2 46 7,1 44 47 22 4,7 1,6 1 2,4 48 3,7

4 45 21 2,9 1 1 1,5 24 2,3 45 44 24 4,9 1,7 1,4 2,8 49 3,1

5 50 24 3,6 1,3 0,9 2,1 50 2 46 45 22 4,3 0,9 1,1 4,2 57 4

6 48 24 2,9 0,8 0,9 1,6 47 4,7 47 43 21 4,8 1,9 1,1 3 40 2,4

7 43 21 3,8 0,9 2,4 3,7 67 2,6 48 47 20 4,1 0,7 1,7 2,8 53 3,3

8 32 24 3,8 2 2,4 3,8 28 3,6 49 50 24 3,9 0,7 2,3 0,3 69 4,4

9 31 23 3,6 0,7 1,7 1,6 55 4,4 50 50 23 3,2 0,8 1,2 1,7 52 5,4

10 33 21 4 1,1 0,9 2,7 53 3,3 51 49 22 3,1 1,3 1 2,5 52 2,3

11 30 22 4,9 1,3 1,2 3,2 42 3 52 50 22 3,4 0,7 1,1 2 32 4,5

12 45 23 4,2 1,7 1,2 2,2 54 4,6 53 49 21 4,2 0,8 0,9 3 38 2,1

13 50 24 4 1,5 1,2 2,2 39 3,5 54 44 21 5,2 1,5 1 3,5 42 3,9

14 50 21 3,6 1,1 1 2,1 74 5,5 55 48 23 3,6 0,6 2,1 1,2 55 2,8

15 50 21 5,3 0,8 1,1 4,1 53 5,9 56 47 23 4 1,3 0,9 2,6 45 4

16 49 24 5 1,3 1,2 3,3 28 2,2 57 47 24 4,3 2,3 0,9 2,4 85 7,6

17 49 23 4,4 0,9 1 1,3 55 3,8 58 44 21 4,9 2,8 2 1,8 64 6,8

18 49 24 0,5 1,9 1,3 2,8 175 13,5 59 35 22 3,8 5,9 0,5 4,3 38 3,9

19 42 24 6,2 1 1,1 4,6 73 5 60 40 23 4,8 2,5 1,1 2,7 63 5

20 33 24 4,2 1,5 1,2 2,3 62 5,3 61 35 20 4,7 2,5 1,3 2,4 50 2,8

21 50 23 4,8 1 0,9 3,6 28 4 62 42 21 3 1,1 1,1 1,4 45 2,1

22 39 21 4,6 1,3 1 3 55 3,2 63 59 23 4,9 1,2 0,9 3,4 38 5,2

23 30 22 4,9 1,3 1,2 3,2 45 3 64 31 22 4,4 1,8 1,1 2,6 6 3,9

24 30 19 5,5 1,8 1,2 3,5 80 4,8 65 40 22 7,6 1,3 0,9 3,4 40 4,7

25 41 22 2,8 2,9 0,8 3,8 99 4,2 66 41 21 3,2 4,5 1,3 1,8 48 3,8

26 33 22 3,7 1,3 0,8 2,4 54 2 67 41 21 3,4 1,2 1,7 1,1 39 2

27 44 22 5,6 1,4 1,4 3,6 49 4,3 68 43 24 4,1 1,1 1,2 2,4 54 4

28 48 23 3,2 1,8 1,6 0,9 82 7,5 69 44 21 3,4 1,3 1,3 1,5 56 4,4

29 47 24 4,6 0,8 0,9 4,2 55 4,6 70 59 24 6,3 0,6 1,1 4,9 58 4,7

30 36 22 3,6 0,7 1,3 1,9 70 3,8 71 35 23 4,1 1,9 4 1,3 44 3,3

31 47 23 6,5 1,5 0,9 4,9 67 5,6 72 51 20 4,1 1,5 0,9 2,7 88 4,5
NO Age BMI Chol TG HDL LDL Cr BUN NO Age BMI Chol TG HDL LDL Cr BUN

73 30 24 6,5 1,8 1,5 4,2 61 3,6 114 50 25 2 0,8 0,6 1 74 5

74 50 21 4,4 2,7 1,3 3,1 61 6 115 33 25 4,8 1,1 1,7 2,6 64 4,8

75 57 22 3,2 1,3 0,9 3 97 4,6 116 50 25 5,3 1,3 1 3,7 62 4,8

76 50 22 4,5 1,2 1,8 4,1 88 6,3 117 30 22 5,4 1,7 1,4 3,3 53 5,7

77 35 24 4,3 1,3 0,8 1,3 61 3,6 118 50 19 5,3 1,3 1 3,7 62 4,8

78 63 20 4,8 1,7 1,1 3 106 6,6 119 50 25 5,4 1,7 1,4 3,3 53 5,7

79 36 20 4,9 2,5 0,9 1,9 70 3,3 120 49 25 4,8 1,4 0,7 3,9 60 4,6

80 50 21 2 1,2 1,3 3 61 5,8 121 33 19 4,2 1,4 1,3 2,6 47 2,7

81 25 22 4,3 3,5 0,8 1,3 35 10 122 49 24 4,8 1,1 1,7 2,6 46 4,8

82 40 24 4,6 1,5 0,7 3 123 5,8 123 50 24 4,2 2,2 0,8 2,5 53 4,7

83 40 22 4,3 0,8 0,8 1,8 79 6,3 124 50 25 4 2,1 1,4 1,9 59 3,5

84 50 21 3,2 1,8 1,6 0,9 97 5,5 125 55 24 3,6 3 1,5 0,8 60 4,8

85 30 24 3,9 1,6 0,9 3,3 79 5,5 126 40 30 2,1 2,3 0,9 2,8 52 2,1

86 50 22 3,8 5,9 0,5 4,3 203 9,6 127 40 31 6,5 3,8 1 3,9 64 3,4

87 60 24 3,4 5,3 1,1 3,6 70 7,5 128 35 32 4 2,5 1,3 2,3 37 4,4

88 77 24 3,9 2,1 1,2 4,2 106 5 129 41 21 4,7 5,3 0,9 1,7 62 5,9

89 44 21 5,2 1,9 2,5 3 132 7,3 130 43 29 4,3 1,8 1,6 1,9 60 4,4

90 40 24 3,1 1,6 1,1 1,3 159 22 131 30 21 4,9 1,6 1,7 2,5 344 17,1

91 54 20 4,3 2 1,3 2,2 106 6,3 132 54 28 4,4 2,9 0,6 2,5 88 4

92 50 24 3,7 0,9 1,2 2,7 70 3,3 133 30 19 4,2 1,7 1,2 2,2 97 6

93 60 24 3,4 5,3 1,1 3,6 70 7,5 134 31 37 4,1 2,2 0,7 2,4 60 3

94 77 19 0 2,8 0,8 1,8 106 5 135 30 27 4,1 1,1 1,2 2,4 81 7,1

95 59 22 4,5 1,8 1,8 1,8 58 4,7 136 45 34 4,8 1,3 0,9 3,3 63 4,1

96 38 24 4,5 1,7 0,9 2,8 83 6,1 137 45 29 3,9 1,5 1,3 2 77 5,3

97 34 23 6,2 3,9 0,8 1,9 81 3,9 138 31 24 4,9 1,6 1 3,2 55 3,4

98 34 23 6,2 3,9 0,8 3,8 81 3,9 139 30 34 4,5 1,8 1,2 2,6 80 5

99 31 24 4,9 1,6 1 3,2 55 3,4 140 35 27 3,7 1 1,2 2 64 4,8

100 43 25 4,7 5,3 0,9 1,7 55 2,1 141 45 31 4,7 1,8 0,8 3,1 82 4,8

101 42 23 5,9 3,7 1,3 3,1 53 5,4 142 45 22 6,1 3,7 0,7 3,9 80 3,6

102 47 23 3,7 1,8 1 2 87 4,1 143 50 29 4,4 2 1 2,5 56 4

103 50 24 4 3 1 1,8 59 4,3 144 48 23 4,4 2,3 1,3 2,2 38 4

104 49 25 2 0,8 0,6 1 74 5 145 38 40 5,3 2 1,6 2,9 59 5,8

105 50 25 4,2 2,2 0,8 2,5 53 4,7 146 46 24 5,7 3,8 1,3 2,8 59 3

106 49 24 4 2,1 1,4 1,9 59 3,5 147 45 25 4,4 1,5 1 2,8 42 2,3

107 49 21 5,6 1,9 0,75 1,35 44 3,3 148 54 22 9,5 1,7 1,3 2,5 39 3

108 50 24 2 0,8 0,6 1 74 5 149 43 21 5,9 2 1,1 3,9 62 5,4

109 49 23 5,6 1,9 0,75 1,35 44 3,3 150 49 24 5,1 1,7 3,9 0,8 65 3,9

110 39 22 4,7 1,3 1,1 3,1 38 3 151 49 25 6 3,5 1,1 3,5 56 3,8

111 50 24 4 2,4 1 1,8 59 4,3 152 45 24 5,9 1,8 1,6 3,5 54 3,1

112 39 24 4,7 1,3 1,1 3,1 46 3 153 47 23 6,3 2,2 1,1 2,8 65 3,5

113 49 24 3,6 2,4 1,9 1,1 75 3,1 154 38 47 5,2 2 1,1 3,2 67 4
NO Age BMI Chol TG HDL LDL Cr BUN NO Age BMI Chol TG HDL LDL Cr BUN

155 42 25 4,7 2,5 1,3 2,4 39 2,8 196 60 27 7,2 2,2 0,8 2,2 45 2

156 39 25 6,7 2,5 1,1 4,5 49 4,3 197 60 27 7,2 2,2 1 2,2 45 2

157 30 25 5,5 1,8 1,2 3,5 80 4,8 198 73 27 5,3 1,4 1,5 3,2 79 4,3

158 40 24 5 2,1 1,6 3 76 5,9 199 61 28 4,1 4,2 1,2 1,4 23 2,1

159 46 24 6,8 0,7 1,7 4,7 47 4,4 200 51 32 3,5 1,8 1,8 1,95 70 6,5

160 45 25 2,5 2,2 1 0,6 49 3,7 201 55 31 4,5 1,5 1,2 2,7 64 4,16

161 33 21 2,4 1,9 0,8 2,5 76 3,3 202 55 30 4,6 1,7 1 2,9 52 2,7

162 40 40 4 1,8 0,9 2,4 72 4,3 203 73 28 5,3 1,4 1,5 3,2 79 4,3

163 40 28 4,4 1,4 1,3 2,5 74 7,1 204 63 29 5,9 2,2 1,2 3,7 93 8,7

164 50 23 5,2 2,1 1,1 3,2 67 7,7 205 52 31 2,7 1,2 0,8 1,4 76 6

165 63 32 5,8 1,7 1,7 3,4 96 6,6 206 55 30 4,1 2,7 1 2 46 2,1

166 44 23 6,2 2,3 1,2 4,1 64 6,8 207 51 36 4,1 2,7 1 2 46 2,1

167 49 25 4,2 1,1 1,1 2,7 53 4,3 208 57 34 4,5 1,6 2,1 1,9 72 4,8

168 42 22 5,6 2,1 0,9 3,8 91 4,6 209 55 31 4,5 1,8 1,1 2,7 78 5,1

169 44 25 5,3 1,8 0,9 3,6 32 4 210 58 33 6,6 2,9 1,1 4,3 800 20,8

170 33 31 3,7 1,2 1,6 1,5 31 1,8 211 60 26 4,4 2,1 1,1 2,5 72 6

171 48 25 4,4 2,3 1,3 2,2 38 4 212 56 26 4,7 1,3 0,9 3,3 60 3,5

172 57 37 4 6 2,5 3,5 370 4,6 213 61 38 2,6 1,1 0,9 1,6 92 5,7

173 47 23 5,3 2,3 0,7 3,7 68 5,1 214 73 34 4,2 1,9 1,95 9,9 67 4,3

174 57 37 6,1 6 2,5 3,5 370 4,6 215 55 35 4,3 1,5 1 2,6 46 3,8

175 33 24 6,2 3,8 0,8 3,7 56 4,6 216 60 37 4,7 1,3 0,9 3,3 60 3,5

176 33 23 6,8 3,1 1 3,9 48 5,7 217 53 39 5,4 3,8 1,9 3 68 4,5

177 34 21 5,1 1,2 1,4 0,9 80 7,7 218 54 33 3,8 1,7 1,1 3 67 5

178 43 23 6,2 3,2 1 3,9 42 3,2 219 61 38 2,6 1,1 0,9 2 92 5,7

179 28 24 5,3 3,2 0,8 0,8 73 4,1 220 54 33 2 1,9 0,9 2,5 25 1,2

180 47 24 7 2,8 0,9 4,9 62 5,8 221 66 26 4,2 1 1,4 2,4 46 3,2

181 39 20 7,1 1,5 1,2 4,1 80 9,1 222 52 33 6 1,2 1,1 2 34 2

182 39 25 4,4 1,7 2,8 0,7 45 4,2 223 61 29 4,4 2 1 2,5 56 4,3

183 49 23 6,6 3,8 1 4,1 23 2,2 224 55 29 4,1 1 1,1 2,1 44 2,9

184 50 24 6,3 4,4 1 3,6 106 2,6 225 56 32 4,9 2,5 0,5 3,4 33 3,2

185 56 35 4,8 1,7 1,3 2,8 92 8,5 226 55 30 5,2 1,8 1,3 3,2 85 5,4

186 51 31 3,8 3,8 1 1,1 65 7,3 227 56 30 4,1 0,6 1,3 1,4 45 4

187 52 33 3,8 3,2 0,8 1,7 60 3 228 66 30 3,6 5,1 0,9 2,5 63 4,1

188 56 39 4,1 1,5 0,8 1,7 44 3,4 229 66 33 5,8 3,3 1 3,4 146 14,1

189 59 38 4,2 2 2,3 0,9 91 3,8

190 54 37 3,1 1,1 3,1 1,2 52 4,3 1. Results using Naïve Bayes
191 69 33 5,4 1,3 1,7 3,1 71 5,9
Diabetes with the Naïve Bayes
192 60 26 4,7 2,3 1,4 1,6 76 6,6
classification method based on the data
193 54 32 5,4 1,3 1,7 3,1 71 5,9
obtained produces test data accuracy of
194 57 33 5,5 1,9 1 3,7 77 2
80.34934%, with test data accuracy of 93%.
195 55 33 5,6 4,6 0,8 2,9 76 5 The confusion matrix and evaluation results are
shown in the table below.
A. Data Latih 2. Results using K-Means

Matriks Konfusi Diabetes classification using the K-Means

Kelas Uji method based on the training data obtained
Tidak produces an unindicated accuracy of 75%, and
Kelas Asli Terindikasi Total an indicated data accuracy of 44%. Based on
terindikasi
Penyakit the test data obtained, the unindicated accuracy
Penyakit
(1) was 78%, and the indicated data accuracy was
(0)
Terindikasi 92%. The confusion matrix and evaluation
penyakit 68 28 96 results are shown in the table below.
(1)
Tidak A. Data Latih
terindikasi
17 116 133
penyakit Kelas Asli Kelas Uji
(0) Total
Tidak
Total Data 229 Indikasi Indikasi
Tidak
Metriks evaluasi Indikasi 100 33 133
F1 Indikasi 54 42 96
ERR ACC TPR FPR Recall Prisn
Score Total data 229
19% 80% 70% 12% 70% 80% 21%
Jumlah TP FP
No Kategori Sampel Rate Rate Akurasi
B. Data Uji Tidak
1 Indikasi 133 0,75 0,25 75%
Matriks Konfusi 2 Indikasi 96 0,44 0,56 44%
Kelas Uji
Tidak
Kelas Asli Terindikasi
terindikasi Total B. Data Uji
Penyakit
Penyakit
(1)
(0) Kelas Kelas
Terindikasi Asli Uji
penyakit 47 3 50 Total
Tidak
(1) Indikasi Indikasi
Tidak Tidak
terindikasi Indikasi 39 11 50
4 46 50
penyakit Indikasi 4 46 50
(0) Total
Total Data 100 data 100

Jumlah TP FP
Metriks evaluasi No Kategori Sampel Rate Rate Akurasi
F1 Tidak
ERR ACC TPR FPR Rcl Prn
Score 1 Indikasi 50 0,78 0,22 78%
7% 93% 94% 8% 94% 92% 14% 2 Indikasi 50 0,92 0,08 92%
Discussion CONFESSION

Based on the results obtained after The author would like to thank various
testing using the three data methods for parties who have played a role in preparing this
diagnosing Diabetes, accuracy values were article. Therefore, with great respect, sincerity
obtained from the three methods, namely: and humility, the author would like to thank:
Naïve Bayes, K-Means, and KNN. It can be 1. Prof. Dr. Eng. Ir. Muhammad Ilhamdi
seen that the accuracy, precision and recall Rusydi, S.T.,
values for Naïve Bayes and K-Means are higher M.T., as the supervisor who has provided
than KNN. This shows that the more suitable lots of input and suggestions regarding writing
methods for diagnosing Diabetes are Naïve this article.
Bayes and K-Means. 2. The author's parents, siblings, and the
author's extended family have provided many
CONCLUSION prayers and encouragement until the
It can be concluded that Diabetes completion of this article.
diagnosis using an intelligent system can be 3. Fellow Andalas University students who
done using various methods, including Naïve have provided support and motivation to the
Bayes, K-Means, and K-Nearest Neighbor author.
(KNN). Naïve Bayes uses probability to group 4. All parties who have helped in completing
existing data, while K-Means uses the distance this article, whose names the author cannot
between each data point, and KNN uses the mention one by one. May Allah bestow His
closest distance from the selected data. To mercy and guidance on them.
evaluate the accuracy, precision and recall of May Allah SWT reward all your guidance, help
these methods, a confusion matrix is needed. and support. Amen.
The results and discussion show that the
accuracy of the Naïve Bayes and K-Means
methods is much higher than the KNN method.
Therefore, it can be concluded that the Naïve
Bayes and K-Means methods are more suitable
for implementing intelligent systems in
diagnosing Diabetes.
REFERENCES 10) [https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC6444850/](https://www.n
1) [https://www.ncbi.nlm.nih.gov/pmc/a
cbi.nlm.nih.gov/pmc/articles/PMC64
rticles/PMC7056531/](https://www.n
44850/)
cbi.nlm.nih.gov/pmc/articles/PMC70
11) Berl T, Schrier RW. Disorders of
56531/)
water metabolism. Chapter 1. In:
2) [https://www.ncbi.nlm.nih.gov/pmc/a
Schrier RW, editor. *Renal and
rticles/PMC8920809/](https://www.n
Electrolyte Disorders*. 6th ed.
cbi.nlm.nih.gov/pmc/articles/PMC89
Philadelphia: Lippincott Williams and
20809/)
Wilkins; 2002. pp. 1–63. [Google
3) [https://www.ncbi.nlm.nih.gov/pmc/a
Scholar]
rticles/PMC9316578/](https://www.n
12) Dossetor JB. Creatininemia versus
cbi.nlm.nih.gov/pmc/articles/PMC93
uremia. The relative significance of
16578/)
blood urea nitrogen and serum
4) [https://www.ncbi.nlm.nih.gov/pmc/a
creatinine concentrations in azotemia.
rticles/PMC7054063/](https://www.n
*Ann Intern Med*. 1966;65:1287–
cbi.nlm.nih.gov/pmc/articles/PMC70
1299. doi: 10.7326/0003-4819-65-6-
54063/)
1287. [PubMed] [CrossRef] [Google
5) [https://www.ncbi.nlm.nih.gov/pmc/a
Scholar]
rticles/PMC10724412/](https://www.
13) Hosten AO. BUN and creatinine. In:
ncbi.nlm.nih.gov/pmc/articles/PMC1
Walker HK, Hall WD, Hurst JW,
0724412)
editors. *Clinical Methods: The
6) [https://www.ncbi.nlm.nih.gov/pmc/a
History, Physical, and Laboratory
rticles/PMC5586853/](https://www.n
Examinations*. 3rd ed. Boston:
cbi.nlm.nih.gov/pmc/articles/PMC55
Butterworths; 1990. pp. 874–878.
86853/)
[PubMed] [Google Scholar]
7) [https://www.ncbi.nlm.nih.gov/books
14) Kalim S, Karumanchi SA, Thadhani
/NBK507821/](https://www.ncbi.nlm
RI, Berg AH. Protein carbamylation
.nih.gov/books/NBK507821/)
in kidney disease: pathogenesis and
8) [https://www.ncbi.nlm.nih.gov/pmc/a
clinical implications. *Am J Kidney
rticles/PMC10663898/](https://www.
Dis*. 2014;64:793–803. doi:
ncbi.nlm.nih.gov/pmc/articles/PMC1
10.1053/j.ajkd.2014.04.034. [PMC
0663898)
free article] [PubMed] [CrossRef]
9) [https://www.ncbi.nlm.nih.gov/pmc/a
[Google Scholar]
rticles/PMC8173137/](https://www.n
15) Lau WL, Vaziri ND. Urea, a true
cbi.nlm.nih.gov/pmc/articles/PMC81
uremic toxin: the empire strikes back.
73137/)
*Clin Sci (Lond)*. 2017;131:3–12.
doi: 10.1042/CS20160203. [PubMed] 20) Cirillo P, Gersch MS, Mu W, Scherer
[CrossRef] [Google Scholar] PM, Kim KM, Gesualdo L, et al.
16) Vanholder R, Gryp T, Glorieux G. Ketohexokinase-dependent
Urea and chronic kidney disease: the metabolism of fructose induces
comeback of the century? (in uraemia proinflammatory mediators in
research). *Nephrol Dial Transplant*. proximal tubular cells. *J Am Soc
2018;33:4–12. doi: Nephrol*. 2009;20:545–553. doi:
10.1093/ndt/gfx039. [PubMed] 10.1681/ASN.2008060576. [PMC
[CrossRef] [Google Scholar] free article] [PubMed] [CrossRef]
17) Cauthen CA, Lipinski MJ, Abbate A, [Google Scholar]
Appleton D, Nusca A, Varma A, et al.
Relation of blood urea nitrogen to
long-term mortality in patients with
heart failure. *Am J Cardiol*.
2008;101:1643–1647. doi:
10.1016/j.amjcard.2008.01.047.
[PubMed] [CrossRef] [Google
Scholar]
18) Matsushita K, Kwak L, Hyun N,
Bessel M, Agarwal SK, Loehr LR, et
al. Community burden and prognostic
impact of reduced kidney function
among patients hospitalized with
acute decompensated heart failure: the
atherosclerosis risk in communities
(ARIC) study community
surveillance. *PLoS One*.
2017;12:e0181373. doi:
10.1371/journal.pone.0181373.
[PMC free article] [PubMed]
[CrossRef] [Google Scholar]
19) Bouby N, Bachmann S, Bichet D,
Bankir L. Effect of water intake on the
progression of chronic renal failure in
the 5/6 nephrectomized rat. *Am J
Phys*. 1990;258:F973–F979.
[PubMed] [Google Scholar]

T2
No ratings yet
T2
2 pages
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
No ratings yet
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
5 pages
Applying ML in Diagnosis CVD IN DM
No ratings yet
Applying ML in Diagnosis CVD IN DM
6 pages
Prediction of Heart Problems in Diabetic Patients
No ratings yet
Prediction of Heart Problems in Diabetic Patients
8 pages
A Model For Early Prediction of Diabetes
No ratings yet
A Model For Early Prediction of Diabetes
6 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Vol 7 No 03
No ratings yet
Vol 7 No 03
3 pages
KNN Diabetes Internasional 2
No ratings yet
KNN Diabetes Internasional 2
6 pages
Purnomo 2020 J. Phys. Conf. Ser. 1511 012001
No ratings yet
Purnomo 2020 J. Phys. Conf. Ser. 1511 012001
7 pages
Prediction of Diabetes Using R
No ratings yet
Prediction of Diabetes Using R
6 pages
21008-Article Text-81743-1-10-20240426
No ratings yet
21008-Article Text-81743-1-10-20240426
9 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
Using Bayes Network For Prediction of Type-2 Diabetes: Yan Hu
No ratings yet
Using Bayes Network For Prediction of Type-2 Diabetes: Yan Hu
5 pages
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
No ratings yet
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
14 pages
Improvement of Support Vector Machine For Predicting Diabetes Mellitus With Machine Learning Approach
No ratings yet
Improvement of Support Vector Machine For Predicting Diabetes Mellitus With Machine Learning Approach
12 pages
DDPIS Diabetes Disease Prediction by Improvising
No ratings yet
DDPIS Diabetes Disease Prediction by Improvising
11 pages
Heart Disease Prediction Using Data Mining
No ratings yet
Heart Disease Prediction Using Data Mining
3 pages
Independent Project
No ratings yet
Independent Project
10 pages
DM Shetty2017
No ratings yet
DM Shetty2017
5 pages
Integracionk Means NaiveBayes
No ratings yet
Integracionk Means NaiveBayes
13 pages
V5i9 0240
No ratings yet
V5i9 0240
4 pages
Ext 74513
No ratings yet
Ext 74513
10 pages
i Ja Tc Se 379942020
No ratings yet
i Ja Tc Se 379942020
9 pages
Classification of Diabetes Mellitus Using Machine Learning Techniques
No ratings yet
Classification of Diabetes Mellitus Using Machine Learning Techniques
4 pages
Barakat
No ratings yet
Barakat
7 pages
2777-Article Text-14832-2-10-20230331
No ratings yet
2777-Article Text-14832-2-10-20230331
14 pages
Diabetes Prediction Using SVM
No ratings yet
Diabetes Prediction Using SVM
6 pages
Sciencedirect: Performance Analysis of Data Mining Classification Techniques To Predict Diabetes
No ratings yet
Sciencedirect: Performance Analysis of Data Mining Classification Techniques To Predict Diabetes
7 pages
Breault 2001 RoughSets
No ratings yet
Breault 2001 RoughSets
11 pages
Diabetic Prediction System Using Data Mining: September 2016
No ratings yet
Diabetic Prediction System Using Data Mining: September 2016
8 pages
Prediction of Diabetes
No ratings yet
Prediction of Diabetes
12 pages
36504-Article Text-88429-2-10-20200111
No ratings yet
36504-Article Text-88429-2-10-20200111
8 pages
Research Paper
No ratings yet
Research Paper
5 pages
Support Vector Machine: Machine Learning Approach in Healthcare
No ratings yet
Support Vector Machine: Machine Learning Approach in Healthcare
5 pages
6886-Article Text-15276-1-10-20240627
No ratings yet
6886-Article Text-15276-1-10-20240627
13 pages
Paper 4
No ratings yet
Paper 4
5 pages
BP 4
No ratings yet
BP 4
7 pages
Using Bayes Network in Weka
No ratings yet
Using Bayes Network in Weka
6 pages
Disease Prediction Using Data Mining
No ratings yet
Disease Prediction Using Data Mining
5 pages
Comparative Study of Machine Learning Algorithms For Diabetes
No ratings yet
Comparative Study of Machine Learning Algorithms For Diabetes
11 pages
Analysis and Prediction of Diabetes Mell PDF
No ratings yet
Analysis and Prediction of Diabetes Mell PDF
10 pages
A Survey On Diabetic Prediction System Using Machine Learning
No ratings yet
A Survey On Diabetic Prediction System Using Machine Learning
5 pages
Diabetes Mellitus Prediction Using Class
No ratings yet
Diabetes Mellitus Prediction Using Class
5 pages
Hir 2024 30 1 73
No ratings yet
Hir 2024 30 1 73
10 pages
Analysis and Prediction of Diabetes Using Machine Learning
No ratings yet
Analysis and Prediction of Diabetes Using Machine Learning
9 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
8 pages
Early Diabetes Detection Using Machine Learning: A Survey
No ratings yet
Early Diabetes Detection Using Machine Learning: A Survey
6 pages
Diabetes Patient's Risk Through Soft Computing Model
No ratings yet
Diabetes Patient's Risk Through Soft Computing Model
6 pages
Diabetes Risk
No ratings yet
Diabetes Risk
5 pages
IEEE Paper 1
No ratings yet
IEEE Paper 1
5 pages
Attachment 1
No ratings yet
Attachment 1
14 pages
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
No ratings yet
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
15 pages
Classification of Diabetes Mellitus Prediction Using Hybrid Machine Learning Techniques
No ratings yet
Classification of Diabetes Mellitus Prediction Using Hybrid Machine Learning Techniques
10 pages
Analysis and Prediction of Diabetes Dise
No ratings yet
Analysis and Prediction of Diabetes Dise
5 pages
1 s2.0 S2665917422002392 Main
No ratings yet
1 s2.0 S2665917422002392 Main
9 pages
Supervised Learning Method of Diabetes Prediction
No ratings yet
Supervised Learning Method of Diabetes Prediction
10 pages
Diabetes Detection in The Humans
No ratings yet
Diabetes Detection in The Humans
43 pages
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
No ratings yet
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
4 pages
An Efficient Distance Estimation and Centroid Selection Based On K-Means Clustering For Small and Large Dataset
No ratings yet
An Efficient Distance Estimation and Centroid Selection Based On K-Means Clustering For Small and Large Dataset
8 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Diabetes Mellitus and Oral Health: An Interprofessional Approach
From Everand
Diabetes Mellitus and Oral Health: An Interprofessional Approach
Ira B. Lamster
No ratings yet
Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
No ratings yet
Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
4 pages
Applied Computing and Informatics: Kumash Kapadia, Hussein Abdel-Jaber, Fadi Thabtah, Wael Hadi
No ratings yet
Applied Computing and Informatics: Kumash Kapadia, Hussein Abdel-Jaber, Fadi Thabtah, Wael Hadi
6 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
MLC PhishingEstimation Lab
No ratings yet
MLC PhishingEstimation Lab
8 pages
Machine Learningbased Lie Detectorappliedtoa Collectedand Annotated Dataset
No ratings yet
Machine Learningbased Lie Detectorappliedtoa Collectedand Annotated Dataset
10 pages
Program Book For Short-Term Internship As On 18-10-2022
No ratings yet
Program Book For Short-Term Internship As On 18-10-2022
57 pages
A Comparative Study On Air Quality Analysis by SVM K - Means and Naive Bayes Algorithms
No ratings yet
A Comparative Study On Air Quality Analysis by SVM K - Means and Naive Bayes Algorithms
17 pages
Problem Statements For Intel Unnati Industrial Training 2025
No ratings yet
Problem Statements For Intel Unnati Industrial Training 2025
13 pages
Project Main Nimoni
No ratings yet
Project Main Nimoni
29 pages
Edunet Week 1 Submission Details
No ratings yet
Edunet Week 1 Submission Details
4 pages
RMDNet-Deep Learning Paradigms For Effective Malware Detection and Classification
No ratings yet
RMDNet-Deep Learning Paradigms For Effective Malware Detection and Classification
14 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
Microsft Guide Paper 1
No ratings yet
Microsft Guide Paper 1
10 pages
Automatic Grape Leaf Diseases Identification Via Unitedmodel Based On Multiple Convolutional Neural Networks
No ratings yet
Automatic Grape Leaf Diseases Identification Via Unitedmodel Based On Multiple Convolutional Neural Networks
9 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Paper CommNet 2022
No ratings yet
Paper CommNet 2022
8 pages
Jurnal Big
No ratings yet
Jurnal Big
22 pages
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
No ratings yet
Vision Transformer and Explainable Transfer Learning Models For Auto Detection of Kidney Cyst, Stone and Tumor From CT Radiography
14 pages
Interview Questions
No ratings yet
Interview Questions
23 pages
Revised PROOFREAD Thesis Document
No ratings yet
Revised PROOFREAD Thesis Document
74 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
Research Paper Minor Project Final
No ratings yet
Research Paper Minor Project Final
7 pages
An AI Based Intelligent System For Healthcare Aanalysis Using Ridge-Adaline Stochastic Gradient Descent Classifier
No ratings yet
An AI Based Intelligent System For Healthcare Aanalysis Using Ridge-Adaline Stochastic Gradient Descent Classifier
20 pages
Table Bank
No ratings yet
Table Bank
9 pages
Project Document
No ratings yet
Project Document
2 pages
Human Behaviour Detection Dataset HBDset Using Com - 2024 - Journal of Safety
No ratings yet
Human Behaviour Detection Dataset HBDset Using Com - 2024 - Journal of Safety
10 pages
Image Segmentation Based On RGBD Data
No ratings yet
Image Segmentation Based On RGBD Data
4 pages

Rudini - Artikel Sistem Cerdas

Uploaded by

Rudini - Artikel Sistem Cerdas

Uploaded by

ANDALAS JOURNAL OF ELECTRICAL AND ELECTRONIC ENGINEERING TECHNOLOGY - VOL. XX NO.

Available online at: http://ajeeet.ft.unand.ac.id/

Andalas Journal of Electrical and Electronic

Click here and write your Article Category

Classification of Diabetes Using Naïve Bayes, K-Means and K-Nearest

ARTICLE INFORMATION ABSTRACT

the development of this disease. As age increases, the risk of

2.4.2. Naïve Bayes

2.4. Data Mining Techniques

The clustering process using the K-Means algorithm for

2. Precision: Determines whether our model is reliable or

3. Recall (Sensitivity): Tells us how many actual positive

methods. 37 45 24 3,8 0,7 1,3 2,2 31 4,7

38 44 21 9,5 1,7 1,3 2,5 39 3

40 33 24 4,2 1,4 1,3 2,6 47 2,7

1 50 24 4,2 0,9 2,4 1,4 46 4,7 42 31 23 4,3 2,1 1 2,4 67 3,6

3 33 21 4,9 1 0,8 2 46 7,1 44 47 22 4,7 1,6 1 2,4 48 3,7

4 45 21 2,9 1 1 1,5 24 2,3 45 44 24 4,9 1,7 1,4 2,8 49 3,1

5 50 24 3,6 1,3 0,9 2,1 50 2 46 45 22 4,3 0,9 1,1 4,2 57 4

6 48 24 2,9 0,8 0,9 1,6 47 4,7 47 43 21 4,8 1,9 1,1 3 40 2,4

8 32 24 3,8 2 2,4 3,8 28 3,6 49 50 24 3,9 0,7 2,3 0,3 69 4,4

10 33 21 4 1,1 0,9 2,7 53 3,3 51 49 22 3,1 1,3 1 2,5 52 2,3

11 30 22 4,9 1,3 1,2 3,2 42 3 52 50 22 3,4 0,7 1,1 2 32 4,5

12 45 23 4,2 1,7 1,2 2,2 54 4,6 53 49 21 4,2 0,8 0,9 3 38 2,1

13 50 24 4 1,5 1,2 2,2 39 3,5 54 44 21 5,2 1,5 1 3,5 42 3,9

14 50 21 3,6 1,1 1 2,1 74 5,5 55 48 23 3,6 0,6 2,1 1,2 55 2,8

15 50 21 5,3 0,8 1,1 4,1 53 5,9 56 47 23 4 1,3 0,9 2,6 45 4

16 49 24 5 1,3 1,2 3,3 28 2,2 57 47 24 4,3 2,3 0,9 2,4 85 7,6

17 49 23 4,4 0,9 1 1,3 55 3,8 58 44 21 4,9 2,8 2 1,8 64 6,8

19 42 24 6,2 1 1,1 4,6 73 5 60 40 23 4,8 2,5 1,1 2,7 63 5

21 50 23 4,8 1 0,9 3,6 28 4 62 42 21 3 1,1 1,1 1,4 45 2,1

22 39 21 4,6 1,3 1 3 55 3,2 63 59 23 4,9 1,2 0,9 3,4 38 5,2

23 30 22 4,9 1,3 1,2 3,2 45 3 64 31 22 4,4 1,8 1,1 2,6 6 3,9

26 33 22 3,7 1,3 0,8 2,4 54 2 67 41 21 3,4 1,2 1,7 1,1 39 2

27 44 22 5,6 1,4 1,4 3,6 49 4,3 68 43 24 4,1 1,1 1,2 2,4 54 4

30 36 22 3,6 0,7 1,3 1,9 70 3,8 71 35 23 4,1 1,9 4 1,3 44 3,3

73 30 24 6,5 1,8 1,5 4,2 61 3,6 114 50 25 2 0,8 0,6 1 74 5

75 57 22 3,2 1,3 0,9 3 97 4,6 116 50 25 5,3 1,3 1 3,7 62 4,8

80 50 21 2 1,2 1,3 3 61 5,8 121 33 19 4,2 1,4 1,3 2,6 47 2,7

99 31 24 4,9 1,6 1 3,2 55 3,4 140 35 27 3,7 1 1,2 2 64 4,8

102 47 23 3,7 1,8 1 2 87 4,1 143 50 29 4,4 2 1 2,5 56 4

103 50 24 4 3 1 1,8 59 4,3 144 48 23 4,4 2,3 1,3 2,2 38 4

104 49 25 2 0,8 0,6 1 74 5 145 38 40 5,3 2 1,6 2,9 59 5,8

108 50 24 2 0,8 0,6 1 74 5 149 43 21 5,9 2 1,1 3,9 62 5,4

181 39 20 7,1 1,5 1,2 4,1 80 9,1 222 52 33 6 1,2 1,1 2 34 2

189 59 38 4,2 2 2,3 0,9 91 3,8

Matriks Konfusi Diabetes classification using the K-Means

You might also like