0% found this document useful (0 votes)
52 views

IT0089 TB391 Decision Tree - Coyohan

The document describes a study that uses a decision tree algorithm to predict whether individuals have diabetes based on attributes like age, BMI, glucose levels, and blood pressure. The dataset contains 768 rows with these variables. The author trains the decision tree model on 70% of the data and tests it on the remaining 30%. The decision tree is then used to make predictions on the test data and a confusion matrix is created to evaluate performance. Finally, the tree is plotted and the results are interpreted.

Uploaded by

Kim Vincere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

IT0089 TB391 Decision Tree - Coyohan

The document describes a study that uses a decision tree algorithm to predict whether individuals have diabetes based on attributes like age, BMI, glucose levels, and blood pressure. The dataset contains 768 rows with these variables. The author trains the decision tree model on 70% of the data and tests it on the remaining 30%. The decision tree is then used to make predictions on the test data and a confusion matrix is created to evaluate performance. Finally, the tree is plotted and the results are interpreted.

Uploaded by

Kim Vincere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Coyohan, Elijah Brian

Diabetes Identification: Unveiling Patterns


with Decision Tree Algorithm

Objectives: To exercise the last discussion we had with classification algorithms via R
language, we will perform an activity where you study the data and create a classification
model to detect if someone has a diabetes based on the attributes.

About the dataset:


1. Download the diabetes_dataset.csv available in our shared folder.
2. This dataset has 768 rows.
3. Identify the dependent variable and independent variable.

Dependent variable Independent variable

Outcome Age

BMI

Glucose

BloodPressure

Write your answer in this font color.

Codes (start from importing data)

# Decision Tree Classification


setwd ('C:\\Users\\201910082\\Desktop\\dataset')
dataset = read.csv('diabetes_dataset.csv')
# Importing the dataset
dataset = read.csv('diabetes_dataset.csv')
#dataset = dataset[3:5]
dataset
# Encoding the target feature as factor
dataset$Outcome = factor(dataset$Outcome, levels = c(0, 1))

# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Outcome, SplitRatio = 0.70)
head(split)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

# Feature Scaling
training_set[-3] = scale(training_set[-3])
test_set[-3] = scale(test_set[-3])

# Fitting Decision Tree Classification to the Training set


# install.packages('rpart')
library(rpart)
classifier = rpart(formula = Outcome ~ .,
data = training_set)

classifier
# Predicting the Test set results
y_pred = predict(classifier, newdata = test_set[-3], type = 'class')

# Making the Confusion Matrix


cm = table(test_set[, 3], y_pred)
cm
# Visualising the Training set results
library(ElemStatLearn)
set = training_set
X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('Age', 'BMI')
y_grid = predict(classifier, newdata = grid_set, type = 'class')
plot(set[, -3],
main = 'Decision Tree Classification (Training set)',
xlab = 'Age', ylab = 'BMI',
xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 1, 'springgreen3', 'tomato'))
points(set, pch = 21, bg = ifelse(set[, 3] == 1, 'green4', 'red3'))

# Visualising the Test set results


library(ElemStatLearn)
set = test_set
X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('Age', 'BMI')
y_grid = predict(classifier, newdata = grid_set, type = 'class')
plot(set[, -3], main = 'Decision Tree Classification (Test set)',
xlab = 'Age', ylab = 'BMI',
xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 1, 'springgreen3', 'tomato'))
points(set, pch = 21, bg = ifelse(set[, 3] == 1, 'green4', 'red3'))

# Plotting the tree


plot(classifier)
text(classifier)
rpart.plot(classifier)
prp(classifier)
Screenshots of results
Decision Tree Graph
Interpretation of Results

-If Outcome is less than the value of Glucose of 144 and if Glucose is less than 102
then the person has a 6 percent of having no diabetes

-If outcome is less than the value of Glucose of 144 and if Glucose is greater than 102
and BMI is less than 26 then the person has a 8 percent of having no diabetes

-If outcome is less than the value of Glucose of 144 and if Glucose is less than 102 and
if BMI is less than 26 then the person has a 26 percent of having no diabetes
-If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is less than to 55 then the
person has a 18 percent chance of having no diabetes
-If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is greater than or equal to 55
and blood pressure is less than 89 then the person has a 25 percent chance of having
no diabetes
-If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is greater than or equal to 55
and blood pressure is greater than or equal to 89 and age is less than 33 and BMI is
less than 32 then the person has a chance of 33 percent of having no diabetes
-If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is greater than or equal to 55
and blood pressure is greater than or equal to 89 and age is less than 33 and BMI is
greater than 32 then the person has a chance of 67 percent of having diabetes
-If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is greater than or equal to 55
and blood pressure is greater than or equal to 89 and age is greater than 33 and
glucose is less than 107 then the person has 38 percent chance of having no diabetes
If Outcome is less than the value of glucose of 144 and if Glucose is less than 102 and
BMI is greater than 26 and age is greater than 29 and age is greater than or equal to 55
and blood pressure is greater than or equal to 89 and age is greater than 33 and
glucose is greater than 107 then the person has 71 percent chance of having diabetes

-If Outcome is greater than the value of Glucose of 144 and if Glucose is less than 162
and age is less than 41 and BMI is less than 30 then the person has a 25 percent of
having no diabetes

-If Outcome is greater than the value of Glucose of 144 and if Glucose is less than 162
and age is less than 41 and BMI is greater than 30 and age is less than 33 then the
person has a 43 percent of having no diabetes

-If Outcome is greater than the value of Glucose of 144 and if Glucose is less than 162
and age is less than 41 and BMI is greater than 30 and age is greater than 33 then the
person has a 73 percent of having diabetes

If Outcome is greater than the value of Glucose of 144 and if Glucose is less than 162
and age is greater than 41 then the person has 81 percent of having diabetes

-If Outcome is less than the value of Glucose of 144 and if Glucose is greater than 102
then the person has a 87 percent of having diabetes

You might also like