210596_ML_Labtask5.ipynb_k - Colab
210596_ML_Labtask5.ipynb_k - Colab
ipynb - Colab
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.tree import DecisionTreeClassifier as DT
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
dataset.head()
0 56 Female 27.0 No No 1
Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null object
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null object
4 Seatbelt_Used 196 non-null object
5 Survived 196 non-null int64
dtypes: float64(1), int64(2), object(3)
memory usage: 9.3+ KB
dataset.dtypes
Age int64
Gender object
Speed_of_Impact float64
Helmet_Used object
Seatbelt_Used object
Survived int64
dtype: object
encoder1=LabelEncoder()
dataset['Gender']=encoder1.fit_transform(dataset['Gender'])
encoder2=LabelEncoder()
dataset['Helmet_Used']=encoder2.fit_transform(dataset['Helmet_Used'])
encoder3=LabelEncoder()
dataset['Seatbelt_Used']=encoder3.fit_transform(dataset['Seatbelt_Used'])
encoder4= LabelEncoder()
dataset['Survived']=encoder4.fit_transform(dataset['Survived'])
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 1/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
dataset
0 56 0 27.0 0 0 1
1 69 0 46.0 0 1 1
2 46 1 46.0 1 1 0
3 32 1 117.0 0 1 0
4 60 0 40.0 1 1 0
191 69 0 111.0 0 1 1
192 30 0 51.0 0 1 1
193 58 1 110.0 0 1 1
194 20 1 103.0 0 1 1
195 56 0 43.0 0 1 1
Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null int64
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null int64
4 Seatbelt_Used 196 non-null int64
5 Survived 196 non-null int64
dtypes: float64(1), int64(5)
memory usage: 9.3 KB
encoder1.classes_
array([0, 1])
encoder1.transform(encoder1.classes_)
array([0, 1])
encoder2.classes_
array([0, 1])
encoder2.transform(encoder2.classes_)
array([0, 1])
x = dataset.drop(columns=['Survived'])
y = dataset['Survived']
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=1)
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 2/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=3)
predict=modelKNN2.predict(x)
predict
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 3/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1])
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7551020408163265
Accuracy: 0.7448979591836735
Recall: 0.74
F1 Score: 0.7474747474747475
Confusion Matrix:
[[72 24]
[26 74]]
▾ KNeighborsClassifier i ?
KNeighborsClassifier()
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0,
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 4/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]
▾ KNeighborsClassifier i ?
KNeighborsClassifier(n_neighbors=7)
predict=modelKNN2.predict(x)
predict
array([1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 5/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1])
print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)
Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]
▾ DecisionTreeClassifier i ?
DecisionTreeClassifier()
predict=modelDT.predict(x)
predict
array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 6/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)
print('Precision:', precision_test)
print('Accuracy:', accuracy_test)
print('Recall:', recall_test)
print('F1 Score:', f1_test)
print('Confusion Matrix:\n', confusion_test)
Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]
The comparison between K-Nearest Neighbors (KNN) and the Decision Tree model highlights key differences in their behavior and
performance.
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 7/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
KNN (k=1): Accuracy = 1.0 (100%) (Likely overfitting)
KNN (k=3): Accuracy = 0.7448 (74.48%)
KNN (k=5): Accuracy = 0.7193 (71.93%)
KNN (k=7): Accuracy = 0.7194 (71.94%)
Decision Tree: Accuracy = 1.0 (100%) (Likely overfitting)
Final Thoughts:
If the dataset is small and well-separated, Decision Trees may be a better choice due to their interpretability.
If the dataset is large and complex, KNN with a well-chosen k-value can offer competitive results.
Hyperparameter tuning (pruning for Decision Trees, k-value selection for KNN) is essential for optimal performance.
https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 8/8