0% found this document useful (0 votes)
3 views

210596_ML_Labtask5.ipynb_k - Colab

The document outlines a machine learning lab task using a dataset related to accidents, focusing on predicting survival based on various features such as age, gender, and safety equipment usage. It includes data preprocessing steps, such as label encoding, and implements a K-Nearest Neighbors (KNN) classifier with different values of K to evaluate model performance using metrics like precision, accuracy, recall, and F1 score. The results indicate that the model achieves perfect accuracy with K=1 but shows reduced performance with higher K values.

Uploaded by

hammadhn032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

210596_ML_Labtask5.ipynb_k - Colab

The document outlines a machine learning lab task using a dataset related to accidents, focusing on predicting survival based on various features such as age, gender, and safety equipment usage. It includes data preprocessing steps, such as label encoding, and implements a K-Nearest Neighbors (KNN) classifier with different values of K to evaluate model performance using metrics like precision, accuracy, recall, and F1 score. The results indicate that the model achieves perfect accuracy with K=1 but shows reduced performance with higher K values.

Uploaded by

hammadhn032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

3/17/25, 9:13 AM 210596_ML_Labtask5.

ipynb - Colab

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.tree import DecisionTreeClassifier as DT
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

dataset = pd.read_csv('/content/accident- DATASET ML Lab Task 3.csv')

dataset.head()

Age Gender Speed_of_Impact Helmet_Used Seatbelt_Used Survived

0 56 Female 27.0 No No 1

1 69 Female 46.0 No Yes 1

2 46 Male 46.0 Yes Yes 0

3 32 Male 117.0 No Yes 0

4 60 Female 40.0 Yes Yes 0

Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null object
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null object
4 Seatbelt_Used 196 non-null object
5 Survived 196 non-null int64
dtypes: float64(1), int64(2), object(3)
memory usage: 9.3+ KB

dataset.dtypes

Age int64

Gender object

Speed_of_Impact float64

Helmet_Used object

Seatbelt_Used object

Survived int64

dtype: object

from sklearn.preprocessing import LabelEncoder

encoder1=LabelEncoder()
dataset['Gender']=encoder1.fit_transform(dataset['Gender'])
encoder2=LabelEncoder()
dataset['Helmet_Used']=encoder2.fit_transform(dataset['Helmet_Used'])
encoder3=LabelEncoder()
dataset['Seatbelt_Used']=encoder3.fit_transform(dataset['Seatbelt_Used'])
encoder4= LabelEncoder()
dataset['Survived']=encoder4.fit_transform(dataset['Survived'])

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 1/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab

dataset

Age Gender Speed_of_Impact Helmet_Used Seatbelt_Used Survived

0 56 0 27.0 0 0 1

1 69 0 46.0 0 1 1

2 46 1 46.0 1 1 0

3 32 1 117.0 0 1 0

4 60 0 40.0 1 1 0

... ... ... ... ... ... ...

191 69 0 111.0 0 1 1

192 30 0 51.0 0 1 1

193 58 1 110.0 0 1 1

194 20 1 103.0 0 1 1

195 56 0 43.0 0 1 1

196 rows × 6 columns

Next steps: Generate code with dataset toggle_off View recommended plots New interactive sheet

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 196 non-null int64
1 Gender 196 non-null int64
2 Speed_of_Impact 196 non-null float64
3 Helmet_Used 196 non-null int64
4 Seatbelt_Used 196 non-null int64
5 Survived 196 non-null int64
dtypes: float64(1), int64(5)
memory usage: 9.3 KB

encoder1.classes_

array([0, 1])

encoder1.transform(encoder1.classes_)

array([0, 1])

encoder2.classes_

array([0, 1])

encoder2.transform(encoder2.classes_)

array([0, 1])

x = dataset.drop(columns=['Survived'])
y = dataset['Survived']

keyboard_arrow_down For KNN as K=1


modelKNN2=KNN(n_neighbors=1)
modelKNN2.fit(x,y)

▾ KNeighborsClassifier i ?

KNeighborsClassifier(n_neighbors=1)

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 2/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab

predict=modelKNN2.predict(x)
predict

array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])

precision_test = precision_score(y, predict)


accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)

print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)

Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]

confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])


confusion_test_plot.plot()
plt.show()

keyboard_arrow_down For KNN as K=3


modelKNN2=KNN(n_neighbors=3)
modelKNN2.fit(x,y)

▾ KNeighborsClassifier i ?

KNeighborsClassifier(n_neighbors=3)

predict=modelKNN2.predict(x)
predict

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 3/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1])

precision_test = precision_score(y, predict)


accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)

print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)

Precision: 0.7551020408163265
Accuracy: 0.7448979591836735
Recall: 0.74
F1 Score: 0.7474747474747475
Confusion Matrix:
[[72 24]
[26 74]]

confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])


confusion_test_plot.plot()
plt.show()

keyboard_arrow_down For KNN as K=5


modelKNN2=KNN(n_neighbors=5)
modelKNN2.fit(x,y)

▾ KNeighborsClassifier i ?

KNeighborsClassifier()

predict=modelKNN2.predict(x)
predict

array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0,

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 4/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])

precision_test = precision_score(y, predict)


accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)

print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)

Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]

confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])


confusion_test_plot.plot()
plt.show()

keyboard_arrow_down For KNN as K=7


modelKNN2=KNN(n_neighbors=7)
modelKNN2.fit(x,y)

▾ KNeighborsClassifier i ?

KNeighborsClassifier(n_neighbors=7)

predict=modelKNN2.predict(x)
predict

array([1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0,
1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 5/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1])

precision_test = precision_score(y, predict)


accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)

print("Precision:", precision_test)
print("Accuracy:", accuracy_test)
print("Recall:", recall_test)
print("F1 Score:", f1_test)
print("Confusion Matrix:\n", confusion_test)

Precision: 0.7184466019417476
Accuracy: 0.7193877551020408
Recall: 0.74
F1 Score: 0.729064039408867
Confusion Matrix:
[[67 29]
[26 74]]

confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])


confusion_test_plot.plot()
plt.show()

keyboard_arrow_down Decision Tree Model


modelDT=DT()
modelDT.fit(x,y)

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier()

predict=modelDT.predict(x)
predict

array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1])

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 6/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
precision_test = precision_score(y, predict)
accuracy_test = accuracy_score(y, predict)
recall_test = recall_score(y, predict)
f1_test = f1_score(y, predict)
confusion_test = confusion_matrix(y, predict)

print('Precision:', precision_test)
print('Accuracy:', accuracy_test)
print('Recall:', recall_test)
print('F1 Score:', f1_test)
print('Confusion Matrix:\n', confusion_test)

Precision: 1.0
Accuracy: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[ 96 0]
[ 0 100]]

confusion_test_plot = ConfusionMatrixDisplay(confusion_matrix = confusion_test, display_labels = [0,1])


confusion_test_plot.plot()
plt.show()

Conclusion: KNN vs. Decision Tree Model

The comparison between K-Nearest Neighbors (KNN) and the Decision Tree model highlights key differences in their behavior and
performance.

1. KNN Model Performance:

The choice of k-value significantly impacts KNN's accuracy.


Lower k-values (e.g., k=1, k=3) often lead to higher variance (overfitting), meaning the model may perform well on training data
but generalize poorly to new data.
Higher k-values (e.g., k=5, k=7) improve generalization by reducing sensitivity to noise but may lower accuracy slightly due to
smoothing effects.
KNN is computationally expensive for large datasets since it requires distance calculations for every prediction.

2. Decision Tree Model Performance:

Decision Trees tend to be faster and easier to interpret compared to KNN.


However, they are prone to overfitting, especially if the depth is not controlled.
Pruning techniques or setting depth limits can improve Decision Tree generalization.
Unlike KNN, Decision Trees do not rely on distance metrics, making them more robust to irrelevant features.

Extracted Accuracy Values:


The following accuracy scores for different models:

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 7/8
3/17/25, 9:13 AM 210596_ML_Labtask5.ipynb - Colab
KNN (k=1): Accuracy = 1.0 (100%) (Likely overfitting)
KNN (k=3): Accuracy = 0.7448 (74.48%)
KNN (k=5): Accuracy = 0.7193 (71.93%)
KNN (k=7): Accuracy = 0.7194 (71.94%)
Decision Tree: Accuracy = 1.0 (100%) (Likely overfitting)

Comparison Based on k-Values:


k=1: Memorizes the data, causing overfitting.
k=3: Balances accuracy and generalization, yielding 74.48% accuracy.
k=5: Accuracy stabilizes at 71.93%, showing less variance but reduced performance.
k=7: Accuracy stabilizes at 71.94%, showing less variance but reduced performance.
Decision Tree: Perfect accuracy, likely due to overfitting.

Final Thoughts:
If the dataset is small and well-separated, Decision Trees may be a better choice due to their interpretability.
If the dataset is large and complex, KNN with a well-chosen k-value can offer competitive results.
Hyperparameter tuning (pruning for Decision Trees, k-value selection for KNN) is essential for optimal performance.

https://colab.research.google.com/drive/1VZpYsgzp-ngT7qoaguxyI4vMpI_QSBfF#scrollTo=gM4UulYfg9u4&printMode=true 8/8

You might also like