0% found this document useful (0 votes)

27 views7 pages

Exp 5

Uploaded by

Loukik Tayshete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Exp 5

Uploaded by

Loukik Tayshete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from scipy.stats import skew

Clustering algo
First we figure out value of K by trial and error

Steps

• Let k = 3 , that we have to figure out most nearby three datapoints using euclidean
distance
• if the three belongs to the same class then the point for which we predict also is
classified as that.

• if k = 10
• then nearest may be from two different classes
• reason for that is points are limited in one class therefore it will look for other class to
complete K's value
• and we will classify the point to the class which has more no. of points considered in K.

• if k = 20
• here sometimes prediction of the point for which we are predicing might be incorrect
here
• as it can then consider more data points from the other class which will lead to wrong
classification.
so for that we have to choose the K carefully

data = pd.read_csv('diabetes-dataset.csv')
data.head(3)

Pregnancies Glucose BloodPressure SkinThickness Insulin

BMI \
0 2 138 62 35 0 33.6

1 0 84 82 31 125 38.2

2 0 145 0 0 0 44.2

DiabetesPedigreeFunction Age Outcome

0 0.127 47 1
1 0.233 23 0
2 0.630 31 1

data.isna().sum()

Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
print(data.isin({0}).sum())

Pregnancies 301
Glucose 13
BloodPressure 90
SkinThickness 573
Insulin 956
BMI 28
DiabetesPedigreeFunction 0
Age 0
Outcome 1316
dtype: int64

for col in ['BMI', 'Glucose', 'BloodPressure']:

data[col] = data[col].replace({0 : data[col].median()})

for col in ['Insulin', 'SkinThickness']:

data[col] = data[col].replace({0 : data[col].mean()})

def skewness(data):
skew_df = pd.DataFrame(data.select_dtypes(np.number).columns,
columns=['Feature'])
skew_df['Skew'] = skew_df['Feature'].apply(lambda feature:
skew(data[feature]))
skew_df['Absolute Skew'] = skew_df['Skew'].apply(abs)
return skew_df.sort_values(by = 'Absolute Skew', ascending =
False).reset_index(drop = True)

skewness(data)

Feature Skew Absolute Skew

0 Insulin 2.946441 2.946441
1 DiabetesPedigreeFunction 1.810620 1.810620
2 SkinThickness 1.575336 1.575336
3 Age 1.180381 1.180381
4 Pregnancies 0.981629 0.981629
5 BMI 0.936902 0.936902
6 Outcome 0.666133 0.666133
7 Glucose 0.515607 0.515607
8 BloodPressure 0.219439 0.219439

All of the features are skewed except Glucose and Blood Pressure. Let us apply log
transformation to deal with it

for col in ['Insulin', 'DiabetesPedigreeFunction', 'SkinThickness',

'Age', 'Pregnancies', 'BMI', 'Glucose', 'BloodPressure']:
data[col] = np.log1p(data[col])

from sklearn.model_selection import train_test_split

X = data.drop('Outcome', axis = 1)
y = data['Outcome'].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2, random_state = 42)

Scaling
mean = X_train.mean()
std = X_test.std()

X_train = (X_train - mean) / std

X_train = np.c_[np.ones(X_train.shape[0]), X_train]
X_test = (X_test - mean) / std
X_test = np.c_[np.ones(X_test.shape[0]), X_test]

len(X_train)

1600

len(X_test)

400

Create KNN (K nearest neighbors classifier)

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=21)
# metric is default minkowski == euclidean dist

knn.fit(X_train,y_train)

KNeighborsClassifier(n_neighbors=21)

knn.score(X_test,y_test)

0.83

Confusion Matrix will tell us for which classes we got out prediction right and vice versa

from sklearn.metrics import confusion_matrix

# Predict the labels for the test set

y_pred = knn.predict(X_test)

# Generate confusion matrix

cm = confusion_matrix(y_test, y_pred)
# Plot heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, cmap='Blues', fmt='g', cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()

from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.84 0.91 0.87 253

1 0.82 0.69 0.75 147

accuracy 0.83 400

macro avg 0.83 0.80 0.81 400
weighted avg 0.83 0.83 0.83 400

Precision: Precision is the ratio of correctly predicted positive observations (true positives) to the
total predicted positives (true positives + false positives). For class 0, precision is 0.84, and for
class 1, precision is 0.82. This means that 84% of the samples predicted as class 0 are actually
class 0, and 82% of the samples predicted as class 1 are actually class 1.

Recall: Recall, also known as sensitivity, is the ratio of correctly predicted positive observations
to the all observations in the actual class. For class 0, recall is 0.91, and for class 1, recall is 0.69.
This means that 91% of the actual class 0 samples are correctly identified, and 69% of the actual
class 1 samples are correctly identified.

F1-score: The F1-score is the weighted average of precision and recall. It considers both false
positives and false negatives. For class 0, the F1-score is 0.87, and for class 1, the F1-score is
0.75. The weighted average F1-score is also provided, which is 0.83 in this case.

Accuracy: Accuracy is the ratio of correctly predicted observations to the total observations. The
overall accuracy of the model is 0.83, which means that 83% of the predictions are correct.

These metrics indicate a "good" model

Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
lab_8__(6)عفان عبدالله احمد_التكليف_
No ratings yet
lab_8__(6)عفان عبدالله احمد_التكليف_
18 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
AIML Report (1) 11
No ratings yet
AIML Report (1) 11
13 pages
AIML Report.
No ratings yet
AIML Report.
12 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Diabetes
No ratings yet
Diabetes
97 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Practical 4
No ratings yet
Practical 4
2 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
1 page
Pythone code for predicting diabetes using ML
No ratings yet
Pythone code for predicting diabetes using ML
18 pages
DAL Experiment Outputs 6to10
No ratings yet
DAL Experiment Outputs 6to10
16 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
E_AI_Lab_EX_2and_3
No ratings yet
E_AI_Lab_EX_2and_3
9 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
Diabetes
No ratings yet
Diabetes
7 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes Dectection
No ratings yet
Diabetes Dectection
7 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
Project
No ratings yet
Project
8 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Logidtic_Regression_ASSIGNMENT
No ratings yet
Logidtic_Regression_ASSIGNMENT
13 pages
Experiment No-4 Code
No ratings yet
Experiment No-4 Code
16 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
مختار النعيري - The Course Work Submission (1)
No ratings yet
مختار النعيري - The Course Work Submission (1)
31 pages
ML Exp 7
No ratings yet
ML Exp 7
3 pages
Assignment 5 - SourceCode - Ipynb - Colab
No ratings yet
Assignment 5 - SourceCode - Ipynb - Colab
4 pages
diabetes-prediction-using-machine-learning
No ratings yet
diabetes-prediction-using-machine-learning
16 pages
Jawab: 1. Run Information
No ratings yet
Jawab: 1. Run Information
6 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Mean Vector and Correlation Matrix in R - Jupyter Notebook
No ratings yet
Mean Vector and Correlation Matrix in R - Jupyter Notebook
7 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
No ratings yet
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
71 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages

Exp 5

Uploaded by

Exp 5

Uploaded by

import pandas as pd

from scipy.stats import skew

Pregnancies Glucose BloodPressure SkinThickness Insulin

DiabetesPedigreeFunction Age Outcome

for col in ['BMI', 'Glucose', 'BloodPressure']:

for col in ['Insulin', 'SkinThickness']:

Feature Skew Absolute Skew

for col in ['Insulin', 'DiabetesPedigreeFunction', 'SkinThickness',

from sklearn.model_selection import train_test_split

X_train = (X_train - mean) / std

Create KNN (K nearest neighbors classifier)

from sklearn.metrics import confusion_matrix

# Predict the labels for the test set

# Generate confusion matrix

from sklearn.metrics import classification_report

precision recall f1-score support

0 0.84 0.91 0.87 253

accuracy 0.83 400

These metrics indicate a "good" model

You might also like