0% found this document useful (0 votes)

37 views

Diabetic Prediction Using LogicalRegression

The document discusses building a logistic regression model to predict diabetes using health parameter data. It defines the problem, imports necessary libraries, builds the dataset by loading data, exploring it, and splitting it into training and test sets. Finally, it trains a logistic regression model on the training set.

Uploaded by

Yagnesh Vyas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Diabetic Prediction Using LogicalRegression

Uploaded by

Yagnesh Vyas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

logisticregression-1

June 8, 2023

1 Health Parameter Analysis and Diabetes Prediction using Lo-

gistic Regression
Logistic regression is a useful model for predicting binary outcomes, where there are only two
possible classes. It is commonly used because it provides interpretable results, allows for estimating
probabilities, and is computationally efficient. Logistic regression has fewer assumptions compared
to other models, can handle non-linear relationships, and is robust to outliers. It also supports
regularization techniques to prevent overfitting. Logistic regression is a well-studied and established
model in statistics and machine learning. However, it may not be suitable for all classification
problems, especially those with highly nonlinear relationships.

1.1 Define the Problem :

The problem is to create a logistic regression model based on the provided dataset that can pre-
dict the outcome of diabetes based on health parameters. The dataset contains various columns,
including the outcome column, which represents whether a person has diabetes or not. The goal is
to train the model using the training set and evaluate its performance on the testing set using the
confusion matrix and accuracy score.

1.2 Importing necessary libraries :

[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

1.3 Build the dataset:

To build the dataset, you need to perform the following steps:

a) Load the dataset using pandas: Use the pandas library to load the dataset from the
‘diabetes.csv’ file

1
[2]: data = pd.read_csv("diabetes.csv")
data.head(5)

[2]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome

0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

b) Exploring the dataset Exploring the dataset is crucial for understanding its structure, iden-
tifying missing values, data distribution, correlations, and outliers, as well as for making informed
decisions regarding data preprocessing and feature selection. It provides insights that guide data
analysis and model building.
[3]: data.shape # Checking number of rows and coulmn in the dataset

[3]: (768, 9)

[4]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
data.info() - Checking the information about the dataset, including the number of entries, the
number of columns, column names, data types of each column, and any missing values. It will help
in understanding the structure and properties of the dataset.

2
[5]: data.describe()

[5]: Pregnancies Glucose BloodPressure SkinThickness Insulin \

count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479
std 3.369578 31.972618 19.355807 15.952218 115.244002
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome

count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000

data.describe() - Checks the descriptive statistics for the numerical columns in the dataset. The
statistics include count, mean, standard deviation, minimum value, 25th percentile (Q1), median
(50th percentile or Q2), 75th percentile (Q3), and maximum value. It provides a summary of the
central tendency, spread, and distribution of the numerical data.
[6]: data.isnull().sum() # Checking if the dataset contains any null values.

[6]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

[7]: num_duplicates = data.duplicated().sum() # Checking if the dataset contains any␣

↪duplicate values

if num_duplicates > 0:
print(f"The dataset contains {num_duplicates} duplicate values")
data = data.drop_duplicates
print("Number of duplicate values after dropping:", num_duplicates)

3
else:
print("The dataset doesn't contain any duplicate values.")

The dataset doesn't contain any duplicate values.

[8]: data.hist(figsize=(10, 8)) # Checking Data Distribution

plt.tight_layout()
plt.show()

c) Extract data from the outcome column as a variable named Y: Extract the values
from the ‘outcome’ column and assign them to a variable called Y.
[9]: X = data.iloc[:,:-1]
X.head(5)

[9]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \

0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3

4
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age
0 0.627 50
1 0.351 31
2 0.672 32
3 0.167 21
4 2.288 33

d) Extract data from every column except the outcome column as a variable named
X: Extract the data from all columns except the ‘outcome’ column and assign them to a variable
called X.
[10]: Y = data.iloc[:,-1]
Y.head(5)

[10]: 0 1
1 0
2 1
3 0
4 1
Name: Outcome, dtype: int64

e) Divide the dataset into two parts for training and testing: Split the dataset into a
training set and a testing set in a 70% - 30% proportion. This will be used to train the model on
the training set and evaluate its performance on the testing set.
[11]: X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.30,␣
↪random_state = 51)

[12]: print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

(537, 8)
(231, 8)
(537,)
(231,)

1.4 Train the model:

[13]: logistic = LogisticRegression()
logistic.fit(X_train, Y_train)

[13]: LogisticRegression()

5
1.5 Evaluate the model :
[14]: Y_predict = logistic.predict(X_test)
print("Y_predict:\n",Y_predict)

Y_predict:
[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1
0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0
1 0 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 0
0 0 0 0 1 0 1 0 0]

[15]: print("Y_test:\n",Y_test)

Y_test:
737 0
505 0
296 1
711 0
329 0
..
405 0
315 0
131 1
364 0
322 1
Name: Outcome, Length: 231, dtype: int64

[16]: score = accuracy_score(Y_test, Y_predict)

print("Accuracy Score: ",score * 100)

Accuracy Score: 79.22077922077922

[17]: confusion_matrix = confusion_matrix(Y_test, Y_predict)

print("Confusion Matrix : \n",confusion_matrix)

Confusion Matrix :
[[131 11]
[ 37 52]]

1.6 Visually Understanding the performance of the model

[18]: # Create a figure and axis
fig, ax = plt.subplots()

# Plot the actual outcomes

6
ax.scatter(range(len(Y_test)), Y_test, color='blue', label='Actual Outcome')

# Plot the predicted outcomes

ax.scatter(range(len(Y_predict)), Y_predict, color='red', label='Predicted␣
↪Outcome')

# Set axis labels and title

ax.set_xlabel('Data Point')
ax.set_ylabel('Outcome')
ax.set_title('Actual vs. Predicted Outcomes')

# Add a legend
ax.legend()

# Show the plot

plt.show()

7
1.7 Use the model:
Once the model is trained and evaluated, you can use it to make predictions on new, unseen data.
This can be done by providing new input values to the model and using the predict function to
obtain the predicted outcome.

Startup India List of Incubators PDF
100% (1)
Startup India List of Incubators PDF
10 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
20 pages
Diary of A Teenage Girl, The PDF
17% (6)
Diary of A Teenage Girl, The PDF
123 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
1 page
Cia 2 ML 2348352
No ratings yet
Cia 2 ML 2348352
6 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
lab_8__(6)عفان عبدالله احمد_التكليف_
No ratings yet
lab_8__(6)عفان عبدالله احمد_التكليف_
18 pages
Diabetes
No ratings yet
Diabetes
97 pages
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
No ratings yet
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
21 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Clase-02-ML - Colab
No ratings yet
Clase-02-ML - Colab
5 pages
Day93 94 Diabetes Prediction Model
No ratings yet
Day93 94 Diabetes Prediction Model
27 pages
From Import: Image Image (Filename, Height, Width)
No ratings yet
From Import: Image Image (Filename, Height, Width)
5 pages
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
No ratings yet
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
8 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Diabetes Prediction - Logistic Regression - Jupyter Notebook
No ratings yet
Diabetes Prediction - Logistic Regression - Jupyter Notebook
4 pages
Materi Analisis Data - Colab
No ratings yet
Materi Analisis Data - Colab
3 pages
One R
No ratings yet
One R
3 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
Assignment 2 B
No ratings yet
Assignment 2 B
10 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Logidtic_Regression_ASSIGNMENT
No ratings yet
Logidtic_Regression_ASSIGNMENT
13 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Ii Avaliação Parcial - Ia - 25.0-Gabarito
No ratings yet
Ii Avaliação Parcial - Ia - 25.0-Gabarito
9 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Univariate and Multivariate Analysis - Jupyter Notebook
No ratings yet
Univariate and Multivariate Analysis - Jupyter Notebook
5 pages
Data pipeline in ML
No ratings yet
Data pipeline in ML
3 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
aditi_dsbda4_final_main
No ratings yet
aditi_dsbda4_final_main
3 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Mean Vector and Correlation Matrix in R - Jupyter Notebook
No ratings yet
Mean Vector and Correlation Matrix in R - Jupyter Notebook
7 pages
Generative AI Binary Classification
No ratings yet
Generative AI Binary Classification
7 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
ICMR Healthcare Capstone Project - Jupyter Notebook
No ratings yet
ICMR Healthcare Capstone Project - Jupyter Notebook
30 pages
# Diabetes: Pandas PD Numpy NP Seaborn Sns
No ratings yet
# Diabetes: Pandas PD Numpy NP Seaborn Sns
4 pages
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
No ratings yet
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
18 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Manufacturing Machine Learning Tool Mechanical
No ratings yet
Manufacturing Machine Learning Tool Mechanical
13 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
Logistic Regression With Pyspark
No ratings yet
Logistic Regression With Pyspark
19 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
Py 4 DS
No ratings yet
Py 4 DS
95 pages
NUmpy Interview Q
No ratings yet
NUmpy Interview Q
3 pages
DataCleaning Techniques
No ratings yet
DataCleaning Techniques
20 pages
ISO 27001 Mindmaps
100% (1)
ISO 27001 Mindmaps
6 pages
Frontline Management Training PDF
No ratings yet
Frontline Management Training PDF
9 pages
And Concept of Research: Presenter Name
No ratings yet
And Concept of Research: Presenter Name
10 pages
Meaning & Nature of Research: UNIT 1.1
No ratings yet
Meaning & Nature of Research: UNIT 1.1
13 pages
MP Front End Requirements
No ratings yet
MP Front End Requirements
7 pages
Lalchandani NH Thesis BM
No ratings yet
Lalchandani NH Thesis BM
305 pages
What Part Does Religion Play in Gender Roles at Work?: Why Faith Belongs in Your Workplace
No ratings yet
What Part Does Religion Play in Gender Roles at Work?: Why Faith Belongs in Your Workplace
14 pages
Reliability and Validity of Q
No ratings yet
Reliability and Validity of Q
6 pages
Applications and Properties of Nickel Alloys
No ratings yet
Applications and Properties of Nickel Alloys
2 pages
Chemistry Test Study Guide PDF
No ratings yet
Chemistry Test Study Guide PDF
8 pages
OSMSES 2023 Paper 03
No ratings yet
OSMSES 2023 Paper 03
8 pages
Literature Review Pneumonia
100% (1)
Literature Review Pneumonia
4 pages
Gynae UM Paper 2
No ratings yet
Gynae UM Paper 2
16 pages
Download full Hardy Martingales Stochastic Holomorphy L 1 Embeddings and Isomorphic Invariants 1st Edition Paul F. X. Müller ebook all chapters
100% (4)
Download full Hardy Martingales Stochastic Holomorphy L 1 Embeddings and Isomorphic Invariants 1st Edition Paul F. X. Müller ebook all chapters
40 pages
CC Transes 1
No ratings yet
CC Transes 1
6 pages
Outlook On Blast Furnace-2
No ratings yet
Outlook On Blast Furnace-2
41 pages
Senior Living Community 4
No ratings yet
Senior Living Community 4
14 pages
Alemnew Mamo Site
No ratings yet
Alemnew Mamo Site
6 pages
Decision Log
No ratings yet
Decision Log
19 pages
Topper 110 2 3 Maths Solution Up202308291227 1693292242 2785
No ratings yet
Topper 110 2 3 Maths Solution Up202308291227 1693292242 2785
19 pages
PVX12SpecSheet
No ratings yet
PVX12SpecSheet
6 pages
State Park: Legend
No ratings yet
State Park: Legend
1 page
UE Measurement Reporting and Control
No ratings yet
UE Measurement Reporting and Control
4 pages
Standards R M Salem - Userguide
No ratings yet
Standards R M Salem - Userguide
49 pages
NCM 109 Checklist
No ratings yet
NCM 109 Checklist
22 pages
6 Types of Symbiotic Relationships EXPLAINED (With Examples) - by Ernest Wolfe - Countdown - Education - Medium
No ratings yet
6 Types of Symbiotic Relationships EXPLAINED (With Examples) - by Ernest Wolfe - Countdown - Education - Medium
2 pages
[Ebooks PDF] download Advanced Topics in Quantum Field Theory 2nd Edition Mikhail Shifman full chapters
100% (4)
[Ebooks PDF] download Advanced Topics in Quantum Field Theory 2nd Edition Mikhail Shifman full chapters
47 pages
Aps U6 Test Review 2016 Key
No ratings yet
Aps U6 Test Review 2016 Key
4 pages
PC460LC 8零件目录
100% (1)
PC460LC 8零件目录
424 pages
A Study On Customer Satisfaction Towards Honda Bikes in Honda Rockfor1
No ratings yet
A Study On Customer Satisfaction Towards Honda Bikes in Honda Rockfor1
12 pages
RCR 224
100% (1)
RCR 224
12 pages
Sma 305 Complex Analysis I
No ratings yet
Sma 305 Complex Analysis I
3 pages
Capacities and Specifications
No ratings yet
Capacities and Specifications
30 pages
Carbon and Its Compounds - DPP - Spartans Batch
No ratings yet
Carbon and Its Compounds - DPP - Spartans Batch
2 pages
Zoonosis
No ratings yet
Zoonosis
27 pages
CASCamera
No ratings yet
CASCamera
6 pages
ZnO As A Buffer Layer For Growth of BiFeO 3 Thin Films
No ratings yet
ZnO As A Buffer Layer For Growth of BiFeO 3 Thin Films
9 pages