0% found this document useful (0 votes)
15 views9 pages

Ai in HC - 2

The document outlines an experiment aimed at predicting heart disease using machine learning techniques. It details the data preparation process, including data cleaning, visualization, and the application of logistic regression for model training and evaluation. The dataset consists of 165 individuals with heart disease and 138 without, and the model's accuracy is assessed through training and testing phases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Ai in HC - 2

The document outlines an experiment aimed at predicting heart disease using machine learning techniques. It details the data preparation process, including data cleaning, visualization, and the application of logistic regression for model training and evaluation. The dataset consists of 165 individuals with heart disease and 138 without, and the model's accuracy is assessed through training and testing phases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Experiment - 2

Aim:- Heart Disease Prediction using Machine


Learning.

Theory:- Predicting and diagnosing heart disease is the


biggest challenge in the medical industry and relies on
factors such as the physical examination, symptoms, and
signs of the patient.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

%matplotlib inline
sns.set_style("whitegrid")
plt.style.use("fivethirtyeight")

df = pd.read_csv("heart.csv")
df.head()

Output:-
df.target.value_counts().plot(kind="bar", color=["salmon",
"lightblue"])

Output:-

We have 165 people with heart disease and 138 people without
heart disease, so our problem is balanced.
# Checking for messing values
df.isna().sum()

Output:-

age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

This dataset looks perfect to use as we don’t have null values


after checking the missing values.

categorical_val = []
continous_val = []
for column in df.columns:
print('==============================')
print(f"{column} : {df[column].unique()}")
if len(df[column].unique()) <= 10:
categorical_val.append(column)
else:
continous_val.append(column)
Output:-

plt.figure(figsize=(15, 15))
for i, column in enumerate(categorical_val, 1):
plt.subplot(3, 3, i)
df[df["target"] == 0][column].hist(bins=35, color='blue',
label='Have Heart Disease = NO', alpha=0.6)
df[df["target"] == 1][column].hist(bins=35, color='red',
label='Have Heart Disease = YES', alpha=0.6)
plt.legend()
plt.xlabel(column)

Output:-
# Create another figure
plt.figure(figsize=(10, 8))

# Scatter with postivie examples


plt.scatter(df.age[df.target==1],
df.thalach[df.target==1],
c="salmon")

# Scatter with negative examples


plt.scatter(df.age[df.target==0],
df.thalach[df.target==0],
c="lightblue")

# Add some helpful info


plt.title("Heart Disease in function of Age and Max Heart Rate")
plt.xlabel("Age")
plt.ylabel("Max Heart Rate")
plt.legend(["Disease", "No Disease"]);

Output:-
Data Processing
categorical_val.remove('target')
dataset = pd.get_dummies(df, columns = categorical_val)

from sklearn.preprocessing import StandardScaler

s_sc = StandardScaler()
col_to_scale = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
dataset[col_to_scale] = s_sc.fit_transform(dataset[col_to_scale])

#Now let’s split the data into training and test sets. I will split
the data into 70% training and 30% testing:

from sklearn.model_selection import train_test_split

X = dataset.drop('target', axis=1)
y = dataset.target

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.3, random_state=42)

Now let’s train the machine learning model and print the
classification report of our logistic regression model:
from sklearn.linear_model import LogisticRegression

lr_clf = LogisticRegression(solver='liblinear')
lr_clf.fit(X_train, y_train)

print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)


print_score(lr_clf, X_train, y_train, X_test, y_test,
train=False)

Output:-
test_score = accuracy_score(y_test, lr_clf.predict(X_test)) * 100
train_score = accuracy_score(y_train, lr_clf.predict(X_train)) *
100

results_df = pd.DataFrame(data=[["Logistic Regression",


train_score, test_score]],
columns=['Model', 'Training Accuracy
%', 'Testing Accuracy %'])
results_df

Output:-

You might also like