0% found this document useful (0 votes)
36 views

Artificial Neural Networks: Supriya A Jadhav

This document discusses building an artificial neural network model to predict startup profit. It includes data preprocessing steps like outlier treatment and feature engineering. Exploratory data analysis is performed through univariate, bivariate and graphical analyses. Different ANN models are built using Keras and TensorFlow by experimenting with various activation functions.

Uploaded by

Jadhav A.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Artificial Neural Networks: Supriya A Jadhav

This document discusses building an artificial neural network model to predict startup profit. It includes data preprocessing steps like outlier treatment and feature engineering. Exploratory data analysis is performed through univariate, bivariate and graphical analyses. Different ANN models are built using Keras and TensorFlow by experimenting with various activation functions.

Uploaded by

Jadhav A.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Artificial Neural Networks

Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:


Name: Supriya A Jadhav
Batch ID: DSWDMCOH 081221 (071221)
Topic: Artificial Neural Networks

Grading Guidelines:
1. An assignment submission is considered complete only when correct and executable
code(s) are submitted along with the documentation explaining the method and
results. Failing to submit either of those will be considered an invalid submission and
will not be considered for evaluation.
2. Assignments submitted after the deadline will affect your grades.

Grading:
Ans Date Ans Date
Correct On time A 100
80% & above On time B 85 Correct Late
50% & above On time C 75 80% & above Late
50% & below On time D 65 50% & above Late
E 55 50% & below
Copied/No Submission F 45

● Grade A: (>= 90): When all assignments are submitted on or before the given
deadline.
● Grade B: (>= 80 and < 90):
o When assignments are submitted on time but less than 80% of problems are
completed.
(OR)
o All assignments are submitted after the deadline.

● Grade C: (>= 70 and < 80):


© 2013 - 2021 360DigiTMG. All Rights Reserved.
o When assignments are submitted on time but less than 50% of the problems
are completed.
(OR)
o Less than 80% of problems in the assignments are submitted after the
deadline.

● Grade D: (>= 60 and < 70):


o Assignments submitted after the deadline and with 50% or less problems.

● Grade E: (>= 50 and < 60):


o Less than 30% of problems in the assignments are submitted after the
deadline.
(OR)
o Less than 30% of problems in the assignments are submitted before the
deadline.

● Grade F: (< 50): No submission (or) malpractice.


Hints:

1.Business Problem
1.1. What is the business objective?
1.2. Are there any constraints?

2.Work on each feature of the dataset to create a data dictionary as


displayed in the below image:

2.1 Make a table as shown above and provide information about the features such as
its data type and its relevance to the model building. And if not relevant, provide
reasons and a description of the feature.

3.Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
© 2013 - 2021 360DigiTMG. All Rights Reserved.
3.2 Outlier Treatment if applicable.
4.Exploratory Data Analysis (EDA):
4.1. Summary.
4.2. Univariate analysis.
4.3. Bivariate analysis.

5.Model Building:
5.1Build an Artificial Neural Network model on the given datasets.
5.2Use TensorFlow and Keras packages.
5.3Briefly explain the output in the documentation for each step in your
own words.
5.4Use different activation functions to get the best model.
6.Write about the benefits/impact of the solution - in what way does the
business (client) benefit from the solution provided?

© 2013 - 2021 360DigiTMG. All Rights Reserved.


1. We have a dataset which contains the details of 50 startups. Build an ANN
model to predict the profit of a new startup based on certain features.

Business Problem
What is the business objective?
To find out what factors affect a startup company and if it will be profitable or
not.

Data Dictionaries:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Name of Description Type Relevance
Feature
R&D Spend Total amount of Continuous Relevant(It’s our input
money spent on variable i.e.
Research and Independent variable)
Development by
the startup.
Administration Total amount of Continuous Relevant(It’s our input
money spent on variable i.e.
Administration by Independent variable
the startup.
Marketing Total amount of Relevant(It’s our input
Spend money spent on Continuous variable i.e.
Marketing by the Independent variable
startup.
State The State or region Nominal Relevant(It’s our input
in which the variable i.e.
startup is launched Independent variable
or operates.
Profit the Profit acquired Continuous Relevant(It’s our target
by the Startup. variable i.e. Dependent
Variable)

import pandas as pd

#loading the dataset


startup = pd.read_csv("C:\\Users\\ankush\\Desktop\\DataSets\ANN\\50_Startups
(2).csv")

#details of startup
startup.info()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


startup.describe()

#rename the columns


startup.rename(columns = {'R&D Spend':'rd_spend', 'Marketing Spend' : 'm_spend'}
, inplace = True)

#data types
startup.dtypes

#checking for na value


startup.isna().sum()
startup.isnull().sum()

#checking unique value for each columns


startup.nunique()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Above fig shows the unique value of each feature.

"""Exploratory Data Analysis (EDA):


Summary
Univariate analysis
Bivariate analysis """

EDA ={"column ": startup.columns,


"mean": startup.mean(),
"median":startup.median(),
"mode":startup.mode(),
"standard deviation": startup.std(),
"variance":startup.var(),
"skewness":startup.skew(),
"kurtosis":startup.kurt()}

EDA
'standard deviation': rd_spend 45902.256482
Administration 28017.802755
m_spend 122290.310726
Profit 40306.180338
dtype: float64,
'variance': rd_spend 2.107017e+09
Administration 7.849973e+08
m_spend 1.495492e+10
Profit 1.624588e+09
dtype: float64,
'skewness': rd_spend 0.164002
Administration -0.489025
m_spend -0.046472
Profit 0.023291
© 2013 - 2021 360DigiTMG. All Rights Reserved.
dtype: float64,
'kurtosis': rd_spend -0.761465
Administration 0.225071
m_spend -0.671701
Profit -0.063859

# covariance for data set


covariance = startup.cov()
covariance

# Correlation matrix
co = startup.corr()
co

according to correlation coefficient no correlation of Administration & State with


model_profit
####### graphistartup repersentation

##historgam and scatter plot


import seaborn as sns
sns.pairplot(startup.iloc[:, :])

© 2013 - 2021 360DigiTMG. All Rights Reserved.


According scatter plot strong correlation between model_profit and rd_spend and
also some relation between model_dffit and m_spend.

#boxplot for every columns


startup.columns
startup.nunique()

startup.boxplot(column=['rd_spend', 'Administration', 'm_spend', 'Profit']) #no


outlier

© 2013 - 2021 360DigiTMG. All Rights Reserved.


here we can see there is outlier For profit

# Detection of outliers (find limits for RM based on IQR)


IQR = startup['Profit'].quantile(0.75) - startup['Profit'].quantile(0.25)
lower_limit = startup['Profit'].quantile(0.25) - (IQR * 1.5)

####################### 2.Replace ############################


# Now let's replace the outliers by the maximum and minimum limit
#Graphical Representation
import numpy as np
import matplotlib.pyplot as plt # mostly used for visualization purposes

#startup['Profit']= pd.DataFrame( np.where(startup['Profit'] < lower_limit,


lower_limit, startup['Profit']))

import seaborn as sns


sns.boxplot(startup.Profit);plt.title('Boxplot');plt.show()

# rd_spend
plt.bar(height = startup.rd_spend, x = np.arange(1, 51, 1))
plt.hist(startup.rd_spend) #histogram
plt.boxplot(startup.rd_spend) #boxplot

© 2013 - 2021 360DigiTMG. All Rights Reserved.


# Administration
plt.bar(height = startup.Administration, x = np.arange(1, 51, 1))
plt.hist(startup.Administration) #histogram
plt.boxplot(startup.Administration) #boxplot

# m_spend
plt.bar(height = startup.m_spend, x = np.arange(1, 51, 1))
plt.hist(startup.m_spend) #histogram
plt.boxplot(startup.m_spend) #boxplot

© 2013 - 2021 360DigiTMG. All Rights Reserved.


#profit
plt.bar(height = startup.Profit, x = np.arange(1, 51, 1))
plt.hist(startup.Profit) #histogram
plt.boxplot(startup.Profit) #boxplot

# Jointplot

sns.jointplot(x=startup['Profit'], y=startup['rd_spend'])

© 2013 - 2021 360DigiTMG. All Rights Reserved.


# Q-Q Plot
from scipy import stats
import pylab

stats.probplot(startup.Profit, dist = "norm", plot = pylab)


plt.show()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


# startupfit is normally distributed

stats.probplot(startup.Administration, dist = "norm", plot = pylab)


plt.show()

# administration is normally distributed

© 2013 - 2021 360DigiTMG. All Rights Reserved.


stats.probplot(startup.rd_spend, dist = "norm", plot = pylab)
plt.show()

stats.probplot(startup.m_spend, dist = "norm", plot = pylab)


plt.show()

#normal

© 2013 - 2021 360DigiTMG. All Rights Reserved.


# Normalization function using z std. all are continuous data.
def norm_func(i):
x = (i-i.mean())/(i.std())
return (x)

# Normalized data frame (considering the numerical part of data)


df_norm = norm_func(startup.iloc[:,[0,1,2]])
df_norm.describe()

"""
from sklearn.preprocessing import OneHotEncoder
# creating instance of one-hot-encoder
enc = OneHotEncoder(handle_unknown='ignore')
sta=startup.iloc[:,[3]]
enc_df = pd.DataFrame(enc.fit_transform(sta).toarray())"""

# Create dummy variables on categorcal columns

enc_df = pd.get_dummies(startup.iloc[:,[3]])
enc_df.columns
enc_df.rename(columns={"State_New York":'State_New_York'},inplace= True)

model_df = pd.concat([enc_df, df_norm, startup.iloc[:,4]], axis =1)

# Rearrange the order of the variables


model_df = model_df.iloc[:, [6, 0,1, 2, 3,4,5]]

##################################
###upport Vector Machines MODEL###
"""
import numpy as np
np.random.seed(10)

X= model_df.iloc[:,1:]
Y= model_df.iloc[:,0]

© 2013 - 2021 360DigiTMG. All Rights Reserved.


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2,random_state =
457) # 20% test data

from tensorflow.keras import Sequential


from tensorflow.keras.layers import Dense
import sklearn.metrics as skl_mtc
from tensorflow import keras
import matplotlib.pyplot as plt

model = keras.models.Sequential()
model.add(keras.layers.Dense(5000, activation='relu', input_dim=6))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(500, activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(1, kernel_initializer='uniform'))
model.compile(loss=keras.losses.MeanSquaredError(),
optimizer=keras.optimizers.Nadam(
learning_rate=0.009,
beta_1=0.8,
beta_2=0.999),metrics=["mse"])

early_stopping = keras.callbacks.EarlyStopping(
monitor='val_loss',
verbose=1,
patience=20,
mode='auto',
restore_best_weights=True)

reduce_lr = keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.2,
patience=10,
verbose=1,
mode='auto',
min_delta=0.0005,
© 2013 - 2021 360DigiTMG. All Rights Reserved.
cooldown=0,
min_lr=1e-6)

# fitting model on train data


model.fit(x=x_train,y=y_train,batch_size=2,epochs=100)

# Evaluating the model on test data


eval_score_test = model.evaluate(x_test,y_test,verbose = 1)

# accuracy on test data set

# accuracy score on train data


eval_score_train = model.evaluate(x_train,y_train,verbose=1)
predict_y = model.predict(x_test)

#R2-score
result = skl_mtc.r2_score(y_test, predict_y)
print(f'R2-score in test set: {np.round(result, 4)}')

R2-score in test set: -5.1253

# test residual values


# accuracy on train data set

pred_df = pd.DataFrame(predict_y, columns =['predict_y'])


pred_y= pred_df.iloc[:,0]
test_resid = pred_y - y_test

# RMSE value for test data


test_rmse = np.sqrt(np.mean(test_resid * test_resid))
test_rmse
Out[100]: 106012.58840077846

#graph for eppchs


history = model.fit(x_train, y_train, epochs=10, batch_size=2, verbose=1,
validation_split=0.2)

print(history.history.keys())

dict_keys(['loss', 'mse', 'val_loss', 'val_mse'])


© 2013 - 2021 360DigiTMG. All Rights Reserved.
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


2. We have a dataset about 517 fires from the Montesano natural park in
Portugal. For each incident, weekday, month, coordinates, and the burnt area
are recorded, as well as several meteorological data such as rain,
temperature, humidity, and wind. Predict the burnt area of forest fires with
the help of an Artificial Neural Network model.

Business Objective:
To predict the size of the burnt area in forest fires annually so that they can be
better prepared in future calamities.
© 2013 - 2021 360DigiTMG. All Rights Reserved.
Data Dictionaries:

Name of Description Type Relevance


Feature
Month Month of the Categorical Relevant
forestfire
Day Day of the forestfire Categorical Relevant
FFMC FFMC Continuous Relevant
DMC DMC Continuous Relevant
DC DC Continuous Irrelevant
ISI ISI Continuous Relevant
Temp Temp at the time Continuous Irrelevant
RH RH Continuous Irrelevant
Wind Wind Continuous Relevant
Rain Rain Continuous Relevant
Area Area Continuous Relevant
size_category size_category Categorical Relevant(It is
our Y variable)

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Layer, Lambda

forestfires =
pd.read_csv("C:\\Users\\ankush\\Desktop\\DataSets\\ANN\\fireforests.csv")

#As dummy variables are already created, we will remove the month and alsoday
columns
forestfires.drop(["month", "day"], axis = 1, inplace = True)

forestfires["area"] = np.where(forestfires["area"] > 50, 1, 0)

forestfires["area"].value_counts()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


forestfires.isnull().sum()
forestfires.describe()
#Normalization being done.
def norm_func(i):
x = (i - i.min()) / (i.max() - i.min())
return (x)

predictors = forestfires.iloc[ :, 0:8]


target = forestfires.iloc[ :, 8]

predictors1 = norm_func(predictors)
#data = pd.concat([predictors1,target],axis=1)

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test= train_test_split(predictors1, target, test_size = 0.2,
stratify = target)

def prep_model(hidden_dim):
model = Sequential()
for i in range(1, len(hidden_dim) - 1):
if (i == 1):
model.add(Dense(hidden_dim[i], input_dim = hidden_dim[0], activation =
"relu"))
else:
model.add(Dense(hidden_dim[i], activation = "relu"))
model.add(Dense(hidden_dim[-1], kernel_initializer = "normal", activation =
"sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = "rmsprop", metrics =
["accuracy"])
return model

first_model = prep_model([8, 50, 40, 20, 1])


first_model.fit(np.array(x_train), np.array(y_train), epochs = 750)
pred_train = first_model.predict(np.array(x_train))

#Converting the predicted values to series


pred_train = pd.Series([i[0] for i in pred_train])

© 2013 - 2021 360DigiTMG. All Rights Reserved.


size = ["small", "large"]
pred_train_class = pd.Series(["small"]*413)
pred_train_class[[i > 0.5 for i in pred_train]] = "large"

train = pd.concat([x_train, y_train], axis = 1)


train["area"].value_counts()

# Cheking with prediction for training data


from sklearn.metrics import confusion_matrix
train["original_class"] = "small"
train.loc[train["area"] == 1, "original_class"] = "large"
train.original_class.value_counts()
confusion_matrix(pred_train_class, train["original_class"])

By observing above confusion matrix we can say that there is 10 times large fires
and 392 times small fires.

np.mean(pred_train_class == pd.Series(train["original_class"]).reset_index(drop =
True))

The accuracy is 97.33%


pd.crosstab(pred_train_class,pd.Series(train["original_class"]).reset_index(drop =
True))

© 2013 - 2021 360DigiTMG. All Rights Reserved.


By observing above confusion matrix we can say that there is 10 times large fires
and 392 times small fires.

#For test data


pred_test = first_model.predict(np.array(x_test))
pred_test = pd.Series([i[0] for i in pred_test])
pred_test_class = pd.Series(["small"]*104)
pred_test_class[[i>0.5 for i in pred_test]] = "large"
test =pd.concat([x_test, y_test], axis = 1)
test["original_class"] = "small"
test.loc[test["area"] == 1, "original_class"] = "large"
test["original_class"].value_counts()

np.mean(pred_test_class==pd.Series(test["original_class"]).reset_index(drop =
True))

Accuracy is 93.36%
confusion_matrix(pred_test_class,test["original_class"])

© 2013 - 2021 360DigiTMG. All Rights Reserved.


pd.crosstab(pred_test_class,pd.Series(test["original_class"]).reset_index(drop =
True))

By observing above confusion matrix we can say that there is 0 times large fires
and 97 times small fires.

© 2013 - 2021 360DigiTMG. All Rights Reserved.


3. The following dataset consists of 1030 instances with 9 attributes and has no
missing values. There are 8 input variables and 1 output variable. Seven input
variables represent the amount of raw material (measured in kg/m³) and one
represents Age (in Days). The target variable is Concrete Compressive
Strength measured in (Mega Pascal). Build a Neural network model to predict
the compressive strength.

Business Problem
What is the business objective?
To predict the compressive strength of Concrete.

Data Dictionaries:
Name of Description Type Relevance
© 2013 - 2021 360DigiTMG. All Rights Reserved.
Feature
Cement One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable)
Slag One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Ash One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Water One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Superplastic One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Coarseagg One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Fineagg One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
age One of the content Continuous Relevant(It’s our input
in concrete. variable i.e.
Independent variable
Strength One of the content Continuous Relevant(It’s our target
in concrete. variable i.e. Dependent
Variable)

import pandas as pd
from pathlib import Path
from sklearn import model_selection
from sklearn import preprocessing
import matplotlib.pyplot as plt
from keras import models, layers, metrics

import numpy as np
np.random.seed(22)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


concrete_data=
pd.read_csv("C:\\Users\\ankush\\Desktop\\DataSets\\ANN\\concrete.csv")

print(concrete_data.head())

predictors = concrete_data.iloc[:,0:8].values
outcomes = concrete_data.iloc[:,8].values

min_max_scaler = preprocessing.MinMaxScaler()
predictors_scaled = min_max_scaler.fit_transform(predictors)
predictors_scaled[:5,]

X_train, X_test, y_train, y_test = model_selection.train_test_split(predictors,


outcomes, test_size=0.33, random_state=22)
print('X_train {0}, y_train {1}'.format(X_train.shape, y_train.shape))
print('X_test {0}, y_test {1}'.format(X_test.shape, y_test.shape))

network = models.Sequential()
network.add(layers.Dense(10, activation='relu', input_shape=(X_train.shape[1], )))
network.add(layers.Dense(5, activation='relu'))
network.add(layers.Dense(1))
network.compile(optimizer='adam',
loss='mean_squared_error')

fig, axes = plt.subplots(2, 5, figsize=(16,8), sharex=True, sharey=True)


losses = []
for i in range(2):
for j in range(5):
© 2013 - 2021 360DigiTMG. All Rights Reserved.
network.fit(X_train, y_train, epochs=50, batch_size=128, verbose=0);
pred_loss = network.evaluate(X_test, y_test, verbose=0)
losses.append(pred_loss)
preds = network.predict(X_test)
axes[i,j].scatter(preds, y_test, alpha=0.2)
axes[i,j].set_title('{0} epochs'.format((5*i+j+1)*50))
axes[i,j].set_ylabel('Actual')
axes[i,j].set_xlabel('Predicted')

fig, ax = plt.subplots(1, 1, figsize=(10, 5))


ax.plot(losses)

ax.set_title('Concrete Compressive Strength Regression Model Loss')


epochs = [str(i*50) for i in range(1, len(losses)+1)]
ax.set_xticks(range(len(losses)))
ax.set_xticklabels(epochs)
ax.set_xlabel('Epochs')
ax.set_ylabel('Mean Squared Error')
ax.text(len(losses)-2, losses[len(losses)-1]+10, 'Min MSE:
{0:.2f}'.format(losses[len(losses)-1]));

© 2013 - 2021 360DigiTMG. All Rights Reserved.


1. RPL Banking and Financing company wants to study the behavior patterns of
their customers so that they can efficiently provide their services and solve
the problem of churn. They have historical data of their customers. Build an
Artificial Neural Network with Exited as the target variable.

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Business Problem
What is the business objective?
To study the behavior patterns of their customers so that they can efficiently
provide their services and solve the problem of churn.
Data Dictionaries:
Name of Description Type Relevance
Feature
CustomerId CustomerId of the Continuous Irelevant
customer
Surname Surname of the Continuous Irelevant
customer
CreditScore CreditScore of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
Geography Geography of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
Gender Gender of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
Age Age of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
Tenure Tenure of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
Balance Balance of the Continuous Relevant(It’s our input
customer variable i.e.
Independent variable
NumOfProducts NumOfProducts of Continuous Relevant(It’s our input
the customer variable i.e.
Independent variable
EstimatedSalary EstimatedSalary of Continuous Relevant(It’s our input
the customer variable i.e.
Independent variable
Exited Exited Continuous Relevant(It’s our target
variable i.e. Dependent
Variable)

import pandas as pd
© 2013 - 2021 360DigiTMG. All Rights Reserved.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
ann_data = pd.read_csv("C:\\Users\\ankush\\Desktop\\DataSets\\ANN\\RPL.csv")
ann_data.head()

ann_data.info()

ann_data.describe()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


sns.pairplot(data = ann_data, hue = "Exited")

sns.set(rc = {"figure.figsize": (8, 6)})


sns.boxplot(x="Exited", y="Age", data=ann_data)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


sns.violinplot(x="Exited", y="NumOfProducts", data=ann, palette="Set1")

© 2013 - 2021 360DigiTMG. All Rights Reserved.


sns.boxplot(x="Geography", y="Exited", data=ann_data)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Correlation

corr = ann_data.corr()
ax =
sns.heatmap(corr,annot=True,cmap='RdYlGn',linewidths=0.1,annot_kws={'size':12})
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)
fig=plt.gcf()
fig.set_size_inches(12,10)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Encoding Categorical Data
ann1 = pd.get_dummies(ann_data)

Input and Output variable

X = ann1.iloc[:, 3:13].values
y = ann1.iloc[:, 13].values
X, y

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Spliting
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 0)

PreProcessing
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Building the Neural Network


import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN


classifier = Sequential()

# Adding the input layer and the first hidden layer


classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu',
input_dim = 10))

# Adding the second hidden layer


classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer


classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN


classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics =
['accuracy'])

© 2013 - 2021 360DigiTMG. All Rights Reserved.


# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 200)

# Predicting the Test set results


y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred

from sklearn.metrics import confusion_matrix


cm = confusion_matrix(y_test, y_pred)
cm
Out[42]: array([[2000]], dtype=int64)

ax =
sns.heatmap(cm,annot=True,cmap='RdYlGn',linewidths=0.1,annot_kws={'size':12})
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)
fig=plt.gcf()
fig.set_size_inches(12,10)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

© 2013 - 2021 360DigiTMG. All Rights Reserved.


As you can see there are no missclassified samples in the
Confusion Matrix. Therefor, the Accuracy Score = 0.1

print(f"Accuracy: {1*100}%")

Accuracy: 100%

Hence Accuracy of the model is 100% which is preety good

© 2013 - 2021 360DigiTMG. All Rights Reserved.

You might also like