0% found this document useful (0 votes)
35 views12 pages

Lab 7

Uploaded by

Shaiza Akhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views12 pages

Lab 7

Uploaded by

Shaiza Akhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Lab 8: Feed Forward Neural Network

CLO-2

Part 1

Representing the feed-forward neural network using Python

Let us create the respective sample weights which are to be applied in the
input layer, the first & the second hidden layer
import numpy as np
from sklearn import datasets
#
# Generate a dataset and plot it
#
np.random.seed(0)
X, y = datasets.make_moons(200, noise=0.20)
#
# Neural network architecture
# No of nodes in input layer = 4
# No of nodes in output layer = 3
# No of nodes in the hidden layer = 6
#
input_dim = 4 # input layer dimensionality
output_dim = 3 # output layer dimensionality
hidden_dim = 6 # hidden layer dimensionality
#
# Weights and bias element for layer 1
# These weights are applied for calculating
# weighted sum arriving at neurons in 1st hidden layer
#
W1 = np.random.randn(input_dim, hidden_dim)
b1 = np.zeros((1, hidden_dim))
#
# Weights and bias element for layer 2
# These weights are applied for calculating
# weighted sum arriving at neurons in 2nd hidden layer
#
W2 = np.random.randn(hidden_dim, hidden_dim)
b2 = np.zeros((1, hidden_dim))
#
# Weights and bias element for layer 2
# These weights are applied for calculating
# weighted sum arriving at in the final / output layer
#
W3 = np.random.randn(hidden_dim, output_dim)
b3 = np.zeros((1, output_dim))

Python code implementation for the propagation of the input signal


through different layers towards the output layer
#
# Forward propagation of input signals
# to 6 neurons in first hidden layer
# activation is calculated based tanh function
#
z1 = X.dot(W1) + b1
a1 = np.tanh(z1)
#
# Forward propagation of activation signals from first hidden layer
# to 6 neurons in second hidden layer
# activation is calculated based tanh function
#
z2 = a1.dot(W2) + b2
a2 = np.tanh(z2)
#
# Forward propagation of activation signals from second hidden layer
# to 3 neurons in output layer
#
z3 = a2.dot(W3) + b3
#
# Probability is calculated as an output
# of softmax function
#
probs = np.exp(z3) / np.sum(np.exp(z3), axis=1, keepdims=True)

Part 2

Now we will train a deep Artificial Neural Networks (ANN) to better classify
the datasets which the logistic regression model struggled, Moons and
Circles. We will also classify an even harder dataset of Sine Wave to
demonstrate that ANN can form really complex decision boundaries.

1. Complex Data - Moons

While building Keras models for logistic regression above, we performed


the following steps:

Step 1: Define a Sequential model.

Step 2: Add a Dense layer with sigmoid activation function. This was the
only layer we needed.

Step 3: Compile the model with an optimizer and loss function.

Step 4: Fit the model to the dataset.

Step 5: Analyze the results: plotting loss/accuracy curves, plotting the


decision boundary, looking at the classification report, and understanding
the confusion matrix.

While building a deep neural network, we only need to change step 2 such
that, we will add several Dense layers one after another. The output of
one layer becomes the input of the next. Keras again does most of the
heavy lifting by initializing the weights and biases, and connecting the
output of one layer to the input of the next. We only need to specify how
many nodes we want in a given layer, and the activation function. It’s as
simple as that.
We first add a layer with 4 nodes and tanh activation function. Tanh is a
commonly used activation function. We then add another layer with 2
nodes again using tanh activation. We finally add the last layer with 1
node and sigmoid activation. This is the final layer that we also used in
the logistic regression model.

This is not a very deep ANN, it only has 3 layers: 2 hidden layers, and the
output layer. But notice a couple of patterns:

Output layer still uses the sigmoid activation function since we’re working
on a binary classification problem.

Hidden layers use the tanh activation function. If we added more hidden
layers, they would also use tanh activation. We have a couple of options
for activation functions: sigmoid, tanh, relu, and variants of relu.

We have fewer number of nodes in each subsequent layer. It’s common to


have less nodes as we stack layers on top of one another, sort of a
triangular shape.

We didn’t build a very deep ANN here because it wasn’t necessary. We


already achieve 100% accuracy with this configuration.

The ANN is able to come up with a perfect separator to distinguish the


classes.
100% precision, nothing misclassified.

2. Complex Data - Circles

Now let’s look at the Circles dataset, where the LR model achieved only
50% accuracy. The model is the same as above, we only change the input
to the fit function using the current dataset. And we again achieve 100%
accuracy.
Similarly the decision boundary looks just like the one we would draw
by hand ourselves. The ANN was able to figure out an optimal
separator.

Just like above we get 100% accuracy.

3. Complex Data - Sine Wave


Let’s try to classify one final toy dataset. In the previous sections, the
classes were separable by one continuous decision boundary. The
boundary had a complex shape, it wasn’t linear, but still one continuous
decision boundary was enough. ANN can draw arbitrary number of
complex decision boundaries, and we will demonstrate that.

Let’s create a sinusoidal dataset looking like the sine function, every up
and down belonging to an alternating class. As we can see in the figure, a
single decision boundary won’t be able to separate out the classes. We
will need a series of non-linear separators.

Now we need a more complex model for accurate classification. So we


have 3 hidden layers, and an output layer. The number of nodes per layer
has also increased to improve the learning capacity of the model.
Choosing the right number of hidden layers and nodes per layer is more of
an art than science, usually decided by trial and error.
The ANN was able to model a pretty complex set of decision boundaries.

Precision is 99%, we only have 14 misclassified points out of 2400. Pretty


good.

4. Multiclass Classification
In the previous sections we worked on binary classification. Now we will
take a look at a multi-class classification problem, where the number of
classes is more than 2. We will pick 3 classes for demonstration, but our
approach generalizes to any number of classes.

Here’s how our dataset looks like, spiral data with 3 classes, using
the make_multiclass method in scikit-learn.

Softmax Regression

As we know Logistic Regression (LR) is a classification method for 2


classes. It works with binary labels 0/1. Softmax Regression (SR) is a
generalization of LR where we can have more than 2 classes. In our
current dataset we have 3 classes, represented as 0/1/2.

Activation function: SR uses softmax. Softmax scales the values of the


output nodes such that they represent probabilities and sum up to 1. So in
our case P(class=0) + P(class=1) + P(class=2)=1. It doesn’t do it in a
naive way by dividing individual probabilities by the sum though, it uses
the exponential function. So higher values get emphasized more and
lower values get squashed more. We will talk in detail what softmax does
in another tutorial. For now you can simply think of it as a normalization
function which lets us interpret the output values as probabilities.

Loss function: In a binary classification problem, the loss function is


binary_crossentropy. In the multiclass case, the loss function is
categorical_crossentropy. Categorical crossentropy is the generalization of
binary crossentropy to more than 2 classes.

Training the model gives us an accuracy of around 50%. The most naive
method which always predicts class 1 no matter what the input is would
have an accuracy of 33%. The SR model is not much of an improvement
over it. Which is expected because the dataset is not linearly separable.

Looking at the decision boundary confirms that we still have a linear


classifier. The lines look jagged due to floating point rounding but in reality
they’re straight.

Here’s the precision and recall corresponding to the 3 classes. And the
confusion matrix is all over the place. Clearly this is not an optimal
classifier.
5. Deep ANN

Now let’s build a deep ANN for multiclass classification. We will do the
same again. Adding a couple of Dense layers with tanh activation
function.

Note that the output layer still has 3 nodes, and uses the softmax
activation. The loss function also didn’t change, still
categorical_crossentropy. These won’t change going from a linear model
to a deep ANN, since the problem definition hasn’t changed. We’re still
working on multiclass classification. But now using a more powerful
model, and that power comes from adding more layers to our neural net.

We achieve 99% accuracy in just a couple of epochs.


The decision boundary is non-linear.

We got almost 100% accuracy. We totally misclassified 5 points out of


1500.

You might also like