Lab 7
Lab 7
CLO-2
Part 1
Let us create the respective sample weights which are to be applied in the
input layer, the first & the second hidden layer
import numpy as np
from sklearn import datasets
#
# Generate a dataset and plot it
#
np.random.seed(0)
X, y = datasets.make_moons(200, noise=0.20)
#
# Neural network architecture
# No of nodes in input layer = 4
# No of nodes in output layer = 3
# No of nodes in the hidden layer = 6
#
input_dim = 4 # input layer dimensionality
output_dim = 3 # output layer dimensionality
hidden_dim = 6 # hidden layer dimensionality
#
# Weights and bias element for layer 1
# These weights are applied for calculating
# weighted sum arriving at neurons in 1st hidden layer
#
W1 = np.random.randn(input_dim, hidden_dim)
b1 = np.zeros((1, hidden_dim))
#
# Weights and bias element for layer 2
# These weights are applied for calculating
# weighted sum arriving at neurons in 2nd hidden layer
#
W2 = np.random.randn(hidden_dim, hidden_dim)
b2 = np.zeros((1, hidden_dim))
#
# Weights and bias element for layer 2
# These weights are applied for calculating
# weighted sum arriving at in the final / output layer
#
W3 = np.random.randn(hidden_dim, output_dim)
b3 = np.zeros((1, output_dim))
Part 2
Now we will train a deep Artificial Neural Networks (ANN) to better classify
the datasets which the logistic regression model struggled, Moons and
Circles. We will also classify an even harder dataset of Sine Wave to
demonstrate that ANN can form really complex decision boundaries.
Step 2: Add a Dense layer with sigmoid activation function. This was the
only layer we needed.
While building a deep neural network, we only need to change step 2 such
that, we will add several Dense layers one after another. The output of
one layer becomes the input of the next. Keras again does most of the
heavy lifting by initializing the weights and biases, and connecting the
output of one layer to the input of the next. We only need to specify how
many nodes we want in a given layer, and the activation function. It’s as
simple as that.
We first add a layer with 4 nodes and tanh activation function. Tanh is a
commonly used activation function. We then add another layer with 2
nodes again using tanh activation. We finally add the last layer with 1
node and sigmoid activation. This is the final layer that we also used in
the logistic regression model.
This is not a very deep ANN, it only has 3 layers: 2 hidden layers, and the
output layer. But notice a couple of patterns:
Output layer still uses the sigmoid activation function since we’re working
on a binary classification problem.
Hidden layers use the tanh activation function. If we added more hidden
layers, they would also use tanh activation. We have a couple of options
for activation functions: sigmoid, tanh, relu, and variants of relu.
Now let’s look at the Circles dataset, where the LR model achieved only
50% accuracy. The model is the same as above, we only change the input
to the fit function using the current dataset. And we again achieve 100%
accuracy.
Similarly the decision boundary looks just like the one we would draw
by hand ourselves. The ANN was able to figure out an optimal
separator.
Let’s create a sinusoidal dataset looking like the sine function, every up
and down belonging to an alternating class. As we can see in the figure, a
single decision boundary won’t be able to separate out the classes. We
will need a series of non-linear separators.
4. Multiclass Classification
In the previous sections we worked on binary classification. Now we will
take a look at a multi-class classification problem, where the number of
classes is more than 2. We will pick 3 classes for demonstration, but our
approach generalizes to any number of classes.
Here’s how our dataset looks like, spiral data with 3 classes, using
the make_multiclass method in scikit-learn.
Softmax Regression
Training the model gives us an accuracy of around 50%. The most naive
method which always predicts class 1 no matter what the input is would
have an accuracy of 33%. The SR model is not much of an improvement
over it. Which is expected because the dataset is not linearly separable.
Here’s the precision and recall corresponding to the 3 classes. And the
confusion matrix is all over the place. Clearly this is not an optimal
classifier.
5. Deep ANN
Now let’s build a deep ANN for multiclass classification. We will do the
same again. Adding a couple of Dense layers with tanh activation
function.
Note that the output layer still has 3 nodes, and uses the softmax
activation. The loss function also didn’t change, still
categorical_crossentropy. These won’t change going from a linear model
to a deep ANN, since the problem definition hasn’t changed. We’re still
working on multiclass classification. But now using a more powerful
model, and that power comes from adding more layers to our neural net.