Lecture 4
Lecture 4
𝑓(𝑏 , 𝑏 ) = 𝜀 ∨ = [𝑦 − (𝑏 + 𝑏 𝑥 )]
In a neural network, the weights are usually found using an optimization algorithm that
minimizes a chosen loss function. In the case of Mean Squared Error (MSE) loss, the
objective is to minimize the average squared difference between the predicted output and
the true output for a given set of input data.
Here's a general overview of how the weights are updated in a neural network using MSE
loss:
1. Initialize the weights of the neural network randomly.
2. Forward pass: Feed the input data through the neural network and obtain the
predicted output.
3. Compute the MSE loss between the predicted output and the true output.
4. Backward pass: Calculate the gradient of the loss with respect to each weight in the
network using backpropagation.
5. Use an optimization algorithm such as Stochastic Gradient Descent (SGD) or
ADAM to update the weights in the direction that reduces the loss (opposite to
gradient). The amount of weight update is determined by the learning rate
hyperparameter.
6. Repeat steps 2-5 for multiple epochs (passes through the entire dataset) until the loss
converges to a minimum or stops improving significantly or other finishing conditions
are satisfied.
During the training process, the weights are adjusted in a way that minimizes the MSE
loss between the predicted output and the true output. This is done by iteratively updating
the weights in the direction of steepest descent of the loss function until convergence.
Once the weights have converged to a minimum, the neural network can be used to make
predictions on new unseen data.
Artificial Neural Networks (ANN)
( ) ( )
– outputs of first hidden layer
( ) ( ) ( )
- outputs of second hidden layer
( ) ( )
- output (one output neuron – scalar number output)
( ) ( ) ( )
Input is a [3x1] vector
Weights W(1) of size [4x3], matrix with connections of the hidden layer, and the biases in
vector b1, of size [4x1].
Single neuron has its weights in a row of W1
Matrix-vector multiplication evaluates the activations of all neurons in that layer.
W(2) is [4x4] matrix with connections, and W(3) a [1x4] matrix for the last (output) layer.
The full forward pass is simply three matrix multiplications and applications of the
activation functions.
Size of the network - the number of parameters, number of layers.
This network has 4 + 4 + 1 = 9 neurons, [3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32
weights and 4 + 4 + 1 = 9 biases, for a total of 41 learnable parameters.
Activation Functions
An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
It decides whether a neuron should be activated or not. This means that it will decide
whether the neurons input to the network is important or not in the process of prediction.
Sometimes the activation function is called a “transfer function.” If the output range of
the activation function is limited, then it may be called a “squashing function.” Many
activation functions are nonlinear and may be referred to as the “nonlinearity” in the layer
or the network design.
The choice of activation function has a large impact on the capability and performance of
the neural network, and different activation functions may be used in different parts of
the model.
Technically, the activation function is used within or after the internal processing of each
node in the network, although networks are designed to use the same activation function
for all nodes in a layer.
A network may have three types of layers: input layers that take raw input from the domain,
hidden layers that take input from another layer and pass output to another layer, and output
layers that make a prediction.
All hidden layers typically use the same activation function. The output layer will
typically use a different activation function from the hidden layers and is dependent upon
the type of prediction required by the model.
Activation functions are also typically differentiable, meaning the first-order derivative
can be calculated for a given input value. This is required given that neural networks are
typically trained using the backpropagation of error algorithm that requires the derivative
of prediction error in order to update the weights of the model.
There are many different types of activation functions used in neural networks, although
perhaps only a small number of functions used in practice for hidden and output layers.
( ) ( ) ( )
without activation
( ) ( ) ( )
perceptron equation
Derivatives