0% found this document useful (0 votes)
5 views

Lecture 4

Uploaded by

asedovskaya.ann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 4

Uploaded by

asedovskaya.ann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Simple Linear Regression

Sample data consists of n observed pairs:


(x1, y1), … , (xn, yn), i=1… n.

, with unknowns b0, b1.


The “best fit” line is motivated by the principle of least squares, which can be
traced back to the German mathematician Gauss (1777–1855):
A line provides the best fit to the data if the sum of the squared vertical distances
(deviations) from the observed points to that line is as small as it can be.

𝑓(𝑏 , 𝑏 ) = 𝜀 ∨ = [𝑦 − (𝑏 + 𝑏 𝑥 )]

The minimizing values of b0 and b1 are found by taking partial derivatives of


f(b0, b1) with respect to both b0 and b1, equating them both to zero [analogously
to fʹ′(b) = 0 in univariate calculus], and solving the equations:
The least squares estimate of the slope coefficient β1 of the true regression line
is:

Sxy = Σxiyi – (Σxi)(Σyi)/n and Sxx = Σxi2 – (Σxi)2/n


(Typically columns for xi, yi, xiyi and xi2 and constructed and then Sxy and Sxx are
calculated.)

numpy.polyfit - Least squares polynomial fit.


p = numpy.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)
p(x) = p[0] * x**deg + p[1] *x**(deg-1) + ... + p[deg-1]*x + p[deg]

numpy.polyval - Evaluate a polynomial at specific values.


numpy.polyval(p, x)
Overfitting – when deg high.
Biological and Artificial Neuron
Artificial neuron related names
Perceptron
Sample data consists of n observed pairs:
(x1, y1), … (xi, yi) ,… (xn, yn), i=1…n.
xi – input vector, yi -label

b, w1, w2, wD - unknowns


Mean Squared Error - MSE.
In context of neural networks MSE loss.

In a neural network, the weights are usually found using an optimization algorithm that
minimizes a chosen loss function. In the case of Mean Squared Error (MSE) loss, the
objective is to minimize the average squared difference between the predicted output and
the true output for a given set of input data.
Here's a general overview of how the weights are updated in a neural network using MSE
loss:
1. Initialize the weights of the neural network randomly.
2. Forward pass: Feed the input data through the neural network and obtain the
predicted output.
3. Compute the MSE loss between the predicted output and the true output.
4. Backward pass: Calculate the gradient of the loss with respect to each weight in the
network using backpropagation.
5. Use an optimization algorithm such as Stochastic Gradient Descent (SGD) or
ADAM to update the weights in the direction that reduces the loss (opposite to
gradient). The amount of weight update is determined by the learning rate
hyperparameter.
6. Repeat steps 2-5 for multiple epochs (passes through the entire dataset) until the loss
converges to a minimum or stops improving significantly or other finishing conditions
are satisfied.
During the training process, the weights are adjusted in a way that minimizes the MSE
loss between the predicted output and the true output. This is done by iteratively updating
the weights in the direction of steepest descent of the loss function until convergence.
Once the weights have converged to a minimum, the neural network can be used to make
predictions on new unseen data.
Artificial Neural Networks (ANN)

Multi-Layer Perceptrons (MLP)

Feed Forward Network


A N-layer neural network with inputs, hidden layers of K neurons each and one output
layer.
There are connections (synapses) between neurons across layers, but not within a layer.
N-layer neural network (excluding input)
Single layer NN have no hidden layers, input mapper onto output (SVM, logistic
regression)
Output layer neurons most commonly do not have an activation function - last output
layer represents the class scores.

( ) ( )
– outputs of first hidden layer

( ) ( ) ( )
- outputs of second hidden layer

( ) ( )
- output (one output neuron – scalar number output)

( ) ( ) ( )
Input is a [3x1] vector
Weights W(1) of size [4x3], matrix with connections of the hidden layer, and the biases in
vector b1, of size [4x1].
Single neuron has its weights in a row of W1
Matrix-vector multiplication evaluates the activations of all neurons in that layer.
W(2) is [4x4] matrix with connections, and W(3) a [1x4] matrix for the last (output) layer.
The full forward pass is simply three matrix multiplications and applications of the
activation functions.
Size of the network - the number of parameters, number of layers.
This network has 4 + 4 + 1 = 9 neurons, [3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32
weights and 4 + 4 + 1 = 9 biases, for a total of 41 learnable parameters.
Activation Functions
An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
It decides whether a neuron should be activated or not. This means that it will decide
whether the neurons input to the network is important or not in the process of prediction.
Sometimes the activation function is called a “transfer function.” If the output range of
the activation function is limited, then it may be called a “squashing function.” Many
activation functions are nonlinear and may be referred to as the “nonlinearity” in the layer
or the network design.
The choice of activation function has a large impact on the capability and performance of
the neural network, and different activation functions may be used in different parts of
the model.
Technically, the activation function is used within or after the internal processing of each
node in the network, although networks are designed to use the same activation function
for all nodes in a layer.
A network may have three types of layers: input layers that take raw input from the domain,
hidden layers that take input from another layer and pass output to another layer, and output
layers that make a prediction.
All hidden layers typically use the same activation function. The output layer will
typically use a different activation function from the hidden layers and is dependent upon
the type of prediction required by the model.
Activation functions are also typically differentiable, meaning the first-order derivative
can be calculated for a given input value. This is required given that neural networks are
typically trained using the backpropagation of error algorithm that requires the derivative
of prediction error in order to update the weights of the model.
There are many different types of activation functions used in neural networks, although
perhaps only a small number of functions used in practice for hidden and output layers.

Importance of activation functions


( ) ( ) ( )
with activation functions

( ) ( ) ( )
without activation

( ) ( ) ( )
perceptron equation
Derivatives

You might also like