0% found this document useful (0 votes)
34 views8 pages

Research Gradient Descent

The document discusses neural networks and cost functions. It explains that neural networks use weights and biases to determine outputs from multiple inputs. A cost function called mean squared error is used to minimize the difference between predicted and actual outputs. Gradient descent is employed to find the minimum of the cost function by computing the gradient and moving in the opposite direction repeatedly. Stochastic gradient descent is more efficient for large datasets by computing the gradient for small random samples rather than the entire training set at once.

Uploaded by

Andrew Jarrett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views8 pages

Research Gradient Descent

The document discusses neural networks and cost functions. It explains that neural networks use weights and biases to determine outputs from multiple inputs. A cost function called mean squared error is used to minimize the difference between predicted and actual outputs. Gradient descent is employed to find the minimum of the cost function by computing the gradient and moving in the opposite direction repeatedly. Stochastic gradient descent is more efficient for large datasets by computing the gradient for small random samples rather than the entire training set at once.

Uploaded by

Andrew Jarrett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Neural Networks and

Cost Function
Andrew Jarrett
11/30/19
Review

  
Perceptions take several inputs and create one output. They use weights and
biases to decide the output.
 Sigmoid neuron 

 This allows the perception to have an number in-between zero and one.
  
The main objective is to create an algorithm which lets us find the weights
and biases so that the output from the network approximates y(x) for each of
the training examples x
 In the textbook example this y(x) is a 10 dimensional vector
 each dimension corresponds to a number output in this case the 1 means the
output is an 8
Cost Function

  The equation below is the quadratic cost function, or mean squared error

 C is a function of the weights and biases, n is the number of training inputs, a


is the vector of the actual outputs for each training input x, and it is the sum
of all the training inputs (x).
 C will approach O when y(x) is approximately equal to the output a for all the
training inputs
 This is used because the cost function is smooth unlike the function of the
number images correctly classified.
Gradient Descent

   is used to solve minimization problems, or finding the absolute minimum of


It
a function
 Visualize a ball rolling down a hill and comes to a complete stop at the
bottom
 The ball is changing by a small amount therefore the change of the ball is C
in the equation below
  
The previous equation allows us to choose a value of v as to make the C
negative
 Therefore

  is a small positive parameter or the learning rate it dictates how fast the
program will learn

 This equation shows how the “ball” is rolling down the hill
 Summary: Gradient descent works by repeatedly computing the gradient of
the cost function then to move it in the opposite direction.
Gradient Descent and Learning

  
Stochastic Gradient Descent

  A problem occurs when using the cost function

 When having to find the C we must find

 This is extremely time consuming if the number of training samples is large


 This would be helpful in our project due to large data source

You might also like