Basic Machine Learning: Case Study
Basic Machine Learning: Case Study
Learning
(Linear Regression using Gradient Descent)
Case Study
Consider using machine learning when you have a complex task or problem involving a
large amount of data and lots of variables, but no existing formula or equation. For
example, machine learning is a good option if you need to handle situations like these: •
Hand-written rules and equations are too complex—as in face recognition and speech
recognition. • The rules of a task are constantly changing—as in fraud detection from
transaction records. • The nature of the data keeps changing, and the program needs to
adapt—as in automated trading, energy demand forecasting, and predicting shopping
trends.
Linear Regression using Gradient Descent:
Linear Regression:
In statistics, linear regression is a linear approach to modelling the relationship between a
dependent variable and one or more independent variables. Let X be the independent
variable and Y be the dependent variable. We will define a linear relationship between
these two variables as follows:
Y= mX+C
This is the equation for a line that you studied in high school. m is the slope of the line
and c is the y intercept. Today we will use this equation to train our model with a given
dataset and predict the value of Y for any given value of X.
Our challenege today is to determine the value of m and c, such that the line
corresponding to those values is the best fitting line or gives the minimum error.
Loss function:
The loss is the error in our predicted of m and c. Our goal is to minimize this error to
obtain the most accurance value of m and c.
We will use the Mean Squared Error function to calculate the loss.
There are three steps in this function:
1. Find the difference between the actual y and predicted y value(y = mx +
c), for a given x.
2. Square this difference.
3. Find the mean of the squares for every value in X.
Here yi is the actual value and is the predicted value. Lets substitue the value of
So we square the error and find the mean. hence the name Mean Squared Error.
Now that we have defined the loss function, lets get into the interesting part - minimizing
it and finding m and c
Imagine a valley and a person with no sense of direction who wants to get to the bottom
of the valley. He goes down the slope and takes large steps when the slope is steep and
small steps when the slope is less steep. He decides his next position based on his current
position and stops when he gets to the bottom of the valley which was his goal.
Let's try applying gradient descent to m and c and approach it step by step:
1. Initially let m = 0 and c = 0. Let L be our learning rate. This controls how
much the value of m changes with each step. L could be a small value
like 0.0001 for good accuracy.
2. Calculate the partial derivative of the loss function with respect to m, and
plug in the current values of x, y, m and c in it to obtain the derivative
value D.
is the value of the partial derivative with respect to m. Similarly lets
find the partial derivative with respect to c,
equation: m= m- L * Dm
c= c- L * Dc
4. We repeat this process untill our loss function is a very small value or
ideally 0 (which means 0 error or 100% accuracy). The value
of m and c that we are left with now will be the optimum values.
Now going back to our analogy, m can be considered the current position of the
person. D is equivalent to the steepness of the slope and L can be the speed with which
he moves. Now the new value of m that we calculate using the above equation will be his
next positon, and will be the size of the steps he will take. When the slope is more
steep (D is more) he takes longer steps and when it is less steep (D is less), he takes
smaller steps. Finally he arrives at the bottom of the valley which corresponds to our loss
= 0.
We repeat the same process above to find the value of c also. Now with the optimum
value of m and c our model is ready to make predictions !
print (m, c)
1.4796491688889395 0.10148121494753726
# Making predictions
Y_pred = m*X + c
plt.scatter(X, Y)
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)], color='red') # predicted
plt.show()
Conclusion:
The biggest advantage gradient descent has is that it requires no knowledge whatsoever of
the fundamentals of the model. We can apply the classifier we've built without knowing
anything about linear regression. In particular, we don't need to know that linear
regression has a closed-form solution, or what that solution looks like, or how to derive it.
Instead we just pick a metric, compute its derivative, and then use a computer to brute-
force a solution.
A gradient descent solution to modeling can be applied to any model metric, so long as
the metric has two properties: it's differentiable (most things are) and concave. Concavity
is the property that no matter where you are on the metric surface, the derivative will
point towards a point with better "fit", right up until you get to the bottom of the thing.
Things that are concave include funnels, eye contacts, and, it turns out, the linear
regression parameter space.
Reference:
https://github.com/chasinginfinity/ml-from-
scratch/tree/master/02%20Linear%20Regression%20using%20Gradient
%20Descent
https://towardsdatascience.com/linear-regression-using-gradient-
descent-97a6c8700931
end of report