Problem_Set_Linear_Regression_and_Gradient_Descent
Problem_Set_Linear_Regression_and_Gradient_Descent
1. Suppose you have a dataset with three training examples given by (x, y):
hθ (x) = θ0 + θ1 x
and your parameters are initialized as θ0 = 0 and θ1 = 1. The mean squared error
(MSE) Loss function is:
m
1 X 2
J(θ0 , θ1 ) = hθ (x(i) ) − y (i) ,
2m i=1
2. Using the same data as above {(1, 2), (2, 3), (3, 6)} and the same hypothesis hθ (x) =
θ0 +θ1 x, assume the Loss function is again the MSE form. Let the learning rate (alpha)
be α = 0.1. Suppose your initial parameters are θ0 = 0 and θ1 = 1.
∂J ∂J
and .
∂θ0 ∂θ1
(b) Compute the partial derivatives given the three data points.
(c) Perform one gradient descent update step, i.e., compute the new
(new) ∂J (new) ∂J
θ0 = θ0 − α , θ1 = θ1 − α .
∂θ0 ∂θ1
1
Part 2: Simulation on given Dataset
where, m is the number of examples. Starting with θ0 = 0 and θ1 = 1, simulate the gradient
descent algorithm on the following Dataset 1, Dataset 2, Dataset 3 for
1. The learning parameter, α = 0.1.
1000
2. Decaying learning parameter, α = 1000+t
, where t is the iteration number.
• Dataset 1: A single-feature dataset where the feature is the number of hours a student
studies, and the target is the student’s test score on a 100-point exam.
Interpretation:
x = Hours studied, y = Final test score.
• Dataset 2: A single-feature dataset where the feature is the size of a house in square
feet, and the target is its price (in thousands of dollars).
Interpretation:
x = House size (sq ft), y = House price ($1,000).
2
• Dataset 3: A single-feature dataset where the feature is the age of a car and the
target is its price.
Interpretation:
1. Explain the principle of gradient descent. What are the roles of the learning rate and
the number of iterations in this algorithm?
4. Explain the differences between gradient descent, stochastic gradient descent (SGD),
and mini-batch gradient descent.
5. What does it mean for gradient descent to converge? How can you tell if gradient
descent has failed to converge in practice?
6. What are some potential drawbacks or challenges of using stochastic gradient descent,
especially in terms of the final stages of convergence?
7. Describe the cost function used in linear regression. Why is it important to minimize
this function?
8. What is the hypothesis function in linear regression? How does it differ in its formu-
lation between simple linear regression and multiple linear regression?