2 LinearRegression2
2 LinearRegression2
Machine Learning
Linear Regression
𝐽(𝑤! , 𝑤" )
𝑤"
𝑤!
Gradient Descent
𝐽(𝑤! , 𝑤" )
𝑤"
𝑤!
Gradient Descent
𝐽(𝑤! , 𝑤" )
𝑤"
𝑤!
Gradient Descent
𝐽(𝑤! , 𝑤" )
𝑤"
𝑤!
Gradient Descent
𝐽(𝑤! , 𝑤" )
𝑤"
𝑤!
Gradient Descent
•
Taylor Expansion
Gradient Decent
•
Gradient Decent
•
• The key operation in the above update step is the calculation of each partial derivative.
•
Gradient Decent
• One should note that the second issue will not arise in the
case of convex problem as the error surface has only one
global minima.
• Why SGD?
Stochastic Gradient Descent (SGD)
1 or K (a small number)
1 or K (a small number)
Polynomial Basis Functions
Target 𝑓
Regression: Curve Fitting
Learned ℎ
𝑦
Target 𝑓
Learned ℎ
𝑦
Target 𝑓
%
ℎ 𝑥 = ℎ 𝑥, 𝐰 = 𝑤$ + 𝑤! 𝑥 + 𝑤" 𝑥 " + ⋯ + 𝑤% 𝑥 % = 1 𝑤& 𝑥 &
&'$
parameters features
Polynomial Curve Fitting
• Parametric model:
%
ℎ 𝑥 = ℎ 𝑥, 𝐰 = 𝑤$ + 𝑤! 𝑥 + 𝑤" 𝑥 " + ⋯ + 𝑤% 𝑥 % = 1 𝑤& 𝑥 &
&'$
#
argmin 1
6=
𝐰 𝐽 𝐰 𝐽 𝐰 = 1 (ℎ𝐰(𝐱 . ) − 𝑦. )"
𝐰 2𝑁
.'!
• Least Square Estimate:
6 = (𝐗 𝐓𝐗),𝟏 𝐗 𝐓𝐲
𝐰
Polynomial Curve Fitting
Regularizer
argmin
𝐰∗ = 𝐸(𝐰)
𝐰
Ridge Regression
Source: https://scikit-learn.org/stable/modules/cross_validation.html
K-fold Cross-Validation
• Split the training data into K folds and try a wide range of
tunning parameter values:
- split the data into K folds of roughly equal size
- iterate over a set of values for 𝜆
• iterate over k = 1,2, ⋯ ,K
- use all folds except k for training
- validate (calculate test error) in the k-th fold
• error[𝜆] = average error over the K folds
- choose the value of 𝜆 that gives the smallest error.
Regularization: Ridge vs. Lasso
• Ridge regression:
! /
𝐽 𝐰 = ∑# (ℎ 𝐱 . − 𝑡. )" + ∑%
&'! 𝑤&
"
"# .'! 𝐰 "
• Lasso:
# %
1 "
𝜆
𝐽 𝐰 = 1 (ℎ𝐰 𝐱 . − 𝑡. ) + 1 𝑤&
2𝑁 2
.'! &'!
Plot of the contours of the unregularized error function (blue) along with the
constraint region (3.30) for the quadratic regularizer 𝑞 = 2 on the left and the lasso
regularizer 𝑞 = 1 on the right, in which the optimum value for the parameter vector
𝐰 is denoted by 𝐰 ∗. The lasso gives a sparse solution in which 𝐰 ∗ = 𝟎.
Regularization