CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
Linear Regression
(modified from Sanja Fidler)
• Curve Fitting
Regression Problems
• What
do all these problems have in common?
– Input: d-dimensional samples/vectors
– Output: continuous target value
• How to make predictions?
– A model, a function that represents the relationship between
and
– A loss or (cost, or objective function), which tells us how well
our model approximates the training examples
– Optimization, a way of finding the parameters of our model that
minimizes the loss function
Simple 1-D example
𝑦(𝑥)
Model Selection
• Model Complexity
– In what form should we parameterize the prediction
function?
– How complex should the model be?
• Example: linear, quadratic, or degree-d polynomial? (1-d case)
• Common Belief
– Simple models
• less flexible, but may be easy to solve (such as a linear model)
– Complex models
• more powerful, but difficult to solve, and prone to overfitting
• We will start from building simple, linear models
Model Selection
10
Linear Model
• Given
d-dimensional training samples
• Linear model
– Equivalent form:
′
𝒘
𝑥𝑛
– So we usually apply ``augmented’’, (d+1)-dimensional data
• More convenient to derive closed-form solutions
Training (In-Sample) Error, Ein
Loss Function
Model
𝒘= [ ¿ ]
𝑹(𝒅+𝟏)× 𝟏
𝑹 𝑵 × 𝟏
𝑹 𝑵 ×(𝒅 +𝟏)
12
Minimizing Ein by closed-form solution
In order to minimize a function E(w)
=
=
=
We need to set its gradient to 0
=0
And then solve the resultant equation (usually easier)
Why? If we want to solve
(it is the counterpart/generalization of square We need to solve
matrix inverse in solving linear systems) is a rectangular (no inverse defined)
But applies here, as if can be inverted!
13
Pseudo Inverse by SVD
•
• Non-square m-by-n matrix typically has no inverse, two definitions
– (1) Mathematical flavored definition: Moore-Penrose generalized inverse is an n-by-m matrix such
that
– (2) Machine learning flavored definition: as the solution of the following linear system
𝛴 𝑈 ′
A 𝑈 𝛴 𝑉
+¿¿
=
’ 𝐴 =
Properties
of U,V, and
’ (invertible)
– How to prove that SVD based is the pseudo inverse of A (by using the properties of U,V, and
above)?
• either ,
• or is the solution of in
Linear Regression Algorithm
15
Augmented Linear Model
• Can
we obtain both (1) closed-form solution and (2)
capacity of modelling nonlinear shapes?
• Nonlinear Data Augmentation
– If we want to use the following model
•
– Then the regression can be written as
, , ,1
.
.
¿ .
.
.
. .
.
Minimizing Loss by Gradient Descent
w=
Gradient
Stochastic
descent
Gradient
Loss
Function