Linear Regression
Linear Regression
Linear Regression
Na Lu
Xi’an Jiaotong University
Machine Learning
• Supervised learning
• Learn to predict an output when given an input
vector
• Reinforcement learning
• Learn to select an action to maximize payoff.
• Unsupervised learning
• Discover a good internal representation of the
input and find out what it could be.
Two types of supervised learning
• Each training case consists of an input vector x and a
target output y.
• Regression: The target output is a real number or a
whole vector of real numbers.
– The price of a stock.
– The temperature during a day.
– Aim: to get as close as you can to the real number.
• Classification: The target output is a class label.
– The simplest case is a choice between 1 and 0.
– Facial identities with multiple labels.
– Aim: to classify the input into the correct category.
How does supervised learning work
– Clump thickness
– Uniformity of cell size
– Uniformity of cell shape
– ……
Unsupervised learning
Question
Applications
Cocktail party problem
Cocktail party problem
• Microphone 1 Output 1
Output 2
[W, s v] =svd((repmat(sum(x.*x,1),size(x,1),1).*x*x’);
Problem
500
Housing Prices
(Portland, OR) 400
300
Price 200
(in 1000s
of dollars) 100
0
0 1000 2000 3000
Size (feet2)
Training Set
) θ 0 + θ1 x
hθ ( x=
Learning Algorithm
y
Size of Estimate
h x
house d price
Linear regression with one variable.
Univariate linear regression.
Question
Linear regression
with one variable
Cost function
Training Set Size in feet2 Price ($) in
(x) 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
: Parameters
How to choose s?
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
1 m
y
J (θ 0 ,θ1 )
= ∑
2m i =1
( hθ ( x (i )
) − y (i ) 2
)
x Minimize J (θ 0 ,θ1 )
θ0 ,θ1
Cost function
intuition I
Hypothesis: Simplified
Parameters:
Cost Function:
Goal:
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 m
J (θ1 ) ∑
2m i =1
(hθ ( x (i ) ) − y (i ) ) 2
1 m
= ∑
2m i =1
(θ1 x (i )
− y (i ) 2
)
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
=
J (0.5) [(0.5 − 1) 2 + (1 − 2) 2 + (1.5 − 3) 2 ]
6
3.5
= ≈ 0.58
6
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 2
J (0)= [1 + 22 + 32 ]
6
14
= ≈ 2.3
6
Question
Hypothesis:
Parameters:
Cost Function:
Goal:
(for fixed , this is a function of x) (function of the parameters )
500
400
Price ($)
in 1000’s 300
200
100
0
0 1000 2000 3000
Size in feet2 (x)
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Linear regression
with one variable
Gradient descent
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
J(θ0,θ1)
θ1
θ0
J(θ0,θ1)
θ1
θ0
Gradient descent algorithm
Gradient descent
intuition
Gradient descent algorithm
If α is too small, gradient
descent can be slow.
Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α
fixed.
As we approach a local
minimum, gradient
descent will
automatically take
smaller steps. So, no
need to decrease α over
time.
Linear regression
with one variable
∂ 1 m
= ∑
∂θ j 2m i =1
(θ 0 + θ1 x (i )
− y (i ) 2
)
1 m
∑
m i =1
( hθ ( x (i )
) − y (i )
)
1 m
∑
m i =1
( hθ ( x (i )
) − y (i )
) ⋅ x (i )
Gradient descent algorithm
update
and
simultaneously
J(θ0,θ1)
θ1
θ0
J(θ0,θ1)
θ1
θ0
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
“Batch” Gradient Descent
Thank you