CSE445 T3 Linear Regression One Variable
CSE445 T3 Linear Regression One Variable
Linear Regression
with One Variable
Housing Price
600
Size in feet2 (x) Price ($) in 1000’s (y)
500
2104 460
PRICE (IN 1000S OF $)
400
1416 232
300 1534 315
200 852 178
100 … …
0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)
Housing Price
600
500
h(x) = θ0 + θ1x
PRICE (IN 1000S OF $)
400
300
200
100
0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Θ0 =1.5 Θ0 = 0 Θ0 = 1
Θ1 = 0 Θ1 = 0.5 Θ1 = 0.5
θ0, θ1
y
x
• Based on our training set we want to generate parameters which make the
straight line
• Chosen these parameters so hθ(x) is close to y for our training examples
• Basically, uses x’s in training set with hθ(x) to give output which is as close to the actual y
value as possible
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we already
have - we can evaluate, how well hθ(x) does this.
hθ(x) = θ1x
1
0
0 1 2 3
Θ0 = 0
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
Θ 1= 1
2 2
Θ1= 1
J(θ1)
y
J(θ1) = 0
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
2 2
Θ1= 0.5
J(θ1)
Θ1= 0.5 J(θ1) = 0.58
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
Θ 1= 1
2 2
J(θ1)
y
Θ1= 0.5
1 1
hθ(x)
0 Θ 1= 0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
• Derivative term
• Derivative says
• Lets take the tangent at the point and look at the slope of the line
• So moving towards the minimum (down) will create a negative derivative, alpha is
always positive, so will update J(θ1) to a smaller value
• Similarly, if we're moving up a slope we make J(θ1) a bigger numbers
positive slope
negative slope
θ1 θ1
d d
θ1 = θ1 − α J(θ ) ≥ 0 θ1 = θ1 − α J(θ ) ≤ 0
dθ1 1 dθ1 1
θ1
d
θ1 = θ1 − α J(θ )
dθ1 1 If'α is too large, gradient descent
can overshoot the minimum. It may
If'α is too small, gradient descent fail to converge, or even diverge.
can be slow.
J(θ1) (θ1 ϵ ℝ)
θ1 at local optima
θ1
θ1
Update θ0 and θ1
simultaneously