0% found this document useful (0 votes)
15 views

Lecture2-Linear Regression With One Variable

Uploaded by

shiksha.anantkal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture2-Linear Regression With One Variable

Uploaded by

shiksha.anantkal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Linear regression

with one variable


Model
representation
Instructor: Wei Ding
Adopted from lecture notes of Andrew Ng
Stanford University
Andrew Ng
Background
• Parametric models for classification and regression
• We make some assumptions about the nature of the data
distribution (either p(y|x) for a supervised problem or p(x) for
an unsupervised problem.
• These assumptions, known as inductive bias, are often
embodied in the form of a parametric model, which is a
statistical model with a fixed number of parameters.
– Linear regression
– Logistic regression

Andrew Ng
500000
Housing Prices
400000
(Portland, OR)
300000
220K
Price 200000
(in 1000s 100000
of dollars) 0
1250
500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data.
Andrew Ng
Training set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 x(1) 460 y(1)
(Portland, OR) 1416 232
1534 315
852 178
… … m=47
examples
Notation:
m = Number of training examples
x’s = “input” variable / features x(3)= ?
y’s = “output” variable / “target” variable y(3)=?

(x,y) – one training example


(x(i) , y(i)) – ith training example
Andrew Ng
Training Set How do we represent h ?
(x)= + x
Shorthand: h(x)
Learning Algorithm x
y
x x (x)= + x
x x
x
x
Size of h Estimated
house price
x
x hypothesis estimated
value of y Linear regression with one variable.
Univariate linear regression.
h maps from x’s to y’s

Andrew Ng
Linear regression
with one variable
Cost function

Machine Learning

Andrew Ng
Size in feet2 (x) Price ($) in 1000's (y)
Training Set 2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Andrew Ng
(x)= + .5x
3 3 3
(x)= + x
2 (x)= + x
2 2
1
1 1
0
0 1 2 3 0 0
0 1 2 3 0 1 2 3

Andrew Ng
m is the number of training examples
𝑚
Minimize 1
2𝑚 ∑ (h 𝜃(x (i))−y (i)))2
y 𝑖=1

()= +
𝑚
1
x (x(i) , y(i)) 2𝑚 ∑ (h 𝜃(x (i))−y (i)))2
𝑖=1

Minimize
Idea: Choose so that
is close to Cost Function
for our training examples Squared Error Cost Function

Andrew Ng
Linear regression
with one variable
Cost function
intuition I
Machine Learning

Andrew Ng
Simplified
Hypothesis:

=0
Parameters:
h(x) h(x)

Cost Function:

()=
Goal:

Andrew Ng
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

X
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J() when =1
𝑚 J(1)=0
1
= 2𝑚 ∑ (𝜃1 x(i)−y(i)))2 = 1
2𝑚
(02+02+02)=02
𝑖=1 Andrew Ng
X X

(for fixed , this is a function of x) (function of the parameter )


3
3
X X
2
2
y y( i )
1
1 X X

h 𝜃(x (i)) 0 X
0 -0.5 0 0.5 1 1.5 2 2.5
0 1 2 3
x
=-0.5 =0
[(0.5-1)2+(1-2)2+(1.5-3)2]
J(0.5)=
(3.5)=0.58 h(x)
= Andrew Ng
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x

Andrew Ng
Linear regression
with one variable
Cost function
intuition II
Machine Learning

Andrew Ng
Hypothesis:

Parameters:

Cost Function:

Goal:

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

500000

400000
We discussed how it looks like
Price ($) 300000 with one parameter θ1. For two
in 1000’s parameters θ0, θ1, what does
200000 the cost function look like?
100000

0
500 1000 1500 2000 2500 3000
Size in feet2 (x)

Andrew Ng
We will use contour
plots/contour figures
to show the 3D
surface

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

x 3 X points have
x the same value on
x
x
θ0=800, θ1=-0.15

Where is the minimum value of


Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

h(x)=360+0X

θ0,=360, θ1=0
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

h(x)

Andrew Ng
Linear regression
with one variable

Gradient
Machine Learning
descent
Andrew Ng
Have some function
Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Andrew Ng
J(0,1)

1
0

Andrew Ng
Let start from a different initial value of θ0 and θ1

J(0,1)

1
0
We start from here in the previous slide
The searches end at 2 different local minimums
Andrew Ng
Gradient descent algorithm

α is the learning rate

Correct: Simultaneous update Incorrect:

Andrew Ng
Linear regression
with one variable
Gradient descent
intuition
Machine Learning

Andrew Ng
Gradient descent algorithm

α is the learning rate

derivative

Andrew Ng
Andrew Ng
If α is too small, gradient descent
can be slow.

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

Andrew Ng
at local optima

Current value of

Andrew Ng
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Andrew Ng
Linear regression
with one variable
Gradient descent for
linear regression
Machine Learning

Andrew Ng
Gradient descent algorithm Linear Regression Model

Andrew Ng
Andrew Ng
Gradient descent algorithm

update
and
simultaneously

Andrew Ng
J(0,1)

1
0

Andrew Ng
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
“Batch” Gradient Descent

“Batch”: Each step of gradient descent


uses all the training examples.

Andrew Ng

You might also like