0% found this document useful (0 votes)

21 views

CSE445 T3 Linear Regression One Variable

Uploaded by

adittorahman adit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

CSE445 T3 Linear Regression One Variable

Uploaded by

adittorahman adit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Lecture 3a

Linear Regression
with One Variable

Silvia Ahmed CSE445 Machine Learning ECE@NSU 1

Learning goals

• After this presentation, you should be able to:

• Understand the Model Representation
• Explain the Hypothesis Function
• Understand the Cost Function
• Gradient Descent Algorithm
• Batch Gradient Descent
• Matrix Multiplication for Implementation

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 2

Linear Regression
• Housing price data example
• Supervised learning regression problem
• What do we start with? Size in feet2 (x) Price ($) in 1000’s (y)

• Training set (this is your data set) 2104 460

1416 232
• Notation (used throughout the course)
1534 315
• m = number of training examples
852 178
• x's = input variables / features
… …
• y's = output variable "target" variables
• (x, y) - single training example
• (x(i), y(i)) - specific example (ith training example)
• i is an index to training set

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 3

Training Set of Housing prices

Housing Price
600
Size in feet2 (x) Price ($) in 1000’s (y)
500
2104 460
PRICE (IN 1000S OF $)

400
1416 232
300 1534 315
200 852 178
100 … …
0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 4

Model Representation
• With our training set defined - how do
we use it?
• Take training set
• Pass into a learning algorithm
• Algorithm outputs a function (denoted h )
(h = hypothesis)
• This function takes an input (e.g. size of new
house)
• Tries to output the estimated value of y

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 5

How do we Represent h?
• Hypothesis function h -
• hθ(x) = θ0 + θ1x

Housing Price
600

500
h(x) = θ0 + θ1x
PRICE (IN 1000S OF $)

400

300

200

100

0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 6

Model Representation
• What does this mean?
• Means y is a linear function of x.
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable
• Also called Univariate Linear Regression
• So in summary
• A hypothesis takes in some variable
• Uses parameters determined by a learning system
• Outputs a prediction based on that input

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 7

Hypothesis Function
hθ(x) = θ0 + θ1x
• equation of a straight line.
• set values for θ0 and θ1 to get estimated output hθ.
• hθ is a function that is trying to map input data (the x's) to our output data
(the y's). Input x Output y
• Example – 0 4
1 7
2 7
• Let θ0 = 2 and θ1 = 2, then hθ(x) = 2 + 2x 3 8
• For input 1 to our hypothesis, hθ(x) will be 4 (off by 3).
• Need to try out various values to find best possible fit.
• Or the most representative "straight line" through the data points mapped
on the x-y plane.

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 8

Cost Function
Size in feet2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178
… …

• Hypothesis: hθ(x) = θ0 + θ1x

• θi’s: parameters

How to choose θi’s?

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 9

Cost Function
• A cost function lets us figure out how to fit the best straight line
to our data.
• We can measure the accuracy of our hypothesis function by
using a cost function.
• Choosing values for θi (parameters)
• Different values give different functions.

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 10

Cost Function
hθ(x) = θ0 + θ1x

h(x) = 1.5 + 0.x h(x) = 0.5x h(x) = 1 + 0.5x

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3

Θ0 =1.5 Θ0 = 0 Θ0 = 1
Θ1 = 0 Θ1 = 0.5 Θ1 = 0.5

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 11

Linear Regression – Cost Function

θ0, θ1

y
x

• Based on our training set we want to generate parameters which make the
straight line
• Chosen these parameters so hθ(x) is close to y for our training examples
• Basically, uses x’s in training set with hθ(x) to give output which is as close to the actual y
value as possible
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we already
have - we can evaluate, how well hθ(x) does this.

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 12

Cost Function
• To formalize this;
• We want to solve a minimization problem
• Minimize the difference between h(x) and y for each/any/every example –
• minimize (hθ(x) - y)2
• Sum this over the training set

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 13

Cost Function
• Minimize squared different between predicted house price and actual
house price
• 1/2m
• 1/m - means we determine the average
• 1/2m - the 2 makes the math a bit easier, and doesn't change the constants we determine
at all (i.e. half the smallest value is still the smallest value!)
• Minimizing difference means we get the values of θ0 and θ1 which find on average the
minimal deviation of x from y when we use those parameters in our hypothesis
function
• More cleanly, this is a cost function

• And we want to minimize this cost function

• Our cost function is (because of the summation term) inherently looking at ALL the data in the training set
at any time

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 14

Cost Function
• Lets consider some intuition about the cost function and why we want to
use it
• The cost function determines parameters
• The value associated with the parameters determines how your hypothesis behaves,
with different values generate different costs
• Simplified hypothesis
3
• Assumes θ0 = 0
2

hθ(x) = θ1x
1

0
0 1 2 3

Θ0 = 0

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 15

Cost Function
• Cost function and goal here are very similar to when we have θ0, but with a
simpler parameter
• Simplified hypothesis makes visualizing cost function J(θ) a bit easier
• So hypothesis pass through (0, 0)
• Two key functions we want to understand
• hθ(x)
• Hypothesis is a function of x - function of what the size of the house is
• J(θ1)
• Is a function of the parameter of θ1

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 16

Cost Function
hθ(x) = θ1x

hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
Θ 1= 1

2 2
Θ1= 1

J(θ1)
y

J(θ1) = 0
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 17

Cost Function
hθ(x) = θ1x

hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3

2 2
Θ1= 0.5

J(θ1)
Θ1= 0.5 J(θ1) = 0.58
y

1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 18

Cost Function
hθ(x) = θ1x

hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3

Θ 1= 1
2 2

J(θ1)
y

Θ1= 0.5
1 1
hθ(x)

0 Θ 1= 0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 19

Deeper Insight - Cost Function
• Using our original complex hypothesis with two variables,
• So cost function is
• J(θ0, θ1)
• Example,
• Say
• θ0 = 50
• θ1 = 0.06
• Previously we plotted our cost function by plotting
• θ1 vs J(θ1)

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 20

Deeper Insight - Cost Function
• Now we have two parameters
• Plot becomes a bit more
complicated
• Generates a 3D surface plot
where axis are
• X = θ1
• Z = θ0
• Y = J(θ0, θ1)
• the height (y) indicates the value
of the cost function, so find where
y is at a minimum

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 21

Deeper Insight - Cost Function
• Instead of a surface plot we can use a contour figures/plots
• Set of ellipses in different colors
• Each color is the same value of J(θ0, θ1), but obviously plot to different locations
because θ1 and θ0 will vary
• Imagine a bowl shape function coming out of the screen so the middle is the
concentric circles

Silvia Ahmed (SvA) CSE445 Machine Learning ECE@NSU 22

Deeper Insight - Cost Function
• Each point (like the red one above) represents a pair of parameter values
for θ0 and θ1
• Our example here put the values at
• θ0 = ~800
• θ1 = ~-0.15
• Not a good fit
• i.e. these parameters give a value on our contour plot far from the center
• If we have
• θ0 = ~360
• θ1 = 0
• This gives a better hypothesis, but still not great - not in the center of the contour plot
• Finally we find the minimum, which gives the best hypothesis
• Doing this by eye/hand is a painful task
• What we really want is an efficient algorithm for finding the minimum for θ0 and θ1