Locally Weighted Regression
Locally Weighted Regression
Locally weighted
linear regression
By
Prof. Amira Yassien
Head of Computers & Control Systems Eng.
Dept.
Non-parametric learning algorithm
Ithelps to alleviate the need somewhat for you to choose
features very carefully.
This leads us into our discussion of locally weighted
regression.
Linear regression: is an example of a parametric learning
algorithm.
Parametriclearning algorithm is an algorithm that has a
fixed number of parameters ϴ that fit to the data.
In contrast, the nonparametric learning algorithm.
It’san algorithm where the number of parameters goes
with m, with the size of the training set.
Usually it’s defined as a number of parameters grows
linearly with the size of the training set.
A slightly less formal definition is that the amount of stuff
that your learning algorithm needs to keep around will grow
linearly with the training sets or, in another way of saying it,
is that this is an algorithm that we’ll need to keep around an
entire training set, even after learning.
Locally weighted regression is a specific non-parametric
learning algorithm
This will be an algorithm that allows us to worry a little bit
less about having to choose features very carefully.
Let’s say that I have a training sit that looks like this.
If you run linear regression on this, you may have a straight
line (red) which appear to be not a good fit.
x xx x
x x
x x
x
Y xx
x xx x x x x x
x xx
X
So maybe you want to try in a quadratic function, but this
isn’t really quadratic either.
So maybe you want to model this as a X+X2+sin(X).
After a while you can probably come up with a set of features
that the model is fit.
Suppose you want to evaluate your hypothesis H at a certain
point with X.
Let’s say you want to know what’s the predicted value of Y
at this position of X?
x xx x
x x
x x
x
Y xx
x xxxxx
xx |
xx x
X
For linear regression,
We would fit ϴ to minimize
(
i 1
T i
x y)i
Tx
Return
In contrast, in locally weighted linear regression it
looks at this point X and then in the data set and take
into account only the data points that are in the little
vicinity of X.
So it looks at where it want to value the hypothesis
(which is the vicinity of this point where I want to value
my hypothesis)
Then I’m going to take, these few points, and I will
apply linear regression to fit a straight line just to this
sub-set of the data.
Sowe take this data set and I fit a straight line to it and
maybe I get a straight line like next figure.
• Then evaluate this particular value of straight line and that will
be the value I return for my algorithm.
x xx x
x x
x x
x
Y xx
x xx x x x x x
x xx
|
X
• This would be the predicted value for the hypothesis outputs
in locally weighted regression.
Locally Weighted Regression
Fit ϴ to minimize
w
i 1
i
( y i
T i 2
x)
x xx x
x x
x x
x
Y xx
x xxxxx
xx |
xx x
X
x
thenI’m going to give the points close to this a large
weight and give the points far away a small weight.
So for the points that are far away, wi will be close to
zero and so they will not contribute much at all to this
summation.
So the effect of using this weighting is that locally weighted
linear regression fits a set of parameters ϴ, paying much
more attention to fit the points close accurately.
Whereas ignoring the contribution from far away points.
If |xi-x| small then wi ≈ 1
If |xi-x| large then wi ≈ 0
There’sone other parameter to this algorithm, which will be
denoted as τ.
i 2
i ( x x)
w exp( )
2 2
Thisexponential decay weight function is a reasonably
common one that seems to be a more reasonable choice
on many problems.
This is just a convenient function that happens to be a
bell-shaped function.
if you imagine putting this on a bell-shaped, centered
around the position of where you want to value your
hypothesis h,
x xx x
x x
x x
x
Y xx
x xx x x x x x
x | xx X
Thenyou can say that the weight of a point(training
example) will be proportional to the height of the bell-
shaped function evaluated at this point.
x xx x
x x
x x
x
Y xx
x xxxxx
xx | |
xx x
X
So because locally weighted regression is a non-
parametric algorithm every time you make a prediction
you need to fit theta to your entire training set again.
If you have a very large training set then this is an
expensive algorithm to use.
Because every time you want to make a prediction you
need to fit a straight line to a huge data set again.
There are ways to make this much more efficient for
large data sets as well.
Remember
In the original linear regression algorithm, to make a prediction
at a query point x (i.e., to evaluate h(x)), we would:
1. Fit θ to minimize ∑i(y(i) − θT x(i))2.
2. Output θT x.