0% found this document useful (0 votes)
31 views17 pages

Locally Weighted Regression

The document discusses locally weighted linear regression, a non-parametric learning algorithm that allows for more flexibility in feature selection compared to parametric algorithms. It emphasizes the importance of weighting nearby data points more heavily when making predictions, which helps to create a more accurate model that can adapt to non-linear relationships. The document also notes the computational cost of this method, as it requires fitting the model to the entire training set for each prediction.

Uploaded by

amrnabih112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views17 pages

Locally Weighted Regression

The document discusses locally weighted linear regression, a non-parametric learning algorithm that allows for more flexibility in feature selection compared to parametric algorithms. It emphasizes the importance of weighting nearby data points more heavily when making predictions, which helps to create a more accurate model that can adapt to non-linear relationships. The document also notes the computational cost of this method, as it requires fitting the model to the entire training set for each prediction.

Uploaded by

amrnabih112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning

Locally weighted
linear regression

By
Prof. Amira Yassien
Head of Computers & Control Systems Eng.
Dept.
Non-parametric learning algorithm
 Ithelps to alleviate the need somewhat for you to choose
features very carefully.
 This leads us into our discussion of locally weighted
regression.
 Linear regression: is an example of a parametric learning
algorithm.
 Parametriclearning algorithm is an algorithm that has a
fixed number of parameters ϴ that fit to the data.
 In contrast, the nonparametric learning algorithm.
 It’san algorithm where the number of parameters goes
with m, with the size of the training set.
 Usually it’s defined as a number of parameters grows
linearly with the size of the training set.
A slightly less formal definition is that the amount of stuff
that your learning algorithm needs to keep around will grow
linearly with the training sets or, in another way of saying it,
is that this is an algorithm that we’ll need to keep around an
entire training set, even after learning.
 Locally weighted regression is a specific non-parametric
learning algorithm
 This will be an algorithm that allows us to worry a little bit
less about having to choose features very carefully.
 Let’s say that I have a training sit that looks like this.
 If you run linear regression on this, you may have a straight
line (red) which appear to be not a good fit.
x xx x
x x
x x
x
Y xx
x xx x x x x x
x xx

X
 So maybe you want to try in a quadratic function, but this
isn’t really quadratic either.
 So maybe you want to model this as a X+X2+sin(X).
 After a while you can probably come up with a set of features
that the model is fit.
 Suppose you want to evaluate your hypothesis H at a certain
point with X.
 Let’s say you want to know what’s the predicted value of Y
at this position of X?
x xx x
x x
x x
x
Y xx
x xxxxx
xx |
xx x

X
 For linear regression,
 We would fit ϴ to minimize
 (
i 1
T i
x  y)i

Tx
 Return
 In contrast, in locally weighted linear regression it
looks at this point X and then in the data set and take
into account only the data points that are in the little
vicinity of X.
 So it looks at where it want to value the hypothesis
(which is the vicinity of this point where I want to value
my hypothesis)
 Then I’m going to take, these few points, and I will
apply linear regression to fit a straight line just to this
sub-set of the data.
 Sowe take this data set and I fit a straight line to it and
maybe I get a straight line like next figure.
• Then evaluate this particular value of straight line and that will
be the value I return for my algorithm.

x xx x
x x
x x
x
Y xx
x xx x x x x x
x xx
|

X
• This would be the predicted value for the hypothesis outputs
in locally weighted regression.
Locally Weighted Regression
 Fit ϴ to minimize
 w
i 1
i
( y i
  T i 2
x)

 wi are called weights. i 2


i ( x  x)
w exp( )
2
 Suppose you have a training example xi. So that xi is
very close to x (which make (xi-x) small)
 If (xi-x) is small, close to zero, then wi will be close to
one.
 Also if xi is very far from x then w will be close to zero.
 So if I’m quarrying at a certain point x, shown on the X
axis, and if my data set, say, look like that in figure,

x xx x
x x
x x
x
Y xx
x xxxxx
xx |
xx x
X
x
 thenI’m going to give the points close to this a large
weight and give the points far away a small weight.
 So for the points that are far away, wi will be close to
zero and so they will not contribute much at all to this
summation.
 So the effect of using this weighting is that locally weighted
linear regression fits a set of parameters ϴ, paying much
more attention to fit the points close accurately.
 Whereas ignoring the contribution from far away points.
 If |xi-x| small then wi ≈ 1
 If |xi-x| large then wi ≈ 0
 There’sone other parameter to this algorithm, which will be
denoted as τ.
i 2
i ( x  x)
w exp( )
2 2
 Thisexponential decay weight function is a reasonably
common one that seems to be a more reasonable choice
on many problems.
 This is just a convenient function that happens to be a
bell-shaped function.
 if you imagine putting this on a bell-shaped, centered
around the position of where you want to value your
hypothesis h,

x xx x
x x
x x
x
Y xx
x xx x x x x x
x | xx X
 Thenyou can say that the weight of a point(training
example) will be proportional to the height of the bell-
shaped function evaluated at this point.

 And the way to get to this training example, will be


proportionate to that height and so on.

 And so training examples that are really far away get a


very small weight. xx x
xx x
x x
x
Y xx
x xxxxx
x xx
| x X
 This parameter τ is called the bandwidth parameter and
informally it controls how fast the weights fall of with
distance.
 So if τ is very small, then you end up choosing a fairly
narrow bell shape, so that the weights of the points are far
away fall off rapidly.
 Whereas if τ is large then you’d end up choosing a
weighting function that falls of relatively slowly with
distance from your query. τ Is large then wide bell shape
x xx x
x x τ Is small then narrow
x x
x bell shape
Y xx
x xx x x x x x
x | xx X
query
Conclusions
 So if you apply locally weighted linear regression to a
data set and ask what your hypothesis output is at a certain
point(query) you end up having a straight line making that
prediction.
 It turns out that every time you try to vary your
hypothesis, every time you ask your learning algorithm to
make a prediction for how much a new house costs or
whatever, you need to run a new fitting procedure and
then evaluate this line that you fit just at the position of the
value of x ( the position of the query where you’re trying
to make a prediction).
 But if you do this for every point along the X-axis
then you find that locally weighted regression is able
to trace on a non-linear curve for a data set like this.

x xx x
x x
x x
x
Y xx
x xxxxx
xx | |
xx x
X
 So because locally weighted regression is a non-
parametric algorithm every time you make a prediction
you need to fit theta to your entire training set again.
 If you have a very large training set then this is an
expensive algorithm to use.
 Because every time you want to make a prediction you
need to fit a straight line to a huge data set again.
 There are ways to make this much more efficient for
large data sets as well.
Remember
 In the original linear regression algorithm, to make a prediction
at a query point x (i.e., to evaluate h(x)), we would:
 1. Fit θ to minimize ∑i(y(i) − θT x(i))2.
 2. Output θT x.

 In contrast, the locally weighted linear regression algorithm


does the following:
 1. Fit θ to minimize ∑i w(i)(y(i) − θT x(i))2.
 2. Output θT x.

You might also like