Linear Regression
Linear Regression
UnsupervisedFeatureLearningandDeepLearningTutorial
Linear Regression
Linear Regression
(http://ufldl.stanford.edu/tutorial/
Problem Formulation
As a refresher, we will start by learning how to implement linear regression. The main idea is to get
familiar with objective functions, computing their gradients and optimizing the objectives over a set
of parameters. These basic tools will form the basis for more sophisticated algorithms later. Readers
that want additional details may refer to the Lecture Note (http://cs229.stanford.edu/notes/cs229notes1.pdf) on Supervised Learning for more.
Our goal in linear regression is to predict a target value y starting from a vector of input values
x n. For example, we might want to make predictions about the price of a house so that y
represents the price of the house in dollars and the elements x j of x represent features that
describe the house (such as its size and the number of bedrooms). Suppose that we are given many
examples of houses where the features for the ith house are denoted x ( i ) and the price is y ( i ) . For
short, we will denote the
Logistic Regression
(http://ufldl.stanford.edu/tutorial/
Vectorization
(http://ufldl.stanford.edu/tutorial/
Debugging: Gradient
Checking
(http://ufldl.stanford.edu/tutorial/
Softmax Regression
(http://ufldl.stanford.edu/tutorial/
Our goal is to find a function y = h(x) so that we have y ( i ) h(x ( i ) ) for each training example. If we
succeed in finding a function h(x) like this, and we have seen enough examples of houses and their
prices, we hope that the function h(x) will also be a good predictor of the house price even when we
are given the features for a new house where the price is not known.
To find a function h(x) where y ( i ) h(x ( i ) ) we must first decide how to represent the function h(x).
To start out we will use linear functions: h (x) = j jx j = x. Here, h (x) represents a large family of
Debugging: Optimizers
and Objectives
(http://ufldl.stanford.edu/tutorial/
functions parametrized by the choice of . (We call this space of functions a hypothesis class.) With
this representation for h, our task is to find a choice of so that h (x ( i ) ) is as close as possible to y ( i ) .
In particular, we will search for a choice of that minimizes:
J() =
1
1
h (x ( i ) ) y ( i ) 2 = 2 x ( i ) y ( i ) 2
2 i
i
This function is the cost function for our problem which measures how much error is incurred in
predicting y ( i ) for a particular choice of . This may also be called a loss, penalty or objective
function.
Function Minimization
We now want to find the choice of that minimizes J() as given above. There are many algorithms
for minimizing functions like this one and we will describe some very effective ones that are easy to
implement yourself in a later section Gradient descent (). For now, lets take for granted the fact that
most commonly-used algorithms for function minimization require us to provide two pieces of
information about J(): We will need to write code to compute J() and J() on demand for any
choice of . After that, the rest of the optimization procedure to find the best choice of will be
handled by the optimization algorithm. (Recall that the gradient J() of a differentiable function J
is a vector that points in the direction of steepest increase as a function of so it is easy to see
how an optimization algorithm could use this to make a small change to that decreases (or
increase) J()).
The above expression for J() given a training set of x ( i ) and y ( i ) is easy to implement in MATLAB to
compute J() for any choice of . The remaining requirement is to compute the gradient:
Supervised Neural
Networks
Multi-Layer Neural
Networks
(http://ufldl.stanford.edu/tutorial/
Exercise: Supervised
Neural Network
(http://ufldl.stanford.edu/tutorial/
Supervised Convolutional
Neural Network
Pooling
(http://ufldl.stanford.edu/tutorial/
Optimization: Stochastic
Gradient Descent
(http://ufldl.stanford.edu/tutorial/
Convolutional Neural
Network
(http://ufldl.stanford.edu/tutorial/
Excercise: Convolutional
Neural Network
http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
1/3
5/9/2015
UnsupervisedFeatureLearningandDeepLearningTutorial
(http://ufldl.stanford.edu/tutorial/
[]
Unsupervised Learning
J ( )
1
Autoencoders
(http://ufldl.stanford.edu/tutorial/
J ( )
J() =
PCA Whitening
(http://ufldl.stanford.edu/tutorial/
J ( )
n
Differentiating the cost function J() as given above with respect to a particular parameter j gives
us:
J()
= x j( i ) h (x ( i ) ) y ( i )
j
i
Sparse Coding
(http://ufldl.stanford.edu/tutorial/
ICA
(http://ufldl.stanford.edu/tutorial/
RICA
(http://ufldl.stanford.edu/tutorial/
Exercise: RICA
(http://ufldl.stanford.edu/tutorial/
Self-Taught Learning
Self-Taught Learning
(http://ufldl.stanford.edu/tutorial/
Exercise: Self-Taught
Learning
(http://ufldl.stanford.edu/tutorial/
3. The code calls the minFunc optimization package. minFunc will attempt to find the best choice
of by minimizing the objective function implemented in linear_regression.m . It will be your
job to implement linear_regression.m to compute the objective function value and the gradient
with respect to the parameters.
4. After minFunc completes (i.e., after training is finished), the training and testing error is printed
out. Optionally, it will plot a quick visualization of the predicted and actual prices for the
examples in the test set.
The ex1_linreg.m file calls the linear_regression.m file that must be filled in with your code. The
linear_regression.m file receives the training data X, the training target values (house prices) y, and
the current parameters .
Complete the following steps for this exercise:
1. Fill in the linear_regression.m file to compute J() for the linear regression problem as defined
earlier. Store the computed value in the variable f .
You may complete both of these steps by looping over the examples in the training set (the columns
of the data matrix X) and, for each one, adding its contribution to f and g . We will create a faster
version in the next exercise.
Once you complete the exercise successfully, the resulting plot should look something like the one
below:
http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
2/3
5/9/2015
UnsupervisedFeatureLearningandDeepLearningTutorial
(Yours may look slightly different depending on the random choice of training and testing sets.)
Typical values for the RMS training and testing error are between 4.5 and 5.
CrushArcadeAdvertisement
http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
3/3