Lecture 3-4
Lecture 3-4
Gradient descent in
practice I: Feature Scaling
Machine Learning
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Andrew Ng
Feature Scaling
Get every feature into approximately a range.
Andrew Ng
Mean normalization
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.
Andrew Ng
Andrew Ng
Andrew Ng
Gradient descent in
practice II: Learning rate
Machine Learning
Gradient descent
Andrew Ng
Making sure gradient descent is working correctly.
Example automatic
convergence test:
Declare convergence if
decreases by less
than in one
0 100 200 300 400
iteration.
No. of iterations
Andrew Ng
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .
No. of iterations
To choose , try
Andrew Ng
Linear Regression with
multiple variables
Features and
polynomial regression
Machine Learning
Housing prices prediction
Andrew Ng
Polynomial regression
Price
(y)
Size (x)
Andrew Ng
Choice of features
Price
(y)
Size (x)
Andrew Ng
Alternative and efficient approach
for multiple features problems
NORMAL EQUATION
Machine Learning
Multiple features (variables).
2104 460
1416 232
1534 315
852 178
… …
Andrew Ng
Multiple features
Approaches: 1) Gradient Descent
2) Normal equation
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Andrew Ng
Hypothesis:
Previously:
Andrew Ng
For convenience of notation, define .
2 7
Let Compute
Let Compute
Identity Matrix
Denoted (or ).
Examples of identity matrices:
2x2
3x3
4x4
For any matrix ,
Linear Algebra
review (optional)
Inverse and
transpose
Machine Learning
Not all numbers have an inverse.
Matrix inverse:
If A is an m x m matrix, and if it has an inverse,
Machine Learning
Let and be matrices. Then in general,
(not commutative.)
E.g.
Linear Regression with
multiple variables
Gradient descent:
Repeat
(simultaneously update )
Andrew Ng
Linear Regression with
multiple variables
Normal equation
Machine Learning
Gradient Descent
Andrew Ng
Intuition: If 1D
(for every )
Solve for
Andrew Ng
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 3000 4 1 38 540
Andrew Ng
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
Andrew Ng
Linear Regression with
multiple variables
Normal equation
and non-invertibility
Machine Learning
is inverse of matrix .
Andrew Ng
What if is non-invertible?
• Redundant features (linearly dependent).
E.g. size in feet2
size in m2
Andrew Ng
Normal equation
Andrew Ng
Example of Normal Equation
Ordinary Least
Squares (OLS) is a
method used to
estimate the
parameter of a
linear regression
model.
Method to solve for
theta analytically.
This will end up
with closed form
solution
Ordinary Least Squares (OLS)