0% found this document useful (0 votes)
9 views

Lecture 3-4

- Ordinary Least Squares (OLS) is a method to estimate the parameters of a linear regression model by minimizing the sum of squared residuals. - It provides a closed-form solution for the parameters without needing an iterative process like gradient descent. - The OLS solution is obtained by taking the inverse of X'X multiplied by X'Y, where X is the feature matrix and Y is the target variable.

Uploaded by

Usama Mustafa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 3-4

- Ordinary Least Squares (OLS) is a method to estimate the parameters of a linear regression model by minimizing the sum of squared residuals. - It provides a closed-form solution for the parameters without needing an iterative process like gradient descent. - The OLS solution is obtained by taking the inverse of X'X multiplied by X'Y, where X is the feature matrix and Y is the target variable.

Uploaded by

Usama Mustafa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 87

Python Tools for Machine Learning

Gradient descent in
practice I: Feature Scaling

Machine Learning
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms

Andrew Ng
Feature Scaling
Get every feature into approximately a range.

Andrew Ng
Mean normalization
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.

Andrew Ng
Andrew Ng
Andrew Ng
Gradient descent in
practice II: Learning rate

Machine Learning
Gradient descent

- “Debugging”: How to make sure gradient


descent is working correctly.
- How to choose learning rate .

Andrew Ng
Making sure gradient descent is working correctly.

Example automatic
convergence test:

Declare convergence if
decreases by less
than in one
0 100 200 300 400
iteration.
No. of iterations
Andrew Ng
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .

No. of iterations

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.


- But if is too small, gradient descent can be slow to converge.
Andrew Ng
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.

To choose , try

Andrew Ng
Linear Regression with
multiple variables

Features and
polynomial regression
Machine Learning
Housing prices prediction

Andrew Ng
Polynomial regression

Price
(y)

Size (x)

Andrew Ng
Choice of features

Price
(y)

Size (x)

Andrew Ng
Alternative and efficient approach
for multiple features problems

NORMAL EQUATION

Machine Learning
Multiple features (variables).

Size (feet2) Price ($1000)

2104 460
1416 232
1534 315
852 178
… …

Andrew Ng
Multiple features
Approaches: 1) Gradient Descent
2) Normal equation

Revise basic concept of linear algebra

Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.

Andrew Ng
Hypothesis:
Previously:

Andrew Ng
For convenience of notation, define .

Multivariate linear regression.


Andrew Ng
Example
7

2 7
Let Compute
Let Compute
Identity Matrix
Denoted (or ).
Examples of identity matrices:

2x2
3x3
4x4
For any matrix ,
Linear Algebra
review (optional)
Inverse and
transpose
Machine Learning
Not all numbers have an inverse.
Matrix inverse:
If A is an m x m matrix, and if it has an inverse,

Matrices that don’t have an inverse are “singular” or “degenerate”


Matrix Transpose
Example:

Let be an m x n matrix, and let


Then is an n x m matrix, and
Linear Algebra
review (optional)
Matrix multiplication
properties

Machine Learning
Let and be matrices. Then in general,
(not commutative.)

E.g.
Linear Regression with
multiple variables

Gradient descent for


multiple variables
Machine Learning
Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )


Andrew Ng
New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for

(simultaneously update )

Andrew Ng
Linear Regression with
multiple variables

Normal equation

Machine Learning
Gradient Descent

Normal equation: Method to solve for


analytically.

Andrew Ng
Intuition: If 1D

(for every )

Solve for
Andrew Ng
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

Math behind this equation


Andrew Ng
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 3000 4 1 38 540

Andrew Ng
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.

Andrew Ng
Linear Regression with
multiple variables

Normal equation
and non-invertibility
Machine Learning
is inverse of matrix .

Andrew Ng
What if is non-invertible?
• Redundant features (linearly dependent).
E.g. size in feet2
size in m2

• Too many features (e.g. ).


- Delete some features, or use regularization.

Andrew Ng
Normal equation

- What if is non-invertible? (singular/


degenerate)
- Octave: pinv(X’*X)*X’*y

Andrew Ng
Example of Normal Equation
Ordinary Least
Squares (OLS) is a
method used to
estimate the
parameter of a
linear regression
model.
Method to solve for
theta analytically.
This will end up
with closed form
solution
Ordinary Least Squares (OLS)

You might also like