0% found this document useful (0 votes)
14 views

ML Lectures 2022 Part 2

Uploaded by

PRIYANKA S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ML Lectures 2022 Part 2

Uploaded by

PRIYANKA S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

Introduction

Welcome
Machine Learning
Andrew Ng
Andrew Ng
SPAM
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
Andrew Ng
Andrew Ng
Introduction
What is machine
learning
Machine Learning

Andrew Ng
Machine Learning definition

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P, improves
with experience E.
Andrew Ng
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


Machine learning algorithms:
- Supervised learning
- Unsupervised learning
Others: Reinforcement learning, recommender
systems.

Also talk about: Practical advice for applying


learning algorithms.
Andrew Ng
Andrew Ng
Introduction
Supervised
Learning
Machine Learning

Andrew Ng
Housing price prediction.
400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500
Size in feet2

Supervised Learning Regression: Predict continuous


“right answers” given valued output (price)
Andrew Ng
Breast cancer (malignant, benign)
Classification
1(Y)
Discrete valued
Malignant?
output (0 or 1)
0(N)
Tumor Size

Tumor Size

Andrew Ng
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Tumor Size

Andrew Ng
You’re running a company, and you want to develop learning algorithms to address each
of two problems.

Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.

Should you treat these as classification or as regression problems?


Treat both as classification problems.

Treat problem 1 as a classification problem, problem 2 as a regression problem.

Treat problem 1 as a regression problem, problem 2 as a classification problem.

Treat both as regression problems.


Andrew Ng
Introduction
Unsupervised
Learning
Machine Learning

Andrew Ng
Supervised Learning

x2

x1
Andrew Ng
Unsupervised Learning

x2

x1
Andrew Ng
Andrew Ng
Andrew Ng
Genes

Individuals

[Source: Daphne Koller] Andrew Ng


Genes

Individuals

[Source: Daphne Koller] Andrew Ng


Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis


Andrew Ng
Cocktail party problem

Speaker #1 Microphone #1

Speaker #2 Microphone #2

Andrew Ng
Microphone #1: Output #1:

Microphone #2: Output #2:

Microphone #1: Output #1:

Microphone #2: Output #2:

[Audio clips courtesy of Te-Won Lee.] Andrew Ng


Cocktail party problem algorithm

[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

[Source: Sam Roweis, Yair Weiss & Eero Simoncelli] Andrew Ng


Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)

Given email labeled as spam/not spam, learn a spam filter.

Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
Andrew Ng
Linear  regression  
with  one  variable  
Model  
representa6on  
Machine  Learning  

Andrew  Ng  
500  
Housing  Prices  
400  
(Portland,  OR)  
300  

Price   200  
(in  1000s   100  
of  dollars)   0  
0   500   1000   1500   2000   2500   3000  
Size  (feet2)  
Supervised  Learning   Regression  Problem  
   

Given  the  “right  answer”  for   Predict  real-­‐valued  output  


each  example  in  the  data.  
Andrew  Ng  
Training  set  of   Size  in  feet2  (x)   Price  ($)  in  1000's  (y)  
housing  prices   2104   460  
(Portland,  OR)   1416   232  
1534   315  
852   178  
…   …  
Nota6on:  
 
 

     m  =  Number  of  training  examples  


     x’s  =  “input”  variable  /  features  
     y’s  =  “output”  variable  /  “target”  variable  

Andrew  Ng  
Training  Set   How  do  we  represent  h  ?  

Learning  Algorithm  

Size  of   h   Es6mated  


house   price  

Linear  regression  with  one  variable.  


Univariate  linear  regression.  

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  

Machine  Learning  

Andrew  Ng  
Size  in  feet2  (x)   Price  ($)  in  1000's  (y)  
Training  Set   2104   460  
1416   232  
1534   315  
852   178  
…   …  

Hypothesis:  
‘s:            Parameters  
How  to  choose          ‘s  ?  
Andrew  Ng  
3   3   3  

2   2   2  

1   1   1  

0   0   0  
0   1   2   3   0   1   2   3   0   1   2   3  

Andrew  Ng  
y  

x  

Idea:  Choose                          so  that                                        


                     is  close  to          for  our  
training  examples    

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  
intui6on  I  
Machine  Learning  

Andrew  Ng  
Simplified  
Hypothesis:  

Parameters:  

Cost  Func6on:  

Goal:  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  
intui6on  II  
Machine  Learning  

Andrew  Ng  
Hypothesis:  

Parameters:  

Cost  Func6on:  

Goal:  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

500  

400  
Price  ($)     300  
in  1000’s  
200  

100  

0  
0   1000   2000   3000  
Size  in  feet2  (x)  

Andrew  Ng  
Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
Linear  regression  
with  one  variable  

Gradient  
Machine  Learning  
descent  
Andrew  Ng  
Have  some  func6on  
Want    

Outline:  
• Start  with  some  
• Keep  changing                            to  reduce                                          
un6l  we  hopefully  end  up  at  a  minimum  
Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
Gradient  descent  algorithm  

Correct:  Simultaneous  update   Incorrect:  

Andrew  Ng  
Linear  regression  
with  one  variable  
Gradient  descent  
intui6on  
Machine  Learning  

Andrew  Ng  
Gradient  descent  algorithm  

Andrew  Ng  
Andrew  Ng  
If  α  is  too  small,  gradient  descent  
can  be  slow.  

If  α  is  too  large,  gradient  descent  


can  overshoot  the  minimum.  It  may  
fail  to  converge,  or  even  diverge.  

Andrew  Ng  
at  local  op6ma  

Current  value  of    

Andrew  Ng  
Gradient  descent  can  converge  to  a  local  
minimum,  even  with  the  learning  rate  α  fixed.  

As  we  approach  a  local  


minimum,  gradient  
descent  will  automa6cally  
take  smaller  steps.  So,  no  
need  to  decrease  α  over  
6me.    
Andrew  Ng  
Linear  regression  
with  one  variable  
Gradient  descent  for    
linear  regression  
Machine  Learning  

Andrew  Ng  
Gradient  descent  algorithm   Linear  Regression  Model  

Andrew  Ng  
Andrew  Ng  
Gradient  descent  algorithm  

update    
and  
simultaneously  
 

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
“Batch”  Gradient  Descent  

“Batch”:  Each  step  of  gradient  descent  


uses  all  the  training  examples.  

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  

Mul2ple  features  

Machine  Learning  
Mul4ple  features  (variables).  

Size  (feet2)   Price  ($1000)  


   
   
2104   460  
1416   232  
1534   315  
852   178  
…   …  

Andrew  Ng  
Mul4ple  features  (variables).  
Size  (feet2)   Number  of   Number  of   Age  of  home   Price  ($1000)  
  bedrooms   floors   (years)    
         
2104   5   1   45   460  
1416   3   2   40   232  
1534   3   2   30   315  
852   2   1   36   178  
…   …   …   …   …  
Nota2on:  
=  number  of  features  
=  input  (features)  of                training  example.  
=  value  of  feature        in                training  example.  

Andrew  Ng  
Hypothesis:  
 Previously:  

Andrew  Ng  
For  convenience  of  nota2on,  define                                .  

Mul2variate  linear  regression.  


Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  

Gradient  descent  for  


mul2ple  variables  
Machine  Learning  
Hypothesis:  
Parameters:  
Cost  func2on:  

Gradient  descent:  
Repeat  

(simultaneously  update  for  every                                                )  


Andrew  Ng  
New  algorithm                              :  
Gradient  Descent  
Repeat  
Previously  (n=1):  
Repeat  
(simultaneously  update                for            
                                               )  

(simultaneously  update                          )  

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  
Gradient  descent  in  
prac2ce  I:  Feature  Scaling  

Machine  Learning  
Feature  Scaling  
Idea:  Make  sure  features  are  on  a  similar  scale.  
E.g.              =  size  (0-­‐2000  feet2)   size  (feet2)  
                           =  number  of  bedrooms  (1-­‐5)  
number  of  bedrooms  

Andrew  Ng  
Feature  Scaling  
Get  every  feature  into  approximately  a                                                      range.  

Andrew  Ng  
Mean  normaliza4on  
Replace            with                                to  make  features  have  approximately  zero  mean  
(Do  not  apply  to                            ).  
E.g.    

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  

Gradient  descent  in  


prac2ce  II:  Learning  rate  
Machine  Learning  
Gradient  descent  

-­‐ “Debugging”:  How  to  make  sure  gradient  


descent  is  working  correctly.  
-­‐ How  to  choose  learning  rate          .  

Andrew  Ng  
Making  sure  gradient  descent  is  working  correctly.  

Example  automa2c  
convergence  test:  

Declare  convergence  if              


decreases  by  less  than              
in  one  itera2on.  
0   100   200   300   400  
No.  of  itera2ons  
Andrew  Ng  
Making  sure  gradient  descent  is  working  correctly.  
Gradient  descent  not  working.    
Use  smaller        .    

No.  of  itera2ons  

No.  of  itera2ons   No.  of  itera2ons  

-­‐ For  sufficiently  small          ,                          should  decrease  on  every  itera2on.  


-­‐ But  if            is  too  small,  gradient  descent  can  be  slow  to  converge.  
Andrew  Ng  
Summary:  
-­‐ If          is  too  small:  slow  convergence.  
-­‐ If          is  too  large:                  may  not  decrease  on  
every  itera2on;  may  not  converge.  

To  choose        ,  try  

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  

Features  and  
polynomial  regression  
Machine  Learning  
Housing  prices  predic4on  

Andrew  Ng  
Polynomial  regression  

Price  
(y)  

Size  (x)  

Andrew  Ng  
Choice  of  features  

Price  
(y)  

Size  (x)  

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  

Normal  equa2on  

Machine  Learning  
Gradient  Descent  

Normal  equa2on:  Method  to  solve  for    


analy2cally.  

Andrew  Ng  
Intui2on:  If  1D  

(for  every      )  

Solve  for    
Andrew  Ng  
Examples:    
Size  (feet2)   Number  of   Number  of   Age  of  home   Price  ($1000)  
  bedrooms   floors   (years)    
         
1   2104   5   1   45   460  
1   1416   3   2   40   232  
1   1534   3   2   30   315  
1   852   2   1   36   178  

Andrew  Ng  
           examples                                                                                                            ;            features.  

E.g.        If  

Andrew  Ng  
is  inverse  of  matrix                          .  

Octave:     pinv(X’*X)*X’*y

Andrew  Ng  
           training  examples,          features.  
Gradient  Descent   Normal  Equa2on  
• Need  to  choose        .     • No  need  to  choose        .  
• Needs  many  itera2ons.   • Don’t  need  to  iterate.  
•   Works  well  even   • Need  to  compute  
when          is  large.  
• Slow  if          is  very  large.  

Andrew  Ng  
Linear  Regression  with  
mul2ple  variables  
Normal  equa2on  
and  non-­‐inver2bility  
(op2onal)  
Machine  Learning  
Normal  equa2on  

-­‐ What  if                          is  non-­‐inver2ble?  (singular/  


degenerate)  
-­‐ Octave:    pinv(X’*X)*X’*y

Andrew  Ng  
What  if                      is  non-­‐inver2ble?  

• Redundant  features  (linearly  dependent).  


E.g.                        size  in  feet2  
                                     size  in  m2  

• Too  many  features  (e.g.                          ).  


-­‐ Delete  some  features,  or  use  regulariza2on.  

Andrew  Ng  
Logis&c  
Regression  
Classifica&on  
Machine  Learning  
Classifica(on  

Email:  Spam  /  Not  Spam?  


Online  Transac&ons:  Fraudulent  (Yes  /  No)?  
Tumor:  Malignant  /  Benign  ?  

0:  “Nega&ve  Class”  (e.g.,  benign  tumor)  


 
 

1:  “Posi&ve  Class”  (e.g.,  malignant  tumor)  

Andrew  Ng  
(Yes)  1  

Malignant  ?  

(No)  0  
Tumor  Size   Tumor  Size  

Threshold  classifier  output                          at  0.5:  


If                                                ,  predict  “y  =  1”  
If                                                ,  predict  “y  =  0”  
Andrew  Ng  
Classifica&on:        y      =      0      or      1  

can  be  >  1  or  <  0  

Logis&c  Regression:  

Andrew  Ng  
Logis&c  
Regression  
Hypothesis  
Representa&on  
Machine  Learning  
Logis(c  Regression  Model  
Want  

1  

0.5  

Sigmoid  func&on   0  

Logis&c  func&on  
Andrew  Ng  
Interpreta(on  of  Hypothesis  Output  
=  es&mated  probability  that  y  =  1  on  input  x    

Example:    If    

Tell  pa&ent  that  70%  chance  of  tumor  being  malignant    

“probability  that  y  =  1,  given  x,  


   parameterized  by        ”  

Andrew  Ng  
Logis&c  
Regression  
Decision  boundary  

Machine  Learning  
Logis(c  regression   1  

z  
   Suppose  predict  “                    “  if  

         predict  “                    “    if  

Andrew  Ng  
Decision  Boundary  
x2  
3
2

1 2 3 x1  

Predict  “                    “  if    

Andrew  Ng  
Non-­‐linear  decision  boundaries  
x2  

1  

-­‐1   1   x1  

Predict  “                    “  if    
-­‐1  

x2  

x1  

Andrew  Ng  
Logis&c  
Regression  
Cost  func&on  

Machine  Learning  
Training  
set:  
m  examples  

How  to  choose  parameters        ?  


Andrew  Ng  
Cost  func(on  
Linear  regression:  

“non-­‐convex”   “convex”  

Andrew  Ng  
Logis(c  regression  cost  func(on  

If  y  =  1  

0   1   Andrew  Ng  
Logis(c  regression  cost  func(on  

If  y  =  0  

0   1   Andrew  Ng  
Logis&c  
Regression  
Simplified  cost  func&on  
and  gradient  descent  

Machine  Learning  
Logis(c  regression  cost  func(on  

Andrew  Ng  
Logis(c  regression  cost  func(on  

To  fit  parameters        :    

To  make  a  predic&on  given  new      :  


Output    

Andrew  Ng  
Gradient  Descent  

Want                                        :  
Repeat  

(simultaneously  update  all          )  

Andrew  Ng  
Gradient  Descent  

Want                                        :  
Repeat  

(simultaneously  update  all          )  

Algorithm  looks  iden&cal  to  linear  regression!  


Andrew  Ng  
Logis&c  
Regression  
Advanced    
op&miza&on  
Machine  Learning  
Op(miza(on  algorithm  
Cost  func&on                  .  Want                                        .  
Given        ,  we  have  code  that  can  compute  
-­‐    
-­‐     (for                                                          )  

Gradient  descent:  
Repeat  

Andrew  Ng  
Op(miza(on  algorithm  
Given        ,  we  have  code  that  can  compute  
-­‐    
-­‐     (for                                                          )  

Op&miza&on  algorithms:   Advantages:  


 -­‐ Gradient  descent   -­‐ No  need  to  manually  pick    
-­‐ Conjugate  gradient   -­‐ Oeen  faster  than  gradient  
-­‐ BFGS   descent.  
-­‐ L-­‐BFGS   Disadvantages:  
-­‐ More  complex  
 
Andrew  Ng  
Example:  
function [jVal, gradient]
= costFunction(theta)
jVal = (theta(1)-5)^2 + ...
(theta(2)-5)^2;
gradient = zeros(2,1);
gradient(1) = 2*(theta(1)-5);
gradient(2) = 2*(theta(2)-5);

options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, ‘100’);


initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] ...
= fminunc(@costFunction, initialTheta, options);

Andrew  Ng  
theta =

function [jVal, gradient] = costFunction(theta)

jVal = [ code  to  compute   ];

gradient(1) = [code  to  compute   ];

gradient(2) = [code  to  compute   ];

gradient(n+1) = [ code  to  compute   ];


Andrew  Ng  
Logis&c  
Regression  
Mul&-­‐class  classifica&on:  
One-­‐vs-­‐all  

Machine  Learning  
Mul(class  classifica(on  
Email  foldering/tagging:  Work,  Friends,  Family,  Hobby  

Medical  diagrams:  Not  ill,  Cold,  Flu  

Weather:  Sunny,  Cloudy,  Rain,  Snow  

Andrew  Ng  
Binary  classifica&on:   Mul&-­‐class  classifica&on:  

x2   x2  

x1   x1  
Andrew  Ng  
x2  
One-­‐vs-­‐all  (one-­‐vs-­‐rest):  

x1  
x2   x2  

x1   x1  
x2  
Class  1:  
Class  2:  
Class  3:  
x1  
Andrew  Ng  
One-­‐vs-­‐all  

Train  a  logis&c  regression  classifier                              for  each  


class        to  predict  the  probability  that                      .  

On  a  new  input        ,  to  make  a  predic&on,  pick  the  


class        that  maximizes  

Andrew  Ng  
Regularization
The problem of
overfitting
Machine Learning
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features, the learned hypothesis


may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Andrew Ng
Example: Logistic regression

x2 x2 x2

x1 x1 x1

( = sigmoid function)

Andrew Ng
Addressing overfitting:
size of house

Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size

Andrew Ng
Addressing overfitting:

Options:
1. Reduce number of features.
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .

Andrew Ng
Regularization
Cost function

Machine Learning
Intuition

Price
Price

Size of house Size of house

Suppose we penalize and make , really small.

Andrew Ng
Regularization.

Small values for parameters


― “Simpler” hypothesis
― Less prone to overfitting
Housing:
― Features:
― Parameters:

Andrew Ng
Regularization.

Price

Size of house

Andrew Ng
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large


for our problem, say )?
- Algorithm works fine; setting to be very large can’t hurt it
- Algortihm fails to eliminate overfitting.
- Algorithm results in underfitting. (Fails to fit even training data
well).
- Gradient descent will fail to converge.

Andrew Ng
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large


for our problem, say )?
Price

Size of house

Andrew Ng
Regularization
Regularized linear
regression
Machine Learning
Regularized linear regression
Gradient descent
Repeat

Andrew Ng
Normal equation

Andrew Ng
Non-invertibility (optional/advanced).
Suppose ,
(#examples) (#features)

If ,

Andrew Ng
Regularization
Regularized
logistic regression
Machine Learning
Regularized logistic regression.

x2

x1
Cost function:

Andrew Ng
Gradient descent
Repeat

Andrew Ng
Advanced optimization
function [jVal, gradient] = costFunction(theta)
jVal = [ code to compute ];

gradient(1) = [ code to compute ];

gradient(2) = [code to compute ];

gradient(3) = [code to compute ];

gradient(n+1) = [ code to compute ];


Andrew Ng

You might also like