ML Lectures 2022 Part 2
ML Lectures 2022 Part 2
Welcome
Machine Learning
Andrew Ng
Andrew Ng
SPAM
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
Andrew Ng
Andrew Ng
Introduction
What is machine
learning
Machine Learning
Andrew Ng
Machine Learning definition
Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P, improves
with experience E.
Andrew Ng
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Andrew Ng
Housing price prediction.
400
300
Price ($)
200
in 1000’s
100
0
0 500 1000 1500 2000 2500
Size in feet2
Tumor Size
Andrew Ng
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
Andrew Ng
You’re running a company, and you want to develop learning algorithms to address each
of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.
Andrew Ng
Supervised Learning
x2
x1
Andrew Ng
Unsupervised Learning
x2
x1
Andrew Ng
Andrew Ng
Andrew Ng
Genes
Individuals
Individuals
Speaker #1 Microphone #1
Speaker #2 Microphone #2
Andrew Ng
Microphone #1: Output #1:
[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
Andrew Ng
Linear
regression
with
one
variable
Model
representa6on
Machine
Learning
Andrew
Ng
500
Housing
Prices
400
(Portland,
OR)
300
Price
200
(in
1000s
100
of
dollars)
0
0
500
1000
1500
2000
2500
3000
Size
(feet2)
Supervised
Learning
Regression
Problem
Andrew
Ng
Training
Set
How
do
we
represent
h
?
Learning Algorithm
Andrew
Ng
Linear
regression
with
one
variable
Cost
func6on
Machine Learning
Andrew
Ng
Size
in
feet2
(x)
Price
($)
in
1000's
(y)
Training
Set
2104
460
1416
232
1534
315
852
178
…
…
Hypothesis:
‘s:
Parameters
How
to
choose
‘s
?
Andrew
Ng
3
3
3
2 2 2
1 1 1
0
0
0
0
1
2
3
0
1
2
3
0
1
2
3
Andrew
Ng
y
x
Andrew
Ng
Linear
regression
with
one
variable
Cost
func6on
intui6on
I
Machine
Learning
Andrew
Ng
Simplified
Hypothesis:
Parameters:
Cost Func6on:
Goal:
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameter
)
3 3
2
2
y
1
1
0
0
0
1
2
3
-‐0.5
0
0.5
1
1.5
2
2.5
x
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameter
)
3 3
2
2
y
1
1
0
0
0
1
2
3
-‐0.5
0
0.5
1
1.5
2
2.5
x
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameter
)
3 3
2
2
y
1
1
0
0
0
1
2
3
-‐0.5
0
0.5
1
1.5
2
2.5
x
Andrew
Ng
Linear
regression
with
one
variable
Cost
func6on
intui6on
II
Machine
Learning
Andrew
Ng
Hypothesis:
Parameters:
Cost Func6on:
Goal:
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
500
400
Price
($)
300
in
1000’s
200
100
0
0
1000
2000
3000
Size
in
feet2
(x)
Andrew
Ng
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
Linear
regression
with
one
variable
Gradient
Machine
Learning
descent
Andrew
Ng
Have
some
func6on
Want
Outline:
• Start
with
some
• Keep
changing
to
reduce
un6l
we
hopefully
end
up
at
a
minimum
Andrew
Ng
J(θ0,θ1)
θ1
θ0
Andrew
Ng
J(θ0,θ1)
θ1
θ0
Andrew
Ng
Gradient
descent
algorithm
Andrew
Ng
Linear
regression
with
one
variable
Gradient
descent
intui6on
Machine
Learning
Andrew
Ng
Gradient
descent
algorithm
Andrew
Ng
Andrew
Ng
If
α
is
too
small,
gradient
descent
can
be
slow.
Andrew
Ng
at
local
op6ma
Andrew
Ng
Gradient
descent
can
converge
to
a
local
minimum,
even
with
the
learning
rate
α
fixed.
Andrew
Ng
Gradient
descent
algorithm
Linear
Regression
Model
Andrew
Ng
Andrew
Ng
Gradient
descent
algorithm
update
and
simultaneously
Andrew
Ng
J(θ0,θ1)
θ1
θ0
Andrew
Ng
J(θ0,θ1)
θ1
θ0
Andrew
Ng
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
(for
fixed
,
this
is
a
func6on
of
x)
(func6on
of
the
parameters
)
Andrew
Ng
“Batch”
Gradient
Descent
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Mul2ple features
Machine
Learning
Mul4ple
features
(variables).
Andrew
Ng
Mul4ple
features
(variables).
Size
(feet2)
Number
of
Number
of
Age
of
home
Price
($1000)
bedrooms
floors
(years)
2104
5
1
45
460
1416
3
2
40
232
1534
3
2
30
315
852
2
1
36
178
…
…
…
…
…
Nota2on:
=
number
of
features
=
input
(features)
of
training
example.
=
value
of
feature
in
training
example.
Andrew
Ng
Hypothesis:
Previously:
Andrew
Ng
For
convenience
of
nota2on,
define
.
Gradient
descent:
Repeat
(simultaneously update )
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Gradient
descent
in
prac2ce
I:
Feature
Scaling
Machine
Learning
Feature
Scaling
Idea:
Make
sure
features
are
on
a
similar
scale.
E.g.
=
size
(0-‐2000
feet2)
size
(feet2)
=
number
of
bedrooms
(1-‐5)
number
of
bedrooms
Andrew
Ng
Feature
Scaling
Get
every
feature
into
approximately
a
range.
Andrew
Ng
Mean
normaliza4on
Replace
with
to
make
features
have
approximately
zero
mean
(Do
not
apply
to
).
E.g.
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Andrew
Ng
Making
sure
gradient
descent
is
working
correctly.
Example
automa2c
convergence
test:
To choose , try
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Features
and
polynomial
regression
Machine
Learning
Housing
prices
predic4on
Andrew
Ng
Polynomial
regression
Price
(y)
Size (x)
Andrew
Ng
Choice
of
features
Price
(y)
Size (x)
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Normal equa2on
Machine
Learning
Gradient
Descent
Andrew
Ng
Intui2on:
If
1D
(for every )
Solve
for
Andrew
Ng
Examples:
Size
(feet2)
Number
of
Number
of
Age
of
home
Price
($1000)
bedrooms
floors
(years)
1
2104
5
1
45
460
1
1416
3
2
40
232
1
1534
3
2
30
315
1
852
2
1
36
178
Andrew
Ng
examples
;
features.
E.g. If
Andrew
Ng
is
inverse
of
matrix
.
Octave: pinv(X’*X)*X’*y
Andrew
Ng
training
examples,
features.
Gradient
Descent
Normal
Equa2on
• Need
to
choose
.
• No
need
to
choose
.
• Needs
many
itera2ons.
• Don’t
need
to
iterate.
•
Works
well
even
• Need
to
compute
when
is
large.
• Slow
if
is
very
large.
Andrew
Ng
Linear
Regression
with
mul2ple
variables
Normal
equa2on
and
non-‐inver2bility
(op2onal)
Machine
Learning
Normal
equa2on
Andrew
Ng
What
if
is
non-‐inver2ble?
Andrew
Ng
Logis&c
Regression
Classifica&on
Machine
Learning
Classifica(on
Andrew
Ng
(Yes)
1
Malignant ?
(No)
0
Tumor
Size
Tumor
Size
Logis&c Regression:
Andrew
Ng
Logis&c
Regression
Hypothesis
Representa&on
Machine
Learning
Logis(c
Regression
Model
Want
1
0.5
Sigmoid func&on 0
Logis&c
func&on
Andrew
Ng
Interpreta(on
of
Hypothesis
Output
=
es&mated
probability
that
y
=
1
on
input
x
Example: If
Andrew
Ng
Logis&c
Regression
Decision
boundary
Machine
Learning
Logis(c
regression
1
z
Suppose
predict
“
“
if
predict “ “ if
Andrew
Ng
Decision
Boundary
x2
3
2
1 2 3 x1
Predict “ “ if
Andrew
Ng
Non-‐linear
decision
boundaries
x2
1
-‐1 1 x1
Predict
“
“
if
-‐1
x2
x1
Andrew
Ng
Logis&c
Regression
Cost
func&on
Machine
Learning
Training
set:
m
examples
“non-‐convex” “convex”
Andrew
Ng
Logis(c
regression
cost
func(on
If y = 1
0
1
Andrew
Ng
Logis(c
regression
cost
func(on
If y = 0
0
1
Andrew
Ng
Logis&c
Regression
Simplified
cost
func&on
and
gradient
descent
Machine
Learning
Logis(c
regression
cost
func(on
Andrew
Ng
Logis(c
regression
cost
func(on
To fit parameters :
Andrew
Ng
Gradient
Descent
Want
:
Repeat
Andrew
Ng
Gradient
Descent
Want
:
Repeat
Gradient
descent:
Repeat
Andrew
Ng
Op(miza(on
algorithm
Given
,
we
have
code
that
can
compute
-‐
-‐
(for
)
Andrew
Ng
theta =
Machine
Learning
Mul(class
classifica(on
Email
foldering/tagging:
Work,
Friends,
Family,
Hobby
Andrew
Ng
Binary
classifica&on:
Mul&-‐class
classifica&on:
x2 x2
x1
x1
Andrew
Ng
x2
One-‐vs-‐all
(one-‐vs-‐rest):
x1
x2
x2
x1
x1
x2
Class
1:
Class
2:
Class
3:
x1
Andrew
Ng
One-‐vs-‐all
Andrew
Ng
Regularization
The problem of
overfitting
Machine Learning
Example: Linear regression (housing prices)
Price
Price
Price
Size Size Size
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Andrew Ng
Addressing overfitting:
size of house
Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size
Andrew Ng
Addressing overfitting:
Options:
1. Reduce number of features.
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Andrew Ng
Regularization
Cost function
Machine Learning
Intuition
Price
Price
Andrew Ng
Regularization.
Andrew Ng
Regularization.
Price
Size of house
Andrew Ng
In regularized linear regression, we choose to minimize
Andrew Ng
In regularized linear regression, we choose to minimize
Size of house
Andrew Ng
Regularization
Regularized linear
regression
Machine Learning
Regularized linear regression
Gradient descent
Repeat
Andrew Ng
Normal equation
Andrew Ng
Non-invertibility (optional/advanced).
Suppose ,
(#examples) (#features)
If ,
Andrew Ng
Regularization
Regularized
logistic regression
Machine Learning
Regularized logistic regression.
x2
x1
Cost function:
Andrew Ng
Gradient descent
Repeat
Andrew Ng
Advanced optimization
function [jVal, gradient] = costFunction(theta)
jVal = [ code to compute ];