Supervised_Learning (2)
Supervised_Learning (2)
RECOGNITION
1
SUPERVISED LEARNING
CLASSIFICATION
2
LEARNİNG A CLASS FROM EXAMPLES
Class C of a “family car”
Prediction: Is car x a family car?
Knowledge extraction: What do people
expect from a family car?
Positive (+) and negative (–) examples
Input representation:
3
TRAİNİNG SET X
For each car
x1
x
x2
1 if x is positive
r
0 if x is negative
t t N
X {x ,r } t 1
4
CLASS C
Class C is defined
by a rectangle in
the price-engine
power space.
5
CLASS C
N
E (h | X ) 1 h xt r t
t 1
7
HYPOTHESİS CLASS H How to read?
N
E (h | X ) 1 h xt r t
t 1
Error on hypothesis
h given the training
set X
8
S, G, AND THE VERSİON SPACE
h Î H, between S and G is
consistent
9
Ci for i=1,...,K
MULTIPLE CLASSES
1 if xt
Ci
X {xt ,r t }tN1 t
ri t
0 if x C j , j i
Train hypotheses
hi(x), i =1,...,K:
1 if xt
Ci
t
hi x t
0 if x C j , j i
10
Ci for i=1,...,K
MULTIPLE CLASSES
K Class problem = K – 2 class problems
Positive examples
for class : Luxury
Sedan
Rest ALL –
Negative examples
11
LINEAR REGRESSION
12
EXAMPLE
14
WHAT IS LINEAR
A slope of 2
means that every
1-unit change in X
yields a 2-unit
change in Y.
15
EXAMPLE
Dataset giving the living areas and prices of
50 houses
16
EXAMPLE
We can plot this data
17
NOTATIONS
The “input” variables – x(i) (living area in this
example)
The “output” or target variable that we are
18
REGRESSION
Given a training set, to learn a function h :
X → Y so that h(x) is a “good” predictor for
the corresponding value of y. For historical
reasons, this function h is called a
hypothesis.
19
CHOICE OF HYPOTHESIS
Decision
How to represent the hypothesis h
For linear regression – we assume that the
hypothesis is Linear
h( x ) 0 1 x
20
HYPOTHESIS
Generally we’ll have more than one input
features
x1=Living area h( x ) 0 1 x1 2 x2
x2 = # of bedrooms 21
HYPOTHESIS
Hypothesis
h( x ) 0 1 x1 2 x2
To show dependence on θ:
h ( x ) 0 1 x1 2 x2
OR
h( x | ) 0 1 x1 2 x2
Define x0 1 h ( x ) 0 x0 1 x1 2 x2
2
h ( x) i xi θs are called the
i 0 parameters and are real
For n features numbers
n Job of learning
h ( x) i xi T X alogrithm to find or
i 0
learn these parameters
23
CHOSING THE REGRESSION LINE
Which of these
lines to chose?
Y Y
X X 24
y h ( x ) 0 1 x
yˆ i
ˆ i yi
Error or residual y
yi
xi X 25
CHOSING THE REGRESSION LINE
How to chose this
best fit line
m
min ( h ( x (i ) ) y (i ) ) 2
Y
i 1
X
26
CHOSING THE REGRESSION LINE
Sum the error
over m training
To simplify examples We dont want
calculations negative values
m
1
J ( ) min ( h ( x (i ) ) y (i ) ) 2
2 i 1
28
θ0
θ1
GRADIENT DESCENT
Chose initial values of θ0 and θ1 and continue
moving the direction of steepest descente
The step size is controlled by a parameter
important
29
MODEL SELECTION
g x w1x w 0
Life is not as simple as
Non-Linear Regression
g x w 2x 2 w1x w 0
30
MODEL SELECTION
Inductive Bias
The set of assumptions we make to have
learning possible is called the inductive bias of
the learning algorithm.
Examples:
Chosing the hypothesis class – Rectangle
Regression – Assuming the function is linear
31
GENERALIZATION
Generalization: How well a model performs
on new data
Overfitting:
The chosen hypothesis is too complex
For example: Fitting a 3rd order polynomial on
linear data
Underfitting:
The chosen hypothesis is too simple
For example: Fitting a line on a quardatic
function
32
CROSS VALIDATION
34
SUMMARY
Model
h ( x) or h x|
Loss Function
m
E ( | x) J ( ) (h ( x (i ) ) y ( i ) ) 2
i 1
Optimization
min E ( | x)
35
COVARIANCE
n
(x i X )( yi Y )
cov ( x , y ) i 1
n 1
cov(X,Y) > 0 X and Y are positively
correlated
cov(X,Y) < 0 X and Y are inversely
correlated
cov(X,Y) = 0 X and Y are independent
36
CORRELATION COEFFICIENT
Pearson’s Correlation Coefficient is
standardized covariance (unitless):
cov( x, y )
r
var x var y
37
CORRELATION COEFFICIENT
Measures the relative strength of the linear
relationship between two variables
Unit-less
linear relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship
38
CORRELATION COEFFICIENT
Y Y
X X
r = -0.8 r = -0.6
Y
Y Y
39
X X
r = +0.8 r = +0.2
CORRELATION COEFFICIENT
Strong relationships Weak relationships
Y Y
X X
Y Y
40
X X
ACKNOWLEDGEMENTS
Machine Intelligence, Dr M. Hanif, UET, Lahore
Machine Learning, Andrew Ng – Stanford
University
Lecture Slides, Introduction to Machine
41