0% found this document useful (0 votes)
3 views

Supervised_Learning (2)

The document discusses supervised learning, specifically classification and linear regression, outlining how to classify data points into classes based on features and how to predict outcomes using linear relationships. It explains the concepts of hypothesis classes, training sets, and the importance of model selection and generalization in machine learning. Additionally, it covers the significance of covariance and correlation coefficients in understanding relationships between variables.

Uploaded by

halouma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Supervised_Learning (2)

The document discusses supervised learning, specifically classification and linear regression, outlining how to classify data points into classes based on features and how to predict outcomes using linear relationships. It explains the concepts of hypothesis classes, training sets, and the importance of model selection and generalization in machine learning. Additionally, it covers the significance of covariance and correlation coefficients in understanding relationships between variables.

Uploaded by

halouma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

PATTERN

RECOGNITION

1
SUPERVISED LEARNING
CLASSIFICATION

2
LEARNİNG A CLASS FROM EXAMPLES
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people
expect from a family car?
 Positive (+) and negative (–) examples
 Input representation:

x1: price, x2 : engine power

3
TRAİNİNG SET X
For each car

 x1 
x  
 x2 
 1 if x is positive
r 
0 if x is negative

For N training examples

t t N
X  {x ,r } t 1

4
CLASS C

p1 price p2  AND e1 engine power e2 


For suitable values
of p1,p2,e1 and e2

Class C is defined
by a rectangle in
the price-engine
power space.

5
CLASS C

p1 price p2  AND e1 engine power e2 


This equation fixes the hypothesis class H –
the set of rectangles
Learning algorithm to find a particular
hypothesis h Є H to approximate C as closely
as possible

Expert defines the hypothesis


class
Algorithm finds the parameters
6
HYPOTHESİS CLASS H
 1 if h classifies x as positive
h (x)  
0 if h classifies x as negative

Training Error: Predictions


of h which do not match
the required values in X

N
 
E (h | X )   1 h xt r t 
t 1
7
HYPOTHESİS CLASS H How to read?

N
 
E (h | X )   1 h xt r t 
t 1

Error on hypothesis
h given the training
set X

8
S, G, AND THE VERSİON SPACE

Most specific hypothesis, S


Most general hypothesis, G

h Î H, between S and G is
consistent

and make up the


version space

9
Ci for i=1,...,K

MULTIPLE CLASSES

 1 if xt
 Ci
X  {xt ,r t }tN1 t
ri   t
0 if x  C j , j i

Train hypotheses
hi(x), i =1,...,K:

 1 if xt
 Ci
 
t 
hi x   t
0 if x  C j , j i

10
Ci for i=1,...,K

MULTIPLE CLASSES
K Class problem = K – 2 class problems

Positive examples
for class : Luxury
Sedan

Rest ALL –
Negative examples

11
LINEAR REGRESSION

12
EXAMPLE

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m


Victoria Beckham: 1.68m Angelina Jolie: 1.70m Laura Bush: ?

 To predict height of the wife in a couple, based on the husband’s


height
 Response (out come or dependent) variable (Y):
 height of the wife
13
 Predictor (explanatory or independent) variable (X):
 height of the husband
WHAT IS LINEAR
 Remember this?

14
WHAT IS LINEAR
 A slope of 2
means that every
1-unit change in X
yields a 2-unit
change in Y.

15
EXAMPLE
 Dataset giving the living areas and prices of
50 houses

16
EXAMPLE
 We can plot this data

Given data like


this, how can we
learn to predict the
prices of other
houses as a
function of the size
of their living
areas?

17
NOTATIONS
 The “input” variables – x(i) (living area in this
example)
 The “output” or target variable that we are

trying to predict – y(i) (price)


 A pair (x(i), y(i)) is called a training example

 A list of m training examples {(x(i), y(i)); i =

 1, . . . ,m}—is called a training set

 X denote the space of input values, and Y

the space of output values

18
REGRESSION
Given a training set, to learn a function h :
X → Y so that h(x) is a “good” predictor for
the corresponding value of y. For historical
reasons, this function h is called a
hypothesis.

19
CHOICE OF HYPOTHESIS
 Decision
 How to represent the hypothesis h
 For linear regression – we assume that the
hypothesis is Linear

h( x )  0  1 x

20
HYPOTHESIS
 Generally we’ll have more than one input
features

x1=Living area h( x )  0  1 x1   2 x2
x2 = # of bedrooms 21
HYPOTHESIS
 Hypothesis
h( x )  0  1 x1   2 x2

 To show dependence on θ:

h ( x )  0  1 x1   2 x2
 OR
h( x |  )  0  1 x1   2 x2

This is the price that the hypothesis predicts


22
for a given house with living area x1 and
number of bedrooms x2
HYPOTHESIS
h ( x )  0  1 x1   2 x2
 For conciseness

Define x0 1 h ( x )  0 x0  1 x1   2 x2
2
h ( x)   i xi θs are called the
i 0 parameters and are real
 For n features numbers

n Job of learning
h ( x)   i xi  T X alogrithm to find or
i 0
learn these parameters
23
CHOSING THE REGRESSION LINE

Which of these
lines to chose?

Y Y

X X 24
y h ( x )  0  1 x

CHOSING THE REGRESSION LINE


The predicted value is:
yˆ i h ( xi )  0  1 xi

Y The true value for xi is yi

yˆ i
ˆ i  yi
Error or residual y
yi

Consider this point xi

xi X 25
CHOSING THE REGRESSION LINE
How to chose this
best fit line
m
min  ( h ( x (i ) )  y (i ) ) 2
Y 
i 1

Minimize the sum of the


In other words: squared (why squared?)
How to chose distances of the points (Yi’s)
the θs from the line for the m
training examples

X
26
CHOSING THE REGRESSION LINE
Sum the error
over m training
To simplify examples We dont want
calculations negative values
m
1
J ( )  min  ( h ( x (i ) )  y (i ) ) 2
2  i 1

Difference between what


hypothesis predicted and what
Find θ which
the actual value is
minimizes the
expression
27
min J ( )

GRADIENT DESCENT
 Chose initial values of θ0 and θ1 and continue
moving the direction of steepest descente
J(θ)

28
θ0
θ1
GRADIENT DESCENT
 Chose initial values of θ0 and θ1 and continue
moving the direction of steepest descente
 The step size is controlled by a parameter

called learning rate


 Starting point is

important

29
MODEL SELECTION

g x  w1x  w 0
Life is not as simple as

 Non-Linear Regression

g x  w 2x 2  w1x  w 0

Higher order polynomial


Linear

30

MODEL SELECTION
 Inductive Bias
 The set of assumptions we make to have
learning possible is called the inductive bias of
the learning algorithm.
 Examples:
 Chosing the hypothesis class – Rectangle
 Regression – Assuming the function is linear

 Learning – Need to chose a bias


 How to chose the right bias?
 Model Selection

31
GENERALIZATION
 Generalization: How well a model performs
on new data
 Overfitting:
 The chosen hypothesis is too complex
 For example: Fitting a 3rd order polynomial on
linear data
 Underfitting:
 The chosen hypothesis is too simple
 For example: Fitting a line on a quardatic
function

32
CROSS VALIDATION

 To estimate generalization error, we need


data unseen during training. We split the
data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
Chose the hypothesis that is best on the
validation set – Cross Validation
33
CROSS VALIDATION
 Example: Find the right order of polynomial in
regression?
 Use the training set to estimate the coefficients
 Caclulate the errors on the validation set
 Chose the one with the least validation error
 Question: What is the expected error of the
chosen model?
 Can NOT use the validation error
 The validation data has been used to chose the
model – effectively a part of training
 Use the TEST data set

34
SUMMARY
 Model
h ( x) or h  x| 

 Loss Function
m
E ( | x)  J ( )  (h ( x (i ) )  y ( i ) ) 2
i 1
 Optimization
min E ( | x)

35
COVARIANCE
n

 (x  i X )( yi  Y )
cov ( x , y )  i 1
n 1
cov(X,Y) > 0 X and Y are positively
correlated
cov(X,Y) < 0 X and Y are inversely
correlated
cov(X,Y) = 0 X and Y are independent

36
CORRELATION COEFFICIENT
 Pearson’s Correlation Coefficient is
standardized covariance (unitless):

cov( x, y )
r
var x var y

37
CORRELATION COEFFICIENT
 Measures the relative strength of the linear
relationship between two variables
 Unit-less

 Ranges between –1 and 1

 The closer to –1, the stronger the negative

linear relationship
 The closer to 1, the stronger the positive linear

relationship
 The closer to 0, the weaker any positive linear

relationship
38
CORRELATION COEFFICIENT
Y Y

X X
r = -0.8 r = -0.6
Y
Y Y

39
X X
r = +0.8 r = +0.2
CORRELATION COEFFICIENT
Strong relationships Weak relationships

Y Y

X X

Y Y

40
X X
ACKNOWLEDGEMENTS
 Machine Intelligence, Dr M. Hanif, UET, Lahore
 Machine Learning, Andrew Ng – Stanford

University
 Lecture Slides, Introduction to Machine

Learning, E. Alpyadin, MIT Press.

41

You might also like