0% found this document useful (0 votes)
32 views

Cours FLD

The document discusses discriminant functions used for classification. It begins by explaining that a classifier computes multiple discriminant functions and selects the class corresponding to the largest function value. It then derives the optimal discriminant function as the log probability of the class conditional density plus the log prior. For normal distributions, the discriminant function takes a quadratic form based on the mean and covariance of each class. The document examines special cases where the covariances are equal or proportional, showing the discriminant functions become linear. It provides examples calculating the decision boundaries for two-dimensional data. In general, with arbitrary covariances, the discriminant functions are quadratic and decision boundaries are quadratic surfaces.

Uploaded by

shi ft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Cours FLD

The document discusses discriminant functions used for classification. It begins by explaining that a classifier computes multiple discriminant functions and selects the class corresponding to the largest function value. It then derives the optimal discriminant function as the log probability of the class conditional density plus the log prior. For normal distributions, the discriminant function takes a quadratic form based on the mean and covariance of each class. The document examines special cases where the covariances are equal or proportional, showing the discriminant functions become linear. It provides examples calculating the decision boundaries for two-dimensional data. In general, with arbitrary covariances, the discriminant functions are quadratic and decision boundaries are quadratic surfaces.

Uploaded by

shi ft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Discriminant Functions

Master MLSD - Université Paris Cité

[email protected]

1 Université Paris Descartes


Discriminant Functions
▪ Classifier can be viewed as network which computes
m discriminant functions and selects category
corresponding to the largest discriminant
select class
giving maximim

discriminant
functions

features

▪ gi(x) can be replaced with any monotonically


increasing function, the results will be unchanged
20
Discriminant Functions
▪ The minimum error-rate classification is achieved by
the discriminant function
gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)
▪ Since the observation x is independent of the class,
the equivalent discriminant function is
gi(x) = P(x|ci)P(ci)
▪ For normal density, convinient to take logarithms.
Since logarithm is a monotonically increasing
function, the equivalent discriminant function is
gi(x) = ln P(x|ci)+ ln P(ci)

21
Discriminant Functions for the Normal Density
▪ Suppose we for class ci its class conditional density
p(x|ci) is N(i,i)
1 1
p(x | ci ) = exp {− (x − )i t −1
(x − )}
(2)
i i
i
1/ 2
d/ 2 2

▪ Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

▪ Plug in p(x|ci) and P(ci) get


constant for all i
1 d 1
g i (x )= − (x −  i )t  i−1(x − i) − ln2− ln  i + lnP(c i )
2 2 2

1 1
g i (x )= − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1

2 2
22
Case i = 2I
▪ That is

▪ In this case, features x1, x2. ,…, xd are independent


with different means and equal variances 2

23
Case i = 2I
▪ Discriminant function

▪ Det(i)= 2d and

▪ Can simplify discriminant function

constant for all i

24
Case i = 2I Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
g i (x ) = − 2 x −  i + ln P( c i )
1
g i (x ) = − x −  i
2 2

2

decision region decision region decision region


for c1 for c3 for c1
1 3 1 3
x in c3 decision regio
2
2 for c3
decision region
decision region
for c2
for c2
voronoi diagram: points in each
cell are closer to the mean in that cell
than to any other mean
25
Case i = 2I

constant
for all classes

discriminant function is linear

26
Case i = 2I

constant in x
gi (x ) = w x + w i0
t
i

linear in x:
d
w x=
t
i Σw x i i
i =1

▪ Thus discriminant function is linear,


▪ Therefore the decision boundaries
gi(x)=gj(x) are linear
▪ lines if x has dimension 2
▪ planes if x has dimension 3
▪ hyper-planes if x has dimension larger than 3
27
Case i = 2I: Example
▪ 3 classes, each 2-dimensional Gaussian with

P (c1 ) = P (c2 ) = P (c3 ) =


1 1
4 and 2

▪ Discriminant function is

▪ Plug in parameters for each class

28
Case i = 2I: Example
▪ Need to find out when gi(x) < gj(x) for i,j=1,2,3
▪ Can be done by solving gi(x) = gj(x) for i,j=1,2,3
▪ Let’s take g1(x) = g2(x) first

▪ Simplifying,

line equation
29
Case i = 2I: Example
▪ Next solve g2(x) = g3(x)

▪ Almost finally solve g1(x) = g3(x)

▪ And finally solve g1(x) = g2(x) = g3(x)

30
Case i = 2I: Example
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2

c3

lines connecting
c2 means
are perpendicular to
decision boundaries

c1

31
Case i = 

▪ Covariance matrices are equal but arbitrary

▪ In this case, features x1, x2. ,…, xd are not


necessarily independent

32
Case i = 
▪ Discriminant function

constant
for all classes
▪ Discriminant function becomes
( x − i )t Σ −1( x − i ) + ln P( c i )
1
gi ( x ) = −
2
squared Mahalanobis Distance

▪ Mahalanobis Distance
▪ If =I, Mahalanobis Distance becomes usual
Eucledian distance
x−y = ( x − y )t( x − y )
2
I −1

33
Eucledian vs. Mahalanobis Distances
x −  = (x − )t (x − ) x− = (x − )t Σ −1(x − )
2 2
−1

eigenvectors of 


points x at equal points x at equal


Eucledian Mahalanobis distance from
distance from  lie on an ellipse:
lie on a circle  stretches cirles to ellipses
34
Case i = Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
gi (x ) = − x − i
1
−1 g i (x ) = − x − i −1 + ln P( ci )
2
decision region decision region
for c2 for c2
2 decision region 2 decision region
for c3 for c3

1 3 1 3
decision region
for c1 decision region
for c1
points in each cell are closer to the
mean in that cell than to any other
mean under Mahalanobis distance
35
Case i = 
▪ Can simplify discriminant function:

▪ Thus in this case discriminant is also linear

36
Case i = : Example
▪ 3 classes, each 2-dimensional Gaussian with

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

37
Case i = : Example
▪ Let’s solve in general first

row vector scalar

38
Case i = : Example

▪ Now substitute for i,j=1,2


− 2 0x =0
x1 = 0

▪ Now substitute for i,j=2,3


− 3.14 − 1.4x = −2.41
3.14 x1 + 1.4 x 2 = 2.41

▪ Now substitute for i,j=1,3


− 5.14 − 1.43x = −2.41
5.14 x1 + 1.43 x 2 = 2.41
39
Case i = : Example
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2

c2

c1 lines connecting
means
are not in general
perpendicular to
decision boundaries
c3

40
General Case i are arbitrary
▪ Covariance matrices for each class are arbitrary
▪ In this case, features x1, x2. ,…, xd are not
necessarily independent

41
General Case i are arbitrary
▪ From previous discussion,
1 1
g i (x ) = − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1

2 2

▪ This can’t be simplified, but we can rearrange it:

42
General Case i are arbitrary
linear in x

constant in x
gi (x ) = xtWx + w t x + w i 0

quadratic in x since
d d d
x tWx = Σ Σ w ij x i x j = Σ w ij x i x j
j =1 i =1 i , j =1

▪ Thus the discriminant function is quadratic


▪ Therefore the decision boundaries are quadratic
(ellipses and parabolloids)

43
General Case i are arbitrary: Example
▪ 3 classes, each 2-dimensional Gaussian with

▪ Priors: P (c1 ) = P (c2 ) = 1 and P (c3 ) = 1


4 2

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

▪ Need to solve a bunch of quadratic inequalities of 2


variables
44
General Case i are arbitrary: Example

c2
c1

c3 c1

45
Important Points
▪ The Bayes classifier when classes are normally
distributed is in general quadratic
▪ If covariance matrices are equal and proportional to
identity matrix, the Bayes classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Eucledian distance classifier
▪ If covariance matrices are equal, the Bayes
classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Mahalanobis distance classifier
▪ Popular classifiers (Euclidean and Mahalanobis
distance) are optimal only if distribution of data
is appropriate (normal)

46

You might also like