Non Parametric Classification: Pattern Recognition
Non Parametric Classification: Pattern Recognition
Non-parametric classification
The assumption that the probability density function
(pdf) that generated the data has some parametric
form (e.g. Gaussian, etc.) may not hold true in many
cases
Non-parametric classifiers assume that the pdf does
not have any parametric form
It is based on finding the similarity between the data
samples
x1 , y1 , x2 , y 2 , x3 , y 3 ,L
L , xn , y n , x
xt
Class (xtClass
) = of the nearest neighbor
Problem Statement
Can we LEARN to recognise a rugby player?
Features
Rugby players = short + heavy?
190cm
130cm
60kg
90kg
Features
Ballet dancers = tall + skinny?
190cm
130cm
60kg
90kg
Feature Space
Rugby players cluster separately in the space.
Height
Weight
Whos this?
Height
Weight
Height
Weight
Height
Weight
Height
Weight
Distance Measure
Euclidean distance
d ( w w1 ) (h h1 )
2
(w, h)
Height
Weight
(w1, h1)
end
Advantage: Surprisingly good classifier!
Disadvantage: Have to store the entire training set in
memory
Whos this?
Distance Measure
Euclidean distance still works in 3-d, 4-d, 5-d, etc.
d ( x x1 ) ( y y1 ) ( z z1 )
2
x = Height
y = Weight
z = Shoe size
Over-fitting
An Important Concept in Pattern Recognition
Over-fitting
Looks good so far
Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?
Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?
Over-fitting
While an overly complex model may allow perfect
classification of the training samples, it is unlikely to give
good classification of novel patterns
Features
Choosing the wrong features makes it difficult
Too many features
Its computationally intensive
Possible features:
- Shoe size
- Height
- Age
- Weight
Shoe size
Age
PR Problem
Now how is this problem like
handwriting
recognition?
height
weight
Handwriting Recognition
Lets say the axes now represent pixel values.
A two-pixel image
255
pixel 1 value
(190, 85)
255
pixel 2 value
Handwriting Recognition
A three-pixel image
pixel 2
pixel 3
Handwriting Recognition
Distances between images
A three-pixel image
pixel 1
pixel 2
pixel 3
Handwriting Recognition
A four-pixel image.
Handwriting Recognition
16 x 16 image. How many dimensions?
Handwriting Recognition
Distances between digits.
?
maybe
probably
not
Which is closest neighbour in N-dimensions?
K-Mean Clustering
1.
2.
3.
4.
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
Assignment
Write a code to identify the three clusters in the
given image.
The image will be emailed to you.
You have to submit the code, and resulting image
by next week.
Class 2
Class 2
1-NN
3-NN
The test sample (green circle) should be classified either to the first class of
blue squares or to the second class of red triangles. If k = 3 it is assigned to
the second class because there are 2 triangles and only 1 square inside the
inner circle. If k = 5 it is assigned to the first class (3 squares vs. 2 triangles
inside the outer circle).
Image Source: Wikipedia
Distance Metric
A distance metric D(.,.) is merely a function that gives
generalized distance between two patterns
For three vectors a, b and c, a distance metric should hold
following properties
Non-negativity:
Reflexivity:
Symmetry:
D (a,b) 0
Triangle inequality:
Distance Metric
Euclidean distance possesses all the four properties
ak bk
k 1
D(a,b) a b
Euclidean distance
in d-dimensions
Lk (a,b)
a b
i 1
Distance Metric
Distance metrics may not be invariant to linear
transformations (e.g. scaling)
Distance Metric
Data normalization
If there is a large disparity in the ranges of the full
data in each dimension, a common procedure is to
rescale all the data to equalize such ranges
Decision Regions
Voronoi Diagram
Given a set of points (referred to as sites or nodes) a
Voronoi diagram is a partition of space into regions,
within which all points are closer to some particular
node than to any other node
Voronoi Editing
Each cell contains one sample, and every location within
the cell is closer to that sample than to any other
sample.
Every query point will be assigned the classification of
the sample within that cell.
Voronoi Editing
Knowledge of this boundary is sufficient to classify new
points.
The boundary itself is rarely computed; many algorithms
seek to retain only those points necessary to generate
an identical boundary.
Voronoi Editing
The boundaries of the Voronoi regions separating
those regions whose nodes are of different class
contribute to a portion of the decision boundary
Voronoi Editing
The nodes of the Voronoi regions whose boundaries
did not contribute to the decision boundary are
redundant and can be safely deleted from the training
set
Voronoi Editing
Points A1 and B1 are well separated. They are preserved by the Voronoi editing to
maintain a portion of the decision boundary. However, assuming that new points will
come from the same distribution as the training set, the portions of the decision boundary
remote from the concentration of training set points is of lesser importance.
Gabriel Graph
Two points A and B are said to be Gabriel neighbours if
their diametral sphere (i.e. the sphere such that AB is
its diameter) doesn't contain any other points
Wilson Editing
Remove points that do not agree with the
majority of their k nearest neighbours
Overlapping classes
Original data
Original data
Original data
Consistent set
Minimum Consistent
Set
References
Chapter # 4, Pattern Classification by Richard O.
Duda, Peter E. Hart & David G. Stork
Some part from chapter 10 of the book