Block-4-output
Block-4-output
Block
COMPUTER VISION-II
Unit 11 Object Detection 247
Unit 12 Object Recognition using Supervised Learning
Approaches 291
Unit 13 Object Classification using Unsupervised Learning
Approaches 321
243
C°^ ute° V'*°n 'PROGRAMME DESIGN COMMITTEE
Prof. (Retd.) S.K. Gupta, IIT, Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. Ela Kumar, IGDTUW, Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. T.V. Vijay Kumar JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Gayatri Dhingra, GVMITM, Sonipat Dr. V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
Mr.Milind Mahajan,. Impressico Business Sh. M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Solutions, New Delhi Dr. Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU
SOCIS FACULTY
Assistant Professor,
School of Computers and Information Sciences, IGNOU.
PRINT PRODUCTION
ShSanjay Aggarwal
Assistant Registrar, MPDD, IGNOU, New Delhi
June, 2023
Slndira Gandhi National Open University, 2023
Allrights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without
permission in writingfrom theIndira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from theUniversity's
office at Maidan Garhi, New Delhi-110068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by MPDD, IGNOU.
Laser Typesetter: Tessa Media& Computers, C-206, Shaheen Bagh, Jamia Nagar, New Delhi-1 10025
244
Multiple Camera
BLOCK4 INTRODUCTION
This block deals with Object detection, Object recognition, and Object
classification techniques for images.
In Unit 11, deals with the image segmentation which includes the detection
of edge, line, boundary and region. Various edge and line detection
algorithms are discussed. Various techniques of region based segmentation
are discussed along with boundary detection algorithms.
Unitl2 discusses the Object recognition using Supervised Learning
Approaches, which includes various image classifiers viz. Bayesian and
Minimum distance classifiers, the Linear and non-linear discriminant
function are also explained.
Unit 13 relates to Object classification using Unsupervised learning
approaches, it starts with explanation of clustering along with the need and
applications of clustering. The Hierarchial Clustering and Partition based
245
246
Object Detection
UNIT 11 OBJECT DETECTION
Structure Page No.
11.1 Introduction 247
Objectives
11.2 Object Detection 248
11.3 Image Segmentation 251
11.3.1Image Segmentation Techniques
11.4 Edge Detection 260
11.4.1Gradient Operators
l 1.4.2 Lapacian Operation
l 1.4.3 Line Detection
11.5 Region Detection 272
11.6 Boundary Detection 281
11.7 Feature Extraction 284
118 S 287
Objectives
After studying this unit, you should be able to
In short, the basic features of object class are to be defined and included ina
database of object models. Using feature extraction process, specific features
of the object we are looking for are to be identified and matched with the data
base foridentifying the object class.
248
The object detection is majorly classified as (1) Edge Detection, (2) Region Object Detection
Detection and (3) Boundary Detection
E h Segmentation
1. Medical Imaging
systems. Knowledge ofthesize of the crowd and tracking its motion can
be used to monitor traffic intersection. Intelligent walk signal system can
be designed based on the number of people waiting to cross the road.
Knowledge of thesize of the crowd is helpful in general safety, crowd
control and planning urban environment.
4. Security and Surveillance
Security of the national assets such as bridges, dams, tunnels etc is
critical in today's world. Automated smart system to detect ‘suspicious’
movements or activities, to detect left baggage or vehicle is crucial for
safely. Automated face detection systems try to matcha criminal's face
ina crowded place.
5. License Plate Recognition (LPR)
Automated license plate reading isa very useful and practical approach
as it helps in monitoring existing and illegally acquired license plates.
LPR can be used in private parking management, traffic monitoring,
250
automatic traffic ticket issuing, automatic toll payment, surveillance and Object Detection
security enforcement. Fig.6 shows thesegmented license plate.
Figure7
https://scikit-image.org/docs/stable/auto examples/applications/plot thresholding.html
(Source: Internet)
252
The thresholding based segmentation can befurther classified as: Object Detection
6. Normalized histogram:
253
Computer Vision-II
- Example
255
Computer Vision-II Canny edge detector
The Canny edge detector works on thefact that for edge detection, there isa
tradeoff between noise reduction (smoothing) and edge localisation.
Algorithm
Smooth theimage witha Gaussian filter
Compute thegradient magnitude and orientation
Apply non-maximal suppression to the gradient magnitude image
Use hysteresis thresholding to detect and link edges
3. Region-Based Segmentation
(Source:Internet)
• Region Growing
256
Object Detection
Figure. 11
https://towardsdatascience.com/image-segmentation-part-2-8959b609d268
Source: Internet
https://towardsdatascience.com/image-segmentation-part-2-8959b609d268
Figure. 13
|Z g -ZminJ<= threshold
Zmax - Maximum pixel intensity value ina regionZmin - Minimum pixel
intensity value ina region
Some properties that must be followed inregion based segmentation -
257
Computer Vision-II - Completeness - The segmentation must be complete i.e,Z Ri = R
Every pixel must be ina region
- Connectedness - The points ofa region must be connected
- Disjointness - Regions must be disjoint: RiHRj =Ø, foralli = 1,2,...n
- Satisfiability-Pixels of a region must satisfy one common P(Ri)
TRUE, foralli propertyP at least, i.e any region must satisfya
homogeneity predicateP
- Segment ability - Different regions satisfy different P(Ri U Rj)
= FALSE properties i.e any two adjacent regions cannot be
merged intoa single region
Example - Segment theimage given by split and merge algorithm
259
Computer Vision-II E3) Segmentation algorithms are generally based on which two properties
of intensity?
51 52 53 59 50 53 150 160
54 52 53 62 51 53 150 180
50 52 53 68 |58 55 154 170
55 52 53 55 |54 56 156 155
(a) (b)
Fig. 15: 2 Sub-Images
Different edge models have been defined based on their intensity profiles.
260 Fig. 16(a) shows a ‘step’ edge which involves transition between two
intensity values overa distance of one pixel. This is an ideal edge where no Object Detection
additional processing is needed foridentification. A ‘ramp’ edge is shown in
Fig. 16(b), where thetransition between two intensity levels takes place over
several pixels. In practice, all the edges get blurred and noisy because of
focusing limitations and inherent noise present in electronic components. A
‘point’ (Fig. 16(c)) is defined as only one or two isolated pixels having
different gray level values as compared to its neighbours. Whereasa roof
edge (Fig. 16(d)) is defined as multiple pixels having same or similar gray
level values which aredifferent from their neighbours.
261
Computer Vision-II 1 1
g(x, y)- I I f(x+ i,y + j) w(i, j) (3)
i=—1 j=—1
Now, before we discuss the edge detection approaches, let us discuss line
detection.
A line can be a small number of pixels ofa different color or gray level on
an otherwise unchanging background. For the sake of simplicity it is assumed
that the line is only single pixel thick. Fig. 18 shows line detection masks in
(g)Horizontal line
Fig. 19
263
Computer Vision-II
(a) (b)
Fig. 20: First and second derivative to a) light stripon dark background, b) dark strip
on light background
(a) (b)
Fig. 21: Example ofEdge Detection
Gradient isa first order derivative and is defined for an image f(x, y) as
If
Gx _ @
V f = (4)
Gy ‘f
264
Object Detection
It points in the direction of maximum rate of change off ata point (x, y).
/2
Magnitude ofthegradient is mag (V f) G2X + Gy
2
Gy'(Z3
+2Z6 +Z0) —(Z +2Z4 +Z7)
—1 —2 —1 —1 0 —1
0 0 0 —2 0 0
1 2 1 —111
Fig. 23: Sobel Operator
V2f =
TwoLaplacian masks are shown inFig. 25. for Fig. 25 (a), the Laplacian
equation is
Andfor
Fig. 15 (b), the Laplacian equation is
2
V f = 8Z5 — (Z1 + Z, -t- Z, -I-Z4 +Z6 +Z7 +Zg)•
g y ( ) ‹)
2 2 2
where,r = x + v and o is the standard deviation.
2 2
First derivative of Gaussian filter is H'(r) = re ’ ’2° .
1 r2 2 2
Second derivative of Gaussian filter is H'(r) = 2 2 —1 e r /2n
x2 + 2 2 2
—(x +y )/2s
2
H’(x, y) =C e
266
A5 x5 LOG mask is given in Fig. 26. Due to its shape, LOG is also known Object Detection
as ‘Mexicanhat’. Computing second derivative in this way is robust and
efficient. Zero crossings are obtained atr = =o.
0 0 —1 0 0
0 —1 —2 —1 0
—1 —2 — 16 —2 —1
0 —1 —2 —1 0
0 0 —1 0 0
Fig. 27 (a) is the input image, Fig. 27 (b) is the output of prewitt filter Fig.27
(c) is the output of Robert filter, Fig. 27 (d), Fig. 27 (e), and Fig.27 (f) are
outputs of Laplacian, Canny and Sobel filters respectively. As it is clear from
the figures, each filter extracts different edges in the image. Laplacian and
canny filters extract lot of inner details while sobel and robert filters extract
only the boundary. Prewitt filter extracts the entire boundary of the flower
without any gaps intheboundary.
267
Computer Vision-II
Fig. 27
Try the following exercises.
we use the angle and its distance from origin will be used forrepresenting
lines.
If‘r’ and ‘8’ represents distance of line from origin and its angle,
(2)
When we represent lines in the form ofy = ax+ b there is one problem. In
this form, the algorithm won't be able to detect vertical lines because the
slopea is undefined/infinity for vertical lines This would meana computer
would need an infinite amount ofmemory torepresent all possible values of
So, in Hough's transform we use the parametric form todescribe the lines, i.e
p = r cos(8)+ c sin(8), wherep is the normal distance of the line from the
origin, and 8 is the angle that the normal to the line makes with the positive
direction ofx-axis inthe positive direction.
269
Computer Vision-II
h // d d i /li d i ih h h f 84020b3b1549
value for each pixel (r,c), for multiple values ofp and 8, and store the
values in the array.
3. Finally, take the highest values in the above array. These will correspond
to the strongest lines in the image, and can be converted back toy = ax+b
form
Hough transform: The Hough transform is an incredible tool that lets you
identify lines. Not just lines, but other shapes as well.
Example: Using Hough transform show that the points (1,1), (2,2), and (3, 3)
are collinear find the equation of line.
Solution: The equation of line is y=mx+c, In order to perform Hough
transform we need to convert line from (x,y) plane to (m,c) plane Equation of
(m,c) plane is
Step 1: y =mx+C
For(1,1) y=mx+c 1=m+c C=-m+1
270
Ifc=0 then( 0=-m+1) m=1 Object Detection
271
Computer Vision-II b) Convolution Based Technique
Lines are detected using the equation (3) by using the response obtained after
convolving these masks with the image.
(3)
8 P P P J g
which have multiple regions corresponding to various portions of the object.
Therefore, it is necessary to partition an image into several regions that
correspond to objects or parts of things in order to interpret accurately. In
general, pixels ina region will have similar features. Pixels belonging toa
specific object can be identified by testing the following conditions
A. The mean ofthegrey value of pixels of an image and the mean ofthe
grey value of pixels ofa specific object in the image will be different
B. The standard deviation of the grey value of pixels belonging toa specific
object in the image will lie withina specific range.
C. The texture of the pixels of an object in the image will havea unique
property
But the connection between regions and objects is not perfect due to
segmentation errors. Therefore, we need to apply object-specific knowledge
inlater stages for image interpretation.
272
Region-based segmentation and boundary estimation using edge detection are Object Detection
two methods forsplittinga picture into areas.
Further, in boundary detection, semantic boundaries are considered to find
different objects or sections of an image. It is different from edge detection
because it does not use boundaries between light and dark pixels in an image.
The brief discussion on Region-based segmentation, boundary estimation,
and boundary detection is given below:
a) Region based segmentation: In region-based segmentation, all pixels of
an image belonging toa same area are grouped and labelled together.
Here, pixels are assigned to areas based on unique characteristic feature
which is different from other part of image. Value similarity and spatial
closeness are two important features of this segmentation process. If two
pixels are very close to one another and have similar intensity
characteristics, they may be allocated to the same region. For example,
the similar grey values can represent similar pixels and Eucledian
distance can represent the closeness of the pixels.
Figure 32
274
Here, initial segmentation may be the entire image (no segmentation). The Object Detection
criterion for inhomogeneity ofa segment may be the variance of gray levels
or the difference in its textures etc. Both splitting and merging methods seem
tobe top-bottom and bottom-top approach of the same method. But there isa
basic difference. Merging two segments is straight forward, but in splitting,
we need toknow thesub-segment boundary.
Let us discuss region growing.
Region growing isa process of merging adjacent pixel segments into one
segment. It is one of the simplest and very popular method of segmentation
which is used in many applications. It needsa set of starting pixels called
‘seed’ points. The process consists of picking a seed from the set and
examining all4 or8 connected neighbours of this seed and merging similar
neighbours to the seed as shown inFig. 33 (a). The seed point is modified
based on all merged neighbours Fig. 33 (b). The algorithm continuous until
the seed set is empty.
pixels having gray level value of g and assigning them grey level
k = 1,k z 1,k z g. Let(x, y)be the coordinates of initial seed, and let (a, b) be
the coordinates of pixel under investigation.
The algorithm
Push(x, y)
This isa recursive algorithm. The final region is extracted by selecting all
pixels having grey level value as 1(k). The algorithm can be modified by
changing the similarity measure to incorporatea range of values for merging.
The statement if f(a, b) =g can be changed to
Thus, if the grey level value of pixel (a, b) is between Fund g2then, it is
segmented. The algorithm can be further modified to incorporate multiple
seed points. In the above algorithm, only four neighbours are considered. It
can be modified for eight neighbourhood. Instead of using four push
instruction, eight push instruction can be used with coordinates of all eight
neighbours.
simple to implement. Only input needed are the seed points and selection
criterion. Multiple criteria can also be applied. The algorithm works well in
noisy environment also.
Major disadvantage of region growing is that the seed points are user
dependent. Selection of wrong seed points can lead to wrong segmentation
results. The algorithm is highly iterative and requires high computational
time and power.
Example 1: In the image segment given in Fig. 34 (a) seed points are given
at (3, 2) and (3, 4). Similarity criterion is grey level difference. Find
segmented image, if a)T =3 and b)T = 8.
Solution: ForT = 3, region growing starts with pixel (3, 2). All the pixels
having grey level difference <3 are assigned asa and denoted as regionRi.
Another region growing starts at (3, 4). All pixels with grey level value <3
are assigned asb and denoted as regionR2. Theoutput is shown inFig34(b).
For T = 8, all the pixels have grey level difference less than3 only one
276
region is formed, with all pixels being assigned as ‘a’. The output is shown Object Detection
inFig. 34 (c).
1 2 3 4 5
1 0 0 5 6 7
2 1 1 5 8 7
3 0 1 6 7 7
4 2 0 7 6 6
5 0 1 5 6 5
A a B B b a A a a a
) y( 8 )
b) Inhorizontal and vertical and diagonal directions (8 neighbourhood).
Similarity, criterion is the difference between two pixel values is less than or
equal to5
10 10 10 10 10 10 10
10 10 10 69 70 10 10
59 10 60 64 59 66 60
10 59 10 60 70 63 62
10 60 59 65 67 10 65
10 10 10 10 10 10 10
10 10 10 10 10 10 10
277
Computer Vision-II Solution: a) Region growing starts with seed point pixel with grey value 60
in the centre. It moves horizontally up and down, vertically up and down to
check how much given pixel value differs from 60. Ifthe difference is less
than equal to 5. Then it is assigned as ‘a’ and merged with theregion, else it is
assigned as ‘b’. Fig 36(a) shows theoutput.
b) If diagonal elements are also included then the region grows more as
shown inFig. 36(b).
bbbbbbb b b b b b b b
bbbbbbb b b b b b b b
bbaa abb a b a a a b a
bababbb b a b a b a a
baa abbb b a a a b b a
bbbbbbb b b b b b b b
bbbbbbb b b b b b b b
(a)
Output for4 Neighborhood (b)Output for8 Neighborhood
(a) (b)
(e)
Fig. 39: Split and merge algorithm
subdivided. Fig. 39(d) shows the segmented image and its final quad-tree
structure. Now all regions are homogeneous and hence no further splitting is
possible.
There isa need forstopping rule as stated in Step 10. We would only include
the spur at the right if we stop when we reach the initial point without
checking the next point. Starting from topmost left most point in Fig. 41 (a)
results in Fig. 41 (b). In Fig. 41 (c) the algorithm has returned to the starting
point again. Rest of the boundary could not be traced.
1 c0 b 0
1
1
1 1
1
1 1
1
t b
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
282
Now, we shall discuss the chain codes. Object Detection
3 6
(a) 4-direction chain code (b)8-direction chain code
Fig 42:DirectionN mbers
Now, tryanexercise.
E12) Find chain code and first difference of the following boundary shape.
284
Object Detection
11.7 FEATURE EXTRACTION
Feature extraction dividesa large set of data into smaller groups for quick
processing. There area large number of variables in these huge data sets
which requirea large amount ofprocessing power. Feature extraction extracts
the best feature from by selecting and combining variables into features.
285
Computer Vision-II Deep Learning Techniques forfeature extraction
Convolutional neural network (CNN) can replace Traditional feature
extractors because of their strong ability and efficiency to extract complex
features for expressing more detailed part of an image and can learn task
specific features.
then divides into two heads that are referred as decoders. One head is
responsible for locating potential places of interest, while the other is in
charge of describing those potential points of interest. Both ofthese activities
will make use ofthemajority of the network's parameters. Unlike previous
systems, which locate interest points first and then compute descriptors, this
one is able to share processing and representation between the two tasks.
Traditional systems locate interest points first and then compute descriptors.
As a consequence of this,a system has been developed that is effective for
completing tasks such as homography estimation, which require matching
geometric shapes.[1]
D2-Net: It isa trainable CNN based local feature detector and dense feature
descriptor(feature descriptor has minimum nonzero values)
286
Object Detection
i) Obtaining the local descriptors dij ata given spatial position (i,j) is as
easy as traversing all then feature maps Dk;
ii) Keypoint detection scores sij are calculated during training usinga soft
11.8 SUMMARY
In
this unit, we have discussed the following:
1. image segmentation techniques;
2. edge based segmentation;
3. line based segmentation; 287
Computer Vision-II 4. various region based segmentation techniques; and
5. boundary detection algorithm
fif
x fi x
V f =
fiy
0 0 -l 0 0
2 2 2 2 2 1 1 1
2 2 2 2 2 1 1 1
112 2 2 2 2 2
12222122
2 2 2 2 2 211
2 2 2 2 2 211
1212 2 2 2 2
112 2 2 2 2 2
290
Object Recognition
UNIT 12 OBJECT RECOGNITION USING Using Supervised
Learning Approaches
SUPERVISED LEARNING
APPROACHES
Structure Page No.
12.1 Introduction 291
Objectives
12.2 Basic Concepts 292
12.3 Discriminant Functions 296
12.4 Bayesian Classification 303
12.5 Minimum Distance Classifiers 308
12.6 Machine Learning Algorithms 311
127 Supervised Learning Approach 312
classify it into one out ofc classes. In the previous units it was discussed that
how the decision boundary surface between various classes can be used to
assigna class to each point in the test data. In this unit,a different approach is
being proposed, where theclass assigned will depend on how close the point
of the test data (that is, the pattern) is toa particular class. This gives rise to
Minimum Distance Classifier.
And now, we will list the objectives of this unit. After going through the unit,
please read this list again make sure you have achieved the objectives.
Objectives
After studying this unit, you should be able to
• define pattern recognition
• apply different types of classifiers
• describe discriminant Functions (linear and non-linear)
• use Bayesian classification.
291
Computer Vision-II • find minimum distance classifiers;
• apply machine learning algorithm;
• describe supervised learning approach;
• describe unsupervised learning approach.
Suppose so ebody e u be e s us b c so e g e
colored than ash. Then brightness becomes an obvious feature. We might
attempt to classify the lumber merely by seeing whether or not the average
brightness ‘x’ exceeds some critical value.
One characteristic of human pattern recognition is that it involvesa teacher.
Similarly a machine pattern recognition system needs to be trained. A
common mode of learning is to be givena collection of labeled examples,
known astraining data set. From thetraining data set, structure information is
distilled and used forclassifying new inputs.
Try an exercise.
292
Goal of pattern recognition is to reach an optimal decision rule to categorize Object Recognition
Using Supervised
the incoming data into their respective categories. A pattern recognition Learning Approaches
investigation may consist of several stages, enumerated below. Not all stages
may be present; some may be merged together so that the distinction between
two operations may not be clear, even if both arecarried out; also, there may
be some application-specific data processing that may not be regarded as one
of the stages listed. However, thepoints below arefairly typical.
1. Formulation of the problem: gaininga clear understanding of the aims
ofthe investigation and planning the remaining stages.
2. Data collection: making measurements on appropriate variables and
recording details of the data collection procedure (ground truth).
3. Initial examination of the data: checking the data, calculating summary
statistics and producing plots in order to geta feel for the structure.
4. Feature selection or feature extraction: selecting variables from the
measured set that are appropriate for the task. These new variables may
be obtained by a linear or nonlinear transformation of the original set
(f i ) h diii ff i
Some properties that could be possibly used to distinguish between the two
types of fishes are
• Length
• Lightness (Dark colour or light colour)
• Width
• Number andshape of fms
• Position of the mouth, etc...
293
Computer Vision-II This is the set of all suggested features to explore for use in the classifier.
Feature Vector
A Single feature may not be useful always forclassification. A set of features
used for classification forma feature vector. For example, here the relevant
feature vector could be
Feature Space
The samples of input (when represented by their features) are represented as
points in the feature space. Ifa single feature is used, then the feature space is
one- dimensional feature space shown infig. 2. If number of features is 2,
then we get points in 2D space as shown intheFig. 3. We can also have an
n -dimensional feature space.
F1
Fig. 3: Sample Points ina 2-Dimensional Feature Space
294
Object Recognition
Using Supervised
Learning Approaches
297
Computer Vision-II
You may seeclearly that in Fig. 9(a) the discriminant function is simplya cut
off, and in Fig. 9(b), the discriminant function isa line and in Fig. 9(c), the
discriminant function isa plane.
g(x) = w T x + w0 = wi xi + w0
Z
i1
rplane,
0
Table1
Candidate No. | English | Math | Decision
1 80 85 Accept
2 70 60 Reject
3 50 70 Reject
4 90 70 Accept
5 85 75 Accept
0
0 50 100 150
Fig. 11
(ii) To plot g(x) = 0, the easiest way is to set xl = 0, and find the value of
x2 sothat g(x) = 0.
Likewise we can also set x2 = 0, find the value of xl so that g(x) = 0. i.e.
0 = x1 +0 — 150, so, x1 = 150. [150, 0]T is on the hyperplane.
Plota straight line linking [0,150]T and [150, 0]T as shown inFig. 11.
300
Next, we shall discuss another discriminant function. Object Recognition
Using Supervised
Learning Approaches
Piecewise Linear discriminant Functions
Suppose we have m classes, define m linear discriminant functions
gi(x) gJ(x) Hj i
Such classifier is calleda linear machine that divides the feature space intoc
decision regions, withgi(x) being the largest discriminant if x is in the
regionRi.
gi (X) j (X)
d(XpHij
)'
Ina multi-class problem,a pattern x is assigned to the class for which the
discriminant function has the maximum value. A linear discriminant function
divides the feature space bya hyperplane whose orientation is determined by
the weight vector w and distance from theorigin by the weight threshold
301
Computer Vision-II Next discriminant function is quadratic discriminant function.
The classification rule is similar as well. You just find the class k which
maximizes the quadratic discriminant function. The decision boundaries are
quadratic equations in x. QDF allows more flexibility for the covariance
matrix, tends to fit the data better than LDF, butthen it has more parameters
to estimate. The number ofparameters increases significantly with QDF asa
separate covariance matrix is required for every class. If you have many
classes and not so many sample points, this can bea problem.
X,P(X)
Fig. 15: Bayes Theorem
_ prior x likelihood
Insimple words, posterior
evidence
In practice, we are only interested in the numerator of that fraction, since the
denominator does not depend on X and the values of the features WI are
given, so that the denominator is effectively constant. The numerator is
equivalent to the joint probability model p(X| Wl W ) which can be
This means that under the above independence assumptions, the conditional
distribution over the class variable can be expressed like this:
n
1
P(C) P( i
Z
'=l
whereZ (the evidence) isa scaling factor dependent only on Wf,...Wl, i.e.,
a constant, ifthe values of the feature variables are known.
All model parameters (i.e., class priors and feature probability distributions)
can be approximated with relative frequencies from the training set. These
304 aremaximum likelihood estimates of the probabilities. A class' prior may be
calculated by assuming equiprobable classes (i.e., priors = 1 / (number of Object Recognition
Using Supervised
classes)), or by calculating an estimate for the class probability from the Learning Approaches
training set (i.e., (prior fora given class) = (number of samples in the class)/
(total number ofsamples)).
q y P y
problematic since it will wipe out all information in the other probabilities
when they are multiplied. It is therefore often desirable to incorporate a
small-sample correction in all probability estimates such that no probability is
ever set to be exactly zero.
Despite the fact that the far-reaching independence assumptions are often
inaccurate, the naive Bayes classifier has several properties that make it
surprisingly useful in practice. In particular, the decoupling of the class
305
Computer Vision-II conditional feature distributions means that each distribution can be
independently estimated asa one dimensional distribution.
This in turn helps to alleviate problems stemming from the curse of
dimensionality, such as the need fordata sets that scale exponentially with
the number of features. Like all probabilistic classifiers under the MAP
decision rule, it arrives at the correct classification as long as the correct class
is more probable than any other class; hence class probabilities do not have to
be estimated very well. In other words, the overall classifier is robust enough
toignore serious deficiencies in its underlying naive probability model.
Properties of Bayes Classifiers
1. Incrementality: with each training example, the prior and the likelihood
can be updated dynamically. It is flexible and robust to errors.
2. Combines prior knowledge and observed data: prior probability ofa
hypothesis is multiplied with probability of the hypothesis given the
training data.
3. Probabilistic hypotheses: outputs is not only a classification, but a
P(f) 0.02
So, probability (percentage) of people having cold along with fever, out of all
those having fever, is = 4/20 = 0.2(20%).
Probability ofa joint event -a sample comes from classC and has the
feature value X :
P(C and X) = P(C).P(X| C)
= 0.01* 0.4
307
Computer Vision-II E6) Explain Bayes classifier.
E7) Explain properties of Bayes classifier.
1. Let the user select prototypes, i.e., one "example" pixel per class.
(Reduces theutility ofa clustering procedure.)
2. Devise an unbiased procedure for selecting prototypes (random selection,
selection at vertices of an arbitrary grid etc)
3. Use the user-selected prototype or unbiased selection procedure as the
starting point of an optimization procedure.
We shall discuss Euclidean distance classifier and Mahalanobis distance
classifiers here.
d,(x) = —Di(x)
The larger (less negative) d,(x), the closer the measurement vector lies
relative to the prototype vectorzi. Themaximum value of d,(x) is zero and
occurs when x matches theprototype vector exactly.
Algorithm
309
Computer Vision-II Step 2: Selecta pixel with measurement vector, x. The selection scheme is
arbitrary. Pixels could be selected at random.
Step 3: Let the first pixel be taken as the first cluster center, z,.
Step 4: Select the next pixel from theimage.
Step 5: Compute thedistance functions, D,(x). Compute thedistance
function for each of the classes established at this point, i.e.,
compute D,(x), fori = 1,..., N where N = thenumber ofclasses.
(N =1 initially.)
Step 6: Compare theD,(x) with T.
a) ifDi(X)< T,thenx C i’
b) ifDi(x)< T, forall i, then let x becomea new prototype
vector: Assign x —›zN+,. (Do notcomputeDN+, for pixels
already assigned to an existing class.)
It must be stated that the Euclidean classifier is often used, even if we know
that the previously stated assumptions are not valid, because of its simplicity.
It assignsa pattern to the class whose meanis closest to it with respect to the
Euclidean norm.
In many appliacations, the range of all feature value may differ widely. One
could be in hundreds while the other could be in decimal fractions. If this
issue is overlooked some feature values will get neglected. If one relaxes the
assumptions required by the Euclidean classifier and removes the last one,
the one requiring the covariance matrix to be diagonal and with equal
elements, the optimal Bayesian classifier takes the form of minimum
Mahalanobis distance classifier. That is, given an unknown x, it is assigned
to class mi if
310
Object Recognition
Using Supervised
Learning Approaches
whereS is the common covariance matrix. The presence of the covariance
matrix accounts for the shape of the Gaussians distributions of various
features.
As we see, Machine learning use the combination ofa training algorithm and
a prediction (or inference) algorithm. The training algorithm uses data to
gradually determine parameters. The set of all learned parameters is calleda
model, basicallya “set of rules” established by the algorithm, applicable even
to unknown data. The inference algorithm then uses the model and applies it
to any given data. Finally, it delivers the desired results.
Equipped with the right vocabulary, we can take a closer look at the
execution ofa machine learning project:
Supervised learning often leaves the probability for inputs undefined. This
model is not needed as long as the inputs are available, but if some of the
input values are missing, it is not possible to infer anything about the outputs.
Unsupervised learning, all the observations are assumed to be caused by
latent variables, that is, the observations is assumed to be at the end of the
causal chain. Examples of supervised learning and unsupervised learning are
shown intheFig. 2.
Step1 (Collect the dataset): Ifa requisite expert is available, then s/he
could suggest which fields (attributes, features) are the most
informative. If not, then the simplest method is that of measuring
everything available in the hope that the right (informative, relevant)
features can be isolated.
313
Computer Vision-II Step2 (Data preparation and data pre-processing): Depending on the
circumstances, there area number of methods to choose from to
handle missing data, outlier (noise) detection. There isa variety of
procedures for sampling instances froma large dataset. Feature
subset selection is the process of identifying and removing as many
irrelevant and redundant features as possible. This reduces the
dimensionality of the data and enables data mining algorithms to
operate faster and more effectively.
Step3 (Definea training set): The goal of the learning algorithm is to
minimize the error with respect to the given inputs. These inputs,
often called the "training set", are the examples from which the
agent tries to learn. But learning the training set well is not
necessarily the best thing to do. For instance, ifI tried to teach you
exclusive-or, but only showed you combinations consisting of one
true and one false, but never both false or both true, you might learn
the rule that the answer is always true. Similarly, with machine
learning algorithms,a common problem is over-fitting the data and
essentially memorizing the training set rather than learninga more
314
Object Recognition
Using Supervised
Learning Approaches
• Text categorization
• Face Detection
Si ii
316
In general,a good strategy for honing inon theright machine learning Object Recognition
Using Supervised
approach is to: Learning Approaches
12.9 SOLUTION/ANSWERS
feature selector/
Ei) sensor extractor classifier
representation feature decision
pattern pattern
317
Computer Vision-II Optical sensing is used to distinguish two patterns. A camera takes
pictures of the object and passes to on toa feature extractor. The feature
extractor reduces the data by measuring certain “properties” that
distinguish pictures of one object to the other. These features are then
passed toa classifier that evaluates the evidence presented and makesa
final decision about the object type.
One characteristic of human pattern recognition is that it involvesa
teacher. Similarlya machine pattern recognition system needs to be
trained. A common mode of learning is to be givena collection of
labeled examples, known astraining data set. From thetraining data
set, structure information is distilled and used for classifying new
inputs.
E2) The samples of input (when represented by their features) are
represented as points in the feature space. Ifa single feature is used,
then work ona one- dimensional feature space. If number offeatures is
2, then we get points in 2D space. We can also have an n-dimensional
feature space.
g(x) = wT x+ w0
318
Object Recognition
Using Supervised
Learning Approaches
E5) LDF assumes that the data are Gaussian. More specifically, it assumes
that all classes share the same covariance matrix.
• LDF finds linear decision boundaries ina K —1 dimensional subspace.
As such, it is not suited if there are higher-order interactions between
theindependent variables.
LDF i 11 i d f 1i 1 b1 b h ld b d ih
E7)
• Incrementality: with each training example, the prior and the
likelihood can be updated dynamically. It is flexible and robust to
errors.
• Combines prior knowledge and observed data: prior probability
ofa hypothesis is multiplied with probability of the hypothesis
given the training data.
• Probabilistic hypotheses: outputs is not onlya classification, but
a probability distribution over all classes.
• Meta-classification: the outputs of several classifiers can be
combined, e.g., by multiplying the probabilities that all classifiers
predict fora given class.
319
Computer Vision-II E8) The Euclidean distance, Di(x), ofa measurement vector, x, from the
prototype vector, Z,:
D(x)'—(x—zl=[x—zi(x—z)f'
E9) Thediscriminant function is usually defined as the negative of
the separation distance:
di(x) = —Di(x)
E10) Supervised learning algorithms are essentially complex algorithms,
categorized as either classification or regression models
320
Object Classification
UNIT 13 OBJECT CLASSIFICATION USING Using Unsupervised
Learning
UNSUPERVISED LEARNING
Structure Page No.
13.1 Introduction 321
Objectives
13.2 Introduction to Clustering 321
13.3 Major Clustering Approaches 326
13.4 Clustering Methods 327
13.5 Hierarchical Clustering 328
13.6 Partitional Clustering 334
13.7 k - MeansClustering 336
13.8 Summary 339
13.9 Solution/Answers 339
• define clustering
• define and use different clustering techniques
• apply Hierarchial Clustering
• use partition based clustering,
• apply K - NN clustering
In fact, clustering is one of the most utilized data mining techniques. It has a
long history, and used in almost every field, e.g., medicine, psychology,
botany, sociology, biology, archeology, marketing, insurance, libraries, etc.
In recent years, due to the rapid increase of online documents, text clustering
becomes important.
Let us see some real-life examples of clustering.
Example 1: Groups people of similar size to gather to make “small”,
“medium” and “large” T-Shirts.
k i di h i i i1 i i
by the method and its implementation. The quality ofa clustering method is
also measured by its ability to discover some orall of the hidden patterns.
We can measure the quality of clustering by dissimilarity/similarity metric.
Similarity is expressed in terms ofa distance function, which is typically
metric: d(i, j). There isa separate “quality” function that measures the
“goodness” ofa cluster. The definitions of distance functions are usually very
different for interval-scaled, boolean, categorical, and ordinal variables.
Weights should be associated with different variables based on applications
and data semantics. It is hard to define “similar enough” or “good enough”.
The answer is typically highly subjective.
Let us definea cluster in the following definition.
Clusters can be defined as collection of similar object groups together. A
cluster isa set of entities which arealike and at the same time entities from
different cluster are not alike.
In general Clusters may be defined as collection of points ina test space such
323
Computer Vision-II that the distance between any two points in the cluster is less than the
distance between any point in the cluster and any point outside the cluster.
In general, similarity and dissimilarity between data points is measured asa
function of the distance between them. The objects may also be grouped into
clusters based on different shapes and sizes.
Cluster analysis embracesa variety of techniques, the main objective of
which is to group observations or variables into homogeneous and distinct
clusters. A simple numerical example will help explain these objectives.
The daily expenditures on food (XI) and clothing (X2) of five persons are
shown inFig. 2.
Person xl x2
a 2 4
b 8 2
c 9 3
d 1 5
2 2 2
d ' ((Xi1 Xj1) + (Xi2 Xj ) + ’’’+ (Xid Xjd) )2 -
2
r +s
d(Xi› Xj) '
q + r +s +t
where q is the number of attributes that equal1 for both objects; t is the
number of attributes that equal0 for both objects; and s and r are the
number ofattributes that are unequal forboth objects.
A binary attribute is asymmetric, if its states are not equally important
(usually the positive outcome is considered more important). In this case, the
denominator ignores the unimportant negative matches (t). This is called the
Jaccard coefficient:
r +s
d(x;, xJ) =
q + r+ s
When theattributes are nominal, two main approaches may be used:
327
Computer Vision-II
Clustering
Hierarchical Nonhierarchical
a cluster.
p1
p3
p4
p2
p1 p2 p3 p4
(a)Dendrogram (b)Nested Clusters
Fig. 4: A Hierarchical Clustering of Four Points
Cluster b c d e
a 6.325 7.071 1.414 7.159
b 0 1.414 7.616 1.118
c 0 8.246 2.062
d 0 8.500
e 0
(a)
X2
X2
d
(be) 0 6.325 1.414 7.614 5 y
a 0 7.071 1.414
0 8.246 b R
K e
d 0
5 10 Xl
(a) (b)
Fig. 7: Nearest Neighbour Method, (Step 2).
Two pairs of clusters are closest to one another at distance 1.414; these are
(ad) and (bee). We arbitrarily select (ad) as the new cluster, as shown inFig.
7(b).
The distance between (be) and (ad) is
D(be, ad) = ruin{ D(be, a), D(be, d)} = ruin( 6.325, 7.616} = 6.325, while that
330 betweenc and (ad) is
Object Classification
D(c, ad) = ruin{ D(c, a), D(c, d)} = min{ 7.071, 8.246} = 7.071. Using Unsupervised
Learning
The three clusters remaining at this step and the distances between these
clusters are shown inFig.8 (a). We merge (be) withc to form the cluster
(bee) shown inFig.8 (b).
The distance between thetwo remaining clusters is
D(ad, bce) = min{ D(ad, be), D(ad, c)}= min{ 6.325, 7.071} = 6.325.
The grouping of these two clusters, it will be noted, occurs ata distance of
6.325,a much greater distance than that at which theearlier groupings took
place. Fig.9 shows thefinal grouping.
The groupings and the distance between the clusters are also shown inthe
tree diagram (dendrogram) of Fig.10. One usually searches the dendrogram
forlarge jumps inthegrouping distance as guidance inarriving at the number
ofgroups. In this example, it is clear that the elements in each of the clusters
(ad) and (bce) are close(they were merged ata small distance), but the
clusters are distant (the distance at which they merge is large).
331
Computer Vision-II Complete-link clustering (also called the diameter method, the maximum
method or the furthest neighbour method) - methods that consider the
distance between two clusters to be equal to the longest distance from any
member of one cluster to any member of the other cluster. The nearest
neighbour is not the only method for measuring the distance between
clusters. Under the furthest neighbor (or complete linkage) method, the
distance between two clusters is the distance between their two most distant
members. This method tends to produce clusters at the early stages that have
objects that are withina narrow range of distances from each other. If we
visualize them as objects in space the objects in such clusters would havea
more spherical shape as shown inFig. 11.
i dit ii i i1 it
(a)
Fig.12: Furthest Neighbour Method (Step 2).
The nearest clusters are (a) and (d), which arenow grouped into the cluster
(ad). The remaining steps are similarly executed.
You may confirm from the Example4 and Example5 that the nearest and
furthest neighbour methods produce the same results. In other cases,
332 however, the two methods may not agree. Consider Fig. 13(a) as an example.
The nearest neighbour method will probably not form the two groups Object Classification
Using Unsupervised
perceived by the naked eye. This is so because at some intermediate step the Learning
method will probably merge thetwo “nose" points joined in Fig. 13(a) into
the same cluster, and proceed to string along the remaining points in chain-
link fashion. The furthest neighbour method, will probably identify the two
clusters because it tends to resist merging clusters the elements of which vary
substantially in distance from those of the other cluster. On the other hand,
the nearest neighbour method will probably succeed in forming the two
groups marked inFig. 13(b), but the furthest neighbor method will probably
not.
334
, y Sample Nearest cluster Object Classification
centroid Using Unsupervised
14 4 Learning
2 8 4 (4,4) (4,4)
315 8 (8,4) (8,4)
4 24 4 (15,8) (8,4)
5 24 12 (24,4) (8,4)
(24,12) (8,4)
(a) x-y Coordinates for5 Points (b) First Iteration
For Step 4, we compute the centroid (6,4) and (21,8) of the clusters. As no
sample changed clusters, the algorithm terminates.
Try an exercise.
In the first step, shown in Fig. 19(a), points are assigned to the initial
centroids, which areall in the larger group of points. For this example, we
use the mean asthecentroid. After points are updated again. In steps 2, 3, and
4, which are shown in Fig. 19(b), (c), and (d), respectively, two of the
centroids move tothetwo small groups of points
(a) First Iteration (b) Second iteration (c) Third Iteration (d) Fourth Iteration
K e
0 Xl
5 10
(b)
Fig. 20: Means Method (Step 1)
2 2
D(a, abd) = (2 — 3.67) + (4 — 3.67) = 1.702.
D(a, ce)= (2 — 8.75)2 + (4 — 2)2 = 7.040.
| Cluster1 Cluster2 |
| Observation xl x2 | Observation xl x2
a 2 4 c 9 3
d 1 5 e 8.5 1
b 8 2
Average 1.5 4.5 Average 8.5 2
(a)
X2
d ci
i) Very fast algorithm (O (k.d . N), ifwe limit the number ofiterations)
2) Convenient centroid vector for every cluster
3) Can be run multiple times to get different results
338
Limitations: Object Classification
Using Unsupervised
Learning
1) Difficult to choose thenumber ofclusters,k
2) Cannot be used with arbitrary distances
3) Sensitive to scaling — requires careful preprocessing
4) Does notproduce the same result every time
5) Sensitive to outliers (squared errors emphasize outliers)
6) Cluster sizes can be quite unbalanced (e.g., one-element outlier clusters)
Try an exercise.
the Minkowski
1
d P
d(Xi› XQ)' I X{$k XJ$k Metric:
k l
339
Computer Vision-II Another well-known measure is the Manhattan distance which is
defined whenp = 1.
T
d
= 0.26
341
Computer Vision-II
342