0% found this document useful (0 votes)
7 views

Block-4-output

The document outlines a course on Digital Image Processing and Computer Vision, focusing on object detection, recognition, and classification techniques. It covers image segmentation methods, including edge detection, region detection, and boundary detection, along with supervised and unsupervised learning approaches for object recognition. The content is designed for academic purposes and includes contributions from various faculty members at Indira Gandhi National Open University.

Uploaded by

Prerna Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Block-4-output

The document outlines a course on Digital Image Processing and Computer Vision, focusing on object detection, recognition, and classification techniques. It covers image segmentation methods, including edge detection, region detection, and boundary detection, along with supervised and unsupervised learning approaches for object recognition. The content is designed for academic purposes and includes contributions from various faculty members at Indira Gandhi National Open University.

Uploaded by

Prerna Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

MCS-230

Digital Image Processing


Indira Gandhi National Open University and Computer Vision

Block

COMPUTER VISION-II
Unit 11 Object Detection 247
Unit 12 Object Recognition using Supervised Learning
Approaches 291
Unit 13 Object Classification using Unsupervised Learning
Approaches 321

243
C°^ ute° V'*°n 'PROGRAMME DESIGN COMMITTEE
Prof. (Retd.) S.K. Gupta, IIT, Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. Ela Kumar, IGDTUW, Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. T.V. Vijay Kumar JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Gayatri Dhingra, GVMITM, Sonipat Dr. V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
Mr.Milind Mahajan,. Impressico Business Sh. M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Solutions, New Delhi Dr. Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU

COURSE DESIGN COMMITTEE


Prof. T.V. Vijay Kumar, JNU, New Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. S.Ba1asundaram, JNU, New Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. D.P. Vidyarthi, JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Anjana Gosain, USICT, GGSIPU, New Delhi Dr. V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
Sh.M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Dr.Ayesha Choudhary, JNU, New Delhi Dr.
Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU

SOCIS FACULTY

Assistant Professor,
School of Computers and Information Sciences, IGNOU.

PRINT PRODUCTION
ShSanjay Aggarwal
Assistant Registrar, MPDD, IGNOU, New Delhi
June, 2023
Slndira Gandhi National Open University, 2023
Allrights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without
permission in writingfrom theIndira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from theUniversity's
office at Maidan Garhi, New Delhi-110068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by MPDD, IGNOU.
Laser Typesetter: Tessa Media& Computers, C-206, Shaheen Bagh, Jamia Nagar, New Delhi-1 10025

244
Multiple Camera
BLOCK4 INTRODUCTION
This block deals with Object detection, Object recognition, and Object
classification techniques for images.
In Unit 11, deals with the image segmentation which includes the detection
of edge, line, boundary and region. Various edge and line detection
algorithms are discussed. Various techniques of region based segmentation
are discussed along with boundary detection algorithms.
Unitl2 discusses the Object recognition using Supervised Learning
Approaches, which includes various image classifiers viz. Bayesian and
Minimum distance classifiers, the Linear and non-linear discriminant
function are also explained.
Unit 13 relates to Object classification using Unsupervised learning
approaches, it starts with explanation of clustering along with the need and
applications of clustering. The Hierarchial Clustering and Partition based

245
246
Object Detection
UNIT 11 OBJECT DETECTION
Structure Page No.
11.1 Introduction 247
Objectives
11.2 Object Detection 248
11.3 Image Segmentation 251
11.3.1Image Segmentation Techniques
11.4 Edge Detection 260
11.4.1Gradient Operators
l 1.4.2 Lapacian Operation
l 1.4.3 Line Detection
11.5 Region Detection 272
11.6 Boundary Detection 281
11.7 Feature Extraction 284
118 S 287

shows theoutput of object detection algorithm withina room anda scene on


the road. We can see objects lie bottles, glasses, laptop, chair, bag etc inside
the room. Where as in the out-door environment, we can see car, two
wheelers bus etc

Figure1 Object detection


247
Computer Vision-II Sometimes we areinterested in analysis and interpretation of various features
which areparts of the image. Object detection involves Image segmentation,
which is the process of finding partitions of an image intheform of groups of
pixels which are homogeneous with respect to the feature under
consideration. We shall begin this unit by discussing the image segmentation
and its applications in Sec. 11.2. In Sec. 11.3, we shall discuss various image
segmentation techniques. We shall discuss edge, detection with, region
detection and boundary detection their applications in the Secs. 11.4, 11.5
and 11.6 respectively.
And now, we will list the objectives of this unit. After going through the unit,
please read this list again make sure you have achieved the objectives.

Objectives
After studying this unit, you should be able to

• define Object Detection/image segmentation techniques


1 d b d i li b d i

Figure2 The object detection process

In short, the basic features of object class are to be defined and included ina
database of object models. Using feature extraction process, specific features
of the object we are looking for are to be identified and matched with the data
base foridentifying the object class.

We can represent object in multiple ways and accordingly features can be


extracted for object identification. The inner region of the object can be
represented by some features like gradient, moments, texture etc. Similarly,
boundary can be identified based on pattern of pixels, Fourier descriptor etc.

248
The object detection is majorly classified as (1) Edge Detection, (2) Region Object Detection
Detection and (3) Boundary Detection

Object detection involves Image segmentation which further involves the


division of any image into meaningful structures. These structures depict the
objects that constitute an image. Segmentation is the first step in image
analysis and pattern recognition. Different features in images are identified
and extracted by segmentation algorithms. Accuracy of extracted features
decide the accuracy of automatic recognition algorithms. Thus,a suitable and
rugged segmentation algorithm should be chosen very carefully. Selection of
a suitable algorithm is highly application dependent. Image segmentation is
one of the most difficult tasks in image processing. Generally, many image
processing tasks are aimed at findinga group of pixels in an image that are
similar or connected to each other in some way. We are showing the step
image analysis, object representation, visualization, understanding and
classification in Fig. 3.

E h Segmentation

Fig. 4: Image toKnowledge Mapping

Image segmentation is the fundamental step in image analysis, understanding,


interpretation and recognition tasks. It is the process of decomposinga scene
into different components. Segmentation partitions an image into multiple
homogeneous regions with respect to some characteristics. In practice, it
groups the pixels having similar attributes into one group. The result of
image segmentation isa set of regions that collectively cover the entire image
ora set of contours extracted from the image. Each pixel ina particular
region is similar to the other pixel with respect to some characteristics such as 249
Computer Vision-II colour, edge, texture, etc. Segmentation is an intermediate stage between low
level and high level image processing tasks. Low level tasks manipulate pixel
values for irregularity correction or for image enhancement, whereas high
level tasks manipulate and analyse a group of pixels that convey some
information.

Segmentation is the most important step in automated recognition system


which hasnumerous applications and some ofthem arediscussed below:

1. Medical Imaging

Segmentation is used to locate tumors, measure tissue volumes, computer


aided surgery, diagnosis, treatment planning, study of anatomical structures
etc. Fig.5 shows thesegmented portion of brain.

systems. Knowledge ofthesize of the crowd and tracking its motion can
be used to monitor traffic intersection. Intelligent walk signal system can
be designed based on the number of people waiting to cross the road.
Knowledge of thesize of the crowd is helpful in general safety, crowd
control and planning urban environment.
4. Security and Surveillance
Security of the national assets such as bridges, dams, tunnels etc is
critical in today's world. Automated smart system to detect ‘suspicious’
movements or activities, to detect left baggage or vehicle is crucial for
safely. Automated face detection systems try to matcha criminal's face
ina crowded place.
5. License Plate Recognition (LPR)
Automated license plate reading isa very useful and practical approach
as it helps in monitoring existing and illegally acquired license plates.
LPR can be used in private parking management, traffic monitoring,
250
automatic traffic ticket issuing, automatic toll payment, surveillance and Object Detection
security enforcement. Fig.6 shows thesegmented license plate.

Fig. 6: Example ofimage segmentation in LPR

6. Industrial Inspection and Automation

Now we shall discuss the classification of segmentation in the following


section.

11.3 IMAGE SEGMENTATION


Image segmentation helps in simplifying the tasks and goals of
computer vision and image processing techniques. Segmenting an
image is often considered as the first step for image analysis.
Image segmentation is done by splitting an image into multiple parts based
on similar characteristics of pixels for identifying objects
There are several techniques through which an image can be segmented,
based on division and group-specific pixels which can be further assigned
labels, and classified according to these labels. The generated labels, can be
used in several supervised, semi-supervised and unsupervised training and
testing tasks in machine learning and deep learning applications.
Image segmentation playsa vital importance in computer vision and has 251
Computer Vision-II several applications in various industries and research. Some of the
commonly used applications are Facial recognition, number plate
recognition, image search, analysis of medical image etc
Classification oflmage segmentation methods Researchers are working on
image segmentation for over a decade. The commonly used methods
involves “Classification based on method of Identification” where the
images can be segmented either by grouping similar pixels or by
differentiating them by identifying the boundary. The Region based
identification method and Boundary based Identification method of
segmentation are discussed below:
• Region based Identification: In this method, similar pixels are selected
based on predefined threshold. Then these selected pixels are grouped
together using clustering algorithms like SVM, K-Means, nearest
neighbor etc. based on similar attributes or features. These attributes or
features can be used for grouping similar pixels for region merging,
region growing, region spreading etc.

converted intoa binary image using thresholding technique as shown in


theFigure.7

Figure7
https://scikit-image.org/docs/stable/auto examples/applications/plot thresholding.html
(Source: Internet)
252
The thresholding based segmentation can befurther classified as: Object Detection

1. Simple Thresholding, and


2. Adaptive Thresholding

Simple Thresholding In the Simple Thresholding method (also known as


global thresholding), all pixels will be converted into white or black based on
the reference pixel intensity value. If the intensity value is less than the
reference (threshold value), the pixel will be converted into black and if it is
greater, the pixel will be converted into white pixel
Algorithm
1. Initial estimate ofT
2. Segmentation using T: G1, pixels brighter than T, G2, pixels darker than
(orequal to) T.
3. Computation oftheaverage intensities ml and m2 of G1 and G2.
4 N th h ld 1 T ( 1+ 2)/2

scanning documents, Removing unwanted colors, Pattern recognition


etc.
Algorithm
1. Otsu's method is aimed infinding the optimal value forthe global
threshold.
2. It is based on the interclass variance maximization.
3. Well thresholded classes have well discriminated intensity values.
4. M X N image histogram: L intensity levels, [0, ..., L — 1]
5. ni - number ofpixels of intensityi

6. Normalized histogram:

253
Computer Vision-II

7. calculate the between-class variance value


8. the final threshold is the maximum between-class variance value
Example -

Variable Thresholding, ifT can change over theimage.


- Local or regional thresholding, ifT depends ona neighborhood of(x, y).
- Adaptive thresholding, ifT isa function of(x, y).
- Multiple thresholding: g(x, y) = a, if f(x, y) > T2
b, if T1 < f(x, y) T2
c, if f(x, y) TI
After Thresholding Segmentation, the next is Edge Based Segmentation,
which is discussed below:
2. Edge-Based Segmentation
In this method, the objects are identified based on the edge detection. The
edge detection is done based on the pixel properties like texture, contrast,
colour, saturation, intensity etc. The results of edge-based image
segmentation are shown inFigure9
254
Object Detection

Figure9 Source: Internet


There aretwo commonly used methods foredge detection
• Search-Based Edge method: In this method, edge detection is done based
on edge strength. It is calculated based on the local directional maxima
of thegradient magnitude througha computed estimate of the edge's
local orientation
• Zero-Crossing Based Edge method: In this method, the edges are

- Identify edge pixels as those for which there isa zero-crossing in L. A


radially-symmetric 2D Gaussian:
Where
The Laplacian of this is:

- Example

255
Computer Vision-II Canny edge detector

The Canny edge detector works on thefact that for edge detection, there isa
tradeoff between noise reduction (smoothing) and edge localisation.

Algorithm
Smooth theimage witha Gaussian filter
Compute thegradient magnitude and orientation
Apply non-maximal suppression to the gradient magnitude image
Use hysteresis thresholding to detect and link edges
3. Region-Based Segmentation

This algorithm, identify group of pixels with specific characteristics either


froma small section or froma bigger portion of input images seed point.
Then the algorithm will add more pixels or shrink based on specific
characteristics of pixels with all other seed points. Thus, we can geta

(Source:Internet)
• Region Growing

In this method, the pixels are merged according to particular similarity


conditions wherein at first, small set of pixels are grouped together and
merged. This phenomenon is continued iteratively until all the pixels are
ground and merged to one another. Basically, the algorithm picks up pixel
randomly and finds out matching neighbouring pixels and adds them and
continue the same till finding outa dissimilar pixel. After that it will find out
another seed point and it will continue.

To avoid overfitting, these algorithms will grow multiple region


simultaneously. Such algorithms will work even with noisy images

256
Object Detection

Figure. 11
https://towardsdatascience.com/image-segmentation-part-2-8959b609d268
Source: Internet

• Region Splitting and Merging

This algorithm focuses on splitting and merging portions of the image.


It splits image based on attributes and then merge regions based on
similar attributes. While splitting the whole image will be considered
and while region growing it will be concentrating specific points

https://towardsdatascience.com/image-segmentation-part-2-8959b609d268
Figure. 13

|Z g -ZminJ<= threshold
Zmax - Maximum pixel intensity value ina regionZmin - Minimum pixel
intensity value ina region
Some properties that must be followed inregion based segmentation -
257
Computer Vision-II - Completeness - The segmentation must be complete i.e,Z Ri = R
Every pixel must be ina region
- Connectedness - The points ofa region must be connected
- Disjointness - Regions must be disjoint: RiHRj =Ø, foralli = 1,2,...n
- Satisfiability-Pixels of a region must satisfy one common P(Ri)
TRUE, foralli propertyP at least, i.e any region must satisfya
homogeneity predicateP
- Segment ability - Different regions satisfy different P(Ri U Rj)
= FALSE properties i.e any two adjacent regions cannot be
merged intoa single region
Example - Segment theimage given by split and merge algorithm

introduced and analyzed convolutional neural network, generative adversarial


networks, deep belief network, and extreme learning machines etc., to
perform excellent image segmentation for various applications in healthcare,
traffic monitoring, satellite visualization, bakery, plant diseases, etc.
Segmentation through neural networks, especially a convent is done by
generatinga feature map for an input image data. Thena region based filter is
further applied to generate the mask according to ones objectives and
applications. Bounding boxes play a very important role in image
segmentation. They can be generated through various techniques and consist
of coordinates from thesegmented part.
A great example of neural network forimage segmentation has been released
by the experts at Facebook AI Research (FAIR) who createda deep learning
architecture called Mask R-CNN which can makea pixel-wise mask forevery
object present in an image. It is an enhanced version of the Faster R-CNN
object detection architecture. The Faster R-CNN uses two pieces of data for
every object in an image, the bounding box coordinates and the class of the
258
object. With Mask R-CNN, you getan additional section in this process. Object Detection
Mask R-CNN outputs the object mask after performing the segmentation.
11.3.1 Image Segmentation Techniques
Image segmentation partitions an image into set of regions. In many
applications, regions represent meaningful areas in an image. In other
applications, regions might be set of border pixels grouped into structures
such as line segments, edges etc. The level of partitioning of an image
depends on the problem to be solved. Segmentation should stop when the
objects of interests have been isolated. Segmentation has two objectives:
a) To decompose an image into regions for further analysis.
b) To performa change of representation of an image forfaster analysis.
There area number of segmentation techniques available. Segmentation is
application dependent.A single segmentation technique may not be suitable
for different applications. Hence, these techniques have to be combined with
the domain knowledge in order to effectively solve the problem Generally

i) Edge Based Segmentation: Edge based method isa commonly used


technique to detect boundaries and discontinuities in an image. With this
technique, detected edges are assumed torepresent object boundaries and
used to identify objects. The assumption here is that every part of the
object is sufficiently uniform such that they can be separated on the basis
of discontinuity alone.
ii) Region Based Segmentation: Edge based techniques find the object
boundaries and then locate the objects, whereas region based techniques
look foruniformity withina sub-region based ona suitable property like
intensity, colour, texture etc. Region based segmentation starts in the
middle of an object and then ‘grows’ outwards till it meets the object
boundary.
Try an exercise.

259
Computer Vision-II E3) Segmentation algorithms are generally based on which two properties
of intensity?

In the following section, we shall discuss the various techniques of image


segmentation.

11.4 EDGE DETECTION — EDGE BASED


SEGMENTATION
Asobject can be fully represented by its edges, segmentation of an image
into separate objects can be achieved by locating edges of these objects. Edge
detection isa fundamental tool in image processing and computer vision. The
aim is to identify points in an image at which thebrightness change sharply.
Changes in image brightness can be due to discontinuities in depth,
discontinuities in surface orientation, changes in material property or
variation in scene illumination. Result of an edge detection algorithm isa set
of connected curves that may indicate object boundaries. This significantly

We discussed akeady discussed on edges in earlier sections. You may recall


that an edge may be loosely defined as a line of pixels showing an
‘observable’ difference. For example, consider two sub-images shown inFig.
15. In the sub image Fig. 15(b), there isa clear difference between the gray
levels in the second and third columns which can be easily picked by the
human eyewhere as in sub-image of Fig.15 (a) no such difference can be
seen.

51 52 53 59 50 53 150 160
54 52 53 62 51 53 150 180
50 52 53 68 |58 55 154 170
55 52 53 55 |54 56 156 155
(a) (b)
Fig. 15: 2 Sub-Images

Different edge models have been defined based on their intensity profiles.
260 Fig. 16(a) shows a ‘step’ edge which involves transition between two
intensity values overa distance of one pixel. This is an ideal edge where no Object Detection
additional processing is needed foridentification. A ‘ramp’ edge is shown in
Fig. 16(b), where thetransition between two intensity levels takes place over
several pixels. In practice, all the edges get blurred and noisy because of
focusing limitations and inherent noise present in electronic components. A
‘point’ (Fig. 16(c)) is defined as only one or two isolated pixels having
different gray level values as compared to its neighbours. Whereasa roof
edge (Fig. 16(d)) is defined as multiple pixels having same or similar gray
level values which aredifferent from their neighbours.

Second order derivative of f(x) is given by


fi2f
2
= f’(x) = f(x+ 1)+ f(x —1) — 2f(x) (2)
Let the values of the ramp edge inFig. 16(b) from left to right be

f(x) = 20 20 20 20 20 50 100 180 180 180 180 180

first derivative — 0 0 0 0 +30 +50 +80 0 0 0 —


f'(X) = [using Eqn. (1)]
second derivative — 0 0 0 0 30 20 30 —80 0 0 0
f'(X) = [using Eqn. (2)]

Generally, for the implementation of first or second derivative, masks are


generated. These masks areconvolved with the image to getthe result. For
3 x3 mask shown inFig. 17, the output is calculated by

261
Computer Vision-II 1 1
g(x, y)- I I f(x+ i,y + j) w(i, j) (3)
i=—1 j=—1

Where f(x, y) is the image with which mask w is being multiplied.

w(—1,1) w(0,—1) w(1,—1)


w(—1,0) w(0,0) w(1,0)
w(—1,1) w(0,1) w(1,1)

Fig. 17: 3 • 3 mask

Now, before we discuss the edge detection approaches, let us discuss line
detection.
A line can be a small number of pixels ofa different color or gray level on
an otherwise unchanging background. For the sake of simplicity it is assumed
that the line is only single pixel thick. Fig. 18 shows line detection masks in

circuit are shown inFig. 19 (a) to Fig. 19 (g).

Now let us discuss edge detection approaches.

Edge detection is the most common approach used insegmentation. A typical


edge may be the border betweena block of red color anda block of yellow.
Edge can also be the boundary between two regions with relatively distinct
gray-level properties. Computation ofa local derivative operator can enhance
edges. Fig.20 shows theresponse of first and second derivative to light strip
on dark background (a) and dark strip on light background (b). Edge is the
smooth change ingrey levels. First derivative is positive at the leading edge
of transition and negative at the trailing edge of transition. It is zero in areas
of constant gray level.
The response of second derivative is different from first derivative. It is
positive for dark side, negative for light side and zero in constant area.
‘Magnitude’ of first derivative is used to detect the presence of an edge.
262
‘Sign’ of second derivative is used to determine whether edge pixel lies on Object Detection
dark side or on light side of edge. ‘Zero crossing’ is at the midpoint ofa
transition in gray level.
Edge detection isa non-triva1 task. Edge- detection is not as simple as it looks
in earlier section. In practice, edges are corrupted by noise and blurring. This
can be illustrated by the following examples of edge detection on a one —
dimensional array. Looking at Fig. 21(a), we can intuitively say that there is
an edge between 4" & 5th pixel. But, if the intensity difference between4th &
5th pixel is smaller because of noise and if intensity difference between the5*
and6th pixels is higher as in Fig. 21 (b) it would notbe easy to identify the
location of the edge precisely. As the edges are comipted by noise, we
observe multiple edges instead ofa single edge.

(e) +45 line (f) Result of horizontal mask

(g)Horizontal line
Fig. 19
263
Computer Vision-II

(a) (b)
Fig. 20: First and second derivative to a) light stripon dark background, b) dark strip
on light background

(a) (b)
Fig. 21: Example ofEdge Detection

11.4.1 Gradient Operator


There are many edge detection methods, broadly, we can classify them into
two categories: First order Gradient based search method, and Laplacian
based zero crossing method. The first order derivative based search methods
detect edges by computing the gradient magnitude and then searching for
local directional maxima ofthegradient magnitude. The zero crossing based
methods search for the zero crossing ina second order derivative computed
from theimage to find edges.

Gradient isa first order derivative and is defined for an image f(x, y) as

If
Gx _ @
V f = (4)
Gy ‘f

264
Object Detection
It points in the direction of maximum rate of change off ata point (x, y).
/2
Magnitude ofthegradient is mag (V f) G2X + Gy
2

This can be approximated as (H f) =| Gx | + Gy and direction of the


Gx
gradient is given by n(x, y) = tan*i , where o is measured with respect
Gy
to x -axis.
Now, we present again discussion on various gradicut operators such as,
Prewitt operator and Sobel operator.
i) Prewitt Operator
It usesa 3x3 size mask that approximate first derivative. The x direction
mask andy direction mask is shown inFig. 22. The approach used is
Gx ' (Z7+ Z + Z9) — (Zl +Z2+ Z3)

Gy'(Z3
+2Z6 +Z0) —(Z +2Z4 +Z7)
—1 —2 —1 —1 0 —1
0 0 0 —2 0 0
1 2 1 —111
Fig. 23: Sobel Operator

Sobel operator has the advantage of providing botha derivative and a


smoothing effect. This smoothing effect has noise suppression characteristics.
Diagonal edges can be detected by prewitt and sobel masks by rotating the
earlier masks by 450 counter clockwise. Fig. 24 shows thetwo masks.
—1—10 0 1 1 0 1 2 —2 —10
—10 0 —101 —10 1 —10 1
0 1 1 —1—11 —2 —10 0 1 2
(a)
Prewitt (b) Sobel
265
Computer Vision-II Fig. 24: Prewitt and Sobel Masks forDiagonal Edge Detection.
Now, let us discuss the laplacian operator.

11.4.2 The Laplacian Operator


Laplacian,a second order derivative is defined fora 2D function f(x, y), as

V2f =

TwoLaplacian masks are shown inFig. 25. for Fig. 25 (a), the Laplacian
equation is

V2f =4Z5—(Z2 +Z4 +Z6 + Zg)

Andfor
Fig. 15 (b), the Laplacian equation is
2
V f = 8Z5 — (Z1 + Z, -t- Z, -I-Z4 +Z6 +Z7 +Zg)•

g y ( ) ‹)
2 2 2
where,r = x + v and o is the standard deviation.

2 2
First derivative of Gaussian filter is H'(r) = re ’ ’2° .

1 r2 2 2
Second derivative of Gaussian filter is H'(r) = 2 2 —1 e r /2n

After returning to original coordinates x,y and introducinga normalizing


coefficient C, a convolution mask ofLOG (Laplacian of Gaussian) operator
is given by

x2 + 2 2 2
—(x +y )/2s
2
H’(x, y) =C e

266
A5 x5 LOG mask is given in Fig. 26. Due to its shape, LOG is also known Object Detection
as ‘Mexicanhat’. Computing second derivative in this way is robust and
efficient. Zero crossings are obtained atr = =o.

0 0 —1 0 0
0 —1 —2 —1 0
—1 —2 — 16 —2 —1
0 —1 —2 —1 0
0 0 —1 0 0

Fig. 26: LOG as an image, LOG 3 D plot, 5•5 LOG mask

Fig. 27 (a) is the input image, Fig. 27 (b) is the output of prewitt filter Fig.27
(c) is the output of Robert filter, Fig. 27 (d), Fig. 27 (e), and Fig.27 (f) are
outputs of Laplacian, Canny and Sobel filters respectively. As it is clear from
the figures, each filter extracts different edges in the image. Laplacian and
canny filters extract lot of inner details while sobel and robert filters extract
only the boundary. Prewitt filter extracts the entire boundary of the flower
without any gaps intheboundary.

(a) Original image (b)Output ofprewitt filter

(c) Output ofrobert filter (d) Output oflaplacian filter

267
Computer Vision-II

(e)Output ofcanny filter (f) Output ofsobel filter

Fig. 27
Try the following exercises.

E4) What is an edge?


E5) List the properties of the second derivative around an edge?

we use the angle and its distance from origin will be used forrepresenting
lines.
If‘r’ and ‘8’ represents distance of line from origin and its angle,
(2)

Figure 28(Source: Internet)


268
We can represent any line using equation (2) where 8C[0,360[ andr > 0. Object Detection

a) Hough Transform (HT): The main constraint of any image processing


algorithm is the amount of data. We need to reduce data for preserving
related information of objects. Edge detection can do this work effectively.
The output of edge detector cannot identify lines. HT was initially developed
forline detection and later defined for shape detection too

When we represent lines in the form ofy = ax+ b there is one problem. In
this form, the algorithm won't be able to detect vertical lines because the
slopea is undefined/infinity for vertical lines This would meana computer
would need an infinite amount ofmemory torepresent all possible values of

So, in Hough's transform we use the parametric form todescribe the lines, i.e
p = r cos(8)+ c sin(8), wherep is the normal distance of the line from the
origin, and 8 is the angle that the normal to the line makes with the positive
direction ofx-axis inthe positive direction.

Figure 29: Representation ofa straight line in the Hough.

269
Computer Vision-II

h // d d i /li d i ih h h f 84020b3b1549

value for each pixel (r,c), for multiple values ofp and 8, and store the
values in the array.
3. Finally, take the highest values in the above array. These will correspond
to the strongest lines in the image, and can be converted back toy = ax+b
form
Hough transform: The Hough transform is an incredible tool that lets you
identify lines. Not just lines, but other shapes as well.
Example: Using Hough transform show that the points (1,1), (2,2), and (3, 3)
are collinear find the equation of line.
Solution: The equation of line is y=mx+c, In order to perform Hough
transform we need to convert line from (x,y) plane to (m,c) plane Equation of
(m,c) plane is
Step 1: y =mx+C
For(1,1) y=mx+c 1=m+c C=-m+1
270
Ifc=0 then( 0=-m+1) m=1 Object Detection

Ifm=0 then (c=1)c=1 (m,c) = (1, 1)


Similarly for other points
If(x,y)= (2,2) then (m,c) = (1,2)
If(x,y)= (3,3) then (m,c) = (1,3)
Step 2: Intersect at the point (0,1) Then (m,c) = (0,1)

Plota graph for(mc) = (1, l), (1,2), and (1,3)


Step3 : The original equation of line is( y=mx+c) putthevalue of m andc
on this eq. Then y=x

points (1,1), (2,2), and (3, 3) are collinear

271
Computer Vision-II b) Convolution Based Technique

Convolution Masks areused to detect lines in this technique. Basically, there


are4 different varieties of convolution masks: Horizontal, vertical, oblique
(+45 degrees), and oblique (—45 degrees)

Horizontal (RI) Vertical (R3) Oblique Oblique


(+45 degrees) (R2) (-45degrees) (R4)
-1 -1 2 2 -1 -1
-1 -1 -1 -1 2 -1 -1 2
2 2 2 -1 2 -1 2 -1

Lines are detected using the equation (3) by using the response obtained after
convolving these masks with the image.

(3)

8 P P P J g
which have multiple regions corresponding to various portions of the object.
Therefore, it is necessary to partition an image into several regions that
correspond to objects or parts of things in order to interpret accurately. In
general, pixels ina region will have similar features. Pixels belonging toa
specific object can be identified by testing the following conditions
A. The mean ofthegrey value of pixels of an image and the mean ofthe
grey value of pixels ofa specific object in the image will be different
B. The standard deviation of the grey value of pixels belonging toa specific
object in the image will lie withina specific range.
C. The texture of the pixels of an object in the image will havea unique
property
But the connection between regions and objects is not perfect due to
segmentation errors. Therefore, we need to apply object-specific knowledge
inlater stages for image interpretation.
272
Region-based segmentation and boundary estimation using edge detection are Object Detection
two methods forsplittinga picture into areas.
Further, in boundary detection, semantic boundaries are considered to find
different objects or sections of an image. It is different from edge detection
because it does not use boundaries between light and dark pixels in an image.
The brief discussion on Region-based segmentation, boundary estimation,
and boundary detection is given below:
a) Region based segmentation: In region-based segmentation, all pixels of
an image belonging toa same area are grouped and labelled together.
Here, pixels are assigned to areas based on unique characteristic feature
which is different from other part of image. Value similarity and spatial
closeness are two important features of this segmentation process. If two
pixels are very close to one another and have similar intensity
characteristics, they may be allocated to the same region. For example,
the similar grey values can represent similar pixels and Eucledian
distance can represent the closeness of the pixels.

Figure 31 Boundary identified by grouping similar pixels

*) Boundary Detection: In boundary detection, semantic boundaries are


considered to find different objects or sections of an image. It is different
from edge detection because it does not use boundaries between light and
dark pixels in an image. A zebra, for example, has several internal
boundaries between black and white stripes that humans would not
consider part of the zebra's boundary. The focus is more on approximate
273
Computer Vision-II boundary detection method using training data becausea perfect solution
requires high-level semantic knowledge about the scene in the image.

Figure 32

Image segmentation using merging hasthefollowing steps


Step 1: Obtain the initial segmentation of the image.
Step 2: Merge two adjacent segments to forma single segment if they are
similar in same way
Step 3: Repeat step2 until no segment to be merged remains.
The initial segmentation can be all individual pixels. The basic idea is to
combine two pixels (regions) ifthey are similar. The similarity criteria can be
based on grey level similarity, texture of the segment etc.
Image segmentation using splitting has following steps
Step 1: Obtain an initial segmentation of an image.
Step 2: Split each segment that is inhomogeneous insome way
Step 3: Repeat step2 until all segments arehomogeneous.

274
Here, initial segmentation may be the entire image (no segmentation). The Object Detection
criterion for inhomogeneity ofa segment may be the variance of gray levels
or the difference in its textures etc. Both splitting and merging methods seem
tobe top-bottom and bottom-top approach of the same method. But there isa
basic difference. Merging two segments is straight forward, but in splitting,
we need toknow thesub-segment boundary.
Let us discuss region growing.
Region growing isa process of merging adjacent pixel segments into one
segment. It is one of the simplest and very popular method of segmentation
which is used in many applications. It needsa set of starting pixels called
‘seed’ points. The process consists of picking a seed from the set and
examining all4 or8 connected neighbours of this seed and merging similar
neighbours to the seed as shown inFig. 33 (a). The seed point is modified
based on all merged neighbours Fig. 33 (b). The algorithm continuous until
the seed set is empty.

pixels having gray level value of g and assigning them grey level
k = 1,k z 1,k z g. Let(x, y)be the coordinates of initial seed, and let (a, b) be
the coordinates of pixel under investigation.
The algorithm
Push(x, y)

Do till stack is not empty


Pop (a, b) /(take input point from topof stack)
If f(a, b) =g / if input = desired valueg
Set f(a, b) =1 / segmented pixel is assigned as 1/
Push (a,b + 1)/ Test all four neighbors of (a,b)
Push (a,b — 1)/ Test all four neighbors of (a,b) by
pushing it on the top of the stack/
Push (a+ 1, b)
275
Computer Vision-II Push (a—1,b)
End

This isa recursive algorithm. The final region is extracted by selecting all
pixels having grey level value as 1(k). The algorithm can be modified by
changing the similarity measure to incorporatea range of values for merging.
The statement if f(a, b) =g can be changed to

gj < f(a, b) <g2.

Thus, if the grey level value of pixel (a, b) is between Fund g2then, it is
segmented. The algorithm can be further modified to incorporate multiple
seed points. In the above algorithm, only four neighbours are considered. It
can be modified for eight neighbourhood. Instead of using four push
instruction, eight push instruction can be used with coordinates of all eight
neighbours.

simple to implement. Only input needed are the seed points and selection
criterion. Multiple criteria can also be applied. The algorithm works well in
noisy environment also.
Major disadvantage of region growing is that the seed points are user
dependent. Selection of wrong seed points can lead to wrong segmentation
results. The algorithm is highly iterative and requires high computational
time and power.
Example 1: In the image segment given in Fig. 34 (a) seed points are given
at (3, 2) and (3, 4). Similarity criterion is grey level difference. Find
segmented image, if a)T =3 and b)T = 8.
Solution: ForT = 3, region growing starts with pixel (3, 2). All the pixels
having grey level difference <3 are assigned asa and denoted as regionRi.
Another region growing starts at (3, 4). All pixels with grey level value <3
are assigned asb and denoted as regionR2. Theoutput is shown inFig34(b).
For T = 8, all the pixels have grey level difference less than3 only one
276
region is formed, with all pixels being assigned as ‘a’. The output is shown Object Detection
inFig. 34 (c).
1 2 3 4 5
1 0 0 5 6 7
2 1 1 5 8 7
3 0 1 6 7 7

4 2 0 7 6 6

5 0 1 5 6 5

(a) Input Image Segment forExample1

A a B B b a A a a a

) y( 8 )
b) Inhorizontal and vertical and diagonal directions (8 neighbourhood).
Similarity, criterion is the difference between two pixel values is less than or
equal to5
10 10 10 10 10 10 10
10 10 10 69 70 10 10
59 10 60 64 59 66 60
10 59 10 60 70 63 62
10 60 59 65 67 10 65
10 10 10 10 10 10 10
10 10 10 10 10 10 10

Fig. 35: Input image segment

277
Computer Vision-II Solution: a) Region growing starts with seed point pixel with grey value 60
in the centre. It moves horizontally up and down, vertically up and down to
check how much given pixel value differs from 60. Ifthe difference is less
than equal to 5. Then it is assigned as ‘a’ and merged with theregion, else it is
assigned as ‘b’. Fig 36(a) shows theoutput.

b) If diagonal elements are also included then the region grows more as
shown inFig. 36(b).

bbbbbbb b b b b b b b
bbbbbbb b b b b b b b
bbaa abb a b a a a b a
bababbb b a b a b a a
baa abbb b a a a b b a
bbbbbbb b b b b b b b
bbbbbbb b b b b b b b

(a)
Output for4 Neighborhood (b)Output for8 Neighborhood

Then, further subdivision of the quadrants is done. Merging operation is


added to the segmentation process, recursive splitting and merging of image
segments is done as shown inFig. 37. Original image is shown inFig. 37(a)
and Fig. 37 (b) shows, entire image split into four segments. Fig. 37(c) shows
a further split of four segments. In Fig. 37(d), further segmentation is done if
grey level variance in the sub block is non zero. Merging of the sub blocks
having same grey levels is shown inFig. 37(e). Continuing this process, we
end up in two segments one being the object and the other being the back
ground.

(a) (b) (c) (d) (e)


Fig. 37: Example ofSplit and Merge Segmentation
278
Split and merge algorithm uses ‘Quad tree’ for representing segments. Fig. 38 Object Detection
shows that there is a one to one relation between splitting an image
recursively into quadrant sand the corresponding quad tree representation.
However, there isa limitation in this representation, as we cannot model
merging of two segments at different level of pyramid.
LetR represent the entire image. A homogeneity criterion it is selected. If the
regionR is not homogeneous (H(R) = false), then split R into four quadrants
Ri,R2, R3,O. Any four regions with the same parent can be merged intoa
single homogeneous region ifthey are homogeneous.
The steps of split and Merge algorithm are as follows:
Step 1: Define an initial segmentation into regions, a homogeneity
criterion and a pyramid data structure.
Step 2: Ifany regionR in the pyramid data structure is not homogeneous
(H(R) = False), split it into four child regions.
Step 3: When no further splitting is possible, merge two adjacent regions

Several modification to this basic algorithm are possible. For example, in


Step 2, merging of homogeneous regions is allowed which results in a
simpler and faster algorithm.
The major advantages of this algorithm is that the image could be split
progressively according to our required resolution because the number of
splitting levels is decided by the user. Major disadvantage is that it may
produce blocky segments as splitting is done in rectangular quadrants. This
problem can be solved by splitting at higher level but this will increase the
computational time.
Example 3: Segment theimage inFig. 39 (a) by split and merge algorithm.
Homogeneity criteria is the grey levels of the pixels.
Solution: The image is divided into four quadrants. Fig. 39 (b) shows the
image and its quad tree. Quadrants2 and3 are homogeneous. Thus no further
splitting is done. Quadrant land4 are non — homogenous hence they are
divided further into 4 quadrants. Fig. 39 (c) shows the splitting and 279
Computer Vision-II corresponding quad tree. Now only one quadrant, 12 is still non-
homogenous. Hence, it is further

(a) (b)

(e)
Fig. 39: Split and merge algorithm

subdivided. Fig. 39(d) shows the segmented image and its final quad-tree
structure. Now all regions are homogeneous and hence no further splitting is
possible.

Now merging operation takes place between adjacent quadrants. Quadrants


43 and 44 are merged intoa single region. Quadrants 123 and 124 are
merged, quadrants 11 and 14 are merged. Finally merge all quadrants which
280
arehomogenous. Fig. 39(e) is the final segmented image after merging is Object Detection
complete.

Now trythe following exercises.

E9) Distinguish between image segmentation based on thresholding with


image segmentation based on region-growing techniques.
E10) Consider the image segment

128 128 128 64 64 32 32 8


64 64 128 128 128 8 32 32
32 8 64 128 128 64 64 64
8 128128 64 64 8 64 64
128 64 64 64 128 128 8 8
64 64 64 128 128 128 32 32

number of concavity in the boundary. Many algorithms require the points in


the boundary ofa region in an ordered clockwise (or anti clockwise)
direction. For boundary detection or following or tracking, the following
assumptions are made:
1) The image is binary where1=foreground and 0=background
2) The image is padded witha border of 0’s so an object cannot merge
with theborder.
3) We limit the discussion to single regions. The extension is straight
forward.
Moore Boundary Tracking Algorithm:
Givena binary regionR or its boundary.
Step 1: Let the starting pointbo, be theuppermost, leftmost point in the
image labeled 1.
Step 2: Denote by cothewest neighbour ofbo. cois alwaysa background
point. 281
Computer Vision-II Step 3: Examine the8-neighbours ofbo, starting at co and proceeding ina
clockwise direction.
Step 4: Let b1 denote the first neighbour encountered whose value is 1.
Step 5: Let c denote the background point immediately precedingb1 in
thesequence.
Step 6: Store thelocations ofb0 and
b1 for useinStep 10.
Step 7: Let b = b, and c = c,.
Step 8: Let the 8-neighbours ofb starting atc and proceeding clockwise
be denoted as n1,n2 ... ng. Find the first nd which is foreground
(i.e.,a “1” ) .
Step 9: Let b = nk andc = nk,
Step 10: Repeat steps8 and 9 until b=bo, that is, we have reached the first
point and the next boundary point found is b,.
The sequence ofb points found when thealgorithm stops is the set of ordered

Fig. 40: Illustration in Boundary Following Algorithm.

There isa need forstopping rule as stated in Step 10. We would only include
the spur at the right if we stop when we reach the initial point without
checking the next point. Starting from topmost left most point in Fig. 41 (a)
results in Fig. 41 (b). In Fig. 41 (c) the algorithm has returned to the starting
point again. Rest of the boundary could not be traced.

1 c0 b 0
1
1
1 1
1
1 1
1
t b
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1

(a) (b) (c)


Fig. 41: Example oferroneous result of Boundary Detection Algorithm.

282
Now, we shall discuss the chain codes. Object Detection

Chain codes are used to representa boundary bya connected sequence of


straight line segments of specified length and direction. Freeman codes
[1961] representa boundary by the sequence of straight line segments of
specified length and direction. The direction is coded bya numbering scheme
(4or 8-connectivity) as shown inFig. 42

3 6
(a) 4-direction chain code (b)8-direction chain code
Fig 42:DirectionN mbers

with Resampling Resampling Coded Boundary


Grid
Fig. 43
Ifwe start from topmost leftmost corner, chain code forFig. 28 is
0 7 6 6 66 6 4 53 3 2 12 12
The chain code depends on the starting point. To normalize it, we treat the
code asa circular sequence of direction number sand redefine the starting
point so that the resulting sequence forms an integer of minimum magnitude.
To account forrotation, we use the first differences of the chain code instead
of the code itself.
The first difference is obtained by counting the number ofdirection changes
in counter clockwise direction that separate two adjacent elements of the
code.
For boundary inexample inFig. 28, the chain code is0 7 6 6 6 6 6 4 5 3 3 2 1
2 1 2.
First difference is6 7 7 0 0 0 0 6 1 1 0 7 7 1 7 1 283
Computer Vision-II
First value is calculated by considering the code asa circular sequence of
integers, hence coding the sign changes from2 to0 counter clockwise and so
on.
Example 4: Find chain code and first difference of the following boundary
shapes.

(a) (b) (c) (d)


Fig. 44
Solution:
1

Now, tryanexercise.

E12) Find chain code and first difference of the following boundary shape.

284
Object Detection
11.7 FEATURE EXTRACTION
Feature extraction dividesa large set of data into smaller groups for quick
processing. There area large number of variables in these huge data sets
which requirea large amount ofprocessing power. Feature extraction extracts
the best feature from by selecting and combining variables into features.

Applications of Feature Extraction

Image Processing, Auto-encoders and Bag of words aresome applications of


Feature Extraction.

1. Image Processing: In image processing, we experiment with images


using different techniques to comprehend them better.

2. Auto-encoders: Auto-encoders do efficient unsupervised data coding. As


a result, the feature extraction technique may be used to discover

3. SURF (Speeded-Up Robust Features)- This technique isa simplified


variant of SIFT.

4. FAST (Features from Accelerated Segment Test)- In comparison to


SURF, this isa substantially faster corner detecting algorithm.

5. BRIEF (Binary Robust Independent Elementary Features)- This feature


descriptor can be used with any other feature detector. By converting
floating point integers to binary strings, this approach minimizes
memory utilization.

6. Oriented FAST andRotated BRIEF (ORB) —This OpenCV algorithm


uses FAST key-point detector and BRIEF descriptor. It is an alternative
to SIFT and SURF.

285
Computer Vision-II Deep Learning Techniques forfeature extraction
Convolutional neural network (CNN) can replace Traditional feature
extractors because of their strong ability and efficiency to extract complex
features for expressing more detailed part of an image and can learn task
specific features.

1. SuperPoint: It detects points of interest and descriptors using Fully CNN.


The extracted features are encoded in VGG style and then using two
decoders, it generates inters points and descriptors

then divides into two heads that are referred as decoders. One head is
responsible for locating potential places of interest, while the other is in
charge of describing those potential points of interest. Both ofthese activities
will make use ofthemajority of the network's parameters. Unlike previous
systems, which locate interest points first and then compute descriptors, this
one is able to share processing and representation between the two tasks.
Traditional systems locate interest points first and then compute descriptors.
As a consequence of this,a system has been developed that is effective for
completing tasks such as homography estimation, which require matching
geometric shapes.[1]
D2-Net: It isa trainable CNN based local feature detector and dense feature
descriptor(feature descriptor has minimum nonzero values)

286
Object Detection

Figure 46: Detect and Describe D2 network

It'sa fully convolutional neural network (FCNN) used forextracting feature


maps witha double purpose:

i) Obtaining the local descriptors dij ata given spatial position (i,j) is as
easy as traversing all then feature maps Dk;
ii) Keypoint detection scores sij are calculated during training usinga soft

It can be thought of as botha feature detector anda dense feature description


at the same time. The key points that are obtained using this method aremore
stable than their traditional equivalents, which are based on the early
detection of low-level structures. This is achieved by delaying the detection
untila later stage. We demonstrate that pixel data can be used to train this
model [3].
Now let us summaries what we have discussed this unit.

11.8 SUMMARY
In
this unit, we have discussed the following:
1. image segmentation techniques;
2. edge based segmentation;
3. line based segmentation; 287
Computer Vision-II 4. various region based segmentation techniques; and
5. boundary detection algorithm

11.9 SOLUTIONS AND ANSWERS


E1) Image segmentation partitions an image into set of regions. In many
applications, regions represent meaningful areas in an image. In other
applications, regions might be set of border pixels grouped into
structures such as line segments, edges etc. The level of partitioning
of an image depends on the problem to be solved. Segmentation
should stop when the objects of interests have been isolated.
Segmentation has two objectives:
a) To decompose an image into regions for further analysis.
b) To performa change of representation of an image for faster
analysis.
2) S i i h i i d ii

In this approach, images are partitioned based on the difference or


discontinuity of the gray level values. Edge based segmentation
methods fall in this category.
ii. Similarity
Images arepartitioned based on the similarity of the gray level values
of the pixels according to a pre-defined criterion. Thresholding,
region based clustering and matching based segmentation techniques
fall in this category.
E4) A typical edge may be the border betweena block of red color anda
block of yellow. Edge can also be the boundary between two regions
with relatively distinct gray-level properties.
E5) ‘Sign’ of second derivative is used to determine whether edge pixel
lies on dark side or on light side of edge. ‘Zero crossing’ is at the
midpoint ofa transition in gray level.
E6) Gradient Operator
288
Object Detection
Gradient isa first order derivative and is defined for an image f(x, y)
as

fif
x fi x
V f =
fiy

It points in the direction of maximum rate of change off ata point


(x, y)

Magnitude ofthegradient is mag (V) = [G2 + G2]l’2

This can be approximated as (H f) ' Gx1 + Gy Hug

0 0 -l 0 0

E9) Image segmentation based on thresholding applies a single fixed


criterion to all pixels in the image simultaneously. Hence, it is rigid.
On the other hand, image segmentation based on the region-based
approach is more flexible; hence it is possible to adjust the acceptance
criteria in the course of region-growing process so that they can
depend on theshape of the growing regions if desired.

E10) Step1 Computation of thehistogram of the input image.


The histogram of the image gives the frequency of occurrence of the
gray level.

The histogram threshold is fixed as 32. Now the input image is


divided into two regions as follows:
289
Computer Vision-II
Region 1: Gray level < 32

Region 2:Gray level > 32

The input image after this decision is given as

2 2 2 2 2 1 1 1
2 2 2 2 2 1 1 1
112 2 2 2 2 2
12222122
2 2 2 2 2 211
2 2 2 2 2 211
1212 2 2 2 2
112 2 2 2 2 2

[1] https://openaccess.thecvf.com/content cvpr 2018 workshops/papers/w9/DeTone


SuperPoint Self-Supervised Interest CVPR 2018 aper.pdf
[2] https://openaccess.thecvf.com/content CVPR 2019/papers/Dusmanu D2-
Net A Trainable CNN for Joint Description and Detection of CVPR 2019 ape
r.pdf
[3] https://www.semanticscholar.org/paper/D2-Net%3A-A-Trainable-CNN-for-Joint-
Description-and-Dusmanu-Rocco/l62d660eaaaleb2l44d8030l02f3e6bele80ce50
[4] https://web.ipac.caltech.edu/staff/fmasci/home/astro refs/HoughTrans lines 09.pdf
[5] https://www2.ph.ed.ac.uk/ wjh/teaching/dia/documents/edge-ohp.pdf
[6] https://en.wikipedia.org/wiki/Line detection
[7] https://towardsdatascience.com/image-feature-extraction-traditional-and-deep-
learning-techniques-ccc059l95d04
[8] https://morioh.com/p/l4d27a725a0ehttps://www.researchgate.net/figure/An-example-
of-edge-based-segmentation-18 fig2 283447261

290
Object Recognition
UNIT 12 OBJECT RECOGNITION USING Using Supervised
Learning Approaches
SUPERVISED LEARNING
APPROACHES
Structure Page No.
12.1 Introduction 291
Objectives
12.2 Basic Concepts 292
12.3 Discriminant Functions 296
12.4 Bayesian Classification 303
12.5 Minimum Distance Classifiers 308
12.6 Machine Learning Algorithms 311
127 Supervised Learning Approach 312

classify it into one out ofc classes. In the previous units it was discussed that
how the decision boundary surface between various classes can be used to
assigna class to each point in the test data. In this unit,a different approach is
being proposed, where theclass assigned will depend on how close the point
of the test data (that is, the pattern) is toa particular class. This gives rise to
Minimum Distance Classifier.
And now, we will list the objectives of this unit. After going through the unit,
please read this list again make sure you have achieved the objectives.
Objectives
After studying this unit, you should be able to
• define pattern recognition
• apply different types of classifiers
• describe discriminant Functions (linear and non-linear)
• use Bayesian classification.
291
Computer Vision-II • find minimum distance classifiers;
• apply machine learning algorithm;
• describe supervised learning approach;
• describe unsupervised learning approach.

We begin the unit by discussing some basic concepts in the following


section.

12.2 BASIC CONCEPTS


Although humans can perform some oftheperceptual tasks with ease, there
is not sufficient understanding to duplicate the performance witha computer.
Because the complex nature of the problems, many pattern recognition
research has been concerned with more moderate problems of pattern
classification — the assignment ofa physical object or event to one of several
pre-specified categories.

Suppose so ebody e u be e s us b c so e g e
colored than ash. Then brightness becomes an obvious feature. We might
attempt to classify the lumber merely by seeing whether or not the average
brightness ‘x’ exceeds some critical value.
One characteristic of human pattern recognition is that it involvesa teacher.
Similarly a machine pattern recognition system needs to be trained. A
common mode of learning is to be givena collection of labeled examples,
known astraining data set. From thetraining data set, structure information is
distilled and used forclassifying new inputs.
Try an exercise.

E1) Explaina pattern classification system.

292
Goal of pattern recognition is to reach an optimal decision rule to categorize Object Recognition
Using Supervised
the incoming data into their respective categories. A pattern recognition Learning Approaches
investigation may consist of several stages, enumerated below. Not all stages
may be present; some may be merged together so that the distinction between
two operations may not be clear, even if both arecarried out; also, there may
be some application-specific data processing that may not be regarded as one
of the stages listed. However, thepoints below arefairly typical.
1. Formulation of the problem: gaininga clear understanding of the aims
ofthe investigation and planning the remaining stages.
2. Data collection: making measurements on appropriate variables and
recording details of the data collection procedure (ground truth).
3. Initial examination of the data: checking the data, calculating summary
statistics and producing plots in order to geta feel for the structure.
4. Feature selection or feature extraction: selecting variables from the
measured set that are appropriate for the task. These new variables may
be obtained by a linear or nonlinear transformation of the original set
(f i ) h diii ff i

quantifiable) which is used to distinguish between (orclassify) two different


objects. For example, sorting incoming fish on a conveyor according to
species using optical sensing. Two category of spices are there: Sea bass,
Salmon.

Some properties that could be possibly used to distinguish between the two
types of fishes are
• Length
• Lightness (Dark colour or light colour)
• Width
• Number andshape of fms
• Position of the mouth, etc...
293
Computer Vision-II This is the set of all suggested features to explore for use in the classifier.
Feature Vector
A Single feature may not be useful always forclassification. A set of features
used for classification forma feature vector. For example, here the relevant
feature vector could be

FishxT =[x1, x2] = [Lightness, Width]

Feature Space
The samples of input (when represented by their features) are represented as
points in the feature space. Ifa single feature is used, then the feature space is
one- dimensional feature space shown infig. 2. If number of features is 2,
then we get points in 2D space as shown intheFig. 3. We can also have an
n -dimensional feature space.

F1
Fig. 3: Sample Points ina 2-Dimensional Feature Space

Decision Region and Decision Boundary


As we know that of pattern recognition is to reach an optimal decision rule to
categorize the incoming data into their respective categories (classes). The
decision boundary separates points belonging to one class from points of
other. The decision boundary partitions the feature space into decision
regions. The nature of the decision boundary is decided by the discriminant
function which is used fordecision. It isa function of the feature vector.

(a) Decision boundary inone-dimensional case with two classes

294
Object Recognition
Using Supervised
Learning Approaches

b) Decision Boundary in2 (or3-Dimensional Case with Three Classes


Fig.4

Hyper Planes and Hyper Surfaces

The accuracy of classification depends on two things, which are


i) The optimality of decision rule used: The central task is to find an
optimal decision rule which categories the training samples correctly,
and can generalise to correctly categorising unseen samples as far as
possible. This decision theory leads to a minimum error-rate
classification.
ii) The accuracy in measurements of feature vectors: This inaccuracy is
because of presence of noise. Hence the classifier should deal with
noisy and missing features too.
There are various types of classifiers used. We define them as following:
a) Nonparametric: Nonparametric techniques do not rely ona set
of parameters/weights.
b) Parametric: These models are parameterized, with its parameters/
weights to be determined through some parameter optimization
295
Computer Vision-II algorithm, which are then determined by fitting the model to the
training data set.
c) Supervised: The training samples are given as some input/output pairs.
The output is the desired response for the input. The
parameters/weights are adjusted so as to minimize the errors between
theresponse of the networks and the desired response.
d) Unsupervised: Suppose that we are given data samples without being
told which classes they belong to. There are schemes that are aimed to
discover significant patterns in the input data withouta teacher (labeled
data samples).
Try the following exercises.

E2) What isa feature space?


E3) You are given set of data S= (dolphin, Pomeranian dog, humming
bird, frog, rattlesnake, bat}. Developing a suitable classification

directly from data. Direct estimation of the decision boundaries is sometimes


referred to as discriminative modeling. The choice of discriminant function
may depend on prior knowledge about the patterns to be classified or may be
a particular functional form whose parameters are adjusted by a training
procedure. Many different forms of discriminant function have been
considered in the literature, varying in complexity from the linear
discriminant function (in whichg isa linear combination of the x,) to multi-
parameter nonlinear functions such as the multilayer perceptron.
Here, we will discuss some discriminant functions.
Linear Discriminant Functions (LDF)
Ifno probability distribution or parameters are known, we can to estimate
parameters of the discriminant function with Labeled data. The shape of
discriminant functions is known such as shown inFig. S. If we have samples
from2 classes x„x2,...,xn, we assume that2 classes can be separated bya
296 linear boundary1 (B) with some unknown parameters 8. Fit the “best”
boundary to data by optimizing over parameters 8 by minimizing Object Recognition
Using Supervised
classification error on training data as shown inFig. 6. Learning Approaches

Fig. 5: Linear Discriminant Function

Fig. 7: Block Diagram forLinear Discriminant Function

Fora new sample x anda given discriminant function, we can decide on


x belongs to Class 1, if g(x) > 0, otherwise it belongs to class 2.
A discriminant function that isa linear combination of the components of x
can be written as
g(x)= wex+ Wo,

where w is called the weight vector and w0 the


threshold weight (also

referred to as bias). These aretheparameters that we want to estimate (learn)


based on training data. A classifier based entirely on linear discriminant
functions is calleda linear classifier ora linear machine.

297
Computer Vision-II

Fig. 8: Linear Discriminant Function for2 Classes

Decision boundary surface that separates data samples assigned to Class1


from data samples assigned to Class2 is given by g(x) = A T X +w0=0.

Theequation g(x) =0 defines the decision boundary This isa hyperplane

Fig. 9: Discriminant Function g(x)

You may seeclearly that in Fig. 9(a) the discriminant function is simplya cut
off, and in Fig. 9(b), the discriminant function isa line and in Fig. 9(c), the
discriminant function isa plane.

g(x) = w T x + w0 = wi xi + w0
Z
i1

This isa linear discriminant function,a complete specification of which is


achieved by prescribing the weight vector w and threshold weight w0.
Equation of g(x) is the equation ofa hyperplane with unit normal in the
direction of w anda perpendicular distance| w0| / | w | from theorigin. The
298
value of the discriminant function fora pattern x is a measure of the Object Recognition
Using Supervised
perpendicular distance from thehyperplane shown inFig. 10. Learning Approaches

rplane,
0

considers the outcomes.


The following are the situation, in which Linear Discriminant Analysis is
useful.
• When theclasses are well-separated, the parameter estimates for the
logistic regression model are surprisingly unstable. Linear discriminant
analysis does not suffer from this problem.
• If n is small and the distribution of the predictors X is approximately
normal ineach of the classes, the linear discriminant model is again more
stable than the logistic regression model.
• Linear discriminant analysis is popular when there are more than two
response classes.
Let us understand this through the following example.
Example 1: In order to select the best candidates, an over-subscribed
secondary school sets an entrance exam on two subjects of English and
299
Computer Vision-II Mathematics. The marks of5 applicants as listed in the Table1 below and the
decision for acceptance is passing an average mark of75.
(i) Show that the decision rule is equivalent of the method of linear
discriminant function.
(ii) Plot the decision hyperplane, indicating the half planes of both Accept
and Reject, and location of the5 applicants.

Table1
Candidate No. | English | Math | Decision
1 80 85 Accept
2 70 60 Reject
3 50 70 Reject
4 90 70 Accept
5 85 75 Accept

Solution: i) Denote marks ofEnglish and Math as xl and x2,


respectively. The decision rule is as follows:

0
0 50 100 150
Fig. 11

(ii) To plot g(x) = 0, the easiest way is to set xl = 0, and find the value of
x2 sothat g(x) = 0.

For this,0 = 0+ x2 —150, so x2 = 150.


[0,150]T is on the hyperplane.

Likewise we can also set x2 = 0, find the value of xl so that g(x) = 0. i.e.
0 = x1 +0 — 150, so, x1 = 150. [150, 0]T is on the hyperplane.
Plota straight line linking [0,150]T and [150, 0]T as shown inFig. 11.

300
Next, we shall discuss another discriminant function. Object Recognition
Using Supervised
Learning Approaches
Piecewise Linear discriminant Functions
Suppose we have m classes, define m linear discriminant functions

8i(X) X -+- Wi0 i = 1,. .., m

Given x, assign classci, if

gi(x) gJ(x) Hj i
Such classifier is calleda linear machine that divides the feature space intoc
decision regions, withgi(x) being the largest discriminant if x is in the
regionRi.

gi (X) j (X)
d(XpHij
)'

Fig. 13: Groups notSeparable bya Linear Discriminant

Ina multi-class problem,a pattern x is assigned to the class for which the
discriminant function has the maximum value. A linear discriminant function
divides the feature space bya hyperplane whose orientation is determined by
the weight vector w and distance from theorigin by the weight threshold

301
Computer Vision-II Next discriminant function is quadratic discriminant function.

Quadratic Discriminant Function


A quadratic discriminant function isa mapping
1
g:X —+R with g(x) = —X T wx + IT x+ o
2
In quadratic discriminant function, the model parameter is 8 = (W; w, w0}.
Depending on W, the geometry ofg could be convex, concave, or neither.
Fig. 14 showsa quadratic discriminant function separating an inner and an
outer cluster of data points.

A quadratic discriminant function is able to classify data using quadratic


surfaces. This example shows an ellipsoid surface for separating an inner and
outer cluster of data points. QDF is not really that much different from LDF
except that you assume that the covariance matrix can be different for each
class and so, we will estimate the covariance matrix separately for each class

Fig. 14: Quadratic Discriminant Function

The classification rule is similar as well. You just find the class k which
maximizes the quadratic discriminant function. The decision boundaries are
quadratic equations in x. QDF allows more flexibility for the covariance
matrix, tends to fit the data better than LDF, butthen it has more parameters
to estimate. The number ofparameters increases significantly with QDF asa
separate covariance matrix is required for every class. If you have many
classes and not so many sample points, this can bea problem.

After quadratic discriminant function, let us now discuss the non-linear


discriminant function.

Now, trythefollowing exercises.


302
Object Recognition
Using Supervised
E4) Explain Linear Discremination Function. Learning Approaches

E5) What aretheproperties of LDA?

Inthefollowing section, we shall discuss Bayesian classification.

12.4 BAYESIAN CLASSIFICATION


Goal of most classification procedures is to estimate the probabilities thata
pattern to be classified belongs to various possible classes, based on the
values of some feature or set of features.
Here, we are discussing Bayesian decision making orBayes Classifier.
This method refers to choosing the most likely class, given the value of the
feature/s. Bayes theorem calculates the probability of class membership. In
most cases we decide which is the most likely class We needa mathematical

number oftimes (occurrences) of X, if it belongs to class w1.


Thegoal is to measure P(wi| X), posteriori probability, from theabove
three values. This is the probability of any vector X being assigned to class
i-
P(X | w)

p( ) —> BAYES RULE P(w| X)

X,P(X)
Fig. 15: Bayes Theorem

Here is another model, called the naive Bayes probabilistic model.


The probability model fora classifier isa conditional model
303
Computer Vision-II overa dependent class variable X witha small number of outcomes or
classes, conditional on several feature variables Wl through Wyl. The
problem is that ifthe number offeatures n is large or whena feature can take
ona large number ofvalues, then basing sucha model on probability tables is
infeasible. We therefore reformulate the model tomake it more tractable.

Using Bayes' theorem, we write

P(X | W19---• n)'


p(X)p(Wi. ., W.1 X)

_ prior x likelihood
Insimple words, posterior
evidence
In practice, we are only interested in the numerator of that fraction, since the
denominator does not depend on X and the values of the features WI are
given, so that the denominator is effectively constant. The numerator is
equivalent to the joint probability model p(X| Wl W ) which can be

p(X| W . .'*n)= p('*)p(>11 )p('*2I X)...


n
= p(C) P( i I X).
i=l

This means that under the above independence assumptions, the conditional
distribution over the class variable can be expressed like this:
n
1
P(C) P( i
Z
'=l

whereZ (the evidence) isa scaling factor dependent only on Wf,...Wl, i.e.,
a constant, ifthe values of the feature variables are known.

Let us now discuss how Parameter estimation is done.

All model parameters (i.e., class priors and feature probability distributions)
can be approximated with relative frequencies from the training set. These
304 aremaximum likelihood estimates of the probabilities. A class' prior may be
calculated by assuming equiprobable classes (i.e., priors = 1 / (number of Object Recognition
Using Supervised
classes)), or by calculating an estimate for the class probability from the Learning Approaches
training set (i.e., (prior fora given class) = (number of samples in the class)/
(total number ofsamples)).

To estimate the parameters fora feature's distribution, one must assumea


distribution or generate nonparametric models for the features from the
training set. If one is dealing with continuous data,a typical assumption is
that the continuous values associated with each class are distributed
according toa Gaussian distribution.

For example, suppose the training data containsa continuous attribute, x. We


first segment thedata by the class and then compute themean andvariance of
x in each class. Let pC bethe mean ofthevalues in x associated with class
c, and let nc2 be thevariance of the values x in associated with class c.
Then, the probability of some value givena class, p(x = v| c), can be
computed by plugging into the equation for a Normal distribution

q y P y
problematic since it will wipe out all information in the other probabilities
when they are multiplied. It is therefore often desirable to incorporate a
small-sample correction in all probability estimates such that no probability is
ever set to be exactly zero.

Now, we shall Construct a classifier from the probability model. The


discussion so far has derived the independent feature model, that is, the naive
Bayes probability model. The naive Bayes classifier combines this model
witha decision rule. One common rule is to pick the hypothesis that is most
probable; this is known asthemnrimuma posteriori or MAP decision rule.
The corresponding classifier is the function classify defined as follows:
n
classify(^i.-•9 Wn) argmax p(C = c) p(Wi= '•iI C = c)
i 1

Despite the fact that the far-reaching independence assumptions are often
inaccurate, the naive Bayes classifier has several properties that make it
surprisingly useful in practice. In particular, the decoupling of the class
305
Computer Vision-II conditional feature distributions means that each distribution can be
independently estimated asa one dimensional distribution.
This in turn helps to alleviate problems stemming from the curse of
dimensionality, such as the need fordata sets that scale exponentially with
the number of features. Like all probabilistic classifiers under the MAP
decision rule, it arrives at the correct classification as long as the correct class
is more probable than any other class; hence class probabilities do not have to
be estimated very well. In other words, the overall classifier is robust enough
toignore serious deficiencies in its underlying naive probability model.
Properties of Bayes Classifiers
1. Incrementality: with each training example, the prior and the likelihood
can be updated dynamically. It is flexible and robust to errors.
2. Combines prior knowledge and observed data: prior probability ofa
hypothesis is multiplied with probability of the hypothesis given the
training data.
3. Probabilistic hypotheses: outputs is not only a classification, but a

P(f) 0.02

let us take an example with values to verify:


Total Population =1000.
Thus, people hav ing cold = 10.
People having both fever and cold = 4.
Thus, people having only cold = 10 —4 = 6.

306 Fig. 15: A Venn Diagram


People having fever (with and without cold) = 0.02* 1000 = 20. Object Recognition
Using Supervised
People having fever without cold = 20 —4 = 16 Learning Approaches

So, probability (percentage) of people having cold along with fever, out of all
those having fever, is = 4/20 = 0.2(20%).
Probability ofa joint event -a sample comes from classC and has the
feature value X :
P(C and X) = P(C).P(X| C)
= 0.01* 0.4

Or, P(C and X) = P(X).P(C| X)


= 0.02* 0.2
Also verify, fora K class problem:
P(X) = P(w1)P(X | w1) + P(w2)P(X | w2) + ‘+P(wk)P(X | wk)

properties to independently contribute to the probability that this fruit is an


apple.
Depending on the precise nature of the probability model, naive Bayes
classifiers can be trained very efficiently ina supervised learning setting. In
many practical applications, parameter estimation for naive Bayes models
uses the method of maximum likelihood; in other words, one can work with
the naive Bayes model without believing in Bayesian probability or using any
Bayesian methods.
An advantage of the naive Bayes classifier is that it only requiresa small
amount of training data to estimate the parameters (means and variances of
the variables) necessary for classification. Because independent variables are
assumed, only the variances of the variables for each class need to be
determined and not the entire covariance matrix.
Try following exercises.

307
Computer Vision-II E6) Explain Bayes classifier.
E7) Explain properties of Bayes classifier.

12.5 MINIMUM DISTANCE CLASSIFIERS


Minimum distance classifier isa pattern classification scheme defined by
distance functions. Here pixels which areclose to each other in feature space
are likely to belong to the same class. The measure of similarity is the
"distance" between pixels in feature space (n-D histogram). All dimensions
should be in comparable units and distance may be scaled in pixels, radiance,
reflectance etc. the classification is most effective if the clusters are disjoint.
It requires the least amount ofprior information to operate.
Distance in feature space is the primary measure of similarity in minimum
distance classifier algorithms.Pixels that are "close" in feature space willbe
grouped in the same class.The relative distances may change when data are
calibrated, atmospherically corrected or rescaled in ways that treat different

The decision boundary for the single prototype, simple distance


discriminant function is the set of planar surfaces perpendicular to and
bisecting the lines connecting pairs of prototypes as shown inFig. 1. This isa
minimum-distance classifier. If the prototype is the mean value of the
training pixels for each class, it is called a minimum-distance-to-mean
classifier.

Fig. 1: Decision boundary


308
The results of clustering will depend strongly on the choice of the prototype. Object Recognition
Using Supervised
Alternatives for prototype selection are Learning Approaches

1. Let the user select prototypes, i.e., one "example" pixel per class.
(Reduces theutility ofa clustering procedure.)
2. Devise an unbiased procedure for selecting prototypes (random selection,
selection at vertices of an arbitrary grid etc)
3. Use the user-selected prototype or unbiased selection procedure as the
starting point of an optimization procedure.
We shall discuss Euclidean distance classifier and Mahalanobis distance
classifiers here.

The Euclidean Distance Classifier

The optimal Bayesian classifier is significantly simplified under the


following assumptions:

D,(x) =| x — ziI = [(x —z,)T(x —z,)]"2

The discriminant function is usually defined as the negative of the


separation distance:

d,(x) = —Di(x)

The larger (less negative) d,(x), the closer the measurement vector lies
relative to the prototype vectorzi. Themaximum value of d,(x) is zero and
occurs when x matches theprototype vector exactly.

Algorithm

Step 1: Selecta threshold, T.T isa representative distance in measurement


space. The choice ofT in this algorithm is entirely arbitrary; it is
also the only input required of the user.

309
Computer Vision-II Step 2: Selecta pixel with measurement vector, x. The selection scheme is
arbitrary. Pixels could be selected at random.
Step 3: Let the first pixel be taken as the first cluster center, z,.
Step 4: Select the next pixel from theimage.
Step 5: Compute thedistance functions, D,(x). Compute thedistance
function for each of the classes established at this point, i.e.,
compute D,(x), fori = 1,..., N where N = thenumber ofclasses.
(N =1 initially.)
Step 6: Compare theD,(x) with T.
a) ifDi(X)< T,thenx C i’
b) ifDi(x)< T, forall i, then let x becomea new prototype
vector: Assign x —›zN+,. (Do notcomputeDN+, for pixels
already assigned to an existing class.)

It must be stated that the Euclidean classifier is often used, even if we know
that the previously stated assumptions are not valid, because of its simplicity.
It assignsa pattern to the class whose meanis closest to it with respect to the
Euclidean norm.

The Mahalanobis Distance Classifier

In many appliacations, the range of all feature value may differ widely. One
could be in hundreds while the other could be in decimal fractions. If this
issue is overlooked some feature values will get neglected. If one relaxes the
assumptions required by the Euclidean classifier and removes the last one,
the one requiring the covariance matrix to be diagonal and with equal
elements, the optimal Bayesian classifier takes the form of minimum
Mahalanobis distance classifier. That is, given an unknown x, it is assigned
to class mi if
310
Object Recognition
Using Supervised
Learning Approaches
whereS is the common covariance matrix. The presence of the covariance
matrix accounts for the shape of the Gaussians distributions of various
features.

Try the following exercises.

E8) What distance measure is used by Euclidean Distance Classifier?

E9) What is the discriminant function for Euclidean distance classifier?

In the following section, we shall discuss Machine learning algorithm.

12.6 MACHINE LEARNING ALGORITHMS

term for execute the ML model and outputa result.

As we see, Machine learning use the combination ofa training algorithm and
a prediction (or inference) algorithm. The training algorithm uses data to
gradually determine parameters. The set of all learned parameters is calleda
model, basicallya “set of rules” established by the algorithm, applicable even
to unknown data. The inference algorithm then uses the model and applies it
to any given data. Finally, it delivers the desired results.

Equipped with the right vocabulary, we can take a closer look at the
execution ofa machine learning project:

• We select the machine learning method forwhich we want to traina


model. The choice will depend on theproblem tobe solved, the available
data, the experience and also on gut feeling.
• Then we divide the available data into two parts: The training data and
the test data. We apply our training data and thus obtain our model. The
model is checked on the unknown test data. It is most important that the 311
Computer Vision-II test data aren't used during the training phase under any circumstances.
The reason is obvious: Computers aregreat at learning by heart. Complex
models like neural networks can actually start to memorize by
themselves. The following results might be quite remarkable. There's
only one flaw: They're not based ona model formulated by the program,
but on “memorized” data. This effect is called “over fitting”” However,
the test data are supposed to simulate the “unknown” during quality
control and to check whether the model hasreally “learned” something. A
good model achieves about the same error rate on the test data as on the
training data without ever having seen it before.
• We use the training data to develop the model with the learning
algorithm. The more data we have, the “stronger” the model becomes.
Using up all available data forthe training algorithm is called an “epoch”.
• In order to test it, the trained model is used on thetest data unknown toit
and makes predictions. If we did everything right, the predictions on
unknown data should be as good ason thetraining data — the model can
generalize and solve the problem. Now it is ready forpractical use.

learning is about designing algorithms that allow a computer to learn.


Learning is not necessarily involves consciousness but learning isa matter of
finding statistical regularities or other patterns in the data. Thus, many
machine learning algorithms will barely resemble how human might
approacha learning task. However, learning algorithms can give insight into
the relative difficulty of learning in different environments.
Now we shall discuss supervised learning approach in detail in the following
section.

12.7 SUPERVISED LEARNING APPROACH


Supervised learning is fairly common inclassification problems because the
goal is often to get the computer to learna classification system that we have
created. Digit recognition, once again, isa common example of classification
learning. More generally, classification learning is appropriate for any
problem where deducinga classification is useful and the classification is
312 easy to determine. In some cases, it might not even be necessary to give
predetermined classifications to every instance ofa problem if the agent can Object Recognition
Using Supervised
work out the classifications for itself. This would be an example of Learning Approaches
unsupervised learning ina classification context.

Supervised learning often leaves the probability for inputs undefined. This
model is not needed as long as the inputs are available, but if some of the
input values are missing, it is not possible to infer anything about the outputs.
Unsupervised learning, all the observations are assumed to be caused by
latent variables, that is, the observations is assumed to be at the end of the
causal chain. Examples of supervised learning and unsupervised learning are
shown intheFig. 2.

Fig. 3: Learning phase ofa supervised learning algorithm

Inductive machine learning is the process of learninga set of rules from


instances (examples ina training set), or more generally speaking, creatinga
classifier that can be used to generalize from new instances. The process of
applying supervised ML toa real world problem is described below

Step1 (Collect the dataset): Ifa requisite expert is available, then s/he
could suggest which fields (attributes, features) are the most
informative. If not, then the simplest method is that of measuring
everything available in the hope that the right (informative, relevant)
features can be isolated.
313
Computer Vision-II Step2 (Data preparation and data pre-processing): Depending on the
circumstances, there area number of methods to choose from to
handle missing data, outlier (noise) detection. There isa variety of
procedures for sampling instances froma large dataset. Feature
subset selection is the process of identifying and removing as many
irrelevant and redundant features as possible. This reduces the
dimensionality of the data and enables data mining algorithms to
operate faster and more effectively.
Step3 (Definea training set): The goal of the learning algorithm is to
minimize the error with respect to the given inputs. These inputs,
often called the "training set", are the examples from which the
agent tries to learn. But learning the training set well is not
necessarily the best thing to do. For instance, ifI tried to teach you
exclusive-or, but only showed you combinations consisting of one
true and one false, but never both false or both true, you might learn
the rule that the answer is always true. Similarly, with machine
learning algorithms,a common problem is over-fitting the data and
essentially memorizing the training set rather than learninga more

essentially complex algorithms, categorized as either classification or


regression models as shown inFig. 4.

1) Classification Models — Classification models areused forproblems


where theoutput variable can be categorized, such as “Yes” or ‘to”, or
“Pass” or “Fail.” Classification Models areused to predict the category
of the data. Real-life examples include spam detection, sentiment
analysis, scorecard prediction of exams, etc.
2) Regression Models — Regression models areused forproblems where
the output variable isa real value such asa unique number, dollars,
salary, weight or pressure, for example. It is most often used to predict
numerical values based on previous data observations. Some of the
more familiar regression algorithms include linear regression, logistic
regression, polynomial regression, and ridge regression.

314
Object Recognition
Using Supervised
Learning Approaches

Fig. 4: Categories of Supervised Machine Learning algorithms

There are some very practical applications of supervised learning algorithms


in real life, including:

• Text categorization
• Face Detection
Si ii

Let us list the Differences between Supervised and Unsupervised Learning


Approaches.

Parameters Supervised machine Unsupervised machine


learning technique learning technique
Process Ina supervised learning In unsupervised learning
model, input and output model, only input data will
variables will be given. be given
Input Data Algorithms aretrained Algorithms areused
using labeled data. against data which is not
labeled
Algorithms Support vector machine, Unsupervised algorithms
Used Neural network, Linear and can be divided into
logistics regression, random different categories: like
forest, and Classification Cluster algorithms, K-
trees. means, Hierarchical
315
Computer Vision-II clustering, etc.
Computational Supervised learning isa Unsupervised learning is
Complexity simpler method. computationally complex
Use of Data Supervised learning model Unsupervised learning
uses training data to learna does not use output data.
link between theinput and
the outputs.
Accuracy of Highly accurate and Less accurate and
Results trustworthy method. trustworthy method.
Real Time Learning method takes Learning method takes
Learning place offline. place in real time.
Number of Number ofclasses is Number ofclasses is not
Classes known. known.
Main Drawback Classifying big data can be You cannot get precise
a real challenge in information regarding data
Supervised Learning. sorting, and the output as
data used inunsupervised
learning is labeled and not

• For example, Baby canidentify other dogs based on past supervised


learning.
• Regression and Classification are two types of supervised machine
learning techniques.
• Clustering and Association are two types of Unsupervised learning.
• Ina supervised learning model, input and output variables will be given
while with unsupervised learning model, only input data will be given

Now the question arises, When toChoose Supervised Learning/


Unsupervised Learning?

In manufacturing,a large number offactors affect which machine learning


approach is best for any given task. And, since every machine learning
problem is different, deciding on which technique to use isa complex
process.

316
In general,a good strategy for honing inon theright machine learning Object Recognition
Using Supervised
approach is to: Learning Approaches

• Evaluate the data. Is it labeled/unlabelled? Is there available expert


knowledge to support additional labeling? This will help to determine
whether a supervised, unsupervised, semi-supervised or reinforced
learning approach should be used
• Define the goal. Is the problem recurring, defined one? Or, will the
algorithm be expected to predict new problems?
• Review available algorithms that may suit the problem with regards to
dimensionality (number of features, attributes or characteristics).
Candidate algorithms should be suited to the overall volume ofdata and
its structure
• Study successful applications of the algorithm type on similar
problems

Let us now summarize, what we have discussed in this unit.

6) Likelihoods can be estimated based on frequencies. Sparse data posesa


huge problem. A possible solution is to use m -estimate.
7) Concept of minimum distance classifier.
8) Supervised Learning.
9) Unsupervised Learning.
10) Differences between Supervised Learning/ Unsupervised Learning.

12.9 SOLUTION/ANSWERS
feature selector/
Ei) sensor extractor classifier
representation feature decision
pattern pattern

317
Computer Vision-II Optical sensing is used to distinguish two patterns. A camera takes
pictures of the object and passes to on toa feature extractor. The feature
extractor reduces the data by measuring certain “properties” that
distinguish pictures of one object to the other. These features are then
passed toa classifier that evaluates the evidence presented and makesa
final decision about the object type.
One characteristic of human pattern recognition is that it involvesa
teacher. Similarlya machine pattern recognition system needs to be
trained. A common mode of learning is to be givena collection of
labeled examples, known astraining data set. From thetraining data
set, structure information is distilled and used for classifying new
inputs.
E2) The samples of input (when represented by their features) are
represented as points in the feature space. Ifa single feature is used,
then work ona one- dimensional feature space. If number offeatures is
2, then we get points in 2D space. We can also have an n-dimensional
feature space.

Fora new sample x anda given discriminant function, we can decide


on x belongs to

Classl if g(x)> 0, otherwise it's Class 2.

A discriminant function that isa linear combination of the components


of x can be written as

g(x) = wT x+ w0

where wis called the weight vector and w0 thethreshold weight.


These are the parameters that we want to estimate based on training
data. A classifier based entirely on linear discriminant functions is
calleda linear classifier ora linear machine.

318
Object Recognition
Using Supervised
Learning Approaches

E5) LDF assumes that the data are Gaussian. More specifically, it assumes
that all classes share the same covariance matrix.
• LDF finds linear decision boundaries ina K —1 dimensional subspace.
As such, it is not suited if there are higher-order interactions between
theindependent variables.
LDF i 11 i d f 1i 1 b1 b h ld b d ih

make it surprisingly useful in practice. In particular, the decoupling of


the class conditional feature distributions means that each distribution
can be independently estimated asa one dimensional distribution.

E7)
• Incrementality: with each training example, the prior and the
likelihood can be updated dynamically. It is flexible and robust to
errors.
• Combines prior knowledge and observed data: prior probability
ofa hypothesis is multiplied with probability of the hypothesis
given the training data.
• Probabilistic hypotheses: outputs is not onlya classification, but
a probability distribution over all classes.
• Meta-classification: the outputs of several classifiers can be
combined, e.g., by multiplying the probabilities that all classifiers
predict fora given class.
319
Computer Vision-II E8) The Euclidean distance, Di(x), ofa measurement vector, x, from the
prototype vector, Z,:

D(x)'—(x—zl=[x—zi(x—z)f'
E9) Thediscriminant function is usually defined as the negative of
the separation distance:
di(x) = —Di(x)
E10) Supervised learning algorithms are essentially complex algorithms,
categorized as either classification or regression models

1) Classification Models — Classification models are used for


problems where theoutput variable can be categorized, such as
“Yes” or ‘No”, or “Pass” or “Fail.” Classification Models are
used to predict the category of the data. Real-life examples
include spam detection, sentiment analysis, scorecard prediction
of exams, etc.

• Predicting housing prices based on theprevailing market price


• Stock price predictions, among others.

320
Object Classification
UNIT 13 OBJECT CLASSIFICATION USING Using Unsupervised
Learning
UNSUPERVISED LEARNING
Structure Page No.
13.1 Introduction 321
Objectives
13.2 Introduction to Clustering 321
13.3 Major Clustering Approaches 326
13.4 Clustering Methods 327
13.5 Hierarchical Clustering 328
13.6 Partitional Clustering 334
13.7 k - MeansClustering 336
13.8 Summary 339
13.9 Solution/Answers 339

• define clustering
• define and use different clustering techniques
• apply Hierarchial Clustering
• use partition based clustering,
• apply K - NN clustering

13.2 INTRODUCTION TO CLUSTERING


The problem of pattern clustering may be regarded as one of discriminating
the input data, not between individual patterns, but between populations.
These populations are searched fora match with the new object with the help
of its features. The objective is to categorize patterns into classes so that
patterns belonging to different classes are well separated. The process may
start with or without any knowledge ofthefeature space. When we have prior
knowledge about the class of a subset of data, it is called supervised
classification or classification. When nothing is knowna prior, the scheme 321
Computer Vision-II is called unsupervised classification or clustering. Ina case of supervised
classification, the labelled subset of data is called training data.
Being unsupervised in nature, clustering isa very difficult task, as the same
data may reveal many different inherent structures depending on the shape
and size of its distribution.
A loose definition of clustering could be “the process of organizing objects
into groups whose members aresimilar in some way”. A cluster is thereforea
collection of objects which are“similar to each other and are “dissimilar” to
the objects belonging to other clusters. Cluster analysis is also used to form
descriptive statistics to ascertain whether or not the data consists ofa set
distinct subgroups, each group representing objects with substantially
different properties. The latter goal requires an assessment of the degree of
difference between theobjects assigned to the respective clusters.
The notion of cluster is not well defined. To better understand the difficulty
of deciding what constitutesa cluster, consider Fig. 1, which shows twenty
points and three different ways of dividing them into clusters. The shapes of

Fig. 1: Different ways ofClustering Same Data Points

The various examples of clustering applications are as follows:


1) Marketing: Help marketers discover distinct groups intheir customer
bases, and then use this knowledge todevelop targeted marketing
programs.
2) Land use: Identification of areas of similar land use in an earth
observation database.
3) Insurance: Identifying groups of motor insurance policy holders witha
high average claim cost.
4) City-planning: Identifying groups of houses according to their house
type, value, and geographical location.
5) Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults.
6) Image processing: Clustering parts of image having similar RGB
322
values, so that image is clustered into regions such as sky, greenery, Object Classification
Using Unsupervised
road, house, etc. Learning

In fact, clustering is one of the most utilized data mining techniques. It has a
long history, and used in almost every field, e.g., medicine, psychology,
botany, sociology, biology, archeology, marketing, insurance, libraries, etc.
In recent years, due to the rapid increase of online documents, text clustering
becomes important.
Let us see some real-life examples of clustering.
Example 1: Groups people of similar size to gather to make “small”,
“medium” and “large” T-Shirts.

— Tailor-made foreach person: too expensive


— One-size-fits-all: does not fit all.

k i di h i i i1 i i

by the method and its implementation. The quality ofa clustering method is
also measured by its ability to discover some orall of the hidden patterns.
We can measure the quality of clustering by dissimilarity/similarity metric.
Similarity is expressed in terms ofa distance function, which is typically
metric: d(i, j). There isa separate “quality” function that measures the
“goodness” ofa cluster. The definitions of distance functions are usually very
different for interval-scaled, boolean, categorical, and ordinal variables.
Weights should be associated with different variables based on applications
and data semantics. It is hard to define “similar enough” or “good enough”.
The answer is typically highly subjective.
Let us definea cluster in the following definition.
Clusters can be defined as collection of similar object groups together. A
cluster isa set of entities which arealike and at the same time entities from
different cluster are not alike.
In general Clusters may be defined as collection of points ina test space such
323
Computer Vision-II that the distance between any two points in the cluster is less than the
distance between any point in the cluster and any point outside the cluster.
In general, similarity and dissimilarity between data points is measured asa
function of the distance between them. The objects may also be grouped into
clusters based on different shapes and sizes.
Cluster analysis embracesa variety of techniques, the main objective of
which is to group observations or variables into homogeneous and distinct
clusters. A simple numerical example will help explain these objectives.
The daily expenditures on food (XI) and clothing (X2) of five persons are
shown inFig. 2.

Person xl x2
a 2 4
b 8 2
c 9 3
d 1 5

according to the degree of proximity among thecluster elements and of the


separation among theclusters. Unfortunately, this is not feasible because in
most cases in practice the number ofall possible clusters is very large and out
of reach of current computers. Cluster analysis offersa number of methods
that operate much asa person would inattempting to reach systematicallya
reasonable grouping of observations or variables.
Since clustering is the grouping of similar instances/objects, some sort of
measure that can determine whether two objects are similar or dissimilar is
required. There aretwo main type of measures used to estimate this relation:
distance measures and similarity measures.
Many clustering methods use distance measures to determine the similarity or
dissimilarity between any pair of objects.
Depending on which formula is used to compute the distance between two
data points can lead to different classification results. Domain knowledge
must be used to guide the formulation ofa suitable distance measure foreach
particular application. For high dimensional data,a popular measure is the
324
1 Object Classification
d Using Unsupervised
P Learning
Minkowski Metric: d(X 9 XQ) ' I Xiks Xj$k
k=l

whered is the dimensionality of the data.


Special Cases:
Ifp = 2, then the distance is Euclidean distance, and ifp = 1, then the distance
is Manhattan distance.
The commonly used Euclidean distance between two objects is achieved
whenp = 2.

2 2 2
d ' ((Xi1 Xj1) + (Xi2 Xj ) + ’’’+ (Xid Xjd) )2 -
2

Another well-known measure is the Manhattan distance which is defined


whenp = 1.
d | (( ) ( ) ( ))

r +s
d(Xi› Xj) '
q + r +s +t
where q is the number of attributes that equal1 for both objects; t is the
number of attributes that equal0 for both objects; and s and r are the
number ofattributes that are unequal forboth objects.
A binary attribute is asymmetric, if its states are not equally important
(usually the positive outcome is considered more important). In this case, the
denominator ignores the unimportant negative matches (t). This is called the
Jaccard coefficient:
r +s
d(x;, xJ) =
q + r+ s
When theattributes are nominal, two main approaches may be used:

i) Simple Matching: d(Xi›Xj)' , where,p is the total number of


P 325
Computer Vision-II attributes and m is the number ofmatches.
ii) Creatinga binary attribute for each state of each nominal attribute and
computing their dissimilarity as described above.
Try the following exercise.

E1) What aredifferent distance measures used forclustering?

Now, we shall discuss major clustering applications in the following section.

13.3 MAJOR CLUSTERING APPROACHES


We shall begin this section by describing the major clustering approaches.
1) Partitioning algorithms: Construct various partitions and then evaluate
them by some criterion.

one example oftheexclusive clustering algorithms.


• Overlapping Clustering
The overlapping clustering uses fuzzy sets to cluster data, so that each
point may belong to two or more clusters with different degrees of
membership.
• Hierarchical Clustering
Hierarchical clustering algorithm has two versions: agglomerative
clustering and divisive clustering.
• Agglomerative clustering It is based on the union between the two
nearest clusters. The beginning condition is realized by setting every
datum asa cluster. Aftera few iterations it reaches the final clusters
wanted. Basically, this isa bottom-up version
• Divisive clustering It starts from one cluster containing all data items. At
each step, clusters are successively split into smaller clusters according
to some dissimilarity. Basically this isa top-down version.
326
• Probabilistic Clustering Object Classification
Using Unsupervised
Probabilistic clustering, e.g. mixture of Gaussian, usesa completely Learning
probabilistic approach.
The following requirements should be satisfied by clustering algorithm.
1) Scalability
2) Dealing with different types of attributes
3) Discovering clusters of arbitrary shape
4) Ability to deal with noise and outliners
5) High dimensionality
6) Insensitivity to the order of attributes
7) Interpretability and usability
Major problems encountered with clustering algorithms are:
• Dealing with large number ofdimensions anda large number ofobjects

1. Hierarchic versus Non-hierarchic Methods: This isa major distinction


involving both the methods and theclassification structures designed with
them. The hierarchic methods generate clusters as nested structures, ina
hierarchical fashion; the clusters of higher levels are aggregations of the
clusters of lower levels. Non- hierarchic methods result ina set of un-
nested clusters. Sometimes, the user, even when he utilizesa hierarchical
clustering algorithm, is interested rather in partitioning the set of the
entities considered.
2. Agglomerative versus Divisive Methods: Agglomerative method isa
bottom up approaching and involves merging smaller clusters into larger
ones while the Divisive method isa top-down approach where large
clusters are split into smaller ones. Agglomerative methods have been
developed for processing mostly similarity/dissimilarity data while the
divisive methods mostly work with attribute-based information,
producing attribute-driven subdivisions (conceptual clustering).

327
Computer Vision-II
Clustering

Hierarchical Nonhierarchical

Agglomerative Divisive Overlap Nonoverlapping

Fig 3: Classical Taxonomy ofClustering Methods


Try an exercise.

a cluster.

p1
p3
p4
p2
p1 p2 p3 p4
(a)Dendrogram (b)Nested Clusters
Fig. 4: A Hierarchical Clustering of Four Points

Logically, several approaches are possible to finda hierarchy associated with


the data. The popular approach is to construct the hierarchy level-by-level,
from bottom totop (agglomerative clustering) or from toptobottom (divisive
clustering). Let us discuss hicrachical clustering methods one by one in
detail.
328
Agglomerative Hierarchical Clustering Object Classification
Using Unsupervised
Agglomerative hierarchical techniques are the more commonly used methods Learning
for clustering. Each object initially represents a cluster of its own. Then
clusters are successively merged until the desired cluster structure is
obtained. Divisive hierarchical clustering. All objects initially belong to one
cluster. Then thecluster is divided into sub-clusters, which aresuccessively
divided into their own sub-clusters. This process continues until the desired
cluster structure is obtained. The result of the hierarchical methods isa
dendrogram, representing the nested grouping of objects and similarity levels
at which groupings change. A clustering of the data objects is obtained by
cutting the dendrogram at the desired similarity level. The merging or
division of clusters is performed according to some similarity measure,
chosen so as to optimize some criterion (such asa sum of squares).
The steps of general agglomerative clustering algorithm are as follows:
Step 1: Begin with N clusters. Each cluster consists of one sample.
Step 2: Repeat Step2 a total of N 1 times

Fig. 5: Cluster Distance in Nearest Neighbour Method

Example 4: Let us suppose that Euclidean distance is the appropriate


measure of proximity. Consider the five observations given as a, b, c,d and
are shown inFig. 6(b), and are forming its own cluster. The distance between
each pair of observations is shown inFig. 6(a).
For example, the distance betweena andb is

(2 — 8)2+ (4 — 2)2 = 36 +4 = 6.325.


329
Computer Vision-II Observationsb ande are nearest (most similar) and, as shown inFig. 6(b),
are grouped in the same cluster. Assuming the nearest neighbor method is
used, the distance between the cluster (be) and another observation is the
smaller of the distances between that observation, on the one hand, andb and
e, on the other.

Cluster b c d e
a 6.325 7.071 1.414 7.159
b 0 1.414 7.616 1.118
c 0 8.246 2.062
d 0 8.500
e 0

(a)

X2

X2
d
(be) 0 6.325 1.414 7.614 5 y

a 0 7.071 1.414

0 8.246 b R
K e
d 0
5 10 Xl
(a) (b)
Fig. 7: Nearest Neighbour Method, (Step 2).
Two pairs of clusters are closest to one another at distance 1.414; these are
(ad) and (bee). We arbitrarily select (ad) as the new cluster, as shown inFig.
7(b).
The distance between (be) and (ad) is
D(be, ad) = ruin{ D(be, a), D(be, d)} = ruin( 6.325, 7.616} = 6.325, while that
330 betweenc and (ad) is
Object Classification
D(c, ad) = ruin{ D(c, a), D(c, d)} = min{ 7.071, 8.246} = 7.071. Using Unsupervised
Learning
The three clusters remaining at this step and the distances between these
clusters are shown inFig.8 (a). We merge (be) withc to form the cluster
(bee) shown inFig.8 (b).
The distance between thetwo remaining clusters is

D(ad, bce) = min{ D(ad, be), D(ad, c)}= min{ 6.325, 7.071} = 6.325.
The grouping of these two clusters, it will be noted, occurs ata distance of
6.325,a much greater distance than that at which theearlier groupings took
place. Fig.9 shows thefinal grouping.

Cluster (be) (ad) c


(be) 0 6.325 1.414
(ad) 0 7.071

The groupings and the distance between the clusters are also shown inthe
tree diagram (dendrogram) of Fig.10. One usually searches the dendrogram
forlarge jumps inthegrouping distance as guidance inarriving at the number
ofgroups. In this example, it is clear that the elements in each of the clusters
(ad) and (bce) are close(they were merged ata small distance), but the
clusters are distant (the distance at which they merge is large).

Fig. 10: Nearest neighbour method, (Dendrogram)

331
Computer Vision-II Complete-link clustering (also called the diameter method, the maximum
method or the furthest neighbour method) - methods that consider the
distance between two clusters to be equal to the longest distance from any
member of one cluster to any member of the other cluster. The nearest
neighbour is not the only method for measuring the distance between
clusters. Under the furthest neighbor (or complete linkage) method, the
distance between two clusters is the distance between their two most distant
members. This method tends to produce clusters at the early stages that have
objects that are withina narrow range of distances from each other. If we
visualize them as objects in space the objects in such clusters would havea
more spherical shape as shown inFig. 11.

Fig. 11: Cluster Distance (Furthest Neighbour Method)

i dit ii i i1 it

(be) 0 7.159 2.062 8.500


a 0 7.071 1.414
c 0 8.246
d 0

(a)
Fig.12: Furthest Neighbour Method (Step 2).

The nearest clusters are (a) and (d), which arenow grouped into the cluster
(ad). The remaining steps are similarly executed.

You may confirm from the Example4 and Example5 that the nearest and
furthest neighbour methods produce the same results. In other cases,
332 however, the two methods may not agree. Consider Fig. 13(a) as an example.
The nearest neighbour method will probably not form the two groups Object Classification
Using Unsupervised
perceived by the naked eye. This is so because at some intermediate step the Learning
method will probably merge thetwo “nose" points joined in Fig. 13(a) into
the same cluster, and proceed to string along the remaining points in chain-
link fashion. The furthest neighbour method, will probably identify the two
clusters because it tends to resist merging clusters the elements of which vary
substantially in distance from those of the other cluster. On the other hand,
the nearest neighbour method will probably succeed in forming the two
groups marked inFig. 13(b), but the furthest neighbor method will probably
not.

(a) x y coordinates for6 points (b) Graph for


6 two-dimensional points
p1 p2 p3 p4 p5 p6
p1 0.00 0.24 0.22 0.37 0.34 0.23
p2 0.24 0.00 0.15 0.20 0.14 0.25
p3 0.22 0.15 0.00 0.15 0.28 0.11
p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39
p6 0.23 0.25 0.11 0.22 0.39 0.00
(c) Euclidean Distance Matrix for6 Points
Fig. 17
Perform clustering using
(i) single link clustering
(ii) complete link clustering
(iii) average link clustering
333
Computer Vision-II (iv) Ward's method

Inthefollowing section, we shall discuss partitioned clustering.

13.6 PARTITIONAL CLUSTERING


Partitioning clustering begins with a starting cluster partition which is
iteratively improved untila locally optimal partition is reached. The starting
clusters can be either random orthecluster output from some clustering pre-
process (e.g. hierarchical clustering). In the resulting clusters, the objects in
the groups together add up to the full object set. Partitioning procedures differ
with respect to the methods used to determine the initial partition of the data,
how assignments are made during each pass or iteration, and the clustering
criterion used. The most frequently used method assigns objects to the
clusters having the nearest centroid. This procedure creates initial partitions
based on the results from preliminary hierarchical cluster procedures such as
the average linkage method or Ward's method,a procedure that resulted in

Step 1: Initialize the cluster centroid to the seed points.


Step 2: For each sample, find the cluster centroid nearest to it. Put the
samples inthe cluster identified with the nearest cluster centroid.
Step 3: Ifno samples changed clusters in Step 2, stop.
Step 4: Compute thecentroids of the resulting clusters and go to step 2.

Let us apply these steps in the following example.


Example 6:Perform partitional clustering using Frogy's method forthe
data given in Fig. 18 (a) with k-2 (two clusters). Use first two sample points
(4,4) and (8,4) as seed points.

334
, y Sample Nearest cluster Object Classification
centroid Using Unsupervised
14 4 Learning
2 8 4 (4,4) (4,4)
315 8 (8,4) (8,4)
4 24 4 (15,8) (8,4)
5 24 12 (24,4) (8,4)
(24,12) (8,4)
(a) x-y Coordinates for5 Points (b) First Iteration

Sample Nearest Sample Nearest


cluster cluster
centroid centroid
(4,4) (4,4) (4,4) (6,4)
(8,4) (4,4) (8,4) (6,4)
(15,8) (17.75,7) (15,8) (21,8)
(24,4) (17.75,7) (24,4) (21,8)
(24,12) (17.75,7) (24,12) (21,8)

For Step 4, we compute the centroid (6,4) and (21,8) of the clusters. As no
sample changed clusters, the algorithm terminates.

Try an exercise.

E5) Consider the data

Performa partitional clustering using


335
Computer Vision-II (i) k =2 and use the first two samples inthe list as seed points.
(ii) k =3 and use the first three samples inthe list as seed points.

In the following section, we discussk -means clustering.

13.7 K-MEANS CLUSTERING


TheK -means clustering technique is simple, and we first choosek initial
centroids, where k isa user-specified parameter, namely, the number of
clusters desired. Each point is then assigned to the closest centroid, and each
collection of points assigned toa centroid isa cluster. The centroid of each
cluster is then updated based on the points assigned to the cluster. We repeat
the assignment and update steps until no point changes clusters, or
equivalently, until the centroids remain the same. In its simplest form, thek -
means method follows the following steps.
Step 1: Specify the number of clusters and, arbitrarily or deliberately, the
b f h 1 t

In the first step, shown in Fig. 19(a), points are assigned to the initial
centroids, which areall in the larger group of points. For this example, we
use the mean asthecentroid. After points are updated again. In steps 2, 3, and
4, which are shown in Fig. 19(b), (c), and (d), respectively, two of the
centroids move tothetwo small groups of points

(a) First Iteration (b) Second iteration (c) Third Iteration (d) Fourth Iteration

Fig.19: Using theK -Means Algorithm


336
at the bottom of thefigures. When theK -means algorithm terminates in Fig. Object Classification
Using Unsupervised
19(d), because no more changes occur, the centroids have identified the Learning
natural groupings of points. Centroid at the beginning of the step and the
assignment of points to those centroids. In the second step, points are
assigned to the updated centroids, and the centroids.
Let us understand this in the following example.
Example 7:Suppose two clusters are to be formed fortheobservations listed
in Fig. 20(a). We begin by arbitrarily assigning a,b and d to Cluster 1, and
c and e to Cluster 2. The cluster centroids are calculated as shown inFig.
20(a).
The cluster centroid is the point with coordinates equal to the average values
of the variables for the observations in that cluster. Thus, the centroid of
Cluster1 is the point (Xl = 3.67, X2 = 3.67), and that of Cluster2 the point
(8.75, 2). The two centroids are marked by Cl and C2 inFig. 20(a). The
cluster's centroid, therefore, can be considered the center of the observations

K e
0 Xl
5 10
(b)
Fig. 20: Means Method (Step 1)
2 2
D(a, abd) = (2 — 3.67) + (4 — 3.67) = 1.702.
D(a, ce)= (2 — 8.75)2 + (4 — 2)2 = 7.040.

Observe thata is closer to the centroid of Cluster 1, to which it is currently


assigned. Therefore, a is not reassigned. Next, we calculate the distance
betweenb and thetwo cluster centroids:

D(b, abd) = (8— 3.67)2 + (2 — 3.67)2 = 4.641.


D(b, cc)= (8— 8.75)2 + (2 — 2)2 = 0.750.
337
Computer Vision-II Since b is closer to Cluster 2's centroid than to that of Cluster 1, it is
reassigned to Cluster 2. The new cluster centroids are calculated as shown in
Fig. 21(a).The new centroids are plotted in Fig. 21(b). The distances of the
observations from thenew cluster centroids are shown inFig. 21(c).

(an asterisk indicates the nearest centroid):

| Cluster1 Cluster2 |
| Observation xl x2 | Observation xl x2
a 2 4 c 9 3
d 1 5 e 8.5 1
b 8 2
Average 1.5 4.5 Average 8.5 2

(a)

X2
d ci

Fig. 21: Means Method (Step 2)


Every observation belongs to the cluster to the centroid of which it
is nearest, and the k -means method stops. The elements of the two clusters
are shown inFig. 21(c).

Now, we list the benefits and drawbacks of k-means methods.


Benefits:

i) Very fast algorithm (O (k.d . N), ifwe limit the number ofiterations)
2) Convenient centroid vector for every cluster
3) Can be run multiple times to get different results

338
Limitations: Object Classification
Using Unsupervised
Learning
1) Difficult to choose thenumber ofclusters,k
2) Cannot be used with arbitrary distances
3) Sensitive to scaling — requires careful preprocessing
4) Does notproduce the same result every time
5) Sensitive to outliers (squared errors emphasize outliers)
6) Cluster sizes can be quite unbalanced (e.g., one-element outlier clusters)
Try an exercise.

E6) What areadvantages and disadvantages ofk -means clustering


methods?

Now let us summaries what we have learnt in this unit.

the Minkowski
1
d P
d(Xi› XQ)' I X{$k XJ$k Metric:
k l

Whered is the dimensionality of the data.


Special Cases:
• p=2: Euclidean distance

• p=1: Manhattan distance


The commonly used Euclidean distance between two objects is
achieved whenp = 2.
1
2 2 2
did= ((xi1— Xj1) + (Xi2 Xj2) + ’’’+ (Xid Xjd) )2

339
Computer Vision-II Another well-known measure is the Manhattan distance which is
defined whenp = 1.

dij' 1 ( i1 Xj1)+ (Xi2 Xj2)+ + (Xid Xjd) 1

The Mahalanobis distance is another very important distance measure


used in statistics that measures the statistical distance between two
populations of Gaussian mixtures having mean p, and Jj anda
common covariance matrix This measure is given by

T
d

E2) i) Exclusive clustering


ii) Overlapping clustering
iii) Agglomerative clustering

Model Graph Spectral


Based Theoretic

E4) (i) Single link clustering

(a) Single Link Clustering (b) Single Link Dendrogram


340
Object Classification
Using Unsupervised
dist ((3,6},(2,5}) = min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) Learning
= min (0.15,0.25,0.28,093)
= 0.15.

(ii) Complete link clustering

(a) Complete Link Clustering (b) Complete Link Dendrogram

(a)Group Average Clustering (b) Group Average Dendrogram

dist({3,6,4},(1}) = (0.22 + 0.37 + 0.23)/(3 s1)


= 0.28
dist({2,5}, (1}) = (0.2357 + 0.3421)/(2* 1).
= 0.2889

dist({3,6,4}, (2,5}) = (0.15 + 0.28+ 0.25 + 0.39+ 0.20 + 0.29)/(6* 2)

= 0.26

iii) Clustering using Ward's method

341
Computer Vision-II

(a) Ward's Clustering (b) Ward's Dendrogram

E6) Benefits ofk -nnalgorithm:

1) Very fast algorithm (O(k.d . N), ifwe limit the number of


iterations)
2) Convenient centroid vector for every cluster
3) Can be run multiple times to get different results
Limitations ofk - nn algorithm:

1) Difficult to choose thenumber ofclusters,k


2) Cannot be used with arbitrary distances
3) Sensitive to scaling requires careful preprocessing
4) Does notproduce the same result every time
5) Sensitive to outliers (squared errors emphasize outliers)
6) Cluster sizes can be quite unbalanced (e.g., one-element outlier
clusters)

342

You might also like