0% found this document useful (0 votes)

15 views

02 - Clustering

The document discusses unsupervised learning and clustering algorithms. It defines supervised vs. unsupervised learning, and describes clustering as a form of nonparametric unsupervised learning. The key components of clustering are discussed as proximity measures, criterion functions, and clustering algorithms like k-means and hierarchical clustering. K-means clustering is described in detail, including initializing cluster centers, computing cluster means, and iteratively reassigning samples until cluster membership stabilizes.

Uploaded by

عبد الحميد عمرو عبد الحميد فرغلى هلالى

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

02 - Clustering

Uploaded by

عبد الحميد عمرو عبد الحميد فرغلى هلالى

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CSE464 & CSE468 : Image

Processing and Pattern

Recognition

Clustering
by:
Hossam El Din Hassan Abd El Munim
‫حسام الدين حسن عبد المنعم‬
Computer & Systems Engineering Dept.,
Ain Shams University,
1 El-Sarayat Street, Abbassia, Cairo 11517

1
Today

▪ New Topic: Unsupervised Learning

▪ Supervised vs. unsupervised learning
▪ Unsupervised learning
▪ Nonparametric unsupervised learning = clustering
▪ Proximity Measures
▪ Criterion Functions
▪ Flat Clustering
▪ k-means
▪ Hierarchical Clustering
▪ Divisive
▪ Agglomerative
Supervised vs. Unsupervised Learning
▪ Up to now we considered supervised learning
scenario, where we are given
1. samples x1,…, xn
2. class labels for all samples x1,…, xn
▪ This is also called learning with teacher, since correct
answer (the true class) is provided

▪ In the next few lectures we consider

unsupervised learning scenario, where we are
only given
1. samples x1,…, xn
▪ This is also called learning without teacher, since
correct answer is not provided
▪ do not split data into training and test sets
Unsupervised Learning
a lot is
▪ Data is not labeled known
”easier”

1. Parametric Approach
▪ assume parametric distribution of data
▪ estimate parameters of this distribution
▪ much “harder” than supervised case
▪ NonParametric Approach
▪ group the data into clusters, each cluster (hopefully)
says something about categories (classes) present in
the data

little is
known
“harder”
Why Unsupervised Learning?
▪ Unsupervised learning is harder
▪ How do we know if results are meaningful? No answer
labels are available.
▪ Let the expert look at the results (external evaluation)
▪ Define an objective function on clustering (internal evaluation)
▪ We nevertheless need it because
1. Labeling large datasets is very costly (speech recognition)
▪ sometimes can label only a few examples by hand
2. May have no idea what/how many classes there are (data
mining)
3. May want to use clustering to gain some insight into the
structure of the data before designing a classifier
▪ Clustering as data description
Clustering
▪ Seek “natural” clusters in the data

▪ What is a good clustering?

▪ internal (within the cluster) distances should be small
▪ external (intra-cluster) should be large
▪ Clustering is a way to discover new
categories (classes)
What we Need for Clustering
1. Proximity measure, either
▪ similarity measure s(xi,xk): large if xi,xk are similar
▪ dissimilarity(or distance) measure d(xi,xk): small if xi,xk are similar

large d, small s large s, small d

2. Criterion function to evaluate a clustering

good clustering bad clustering

3. Algorithm to compute clustering

▪ For example, by optimizing the criterion function
How Many Clusters?

3 clusters or 2 clusters?

▪ Possible approaches
1. fix the number of clusters to k
2. find the best clustering according to the criterion
function (number of clusters may vary)
Proximity Measures
▪ good proximity measure is VERY application
dependent
▪ Clusters should be invariant under the transformations
“natural” to the problem
▪ For example for object recognition, should have
invariance to rotation

distance 0

▪ For character recognition, no invariance to rotation

9 6
Distance (dissimilarity) Measures
▪ Euclidean distanced
  Σx    x   
d xi , x j  i
k
j
k 2

k 1
▪ translation invariant

▪ Manhattan (city block) distance

 
d x i , x j  Σx ik   x jk 
d

k 1
▪ approximation to Euclidean distance,
cheaper to compute

▪ Chebyshev distance
 
d x i , x j  max | x ik   x jk  |
1k d

▪ approximation to Euclidean distance,

cheapest to compute
Similarity Measures
▪ Cosine similarity:

s xi , x j   x Ti x j
|| x i || || x j ||

▪ the smaller the angle, the larger the

similarity
▪ scale invariant measure
▪ popular in text retrieval

▪ Correlation coefficient
▪ popular in image processing

Σx    x x    x 
d
k k
i i

 
i i
s xi , x j  k 1

「d
  Σx 
1/ 2

|Σ xi  x i
d
k  2 k 
 x j 2|
]
j
k 1
k 1
Feature Scale
Simplest Clustering Algorithm
•Two Issues
1.How to measure similarity between samples?
2.How to evaluate partitioning?
•If distance is a good measure of dissimilarity, distance between
samples in same cluster must be smaller than distance between
samples in different clusters
•Two samples belong to the same cluster if distance between them is
less than a threshold d0.
Scaling Axis
Criterion Functions for Clustering
▪ Have samples x1,…,xn
▪ Suppose partitioned samples into c subsets D1,…,Dc
D3
D1
D2
▪ There are approximately cn/c! distinct partitions
▪ Can define a criterion function J(D1,…,Dc) which
measures the quality of a partitioning D1,…,Dc
▪ Then the clustering problem is a well defined
problem
▪ the optimal clustering is the partition which optimizes the
criterion function
SSE Criterion Function
▪ Let ni be the number of samples in Di, and define
the mean of samples in in Di
i 
1
ni
Σx
xDi

▪ Then the sum-of-squared errors criterion function (to

minimize) is:
JSSE  ΣΣ|| x  i ||
c
2

i 1 xDi

2
1

▪ Note that the number of clusters, c, is fixed

SSE Criterion Function
JSSE  ΣΣ|| x  i ||2
c

i 1 xDi

▪ SSE criterion appropriate when data forms compact

clouds that are relatively well separated

▪ SSE criterion favors equally sized clusters, and may

not be appropriate when “natural” groupings have
very different sizes
large JSSE small JSSE
K-means Clustering
▪ We now consider an example of iterative optimization algorithm for the special case of
JSSE objective function

JSSE  ΣΣ|| x  i ||2

i 1 xDi

▪ for a different objective function, we need a different optimization algorithm, of

course

▪ Fix number of clusters to k (c = k)

▪ k-means is probably the most famous clustering algorithm

▪ it has a smart way of moving from current partitioning to the next one
K-means Clustering k=3

1. Initialize x
▪ pick k cluster centers arbitrary
▪ assign each example to closest x x
center

2. compute sample
means for each cluster x
x x

3. reassign all samples to the x

closest mean x
x

4. if clusters changed at step 3, go to step 2

K-means Clustering (Means Derivation)
K-means Clustering
▪ In practice, k-means clustering performs usually
well

▪ It is very efficient

▪ Its solution can be used as a starting point for

other clustering algorithms

▪ Still 100’s of papers on variants and improvements

of k-means clustering every year
Clustering Criteria Again
23

• Scatter criteria
• Scatter matrices used in multiple discriminant analysis,
i.e., the within-scatter matrix SW and the between-
scatter matrix SB
ST = SB +SW
• Note:
• ST does not depend on partitioning
• In contrast, SB and SW depend on partitioning
• Two approaches:
• minimize the within-cluster
• maximize the between-cluster scatter

Pattern Classification, Chapter 10

The trace (sum of diagonal elements) is the

simplest scalar measure of the scatter matrix
c c
trSW    trSi     x  m i  Je
2

i 1 i 1 xDi

• proportional to the sum of the variances in the

coordinate directions
• This is the sum-of-squared-error criterion, Jsse.

Pattern Classification, Chapter 10

• As tr[ST] = tr[SW] + tr[SB] and tr[ST] is independent from

the partitioning, no new results can be derived by
minimizing tr[SB]

• However, seeking to minimize the within-cluster criterion

Jsse=tr[SW], is equivalent to maximise the between-cluster
criterion c
trS B    ni m i  m
2

i 1

where m is the total mean vector:

1 1 c
m   x   ni mi
n D n i 1

Pattern Classification, Chapter 10

Iterative optimization
• Clustering  discrete optimization problem
• Finite data set  finite number of partitions
• What is the cost of exhaustive search?
cn/c! For c clusters. Not a good idea

• Typically iterative optimization used:

• starting from a reasonable initial partition
• Redistribute samples to minimize criterion function.
•  guarantees local, not global, optimization.
Pattern Classification, Chapter 10
27

• consider an iterative procedure to minimize the

sum-of-squared-error criterion Je
c
Je   Ji Ji   xm
2
where i
i 1 xDi

where Ji is the effective error per cluster.

• Moving sample x̂ from cluster Di to Dj, changes

the errors in the 2 clusters by:
nj 2 ni
J j*  J j  xˆ  m j Ji*  Ji  xˆ  m i
2

n j 1 ni  1

Pattern Classification, Chapter 10

• Hence, the transfer is advantegeous if the

decrease in Ji is larger than the increase in Jj
ni nj 2
xˆ  m i  xˆ  m j
2

ni  1 n j 1

Pattern Classification, Chapter 10

• Alg. 3 is sequential version of the k-means alg.

• Alg. 3 updates each time a sample is reclassified
• k-means waits until n samples have been reclassified
before updating

• Alg 3 can get trapped in local minima

• Depends on order of the samples
• However, it is at least a stepwise optimal procedure,
and it can be easily modified to apply to problems in
which samples are acquired sequentially and
clustering must be done on-line.

• Alg 2 is the fuzzy k-means clustering. Pattern Classification, Chapter 10

Selecting the Number of
Clusters (Elbow Method)
31

Hierarchical Clustering
• Many times, clusters are not disjoint, but a cluster
may have subclusters, in turn having sub-
subclusters, etc.
• Consider a sequence of partitions of the n
samples into c clusters
• The first is a partition into n cluster, each one
containing exactly one sample
• The second is a partition into n-1 clusters, the third into
n-2, and so on, until the n-th in which there is only one
cluster containing all of the samples
• At the level k in the sequence, c = n-k+1.
Pattern Classification, Chapter 10
32

• Given any two samples x and x’, they will be grouped

together at some level, and if they are grouped a level k,
they remain grouped for all higher levels
• Hierarchical clustering  tree representation called
dendrogram

Pattern Classification, Chapter 10

• Are groupings natural or forced: check similarity values

• Evenly distributed similarity  no justification for grouping

• Another representation is based on set, e.g., on the Venn

diagrams

Pattern Classification, Chapter 10

• Hierarchical clustering can be divided in

agglomerative and divisive.

• Agglomerative (bottom up, clumping): start with n

singleton cluster and form the sequence by
merging clusters

• Divisive (top down, splitting): start with all of the

samples in one cluster and form the sequence by
successively splitting clusters

Pattern Classification, Chapter 10

Agglomerative hierarchical clustering

• The procedure terminates when the specified

number of cluster has been obtained, and returns
the cluster as sets of points, rather than the mean
or a representative vector for each cluster
Pattern Classification, Chapter 10
36

• At any level, the distance between nearest clusters

can provide the dissimilarity value for that level
• To find the nearest clusters, one can use
d min ( Di , D j )  min x  x'
xDi , x 'D j

d max ( Di , D j )  max x  x'

xDi , x 'D j

1
d avg ( Di , D j ) 
ni n j
  x  x'
xDi x 'D j

d mean ( Di , D j )  m i  m j
which behave quite similar of the clusters are
hyperspherical and well separated.
• The computational complexity is O(cn2d2), n>>c
Pattern Classification, Chapter 10
• Nearest-neighbor algorithm (single linkage)
37

• dmin is used
• The use of dmin as a distance measure and the
agglomerative clustering:-

Pattern Classification, Chapter 10

• The farthest neighbour algorithm (complete linkage)
38

• dmax is used
• When two clusters are merged, the graph is changed by adding
edges between every pair of nodes in the 2 clusters

• All the procedures involving minima or maxima are sensitive to

outliers. The use of dmean or davg are natural compromises
Pattern Classification, Chapter 10
Applications of Clustering
▪ Image segmentation
▪ Find interesting “objects” in images to focus attention at

From: Image Segmentation by Nested Cuts, O. Veksler, CVPR2000

Applications of Clustering
▪ Image Database Organization
▪ for efficient search
Applications of Clustering
▪ Data Mining
▪ Technology watch
▪ Derwent Database, contains all patents filed in the
last 10 years worldwide
▪ Searching by keywords leads to thousands of
documents
▪ Find clusters in the database and find if there are any
emerging technologies and what competition is up to
▪ Marketing
▪ Customer database
▪ Find clusters of customers and tailor marketing
schemes to them
Applications of Clustering
▪ Profiling Web Users
▪ Use web access logs to generate a feature vector for
each user
▪ Cluster users based on their feature vectors
▪ Identify common goals for users
▪ Shopping
▪ Job Seekers
▪ Product Seekers
▪ Tutorials Seekers
▪ Can use clustering results to improving web content and
design
Summary
▪ Clustering (nonparametric unsupervised learning)
is useful for discovering inherent structure in data
▪ Clustering is immensely useful in different fields
▪ Clustering comes naturally to humans (in up to 3
dimensions), but not so to computers
▪ It is very easy to design a clustering algorithm, but
it is very hard to say if it does anything good
▪ General purpose clustering does not exist, for best
results, clustering should be tuned to application at
hand

Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
24 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
ML Co4 Session 29
No ratings yet
ML Co4 Session 29
36 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
No ratings yet
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
32 pages
Slides Courtesy: Ling Chen [email protected]
No ratings yet
Slides Courtesy: Ling Chen [email protected]
42 pages
clustering
No ratings yet
clustering
62 pages
Clustering
No ratings yet
Clustering
80 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
dm 4
No ratings yet
dm 4
76 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
cs4811-ch10c-clustering
No ratings yet
cs4811-ch10c-clustering
35 pages
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
No ratings yet
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
31 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Clustering Class Ppt
No ratings yet
Clustering Class Ppt
103 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Lect 4
No ratings yet
Lect 4
34 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Learning
No ratings yet
Learning
48 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Clustering
No ratings yet
Clustering
28 pages
Lecture 8
No ratings yet
Lecture 8
56 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
linear-models-for-classification
No ratings yet
linear-models-for-classification
21 pages
Lecture 8-9 - Clustering
No ratings yet
Lecture 8-9 - Clustering
43 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
AI Lecture 12
No ratings yet
AI Lecture 12
49 pages
Unit 5 - KVR
No ratings yet
Unit 5 - KVR
41 pages
Introduction To Data Science: Tom A S Horv Ath
No ratings yet
Introduction To Data Science: Tom A S Horv Ath
39 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
UNIT V AIML
No ratings yet
UNIT V AIML
102 pages
5i
No ratings yet
5i
38 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Gap Statistic
No ratings yet
Gap Statistic
32 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
How To CombineNeural Networks Anddecision Trees
No ratings yet
How To CombineNeural Networks Anddecision Trees
34 pages
Chap 10 Object Recognition
No ratings yet
Chap 10 Object Recognition
28 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Section 3
No ratings yet
Section 3
22 pages
Intel Ligència Artificial: Beatriz López Curs 2014-15
No ratings yet
Intel Ligència Artificial: Beatriz López Curs 2014-15
58 pages
Lec13 Image Classification
No ratings yet
Lec13 Image Classification
71 pages
3c Feature Extraction
No ratings yet
3c Feature Extraction
19 pages
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
No ratings yet
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
21 pages
I Can Master Multiplication, Grades 3 - 4
From Everand
I Can Master Multiplication, Grades 3 - 4
Carson Dellosa Education
No ratings yet
Hexagon Number Sense
From Everand
Hexagon Number Sense
Christopher Casey
No ratings yet
Gate-Level Combinational Circuit: FPGA Prototyping
No ratings yet
Gate-Level Combinational Circuit: FPGA Prototyping
6 pages
A HybridMCDMApproach Using the BWMand the TOPSIS for a Financial Performance-Based Evaluation of Saudi Stocks
No ratings yet
A HybridMCDMApproach Using the BWMand the TOPSIS for a Financial Performance-Based Evaluation of Saudi Stocks
22 pages
On Tap Tieng Anh Lop 7 HKII
No ratings yet
On Tap Tieng Anh Lop 7 HKII
7 pages
Algebra 03 OK
No ratings yet
Algebra 03 OK
61 pages
Cot - DLP - Mathematics 5 by Teacher Filpa D. Aro
100% (2)
Cot - DLP - Mathematics 5 by Teacher Filpa D. Aro
4 pages
diagnostic test for 3is
No ratings yet
diagnostic test for 3is
4 pages
Diseño Vigas H Ó W
No ratings yet
Diseño Vigas H Ó W
109 pages
Presentation On Production Function
No ratings yet
Presentation On Production Function
31 pages
CSCI 512 - EENG 512 Computer Vision
No ratings yet
CSCI 512 - EENG 512 Computer Vision
4 pages
Aptitudeasd
0% (1)
Aptitudeasd
59 pages
Normed Linear Space-1
No ratings yet
Normed Linear Space-1
16 pages
Edwin F. Taylor, John Archibald Wheeler - Spacetime Physics-W. H. Freeman (1992)
No ratings yet
Edwin F. Taylor, John Archibald Wheeler - Spacetime Physics-W. H. Freeman (1992)
324 pages
Impact of Using Relationship Marketing Strategies On Customers Loyalty Study On STC Customers in Riyadh
No ratings yet
Impact of Using Relationship Marketing Strategies On Customers Loyalty Study On STC Customers in Riyadh
12 pages
Daily Learning Assignment: Teaching Material & Practice Activity
No ratings yet
Daily Learning Assignment: Teaching Material & Practice Activity
2 pages
Research Introduction - Exploring the Impact of AI Math Technology Applications Among STEM Sudents (1)
No ratings yet
Research Introduction - Exploring the Impact of AI Math Technology Applications Among STEM Sudents (1)
13 pages
Section 7.4
No ratings yet
Section 7.4
16 pages
Introductory Statistics Using SPSS Herschel (Hersch) E. (Edmond) Knapp 2024 Scribd Download
100% (4)
Introductory Statistics Using SPSS Herschel (Hersch) E. (Edmond) Knapp 2024 Scribd Download
51 pages
Hemispherical Head
No ratings yet
Hemispherical Head
5 pages
Basics of Flight Physics
No ratings yet
Basics of Flight Physics
17 pages
Improvements On Cross Validation The 632 Bootstrap Method
No ratings yet
Improvements On Cross Validation The 632 Bootstrap Method
14 pages
Presentation On Mathematical Concepts in African Games and Other Aspects of Cultural Heritage
No ratings yet
Presentation On Mathematical Concepts in African Games and Other Aspects of Cultural Heritage
21 pages
Miller 1983
No ratings yet
Miller 1983
23 pages
Quiz 1 Recommender 2024
No ratings yet
Quiz 1 Recommender 2024
5 pages
Data Analysis Using R Programming_f4bbedee6eb42e53b3cde0028f27ba5b
No ratings yet
Data Analysis Using R Programming_f4bbedee6eb42e53b3cde0028f27ba5b
341 pages
Numerical Solution Scheme For Inert Disperse and Dilute Gas-Particle Flows
No ratings yet
Numerical Solution Scheme For Inert Disperse and Dilute Gas-Particle Flows
18 pages
Subspace Pursuit For Compressive Sensing Signal Reconstruction
No ratings yet
Subspace Pursuit For Compressive Sensing Signal Reconstruction
19 pages
Scour Depth
No ratings yet
Scour Depth
20 pages
Exam Digital Communications I: 16th of March, 14:00-19:00
No ratings yet
Exam Digital Communications I: 16th of March, 14:00-19:00
2 pages
SI417-Tut1-2017 Probability
No ratings yet
SI417-Tut1-2017 Probability
3 pages
Axial Deformation
No ratings yet
Axial Deformation
33 pages

02 - Clustering

Uploaded by

02 - Clustering

Uploaded by

CSE464 & CSE468 : Image

Processing and Pattern

▪ New Topic: Unsupervised Learning

▪ In the next few lectures we consider

▪ What is a good clustering?

large d, small s large s, small d

2. Criterion function to evaluate a clustering

good clustering bad clustering

3. Algorithm to compute clustering

▪ For character recognition, no invariance to rotation

▪ Manhattan (city block) distance

▪ approximation to Euclidean distance,

▪ the smaller the angle, the larger the

▪ Then the sum-of-squared errors criterion function (to

▪ Note that the number of clusters, c, is fixed

▪ SSE criterion appropriate when data forms compact

▪ SSE criterion favors equally sized clusters, and may

JSSE  ΣΣ|| x  i ||2

▪ for a different objective function, we need a different optimization algorithm, of

▪ Fix number of clusters to k (c = k)

▪ k-means is probably the most famous clustering algorithm

3. reassign all samples to the x

4. if clusters changed at step 3, go to step 2

▪ Its solution can be used as a starting point for

▪ Still 100’s of papers on variants and improvements

Pattern Classification, Chapter 10

The trace (sum of diagonal elements) is the

• proportional to the sum of the variances in the

Pattern Classification, Chapter 10

• As tr[ST] = tr[SW] + tr[SB] and tr[ST] is independent from

• However, seeking to minimize the within-cluster criterion

where m is the total mean vector:

Pattern Classification, Chapter 10

• Typically iterative optimization used:

• consider an iterative procedure to minimize the

where Ji is the effective error per cluster.

• Moving sample x̂ from cluster Di to Dj, changes

Pattern Classification, Chapter 10

• Hence, the transfer is advantegeous if the

Pattern Classification, Chapter 10

• Alg. 3 is sequential version of the k-means alg.

• Alg 3 can get trapped in local minima

• Alg 2 is the fuzzy k-means clustering. Pattern Classification, Chapter 10

• Given any two samples x and x’, they will be grouped

Pattern Classification, Chapter 10

• Are groupings natural or forced: check similarity values

• Another representation is based on set, e.g., on the Venn

Pattern Classification, Chapter 10

• Hierarchical clustering can be divided in

• Agglomerative (bottom up, clumping): start with n

• Divisive (top down, splitting): start with all of the

Pattern Classification, Chapter 10

Agglomerative hierarchical clustering

• The procedure terminates when the specified

• At any level, the distance between nearest clusters

d max ( Di , D j )  max x  x'

Pattern Classification, Chapter 10

• All the procedures involving minima or maxima are sensitive to

From: Image Segmentation by Nested Cuts, O. Veksler, CVPR2000

You might also like