K Means Clustering in Image Segmentation
K Means Clustering in Image Segmentation
Segmentation
Pragna Goura
Clustering
- Clustering is a classification technique. It involves grouping similar data together.
- Given a vector of N measurements describing each pixel or group of pixels (i.e.,
region) in an image, a similarity of the measurement vectors and therefore their
clustering in the N-dimensional measurement space implies similarity of the
corresponding pixels or pixel groups.
K means
K-means clustering finds a grouping of the measurements that minimizes
the within-cluster sum-of-squares.
- In this method, each measurement, represented by a vector of length N,
is grouped so that it is assigned to one of a fixed number of clusters.
The number of clusters is determined by the number of seeds given as
the second argument of K-Means.
- Measurements are transferred from one cluster to another when doing
so decreases the within-cluster distances. The algorithm stops when no
more transfers can occur.
Why Initialization Matters
K-Means works by iteratively refining the positions of the centroids until they stabilize, meaning that no further
changes in the assignments of data points occur. However, the positions of these centroids at the beginning of the
process can affect:
● Which points are grouped together: Poor initialization may cause the algorithm to converge to
suboptimal clusters where similar points are incorrectly grouped.
● The number of iterations: Bad initial centroids may cause the algorithm to take longer to converge or
get stuck in a local minimum.
● Segmentation quality: In image segmentation, this can lead to improper or inaccurate segmentation,
particularly in cases where the image has complex structures or subtle intensity variations.
Types of Initialization in K-Means
Random Initialization (Standard K-Means): The most basic method is to randomly pick K data points from the dataset (or the image) as the initial centroids.
K-Means++ Initialization: The algorithm first selects one centroid randomly. Then, it chooses the next centroids by favoring data points that are farthest from the already chosen
centroids. This ensures that the centroids are spread out across the dataset, leading to more reliable clusters.
● Pros: Reduces the chances of poor initialization, often leading to better clustering results and faster convergence.
● Cons: Slightly slower than random initialization but more effective for complex data.
Forgy Initialization:
● Another method of randomly selecting K centroids directly from the data points (similar to random initialization).
● Pros: Simple to implement.
● Cons: Same limitations as random initialization; prone to suboptimal clustering