Unit-5
Unit-5
CLUSTERING
2. Image Segmentation
Application like Computer Vision, Partition the images into meaningful
segments (e.g., objects, background).
It is used in medical imaging, object recognition, and photo editing
applications. Dr. Suresh Chimkode
Few key uses of clustering are……
3. Document Clustering
Application like Information Retrieval, Natural Language Processing, Group
the similar documents together for topic extraction, information retrieval,
and document organization.
This is useful for search engines, recommendation systems, and organizing
large collections of texts.
4. Anomaly Detection
Application like Cybersecurity, Fraud Detection identify unusual patterns
that do not conform to expected behavior.
Clustering can help detect fraudulent transactions, network intrusions, and
other anomalies by grouping normal data points and identifying outliers.
K-Means Clustering
It divide the data such that the points within each cluster are as similar as
possible, while points in different clusters are as dissimilar as possible.
Choose the Number of Clusters 𝑘: The user specifies the number of clusters
1. Initialization
Initialize Centroids: Randomly select 𝑘 points from the dataset as the initial
they want to create.
centroids. These centroids act as the initial center points for the clusters.
2. Assignment Step
Assign Points to Nearest Centroid:
For each data point in the dataset, calculate its distance to each centroid.
Assign the data point to the cluster whose centroid is closest.
3. Update Step
Recalculate Centroids:
After all points are assigned to clusters, recalculate the centroids.
The new centroid of each cluster is the mean of all the points assigned to
that cluster.
4. Repeat
Iterate: Repeat the Assignment and Update steps until the centroids no longer
change significantly or until a predefined number of iterations
Dr. Suresh is reached.
Chimkode
Objective Function (Cost Function)
The goal of the K-Means algorithm is to minimize the sum of squared
distances between the data points and their assigned centroids. This is
known as the within-cluster sum of squared errors (WCSS).
The objective function to minimize is:
Example Dataset
Cluster Assignments:
Since the assignments and centroids haven’t changed after the update, the
algorithm converges.
Final Clusters:
For a given pair of medoids, 𝑚𝑎 and 𝑚𝑏 , and the swap between them, the
leads to the greatest improvement in minimizing the cost function.
4. Recompute Medoids:
After swapping, the medoids are updated based on the new clusters formed by
the closest points to the new medoids.
medoid. In this case, 𝑥2 (2,3) has the minimum dissimilarity of 2.82 and will
The point with the minimum total dissimilarity will be chosen as the new
i.e., 𝑥2 is the optimal medoid for Cluster 1 based on the total dissimilarity
be selected as the new medoid for Cluster 1.
calculation.
It is also known as top-down clustering, it starts with all data points in a single
cluster and splits the least cohesive clusters iteratively until each data point is
its own cluster or a desired number of clusters is reached.
ws
TH
Fo ’s?
llo
Dr. Suresh Chimkode