0% found this document useful (0 votes)

13 views82 pages

ML Unit III

The document outlines the syllabus for a course on Unsupervised Learning, focusing on clustering techniques such as K-Means, Hierarchical, and Density-Based clustering. It discusses the importance of clustering in detecting patterns in unlabeled data, various clustering algorithms, and their applications in fields like customer segmentation and fraud detection. Additionally, it covers performance measures like the Dunn index and methods for determining the optimal number of clusters using the Elbow method.

Uploaded by

yashikov99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views82 pages

ML Unit III

Uploaded by

yashikov99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

ML-UNIT III

SCHOOL OF COMPUTER ENGINEERING & TECHNOLOGY

Syllabus
Unsupervised Learning : Clustering
Distance Based Clustering algorithms :
K Means Clustering
Hierarchical Clustering
K-Medoids and density based clustering
Measures of quality of clustering
Implementation using Sklearn

2
Clustering
• Unsupervised learning
• Requires data, but no labels
• Detect patterns e.g.
Group emails or search results
Customer shopping patterns
Regions of images

3
Clustering
• Clustering is a technique in machine learning used to group similar data
points together in an unsupervised manner.
• In clustering, the goal is to partition a set of data points into subsets or
clusters based on the similarity of their attributes or features.
• The clusters are formed such that the data points within a cluster are more
similar to each other than to those in other clusters.
Example: Let's understand the clustering technique with the real-world
example of Mall: When we visit any shopping mall, we can observe that the
things with similar usage are grouped together. Such as the t-shirts are grouped
in one section, and trousers are at other sections, similarly, at vegetable
sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so
that we can easily find out the things. The clustering technique also works in
the same way.
4
▪ In general a grouping of objects such that the objects in a group (cluster) are
similar (or related) to one another and diﬀerent from (or unrelated to) the objects
in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

There are several clustering algorithms available, including k-means, hierarchical

clustering, and density-based clustering. The choice of algorithm depends on the
nature of the data, the number of clusters required, and the application domain.
5
Cntd..

6
Cntd…

7
Cntd..

8
Cntd..

9
Why Clustering
• Clustering is very much important as it determines the intrinsic grouping
among the unlabelled data present.
• There are no criteria for good clustering, it depends on the user, what is the
criteria they may use which satisfy their need.
• For instance, we could be interested in finding representatives for
homogeneous groups (data reduction), in finding “natural clusters” and
describe their unknown properties (“natural” data types), in finding useful and
suitable groupings (“useful” data classes).
• This algorithm must make some assumptions that constitute the similarity of
points and each assumption make different and equally valid clusters.

10
Clustering Applications
• Customer segmentation: Customer segmentation is the practice of dividing a company's
customers into groups that reflect similarity among customers in each group
• Fraud detection: Using techniques such as K-means Clustering, one can easily identify the
patterns of any unusual activities. Detecting an outlier will mean a fraud event has taken
place.
• Document groupings
• Image segmentation
• Anomaly detection

11
Clustering Methods
• Partitioning Methods: These methods partition the objects into k clusters and
each partition forms one cluster. This method is used to optimize an objective
criterion similarity function such as when the distance is a major parameter
example K-means
• Hierarchical Based Methods: The clusters formed in this method form a tree-type
structure based on the hierarchy. It is divided into two category
• Agglomerative (bottom-up approach)
• Divisive (top-down approach)

• Density-Based Methods: These methods consider the clusters as the dense region
having some similarities and differences from the lower dense region of the space.
These methods have good accuracy and the ability to merge two clusters.
Example DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
12
Dunn Index: Performance Measure
Dunn index
• Dunn Index is used to identify dense and well-separated groups.
• It is the ratio between minimum inter-cluster distance and maximum intra-cluster distance.
The Dunn index can be computed as below:

• Here d(i,j) is the distance between clusters i and j, which is the minimum of all inter-cluster
distances, and d(k) is the intra-cluster distance of cluster k, which is the maximum of all
intra-cluster distances.
• The algorithms that create clusters with a high Dunn index are more desirable as that way,
clusters would be more compact and different from each other.

13
K-mean Clustering
• K-Means clustering is an unsupervised iterative clustering technique.
• It partitions the given data set into k predefined distinct clusters.
• A cluster is defined as a collection of data points exhibiting certain similarities

14
K-mean Clustering

15
K-mean Clustering
It partitions the data set such that-
• Each data point belongs to a cluster with the nearest mean.

• Data points belonging to one cluster have high degree of

similarity.

• Data points belonging to different clusters have high degree

of dissimilarity.

• K-means algorithm partitions n observations into k clusters

where each observation belongs to the cluster with the
16
nearest mean serving as a prototype of the cluster.
K-mean Clustering
K-Means Clustering Algorithm-
K-Means Clustering Algorithm involves the following steps-
Step-01:
• Choose the number of clusters K.
Step-02:
• Randomly select any K data points as cluster centers.
• Select cluster centers in such a way that they are as farther as possible from each other.
Step-03:
• Calculate the distance between each data point and each cluster center.
• The distance may be calculated either by using given distance function or by using euclidean
distance formula.

17
K-mean Clustering
K-Means Clustering Algorithm-
Step-04:
• Assign each data point to some cluster.
• A data point is assigned to that cluster whose center is nearest to that data point.
Step-05:
• Re-compute the center of newly formed clusters.
• The center of a cluster is computed by taking mean of all the data points contained in that
cluster.
Step-06:
Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping
criteria is met-
• Center of newly formed clusters do not change
• Data points remain present in the same cluster
• Maximum number of iterations are reached
18
K-mean Clustering
• More detailed version of the algorithm:
• Input: A set of n data points X = {x1, x2, ..., xn}, and a number k of clusters
to form.
• Output: A set of k cluster centroids C = {c1, c2, ..., ck} and a set of k
clusters S = {S1, S2, ..., Sk}.
• Randomly select k data points as the initial centroids C = {c1, c2, ..., ck}.
• Repeat until convergence:
• a. For each data point xi, find the nearest centroid cj using Euclidean
distance.
• b. Assign xi to the cluster with centroid cj.
• c. Update the centroid cj by calculating the mean of all data points
assigned to it.
• Return the set of k centroids C and the set of k clusters S.
19
Advantages
Advantages of K-means clustering algorithm
• Relatively easy to understand and implement.

• Scalable to large datasets.

• Better computation cost.

• Easily warm start the assignments and positions of centroids

20
Disadvantages
Disadvantages of K-means clustering algorithm
• Choosing K manually and being dependent on the initial values

• Lacks consistent results for different values of K

• Always tries to find circular clusters

• Centroids get dragged due to outliers in the dataset

• Curse of dimensionality, K is ineffective when the number of dimensions increases

21
Elbow Method
K Means Clustering Using the Elbow Method
• In the Elbow method, we are actually varying the number of clusters (K) from 1 – 10.
• For each value of K, we are calculating WCSS (Within-Cluster Sum of Square).
• WCSS is the sum of the squared distance between each point and the centroid in a cluster.
• When we plot the WCSS with the K value, the plot looks like an Elbow.
• As the number of clusters increases, the WCSS value will start to decrease.
• WCSS value is largest when K = 1. When we analyze the graph, we can see that the graph
will rapidly change at a point and thus creating an elbow shape.
• From this point, the graph moves almost parallel to the X-axis. The K value corresponding
to this point is the optimal value of K or an optimal number of clusters.

22
Elbow Method

23
Variations
▪ K-medoids: Similar problem definition as in K-means, but the centroid of the cluster
is defined to be one of the points in the cluster (the medoid).
▪ A medoid can be defined as a point in the cluster, whose dissimilarities with all the
other points in the cluster are minimum.

▪ K-centers: Similar problem definition as in K- means, but the goal now is to

minimize the maximum diameter of the clusters (diameter of a cluster is maximum
distance between any two points in the cluster).

24
Hierarchical Clustering
Introduction
• Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster and also known
as Hierarchical Cluster Analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the form of a tree,
and this tree-shaped structure is known as the dendrogram.
• A dendrogram is a diagram that shows the hierarchical relationship
between objects.
• It is most commonly created as an output from hierarchical clustering.
• The main use of a dendrogram is to work out the best way to allocate
objects to clusters.
• Hierarchical clustering algorithms group similar objects into groups
called clusters.
25
Hierarchical Clustering: Introduction
• There are two types of hierarchical clustering algorithms:
• Agglomerative — Bottom up approach. Start with many small clusters and merge
them together to create bigger clusters.
▪ Start with the points as individual clusters
▪ At each step, merge the closest pair of clusters until only one cluster (or k clusters) left

• Divisive — Top down approach. Start with a single cluster than break it up into
smaller clusters.
▪ Start with one, all-inclusive cluster
▪ At each step, split a cluster until each cluster contains a point (or there are k clusters)

26
Introduction

27
Divisive Clustering
• The divisive clustering algorithm is a top-down clustering approach,
initially, all the points in the dataset belong to one cluster and split is
performed recursively as one moves down the hierarchy.
• Steps of Divisive Clustering:
• Initially, all points in the dataset belong to one single cluster.
• Partition the cluster into two least similar cluster
• Proceed recursively to form new clusters until the desired number
of clusters is obtained.

28
Agglomerative Clustering
▪ Produces a set of nested clusters organized as a
hierarchical tree
▪ Can be visualized as a dendrogram
▪ A tree like diagram that records the sequences of merge or splits

0.2 6 5

0.15 4
3 4
2
0.1 5
2
0.05
1
3 1
0

29
Strengths of Hierarchical Clustering
▪ Do not have to assume any particular number of clusters
▪ Any desired number of clusters can be obtained by ‘cutting’ the
dendogram at the proper level

30
Agglomerative Clustering
• Also known as bottom-up approach or Hierarchical Agglomerative
Clustering (HAC).
• This clustering algorithm does not require us to pre-specify the number of
clusters.
• Bottom-up algorithms treat each data as a singleton cluster at the outset
and then successively agglomerates pairs of clusters until all clusters have
been merged into a single cluster that contains all data.
• It means, this algorithm considers each dataset as a single cluster at the
beginning, and then start combining the closest pair of clusters together.
• It does this until all the clusters are merged into a single cluster that
contains all the datasets.

31
Agglomerative Clustering

32
Measure for the distance between two clusters

• Single Linkage: It is the Shortest Distance between the closest

points of the clusters. Consider the below image:

33
Measure for the distance between two clusters

• Complete Linkage: It is the farthest distance between the two

points of two different clusters. It is one of the popular linkage
methods as it forms tighter clusters than single-linkage.

34
Measure for the distance between two clusters
• Average Linkage: It is the linkage method in which the distance
between each pair of datasets is added up and then divided by the
total number of datasets to calculate the average distance between
two clusters. It is also one of the most popular linkage methods.

35
Measure for the distance between two clusters
• Centroid Linkage: It is the linkage method in which the distance
between the centroid of the clusters is calculated. Consider the
below image:

36
Cluster Distance Measures
Example: Given a data set of five objects characterised by a single feature, assume that
there are two clusters: C1: {a, b} and C2: {c, d, e}.
a b c d e
Feature 1 2 4 5 6

1. Calculate the distance matrix. 2. Calculate three cluster distances between C1 and C2.
Single link
a b c d e
dist(C1 ,C2 ) = min{d(a,c), d(a,d), d(a,e),d(b,c),d(b,d),
a 0 1 3 4 5 d(b,e)}
= min{3, 4, 5, 2, 3, 4} = 2
b 1 0 2 3 4
Complete link
c 3 2 0 1 2
dist(C1 ,C2 ) = max{d(a,c), d(a,d), d(a,e),d(b,c),d(b,d),
d 4 3 1 0 1 d(b,e)}
= max{3, 4, 5, 2, 3, 4} = 5
dist(C1 , C2 )
e 5 4 2 1 0
= 6
Average
3 + 4 + 5 + 2 + 3 + 4 = 21 d(b,c) + d(b,d) + d(b,e)
= d(a,c) + d(a,d) + d(a,e)+ =
3.5 37
6 6
Hierarchical Clustering: Problems and
Limitations
▪ Computational complexity in time and space

▪ Once a decision is made to combine two clusters, it

cannot be undone
▪ No objective function is directly minimized
▪ Different schemes have problems with one or more of the
following:
▪ Sensitivity to noise and outliers
▪ Difficulty handling different sized clusters and convex shapes
▪ Breaking large clusters

38
Customer Dataset
Annual Income Spending Score
CustomerID Gender Age (k$) (1-100) Cluster
1 1 19 15 39 3
2 1 21 15 81 4
3 0 20 16 6 3
4 0 23 16 77 4
5 0 31 17 40 3
6 0 22 17 76 4
7 0 35 18 6 3
8 0 23 18 94 4
9 1 64 19 3 3
10 0 30 19 72 4
Rows: 200 and Columns: 5

39
Implementation
• Step 1: Data Pre-processing:
• Importing the libraries
# Importing the libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
• The above lines of code are used to import the libraries to perform
specific tasks, such as numpy for the Mathematical operations,
matplotlib for drawing the graphs or scatter plot, and pandas for
importing the dataset.

40
Implementation
• Importing the dataset
• # Importing the dataset

dataset = pd.read_csv('Mall_Customers_data.csv')
• Extracting the matrix of features

• Here we will extract only the matrix of features as we don't have any
further information about the dependent variable.
x = dataset.iloc[:, [3, 4]].values
• Here we have extracted only 3 and 4 columns as we will use a 2D
plot to see the clusters. So, we are considering the Annual income
and Spending score as the matrix of features.

41
Implementation
• Step-2: Finding the optimal number of clusters using the Dendrogram
• Now we will find the optimal number of clusters using the Dendrogram
for our model. For this, we are going to use scipy library as it provides a
function that will directly return the dendrogram for our code.
• #Finding the optimal number of clusters using the dendrogram
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()

42
Implementation
• In the above lines of code, we have imported the hierarchy module
of scipy library.
• This module provides us a method shc.denrogram(), which takes
the linkage() as a parameter. The linkage function is used to define
the distance between two clusters, so here we have passed the
x(matrix of features), and method "ward," the popular method of
linkage in hierarchical clustering.
• Ward minimizes the sum of squared differences within all clusters.

43
Implementation
• Output:
• By executing the above lines of code, we will get the below output:

44
Implementation
• Using this Dendrogram, we will now determine the optimal number of clusters
for our model. For this, we will find the maximum vertical distance that does
not cut any horizontal bar. Consider the below diagram:

45
Implementation
• In the above diagram, we have shown the vertical distances that are
not cutting their horizontal bars. As we can visualize, the 4th distance
is looking the maximum, so according to this, the number of
clusters will be 5 (the vertical lines in this range).

46
Implementation
• Step-3: The hierarchical clustering model
• As we know the required optimal number of clusters, we can now
train our model.
• #training the hierarchical model on dataset
• from sklearn.cluster import AgglomerativeClustering
• hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkag
e='ward')
• y_pred= hc.fit_predict(x)

47
Implementation
• In the code, we have imported the AgglomerativeClustering class of cluster
module of scikit learn library.
• Then we have created the object of this class named as hc.
• The AgglomerativeClustering class takes the following parameters:
• n_clusters=5: It defines the number of clusters, and we have taken here 5
because it is the optimal number of clusters.
• affinity='euclidean': It is a metric used to compute the linkage.
• linkage='ward': It defines the linkage criteria, here we have used the "ward"
linkage. This method is the popular linkage method that we have already used
for creating the Dendrogram.
• In the last line, we have created the dependent variable y_pred to fit or train the
model. It does train not only the model but also returns the clusters to which
each data point belongs.

48
Implementation
• Step-4: Visualizing the clusters
• As we have trained our model successfully, now we can visualize the clusters
corresponding to the dataset.
• #visulaizing the clusters
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
49
Implementation
• Output: By executing the above lines of code, we will get the below
output:

50
Density based Clustering
• Why do we need a Density-Based clustering algorithm like DBSCAN when we already have
K-means clustering?
• K-Means clustering may cluster loosely related observations together and every observation
becomes a part of some cluster eventually, even if the observations are scattered far away in the
vector space.
• Since clusters depend on the mean value of cluster elements, each data point plays a role in
forming the clusters and a slight change in data points might affect the clustering outcome.
• This problem is greatly reduced in DBSCAN due to the way clusters are formed.
• Another challenge with k-means is that you need to specify the number of clusters (“k”) in order
to use it and much of the time, we won’t know what a reasonable k value is a priori.
• What’s nice about DBSCAN is that you don’t have to specify the number of clusters to use it.
• All you need is a function to calculate the distance between values and some guidance for what
amount of distance is considered “close”.
• DBSCAN also produces more reasonable results than k-means across a variety of different
distributions.

51
▪ Partitioning methods (K-means) and hierarchical clustering work for ﬁnding
spherical-shaped clusters or convex clusters.
• In other words, they are suitable only for compact and well-separated clusters.
Moreover, they are also severely aﬀected by the presence of noise and outliers
in the data.

▪ Real life data may contain irregularities, like:

▪ Clusters can be of arbitrary shape such as those shown in the ﬁgure below.
▪ Data may contain noise.

The figure shows a data set

containing outliers/noises. Given
such data, k-means algorithm has
difficulties in identifying these
clusters with arbitrary shapes.

52
DBSCAN algorithm requires two parameters:
1. eps : It defines the neighborhood around a data point i.e. if the distance
between two points is lower or equal to ‘eps’ then they are considered neighbors.
If the eps value is chosen too small then large part of the data will be considered
as outliers. If it is chosen very large then the clusters will merge and the majority
of the data points will be in the same clusters.

2. MinPts: Minimum number of neighbors (data points) within eps radius. Larger
the dataset, the larger value of MinPts must be chosen. As a general rule, the
minimum MinPts can be derived from the number of dimensions D in the dataset
as, MinPts >= D+1. The minimum value of MinPts must be chosen at least 3.

53
▪ DBSCAN is a Density-Based Clustering algorithm
▪ Reminder: In density based clustering we partition points into dense
regions separated by not-so-dense regions.
▪ Important Questions:
▪ How do we measure density?
▪ What is a dense region?

▪ DBSCAN:
▪ Density at point p: number of points within a circle of radius Eps
▪ Dense Region: A circle of radius Eps that contains at least MinPts
points

54
▪ In this algorithm, we have 3 types of data points.
Core Point: A point is a core point if it has more than MinPts points
within eps.
Border Point: A point which has fewer than MinPts within eps but it is
in the neighborhood of a core point.
Noise or outlier: A point which is not a core point or border point.

55
▪ In this algorithm, we have 3 types of data points.
Core Point: A point is a core point if it has more than MinPts points
within eps.
Border Point: A point which has fewer than MinPts within eps but it is
in the neighborhood of a core point.
Noise or outlier: A point which is not a core point or border point.

56
▪ Characterization of points
▪ A point is a core point if it has more than a speciﬁed number
of points (MinPts) within Eps
▪ These points belong in a dense region and are at the interior of a
cluster

▪ A border point has fewer than MinPts within Eps, but is in

the neighborhood of a core point.

▪ A noise point is any point that is not a core point or a border

point.
57
58
Point types: core, border
Original Points
and noise

Eps = 10, MinPts = 4

59
▪ Density edge

▪ We place an edge between two p

core points q and p if they are q

within distance Eps.

▪ Density-connected
▪ A point p is density-connected to a
point q if there is a path of edges p q
from p to q
o

60
▪ Label points as core, border and noise
▪ Eliminate noise points
▪ For every core point p that has not been
assigned to a cluster
▪ Create a new cluster with the point p and all the
points that are density-connected to p.
▪ Assign border points to the cluster of
the closest core point.

61
DBSCAN algorithm can be abstracted in the
following steps:
• The algorithm proceeds by arbitrarily picking up a point in the
dataset (until all points have been visited).
• If there are at least ‘minPoint’ points within a radius of ‘ε’ to the
point then we consider all these points to be part of the same
cluster.
• The clusters are then expanded by recursively repeating the
neighborhood calculation for each neighboring point

62
DBSCAN algorithm can be abstracted in the
following steps:
• Find all the neighbor points within eps and identify the core points or
visited with more than MinPts neighbors.
• For each core point if it is not already assigned to a cluster, create a new
cluster.
• Find recursively all its density connected points and assign them to the
same cluster as the core point.
A point a and b are said to be density connected if there exist a
point c which has a sufficient number of points in its neighbors and both
the points a and b are within the eps distance. This is a chaining process.
So, if b is neighbor of c, c is neighbor of d, d is neighbor of e, which in turn
is neighbor of a implies that b is neighbor of a.
• Iterate through the remaining unvisited points in the dataset. Those
points that do not belong to any cluster are noise.

63
▪ Idea is that for points in a cluster, their kth nearest neighbors are
at roughly the same distance
▪ Noise points have the kth nearest neighbor at farther distance
▪ So, plot sorted distance of every point to its kth nearest neighbor
▪ Find the distance d where there is a “knee” in the curve
▪ Eps = d, MinPts = k

Eps ~ 7-10
MinPts = 4

64
Original Points
Clusters

• Resistant to Noise
• Can handle clusters of diﬀerent shapes and sizes

65
(MinPts=4, Eps=9.75).

Original Points

•Varying densities
• High-dimensional data

(MinPts=4, Eps=9.92)

66
67
▪ PAM, CLARANS: Solutions for the k-medoids problem
▪ BIRCH: Constructs a hierarchical tree that acts a
summary of the data, and then clusters the leaves.
▪ MST: Clustering using the Minimum Spanning Tree.
▪ ROCK: clustering categorical data by neighbor and
link analysis
▪ LIMBO, COOLCAT: Clustering categorical data using
information theoretic tools.
▪ CURE: Hierarchical algorithm uses diﬀerent
representation of the cluster
▪ CHAMELEON: Hierarchical algorithm uses closeness
and interconnectivity for merging 68
K-Medoids
• K-medoids is an unsupervised method with unlabelled data to be clustered, It is an
improvised version of the K-Means algorithm mainly designed to deal with outlier data
sensitivity.
• Compared to other partitioning algorithms, the algorithm is simple, fast, and easy to
implement.
• A medoid can be defined as a point in the cluster, whose dissimilarities with all the other
points in the cluster are minimum.
• The dissimilarity of the medoid(Ci) and object(Pi) is calculated by using E = |Pi – Ci|
• Medoid: A Medoid is a point in the cluster from which the sum of distances to other
data points is minimal.
• Instead of centroids as reference points in K-Means algorithms, the K-Medoids
algorithm takes a Medoid as a reference point.
• The partitioning will be carried on such that:
• Each cluster must have at least one object
• An object must belong to only one cluster

69
K-Medoids: Algorithm
• Given the value of k and unlabelled data:
• Choose k number of random points from the data and assign these k points to k number
of clusters. These are the initial medoids.
• For all the remaining data points, calculate the distance from each medoid and assign it
to the cluster with the nearest medoid.
• Calculate the total cost (Sum of all the distances from all the data points to the medoids)
• Select a random point as the new medoid and swap it with the previous medoid. Repeat
2 and 3 steps.
• If the total cost of the new medoid is less than that of the previous medoid, make the
new medoid permanent and repeat step 4.
• If the total cost of the new medoid is greater than the cost of the previous medoid,
undo the swap and repeat step 4.
• The Repetitions have to continue until no change is encountered with new medoids to
classify data points.

70
K-Medoids: Example
• Let's suppose we have the data given below, and we want to divide
the data into two clusters, i.e., k=2. If we plot the data, it looks like
the below image.

S. No. X Y
1 9 6
2 10 4
3 4 4
4 5 8
5 3 8
6 2 5
7 8 5
8 4 6
9 8 4
10 9 3 71
K-Medoids: Example
• For step 1, let's pick up two random medoids(as our k=2). So we pick M1 (8,4)
and M2 (4,6) as our initial medoids.
Let’s calculate the distance of each data point from both the medoids.
S.No. X Y Distance from Distance from
M1(8,4) M2(4,6)
1 9 6 3 5
2 10 4 2 8
3 4 4 4 2
4 5 8 7 3
5 3 8 9 3
6 2 5 7 3
7 8 5 1 5
8 4 6 - -
9 8 4 - -
10 9 3 2 8
72
K-Medoids: Example
• Each point is assigned to the cluster of that medoid whose distance is
less.
• Points (1,2,7,10) are assigned to M1(8,4) and points (3,4,5,6) are assigned
to M2(4,6)

• Therefore,
• Cost = (3+2+1+2)+(2+3+3+3)
• = (8)+(11)
• = 19
• Now let us randomly select another medoid and swap it. Let us check by
having M1 as (8,5).
• The new medoids are M1(8,5) and M2(4,6)

73
K-Medoids: Example
• Therefore distance of each point from M1(8,5) and M2(4,6) is
calculated as follows:-
S.No. X Y Distance Distance
from M1(8,5) from M2(4,6)
1 9 6 2 5
2 10 4 3 8
3 4 4 5 2
4 5 8 6 3
5 3 8 9 3
6 2 5 6 3
7 8 5 - 5
8 4 6 - -
9 8 4 1 -
10 9 3 3 8

74
K-Medoids: Example
• Points (1,2,7,10) are assigned to M1(8,5) and points (3,4,5,6) are assigned to M2(4,6)

• New Total Cost= (2+3+1+3)+(2+3+3+3)

• = (9)+(11)
• = 20

• Now let’s calculate the Swap Cost

• Swap Cost= New Total Cost- Cost
• = 20-19
• =1
• Since Swap Cost>0, we would undo the swap.

• Our final medoids are M1(8,4) and M2(4,6), and the two clusters are formed with these medoids.

75
K-Medoids: Example
• The orange dots represent the first cluster, and the blue dots
represent the second cluster. The triangles represent the medoids of
the clusters.

76
Advantages and Disadvantages of using
K-Medoids:
• Advantages
• Deals with noise and outlier data effectively
• Easily implementable and simple to understand
• Faster compared to other partitioning algorithms
• Disadvantages:
• Not suitable for Clustering arbitrarily shaped groups of data points.
• As the initial medoids are chosen randomly, the results might vary based on
the choice in different runs.

77
Spectral Clustering
• Spectral Clustering is a growing clustering algorithm which has
performed better than many traditional clustering algorithms in
many cases.
• It treats each data point as a graph node and thus transforms the
clustering problem into a graph-partitioning problem.
• Spectral clustering is a technique with roots in graph theory, where
the approach is used to identify communities of nodes in a graph
based on the edges connecting them. The method is flexible and
allows us to cluster non graph data as well.
• Spectral clustering uses information from the eigenvalues
(spectrum) of special matrices built from the graph or the data set.

78
How to do Spectral Clustering?
• The three major steps involved in spectral clustering are:
constructing a similarity graph, projecting data onto a
lower-dimensional space, and clustering the data.
• Given a set of points S in a higher-dimensional space, it can be
elaborated as follows:
• 1. Form a distance matrix
• 2. Transform the distance matrix into an afﬁnity matrix A
• 3. Compute the degree matrix D and the Laplacian matrix L = D – A.
• 4. Find the eigenvalues and eigenvectors of L.
• 5. With the eigenvectors of k largest eigenvalues computed from the
previous step form a matrix.
• 6. Normalize the vectors.
• 7. Cluster the data points in k-dimensional space.

79
Pros and Cons of Spectral Clustering
• Spectral clustering helps us overcome two major problems in clustering:
• One being the shape of the cluster and the other is determining the cluster centroid.
• K-means algorithm generally assumes that the clusters are spherical or round i.e. within
k-radius from the cluster centroid.
• In K means, many iterations are required to determine the cluster centroid, In spectral,
the clusters do not follow a ﬁxed shape or pattern.
• Points that are far away but connected belong to the same cluster and the points which
are less distant from each other could belong to different clusters if they are not
connected, This implies that the algorithm could be effective for data of different shapes
and sizes.

80
Spectral Clustering: Applications
• Spectral clustering has its application in many areas which
includes: image segmentation, educational data mining, speech
separation, spectral clustering of protein sequences, text-image
segmentation.
• Though spectral clustering is a technique based on graph theory,
the approach is used to identify communities of vertices in a
graph based on the edges connecting them.
• This method is ﬂexible and allows us to cluster non-graph data
as well either with or without the original data.

81
THANK YOU

CITY MUNICIPAL FS IPCR-chief
0% (1)
CITY MUNICIPAL FS IPCR-chief
5 pages
College Geometry
No ratings yet
College Geometry
27 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
M5
No ratings yet
M5
40 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Clustering
No ratings yet
Clustering
80 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit-IV - Unsupervised Learning
No ratings yet
Unit-IV - Unsupervised Learning
154 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering
No ratings yet
Clustering
75 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Unit 4
No ratings yet
Unit 4
16 pages
ML - 8
No ratings yet
ML - 8
70 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Unit 4
No ratings yet
Unit 4
29 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Clustering
No ratings yet
Clustering
67 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Clustering
No ratings yet
Clustering
104 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Clustering
No ratings yet
Clustering
38 pages
Lecture8 Unsupervised Learning
No ratings yet
Lecture8 Unsupervised Learning
58 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
(3rd Year) Pattern REcognition Lecture 4
No ratings yet
(3rd Year) Pattern REcognition Lecture 4
48 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering
No ratings yet
Clustering
84 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Session 34 - 35clustering
No ratings yet
Session 34 - 35clustering
50 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Curriculum Map: SY 2019-2020 Yr Level: Grade 8 Subject: Mathematics 8 (Second Quarter)
No ratings yet
Curriculum Map: SY 2019-2020 Yr Level: Grade 8 Subject: Mathematics 8 (Second Quarter)
3 pages
#Freud's Concept of Narcissism
No ratings yet
#Freud's Concept of Narcissism
5 pages
Gastroschisis Pathway
100% (1)
Gastroschisis Pathway
18 pages
Listening Skills Practice: Study Tips - Exercises: Preparation: Matching
100% (2)
Listening Skills Practice: Study Tips - Exercises: Preparation: Matching
2 pages
Chapter 4
No ratings yet
Chapter 4
19 pages
Writing in Focus
No ratings yet
Writing in Focus
69 pages
Asia Score For Vertebra Injury
100% (1)
Asia Score For Vertebra Injury
2 pages
Gerson The Relational Unconscious Psychoanalytic Quarterly
No ratings yet
Gerson The Relational Unconscious Psychoanalytic Quarterly
19 pages
Helix Habitat and TDS Classic
No ratings yet
Helix Habitat and TDS Classic
14 pages
Terato Threshold Black Magic and Shattered Geometry Ryan Anschauung PDF Download
100% (1)
Terato Threshold Black Magic and Shattered Geometry Ryan Anschauung PDF Download
40 pages
Parallel Texts Alignment Strategies
No ratings yet
Parallel Texts Alignment Strategies
7 pages
Legal Research Search Techniques
100% (1)
Legal Research Search Techniques
12 pages
Northeast China Grain Supply Chain Analysis
No ratings yet
Northeast China Grain Supply Chain Analysis
6 pages
eNodeB Site Commissioning Guide
No ratings yet
eNodeB Site Commissioning Guide
22 pages
OFP Interview Questions
100% (1)
OFP Interview Questions
2 pages
Math-Q3-Week 6
No ratings yet
Math-Q3-Week 6
6 pages
Unit 456
No ratings yet
Unit 456
6 pages
Final Report Askari Bank
No ratings yet
Final Report Askari Bank
117 pages
SC-FDMA in LTE Uplink: A Study
No ratings yet
SC-FDMA in LTE Uplink: A Study
11 pages
Standard SRMU
0% (5)
Standard SRMU
24 pages
RICPI21ATP550-Atlas Fire Hydrant Non Traffic Symetric Outlets
No ratings yet
RICPI21ATP550-Atlas Fire Hydrant Non Traffic Symetric Outlets
7 pages
Riasec Test
No ratings yet
Riasec Test
2 pages
12 Must-Watch Mograph Videos: Grab Some Popcorn. It'S Binge Watching Time!
No ratings yet
12 Must-Watch Mograph Videos: Grab Some Popcorn. It'S Binge Watching Time!
6 pages
RPH Reviewer
No ratings yet
RPH Reviewer
29 pages
Grade 6 Civics Lesson Plan Term1 Week 2
No ratings yet
Grade 6 Civics Lesson Plan Term1 Week 2
6 pages
Classical Physics Prof. V. Balakrishnan Department of Physics Indian Institute of Technology, Madras Lecture No. # 12
No ratings yet
Classical Physics Prof. V. Balakrishnan Department of Physics Indian Institute of Technology, Madras Lecture No. # 12
25 pages
Sabp A 021 PDF
No ratings yet
Sabp A 021 PDF
13 pages
How Can I Remove Win32 - Grenam.a Permanently - Win32 - Grenam
No ratings yet
How Can I Remove Win32 - Grenam.a Permanently - Win32 - Grenam
15 pages

ML Unit III

Uploaded by

ML Unit III

Uploaded by

ML-UNIT III

SCHOOL OF COMPUTER ENGINEERING & TECHNOLOGY

There are several clustering algorithms available, including k-means, hierarchical

• Data points belonging to one cluster have high degree of

• Data points belonging to different clusters have high degree

• K-means algorithm partitions n observations into k clusters

• Scalable to large datasets.

• Better computation cost.

• Easily warm start the assignments and positions of centroids

• Lacks consistent results for different values of K

• Always tries to find circular clusters

• Centroids get dragged due to outliers in the dataset

• Curse of dimensionality, K is ineffective when the number of dimensions increases

▪ K-centers: Similar problem definition as in K- means, but the goal now is to

• Single Linkage: It is the Shortest Distance between the closest

• Complete Linkage: It is the farthest distance between the two

▪ Once a decision is made to combine two clusters, it

▪ Real life data may contain irregularities, like:

The figure shows a data set

▪ A border point has fewer than MinPts within Eps, but is in

▪ A noise point is any point that is not a core point or a border

Eps = 10, MinPts = 4

▪ We place an edge between two p

core points q and p if they are q

within distance Eps.

• New Total Cost= (2+3+1+3)+(2+3+3+3)

• Now let’s calculate the Swap Cost

You might also like