0% found this document useful (0 votes)
4 views

Hierarchical-Clustering-in-Machine-Learning

Hierarchical clustering is a connectivity-based clustering method that organizes data points into groups based on their similarity or distance, represented visually by a dendrogram. There are two main types: agglomerative clustering, which merges clusters from individual data points, and divisive clustering, which recursively splits a single cluster into smaller ones. The choice of distance metric can significantly affect the clustering results, with various methods available for measuring similarity between clusters.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Hierarchical-Clustering-in-Machine-Learning

Hierarchical clustering is a connectivity-based clustering method that organizes data points into groups based on their similarity or distance, represented visually by a dendrogram. There are two main types: agglomerative clustering, which merges clusters from individual data points, and divisive clustering, which recursively splits a single cluster into smaller ones. The choice of distance metric can significantly affect the clustering results, with various methods available for measuring similarity between clusters.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Hierarchical Clustering in Machine Learning

Clustering is an unsupervised learning technique that organizes data


based on its resemblance to other data sets. There are numerous types of
clustering methods in machine learning.

 Connectivity-based clustering: This type of clustering algorithm


creates a cluster based on the connection of data points. Examples
include hierarchical clustering.
 Centroid-based clustering: This clustering algorithm clusters
data points around their centroids. Examples include K-Means and K-
Mode clustering.
 Distribution-based clustering: Statistical distributions are used
to model this clustering process. It assumes that the data points in a
cluster are created from a particular probability distribution, and the
method seeks to estimate the parameters of the distribution in
order to group comparable data points into clusters. Example:
Gaussian Mixture Models (GMM)
 Density-based clustering: With this kind of clustering method,
data points in high-density concentrations are grouped together,
while points in low-density concentrations are separated. The main
concept is that it finds high density data point locations in the data
space and clusters those points together. DBSCAN (Density-Based
Spatial Clustering of Applications with Noise) is one example.

Hierarchical clustering

A connectivity-based clustering methodology called hierarchical clustering


puts nearby data points in groups according to similarity or distance. Data
points that are closer together are thought to be more similar or
connected than those that are farther apart.

The hierarchical links between groups are shown by a dendrogram, a tree-


like picture created via hierarchical clustering. The dendrogram shows
that the largest clusters, which contain all of the data points, are at the
top, while individual data points are at the bottom. It is possible to slice
the dendrogram at different heights to produce varying numbers of
clusters.
Dendogram

What is a Dendrogram?

A tree diagram showing the arrangement of clusters produced by


hierarchical clustering.

 Vertical lines represent the merging of clusters.


 Horizontal lines indicate the distance between clusters.
 The height at which clusters merge can guide in determining the
optimal number of clusters.

Clusters are iteratively merged or broken according to a distance or


similarity metric between data points to form the dendrogram. Until every
data point is contained in a single cluster or until the target number of
clusters is reached, clusters are split or combined again.

To determine the ideal number of clusters, we can examine the


dendrogram and assess the height at which the branches form distinct
clusters. At this height, the dendrogram can be cut to find the number of
clusters.
Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering

1. Agglomerative Clustering
2. Divisive clustering

Hierarchical Agglomerative Clustering

It is sometimes referred to as hierarchical agglomerative clustering (HAC)


or the bottom-up technique. An organized set of clusters that yields more
information than the unorganized group obtained from flat clustering. The
number of clusters does not need to be predetermined when using this
clustering procedure. From the beginning, bottom-up algorithms treat
each data set as a singleton cluster. They subsequently group pairs of
clusters together until all of the clusters are combined into a single cluster
that contains all of the data.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

# compute the distance matrix

for i=1 to N:

# as the distance matrix is symmetric about

# the primary diagonal so we compute only lower

# part of the primary diagonal

for j=1 to i:

dis_mat[i][j] = distance[di, dj]

each data point is a singleton cluster

repeat

merge the two cluster having minimum distance

update the distance matrix

until only a single cluster remains


Hierarchical Agglomerative Clustering
Steps:

 Consider each alphabet as a single cluster and calculate the


distance of one cluster from all the other clusters.
 In the second step, comparable clusters are merged together to
form a single cluster. Let’s say cluster (B) and cluster (C) are very
similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A),
(BC), (DE), (F)]
 We recalculate the proximity according to the algorithm and merge
the two nearest clusters([(DE), (F)]) together to form new clusters as
[(A), (BC), (DEF)]
 Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now
left with clusters [(A), (BCDEF)].
 At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].

Python implementation of the above algorithm using the scikit-learn


library:

from sklearn.cluster import AgglomerativeClustering

import numpy as np

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

# here we need to mention the number of clusters

# otherwise the result will be a single cluster

# containing all the data

clustering = AgglomerativeClustering(n_clusters=2).fit(X)

# print the class labels


print(clustering.labels_)

Output :

[1, 1, 1, 0, 0, 0]

Hierarchical Divisive Clustering

Another name for it is a top-down approach. It is also not necessary to


predetermine the number of clusters using this algorithm. The process of
breaking a cluster that includes all of the data must be done recursively in
order to divide each data into singleton clusters, which is necessary for
top-down clustering.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

at the top we have all data in one cluster

the cluster is split using a flat clustering method eg. K-Means etc

repeat

choose the best cluster among all the clusters to split


split that cluster by the flat clustering algorithm

until each data is in its own singleton cluster

Hierarchical Divisive clustering

Computing Distance Matrix

We measure the distance between each pair of clusters while merging


them, combining the ones that have the greatest similarity and the least
distance. How that distance is calculated is the question, though.
Distance/similarity between clusters can be defined in a variety of ways.
Some of them are:

1. Min Distance: Find the minimum distance between any two points of
the cluster.
2. Max Distance: Find the maximum distance between any two points
of the cluster.
3. Group Average: Find the average distance between every two points
of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the
increase in squared error when two clusters are merged.

For example, if we group a given data using different methods, we may


get different results:

Distance Matrix Comparison in Hierarchical Clustering

Implementations code

import numpy as np

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])


# Perform hierarchical clustering

Z = linkage(X, 'ward')

# Plot dendrogram

dendrogram(Z)

plt.title('Hierarchical Clustering Dendrogram')

plt.xlabel('Data point')

plt.ylabel('Distance')

plt.show()

Output:

Hierarchical Clustering Dendrogram

Hierarchical Agglomerative vs Divisive Clustering


 Divisive clustering is more complex as compared to agglomerative
clustering, as in the case of divisive clustering we need a flat
clustering method as “subroutine” to split each cluster until we have
each data having its own singleton cluster.
 Divisive clustering is more efficient if we do not generate a complete
hierarchy all the way down to individual data leaves. The time
complexity of a naive agglomerative clustering is O(n3) because we
exhaustively scan the N x N matrix dist_mat for the lowest distance
in each of N-1 iterations. Using priority queue data structure we can
reduce this complexity to O(n2logn). By using some more
optimizations it can be brought down to O(n2). Whereas for divisive
clustering given a fixed number of top levels, using an efficient flat
algorithm like K-Means, divisive algorithms are linear in the number
of patterns and clusters.
 A divisive algorithm is also more accurate. Agglomerative clustering
makes decisions by considering the local patterns or neighbour
points without initially taking into account the global distribution of
data. These early decisions cannot be undone. Whereas divisive
clustering takes into consideration the global distribution of data
when making top-level partitioning decisions.

You might also like