4.4 Hierarchical Clustering Methods
4.4 Hierarchical Clustering Methods
2
Hierarchical Clustering Me thods
Hierarchical Clustering Methods
● Hierarchical clustering is a technique that organizes data points into a tree-like
structure of nested clusters. It offers insights into both macro and micro-level
relationships within data.
● Hierarchical clustering
○ Generate a clustering hierarchy (drawn as a dendrogram)
○ Not required to specify K, the number of clusters
○ More deterministic
○ No iterative refinement
3
Hierarchical Clustering Me thods
Hierarchical Clustering Methods
● Two categories of algorithms:
○ Agglomerative: Start with singleton clusters, continuously merge two
clusters at a time to build a bottom-up hierarchy of clusters
○ Divisive: Start with a huge macro-cluster, split it continuously into two
groups, generating a top-down hierarchy of clusters
● Key Concepts:
○ Agglomerative and divisive algorithms build hierarchies from bottom-up
and top-down, respectively.
○ Hierarchical clustering produces dendrogram visualizations.
○ Provides flexibility in exploring clusters at different granularity levels.
4
○ Suitable for various data types and proximity measures.
Hierarchical Clustering Me thods
Dendrogram: Shows How Clusters are Merged
● Dendrogram: Decompose a set of data objects into a tree of clusters by multi-
level nested partitioning
● A clustering of the data objects is obtained by cutting the dendrogram at the
desired level, then each connected component forms a cluster
5
Hierarchical Clustering Me thods
Agglomerative Clustering Algorithms
● Agglomerative hierarchical clustering starts with individual data points as
separate clusters and progressively merges them.
● AGNES (AGglomerative NESting) (Kaufmann and Rousseeuw, 1990)
○ Use the single-link method and the dissimilarity matrix
○ Continuously merge nodes that have the least dissimilarity
○ Eventually all nodes belong to the same cluster
6
Hierarchical Clustering Me thods
Agglomerative Clustering Algorithms
● Agglomerative clustering varies on different similarity measures among
clusters
○ Single link (nearest neighbor)
○ Complete link (diameter)
○ Average link (group average)
○ Centroid link (centroid similarity)r.
7
Hierarchical Clustering Me thods
Single Link vs. Complete Link in Hierarchical Clustering
● Single link (nearest neighbor)
○ The similarity between two clusters is the similarity between their most
similar (nearest neighbor) members
○ Local similarity-based: Emphasizing more on close regions, ignoring the
overall structure of the cluster
○ Capable of clustering non-elliptical shaped group of objects
○ Sensitive to noise and outliers
8
Hierarchical Clustering Me thods
Single Link vs. Complete Link in Hierarchical Clustering
● Complete link (diameter)
○ The similarity between two clusters is the similarity between their most
dissimilar members
○ Merge two clusters to form one with the smallest diameter
○ Nonlocal in behavior, obtaining compact shaped clusters
○ Sensitive to outliers
9
Hierarchical Clustering Me thods
Agglomerative Clustering: Average vs. Centroid Links
● Agglomerative clustering with average link
○ Average link: The average distance between an
element in one cluster and an element in the other
(i.e., all pairs in two clusters)
■ Expensive to compute
● Agglomerative clustering with centroid link
○ Centroid link: The distance between the centroids of
two clusters
10
Hierarchical Clustering Me thods
Agglomerative Clustering: Average vs. Centroid Links
● Group Averaged Agglomerative Clustering (GAAC)
○ Let two clusters Ca and Cb be merged into CaUb.
The new centroid is:
12
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Divisive hierarchical clustering begins with all data points in a single cluster
and recursively divides them.
● DIANA (Divisive Analysis) (Kaufmann and Rousseeuw,1990)
○ Implemented in some statistical analysis packages, e.g., Splus
● Inverse order of AGNES: Eventually each node forms a cluster on its own
13
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Divisive clustering is a top-down approach
○ The process starts at the root with all the points as one cluster
○ It recursively splits the higher level clusters to build the dendrogram
○ Can be considered as a global approach
○ More efficient when compared with agglomerative clustering
14
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Dendrogram Visualization:
○ Starts with a single cluster that splits into subclusters.
○ Divisions occur at varying levels of granularity.
○ Also yields a dendrogram, highlighting the hierarchical structure.
○ Divisive clustering allows exploration of finer details within clusters but
can be challenging to implement effectively.
15
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Hierarchical clustering can be extended to handle specific challenges and data
characteristics.
● Single Linkage (Minimum Linkage):
○ Measures distance between two clusters as the minimum distance between
any pair of points.
○ Vulnerable to "chaining" effect, where distant points influence the linkage.
16
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Complete Linkage (Maximum Linkage):
○ Measures distance between two clusters as the maximum distance
between any pair of points.
○ Can lead to "crowding" problem, where compact clusters are merged.
● Average Linkage:
○ Measures distance between two clusters as the average distance between
all pairs of points.
○ Balanced approach between single and complete linkage.
17
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Major weaknesses of hierarchical clustering methods
○ Can never undo what was done previously
○ Do not scale well
■ Time complexity of at least O(n2), where n is the number of total
objects
● Other hierarchical clustering algorithms
○ BIRCH (1996): Use CF-tree and incrementally adjust the quality of sub-
clusters
○ CURE (1998): Represent a cluster using a set of well-scattered representative
points
18
○ CHAMELEON (1999): Use graph partitioning methods on the K-nearest
Hierarchical Clustering Me thods
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
● BIRCH is a hierarchical clustering approach designed for large datasets by
employing a micro-clustering strategy.
● A multiphase clustering algorithm (Zhang, Ramakrishnan & Livny,
SIGMOD’96)
● Incrementally construct a CF (Clustering Feature) tree, a hierarchical data
structure for multiphase clustering
○ Phase 1: Scan DB to build an initial in-memory CF tree (a multi-level
compression of the data that tries to preserve the inherent clustering
structure of the data)
○ Phase 2: Use an arbitrary clustering algorithm to cluster the leaf nodes of
the CFtree 19
Hierarchical Clustering Me thods
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
● Key idea: Multi-level clustering
○ Low-level micro-clustering: Reduce complexity and increase scalability
○ High-level macro-clustering: Leave enough flexibility for high-level
clustering
● Scales linearly: Find a good clustering with a single scan and improve the
quality with a few additional scans
20
Hierarchical Clustering Me thods
Clustering Feature Vector in BIRCH
● Clustering Feature (CF): CF = (N, LS, SS) \
○ N: Number of data points
○ LS: linear sum of N points:
○ SS: square sum of N points:
● Clustering feature:
○ Summary of the statistics for a given sub-cluster: the 0-th, 1st, and 2nd
moments of the sub-cluster from the statistical point of view
○ Registers crucial measurements for computing cluster and utilizes storage
efficiently
21
Hierarchical Clustering Me thods
Measures of Cluster: Centroid, Radius and Diameter
● Centroid:
○ the “middle” of a cluster
○ n: number of points in a cluster
○ is the i-th point in the cluster
● Radius: R
○ Average distance from member objects to the centroid
○ The square root of average distance from any point of the cluster to its
centroid
22
Hierarchical Clustering Me thods
Measures of Cluster: Centroid, Radius and Diameter
● Diameter: D
○ Average pairwise distance within a cluster
○ The square root of average mean squared distance between all pairs of
points in the cluster
23
Hierarchical Clustering Me thods
The CF Tree Structure in BIRCH
● Incremental insertion of new points (similar to B+-tree)
● For each point in the input
○ Find closest leaf entry
○ Add point to leaf entry and update CF
○ If entry diameter > max_diameter
■ split leaf, and possibly parents
○ A CF tree has two parameters
■ Branching factor: Maximum number of children
■ Maximum diameter of subclusters stored at the leaf nodes
24
Hierarchical Clustering Me thods
The CF Tree Structure in BIRCH
● A CF tree: A height-balanced tree that stores the clustering features (CFs)
● The non-leaf nodes store sums of the CFs of their children
25
Hierarchical Clustering Me thods
BIRCH: A Scalable and Flexible Clustering Method
● An integration of agglomerative clustering with other (flexible) clustering
methods
○ Low-level micro-clustering
■ Exploring CP-feature and BIRCH tree structure
■ Preserving the inherent clustering structure of the data
○ Higher-level macro-clustering
■ Provide sufficient flexibility for integration with other clustering
methods
26
Hierarchical Clustering Me thods
BIRCH: A Scalable and Flexible Clustering Method
● Impact to many other clustering methods and applications
● Concerns
○ Sensitive to insertion order of data points
○ Due to the fixed size of leaf nodes, clusters may not be so natural
○ Clusters tend to be spherical given the radius and diameter measures
27
Hierarchical Clustering Me thods
BIRCH: Advantages and Applications
● Applications:
○ Network traffic analysis for intrusion detection.
○ E-commerce customer segmentation.
○ Biological sequence clustering.
○ Sensor data analysis.
● BIRCH's innovative micro-clustering strategy makes it a valuable tool for
scalable and memory-efficient clustering tasks.
28
Hierarchical Clustering Me thods
Case Study: Retail Store Clustering with Hierarchical Methods
● Let's apply hierarchical clustering to a retail store dataset for location-based
segmentation.
● Objective:
○ Group stores based on geographical sales patterns.
○ Identify clusters with similar market behavior.
29
Hierarchical Clustering Me thods
Case Study: Retail Store Clustering with Hierarchical Methods
● Steps:
○ Data Preparation: Convert store sales and location data into suitable
format.
○ Agglomerative Clustering: Apply agglomerative hierarchical clustering.
○ Dendrogram Analysis: Analyze dendrogram to determine optimal cluster
count.
○ Cluster Interpretation: Interpret characteristics of formed clusters.
○ Strategy Development: Design marketing strategies for each cluster.
○ Hierarchical clustering helps unravel store behavior and guides targeted
marketing efforts.
30
Hierarchical Clustering Me thods
Hierarchical Clustering vs. Partitioning-Based Clustering
● Hierarchical Clustering:
○ Produces nested clusters in a tree-like structure.
○ Offers exploration at different granularity levels.
○ Dendrogram provides visual insights into cluster relationships.
○ Suitable for diverse data types.
● Partitioning-Based Clustering:
○ Divides data into distinct, non-overlapping clusters.
○ Requires specifying the number of clusters (K).
○ Applicable for large datasets and efficient for K-Means.
○ Offers flexibility in selecting different similarity measures. 31
Hierarchical Clustering Me thods
Hierarchical Clustering vs. Partitioning-Based Clustering
● Choosing between hierarchical and partitioning-based clustering depends on
the data characteristics and analysis goals
32
Hierarchical Clustering Me thods
Challenges and Considerations
● Hierarchical clustering methods come with their own set of challenges and
considerations.
● Challenges:
○ Dendrogram Interpretation: Determining the optimal number of clusters
from a dendrogram.
○ Computation Complexity: Agglomerative clustering can be resource-
intensive for large datasets.
○ Choice of Linkage: Selecting the appropriate linkage method based on
data characteristics.
○ Hierarchical Nature: Dendrograms may become complex and hard to
interpret. 33
Hierarchical Clustering Me thods
Challenges and Considerations
● Considerations:
○ Experiment with different linkage methods.
○ Use silhouette scores or other metrics to evaluate cluster quality.
○ Balance granularity and interpretability based on analysis goals.
34
Hierarchical Clustering Me thods
Visualizing Hierarchical Clustering
● Visualizing hierarchical clustering results enhances our understanding of
formed clusters.
● Dendrogram Visualization:
○ Depicts merging/dividing process.
○ Provides insights into cluster relationships.
○ Helpful for selecting optimal cluster count.
35
Hierarchical Clustering Me thods
Visualizing Hierarchical Clustering
● Visualizing hierarchical clustering results enhances our understanding of
formed clusters.
● Cluster Visualization:
○ Plot clusters in multidimensional space.
○ Use dimensionality reduction techniques (e.g., PCA, t-SNE).
○ Visualize cluster characteristics, separability, and patterns.
○ Visualizations empower us to extract meaningful insights from complex
hierarchical clustering outcomes.
36
Hierarchical Clustering Me thods
Hierarchical Clustering in Data Exploration
● Hierarchical clustering can be an effective tool for exploratory data analysis.
● Data Exploration Benefits:
○ Pattern Discovery: Identify inherent data patterns and relationships.
○ Anomaly Detection: Detect outliers and unusual data points.
○ Data Reduction: Group similar data points for summarization.
○ Segmentation: Uncover distinct segments or subgroups.
● By applying hierarchical clustering to data exploration, we unveil valuable
insights that guide subsequent analyses and decision-making.
37
Hierarchical Clustering Me thods
Recap and Key Takeaways
● Agglomerative clustering merges data points into nested clusters.
● Divisive clustering recursively divides data points into clusters.
● Linkage methods (single, complete, average) influence clustering outcomes.
● BIRCH utilizes micro-clustering for memory-efficient clustering.
● Applications include retail store segmentation and gene expression analysis.
● As you continue your journey through hierarchical clustering, remember the
significance of dendrograms and the diverse applications of this method in
various fields.
38
Summary
Hierarchical Clustering
● Hierarchical clustering organizes data in a tree-like structure.
● Agglomerative starts with individual data points and merges them.
● Divisive begins with all data in one cluster and splits iteratively.
● Useful for exploring data hierarchy and relationships.
39