0% found this document useful (0 votes)
42 views11 pages

Divisive_Hierarchical_Clustering

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views11 pages

Divisive_Hierarchical_Clustering

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Divisive Hierarchical Clustering

Author: Amrithkala M Shetty

Date: September 2024


Table of Contents
1. Introduction

2. Divisive Hierarchical Clustering: An Overview

3. The Algorithm

4. Distance Metrics and Splitting Criteria

5. Example

6. Advantages and Disadvantages

7. Applications

8. Conclusion

9. References
1. Introduction
Hierarchical clustering is a popular method of cluster analysis that seeks to build a
hierarchy of clusters. It can be broadly categorized into two types: agglomerative (bottom-
up) and divisive (top-down) clustering. While agglomerative clustering starts with
individual points and merges them to form clusters, divisive clustering takes the opposite
approach. This document will focus on divisive hierarchical clustering, its significance,
algorithm, and applications.
2. Divisive Hierarchical Clustering: An Overview
Divisive hierarchical clustering, also known as the top-down approach, starts with all data
points in a single cluster. It recursively splits the clusters into smaller ones until each data
point is in its own cluster, or until another stopping criterion is met. This method is less
commonly used compared to agglomerative clustering but is particularly useful in certain
scenarios where large clusters need to be divided into more granular subgroups.
3. The Algorithm
The divisive clustering algorithm follows these steps:

1. Start with all data points in a single cluster.


2. Recursively split the cluster into two or more sub-clusters based on a chosen criterion,
such as distance.
3. Continue splitting until each data point is its own cluster or another stopping condition is
met.

Pseudo-code for the divisive hierarchical clustering algorithm:


```
function DIVISIVE_CLUSTERING(data_points):
cluster = [all data_points]
clusters = [cluster]
while not stopping_condition(clusters):
for each cluster in clusters:
split cluster into two or more sub-clusters
replace cluster with sub-clusters
return clusters
```
4. Distance Metrics and Splitting Criteria
The choice of distance metric and splitting criterion significantly impacts the results of
divisive clustering. Common distance metrics include Euclidean distance, Manhattan
distance, and cosine similarity. The splitting criterion might involve maximizing the inter-
cluster distance or minimizing the intra-cluster distance. These choices should be made
based on the specific characteristics of the data and the desired outcome.
5. Example
Consider a simple example with a small dataset. Suppose we have a dataset of five points in
a 2D space: A(1,2), B(2,3), C(3,4), D(5,6), and E(8,9). The divisive clustering process might
start by placing all points in a single cluster. The algorithm then evaluates and splits the
cluster into sub-clusters based on the distance between points, resulting in a hierarchy of
clusters. A visual representation of the resulting dendrogram can help illustrate the
clustering process.
6. Advantages and Disadvantages
### Advantages
- **Intuitive and Easy to Understand:** Divisive hierarchical clustering is straightforward
and easy to interpret, making it accessible to users.
- **Dendrogram Visualization:** The hierarchical nature allows for the creation of a
dendrogram, which visually represents the relationships between clusters.
- **No Need to Pre-specify Number of Clusters:** Unlike k-means clustering, divisive
clustering does not require the user to specify the number of clusters in advance.

### Disadvantages
- **Computationally Expensive:** The recursive splitting process is computationally
intensive, especially for large datasets.
- **Sensitive to Noise and Outliers:** Divisive clustering can be affected by noise and
outliers, potentially leading to inaccurate clustering results.
- **Imbalanced Clusters:** The algorithm may produce clusters of significantly different
sizes, which can be undesirable in some applications.
7. Applications
Divisive hierarchical clustering has applications in various domains, including:
- **Biology:** It is used to classify species into hierarchical taxonomies based on genetic
similarities.
- **Marketing:** Helps in segmenting customers into distinct groups based on purchasing
behavior.
- **Social Network Analysis:** Divisive clustering is used to identify communities within
social networks, where users are grouped based on interaction patterns.
8. Conclusion
Divisive hierarchical clustering is a powerful method for clustering data, particularly when
there is a need to break down large clusters into more detailed sub-clusters. Despite its
computational challenges, it provides a clear and interpretable structure for analyzing
complex datasets. The ability to create a dendrogram adds an extra layer of insight, making
divisive clustering a valuable tool in various fields of study.
9. References
1. Hastie, T., Tibshirani, R., & Friedman, J. (2009). *The Elements of Statistical Learning: Data
Mining, Inference, and Prediction*. Springer.
2. Kaufman, L., & Rousseeuw, P. J. (1990). *Finding Groups in Data: An Introduction to
Cluster Analysis*. Wiley.
3. Xu, R., & Wunsch, D. (2005). *Clustering*. Wiley-IEEE Press.
4. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). *Cluster Analysis*. Wiley.

You might also like