Clustering Metrics in Machine Learning
Last Updated :
17 Apr, 2025
Clustering is a technique in Machine Learning that is used to group similar data points. While the algorithm performs its job, helping uncover the patterns and structures in the data, it is important to judge how well it functions. Several metrics have been designed to evaluate the performance of these clustering algorithms.
In this article, we will explore these metrics and see the mathematical concepts that lie behind them. After that, we will demonstrate their practical implementation using scikit-learn.
What is Clustering?
Clustering is an unsupervised machine-learning approach that is used to group comparable data points based on specific traits or attributes. Clustering algorithms do not require labelled data, which makes them ideal for finding patterns in large datasets. It is a widely used technique in applications like customer segmentation, image recognition, anomaly detection, etc.
There are multiple clustering algorithms, and each has its way of grouping data points. Clustering metrics are used to evaluate all these algorithms. Let us take a look at some of the most commonly used clustering metrics:
1. Silhouette Score
The Silhouette Score is a way to measure how good the clusters are in a dataset. It helps us understand how well the data points have been grouped. The score ranges from -1 to 1.
- A score close to 1 means a point fits really well in its group (cluster) and is far from other groups.
- A score close to 0 means the point is on the border between two clusters.
- A score close to -1 means the point might be in the wrong cluster.
Silhouette Score (S) for a data point i is calculated as:
S(i) = \frac{b(i)- a(i)}{max({a(i),b(i)})}
where,
- a(i) is the average distance from i to other data points in the same cluster.
- b(i) is the smallest average distance from i to data points in a different cluster.
2. Davies-Bouldin Index
The Davies-Bouldin Index (DBI) helps us measure how good the clustering is in a dataset. It looks at how tight each cluster is (compactness), and how far apart the clusters are (separation).
- Lower DBI = better, clearer clusters
- Higher DBI = messy, overlapping clusters
A lower score is better, because it means:
- Points in the same cluster are close to each other.
- Different clusters are far apart from one another.
Davies-Bouldin Index (DB) is calculated as:
DB = \frac{1}{k} \sum_{i=1}^{k} \max_{j \neq i} \left( \frac{R_{ii} + R_{jj}}{R_{ij}} \right)
where,
- k is the total number of clusters.
- R_{ii}​ is the compactness of cluster i.
- R_{jj}​ is the compactness of cluster j.
- R_{ij} is the dissimilarity (distance) between cluster i and cluster j.
3. Calinski-Harabasz Index (Variance Ratio Criterion)
The Calinski-Harabasz Index measures how good the clusters are in a dataset.
It looks at:
- How close the points are inside each cluster?
- How far apart the clusters are?
A higher score is better, as it means the clusters are tight and well-separated. It helps determine the ideal number of clusters.
Calinski-Harabasz Index (CH) is calculated as:
CH = \frac{B}{W} \times \frac{N - K}{K - 1}
where,
- B is the sum of squares between clusters.
- W is the sum of squares within clusters.
- N is the total number of data points.
- K is the number of clusters.
Calculating between group sum of squares (B)
B= \sum_{k=1}^{K} n_k \times ||C_k – C||^2
where,
- n_k is the number of observation in cluster 'k'
- C_k is the centroid of cluster 'k'
- C is the centroid of the dataset
- K is number of clusters
Calculating within the group sum of squares (W)
W = \sum_{i=1}^{n_k} ||X_{ik} – C_{k}||^2
where,
- n_k is the number of observation in cluster 'k'
- X_{ik} is the i-th observation of cluster 'k'
- C_k is the centroid of cluster 'k'
4. Adjusted Rand Index (ARI)
The Adjusted Rand Index (ARI) helps us measure how accurate a clustering result is by comparing it to the true labels (ground truth).
It checks how well the pairs of points are grouped:
- Are the same pairs together in both the real and predicted clusters?
- Are different pairs also kept apart correctly?
The score ranges from -1 to 1:
- 1 means perfect match - the clustering is exactly right.
- 0 means random guess - no better than chance.
- Below 0 means worse than random - very poor clustering.
Adjusted Rand Index (ARI) is calculated as:
ARI = \frac{(RI - Expected_{RI})}{(max(RI) - Expected_{RI})}
where,
- RI is the Rand Index.
- Expected_{RI} is the expected value of the Rand Index.
Mutual Information measures how much two variables are related or connected. In clustering, it compares how much the true cluster labels match with the predicted labels. It shows how much knowing about one variable helps us predict the other. The more agreement there is, the higher the score.
- Higher values mean better agreement between the clusters.
- Zero means no agreement at all.
MI between true labels Y and predicted labels Z is calculated as:
MI(y, z) = \sum_{i} \sum_{j} p(y_i, z_j) \cdot \log \left( \frac{p(y_i, z_j)}{p(y_i) \cdot p(z_j)} \right)
where,
- y_i is a true label.
- z_i is a predicted label.
- p(y_i, z_i) is the joint probability of y_i and z_j.
- p(y_i) and p(z_i) are the marginal probabilities.
These clustering metrics help in evaluating the quality and performance of clustering algorithms, allowing for informed decisions when selecting the most suitable clustering solution for a given dataset.
Steps to Evaluate Clustering Using Sklearn
Let's consider an example using the Iris dataset and the K-Means clustering algorithm. We will calculate the Silhouette Score, Davies-Bouldin Index, Calinski-Harabasz Index, and Adjusted Rand Index to evaluate the clustering.
Import Libraries
Import the necessary libraries, including scikit-learn (sklearn).
Python
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score
from sklearn.metrics import mutual_info_score, adjusted_rand_score
Load Your Data
Load or generate your dataset for clustering. Iris dataset consists of 150 samples of iris flowers. There are three species of iris flower: setosa, versicolor, and virginica with four features: sepal length, sepal width, petal length, and petal width.
Python
# Example using a built-in dataset (e.g., Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
Choose a clustering algorithm, such as K-Means, and fit it to your data.
K means is an unsupervised technique used for creating cluster based on similarity. It iteratively assigns data points to the nearest cluster center and updates the centroids until convergence.
Python
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
Calculate Clustering Metrics
Use the appropriate clustering metrics to evaluate the clustering results.
Python
# Calculate clustering metrics
silhouette = silhouette_score(X, kmeans.labels_)
db_index = davies_bouldin_score(X, kmeans.labels_)
ch_index = calinski_harabasz_score(X, kmeans.labels_)
ari = adjusted_rand_score(iris.target, kmeans.labels_)
mi = mutual_info_score(iris.target, kmeans.labels_)
# Print the metric scores
print(f"Silhouette Score: {silhouette:.2f}")
print(f"Davies-Bouldin Index: {db_index:.2f}")
print(f"Calinski-Harabasz Index: {ch_index:.2f}")
print(f"Adjusted Rand Index: {ari:.2f}")
print(f"Mutual Information (MI): {mi:.2f}")
Output:
Silhouette Score: 0.55
Davies-Bouldin Index: 0.66
Calinski-Harabasz Index: 561.63
Adjusted Rand Index: 0.73
Mutual Information (MI): 0.83
Interpreting the Metrics
Here's an interpretation of the metric scores obtained:
Silhouette Score (0.55)
This score reveals how similar data points are inside their clusters when compared to data points from other clusters. A result of 0.55 indicates that there is some separation between the clusters, but there is still space for improvement. Closer to 1 values suggest better-defined clusters.
Davies-Bouldin Index (0.66)
This index calculates the average similarity between each cluster and its closest neighbors. A lower score is preferable, and 0.66 suggests a pretty strong separation across clusters.
The score Index (561.63)
Calculates the ratio of between-cluster variation to within-cluster variance. Higher values suggest more distinct groups. Your clusters are distinct and independent with a score of 561.63.
The Adjusted Rand Index (0.73)
Compares the resemblance of genuine class labels to predicted cluster labels. A rating of 0.73 shows that the clustering findings and the actual class labels correspond rather well.
This metric measures the agreement between the true class labels and the predicted cluster labels. A score of 0.75 indicates a substantial amount of shared information between the true labels and the clusters assigned by the algorithm.
Read More:
Similar Reads
Spectral Clustering in Machine Learning
Prerequisites: K-Means Clustering In the clustering algorithm that we have studied before we used compactness(distance) between the data points as a characteristic to cluster our data points. However, we can also use connectivity between the data point as a feature to cluster our data points. Using
9 min read
Clustering in Machine Learning
In real world, not every data we work upon has a target variable. Have you ever wondered how Netflix groups similar movies together or how Amazon organizes its vast product catalog? These are real-world applications of clustering. This kind of data cannot be analyzed using supervised learning algori
9 min read
Evaluation Metrics in Machine Learning
Evaluation is always good in any field, right? In the case of machine learning, it is best practice. In this post, we will cover almost all the popular as well as common metrics used for machine learning. Table of Content Classification MetricsAccuracyLogarithmic Loss Area Under Curve (AUC)Precision
8 min read
Hierarchical Clustering in Machine Learning
In the real world data often lacks a target variable making supervised learning impractical. Have you ever wondered how social networks like Facebook recommend friends or how scientists group similar species together? These are some examples of hierarchical clustering that we will learn about in thi
7 min read
Agglomerative Methods in Machine Learning
Before diving into the Agglomerative algorithms, we must understand the different concepts in clustering techniques. So, first, look at the concept of Clustering in Machine Learning: Clustering is the broad set of techniques for finding subgroups or clusters on the basis of characterization of objec
14 min read
KNN vs Decision Tree in Machine Learning
There are numerous machine learning algorithms available, each with its strengths and weaknesses depending on the scenario. Factors such as the size of the training data, the need for accuracy or interpretability, training time, linearity assumptions, the number of features, and whether the problem
5 min read
Rand-Index in Machine Learning
Cluster analysis, also known as clustering, is a method used in unsupervised learning to group similar objects or data points into clusters. It's a fundamental technique in data mining, machine learning, pattern recognition, and exploratory data analysis. To assess the quality of the clustering resu
8 min read
SVM vs KNN in Machine Learning
Support Vector Machine(SVM) and K Nearest Neighbours(KNN) both are very popular supervised machine learning algorithms used for classification and regression purpose. Both SVM and KNN play an important role in Supervised Learning. Table of Content Support Vector Machine(SVM)K Nearest Neighbour(KNN)S
5 min read
Machine Learning Model Evaluation
Model evaluation is a process that uses some metrics which help us to analyze the performance of the model. Think of training a model like teaching a student. Model evaluation is like giving them a test to see if they truly learned the subjectâor just memorized answers. It helps us answer: Did the m
10 min read
Maths for Machine Learning
Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
5 min read