0% found this document useful (0 votes)
37 views

Clustering

Clustering is an unsupervised machine learning technique that groups similar data points together without labels. It discovers natural groupings or patterns in datasets. Some common applications of clustering include customer segmentation, image segmentation, document grouping, and anomaly detection. There are different types of clustering algorithms such as partitioning clustering (e.g. k-means), density-based clustering, distribution model-based clustering (e.g. Gaussian mixture models), hierarchical clustering, and fuzzy clustering.

Uploaded by

Hareesh K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Clustering

Clustering is an unsupervised machine learning technique that groups similar data points together without labels. It discovers natural groupings or patterns in datasets. Some common applications of clustering include customer segmentation, image segmentation, document grouping, and anomaly detection. There are different types of clustering algorithms such as partitioning clustering (e.g. k-means), density-based clustering, distribution model-based clustering (e.g. Gaussian mixture models), hierarchical clustering, and fuzzy clustering.

Uploaded by

Hareesh K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Clustering

Clustering is a machine learning and data analysis technique used to group similar data points
together based on certain characteristics or features. The goal of clustering is to find patterns,
structures, or natural groupings within a dataset, without prior knowledge of the class labels.
It is often an unsupervised learning technique, meaning that the algorithm discovers the
clusters autonomously without labeled data.

Note: Clustering is somewhere similar to the classification algorithm, but the difference is the
type of dataset that we are using. In classification, we work with the labeled data set, whereas
in clustering, we work with the unlabeled dataset.

Example: Let's understand the clustering technique with the real-world example of Mall: When
we visit any shopping mall, we can observe that the things with similar usage are grouped
together. Such as the t-shirts are grouped in one section, and shirts are at other sections,
similarly, at vegetable sections, apples, bananas, mangoes, etc., are grouped in separate
sections, so that we can easily find out the things. The clustering technique also works in the
same way.

Here's an explanation of clustering with an example:

Clustering Example: Customer Segmentation

Let's say you are the manager of a retail store, and you want to better understand your customer
base to tailor marketing strategies and product offerings. You have collected data on your
customers' purchase history, including information such as age, income, and shopping
behavior. Clustering can help you segment your customer base into distinct groups to better
target your marketing efforts.

1. Data Collection: You gather data on a sample of your customers. The data includes
features like age, income, and purchase behavior, and it is represented as a dataset
with multiple data points (each customer) and features (age, income, etc.).
2. Data Preprocessing: You may need to clean and preprocess the data by handling
missing values, scaling the features, and ensuring that the data is in a suitable format
for clustering algorithms.
3. Selecting a Clustering Algorithm: There are various clustering algorithms to choose
from, such as K-Means, hierarchical clustering, DBSCAN, and more. Each algorithm
has its own strengths and weaknesses. For this example, let's use K-Means clustering.
4. Applying K-Means Clustering:
 Choose the number of clusters (K) that you want to divide your customers into. This
is a hyper parameter you need to specify.
 K-Means will iteratively assign each customer to the nearest cluster centroid based on
their features and then recalculate the cluster centroids. This process continues until
convergence.
 After convergence, you will have K clusters, and each cluster contains customers with
similar features.

5. Interpreting the Clusters: Once the clustering algorithm has run, you can analyze the

results to understand the characteristics of each cluster. For example, you might find:

 Cluster 1: Young, high-income customers who buy luxury products.


 Cluster 2: Middle-aged, middle-income customers who purchase everyday items.
 Cluster 3: Seniors on fixed incomes who shop infrequently.

6. Business Insights and Actions: With the clusters identified, you can tailor marketing

strategies and product recommendations for each group. For example:

 Cluster 1 may receive targeted advertisements for high-end products.


 Cluster 2 could be offered loyalty discounts on frequently purchased items.
 Cluster 3 might benefit from senior citizen discounts and special events.
Clustering helps you gain insights into your customer base and make data-driven decisions to
improve business operations. This is just one example of how clustering can be applied.
Clustering is widely used in various fields, including customer segmentation, image
segmentation, document grouping, and anomaly detection, among others. The specific
application and choice of clustering algorithm depend on the problem and the nature of the
data.

Applications of Clustering

Below are some commonly known applications of clustering technique in Machine Learning:

In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.

In Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a query
depends on the quality of the clustering algorithm used.

Customer Segmentation: It is used in market research to segment the customers based on their
choice and preferences.

In Biology: It is used in the biology stream to classify different species of plants and animals
using the image recognition technique.

In Land Use: The clustering technique is used in identifying the area of similar lands use in the
GIS database. This can be very useful to find that for what purpose the particular land should
be used, that means for which purpose it is more suitable.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (data point belongs to only
one group) and Soft Clustering (data points can belong to another group also). But there are
also other various approaches of Clustering exist. Below are the main clustering methods used
in Machine learning:

 Partitioning Clustering
 Density-Based Clustering
 Distribution Model-Based Clustering
 Hierarchical Clustering
 Fuzzy Clustering

1. Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-
Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number
of pre-defined groups. The cluster center is created in such a way that the distance between the
data points of one cluster is minimum as compared to another cluster centroid.

2. Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and connects the areas of high
densities into clusters. The dense areas in data space are divided from each other by sparser
areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying
densities and high dimensions.
3. Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability
of how a dataset belongs to a particular distribution. The grouping is done by assuming some
distributions commonly Gaussian distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).

4. Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset
is divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level.
The most common example of this method is the Agglomerative Hierarchical algorithm.
5. Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to more than one
group or cluster. Each dataset has a set of membership coefficients, which depend on the degree
of membership to be in a cluster. Fuzzy C-means algorithm is the example of this type of
clustering; it is sometimes also known as the Fuzzy k-means algorithm.

You might also like