clustering
clustering
Richard Martin
SUBMITTED BY
M Tech IV Semester
CLUSTERING TECHNIQUES
RICHARD MARTIN
M TECH IV SEMESTER
Gentle introduction about what is Clustering
Types of Clustering
Clustering Algorithms
Application of Clustering
Advantage and Disadvantage
Reference
TOPICS
COVERED
WHAT IS CLUSTERING
Clustering or cluster analysis is a machine learning technique,
which groups the unlabelled dataset. It can be defined as "A way
of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group.“
It does it by finding some similar patterns in the unlabelled dataset
such as shape, size, color, behavior, etc., and divides them as per
the presence and absence of those similar patterns.
It is an unsupervised learning method, hence no supervision is
provided to the algorithm, and it deals with the unlabeled dataset.
After applying this clustering technique, each cluster or group is
provided with a cluster-ID. ML system can use this id to simplify
the processing of large and complex datasets.
EXAMPLES OF
CLUSTERING Let's understand the clustering technique with the real-world
example of Mall:
When we visit any shopping mall, we can observe that the
things with similar usage are grouped together. Such as the t-
shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can
easily find out the things. The clustering technique also works
in the same way. Other examples of clustering are grouping
documents according to the topic.
CLUSTERING TECHNIQUES
The clustering technique can be widely used in various tasks. Some most common uses of this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products. Netflix also uses this technique to recommend the movies and
web-series to its users as per the watch history.
The clustering methods are broadly divided
into Hard clustering (datapoint belongs to only one
group) and Soft Clustering (data points can belong
to another group also). But there are also other
various approaches of Clustering exist. Below are
the main clustering methods used in Machine
TYPES OF
learning:
1. Partitioning Clustering
The Clustering algorithms can be divided based on their models that are explained above. There are different types of
clustering algorithms published, but only a few are commonly used. The clustering algorithm is based on the kind of
data that we are using. Such as, some algorithms need to guess the number of clusters in the given dataset, whereas
some are required to find the minimum distance between the observation of the dataset.
TYPES OF CLUSTERING ALGORITHMS
Here we are discussing mainly popular Clustering algorithms that are widely used in machine learning:
• K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It classifies the
dataset by dividing the samples into different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear complexity of O(n).
• Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density of data points. It is
an example of a centroid-based model, that works on updating the candidates for centroid to be the center of the
points within a given region.
TYPES OF CLUSTERING ALGORITHMS
• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It is an
example of a density-based model similar to the mean-shift, but with some remarkable advantages. In this
algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be
found in any arbitrary shape.
• Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative for the k-
means algorithm or for those cases where K-means can be failed. In GMM, it is assumed that the data points are
Gaussian distributed.
TYPES OF CLUSTERING ALGORITHMS
• Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the bottom-up
hierarchical clustering. In this, each data point is treated as a single cluster at the outset and then successively
merged. The cluster hierarchy can be represented as a tree-structure.
• Affinity Propagation: It is different from other clustering algorithms as it does not require to specify the number
of clusters. In this, each data point sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.
Below are some commonly known applications of clustering
technique in Machine Learning:
• In Identification of Cancer Cells: The clustering algorithms
are widely used for the identification of cancerous cells. It
divides the cancerous and non-cancerous data sets into
different groups.
• In Search Engines: Search engines also work on the
clustering technique. The search result appears based on the
closest object to the search query. It does it by grouping
APPLICATIONS similar data objects in one group that is far from the other
dissimilar objects. The accurate result of a query depends on
OF the quality of the clustering algorithm used.
CLUSTERING
• Customer Segmentation: It is used in market
research to segment the customers based on
their choice and preferences.
• In Biology: It is used in the biology stream to
classify different species of plants and animals
using the image recognition technique.
• In Land Use: The clustering technique is used in
identifying the area of similar lands use in the GIS
database. This can be very useful to find that for
APPLICATIONS what purpose the particular land should be used,
that means for which purpose it is more suitable.
OF
CLUSTERING
Failover Support. Failover support
ensures that a business intelligence
system remains available for use if an
application or hardware failure occurs. ...
Work Fencing.
ADVANTAGES OF
CLUSTERING
DIS-ADVANTAGES OF CLUSTERING
https://www.javatpoint.com/clustering-in-mac
hine-learning
https://www.geeksforgeeks.org/clustering-in-m
achine-learning
https://www.analyticsvidhya.com/blog/2016/1
1/an-introduction-to-clustering-and-different-
methods-of-clustering
https://developers.google.com/machine-learni
ng/clustering/overview