0% found this document useful (0 votes)
8 views

clustering

Uploaded by

richard martin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

clustering

Uploaded by

richard martin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Submitted By

Richard Martin
SUBMITTED BY
M Tech IV Semester
CLUSTERING TECHNIQUES
RICHARD MARTIN
M TECH IV SEMESTER
 Gentle introduction about what is Clustering
 Types of Clustering
 Clustering Algorithms
 Application of Clustering
 Advantage and Disadvantage
 Reference

TOPICS
COVERED
WHAT IS CLUSTERING
Clustering or cluster analysis is a machine learning technique,
which groups the unlabelled dataset. It can be defined as "A way
of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group.“
It does it by finding some similar patterns in the unlabelled dataset
such as shape, size, color, behavior, etc., and divides them as per
the presence and absence of those similar patterns.
It is an unsupervised learning method, hence no supervision is
provided to the algorithm, and it deals with the unlabeled dataset.
After applying this clustering technique, each cluster or group is
provided with a cluster-ID. ML system can use this id to simplify
the processing of large and complex datasets.
EXAMPLES OF
CLUSTERING Let's understand the clustering technique with the real-world
example of Mall:
When we visit any shopping mall, we can observe that the
things with similar usage are grouped together. Such as the t-
shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can
easily find out the things. The clustering technique also works
in the same way. Other examples of clustering are grouping
documents according to the topic.
CLUSTERING TECHNIQUES

The clustering technique can be widely used in various tasks. Some most common uses of this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products. Netflix also uses this technique to recommend the movies and
web-series to its users as per the watch history.
The clustering methods are broadly divided
into Hard clustering (datapoint belongs to only one
group) and Soft Clustering (data points can belong
to another group also). But there are also other
various approaches of Clustering exist. Below are
the main clustering methods used in Machine

TYPES OF
learning:
1. Partitioning Clustering

CLUSTERING 2. Density-Based Clustering


3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
PARTITIONING CLUSTERING

It is a type of clustering that divides the data into non-


hierarchical groups. It is also known as the centroid-
based method. The most common example of
partitioning clustering is the
K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups,
where K is used to define the number of pre-defined
groups. The cluster center is created in such a way that
the distance between the data points of one cluster is
minimum as compared to another cluster centroid.
DENSITY-BASED CLUSTERING

The density-based clustering method connects the highly-


dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of
high densities into clusters. The dense areas in data space
are divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data
points if the dataset has varying densities and high
dimensions.
DISTRIBUTION MODEL-BASED CLUSTERING

In the distribution model-based clustering method, the


data is divided based on the probability of how a dataset
belongs to a particular distribution. The grouping is done
by assuming some distributions commonly Gaussian
Distribution.
The example of this type is the Expectation-
Maximization Clustering algorithm that uses Gaussian
Mixture Models (GMM).
HIERARCHICAL CLUSTERING

Hierarchical clustering can be used as an


alternative for the partitioned clustering as
there is no requirement of pre-specifying the
number of clusters to be created. In this
technique, the dataset is divided into clusters
to create a tree-like structure, which is also
called a dendrogram. The observations or
any number of clusters can be selected by
cutting the tree at the correct level. The most
common example of this method is
the Agglomerative Hierarchical
algorithm.
FUZZY CLUSTERING

 Fuzzy clustering is a type of soft method in


which a data object may belong to more
than one group or cluster. Each dataset has
a set of membership coefficients, which
depend on the degree of membership to be
in a cluster. Fuzzy C-means algorithm is
the example of this type of clustering; it is
sometimes also known as the Fuzzy k-
means algorithm.
CLUSTERING ALGORITHMS

The Clustering algorithms can be divided based on their models that are explained above. There are different types of
clustering algorithms published, but only a few are commonly used. The clustering algorithm is based on the kind of
data that we are using. Such as, some algorithms need to guess the number of clusters in the given dataset, whereas
some are required to find the minimum distance between the observation of the dataset.
TYPES OF CLUSTERING ALGORITHMS

Here we are discussing mainly popular Clustering algorithms that are widely used in machine learning:
• K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It classifies the
dataset by dividing the samples into different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear complexity of O(n).
• Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density of data points. It is
an example of a centroid-based model, that works on updating the candidates for centroid to be the center of the
points within a given region.
TYPES OF CLUSTERING ALGORITHMS

• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It is an
example of a density-based model similar to the mean-shift, but with some remarkable advantages. In this
algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be
found in any arbitrary shape.

• Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative for the k-
means algorithm or for those cases where K-means can be failed. In GMM, it is assumed that the data points are
Gaussian distributed.
TYPES OF CLUSTERING ALGORITHMS

• Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the bottom-up
hierarchical clustering. In this, each data point is treated as a single cluster at the outset and then successively
merged. The cluster hierarchy can be represented as a tree-structure.

• Affinity Propagation: It is different from other clustering algorithms as it does not require to specify the number
of clusters. In this, each data point sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.
Below are some commonly known applications of clustering
technique in Machine Learning:
• In Identification of Cancer Cells: The clustering algorithms
are widely used for the identification of cancerous cells. It
divides the cancerous and non-cancerous data sets into
different groups.
• In Search Engines: Search engines also work on the
clustering technique. The search result appears based on the
closest object to the search query. It does it by grouping
APPLICATIONS similar data objects in one group that is far from the other
dissimilar objects. The accurate result of a query depends on
OF the quality of the clustering algorithm used.

CLUSTERING
• Customer Segmentation: It is used in market
research to segment the customers based on
their choice and preferences.
• In Biology: It is used in the biology stream to
classify different species of plants and animals
using the image recognition technique.
• In Land Use: The clustering technique is used in
identifying the area of similar lands use in the GIS
database. This can be very useful to find that for
APPLICATIONS what purpose the particular land should be used,
that means for which purpose it is more suitable.
OF
CLUSTERING
Failover Support. Failover support
ensures that a business intelligence
system remains available for use if an
application or hardware failure occurs. ...

Load Balancing. ...

Project Distribution and Project


Failover. ...

Work Fencing.
ADVANTAGES OF
CLUSTERING
DIS-ADVANTAGES OF CLUSTERING

Disadvantages of clustering are complexity and inability to recover from database


corruption. In a clustered environment, the cluster uses the same IP address for Directory
Server and Directory Proxy Server, regardless of which cluster node is actually running the
service.
REFERENCES

 https://www.javatpoint.com/clustering-in-mac
hine-learning
 https://www.geeksforgeeks.org/clustering-in-m
achine-learning
 https://www.analyticsvidhya.com/blog/2016/1
1/an-introduction-to-clustering-and-different-
methods-of-clustering
 https://developers.google.com/machine-learni
ng/clustering/overview

You might also like