0% found this document useful (0 votes)
56 views

Mod 4 - CLustering

The document discusses different clustering techniques including K-means clustering, hierarchical clustering, and density-based clustering (DBSCAN). It defines clustering as a machine learning technique that groups unlabeled datasets into clusters of similar data points. It describes how K-means clustering works by assigning data points to clusters based on distance from cluster centers, and how the cluster centers are recomputed in each iteration. It also provides an overview of DBSCAN, a density-based clustering method that can identify clusters of arbitrary shapes unlike K-means.

Uploaded by

ABHIJITH DAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Mod 4 - CLustering

The document discusses different clustering techniques including K-means clustering, hierarchical clustering, and density-based clustering (DBSCAN). It defines clustering as a machine learning technique that groups unlabeled datasets into clusters of similar data points. It describes how K-means clustering works by assigning data points to clusters based on distance from cluster centers, and how the cluster centers are recomputed in each iteration. It also provides an overview of DBSCAN, a density-based clustering method that can identify clusters of arbitrary shapes unlike K-means.

Uploaded by

ABHIJITH DAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Module 4

Clustering:-
Introduction - Similarity measures - Clustering criteria - Distance functions - K-means clustering,
Hierarchical Clustering, Density based clustering (DBSCAN)
Combining Multiple Learners:- Voting, Bagging, Boosting

By:
Sherry O. Panicker
MCA, M. Phil
MCA@NirmalaCollegeMuvattupuzha
Introduction

• Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a group
that has less or no similarities with another group.“

• Clustering is the process of grouping a set of data objects into multiple groups or clusters
so that objects within a cluster have high similarity, but are very dissimilar to objects in other
clusters.
• Dissimilarities and similarities are assessed based on the attribute values and often involve
distance measures.

2
Introduction
Application areas of Clustering Techniques:
 as a data mining tool in biology, security, business intelligence, and Web search.
 Market Segmentation

 Customer Segmentation

 Image segmentation

 Statistical data analysis

 Social network analysis

 Anomaly detection, etc.

 Search Engines

 In Identification of Cancer Cells

 In Land Use

 In Biology

3
Introduction

 Clustering is used by the Amazon in its recommendation system to provide the


recommendations as per the past search of products. 
 Netflix also uses this technique to recommend the movies and web-series to its users as per
the watch history.

4
Introduction

• It is an unsupervised learning method.

• After applying clustering technique, each cluster or group is provided with a cluster-ID.

• ML system can use this id to simplify the processing of large and complex datasets.

Clustering is somewhere similar to the Classification Algorithm, but the difference is the


type of dataset that we are using.

In classification, we work with the labeled data set (supervised)

In clustering, we work with the unlabelled dataset. (unsupervised)

5
Introduction

• Categories of Clustering techniques: broadly divided into Hard clustering (datapoint


belongs to only one group) and Soft Clustering  (data points can belong to another group
also)
 partitioning methods (ex. kMeans, kMedoid, CLARA, CLARANS)
 hierarchical methods (BIRCH)
 density-based methods (DBSCAN)
 Probability based methods
 grid-based methods
 Distribution Model-Based Clustering
 Fuzzy Clustering
6
Introduction..
Working of the clustering algorithm

7 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Similarity Measures
• Similarity denotes the strength of relationship between two data items, it represents how
similar 2 data patterns are.
• Clustering is done based on a similarity measure to group similar data objects together.
• The clusters are formed in such a way that any two data objects within a cluster have a
minimum distance value and any two data objects across different clusters have a maximum
distance value.
• Clustering using distance functions, called distance based clustering, is a very popular
technique to cluster the objects and has given good results.
• Similarity measure is based on distance functions such as 
Euclidean distance
Manhattan distance
Minkowski distance
Cosine similarity, etc. to group objects in clusters.

8
Clustering Criteria
• A good clustering method will produce high quality clusters where:
– the intra-class similarity is high. 
– the inter-class similarity is low.
• The quality of a clustering result also depends on both the similarity measure used by
the method and its implementation.

9
Distance functions
• .

10 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering
Can be used in clustering all types of data

Aim: clustering/grouping of data. Ex. YouTube groups people on the basis of age, location, etc.

11 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…
I 1D data: 2, 4, 10, 8, 16, 30, 12, 6
1. Suppose k=2, ie we are forming 2 clusters.
2. Randomly select cluster centres. Say 6 and 12.
3. Calculate similarity using distance function: we have to decide the
cluster to which remaining data will go. For that, take each data item
and calculate its difference with the cluster centres/ compare all data
items with centres(take absolute values). Data will belong to Cluster
with minimum distance, ie maximum similarity.
16-6=10 more 16-12=4 less
Iter 1: 2-6=4 less 2-12=10 more 30-6=24 more 30-12=18 less
4-6=2 less 4-12=8 more
10-6=4 more 10-12=2 less
8-6=2 less 8-12=4 more
End of iteration 1
12 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…
Iter 2: Find cluster centres again, taking avarege of data items. New values may or may not be part of
actual data.

Here 5 and 17 are not part of data. But its ok.

13 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…

Repeat Step 1. Note: If distance to both clusters are same for any data item, it can belong to C1
or C2. 2-5=3 less 2-17=15 more
4-5=1 less 4-17=13 more
10-5=5 less 10-17=7 more
8-5=3 less 8-17=9 more
16-5=11 more 16-17=1 less
30-5=25 more 30-17=13 less
12-5=7 more 12-17=5 less
6-5=1 less 6-17=11 more

14 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…

Iter 3: Find cluster centres again, taking avarege of data items. Do not include 5 and 17 since
they are not part of data.

6 is data. 15 not data. Repeat distance calculation and form


clusters as before

End of iteration 3

15 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…

label 0 label 1

How to decide the number of iterations? OR How to stop iteration? 2 options:


1. If all current cluster centres are same as previous, stop iteration.
2. If number of iterations n has been set, stop after n rounds.

• Clusters are named starting from label 0 onwards. This is clustering of 1D data.

16 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…
• Drawback: in k-means, we use circular or ellipse shape ie. Rigid boundaries,
so points near or on border will be avoided. Its better to use arbitrary shapes
DBSCAN makes it possible.

II Clustering of 2D data
Then Calculate distance
D1-D2 D1-D4
D3-D2 D3-D4
D5-D2 D5-D4
In 2D, distance between data points is calculated using Euclidean distance.
Ex.

17 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…
Hands on for K-Means

• Number of items=150
• Dimensionality = 4
• Since k-Means is unsupervised, we take only x. do not consider the output y. Unlabeled.
• Group data based on similarity
Within cluster: high similarity
Between clusters: very low similarity. High dissimilarity
• Do the steps mentioned with 1D.
18 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…- Hands on for K-Means
• Project the data in 2D plane. So we can take only 2
features. Suppose we take SL and Sw.

Final result

19 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
K-Means clustering…- Hands on for K-Means
c=labels: plotting based on the variable ‘labels’
4 Steps #4 Draw the Graph
import matplotlib.pyplot as plt
xaxis=file["sepal_length"]
yaxis=file["sepal_width"]
plt.scatter(xaxis,yaxis,c=labels,cmap="rainbow")

#1 Sklearn Package OUTPUT
from sklearn.cluster import KMeans
ML=KMeans(n_clusters=3,max_iter=5)

#2  Load Data
import pandas as pd
file=pd.read_csv("/content/irisexcel.csv")
x=file[["sepal_length","sepal_width"]]
ML.fit(x)

#3 Finding  Centers and labels
centers=ML.cluster_centers_
labels=ML.labels_
20 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN: Density-Based Spatial Clustering of Applications with Noise
• Deals with arbitrary shapes

DBSCAN - parameters

Ex. M=3 means for given a radius, minimum 3 neighbours required.

21 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN: Density-Based Spatial Clustering of Applications with Noise

DBSCAN – points
Each point is either:
• Core point
• Border point
• Outlier point
Core point example
• Let R=2, M= 3
• Consider a point x with radius 2 from border. If there are minimum 3 other points within this
radius, then x is a core point. There can be more than 3 too.

x
2
x
x x x

22 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN…
2
Density Reachability are of 3 types.
1
• p is a core point. q is a point in p’s neighbourhood.
• If q is also a core point, we can form a cluster from q.
• Let r be a point in q’s neighbourhood.
• Through q, r is density reachable from p
• So p,q,r can be considered a single group.

23 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN…
3

• p is a core point. q and r are 2 points within p’s boundary.


• Also q is also a core point and s is a point within boundary of q.
• Later we understood that r is also a core point and t is within its boundary.
• Then, we can say that t and s are density connected.

Thus they form a chain. Thus they can be grouped into a single
category.
24 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN…
DBSCAN steps
1. Suppose we have a set of points.
2. Select a random point and check if it’s a core point(are there min number of points in its
neighbourhood) and if it can form a cluster.
3. If it forms a cluster, mark all inside points as visited with a tick mark.

Here, all 4
points in the
blue boundary
get’s tick mark

4. Go for the next random point, repeat all previous steps.

Now, total 8 points visited.

25 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN…

5. Suppose there is another random point that cannot form a cluster.


6. We select that too and mark with X.

Implementing in Python

7. Continue the process till all points have been visited and marked.
8. Combine the groups based on the 3 density reachability conditions.
9. Advantages: arbitrary shapes possible, number of clusters not required.

26 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DBSCAN…

27 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering

• unsupervised machine learning algorithm


• Hierarchical clustering, also known as hierarchical cluster analysis, is an
algorithm that groups similar objects into groups called clusters.
• It means, this algorithm:
 considers each dataset as a single cluster at the beginning, and then
 start combining the closest pair of clusters together.
 It does this until all the clusters are merged into a single cluster that contains all
the datasets.
• Here, the hierarchy of clusters is developed in the form of a tree, and this tree-
shaped structure is known as the dendrogram.

28 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..

Hierarchical clustering technique has two approaches:


1.Agglomerative or AGglomerative NESting or AGNES : Agglomerative is a bottom-
up approach.
 This algorithm starts with N groups, each initially containing one training instance,
 merging similar groups to form larger groups, until there is a single one.

2. Divisive or DIvisie ANAlysis (DIANA): Divisive algorithm is the reverse of the


agglomerative algorithm as it is a top-down approach.
 Starts with a single large group, divide it into smaller groups, until each group
contains a single instance.

29 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering- Agglomerative algorithm(AGNES)
• At each iteration of an agglomerative algorithm, we choose the two closest groups to merge.
• There are various ways to calculate the distance between two clusters, and these ways
decide the rule for clustering.
• These measures are called Linkage methods. 
 In single-link clustering, the distance is defined as the smallest distance between all possible
pair of elements of the two groups.
 In complete-link clustering, the distance between two groups is taken as the largest distance
between all possible pairs.
 In average-link method, the average of distances between all pairs is used.
 In centroid distance, the distance between the centroids (means) of the two groups is
measured.

30 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..
Single-link clustering
Complete-link clustering

Ways to calculate distance measures include:


• Euclidean distance measure 
• Squared Euclidean distance measure
Centroid distance • Manhattan distance measure 
• Cosine distance measure 

31 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..

Consider that we have a few


points on a 2D plane with x-y
coordinates.
• Here, each data point is a
cluster of its own.
• Next we find points with the
least distance between
them, and start grouping
them together to form
clusters of multiple points. 

32 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..
1 2 • Proceeding this way we get three groups:
P1-P2, P3-P4, and P5-P6.
• Similarly, we have three dendrograms, as shown:

1st cluster and its


representation as a
dendogram
33 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..
3 In the next step, we bring two groups together. Now the two groups P3-P4 and P5-P6 are all under
one dendrogram because they're closer together than the P1-P2 group. This is as shown:

This box shows P3-P4 and P5-P6 in one


cluster and hence under one dendogram.

34 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Hierarchical Clustering..
4 We bring everything together by joining P1-P2 with cluster of P3-P4 and P5-P6

35 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
1

How do we represent a cluster of more than one point?

2
Here, we will make use of centroids, which is the average
of its points. 
Let’s first take the points (1, 2) and (2,1), and we’ll group
them together because they're close. For these points, we
compute a point in the middle and mark it as (1.5,1.5). It’s
the centroid of those two points. 

36 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
3

4 5

37 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
DIANA Hierarchical Clustering

38 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
When do we stop combining clusters?

There are many approaches to choose:


• Approach 1: Pick clusters(k). We pick the K value. We decide the number of clusters (say, the
first six or seven) required in the beginning, and we finish when we reach the value K.
• Approach 2: Stop when the next merge would create a cluster with low cohesion.
We keep clustering until the next merge of clusters creates a bad cluster/low cohesion setup.
That means it doesn't make sense to bring them together.
• Approach 3.1: Diameter of a cluster. Diameter is the maximum distance between any pair of
points in the cluster. We finish when the diameter of a new cluster exceeds the threshold.
• Approach 3.2: Radius of a cluster.

39 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Combining Multiple Learners - Voting, Bagging, Boosting

We discussed many different learning algorithms in the previous chapters. Though these are
generally successful, no one single algorithm is always the most accurate. Now, we are going to
discuss models composed of multiple learners that complement each other so that by combining
them, we attain higher accuracy.
What are the different ways to combine classifiers in machine learning?
They can be divided into two big groups:
1. Ensemble methods: Bagging(Bootstrap Aggregating) and Boosting are the most extended
ones.
2. Hybrid methods

An ensemble is a machine learning model that combines the predictions from two or more
models. The models that contribute to the ensemble, referred to as ensemble members, may be
the same type or different types and may or may not be trained on the same training data.

40 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Combining Multiple Learners - Model Combination Schemes

Ways to combine multiple base-learners are


1. Multiexpert combination methods have base-learners that work in parallel. These
methods can in turn be divided into two:
• global approach or learner fusion:, given an input, all base-learners generate an output and all
these outputs are used. Ex. Voting and Stacking.
• local approach or learner selection: a gating model looks at the input and chooses one (or very
few) of the learners as responsible for generating the output

2. Multistage combination methods use serial approach where the next combination base-
learner is trained with or tested on only the instances where the previous base-learners are not
accurate enough.
OR
base-learners are sorted in increasing complexity so that a complex base-learner is not used
unless the preceding simpler base-learners are not confident. Ex. Cascading.

41 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Combining Multiple Learners - Voting, Bagging, Boosting…

What is Ensemble learning?


It is a machine learning pattern where multiple models (often called weak learners or base models)
are trained to solve the same problem and combined to get better performances. The main
hypothesis is that if we combine the weak learners the right way we can obtain more accurate
and/or robust models.
• Uses multiple learning algorithm together for the same task.
• Better predictions than individual learning models.

• Suppose, we have a classifier that is 80 percent accurate.


• When we decide on a second classifier, we do not care for the overall accuracy; we care only
about how accurate it is on the 20 percent that the first classifier misclassifies.

42 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Combining Multiple Learners - Voting, Bagging, Boosting…

43 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
Types of Ensembling techniques include:
• Bagging or Bootstrap Aggregation
• Boosting
• Stacking Classifier
• Voting Classifier

44 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.4 Voting
• This is the simplest way to combine multiple classifiers.
• It corresponds to taking a linear combination of the learners
• This is also known as ensembles and linear opinion pools.
• In simple voting, all learners are given equal weight and we take the average.

Table 17.1 Classifier combination rules Table 17.2 Example of combination rules on three learners
and three classes

• Sum rule is the widely used in practice.


• Median rule is more robust to outliers;
• Minimum and Maximum rules are pessimistic and optimistic, respectively.
• With the product rule, each learner has veto power.

45 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.4 Voting…

Voting Classifier:
• A voting classifier is a machine learning estimator that trains various base models or estimators
and predicts on the basis of aggregating the findings of each base estimator.
• It is a homogeneous and heterogeneous type of Ensemble Learning, that is, the base
classifiers can be of the same or different type.
• also works as an extension of bagging (e.g. Random Forest).
The voting criteria can be of two types:
• Hard Voting: Voting is calculated on the predicted output class.
• Soft Voting: Voting is calculated on the predicted probability of the output class.

Simple voting is a special case where all voters have equal weight.
 This is called plurality voting where the class having the maximum number of votes is the
winner.
 When there are two classes, this is majority voting.
• Voting schemes can be seen as approximations under a Bayesian framework. This is Bayesian
model combination.
46 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.4 Voting…
Figure 2. Voting Classifier in
Figure 1. Voting “soft” mode
Classifier in “hard”
mode

47 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.6 Bagging(Bootstrap Aggregating )

• Bagging is a voting method where base-learners are made different by training them over
slightly different training sets.
• Unstable algorithm: A learning algorithm is unstable if small changes in the training set
causes a large difference in the generated learner.
• Bagging, short for bootstrap aggregating, uses bootstrap to generate L training sets, trains L
base-learners using an unstable learning procedure, and then, during testing, takes an

average. The base-learners dj are trained with these L samples Xj .


• Bagging can be used both for classification and regression. In the case of regression, to be
more robust, one can take the median instead of the average when combining predictions.
Algorithms such as decision trees and multilayer perceptrons are unstable. Nearest neighbor is stable, but condensed nearest
neighbor is unstable
48 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.6 Bagging(Bootstrap Aggregating )….
• Bagging is composed of two parts: aggregation and bootstrapping.
Bootstrapping is a sampling method, where a sample is chosen out of a set, using the replacement
method. This is called row sampling with replacement. The learning algorithm is then run on the
samples selected.
Aggregation: Model predictions undergo aggregation to combine them for the final prediction. The
aggregation can be done based on the total number of outcomes or the probability of predictions.
Advantage of Bagging
• Allows many weak learners to combine and outdo a single strong learner.
• It helps in the reduction of variance.
• hence eliminating the overfitting of models in the procedure.
Disadvantage of bagging
• It introduces a loss of interpretability of a model.

49 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.6 Bagging(Bootstrap Aggregating)…

Aggregation
Bootstrapping

50 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.7 Boosting
• Here, we try to generate complementary base-learners by training the next learner boosting on
the mistakes of the previous learners.
• The original boosting algorithm combines three weak learners to generate a strong learner.
1. Given a large training set, randomly divide it into three.
2. Use X1 and train d1.
3. Then take X2 and feed it to d1.
4. All instances misclassified by d1 and many instances in X2 where d1 is correct together form
the training set of d2.
5. Then take X3 and feed it to d1 and d2.
6. The instances on which d1 and d2 disagree form the training set of d3.
7. Testing: feed an instance to d1 and d2; if they agree, that is the response, otherwise the response
of d3 is taken as the output.
8. This overall system has reduced error rate.
51 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.7 Boosting…

4 Boosting Algorithms in Machine Learning


1.Gradient Boosting Machine (GBM)
2.Extreme Gradient Boosting Machine (XGBM)
3.LightGBM
4.CatBoost

Disadvantage:
• Though successful, it requires a very large training sample.
• The sample should be divided into three and, the second and third classifiers are only
trained on a subset on which the previous ones err.
• So without a large training set, d2 and d3 will not have training sets of reasonable size.

52 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.7 Boosting…

53 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
17.7 Boosting…

Though it is quite successful, the disadvantage of the original boosting method is that it requires a very large training
sample. ADABOOST is a variant of boosting technique.

• AdaBoost (Adaptive boosting) was the first boosting algorithm to combine various weak classifiers into a single
strong classifier in the history of machine learning.
• It primarily focuses to solve classification tasks such as binary classification.
ADABOOST - adaptive boosting, uses the same training set over and over and thus need not be large, but the
classifiers should be simple so that they do not overfit. AdaBoost can also combine an arbitrary number of base-
learners, not three.
ADABOOST Outline – we assign equal probability to all data instance initially and give a portion to first learner, using
the trained first learner we classify all the datasets, we update the probabilities of each instance so that misclassified
instance have higher chance to be chosen and fed to the next learner, this process is repeated over all learners in a serial
manner, the final learner is expected to produce all result rightly classified.

54 10/18/2022 MCA@NirmalaCollegeMuvattupuzha
End of Module 4

Thank you

55 10/18/2022 MCA@NirmalaCollegeMuvattupuzha

You might also like