22AIP3101A Session 9
22AIP3101A Session 9
MACHINE LEARNING
22AIP3101A
Topic:
CLUSTERING models
Session - 9
To familiarize students with the concepts of unsupervised machine learning, its difference with
supervised machine learning and the use of unsupervised learning, particularly clustering
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
• The data set has three natural groups of data points, i.e., 3 natural
clusters.
What is clustering for?
• A clustering algorithm
▪ Partitional clustering
▪ Hierarchical clustering
▪ …
• It can be used to cluster large datasets that do not fit in main memory
• Not the best method. There are other scale-up algorithms, e.g., BIRCH.
A disk version of k-means (cont …)
Strengths of k-means
• Strengths:
▪ Simple: easy to understand and to implement
▪ Efficient: Time complexity: O(tkn), where n is the number of data points, k
is the number of clusters, and t is the number of iterations.
▪ Since both k and t are small. k-means is considered a linear algorithm.
21
Hierarchical Clustering
• Divisive (top down) clustering: It starts with all data points in one
cluster, the root.
▪ Splits the root into a set of child clusters. Each child cluster is
recursively divided further
▪ stops when only singleton clusters of individual data points remain,
i.e., each cluster with only a single point
AGGLOMERATIVE CLUSTERING
24
AGGLOMERATIVE CLUSTERING
25
AGGLOMERATIVE CLUSTERING
26
AGGLOMERATIVE CLUSTERING
27
AGGLOMERATIVE CLUSTERING
28
DIVISIVE HIERARCHICAL CLUSTERING
29
DIVISIVE HIERARCHICAL CLUSTERING
30
DIVISIVE CLUSTERING
31
DIVISIVE CLUSTERING
• This approach starts with all of the objects in the same cluster.
• In the continuous iteration, a cluster is split up into smaller clusters.
• It is down until each object in one cluster or the termination condition
holds.
• This method is rigid, i.e., once a merging or splitting is done, it can never
be undone.
32
APPLICATIONS
33
Summary
▪ The centroid representation alone works well if the clusters are of the
hyper-spherical shape.
(a) TRUE
(b) FALSE
36
Density-Based
Self-Assessment Questions
(a) Dbscan
(b) Hierarchy
(c) Grid
(d) Project based
4. __________clusters formed in this method forms a tree-type structure based on the hierarchy.
(a) Dbscan
(b) Hierarchy
(c) Grid
(d) Project based
37
THANK YOU
OUR TEAM
38