3CP10 MJJ Clustering Intro

The document discusses clustering as a key aspect of unsupervised learning, which involves grouping similar objects without labeled data. It outlines various clustering approaches, including partitioning, hierarchical, model-based, density-based, and graph-theoretic methods, along with their applications in fields like biology and marketing. Additionally, it covers the importance of distance metrics and evaluation methods for assessing clustering quality.

Uploaded by

ypwudfhck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views18 pages

3CP10 MJJ Clustering Intro

Uploaded by

ypwudfhck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Foundations of Machine Learning

Module 9: Clustering
Part A: Introduction and kmeans

Sudeshna Sarkar
IIT Kharagpur
Unsupervised learning
• Unsupervised learning:
– Data with no target attribute. Describe hidden structure from
unlabeled data.
– Explore the data to find some intrinsic structures in them.
• Clustering: the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more
similar to each other than to those in other clusters.
• Useful for
– Automatically organizing data.
– Understanding hidden structure in data.
– Preprocessing for further analysis.
2
Applications: News Clustering (Google)
Gene Expression Clustering
Other Applications
• Biology: classification of plants and animal kingdom
given their features
• Marketing: Customer Segmentation based on a
database of customer data containing their
properties and past buying records
• Clustering weblog data to discover groups of similar
access patterns.
• Recognize communities in social networks.
An illustration
• This data set has four natural clusters.

6
An illustration
• This data set has four natural clusters.

7
Aspects of clustering
• A clustering algorithm such as
– Partitional clustering eg, kmeans The quality of a
– Hierarchical clustering eg, AHC clustering result
– Mixture of Gaussians depends on the
• A distance or similarity function algorithm, the
distance function,
– such as Euclidean, Minkowski, cosine
and the application.
• Clustering quality
– Inter-clusters distance  maximized
– Intra-clusters distance  minimized

8
Major Clustering Approaches
• Partitioning: Construct various partitions and then evaluate
them by some criterion
• Hierarchical: Create a hierarchical decomposition of the set of
objects using some criterion
• Model-based: Hypothesize a model for each cluster and find
best fit of models to data
• Density-based: Guided by connectivity and density functions
• Graph-Theoretic Clustering

9
Partitioning Algorithms
• Partitioning method: Construct a partition of a
database D of m objects into a set of k clusters
• Given a k, find a partition of k clusters that optimizes
the chosen partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic method: k-means (MacQueen, 1967)

10
Hierarchical Clustering
animal

vertebrate invertebrate

fish reptile amphib. mammal worm insect crustacean

• Produce a nested sequence of clusters.

• One approach: recursive application of a partitional
clustering algorithm.
Model Based Clustering
• A model is hypothesized
• e,g., Assume data is
generated by a mixture of
underlying probability
distributions
• Fit the data to model
Density based Clustering
• Based on density
connected points
• Locates regions of high
density separated by
regions of low density
• e.g., DBSCAN
Graph Theoretic Clustering
• Weights of edges
between items (nodes)
based on similarity
• E.g., look for minimum
cut in a graph
(Dis)similarity measures
• Distance metric (scale-dependent)
– Minkowski family of distance measures

Manhattan (p=1), Euclidean (p=2)

– Cosine distance
(Dis)similarity measures
• Correlation coefficients (scale-invariant)
• Mahalanobis distance

• Pearson correlation
Quality of Clustering
• Internal evaluation:
– assign the best score to the algorithm that produces clusters with high
similarity within a cluster and low similarity between clusters, e.g.,
Davies-Bouldin index

• External evaluation:
– evaluated based on data such as known class labels and external
benchmarks, eg, Rand Index, Jaccard Index, f-measure
Thank You

Classify Clustering
No ratings yet
Classify Clustering
31 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Clustering
No ratings yet
Clustering
44 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
UNIT-4
No ratings yet
UNIT-4
62 pages
Clustering
No ratings yet
Clustering
22 pages
unit4
No ratings yet
unit4
96 pages
clustering
No ratings yet
clustering
20 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit-8 (1)
No ratings yet
Unit-8 (1)
62 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Clustering new
No ratings yet
Clustering new
6 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
Clustering
No ratings yet
Clustering
57 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
L 8 Clustering
No ratings yet
L 8 Clustering
58 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering
No ratings yet
Clustering
13 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
unit-4 ML
No ratings yet
unit-4 ML
16 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
M5
No ratings yet
M5
40 pages
K Means
No ratings yet
K Means
9 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Module 5
No ratings yet
Module 5
91 pages
Chapter 12 - Machine Learning
No ratings yet
Chapter 12 - Machine Learning
47 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Clustering
No ratings yet
Clustering
84 pages
clustering
No ratings yet
clustering
16 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Machine Learning For Everyone
No ratings yet
Machine Learning For Everyone
35 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python Machine Learning
From Everand
Python Machine Learning
Sebastian Raschka
4/5 (18)
Search Algorithms and Systems: Definitive Reference for Developers and Engineers
From Everand
Search Algorithms and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
IBM Cloud Pak for Business Automation
No ratings yet
IBM Cloud Pak for Business Automation
3 pages
DBMS R19 - Unit-4
No ratings yet
DBMS R19 - Unit-4
9 pages
ENGLISH 8 SECOND QUARTER Module 1
No ratings yet
ENGLISH 8 SECOND QUARTER Module 1
21 pages
Ecm in Sharepoint Online
No ratings yet
Ecm in Sharepoint Online
67 pages
Buckland, What Is A Document
No ratings yet
Buckland, What Is A Document
8 pages
CSC270_DB_CDF_V4.0
No ratings yet
CSC270_DB_CDF_V4.0
2 pages
Computer XII-CHAPTER-15
No ratings yet
Computer XII-CHAPTER-15
6 pages
Alapan Das's Resume
No ratings yet
Alapan Das's Resume
2 pages
PL-900 StudyGuide ENU v204 6.1
No ratings yet
PL-900 StudyGuide ENU v204 6.1
6 pages
Mini Literature Analysis On Information Technology Definition
No ratings yet
Mini Literature Analysis On Information Technology Definition
3 pages
Course 1BluePrint. Updated
No ratings yet
Course 1BluePrint. Updated
13 pages
IEEE Xplore Digital Library Subscription Option
No ratings yet
IEEE Xplore Digital Library Subscription Option
2 pages
Chapter 1 - Introduction To Databases: 2-What Is A Database
No ratings yet
Chapter 1 - Introduction To Databases: 2-What Is A Database
13 pages
Steps To Perform Switchover
No ratings yet
Steps To Perform Switchover
3 pages
Create Multiple Purchase Orders in SAP
No ratings yet
Create Multiple Purchase Orders in SAP
11 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
45 pages
How To Connect Python Programs To MariaDB
No ratings yet
How To Connect Python Programs To MariaDB
6 pages
Big Data Notes With Diagrams
No ratings yet
Big Data Notes With Diagrams
3 pages
IS223 Access DataFiles Tutorial Data Files Updated Spring 2024 Revised
No ratings yet
IS223 Access DataFiles Tutorial Data Files Updated Spring 2024 Revised
5 pages
Informatica Vs Odi
100% (1)
Informatica Vs Odi
5 pages
Power BI
No ratings yet
Power BI
15 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Usability Testing Survey for VibeFit_ AI Coach Fitness App - Google Forms
No ratings yet
Usability Testing Survey for VibeFit_ AI Coach Fitness App - Google Forms
5 pages
Question Bank IRS All Module - OS
No ratings yet
Question Bank IRS All Module - OS
5 pages
Spool Respcal - SQL: 11 1 Rlwrap Sqlplus System/admonbd2 As Sysdba @respcal - SQL
No ratings yet
Spool Respcal - SQL: 11 1 Rlwrap Sqlplus System/admonbd2 As Sysdba @respcal - SQL
11 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
8 pages
Data Dictionary Interview Questions
100% (1)
Data Dictionary Interview Questions
41 pages
CSS (Week 5)
No ratings yet
CSS (Week 5)
2 pages
Cit 743 Past Questions
No ratings yet
Cit 743 Past Questions
3 pages
Data science case report
No ratings yet
Data science case report
20 pages

3CP10 MJJ Clustering Intro

Uploaded by

3CP10 MJJ Clustering Intro

Uploaded by

Foundations of Machine Learning

fish reptile amphib. mammal worm insect crustacean

• Produce a nested sequence of clusters.

Manhattan (p=1), Euclidean (p=2)

You might also like