Clustering Algorithms

The document discusses different clustering techniques including hierarchical and partitional clustering. It describes hierarchical agglomerative clustering and three linkage methods - single, complete, and average linkage. It also explains k-means clustering including how it works, the algorithm, updating cluster means, and stopping criteria. Clustering and biclustering of microarray data is also briefly mentioned.

Uploaded by

Ayesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Clustering Algorithms

Uploaded by

Ayesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Clustering

Dr. Zoya Khalid

[email protected]
Clustering techniques

• Hierarchical: Organize elements into a tree, leaves represent genes and

the length of the paths between leaves represents the distances between
objects (genes, etc). Similar objects lie within the same subtrees. It has
two types:
• Agglomerative (Bottom-Up): Start with every element in its own cluster,
and iteratively join clusters together
• Divisive (Top-Down): Start with one cluster and iteratively divide it into
smaller clusters
Contid….
Measures of similarity and
dissimilarity (distance)
• There are many different ways of calculating similarity and distance
• Knowing your data is important
• When working on distance, pay attention to three properties:
positivity, symmetry, and triangle inequality.
• Examples
• Euclidean distance
Hierarchical Agglomerative Clustering
Most Hierarchical clustering algorithms are agglomerative
Three Techniques
Hierarchical clustering: Recomputing distances
d min (C , C * ) = min d ( x, y )
for all elements x in C and y in C*
• Distance between two clusters is the smallest distance between any pair of their elements
(single-linkage)

d max (C , C * ) = max d ( x, y )
for all elements x in C and y in C*
• Distance between two clusters is the largest distance between any pair of their elements
(complete-linkage)
d avg (C , C ) =
* å d ( x, y )
C C*
for all elements x in C and y in C*
• Distance between two clusters is the average distance between all pairs of their elements
(average-linkage)
Single Linkage example
Single Linkage continued

A B D F
Continued
Complete Linkage Method
Contid….
Contid….
Contid…
Which Distance Measure is Better?
• Each method has both advantages and disadvantages; application-
dependent, single-link and complete-link are the most common
methods
• Single-link
• Can find irregular-shaped clusters
• Sensitive to outliers
• Complete-link, Average-link,
• Robust to outliers
• Tend to break large clusters
• Prefer spherical clusters (smaller sized)
Partitional clustering
• It determines all clusters at once
They include:
• K-means and derivatives
• Fuzzy c-means clustering
• QT clustering algorithm
K –means clustering
K- Means Clustering
K-Means clustering
• consider an example in which our vectors have 2 dimensions

+ +

+ cluster center
profile

+
K-Means clustering
• each iteration involves two steps
• assignment of profiles to clusters
• re-computation of the cluster centers (means)
+ + + +

+ +

assignment re-computation of cluster centers

Example
y
Distance between two clusters
Distance from Cluster 1 – Cluster 2
y
Tabulate them
y
y
Tabulate the new dataset
y
y
y
y
Elbow Method
• It involves running the algorithm multiple times over a loop, with an
increasing number of cluster choice and then plotting a clustering
score as a function of the number of clusters.
How K-means algorithm works
K-Means clustering algorithm
• Input: K, number of clusters, a set X={x1,.. xN} of data points, where
xi are p-dimensional vectors
• Initialize
• Select initial cluster means f1, ….., fK
• Repeat until convergence
• Assign each xi to cluster C(i) such that

C(i) = argmin1≤k≤K ǁ xi - fk ǁ2

• Re-estimate the mean of each cluster based on new members

K-means: updating the mean
• To compute the mean of the kth cluster

1 X
fk = xi
Nk
i:C(i)=k

Number of points in cluster k All points in cluster k

K-means stopping criteria
1. Assignment of objects to clusters don’t change (convergence)

2. Maximum Number of iterations

Microarray data
Clustering of microarray data
Clsutering and Biclustering
• Biclustering - Identifies groups of genes with similar/coherent
expression patterns under a specific subset of the conditions.
• Clustering - Identifies groups of genes/conditions that show similar
activity patterns under all the set of conditions/all the set of genes
under analysis.

UnFIX Introduction
No ratings yet
UnFIX Introduction
24 pages
File
No ratings yet
File
10 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering
No ratings yet
Clustering
75 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
kmea
No ratings yet
kmea
53 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Clustering
No ratings yet
Clustering
75 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Clustering
No ratings yet
Clustering
125 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
24 pages
Clustering Part1
No ratings yet
Clustering Part1
19 pages
ML UNIT-5 (1)
No ratings yet
ML UNIT-5 (1)
30 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering
No ratings yet
Clustering
80 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Clustering
No ratings yet
Clustering
69 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Module 4
No ratings yet
Module 4
63 pages
Unit 2
No ratings yet
Unit 2
33 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Unit 5
No ratings yet
Unit 5
63 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Quiz2 - Solution
No ratings yet
Quiz2 - Solution
2 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Lecture2 - Background
No ratings yet
Lecture2 - Background
43 pages
Quiz 1 - Solution
No ratings yet
Quiz 1 - Solution
2 pages
Lec4 Databases
No ratings yet
Lec4 Databases
29 pages
Lecture 5 Fragment Assembly
No ratings yet
Lecture 5 Fragment Assembly
40 pages
Lec1-Introduction To Bioinformatics
No ratings yet
Lec1-Introduction To Bioinformatics
27 pages
Why Bioinformatics?: Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Why Bioinformatics?: Zoya Khalid Zoya - Khalid@nu - Edu.pk
22 pages
Assignment 3 CS-460
No ratings yet
Assignment 3 CS-460
2 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
38 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
A Biology Primer For Computer Scientists: Franco P. Preparata
No ratings yet
A Biology Primer For Computer Scientists: Franco P. Preparata
18 pages
3D Structure Prediction
No ratings yet
3D Structure Prediction
33 pages
Bio Assignment 3
No ratings yet
Bio Assignment 3
3 pages
IGCSE Language Booklet - 1
100% (1)
IGCSE Language Booklet - 1
180 pages
Final Case Analysis
No ratings yet
Final Case Analysis
5 pages
Chapter 1 Speech Situations Roles
No ratings yet
Chapter 1 Speech Situations Roles
16 pages
Capstone - Lesson 1
No ratings yet
Capstone - Lesson 1
14 pages
RPH 1
No ratings yet
RPH 1
5 pages
Autistic Self Advocacy Network Webinar With Autism NOW May 29 2014
No ratings yet
Autistic Self Advocacy Network Webinar With Autism NOW May 29 2014
23 pages
روش تدریس
No ratings yet
روش تدریس
47 pages
Art of Living PDF
No ratings yet
Art of Living PDF
1 page
Cot - DLP - Science 6 - Earth's Rotation Rassel
No ratings yet
Cot - DLP - Science 6 - Earth's Rotation Rassel
9 pages
Gadamer, Hans-Georg_ Heidegger, Martin_ Bultmann, Rudolf_ Thiselton, Anthony C. - The Two Horizons _ New Testament Hermeneutics and Philosophical Description With Special Reference to Heidegger, Bultmann, Ga
67% (3)
Gadamer, Hans-Georg_ Heidegger, Martin_ Bultmann, Rudolf_ Thiselton, Anthony C. - The Two Horizons _ New Testament Hermeneutics and Philosophical Description With Special Reference to Heidegger, Bultmann, Ga
192 pages
8657 Manual Spring 2022
No ratings yet
8657 Manual Spring 2022
61 pages
Planning and Writing A Speech
No ratings yet
Planning and Writing A Speech
8 pages
Narrative Report in Coaching and Mentoring
82% (11)
Narrative Report in Coaching and Mentoring
1 page
Future Continuous Tense X Mipa 1
No ratings yet
Future Continuous Tense X Mipa 1
4 pages
Presented By: Mubashir Hassan Prekshya Panth Priyanka Rajbhandari Neelam Kr. Patel Neeraj Singhal
No ratings yet
Presented By: Mubashir Hassan Prekshya Panth Priyanka Rajbhandari Neelam Kr. Patel Neeraj Singhal
19 pages
Divergent Project Combined Mtrls
No ratings yet
Divergent Project Combined Mtrls
4 pages
Mba III Consumer Behavior 16 Mbamm 301 Notes
No ratings yet
Mba III Consumer Behavior 16 Mbamm 301 Notes
145 pages
Facilitating Learning 1
No ratings yet
Facilitating Learning 1
6 pages
Hegel On Infinity
No ratings yet
Hegel On Infinity
14 pages
LESSON PLAN English 4
No ratings yet
LESSON PLAN English 4
4 pages
PETA HENSHELWOOD Grade 5 2007-2008 ISS
No ratings yet
PETA HENSHELWOOD Grade 5 2007-2008 ISS
18 pages
Embick & Marantz 2008
No ratings yet
Embick & Marantz 2008
53 pages
Year 2 Cefr
No ratings yet
Year 2 Cefr
2 pages
Community Health Nursing
No ratings yet
Community Health Nursing
32 pages
Family Counseling Structural Model
100% (1)
Family Counseling Structural Model
28 pages
PDF - English - Teaching Method
No ratings yet
PDF - English - Teaching Method
34 pages
Characteristics of Teachers
No ratings yet
Characteristics of Teachers
22 pages
Grammar 2
No ratings yet
Grammar 2
50 pages

Clustering Algorithms

Uploaded by

Clustering Algorithms

Uploaded by

Clustering

Dr. Zoya Khalid

• Hierarchical: Organize elements into a tree, leaves represent genes and

assignment re-computation of cluster centers

• Re-estimate the mean of each cluster based on new members

Number of points in cluster k All points in cluster k

2. Maximum Number of iterations

You might also like