0% found this document useful (0 votes)
12 views

07Clustering

Uploaded by

hussienayman366
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

07Clustering

Uploaded by

hussienayman366
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Cluster Analysis

What is Cluster Analysis?


 Finding groups of objects such that the objects in
a group will be similar (or related) to one another
and different from (or unrelated to) the objects in
other groups Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no
predefined classes
 Clustering is used:
 As a stand-alone tool to get insight into data distribution
 Visualization of clusters may unveil important information
 As a preprocessing step for other algorithms
 Efficient indexing or compression often relies on clustering
Some Applications of Clustering
What Is Good Clustering?
 A good clustering method will produce high
quality clusters with
 high intra-class similarity
 low inter-class similarity
 The quality of a clustering result depends on both
the similarity measure used by the method and its
implementation.
 The quality of a clustering method is also
measured by its ability to discover some or all of
the hidden patterns.
Requirements of Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to
determine input parameters
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
Clustering Algorithms
Four of the most used clustering Algorithms
Distances Measure
K-Means Clustering
Algorithm
K-means Clustering

 Partitional clustering approach


 Each cluster is associated with a centroid (center
point)
 Each point is assigned to the cluster with the
closest centroid
 Number of clusters, K, must be specified
 The basic algorithm is very simple
K-means Clustering – Details
 Initial centroids are often chosen randomly.
 Clusters produced vary from one run to another.
 The centroid is (typically) the mean of the points in
the cluster.
 ‘Closeness’ is measured by Euclidean distance,
cosine similarity, correlation, etc.
 Most of the convergence happens in the first few
iterations.
 Often the stopping condition is changed to ‘Until relatively
few points change clusters’
 Complexity is O( n * K * I * d )
 n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Two different K-means Clusterings
3

2.5

2
Original Points
1.5

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2


x

3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2


x x

Optimal Clustering Sub-optimal Clustering


How the K-Mean Clustering algorithm works?
Example of K-Means Clustering
𝐺 𝑖 = 𝐺 𝑖+1 That the objects does not move group anymore
Hierarchical Clustering
Algorithms
How They Work
Step 3 can be done in different ways:
Example
How to calculate distance between newly grouped
clustered (D,F) and other clusters?
Assignment
A hierarchical clustering of distances in kilometers between some Italian
cities. The method used is single-linkage.
Input distance matrix (L = 0 for all the clusters):

The process is summarized by


the following hierarchical tree

You might also like