0% found this document useful (0 votes)
55 views7 pages

Clustering L7

Clustering is an unsupervised machine learning technique that groups unlabeled data points together based on similarities. It partitions data into clusters such that data points within a cluster are more similar to each other than data points in other clusters. Common clustering algorithms include density-based, hierarchical, partitioning, and grid-based methods. K-means clustering is a simple and commonly used partitioning algorithm that assigns data points to the cluster with the nearest mean.

Uploaded by

u- m-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

Clustering L7

Clustering is an unsupervised machine learning technique that groups unlabeled data points together based on similarities. It partitions data into clusters such that data points within a cluster are more similar to each other than data points in other clusters. Common clustering algorithms include density-based, hierarchical, partitioning, and grid-based methods. K-means clustering is a simple and commonly used partitioning algorithm that assigns data points to the cluster with the nearest mean.

Uploaded by

u- m-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

-7-

Clustering
Clustering is the task of dividing the population or data points into a number of groups
such that data points in the same groups are more similar to other data points in the
same group and dissimilar to the data points in other groups. It is basically a collection
of objects on the basis of similarity and dissimilarity between them.
Why Clustering?
Clustering is very much important as it determines the intrinsic grouping among
the unlabeled data present. There are no criteria for good clustering. It depends on the
user, what is the criteria they may use which satisfy their need. For instance, we could
be interested in finding representatives for homogeneous groups (data reduction), in
finding “natural clusters” and describe their unknown properties (“natural” data types),
in finding useful and suitable groupings (“useful” data classes) or in finding unusual
data objects (outlier detection). This algorithm must make some assumptions that
constitute the similarity of points and each assumption make different and equally
valid clusters.
Clustering Methods:
Density-Based Methods: These methods consider the clusters as the dense region
having some similarities and differences from the lower dense region of the space.
These methods have good accuracy and the ability to merge two clusters.
Example DBSCAN (Density-Based Spatial Clustering of Applications with
Noise), OPTICS (Ordering Points to Identify Clustering Structure), etc.

Hierarchical Based Methods: The clusters formed in this method form a tree-
type structure based on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category
 Agglomerative (bottom-up approach)
 Divisive (top-down approach)
Partitioning Methods: These methods partition the objects into k clusters and each
partition forms one cluster. This method is used to optimize an objective criterion
similarity function such as when the distance is a major parameter example K-
means, CLARANS (Clustering Large Applications based upon Randomized
Search), etc.
Grid-based Methods: In this method, the data space is formulated into a finite
number of cells that form a grid-like structure. All the clustering operations done
on these grids are fast and independent of the number of data objects

1
-7-

example STING (Statistical Information Grid), wave cluster, CLIQUE


(CLustering In Quest), etc.

Applications of Clustering in different fields


 Marketing: It can be used to characterize & discover customer segments for
marketing purposes.
 Biology: It can be used for classification among different species of plants and
animals.
 Libraries: It is used in clustering different books on the basis of topics and
information.
 Insurance: It is used to acknowledge the customers, their policies and identifying
the frauds.
 City Planning: It is used to make groups of houses and to study their values based
on their geographical locations and other factors present.
 Earthquake studies: By learning the earthquake-affected areas we can determine
the dangerous zones.

Clustering Algorithms
K-means clustering algorithm
It is the simplest unsupervised learning algorithm that solves clustering problem. K-
means algorithm partitions n observations into k clusters where each observation
belongs to the cluster with the nearest mean serving as a prototype of the cluster.

2
-7-

K-mean classification algorithm


 Step1:In the beginning, Determine the number of classes (cluster) k
 Step2: Choose the centers of these classes, we can chose these centers either
selection certain points of the image randomly or based on some considerations.
 Step3: Calculate the Euclidean distance between the points of the image and centers
classes according to the following equation :

 Step4: Pixel(x) are assigned to their nearest class based on minimum distance.
 Step5 :For each class, recomputed its center by finding the mean of the class
according to the following equation :

Nj 
Z j (n  1)  1 Xi

Where Zj is the new mean, Nj is the number of points in class j, and Xi is the
points belonging to class.
 Step6: Compare the old centers of classes with the new centers. if there is no change
in centers the algorithm stop, Otherwise, repeat steps from 2.

Example:
Suppose we want to group the visitors to a website using just their age (one-dimensional
space) as follows:
n = 19
15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65
Initial clusters (random centroid or average):
k = 2, c1 = 16, c2 = 22

3
-7-

Iteration 1:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 16 22 1 7 1
15 16 22 1 7 1 15.33
16 16 22 0 6 1
19 16 22 9 3 2
19 16 22 9 3 2
20 16 22 16 2 2
20 16 22 16 2 2
21 16 22 25 1 2
22 16 22 36 0 2
28 16 22 12 6 2
35 16 22 19 13 2
36.25
40 16 22 24 18 2
41 16 22 25 19 2
42 16 22 26 20 2
43 16 22 27 21 2
44 16 22 28 22 2
60 16 22 44 38 2
61 16 22 45 39 2
65 16 22 49 43 2
Iteration 2:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 15.33 36.25 0.33 21.25 1
15 15.33 36.25 0.33 21.25 1
16 15.33 36.25 0.67 20.25 1
19 15.33 36.25 3.67 17.25 1
19 15.33 36.25 3.67 17.25 1 18.56
20 15.33 36.25 4.67 16.25 1
20 15.33 36.25 4.67 16.25 1
21 15.33 36.25 5.67 15.25 1
22 15.33 36.25 6.67 14.25 1

4
-7-

28 15.33 36.25 12.67 8.25 2


35 15.33 36.25 19.67 1.25 2
40 15.33 36.25 24.67 3.75 2
41 15.33 36.25 25.67 4.75 2
42 15.33 36.25 26.67 5.75 2
45.9
43 15.33 36.25 27.67 6.75 2
44 15.33 36.25 28.67 7.75 2
60 15.33 36.25 44.67 23.75 2
61 15.33 36.25 45.67 24.75 2
65 15.33 36.25 49.67 28.75 2
Iteration 3:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 18.56 45.9 3.56 30.9 1
15 18.56 45.9 3.56 30.9 1
16 18.56 45.9 2.56 29.9 1
19 18.56 45.9 0.44 26.9 1
19 18.56 45.9 0.44 26.9 1
19.50
20 18.56 45.9 1.44 25.9 1
20 18.56 45.9 1.44 25.9 1
21 18.56 45.9 2.44 24.9 1
22 18.56 45.9 3.44 23.9 1
28 18.56 45.9 9.44 17.9 1
35 18.56 45.9 16.44 10.9 2
40 18.56 45.9 21.44 5.9 2
41 18.56 45.9 22.44 4.9 2
42 18.56 45.9 23.44 3.9 2
43 18.56 45.9 24.44 2.9 2 47.89
44 18.56 45.9 25.44 1.9 2
60 18.56 45.9 41.44 14.1 2
61 18.56 45.9 42.44 15.1 2
65 18.56 45.9 46.44 19.1 2

5
-7-

Iteration 4:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 19.5 47.89 4.50 32.89 1
15 19.5 47.89 4.50 32.89 1
16 19.5 47.89 3.50 31.89 1
19 19.5 47.89 0.50 28.89 1
19 19.5 47.89 0.50 28.89 1
19.50
20 19.5 47.89 0.50 27.89 1
20 19.5 47.89 0.50 27.89 1
21 19.5 47.89 1.50 26.89 1
22 19.5 47.89 2.50 25.89 1
28 19.5 47.89 8.50 19.89 1
35 19.5 47.89 15.50 12.89 2
40 19.5 47.89 20.50 7.89 2
41 19.5 47.89 21.50 6.89 2
42 19.5 47.89 22.50 5.89 2
43 19.5 47.89 23.50 4.89 2 47.89
44 19.5 47.89 24.50 3.89 2
60 19.5 47.89 40.50 12.11 2
61 19.5 47.89 41.50 13.11 2
65 19.5 47.89 45.50 17.11 2

New centers= Old centers then the algorithm stop.

6
-7-

H.w.
Suppose you have the following data

100 120 140 150 100 120


110 130 170 160 110 130
100 120 180 100 100 120
120 150 140 180 120 150
100 120 140 150 100 120
110 130 170 160 110 130
And you have the following Centers
90 120 180 220

Apply K-mean Classification to the above image

You might also like