0% found this document useful (0 votes)

55 views7 pages

Clustering L7

Clustering is an unsupervised machine learning technique that groups unlabeled data points together based on similarities. It partitions data into clusters such that data points within a cluster are more similar to each other than data points in other clusters. Common clustering algorithms include density-based, hierarchical, partitioning, and grid-based methods. K-means clustering is a simple and commonly used partitioning algorithm that assigns data points to the cluster with the nearest mean.

Uploaded by

u- m-

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views7 pages

Clustering L7

Uploaded by

u- m-

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

-7-

Clustering
Clustering is the task of dividing the population or data points into a number of groups
such that data points in the same groups are more similar to other data points in the
same group and dissimilar to the data points in other groups. It is basically a collection
of objects on the basis of similarity and dissimilarity between them.
Why Clustering?
Clustering is very much important as it determines the intrinsic grouping among
the unlabeled data present. There are no criteria for good clustering. It depends on the
user, what is the criteria they may use which satisfy their need. For instance, we could
be interested in finding representatives for homogeneous groups (data reduction), in
finding “natural clusters” and describe their unknown properties (“natural” data types),
in finding useful and suitable groupings (“useful” data classes) or in finding unusual
data objects (outlier detection). This algorithm must make some assumptions that
constitute the similarity of points and each assumption make different and equally
valid clusters.
Clustering Methods:
Density-Based Methods: These methods consider the clusters as the dense region
having some similarities and differences from the lower dense region of the space.
These methods have good accuracy and the ability to merge two clusters.
Example DBSCAN (Density-Based Spatial Clustering of Applications with
Noise), OPTICS (Ordering Points to Identify Clustering Structure), etc.

Hierarchical Based Methods: The clusters formed in this method form a tree-
type structure based on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category
 Agglomerative (bottom-up approach)
 Divisive (top-down approach)
Partitioning Methods: These methods partition the objects into k clusters and each
partition forms one cluster. This method is used to optimize an objective criterion
similarity function such as when the distance is a major parameter example K-
means, CLARANS (Clustering Large Applications based upon Randomized
Search), etc.
Grid-based Methods: In this method, the data space is formulated into a finite
number of cells that form a grid-like structure. All the clustering operations done
on these grids are fast and independent of the number of data objects

1
-7-

example STING (Statistical Information Grid), wave cluster, CLIQUE

(CLustering In Quest), etc.

Applications of Clustering in different fields

 Marketing: It can be used to characterize & discover customer segments for
marketing purposes.
 Biology: It can be used for classification among different species of plants and
animals.
 Libraries: It is used in clustering different books on the basis of topics and
information.
 Insurance: It is used to acknowledge the customers, their policies and identifying
the frauds.
 City Planning: It is used to make groups of houses and to study their values based
on their geographical locations and other factors present.
 Earthquake studies: By learning the earthquake-affected areas we can determine
the dangerous zones.

Clustering Algorithms
K-means clustering algorithm
It is the simplest unsupervised learning algorithm that solves clustering problem. K-
means algorithm partitions n observations into k clusters where each observation
belongs to the cluster with the nearest mean serving as a prototype of the cluster.

2
-7-

K-mean classification algorithm

 Step1:In the beginning, Determine the number of classes (cluster) k
 Step2: Choose the centers of these classes, we can chose these centers either
selection certain points of the image randomly or based on some considerations.
 Step3: Calculate the Euclidean distance between the points of the image and centers
classes according to the following equation :

 Step4: Pixel(x) are assigned to their nearest class based on minimum distance.
 Step5 :For each class, recomputed its center by finding the mean of the class
according to the following equation :

Nj 
Z j (n  1)  1 Xi

Where Zj is the new mean, Nj is the number of points in class j, and Xi is the
points belonging to class.
 Step6: Compare the old centers of classes with the new centers. if there is no change
in centers the algorithm stop, Otherwise, repeat steps from 2.

Example:
Suppose we want to group the visitors to a website using just their age (one-dimensional
space) as follows:
n = 19
15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65
Initial clusters (random centroid or average):
k = 2, c1 = 16, c2 = 22

3
-7-

Iteration 1:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 16 22 1 7 1
15 16 22 1 7 1 15.33
16 16 22 0 6 1
19 16 22 9 3 2
19 16 22 9 3 2
20 16 22 16 2 2
20 16 22 16 2 2
21 16 22 25 1 2
22 16 22 36 0 2
28 16 22 12 6 2
35 16 22 19 13 2
36.25
40 16 22 24 18 2
41 16 22 25 19 2
42 16 22 26 20 2
43 16 22 27 21 2
44 16 22 28 22 2
60 16 22 44 38 2
61 16 22 45 39 2
65 16 22 49 43 2
Iteration 2:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 15.33 36.25 0.33 21.25 1
15 15.33 36.25 0.33 21.25 1
16 15.33 36.25 0.67 20.25 1
19 15.33 36.25 3.67 17.25 1
19 15.33 36.25 3.67 17.25 1 18.56
20 15.33 36.25 4.67 16.25 1
20 15.33 36.25 4.67 16.25 1
21 15.33 36.25 5.67 15.25 1
22 15.33 36.25 6.67 14.25 1

4
-7-

28 15.33 36.25 12.67 8.25 2

35 15.33 36.25 19.67 1.25 2
40 15.33 36.25 24.67 3.75 2
41 15.33 36.25 25.67 4.75 2
42 15.33 36.25 26.67 5.75 2
45.9
43 15.33 36.25 27.67 6.75 2
44 15.33 36.25 28.67 7.75 2
60 15.33 36.25 44.67 23.75 2
61 15.33 36.25 45.67 24.75 2
65 15.33 36.25 49.67 28.75 2
Iteration 3:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 18.56 45.9 3.56 30.9 1
15 18.56 45.9 3.56 30.9 1
16 18.56 45.9 2.56 29.9 1
19 18.56 45.9 0.44 26.9 1
19 18.56 45.9 0.44 26.9 1
19.50
20 18.56 45.9 1.44 25.9 1
20 18.56 45.9 1.44 25.9 1
21 18.56 45.9 2.44 24.9 1
22 18.56 45.9 3.44 23.9 1
28 18.56 45.9 9.44 17.9 1
35 18.56 45.9 16.44 10.9 2
40 18.56 45.9 21.44 5.9 2
41 18.56 45.9 22.44 4.9 2
42 18.56 45.9 23.44 3.9 2
43 18.56 45.9 24.44 2.9 2 47.89
44 18.56 45.9 25.44 1.9 2
60 18.56 45.9 41.44 14.1 2
61 18.56 45.9 42.44 15.1 2
65 18.56 45.9 46.44 19.1 2

5
-7-

Iteration 4:
Distance Distance New Centroid
xi c1 c2 1 2
Nearest Cluster
(Z)
15 19.5 47.89 4.50 32.89 1
15 19.5 47.89 4.50 32.89 1
16 19.5 47.89 3.50 31.89 1
19 19.5 47.89 0.50 28.89 1
19 19.5 47.89 0.50 28.89 1
19.50
20 19.5 47.89 0.50 27.89 1
20 19.5 47.89 0.50 27.89 1
21 19.5 47.89 1.50 26.89 1
22 19.5 47.89 2.50 25.89 1
28 19.5 47.89 8.50 19.89 1
35 19.5 47.89 15.50 12.89 2
40 19.5 47.89 20.50 7.89 2
41 19.5 47.89 21.50 6.89 2
42 19.5 47.89 22.50 5.89 2
43 19.5 47.89 23.50 4.89 2 47.89
44 19.5 47.89 24.50 3.89 2
60 19.5 47.89 40.50 12.11 2
61 19.5 47.89 41.50 13.11 2
65 19.5 47.89 45.50 17.11 2

New centers= Old centers then the algorithm stop.

6
-7-

H.w.
Suppose you have the following data

100 120 140 150 100 120

110 130 170 160 110 130
100 120 180 100 100 120
120 150 140 180 120 150
100 120 140 150 100 120
110 130 170 160 110 130
And you have the following Centers
90 120 180 220

Apply K-mean Classification to the above image

AI Projects
100% (1)
AI Projects
5 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
ML - 8
No ratings yet
ML - 8
70 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Cluster
100% (1)
Cluster
72 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Lect 4
No ratings yet
Lect 4
34 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
M5
No ratings yet
M5
40 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Clustering
No ratings yet
Clustering
34 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
3.K-metoids and hierarchical updated (2)
No ratings yet
3.K-metoids and hierarchical updated (2)
50 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
PART2
No ratings yet
PART2
61 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
DM UNIT IV (1)
No ratings yet
DM UNIT IV (1)
45 pages
Clustering
No ratings yet
Clustering
75 pages
Lect 12
No ratings yet
Lect 12
80 pages
Lecture-18-Clustering-19092024-091909am
No ratings yet
Lecture-18-Clustering-19092024-091909am
33 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
UNIT5
No ratings yet
UNIT5
60 pages
Unit 5
No ratings yet
Unit 5
63 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering
No ratings yet
Clustering
45 pages
Grouping
No ratings yet
Grouping
98 pages
Unit-4
No ratings yet
Unit-4
19 pages
Clustering
No ratings yet
Clustering
104 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Clustering
No ratings yet
Clustering
80 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Autodesk Fusion 360 Black Book (V 2.0.15293) - Part 1
From Everand
Autodesk Fusion 360 Black Book (V 2.0.15293) - Part 1
Gaurav Verma
No ratings yet
Formal Approaches To Sla-Universal Grammar Mat-English: Ivan T. Barroga
No ratings yet
Formal Approaches To Sla-Universal Grammar Mat-English: Ivan T. Barroga
39 pages
Senior Artificial Intelligence Engineer CV
No ratings yet
Senior Artificial Intelligence Engineer CV
5 pages
A Survey of Deep Learning
No ratings yet
A Survey of Deep Learning
21 pages
Deep Learning Unit 3 (Part-2)
No ratings yet
Deep Learning Unit 3 (Part-2)
56 pages
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
No ratings yet
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
5 pages
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
No ratings yet
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
22 pages
Chapter 01 Intro
No ratings yet
Chapter 01 Intro
60 pages
Machine Learning System Design
No ratings yet
Machine Learning System Design
18 pages
AI and ML Report-Final
No ratings yet
AI and ML Report-Final
2 pages
Imc Pid PDF
No ratings yet
Imc Pid PDF
29 pages
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
No ratings yet
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
20 pages
Research Proposal
No ratings yet
Research Proposal
16 pages
Business Communication
No ratings yet
Business Communication
13 pages
LectDB 26recovery-1
No ratings yet
LectDB 26recovery-1
16 pages
Dayananda Sagar College of Engineering USN: Model Question Paper
No ratings yet
Dayananda Sagar College of Engineering USN: Model Question Paper
5 pages
Answer All Questions PART A - (5 2 10)
No ratings yet
Answer All Questions PART A - (5 2 10)
2 pages
SVM VS SVC
No ratings yet
SVM VS SVC
27 pages
Us Federal Government Agency Alation Customer Case Study
No ratings yet
Us Federal Government Agency Alation Customer Case Study
5 pages
Shannon and Weaver Model of Communication
No ratings yet
Shannon and Weaver Model of Communication
2 pages
Assignment 1 Database Latest
No ratings yet
Assignment 1 Database Latest
5 pages
Lean Diamonds Awards Trends 2022
No ratings yet
Lean Diamonds Awards Trends 2022
29 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
Modified Internal Model Control For Unstable Systems: Kou Yamada
No ratings yet
Modified Internal Model Control For Unstable Systems: Kou Yamada
10 pages
Artificial Intelligence Course For Managers Leaders
No ratings yet
Artificial Intelligence Course For Managers Leaders
12 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
Psycholinguistics Presentation
No ratings yet
Psycholinguistics Presentation
26 pages
Weick's Model and Its Applications To Health Promotion and Communication
No ratings yet
Weick's Model and Its Applications To Health Promotion and Communication
10 pages
Chapter 5 CS Design
No ratings yet
Chapter 5 CS Design
42 pages
Content Based Image Retrieval PHD Thesis
100% (3)
Content Based Image Retrieval PHD Thesis
8 pages

Clustering L7

Uploaded by

Clustering L7

Uploaded by

-7-

example STING (Statistical Information Grid), wave cluster, CLIQUE

Applications of Clustering in different fields

K-mean classification algorithm

28 15.33 36.25 12.67 8.25 2

New centers= Old centers then the algorithm stop.

100 120 140 150 100 120

Apply K-mean Classification to the above image

You might also like