0% found this document useful (0 votes)

6 views

Clustering-Part 1

Uploaded by

abebaw

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Clustering-Part 1

Uploaded by

abebaw

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

CHAPTER 4:

Clustering
Contents…

4. Clustering
 Introduction to Unsupervised Learning
 Introduction to Clustering
 Clustering Applications
 Partioning-based Clustering
 Hierarchical-based clustering
 Density-based clustering
 Evaluation methods for Clustering

2
Types of ML
Supervised Unsupervised

have a target column  Doesn’t have a target column

Data points have a known outcome  Data points have unknown outcome

 Unsupervised algorithms are relevant when we don’t have an

outcome or labeled variable we are trying to predict. 3
Types of ML
Supervised Unsupervised

have a target column  Doesn’t have a target column

Data points have a known outcome  Data points have unknown outcome

 Unsupervised algorithms are relevant when we don’t have an

outcome or labeled variable we are trying to predict. 4
Unsupervised ML –
Group New Articles by Example

 Unsupervised algorithms are helpful to find structures within our data

set and when we want to partition our data set into smaller pieces.
 Explore the data to find some intrinsic structures in them 5
Unsupervised ML –
Group New Articles by Example

 Unsupervised algorithms are helpful to find structures within our data

set and when we want to partition our data set into smaller pieces.
 Explore the data to find some intrinsic structures in them 6
Types of Unsupervised ML
Dimensionality
Clustering
Reduction
The process of partitioning a set
 Use structural
of data objects (or observations)
characteristics to simplify
into subsets
data
Identify unknown structure in data
 Examples
 Examples  Principal Component Analysis
 K-Means  Non-Negative Matrix
 K-Medoids Factorization
 Agglomerative Clustering
 DBSCAN

7
Cluster Analysis
 Cluster- a collection of data objects
 Clustering- is the process of partitioning a set of data objects (or
observations) into subsets
 Each subset is a cluster, such that:
 objects in a cluster are similar to one another
 dissimilar to objects in other clusters

Definition. Given a database D ={t1,t2,…,tn} of tuples and an integer value k,

the clustering problem is to define mapping f:D→{1,…,k} where each ti is
assigned to one cluster Kj, 1≤ j ≤ k. A cluster, Kj, containing precisely those
8
tuples mapped to it; that is, Kj = {ti | f(ti) = Kj, 1 ≤ i ≤ n, and ti ∈ D}
Clustering Applications
 Classification
 Clustering of document into topics
 Image pattern recognition
 Handwritten character recognition
 Image compression
 Web search
 Grouping of search results
 Customer Segmentation
 Helps marketers discover distinct groups, so that they can
characterize their customer groups based on the purchasing
patterns
 As a data mining tool, it can be used to gain insight into the
distribution of data.

9
Application of Clustering

 Clustering can be used

for outlier/anomaly
detection.
 An outliers :- are values
which are far away from
any values of the cluster.
 Example -credit card
fraud detection.

10
Requirement for cluster analysis
 Scalability :- Clustering on only a sample of a given large data
set may lead to biased results. Therefore, highly scalable
clustering algorithms are needed
 Ability to deal with different types of attributes: - a clustering
algorithms shall be able to cluster numeric, nominal, ordinal and
other types of data
 Ability to deal with noisy data :- Clustering algorithms can be
sensitive to noise (outliers, missing values unknown or
erroneous data) and may produce poor-quality clusters.
Therefore, we need clustering methods that are robust to noise.
 Interpretability :− The clustering results should be
interpretable, comprehensible, and usable.
 High dimensionality :− The clustering algorithm should not only
be able to handle low-dimensional data but also the high
dimensional space

11
Categories of Clustering Methods

12
Clustering Methods: Partitioning

 Partitioning methods : - divides n objects in to k partitions of

data.
 Each group (cluster) must contain at least one object
 Adopts exclusive cluster separation :- each object must
belong to exactly one group
 are distance based
 Follows iterative relocation technique – by moving objects
from one group to an other it improves the partitioning
 Good partition – objects in same cluster are close, while
objects in different clusters are far apart
 For example K-means and k-mediods

13
Clustering Methods: K-Means

 k-Means clustering algorithm proposed by J. Hartigan and M. A.

Wong [1979].
 Given a set of n distinct objects, the k-Means clustering
algorithm partitions the objects into k number of clusters such
that intracluster similarity is high but the intercluster
similarity is low.
 In this algorithm, user has to specify k, the number of clusters
and consider the objects are defined with numeric attributes
and thus using any one of the distance metric to demarcate the
clusters.

14
Clustering Methods: K-Means

The algorithm can be stated as follows.

 Step 1: Randomly assign cluster centroids
 First it selects k number of objects at random from the set of n objects. These
k objects are treated as the centroids or center of gravities of k clusters.

15
Clustering Methods: K-Means

The algorithm can be stated as follows.

 Step 2: Each point belongs to closet centroid
 For each of the remaining objects, it is assigned to one of the closest centroid.
Thus, it forms a collection of objects assigned to each centroid and is called a
cluster.

16
Clustering Methods: K-Means

The algorithm can be stated as follows.

17
Clustering Methods: K-Means

The algorithm can be stated as follows.

 Step 3: Move each centroid to cluster’s mean
 the centroid of each cluster is then updated (by calculating the mean values
of attributes of each object).

18
Clustering Methods: K-Means

The algorithm can be stated as follows.

 Step 3: Move each centroid to cluster’s mean
 the centroid of each cluster is then updated (by calculating the mean values
of attributes of each object).

19
Clustering Methods: K-Means

The algorithm can be stated as follows.

 Step 3: Move each centroid to cluster’s mean
 the centroid of each cluster is then updated (by calculating the mean values
of attributes of each object).
 The assignment and update procedure is until it reaches some
stopping criteria (such as, number of iteration, centroids remain
unchanged or no assignment, etc)

 Points don’t change- converged 20

k-Means Algorithm
Algorithm : k-Means clustering
 Input:
 D : a dataset containing n objects,
 k : the number of cluster
 Output: A set of k clusters

Steps:
1. arbitrarily choose k objects from D as the initial cluster
centroids.
2. For each of the objects in D do
 Compute distance between the current objects and k
cluster centroids
 Assign the current object to that cluster to which it is
closest.
3. Compute the “cluster centers” of each cluster. These become
the new cluster centroids.
4. Repeat step 2-3 until the convergence criterion is satisfied
21
5. Stop
k-Means Algorithm
Note:
1) Objects are defined in terms of set of attributes. 𝐴 = {𝐴1 , 𝐴2 , … . . , 𝐴𝑚 }
where each 𝐴𝑖 is continuous data type.

2) Distance computation: Any distance such as 𝐿1 , 𝐿2 , 𝐿3 or cosine

similarity.

3) Minimum distance is the measure of closeness between an object and

centroid.

4) Mean Calculation: It is the mean value of each attribute values of all

objects.

5) Convergence criteria: Any one of the following are termination

condition of the algorithm.
 Number of maximum iteration permissible.

 No change of centroid values in any cluster.

 Zero (or no significant) movement(s) of object from one cluster

to another.
 Cluster quality reaches to a certain level of acceptance.
22
Example : Clustering by k-means partitioning
Consider a set of objects depicted in the table below. Let k =3, that is, the user would like
the objects to be partitioned into three clusters

A1 A2
6.8 12.6 Fig 1: Plotting data of Table 1
25
0.8 9.8
1.2 11.6
Table 1 : 20
16 objects with 2.8 9.6
two attributes 3.8 9.9
𝑨𝟏 and 𝑨𝟐 . 15
4.4 6.5

A2
4.8 1.1
6.0 19.9 10

6.2 18.5
7.6 17.4 5

7.8 12.2
6.6 7.7 0
0 2 4 6 8 10 12
8.2 4.5
A1
8.4 6.9
9.0 3.4
9.6 11.1 23
k-Means Algorithm
Step 1: We arbitrarily choose three objects as the three initial cluster centers

A1 A2
6.8 12.6 Plotting data of Table 1
25
0.8 9.8
1.2 11.6
20
2.8 9.6
3.8 9.9
4.4 6.5 15
A2
4.8 1.1
6.0 19.9 10

6.2 18.5
7.6 17.4 5
7.8 12.2
6.6 7.7 0
8.2 4.5 0 2 4 6 8 10 12
A1
8.4 6.9
9.0 3.4
9.6 11.1 24
k-Means Algorithm
Step 2: Each object is assigned to a cluster based on the cluster center to which it is the nearest.
 Let d1, d2 and d3 denote the distance from an object to c1, c2 and c3 respectively.

A1 A2
6.8 12.6 Plotting data of Table 1
25
0.8 9.8
1.2 11.6
20
2.8 9.6
3.8 9.9
4.4 6.5 15
A2
4.8 1.1
6.0 19.9 10

6.2 18.5
7.6 17.4 5
7.8 12.2
6.6 7.7 0
8.2 4.5 0 2 4 6 8 10 12
A1
8.4 6.9
9.0 3.4
9.6 11.1 25
k-Means Algorithm
Step 2: Each object is assigned to a cluster based on the cluster center to which it is the nearest.
 Let d1, d2 and d3 denote the distance from an object to c1, c2 and c3 respectively.
.

A1 A2
6.8 12.6
0.8 9.8
Initial centroids chosen randomly
1.2 11.6
2.8 9.6
Centroid Objects
3.8 9.9
4.4 6.5
A1 A2
4.8 1.1 c1 3.8 9.9
6.0 19.9 c2 7.8 12.2
6.2 18.5 c3 6.2 18.5
7.6 17.4
7.8 12.2
6.6 7.7
8.2 4.5
8.4 6.9
9.0 3.4
26
9.6 11.1
k-Means Algorithm
Step 2: Each object is assigned to a cluster based on the cluster center to
which it is the nearest.
 Let d1, d2 and d3 denote the Euclidean distance from an object to
c1, c2 and c3 respectively.
 The distance calculations are shown in Table.
. A1 A2 d1 d2 d3
6.8 12.6 4.0 1.1 5.9
0.8 9.8 3.0 7.4 10.2
1.2 11.6 3.1 6.6 8.5
2.8 9.6 1.0 5.6 9.5 Centroid Objects
3.8 9.9 0.0 4.6 8.9
A1 A2
4.4 6.5 3.5 6.6 12.1
c1 3.8 9.9
4.8 1.1 8.9 11.5 17.5
c2 7.8 12.2
6.0 19.9 10.2 7.9 1.4
6.2 18.5 8.9 6.5 0.0 c3 6.2 18.5
7.6 17.4 8.4 5.2 1.8
7.8 12.2 4.6 0.0 6.5
6.6 7.7 3.6 4.7 10.8
8.2 4.5 7.0 7.7 14.1
8.4 6.9 5.5 5.3 11.8
9.0 3.4 8.3 8.9 15.4
27
9.6 11.1 5.9 2.1 8.1
k-Means Algorithm
Step 2: Each object is assigned to a cluster based on the cluster center to
which it is the nearest.
 Assignment of each object to the respective centroid is shown in
the right-most column and the clustering so obtained is shown in
Fig below.
. A1 A2 d1 d2 d3 cluster
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
28
9.6 11.1 5.9 2.1 8.1 2
k-Means Algorithm
Step 3: Update the cluster centers
 the mean value of each cluster is recalculated based on the current
objects in the cluster.
 Using the new cluster centers, the objects are redistributed to the
clusters based on which cluster center is the nearest
A1 A2 d1 d2 d3 cluster
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
29
9.6 11.1 5.9 2.1 8.1 2
k-Means Algorithm
Step 3: Update the cluster centers
 the mean value of each cluster is recalculated based on the current
objects in the cluster.
 Using the new cluster centers, the objects are redistributed to the
clusters based on which cluster center is the nearest
A1 A2 d1 d2 d3 cluster
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
30
9.6 11.1 5.9 2.1 8.1 2
k-Means Algorithm
Step 3: Update the cluster centers
 the mean value of each cluster is recalculated based on the current
objects in the cluster.
 Using the new cluster centers, the objects are redistributed to the
clusters based on which cluster center is the nearest

New Objects
Centroid A1 A2
c1 4.6 7.1
c2 8.2 10.7
c3 6.6 18.6

31
k-Means Algorithm
Step 3: We next reassign the 16 objects to three clusters by determining
which centroid is closest to each one. This gives the revised set of
clusters shown in Fig 4.
 Note that point p moves from cluster C2 to cluster C1.

Fig 4: Cluster after first iteration

32
k-Means Algorithm

• The newly obtained centroids after second iteration are given in the table below.
Note that the centroid c3 remains unchanged, where c2 and c1 changed a little.
• With respect to newly obtained cluster centres, 16 points are reassigned again.
These are the same clusters as before. Hence, their centroids also remain
unchanged.
• Considering this as the termination criteria, the k-means algorithm stops here.
Hence, the final cluster in Fig 5 is same as Fig 4.

Cluster centres after second iteration Fig 5: Cluster after Second iteration

Centroi Revised Centroids

d A1 A2
c1 5.0 7.1
c2 8.1 12.0
c3 6.6 18.6

 The process of iteratively reassigning objects to clusters to

improve the partitioning is referred to as iterative relocation.
 Eventually, no reassignment of the objects in any cluster occurs
and so the process terminates.
 The k-means method is not guaranteed to converge to the global
optimum and often terminates at a local optimum. The results 32
may depend on the initial random selection of cluster centers.
Comments of K-Means Algorithm
Limitations :
• k-means has trouble clustering data
that contains outliers. When the SSE is
used as objective function, outliers can
unduly influence the cluster that are
produced. More precisely, in the
presence of outliers, the cluster
centroids, in fact, not truly as
representative as they would be
otherwise. It also influence the SSE
measure as well.

• k-Means algorithm cannot handle non-

globular clusters, clusters of different
sizes and densities (see Fig 16.6 in the
next slide).

• k-Means algorithm not really beyond

the scalability issue (and not so
practical for large databases).

32
Thank You

Project Report On Employee Management System
87% (45)
Project Report On Employee Management System
30 pages
API 1104 Summary
100% (2)
API 1104 Summary
7 pages
PhA 047 - The Logic of Apuleius. Including A Complete Latin Text and English Translation of The Peri Hermeneias of Apuleius of Madaura PDF
100% (3)
PhA 047 - The Logic of Apuleius. Including A Complete Latin Text and English Translation of The Peri Hermeneias of Apuleius of Madaura PDF
129 pages
Clustering
No ratings yet
Clustering
104 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Clustering
No ratings yet
Clustering
25 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unit 4
No ratings yet
Unit 4
4 pages
Lect 12
No ratings yet
Lect 12
80 pages
M5
No ratings yet
M5
40 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unit 4
No ratings yet
Unit 4
74 pages
M5
No ratings yet
M5
40 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
07Clustering
No ratings yet
07Clustering
34 pages
Week 11
No ratings yet
Week 11
49 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Clustering
No ratings yet
Clustering
125 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
BI UNIT-03 Chap02 Clustering
No ratings yet
BI UNIT-03 Chap02 Clustering
8 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Course Outline Overview of Computer Science
No ratings yet
Course Outline Overview of Computer Science
9 pages
Computer Software
No ratings yet
Computer Software
8 pages
4th Year Computer Science Student List
No ratings yet
4th Year Computer Science Student List
3 pages
Lecture 2-Regression
No ratings yet
Lecture 2-Regression
49 pages
Chapter 5 - Machine Learning
No ratings yet
Chapter 5 - Machine Learning
59 pages
Guesh Dagnew
No ratings yet
Guesh Dagnew
143 pages
Chapter 4-Communication
No ratings yet
Chapter 4-Communication
41 pages
Chapter - 4
No ratings yet
Chapter - 4
17 pages
Lect 2
No ratings yet
Lect 2
18 pages
Introduction To Computer Science
No ratings yet
Introduction To Computer Science
108 pages
Lect 3
No ratings yet
Lect 3
12 pages
Automata Resource
No ratings yet
Automata Resource
1 page
PS F4 CSC 2022-2023
No ratings yet
PS F4 CSC 2022-2023
3 pages
Atomic_structure_and_mass_spectroscopy[1]
No ratings yet
Atomic_structure_and_mass_spectroscopy[1]
2 pages
QM Mock 3 For Review
No ratings yet
QM Mock 3 For Review
14 pages
American Ceramic: Society
No ratings yet
American Ceramic: Society
8 pages
Raymond M. Smullyan - Satan, Cantor, and Infinity
No ratings yet
Raymond M. Smullyan - Satan, Cantor, and Infinity
5 pages
7th English Maths 1 PDF
No ratings yet
7th English Maths 1 PDF
172 pages
EOT Crane Load Calculations
No ratings yet
EOT Crane Load Calculations
8 pages
How To Read Crochet Patterns in Charts
100% (1)
How To Read Crochet Patterns in Charts
2 pages
Ch04Ex4a
No ratings yet
Ch04Ex4a
2 pages
Rexroth - Motores Hidraulicos II
No ratings yet
Rexroth - Motores Hidraulicos II
36 pages
Calculatecylinder Heads
No ratings yet
Calculatecylinder Heads
23 pages
Leftist Skew Heaps
100% (2)
Leftist Skew Heaps
52 pages
Slides - Introduction To Electric Drives
No ratings yet
Slides - Introduction To Electric Drives
45 pages
Citation:: Critical Appraisal of A Case-Control Study Case-Control Worksheet
No ratings yet
Citation:: Critical Appraisal of A Case-Control Study Case-Control Worksheet
4 pages
Chapter 2 - FIRE SAFETY PREVENTION SYSTEM
No ratings yet
Chapter 2 - FIRE SAFETY PREVENTION SYSTEM
66 pages
QS_DOC (1)
No ratings yet
QS_DOC (1)
5 pages
Host Interface Manual - Cobas C 311
75% (4)
Host Interface Manual - Cobas C 311
62 pages
23 - 24 Cat 2023 10cat t2 Practical Test
No ratings yet
23 - 24 Cat 2023 10cat t2 Practical Test
8 pages
Chapter 3, Requirement Engineering (Part 1) + Requirement Process (Part 2)
No ratings yet
Chapter 3, Requirement Engineering (Part 1) + Requirement Process (Part 2)
10 pages
Forming PDF
No ratings yet
Forming PDF
23 pages
Transportation Model
No ratings yet
Transportation Model
40 pages
Briggs Stratton 10 KW - Modelo 30207
No ratings yet
Briggs Stratton 10 KW - Modelo 30207
30 pages
Guidelines For Estimating and Reporting Measurement Uncertainty of Chemical Test Results
No ratings yet
Guidelines For Estimating and Reporting Measurement Uncertainty of Chemical Test Results
11 pages
Budgeting Case Study
No ratings yet
Budgeting Case Study
16 pages
Mine Geology Short Course PPT 050517
No ratings yet
Mine Geology Short Course PPT 050517
177 pages
Research, Design and Develop A Prototype of Multi-Spindle Drilling Head
No ratings yet
Research, Design and Develop A Prototype of Multi-Spindle Drilling Head
6 pages
Applied Statistics in Business and Economics 4th Edition Doane Test Bank instant download
100% (2)
Applied Statistics in Business and Economics 4th Edition Doane Test Bank instant download
65 pages

Clustering-Part 1

Uploaded by

Clustering-Part 1

Uploaded by

CHAPTER 4:

have a target column  Doesn’t have a target column

 Unsupervised algorithms are relevant when we don’t have an

have a target column  Doesn’t have a target column

 Unsupervised algorithms are relevant when we don’t have an

 Unsupervised algorithms are helpful to find structures within our data

 Unsupervised algorithms are helpful to find structures within our data

Definition. Given a database D ={t1,t2,…,tn} of tuples and an integer value k,

 Clustering can be used

 Partitioning methods : - divides n objects in to k partitions of

 k-Means clustering algorithm proposed by J. Hartigan and M. A.

The algorithm can be stated as follows.

The algorithm can be stated as follows.

The algorithm can be stated as follows.

The algorithm can be stated as follows.

The algorithm can be stated as follows.

The algorithm can be stated as follows.

 Points don’t change- converged 20

2) Distance computation: Any distance such as 𝐿1 , 𝐿2 , 𝐿3 or cosine

3) Minimum distance is the measure of closeness between an object and

4) Mean Calculation: It is the mean value of each attribute values of all

5) Convergence criteria: Any one of the following are termination

 No change of centroid values in any cluster.

 Zero (or no significant) movement(s) of object from one cluster

Fig 4: Cluster after first iteration

Centroi Revised Centroids

 The process of iteratively reassigning objects to clusters to

• k-Means algorithm cannot handle non-

• k-Means algorithm not really beyond

You might also like