0% found this document useful (0 votes)
29 views

K Mean Clustering

K-means clustering is an algorithm that partitions n observations into k clusters by minimizing the sum of squares of distances between each observation and its nearest cluster centroid, where k is a positive integer number specified in advance. It works by first randomly assigning observations to k initial centroids and then iteratively reassigning observations between clusters and recalculating centroids until the clusters stabilize. The goal is to classify observations such that observations within each cluster are as similar as possible while observations between clusters are as dissimilar as possible.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

K Mean Clustering

K-means clustering is an algorithm that partitions n observations into k clusters by minimizing the sum of squares of distances between each observation and its nearest cluster centroid, where k is a positive integer number specified in advance. It works by first randomly assigning observations to k initial centroids and then iteratively reassigning observations between clusters and recalculating centroids until the clusters stabilize. The goal is to classify observations such that observations within each cluster are as similar as possible while observations between clusters are as dissimilar as possible.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

K-MEANS

CLUSTERING
K-MEANS CLUSTERING
⚫ The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
⚫ It is similar to the expectation-maximization
algorithm for mixtures of Gaussians in that they
both attempt to find the centers of natural clusters
in the data.
⚫ It assumes that the object attributes form a vector
space.
⚫ An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing data points so as to minimize the
sum-of-squares criterion

where xn is a vector representing the the nth


data point and uj is the geometric centroid of
the data points in Sj.
⚫ Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
⚫ K is positive integer number.
⚫ The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
How the K-Mean Clustering
algorithm works?
⚫ Step 1: Begin with a decision on the value of k =
number of clusters .
⚫ Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly,or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest
centroid. After each assignment, recompute
the centroid of the gaining cluster.
⚫ Step 3: Take each sample in sequence and
compute its distance from the centroid of
each of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster
and update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
⚫ Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
K-means: an example
K-means: Initialize centers randomly
K-means: assign points to nearest
center
K-means: readjust centers
K-means: assign points to nearest
center
K-means: readjust centers
K-means: assign points to nearest
center
K-means: readjust centers
K-means: assign points to nearest
center

No changes: Done
Another example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
⚫ Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
⚫ Their new centroids are:
Step 3:
⚫ Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

⚫ Therefore, the new


clusters are:
{1,2} and {3,4,5,6,7}

⚫ Next centroids are:


m1=(1.25,1.5) and m2 =
(3.9,5.1)
⚫ Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

⚫ Therefore, there is no
change in the cluster.
⚫ Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
CONCLUSION
⚫ K-means algorithm is useful for undirected
knowledge discovery and is relatively simple.
K-means has found wide spread usage in lot
of fields, ranging from unsupervised learning
of neural network, Pattern recognitions,
Classification analysis, Artificial intelligence,
image processing, machine vision, and many
others.

You might also like