0% found this document useful (0 votes)

15 views12 pages

K-Means Clustering Clearly Explained

K-Means Clustering is an unsupervised learning algorithm that groups data into K distinct clusters based on similarity without labeled outputs. The algorithm iteratively assigns data points to the nearest centroid and updates centroids until stabilization, aiming to minimize within-cluster variance. It is widely used in applications like market segmentation, anomaly detection, and image compression, but requires predefining the number of clusters and is sensitive to initialization and outliers.

Uploaded by

rk7370179

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

K-Means Clustering Clearly Explained

Uploaded by

rk7370179

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

linkedin.

com/in/vikrantkumar95

K-Means Clustering
Clearly Explained
What is Unsupervised Learning?
Before we jump into what K-Means Clustering is, let see what the
difference between Unsupervised and Supervised Learning is:

Supervised Learning

Trained Model Supervised Learning usually consists of an

algorithm that learns from labeled data. The
model learns to recognize characteristics
based on the training data and the label (Eg.
Predicting Property prices based on size,
location etc). It then is able to predict the label
of unlabelled data on based on those
characterics.

Unsupervised Learning
Unsupervised Learning is a type of machine
learning where an algorithm learns patterns,
structures, or relationships in data without any
labeled outputs. The goal is to uncover hidden
insights or organize data into meaningful
groups based on similarities. K-Means
Clustering is a type of Unsupervised Learning
Alogrithm.

linkedin.com/in/vikrantkumar95
What is K-Means Clustering
As we mentioned in the previous page, K-Means Clustering is a
Unsupervised Learning method. It works very simply:

Divides data into K distinct clusters

Groups similar data points based on their features
Ensures data points in a cluster are closer to each other than to
points in other clusters

Unclustered Data First Cluster, could be better?

Looks Much Better

Since it’s an unsupervised method, we don’t have any labels to train a

model on a training set. The K-means algorithm iterates by creating
clusters (we give the number of clusters) and then measuring if it’s
good. So how does it create clusters and how does it measure the
“goodness” of a cluster? Let’s see how it works!

linkedin.com/in/vikrantkumar95
How does it work?
Let’s start with a one-dimensional data set : [1, 3, 4, 8, 10, 12]

1 3 4 8 10 12

Step 1: Decide on the number of clusters (K)

We’ll create 2 clusters here. So we take K = 2.

Step 2: Decide on the number of clusters (K)

Randomly select two initial “Centroids” (equal to k) from the data points.
Here centroid simply means the mean of all the points in a cluster. Since
we don’t have a cluster yet, we will select 2 data points. The initial
Centroids we picked here are : [3, 10]

Step 3: Calculate the Distance of Each Centroid from the

data points

Data Distance to Distance to Assigned

Point Centroid 3 Centroid 10 Cluster

1 2 9 3

3 0 7 3

4 1 6 3

8 5 2 10

10 7 0 10

12 9 2 10

linkedin.com/in/vikrantkumar95
How does it work?
We calculated the distance in the table previously. Since it’s one
dimensional data, the distance is just the difference between the
centroid and the data point. We assigned each point to the cluster
with closest centroid

1 3 4 8 10 12

Step 4: Update the Centroids

Now that we have our two clusters, we calculate new Centroids. It’s
simply the mean of all the points in the cluster.

1+3+4
New Centroid for Cluster 1 = = 2.67
3

8 + 10 + 12
New Centroid for Cluster 2 = = 10
3

These two points are now our new Centroids [2.67, 10]
We then repeat steps 3 and 4 again. We will c

1 2.67 3 4 8 10 12

linkedin.com/in/vikrantkumar95
How does it work?
Step 5: Repeat Steps 3 - 4

Data Distance to Distance to Assigned

Point Centroid 2.67 Centroid 10 Cluster

1 1.67 9 2.67

3 0.33 7 2.67

4 1.33 6 2.67

8 5.33 2 10

10 7.33 0 10

12 9.33 2 10

Now from what we can see above:

Even though the Centroids changed (atleast one did), the clusters
remained the same [1,3,4] & [8,10,12].
If we were to repeat steps 3 and 4 again, we would get the same
result since the we’d get the same Centroids again. The alorithm
thus stops here and we have the best presumed cluster.
The alogrith stops when the Centroids don’t change or when it
reaches the max number of iterations.

Now let’s see how it would perform on a 2 Dimensional Dataset!

linkedin.com/in/vikrantkumar95
How does it work?

Step 1: Let’s create 3 clusters this time. So K = 3

Step 2: With a 2-D data set, we will again start with picking the
Centroids. These will be 3 points randomly selected.

The Red, Green and Yellow points are our initial centroids.

linkedin.com/in/vikrantkumar95
How does it work?
Step 3: Similar to when we had 1-D data, we have to calculate
the distance of each point from the centroids and then put them
in the cluster with the closest centroid. Now, to calculate the
distance in 2-D and more dimensions, we use something called
Euclidean Distance d

Step 4: After calculating the distances, we would then assign

each point to the cluster belonging to the Centroid that is at the
least distance from it.

These are our new clusters. We now update the centroids again,
by calculating the mean of all the points in a particular cluster.

x1 + x2 + x3 +x4 y1 + y2 + y3 +y4
New Centroid for Cluster 1 =
,
4 4

We do the same for Cluster 2 and Cluster 3.

linkedin.com/in/vikrantkumar95
How does it work?
Step 5: Repeat Step 3 & 4

C2
C1

With our new Centroids C1, C2 and C3 we again calculate the

distance of each point from each Centroid and bucket them in
the cluster belonging to the nearest Centroid.

C2
C1

This looks like a much better set of clusters! The algorithm will
continue to create new centroids and changing clusters till either
the centroids don’t change significantly or the max iterations is
reached.

linkedin.com/in/vikrantkumar95
How do we decide K
A question people often have is, how do we decide on the value
of K? In our examples it was somewhat clear by looking at the
data, how many clusters would be ideal. But if it’s a higher
dimensional data, we will not be able to visualise it. How do we
decide how many clusters should we create?

Since with each iteration of the centroids, the distance between

the points and centroid decreases, we are essentially
decreasing the varations within a cluster. Variation would simply
be the sum of the square distances between the centroid and
the data points.

With each iteration, this varation will decrease. However, at some

point, the decrease in variance will start to get smaller and
smaller.
Decrease in Variation

Elbow Point

1 2 3 4 5
k

We usually choose a K that’s at the “Elbow Point”, above which

there’s diminishing returns in creating more clusters.

linkedin.com/in/vikrantkumar95
Let’s Summarise
Let’s summarise what K-means Clustering is:

1. What It Is
A simple and popular unsupervised learning algorithm.
Groups data into K clusters based on similarity.

2. How It Works
Step 1: Choose K
Step 2: Initialize K centroids randomly.
Step 3: Assign data points to the nearest centroid.
Step 4: Update centroids as the mean of assigned points.
Step 5: Repeat until centroids stabilize.
3. Goal
Minimize within-cluster variance (make clusters compact).

4. Strengths
Easy to understand and implement.
Works well for structured, spherical data.

5. Challenges
Requires predefining K (number of clusters).
Sensitive to initialization and outliers.
Assumes clusters are spherical and similar in size.

6. Applications
Market Segmentation: Group customers by behavior.
Anomaly Detection: Identify unusual patterns.
Recommender Systems: Cluster similar users or items.
Image Compression: Reduce pixel redundancy.

linkedin.com/in/vikrantkumar95
Enjoyed
reading?

Follow for
everything Data
and AI!

linkedin.com/in/vikrantkumar95

A System Dynamics Model For Behavioral Analysis of Safety Conditions in A Chemical Storage Unit
No ratings yet
A System Dynamics Model For Behavioral Analysis of Safety Conditions in A Chemical Storage Unit
9 pages
Holanda - 2019 - Asset Replacement Strategies in Ageing Grids
No ratings yet
Holanda - 2019 - Asset Replacement Strategies in Ageing Grids
13 pages
Digitalization in Oil and Gas Industry
No ratings yet
Digitalization in Oil and Gas Industry
8 pages
This Study Resource Was: Case Application 2: Shifting Direction
100% (1)
This Study Resource Was: Case Application 2: Shifting Direction
5 pages
HiddenMarkovModel FINAL
100% (2)
HiddenMarkovModel FINAL
73 pages
Date Phone Calls Leads Sales Leads Per Call Sales Per Call
No ratings yet
Date Phone Calls Leads Sales Leads Per Call Sales Per Call
40 pages
01 Asset Management - The First 20 Years
100% (1)
01 Asset Management - The First 20 Years
18 pages
Start s7LT
No ratings yet
Start s7LT
24 pages
Business Analytics CA 1
No ratings yet
Business Analytics CA 1
17 pages
AssetManagementIntro 001 0
0% (1)
AssetManagementIntro 001 0
169 pages
ChemicalTankerTutorial ExpressMarine V1.2.1
No ratings yet
ChemicalTankerTutorial ExpressMarine V1.2.1
129 pages
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines
No ratings yet
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines
16 pages
Holanda - 2019 - Asset Replacement Strategies in Ageing Grids
100% (1)
Holanda - 2019 - Asset Replacement Strategies in Ageing Grids
23 pages
VRFing 101, Understing VRF Basics - PacketU PDF
No ratings yet
VRFing 101, Understing VRF Basics - PacketU PDF
21 pages
2019 Faculty of Computer Science and Information Technology-3
No ratings yet
2019 Faculty of Computer Science and Information Technology-3
20 pages
MTA (Maintenance Task Analisys)
100% (1)
MTA (Maintenance Task Analisys)
3 pages
FSD_PostgreSQL
No ratings yet
FSD_PostgreSQL
8 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Condition Monitoring Methods and Techniques
No ratings yet
Condition Monitoring Methods and Techniques
2 pages
HLURB LU Codes
100% (2)
HLURB LU Codes
7 pages
Code 16 swhihatPAD KIT ARDUINO
No ratings yet
Code 16 swhihatPAD KIT ARDUINO
5 pages
algo
No ratings yet
algo
59 pages
Tennessee Department of Education Letter To SCS
No ratings yet
Tennessee Department of Education Letter To SCS
3 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Cyber Risk Thematic Review 2024
No ratings yet
Cyber Risk Thematic Review 2024
3 pages
Brain Fingerprinting: Bachelor of Technology
No ratings yet
Brain Fingerprinting: Bachelor of Technology
4 pages
Azure Clickthrough
No ratings yet
Azure Clickthrough
124 pages
Stsa 1805
No ratings yet
Stsa 1805
8 pages
Plithogenic Fuzzy Graph: A Study of Fundamental Properties and Potential Applications
No ratings yet
Plithogenic Fuzzy Graph: A Study of Fundamental Properties and Potential Applications
19 pages
A Simulation Model To Reduce The Fuel Consumption Through Efficient Road Traffic Modelling
No ratings yet
A Simulation Model To Reduce The Fuel Consumption Through Efficient Road Traffic Modelling
13 pages
LectureNotes Part01
No ratings yet
LectureNotes Part01
113 pages
MTA2 Guide
100% (1)
MTA2 Guide
338 pages
Aon RMI Insight Report October 2014
No ratings yet
Aon RMI Insight Report October 2014
20 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
ups_tt_uk
No ratings yet
ups_tt_uk
2 pages
Workshop Manual - Social Media Marketing
100% (1)
Workshop Manual - Social Media Marketing
34 pages
Lab 1 - Basic Model Building: EEE3454 Augmented and Virtual Reality Engineering
No ratings yet
Lab 1 - Basic Model Building: EEE3454 Augmented and Virtual Reality Engineering
24 pages
Pilot
No ratings yet
Pilot
3 pages
IAM, How Asset Management Can Enable The Circular Economy
No ratings yet
IAM, How Asset Management Can Enable The Circular Economy
18 pages
Achieving Excellence Through Asset Management and Risk Analysis
100% (1)
Achieving Excellence Through Asset Management and Risk Analysis
7 pages
Cyberoam Release Notes V 10 CR15wi, CR15i & CR25i
No ratings yet
Cyberoam Release Notes V 10 CR15wi, CR15i & CR25i
17 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
100 pages
Asset Management Decision Support Modelling Using Health and Risk Model
No ratings yet
Asset Management Decision Support Modelling Using Health and Risk Model
6 pages
The Self-Assessment Methodology - Guidance - PDF
No ratings yet
The Self-Assessment Methodology - Guidance - PDF
18 pages
IAM Certificate Sample Questions
100% (1)
IAM Certificate Sample Questions
3 pages
Measure of Central Tendency (Assignment)
No ratings yet
Measure of Central Tendency (Assignment)
8 pages
Training Practice Questions V11
0% (1)
Training Practice Questions V11
8 pages
RBI RBIF Standard Europe FStandard - v02fq04102015
No ratings yet
RBI RBIF Standard Europe FStandard - v02fq04102015
26 pages
Cullen College of Engineering: Information Systems and Services & Instructional Technologies
No ratings yet
Cullen College of Engineering: Information Systems and Services & Instructional Technologies
2 pages
Integration of Asset Management Standard ISO 55000 With A Maintenance Management Model
No ratings yet
Integration of Asset Management Standard ISO 55000 With A Maintenance Management Model
7 pages
Oreda 2009-Volumen 1
No ratings yet
Oreda 2009-Volumen 1
285 pages
Smart Signal Demo
No ratings yet
Smart Signal Demo
34 pages
ASR1000 QoS FAQ
No ratings yet
ASR1000 QoS FAQ
25 pages
Maintenance Organization
No ratings yet
Maintenance Organization
8 pages
Basic Principles of Asset Management, PAS 55 & ISO 55000 2 Day Course
100% (2)
Basic Principles of Asset Management, PAS 55 & ISO 55000 2 Day Course
3 pages
1631 GCS210109 TranQuangHien Assignment1
No ratings yet
1631 GCS210109 TranQuangHien Assignment1
38 pages
Grammar: Be Seen From The Road
No ratings yet
Grammar: Be Seen From The Road
1 page
CQE Academy Equation Cheat Sheet B
No ratings yet
CQE Academy Equation Cheat Sheet B
15 pages
RCMO
100% (2)
RCMO
115 pages
X64 Xcelera-CL LX1
No ratings yet
X64 Xcelera-CL LX1
88 pages
Bootstrap
No ratings yet
Bootstrap
8 pages
The Asset Management Industry in 2010
100% (2)
The Asset Management Industry in 2010
42 pages
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
No ratings yet
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
3 pages
AssetManager PDF
No ratings yet
AssetManager PDF
4 pages
Maintenance and Reliability - Theory: John E. Skog P.E. WGA3-06 Tutorial June 2006 Rio de Janeiro
No ratings yet
Maintenance and Reliability - Theory: John E. Skog P.E. WGA3-06 Tutorial June 2006 Rio de Janeiro
59 pages
Standardized Work
No ratings yet
Standardized Work
17 pages
AMMP: A New Maintenance Management Model Based On ISO 55000: Infrastructure Asset Management January 2016
No ratings yet
AMMP: A New Maintenance Management Model Based On ISO 55000: Infrastructure Asset Management January 2016
11 pages
Paper RIMAP SanDiego2004
No ratings yet
Paper RIMAP SanDiego2004
8 pages
Operational Definition of Benchmarking
No ratings yet
Operational Definition of Benchmarking
26 pages
Davies R.The State of ISO 55000 Next Steps Read Only PDF
No ratings yet
Davies R.The State of ISO 55000 Next Steps Read Only PDF
29 pages
Priority Vs Criticality
No ratings yet
Priority Vs Criticality
8 pages
Run Comparisons: Using In-Line Inspection Data For The Assessment of Pipelines
No ratings yet
Run Comparisons: Using In-Line Inspection Data For The Assessment of Pipelines
8 pages
Untitled
No ratings yet
Untitled
49 pages
JQT - Planned Replacement - Some Theory and Its Application
No ratings yet
JQT - Planned Replacement - Some Theory and Its Application
10 pages
The 2008 IAM Competences Framework: Defining Competence Requirements For People Working in Asset Management
No ratings yet
The 2008 IAM Competences Framework: Defining Competence Requirements For People Working in Asset Management
2 pages
LCE Risk-Based Asset Management Report
No ratings yet
LCE Risk-Based Asset Management Report
11 pages
WLC Workshop Module V2 11 19 18
No ratings yet
WLC Workshop Module V2 11 19 18
54 pages
RAM Analysis Applied To Centrifugal Gas Compressors "Case Study of An Oil and Gas Company"
No ratings yet
RAM Analysis Applied To Centrifugal Gas Compressors "Case Study of An Oil and Gas Company"
10 pages
Asset Management
No ratings yet
Asset Management
11 pages
Introduction To AWS Identity and Access Management (IAM) : Cloud - User
No ratings yet
Introduction To AWS Identity and Access Management (IAM) : Cloud - User
7 pages
Decisions in Asset Management AMMJ Jan 2015
No ratings yet
Decisions in Asset Management AMMJ Jan 2015
3 pages
Maintenance Benchmarking: Robert G. Batson Department of Industrial Engineering The University of Alabama
No ratings yet
Maintenance Benchmarking: Robert G. Batson Department of Industrial Engineering The University of Alabama
21 pages
6763 TestingLine FC 20160602 Web
No ratings yet
6763 TestingLine FC 20160602 Web
9 pages
S Reliability Centered Maintenance (RCM) : Tudy of Existing
No ratings yet
S Reliability Centered Maintenance (RCM) : Tudy of Existing
97 pages
Gfmam - Auditor Assessor - First Edition Version 2 - English Version
No ratings yet
Gfmam - Auditor Assessor - First Edition Version 2 - English Version
18 pages
Condition Based Maintenance Optimization Considering Multiple Objectives
No ratings yet
Condition Based Maintenance Optimization Considering Multiple Objectives
9 pages
Bali Conference Programme - Final - LR PDF
No ratings yet
Bali Conference Programme - Final - LR PDF
8 pages
Ventyx-ABB Asset Health Center
No ratings yet
Ventyx-ABB Asset Health Center
23 pages
Sustainable Asset Management: AI & Blockchain Unleashed
From Everand
Sustainable Asset Management: AI & Blockchain Unleashed
Prashant Sinha
No ratings yet
From Prognostics and Health Systems Management to Predictive Maintenance 2: Knowledge, Reliability and Decision
From Everand
From Prognostics and Health Systems Management to Predictive Maintenance 2: Knowledge, Reliability and Decision
Brigitte Chebel-Morello
No ratings yet

K-Means Clustering Clearly Explained

Uploaded by

K-Means Clustering Clearly Explained

Uploaded by

linkedin.

Trained Model Supervised Learning usually consists of an

Divides data into K distinct clusters

Unclustered Data First Cluster, could be better?

Looks Much Better

Since it’s an unsupervised method, we don’t have any labels to train a

Step 1: Decide on the number of clusters (K)

Step 2: Decide on the number of clusters (K)

Step 3: Calculate the Distance of Each Centroid from the

Data Distance to Distance to Assigned

Step 4: Update the Centroids

Data Distance to Distance to Assigned

Now from what we can see above:

Now let’s see how it would perform on a 2 Dimensional Dataset!

Step 1: Let’s create 3 clusters this time. So K = 3

Step 4: After calculating the distances, we would then assign

We do the same for Cluster 2 and Cluster 3.

With our new Centroids C1, C2 and C3 we again calculate the

Since with each iteration of the centroids, the distance between

With each iteration, this varation will decrease. However, at some

We usually choose a K that’s at the “Elbow Point”, above which

You might also like