0% found this document useful (0 votes)

25 views

Chap2 Part1 KMEANS

The document discusses the K-means clustering algorithm. It explains that K-means aims to partition observations into K clusters where each observation belongs to the cluster with the nearest mean. The algorithm works by assigning observations to initial cluster means, calculating new means as the centroid of each cluster, and repeating this process until cluster means stop changing.

Uploaded by

houcem.swissi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Chap2 Part1 KMEANS

Uploaded by

houcem.swissi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Chapter 2 : Generative Models

Kmeans
Machine Learning Team
UP GL-BD
Learning Outcomes

- Apply unsupervised models exploiting the geometric relationships

between data through the K-means algorithm.

- Evaluate the performance of generative models.

Machine Learning team Esprit 2022/2023 2

Plan
1. Introduction

2. Unsupervised learning categories

3. Clustering

4. K-means

5. Bibliography

Machine Learning team Esprit 2022/2023 3

Introduction
• Discriminative models draw boundaries in the data space, while generative models try to model
how data is placed throughout the space.

• A generative model focuses on explaining how the data was generated, while a discriminative
model focuses on predicting the labels of the data

Machine Learning team Esprit 2022/2023 4

Introduction
• An unsupervised learning method is a method in which we draw

references from datasets consisting of input data without labeled

responses.

• Generally, it is used as a process to find:

o Meaningful structure,
o Explanatory underlying processes,
o Generative features, Unsupervised Learning
o and groupings inherent in a set of examples.

Machine Learning team Esprit 2022/2023 5

Introduction

Machine Learning team Esprit 2022/2023 6

Introduction

Machine Learning team Esprit 2022/2023 7

Unsupervised learning categories

Different tasks are associated with unsupervised learning:

Unsupervised Learning

Dimensionality
Clustering Association rules
reduction

Machine Learning team Esprit 2022/2023 8

Unsupervised learning categories

Different tasks are associated with unsupervised learning:

Unsupervised Learning

Dimensionality
Clustering Association rules
reduction

Machine Learning team Esprit 2022/2023 9

Clustering
Definition
• Clustering is the task of dividing the population or data points into a number of groups/cluster.

• It is basically a collection of objects on the basis of similarity and dissimilarity between them:
• Data points in the same groups are more similar to other data points in the same group
• Data points in other groups are dissimilar.

• No predefined classes => unlabeled data

🡺 The quality of a clustering depends on the Similarity Measure
🡺 A good method will produce clusters whose elements have:
- strong intra‐class similarity.
- low inter‐class similarity.

Machine Learning team Esprit 2022/2023 10

Clustering
Similarity Measure
• Similarity between objects depends on:
- The type of data
- The type of similarity

Data type Similarity Measure Remarks

Distance de Manhattan • It needs to normalize the data before using this distance measure.
• Euclidean distance works great when we have low-dimensional data
• Overweight outliers
Distance euclidienne • Does not overweight outliers.
Numerical • The calculation times are particularly long
data

Distance de Minkowski • It allows you a huge amount of flexibility over your distance metric
• The parameter p can be troublesome to work with as finding the right value
can be quite computationally inefficient depending on the use-case.

Binary distance d(0,0)=d(1,1)=0

Binary data d(0,1)=d(1,0)=1

Enumerated Distance zero if the values are equal and 1 otherwise

data
Machine Learning team Esprit 2022/2023 11
Clustering
Applications of Clustering

Bank & Insurance Medicine

It is used to acknowledge the Patients segmentation
customers, their policies and Location of tumors in the brain (Similar behaviors)
identifying the frauds

City planning
It is used to make groups of houses and to study
their values based on their geographical locations
and other factors present.

Machine Learning team Esprit 2022/2023 12

Clustering
Types
• Centroid-based Clustering : finding k sets of • Distribution-based Clustering: assumes data is
points which are grouped based on the composed of distributions, such as Gaussian
proximity to the centroid distributions

• Hierarchical Clustering: assumes data is

• Density-based Clustering: connects areas of high composed of distributions, such as Gaussian
distributions
example density into clusters

Machine Learning team Esprit 2022/2023 13

Centroid-based Clustering:
Kmeans (MacQueen’67)

Machine Learning team Esprit 2022/2023 14

K-Means
Working principal
• Chercher des groupes homogènes dans une population hétérogène

Machine Learning team Esprit 2022/2023 15

K-Means
Working principal
Objective: identify groups (clusters) of observations with similar characteristics (e.g. discover
customer segments for marketing purposes, clustering different books on the basis of topics
and information, etc.)
(1) The individuals in the same group are similar as much as possible
(2) Individuals in different groups stand out as much as possible
Why ?
o Identify underlying structures in the data
o Summarize behaviors
o Assign new individuals to categories

Machine Learning team Esprit 2022/2023 16

K-Means
Working principal
• K-means clustering algorithm computes the centroids and iterates until we it finds optimal
centroid.

• It assumes that the number of clusters are already known.

• The number of clusters identified from data by algorithm is represented by ‘K’ in K-means.

• In this algorithm, the data points are assigned to a cluster in such a manner that the sum of
the squared distance between the data points and centroid would be minimum.

Machine Learning team Esprit 2022/2023 17

K-Means
Working principal

Machine Learning team Esprit 2022/2023 18

K-Means
Working principal
• Initialize k means with random values

Machine Learning team Esprit 2022/2023 19

K-Means
Working principal
• Find the mean closest to the item by calculating the Euclidean
distance of the item with each of the means
• Assign item to mean

Machine Learning team Esprit 2022/2023 20

K-Means
Working principal
• Update mean by shifting it to the average of the items in that cluster

Machine Learning team Esprit 2022/2023 21

K-Means
Working principal
• Assign item to the new mean

Machine Learning team Esprit 2022/2023 22

K-Means
Working
principal
Total Inertia = Inertia inter – classes + Inertia intra – classe (Huygens Theorem)
◼ Cluster weight= sum (weight of each observation)/ number
of observations
◼ Weight of an observation (by default)= 1/nb observations
Distance measurement

• Dispersion of barycenters around the • Dispersion within each group;

global barycenter; • Cluster compactness indicator
• Cluster separability indicator

The objective of the automatic clustering would be to minimize the intra-

class inertia W, at a fixed number of clusters K.
Machine Learning team Esprit 2022/2023 23
K-Means
Pseudocode
Input: X (n obs., p variables), K (clusters )
Initialize k means with random values
REPEAT
• Assign each individual to the cluster whose Fundamental property: the intra-class
center is closest inertia decreases at each step

• Recalculate cluster centers from attached

individuals
Number of iterations fixed when
UNTIL Convergence • No individual changes class
Output : A partition of individuals characterized • Or when W no longer decreases
• Or when the Gk are steady
by the K centers of clusters Gk

Machine Learning team Esprit 2022/2023 24

K-means
Number of
clusters

Machine Learning team Esprit 2022/2023 25

K-Means
Example

Machine Learning team Esprit 2022/2023 26

K-Means
Example

Machine Learning team Esprit 2022/2023 27

K-Means
Advantages

• Relatively simple to implement.

• Scales to large data sets.

• Easy to interpret.

• Easily adapts to new examples…

Machine Learning team Esprit 2022/2023 28

K-Means
Disdvantages

• Choosing k manually.

• Being dependent on initial values.

• Clustering data of varying sizes and density: k-means has trouble clustering data where clusters
are of varying sizes and density

• Clustering outliers: Centroids can be dragged by outliers, or outliers might get their own cluster
instead of being ignored. Consider removing or clipping outliers before clustering.

Machine Learning team Esprit 2022/2023 29

K-Means
Exercice
4 types of drugs each having two modalities: Concentration and efficacy, we want to
create two clusters (k=2)

Drug Concentration Efficacity

A 1 1

B 2 1

C 4 3

D 5 4

We randomly designate A and B as center of classes: C1=A and C2=B

NB: The used distance is the Euclidean distance

Machine Learning team Esprit 2022/2023 30

Bibliography
• J. MacQueen (1967). Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp.
on Math. Statist. and Prob., Vol. 1 (Univ. of Calif. Press, 1967), 281--297.

• A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. KDD-96 Proceedings, Martin
Ester & al.

• Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US,
2005. 321-352
• Algorithm AS 136: A K-Means Clustering Algorithm, J. A. Hartigan and M. A. Wong, Journal of the Royal Statistical Society.
Series C (Applied Statistics) Vol. 28, No. 1 (1979), pp. 100-108

Machine Learning team Esprit 2022/2023 31

Module 4 Quiz
0% (1)
Module 4 Quiz
7 pages
Week 11
No ratings yet
Week 11
49 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
04 - KMeans Clustering
No ratings yet
04 - KMeans Clustering
56 pages
K Means
No ratings yet
K Means
9 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
UNIT 3 ML Distance Based Learning
No ratings yet
UNIT 3 ML Distance Based Learning
19 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Week6_clustering_regression
No ratings yet
Week6_clustering_regression
101 pages
ML+Clustering
No ratings yet
ML+Clustering
33 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
Lecture 1 Clustering PDF
No ratings yet
Lecture 1 Clustering PDF
8 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
ML Application in Signal Processing and Communication Engineering
No ratings yet
ML Application in Signal Processing and Communication Engineering
27 pages
Unit-4
No ratings yet
Unit-4
46 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Clustering
No ratings yet
Clustering
24 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Unit IV
No ratings yet
Unit IV
96 pages
Week 10
No ratings yet
Week 10
41 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
Clustering Part-A
No ratings yet
Clustering Part-A
41 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
ML_lecture14
No ratings yet
ML_lecture14
17 pages
ML - UNIT 5 - Material - SVCK - CSE
No ratings yet
ML - UNIT 5 - Material - SVCK - CSE
22 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
K Mean
No ratings yet
K Mean
7 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Unsupervised ML Clustering
No ratings yet
Unsupervised ML Clustering
15 pages
Machine Learning K Means - Unsupervised
No ratings yet
Machine Learning K Means - Unsupervised
5 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
VI Sem Machine Learning CS 601 PDF
No ratings yet
VI Sem Machine Learning CS 601 PDF
28 pages
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
No ratings yet
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
2 pages
A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
No ratings yet
A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
6 pages
Ehy 4042
No ratings yet
Ehy 4042
15 pages
Top NLP BOoks
No ratings yet
Top NLP BOoks
5 pages
Implement SOFM For Character Recognition - Watermark
No ratings yet
Implement SOFM For Character Recognition - Watermark
9 pages
Understanding Kolmogorov Arnold Networks (KAN) - Towards Data Science
100% (1)
Understanding Kolmogorov Arnold Networks (KAN) - Towards Data Science
24 pages
Orange Lecture01 - Machine Learning (1)
No ratings yet
Orange Lecture01 - Machine Learning (1)
7 pages
Crónica. GA2-240202501-AA1-EV03.
No ratings yet
Crónica. GA2-240202501-AA1-EV03.
3 pages
Different Artificial Neural Networks Architectures
No ratings yet
Different Artificial Neural Networks Architectures
27 pages
A Deep Learning Approach To The Classification of 3D CAD Models
No ratings yet
A Deep Learning Approach To The Classification of 3D CAD Models
16 pages
Student Declaration Form 246058
No ratings yet
Student Declaration Form 246058
2 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
Introduction to Machine Learning 9
No ratings yet
Introduction to Machine Learning 9
3 pages
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
No ratings yet
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
11 pages
mcq_dlei
No ratings yet
mcq_dlei
16 pages
Open House Project Poster
No ratings yet
Open House Project Poster
2 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
(A2) Nguyen Quang Hung 20183927 - Nguyen Thi Hoa 20183746
No ratings yet
(A2) Nguyen Quang Hung 20183927 - Nguyen Thi Hoa 20183746
15 pages
InformativeEssay RicardoRosa#32141266
No ratings yet
InformativeEssay RicardoRosa#32141266
4 pages
Mapping Ai 2021 v2 PDF
No ratings yet
Mapping Ai 2021 v2 PDF
1 page
Detection of Tomato Leaf Disease Locations Using Deep Learning
No ratings yet
Detection of Tomato Leaf Disease Locations Using Deep Learning
9 pages
ML unit-1
No ratings yet
ML unit-1
15 pages
Resume Format
100% (1)
Resume Format
1 page
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
No ratings yet
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
46 pages
Deep Learning Mar 19
No ratings yet
Deep Learning Mar 19
2 pages
History Aware Multimodal Transformer For Vision-and-Language Navigation
No ratings yet
History Aware Multimodal Transformer For Vision-and-Language Navigation
23 pages
Machine Learning (15Cs73) : Text Book Tom M. Mitchell, Machine Learning, India Edition 2013, Mcgraw Hill
No ratings yet
Machine Learning (15Cs73) : Text Book Tom M. Mitchell, Machine Learning, India Edition 2013, Mcgraw Hill
78 pages
Chapter3 Classification Summary Final
No ratings yet
Chapter3 Classification Summary Final
11 pages