Chap2 Part1 KMEANS
Chap2 Part1 KMEANS
Kmeans
Machine Learning Team
UP GL-BD
Learning Outcomes
3. Clustering
4. K-means
5. Bibliography
• A generative model focuses on explaining how the data was generated, while a discriminative
model focuses on predicting the labels of the data
responses.
Unsupervised Learning
Dimensionality
Clustering Association rules
reduction
Unsupervised Learning
Dimensionality
Clustering Association rules
reduction
• It is basically a collection of objects on the basis of similarity and dissimilarity between them:
• Data points in the same groups are more similar to other data points in the same group
• Data points in other groups are dissimilar.
Distance de Minkowski • It allows you a huge amount of flexibility over your distance metric
• The parameter p can be troublesome to work with as finding the right value
can be quite computationally inefficient depending on the use-case.
City planning
It is used to make groups of houses and to study
their values based on their geographical locations
and other factors present.
• The number of clusters identified from data by algorithm is represented by ‘K’ in K-means.
• In this algorithm, the data points are assigned to a cluster in such a manner that the sum of
the squared distance between the data points and centroid would be minimum.
• Easy to interpret.
• Choosing k manually.
• Clustering data of varying sizes and density: k-means has trouble clustering data where clusters
are of varying sizes and density
• Clustering outliers: Centroids can be dragged by outliers, or outliers might get their own cluster
instead of being ignored. Consider removing or clipping outliers before clustering.
B 2 1
C 4 3
D 5 4
• A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. KDD-96 Proceedings, Martin
Ester & al.
• Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US,
2005. 321-352
• Algorithm AS 136: A K-Means Clustering Algorithm, J. A. Hartigan and M. A. Wong, Journal of the Royal Statistical Society.
Series C (Applied Statistics) Vol. 28, No. 1 (1979), pp. 100-108