Unit4-Clustering
Unit4-Clustering
Clustering
Basic Concept and Terminologies
●
Both the Euclidean distance and Manhattan distance satisfy the
following mathematic requirements of a distance function:
1. d(i, j) ≥ 0: Distance is a nonnegative number.
2. d(i, i) = 0: The distance of an object to itself is 0.
3. d(i, j) = d( j, i): Distance is a symmetric function.
4. d(i, j) ≤ d(i, h) + d(h, j): Going directly from object i to object j in space is o
more than making a detour over any other object h (triangular inequality)
COM 315 Advanced Programming Techniques)
2024 BBIS, KU Unit 4: Clustering 27
Minkowski Distance
●
Minkowski distance is a generalization of both Euclidean
distance and Manhattan distance. It is defined as
●
where p is a positive integer. Such a distance is also called Lp
norm, in some literature.
●
It represents the Manhattan distance when p = 1 (i.e., L1
norm) and Euclidean distance when p = 2 (i.e., L2 norm).
●
Weighting can also be applied to the Manhattan and
Minkowski distances
●
The k-means algorithm takes the input parameter, k, and
partitions a set of n objects into k clusters so that the
resulting intra-cluster similarity is high but the inter-cluster
similarity is low.
●
Cluster similarity is measured in regard to the mean value of
the objects in a cluster, which can be viewed as the cluster’s
centroid or center of gravity.
●
The k-means and the k-modes methods can be integrated to
cluster data with mixed numeric and categorical values.