Cluster
Cluster
Unsupervised learning
• Unsupervised learning:
– Data with no target attribute. Describe hidden structure from unlabeled data.
– Explore the data to find some intrinsic structures in them.
• Clustering: the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar to each other than to
those in other clusters.
• Useful for
– Automatically organizing data.
– Understanding hidden structure in data.
– Preprocessing for further analysis.
2
Applications
• Biology: classification of plants and animal kingdom given their
features
• Marketing: Customer Segmentation based on a database of
customer data containing their properties and past buying
records
• Clustering weblog data to discover groups of similar access
patterns.
• Recognize communities in social networks.
Aspects of
clustering
• A clustering algorithm such as The quality of a
– Partitional clustering eg, kmeans clustering result
– Hierarchical clustering depends on the
algorithm, the distance
– Mixture of Gaussians
function, and the
• A distance or similarity function application.
– such as Euclidean, Minkowski, cosine
• Clustering quality
– Inter-clusters distance maximized
– Intra-clusters distance minimized
Partitioning
Algorithms
• Partitioning method: Construct a partition of a
database D of m objects into a set of k
clusters
• Given a k, find a partition of k clusters that optimizes
the chosen partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic method: k-means (MacQueen, 1967)
Partitioning
Algorithms
• Given k
• Construct a partition of m objects 𝑋 = {𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 }
where 𝒙𝒊 = (𝑥 𝑖1, 𝑥 𝑖2, … , 𝑥 𝑖 𝑛 ) is a vector in a real-valued space 𝑋 ⊆ ℝ 𝑛 , n is the
number of attributes.
• into a set of k clusters 𝑆 = {𝑆1 , 𝑆2 , … , 𝑆𝑘 }
• The cluster mean 𝜇𝑖 serves as a prototype of the cluster 𝑆𝑖 .
• Find k clusters that optimizes a chosen criterion
– E.g., the within-cluster sum of squares (WCSS)
(sum of distance functions of each point in the cluster to the cluster
mean)
𝑘 2
argmin 𝑥𝑖 − 𝜇 𝑖
𝑆
𝑖=1 𝑥∈𝑆 𝑖
=
Convergence of K-Means
• Recomputation monotonically decreases each square
error since
(𝑚𝑗 is number of members in cluster j):
σ 𝑥𝑖 − 𝑎 2 reaches minimum for:
−2 𝑥𝑖 − 𝑎 = 0
𝑥𝑖 = 𝑎 = 𝑚𝑗 𝑎
𝑎 = 1ൗ𝑚𝑗 𝑥𝑖 = 𝑐𝑗