Partitioning Method (K-Mean) in Data Mining
Last Updated :
04 Sep, 2024
Partitioning Method: This clustering method classifies the information into multiple groups based on the characteristics and similarity of the data. Its the data analysts to specify the number of clusters that has to be generated for the clustering methods. In the partitioning method when database(D) that contains multiple(N) objects then the partitioning method constructs user-specified(K) partitions of the data in which each partition represents a cluster and a particular region. There are many algorithms that come under partitioning method some of the popular ones are K-Mean, PAM(K-Medoids), CLARA algorithm (Clustering Large Applications) etc. In this article, we will be seeing the working of K Mean algorithm in detail.
K-Mean (A centroid based Technique): The K means algorithm takes the input parameter K from the user and partitions the dataset containing N objects into K clusters so that resulting similarity among the data objects inside the group (intracluster) is high but the similarity of data objects with the data objects from outside the cluster is low (intercluster). The similarity of the cluster is determined with respect to the mean value of the cluster. It is a type of square error algorithm. At the start randomly k objects from the dataset are chosen in which each of the objects represents a cluster mean(centre). For the rest of the data objects, they are assigned to the nearest cluster based on their distance from the cluster mean. The new mean of each of the cluster is then calculated with the added data objects.
Algorithm:
K mean:
Input:
K: The number of clusters in which the dataset has to be divided
D: A dataset containing N number of objects
Output:
A dataset of K clusters
Method:
- Randomly assign K objects from the dataset(D) as cluster centres(C)
- (Re) Assign each object to which object is most similar based upon mean values.
- Update Cluster means, i.e., Recalculate the mean of each cluster with the updated values.
- Repeat Step 2 until no change occurs.
Figure - K-mean Clustering
Flowchart:
Figure - K-mean Clustering
Example: Suppose we want to group the visitors to a website using just their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.
Iteration-1:
C1 = 16.33 [16, 16, 17]
C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the clusters (16-29) and (36-66) as 2 clusters we get using K Mean Algorithm.
Similar Reads
Measures of Distance in Data Mining
Clustering consists of grouping certain objects that are similar to each other, it can be used to decide if two items are similar or dissimilar in their properties. In a Data Mining sense, the similarity measure is a distance with dimensions describing object features. That means if the distance amo
3 min read
Data Preprocessing in Data Mining
Data preprocessing is the process of preparing raw data for analysis by cleaning and transforming it into a usable format. In data mining it refers to preparing raw data for mining by performing tasks like cleaning, transforming, and organizing it into a format suitable for mining algorithms. Goal i
6 min read
Data Reduction in Data Mining
Prerequisite - Data Mining The method of data reduction may achieve a condensed description of the original data which is much smaller in quantity but keeps the quality of the original data. INTRODUCTION: Data reduction is a technique used in data mining to reduce the size of a dataset while still p
7 min read
Data Normalization in Data Mining
Data normalization is a technique used in data mining to transform the values of a dataset into a common scale. This is important because many machine learning algorithms are sensitive to the scale of the input features and can produce better results when the data is normalized. Normalization is use
5 min read
Data Transformation in Data Mining
Data transformation in data mining refers to the process of converting raw data into a format that is suitable for analysis and modeling. It also ensures that data is free of errors and inconsistencies. The goal of data transformation is to prepare the data for data mining so that it can be used to
4 min read
Data Mining in R
Data mining is the process of discovering patterns and relationships in large datasets. It involves using techniques from a range of fields, including machine learning, statistics and database systems, to extract valuable insights and information from data.In this article, we will provide an overvie
3 min read
Table Partitioning in Cassandra
In this article, we are going to cover how we can our data access on the basis of partitioning and how we can store our data uniquely in a cluster. Let's discuss one by one. Pre-requisite â Data Distribution Table Partitioning : In table partitioning, data can be distributed on the basis of the part
2 min read
Measuring Clustering Quality in Data Mining
A cluster is the collection of data objects which are similar to each other within the same group. The data objects of a cluster are dissimilar to data objects of other groups or clusters. Clustering Approaches:1. Partitioning approach: The partitioning approach constructs various partitions and the
4 min read
Numerosity Reduction in Data Mining
Prerequisite: Data preprocessing Why Data Reduction ? Data reduction process reduces the size of data and makes it suitable and feasible for analysis. In the reduction process, integrity of the data must be preserved and data volume is reduced. There are many techniques that can be used for data red
6 min read
Various terms in Data Mining
Data mining has applications in multiple fields like science and research. It is a prediction based on likely outcomes. Its focuses on the last data set. Data mining is the procedure of mining knowledge from data. The knowledge extracted so can be used for any of the following applications such as p
3 min read