B43 Exp5 ML
B43 Exp5 ML
Experiment No. 05
A.1 Aim:
To implement K-means clustering
A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able To implement K-means clustering. A.4
Theory:
K-means clustering is one of the most widely used unsupervised machine learning
algorithms that forms clusters of data based on the similarity between data instances.
For this particular algorithm to work, the number of clusters has to be defined
beforehand. The K in the K-means refers to the number of clusters.
The K-means algorithm starts by randomly choosing a centroid value for each cluster.
After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance
between each data instance and centroids of all the clusters; (ii) Assign the data
instances to the cluster of the centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the data instances from the
corresponding cluster .
Hierarchical Based Methods : The clusters formed in this method forms a tree type
structure based on the hierarchy. New clusters are formed using the previously formed
one. It is divided into two category.
Agglomerative Clustering :
Agglomerative algorithms start with each individual item in its own cluster and
iteratively merge clusters until all items belong in one cluster. Different agglomerative
algorithms differ in how the clusters are merged at each level. Outputting the dendrogram
produces a set of clusters rather than just one clustering. The user can determine which
of the clusters (based on distance threshold) he or she wishes to use.
Agglomerative Algorithm
Compute the distance matrix between the input data points Let each data point be a cluster.
Repeat
Distance between two clusters Each cluster is a set of points. In following ways
distance is defined in clusters.
Single Link:
Distance between clusters Ci and Cj is the minimum distance between any object in
Ci and any object in Cj.
Complete Link:
Distance between clusters Ci and Cj is the maximum distance between any object in
Ci and any object in Cj
Average Link:
Distance between clusters Ci and Cj is the average distance between any object in Ci
and any object in Cj
PART B
( PART B : TO BE COMPLETED BY STUDENTS)
Roll. No.: B43 Name: Nikhil Aher
Grade:
B.1 Software Code written by student:
1. K- Means Clustering:
2. Agglomerative Clustering:
**********