0% found this document useful (0 votes)
3 views

B43 Exp5 ML

Uploaded by

Nikhil Aher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

B43 Exp5 ML

Uploaded by

Nikhil Aher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

PART A

( PART A: TO BE REFERRED BY STUDENTS)

Experiment No. 05
A.1 Aim:
To implement K-means clustering

A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able To implement K-means clustering. A.4

Theory:

K-means clustering is one of the most widely used unsupervised machine learning
algorithms that forms clusters of data based on the similarity between data instances.
For this particular algorithm to work, the number of clusters has to be defined
beforehand. The K in the K-means refers to the number of clusters.

The K-means algorithm starts by randomly choosing a centroid value for each cluster.
After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance
between each data instance and centroids of all the clusters; (ii) Assign the data
instances to the cluster of the centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the data instances from the
corresponding cluster .

Hierarchical Based Methods : The clusters formed in this method forms a tree type
structure based on the hierarchy. New clusters are formed using the previously formed
one. It is divided into two category.

Agglomerative (bottom up approach)

Divisive (top down approach) .

Agglomerative Clustering :
Agglomerative algorithms start with each individual item in its own cluster and
iteratively merge clusters until all items belong in one cluster. Different agglomerative
algorithms differ in how the clusters are merged at each level. Outputting the dendrogram
produces a set of clusters rather than just one clustering. The user can determine which
of the clusters (based on distance threshold) he or she wishes to use.

Agglomerative Algorithm

Compute the distance matrix between the input data points Let each data point be a cluster.

Repeat

Merge the two closest clusters

Update the distance matrix

Distance between two clusters Each cluster is a set of points. In following ways
distance is defined in clusters.

Single Link:

Distance between clusters Ci and Cj is the minimum distance between any object in
Ci and any object in Cj.
Complete Link:

Distance between clusters Ci and Cj is the maximum distance between any object in
Ci and any object in Cj

Average Link:

Distance between clusters Ci and Cj is the average distance between any object in Ci
and any object in Cj

PART B
( PART B : TO BE COMPLETED BY STUDENTS)
Roll. No.: B43 Name: Nikhil Aher

Class: Fourth Year (B) Batch: B3

Date of Experiment: 09/08/24 Date of Submission: 16/08/24

Grade:
B.1 Software Code written by student:
1. K- Means Clustering:
2. Agglomerative Clustering:

B.2 Input and Output:


1 . K- Means Clustering:
2. Agglomerative Clustering:

B.3 Observations and learning:


K-means clustering efficiently groups data into clusters by iteratively assigning
points to the nearest centroid and recalculating centroids. It assumes clusters are
spherical and equally sized, which may not fit all datasets. The number of clusters must
be specified beforehand, and the algorithm’s performance depends on the initial centroid
placement. Using multiple initializations can improve results. Visualization and evaluation
metrics like the Elbow Method and Silhouette Score are crucial for assessing clustering
quality and determining the optimal number of clusters.
B.4 Conclusion:
K-means clustering partitions data into clusters by assigning points to the nearest
centroid and adjusting centroids iteratively. It requires specifying the number of clusters
in advance and assumes clusters are spherical and of similar size. The algorithm’s
effectiveness can be improved by using multiple initializations to mitigate sensitivity to
centroid placement. Visualization and techniques like the Elbow Method aid in evaluating
clustering results and determining the optimal number of clusters. Despite its assumptions,
K-means is a powerful tool for understanding data structure and identifying patterns.

**********

You might also like