0% found this document useful (0 votes)

38 views18 pages

ML Exp5 C36

The document discusses implementing K-means clustering using Python. It includes code to perform K-means clustering on three datasets: Mall Customers, Iris, and Housing. The code calculates inertia for different numbers of clusters on Mall Customers data and plots the clustered data. It also performs clustering and visualizes the results for the Iris and Housing datasets.

Uploaded by

Prathmesh Gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views18 pages

ML Exp5 C36

Uploaded by

Prathmesh Gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Prathmesh Gaikwad

TUS3F202128 C36

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 5
A.1 Aim:
To implement K-means clustering

A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able To implement K-means clustering.

A.4 Theory:

K-means clustering is one of the most widely used unsupervised machine learning
algorithms that forms clusters of data based on the similarity between data instances.
For this particular algorithm to work, the number of clusters has to be defined
beforehand. The K in the K-means refers to the number of clusters.

The K-means algorithm starts by randomly choosing a centroid value for each
cluster. After that the algorithm iteratively performs three steps: (i) Find the
Euclidean distance between each data instance and centroids of all the clusters; (ii)
Assign the data instances to the cluster of the centroid with nearest distance; (iii)
Calculate new centroid values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.

Hierarchical Based Methods : The clusters formed in this method forms a tree type
structure based on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category.
Prathmesh Gaikwad
TUS3F202128 C36

Agglomerative (bottom up approach)

Divisive (top down approach) .

Agglomerative Clustering:

Agglomerative algorithms start with each individual item in its own cluster and
iteratively merge clusters until all items belong in one cluster. Different
agglomerative algorithms differ in how the clusters are merged at each level.
Outputting the dendrogram produces a set of clusters rather than just one clustering.
The user can determine which of the clusters (based on distance threshold) he or she
wishes to use.

Agglomerative Algorithm

Compute the distance matrix between the input data points Let each data point be a
cluster.

Repeat

Merge the two closest clusters

Update the distance matrix

Prathmesh Gaikwad
TUS3F202128 C36

Distance between two clusters Each cluster is a set of points. In following ways
distance is defined in clusters.

Single Link:

Distance between clusters Ci and Cj is the minimum distance between any object in
Ci and any object in Cj.

Complete Link:

Distance between clusters Ci and Cj is the maximum distance between any object in
Ci and any object in Cj
Prathmesh Gaikwad
TUS3F202128 C36

Average Link:

Distance between clusters Ci and Cj is the average distance between any object in
Ci and any object in Cj
Prathmesh Gaikwad
TUS3F202128 C36

PART B
(PART B : TO BE COMPLETED BY STUDENTS)

Roll No: BE-C36 Name: Prathmesh Krishna Gaikwad

Class: BE-Comps Batch: C2
Date of Experiment: 26/09/2023 Date of Submission: 26/09/2023
Grade:

B.1 Software Code written by student:

1. Mall_Customers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly as py
import plotly.graph_objs as go

from sklearn.cluster import KMeans

import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('Mall_Customers.csv')
df.head()
df.columns
X1 = df[['Age' , 'Spending Score (1-100)']].iloc[: , :].values
inertia = []
for n in range(1 , 15):
algorithm = (KMeans(n_clusters = n ,init='k-means++', n_init = 10 ,max_iter=300,
tol=0.0001, random_state= 111 , algorithm='elkan') )
algorithm.fit(X1)
inertia.append(algorithm.inertia_)
algorithm = (KMeans(n_clusters = 4 ,init='k-means++', n_init = 10 ,max_iter=300,
tol=0.0001, random_state= 111 , algorithm='elkan') )
algorithm.fit(X1)
labels1 = algorithm.labels_
centroids1 = algorithm.cluster_centers_
h = 0.02
x_min, x_max = X1[:, 0].min() - 1, X1[:, 0].max() + 1
y_min, y_max = X1[:, 1].min() - 1, X1[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = algorithm.predict(np.c_[xx.ravel(), yy.ravel()])
Prathmesh Gaikwad
TUS3F202128 C36

plt.figure(1 , figsize = (15 , 7) )

plt.clf()
Z = Z.reshape(xx.shape)
plt.imshow(Z , interpolation='nearest',
extent=(xx.min(), xx.max(), yy.min(), yy.max()),
cmap = plt.cm.Pastel2, aspect = 'auto', origin='lower')

plt.scatter( x = 'Age', y = 'Spending Score (1-100)', data = df, c = labels1, s = 100)

plt.scatter(x = centroids1[: , 0] , y = centroids1[: , 1] , s = 300 , c = 'red' , alpha = 0.5)
plt.ylabel('Spending Score (1-100)') , plt.xlabel('Age')
plt.show()

2. Iris
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("/content/Iris.csv")
df.head()
df['Species'],categories =pd.factorize(df['Species'])
df.head()
df.describe
df.isna().sum()
sns.scatterplot(data=df, x="SepalLengthCm", y="SepalWidthCm",hue="Species");
sns.scatterplot(data=df, x="PetalLengthCm", y="PetalWidthCm",hue="Species");

3. Housing
import pandas as pd
home_data = pd.read_csv('housing.csv', usecols = ['longitude', 'latitude', 'median_house_value'])
home_data.head()

import seaborn as sns

sns.scatterplot(data = home_data, x = 'longitude', y = 'latitude', hue = 'median_house_value')

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(home_data[['latitude', 'longitude']],
home_data[['median_house_value']], test_size=0.33, random_state=0)

from sklearn import preprocessing

X_train_norm = preprocessing.normalize(X_train)
X_test_norm = preprocessing.normalize(X_test)

from sklearn import KMeans

kmeans = KMeans(n_clusters = 3, random_state = 0, n_init='auto')
Prathmesh Gaikwad
TUS3F202128 C36

kmeans.fit(X_train_norm)

sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = kmeans.labels_)

sns.boxplot(x = kmeans.labels_, y = y_train['median_house_value'])

from sklearn.metrics import silhouette_score

silhouette_score(X_train_norm, kmeans.labels_, metric='euclidean')

K = range(2, 8)
fits = []
score = []

for k in K:
# train the model for current value of k on training data
model = KMeans(n_clusters = k, random_state = 0, n_init='auto').fit(X_train_norm)
# append the model to fits
fits.append(model)
# Append the silhouette score to scores
score.append(silhouette_score(X_train_norm, model.labels_, metric='euclidean'))

sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = fits[0].labels_)

sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = fits[2].labels_)
sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = fits[2].labels_)
sns.lineplot(x = K, y = score)
sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = fits[3].labels_)
sns.boxplot(x = fits[3].labels_, y = y_train['median_house_value'])

B.2 Input and Output:

1. Mall_Customers
Prathmesh Gaikwad
TUS3F202128 C36

2. Iris
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36

3. Housing
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36

B.3 Observations and learning:

k-means clustering is a method of vector quantization, originally from signal processing, that
aims to partition n observations into k clusters in which each observation belongs to
the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the
cluster.

B.4 Conclusion
Hence, we successfully learned & To implemented K-means clustering.

B.5 Question of Curiosity (Handwritten any 3)

1. What is Agglomerative clustering? Explain in detail with algorithm.
2. Explain Divisive clustering (top-down approach).
3. What are the limitations while implementing K-means clustering?
4. Explain the steps involved in clustering the data using K-means clustering algorithm?
5. How are centroids calculated using K-means clustering algorithm?
6. What are the disadvantages of K-means?
7. How is K means clustering achieved in python dataframe?
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36

09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Clustering
No ratings yet
Clustering
35 pages
k-means
No ratings yet
k-means
25 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Detecting Patterns with Unsupervised Learning
No ratings yet
Detecting Patterns with Unsupervised Learning
21 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Artificial Intelligence Lab 10
No ratings yet
Artificial Intelligence Lab 10
8 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
EXP-6 K Mean Clustring
No ratings yet
EXP-6 K Mean Clustring
6 pages
AbidAdhikari26840-DWDM
No ratings yet
AbidAdhikari26840-DWDM
43 pages
K Means
100% (2)
K Means
329 pages
DWM_EXP4
No ratings yet
DWM_EXP4
9 pages
EXPERIMENT 9
No ratings yet
EXPERIMENT 9
10 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
DA_EXP_10
No ratings yet
DA_EXP_10
6 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
KMEANS
No ratings yet
KMEANS
9 pages
Lab-7_Clustering
No ratings yet
Lab-7_Clustering
4 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
DOC-20250407-WA0033.
No ratings yet
DOC-20250407-WA0033.
38 pages
ML-Lab Programs - VTU
No ratings yet
ML-Lab Programs - VTU
5 pages
DMDW Lab8
No ratings yet
DMDW Lab8
3 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
AIML_lab_10
No ratings yet
AIML_lab_10
4 pages
K Means
No ratings yet
K Means
3 pages
Assignment # 1: Performance Timeline of Flynn Taxonomy
No ratings yet
Assignment # 1: Performance Timeline of Flynn Taxonomy
21 pages
K++
No ratings yet
K++
5 pages
Clustering
No ratings yet
Clustering
1 page
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
K-Means Clustering From Scratch
No ratings yet
K-Means Clustering From Scratch
3 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
k-means
No ratings yet
k-means
3 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
DS - ML - 7 - 60019210046 1
No ratings yet
DS - ML - 7 - 60019210046 1
6 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
machine learning lab
No ratings yet
machine learning lab
20 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
2.3 Aiml Rishit
No ratings yet
2.3 Aiml Rishit
7 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
ML Exp8 C36
No ratings yet
ML Exp8 C36
18 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
252211se Exp4 C37
No ratings yet
252211se Exp4 C37
13 pages
LAB Manual Part A: Experiment No.04
No ratings yet
LAB Manual Part A: Experiment No.04
12 pages
Terna Engineering College: LAB Manual Part A
No ratings yet
Terna Engineering College: LAB Manual Part A
7 pages
Java Exp6 c28
No ratings yet
Java Exp6 c28
10 pages
Terna Engineering College: LAB Manual Part A
No ratings yet
Terna Engineering College: LAB Manual Part A
10 pages
CG EXP12 C28 Mini Project
No ratings yet
CG EXP12 C28 Mini Project
9 pages
Re RL Nfa
No ratings yet
Re RL Nfa
7 pages

ML Exp5 C36

Uploaded by

ML Exp5 C36

Uploaded by

Prathmesh Gaikwad

Agglomerative (bottom up approach)

Divisive (top down approach) .

Merge the two closest clusters

Update the distance matrix

Roll No: BE-C36 Name: Prathmesh Krishna Gaikwad

B.1 Software Code written by student:

from sklearn.cluster import KMeans

plt.figure(1 , figsize = (15 , 7) )

plt.scatter( x = 'Age', y = 'Spending Score (1-100)', data = df, c = labels1, s = 100)

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn import preprocessing

from sklearn import KMeans

sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = kmeans.labels_)

from sklearn.metrics import silhouette_score

sns.scatterplot(data = X_train, x = 'longitude', y = 'latitude', hue = fits[0].labels_)

B.2 Input and Output:

B.3 Observations and learning:

B.5 Question of Curiosity (Handwritten any 3)

You might also like