ML Exp5 C36
ML Exp5 C36
TUS3F202128 C36
PART A
(PART A: TO BE REFFERED BY STUDENTS)
Experiment No. 5
A.1 Aim:
To implement K-means clustering
A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able To implement K-means clustering.
A.4 Theory:
K-means clustering is one of the most widely used unsupervised machine learning
algorithms that forms clusters of data based on the similarity between data instances.
For this particular algorithm to work, the number of clusters has to be defined
beforehand. The K in the K-means refers to the number of clusters.
The K-means algorithm starts by randomly choosing a centroid value for each
cluster. After that the algorithm iteratively performs three steps: (i) Find the
Euclidean distance between each data instance and centroids of all the clusters; (ii)
Assign the data instances to the cluster of the centroid with nearest distance; (iii)
Calculate new centroid values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.
Hierarchical Based Methods : The clusters formed in this method forms a tree type
structure based on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category.
Prathmesh Gaikwad
TUS3F202128 C36
Agglomerative Clustering:
Agglomerative algorithms start with each individual item in its own cluster and
iteratively merge clusters until all items belong in one cluster. Different
agglomerative algorithms differ in how the clusters are merged at each level.
Outputting the dendrogram produces a set of clusters rather than just one clustering.
The user can determine which of the clusters (based on distance threshold) he or she
wishes to use.
Agglomerative Algorithm
Compute the distance matrix between the input data points Let each data point be a
cluster.
Repeat
Distance between two clusters Each cluster is a set of points. In following ways
distance is defined in clusters.
Single Link:
Distance between clusters Ci and Cj is the minimum distance between any object in
Ci and any object in Cj.
Complete Link:
Distance between clusters Ci and Cj is the maximum distance between any object in
Ci and any object in Cj
Prathmesh Gaikwad
TUS3F202128 C36
Average Link:
Distance between clusters Ci and Cj is the average distance between any object in
Ci and any object in Cj
Prathmesh Gaikwad
TUS3F202128 C36
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
import plotly as py
import plotly.graph_objs as go
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('Mall_Customers.csv')
df.head()
df.columns
X1 = df[['Age' , 'Spending Score (1-100)']].iloc[: , :].values
inertia = []
for n in range(1 , 15):
algorithm = (KMeans(n_clusters = n ,init='k-means++', n_init = 10 ,max_iter=300,
tol=0.0001, random_state= 111 , algorithm='elkan') )
algorithm.fit(X1)
inertia.append(algorithm.inertia_)
algorithm = (KMeans(n_clusters = 4 ,init='k-means++', n_init = 10 ,max_iter=300,
tol=0.0001, random_state= 111 , algorithm='elkan') )
algorithm.fit(X1)
labels1 = algorithm.labels_
centroids1 = algorithm.cluster_centers_
h = 0.02
x_min, x_max = X1[:, 0].min() - 1, X1[:, 0].max() + 1
y_min, y_max = X1[:, 1].min() - 1, X1[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = algorithm.predict(np.c_[xx.ravel(), yy.ravel()])
Prathmesh Gaikwad
TUS3F202128 C36
2. Iris
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("/content/Iris.csv")
df.head()
df['Species'],categories =pd.factorize(df['Species'])
df.head()
df.describe
df.isna().sum()
sns.scatterplot(data=df, x="SepalLengthCm", y="SepalWidthCm",hue="Species");
sns.scatterplot(data=df, x="PetalLengthCm", y="PetalWidthCm",hue="Species");
3. Housing
import pandas as pd
home_data = pd.read_csv('housing.csv', usecols = ['longitude', 'latitude', 'median_house_value'])
home_data.head()
kmeans.fit(X_train_norm)
K = range(2, 8)
fits = []
score = []
for k in K:
# train the model for current value of k on training data
model = KMeans(n_clusters = k, random_state = 0, n_init='auto').fit(X_train_norm)
# append the model to fits
fits.append(model)
# Append the silhouette score to scores
score.append(silhouette_score(X_train_norm, model.labels_, metric='euclidean'))
2. Iris
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
3. Housing
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
B.4 Conclusion
Hence, we successfully learned & To implemented K-means clustering.