Clustering - Jupyter Notebook
Clustering - Jupyter Notebook
Clustering
k-means Clustering
https://en.wikipedia.org/wiki/K-means_clustering (https://en.wikipedia.org/wiki/K-means_clustering)
In [2]: style.use('default')
In [4]: iris_df['species'].value_counts()
Out[4]: virginica 50
setosa 50
versicolor 50
iter = 1
while True:
# 2a. Assign labels based on closest center
labels = pairwise_distances_argmin(X, centers)
if np.all(centers == new_centers):
break
centers = new_centers
1 1
2 1
2 2
5 1
5 2
5 3
5 4
5 5
10 1
10 2
10 3
10 4
10 5
10 6
10 7
10 8
10 9
In [9]: iris_df
Using sklearn
https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
(https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:882: User
Warning: KMeans is known to have a memory leak on Windows with MKL, when there
are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.
In [12]: kmc.cluster_centers_
In [13]: kmc.labels_
Out[13]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2,
2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2,
2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1])
row_0
0 50 0 0
1 0 48 14
2 0 2 36
Metrics
https://scikit-learn.org/stable/modules/clustering.html#clustering-evaluation (https://scikit-
learn.org/stable/modules/clustering.html#clustering-evaluation)
Out[15]: 0.6128676734836785
In [ ]: