0% found this document useful (0 votes)
7 views

Lesson 5 - Unsupervised Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lesson 5 - Unsupervised Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

INTRODUCTION TO

ARTIFICIAL INTELLIGENCE
BUI NGOC DUNG
Information (if available)

CHAPTER 5: UNSUPERVISED LEARNING


K-MEANS CLUSTERING
UNSUPERVISED LEARNING
❑ Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset
and are allowed to act on that data without any supervision.
❑ The aim of an unsupervised algorithm is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
CLUSTER
❑ The organization of unlabeled data into similarity groups called clusters.
❑ A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in
other clusters.
CLUSTERING
Clustering is a type of unsupervised learning that automatically forms clusters of similar things.
K-MEANS CLUSTERING
K-means is an algorithm that find k clusters for a given dataset. The number of clusters k is user defined. Each cluster is
described by a single point known as the centroid. Centroid means it’s at the center of all the points in the
cluster.
K-MEANS CLUSTERING
❑ Pros: Easy to implement
❑ Cons: Can converge at local minimal; slow on very large datasets
❑ Work with: Numeric values
PSEUDO-CODE
𝐶𝑟𝑒𝑎𝑡𝑒 𝑘 𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑜𝑟 𝑠𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠 (𝑜𝑓𝑡𝑒𝑛 𝑟𝑎𝑛𝑑𝑜𝑚𝑙𝑦)
𝑊ℎ𝑖𝑙𝑒 𝑎𝑛𝑦 𝑝𝑜𝑖𝑛𝑡 ℎ𝑎𝑠 𝑐ℎ𝑎𝑛𝑔𝑒𝑑 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑝𝑜𝑖𝑛𝑡 𝑖𝑛 𝑜𝑢𝑟 𝑑𝑎𝑡𝑎𝑠𝑒𝑡:
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑:
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑎𝑛𝑑 𝑝𝑜𝑖𝑛𝑡
𝑎𝑠𝑠𝑖𝑔𝑛 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑠𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑎𝑡 𝑐𝑙𝑢𝑠𝑡𝑒𝑟:
𝑎𝑠𝑠𝑖𝑔𝑛 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
DISTANCE MEASURE
❑ Distance measure determines the similarity between two elements and influences the shape of clusters.
❑ K-Means clustering supports various kinds of distance measures, and the most method is used Euclidean
measure to calculate the distance between two points.
GENERAL APPROACH TO K-MEANS CLUSTERING
1. Collect: Any method.
2. Prepare: Numeric values are needed for a distance calculation, and nominal val ues can be mapped into
binary values for distance calculations.
3. Analyze: Any method.
4. Train: Doesn’t apply to unsupervised learning.
5. Test: Apply the clustering algorithm and inspect the results. Quantitative error measurements such as sum
of squared error (introduced later) can be used.
6. Use: Anything you wish. Often, the clusters centers can be treated as representative data of the whole
cluster to make decisions.
ILLUSTRATION
❑ https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
❑ http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
THANK YOU
INFORMATION (IF AVAILABLE)
Information (if available)

Information (if available)

You might also like