Lecture 3 Types of Machine Learning
Lecture 3 Types of Machine Learning
Unsupervised learning algorithms apply the following techniques to describe the data:
• Clustering: it is an exploration of data used to segment it into meaningful groups (i.e., clusters)
based on their internal patterns without prior knowledge of group credentials.
• The credentials are defined by the similarity of individual data objects and also aspects of their
dissimilarity from the rest (which can also be used to detect anomalies).
• Dimensionality reduction: there is a lot of noise in the incoming data. Machine learning algorithms
use dimensionality reduction to remove this noise while distilling the relevant information.
Clustering
• Clustering is a technique for finding similarity groups in data, called clusters. i.e., groups data instances
that are similar to (near) each other in one cluster and data instances that are very different (far away) from
each other into different clusters.
• Clustering is often called an unsupervised learning task, as no class values denoting an a priori grouping
of the data instances are given, which is the case in supervised learning.
• It is a main task of exploratory data mining , and a common technique for statistical data analysis . Used in
many fields, including machine learning, pattern recognition, image analysis, etc.
• For example, a taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days”
without ever being given labeled examples of each by a teacher.
4
Unsupervised Learning Process Flow
The data has no labels. The machine just looks for whatever patterns it can find.
Feature
Training
Text,
Documents
Vectors
, Images,
etc.
Machine
Learning
Algorithm
Feature
Likelihood
New Text, Vectors or Cluster ID
Document, Predictive or Better
Images, etc. Model Representatio
n
Unsupervised Learning vs. Supervised Learning
The only difference is the labels in the training data
Feature Feature
Training Text, Training Text,
Vectors Documents, Vectors
Documents,
Images, etc. Images, etc.
Machine
Machine
Learning Labels Learning
Algorithm
Algorithm
Feature
Likelihood Feature
New Text, or Cluster ID
Vectors Predictive New Text,
Document, or Better Expected
Model Vectors Predictive
Images, etc. Representation Document,
Model Label
Images, etc.
Unsupervised Learning: Example
Clustering like-looking birds/animals based on their features
Unsupervised
Learning
Application of Unsupervised Learning
Unsupervised learning can be used for anomaly detection as well as
clustering
Anomaly
0.9 detection
7 0.8
6
0.7
5 + xxx
x xx
0.6 + +++ x
+ x x x x x xx
4 +++++ x
xxx 0.0033
+++ 0.0251
3 0.5 + 0.008
2 0.0119
0.4
1
0.3
0
0.2
-1
-2 0.1
0 2 4 6 8 10 0 0.5 0.6 0.7 0.8 0.9 0.1
0.1 0.2 0.3 0.4
Identifying
similarities in
groups
(Clustering)
Clustering
Grouping objects based on the information found in data that describes the
objects or their relationship
Cluster 0
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Need of Clustering
To determine the intrinsic grouping in a set of unlabeled data
To understand and extract value from large sets of structured and unstructured data
Types of Clustering
Clustering
Hierarchical Partitional
clustering
clustering
B C B C B C B C
A A A A
D D D D
F E F E F E F E
Dissimilarity
Dissimilarity
Dissimilarity
Dissimilarity
A
A B C D E B C D E F A B C D E F
A B C D E
F
Combine A and B based on similarity Combination of A and B is combined Final tree contains all clusters
Combine D and E based on with C Combined into a single cluster
Combination of D and E is combined
similarity
2 with F 3
1 4
Working: Hierarchical Clustering
Assign each item to Find the closest (most Compute distances Repeat steps 2 and 3
its own cluster, such similar) pair of (similarities) between until all items are
that if you have N clusters and merge the new cluster and clustered into a single
number of items, you them into a single every old cluster cluster of size N
now have N number cluster. Now you have
of clusters one less cluster
Distance Measures
Complete - Linkage clustering
• Find the maximum possible distance between points belonging to two different
clusters
Agglomerative
Divisive
Hierarchical Clustering: Example
A hierarchical clustering of distances between cities in kilometers
MI
TO
FI
RM
NA BA
RM
BA NA RM FI TO MI
Hierarchical Clustering: Step 1
Create distance matrix of data
BA FI MI NA RM TO
Distance Matrix
Hierarchical Clustering: Step 2
From the distance matrix, you can see that MI has least distance with TO and they form a cluster together
BA FI MI NA RM TO
BA FI MI/TO NA RM
BA 0 662 877 255 412 996
BA 0 662 877 255 412
FI 662 0 295 468 268 400
FI 662 0 295 468 268
MI 877 295 0 754 564 138
MI/TO 877 295 0 754 564
NA 255 468 754 0 219 869
NA 255 468 754 0 219
RM 412 268 564 219 0 669
RM 412 268 564 219 0
TO 996 400 138 869 669 0
TO MI
BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564 NA RM TO MI
NA 255 468 754 0 219
RM 412 268 564 219 0
BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 468 564 0
Hierarchical Clustering: Step 3 (Contd.)
BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564 NA RM TO MI
NA/RM 255 268 564 0
BA/(NA/RM) FI MI/TO
BA/(NA/RM) 0 268 564
FI 268 0 295
MI/TO 564 295 0
Hierarchical Clustering: Step 3 (Contd.)
BA/(NA/RM) FI MI/TO
BA/(NA/RM) 0 268 564
FI 268 0 295 BA
MI/TO 564 295 0 NA RM FI MI
TO
BA/(NA/RM)/FI (MI/TO)
BA/(NA/RM)/FI 0 295
(MI/TO) 295 0
Hierarchical Clustering: Step 4
Derive the final dendrogram
BA/(NA/RM)/FI (MI/TO)
BA/(NA/RM)/FI 0 295
BA
(MI/TO) 295 0 NA RM FI TO
MI
K-means Algorithm: Steps
i.e., Distortion
This is because their size decreases and hence distortion is
also smaller"
a. Tru
e
b. False
Knowledge
Check Can decision trees be used for performing clustering?
1
a. Tru
e
b. False
a. 1,3, and 4
b. 1, 2, and 3
c. 1, 2, and 4
a. 1,3, and 4
b. 1, 2, and 3
c. 1, 2, and 4
Web Sources:
1. https://machinelearningmastery.com/linear-regression-for-machine-learning/
Video Source:
1. https://www.youtube.com/watch?v=zPG4NjIkCjc
THANK
YOU
For Queries,
Write at : [email protected]