Chapter 3-Unsupervised learning_updated
Chapter 3-Unsupervised learning_updated
Unsupervised Learning
By: Yeshambel A.
Unsupervised learning
• Cluster analysis
Finding similarities between data according to the characteristics found in the data
and grouping similar data objects into clusters
What is Cluster Analysis?
Finding groups of objects such that the objects in a group will be similar
(or related) to one another and different from (or unrelated to) the objects
in other groups
Inter-cluster
Intra- distances
cluster are
distances maximized
are
minimized
Clustering-Applications
Example: Clustering
⚫ The example below demonstrates the clustering of padlocks of same
⚫ There are three different kind of padlocks; which can be grouped into
three different clusters.
⚫ The padlocks of same kind are clustered into a group as shown below.
7
Examples of Clustering Applications
⚫ Marketing: H elp marketers discover distinct groups in their customer bases, and the use
this knowledge to develop targeted marketing programs
⚫ Land use: Identification of areas of similar land use in an earth observation database
⚫ Insurance: Identifying groups of motor insurance policy holders with a high average
claim
cost
Feature Selection
identifying the most effective subset of the original features to
use in clustering
Feature Extraction
transformations of the input features to produce new salient features.
Inter-pattern Similarity
measured by a distance function defined on pairs of patterns.
Grouping
methods to group similar patterns in the same cluster
Outliers
cluster
outliers
⚫ Cohesion measures how near the data points in a cluster are to the
cluster
centroid.
⚫ Sum of squared error (SSE) is a commonly used measure.
⚫
Measuring clustering validity
Internal Index:
•Validate without external info
•With different number of clusters ? ?
•Solve the number of clusters
⚫ SSE is good for comparing two clustering or two clusters (average SSE).
⚫ Can also be used to estimate the number of clusters
10
6 9
8
4
7
2 6
SS
E
0 5
4
-2
3
-4
2
-6 1
2 5 1 15 20 25 30
5 10 1
0 0 K
5
Internal Measures: Cohesion and Separation
⚫ Cluster Cohesion: Measures how closely related are objects in a cluster.
⚫Cluster Separation: Measure how distinct or well-separated a cluster is from other clusters.
⚫ A proximity graph based approach can also be used for cohesion and
separation.
⚫ Cluster cohesion is the sum of the weight of all links within a cluster.
⚫ Cluster separation is the sum of the weights between nodes in the cluster and
nodes outside the cluster.
Categories of clustering Algorithms
Partitioning method
2. Partition objects into k subsets. An object will be clustered into class J if it has
the smallest distance with this class mean compared to the distance with the
other class mean
3. Compute the new centroids of the clusters of the current partition. The
centroid of the jth cluster is the center (mean point) of the data point whose
cluster index is found to be the center of class j in the above step.
• Strengths:
• Simple: easy to understand and to implement
• Efficient: Time complexity: O(tkn), where n is the number of data points, k is the number of
clusters, and t is the number of iterations.
• Since both k and t are small. k-means is considered a linear algorithm.
• Outliers are data points that are very far away from other data points.
• Outliers could be errors in the data recording or some special data points with
very different values.
Iteration 1
First we list all points in the first column of the table below. The initial cluster
centers - are (2, 10),
centroids, (8,4) and (1, 2) - chosen randomly.
The Table shows the distance of each data points (instances) from
chosen centroids. The last column the shows the cluster the instance should be assigned to
based on its distance from the centroids.
Data Points Cluster 1 with Cluster 2 with Cluster 3 with Cluster
centroid (2,10) centroid (8, centroid (1,
4) 2)
A1 (2, 10) 0 12 9 1
A2 (2, 5) 5 7 4 3
A3 (8, 4) 12 0 9 2
A4 (5, 8) 5 7 10 1
A5 (7, 5) 10 2 9 2
A6 (6, 4) 10 2 7 2
A7 (1, 2) 9 9 0 3
A8 (4, 9) 3 9 10 1
Iteration 1
⚫ Next, we need to re-compute the new cluster centers.We do so, by taking the
mean of all points in each cluster.
⚫ For Cluster 1, we have three points (A1, A4, A8) and needs to take average
of them as new centroid, i.e.
((2+5+4)/3, (10+8+9)/3) = (3.67, 9) Cluster 1
⚫ For Cluster 2, we have three points (A3, A5, A6).The new centroid
is: ((8+7+6)/3, (4+5+4)/3 ) = (7, 4.33) Cluster 2
⚫ ForCluster 3, we have two points (A2, A7).The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
Cluster 3
⚫ Sincecentroids changes in Iteration1, we go to the next Iteration (epoch2)
using the new means we computed.
⚫ The
iteration continues until the centroids do not change
anymore.
Second epoch
⚫ After the 2nd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.67,9); cluster 2:
{A3,A5,A6} with new centroid = (7,4.33); cluster 3:
{A2,A7} with new centroid=(1.5,3.5)
⚫ Using the new centroid compute cluster members again.
Data Points Cluster 1 with Cluster 2 with Cluster 1 with Cluster
centroid (3.67, 9) centroid (7, 4.33) centroid (1.5, 3.5)
A1 (2, 10)
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
0
A7 (1, 2)
A8 (4, 9)
Final results
⚫Finally in the 2th epoch there is no change of members of clusters and
centroids. So the algorithm stops.
⚫ The result of clustering is shown in the following figure
Hierarchical Clustering
• Produce a nested sequence of clusters, a tree, also called Dendrogram.
Example of hierarchical clustering
Hierarchical clustering
Types of hierarchical clustering
• stops when all the data points are merged into a single cluster (i.e., the root cluster).
• Divisive (top down) clustering: It starts with all data points in one cluster, the
root.
• Splits the root into a set of child clusters. Each child cluster is recursively divided further
• stops when only singleton clusters of individual data points remain, i.e., each cluster
with only a single point
Types of hierarchical clustering
Step Step Step Step Step
0 1 2 3 4 Agglomerative Nesting
a ab (AGNES)
b abcde
cde
c de
d
ab
e abcde
Divisive
a cde
Analysis
de
(DIANA)
b
Step Step Step
Step 3 2 1 0
c
Agglomerative clustering
• Go on merging
Ot At
reward
Rt
▪ Reinforcement learning is the problem faced by an agent that learns behavior through
trial- and-error interactions with a dynamic environment.
▪ Reinforcement Learning is learning how to act in order to maximize a numerical reward.
State st Reward rt
Action t
Next state s t+1 a
Environment
Policy
Reward
Valu Model of
e environment • Policy: what to do
• Reward: what is good
• Value: what is good because it
predicts reward
▪ Reward function: • Model: what follows what
▪ Defines the goal in an RL problem
▪ Policy is altered to achieve this goal
▪ The task is to learn an optimal policy that maps states of the world to actions of the agent. I.e., if
this patch of room is dirty, I clean it. If my battery is empty, I
recharge it.
Markov Decision Process
▪ MDP is a classical formalization of sequential decision making, where actions influence
not
just immediate rewards, but also subsequent situations, or states, and through those future rewards,
▪ It meant to be a straightforward framing of the problem of learning from
interaction to achieve a goal.
▪ The learner and decision maker is called the agent. The thing it interacts with,
comprising everything outside the agent, is called the environment.
▪ These interact continually, the agent selecting actions and the environment responding to
these
actions and presenting new situations to the agent.
Markov Decision Process
▪ At time step t=0, environment samples initial state s0 ~ p(s0)
▪ Searching is better but runs down the battery; if runs out of power while searching, has to be
rescued (which is bad).
Markov Decision Process
▪ Example: Recycle robot MDP
▪ Decisions made on basis of current energy level: high, low.
▪ Reward = number of cans collected
Rsearch = expected no. of cans while searching Rwait =
S = {high,low} expected no. of cans while waiting
A( high ) = {search,wait}
search
wait
Dimensionality reduction techniques
Dimensionality reduction
• Represent the data x1 , . . . , xn ∈ Rd in a subspace of lower
dimension with as little loss of information as possible
• Advantages:
• Visualization
• Lower computation and time complexity
• Avoid overfitting and reduce noise
Dimensionality reduction techniques
• Problem settings:
• Given x 1 , . . . , x n ∈ Rd
• find a k -dimensional subspace
• such that the data projected onto that space
• is as close to the original data as possible
• Common techniques:
• Principal Component Analysis(PCA)
• Factor Analysis
• Linear Discriminant Analysis( LDA) please read the details of each
• Multiple Correspondence Analysis (MCA)
• t-Distributed Stochastic Neighbour Embedding (t-
SNE)
Dimensionality reduction techniques