Analysis of High-Dimensional Data: Leif Kobbelt
Analysis of High-Dimensional Data: Leif Kobbelt
High-Dimensional Data
Leif Kobbelt
Motivation
X x1 , , x n R
d n
Motivation
X x1 , , x n R
Decrease d
PCA
MDS
d n
dimensionality reduction:
X x1 , x 2 , , x n R
d n
C ( X X )( X X ) R
T
d d
C XJ XJ X J J X
T
1 T
J I n 11
n
principal component(s):
eigenvector(s) vi to largest eigenvalue(s) i of C
(low rank approximation)
6
C V DV
v1 v d diag1 d v1 v d
v1 v q diag 1 q v1 v q
X : v1 v q
*
X J R
q n
Relation to SVD
XJ V U
C XJ ( XJ ) V U U V
T
V V
2
C XJ ( XJ ) R
~
T
n n
C ( XJ ) XJ R
T
w XJ v
Cv v
~
T
T
T
C w XJ XJ XJ v XJ v w
9
Example
10 points in
10
10
Example
10 points in
0.617 0.615
C
0.615 0.717
0.74
e1
0.68
0.68
e2
0.74
11
11
Multi-Dimensional Scaling
X x1 , , x n , x i R
n n
Di , j xi x j
12
12
d n
in high-
of pairwise
Multi-Dimensional Scaling
X x1 , , x n , x i A
matrix
DR
n n
Di , j
13
13
Multi-Dimensional Scaling
( D, D ) J ( D D ) J
T
2
F
) are possible
other measures ( D, D
but they cannot be solved easily.
14
14
Multi-Dimensional Scaling
J DJ R
1
2
n n
vq
v1
X' 1
, , q
v1
v q
15
15
R q n
Multi-Dimensional Scaling
16
Motivation
X x1 , , x n R
Decrease n
clustering:
k-means
EM
Mean shift
Spectral clustering
Hierarchical clustering
17
17
d n
Cluster Analysis
18
18
Cluster Analysis
19
19
k-means Clustering
,
x
x
R
Given: data samples 1
n
i
Goal: partition the n samples into k sets (k n)
S1, S2, , Sk such that k
arg min
S
i 1 x j S i
20
k-means Clustering
Sit x j : x j m ti x j m ti* , i * 1, , k
t 1
i
1
t
Si
x j S it
21
k-means Clustering
22
22
k-means Clustering
23
23
k-means Clustering
24
24
k-means Clustering
25
25
k-means Clustering
26
26
k-means Clustering
27
27
k-means Clustering
28
28
k-means Clustering
29
29
k-means Clustering
30
30
k-means Clustering
31
31
k-means Clustering
32
32
k-means Clustering
33
33
k-means Clustering
34
34
k-means Clustering
35
35
Advantages:
Efficient
Always converges to a solution
Drawbacks:
36
36
Clustering Results
37
37
EM Algorithm
38
38
EM Algorithm
d
x
,
,
x
,
x
R
Given: data samples
1
n
i
Assumption: data was generated by k Gaussians
Goal: Fit Gaussian mixture model (GMM) to data X
Find j 1,, k
means
covariances
j of the Gaussians j
probabilities (weights) j that the samples come from
the Gaussian j
39
39
40
41
41
42
42
43
43
EM Algorithm
n
k
1
L0 log 0j x i 0j , 0j
n i 1
j 1
2. E-step: Compute
m
ij
mj x i mj , mj
m
m
m
l
i l
l
l 1
and
, i 1, , n, j 1, , k
n ijm , j 1, , k
m
j
44
i 1
44
EM Algorithm
n mj
n
1
m 1
j m
nj
m 1
j
1
m
nj
ij xi
i 1
x
n
i 1
m
ij
m 1
j
m 1 T
j
45
k m 1
1 n
log j x i mj 1 , mj 1
n i 1
j 1
45
Example (2D)
Ground truth:
Means:
Covariance matrices:
Weights:
Input to EM-algorithm:
1000 samples
46
46
Initial Estimate
47
47
1st Iteration
48
48
2nd Iteration
49
49
3rd Iteration
50
50
51
51
52
52
,
x
x
R
Given: data samples 1
n
i
Multi-variate kernel density estimate with radially
symmetric kernel K(x) and window radius h
1
f x d
nh
x xi
K
h
i 1
n
K x ck ,d k x
53
x xi 2
x
g
i
2
h
x x i i 1
n
h
x xi
g
h
i 1
'
k
where
kernel profile k x
54
54
x xi 2
x
g
i
2
h
x x i i 1
n
h
x xi
g
h
i 1
proportional to density
estimate at x
mh x
55
1.
2.
m x ti
x ti 1 x ti m x ti
f x i 0
56
56
57
57
x i0
58
58
x1i
59
59
x i2
60
60
x 3i
61
61
x in
62
62
Advantages:
Drawbacks:
Computationally expensive:
63
Summary
X x1 , , x n R
d n
Decrease d
dimensionality reduction:
Decrease n
clustering:
PCA
MDS
k-means
EM
Mean shift
Spectral clustering
Hierarchical clustering
64
64
Spectral Clustering
Model similarity between data points as graph
65
Spectral Clustering
Model similarity between data points as graph
66
Spectral Clustering
Graphs:
67
Spectral Clustering
Model similarity between data points as graph
Graph Laplacian L = D W:
68
Spectral Clustering
Properties of the Graph Laplacian L:
For every vector
L is symmetric and positive semi-definite
The smallest eigenvalue of L is 0
The corresonding eigenvector is the constant one vector
L has n non-negative, real-valued eigenvalues
69
Spectral Clustering
The multiplicity k of the eigenvalue 0 of L equals the number of connected
components in the graph
Consider k = 1. Assume f is eigenvector with eigenvalue 0:
vanish
70
Spectral Clustering
Laplacian of graph with 1 connected component has one constant vector
with eigenvalue 0
For k > 1: Wlog. assume that vertices are ordered according to connected
components
Each
Each
71
Spectral Clustering
Graph:
Graph Laplacian
72
Spectral Clustering
Graph:
73
Spectral Clustering
Similarity Graph:
W=
74
Spectral Clustering
Similarity Graph:
L=
Eigenvalues : 0, 0.4, 2, 2
Eigenvectors :
75
Spectral Clustering
Similarity Graph:
Partition graph into 2 sets of vertices such that the weight of edges connecting them
is minimal:
Vertices in each set should be similar to vertices in the same set, but dissimilar to
vertices from the other set
Partitions often not balanced: isolated vertices
76
Spectral Clustering
Similarity Graph:
Partition graph into 2 sets of vertices such that the weight of edges connecting them
is minimal
Partitions should have similar size
77
Spectral Clustering
Min-Cut: minimize
Normalized Cut: minimize
78
minimal if
Spectral Clustering
Reformulate with Graph Laplacian
Construct f:
79
Spectral Clustering
Reformulate Ncut:
Minimize
subject to
For k > 2 we can similarily construct indicator vectors like f and relax the
problem for minimization:
Project the vertices into the subspace spanned by the first k eigenvectors of L
Clustering the embedded vertices yields the solution
80
Spectral Clustering
Mean Shift
81
Spectral Clustering
K-Means
Spectral Clustering
Summary:
82
Hierarchical Clustering
Bottom up:
83
Hierarchical Clustering
Requirements:
84
Maximum linkage:
Average linkage:
Ward linkage:
Hierarchical Clustering
Algorithm:
Maximum linkage:
Average linkage:
Ward linkage:
85
Hierarchical Clustering
We can add connectivity constraints that enforce which clusters can be
merged
86
Hierarchical Clustering
Summary:
87