Approximating A Tensor As A Sum of Rank-One Components: Petros Drineas
Approximating A Tensor As A Sum of Rank-One Components: Petros Drineas
rank-one components
To access my web page:
Petros Drineas Petros Drineas
Rensselaer Polytechnic Institute
Computer Science Department
drineas
2
Algorithmic tools
(Randomized, Approximate) Matrix/tensor algorithms and in particular
matrix/tensor decompositions.
Goal
Learn a model for the underlying physical system generating the data.
Research interests in my group
3
Data are represented by matrices
Numerous modern datasets are in matrix form.
We are given m objects and n features describing the objects.
A
ij
shows the importance of feature j for object i.
Data are also represented by tensors.
Linear algebra and numerical analysis provide the fundamental mathematical
and algorithmic tools to deal with matrix and tensor computations.
m
objects
n features
Matrices/tensors in data mining
4
The TensorCUR algorithm
Mahoney, Maggioni, & Drineas KDD 06, SIMAX 08, Drineas & Mahoney LAA 07
- Definition of Tensor-CUR decompositions
- Theory behind Tensor-CUR decompositions
- Applications of Tensor-CUR decompositions:
recommendation systems, hyperspectral image
analysis
n products
n products
m customers
n products
n products
2 samples
sample
Theorem:
Best rank k
a
approximation to A
[a]
Unfold R along the dimension
and pre-multiply by CU
5
Overview
Preliminaries, notation, etc.
Negative results
Positive results
Existential result (full proof)
Algorithmic result (sketch of the algorithm)
Open problems
6
Approximating a tensor
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Notation
A is an order-r tensor (e.g., a tensor with r modes)
A rank-one component is an outer product of r vectors:
A rank-one component has the same dimensions as A, and
7
Approximating a tensor
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Notation
A is an order-r tensor (e.g., a tensor with r modes)
A rank-one component is an outer product of r vectors:
We will measure the error:
8
Approximating a tensor
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Notation
A is an order-r tensor (e.g., a tensor with r modes)
Frobenius norm:
We will measure the error:
Spectral norm:
9
Approximating a tensor
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Notation
A is an order-r tensor (e.g., a tensor with r modes)
Frobenius norm:
We will measure the error:
Spectral norm:
10
Approximating a tensor
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Notation
A is an order-r tensor (e.g., a tensor with r modes)
Frobenius norm:
We will measure the error:
Spectral norm:
Equivalent to the
corresponding matrix
norms for r=2
11
Approximating a tensor: negative results
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Negative results
(A is an order-r tensor)
1. For r=3, computing the minimal k such that A is exactly equal to the sum
of rank-one components is NP-hard [Hastad 89, 90]
12
Approximating a tensor: negative results
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Negative results
(A is an order-r tensor)
1. For r=3, computing the minimal k such that A is exactly equal to the sum
of rank-one components is NP-hard [Hastad 89, 90]
2. For r=3, identifying k rank-one components such that the Frobenius norm
error of the approximation is minimized might not even have a solution (L.-
H. Lim 04)
13
Approximating a tensor: negative results
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Negative results
(A is an order-r tensor)
1. For r=3, computing the minimal k such that A is exactly equal to the sum
of rank-one components is NP-hard [Hastad 89, 90]
2. For r=3, identifying k rank-one components such that the Frobenius norm
error of the approximation is minimized might not even have a solution (L.-
H. Lim 04)
3. For r=3, identifying k rank-one components such that the Frobenius norm
error of the approximation is minimized (assuming such components exist)
is NP-hard.
14
Approximating a tensor: positive results!
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
1. (Existence) For any tensor A, and any > 0, there exist at most k=1/
2
rank-one tensors such that
Positive results ! Both from a paper of Kannan et al in STOC 05.
(A is an order-r tensor)
15
Approximating a tensor: positive results!
Fundamental Question
Given a tensor A and an integer k find k rank-one tensors such that their sum
is as close to A as possible.
Positive results ! Both from a paper of Kannan et al in STOC 05.
(A is an order-r tensor)
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
Time:
16
The matrix case
Matrix result
For any matrix A, and any > 0, we can find at most k=1/
2
rank-one
matrices such that
17
The matrix case
Matrix result
For any matrix A, and any > 0, we can find at most k=1/
2
rank-one
matrices such that
To prove this, simply recall that the best rank k approximation to a matrix
A is given by A
k
(as computed by the SVD). But, by setting k=1/
2
18
The matrix case
Matrix result
For any matrix A, and any > 0, we can find at most k=1/
2
rank-one
matrices such that
From an existential perspective, the result is the same for matrices and
higher order tensors.
From an algorithmic perspective, in the matrix case, the algorithm is (i)
more efficient, (ii) returns fewer rank one components, and (iii) there is no
failure probability.
19
Existential result: the proof
1. (Existence) For any tensor A, and any > 0, there exist at most k=1/
2
rank-one tensors such that
Proof
If then we are done. Otherwise, by the definition of the
spectral norm of the tensor,
w.l.o.g., unit norm
20
Proof
Consider the tensor:
Existential result: the proof
1. (Existence) For any tensor A, and any > 0, there exist at most k=1/
2
rank-one tensors such that
scalar
We can prove (easily) that:
21
Proof
Now combine:
Existential result: the proof
1. (Existence) For any tensor A, and any > 0, there exist at most k=1/
2
rank-one tensors such that
22
Proof
We now iterate this process using B instead of A. Since at every step we
reduce the Frobenius norm of A, this process will eventually terminate.
The number of steps is at most k=1/
2
, thus leading to k rank-one tensors.
Existential result: the proof
1. (Existence) For any tensor A, and any > 0, there exist at most k=1/
2
rank-one tensors such that
23
Algorithmic result: outline
Ideas:
For simplicity, focus on order-3 tensors. The only part of the existential proof
that is not constructive, is how to identify unit vectors x, y, and z such that
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
Time:
is maximized.
24
Algorithmic result: outline (contd)
Good news!
If x and y are known, then in order to maximize
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
over all unit vectors x,y, and z, we can set z be the (normalized) vector whose
j
3
entry is:
for all j
3
25
Algorithmic result: outline (contd)
Approximating z
Instead of computing the entries of z, we approximate them by sub-sampling:
We draw a set S of random tuples (j
1
,j
2
) we need roughly 1/
2
such tuples
and we approximate the entries of z by using the tuples in S only!
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
26
Algorithmic result: outline (contd)
Weighted sampling
Weighted sampling is used in order to pick the tuples (j
1
,j
2
).
More specifically,
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
27
Algorithmic result: outline (contd)
Exhaustive search in a discretized interval
We only need values for x
j1
and y
j2
in the set S.
We will exhaustively try all possible values (by placing a fine grid on the
interval [-1,1]).
This leads to a number of trials that is exponential in |S|.
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
28
Algorithmic result: outline (contd)
Recursively figure out x and y
Each one of the possible values for x
j1
and y
j2
for (j
1
,j
2
) in S, leads to a possible
vector z.
We treat that vector as the true z, and we try to figure out x and y recursively!
This is a smaller problem..
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
29
Algorithmic result: outline (contd)
Done!
Return the best x,y, and z.
The running time is dominated by the cardinality of S, which is not too bad
assuming that is a constant
The algorithm can also be generalized to higher order tensors.
2. (Algorithmic) For any tensor A, and any > 0, we can find at most k=4/
2
rank-one tensors such that with probability at least .75
30
Approximating Max- r -CSP problems
Max- r -CSP (=Max-SNP)
The goal of the Kannan et al paper was to design PTAS (polynomial-time
approximation schemes) for a large class of Max- r -CSP problems.
Max- r -CSP problems are constraint satisfaction problems with n boolean
variables and m constraints: each constraint is the logical OR of exactly r
variables.
The goal is to maximize the number of satisfied constraints.
Max- r -CSP problems model a large number of problems, including Max-Cut,
Bisection, Max-k-SAT, 3-coloring, Dense-k-subgraph, etc.
Interestingly, tensors may be used to model Max- r -CSP as an optimization
problem, and tensor decompositions help reduce its dimensionality.
31
Approximating Max- r -CSP problems
Max- r -CSP (=Max-SNP)
The goal of the Kannan et al paper was to design PTAS (polynomial-time
approximation schemes) for a large class of Max- r -CSP problems.
Max- r -CSP problems are constraint satisfaction problems with n boolean
variables and m constraints: each constraint is the logical OR of exactly r
variables.
The goal is to maximize the number of satisfied constraints.
Max- r -CSP problems model a large number of problems, including Max-Cut,
Bisection, Max-k-SAT, 3-coloring, Dense-k-subgraph, etc.
Interestingly, tensors may be used to model Max- r -CSP as an optimization
problem, and tensor decompositions help reduce its dimensionality.
See also:
Arora, Karger & Karpinski 95, Frieze & Kannan 96, Goldreich, Goldwasser & Ron 96, Alon, Vega,
Kannan, & Karpinski 02, 03, Drineas, Kannan, & Mahoney, 05, 07.
32
Open problems
Similar error bounds for other norm combinations?
What about Frobenius on both sides, or spectral on both sides?
Existential and/or algorithmic results are interesting.
Is it possible to get constant (or any) factor approximations in the case where the
optimal solution exists?
Improved algorithmic results
The exponential dependency on is totally impractical.
Provable algorithms would be preferable