0% found this document useful (0 votes)

14 views

03 Matrix

Uploaded by

chunfeng277

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

03 Matrix

Uploaded by

chunfeng277

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 112

CS 514 Advanced Topics in Network Science

Lecture 3. Matrix and Tensor

Hanghang Tong, Computer Science, Univ. Illinois at Urbana -Champaign, 2024
Network Science: An Overview
We are here
network
(e.g., Patterns, laws, connectivity, etc.)

(e.g., clusters, communities,

dense subgraphs, etc.)
subgraph

(e.g., ranking, link prediction, embedding, etc.)

node/link

• Level 1: diameter, connectivity, graph-level classification, graph-level embedding, graph kernel, graph structure learning, graph generator,…
• Level 2: frequent subgraphs, clustering, community detection, motif, teams, dense subgraphs, subgraph matching, NetFair, …
• Level 3: node proximity, node classification, link prediction, anomaly detection, node embedding, network alignment, NetFair,
• Beyond:, network of X, ….

2
Matrix & Tensor Tools
• Matrix Tools
– Proximity (covered in Lecture 2)
– Low-rank approximation
– Co-clustering
• Tensor Tools

3
Motivation
• Q: How to find patterns?
– e.g., communities, anomalies, etc.
• A (Common Approach): Low-Rank
Approximation (LRA) for Adjacency Matrix.
X M X R

A ~ L

4
Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos: Colibri: fast
mining of large static and dynamic graphs. KDD 2008: 686-694
LRA for Graph Mining
Conference

John
ICDM
1 1 0 0
Tom 1 1 0 0
KDD

Author
Bob
1 1 0 0
Carl
ISMB 0 1 1 1
Van
RECOMB
0 0 1 1
Roy
0 0 1 1
Author Conference Adjacency matrix: A
5
LRA for Graph Mining: Communities
R: Conf. Group Matrix
John Adj. matrix: A
ICDM
Tom X X
KDD
Bob

Carl
ISMB
~ M: Group-Group
Van
Interaction Matrix
RECOMB
Roy

L: author group matrix

Author Conf.

6
LRA for Graph Mining: Anomalies
John Adj. matrix: A L M R
ICDM
Tom X X
KDD
Bob

Carl
ISMB
~
Van
RECOMB
Roy

Author Conf.
Recon. error is high
→ ‘Carl’ is abnormal
7
Challenges – Problem 1
• Prob.1: Given a static graph A,
+ (C1) How to get (L, M, R) efficiently?
- Both time and space
+ (C2) What is the interpretation of
(L, M, R)?

8
Challenges – Problem 2
• Prob. 2: Given a dynamic graph
At(t=1,2,…),
+ (C3) How to get (Lt, Mt, Rt) incrementally?
- Track patterns over time

9
Roadmap - LRA
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
10
Overview

X M X R

A L L
Find L Project A

Projection of A

Same for
different methods
11
Matrix & Vector
3 1 Phlip Yu Philip 1
ICML
• Matrix B= 1 1 William Cohen William
1
3
1 SIGMOD
0 0 John Smith John

SIGMOD ICML

John Smith
William Cohen

ICML = [1, 1, 0]’

SIGMOD = [3, 1, 0]’
Philip Yu

12
Column Space
3 1 Phlip Yu Philip 1
ICML
• Matrix B= 1 1 William Cohen William
1
3
1 SIGMOD
0 0 John Smith John

SIGMOD ICML

• Column Space of a Matrix

ICML SIGMOD

VLDB = SIGMOD – ICML = [2 0 0]’

13
Projection & Projection Matrix
KDD
v

ICML
v~ KDD ~
SIGMOD

+
X BTB X BT X

v~ = B v
Core Matrix

Projection of v Projection matrix of B An arbitrary vector 14

Projection of a Matrix

L M R

+
X BTB X BT X

~ = B A
A
Core Matrix

Projection of A Projection matrix of B

15
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
16
Singular-Value-Decomposition (SVD)
1 … v1

x x

…
… … …. ….
… k vk

 V:
a1 a2 ….
a3 …a m
~ u1 …
…. uk
right singular vectors

… …

A: n x m U: left singular vectors 17

SVD: definitions
• #1: Find the left matrix U,
– where A  viT a1  vi ,1 + a2  vi ,2 + ... + am  vi ,m
ui = =
i i
• #2: Project A into the column space of U
+
A = U (U U ) U A = ... = U V
T T

18
SVD: advantages
• Optimal Low-Rank Approximation
–In both L2 and LF

–For any rank-k matrix Ak

|| A – ||2, F <= || A – Ak ||2,F

19
SVD: drawbacks
• (C1) Efficiency A U  V
2 2
– Time O (min( n m, nm ))
[footnote: or O( E • Iter ) ] =
– Space (U, V) are dense

• (C2) Interpretation

20
SVD: drawbacks
• (C3) Dynamic: not easy
At Ut t Vt At+1 Ut+1 t+1 Vt+1

21
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
22
CUR (CX) decomposition
[Drineas+ 2005]

+
… … … x (C T….
C) x C AT
….

U R
… ……. … ~ ….
•Sample Columns from A
•Project A
… … … Left matrix: C
Middle matrix: (C T C ) +
Right matrix : C T A

A: n x m C 23
CUR (CX): advantages
• (C0) Quality: Near-Optimal
• (C1) Efficiency (better than SVD)
– Time O ( c 2
n ) or O ( c 3
+ cm)
• (c is # of sampled col.s)
– Space (C, R) are sparse

• (C2) Interpretation

24
CUR (CX): drawbacks
• (C1) Redundancy in C

• 3 copies of green,
• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red…

25
Redundant Col.
KDD
Does Not Help
ICML ~
KDD
SIGMOD

Observations: VLDB
~
#1: Does not help KDD
KDD
#2: Wastes time & space
ICML ~
KDD
SIGMOD
VLDB

26
CUR (CX): drawbacks
• (C3) Dynamic: not easy

~~
~

?
t t+1

C ~
C

27
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
28
CMD [Sun+ 2007]
CUR (CX) CMD
Original Matrix

~~ ~

Left matrix: C
Middle matrix: (C T C ) +
• 3 copies of green, Right matrix : C T A
• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red C

Duplicate: deleted in CMD!

29
Challenges
• Can we do even better than CMD
• by removing the other types of redundancy?
• Can we efficiently track LRA
• for time-evolving graphs?

30
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
– Colibri-S for static graphs (Problem 1)
– Colibri-D for dynamic graphs (Problem 2)
• Experimental Results
• Conclusion

31
Colibri-S: Basic Idea
CUR (CX) Colibri-S
Original Matrix
…
x. x ….

M R

…
. Left matrix: L
Middle matrix: ( LT L) −1

• 3 copies of green, Right matrix : LT A

• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red L
We want the Col.s in L to be linearly independent!
32
Q: How to find L & M from C efficiently?

33
A: Find L & M incrementally!
Initially Sampled
….
Matrix C

Current For each col. v in C

Redundant
discard v
L&M Project it on L ? Yes

Expand L & M
34
Step 1: How to test if KDD is redundant ?

KDD
SIGMOD

~
_ X

ICML
KDD
KDD
=

ICML
~ SIGMOD

KDD
KDD = Mold X ICML X
SIGMOD

35
Step 2: How to update core matrix ?

-1

SIGMOD
SIGMOD

ICML
KDD Mold = ICML
X

ICML
~
KDD
?
SIGMOD
-1
SIGMOD

SIGMOD
ICML
X

KDD
Mnew = ICML
KDD

36
Q: How to update core matrix?
A: Incrementally.
Theorem 1
1 ~
KDDX −1
[Tong et al KDD 2008]
+ X 

KDD
Mold 

KDD
~

~
 2

Mnew = −1
 ~
KDD
1
 

~
We only need to know KDD and !
37
Colibri-S vs. CUR(CMD)
Example:
• (C0) Quality: -If c = 200, c = 1000
• Colibri-S = CUR(CMD) - Colibri-S: 125x faster !

• (C1) Time: O(c 3 + cm) vs. O(c3 + cm ), where c  c, m  m

• Colibri-S better or equal CUR(CMD)
• (C1) Space
• Colibri-S better or equal CUR(CMD)
• (C2) Interpretations
• Colibri-S = CUR(CMD)
38
A Pictorial Comparison
1 ICML Y: William Cohen
Philip
1 3
William 1 SIGMOD
……

X: Philip Yu

Each dot is a conference

39
A Pictorial Comparison: SVD
Y: William Cohen

2nd singular vector

1st singular vector

X: Philip Yu

Each dot is a conference

40
A Pictorial Comparison: CUR
[Drineas+ 2005]

Y: William Cohen

3x 2x

4x
2x
1x X: Philip Yu

Each dot is a conference

41
A Pictorial Comparison: CMD
[Sun+ 2007]

Y: William Cohen

X: Philip Yu

Each dot is a conference

42
A Pictorial Comparison: Colibri-S
[Tong+ 2008]

Y: William Cohen

X: Philip Yu

Each dot is a conference

43
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
– Colibri-S for static graphs (Problem 1)
– Colibri-D for dynamic graphs (Problem 2)
• Experimental Results optional

• Conclusion

44
Problem Definition
• Given (e.g., Author-Conference Graphs)

A1 A2 A3 …

• Find Incrementally

M1 R1 M2 R2 M3 R3
L1 L2 L3 …
45
Colibri-D for dynamic graphs

Mt Rt

t Lt

Initially sampled matrix

Mt+1 Rt+1
?
t+1 Lt+1

Q: How to update L and M efficiently? 46

Colibri-D: How-To
Selected Redundant

Mt Rt

t Lt

Initially sampled matrix

Selected Redundant
t+1 t+1
M R
?
t+1 Lt+1

47
Changed from t
Colibri-D: How-To Mt

Selected Redundant Lt

t
~
M
Unchanged Cols!

~ Subspace by
L
Initially sampled matrix blue cols
Selected Redundant at t+1

t+1
Mt+1

Lt+1
48
How to Get Core Matrix
for Un-changed Col.s ?
Lt
-1

X
M t
= [(Lt )’ x Lt ]-1 =
t

?
-1
t+1 ~ ~t ~ t -1
Mt = [(L )’ xL ] = X

~
Lt v
49
How to Get Core Matrix
for Un-changed Col.s ?
Let
s: # of changed columns in Lt

Theorem 2 [Tong et al KDD 2008] -1

X X
~t _ t
M 2,2
t
M 2,1
M = t
M 1,2

We only need an s x s matrix inverse !

50
How to Get Core Matrix
for Un-changed Col.s
Let t: # of un-changed columns in Lt

s: # of changed columns in Lt

We only need a matrix inverse of size

- s x s, instead of t x t
- if s<< t (a.k.a, “smooth”), we are faster
- example:
+ if s=10 and t=100, we are 1000x faster!

51
Comparison SVD, CUR/CMD vs. Colibri
s

Wish List SVD CUR/CMD Colibri

[Golub+ 1989] [Drineas+ 2005, [Tong+ 2008]
Sun+ 2007]
(C0)
Quality
(C1)
Efficiency
(C2)
Interpretation
(C3)
Dynamics
(?)

52
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
• Experimental Results
• Conclusion

53
Experimental Setup
• Data set
• Network traffic
• 21,837 sources/destinations
• 1,222 consecutive hours (~ 2 months)
• 22,800 edges per hour
• Accuracy:
Accuracy =
• Space Cost:

54
Performance
SVD SVD
of Colibri-S
• Accuracy CUR CUR
• Same 91%+
• Time
• 12x of CMD
• 28x of CUR
• Space
• ~1/3 of CMD CMD
• ~10% of CUR CMD Ours
Ours
Time Space 55
Performance
Time
CMD of Colibri-D
(Prior Best Method)

Network traffic
- 21,837 nodes
Colibri-S - 1,220 hours
- 22,800 edge/hr
Colibri-D Accuracy
- Same 93%+
# of changed cols
Colibri-D achieves up to 112x speedups 56
Conclusion: Colibri
• Colibri-S (for static graphs)
– Idea: remove redundancy
– Up to 52x speedup; 2/3 space saving
– No quality loss (w.r.t., CUR/CMD)
• Colibri-D (for dynamic graphs)
– Idea: leverage “smoothness”
– Up to 112x than CMD

57
optional

• More on Matrix Low Rank Approximations

58
Graph Mining by Low-Rank Approximation

Q: How to get the low-rank matrix approximations?

59
optional

More on LRA
• Q0: SVD + example-based LRA
• Q1: Nonnegative Matrix Factorization
• Q2: Non-negative Residual Matrix Factorization
• Q3: Nuclear norm related technologies

60
Low Rank Approximation
• Nonnegative Matrix Factorization (NMF)

DanielMay
D. Lee 1st-4th,
and H. Sebastian
2013Seung. Learning
SDM the 2013,
parts of objects
Austin, by Texas
non-negative matrix factorization. Nature 401,61788-
791 (21 October 1999)
Nonnegative Matrix Factorization (NMF)

• Factorizing a nonnegative matrix to the

product of two low-rank matrices
(entire F matrix)

(1 row in G)

r r
62
NMF Solutions: Multiplicative Updates
• Multiplicative update method

Daniel D. Lee and H. Sebastian Seung (2001). Algorithms for Non-negative Matrix Factorization. NIPS 2001.
H Zhou, K Lange,
May and M Suchard.
1st-4th, 2013 (2010) Graphical
SDM processing units andTexas
2013, Austin, high-dimensional optimization, Statistical Science,
63
25:311-324
NMF Solutions: Alternating Nonnegative
Least Squares
• Initialize F and G with nonnegative values
• Iterate the following procedure:
– Fixing , Solve
– Fixing , Solve

(1) Projected Gradient: http://www.csie.ntu.edu.tw/~cjlin/nmf/

(2) Newtown Type of Method:
http://www.cs.utexas.edu/users/dmkim/Source/software/nnma/index.html
(3) Block Principal Pivoting: https://sites.google.com/site/jingukim/nmf_bpas.zip?attredirects=0

P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error
estimates of data values. Environmetrics, 5(1):111–126, 1994
C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation,19(2007), 2756-2779.
D. Kim, S. Sra, I. S. Dhillon, Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem.
SDM 2007.
May 1st-4th, 2013 SDM 2013, Austin, Texas 64
J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons. ICDM 2008.
Application of NMF: Privacy-Aware On-line User
Role Tracking [AAAI11]
• Problem Definitions
– Given: the user-activity log that changes over time
– Monitor: (1) the user role/cluster; and (2) the role/cluster description.
• Design Objective
– (1) Privacy-aware; and (2) Efficiency (in both time and space).

65
Key Ideas
• Minimize the upper bound of the original/exact objective function
min ||X + ΔX – F G - ΔF GT – F ΔGT - ΔF ΔGT ||F

||X – F GT ||F Dependent on X,

but fixed
≤ +
||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X
subject to: ΔF+F ≥ 0, ΔG+G ≥ 0

min ||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X

subject to: ΔF+F ≥ 0, ΔG+G ≥ 0

Can be solved by the projected gradient descent method

Fei Wang, Hanghang Tong, Ching-Yung Lin: Towards Evolutionary Nonnegative Matrix
66
Factorization. AAAI 2011
Experimental Results

Time
Time Stamp Data Sets

Red: Our method; Blue: Off-line method

Fei Wang, Hanghang Tong, Ching-Yung Lin: Towards Evolutionary Nonnegative Matrix
67
Factorization. AAAI 2011
NMF: Extensions
• General loss
– Bregman Divergence
• Different constraints
– Semi-NMF, Convex NMF, Symmetric NMF
• Incorporating supervisions
– Pairwise constraints, label
• Multiple factorized matrices
– Tri-factorization
I. S. Dhillon and S. Sra. Generalized Nonnegative Matrix Approximations with Bregman Divergences. NIPS 2005.
Chris H. Q. Ding, Tao Li, Michael I. Jordan: Convex and Semi-Nonnegative Matrix Factorizations. IEEE Trans. Pattern Anal.
Mach. Intell. 32(1): 45-55 (2010)
Chris H. Q. Ding, Tao Li, Wei Peng, Haesun Park: Orthogonal nonnegative matrix t-factorizations for clustering. KDD 2006.
Fei Wang, Tao Li, Changshui Zhang: Semi-Supervised Clustering via Matrix Factorization. SDM 2008: 1-12
Yuheng Hu, Fei Wang, Subbarao Kambhampati. Listen to the Crowd: Automated Analysis of Live Events via Aggregated
TwitterMay 1st-4th, 2013 SDM 2013, Austin, Texas 68
Sentiment. IJCAI 2013.
Graph Mining by Low-Rank Approximation

Q: How to get the low-rank matrix approximations?

69
A2: Non-negative Residual MF
• Observations: anomalies → actual activities
• Examples: popularity contest, port scanner, etc
• NrMF formulation

Weighted Frobenius Form

Common in Any MF Weight

Unique in NrMF Non-negative residual

H. Tong, C. Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly
May 1st-4th,
Detection. SDM 20112013 SDM 2013, Austin, Texas 70
Visual Comparisons
Original NrMF SVD Original NrMF SVD

71
Low Rank Approximation
• Nonnegative Matrix Factorization
• Non-negative Residual Matrix Factorization
• Nuclear norm related technologies

SDM 2013, Austin, Texas 72

Rank Minimization and Nuclear Norm
• Matrix completion with rank minimization
NP hard

• Convex relaxation

M. Fazel, H. Hindi, S. Boyd. A Rank Minimization Heuristic with Application to Minimum Order
May
System 1st-4th, 2013Proceedings
Approximation. SDM 2013, Control
American Austin, Conference,
Texas 6:4734-4739, June 2001.
73
Nuclear Norm Minimization
• Singular Value Thresholding
– http://svt.stanford.edu/
• Accelerated gradient
– http://www.public.asu.edu/~jye02/Software/SLEP
/index.htm
• Interior point methods
– http://abel.ee.ucla.edu/cvxopt/applications/nucnr
m/
J-F. Cai, E.J. Candès and Z. Shen. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM Journal on
Optimization. Volume 20 Issue 4, January 2010 Pages 1956-1982.
Shuiwang Ji and Jieping Ye. An Accelerated Gradient Method for Trace Norm Minimization. The Twenty-Sixth International
Conference on Machine Learning (ICML 2009)
Z. Liu, May
Lieven 1st-4th,
Vandenberghe.
2013 Interior-point method for nuclear
SDM 2013, norm approximation
Austin, Texas with application to system 74
identification. SIAM Journal on Matrix Analysis and Applications (2009)
◼ From LRA to Co-clustering
Co-clustering
• Let X and Y be discrete random variables
– X and Y take values in {1, 2, …, m} and {1, 2, …, n}
– p(X, Y) denotes the joint probability distribution—if
not known, it is often estimated based on co-occurrence
data
– Application areas: text mining, market-basket analysis,
analysis of browsing behavior, etc.
• Key Obstacles in Clustering Contingency Tables
– High Dimensionality, Sparsity, Noise
– Need for robust and scalable algorithms

Reference:
1. Dhillon et al. Information-Theoretic Co-clustering, KDD’03
76
n
𝑃(𝑋, 𝑌) .05 .05 .05 0 0 0  eg, terms x documents
.05 .05 .05 0 0 0

m 0 0 0 .05 .05 .05

.04 
0 0 0 .05 .05 .05
.04 0 .04 .04 .04
.04 .04 .04 0 .04 .04 
k
 =
l n
.5 0 0  .3 0  l .36 .36 .28 0 0 0 .054 .054 .042 0 0 0 
.5 0 0  k 0 .3
.2 .2 0 0 0 .28 .36 .36
.054 .054 .042 0 0 0

m 0   00 
0 .5 0 0 0 .042 .054 .054

0 .5 0
 ෠
𝑃(𝑌|𝑌)
.036 0 0 .042 .054 .054

0  𝑃(𝑋,෠ 𝑌) .036 
0 .5 .036 028 .028 .036 .036
 0 .5  ෠ .036 .028 .028 .036 .036 

෠
𝑃(𝑋, 𝑌)
෠
𝑃(𝑋|𝑋)

77
med. doc
cs doc

.05 .05 .05 0 0 0  med. terms

.05 .05 .05 0 0 0

 00 0 0 .05 .05 .05
 cs terms
.04 
0 0 .05 .05 .05
term group x
.04 0 .04 .04 .04
doc. group .04 .04 .04 0 .04 .04 
common terms

.5
.5
0
0
0
0
 .03 .03
.2 .2

.36 .36 .28
0 0 0
0 0
.28 .36 .36
0
= .054
.054
.054
.054
.042
.042
0
0
0
0
0
0


0 .5 0
  00 0 0 .042 .054 .054

 00 .5 0
 doc x .036 0 0 .042 .054 .054

0  .036 
0 .5 .036 028 .028 .036 .036
 0 .5  doc group .036 .028 .028 .036 .036 

term x
term-group
78
Co-clustering
Observations
• uses KL divergence, instead of L2 or LF
• the middle matrix is not diagonal
– we’ll see that again in the Tucker tensor
decomposition

79
Matrix & Tensor Tools
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

80
Tensor Basics
Reminder: SVD

n n



VT
m A m

U
– Best rank-k approximation in L2 or LF

82
Reminder: SVD

n
1u1v1 2u2v2

m A  +

– Best rank-k approximation in L2

83 See also PARAFAC

Goal: extension to >=3 modes

IxJxK
IxR JxR


¼
A
B = +…+

RxRxR

84
Main points:
• 2 major types of tensor decompositions:
PARAFAC and Tucker
• both can be solved with ``alternating least
squares’’ (ALS)
• Details follow – we start with terminology:

85
[T. Kolda,’07]
A tensor is a multidimensional array
An I x J x K tensor Column (Mode-1) Row (Mode-2) Tube (Mode-3)
Fibers Fibers Fibers
X1,1,1

xijk
I

J
Horizontal Slices Lateral Slices Frontal Slices
3rd
order tensor
mode 1 has dimension I
mode 2 has dimension J
mode 3 has dimension K
Note: focus is on 3rd
order, but everything
can be extended to
higher orders.

86
details [T. Kolda,’07]
Matricization: Converting a Tensor to
a Matrix
X(n): The mode-n fibers are
Matricize
(i′,j′) rearranged to be the columns
(unfolding) (i,j,k)
of a matrix

Reverse
(i′,j′)
Matricize (i,j,k)

5 7
1 3
6 8
2 4

87
details

Tensor Mode-n Multiplication

• Tensor Times Matrix • Tensor Times Vector

Compute the dot

Multiply each
product of a and
row (mode-2)
fiber by B each column
(mode-1) fiber

[T. Kolda,’07]
88
details

Mode-n product Example

• Tensor times a matrix

Time

Location

Clusters
Location

Time

Clusters
Time

[T. Kolda,’07]
89
details

Mode-n product Example

• Tensor times a vector

Location

Time
Location

Time

[T. Kolda,’07]
90
details
Outer, Kronecker, &
Khatri-Rao Products
3-Way Outer Product Review: Matrix Kronecker Product

MxN PxQ

MP x NQ

=
Matrix Khatri-Rao Product
Rank-1 Tensor

MxR NxR MN x R

91 [T. Kolda,’07]
Specially Structured Tensors
Specially Structured Tensors
• Tucker Tensor • Kruskal Tensor

Our
Notation
Our
Notation

“core”

IxJxK IxR JxS IxJxK wI1 x R wR

JxR
= V = = +…+ V
v1 vR
U U
RxSxT u1
RxRxR uR

[T. Kolda,’07]
93
details

Specially Structured Tensors

• Tucker Tensor • Kruskal Tensor

In matrix form: In matrix form:

[T. Kolda,’07]
94
Outline: Part 2
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

95
Tensor Decompositions
Tucker Decomposition - intuition

IxJxK IxR JxS

~ B
A
RxSxT

• author x keyword x conference

• A: author x author-group
• B: keyword x keyword-group
• C: conf. x conf-group
• G: how groups relate to each other
97
Reminder
.05 .05 .05 0 0 0  med. terms
.05 .05 .05 0 0 0

 00 0 0 .05 .05 .05
 cs terms
.04 
0 0 .05 .05 .05
term group x
.04 0 .04 .04 .04
doc. group .04 .04 .04 0 .04 .04 
common terms

term x
term-group

98
Tucker Decomposition

IxJxK IxR JxS

~ B
Given A, B, C, the optimal core is:
A
RxSxT

• Proposed by Tucker (1966) Recall the equations for

• AKA: Three-mode factor analysis, three-mode converting a tensor to a matrix
PCA, orthogonal array decomposition
• A, B, and C generally assumed to be
orthonormal (generally assume they have full
column rank)
• is not diagonal
• Not unique

99
details

Tucker Variations
See Kroonenberg & De Leeuw, Psychometrika,1980 for discussion.
• Tucker2 Identity Matrix
IxJxK IxR JxS
~ B
A
RxSxK

• Tucker1
IxJxK IxR
~
A Finding principal components in only mode 1
RxJxK
can be solved via rank-R matrix SVD

100
details
Solving for Tucker
Given A, B, C orthonormal, the optimal core is: IxJxK IxR JxS
~~ B
Tensor norm is the square A
root of the sum of all the RxSxT
elements squared Eliminate the core to get:

Minimize
s.t. A,B,C orthonormal fixed maximize this
If B & C are fixed, then we can solve for A as follows:

Optimal A is R left leading singular vectors for

101
details

Higher Order SVD (HO-SVD)

Not optimal, but
IxJxK often used to
IxR JxS initialize Tucker-
~ B ALS algorithm.
A
RxSxT

(Observe connection to Tucker1)

De Lathauwer, De Moor, & Vandewalle, SIMAX, 1980

102
Tucker-Alternating Least Squares (ALS)
Successively solve for each component (A,B,C).

• Initialize
– Choose R, S, T
IxJxK – Calculate A, B, C via HO-SVD
IxR JxS
• Until converged do…
= B
– A = R leading left singular
A vectors of X(1)(CB)
RxSxT
– B = S leading left singular
vectors of X(2)(CA)
– C = T leading left singular
vectors of X(3)(BA)
• Solve for core:

Kroonenberg & De Leeuw, Psychometrika, 1980

103
details
Tucker in Not Unique

IxJxK IxR JxS

~ B
A
RxSxT

Tucker decomposition is not unique. Let Y be

an RxR orthonormal matrix. Then…

[T. Kolda,’07]
104
Outline: Part 2
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

105
CANDECOMP/PARAFAC
Decomposition

IxJxK
IxR JxR
~ B = +…+
A
RxRxR

• CANDECOMP = Canonical Decomposition (Carroll & Chang, 1970)

• PARAFAC = Parallel Factors (Harshman, 1970)
• Core is diagonal (specified by the vector )
• Columns of A, B, and C are not orthonormal
• If R is minimal, then R is called the rank of the tensor (Kruskal 1977)
• Can have rank ( ) > min{I,J,K}
106
details

PARAFAC-Alternating Least Squares (ALS)

Successively solve for each component (A,B,C).

= +…+

IxJxK
Khatri-Rao Product
(column-wise Kronecker product) Find all the vectors in
one mode at a time

Hadamard Product

If C, B, and  are fixed, the optimal A is given by:

Repeat for B,C, etc.

[T. Kolda,’07]
107
details

PARAFAC is often unique

IxJxK c1 cR
Assume
+…+ PARAFAC
= b1 bR decomposition
a1 aR is exact.

Sufficient condition for uniqueness (Kruskal, 1977):

kA = k-rank of A = max number k such that every set

of k columns of A is linearly independent
108
Tucker vs. PARAFAC Decompositions
• Tucker • PARAFAC
– Variable transformation in – Sum of rank-1 components
each mode – No core, i.e., superdiagonal
– Core G may be dense core
– A, B, C generally – A, B, C may have linearly
orthonormal dependent columns
– Not unique – Generally unique

IxJxK IxR JxS IxJxK c1 cR

~¼
B ¼ +…+
~ b1 bR
A
RxSxT a1 aR

109
Tensor tools - summary
• Two main tools
– PARAFAC
– Tucker
• Both find row-, column-, tube-groups
– but in PARAFAC the three groups are identical
• To solve: Alternating Least Squares

110
Tensor tools - resources
• Toolbox: from Tamara Kolda:
csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
• T. G. Kolda and B. W. Bader. Tensor
Decompositions and Applications. SIAM
Review 2008
• csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfil
es/TensorReview-preprint.pdf

111
Key Papers
Core Papers
• Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos: Colibri: fast mining
of large static and dynamic graphs. KDD 2008: 686-694
• Dhillon et al. Information-Theoretic Co-clustering, KDD’03
• T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review 2008

Further Reading
• Chih-Jen Lin: Projected Gradient Methods for Non-negative Matrix Factorization.
https://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf
• Candès, Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex optimization."
Foundations of Computational mathematics 9, no. 6 (2009): 717.
• Rendle, S. (2010, December). Factorization machines. In 2010 IEEE International Conference on Data
Mining (pp. 995-1000). IEEE.
• Tamara G. Kolda, Brett W. Bader, Joseph P. Kenny: Higher-Order Web Link Analysis Using Multilinear
Algebra. ICDM 2005: 242-249
• U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos Faloutsos: GigaTensor: scaling tensor analysis
up by 100 times - algorithms and discoveries. KDD 2012: 316-324
• Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, Christos Faloutsos: Fully automatic
cross-associations. KDD 2004: 79-88
• Trigeorgis, G., Bousmalis, K., Zafeiriou, S., & Schuller, B. (2014, January). A deep semi-nmf model for
learning hidden representations. In International Conference on Machine Learning (pp. 1692-1700).
• Risi Kondor, Nedelina Teneva, and Vikas Garg. 2014. Multiresolution matrix factorization. In International
Conference on Machine Learning. 1620–1628

112

Golub, Van Loan, Matrix Computations
100% (1)
Golub, Van Loan, Matrix Computations
367 pages
U5 - SVD - 5th Sem - DS
No ratings yet
U5 - SVD - 5th Sem - DS
17 pages
Directsparsematrices
No ratings yet
Directsparsematrices
87 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
1symeonidis Panagiotis Zioupos Andreas Matrix and Tensor Fact
No ratings yet
1symeonidis Panagiotis Zioupos Andreas Matrix and Tensor Fact
101 pages
Near-Neighbor Search in Pattern Distance Spaces
No ratings yet
Near-Neighbor Search in Pattern Distance Spaces
5 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
MATRIX FACTORIZ-WPS Office
No ratings yet
MATRIX FACTORIZ-WPS Office
15 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
24120036_coding_assignment_report
No ratings yet
24120036_coding_assignment_report
5 pages
Sparse Matrix Methods: Day 1: Overview
No ratings yet
Sparse Matrix Methods: Day 1: Overview
17 pages
EECS 275 Matrix Computation: Ming-Hsuan Yang
No ratings yet
EECS 275 Matrix Computation: Ming-Hsuan Yang
21 pages
epdf.pub_scientific-computation.gonnet09.pdf
No ratings yet
epdf.pub_scientific-computation.gonnet09.pdf
249 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
KPSVD
No ratings yet
KPSVD
6 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Advanced R
100% (2)
Advanced R
24 pages
MIT18 409S15 Bookex
No ratings yet
MIT18 409S15 Bookex
123 pages
Recommender_Systems_Assignment
No ratings yet
Recommender_Systems_Assignment
10 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
100 Time Series Data Mining Questions With Answers
No ratings yet
100 Time Series Data Mining Questions With Answers
26 pages
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
No ratings yet
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
35 pages
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
No ratings yet
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
100 pages
Clustering Lecture 1: Basics: Jing Gao
No ratings yet
Clustering Lecture 1: Basics: Jing Gao
62 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
hpc_graph
No ratings yet
hpc_graph
22 pages
M2 - FDS
No ratings yet
M2 - FDS
20 pages
Algorithms 17 00112 v2
No ratings yet
Algorithms 17 00112 v2
11 pages
1513637
No ratings yet
1513637
40 pages
Euclidean Distance Reconstruction From Partial Distance Information
No ratings yet
Euclidean Distance Reconstruction From Partial Distance Information
11 pages
6_2023_10_03!10_30_59_PM
No ratings yet
6_2023_10_03!10_30_59_PM
26 pages
T25. Forecasting Big Time Series - Theory and Practice
No ratings yet
T25. Forecasting Big Time Series - Theory and Practice
166 pages
Z-Matrix (Mathematics) PDF
No ratings yet
Z-Matrix (Mathematics) PDF
7 pages
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
No ratings yet
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
24 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
M4.arrays Searching Sorting
No ratings yet
M4.arrays Searching Sorting
44 pages
Large-Scale Data Mining CS 395T: Unique Number: 49460
No ratings yet
Large-Scale Data Mining CS 395T: Unique Number: 49460
4 pages
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
No ratings yet
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
28 pages
Lec 5
No ratings yet
Lec 5
24 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
TSIndexing
No ratings yet
TSIndexing
64 pages
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
No ratings yet
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
36 pages
Intro To Duplicate Detection
No ratings yet
Intro To Duplicate Detection
87 pages
Nguyen Princeton 0181D 11063
No ratings yet
Nguyen Princeton 0181D 11063
168 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Most Tensor Problems Are NP-Hard: ACM Reference Format
No ratings yet
Most Tensor Problems Are NP-Hard: ACM Reference Format
39 pages
Cluster Analysis Introduction
No ratings yet
Cluster Analysis Introduction
23 pages
17-Matrix Sketching
No ratings yet
17-Matrix Sketching
65 pages
02data Part4
No ratings yet
02data Part4
28 pages
Linear Algebra Course Project
No ratings yet
Linear Algebra Course Project
7 pages
ADMM For Combinatorial Graph Problems: Preprint
No ratings yet
ADMM For Combinatorial Graph Problems: Preprint
20 pages
Sta 5
No ratings yet
Sta 5
16 pages
DWDM Rit-E22 Unit4
No ratings yet
DWDM Rit-E22 Unit4
139 pages
APS1070 Lecture (6) Slides Annotated
No ratings yet
APS1070 Lecture (6) Slides Annotated
77 pages
Kshape
No ratings yet
Kshape
49 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
Night of the Living Dogs: Book 3
From Everand
Night of the Living Dogs: Book 3
Trina Robbins
4/5 (1)
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
LED Lights
From Everand
LED Lights
Leo Musk
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
2412.09401v2
No ratings yet
2412.09401v2
15 pages
2406.12080v1
No ratings yet
2406.12080v1
15 pages
monst3r_paper
No ratings yet
monst3r_paper
21 pages
Lecture 3
No ratings yet
Lecture 3
62 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
Chen Et Al. 2018
No ratings yet
Chen Et Al. 2018
10 pages
Xu Et Al. 2022
No ratings yet
Xu Et Al. 2022
12 pages
Early History of Soil-Structure Interaction - 2010
No ratings yet
Early History of Soil-Structure Interaction - 2010
11 pages
U3 - Firm Behaviour
No ratings yet
U3 - Firm Behaviour
39 pages
Engineering Mechanics Dec 2013
No ratings yet
Engineering Mechanics Dec 2013
11 pages
Chap 011
No ratings yet
Chap 011
42 pages
Mane2019
No ratings yet
Mane2019
6 pages
Mathematics_Advanced Crux (Sol.)
No ratings yet
Mathematics_Advanced Crux (Sol.)
45 pages
Exploring Matrix Applications in The Digital World Using C Programming
No ratings yet
Exploring Matrix Applications in The Digital World Using C Programming
19 pages
Acquisition of Seismic Data Using Wiener Filter
No ratings yet
Acquisition of Seismic Data Using Wiener Filter
26 pages
Graduate Trainee Test
No ratings yet
Graduate Trainee Test
7 pages
Lecture 6 - Convolution Neural Network (CNN)
No ratings yet
Lecture 6 - Convolution Neural Network (CNN)
26 pages
Assignmentquestion (Sem120172018)
No ratings yet
Assignmentquestion (Sem120172018)
5 pages
Datesheet - Semester End Examination (First Semester) - January 2023
No ratings yet
Datesheet - Semester End Examination (First Semester) - January 2023
27 pages
One or More Way ANOVA
No ratings yet
One or More Way ANOVA
23 pages
Multilayer Perceptron: Architecture Optimization and Training
No ratings yet
Multilayer Perceptron: Architecture Optimization and Training
5 pages
An Introduction To Coding Theory For Mathematics Students: John Kerl
No ratings yet
An Introduction To Coding Theory For Mathematics Students: John Kerl
28 pages
7SJ80
No ratings yet
7SJ80
5 pages
Research Paper Group 5
No ratings yet
Research Paper Group 5
35 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
Double Pendulum - Dif Eq
No ratings yet
Double Pendulum - Dif Eq
47 pages
Core Programming: Basic Object-Oriented Programming in Java
No ratings yet
Core Programming: Basic Object-Oriented Programming in Java
25 pages
SAFAL Practice Paper-2 Maths Class IX
No ratings yet
SAFAL Practice Paper-2 Maths Class IX
5 pages
Download Statistical Inference 2e 2nd Edition George Casella ebook All Chapters PDF
100% (1)
Download Statistical Inference 2e 2nd Edition George Casella ebook All Chapters PDF
51 pages
Homework and Practice Workbook Answers 6th Grade
100% (1)
Homework and Practice Workbook Answers 6th Grade
6 pages
Academic Schedule Allen Paharia JEE 31-03-2025 TO 06-04-2025
No ratings yet
Academic Schedule Allen Paharia JEE 31-03-2025 TO 06-04-2025
1 page
Social Networks: Maria Rosaria D'Esposito, Domenico de Stefano, Giancarlo Ragozini
No ratings yet
Social Networks: Maria Rosaria D'Esposito, Domenico de Stefano, Giancarlo Ragozini
13 pages
Mathematics for Physicists and Engineers: Fundamentals and Interactive Study Guide, 3rd 3rd Edition Klaus Weltner pdf download
No ratings yet
Mathematics for Physicists and Engineers: Fundamentals and Interactive Study Guide, 3rd 3rd Edition Klaus Weltner pdf download
52 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Chess and Philosophy A Rational Conversa
No ratings yet
Chess and Philosophy A Rational Conversa
8 pages
K-Tron (YF-401) PDF
No ratings yet
K-Tron (YF-401) PDF
12 pages
Material Selection For Low-Temperature Applications - HP - July 2004
No ratings yet
Material Selection For Low-Temperature Applications - HP - July 2004
12 pages

03 Matrix

Uploaded by

03 Matrix

Uploaded by

CS 514 Advanced Topics in Network Science

Lecture 3. Matrix and Tensor

(e.g., clusters, communities,

(e.g., ranking, link prediction, embedding, etc.)

L: author group matrix

ICML = [1, 1, 0]’

• Column Space of a Matrix

VLDB = SIGMOD – ICML = [2 0 0]’

Projection of v Projection matrix of B An arbitrary vector 14

Projection of A Projection matrix of B

A: n x m U: left singular vectors 17

–For any rank-k matrix Ak

Duplicate: deleted in CMD!

• 3 copies of green, Right matrix : LT A

Current For each col. v in C

• (C1) Time: O(c 3 + cm) vs. O(c3 + cm ), where c  c, m  m

Each dot is a conference

2nd singular vector

1st singular vector

Each dot is a conference

Each dot is a conference

Each dot is a conference

Each dot is a conference

Initially sampled matrix

Q: How to update L and M efficiently? 46

Initially sampled matrix

Theorem 2 [Tong et al KDD 2008] -1

We only need an s x s matrix inverse !

We only need a matrix inverse of size

Wish List SVD CUR/CMD Colibri

• More on Matrix Low Rank Approximations

Q: How to get the low-rank matrix approximations?

• Factorizing a nonnegative matrix to the

(1) Projected Gradient: http://www.csie.ntu.edu.tw/~cjlin/nmf/

||X – F GT ||F Dependent on X,

min ||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X

Can be solved by the projected gradient descent method

Red: Our method; Blue: Off-line method

Q: How to get the low-rank matrix approximations?

Weighted Frobenius Form

Common in Any MF Weight

Unique in NrMF Non-negative residual

SDM 2013, Austin, Texas 72

.05 .05 .05 0 0 0  med. terms

– Best rank-k approximation in L2

83 See also PARAFAC

Tensor Mode-n Multiplication

• Tensor Times Matrix • Tensor Times Vector

Compute the dot

Mode-n product Example

Mode-n product Example

IxJxK IxR JxS IxJxK wI1 x R wR

Specially Structured Tensors

In matrix form: In matrix form:

IxJxK IxR JxS

• author x keyword x conference

IxJxK IxR JxS

• Proposed by Tucker (1966) Recall the equations for

Optimal A is R left leading singular vectors for

Higher Order SVD (HO-SVD)

(Observe connection to Tucker1)

De Lathauwer, De Moor, & Vandewalle, SIMAX, 1980

Kroonenberg & De Leeuw, Psychometrika, 1980

IxJxK IxR JxS

Tucker decomposition is not unique. Let Y be

• CANDECOMP = Canonical Decomposition (Carroll & Chang, 1970)

PARAFAC-Alternating Least Squares (ALS)

If C, B, and  are fixed, the optimal A is given by:

Repeat for B,C, etc.

PARAFAC is often unique

Sufficient condition for uniqueness (Kruskal, 1977):

kA = k-rank of A = max number k such that every set

IxJxK IxR JxS IxJxK c1 cR

You might also like