L21 Mining Social Network Graphs
L21 Mining Social Network Graphs
Lecture 8
December 4, 2017 1
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Outline
1. Social Networks as Graphs
a) Examples
b) Representation
c) Properties
2. Clustering
a) Distance Measures
b) Girvan-Newman
3. Spectral Analysis
a) Networks as Matrices
b) Connectivity
c) Partitioning
d) Clustering
December 4, 2017 2
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 3
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 4
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Less Obvious
• Phone/E-mail (within K units of time)
• Collaboration (academic papers, patents)
• Biological (proteins, genes)
December 4, 2017 5
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Representation
• Nodes = entities
– People, papers, …
– Single vs. multiple
• del.icio.us: people, websites, tags
• Edges = relationship
– Discrete vs continuous (i.e. weighted)
– Directed (e.g. following) vs undirected
December 4, 2017 6
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• Given a graph of 7 nodes (A-G), how
many undirected edges are possible?
December 4, 2017 7
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Answer
• Given a graph of 7 nodes (A-G), how
many undirected edges are possible?
✓ ◆
7 7! 7(7 1)
= = = 21
2 2!(7 2)! 2
December 4, 2017 8
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Locality
• Typically assumed that if node A is
connected to both B and C, then it is
more likely than random that B and C are
connected
December 4, 2017 9
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Example Graph
A B D E
G F
December 4, 2017 10
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Communities to Be Detected
A B D E
G F
December 4, 2017 11
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
G F
• Nodes: 7
• Edges: 9
• Avg Connectivity: 9/21 ~ 0.43
December 4, 2017 12
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
G F
• Nodes: 7
• Edges: 9
• Avg Connectivity: 9/21 ~ 0.43
• Expected Locality: 7/19 ~ 0.37
December 4, 2017 13
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Actual?
A B D E
G F
• Nodes: 7
• Edges: 9
• Avg Connectivity: 9/21 ~ 0.43
• Expected Locality: 7/19 ~ 0.37
• What are all the triples?
December 4, 2017 14
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
• ✅ = 9, ❌ = 7
• ✅ / (✅ + ❌) = 9/16 ~ 0.56
December 4, 2017 15
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Local!
A B D E
G F
• Nodes: 7
• Edges: 9
• Avg Connectivity: 9/21 ~ 0.43
• Expected Locality: 7/19 ~ 0.37 Let’s Cluster!
• Actual Locality: 9/16 ~ 0.56 (>> Expected!)
December 4, 2017 16
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Distance Measure?
• Distance measures on social-network graphs
can be tricky
December 4, 2017 17
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Betweenness
• One of the simplest measures, based on
finding the edges that are the least likely
to be inside a community
– Clustering: remove high betweenness first!
December 4, 2017 18
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 19
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
5 12 4.5
A B D E
1 5
4.5 4 1.5
C
G F
1.5
December 4, 2017 20
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A B D E
1
1.5
C
G F
1.5
December 4, 2017 21
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 22
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
G F
December 4, 2017 23
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
G F
December 4, 2017 24
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
G F
B G
December 4, 2017 25
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
G F
B G
A C
December 4, 2017 26
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
G F
B G
A C
December 4, 2017 27
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 28
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
G F
B G 1
1 A C 1
December 4, 2017 29
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F C
0.5 0.5
G F
B G 1
1 1
1 A C 1
December 4, 2017 30
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F 1.5 C
0.5 0.5
G F
3 B G 1
1 1
1 A C 1
December 4, 2017 31
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
D F 1.5 C
3 0.5 0.5
G F
3 B G 1
1 1
1 A C 1
December 4, 2017 32
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
4.5 D F 1.5 C
3 0.5 0.5
G F
3 B G 1
1 1
1 A C 1
December 4, 2017 33
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
4.5 D F 1.5 C
3 0.5 0.5
G F
3 B G 1
1 1
1 A C 1
December 4, 2017 34
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
5 B C 1 C
4
4 G F
D
1 1
1
1 E F G 1
1
December 4, 2017 35
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 36
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
5 B A 1 C
4
4 G F
D
1 1
1
1 E F G 1
1
December 4, 2017 37
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 38
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A C
1 1
December 4, 2017 39
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
1.5 F D 4.5 C
0.5
0.5 3
G F
1 E B 3
1 1
1 A C 1
December 4, 2017 40
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 41
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 42
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 43
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 44
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 45
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 46
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 47
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 48
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 49
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Edge Contributions
AB AC BC BD DE DG DF EF GF
5 1 5 12 4.5 4.5 4 1.5 1.5
5 12 4.5
A B D E
1 5
4.5 4 1.5
C
G F
1.5
December 4, 2017 50
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Girvan-Newman
1. Repeat until no edges left
a) Calculate betweenness of edges
b) Remove edge(s) with highest betweenness
December 4, 2017 51
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Example (0)
December 4, 2017 52
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Example (1)
Step 1: Step 2:
December 4, 2017 53
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 54
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Selecting # of Clusters
December 4, 2017 55
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Direct Partitioning
• We now look at an approach to divide a
graph into two disjoint groups, the task of
bi-partitioning, via spectral analysis
December 4, 2017 56
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A B D E
C
2 3
0 1 1 0 0 0 0
61 0 1 1 0 0 07
6 7 G F
61 1 0 0 0 0 07
6 7
60 1 0 0 1 1 17
6 7
60 0 0 1 0 1 07
6 7
40 0 0 1 1 0 15
0 0 0 1 0 1 0
Mining Social-Network Graphs
December 4, 2017 57
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A B D E
C
2 3
2 0 0 0 0 0 0
60 3 0 0 0 0 07
6 7 G F
60 0 2 0 0 0 07
6 7
60 0 0 4 0 0 07
6 7
60 0 0 0 2 0 07
6 7
40 0 0 0 0 3 05
0 0 0 0 0 0 2
Mining Social-Network Graphs
December 4, 2017 58
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Laplacian (L)
L=D-A
A B D E
2 3
2 1 1 0 0 0 0
6 1 3 1 1 0 0 07 G F
6 7
6 1 1 2 0 0 0 07
6 7
6 0 1 0 4 1 1 17
6 7
6 0 0 0 1 2 1 07
6 7
4 0 0 0 1 1 3 15
0 0 0 1 0 1 2
December 4, 2017 59
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• Rows sum to… ?
0 (degree – edges)
• Diagonal Dominant?
Yes! (edge case = lonely)
December 4, 2017 60
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
The Plan
• We will now analyze the eigen-
decomposition (Lv = 𝜆v) of the Laplacian
December 4, 2017 61
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• What is a non-trivial solution to the
equation Lv = 𝜆v for 𝜆=0 (i.e. Lv=0)?
2 3
2 1 1 0 0 0 0
6 1 3 1 1 0 0 07
6 7
6 1 1 2 0 0 0 07
6 7
6 0 1 0 4 1 1 17
6 7
6 0 0 0 1 2 1 07
6 7
4 0 0 0 1 1 3 15
0 0 0 1 0 1 2
Mining Social-Network Graphs
December 4, 2017 62
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Answer
• What is a non-trivial solution to the
equation Lv = 𝜆v for 𝜆=0 (i.e. Lv=0)?
December 4, 2017 63
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• Laplacian of the following graph?
W X
Y Z
December 4, 2017 64
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Answer
• Laplacian of the following graph?
W X
2 3
1 1 0 0
6 1 1 0 07
6 7
40 0 1 15
Y Z
0 0 1 1
Mining Social-Network Graphs
December 4, 2017 65
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• Null-space of L? (i.e. Lv=0)
W X
2 3
1 1 0 0
6 1 1 0 07
6 7
40 0 1 15
Y Z
0 0 1 1
Mining Social-Network Graphs
December 4, 2017 66
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Answer
• Null-space of L? (i.e. Lv=0)
[1 1 0 0]T
[0 0 1 1]T
W X
2 3
1 1 0 0
6 1 1 0 07
6 7
40 0 1 15
Y Z
0 0 1 1
Mining Social-Network Graphs
December 4, 2017 67
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 68
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Second-Smallest Eigenvalue
• Let’s call this 𝜆1 (with corresponding v1)
• If 𝜆1 is 0, what do we know?
That there’s not much sense in bi-partitioning :)
December 4, 2017 69
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Setup
• For a given graph, we want to assign each
node to one of two groups
– Let’s represent this assignment, per node, as
the value -1 or +1 for variable ni
– So, a vector, n, of length |V| of either -1 or 1
December 4, 2017 70
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
What is (ni – nj)2 for two connected nodes that …
December 4, 2017 71
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Converting to Matrices
• So we have ∑(ni – nj)2
– Or: ∑ (ni2 – 2ninj + nj2)
December 4, 2017 72
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Checkup
• What is the easiest way to minimize the
following function: ni2 – 2ninj + nj2
December 4, 2017 73
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Answer
• What is the easiest way to minimize the
following function: ni2 – 2ninj + nj2
• n = [0 0 … 0]
– So we need to force values…
December 4, 2017 74
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 75
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Ln n=0
December 4, 2017 76
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A B
C D
December 4, 2017 77
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
2 3
2 1 1 0
6 1 2 0 17
6 7
4 1 0 1 05
0 1 0 1
December 4, 2017 78
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
Sign = Partitioning (+ vs -)
+ {B, D}
- {A, C}
December 4, 2017 79
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
A B
C D
December 4, 2017 80
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 81
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 82
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 83
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
December 4, 2017 84
CS6220 – Data Mining Techniques・ ・・ Fall 2017・ ・・ Derbinsky
• Network/Node properties
– Triangles, neighborhoods
– Centrality, influence
• Network formation
• Problems
– Link prediction
Mining Social-Network Graphs
December 4, 2017 85