0% found this document useful (0 votes)
6 views

04 Communities

The document discusses community detection in networks, defining communities as sets of nodes that are densely connected internally. It outlines various methods for detecting communities, including the Clique Percolation Method and different community detection criteria. Additionally, it highlights the importance of community detection in fields such as sociology, biology, and computer science.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

04 Communities

The document discusses community detection in networks, defining communities as sets of nodes that are densely connected internally. It outlines various methods for detecting communities, including the Clique Percolation Method and different community detection criteria. Additionally, it highlights the importance of community detection in fields such as sociology, biology, and computer science.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Communities

What is community?
Finding communities
CONCOR
Edge removal (Girvan-Newman)
Outline
• What does “network community” mean?
• Community detection versus graph partitioning
versus hierarchical clustering
• Graph partitioning algorithms
– Spectral partitioning (Fiedler’s method based on
graph Laplacian)
• Modularity metric for community detection
– Spectral-based modularity optimization
– Other methods for modularity optimization
• Community detection methods that do not rely on
modularity metric
– Betweenness-Centrality method
– Radicchi et al. method
• Hierarchical agglomerative clustering
Cliques Review
• Dfn: A clique is a maximal,
completely connected subgraph of a
given graph.
B B B

C A C A C A

D D D
Clique Review

• Clique problem: refers to any problem to find particular


(complete) subgraphs ("cliques") in a graph,
• i.e., sets of elements where each pair of elements is
connected.

http://sebastian.doc.gold.ac.uk/
• Note: the notion of clique here
dose not necessary refers to a
complete subgraph,

Complete Graph: there's an edge between any two node


Dense Graph: number of edges is close to the maximal number of
edges
Sparse Graph: when it has only a few edges

http://sebastian.doc.gold.ac.uk/
Clique Percolation Method (CPM)
• Clique is a very strict definition, unstable
• Normally use cliques as a core to find larger communities
• CPM is such a method to find overlapping communities
– Input
• A parameter k, and a network

– Procedure
1. Find out all cliques of size k in a given network
2. Construct a clique graph. Two cliques are adjacent if
they share k-1 nodes
3. Each connected components in the clique graph form
a community 6
Example: Clique Percolation Method
Step 1: Find all Cliques of size 3

{1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8},
{6, 7, 8}

7
Step 2: Construct Clique Graph

{1, 2, 3}, {1, 3, 4}, {4, 5, 6},


{5, 6, 7}, {5, 6, 8}, {5, 7, 8},
{6, 7, 8}

8
Step 3: Finding Communities
Two cliques are adjacent if
they share k-1 nodes (i.e. k-1=2)

{1, 2, 3}, {1, 3, 4}, {4, 5, 6},


{5, 6, 7}, {5, 6, 8}, {5, 7, 8},
{6, 7, 8}

Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8} 9
Communities cont..
• In the study of complex networks, a network is said to
have community structure if the nodes of the network can
be easily grouped into (potentially overlapping) sets of
nodes such that each set of nodes is densely connected
internally.
• In the particular case of non- overlapping community
finding, this implies that the network divides naturally into
groups of nodes with dense connections internally and
sparser connections between groups.
• But overlapping communities are also allowed. The more
general definition is based on the principle that pairs of
nodes are more likely to be connected if they are both
members of the same community(ties), and less likely to be
connected if they do not share communities.
Network Communities cont..
• One of the most relevant feature of graphs
representing real systems is community structure or
Clustering.
• The organization of vertices in clusters , with many
edges joining vertices of same cluster and
comparatively few edges joining vertices of different
clusters.
• Such clusters or communities can be considered as
fairly independent compartments of a graph, plying
similar role like “tissues or the organs in the human
body”.
Communities cont..
• Detecting communities is a great importance
in sociology, biology and computer science.
• Detection of communities is a task of defining
and identifying communities in social and
information networks.
• In graphs in which the node represents
underlying social entities and the edges
represent interactions between pairs of nodes.
GROUP DISCUSSION:

How would you define a


community/cluster/partition
in a network?
Structural equivalence
• Dfn: Two nodes i and j are
structurally equivalent for graph G
if: Gik G jk , k i, k  j
edges are the same all other nodes

I.e., relationships to all other nodes


are identical!
• In practice: too rigid
Community structure defined on

• Dfn: A community structure, ∏, is


nodes/vertices

a collection of disjoint subsets of V


(i.e., a partition) whose union is V.
Why find communities?
• Improve web client performance
• Recommender systems
• Efficient graph storage/access
• Study node relationships
Clustering'web'clients'with'similar'interest'or' geographically‘ Near 'can‘ improve‘
performance’
• Customers'with'similar'interests'could'be' clustered 'to'help‘ recommendation
'systems’
• Clusters'in'large'graphs'can'be'used'to'create ' data'structures‘ for'efficient‘
storage'of'graph' data'to'handle'queries'or'path'searches’
• Study'the'relationship/mediation 'among' nodes
Communities Detection
• Detecting communities is a great importance
in sociology, biology and computer science.
• Detection of communities is a task of defining
and identifying communities in social and
information networks.
• In graphs in which the node represents
underlying social entities and the edges
represent interactions between pairs of nodes.
Taxonomy of Community Criteria
- Community detection methods categories:
• Node-Centric Community Detection
– Each node in a group satisfies certain properties
• Group-Centric Community Detection
– Consider the connections within a group as a whole. The group
has to satisfy certain properties without zooming into node-level
• Network-Centric Community Detection
– Partition the whole network into several disjoint sets
• Hierarchy-Centric Community Detection
– Construct a hierarchical structure of communities
A classification of
community
detection and
graph clustering
methods
Community Detection
• Network communities
– Sets of nodes where the nodes in the same
set are similar (more internal links) and the
nodes in different sets are dissimilar (less
external links)
– Communities, clusters, modules, groups, etc.
• Non-overlapping community detection
– Finding a good partition of nodes

Clusters are NOT


overlapped
Overlapping Community Detection

• A person (node) can belong to multiple


communities, e.g., family, friends, colleagues, etc.
• Overlapping community detection allows that a
node can be included in different groups

family,
friends,
colleagues
,
Existing Methods
• Node-based: A node overlaps if more than one belonging
coefficient values are larger than some threshold
– Label Propagation (COPRA) [Gregory 2010, Subelj and Bajec 2011]
• Structure-based: A node overlaps if it participates in
multiple base structures with different memberships
– Clique Percolation (CPM) [Palla et al. 2005, Derenyi et al. 2005]
– Link Partition [Evans and Lambiotte 2009 , Ahn et al. 2010]

f(i,c1)=0.35,
f(i,c2)=0.05, Base structure:
f(i,c3)=0.4, … links

i i i

f(i,c)=mean(f(j,c))
Limitations of Existing Methods

• The existing methods do not perform well for


– 1. networks with many highly overlapping nodes,
– 2. networks with various base structures, and
– 3. networks with many weak-ties

f(i,c1)=0.2, f(i,c2)=0.15,
f(i,c3)=0.25, f(i,c4)=0.2,

Weak-tie
c1 c2

i c3 i i

c4
i: overlapping i: non-overlapping i: non-overlapping
COPRA fails CPM fails Link partition fails
Convergences of iterated Correlation
CONCOR Intuition
• In a partition, nodes are similar (w.r.t. edges)
• Find similar nodes via correlation
– Pattern of edges: adjacency matrix
Review: Adjacency Matrix
• N(v) pre-calc A
A
0
B
1
C
0
D

E F G H I J K L M

B 1 0 0 …
C 0 0 ... 1
D … … 1
E 1 1
F 1 1
G 1 1 1
H 1 1 1
I 1 1
J 1 1 1
K 1 1 1
L 1 1
M 1 1
Background: Pearson Correlation
• How related are variables X and Y?
+1: positively related
0: not related
-1: inversely related

• How to calculate it (for a sample)?


covariance of X and Y
n mean
 x i  x yi  y 
value of Y
rxy  i 1
n n sample

 i
x  x  
 iy  y
2
2

i 1 i 1

standard deviation of Y sample


CONCOR Intuition
• In a partition, nodes are similar
(w.r.t. edges)
• Find similar nodes via correlation
– Pattern of edges: adjacency matrix
– 2D adjacency matrix -> 2D
correlation matrix
– Iterate!... Will become stable…
Iterated Correlation Matrices
A B C D E F G H I J K L M A B C D E F G H I J K L M
A A
B
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
C
(0) A
B
1.00 -0.08 -0.08 -0.08 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
-0.08 1.00 -0.08 -0.08 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
C 0 0 0 0 1 0 0 0 0 0 0 0 0 C -0.08 -0.08 1.00 1.00 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
D 0 0 0 0 1 0 0 0 0 0 0 0 0 D -0.08 -0.08 1.00 1.00 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
E 0 0 1 1 0 0 0 0 0 0 0 0 0 E -0.12 -0.12 -0.12 -0.12 1.00 -0.18 -0.23 -0.23 -0.18 -0.23 -0.23 -0.18 -0.18
F 0 0 0 0 0 0 1 1 0 0 0 0 0 F -0.12 -0.12 -0.12 -0.12 -0.18 1.00 -0.23 -0.23 0.41 0.78 -0.23 -0.18 0.41
G 0 0 0 0 0 1 0 0 1 1 0 0 0 G -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 1.00 0.57 -0.23 -0.30 0.57 -0.23 -0.23
H 0 0 0 0 0 1 0 0 0 1 0 0 1 H -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 0.57 1.00 -0.23 -0.30 0.13 0.27 -0.23
I 0 0 0 0 0 0 1 0 0 0 1 0 0 I -0.12 -0.12 -0.12 -0.12 -0.18 0.41 -0.23 -0.23 1.00 0.78 -0.23 0.41 -0.18
J 0 0 0 0 0 0 1 1 0 0 1 0 0 J -0.16 -0.16 -0.16 -0.16 -0.23 0.78 -0.30 -0.30 0.78 1.00 -0.30 0.27 0.27
K 0 0 0 0 0 0 0 0 1 1 0 1 0 K -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 0.57 0.13 -0.23 -0.30 1.00 -0.23 0.27
L 0 0 0 0 0 0 0 0 0 0 1 0 1 L -0.12 -0.12 -0.12 -0.12 -0.18 -0.18 -0.23 0.27 0.41 0.27 -0.23 1.00 -0.18
M 0 0 0 0 0 0 0 1 0 0 0 1 0 M -0.12 -0.12 -0.12 -0.12 -0.18 0.41 -0.23 -0.23 -0.18 0.27 0.27 -0.18 1.00

A B C D E F G H I J K L M A B C D E F G H I J K L M
C
(t) A
B 1
1 1
1
1
1
1
1
1
1
-1
-1
1
1
1
1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
C
(1) A
B
1.00 -0.08 -0.08 -0.08 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
-0.08 1.00 -0.08 -0.08 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
C 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 C -0.08 -0.08 1.00 1.00 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
D 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 D -0.08 -0.08 1.00 1.00 -0.12 -0.12 -0.16 -0.16 -0.12 -0.16 -0.16 -0.12 -0.12
E 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 E -0.12 -0.12 -0.12 -0.12 1.00 -0.18 -0.23 -0.23 -0.18 -0.23 -0.23 -0.18 -0.18
F -1 -1 -1 -1 -1 1 -1 -1 1 1 -1 1 1 F -0.12 -0.12 -0.12 -0.12 -0.18 1.00 -0.23 -0.23 0.41 0.78 -0.23 -0.18 0.41
G 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 G -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 1.00 0.57 -0.23 -0.30 0.57 -0.23 -0.23
H 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 H -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 0.57 1.00 -0.23 -0.30 0.13 0.27 -0.23

+1, -1: Two


I -1
J -1
-1
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
-1
-1
1
1
1
1
-1
-1
1
1
1
1
I
J
-0.12 -0.12 -0.12 -0.12 -0.18 0.41 -0.23 -0.23 1.00 0.78 -0.23 0.41 -0.18
-0.16 -0.16 -0.16 -0.16 -0.23 0.78 -0.30 -0.30 0.78 1.00 -0.30 0.27 0.27

communities!
K 1 1 1 1 1 -1 1 1 -1 -1 1 -1 -1 K -0.16 -0.16 -0.16 -0.16 -0.23 -0.23 0.57 0.13 -0.23 -0.30 1.00 -0.23 0.27
L -1 -1 -1 -1 -1 1 -1 -1 1 1 -1 1 1 L -0.12 -0.12 -0.12 -0.12 -0.18 -0.18 -0.23 0.27 0.41 0.27 -0.23 1.00 -0.18
M -1 -1 -1 -1 -1 1 -1 -1 1 1 -1 1 1 M -0.12 -0.12 -0.12 -0.12 -0.18 0.41 -0.23 -0.23 -0.18 0.27 0.27 -0.18 1.00
CONCOR Summary
• Node similarity by correlation
• Iterate to stability, bisect graph
• Repeat!
• Issues:
– Meaningful bisection?
– How many bisections?
Edge Removal: Girvan-Newman Method

Edge Removal (Girvan-Newman 2002)

• Local bridges connect weakly


interacting parts of network
– Remove local bridges (weak ties)!
• Questions:
– Many bridges? • Betweenness
– No bridges? • (Or centrality, etc.)
Edge Removal: Betweenness
• The flow between v and u is 1
– Divide flow between shortest paths
• Dfn: The betweenness of edge e is
the total flow for all pairs.
INDIVIDUAL EXERCISE:

What is the betweenness for


a) Edge 7-8?
b) Edge 8-9?
Girvan-Newman Method
1. Calculate betweenness of all edges
2. Cut (remove max betweenness)
3. Repeat!
Computing Betweenness E&K 3.6.B

1. BFS from v
Computing Betweenness E&K 3.6.B

1. BFS from v
2. # shortest paths from
v to each node 1 1 1 1

2 1 2

3 3

6
Computing Betweenness E&K 3.6.B

 Repeat for each


1. BFS from each v node (not just A)
 Sum flow on an edge

2. # shortest paths from


v to each node
2 2 4 2

1 1 1 1
Flow +1 Flow +1 Flow +1 Flow +1

3. Propagate flow 1

2
1

1
2 1

2
1

Flow +1 Flow +1 Flow +1


1 1/2 1/2 1

3 3
Flow +1 Flow +1
1/2 1/2

6
Flow +1
A. Count flow from below + 1 at each node
B. Split flow up among parent nodes
according to # of shortest paths
Girvan-Newman Method
1. Calculate betweenness of all edges
2. Cut (remove max betweenness)
3. Repeat!

 When do
we stop???
Modularity (Stopping Criterion)
• Intuition: More edges inside a
community than random chance
the community v
1v,u = 1 if there is and u are assigned
an edge v, u to; 1 if equal

1  k v ku 
Q  1v ,u    cv , cu 
2 m v ,u  2m 
divide over all Expected edges
edges m in a random
version (preview)
Edge removal summary
• Remove edges
– E.g., pick by betweenness
– This bisects the graph
• Modularity as stopping criterion
• In practice:
– Ok for several thousand nodes
– Bigger – need to approximate
betweenness
• Hierarchical clustering
• Stochastic Block Models
– Brief review of probability theory
Intuition: Hierarchical Clustering
• Node similarity (distance) metric
• Find communities on blank nodes
– Apply threshold t0 and add edges
– Draw graph of communities G(t )
0

– Community = Component of G(t )0

• Decrease (increase) threshold &


repeat, yielding G(t )
n+1
Hier. Clustering Example (1)
exclude
2 nodes
• Node distance: Manhattan distance
1 2 3 4 5 6 7 8 9 10
7 1 1 1 1 1 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0
8 6 5 3 1 1 1 0 0 0 0 0 0 0
4 1 0 0 1 1 0 0 0 0 0
9 5 0 0 0 1 1 1 0 0 0 0
10
6 0 0 0 0 1 1 1 0 1 0
7 0 0 0 0 0 1 1 1 1 0
3 1 4 8 0 0 0 0 0 0 1 1 1 0
9 0 0 0 0 0 1 1 1 1 0
10 0 0 0 0 0 0 0 0 0 1
2

G (0) Π(0) A Distance = 0


Hier. Clustering Example (2)
exclude
2 nodes
• Node distance: Manhattan distance
1 2 3 4 5 6 7 8 9 10
7 1 1 1 1 1 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0
8 6 5 3 1 1 1 0 0 0 0 0 0 0
4 1 0 0 1 1 0 0 0 0 0
9 5 0 0 0 1 1 1 0 0 0 0
10
6 0 0 0 0 1 1 1 0 1 0
7 0 0 0 0 0 1 1 1 1 0
3 1 4 8 0 0 0 0 0 0 1 1 1 0
9 0 0 0 0 0 1 1 1 1 0
10 0 0 0 0 0 0 0 0 0 1
2

G(1)
(0)
Π(1) A Distance = 1
INDIVIDUAL EXERCISE:

What is Π(2)?
1 2 3 4 5 6 7 8 9 10
7 1 1 1 1 1 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0
8 6 5 3 1 1 1 0 0 0 0 0 0 0
4 1 0 0 1 1 0 0 0 0 0
9 5 0 0 0 1 1 1 0 0 0 0
10
6 0 0 0 0 1 1 1 0 1 0
7 0 0 0 0 0 1 1 1 1 0
3 1 4 8 0 0 0 0 0 0 1 1 1 0
9 0 0 0 0 0 1 1 1 1 0
10 0 0 0 0 0 0 0 0 0 1
2
Hier. Clustering Example (3)
exclude
2 nodes
• Node distance: Manhattan distance
1 2 3 4 5 6 7 8 9 10
7 1 1 1 1 1 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0
8 6 5 3 1 1 1 0 0 0 0 0 0 0
4 1 0 0 1 1 0 0 0 0 0
9 5 0 0 0 1 1 1 0 0 0 0
10
6 0 0 0 0 1 1 1 0 1 0
7 0 0 0 0 0 1 1 1 1 0
3 1 4 8 0 0 0 0 0 0 1 1 1 0
9 0 0 0 0 0 1 1 1 1 0
10 0 0 0 0 0 0 0 0 0 1
2

G(1)
(2)
Π(2) A Distance = 2
Cluster Dendrogram
• Clusters merge over time
• Issues:
– When do you stop?
(Just report the tree?
Stopping criteria?)
– Ignoring structure of G(t)
Clusters

STOCHASTIC BLOCK MODELS

Probabilities!
Generative vs. Discriminative
• Given network G, find best partition Π*
Π* = max
Π
P(Π | G)
Generative Models Discriminative Models
• Use Bayes’ Rule • Directly calculate
P(Π | G) = P(G | Π) ·P(Π) / P(G) P(Π | G)
• Model the generation of G • Discriminate between
• Estimate parameters for possible Π values
the model from data • Define features, find
patterns implying Π
General formulation
prob. of edges
possible observed in G
partition
likelihood
  

L(G |  , )   i  j    1   i  j 



function;  ij G   ij G 
i.e., prob. parameters:
assuming prob. of not producing
probabilities edges (that weren’t
params η of edges observed in G)

1 prob. per i,j pair!


TOO ABSTRACT!
Simplify: In- or Out-block
• General: η → 1 prob per i,j pair
• Copic/Jackson/Kirman: η → pin, pout
– Pin: prob. of link within community
– Pout: prob. of link outside community
pairs in same
community In    , s.t.i, j   

# links in communities Tin G,   G  In 


under Π
Maximum Likelihood Algorithm
  

L(G |  , )   i  j    1   i  j  


 ij G   ij G 
i and j within
L G |  , pin , pout   p in 1  pin 
Tin G ,   In    Tin G ,  
community
i and j not in
p Toutout G ,   1  pout 
Out    Tout G ,  
community
• Take log of both sides… rearrange…
l G |  , pin , pout  k1  k3  In   k 2  k 4  Tin G,   r
Simple calculation: # pairs in
communities
# links in communities
Implementing ML: Issues
• Estimate pin and pout alongside Π
– Solution: Pick pin and pout, build Π,
then estimate pin and pout… iterate!
• Exponential blowup of possible
Π… can’t test all
– Solution: Approximate which Π to
explore
Latent Space Estimation
• Copic/Jackson/Kirman: η → pin, pout
• Other bases for probabilities?
– Define space S
– Any info about nodes, links, groups!
L(G | S)

You might also like