0% found this document useful (0 votes)
20 views

Week 2 - Social Network Analysis

This document discusses community detection in networks. It defines a community as a subset of densely connected nodes within a graph. Community detection algorithms aim to identify these communities by optimizing metrics like modularity. The document describes several popular community detection algorithms, including edge-betweenness clustering, Louvain modularity, and InfoMap. It explains how each algorithm works and compares their benefits and limitations.

Uploaded by

Jasmine Zhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Week 2 - Social Network Analysis

This document discusses community detection in networks. It defines a community as a subset of densely connected nodes within a graph. Community detection algorithms aim to identify these communities by optimizing metrics like modularity. The document describes several popular community detection algorithms, including edge-betweenness clustering, Louvain modularity, and InfoMap. It explains how each algorithm works and compares their benefits and limitations.

Uploaded by

Jasmine Zhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SOCIAL NETWORK ANALYSIS – WEEK 2

Kanchana Padmanabhan

1 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Community Structure

A community is defined as a subset of nodes within the graph such that


connections between the nodes are denser than connections with the rest
of the network

2
Community Structure - What does it mean is real-world networks?

• Friends on social media with similar interests


• Neighborhoods that are likely to be affected by hurricane
• People that are likely to be affected by a contagious disease

3
Community Structure - Why do we care?

• Allow us to study a large-scale networks since individual community acts like meta-nodes
• In biology, individual community also shed light on the function of the system
• Affects information spreading
• Affects disease spreading
• Prediction of missing links or Identification of false links in the network

4
Clustering vs. Community Detection

• Community Detection – more real-world entities (similarity)


• Clusters – only care about structural properties (Min-Cut)
• For our class they will be one and the same

5
What is Community Detection?

The process of dividing nodes of a graph into possibly overlapping, subsets, where nodes in each subset are considered related by
some similarity measure

Shape
p by
Grou 2 communities

Gr
ou
p
by
Co
lor

3 communities

6
Community Detection

• Finding Communities in undirected, directed, and weighted graphs


• Betweenness Based
• Louvain Modularity
• InfoMap

• Evaluating Results

7
Betweenness Centrality - Recap
15

A measure of the degree to which a given node (or edge) lies on the shortest paths (geodesics) between other nodes in the graph.

Two types
• Vertex Betweenness
• Edge Betweenness
Edge Betweenness

The number of shortest paths in the graph G that pass through given edge (S, B)

NCSU

Sharon and Bob both study at NCSU and they are the only link between NY DANCE and CISCO groups

9
Edge-Betweenness Clustering

• Also called Girvan & Newman Clustering


• Michelle Girvan and Mark Newman developed algorithm in 2002
• Edges with high Betweenness form good starting points to identify communities

10
Girvan & Newman Clustering

Compute
Edge
Betweenness
Input graph G
Repeat until highest
vertex betweenness is
below μ or until you
have k communities

Remove Find edge edge (3,4) with


edge from with largest value 0.571
graph betweenness

11 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Louvain Modularity

• Developed by Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre in 2008
• Method was devised when they all were at the Université catholique de Louvain
• Undirected and Weighted Graphs
• Variant was later proposed for Directed Graphs

12
Louvain Modularity - How does it work?

• Greedy optimization method that attempts to optimize the "modularity” metric


• The modularity measures the density of connection within clusters compared to the density of connections between
cluster
• Maximize the modularity of the entire graph

13
Modularity

• Formalized in the paper by M.J Newman in 2006 PNAS publication


• It was designed to measure the strength of division of a network into modules (also called groups, clusters or communities)
• Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes
in different modules.
• Biological networks exhibit a high degree of modularity

14
Modularity - Formula

• Value of the modularity lies in the range -1 to 1


• Solving this exact modularity optimization is computationally hard.
• Blondel et al created a fast and scalable algorithm

1 𝑘! 𝑘"
𝑄= & 𝐴!" − 𝛿 𝑐! 𝑐"
2𝑚 2𝑚
!"

𝐴!" – edge weight between nodes 𝑖 and 𝑗


𝑘! - sum of the weights of the edges attached to node 𝑖
1, 𝑐! = 𝑐"
𝛿 𝑐! 𝑐" = %
0, 𝑐! ≠ 𝑐"

𝑚 – is the sum of all the edge weights in the graph

15
Louvain Modularity (Two Phases)

Start with all nodes in their own community


Repeat until no change in modularity or change is less than some value 𝜃
• Optimization Phase:
• In some random order, compare each node with each of its neighbor and keep track of change in modularity.
• Merge each node with the neighboring node that provides the best improvement in modularity. Only a positive change in modularity leads to a shift
• You have the option to not merge with any neighbor.

• Aggregation Phase:
• Create a new network where the communities identified in optimization phase become the new nodes.
• The weights for the connections between the communities is the sum of the weights of all the edges between the nodes in the two corresponding
communities

16
Louvain Modularity

17
Modularity - Simplified

' (
𝐿$ 𝑘$
𝑄=& −
𝑚 2𝑚
$%&

𝐿# – is the sum of edge weights between nodes within the community (each edge is considered twice)
𝑘# - sum of all edge weights for nodes within the community (including edges which link to other communities)
𝑚 – is the sum of all the edge weights in the graph

18
Louvain Modularity
19

Pros Cons
Steps are intuitive and easy to implement, and • Might assign outliers to the closest community
the outcome is unsupervised
The algorithm is extremely fast;
• Computer simulations on large ad-hoc
modular networks suggest that its complexity
is linear on typical and sparse data.
• Possible gains in modularity are easy to
compute
• The number of communities decreases
drastically after just a few passes
Resolution parameter

• Most implementations will have a resolution parameter 𝜆


• < 1 - Lower resolution = more but smaller communities
• > 1- Higher resolution = less but larger communities

1 1 𝑘* 𝑘+
𝑄= & 𝐴*+ − 𝛿 𝑐* 𝑐+
2𝑚 𝝀 2𝑚
*+

20
InfoMap

• The core identical to the procedure of Blondel.


• The algorithm a similar two described phases until an objective function is optimized.
• Rosvall and Bergstrom in 2008
• The objective function to be optimized is called map equation.

21
Map Equation

• Random walker randomly moves from object to object in the network.


• The more the connection of an object is weighted, the more likely the random walker will use that connection to reach the next
object.
• The goal is to form clusters in which the random walker stays for a long time.
• This happens when the weights of the connections within the cluster are greater values than the weights of the connections between
objects of different clusters.

22
InfoMap - Benefits

• The algorithm is also fast, but Louvian is faster


• InfoMap can potentially leave outliers out of communities

23
Evaluation of Algorithms - What are we evaluating?

• There is typically no ground-truth to evaluate


• You can manually look at few examples to see if it makes sense (you should do that anyway!)
• You can overlay additional information (e.g., word cloud from posts/tweets)
• You can use some metrics that evaluate quality of community structure (good way to compare multiple algorithms)

24 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Evaluation of Quality

Given Communities 𝐶 = 𝑆! , 𝑆" , 𝑆# ⋯


∑𝒊 𝒇(𝑺𝒊 )
𝑸 𝑪 =
|𝑪|

where 𝒇( ) is some quality metric

25
Quality Metrics

• Internal connectivity (Maximize)


• Number of edges within a community (Density)
• Average degree within the community

• External connectivity (Minimize)


• Number of edges with nodes outside of the community (Expansion)

• Hybrid functions
• Ratio of Expansion to (Density + Expansion) (Conductance)

This section shows formula for undirected graphs. It can be extended for directed graphs and weighted graphs.

26
Internal Connectivity

Given a community 𝑆
𝑛$ - the number of nodes in 𝑆
𝑚$ - the number of edges in 𝑆

" ∗ &#
Internal Density: 𝑓 𝑆 = '
! ∗ ( '# )!)

" ∗ &#
Average Degree: 𝑓 𝑆 =
'!

Large clusters (> 10 nodes) tend to get progressively sparser

27
External Connectivity

Given a community 𝑆
𝑛$ - the number of nodes in 𝑆
𝑐$ - the number of edges where only one endpoint of edge is in 𝑆

cs
Expansion: 𝑓 𝑆 = '!

28 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Hybrid Functions

Given a community 𝑆
ms - the number of edges in S
𝑐$ - the number of edges where only one endpoint of edge is in S

c
Conductance: 𝑓 𝑆 = c +"sm
s s

29
References

• Image on slide 9, 11, 12, 13 from Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana Padmanabhan, and Arpan
Chakraborty. 2013. Practical Graph Mining with R. Chapman & Hall/CRC.
• Image on slide 15 from Newman, M. E. J. (2006). "Modularity and community structure in networks". Proceedings of the National
Academy of Sciences of the United States of America. 103 (23): 8577–8696. arXiv:physics/0602124.
• Image on slide 17 from Blondel, Vincent D. / Guillaume, Jean L. / Lambiotte, Renaud / Lefebvre, Etienne (2008), „Fast unfolding of
communities in large networks“, Journal of Statistical Mechanics: Theory and Experiment, Jg.2008, Nr.10, P10008

30 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date

You might also like