Week 2 - Social Network Analysis
Week 2 - Social Network Analysis
Kanchana Padmanabhan
1 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Community Structure
2
Community Structure - What does it mean is real-world networks?
3
Community Structure - Why do we care?
• Allow us to study a large-scale networks since individual community acts like meta-nodes
• In biology, individual community also shed light on the function of the system
• Affects information spreading
• Affects disease spreading
• Prediction of missing links or Identification of false links in the network
4
Clustering vs. Community Detection
5
What is Community Detection?
The process of dividing nodes of a graph into possibly overlapping, subsets, where nodes in each subset are considered related by
some similarity measure
Shape
p by
Grou 2 communities
Gr
ou
p
by
Co
lor
3 communities
6
Community Detection
• Evaluating Results
7
Betweenness Centrality - Recap
15
A measure of the degree to which a given node (or edge) lies on the shortest paths (geodesics) between other nodes in the graph.
Two types
• Vertex Betweenness
• Edge Betweenness
Edge Betweenness
The number of shortest paths in the graph G that pass through given edge (S, B)
NCSU
Sharon and Bob both study at NCSU and they are the only link between NY DANCE and CISCO groups
9
Edge-Betweenness Clustering
10
Girvan & Newman Clustering
Compute
Edge
Betweenness
Input graph G
Repeat until highest
vertex betweenness is
below μ or until you
have k communities
11 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Louvain Modularity
• Developed by Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre in 2008
• Method was devised when they all were at the Université catholique de Louvain
• Undirected and Weighted Graphs
• Variant was later proposed for Directed Graphs
12
Louvain Modularity - How does it work?
13
Modularity
14
Modularity - Formula
1 𝑘! 𝑘"
𝑄= & 𝐴!" − 𝛿 𝑐! 𝑐"
2𝑚 2𝑚
!"
15
Louvain Modularity (Two Phases)
• Aggregation Phase:
• Create a new network where the communities identified in optimization phase become the new nodes.
• The weights for the connections between the communities is the sum of the weights of all the edges between the nodes in the two corresponding
communities
16
Louvain Modularity
17
Modularity - Simplified
' (
𝐿$ 𝑘$
𝑄=& −
𝑚 2𝑚
$%&
𝐿# – is the sum of edge weights between nodes within the community (each edge is considered twice)
𝑘# - sum of all edge weights for nodes within the community (including edges which link to other communities)
𝑚 – is the sum of all the edge weights in the graph
18
Louvain Modularity
19
Pros Cons
Steps are intuitive and easy to implement, and • Might assign outliers to the closest community
the outcome is unsupervised
The algorithm is extremely fast;
• Computer simulations on large ad-hoc
modular networks suggest that its complexity
is linear on typical and sparse data.
• Possible gains in modularity are easy to
compute
• The number of communities decreases
drastically after just a few passes
Resolution parameter
1 1 𝑘* 𝑘+
𝑄= & 𝐴*+ − 𝛿 𝑐* 𝑐+
2𝑚 𝝀 2𝑚
*+
20
InfoMap
21
Map Equation
22
InfoMap - Benefits
23
Evaluation of Algorithms - What are we evaluating?
24 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Evaluation of Quality
25
Quality Metrics
• Hybrid functions
• Ratio of Expansion to (Density + Expansion) (Conductance)
This section shows formula for undirected graphs. It can be extended for directed graphs and weighted graphs.
26
Internal Connectivity
Given a community 𝑆
𝑛$ - the number of nodes in 𝑆
𝑚$ - the number of edges in 𝑆
" ∗ &#
Internal Density: 𝑓 𝑆 = '
! ∗ ( '# )!)
" ∗ &#
Average Degree: 𝑓 𝑆 =
'!
27
External Connectivity
Given a community 𝑆
𝑛$ - the number of nodes in 𝑆
𝑐$ - the number of edges where only one endpoint of edge is in 𝑆
cs
Expansion: 𝑓 𝑆 = '!
28 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date
Hybrid Functions
Given a community 𝑆
ms - the number of edges in S
𝑐$ - the number of edges where only one endpoint of edge is in S
c
Conductance: 𝑓 𝑆 = c +"sm
s s
29
References
• Image on slide 9, 11, 12, 13 from Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana Padmanabhan, and Arpan
Chakraborty. 2013. Practical Graph Mining with R. Chapman & Hall/CRC.
• Image on slide 15 from Newman, M. E. J. (2006). "Modularity and community structure in networks". Proceedings of the National
Academy of Sciences of the United States of America. 103 (23): 8577–8696. arXiv:physics/0602124.
• Image on slide 17 from Blondel, Vincent D. / Guillaume, Jean L. / Lambiotte, Renaud / Lefebvre, Etienne (2008), „Fast unfolding of
communities in large networks“, Journal of Statistical Mechanics: Theory and Experiment, Jg.2008, Nr.10, P10008
30 10/17/22 This is the slide footer, which can be edited under Header & Footer, along with slide numbers and date