Weic-Dmkd10 Overlappingcommunity
Weic-Dmkd10 Overlappingcommunity
1 Introduction
The earliest result related to community detection in computer science is probably the
graph partition problem, which can date back to the design of VLSI [6] and modeling
roles and positions in social structure [24]. While these approaches are relevant to the
community detection, Newman pointed out a few facts that make the approaches unsuit-
able for detecting communities [17]. For instance, in a typical graph partitioning prob-
lem, the number of nodes to be partitioned and the number of groups to be partitioned
are usually known in advance, which is of great difference from the community detec-
tion problem we are considering. A second serious drawback of these graph-partition-
based methods is that all of them are essentially variations of finding partitions of a
graph so that the number of crossing edges between partitions are minimized. However,
small number of crossing edges alone may not be a good indication for communities
without considering the intrinsic connections among the nodes in the graph.
Newman’s revolutionary notion of modularity is the first successful attempt to re-
solve the drawbacks specified above [17]. The modularity is defined on a partition of
the nodes in a graph. Let G = (V, E) be an undirected graph modeling a social network
with n nodes and m edges. Assume that each node v belongs to community cv . We
define the indicator function δ(cu , cv ) = 1 if and only if cu = cv , i.e., u, v are in the
same community, and otherwise δ(cu , cv ) = 0 for two nodes u, v ∈ V . The modularity
Q of this specific community partition is
( )
1 ∑ du dv
Q= Auv − δ(cu , cv ),
2m u,v 2m
Strategy space and Nash equilibrium. In our community formation game, the strate-
gies of agent vi are subsets of communities that it wants to join, i.e, all subsets of [k].
We denote Li ⊆ [k] as a strategy of vi , which we also refer to as the community label
of vi . We allow Li = ∅, which means that i chooses to not belong to any community.
Define L = (L1 , L2 , . . . , Ln ) as a strategy profile, which is a vector of community
labels for all agents.
The utility of vi is measured by a gain function gi (·) and a loss function ℓi (·), which
map L to real numbers.4 Let the community labels of agents other than i to be L−i , and
we use (L−i , L′i ) to denote a strategy profile where the i-th entry of L is replaced by
L′i . We define the utility function for vi to be: ui (L) = gi (L) − ℓi (L).
In the community formation game, given the strategies of other agents L−i , the best
response strategy (or strategies) of agent vi is:
arg max
′
gi (L−i , L′i ) − ℓi (L−i , L′i ).
Li ⊆[k]
Definition 1 (Pure Nash equilibrium). Given graph G, the strategy profile L = (L1 , L2 ,
. . . , Ln ) forms a (pure) Nash equilibrium of the community formation game if all agents
are playing their best strategies, that is,
∀i and L′i ̸= Li , ui (L−i , L′i ) ≤ ui (L−i , Li ).
In other words, in a Nash equilibrium, no agent can improve her own utility by
changing her strategy unilaterally. One can interpret that each agent is satisfied with
her community selection at the state of a Nash equilibrium. Since each node may select
more than one community, the communities detected at the equilibrium naturally can
be overlapping with each other, which shall reflect what occurs in the real world.
4
A more appropriate notation should be giG (·) and ℓG i (·). However, since the underlying graph
G is static in our game, it is simpler to omit the superscript on G.
Existence and computation of Nash equilibria. In general, Nash equilibria may not
exist in a community formation game. To see this, one can easily formulate a “matching
pennies” game [20] in the community formation game, in which one node u always
prefer to be with another node v in the same community while v always prefer not to
be in the same community as u. It is thus interesting to know when a Nash equilibrium
exists in the community formation game.
Let us recall that potential games are a general class of games that permit pure Nash
equilibria [19]. In a potential game, there is an associated potential function Φ(·) defined
on the strategy profiles of the agents. A community formation game is a potential game
if Φ(L) − Φ(L−i , L′i ) = ui (L−i , L′i ) − ui (L) for every strategy profile L and every
strategy L′i of vi . In other words, if an agent changes her strategy to improve her own
utility, the potential function strictly decreases with the same amount as the increase
of the agent’s utility. In any potential game that contains a finite number of strategy
profiles, Nash equilibria always exist. Furthermore, every better response dynamic, in
which each agent sequentially changes her strategy to improve her own utility, will
converge to a Nash equilibrium.
Next, we provide a sufficient condition to make a community formation game poten-
tial, and thus address the existence of Nash equilibria for community detection purpose.
Definition 2. A set of functions {fi (·) : 1 ≤ i ≤ n} is locally linear with linear factor
ρ if for every strategy profile L and every strategy L′i of vi , the following relation holds:
We show that if the gain and loss functions in a community formation game are
locally linear, the game is a potential game.
Theorem 1. Let {gi (·) : i ∈ [n]} and {ℓi (·) : i ∈ [n]}) be the sets of gain and
loss functions of a community formation game. If {gi (·)} and {ℓi (·)}) are locally linear
functions with linear factor ρg and ρℓ , then the community formation game is a potential
game.
With locally linear gain and loss functions, we can guarantee the existence of Nash
equilibria, but it is not necessarily true that finding a Nash equilibrium is easy. The
follow lemma indicates that computing a Nash equilibrium could be hard. Due to space
limitation, we omit the proof of the lemma.
Lemma 1. There exists a community formation game, in which the sets of gain and loss
functions are locally linear, such that both computing the best response for an individual
agent and computing a Nash Equilibrium in the game are NP-hard.
Gain and loss functions. We now propose a set of gain and loss functions. These gain
and loss functions have natural economic interpretations and they can be computed ef-
ficiently. Additionally, our experiments also demonstrate that the equilibria using these
gain and loss functions provide pretty accurate information regarding the community
formation.
The gain function we use here is a generalized version of the well accepted mod-
ularity function, which is adapted to fit the scenario that one node may participate in
multiple communities. We define δ̂(i, j) = 1 if |Li ∩Lj | ≥ 1 and δ̂(i, j) = 0 otherwise.
Let A be the adjacency matrix of graph G.
Definition 4 (Linear loss function). Let c > 0 be a constant. The loss of a node vi
with the linear loss function is (|Li | − 1) · c.
It is easy to verify that both the personalized modularity function and the linear
loss function are locally linear functions, with linear factor 1/2 and 1, respectively.
Therefore, we have the following result.
Theorem 2. Let gi (L) be the personalized modularity function and ℓi (L) be a linear
loss function. The community formation game has a Nash equilibrium.
We have shown that if the set of gain and loss functions are locally linear, there always
exists a Nash equilibrium in our community formation game (Theorem 1). However,
computing the best response might be hard in some simple cases from Lemma 1. It is
thus not reasonable to assume that individuals always make the best response. Instead,
Algorithm 1 LocalEquilibrium(G)
1: initialize each node to a singleton community
2: repeat the following process until no node can improve itself
3: randomly pick a node vi , and perform the best operation among join, leave and switch
we propose that an agent will only choose a strategy from a restricted space that de-
pends on her current state when she needs to respond to the other agents’ strategies. In
particular, an agent can only locally implement the following three operations,
1. join. Agent vi joins a new community on top of the communities she joins by adding
a new label in Li .
2. leave. Agent vi leaves a community she is in by removing a label from Li .
3. switch. Agent vi switches from one community to another by replacing a label in
Li .
In the restricted strategy spaces, an equilibrium is a state where no agent can deviate
from her current strategy within the locally allowed strategy space. Such kind of equi-
libria are referred as local equilibria [1] in the literature. In the community formation
game, the entire strategy space of agent i is S = 2[k] . For each agent i with the current
community label set Li , we use ls(Li ) to denote i’s local strategy space, which is the
set of possible label sets we could obtain by applying one of the operations join, leave
and switch once on Li . The local equilibrium is defined as follows:
Definition 5 (Local equilibrium). 5 Given G, the strategy profile L = (L1 , L2 , . . . , Ln )
forms a local equilibrium of the community formation game if all agents are playing
their local optimal strategies, that is,
Different from Nash equilibrium, at local equilibrium, the utility of each agent is
achieving a local maximum instead of a global one. This is useful when the local strat-
egy space is easy to explore, while computing a global optimal solution is not feasible.
Restricting strategy space to our local strategy space is further justified by the fact that
in real world individuals usually consider joining or leaving one community at a time.
In our setting, computing the local best response in the local strategy space is
polynomial-time, by simply enumerating all O(k) possible join, leave and switch op-
erations. We show in the following lemma that for the case of using our personalized
modularity gain function and linear loss function, the computation can be made efficient
by maintaining a quantity for each community, and by only checking new communities
to add or switch to from the set of communities of one’s neighbors.
Lemma 2. Let ∆i be the degree of the node vi and Ni be the set of neighbors of vi in
the graph G. Let gi (L) be the personalized modularity function and ℓi (L) be a linear
loss function. The time complexity to find the best local operation for agent i in join,
leave and switch is O(|Li | · |L(Ni )| · ∆i ), where L(Ni ) = ∪j∈Ni Li .
5
The local equilibrium in [1] is defined on Euclidean strategy spaces.
Proof. It is sufficient to show that the function Qi (L) can be efficiently updated. For
∑ dj
each community U , we maintain a quantity Q̂U = vj ∈U 2m . Notice that if only one
member leaves U or joins U in each step, Q̂U can be updated in constant time.
For the join operation, if vi is joining a community not in L(Ni ), Qi (L) as well
as ui (L) will be strictly decreasing. Therefore, vi only considers the communities in
L(Ni ) to join. For the switch operation, the same argument applies, except that vi may
have a possible gain if she switches to a brand new community U ′ , but in this case vi
can gain the same utility by simply removing the old community. Therefore vi only
needs to consider communities in L(Ni ).
In joining a new community, Qi (L) can be updated in O(∆i ) time. This is because
∑ di dj
the term − j∈U 2m can be computed in constant time given the maintained Q̂U . The
switch operation will consider leaving |Li | communities and joining |L(Ni )| commu-
nities. Therefore, the total running time is O(|Li | · |L(Ni )| · ∆i ). 2
The fast computation of the local best operation is important in our experiment. This
is in particular a reason we propose to use |Li ∩ Lj | in defining the personalized modu-
larity function. Next, we use the simple algorithm LocalEquilibrium(G) (Algorithm 1)
to compute a local equilibrium, which essentially simulates the best local response dy-
namic. Although in general the local response dynamic may take long time to converge,
we show below that for the case of the personalized modularity function and the linear
loss function, the convergence is fast if we set the parameter c in the linear loss function
appropriately.
Theorem 3. Let gi (L) be the personalized modularity function and ℓi (L) = c(|Li |−1)
be a linear loss function with constant c satisfying 4cm2 is an integer. LocalEquilib-
rium takes at most O(m2 ) steps to reach a local equilibrium.
∑
Proof. Define∑the potential function Φ(L) = ℓ(L) − g(L)/2, where ℓ(L) = i ℓi (L)
and g(L) = i gi (L). Notice that Φ(L) ≥ −1/2 since g(L) ≤ 1 and ℓ(L) ≥ 0.
Now consider 4m2 ·Φ(L). It is straightforward to verify that 4m2 ·Φ(L) is an integer
function. Notice that we have Φ(L) = 0 in the initial state of LocalEquilibrium, and
Φ(L) ≥ −1/2 during the algorithm execution. Since 4m2 · Φ(L) strictly decreases in
each round, Φ(L) will achieve minimum in at most 2m2 steps. 2
Our LocalEquilibrium algorithm uses the initial configuration in which every agent
has one community of her own. One reason we choose this starting point instead of
other possible ones, such as all nodes sharing one single community, is that with this
starting point most of the agents’ activities will be joining communities, which is likely
to be an individual decision. In contrast, when people share one community and the
community evolves by splitting, the splitting decision is usually a collective decision
made by a group of people, which is not modeled in our current game-theoretic frame-
work and it would be computationally expensive to incorporate group decisions. Our
experimental results show that this starting point indeed leads to the extraction of rea-
sonable community structures. Additionally, if we already have some partial knowledge
on who are in the same communities, we can choose it as our starting point and do not
allow the local dynamic to change it, and thus community learning can be naturally
incorporated in our framework as well. This is a future direction we will pursue next.
4 Experiments
It is very hard to obtain ground truth community information from real-world networks.
Therefore, detecting communities sometimes is more of an art than a science [5]. Never-
theless, we conduct experiments on some real-world networks as well as several bench-
mark graphs. In the experiment, our algorithm uses the personalized modularity func-
1
tions as the gain functions and the simple loss functions with the loss factor c = m .
Fig. 1. The communities discovered for the dolphin network (left). The communities discovered
for the Zachary’s karate club (right).
which essentially has to compute all pairs of shortest paths. In particular, we are only
able to obtain results of CONGA for the graphs with 1,000 nodes.
In Figure 2, we show the results of different algorithms on the benchmark graphs
with overlapping communities. The networks used to produce the left figure consist of
1,000 nodes, whereas those of the right figure consist of 5,000 nodes. In each figure,
the community sizes of the two upper diagrams range between smin = 10 and smax =
50, and the community sizes of two bottom diagrams range between smin = 20 and
smax = 100. The mixing parameter, i.e., the portion of crossing edges, µ is 0.1 for
two left diagrams and 0.3 for two right diagrams. The other parameters are τ1 = 2,
τ2 = 1, kavg = 20, kmax = 50 and om = 2. The x-axis represents the portion of
nodes that belong to multiple communities. The graphs are generated in the following
steps: (1) generate the number of communities that each node will belong to; assign
the degrees to the nodes based on a power law distribution with exponent τ1 ; (2) assign
the community sizes from another power law distribution with exponent τ2 for a fixed
number of communities; (3) generate the bipartite graph between the nodes and the
communities with the configuration model [14]; (4) for each node, assign the cross-
community degree and internal degrees within each community based on µ; (5) build
the graphs for each community and the cross-community edges with the configuration
model. For more detailed steps on how the graphs are generated and refined, readers
can refer to [11].
As shown in Figure 2, our algorithm for the game-theoretic framework performs
better for the bottom two networks. In other words, our algorithm works better on
1 1 1
0.99
0.9
0.98
0.5 0.5
Normalized mutual information
1 1 1 1
Fig. 2. The network on the left consists of 1,000 nodes. The minimum degree and maximum
degree of the network are 10 and 50 respectively. The network on the right consists of 5,000
nodes. The minimum degree and maximum degree of the network are 10 and 50 respectively.
graphs with larger communities. Notice that the community sizes in the bottom two
graphs are within 20 and 100, while the upper two graphs have communities with size
10 to 50.
Compared with the clique percolation algorithm, both our algorithm and the clique
percolation algorithm perform very well on the two upper left networks, with mu-
tual information being above 90%. For the two upper right networks in Figure 2, the
clique percolation algorithm outperforms our algorithm when the fraction of overlap-
ping nodes is small. However, our algorithm is more stable than the clique percolation
algorithm over all instances. The performance of the clique percolation algorithm drops
significantly when the portion of overlapping nodes increases. In particular, when half
nodes belong to multiple communities (at the point 0.5 on the x-axis), the performance
of our algorithm actually is equally good to the clique percolation algorithm for graphs
with 1,000 nodes, and performs better on graphs with 5,000 nodes.
Compared with CONGA algorithm, our algorithm is better for µ = 0.1, and is not
as good as CONGA for µ = 0.3. Again the performance of our algorithm is more stable
than CONGA, and is better when the fraction of overlapping nodes is large. Also notice
that CONGA is not able to finish in reasonable time for graphs with 5000 nodes.
Wei Chen Jialin Zhang, Chao Jin, Zheng Zhang, Likun Liu, Shiding Lin, Ming Chen,
(MSRA) Shaomei Wu, Yu Chen, Qiao Lian, Ben Y. Zhao, Xuezheng Liu
Marcos Kawazoe Aguilera, Sam Toueg
William M. Andrews, Aidong Lu, David S. Ebert, Mario Costa Sousa,
Ross Maciejewski, Tobias Isenberg
Zhongding Jiang, Yi Gong, Yu Guan, Jin Wang, Yingchao Zhao, Chunxiao Liu,
Zi’ang Ding, Guofeng Zhang, Yingzhen Yang, Ling Zhuang, Hongxin Zhang,
Wei Chen Chengfang Song, Huafeng Liu, Huagen Wan, Luying Li, Hujun Bao, Xiao Liang,
(ZJU) Qunsheng Peng, Qifeng Tan, Pengcheng Shi, Yubo Zhang, Shang-Hua Teng,
Lincan Zou, Xiaobo An,Xueying Qin, Long Zhang, Yinan Fan, Dong Xu,
Yun Zeng, Wei Hua, Zhao Dong,
Table 1. The partition of the co-authors of two “Wei Chen”s
The first “Wei Chen” is the first author of this paper. The co-authors of him are
splitted into two communities. One related to his research collaborators after he joins
Microsoft Research Asia, and the other are his collaborators back when he was at Cor-
nell.
The second “Wei Chen” is a faculty member in Zhejiang University. The first group
of his collaborators represents his connection in Purdue university. The second group of
the co-authors is his colleagues in Zhejiang University, with the exception of “Shang-
Hua Teng” and “Yingchao Zhao”. These two authors are actually the co-authors of “Wei
Chen (MSRA)”. A reason to explain the misclassification is that Teng and Zhao only
co-authored one paper with “Wei Chen (MSRA)” (by the end of 2008) and they did not
collaborate with any other “Wei Chen (MSRA)”s co-authors. On the other hand, Teng
had collaboration with “Harry Shum” that has strong connections with the graphics
researchers, one of whom is “Wei Chen (ZJU)”. In this respect, it is actually hard to say
that it was a “misclassification” since the two authors only collaborate with “Wei Chen
(MSRA)” once.
5 Conclusion
We propose for the first time a game-theoretic framework to detect community struc-
tures in social networks. This formulation intuitively matches the dynamic formation of
communities in real world scenarios. Furthermore, since we do not require each agent
to join exactly one community, the resulting community structure natually incorporates
overlapping communities.
Our experiment shows that, even with simple ulitity functions defined on the agents,
our method is effective in discovering overlapping communities in several benchmark
graphs and real world networks. Since the algorithm we use to find local equilibrium
only implements local operations, the running time is fast and the algorithm can fit into
a parallel framework.
There remain many interesting open problems under this framework. One direction
is to find more appropriate gain and loss functions. The proposed ones in this paper,
though simple and effective, are by no means the best choices for the community for-
mation games. In particular, we believe better gain and loss functions can be obtained by
deeper understanding of the community formation process in the real world networks.
6 Acknowledgement
The authors thank Prof. Wei Chen from State Key Lab of CAD&CG, Zhejiang Univer-
sity for commenting on the partition produced by our algorithm.
References
1. C. Alós-Ferrer and A. Ania. Local equilibria in economic games. Economics Letters,
70(2):165–173, 2001.
2. S. Athey and S. Jha. A theory of community formation and social hierarchy. working paper,
2006.
3. U. Brandes and T. Erlebach. Network Analysis: methodological foundations. Springer Ver-
lag, 2005.
4. A. Clauset, M. E. J. Newman, and C. Moore. Finding Community Structure in Very Large
Networks. Phys. Rev. E, 70(6):066111, Dec 2004.
5. J. Copic, M. O. Jackson, and A. Kirman. Identifying Community Structures from Network
Data via Maximum Likelihood Methods. The B.E. Journal of Theoretical Economics, 9,
2009. working paper.
6. P. O. Fjällström. Algorithms for graph partitioning: A Survey. In Linkoping Electronic
Atricles in Computer and Information Science, 3., 1998.
7. S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proceedings of
the National Academy of Sciences, 104(1):36, 2007.
8. S. Gregory. A fast algorithm to find overlapping communities in networks. In ECML/PKDD.
Springer, 2008.
9. J. D. Kasarda and M. Janowitz. Community Attachment in Mass Society. American Socio-
logical Review, 39(3):328–339, 1974.
10. P. Kotler and G. Zaltman. Social Marketing: An Approach to Planned Social Change. The
Journal of Marketing, 35(3):3–12, 1971.
11. A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algo-
rithms on directed and weighted graphs with overlapping communities. Physical Review
E, 80(1):16118, 2009.
12. D. Lusseau. The emergent properties of a dolphin social network. Proceedings: Biological
Sciences, 270:S186–S188, 2003.
13. D. McKenzie-Mohr and W. Smith. Fostering Sustainable Behavior: An Introduction to
Community-Based Social Marketing. New Society Publishers, 1999.
14. M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence.
Random Structures and Algorithms, 6(2-3):161–180, 1995.
15. M. E. J. Newman. Coauthorship networks and patterns of scientific collaboration. Pro-
ceedings of the National Academy of Sciences of the United States of America, 101(Suppl
1):5200–5205, 2004.
16. M. E. J. Newman. Who Is the Best Connected Scientist?A Study of Scientific Coauthorship
Networks. Complex Networks, 650:337–370, 2004.
17. M. E. J. Newman. Modularity and community structure in networks. Proceedings of the
National Academy of Sciences, 103(23):8577–8582, 2006.
18. V. Nicosia, G. Mangioni, V. Carchiolo, and M. Malgeri. Extending the definition of modu-
larity to directed graphs with overlapping communities. J. Stat. Mech, 3024, 2009.
19. N. Nisan, T. Roughgarden, É. Tardos, and V. V. Vazirani. Algorithmic game theory. Cam-
bridge University Press, 2007.
20. M. J. Osborne and A. Rubinstein. A Course in Game Theory. MIT Press, 1994.
21. G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community struc-
ture of complex networks in nature and society. Nature, 435(7043):814, 2005.
22. R. J. Sampson and W. B. Groves. Community Structure and Crime: Testing Social-
Disorganization Theory. American Journal of Sociology, 94(4):774, 1989.
23. S. Sarason. The Psychological Sense of Community. Jossey-Bass, 1974.
24. H. C. White, S. A. Boorman, and R. L. Breiger. Social Structure from Multiple Networks. I.
Blockmodels of Roles and Positions. American Journal of Sociology, 81(4):730, 1976.
25. W. W. Zachary. An information flow model for conflict and fission in small groups. Journal
of Anthropological Research, 33(4):452–473, 1977.