Data Mining and BI: Social Network Analytics: Credits: Lada Adamic
Data Mining and BI: Social Network Analytics: Credits: Lada Adamic
Network Analytics
Introduction
A B
A B
Some Research Facts
• Making and Maintaining friendship is costly.
• Average person can maintain only 150
intimate relationship.
• So this friendship network is hardly possible.
Some Research Facts
• This is a small world. On average each person in
the world has a friendship distance of 6
Ngram graph
Organizational Network
2
3
0 0 0 0 0
1 0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
5 4 1 1 0 0 0
Quiz: How would this change if there was a self-loop in node #1?
Data Representation
• Adjacency Matrix The edgelist representation is usually preferred
• Edgelist for the computational processing of large-scale
graphs (i.e. very large |E|)
• Adjacency list
From To
2 4
2
2 3
3
1 3 2
3 4
4 5
5 4
5 2
5 1
Data Representation
• Adjacency Matrix
• Edgelist The adjacency-list representation is usually
preferred, because it provides a compact way
• Adjacency list to represent sparse graphs--those for which
|E| is much less than |V|/2
2
Node Adj nodes
3
2 3 4
1
3 2 4
4 5
5 4 5 2 1
Quiz
• Which representation model to use if you
want to quickly retrieve all neighbors for a
node?
Computing Metrics
• Degree & degree distribution
• Connected components
Degree: Which node is most
connected?
2
3 Node #4 Indegree: ?
1
Node #3 Outdegree: ?
Node #1 Degree: ?
5 4
Node degree from matrix values
2
3
1
•
5 4
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
Network metrics: degree sequence and
degree distribution
• Degree sequence: An ordered list of the (in,out)-degree of each node
– In--degree sequence: 2 6
• [2, 2, 2, 1, 1, 1, 1, 0] 3 7
– Out--degree sequence: 1
• [2, 2, 2, 2, 1, 1, 1, 0]
– (undirected) degree sequence: 5 4 8
• [3, 3, 3, 2, 2, 1, 1, 1]
• Degree distribution: A frequency count of the occurrence of each degree
– In--degree distribution:
• [(2,3) (1,4) (0,1)]
– Out--degree distribution:
• [(2,4) (1,3) (0,1)]
– (undirected) distribution:
• [(3,3) (2,2) (1,3)]
Quiz: Indegree distribution?
Is everything connected?
Connected components
• Strongly connected components: each node within the
component can be reached from every other node in the
component by following directed links
– Strongly connected components
• 2345 2 6
• 1 3 7
1
• 67
• 8
5 4 8
• Weakly connected components: every node can be reached from
every other node by following links in either direction
– Weakly connected components
• 12345
• 678
• In undirected networks one talks simply about ‘connected
components’
Quiz: How many strongly connected
components are in this network?
Quiz: How many strongly connected
components are in this network?
Giant component
• If the largest component encompasses a
significant fraction of the graph, it is called the
giant component
Learning from Flickr & Yahoo
• Types of network activity:
– “Singletons,” who have no
connections and are least
central
– The “giant component,” which
is the largest group of nodes
tightly connected to the central
nodes and to each other
– The “middle region,” which
represents isolated groups
which interact amongst
themselves but not with the
rest of the network, forming
isolated stars. These groups
grow one user at a time. Over
time they merge with the giant
component.
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution
of online social networks. In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining (KDD '06). ACM, New York,
NY, USA, 611-617. DOI=http://dx.doi.org/10.1145/1150402.1150476