0% found this document useful (0 votes)
69 views

Data Mining and BI: Social Network Analytics: Credits: Lada Adamic

This document provides an introduction to social network analysis and complex network theory. It discusses how networks can be used to model complex systems found in nature and society. Some key points made in the document include: - Networks consist of nodes connected by links or edges, and can be used to represent interactions between components of various systems. - Understanding networks is important for understanding complex systems like social groups, biological systems, and technological networks. - Network science provides a shared language for studying relationships across different domains. - Real-world networks often exhibit properties like strong clustering, short path lengths, and heavy-tailed degree distributions. - Different data structures like adjacency matrices, edge lists, and adjacency lists can

Uploaded by

marouli90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Data Mining and BI: Social Network Analytics: Credits: Lada Adamic

This document provides an introduction to social network analysis and complex network theory. It discusses how networks can be used to model complex systems found in nature and society. Some key points made in the document include: - Networks consist of nodes connected by links or edges, and can be used to represent interactions between components of various systems. - Understanding networks is important for understanding complex systems like social groups, biological systems, and technological networks. - Network science provides a shared language for studying relationships across different domains. - Real-world networks often exhibit properties like strong clustering, short path lengths, and heavy-tailed degree distributions. - Different data structures like adjacency matrices, edge lists, and adjacency lists can

Uploaded by

marouli90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Mining and BI: Social

Network Analytics
Introduction

Credits: Lada Adamic


Source: https://github.com/ladamalina/coursera-sna/tree/master/Week%201.%20Introduction
INTRODUCTION
Networks and Complex Systems
• Society is a collection of billions of individuals
• Communication systems link electronic
devices
• Information and knowledge is organized and
linked
• Interactions between thousands of genes
regulate life
• Our thoughts are hidden in the connections
between billions of neurons in our brain
The network
• Behind many systems there is an intricate
wiring diagram, a network, that defines the
interactions between the components

We will never understand these systems unless


we understand the networks behind them!
Why Networks? Why Now?
• Universal language for describing complex data
– Networks from science, nature, and technology are
more similar than one would expect
• Shared vocabulary between fields
– Computer Science, Social science, Physics, Economics,
Statistics, Biology
• Data availability (/computational challenges)
– Web/mobile, bio, health, and medical
• Impact!
– Social networking, Social media, Drug design
Some Research Facts
• Relationships are mostly mutual. If you like
me, I will like you.

A B

• Friend of my friend will eventually be my


friend
C

A B
Some Research Facts
• Making and Maintaining friendship is costly.
• Average person can maintain only 150
intimate relationship.
• So this friendship network is hardly possible.
Some Research Facts
• This is a small world. On average each person in
the world has a friendship distance of 6

• There are on average 3.74 people in between any


two Facebook users.
COURSE ORGANIZATION
Details
• Instructor: K. Tserpes
• 2 courses
– Duration: 3 hours
– Main Auditorium
• Organized based on (and using material from) the:
– Social Networking course taught by Lada Adamic, Assoc. Prof. of
Information at the Univ. of Michigan
(https://github.com/ladamalina/coursera-sna)
– CS224W: Social and Information Network Analysis, Jure Leskovec,
Stanford University (http://cs224w.stanford.edu)
• webpage: eclass
• Contact details:
[email protected]
– Office 4.6
BASIC NETWORK PROPERTIES
Components of a Network
Concept Representation Notation
Objects nodes, vertices V
Interactions links, edges E
System network, graph G(V,E)

Nodes are Entities/Actors


Can be people, organizations,
A B countries

Edges are relationship


Can be Friendship, Transaction,
Trust, Relationship
D C
Networks Vs Graphs
• The term “network” often refers to real systems
– Web, social, metabolic networks
– Language
• Network, Node, Link
• Graph is a formal notation for representing a
network
– Web Graph, Social Graph
– Language
• Graph, vertex, edge
Edges
• Directed (also called arcs, links)
– A->B
• A likes B, A gave a gift to B, A is B’s child
• Undirected
– A <-> B or A – B
• A and B like each other
• A and B are siblings
• A and B are co--authors
How do you model something into a
network?

Ngram graph

Bird Migration Network

Organizational Network

How do you model this?:


Supply (Transportation Network

What you make out of these models?


Edge attributes
• Examples
– Weight (e.g. frequency of communication)
– Ranking (best friend, second best friend…)
– Type (friend, relative, co-worker)
– Properties depending on the structure of the rest
of the graph: e.g. betweenness
Directed Networks

girls’ school dormitory dining-table partners,


1st and 2nd choices (Moreno, The sociometry reader, 1960)
Positive and negative weights
• e.g. one person trusting/distrusting another
• Research challenges:
– How does one ‘propagate’ negative feelings in a
social network?
– Is my enemy’s enemy my friend?
Data Representation
• Adjacency Matrix
• Edgelist An adjacency-matrix representation may be
preferred, when the graph is dense--|E| is
close to |V|/2 -- or when we need to be able
• Adjacency list to tell quickly if there is an edge connecting
two given vertices.

2
3
0 0 0 0 0
1 0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
5 4 1 1 0 0 0

Quiz: How would this change if there was a self-loop in node #1?
Data Representation
• Adjacency Matrix The edgelist representation is usually preferred
• Edgelist for the computational processing of large-scale
graphs (i.e. very large |E|)
• Adjacency list
From To
2 4
2
2 3
3
1 3 2
3 4
4 5
5 4
5 2
5 1
Data Representation
• Adjacency Matrix
• Edgelist The adjacency-list representation is usually
preferred, because it provides a compact way
• Adjacency list to represent sparse graphs--those for which
|E| is much less than |V|/2

2
Node Adj nodes
3
2 3 4
1
3 2 4
4 5
5 4 5 2 1
Quiz
• Which representation model to use if you
want to quickly retrieve all neighbors for a
node?
Computing Metrics
• Degree & degree distribution
• Connected components
Degree: Which node is most
connected?

Les Miserables: coappearance weighted network of characters in the


novel Les Miserables. D. E. Knuth, The Stanford GraphBase: A Platform
for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).
Nodes
• Node network properties
– From immediate connections
• Indegree how many directed edges (arcs) are incident on a node
• Outdegree how many directed edges (arcs) originate at a node
• Degree (in or out) number of edges incident on a node
• From the entire graph
– Centrality (betweenness, closeness)

2
3 Node #4 Indegree: ?
1
Node #3 Outdegree: ?
Node #1 Degree: ?
5 4
Node degree from matrix values
2
3
1

5 4
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0

0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
Network metrics: degree sequence and
degree distribution
• Degree sequence: An ordered list of the (in,out)-degree of each node
– In--degree sequence: 2 6
• [2, 2, 2, 1, 1, 1, 1, 0] 3 7
– Out--degree sequence: 1
• [2, 2, 2, 2, 1, 1, 1, 0]
– (undirected) degree sequence: 5 4 8
• [3, 3, 3, 2, 2, 1, 1, 1]
• Degree distribution: A frequency count of the occurrence of each degree
– In--degree distribution:
• [(2,3) (1,4) (0,1)]
– Out--degree distribution:
• [(2,4) (1,3) (0,1)]
– (undirected) distribution:
• [(3,3) (2,2) (1,3)]
Quiz: Indegree distribution?
Is everything connected?
Connected components
• Strongly connected components: each node within the
component can be reached from every other node in the
component by following directed links
– Strongly connected components
• 2345 2 6
• 1 3 7
1
• 67
• 8
5 4 8
• Weakly connected components: every node can be reached from
every other node by following links in either direction
– Weakly connected components
• 12345
• 678
• In undirected networks one talks simply about ‘connected
components’
Quiz: How many strongly connected
components are in this network?
Quiz: How many strongly connected
components are in this network?
Giant component
• If the largest component encompasses a
significant fraction of the graph, it is called the
giant component
Learning from Flickr & Yahoo
• Types of network activity:
– “Singletons,” who have no
connections and are least
central
– The “giant component,” which
is the largest group of nodes
tightly connected to the central
nodes and to each other
– The “middle region,” which
represents isolated groups
which interact amongst
themselves but not with the
rest of the network, forming
isolated stars. These groups
grow one user at a time. Over
time they merge with the giant
component.
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution
of online social networks. In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining (KDD '06). ACM, New York,
NY, USA, 611-617. DOI=http://dx.doi.org/10.1145/1150402.1150476

You might also like