Social Network Analysis in R PDF
Social Network Analysis in R PDF
Drew Conway
August 6, 2009
Introduction
The number of software suites and packages available for conducting social
network analysis has exploded over the past ten years
I In general, this software can be categorized in two ways:
I Type - many SNA tools are developed to be standalone applications, while
others are language specific packages
I Intent - consumers and producer of SNA come from a wide range of
technical expertise and/or need, therefore, there exist simple tools for data
collection and basic analysis, as well as complex suites for advanced research
1
All tests performed on a 2.5 GHz Intel Core 2 Duo MacBook Pro with 4GB 667 MHz DDR2
Drew Conway Social Network Analysis in R
Why use R to do SNA? SNA Software Landscape
Examples of SNA in R Pros and Cons of R
Additional Resources Comparison of SNA in R vs. Python
NX Code 1
igraph Code 1
def betweenness_test(G):
betweenness_test<-function(graph) {
start=time.clock()
return(betweenness(graph)) }
B=networkx.brandes_betweenness_centrality(G)
system.time(B<-betweenness_test(G))
return time.clock()-start
1
All tests performed on a 2.5 GHz Intel Core 2 Duo MacBook Pro with 4GB 667 MHz DDR2
Drew Conway Social Network Analysis in R
Why use R to do SNA? SNA Software Landscape
Examples of SNA in R Pros and Cons of R
Additional Resources Comparison of SNA in R vs. Python
NX Code 1
igraph Code 1
def betweenness_test(G):
betweenness_test<-function(graph) {
start=time.clock()
return(betweenness(graph)) }
B=networkx.brandes_betweenness_centrality(G)
system.time(B<-betweenness_test(G))
return time.clock()-start
1
All tests performed on a 2.5 GHz Intel Core 2 Duo MacBook Pro with 4GB 667 MHz DDR2
Drew Conway Social Network Analysis in R
Why use R to do SNA? SNA Software Landscape
Examples of SNA in R Pros and Cons of R
Additional Resources Comparison of SNA in R vs. Python
NX Code 1
igraph Code 1
def betweenness_test(G):
betweenness_test<-function(graph) {
start=time.clock()
return(betweenness(graph)) }
B=networkx.brandes_betweenness_centrality(G)
system.time(B<-betweenness_test(G))
return time.clock()-start
NX Code 2
igraph Code 2
def layout_test(G,i=50):
layout_test<-function(graph,i=50) {
start=time.clock()
return(layout.fruchterman.reingold(graph,niter=i)) }
v=networkx.layout.spring_layout(G,iterations=i)
system.time(v<-layout_test(G))
return time.clock()-start
1
All tests performed on a 2.5 GHz Intel Core 2 Duo MacBook Pro with 4GB 667 MHz DDR2
Drew Conway Social Network Analysis in R
Why use R to do SNA? SNA Software Landscape
Examples of SNA in R Pros and Cons of R
Additional Resources Comparison of SNA in R vs. Python
NX Code 1
igraph Code 1
def betweenness_test(G):
betweenness_test<-function(graph) {
start=time.clock()
return(betweenness(graph)) }
B=networkx.brandes_betweenness_centrality(G)
system.time(B<-betweenness_test(G))
return time.clock()-start
NX Code 2
igraph Code 2
def layout_test(G,i=50):
layout_test<-function(graph,i=50) {
start=time.clock()
return(layout.fruchterman.reingold(graph,niter=i)) }
v=networkx.layout.spring_layout(G,iterations=i)
system.time(v<-layout_test(G))
return time.clock()-start
NX Code 3
igraph Code 3
def diameter_test(G):
diameter_test<-function(graph) {
start=time.clock()
return(diameter(graph)) }
D=networkx.distance.diameter(G)
system.time(D<-diameter_test(G))
return time.clock()-start
NX Code 3
igraph Code 3
def diameter_test(G):
diameter_test<-function(graph) {
start=time.clock()
return(diameter(graph)) }
D=networkx.distance.diameter(G)
system.time(D<-diameter_test(G))
return time.clock()-start
NX Code 3
igraph Code 3
def diameter_test(G):
diameter_test<-function(graph) {
start=time.clock()
return(diameter(graph)) }
D=networkx.distance.diameter(G)
system.time(D<-diameter_test(G))
return time.clock()-start
NX Code 4
igraph Code 4
def max_clique_test(G):
max_clique_test<-function(graph) {
start=time.clock()
return(maximal.cliques(graph)) }
C=networkx.clique.find_cliques(G)
system.time(M<-max_clique_test(G))
return time.clock()-start
NX Code 3
igraph Code 3
def diameter_test(G):
diameter_test<-function(graph) {
start=time.clock()
return(diameter(graph)) }
D=networkx.distance.diameter(G)
system.time(D<-diameter_test(G))
return time.clock()-start
NX Code 4
igraph Code 4
def max_clique_test(G):
max_clique_test<-function(graph) {
start=time.clock()
return(maximal.cliques(graph)) }
C=networkx.clique.find_cliques(G)
system.time(M<-max_clique_test(G))
return time.clock()-start
Finding maximal cliques can require several nested loops, which may contribute to R’s
poor performance
Often social network analysis is used to identify key actors within a social
group. To identify these actors, various centrality metrics can be computed
based on a network’s structure
I Degree (number of connections)
I Betweenness (number of shortest paths an actor is on)
I Closeness (relative distance to all other actors)
I Eigenvector centrality (leading eigenvector of sociomatrix)
One method for using these metrics to identify key actors is to plot actors’
scores for Eigenvector centrality versus Betweenness. Theoretically, these
metrics should be approximately linear; therefore, any non-linear outliers will be
of note.
I An actor with very high betweenness but low EC may be a critical
gatekeeper to a central actor
I Likewise, an actor with low betweenness but high EC may have unique
access to central actors
For this example, we will use the main component of the social network
collected on drug users in Hartford, CT. The network has 194 nodes and 273
edges.
For this example, we will use the main component of the social network
collected on drug users in Hartford, CT. The network has 194 nodes and 273
edges.
For this example, we will use the main component of the social network
collected on drug users in Hartford, CT. The network has 194 nodes and 273
edges.
Centrality
Eigenvector
Centrality")+ylab("Eigenvector 50
abs(res)
Centrality")
0.1
# We use the residuals to color and
0.4 0.2
# shape the points of our plot, 141
47
63 89105 77
94
184
67
0.7
# so we know who is who 101
64 165
93
33
49 162
86 91
36177
174
17016821
52 111 27 16
42 68
6 12 69
148
149 78
169 172
181
183
110
25
150
182
185 176
130
45 79
26 12161 138 66 62 11
0.0
124
164
163
179
189
152
133
161
166
135
187
146
96
122
112
167
31
128
145
153
129
191
15
171
13
39
92
125
158
180
108
194
113
109
119
156
175
186
59
81
98
107
120
134
188
51
74
80
83
1 952190
210
37
3103
76 90
143
114
193
127
40
71
82
7132
173
178 3142
137
159
38 99157
192
123
106 151
8846
144
72 73154
5 97 60 244
8 118
116 160
131 117 70
34 115 14
30 29 35 102
Using the drug network data, we will now identify the location of the key actors
from the previous analysis
I We will use the same residual data from before to size the nodes and
locate the key actors
First, however, we’ll look at the network as a whole using igraph’s Tcl/Tk
interface
Using the drug network data, we will now identify the location of the key actors
from the previous analysis
I We will use the same residual data from before to size the nodes and
locate the key actors
First, however, we’ll look at the network as a whole using igraph’s Tcl/Tk
interface
library(igraph)
G<-as.undirected(read.graph(
"drug_main.txt",type="edgelist"))
tklplot(G,layout=layout.fruchterman.reingold)
# This will open a new X11 window plot of G
●
●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
● ● ● ● ●
●
●
141
●
●
● ●
● 155 Network plot
●
●
●●
●
●
58 44
● ●47
●
●
●
● ●
● 28 # Create positions for all of
●
●
●
● ●
●
●
● ●
●
# the nodes w/ force directed
l<-layout.fruchterman.reingold(G,
●53
● ●
●
●
● ● ● ● ●
niter=500)
20
●50
●
●
● ● # Set the nodes’ size relative to
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
V(G)$size<-abs(res)*10
● ●
● ● ●
● ●
●
# Only display the labels of key
●
●
●
# players
●
●
●
● ●
●
nodes<-as.vector(V(G)+1)
●
●
●
●
●
●
●
●
●
79
●
# Save plot as PDF
● ●
●
pdf(‘actor_plot.pdf’,pointsize=7)
●
● ●
● plot(G,layout=l,vertex.label=nodes,
● ●
102 ●
vertex.label.dist=0.25,
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
vertex.label.color=‘red’,edge.width=1)
●
●
●
● ●
● ●
●
●
dev.off()
● ●
●
● ● ●
● ●
●
●
● ●
●
● ●
●
●
●
● ● ●
●
● ● ● ●
●
● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●●● ●
● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ●
● ● ● ●● ●●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ●●
● ● ● ● ● ●
● ● ● ●● ●
● ● ●
● ● ● ● ● ●
●
● ● ●
● ● ● ● ●
● ●● ● ● ●
● ● ● ● ●
●
● ●
● ● ● ● ●
● ● ●
● ●
● ●● ● ● ●
● ● ●●
● ● ● ● ● ●
● ● ●● ● ● ● ●
●
●●
● ● ● ●● ● ●
● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ●
● ● ●
● ●
● ●
●
● ● ● ●
●
● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●●● ●
● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ●
● ● ● ●● ●●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ●●
● ● ● ● ● ●
● ● ● ●● ●
● ● ●
● ● ● ● ● ●
●
● ● ●
● ● ● ● ●
● ●● ● ● ●
● ● ● ● ●
●
● ●
● ● ● ● ●
● ● ●
● ●
● ●● ● ● ●
● ● ●●
● ● ● ● ● ●
● ● ●● ● ● ● ●
●
●●
● ● ● ●● ● ●
● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ●
● ● ●
● ●
● ●
●
● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ●
●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●●● ● ● ●
● ● ● ●● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ●● ●
● ● ● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ●
● ● ● ●● ●
● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ●
● ●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●● ●
● ● ● ● ●● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ● ●
●● ● ●
● ● ● ●● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
Online Resources
igraph
I Network Analysis with igraph
I Excellent resource for learning how to use igraph in R, but also reviews
many of the basic concepts of SNA
statnet
I Statnet Users Guide
I This package combines functionality from several popular R packages for
SNA, and the online users guide contains reference material for:
I network: A package for managing relational data in R
I ergm: A package to fit, simulate and diagnose exponential family models for networks
I latentnet: a package for fitting latent cluster models for networks
I sna: A package for social network analysis
I dynamicnetwork and rSoNIA: Prototype packages for managing and animating longitudinal network
data
I networksis: A Package to Simulate Bipartite Graphs with Fixed Marginals Through Sequential
Importance Sampling
Helpful Experts
Several experts in both SNA in R, and SNA more general are active online and
can be very helpful for those trying these methods for the first time
I SNA in R Experts
I Nicole Radziwill - networks researcher
Web: http://qualityandinnovation.com/
Twitter: @nicoleradziwill
I Michael Bommarito - PhD student in political science at U Michigan
Web: http://computationallegalstudies.com/
Twitter: @mjbommar
I Email: [email protected]
I Web: http://www.drewconway.com/zia
I Twitter: @drewconway