Social Network Analysis
Social Network Analysis
Network
Analysis
Social Networks and Social Network Analysis (SNA)
• Social media platforms allow members/visitors to connect to each other, to brands, to groups:
• Such interconnections on these platforms give rise to networks that may consist of individuals, groups, or
brands/companies
• It is important for the marketer to study the network of connections in order to uncover the role of an
individual/brand page
• Social Network Analysis (SNA) is the process of investigating social structures through the use of network
and graph theories:
• A typical network consists of nodes (represented through circles) and edges/connections (represented through
lines), technically referred to as a sociogram
• Sociograms were developed to analyze choices or preferences within a group and represent the structure and
patterns of group interactions through a diagram
• It can be drawn on the basis of many different criteria: friend requests, interactions with comments, memes
spread, information circulation, business networks, knowledge networks, and likes on posts, among many others
• These visualizations provide a means of qualitatively assessing networks by varying the visual representation of
their nodes and edges to reflect attributes of interest
Social Networks and Social Network Analysis (SNA)
• Firstly, it can process a large amount of relational data and describe the
overall relational network structure
Traditional participation statistics can provide important insights into the volume of engagement of a community but
can say little about the structure of the connections between community members. Network analysis can help explain
important social phenomena such as group formation, group cohesion, social roles, personal influence, and overall
community health. Combining traditional participation metrics with network metrics provides the best of worlds and
allows you to answer important questions such as the following:
• What kinds of social roles are being performed within a social media collection? Does a community have enough people
filling the important roles?
• Which individuals play important social roles within a group or collection? Who would make a good administrator based
on that person’s network position?
• What subgroups exist? Do connections between subgroups exist? Who plays the bridge roles that connect otherwise
unconnected groups?
• How do new ideas propagate through a network? Who are the influencers that spark the spread of ideas?
• How do the overall structures of a social network change after a particular event (e.g., a company social, a round of new
hires or layoffs, a product launch or recall)?
Applications of Social Networks
This network graph visualization
paints a picture of the social
relationships among the Twitter
accounts of members of the
United States Senate in 2018
• Nodes/Vertex:
• Nodes are the entities in the network
• Often, they represent people or social structures such as workgroups, teams, organizations, institutions, states,
or even countries
• At other times they represent content such as web pages, keywords, or videos
• Edges:
• Edges are the connections between the nodes. Edges can be any relationship that the marketer wishes to
capture
• Edges can represent many different types of relationships like proximity, collaboration, kinship, friendship,
trade partnership, literature citation, investment, hyperlink, transaction, or any shared attribute (e.g., people
who attended the same University)
• An edge can be said to exist if it has some official status, is recognized by the participants, or is observed by
exchange or interaction between them
Directed and Undirected Networks
• Directed Networks:
• Directed edges (also known as asymmetric edges) have a clear origin and destination
• For example, money is lent from one person to another, a Twitter user follows another user, an email is
sent from an author to a recipient, or a web page links to another web page
• They are represented on a graph as a line with an arrow pointing from the source node to the recipient
node
• Undirected Networks:
• An undirected edge (also known as a symmetric or mutual edge) simply exists between two people or
things: a couple is married, two Facebook users are friends, or two people are members of the same
organization
• No origin or destination is clear in these mutual relationships
• They cannot exist unless they are reciprocated. Undirected edges are represented on a graph as a line
connecting two vertices with no arrows
Directed and Undirected Networks
• For example, the marketer may be interested in learning more about a network where specific individuals
ask questions to others, while a specific set of individuals answer these questions. The researcher can draw
up to two separate directed networks:
• One in which these individuals who pose questions have an arrow directed from them to the one to
whom the question is being asked
• Another in which the direction of the arrow points from the answer giver to the one who is asking
questions
• In an undirected network, only connections are captured and the direction is not important
Unweighted and Weighted Networks
• Unweighted Networks:
• An unweighted edge or binary edge, only indicates if an edge exists or not
• For example, a friendship tie between Facebook users either exists or does not
• Weighted Networks:
• A weighted edge includes values associated with each edge that indicate the strength or frequency of a tie
• For example, a weighted edge between two Facebook users may indicate the number of photo comments
exchanged or the duration since the creation of a friendship
• Weighted edges are often represented visually as thicker or darker or as more or less opaque lines
• Including weighted edge data in a network dataset is preferable because this provides additional
information about each tie
However, many social network analysis metrics are designed for unweighted networks
Fortunately, any weighted network can be converted to an unweighted one by choosing a cutoff point. For example, an
unweighted edge could be shown between individuals who exchanged at least 10 email messages, with no edge between
people who exchanged fewer than 10 messages
Shortest Path
• The shortest path between two people is called the “geodesic distance”
• The geodesic distance is the length of the shortest path between two people
• If you think of the edges as roads and the vertices as houses, the geodesic distance would be the number of
roads someone must take to get from one house to another, assuming that the person is traveling on the
shortest path possible
• The maximum geodesic distance, or diameter of a network, is the largest geodesic distance in the network,
or the distance between the two vertices that are farthest from each other
Social Network Analysis: SNA Metrics
The table describes a directed network because not all connections are reciprocated. For example, Ann “points to” Bob
as shown in row 1, but Bob does not “point to” Ann as shown in row 2. If it were an undirected network, it would be a
symmetric matrix; if Ann points to Bob, then Bob must necessarily point to Ann
This network is a binary network because it only includes 1s and 0s, where a 1 indicates that there is a connection and a 0
indicates that there is no connection
Allowing additional values would create a weighted network. For example, the 1s could be replaced with the number of
email messages sent or phone calls made to the other person
Social Network Analysis: SNA Metrics
An alternative to the matrix data format that is a more efficient representation of a network is called an “edge list”
It is simply a list of all edges in the network. Individuals in the Vertex1 column “point to” those in the Vertex2 column
Unless data describing the value of each edge are provided in additional columns, the network is implied to be a binary one
Social Network Analysis: Network
Types of Networks
Social networks range in size from a handful of people to national and planetary populations. They also differ in
the types of vertices they include, the nature of the edges that connect them, and the ways in which they are
formed
It is often useful to consider social networks from an individual member’s point of view. Network analysts call
the individual that is the focus of attention “ego” and the people he or she is connected to “alters”
Some networks, called egocentric networks, only include individuals who are connected to a specified ego
For example, a network of your personal Facebook friends would be an egocentric network because you are, by
definition, connected to all other vertices
Types of Networks
• Unimodal, Multimodal, and Affiliation Networks
The standard networks are called unimodal networks because they include one type (i.e., mode) of vertex. They connect
users to users, or they connect documents to documents, but they don’t include both users and documents
However, networks can include different types of vertices creating multimodal networks. For example, connecting
Marvel Movies to Characters in those movies. Rich sets of intersecting networks often form in social media environments
composed of connections between people, photos, videos, messages, documents, groups, organizations, locations, and
services. In many cases, these multimodal networks have to be transformed into simpler unimodal networks to perform
meaningful network analysis, as most network metrics are designed for unimodal networks
A common type of multimodal network is a bimodal network with exactly two types of vertices. Data for these networks often
include individuals and some event, activity, or content with which they are affiliated, creating an affiliation network. For
example, an affiliation network may connect users with the wiki pages they have edited
People are affiliated with pages. In this network, no two users would directly connect to each other. Likewise, no two pages
would directly connect to each other. Pages only link to people (i.e., editors)
Bimodal affiliation networks can be transformed into two separate unimodal networks: a “user edits page” network can be
converted into a user-to-user network and a page to page network
Social Network Analysis: SNA Metrics
• These quantitative network metrics allow analysts to systematically inspect the patterns of
connection within the social world, creating a basis on which to compare networks, track
changes in a network over time, and determine the relative position of individuals and
clusters within a network
• Centrality measures are a set of metrics that describe the position that an individual
occupies in a network. These metrics describe how a particular node can be said to be in
the ‘middle’ of the network. Leading measures of centrality include:
• Degree Centrality
• Betweenness Centrality
• Closeness Centrality
• Eigenvector/PageRank Centrality
It is more common to normalize this score so that it represents the average length of the shortest paths rather than their sum. This adjustment
allows comparisons of the closeness centrality of nodes of graphs of different sizes
Degree Centrality
Degree Centrality: It is a simple count of the total number of connections linked to a node.
In other words, it measures the number of neighbors of the node
• It can be thought of as a kind of popularity measure, but a crude one that does
not recognize a difference between quantity and quality. Degree centrality does
not differentiate between a link to the CEO of a big company and a link to its most
recent trainee hire
• For directed networks where relationships have an origin and a destination, In-
degree is the number of directed edges incident on a node. Out-degree is the
number of directed edges that originate at a node
Rather than returning a count, it is the degree of the node divided by the total possible
number of edges that the node could have
For the case of the directed graph, the degree of the incoming vertices and outgoing vertices
would likely be treated separately
• This can be thought of as a kind of “bridge” score, a measure of how much removing a person
would disrupt the connections between other people in the network
• A “structural hole” is a term for recognizing a missing bridge. Wherever two or more groups
fail to connect, one can argue that there is a structural hole, a missing gap waiting to be filled
Social network analysis has many strategic applications for people in an organization to analyze their position and the
position of others. Managers and leaders can recognize gaps or disconnections within organizations and devote resources to
bridging the divide. People may be able to apply social network analysis to identify locations in which a gap exists and elect
to fill them, recognizing the value they can generate as brokers between two otherwise separate groups
Betweenness Centrality
Here,
σst(v) is used to indicate the number of those paths that pass through v
σst is used to indicate the total number of shortest paths from node s to node t
For standardization, the summation is divided by (N-1)(N-2)/2 for undirected networks and (N-1)(N-2) for directed
networks
This measure assumes that the node that is closest to all the other nodes is the most important. It is the ratio of the highest
possible degree for the node to the sum of the shortest paths to the other nodes.
Intuitively, it considers not just “how many people you know,” but also “who you know”
• There is a variety of software for undertaking social network analysis including free and
paid versions:
• Some of the leading SNA software include UCINET, Gephi, NodeXL, and Pajek
The most popular person should have the highest number of friends. Thus, degree centrality is the most
appropriate measure
Undirected Network: Centrality Example
Suppose the following network refers to the information flow network of an organization. If
you are interested in finding the section that can most frequently control information
flow in the network, which centrality measure is the most appropriate? Answer with
reasons why it is the most appropriate
Undirected Network: Centrality Example
Suppose the following network refers to the information flow network of an organization. If
you are interested in finding the section that can most frequently control information
flow in the network, which centrality measure is the most appropriate? Answer with
reasons why it is the most appropriate
To control information flow, a node should be between other nodes because the node can interrupt information flow
between them. Thus, betweenness centrality is the most appropriate measure
Undirected Network: Centrality Example
Suppose the following network refers to the information flow network of an
organization. Each node represents a section in the organization, and each edge represents a
possible information exchange between the sections at the ends. If you are interested in
finding the section that can most efficiently obtain information from every other
section, which centrality measure is the most appropriate? Answer with reasons why it is the
most appropriate
Undirected Network: Centrality Example
Suppose the following network refers to the information flow network of an
organization. Each node represents a section in the organization, and each edge represents a
possible information exchange between the sections at the ends. If you are interested in
finding the section that can most efficiently obtain information from every other
section, which centrality measure is the most appropriate? Answer with reasons why it is the
most appropriate
To obtain information, one should be near everyone. In this sense, the node in the nearest position on average can most
efficiently obtain information. Thus, closeness centrality is the most appropriate
Undirected Networks: Centrality Exercises
Calculate:
Degree Centrality
Betweenness Centrality
Closeness Centrality
Also, find the most important nodes based on different centrality measures
Directed Networks: Degree and Betweenness Centrality
Degrees: Node C has the highest degree of centrality measure: 4 Degrees, 1 in-degree, 3 out-degrees
Node B: 2 degrees, 1 in-degree, 1 out-degree
Betweenness: Calculated in the same way as an undirected network. The sum is divided by (N-1)(N-2) as the direction is
known
Directed Networks: Closeness Centrality
Calculate:
Degree Centrality
Betweenness Centrality
Closeness Centrality
Also, find the most important nodes based on different centrality measures
Social Network Analysis: SNA Metrics
• Community Detection:
• Community detection algorithms in SNA will allow an individual to be included in more than
one group, therefore giving a more realistic evaluation of the groups to which that person may
belong to
• Visualizing and making sense of large networks can be challenging, particularly if they are
densely connected
• One strategy for understanding large networks is to filter out information. Many
criteria can be used to filter out vertices and/or edges. For example, vertices with low
centrality scores can be filtered out, leaving only those most important in the network.
Other data associated with vertices, such as age, country of origin, time zone, or the number
of Twitter followers, can be used to filter vertices
• Filtering can also be applied to edges. For example, if edges represent the number of email
messages exchanged between two people, a network of “strongly” connected individuals
may filter out those who have sent less than 10 messages to one another
Grouping and Filtering
• On a scale of 0 to 1, a 0 would mean that there are no connections at all, and a 1 would
indicate that all possible edges are present (a perfectly connected network)
• A high value of density represents that the network is more dense and nodes are more
cohesive. On the other hand, a low-density value represents a less connected (sparse) network.
In a dense network, the information can flow easily and faster than in a sparse network
• There are many ways to approach community detection in networks. The modularity
optimization method is one of the widely used methods. Modularity is a measure of the
extent to which like is connected to like in a network
Community Detection
• Many algorithms are used for community detection. However, the two widely used
algorithms are
• Girvan-Newman Algorithm
• Louvain Algorithm
• On the other hand, Louvain simply compares node modularity and does not actually
change the original graph's setup
Girvan-Newman Algorithm
Repeat these steps until no edges are left:
• The Louvain method of community detection uses a greedy approach and initially assigns
each node to an individual community
Greedy Modularity Communities
Greedy modularity maximization begins with each node in its own community and repeatedly joins the
pair of communities that lead to the largest modularity until no further increase in modularity is possible (a
maximum)
Q = Modularity
m = Weights of all edges in the graph (if unweighted graph, then count of all edges in the graph)
Ꟙ(ci, cj) = 1 if both nodes i and j are in the same community else 0
Louvain Algorithm Steps
• Initially assign each node a unique community such that total nodes = total unique
communities
• Now, in an iterative manner, assign every node i to its neighboring node j community and
recalculate the modularity of the graph. If modularity improves as compared to when node i
wasn’t in the j node’s community, we will assign i to community j else not
• This will be repeated for multiple iterations till no further gain is observed in modularity by
moving any node to its neighbor’s community, and this way, maxima of modularity are
reached
Facebook Case Study
• This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. The dataset includes
node features (profiles), circles, and ego networks
• Facebook data has been anonymized by replacing the Facebook-internal ids for each user
with a new value. Also, while feature vectors from this dataset have been provided, the
interpretation of those features has been obscured. For instance, where the original dataset
may have contained a feature “political=Democratic Party”, the new data would simply
contain “political=anonymized feature 1”. Thus, using the anonymized data it is possible
to determine whether two users have the same political affiliations, but not what their
individual political affiliations represent
Summary
Centrality Measures
Degree Centrality
• Definition: Degree centrality assigns an importance score based simply on the number of
links held by each node
• What it tells us: How many direct, ‘one hop’ connections each node has to other nodes in
the network
• When to use it: For finding very connected individuals, popular individuals, individuals
who are likely to hold the most information, or individuals who can quickly connect with
the wider network
• A bit more detail: Degree centrality is the simplest measure of node connectivity.
Sometimes it’s useful to look at in-degree (number of inbound links) and out-degree
(number of outbound links) as distinct measures, for example when looking at transactional
data or account activity
Betweenness Centrality
• Definition: Betweenness centrality measures the number of times a node lies on the
shortest path between other nodes
• What it tells us: This measure shows which nodes are ‘bridges’ between nodes in a
network. It does this by identifying all the shortest paths and then counting how many
times each node falls on one
• When to use it: For finding the individuals who influence the flow around a system
• A bit more detail: Betweenness is useful for analyzing communication dynamics but
should be used with care. A high betweenness count could indicate someone holds
authority over disparate clusters in a network, or just that they are on the periphery of both
clusters
Closeness Centrality
• Definition: Closeness centrality scores each node based on its ‘closeness’ to all
other nodes in the network
• What it tells us: This measure calculates the shortest paths between all nodes,
then assigns each node a score based on its sum of shortest paths
• When to use it: For finding the individuals who are best placed to influence the
entire network most quickly
• A bit more detail: Closeness centrality can help find good ‘broadcasters’, but in
a highly-connected network, you will often find all nodes have a similar score.
What may be more useful is using Closeness to find influencers in a single
Eigenvector Centrality
• Definition: Like degree centrality, Eigenvector Centrality measures a node’s influence based
on the number of links it has to other nodes in the network. Eigenvector Centrality then goes
a step further by also taking into account how well-connected a node is, and how many links
their connections have, and so on through the network
• What it tells us: By calculating the extended connections of a node, Eigenvector Centrality
can identify nodes with influence over the whole network, not just those directly connected to
it
• When to use it: Eigenvector Centrality is a good ‘all-round’ SNA score, handy for
understanding human social networks, but also for understanding networks like malware
propagation
• What it tells us: This measure uncovers nodes whose influence extends beyond their direct
connections into the wider network
• When to use it: Because it takes into account direction and connection weight, PageRank
can be helpful for understanding citations and authority
• A bit more detail: PageRank is famously one of the ranking algorithms behind the original
Google search engine (the ‘Page’ part of its name comes from creator and Google founder,
Larry Page)