Sna It Unit5
Sna It Unit5
Introduction
Social networks are built when actors belonging to different social groups are connected to each
other. Sociograms used to represent social networks and started to explore the social relations in
a formal study. Until today, social network analysis is still a hot issue that attracts many
concerns, particularly for analyzing the online social networks. Some important metrics for
network analysis are described below.
Graph Theory
Graph theory is the study of graphs, which are mathematical structures used to model pair wise
relations between objects. Many fundamental concepts and metrics in social network analysis are
derived from graph theory, because graph theory formally represents social networks with
structural properties. Some fundamental concepts related to graph theory are :
Node degree:
The degree of a node in a graph is the number of edges incident to the node. If there are loops in
the graph, the degree of a node will be counted twice. Therefore, the maximum number of
unique edges in a graph can be obtained when the loops are excluded.
- Undirected graph
- Directed Graph
Node density:
It is a graph in which the number of edges is close to the maximal number of edges.
- Undirected graph
1
UNIT V
Path length:
The path length is the number of edges in the sequence that a walk follows. In a path, all nodes
and edges appear only once in the sequence. Therefore, the path length can be defined as the
distances between pairs of nodes in a network graph, and average path length is the average of
these distances between all pairs of nodes.
Component size:
When the component size is concerned, a connected graph needs to be discovered first since the
component size is counted by the number of connected nodes in a graph. A graph is connected if
all pairs of nodes are reachable, and for each pair of two nodes, one of them is reachable from
the other.
On the other hand, if a graph is not connected, the graph can be partitioned into several
connected subgraphs where each component size can be calculated by the number of connected
nodes in each subgraph.
Centrality
Centrality is a measure indicating the importance of node in the network. The measure of
centrality is thus used to give a rough indication of the social power of a node based on how well
they connect the network. HITS and PageRank are two most famous representatives using
centrality for ranking.
HITS analyzes the important nodes based on calculating Authorities (indegrees) and Hubs (out-
degrees)
PageRank calculates node values based on out-degrees.
The three most popularly adopted methods to measure the centrality of a social network are
listed below:
Degree
Betweenness
Closeness
Degree centrality:
Degree centrality is defined as the number of edges incident upon a node, and thus it is usually
the first way to calculate the nodes that are most potential to determine other nodes.
2
UNIT V
For calculating degree centrality, the nodes that have direct connections to a large number of
nodes are considered. If the edges in a graph are directed, the in-degree centrality is
differentiated from the out-degree centrality.
Betweenness centrality:
Betweeness centrality is another key metrics for computing the extent to which a node lies
between other nodes in the network. Betweenness centrality is to measure the connectivity of the
neighbors of a node and to give a higher value for nodes which bridge clusters. Besides, this
measure reflects the number of nodes which a node is connecting indirectly through the direct
links.
Closeness centrality:
The measure of closeness centrality is to take into account how distant a node is to the other
nodes in the network.. Nodes that are ranked high with closeness centrality can be seen as the
nodes that are more likely to act as information distributors in the social network.
A node is considered important if it is relatively close to all other nodes.
Farness of a node is the sum of its distances to all other nodes.
Closeness if the inverse of the farness.
𝐶𝐶 𝑢 = 1 / Σ𝑣≠𝑢 𝑑(𝑢,𝑣)
3
UNIT V
Clustering
Also called community, it refers to a group of nodes having denser relations with each other than
with the rest of the network.
Clustering coefficient:
Clustering coefficient is to measure the degrees of nodes to decide which nodes in a graph tend
to be clustered together. Thus, the clustering coefficient measure is to quantify how close its
neighbors are to being a complete graph. As the nodes grouped in the real-world social network
tend to have relatively high density of ties, the clustering coefficient is also utilized for small
world analysis.
Example :
Clustering coefficient = 3 * TC / CT
Node-Edge Diagrams
4
UNIT V
A node-edge diagram is an intuitive way to visualize social networks. With the node-edge
visualization, many network analysis tasks, such as component size calculation, centrality
analysis, and pattern sketching, can be better presented in a more straightforward manner.
different layouts have their own pros and cons to display the network graph depending on the
size, complexity, and structure of the social network. There are three kinds of layouts:
Random layout
Force – directed layout
Tree layout
Random Layout
Random layout is to put the nodes at random geometric locations in the graph.
It may not yield very clear visualization results, particularly when the number of nodes
immensely increases, e.g. more than thousands of nodes
Since a random layout algorithm can efficiently draw the social network graph in linear
time, O(N), sometimes it can be usable to visualize very large network graphs.
Force-Directed Layout
Also known as a spring layout, this simulates the graph as a virtual physical system.
The edges act as spring and the nodes act as repelling objects.
There exists gravitational attraction or magnetic repulsion between each node in the
graph.
Generally, an initial random layout will be yielded first, and then the force-directed
algorithms will run iteratively to adjust the positions of nodes until all graph nodes and
attractive forces between the adjacent nodes run to convergence.
Since a force-directed layout may take hundreds of iterations to obtain a stable layout, the
running time is at least O(N log N) or O(E), where N is the number of nodes and E is the
number of edges.
The running cost of a force-directed layout is much higher than that of a random layout,
especially when the number of nodes is large. It is therefore not suitable for graphs larger
than hundreds of nodes.
Tree Layout
5
UNIT V
A basic tree layout is to choose a node as the root of tree, and the nodes connected to the root
become children of the root node.
Nodes that are at more levels away from the root become the grand-children of the root and so
on.
It can display a more structural layout than graph layouts by considering more contextual
information.
Tree layout was proposed for better visualization.
The tree visualizations utilize the idea of focus + context to better the visualization effects with
animation techniques and help users to obtain both global and local views of a social network in
a 2D display.
Matrix Representations
A matrix presentation can help minimize the occlusion problems caused by the node-edge
diagram, the matrix-based representation of graphs offers an alternative to the traditional node-
edge diagrams.
With a matrix-based representation, clusters and associations among the nodes can also be
better discovered when the number of nodes increases. Particularly, when the relationships are
complex, a matrix-based representation can effectively outperform a node-edge diagram in
readability since the high connectivity of a node-edge representation will easily diffuse the focus
6
UNIT V
Web Communities
Different social network services were created on the Web to help people maintain their
social relationships.
The SixDegrees.com website was an early representative created on the basis of the Web
interaction model during 1997 and 2001.
o Various social network websites and Web-based dating services have been
established to provide people more convenient ways to build up their social
relationships and communities. In addition,many social network websites are
developed with interactive visualization interfaces to facilitate people connecting
their communities and maintaining social relationships.
In 2003, Club Nexus was established based on the friendship network data of Stanford
students and allowed them to explicitly list their friends by their profiles.
o For example, students registered on Club Nexus were provided with the profiles
of their year in school, major, residence, gender, personalities, hobbies and
interests to facilitate interacting with their online social networks.
In addition to listing actors with their profiles for social network analysis, a modern
visualization of social networks, Vizster, was contributed with customized techniques to
visualize social relationships and community structures in 2005.
o Vizster was developed based on node-edge network layouts for exploring
connectivity in large graph structures, supporting visual search and analysis, and
automatically identifying and visualizing community structures
The visualization techniques are mainly introduced to deal with the complex social
relationships based on human-centric or user-centric views. As the development of
SemanticWeb, a project called FOAF (Friend-of-a-friend) was proposed to visualize
such human-centric social relationships based on Semantic Web social metadata.
With XML/RDF format, the FOAF relations can be explicitly defined for further social
network analysis and visualization.
Microsoft Research Asia proposed a novel object-level search service, called Entity
Cube, to help people discover real-world entities, such as people, locations, and
organizations, and explore their social relationships.
Web entities are summarized from billions of Web pages with a modest presence.
Email Groups
7
UNIT V
Email service is one of themost popular applications that people often use to connect each
other and deliver messages in their daily lives. Personal online social networks are thus
constructed through people’s daily social interactions.
In 2004, Soylent was developed to study the social patterns and the temporal rhythms of
daily email activities. Through the Soylent visualization, mutual interactions between
different users and groups, and their everyday collaboration activities can be clearly
displayed.
Two visual metaphors, Social Network Fragments (SNF) and PostHistory, were
employed to visualize the major two dimensions of email activities: people and time.
o In SNF different colors are utilized to indicate people from different contexts of
ego’s social life.
o PostHistory presents social network visualization with a calendar panel on the
left and a contacts panel on the right. The email exchange activities with time
progress can thus be visualized in an interactive calendar-like interface.
In 2006, an improved calendar-like visualization interface, called The mail, was
developed to help analyze email-based social networks with a chronological sequence
and the corresponding topics.
o Through the analysis of email content, the social relations and mutual interactions
between a user and her contacts can be clear presented in The mail
Digital Libraries
In digital libraries, social networks can be mainly analyzed from two aspects: authors and
writings.
Co-Authorship Networks
On the aspect of authors, co-authorships can be mined from the existing publications and
organize the co-authorship networks.
From the visualization of the co-authorship networks illustrated above, a small world graph can
be drawn with the connections of authors from different places in the world.
Other characteristics of social network analysis, such as higher clustering coefficient and longer
path length, also indicate that co-authors of one author are more likely to publish together in the
JCDL (Joint Conference on Digital Libraries) community, and authors from different groups are
not as well connected as those in other co-authorship communities.
8
UNIT V
With the matrix representation, the interlaced problem of the node-edge representation caused by
a large amount of nodes and complex relations can be effectively improved.
Co-Citation Relations
Social networks in digital libraries can be discovered from the citations and co-citations among
writings themselves.
In 2006, a novel visualization tool, called Circle View, was proposed to visualize academic
citation relations with interactive design and highlighted color.
In 2007, an interactive visualization tool was developed to present large co-citation networks
with latent visual cues and allows direct interaction with the visualized graphs.
In 2009, an innovative visualization technique, called FP-tree, was developed to present co-
citation network from a new perspective, namely, visualizing social networks based on a paper-
reference matrix instead of using a reference-reference matrix.
The paper-reference matrix was transformed into an FP-tree visualization to analyze the
intellectual structure of two domains: Information Visualization and Sloan Digital Sky Survey
(SDSS).
Although the FP-tree visualization is friendly to help users analyze the intellectual structure, it
will also cause multiple distributions of the same reference and make the tree structure larger
than the co-citation network in several magnitude levels.
9
UNIT V
10
UNIT V
11
UNIT V
Explore Interactively
Both the matrix and node-link representations support the analysis of the network at different
levels of details. For instance, if an analyst is looking for an overview of the network to identify
its main communities, the matrix is the best option. Then, when a more detailed analysis is
required, to identify actors bridging two communities for example, node-link diagrams
constitute a better alternative. With Matrix Explorer, we provide multiple views of the network
and provide a number of tools to interactively manipulate matrix and node-link representations.
Selecting a visual pattern in the matrix and visualizing its equivalent in the node-link
diagram also ease the understanding and learning of matrix representations, making them
accessible to less expert users.
Set of tools available for manipulating matrix and node link representations are listed below:
Interactive specification of visual attributes
Interactive layout and reordering
Automatic layout and reordering techniques
Computer-assisted layout and reordering techniques
Interactive filtering.
Interactive clustering
OverviewCDetail techniques to navigate in both representations
12
UNIT V
Present Findings
While matrix representations may prove effective when exploring large networks, node-link
diagrams are essential to communicate findings to a wide audience. node-link diagrams may be
created for presenting results with different filters and possibly different aggregations. To ease
this process. Matrix Explorer allows users to generate pictures while performing the exploration.
Hybrid Representations
Providing both matrix and node-link diagrams to the user has a number of advantages but also
drawbacks.
It requires a large amount of display space.
At least two display monitors are required to comfortably use Matrix Explorer;
Switching from one representation to the other may induce high cognitive load to the
user.
Two hybrid representations were developed namely,
o MatLink and NodeTrix
Augmenting Matrices
The principle of MatLink is to augment a standard matrix representation with links on its
borders. These links provides a dual encoding of the connections between actors. Two types of
links are added to the representations:
static links (in white on the figure) and
interactive links (in a darker shade).
When a row or column is selected, these links show a shortest path to any other row or column
placed under the cursor.
Assessing the Readability of MatLink
MatLink introduced specific tasks of social network analysis: find a cut point, find
a clique and find communities (strongly connected groups).
By the way MatLink significantly improve standard matrix representations.
The only task for which node-link diagrams still perform better is the identification of
cut points. With MatLink, this task requires to identify specific visual patterns of the
links.
13
UNIT V
Interactive Exploration
NodeTrix developed a number of interactions based on traditional drag-and-drop of objects with
the mouse cursor for ease creation, exploration and edition of matrices.
Matrix representations have the advantage of placing actors of the network linearly (in rows and
in columns), thus it becomes easy to identify the community members connected to external
actors. To add or remove actors from the matrix, users simply select the node or row/column
representing an actor and drag it in or out of the matrix. Other interactions include the possibility
to merge two matrices or split them to get back to the original node-link representation.
Drawback:
Making it impossible to place an actor in two different communities.
14
UNIT V
Presenting Findings:
NodeTrix can be used for both exploration and communication because matrices can be
expanded showing detailed information on actors and connections showing higher-level
connection patterns.
Organizational Issues
Organizational Behavior is the study and application of knowledge about how people,
individuals, and groups act within an organization. In any organization, cooperation and
information sharing among the workers is very important for the success of an organization.
SNA can also be used to identify the key or central persons of an organization which also helps
to understand important to go people in an organization.
Team Formation
For the success of any project, right team formation is a very crucial issue which requires
careful analysis of the available human resources of an organization.
In larger organizations, it is obvious that two individuals work on similar projects without
realizing it. It is possible to generate the teams of individuals having similar skills and
interest using SNA.
15
UNIT V
The trust factor is a central influencer on performance of project teams both tangibly and
intangibly apart from skills of team members.
The teams cohesiveness is also key factor which affects the project’s success.
Multi-disciplinary projects benefit from appropriate multi-agency team in terms of its
better performance and results.
Identifying bottlenecks
Team may not function as expected even by having the right team and right information
resources. The bottlenecks such as un-uniform distribution of workload and resources may
affects the decision making and information sharing.
Social network analysis can identify such bottlenecks in a team. The team can address these
issues and plan ways to improve the efficiency and unlock the flow of resources in the network.
Hidden barriers
Hidden barriers arise because of different race, religion, cast, age, gender, professional or
educational background, department etc.
Social network analysis has been also used as a tool to identify such hidden barriers, to
understand effect of these hidden barriers, and help people to plan for simple, targeted
interventions.
16
UNIT V
Most of the e-commerce sites such as Amazon, ebay etc. have their own
recommendation systems for recommending customized products to customers and also
tries to improve targeted marketing of products.
Various social network analysis techniques are applied on such information systems to
retrieve user interest patterns and other users of same likes or dislikes.
These recommendation systems collects database of users and items purchased for further
analysis which is mostly done by using various data mining techniques.
Social Network analysis in recommendation systems helps to enhance selling by
converting browsers into buyers. Also, these websites acts as recommender agents to
learn customers, obtain their preference and provide items of their interest.
The SNA makes use of various metrics such as centrality, cohesiveness, degree of vertex
etc. each may reflect different meaning in recommendation system analysis.
o Node with high centrality means it has high impact on other nodes
o The vertex similarity may be considered as metric to search the individuals
having same interest or preference.
o Cohesiveness property of network defines a group of nodes of network bounded
with each other by some relation and may have common characteristics
Covert Networks
The covert networks are hidden, the actors of such network does not disclose their information to
the external world. Covert groups have cellular networks structure which is different from
hierarchical organizations. The terrorist and criminal networks are good examples of such
networks.
SNA has been successfully applied to such domains to understand covert cell operations and
their organization. SNA is applied on terrorism database for predicting node and link,
discovering interesting patterns and actors involved in an event.
Another vital application of SNA for terrorist database is to predict terrorism activities. SNA
tools has been used to identify these organization structures and provide critical information for
terrorist detection and terrorism prediction.
SNA techniques applied to terrorists network varies from basic measures to complex graph
algorithms and data mining techniques. SNA considers terrorists networks analysis as a problem
of connecting dots. Connecting multiple pairs of dots exposes the total network. Centrality is the
17
UNIT V
most important and widely used measure in SNA used to identify key players in terrorist
network.
To facilitate this, the regular day-to-day activities of the key players are monitored. The hidden
actors are discovered by monitoring contact and the extend of contacts of known terrorists with
other people.
With the advanced graph theoretic and link analysis techniques, SNA is applied to terrorists
network to persecution of criminal activities.
Web Applications
Web being a wealth of information, SNA finds a lot of applications in this domain. Web is being
used by different community for various purposes such academic improvement, knowledge
sharing, interest sharing, communication and profiling, research, business etc. Hence,
different techniques are required to improve and optimize the usability of web.
Researchers have been also employed to study the network of World Wide Web as a social
network. It helps to understand how sociology evolves with respect to contents of the web.
SNA models web as a graph where web pages are represented as nodes and hyperlinks as edges.
Node similarity based SNA techniques are employed to classify the web based on its usage and
contents in order to understand the scope of domain and density.
SNA is also used in search engines such as google to enhance keyword search quality. Google
uses PageRank as a measure of popularity, which is obtained by simulating a random walk on
network of web pages and computing prestige of web pages
Community Welfare
The SNA techniques are not limited to scientific and research areas, rather also used to improve
the community welfare. SNA is used to analyze different types of relations such as
communication patterns, physical contacts, sexual relationship etc. The SNA may reveal the
patterns of human contact which may lead to spread of disease such as HIV in population.
Another interesting application is to use SNA to examine and observe farm animal network to
identify patterns of disease spread from one animal to another.
Mass surveillance is one of the modern practices undertaken by some organizations and
governments to monitor the behavior of suspected people of population. This is done with the
purpose of protecting people from criminals, terrorists or political subversives to maintain social
control.
18
UNIT V
Social Networks which are made for strengthening community resilience against disasters
(natural or human-made) can reveal vulnerabilities within a network.
Collaboration Networks
Collaboration network consists groups of persons working together to perform particular activity
and studying human collaboration is an important topic in sociology. The widely studied
collaboration network by researchers in context of SNs are science Co-authorship collaboration
network and movie actor collaboration networks.
The co-authorship network is analyzed by various researchers to study dynamics in patterns of
interactions between educational entities or communities. The Co-authorship network analysis
also helps to study and understand the interdisciplinary research which is key factor for
innovation.
Examples of co-authorship networks are Wikipedia article authors, network of the pacific Asia
Conference on Information Systems, network of European Conference on Information Systems
(ECIS) etc.
The required datasets for co-authorship network analysis is mostly extracted from sources
including scientific journals, bibliographic records and digital libraries
Another type of collaboration network is knowledge collaboration network. The information
about Open Source Software needs to be distributed amongst community or users because not
all members have required knowledge or skills for such software usage and development. Hence,
success of such software highly depends on distribution of knowledge using tools such as
emails, discussion forums, web blogs etc.
Co-Citation Networks
Co-citation is used as a measure of similarity between two objects. Co-citation analysis helps to
understand the status and structure of scientific research. Basic two approaches of co-citation are
author co-citation and
document co-citation.
Basic application of co-citation analysis is to study the scientific communication. There are
different examples of co-citation analysis. In the field of methodological evaluation, co-citation
analysis has been employed to search for invisible colleges. This reveals the research network
consisting of different institutions linked to each other informally by having indicators to each
19
UNIT V
others documents/papers which can be used to get group of institutes having similar ongoing
research.
20