0% found this document useful (0 votes)
23 views

Graph Storage Formats and Visualization

The document discusses various graph storage formats, including adjacency lists, matrices, and edge lists, highlighting their characteristics, advantages, and limitations. It also covers graph visualizations, such as node-edge diagrams and matrix representations, which are used to analyze and present social media graphs. Additionally, it explores applications of these graphs in social networks, web communities, and digital libraries.

Uploaded by

rakshithavasan22
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Graph Storage Formats and Visualization

The document discusses various graph storage formats, including adjacency lists, matrices, and edge lists, highlighting their characteristics, advantages, and limitations. It also covers graph visualizations, such as node-edge diagrams and matrix representations, which are used to analyze and present social media graphs. Additionally, it explores applications of these graphs in social networks, web communities, and digital libraries.

Uploaded by

rakshithavasan22
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

Social Media Graphs, Graph

Storage Formats and


Visualization
Unit V
Agenda
• Graph Storage Formats
• Graph Visualizations
• Social Media Graphs
GRAPH STORAGE FORMATS
Graph Storage formats
• Adjacency List
• Adjacency Matrix
• Edge List
• Compressed Sparse Row (CSR)
• Compressed Sparse Column (CSC)
• Edge Dictionary
• Adjacency List of Edge Attributes (ALEA)
• Graph Database Formats
Adjacency List
• An adjacency list represents a graph as a collection of lists or
arrays, where each node in the graph has an associated list
that stores its adjacent (neighbor) nodes.
• Representation in Memory:
– Directed Graph: For a directed edge from node A to node B, B is
stored in A’s adjacency list but not vice versa.
– Undirected Graph: Each edge between A and B is stored in both A’s
and B’s adjacency lists.
Data Structure Options:
o
In languages like Python, adjacency lists are often implemented as
a dictionary of lists, where each key is a node, and the value is a list
of neighbors. For example:
graph = {
'A': ['B', 'C'],
'B': ['A', 'D'],
'C': ['A', 'D'],
'D': ['B', 'C']
}
o
In languages like C++, adjacency lists are commonly represented as
an array of vectors.
Weighted Graphs:
o
For graphs with edge weights, each adjacent node in the list is
paired with the weight of the edge. For instance:
graph = {
'A': [('B', 3), ('C', 5)],
'B': [('A', 3), ('D', 2)],
'C': [('A', 5), ('D', 4)],
'D': [('B', 2), ('C', 4)]
}
Key Characteristics of Adjacency Lists

Space Complexity: O(V+E) where V is the number of vertices (nodes) and E is the

number of edges.
o This makes adjacency lists very space-efficient for sparse graphs where E is
much smaller than V^2.
Time Complexity:

o Neighbor Lookup: Checking a specific neighbor is O(degree of node) on


average.
o Edge Insertion: O(1) for adding an edge in an undirected graph or a directed
graph.
o Edge Deletion: O(degree of node) to locate and remove a specific edge.
Advantages of Adjacency Lists
1.Space Efficiency: They are much more efficient in terms of memory usage than
adjacency matrices, especially for sparse graphs (graphs with relatively few edges compared
to nodes).
2.Ease of Traversal: Adjacency lists store nodes close to each other, making them efficient
for traversal algorithms like DFS (Depth-First Search) and BFS (Breadth-First Search).
3.Flexible Data Storage: It’s easy to add additional edge information, like weights or
labels, directly within each node’s adjacency list.

Limitations of Adjacency Lists


1.Slow Edge Lookup: Unlike an adjacency matrix where you can check the existence of an
edge in O(1), adjacency lists require O(degree of node) time to check if a specific edge
exists.
2.Inefficient for Dense Graphs: For dense graphs (where edges are close to V^2, adjacency
matrices are more memory-efficient and provide faster access to edges.
3.Random Access to Neighbors: Adjacency lists do not provide random access to
neighbors, as finding a specific neighbor could involve scanning part of the list.
Variations and Enhancements
1.HashMap-Based Adjacency Lists: For faster access to neighbors, some implementations
use a hash map within each adjacency list, especially if edge removal is frequent.
2.Edge Attributes: Adjacency lists can store additional data such as weights, capacities (for
network flow), or even labels associated with each edge.
3.Compressed Adjacency Lists: In large-scale applications, adjacency lists may be
compressed to reduce memory usage by encoding edges efficiently.

Example Use Cases


1.Social Networks: Representing user connections, where each user (node) connects to other
users (friends or followers) via edges.
2.Web Crawling and Link Analysis: Storing links between web pages, where each page
links to other pages.
3.Routing and Pathfinding: Maps and navigation systems often use adjacency lists to
represent locations and paths between them, especially when distances or weights are
relevant.
4.Recommendation Systems: Representing user-item interactions, where each user connects
to items they interact with.
Adjacency Matrix
Key Characteristics of Adjacency Matrices:

1.Space Complexity: O(V^2), where V is the number of vertices.


1. This space requirement makes adjacency matrices practical only for
small or dense graphs.

2.Time Complexity:
1. Edge Lookup: O(1) time to check if an edge exists between two
nodes, as it involves direct access to a matrix cell.
2. Edge Insertion/Deletion: O(1) time to add or remove an edge, as it
only involves setting or unsetting a matrix cell.

3.Traversal:
1. Accessing all neighbors of a node takes O(V) time, as the algorithm
must scan through the entire row (or column) for that node.
Applications of Adjacency Matrices

1.Graph Algorithms for Dense Graphs: Algorithms where quick access to edge
information is essential, such as Floyd-Warshall for all-pairs shortest paths,
benefit from adjacency matrices.

2.Matrix Operations on Graphs: In computational mathematics and computer


science, adjacency matrices enable matrix multiplication and other matrix
operations, which can reveal structural properties of the graph.

3.Social Networks and Interaction Graphs: Where relationships or interactions


are numerous, adjacency matrices allow for efficient representation and quick
lookup.

4.Computer Vision and Image Processing: Used to represent pixel adjacency


in grid-based image processing applications, where each pixel connects densely
to its neighbors.
Edge List
An edge list is a simple and compact way to represent a graph by explicitly listing
all edges in the graph, rather than focusing on nodes or their neighbors.
4. Compressed Sparse Row (CSR)

Description: Combines elements of adjacency lists and adjacency matrices, with an


array for row pointers and column indices.
Storage: Efficient for both space and traversal in sparse graphs.

Pros: Compact; fast for traversal algorithms as neighbors of a node are stored
contiguously.
Cons: Random access is slower; better suited for graph traversal than querying specific
edges.
Applications: Frequently used in scientific computing for sparse matrix operations.

5. Compressed Sparse Column (CSC)

Description: Similar to CSR but focuses on storing column indices for adjacency
information.
Storage: Rows are stored in contiguous memory locations.

Pros: Efficient for operations that require access to nodes connected to a particular
edge or for column-based traversal.
Cons: Not as efficient for general traversals; requires additional transformations for

node-centric operations.
Applications: Suitable for certain machine learning tasks involving graph data.

6. Edge Dictionary

Description: Uses dictionaries (hash tables) to store edges, often with each node as a

key, and an associated dictionary or set of connected nodes.


Storage: Space-efficient, especially for sparse graphs.

Pros: Fast lookups for neighbors; efficient for dynamic graph changes.

Cons: Less efficient for dense graphs due to the overhead of dictionary structures.

Applications: Practical for storing large, dynamic graphs where nodes and edges are
frequently added or removed.
7. Adjacency List of Edge Attributes (ALEA)

Description: Extends the adjacency list by attaching edge attributes (e.g., weights or
labels) to each edge in the adjacency list.
Storage: Each edge has a space for additional attributes.

Pros: Useful for graphs with complex edge data; efficient access to neighbor and edge
attribute information.
Cons: Space-intensive for dense graphs.

Applications: Useful in networks where edges have significant metadata, such as


transportation networks or knowledge graphs.
Graph Database Formats

Neo4j’s Native Storage: Uses its native adjacency list storage, optimized for fast
traversal with relationships stored alongside nodes.
Apache TinkerPop: Supports various formats, including adjacency list and graph-

based serialization formats like Gryo.


Parquet: Used in graph data lakes for compressed storage and columnar access

patterns.
Pros: Optimized for scalability and handling complex relationships; supports
efficient querying.
Cons: More complex than simple in-memory structures; relies on the underlying

database engine.
Applications: Widely used in industry for large-scale graph applications, including
social network analysis and fraud detection.
Final Note:

Each of these formats offers trade-offs in terms of memory efficiency, query


performance, and traversal speed, and the choice of format depends heavily on
the specific requirements of the graph structure and application context.
GRAPH VISUALIZATIONS
Graph Visualizations
• Some visual representations are considered
appropriate to present network structures,
such as
– node-edge diagrams and
– matrix representations
• These visual representations have also been
popularly employed in visualizing social
networks
Node-Edge Diagrams
• A node-edge diagram is an intuitive way to
visualize social networks.
• With the node-edge visualization, many
network analysis tasks, such as component
size calculation, centrality analysis, and
pattern sketching, can be better presented.
• Many node-edge layouts
– Few to mention: random layout, force-directed
layout, and tree layout
Random Layout
• A random layout is to put the nodes at
random geometric locations in the graph
– may not yield very clear visualization results,
particularly when the number of nodes immensely
increases
• But, can efficiently draw the social network
graph in linear time O(N)
– Takes into account the structural characteristics of
instances
A random geographic layout
Force-Directed Layout
• A force-directed layout is also known as a spring
layout, which simulates the graph as a virtual
physical system.
• In a force-directed layout, the edges act as spring
and the nodes act as repelling objects
• Generally, an initial random layout will be yielded
first, and then the force-directed algorithms will
run iteratively to adjust the positions of nodes
until all graph nodes and attractive forces
between the adjacent nodes run to convergence.
A force-based graph layout
Force-Directed Layout
• Since a force-directed layout may take hundreds of
iterations to obtain a stable layout, the running
time is at least O(N logN) or O(E), where N is the
number of nodes and E is the number of edges.
• FDL vs RL
– the running cost of a force-directed layout is much
higher than that of a random layout, especially when
the number of nodes is large
• It is therefore not suitable for graphs larger than
hundreds of nodes.
Tree Layout
• A tree layout can display a more structural layout than
graph layouts by considering more contextual
information.
– Because of the hierarchical nature of a tree layout, trees
are more straightforward to grasp human eye than general
graphs.
• Drawing a tree layout takes more constraints than
drawing a general graph since tree structures are a
special case of graphs.
• More contextual information of a graph can be
extracted to present a hierarchical layout and facilitate
network analysis
Variants of the tree layout
• hyperbolic tree layout and a radial tree layout
• utilize the idea of focus+context to better the
visualization effects with animation
techniques and help users to obtain both
global and local views of a social network in a
2D display
Matrix Representations
• simple Boolean matrix
– whose rows and columns represent the vertices of the graph
• with valued attributes
– associated with the edges to provide more informative network
visualizations
• Advantages:
– minimize the occlusion problems caused by the node-edge
diagram
– clusters and associations among the nodes can also be better
discovered when the number of nodes increases
– outperform a node-edge diagram in readability since the high
connectivity of a node-edge representation will easily diffuse
the focus
MatrixExplorer
• Dual-Representation - matrix and node-edge
SOCIAL MEDIA GRAPHS
Social Media Graphs - Visualizing
Online Social Networks
• Including Web communities, email groups,
digital libraries, and Web 2.0 services.
• online social network visualizations based on
different views of social relationships
– e.g. usercentric, social relationships, content-
centric social relationships, and hybrid social
relationships.
Web Communities
• 2003, Club Nexus
- Web community
of over 2,000
Stanford students
• established based
on the friendship
network data of
Stanford students
and allowed
them to explicitly
list their friends
by their profiles
Web Communities…
• Vizster developed based on node-edge network layouts for exploring connectivity
in large graph structures, supporting visual search and analysis, and automatically
identifying and visualizing community structures
Web Communities…
• FOAF (Friend-of-a-friend) was proposed to
visualize such human-centric social
relationships based on Semantic Web social
metadata
• Microsoft Research Asia - object-level search
service, called EntityCube
– to help people discover real-world entities, such
as people, locations, and organizations, and
explore their social relationships
FOAF: groups of actors with
shared interests and social
relations

People EntityCube: visualizing


human social relationships
Email Groups
• Some recurrent patterns discovered in the
social networks:
– onion pattern, the nexus pattern, and the butterfly
pattern - suggest regular ways of understanding
their interactions
Soylent: visualizing social relationships among
email groups
SNF: visualization of a
complex cluster of contacts
Digital Libraries
• Co-Authorship Networks
– With the visualization of co-authorships, some
characteristics, such as clustering coefficient and
average path length, can be hence analyzed in co-
authorship networks
• Co-Citation Relations
– proper visualization of co-citation networks,
documents with high impacts or similar citation
patterns can be immediately identified, and the co-
citation relationships can be intuitively observed as
well
Co authorship network of D-Lib plus JODL research
community
CircleView: visualization of paper citation relations
Web 2.0 Services
• Many Web 2.0 applications are popularly
accessed by users to connect their social
networks, such as Twitter and Facebook

You might also like