Module 5
Module 5
Graph Database
Graph databases are a type of NoSQL database that employ graph structures to store and
manage data.
Unlike traditional relational databases that use tables and rows, graph databases utilize
nodes, edges, and properties to represent and store information.
Nodes represent entities such as people, products, or places, while edges represent the
relationships between these entities.
Properties are used to store additional attributes or metadata about nodes and edges.
Graph databases excel at modelling and querying complex relationships, making them
particularly suited for applications with highly connected data.
Graph DB
Advantages
Flexible Schema: Graph databases offer a flexible schema that can adapt to evolving data
models without requiring costly schema migrations.
Efficient Queries: Graph databases enable efficient traversal of relationships, allowing for
fast and complex queries on connected data.
Dis-advantages
Scalability Issues: While graph databases are designed to scale horizontally, maintaining
performance at scale can be challenging, especially for write-heavy workloads.
Data Modelling Complexity: Designing effective graph schemas and managing evolving
data structures can be complex, requiring expertise in graph database design and modelling.
Graph Databases
o Consistency
o Transaction
o Availability
o Query features
o Scaling
Consistency
Graph databases ensure data consistency through the maintenance of relationships between
nodes and edges.
This means that whenever a change is made to one node or edge, the corresponding
changes are propagated throughout the graph, ensuring that the data remains coherent and
accurate.
Consistency - Example
In a social media platform, when a user adds another user as a friend, it's crucial that both
users' friend lists are updated simultaneously to maintain consistency.
If user A adds user B as a friend, but the addition fails to update user B's friend list due to
inconsistency, it could lead to confusion and potential issues in displaying mutual
connections.
Transaction
This ensures that database transactions are executed reliably, and the database remains in a
consistent state even in the event of failures.
Transaction - Example
Banking Transactions
In banking systems, when a customer transfers funds from one account to another, the
transaction must be completed securely and reliably to maintain the integrity of the
customer's financial data.
The transaction must adhere to the principles of ACID (Atomicity, Consistency, Isolation,
Durability) to ensure that funds are transferred accurately and that the customer's account
balances are updated correctly.
Availability
High availability is a critical feature of graph databases, ensuring that users can access data
without interruption.
This is achieved through features such as data replication, fault tolerance, and automatic
failover mechanisms, which ensure continuous access to data even in the face of hardware
failures or network outages.
Availability - Example
In an online retail recommendation system, when a user browses products, the system
needs to be continuously available to provide personalized recommendations in realtime.
Graph databases offer powerful query capabilities for traversing and analysing
relationships between nodes and edges.
This enables users to perform complex graph queries efficiently, such as finding the
shortest path between two nodes, identifying patterns within the graph, or performing
graph-based analytics.
Query Features - Example
Graph database query features allow healthcare providers to perform complex analyses,
such as identifying patterns of disease spread, optimizing patient care pathways, and
predicting healthcare outcomes based on historical data.
Scaling
Graph databases are designed to scale horizontally, allowing them to handle largescale
connected datasets with ease.
Horizontal scaling involves distributing the data and workload across multiple nodes,
enabling the database to handle increasing data volumes and user loads without sacrificing
performance.
Scaling - Example
In a social networking platform, as the number of users and connections between them
grows, the graph database must be able to scale horizontally to accommodate the increasing
volume of data.
Horizontal scaling allows the platform to distribute the workload across multiple servers,
ensuring that performance remains consistent even as the social network expands in size
and complexity.
Use Cases of Graph Database
Connected Data
Graph databases are well-suited for applications that deal with highly interconnected data,
such as social networks, where relationships between entities are as important as the
entities themselves.
Graph databases can effectively manage and analyse the complex networks of devices and
sensors in IoT deployments.
Recommendation Engine
Fraud Detection:
Graph databases are used in fraud detection applications to identify patterns of fraudulent
behaviour and detect suspicious activities.
Neo4j is a leading graph database management system known for its performance,
scalability, and ease of use.
It offers a robust set of features, including support for ACID transactions, built-in graph
algorithms, and a powerful query language called Cypher.
Neo4j's architecture consists of a graph database engine, storage layer, and query
processing components, all optimized for handling graph data efficiently.
Graph Database Engine: Responsible for executing graph queries and managing the
traversal of relationships between nodes and edges.
Storage Layer: Stores graph data efficiently on disk and manages data retrieval and storage
operations.
Query Processing: Executes Cypher queries and performs optimizations to ensure efficient
query execution.
Example Queries:
Creating Nodes: `CREATE (Alice: Person { name: 'Alice' }), (Bob: Person { name: 'Bob'
})`
Cypher is a declarative query language for Neo4j, designed specifically for querying and
manipulating graph data.
It allows users to express patterns and relationships in a concise and readable manner.
Examples of Cypher Queries