Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
ROLL NO -26900123054
DEPERTMENT-CSE REGISTRATION NO -
232690120125 SEMESTER-6 TH
SESSION-2023-2024
Parallel Database Systems:
A n O er iew
Parallel database systems are designed to improve performance
by executing multiple operations simultaneously. These
systems are essential for managing large datasets and complex
queries in distributed environments. This presentation will
explore the key concepts, architectures, techniques, and real-
world implementations of parallel database systems.
by Sayan Ghosh
Distributed s. Parallel Databases: Core
Diff erences
Distributed Databases Parallel Databases
Data is spread across multiple machines, A centralized system with multiple processors,
emphasizing location transparency and autonomy. emphasizing performance and throughput via
The focus is on data distribution, fault tolerance, parallel processing. The focus is on performance,
and geographic dispersion. scalability, and high availability within a single
These databases are loosely coupled and potentially system. These databases are tightly coupled and
heterogeneous, ideal for worldwide banking typically homogeneous, suitable for large data
systems with local data management. warehouses used for complex analytics.
Architectures for Parallel
Databases
3 Parallel Join
Joins large tables in parallel using techniques like hash join
and sort-merge join to improve join performance. Hash
join involves partitioning tables based on hash values and
joining partitions in parallel.
D ata Parti ti oning Strategies
Horizontal Parti ti oning
Divides rows of a table across multiple nodes. Round
Robin distributes rows evenly, while Hash
1 Partitioning distributes rows based on a hash
function applied to a key column (e.g.,
customer_id). Range Partitioning distributes rows
based on ranges of values in a key column (e.g.,
customer_id 1-1000).
Ro u n d Ro b i n E xa m p l e
2 Node 1 gets rows 1, 4, 7; Node 2 gets rows 2, 5, 8;
Node 3 gets rows 3, 6, 9, ensuring even distribution
across nodes.
H a s h Parti ti oning E xa m p l e
3 Hashing customer_id to distribute customer data
across nodes, ensuring related data can be
processed together.
Parallel Query Opti mizati on
Techniques
Query Decompositi on
Breaks down complex queries into smaller, parallelizable
tasks that can be executed concurrently.
D ata Localizati on
Moves computation to the data to minimize data transfer,
applying filters on data at the node where the data resides
before transferring it.
Concurrency Control and Transacti on
Management
Two- Phase C o m m i t (2PC)
Ensures that transactions are
2 either fully committed or fully
rolled back across all nodes,
Distributed L o c k i n g
maintaining atomicity.
Manages locks across multiple
1
nodes to ensure data
consistency, using protocols
Distributed Deadlock
like two-phase
Detecti on
locking.
Detects and resolves deadlocks
3 that occur across multiple
nodes, using a global deadlock
detector.
Fault Tolerance and H i g h A ailability
Replicati on D ata Parti ti oning with Automati c Failo er
Redundancy
Creating multiple copies of data Automatically switching to a
on diff erent nodes to ensure Distributing data across nodes backup node in case of a failure,
data is available even if one with redundant copies to using heartbeat mechanisms to
node fails. Can be synchronous ensure data availability. detect node failures.
or asynchronous. Utilizing RAID configurations
and mirroring data across
nodes.
Case Studies: Real-World Implementati ons