0% found this document useful (0 votes)
9 views

Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem

The presentation provides an overview of parallel database systems, emphasizing their ability to improve performance through simultaneous operations on large datasets. It contrasts parallel databases with distributed databases, detailing architectures, query processing techniques, and data partitioning strategies. The document also discusses real-world implementations and the future of parallel databases, highlighting trends such as cloud adoption and big data integration.

Uploaded by

Sayan Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem

The presentation provides an overview of parallel database systems, emphasizing their ability to improve performance through simultaneous operations on large datasets. It contrasts parallel databases with distributed databases, detailing architectures, query processing techniques, and data partitioning strategies. The document also discusses real-world implementations and the future of parallel databases, highlighting trends such as cloud adoption and big data integration.

Uploaded by

Sayan Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

PRESENTATION ON - PARALLEL DATABASE SYSTEM

PAPER NAME- DISTRIBUTED DATABASE SYSTEM


FOR- Continuous Assessment 1 (CA1)

NAME – SAYAN GHOSH

ROLL NO -26900123054
DEPERTMENT-CSE REGISTRATION NO -
232690120125 SEMESTER-6 TH

SESSION-2023-2024
Parallel Database Systems:
A n O er iew
Parallel database systems are designed to improve performance
by executing multiple operations simultaneously. These
systems are essential for managing large datasets and complex
queries in distributed environments. This presentation will
explore the key concepts, architectures, techniques, and real-
world implementations of parallel database systems.

We will begin with an introduction to parallel database


systems, comparing them to traditional systems and
highlighting their key benefits. Then, we will delve into the
architectures, query processing techniques, and data
partitioning strategies used in these systems.

by Sayan Ghosh
Distributed s. Parallel Databases: Core
Diff erences
Distributed Databases Parallel Databases
Data is spread across multiple machines, A centralized system with multiple processors,
emphasizing location transparency and autonomy. emphasizing performance and throughput via
The focus is on data distribution, fault tolerance, parallel processing. The focus is on performance,
and geographic dispersion. scalability, and high availability within a single
These databases are loosely coupled and potentially system. These databases are tightly coupled and
heterogeneous, ideal for worldwide banking typically homogeneous, suitable for large data
systems with local data management. warehouses used for complex analytics.
Architectures for Parallel
Databases

Shared Memory Shared D i s k Shared Nothing


Multiple processors Multiple processors Each processor has
access a common share common its own memory
memory space, disks, providing and disks,
facilitating easy high availability communicating via
communication and moderate a network. This
and low latency. scalability. Disk off ers high
However, this contention and scalability and fault
architecture suffers complex tolerance but
from memory concurrency control involves complex
contention and are its drawbacks. communication
limited scalability. IBM DB2 with and higher latency.
Oracle Exadata shared disk cluster Teradata systems
exemplifies this configurations is a and Hadoop
with its tightly notable example. clusters are
integrated representative of
hardware and this architecture.
software.
Parallel Query Processing:
Core Techniques
1 Parallel S ca n 2 Parallel Sort
Distributes table scans Sorts large datasets in
across multiple parallel using algorithms
processors to speed up like parallel merge sort,
data retrieval. enhancing sorting
For example, scanning a performance. For
1TB table using 10 example, sorting a 500GB
processors, each scanning dataset in parallel using
100GB. multiple sorter nodes.

3 Parallel Join
Joins large tables in parallel using techniques like hash join
and sort-merge join to improve join performance. Hash
join involves partitioning tables based on hash values and
joining partitions in parallel.
D ata Parti ti oning Strategies
Horizontal Parti ti oning
Divides rows of a table across multiple nodes. Round
Robin distributes rows evenly, while Hash
1 Partitioning distributes rows based on a hash
function applied to a key column (e.g.,
customer_id). Range Partitioning distributes rows
based on ranges of values in a key column (e.g.,
customer_id 1-1000).

Ro u n d Ro b i n E xa m p l e
2 Node 1 gets rows 1, 4, 7; Node 2 gets rows 2, 5, 8;
Node 3 gets rows 3, 6, 9, ensuring even distribution
across nodes.

H a s h Parti ti oning E xa m p l e
3 Hashing customer_id to distribute customer data
across nodes, ensuring related data can be
processed together.
Parallel Query Opti mizati on
Techniques
Query Decompositi on
Breaks down complex queries into smaller, parallelizable
tasks that can be executed concurrently.

Cost-B as ed Opti mizati on


Chooses the most effi cient execution plan based on
estimated costs, considering factors like CPU, I/O, and
network costs.

Parallel J oin Ordering


Determines the optimal order to perform joins in parallel,
often joining the smallest tables fi rst to reduce
intermediate result sizes.

D ata Localizati on
Moves computation to the data to minimize data transfer,
applying filters on data at the node where the data resides
before transferring it.
Concurrency Control and Transacti on
Management
Two- Phase C o m m i t (2PC)
Ensures that transactions are
2 either fully committed or fully
rolled back across all nodes,
Distributed L o c k i n g
maintaining atomicity.
Manages locks across multiple
1
nodes to ensure data
consistency, using protocols
Distributed Deadlock
like two-phase
Detecti on
locking.
Detects and resolves deadlocks
3 that occur across multiple
nodes, using a global deadlock
detector.
Fault Tolerance and H i g h A ailability
Replicati on D ata Parti ti oning with Automati c Failo er
Redundancy
Creating multiple copies of data Automatically switching to a
on diff erent nodes to ensure Distributing data across nodes backup node in case of a failure,
data is available even if one with redundant copies to using heartbeat mechanisms to
node fails. Can be synchronous ensure data availability. detect node failures.
or asynchronous. Utilizing RAID configurations
and mirroring data across
nodes.
Case Studies: Real-World Implementati ons

Teradata IBM DB2 Oracle E xa d ata


Utilizes a shared-nothing Employs a shared-disk architecture Features a shared-memory
architecture for large-scale data for high availability and scalability, architecture optimized for
warehousing, serving major used by enterprises for Oracle databases, catering to
retailers and financial transactional processing and organizations needing high
institutions. data warehousing. performance and scalability.
Conclusion: The Future of Parallel Databases
Cloud Adopti on 1
Increasing adoption of cloud-based
parallel database solutions like Amazon
Redshift and
2 B i g D ata Integrati on
Google BigQuery Seamless integration with big data
is on the rise.
technologies such as Hadoop and Spark
Algorithm D e elopment 3 continues to evolve.
The development of new parallel query
processing algorithms and optimization
techniques is ongoing
a
nd crucial.
Parallel databases will continue to evolve, playing a critical role in data management and analytics. They
are essential for handling large datasets and complex queries in distributed environments, driving
innovation and effi ciency in various industries.

You might also like