0% found this document useful (0 votes)

115 views23 pages

Parallel and Distributed Databases Overview

Parallel and distributed databases allow data to be stored across multiple machines to improve performance and handle large datasets. There are three main architectures: shared memory, shared disk, and shared-nothing. Parallel query processing distributes operations like scans, joins, and sorts across multiple machines through intra-operator and inter-operator parallelism. The performance of parallel algorithms depends on how the data is partitioned among the machines. Optimizing parallel query plans requires different techniques than optimizing sequential plans.

Uploaded by

heyramzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views23 pages

Parallel and Distributed Databases Overview

Uploaded by

heyramzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

PARALLEL & DISTRIBUTED

DATABASES
CS561-SPRING 2012
W P I , M O H A M E D E L TA B A K H

INTRODUCTION
In centralized database:
Data is located in one place (one server)
All DBMS functionalities are done by that server
Enforcing ACID properties of transactions
Concurrency control, recovery mechanisms
Answering queries

In Distributed databases:
Data is stored in multiple places (each is running a DBMS)
New notion of distributed transactions
DBMS functionalities are now distributed over many machines
Revisit how these functionalities work in distributed environment
2

WHY DISTRIBUTED DATABASES

Data is too large
Applications are by nature distributed
Bank with many branches
Chain of retail stores with many locations
Library with many branches

Get benefit of distributed and parallel processing

Faster response time for queries

PARALLEL VS. DISTRIBUTED DATABASES

Distributed processing usually imply parallel processing
(not vise versa)
Can have parallel processing on a single machine

Assumptions about architecture

Parallel Databases

Machines are physically close to each other, e.g., same server room
Machines connects with dedicated high-speed LANs and switches
Communication cost is assumed to be small
Can shared-memory, shared-disk, or shared-nothing architecture

Machines can far from each other, e.g., in different continent

Can be connected using public-purpose network, e.g., Internet
Communication cost and problems cannot be ignored
Usually shared-nothing architecture

Distributed Databases

PARALLEL DATABASE
&
PARALLEL PROCESSING

WHY PARALLEL PROCESSING

At 10 MB/s
1.2 days to scan

1 Terabyte

1,000 x parallel
1.5 minute to scan.

1 Terabyte

dt
h

10 MB/s
Divide a big problem into many smaller ones to be solved in
parallel
Increase bandwidth (in our case decrease queries response
time)

DIFFERENT ARCHITECTURE
Three possible architectures for passing information
Shared-memory

Shared-disk
Shared-nothing

1- SHARED-MEMORY ARCHITECTURE
Every processor has its own disk
Single memory address-space for
all processors
Reading or writing to far memory can
be slightly more expensive

Every processor can have its own

local memory and cache as well

2- SHARED-DISK ARCHITECTURE
Every processor has its own
memory (not accessible by others)
All machines can access all disks
in the system
Number of disks does not
necessarily match the number of
processors

3- SHARED-NOTHING ARCHITECTURE
Most common architecture nowadays
Every machine has its own memory and
disk
Many cheap machines (commodity
hardware)

Communication is done through highspeed network and switches

Usually machines can have a hierarchy
Machines on same rack
Then racks are connected through highspeed switches

Scales better
Easier to build
Cheaper cost

TYPES OF PARALLELISM
Pipeline Parallelism (Inter-operator parallelism)
Ordered (or partially ordered) tasks and different machines
are performing different tasks

Order between
them

Pipeline

Sequential
Sequential

Sequential

Partitioned Parallelism (Intra-operator parallelism)

A task divided over all machines to run in parallel

Partition

Sequential
Sequential
11

More resources means

proportionally less time for
given amount of data.

Scale-Up
If resources increased in
proportion to increase in
data size, time is constant.

sec./Xact
(response time)

Speed-Up

Xact/sec.
(throughput)

IDEAL SCALABILITY SCENARIO

Ideal

degree of ||-ism

Ideal

degree of ||-ism

PARTITIONING OF DATA
To partition a relation R over m machines
Range partitioning

Hash-based partitioning

Round-robin partitioning

A...E F...J K...N O...S T...Z

Shared-nothing architecture is sensitive to partitioning

Good partitioning depends on what operations are
common
13

PARALLEL ALGORITHMS FOR

DBMS OPERATIONS

PARALLEL SCAN c(R)

Relation R is partitioned over m machines
Each partition of R is around |R|/m tuples

Each machine scans its own partition and applies the selection
condition c
If data are partitioned using round robin or a hash function (over
the entire tuple)
The resulted relation is expected to be well distributed over all nodes
All partitioned will be scanned

If data are range partitioned or hash-based partitioned (on the

selection column)
The resulted relation can be clustered on few nodes
Few partitions need to be touched

Parallel Projection is also straightforward

All partitions will be touched
Not sensitive to how data is partitioned

PARALLEL DUPLICATE ELIMINATION

If relation is range or hash-based partitioned
Identical tuples are in the same partition
So, eliminate duplicates in each partition independently

If relation is round-robin partitioned

Re-partition the relation using a hash function
So every machine creates m partitions and send the ith
partition to machine i
machine i can now perform the duplicate elimination
Same idea applies to Set Operations (Union, Intersect,
Except)
But apply the same partitioning to both relations R & S
16

PARALLEL JOIN R(X,Y) S(Y,Z)

Re-partition R and S on the join attribute Y (natural join) or (equi join)
Hash-based or range-based partitioning

Each machine i receives all ith partitions from all machines (from R
and S)
Each machine can locally join the partitions it has

Depending on the partitions sizes of R and S, local joins can be

hash-based or merge-join
OUTPUT
1
Original Relations
(R then S)

...
Disk

INPUT
hash
function
h

Partitions
1

2
B-1

B-1

B main memory buffers

Disk
17

PARALLEL SORTING
Range-based

Re-partition R based on ranges into m partitions

Machine i receives all ith partitions from all
machines and sort that partition
The entire R is now sorted
Skewed data is an issue
Apply sampling phase first
Ranges can be of different width

Merge-based

Each node sorts its own data

All nodes start sending their sorted data (one
block at a time) to a single machine
This machine applies merge-sort technique as
data come
18

COMPLEX PARALLEL QUERY PLANS

All previous examples are intra-operator parallelism
Complex queries can have inter-operator parallelism
Different machines perform different tasks

Sites 1-8
Sites 1-4
A

Sites 5-8
B

PERFORMANCE OF PARALLEL
ALGORITHMS
In many cases, parallel algorithms reach their expected lower
bound (or close to)
If parallelism degree is m, then the parallel cost is 1/m of the sequential cost
Cost mostly refers to querys response time

Example

Ideal

degree of ||-ism

sec./Xact
(response time)

Xact/sec.
(throughput)

Parallel selection or projection is 1/m of the sequential cost

Ideal

degree of ||-ism

PERFORMANCE OF PARALLEL
ALGORITHMS (CONTD)
Total disk I/Os (sum over all machines) of parallel algorithms can
be larger than that of sequential counterpart
But we get the benefit of being done in parallel

Example
Merge-sort join (serial case) has I/O cost = 3(B(R) + B(S))
Merge-sort join (parallel case) has total (sum) I/O cost = 5(B(R) + B(S))
Considering the parallelism = 5(B(R) + B(S)) / m

Number of pages
of relations R and S

OPTIMIZING PARALLEL ALGORITHMS

Best serial plan != the best parallel one
Trivial counter-example:
Table partitioned with local secondary index at
two nodes
Range query: all data of node 1 and 1% of
node 2.
Node 1 should do a scan of its partition.
Node 2 should use secondary index.

Table
Scan

A..M

Index
Scan

N..Z

Different optimization algorithms for parallel plans (more

candidate plans)
Different machines may perform the same operation but using
different plans

SUMMARY OF PARALLEL DATABASES

Three possible architectures

Shared-memory
Shared-disk
Shared-nothing (the most common one)

Parallel algorithms
Intra-operator

Scans, projections, joins, sorting, set operators, etc.

Inter-operator

Distributing different operators in a complex query to different nodes

Partitioning and data layout is important and affect the

performance
Optimization of parallel algorithms is a challenge
23

Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
Parallel and Distributed Databases NOTES
No ratings yet
Parallel and Distributed Databases NOTES
98 pages
Parallel Database Systems Overview
100% (1)
Parallel Database Systems Overview
141 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
Advanced Parallel DB Systems
No ratings yet
Advanced Parallel DB Systems
30 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
11 pages
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
No ratings yet
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
70 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
11 pages
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Parallelism in Database Management Systems
No ratings yet
Parallelism in Database Management Systems
37 pages
Module 3 - Parallel and Distributed Database
No ratings yet
Module 3 - Parallel and Distributed Database
22 pages
Parallel Database
No ratings yet
Parallel Database
22 pages
Parallel and Distributed Databases in DBMS
No ratings yet
Parallel and Distributed Databases in DBMS
31 pages
Adbms Unit4
No ratings yet
Adbms Unit4
24 pages
Second Unit ADBMS
No ratings yet
Second Unit ADBMS
53 pages
Parallel Database
No ratings yet
Parallel Database
4 pages
Parallel Database Partitioning Techniques
No ratings yet
Parallel Database Partitioning Techniques
44 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Parallel Database System
No ratings yet
Parallel Database System
55 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
9.CSI2004-ADBMS Module2 Part1
No ratings yet
9.CSI2004-ADBMS Module2 Part1
54 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
Parallel and Distributed DBMS Techniques
No ratings yet
Parallel and Distributed DBMS Techniques
15 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
11 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
CH14
No ratings yet
CH14
43 pages
Parallel and Distributed Database Systems
No ratings yet
Parallel and Distributed Database Systems
22 pages
Unit I
No ratings yet
Unit I
43 pages
17 DatabaseArchitectures
No ratings yet
17 DatabaseArchitectures
41 pages
Lec 22
No ratings yet
Lec 22
45 pages
Database Parallelism Essentials
No ratings yet
Database Parallelism Essentials
46 pages
Parallel Dbms
No ratings yet
Parallel Dbms
5 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
Adbms
No ratings yet
Adbms
70 pages
Intra Query Parallelism in Databases
No ratings yet
Intra Query Parallelism in Databases
58 pages
Intra-Query Parallelism Explained
No ratings yet
Intra-Query Parallelism Explained
13 pages
Dbms
No ratings yet
Dbms
14 pages
Parallelism in Database Systems
No ratings yet
Parallelism in Database Systems
42 pages
Module 4
No ratings yet
Module 4
23 pages
HPC2
No ratings yet
HPC2
22 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
17 pages
Unit-7 - Parallel Database Systems
No ratings yet
Unit-7 - Parallel Database Systems
35 pages
Execution
No ratings yet
Execution
37 pages
Remaining Portions DB
No ratings yet
Remaining Portions DB
57 pages
CS614 Finalterm Subjective Referencefile
No ratings yet
CS614 Finalterm Subjective Referencefile
27 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Evolution and Architecture of Database Systems
No ratings yet
Evolution and Architecture of Database Systems
51 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
LN 2
No ratings yet
LN 2
33 pages
Parallel Database
No ratings yet
Parallel Database
27 pages
ADBMS Mid Term Questions Answers
No ratings yet
ADBMS Mid Term Questions Answers
33 pages
Parallel Database Systems An Overview
No ratings yet
Parallel Database Systems An Overview
10 pages
Understanding Parallel Database Systems
No ratings yet
Understanding Parallel Database Systems
14 pages
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
No ratings yet
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
4 pages
Counselling
No ratings yet
Counselling
4 pages
CS6212 PDS Lab Manual CSE 2013 Regulations
No ratings yet
CS6212 PDS Lab Manual CSE 2013 Regulations
52 pages
OOAD Lab Manual for CSE Students
No ratings yet
OOAD Lab Manual for CSE Students
240 pages
Dewan Housing Finance Corporation Limited (DHFL) : High Safety Ratings
No ratings yet
Dewan Housing Finance Corporation Limited (DHFL) : High Safety Ratings
6 pages
Corrupted Document Analysis
No ratings yet
Corrupted Document Analysis
16 pages
Syllabus PCD
No ratings yet
Syllabus PCD
64 pages
Object-Oriented Analysis and Design Methodology
No ratings yet
Object-Oriented Analysis and Design Methodology
43 pages
Compare and Contrast The Object Oriented Methodology of Booch, Rumbaugh and Jacobson
33% (3)
Compare and Contrast The Object Oriented Methodology of Booch, Rumbaugh and Jacobson
1 page
Nokia Phones & Accessories Deals
No ratings yet
Nokia Phones & Accessories Deals
4 pages
Gravity Lessons for Middle School
No ratings yet
Gravity Lessons for Middle School
11 pages
Review Mid Term 2
No ratings yet
Review Mid Term 2
14 pages
Lamana Instruction 06 07 Poncho
100% (1)
Lamana Instruction 06 07 Poncho
6 pages
Construction of The Foundations For A Normative Model Based On The Paradigm of Sustainability. FMT Jesús Martín González.
100% (1)
Construction of The Foundations For A Normative Model Based On The Paradigm of Sustainability. FMT Jesús Martín González.
70 pages
Medical Distributor Management System
No ratings yet
Medical Distributor Management System
45 pages
Survival Movies: A Theological Analysis
No ratings yet
Survival Movies: A Theological Analysis
44 pages
Stanford Marshmallow Experiment
100% (1)
Stanford Marshmallow Experiment
17 pages
Government of Karnataka Department of Mines & Geology: M.V. Prashanth Kumar, KSAS Nodal Officer, E-Cell
No ratings yet
Government of Karnataka Department of Mines & Geology: M.V. Prashanth Kumar, KSAS Nodal Officer, E-Cell
44 pages
Society's Influence on Education
No ratings yet
Society's Influence on Education
2 pages
Lecture Notes Foundation Engineering
100% (1)
Lecture Notes Foundation Engineering
83 pages
Business Activity: Revision Questions
100% (4)
Business Activity: Revision Questions
1 page
Biology Project 2018-19: Topic: Aids and Hiv BY Mohammad Farhan CLASS-12-A ROLL NO.
100% (3)
Biology Project 2018-19: Topic: Aids and Hiv BY Mohammad Farhan CLASS-12-A ROLL NO.
19 pages
Logic and Bit Operations Overview
No ratings yet
Logic and Bit Operations Overview
3 pages
Introduction to Immunology Basics
100% (1)
Introduction to Immunology Basics
123 pages
180512-180503-Process Simulation and Optimization
No ratings yet
180512-180503-Process Simulation and Optimization
2 pages
Past Simple Tense Overview
No ratings yet
Past Simple Tense Overview
13 pages
Peter Thiel
No ratings yet
Peter Thiel
12 pages
Writing Magazine - January 2017 PDF
100% (1)
Writing Magazine - January 2017 PDF
111 pages
Papaji New Menu 2024-25
No ratings yet
Papaji New Menu 2024-25
2 pages
How To Donate Your Body: The Role of The HTA in Body Donation
No ratings yet
How To Donate Your Body: The Role of The HTA in Body Donation
15 pages
Tales of Music Songs For Weddings, Birthday Celebrations and Corporate Events
100% (2)
Tales of Music Songs For Weddings, Birthday Celebrations and Corporate Events
8 pages
History of Gymnastics
No ratings yet
History of Gymnastics
9 pages
Economics Workshop: Monopoly & Competition
No ratings yet
Economics Workshop: Monopoly & Competition
3 pages
Women Medical Officer
No ratings yet
Women Medical Officer
7 pages
Chemistry Practice Test With Answer For Physical Science Major
No ratings yet
Chemistry Practice Test With Answer For Physical Science Major
6 pages
Reologia
100% (1)
Reologia
428 pages
Comparing Characters' Goals in Yaskul's Trade
No ratings yet
Comparing Characters' Goals in Yaskul's Trade
12 pages
Ethical and Professional Issues in Group Practice
100% (1)
Ethical and Professional Issues in Group Practice
10 pages
Dyspepsia
100% (2)
Dyspepsia
28 pages
Anova Manova
No ratings yet
Anova Manova
9 pages

Parallel and Distributed Databases Overview

Uploaded by

Parallel and Distributed Databases Overview

Uploaded by

PARALLEL & DISTRIBUTED

WHY DISTRIBUTED DATABASES

Get benefit of distributed and parallel processing

PARALLEL VS. DISTRIBUTED DATABASES

Assumptions about architecture

Machines can far from each other, e.g., in different continent

WHY PARALLEL PROCESSING

Every processor can have its own

Communication is done through highspeed network and switches

Partitioned Parallelism (Intra-operator parallelism)

More resources means

IDEAL SCALABILITY SCENARIO

A...E F...J K...N O...S T...Z

A...E F...J K...N O...S T...Z

A...E F...J K...N O...S T...Z

Shared-nothing architecture is sensitive to partitioning

PARALLEL ALGORITHMS FOR

PARALLEL SCAN c(R)

If data are range partitioned or hash-based partitioned (on the

Parallel Projection is also straightforward

PARALLEL DUPLICATE ELIMINATION

If relation is round-robin partitioned

PARALLEL JOIN R(X,Y) S(Y,Z)

Depending on the partitions sizes of R and S, local joins can be

B main memory buffers

Re-partition R based on ranges into m partitions

Each node sorts its own data

COMPLEX PARALLEL QUERY PLANS

Parallel selection or projection is 1/m of the sequential cost

OPTIMIZING PARALLEL ALGORITHMS

Different optimization algorithms for parallel plans (more

SUMMARY OF PARALLEL DATABASES

Scans, projections, joins, sorting, set operators, etc.

Distributing different operators in a complex query to different nodes

Partitioning and data layout is important and affect the

You might also like