0% found this document useful (0 votes)

24 views24 pages

Adbms Unit4

Parallel database systems utilize multiple CPUs to enhance performance by executing tasks concurrently across various nodes, improving I/O speeds and throughput. They offer advantages such as increased response time and flexibility, but also face challenges like high start-up costs and interference issues. The architecture includes shared-memory, shared-disk, and shared-nothing models, each with its own benefits and limitations, while also addressing query parallelism through various techniques like I/O parallelism and intra-query parallelism.

Uploaded by

ofurusamuelchimenim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views24 pages

Adbms Unit4

Uploaded by

ofurusamuelchimenim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Parallel DataBase Systems

 Multiple CPU’s work in parallel to improve performance through parallel

implementation of various operations such as loading data, building indexes and
evaluating queries.
 It divides a large task into many smaller tasks and executes the task concurrently on
several nodes
 It improves the I/O speeds by using CPU’s and Disk working in parallel.

Advantages of Parallel Databases

1. Increased Throughput
2. Improved Response Time
3. Useful to Query extremely large databases.
4. Substantial Performance improvement
5. Increased availability of System
6. Greater Flexibility
7. Possible to serve large number of users.

Disadvantages of Parallel Processing

1. More Start-up costs.

2. Inteference Problem
3. Skew Problem

ARCHITECTURE OF PARALLEL DATABASES.

The most prominent architectural models for parallel machines are

Shared-memory multiple CPU

Shared-Disk multiple CPU

Shared-nothing multiple CPU

1. Shared-Memory Multiple CPU

A computer has several active CPUs that are attached to an interconnection network
and can share a single main memory and a common array of disk storage. A single copy of
multithreaded OS and multithreaded DBMS can support multiple cpu’s. This structure is
used for achieving moderate parallelism.

CPU
CPU CPU CPU

InterConnection Network

Shared
Disk Disk Disk memory
Benefits

 Communication between CPU’s is extremely efficient. A CPU can send messages to

otherusing efficient memory writes.
 Communication overhead is low.

Limitations

 Design must take special precaution so that different CPU’s have equal access to
the common memory.
 This is architecture is not scalable beyond 80 or 100 CPUs in parallel. The n/w
becomes a bottleneck if the number of CPU increases.
 The addition of more CPU may cause CPUs to spend time waiting for their turn
to access memory.
2. Shared-disk Multiple CPU Parallel Database Architecture

In this system, multiple CPUs are attached to an interconnection network

and each CPU has its own memory but all of them have access to the same
disk storage. The scalability is determined by the capacity and throughput of
the interconnection network.
Each node has its own copy of OS and DBMS. A kind of global locking
mechanism is needed to ensure the preservance of data integrity. This
architecture is also called as Parallel Database System.

memory mem mem

CPU CPU CPU

Interconnection n/w

DISK disk disk

Benefits

 Easy to load balance, since data does not have to be permanently divided among
CPU’s.
 Memory bus is not a bottleneck.
 Provides a high degree of Fault tolerance. In case of a CPU failure, the other CPU
take over the task, since the database is resident on disks that are accessible from
all CPUs.
 Acceptance in wide applications.
Limitations
 Interference and Memory contention bottleneck as the number of CPUs
increases.
 Problem of scalability. The interconnection to the disk subsystem becomes
bottleneck.

3. Share-Nothing multiple CPU Parallel Database Architecture.

In this system, multiple CPUs are attached to an interconnection network
and each CPU has a local memory and disk storage, but no two CPU can
access the same disk storage area. Each CPU has its own copy of OS, its own
copy of DBMS, and its own portion of data managed by the DBMS. CPUs
perform the transactions and queries by dividing up the work and
communicating by message over the high speed network.

mem mem

disk
disk
CPU cpu

Interconnection n/w

Benefits

 Minimise contention among CPUs by not sharing resources and therefore offer a
high degree of Scalability.
 Only queries and access to non local disk and result relation pass through the
network.
 Adding more CPUs and more disks enables the system grow and provides high
degree of Scalability. It provides linear speed up and linear scale up.
 Linear speed up and linear scale up increase the transmission capacity of shared
nothing architecture and so it can easily support for large number of CPUs.

Limitations

 Difficult to load balance. Proper splitting of workload across a shared-

nothing system requires an administrator to partition the data across the
various disks so that each CPU is kept roughly as busy as the others.
 Data may be redistributed in order to make advantage of the new hardware
and thus requires more extensive reorganization of the DBMS code.
 Cost of Communication is high.
 High speed networks are limited in size. This leads to the requirement that a
parallel architecture has CPU that is physically close together. This is also
called as LAN.
 This architecture introduces a single point of failure. If a CPU goes down,
data stored at one or more disks becomes inaccessible.
 It requires an OS that is capable of supporting heavy inter process
communication.

Applications:

1. Well suited for relatively cheap CPU technology.

2. It forms the basis for massive parallel processing.

KEY ELEMENTS OF PARALLEL DATABASE PROCESSING

1. Speed UP
2. Scale UP
3. Synchronisation
4. Locking
1. Speed UP

It is the property in which the time taken for performing a task decreases in
proportion to the number of CPU’s and disks in parallel. It is the property of
running a given task in less time by increasing the degree of
parallelism.Speed UP due to parallelism can be defined as

Speed up=to/tp

To-execution time of a task on the smaller machine.

Tp-execution time of a tack on the parallel machine.

For ex: if a original system takes 60sec to perform a task and the two parallel
systems take 30sec to perform the same task, then the value of speed up is 60/30=2. The
speed up value is an indication of linear speed up. Parallel system is said to demonstrate
linear speed up if the speed-up is N, when the larger system has N times the resources of
the smaller system. If the speed up is less than N, the system is less than N, the system is
said to demonstrate sub-linear speed up.

Speed UP curve is demonstrated below,

Linear speed up

speed

resources

2. Scale UP
Scale up is the property in which the performance of the parallel database is
sustained if the number of CPU and disks are increased in proportion to the
amount of data.
Scale up due to parallelism can be defined as
Scale up=vp/vo
Where vp=parallel or large processing volume.
Vo=original or small processing volume.

Linear speed up

Sub linear speed u

speed

problem size(no of CPu, database size)

The Scale up curve shows that adding more CPU enable the user to process
large tasks.
3. Synchronization

It is the coordination of concurrent tasks. For successful parallel processing,

the task should be divided such that the synchronization requirement is
lesser. With less synchronization better speed up and scale up can be
achieved.

4. Locking
It is a method of synchronizing concurrent tasks. For external locking, a
distributed lock manager(DLM) is used, which is a part of OS. The DLM
allows applications to synchronize access to resources such as data, s/w and
peripherals.

QUERY PARALLELISM

Parallelism is used to provide speed up and scale up. This is done so that the
queries are executed faster by adding more resources. The main challenge of parallel
databases is Query Parallelism. Some of the Query parallelism architectures are,

 I/O Parallelism
 Intra Query Parallelism
 Inter Query Parallelism
 Intra-Operation Parallelism
 Inter-operation parallelism

I/O Parallelism (Data Partitioning)

I/O Parallelism is the simplest form of parallelism in which the relations are
partitioned on multiple disks to reduce the retrieval time of relations from disk. The i/p data
is partitioned and each partition is executed in parallel. The results are combined after the
processing of all partitioned data. This is also called data partitioning.

4 types of partitioning techniques used are

 Hash Partitioning
 Range Partitioning
 Round-Robin Partitioning
 Schema Partitioning
HASH PARTITIONING

 In this technique a hash function is applied to the attribute value whose

range is [0,1,2….n-1]. Each tuple is hashed on the partitioning attribute.
The output of the function causes the data to be targeted to a particular
disk.
 For ex: if there are n disks d1,d2…dn across which the data is to be
partitioned. Now if the hash function returns a value 2, then the tuple is
placed in disk d2.
 Advantages:
Even distribution of data across the available disk, helping to
prevent skewing.
It is best suited for point queries based on the partitioned
attribute.
It is useful for sequential scans of the entire relations
 Disadvantages
It is not suited for point queries onnon-partitioning attributes.
It is not suited for answering range queries.

RANGE PARTITIONING

 Range partitioning distributes contiguous attribute-value ranges to each disk.

 For ex: range partitioning with 3 disks numbered 0,1,2 might place tuples for emp
numbers with upto 10000 on disk 0, tuples for empno between 10001-15000 on
disk1 and so forth.
 Advantages:
o Offers good performance for range-based queries and reasonable
performance for point queries.
 Disadvantages
o It cause skewing in some cases. Data may be evenly distributed.

ROUND ROBIN PARTITIONING

o The relations are scanned in any order and the ith tuple is send to disk number di mod
n.
o It ensures an even distribution of tuples across disks. Each disk has the same number
of tuples as the other.
Advantages
o Ideally suited for applications that wish to read the entire relation
sequentially for each query.

Disadvantages:

o Both point queries and range queries are complicated to process, since
each of the n disks must be used for search.

SCHEMA PARTITIONING

o Different relations within a database is placed on several disks.

o It is prone to data skewing.
2. Intra-Query Parallelism

It refers to the execution of a single query in parallel on multiple CPUs using

shared nothing parallel architecture technique. It is also called Parallel Query Processing.

For ex. Suppose a relation has been partitioned across multiple disks by range partitioning
on some attribute. Now a user wants to sort on the partitioning attribute. The sort
operation can be implemented by sorting each partition in parallel, then concatenating
the sorted partitions to get the final sorted relation.

Advantages

o Speeds up long running queries.

o They are beneficial for decision support applications, read-only queries,
queries involving multiple joins.

Cpu1 Cpu2 Cpu3

Interconnection n/w

Query1

3. Inter-Query Parallelism

o Multiple transactions are executed in parallel. It is also called parallel transaction

processing.
o The primary goal is to increase the transaction throughput.
o The incoming request are routed to the least busy processor and the overall
workload is kept balanced.
o Efficient lock management is another method used by dbms to support
parallelism.
o The DBMS must understand the locks held by different transactions inorder to
ensure the integrity.
o If memory is shared, the lock information is kept in the global memory.
o If disk is shared, then the lock information must be kept in the shared disk.
o Oracle 8 and Oracle RDB systems are examples of shared disk parallel database
systems that support inter query parallelism.
o Adv: Increased transaction throughput.
o Easiest form of parallelism.
o It scales up a transaction processing system to support a large number of
transactions per second.
o Disadv: Response time are no faster.
o It is more complicates in shared disk and shared nothing architecture.

Transaction1 Transaction2 Transaction3

CPU1 CPU2 CPU3

Interconnection n/w

4.Intra-Operation Parallelism

Here, the execution of each individual operation of a task is parallelized. The tasks
may be sorting, projection, join and so on. It scales better with increasing parallelism

Adv: It is natural in a database

Degree of Parallelism is enormous.

5.Inter-operation Parallelism

Here, the different operations in a query expression are executed in parallel. The
two types of inter-operation parallelism are

1. Pipelined Parallelism
2. Independent Parallelism

Pipelined Parallelism

In pipelined parallelism, the o/p tuples of one operation A are consumed by

second operation B, even before the first operation has produced the entire set of tuples
in its output. Thus it is possible to run both the operations simultaneously.

Adv: Useful with a small no of CPUs.

Avoid writing intermediate results to disks.

Disadv:

Does not scale up well. Pipeline chain do not attain sufficient length. Only

Marginal speed up is obtained for the frequent cases.

Independent Parallelism

The operations in a query expression do not depend on one another can be

executed in parallel.

Adv: Useful with a lower degree of parallelism

Disadv: Less useful in a highly parallel systems.

Distributed Databases (DB)
A distributed database system consists of loosely coupled sites that share no physical
component. Database systems that run on each site are independent of each other
Transactions may access data at one or more sites.
Multiple CPU’s are loosely coupled and geographically distributed at several sites
interconnected by means of telephone lines,OFC,satellite etc with no sharing of physical
components.
A Distributed Database system (DDBS) is a database physically stored on several
systems connected together via communication network.
Each site is a database system in its own right.

Site A Site B

Communication n/w

Site C Site d

Difference between Parallel & Distributed Databases

Parallel Distributed
Tightly coupled, closed vicinity Loosely coupled, geographically
dispersed
Local & global transactions are not differentiated Local & global transactions
are differentiated
Desirable Properties of DB
1. Distributed Data Independence
2. Distributed Transaction Atomicity
1.DD Independence
It enables users to ask queries without specifying where the relations or copies or
fragments of relations are located.
2.DD Transaction Atomicity
It enables users to write transactions that access and update data at several sites
just as they would write over purely local data. Moreover,all changes persist if the
transaction commits,and none persists if it aborts.
TYPES OF DD
 Homogeneous
 Heterogeneous
Homogeneous Distributed Databases
Simplest form of DD, where several sites each running their
own applications on the same DBMS s/w. All sites have identical DBMS s/w. All applications
can see the same schema and run the same transaction. It provides local transparency.
Heterogeneous Distributed Databases
Different sites run under the control of different DBMS and
are connected to enable access to data from multiple sites. Different sites use different
schema and different DBMS s/w. Each server(site) is an independent and autonomous
centralized DBMS. Also called multidatabase system or federated database systems.
Hereone server may be RDBMS, another DBMS and third can be ORDBMS or centralized
DBMS.
Functions of DD
 Ability to keep track of data, fragmentation and replication.
 Provide local autonomy.
 Should be location independent
 Distributed Catalogue management
 No reliance on central site
 Ability of replicated data management
 Ability to manage distributed query processing.
 Ability of distributed transaction management.
 Should have fragmentation independence.
 h/w independent
 Os independent
 Recovery Management
 N/w independent
 DBMS independent
 Proper management of security of data.
Advantages of DD
1. Sharing of data
2. Increased Efficiency
3. Efficient management of distributed data.
4. Structure of database mirrors the structure of the enterprise.
5. Increased Local autonomy.
6. Increased Accessibility
7. Increased Availability
8. Increased Reliability
9. Increased Performance
10. Improved Scalability
11. Easier Expansion
12. Parallel Evaluation.
Disadvantages
1. Recovery is complex.
2. Increased Complexity
3. Increased Transparency
4. Increased s/w cost
5. Great potential for bugs.
6. Increased processing overheads
7. Technical problem of connecting dissimilar machines.
8. Difficulty in Integrity.
9. Security is difficult to maintain.
10. Lack of Standards.

Architecture of DD
1. Client-server
2. Collaborating Server
3. Middleware Systems
Client-Server Architecture
 DBMS workload is split into 2 logic components-Client & server.
 Client is the user of the resource whereas the server is the provider of the resource.
 Applications and tools are put on one or more client platforms and are connected to
DBMs tht resides on the server.
 The DBMS in turn, services these requests and returns the results to the client.
 All modern information systems are based on client/server architecture.

Users Users
Applications & tools
Applications & tools

Database server

Components
 Clients in the form of intelligent workstations.
 DBMS Server
 Communication n/w
 s/w Applications connecting clients,server and network.
Benefits
 Simple to implement
 Better Adaptability
 Use of GUI
 Less Expensive
 Optimally Utilized.
 Computing platform Independence
 Productivity improvement
 Improved Performance.
Limitations
 Single query cannot be spanned across multiple servers.
 Client process is quite complex
 An increase in no. of users and processing sites often create security problems.
2. Collaborating Server Systems
There are several database servers, each capable of running
transactions against local data which cooperatively execute transactions spanning multiple
servers.
When a server receives a query that requires access to other
servers, it generates the appropriate sub query and sends it to the other server. In this way
collaboration among various servers take place.
3. Middleware Systems
Also called data access middleware. Is designed to allow a
single query to span multiple servers, without requiring all servers to be capable of
managing that execution strategies.
Middle ware provides users with consistent interface to multiple DBMS and file systems. It
simplifies heterogeneous environment to programmers.
Middle ware is a layer of s/w which works with a special server and coordinate the
execution of queries and transactions. It is responsible for routing a local request to one or
more remote servers, translating the request to remote servers from one sql dialect to
another as needed and converting data from one format to another.
This architecture consists of API, middleware engine, driversm and native interfaces. API
usually consists of series of function calls as well as series of data access statements.

DISTRIBUTED DATABASE DESIGN (DDBS)

Some of the strategies and objectives of dbms design are,
 Replication
o System maintains multiple copies of data, stored in different sites, for faster
retrieval and fault tolerance.
 Fragmentation
o Relation is partitioned into several fragments stored in distinct sites
 Replication and fragmentation can be combined
o Relation is partitioned into several fragments: system maintains several
identical replicas of each such fragment.
 Replication Transparency
o All copies are updated when changes are made in one copy.
 Location Transparency
o The location of the data is hidden from the user.
 Configuration Independence
o Enables user to add/remove hardware without changing the existing s/w
components of the DBMS.
 Non-homogenity DBMS
o Integrating databases maintained by different DBMS.
1. DATA REPLICATION
A relation or fragment of a relation is replicated if it is stored redundantly in two or
more sites.

Full replication of a relation is the case where the relation is stored at all sites.

Fully redundant databases are those in which every site contains a copy of the
entire database.

Advantages of Replication
a. Availability: failure of site containing relation r does not result in
unavailability of r is replicas exist.
b. Parallelism: queries on r may be processed by several nodes in parallel.
c. Reduced data transfer: relation r is available locally at each site containing a
replica of r.
Disadvantages of Replication
d. Increased cost of updates: each replica of relation r must be updated.
e. Increased complexity of concurrency control: concurrent updates to distinct
replicas may lead to inconsistent data unless special concurrency control
mechanisms are implemented.
i. One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy

2. DATA FRAGMENTATION
Division of relation r into fragments r1, r2, …, rn which contain sufficient information to
reconstruct relation r.

Horizontal fragmentation: each tuple of r is assigned to one or more fragments

Vertical fragmentation: the schema for relation r is split into several smaller schemas

All schemas must contain a common candidate key (or superkey) to ensure
lossless join property.

A special attribute, the tuple-id attribute may be added to each schema to serve
as a candidate key.
Fragmentation of
account Relation
account_number
branch_name balance
Hillside A-305 500
Hillside A-226 336
account1 =
Hillside A-155 62
account_number
branch_name
 balance
Valleyviewbranch_name=“Hillside”
A-177 205
(account )
Valleyview A-402 10000
account
Valleyview A-4082= 1123
Valleyview A-639 750
branch_name=“Valleyview”
(account )

Vertical
Fragmentation of
employee_info Relation
branch_name
customer_name tuple_id
Hillside Lowman 1
Hillside Camp 2
ValleyviewCamp 3
depositKahn
Valleyview 1 = branch_name, 4
account_number
Hillside balance
Kahn tuple_id
5
A-305 Kahn 500
customer_name, tuple_id 16
Valleyview336
A-226
(employee_info ) 27
ValleyviewGreen
A-177 205 3
deposit
A-4022 =10000 account_number,
4
A-155 62
balance, tuple_id
5
A-408 1123 )
(employee_info 6
A-639 750 7

ADVANTAGES OF FRAGMENTATION
o Horizontal:
o allows parallel processing on fragments of a relation
o allows a relation to be split so that tuples are located where they are most
frequently accessed
o Vertical:
o allows tuples to be split so that each part of the tuple is stored where it is
most frequently accessed
o tuple-id attribute allows efficient joining of vertical fragments
o allows parallel processing on a relation
o Vertical and horizontal fragmentation can be mixed.
o Fragments may be successively fragmented to an arbitrary depth.

CONCURRENCY CONTROL IN DISTRIBUTED DATABASES

1. Single Lock Manager approach
System maintains a single lock manager that resides in a single chosen site, say Si

When a transaction needs to lock a data item, it sends a lock request to S i and lock
manager determines whether the lock can be granted immediately

If yes, lock manager sends a message to the site which initiated the request

 The transaction can read the data item from any one of the sites at which a replica
of the data item resides.
 Writes must be performed on all replicas of a data item
 Advantages of scheme:
o Simple implementation
o Simple deadlock handling
 Disadvantages of scheme are:
o Bottleneck: lock manager site becomes a bottleneck
o Vulnerability: system is vulnerable to lock manager site failure.

If no, request is delayed until it can be granted, at which time a message is sent to
the initiating site

2. Distributed Lock manager

In this approach, functionality of locking is implemented by lock managers at each
site. Lock managers control access to local data items But special protocols may be
used for replicas

Advantage: work is distributed and can be made robust to failures

Disadvantage: deadlock detection is more complicated

Lock managers cooperate for deadlock detection

Several variants of this approach

a. Primary copy
b. Majority protocol
c. Biased protocol
d. Quorum consensus
Primary copy
 Choose one replica of data item to be the primary copy.
o Site containing the replica is called the primary site for that data item
o Different data items can have different primary sites
 When a transaction needs to lock a data item Q, it requests a lock at the primary site
of Q.
o Implicitly gets lock on all replicas of the data item
 Benefit
o Concurrency control for replicated data handled similarly to unreplicated
data - simple implementation.
 Drawback
o If the primary site of Q fails, Q is inaccessible even though other sites
containing a replica may be accessible.
Majority Protocol
 Local lock manager at each site administers lock and unlock requests for data items
stored at that site.
 When a transaction wishes to lock an unreplicated data item Q residing at site Si, a
message is sent to Si ‘s lock manager.
o If Q is locked in an incompatible mode, then the request is delayed until it
can be granted.
o When the lock request can be granted, the lock manager sends a message
back to the initiator indicating that the lock request has been granted.

Biased Protocol

 Local lock manager at each site as in majority protocol, however, requests for
shared locks are handled differently than requests for exclusive locks.
 Shared locks. When a transaction needs to lock data item Q, it simply requests a lock
on Q from the lock manager at one site containing a replica of Q.
 Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q
from the lock manager at all sites containing a replica of Q.
 Advantage - imposes less overhead on read operations.
 Disadvantage - additional overhead on writes
3. Time stamping

Timestamp based concurrency-control protocols can be used in distributed systems.

Each transaction must be given a unique timestamp.

Main problem: how to generate a timestamp in a distributed fashion

 Each site generates a unique local timestamp using either a logical

counter or the local clock.
 Global unique timestamp is obtained by concatenating the unique
local timestamp with the unique identifier.

DEADLOCK HANDLING IN DDBMs

 A global wait-for graph is constructed and maintained in a single site by the

deadlock-detection coordinator

o Real graph: Real, but unknown, state of the system.

o Constructed graph:Approximation generated by the controller during the

execution of its algorithm .

 the global wait-for graph can be constructed when:

o a new edge is inserted in or removed from one of the local wait-for graphs.

o a number of changes have occurred in a local wait-for graph.

o the coordinator needs to invoke cycle-detection.

 If the coordinator finds a cycle, it selects a victim and notifies all sites. The sites roll
back the victim transaction
Local and Global Wait-
For Graphs

Local

Global

RECOVERY CONTROL IN DDBMs

 Transaction may access data at several sites.

 Each site has a local transaction manager responsible for:

o Maintaining a log for recovery purposes

o Participating in coordinating the concurrent execution of the transactions

executing at that site.

 Each site has a transaction coordinator, which is responsible for:

o Starting the execution of transactions that originate at the site.

o Distributing subtransactions at appropriate sites for execution.

o Coordinating the termination of each transaction that originates at the site,

which may result in the transaction being committed at all sites or aborted
at all sites.
COMMIT PROTOCOLS

 Commit protocols are used to ensure atomicity across sites

o a transaction which executes at multiple sites must either be committed at

all the sites, or aborted at all the sites.

o not acceptable to have a transaction committed at one site and aborted at

another

 The two-phase commit (2PC) protocol is widely used

 The three-phase commit (3PC) protocol is more complicated and more expensive,
but avoids some drawbacks of two-phase commit protocol. This protocol is not
used in practice.

TWO PHASE COMMIT (2PC)

 Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be
Ci.

Phase-1 (Obtaining the Decision)

 Coordinator asks all participants to prepare to commit transaction Ti.

o Ci adds the records <prepare T> to the log and forces log to stable storage

o sends prepare T messages to all sites at which T executed

 Upon receiving message, transaction manager at site determines if it can commit

the transaction

o if not, add a record <no T> to the log and send abort T message to Ci
o if the transaction can be committed, then:

o add the record <ready T> to the log

o force all records for T to stable storage

o send ready T message to Ci

Phase-2 (Recording the Decision)

 T can be committed of Ci received a ready T message from all the participating sites:
otherwise T must be aborted.

 Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces
record onto stable storage. Once the record stable storage it is irrevocable (even if
failures occur)

 Coordinator sends a message to each participant informing it of the decision

(commit or abort)

 Participants take appropriate action locally.

HANDLING OF FAILURES

Site Failure

 When site Si recovers, it examines its log to determine the fate of

 transactions active at the time of the failure.

 Log contain <commit T> record: site executes redo (T)

 Log contains <abort T> record: site executes undo (T)

 Log contains <ready T> record: site must consult Ci to determine the fate of T.

o If T committed, redo (T)

o If T aborted, undo (T)

 The log contains no control records concerning T replies that Sk failed before
responding to the prepare T message from Ci

o since the failure of Sk precludes the sending of such a

response C1 must abort T

o Sk must execute undo (T)

Coordinator Failure

 If coordinator fails while the commit protocol for T is executing then participating
sites must decide on T’s fate:

o If an active site contains a <commit T> record in its log, then T must be
committed.

o If an active site contains an <abort T> record in its log, then T must be
aborted.

o If some active participating site does not contain a <ready T> record in its
log, then the failed coordinator Ci cannot have decided to commit T. Can
therefore abort T.

o If none of the above cases holds, then all active sites must have a <ready T>
record in their logs, but no additional control records (such as <abort T> of
<commit T>). In this case active sites must wait for Ci to recover, to find
decision.

 Blocking problem: active sites may have to wait for failed coordinator to recover.

Three Phase Commit 3PC

 It is an extension of a 2PC. It avoids the blocking limitation of 2PC.

 It assumes that no network partition occurs and not more than a predertermined
number of sites failed.
 It introduces the extra third phase, where multiple sites are involved in the decision
to commit. In 3PC, the coordinator effectively postpones the decision to commit
until it ensures that predetermined number of sites know that it is intended to
commit the transaction.
 If a coordinator fails, remaining sites first select the new coordinator. This new
coordinator checks the status of the protocol and the commit decision of the old
coordinator is respected. Otherwise new coordinator aborts the transaction.
 3PC introduces a 3rd phase called precommit phase between voting & decision
making.
Adv: Does not block the sites.
Disadv: Overhead and cost.
Distributed Query Processing

 For centralized systems, the primary criterion for measuring the cost of a particular
strategy is the number of disk accesses. I n a distributed system, other issues must
be taken into account:

o The cost of a data transmission over the network.

o The potential gain in performance from having several sites process parts of
the query in parallel.

Query Transformation

 Translating algebraic queries on fragments.

o It must be possible to construct relation r from its fragments

o Replace relation r by the expression to construct relation r from its

fragments.

Example:

Consider the horizontal fragmentation of the account relation into

account1 = s branch_name = “Hillside” (account )

account2 = s branch_name = “Valleyview” (account )

The query s branch_name = “Hillside” (account ) becomes

branch_name = “Hillside” (account1 È account2)

which is optimized into

branch_name = “Hillside” (account1) È s branch_name = “Hillside” (account2)

1. Simple Join Processing

 Consider the following relational algebra expression in which the three relations are
neither replicated nor fragmented

o account depositor branch

 account is stored at site S1 depositor at S2 branch at S3

 For a query issued at site SI, the system needs to produce the result at site SI
2. Semijoin Strategy

Let r1 be a relation with schema R1 stores at site S1. Let r2 be a relation with schema
R2 stores at site S2.

o Evaluate the expression r1 r2 and obtain the result at S1.

o Compute temp1 ¬ ÕR1 Ç R2 (r1) at S1.
o Ship temp1 from S1 to S2.
o Compute temp2 ¬ r2 temp1 at S2
o Ship temp2 from S2 to S1.
o Compute r1 temp2 at S1. This is the same as r1 r2.

The semijoin of r1 with r2, is denoted by:

r1 r2

Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2.

For joins of several relations, the above strategy can be extended to a series of semijoin
steps.

Parallel Database
No ratings yet
Parallel Database
22 pages
Parallel Database Systems Overview
100% (1)
Parallel Database Systems Overview
141 pages
Evolution and Architecture of Database Systems
No ratings yet
Evolution and Architecture of Database Systems
51 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
Parallel DB
No ratings yet
Parallel DB
11 pages
Module 4
No ratings yet
Module 4
23 pages
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Parallel Dbms
No ratings yet
Parallel Dbms
5 pages
9.CSI2004-ADBMS Module2 Part1
No ratings yet
9.CSI2004-ADBMS Module2 Part1
54 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Unit - 2 Adbms
No ratings yet
Unit - 2 Adbms
26 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
DBMS Unit 6
No ratings yet
DBMS Unit 6
14 pages
Intra Query Parallelism in Databases
No ratings yet
Intra Query Parallelism in Databases
58 pages
Parallel and Distributed Databases NOTES
No ratings yet
Parallel and Distributed Databases NOTES
98 pages
Second Unit ADBMS
No ratings yet
Second Unit ADBMS
53 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
Parallel and Distributed Databases Overview
No ratings yet
Parallel and Distributed Databases Overview
23 pages
Advanced Database Architectures Overview
No ratings yet
Advanced Database Architectures Overview
108 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
Parallel Database
No ratings yet
Parallel Database
4 pages
Lecture - 24 24 Parallel and Distributed Databases Parallel and Distributed Databases
No ratings yet
Lecture - 24 24 Parallel and Distributed Databases Parallel and Distributed Databases
23 pages
Adbms
No ratings yet
Adbms
70 pages
Parallelism in Database Management Systems
No ratings yet
Parallelism in Database Management Systems
37 pages
DBT Unit 3 Slides
No ratings yet
DBT Unit 3 Slides
110 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
Lect2-PARALLEL DATABASE
No ratings yet
Lect2-PARALLEL DATABASE
25 pages
Notes - 1071 - MCA-20-23 Unit - 4.1
No ratings yet
Notes - 1071 - MCA-20-23 Unit - 4.1
48 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
17 pages
Database Technologies ch3
No ratings yet
Database Technologies ch3
100 pages
Parallel & Distributed DBMS Guide
No ratings yet
Parallel & Distributed DBMS Guide
58 pages
Parallel and Distributed Databases in DBMS
No ratings yet
Parallel and Distributed Databases in DBMS
31 pages
Note On Parallel and Distributed Database
No ratings yet
Note On Parallel and Distributed Database
10 pages
CH 4
No ratings yet
CH 4
16 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
Data Warehousing
No ratings yet
Data Warehousing
42 pages
Week 2 Parallel and Distributed Database
No ratings yet
Week 2 Parallel and Distributed Database
7 pages
ADBMS
No ratings yet
ADBMS
31 pages
Parallel and Distributed Databases
No ratings yet
Parallel and Distributed Databases
7 pages
9-Database System Architecture
No ratings yet
9-Database System Architecture
37 pages
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
No ratings yet
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
70 pages
Unit - I DBMS
No ratings yet
Unit - I DBMS
74 pages
Parallel System
No ratings yet
Parallel System
11 pages
DBMS
No ratings yet
DBMS
65 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
66 pages
Module 3 - Parallel and Distributed Database
No ratings yet
Module 3 - Parallel and Distributed Database
22 pages
ADBMS Mid Term Questions Answers
No ratings yet
ADBMS Mid Term Questions Answers
33 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
11 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
11 pages
Ch20 Database System Architectures
No ratings yet
Ch20 Database System Architectures
37 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
11 pages
Unit 9 - V1
No ratings yet
Unit 9 - V1
30 pages
Parallel and Distributed Database Systems
No ratings yet
Parallel and Distributed Database Systems
22 pages
Parallel Database Management Systems Overview
No ratings yet
Parallel Database Management Systems Overview
74 pages
Parallel Database System
No ratings yet
Parallel Database System
55 pages
Parallel and Distributed DBMS Techniques
No ratings yet
Parallel and Distributed DBMS Techniques
15 pages
SQL & DBMS Essentials Explained
No ratings yet
SQL & DBMS Essentials Explained
6 pages
Adbms Modelqpnov2013
No ratings yet
Adbms Modelqpnov2013
2 pages
Adb QB
No ratings yet
Adb QB
2 pages
Adbms Unit1
No ratings yet
Adbms Unit1
27 pages
Adbms Unit2
No ratings yet
Adbms Unit2
20 pages
Adbms Unit5
No ratings yet
Adbms Unit5
10 pages
Overview of NoSQL Database Systems
No ratings yet
Overview of NoSQL Database Systems
9 pages
Ddbms
No ratings yet
Ddbms
8 pages
Transactions and Their Applications in The Digital World: Abstract
No ratings yet
Transactions and Their Applications in The Digital World: Abstract
5 pages
Notes - Unit 4 - Distributed Database Design
No ratings yet
Notes - Unit 4 - Distributed Database Design
14 pages
Active Databases
No ratings yet
Active Databases
47 pages
Distributed Systems Algorithms
No ratings yet
Distributed Systems Algorithms
8 pages
CS621 Assignment 01
No ratings yet
CS621 Assignment 01
2 pages
Unit # IV Replication and Fault Tolerance
No ratings yet
Unit # IV Replication and Fault Tolerance
82 pages
Chapter - 6 Distributed Database System
No ratings yet
Chapter - 6 Distributed Database System
50 pages
DBMS Answers Mid Questin Bank
No ratings yet
DBMS Answers Mid Questin Bank
17 pages
Nonblocking Commit Protocols Explained
No ratings yet
Nonblocking Commit Protocols Explained
42 pages
Chapter - 7 Distributed Database System
100% (1)
Chapter - 7 Distributed Database System
54 pages
Spanner Google's Globally-Distributed Database
No ratings yet
Spanner Google's Globally-Distributed Database
14 pages
Distributed Systems Exam Guide
No ratings yet
Distributed Systems Exam Guide
5 pages
SOA Messaging and Activity Management
No ratings yet
SOA Messaging and Activity Management
50 pages
Oracle Database Background Processes Explained
No ratings yet
Oracle Database Background Processes Explained
11 pages
Distributed Databases Overview and Types
No ratings yet
Distributed Databases Overview and Types
44 pages
DDB Transaction Management Protocols
No ratings yet
DDB Transaction Management Protocols
11 pages
3 - Nonblocking Commit Protocols
No ratings yet
3 - Nonblocking Commit Protocols
28 pages
Unit 5 & 6 DBM
No ratings yet
Unit 5 & 6 DBM
68 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
16 pages
Ddbs Checkpointing ... Ddbs Checkpointing ... : Phase 1 at Css Phase 2 at CC
No ratings yet
Ddbs Checkpointing ... Ddbs Checkpointing ... : Phase 1 at Css Phase 2 at CC
9 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
30 pages
Designing Data Intensive Applications
100% (1)
Designing Data Intensive Applications
169 pages
Distributed Systems Overview
86% (7)
Distributed Systems Overview
25 pages
Oracle Database FAQ
100% (8)
Oracle Database FAQ
148 pages
Configuring COM+ Services for Security
No ratings yet
Configuring COM+ Services for Security
90 pages
DDBMS Reliability and Failure Techniques
No ratings yet
DDBMS Reliability and Failure Techniques
6 pages
SybaseRepServerArch 3.v4
No ratings yet
SybaseRepServerArch 3.v4
30 pages