Distributed DBMS Reliability Unit IV
Distributed DBMS Reliability Unit IV
RELIABILITY
By Purvi Gautam
Distributed DBMS Reliability
A reliable distributed database management system
is one that can continue to process user requests
even when the underlying system is unreliable.
Even when components of the distributed
computing environment fail, a reliable distributed
DBMS should be able to continue executing user
requests without violating database consistency.
Two specific aspects of reliability protocols that
need to be discussed in relation to these properties
are the commit and the recovery protocols.
Reliability Concepts and Measures
System, State, and Failure:Reliability refers to a
system that consists of a set of components.
The system has a state, which changes as the system
operate.
The behavior of the system in providing response to
all the possible external stimuli is laid out in an
authoritative specification of its behavior.
Any deviation of a system from the behavior
described in the specification is considered a failure.
Contd..
Contd.
We differentiate between errors (or faults and failures)
that are permanent and those that are not permanent.
Permanence can apply to a failure, a fault, or an error,
permanent failures.
Contd..
The characteristics of these failures is that recovery from them
requires intervention to “repair” the fault.
Systems also experience intermittent and transient faults. These
two are typically not differentiated; they are jointly called soft
faults.
An intermittent fault refers to a fault that demonstrates itself
The most common ones are the errors in the messages, improperly
transit, it may also divide the network into two or more disjoint
groups.
This is called network partitioning.
computer
systems.
Local Reliability Protocols
The functions performed by the local recovery manager
(LRM) that exists at each site.
These functions maintain the atomicity and durability
properties of local transactions.
Architectural Considerations
Recovery Information
In-Place Update Recovery Information
Out-of-Place Update Recovery Information
Execution of LRM Commands
Begin transaction, Read, and Write Commands
Distributed Reliability Protocols
Two-Phase Commit Protocol: Two-phase commit (2PC) is a very
simple and elegant protocol that ensures the atomic commitment of
distributed transactions
It extends the effects of local atomic commit actions to distributed
transactions by insisting that all sites involved in the execution of a
distributed transaction agree to commit the transaction before its
effects are made permanent.
Two rules govern this decision, which, together, are called the global
commit rule:
If even one participant votes to abort the transaction, the coordinator
has to reach a global abort decision.
If all the participants vote to commit the transaction, the coordinator
has to reach a global commit decision
Variations of 2PC
Two variations of 2PC have been proposed to improve its
performance.
This is accomplished by reducing (1) the number of messages
that are transmitted between the coordinator and the
participants, and (2) the number of times logs are written.
These protocols are called presumed abort and presumed
commit.
Presumed abort is a protocol that is optimized to handle read-
only transactions as well as those update transactions, some of
whose processes do not perform any updates to the database.
The presumed commit protocol is optimized to handle the
general update transactions.
Parallel Database System Architectures