100% found this document useful (1 vote)
351 views

Distributed DBMS Reliability Unit IV

Uploaded by

Rahil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
351 views

Distributed DBMS Reliability Unit IV

Uploaded by

Rahil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

DISTRIBUTED DBMS

RELIABILITY

By Purvi Gautam
Distributed DBMS Reliability
 A reliable distributed database management system
is one that can continue to process user requests
even when the underlying system is unreliable.
 Even when components of the distributed
computing environment fail, a reliable distributed
DBMS should be able to continue executing user
requests without violating database consistency.
 Two specific aspects of reliability protocols that
need to be discussed in relation to these properties
are the commit and the recovery protocols.
Reliability Concepts and Measures
 System, State, and Failure:Reliability refers to a
system that consists of a set of components.
 The system has a state, which changes as the system
operate.
 The behavior of the system in providing response to
all the possible external stimuli is laid out in an
authoritative specification of its behavior.
 Any deviation of a system from the behavior
described in the specification is considered a failure.
Contd..
Contd.
 We differentiate between errors (or faults and failures)
that are permanent and those that are not permanent.
 Permanence can apply to a failure, a fault, or an error,

although we typically use the term with respect to


faults.
 Permanent fault, also commonly called a hard fault, is

one that reflects an irreversible change in


the behavior of the system.
 Permanent faults cause permanent errors that resultin

permanent failures.
Contd..
 The characteristics of these failures is that recovery from them
requires intervention to “repair” the fault.
 Systems also experience intermittent and transient faults. These

two are typically not differentiated; they are jointly called soft
faults.
 An intermittent fault refers to a fault that demonstrates itself

occasionally due to unstable hardware or varying hardware or


software states.
 A transient fault describes a fault that results from temporary

environmental conditions. A transient fault might occur, for


example, due to a sudden increase in
the room temperature.
Failures in Distributed DBMS
 Designing a reliable system that can recover from
failures requires identifying the types of failures
with which the system has to deal.
 In a distributed database system, we need to deal
with four types of failures: transaction failures
(aborts), site (system) failures, media (disk)
failures, and communication line failures.
 Software failures are typically caused by “bugs” in
the code.
Transaction Failures
Transaction Failure can be due to an error in the
transaction caused by incorrect input data as well
as the detection of a present or potential deadlock.
 The usual approach to take in cases of transaction

failure is to abort the transaction, thus resetting the


database to its state prior to the start of this
transaction.
Site (System) Failures..
 The reasons for system failure can be traced back to a hardware
or to a software failure.
 The important point from the perspective of this discussion is
that a system failure is always assumed to result in the loss of
main memory contents.
 Any part of the database that was in main memory buffers is lost
as a result of a system failure.
 The database that is stored in secondary storage is assumed to be
safe and correct.
 In distributed database terminology, system failures are typically
referred to as site failures, since they result in the failed site
being unreachable from other sites in the distributed system.
Media Failures
 Media failure refers to the failures of the secondary
storage devices that store the database.
 Such failures may be due to operating system
errors, as well as to hardware faults such as head
crashes or controller failures.
 Media failures are frequently treated as problems
local to one site and therefore not specifically
addressed in the reliability mechanisms of
distributed DBMSs.
Communication Failures
 Communication failures, however, are unique to the distributed case.
 There are a number of types of communication failures.

 The most common ones are the errors in the messages, improperly

ordered messages, lost (or undeliverable) messages, and


communication line failures.
 If a communication line fails, in addition to losing the message(s) in

transit, it may also divide the network into two or more disjoint
groups.
 This is called network partitioning.

 Network partitions point to a unique aspect of failures in distributed

computer
systems.
Local Reliability Protocols
 The functions performed by the local recovery manager
(LRM) that exists at each site.
 These functions maintain the atomicity and durability
properties of local transactions.
 Architectural Considerations
 Recovery Information
 In-Place Update Recovery Information
 Out-of-Place Update Recovery Information
 Execution of LRM Commands
 Begin transaction, Read, and Write Commands
Distributed Reliability Protocols
 Two-Phase Commit Protocol: Two-phase commit (2PC) is a very
simple and elegant protocol that ensures the atomic commitment of
distributed transactions
 It extends the effects of local atomic commit actions to distributed
transactions by insisting that all sites involved in the execution of a
distributed transaction agree to commit the transaction before its
effects are made permanent.
 Two rules govern this decision, which, together, are called the global
commit rule:
 If even one participant votes to abort the transaction, the coordinator
has to reach a global abort decision.
 If all the participants vote to commit the transaction, the coordinator
has to reach a global commit decision
Variations of 2PC
 Two variations of 2PC have been proposed to improve its
performance.
 This is accomplished by reducing (1) the number of messages
that are transmitted between the coordinator and the
participants, and (2) the number of times logs are written.
 These protocols are called presumed abort and presumed
commit.
 Presumed abort is a protocol that is optimized to handle read-
only transactions as well as those update transactions, some of
whose processes do not perform any updates to the database.
 The presumed commit protocol is optimized to handle the
general update transactions.
Parallel Database System Architectures

 The objectives of parallel database systems are


covered by those of distributed DBMS
(performance, availability, extensibility).
 Parallel database systems combine database
management and parallel processing to increase
performance and availability.
 Parallel database system should provide the
following advantages.
Contd..
 High performance - This can be obtained through several
complementary solutions: database-oriented operating system
support, parallel data management, query optimization, and load
balancing performance.
 Parallelism can increase throughput, using inter-query
parallelism, and decrease transaction response times, using intra-
query parallelism.
 High-availability: Because a parallel database system consists
of many redundant components, it can well increase data
availability and fault-tolerance.
 In a highly-parallel system with many nodes, the probability of a
node failure at any time can be relatively high.
Contd..
 Extensibility: In a parallel system, accommodating
increasing database sizes or increasing performance
demands (e.g., throughput) should be easier.
 Ideally, the parallel database system should
demonstrate two extensibility advantages and
linear speedup and linear scaleup.
Parallel DBMS Architectures
 Shared-Memory: In this approach any processor
has access to any memory module or disk unit
through a fast interconnect.
 All the processors are under the control of a single
operating system.
 Shared-memory has two strong advantages:
simplicity and load balancing.
 Shared-memory has three problems: high cost,
limited extensibility and low availability.
Contd..
 Shared-Disk: In this approach any processor has
access to any disk unit through the interconnect but
exclusive (non-shared) access to its main memory.
 Shared-disk has a number of advantages: lower cost,
high extensibility, load balancing, availability, and
easy migration from centralized systems.
 Shared-disk suffers from higher complexity and
potential performance problems.
 It requires distributed database system protocols,
such as distributed locking and two-phase commit.
Contd.
 Shared-Nothing: In the shared-nothing approach each processor has
exclusive access to its main memory and disk unit(s). Similar to shared-
disk, each processor memory-
 disk node is under the control of its own copy of the operating system.
 It has shared-nothing has three main virtues:
 lower cost, high extensibility, and high availability. The cost advantage
is better than that of shared-disk that requires a special interconnect for
the disks.
 Shared-nothing is much more complex to manage than either shared-
memory or shared-disk.
 Hybrid Architectures: Various possible combinations of the three basic
architectures are possible to obtain different trade-offs between cost,
performance, extensibility, availability, etc.
Contd..
 Hybrid architectures try to obtain the advantages of different
architectures: typically the efficiency and simplicity of shared-
memory and the extensibility and cost of either shared disk or
shared nothing.
 A cluster is a set of independent server nodes interconnected to
share resources and form a single system. The shared resources,
called clustered resources, can be hardware such as disk or
software such as data management services.
 A cluster architecture has important advantages. It combines the
flexibility and performance of shared-memory at each node with
the extensibility and availability of shared-nothing or shared-
disk.
Parallel Data Placement
 Data placement in a parallel database system exhibits similarities with data fragmentation
in distributed databases.
 we use the terms partitioning and partition instead of horizontal fragmentation and
horizontal fragment, respectively, to contrast with the alternative strategy, which consists
of clustering a relation at a single node.
 The main problem is to avoid resource contention, which may result in the entire system
thrashing.
 Data placement must be done so as to maximize system performance, which can be
measured by combining the total amount of work done by the system and the response
time of individual queries.
 Round-robin partitioning: is the simplest strategy, it ensures uniform data distribution.
With n partitions, the ith tuple in insertion order is assigned to partition (i mod n). This
strategy enables the sequential access to a relation to be done in parallel.
 Hash partitioning applies a hash function to some attribute that yields the partition
number. This strategy allows exact-match queries on the selection attribute to be
processed by exactly one node and all other queries to be processed by all the nodes in
parallel.
Contd.
 Range partitioning distributes tuples based on the
value intervals (ranges) of some attribute. In
addition to supporting exact-match queries (as in
hashing), it is well-suited for range queries.
 Note: serious problem in data placement is dealing
with skewed data distributions that may lead to
non-uniform partitioning and hurt load balancing.
Parallel Query Processing
 The objective of parallel query processing is to transform queries into
execution plans that can be efficiently executed in parallel.
 This is achieved by exploiting parallel data placement and the various
forms of parallelism offered by high-level queries.
 Query Parallelism: Parallel query execution can exploit two forms of
parallelism: inter- and intra-query.
 Inter-query parallelism enables the parallel execution of multiple
queries generated by concurrent transactions, in order to increase the
transactional throughput.
 Inter-operator parallelism is obtained by executing in parallel several
operators of the query tree on several processors while with
intraoperator parallelism, the same operator is executed by many
processors, each one working on a subset of the data.
Parallel Query Optimization
 Parallel query optimization exhibits similarities with distributed query processing.
 it focuses much more on taking advantage of both intra-operator parallelism and inter-
operator parallelism.
 parallel query optimizer can be seen as three components:
 a search space, a cost model, and a search strategy.
 Search Space: Execution plans are abstracted by means of operator trees, which define
the order in which the operators are executed.
 In a parallel DBMS, an important execution aspect to be reflected by annotations is the
fact that two subsequent operators can be executed in pipeline.
 The second operator starts consuming tuples as soon as the first one produces them.
 Cost Model: optimizer cost model is responsible for estimating the cost of a given
execution plan. It consists of two parts: architecture-dependent and architecture
independent .
 To estimate the cost of an execution plan, the cost model uses database statisticsand
organization information, such as relation cardinalities and partitioning, as with
distributed query optimization.
Contd.
 Search Strategy: The search strategy does not
need to be different from either centralized or
distributed query optimization.
 The search space tends to be much larger because
there are more parameters that impact parallel
execution plans, in particular, pipeline and store
annotations.
Load Balancing
 Good load balancing is crucial for the performance of a
parallel system.
 Minimizing the time of the longest one is important for
minimizing response time.
 Balancing the load of different transactions and queries among
different nodes is also essential to maximize throughput.
 Although the parallel query optimizer incorporates decisions
on how to execute a parallel execution plan, load balancing
can be hurt by several problems incurring at execution time.
 The principal problems introduced by parallel query execution
are initialization, interference and skew.

You might also like