0% found this document useful (0 votes)
5 views

Chapter 7kec

Uploaded by

snekarki80
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter 7kec

Uploaded by

snekarki80
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 7

Replication
Replication and its Reasons
Replication is the mechanism of maintenance of multiple copies of data at multiple
computers. It helps to make distributed system effective by enhancing performance.

Reasons for Replication


1. Performance enhancement
The copy of data placed in the proximity of the process using that data decreases the time to
access the data. This enhances performance of the distributed system.
Example: Web browsers store a copy of previously fetched web page locally as a cached data
to reduce latency of fetching resources from the server.
2. Increased availability
Users always want the services to be highly available. Replication helps in data redundancy
i.e. availability of data even if the server fails. If each of n servers has independent
probability p of crashing, then the availability of resources at each server is 1-pn
Example: 5% chance of a server failure within a given period - two independent servers give
99.75% availability
3. Fault Tolerance
Even if one server fails, the data on other servers are provided to the users. To allow file
access to occur even if one file server is down. A server crash should not bring the entire
system down until the server can be rebooted.
There are three ways in which replication can be done.
i. Explicit file replication: This is for the programmer to control the entire process.
ii. Lazy replication: Only one copy of each file is created, on some server. Later, the
server itself makes replicas on other servers automatically, without the programmer’s
knowledge.
iii. Group communication: All WRITE system calls are simultaneously transmitted to all
the servers at once, so extra copies are made at the same time the original is made.

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 1
a) Explicit file replication b) Lazy file replication c) File replication using a group

Problems with Replication


Inconsistency: If a copy is modified, the copy becomes inconsistent from the rest of the
copies. To ensure consistency, all the copies should be modified.
Consistency Models
Consistency model is the contract between processes and the data store. Whenever a read
operation is done, it should return a value showing the last write operation on data. But due to
lack of global clock, it is difficult to determine last write operation.
To use DSM, one must also implement a distributed synchronization service. This includes
the use of locks, semaphores, and message passing. Most implementations, data is read from
local copies of the data but updates to data must be propagated to other copies of the data.
Memory consistency models determine when data updates are propagated and what level of
inconsistency is acceptable.
Types of Consistency
a) Strict Consistency
Strict consistency is the strongest consistency model. Under this model, a write to a variable
by any processor needs to be seen instantaneously by all processors. The Strict model
diagram and non-Strict model diagrams describe the time constraint – instantaneous. It can be
better understood as though a global clock is present in which every write should be reflected
in all processor caches by the end of that clock period.
b) Sequential Consistency
The sequential consistency model is a weaker memory model than strict consistency. A write
to a variable does not have to be seen instantaneously, however, writes to variables by
different processors have to be seen in the same order by all processors.
As defined by Lamport(1979), Sequential Consistency is met if "the result of any execution is
the same as if the operations of all the processors were executed in some sequential order,

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 2
and the operations of each individual processor appear in this sequence in the order specified
by its program."

c) Causal Consistency
Causal consistency is a weakening model of sequential consistency by categorizing events
into those causally related and those that are not. It defines that only write operations that are
causally related need to be seen in the same order by all processes.
d) Processor Consistency
In order for consistency in data to be maintained and to attain scalable processor systems
where every processor has its own memory, the Processor consistency model was derived.
All processors need to be consistent in the order in which they see writes done by one
processor and in the way they see writes by different processors to the same location
(coherence is maintained). However, they do not need to be consistent when the writes are by
different processors to different locations.
e) Release consistency
The Release consistency model relaxes the Weak consistency model by distinguishing the
entrance synchronization operation from the exit synchronization operation. Under weak
ordering, when a synchronization operation is to be seen, all operations in all processors need
to be visible before the Synchronization operation is done and the processor proceeds.
However, under Release consistency model, during the entry to a critical section, termed as
"acquire", all operations with respect to the local memory variables need to be completed.
During the exit, termed as "release", all changes made by the local processor should be
propagated to all other processors.
f) Entry Consistency
This is a variant of the Release Consistency model. It also requires the use of Acquire and
Release instructions to explicitly state an entry or exit to a critical section. However, under
Entry Consistency, every shared variable is assigned a synchronization variable specific to it.
This way, only when the Acquire is to variable x, all operations related to x need to be
completed with respect to that processor. This allows concurrent operations of different
critical sections of different shared variables to occur. Concurrency cannot be seen for critical
operations on the same shared variable. Such a consistency model will be useful when
different matrix elements can be processed at the same time.

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 3
(a) Consistency models not using synchronization operations. (b) Models with
synchronization operations.
Object Replication
Data in the distributed system contains collection of items. These items are called objects. An
object can be file or object generated by programming paradigm. Object replication is the
mechanism to form physical replicas of such objects, each stored at single computer and tied
to some degree of consistency.
Problem:
 If objects are shared, the concurrent accesses to the shared objects should be managed
to guarantee state consistency.
Solution:
The process of handling concurrent invocations is shown in given figure:

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 4
There are essentially two categories of replication protocol:
• active replication
• single-copy passive replication

i. Active replication:
The idea is that every replica sees exactly the same set of messages in the same
order and will process them in that order. It assumes objects are deterministic and requires
group communication mechanism to deliver the same set of messages to each active replica
in the same order
It is often the preferred choice where masking of replica failures with minimum time penalty
is highly desirable
Benefits:
 Every server is able to respond to client queries with updated data
 Immediate fail-over
Limitations:
 Waste of resources, since all replicas are doing the same
 Update propagation only, which requires determinism

ii. Passive replication:


This method do not require complex, order preserving group communications protocols and
can be implemented using traditional RPC.
One server plays a special primary role Performs all the updates May propagate them to
backup replicas eagerly or lazily Maintains the most updated state Backup servers may take
off the load of processing client

The performance of the system in the presence of primary failures can be substantially poorer
than under no failures. Implementable without deterministic operations and is typically easier
to implement than active replication. It requires less network traffic during the normal
operation but longer recovery with possible data loss

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 5
Fault Tolerant Services

A system is said to be failed when it does not meet its specification.

a) Component faults
b) Transient faults: occur once and then disappear. E.g. a bird flying through the beam of
a microwave transmitter may cause lost bits on some network. If retry, may work.
c) Intermittent faults: occurs, then vanishes, then reappears, and so on. E.g. A loose
contact on a connector.
d) Permanent faults: continue to exist until the fault is repaired. E.g. burnt-out chips,
software bugs, and disk head crashes.
e) System failures: There are two types of processor faults:
o Fail-silent faults: a faulty processor just stops and does not respond,
o Byzantine faults: Continue to run but give wrong answers

There are three kinds of fault tolerance approaches:

i. Information redundancy: extra bit to recover from garbled bits.


ii. Time redundancy: do again
iii. Physical redundancy: add extra components.

There are two ways to organize extra physical equipment: active replication (use the
components at the same time) and primary backup (use the backup if one fails).

Issues for physical redundancy:

 The degree of replication required.


 The average and worst-case performance in the absence of faults.
 The average and worst-case performance when a fault occurs.

1. Fault tolerance using active replication:

It can also be referred as State Machine Approach. Each device is replicate three times. Each
stage in the circuit is a triplicated voter. Each voter is a circuit that has three inputs and one
output. If two or three of the inputs are same, the output is equal to that input. If all three
inputs are different, the output is undefined. This kind of design is known as TMR (Triple
Modular Redundancy)

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 6
Suppose element A2 fails. Each of the voters V1, V2 and V3 gets two good input and one
rogue input, and each of them outputs the correct value to the second stage. The effect of A2
failing is completed masked, so that the inputs to B1, B2 and B3 are exactly the same as they
would have been had no fault occurred. If B3 and C1 are also faulty in addition to A2, these
effects also masked, so the three final outputs are still correct.

E.g. Aircraft (747 have four engines but can fly with three engines), Sports (multiple referees
in case one misses an event)

How much replication is needed?

A system is said to be k fault tolerant if it can survive faults in k components and still meet its
specifications. K+1 processors can fault tolerant k fail-stop faults. If k of them fail, the one
left can work. But need 2k+1 to tolerate k Byzantine faults because if k processors send out
wrong replies, but there are still k+1 processors giving the correct answer. By majority vote, a
correct answer can still be obtained.

2. Fault tolerance using primary backup:

With a primary backup approach, one server (the primary) does all the work. When the
server fails, the backup takes over. A backup may periodically ping the primary with "are
you alive" messages. If it fails to get an acknowledgement, then the backup may assume that
the primary failed and it will take over the functions of the primary.

If the system is asynchronous, there are no upper bounds on a timeout value for the pings.
This is a problem.

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 7
If the primary crashes

 Before doing work (step 2), No harm is done


 After doing the work but before sending the update, work will be done a second time
by backup.
 After step 4 before step 6, work may end up being done three times (once by primary,
once by the backup as a result of step 3 and once after the backup becomes the
primary)

Recovery from a primary failure may be time-consuming and/or complex depending on the
needs for continuous operation and application recovery.

a) Cold Failover: Cold failover entails application restart on the backup machine. When
a backup machine takes over, it starts all the applications that were previously running
on the primary system. Of course, any work that the primary may have done is now
lost.
b) Warm failover: applications periodically write checkpoint files onto stable storage
that is shared with the backup system. When the backup system takes over, it reads
the checkpoint files to bring the applications to the state of the last checkpoint.
c) Hot failover: applications on the backup run in lockstep synchrony with applications
on the primary, taking the same inputs as on the primary. When the backup takes
over, it is in the exact state that the primary was in when it failed.

Advantage: relatively easy to design since requests do not have to go multicast to a group of
machines and there are no decisions to be made on who takes over.

Disadvantage: another backup is needed immediately. Backup servers work poorly with
Byzantine faults, since the backup may not be able to detect that the primary has actually
failed.

High available services

Transaction with replicated data

Prepared and Compiled By: Er. Sapana Thakulla


Lecturer, Kathmandu Engineering College 8

You might also like