Chapter 7kec
Chapter 7kec
Replication
Replication and its Reasons
Replication is the mechanism of maintenance of multiple copies of data at multiple
computers. It helps to make distributed system effective by enhancing performance.
c) Causal Consistency
Causal consistency is a weakening model of sequential consistency by categorizing events
into those causally related and those that are not. It defines that only write operations that are
causally related need to be seen in the same order by all processes.
d) Processor Consistency
In order for consistency in data to be maintained and to attain scalable processor systems
where every processor has its own memory, the Processor consistency model was derived.
All processors need to be consistent in the order in which they see writes done by one
processor and in the way they see writes by different processors to the same location
(coherence is maintained). However, they do not need to be consistent when the writes are by
different processors to different locations.
e) Release consistency
The Release consistency model relaxes the Weak consistency model by distinguishing the
entrance synchronization operation from the exit synchronization operation. Under weak
ordering, when a synchronization operation is to be seen, all operations in all processors need
to be visible before the Synchronization operation is done and the processor proceeds.
However, under Release consistency model, during the entry to a critical section, termed as
"acquire", all operations with respect to the local memory variables need to be completed.
During the exit, termed as "release", all changes made by the local processor should be
propagated to all other processors.
f) Entry Consistency
This is a variant of the Release Consistency model. It also requires the use of Acquire and
Release instructions to explicitly state an entry or exit to a critical section. However, under
Entry Consistency, every shared variable is assigned a synchronization variable specific to it.
This way, only when the Acquire is to variable x, all operations related to x need to be
completed with respect to that processor. This allows concurrent operations of different
critical sections of different shared variables to occur. Concurrency cannot be seen for critical
operations on the same shared variable. Such a consistency model will be useful when
different matrix elements can be processed at the same time.
i. Active replication:
The idea is that every replica sees exactly the same set of messages in the same
order and will process them in that order. It assumes objects are deterministic and requires
group communication mechanism to deliver the same set of messages to each active replica
in the same order
It is often the preferred choice where masking of replica failures with minimum time penalty
is highly desirable
Benefits:
Every server is able to respond to client queries with updated data
Immediate fail-over
Limitations:
Waste of resources, since all replicas are doing the same
Update propagation only, which requires determinism
The performance of the system in the presence of primary failures can be substantially poorer
than under no failures. Implementable without deterministic operations and is typically easier
to implement than active replication. It requires less network traffic during the normal
operation but longer recovery with possible data loss
a) Component faults
b) Transient faults: occur once and then disappear. E.g. a bird flying through the beam of
a microwave transmitter may cause lost bits on some network. If retry, may work.
c) Intermittent faults: occurs, then vanishes, then reappears, and so on. E.g. A loose
contact on a connector.
d) Permanent faults: continue to exist until the fault is repaired. E.g. burnt-out chips,
software bugs, and disk head crashes.
e) System failures: There are two types of processor faults:
o Fail-silent faults: a faulty processor just stops and does not respond,
o Byzantine faults: Continue to run but give wrong answers
There are two ways to organize extra physical equipment: active replication (use the
components at the same time) and primary backup (use the backup if one fails).
It can also be referred as State Machine Approach. Each device is replicate three times. Each
stage in the circuit is a triplicated voter. Each voter is a circuit that has three inputs and one
output. If two or three of the inputs are same, the output is equal to that input. If all three
inputs are different, the output is undefined. This kind of design is known as TMR (Triple
Modular Redundancy)
E.g. Aircraft (747 have four engines but can fly with three engines), Sports (multiple referees
in case one misses an event)
A system is said to be k fault tolerant if it can survive faults in k components and still meet its
specifications. K+1 processors can fault tolerant k fail-stop faults. If k of them fail, the one
left can work. But need 2k+1 to tolerate k Byzantine faults because if k processors send out
wrong replies, but there are still k+1 processors giving the correct answer. By majority vote, a
correct answer can still be obtained.
With a primary backup approach, one server (the primary) does all the work. When the
server fails, the backup takes over. A backup may periodically ping the primary with "are
you alive" messages. If it fails to get an acknowledgement, then the backup may assume that
the primary failed and it will take over the functions of the primary.
If the system is asynchronous, there are no upper bounds on a timeout value for the pings.
This is a problem.
Recovery from a primary failure may be time-consuming and/or complex depending on the
needs for continuous operation and application recovery.
a) Cold Failover: Cold failover entails application restart on the backup machine. When
a backup machine takes over, it starts all the applications that were previously running
on the primary system. Of course, any work that the primary may have done is now
lost.
b) Warm failover: applications periodically write checkpoint files onto stable storage
that is shared with the backup system. When the backup system takes over, it reads
the checkpoint files to bring the applications to the state of the last checkpoint.
c) Hot failover: applications on the backup run in lockstep synchrony with applications
on the primary, taking the same inputs as on the primary. When the backup takes
over, it is in the exact state that the primary was in when it failed.
Advantage: relatively easy to design since requests do not have to go multicast to a group of
machines and there are no decisions to be made on who takes over.
Disadvantage: another backup is needed immediately. Backup servers work poorly with
Byzantine faults, since the backup may not be able to detect that the primary has actually
failed.