0% found this document useful (0 votes)
10 views

unit 4 final-1

Uploaded by

eeswarjai032
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

unit 4 final-1

Uploaded by

eeswarjai032
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

CS3551-DISTRIBUTED COMPUTING –UNIT 4

s UNIT IV CONSENSUS AND RECOVERY


Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in
a Failure-Free System (Synchronous and Asynchronous) – Agreement in Synchronous Systems
with Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions –
Issues in Failure Recovery – Checkpoint-based Recovery – Coordinated Checkpointing Algorithm -
Algorithm for Asynchronous Checkpointing and Recovery.

CONSENSUS AND AGREEMENT ALGORITHMS:


4.1 PROBLEM DEFINITION
 Agreement among the processes in a distributed system is a fundamental requirement for a
wide range of applications.
 Processes to exchange information to negotiate with one another and reach a common
understanding or agreement, before taking application specific actions.
Examples of consensus algorithm:
 Deciding whether to commit a distributed transaction to a database.
 Designating node as a leader for some distributed task.
 Synchronizing state machine replicas and ensuring consistency among them.
Assumptions in Consensus algorithms
 Failure models
 Synchronous/asynchronous communication
 Network connectivity
 Sender identification
 Channel reliability
 Authenticated vs. non-authenticated messages
 Agreement variable
Failure models:
 Some of the processes may be faulty in distributed systems.
 A faulty process can behave in any manner allowed by the failure model assumed.
 Some of the well-known failure models includes fail-stop, Crash, send omission,
receive omission, General omission and Byzantine failures or malicious failure,
 Among the n processes in the system, at most f processes can be faulty. A faulty process
can behave in any manner allowed by the failure model assumed.

Synchronous/asynchronous communication:
Synchronous Computation:
 Processes run in lock step manner [ Process receives a message sent to it earlier,
performs computation and sends a message to other process.
 Step of synchronous Computation is called as rounds.
Asynchronous Computation:
 Computation does not proceed in lock step.
1
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 Process can send receive messages and perform computation at any time.
Network connectivity:
 The system has full logical connectivity, i.e., each process can communicate with any
other by direct message passing.
Sender identification:
 A process that receives a message always knows the identity of the sender process.
 When multiple messages are expected from the same sender in a single round, a
scheduling algorithm is employed that sends these messages in sub-rounds, so that each
message sent within the round can be uniquely identified.
Channel reliability:
 The channels are reliable, and only the processes may fail.
Authenticated vs. non-authenticated messages:
 With unauthenticated messages, when a faulty process relays a message to other
processes
(i) it can forge the message and claim that it was received from another process,
(ii) it can also tamper with the contents of a received message before relaying it.
 When a process receives a message, it has no way to verify its authenticity. This is
known as un authenticated message or oral message or an unsigned message.
 Using authentication via techniques such as digital signatures, it is easier to solve the
 agreement problem because, if some process forges a message or tampers with the
contents of a received message before relaying it, the recipient can detect the forgery or
tampering. This is known as authenticated message or signed message.
Agreement variable:
 The agreement variable may be boolean or multivalued, and need not be an integer.
 This simplifying assumption does not affect the results for other data types, but helps in
the abstraction while presenting the algorithms.
Byzantine General problem
 The Byzantine Generals’ Problem (BGP) is a classic problem faced by any
distributed computer system network.
 Imagine that the grand Eastern Roman empire aka Byzantine empire has decided to
capture a city.
 There is fierce resistance from within the city.
 The Byzantine army has completely encircled the city.
 The army has many divisions and each division has a general.
 The generals communicate between each as well as between all lieutenants within their
division only through messengers.
 All the generals or commanders have to agree upon one of the two plans of action.
 Exact time to attack all at once or if faced by fierce resistance then the time to retreat all at
once. The army cannot hold on forever.
2
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 If the attack or retreat is without full strength, then it means only one thing Unacceptable
brutal defeat.
 If all generals and/or messengers were trustworthy then it is a very simple solution.
 However, some of the messengers and even a few generals/commanders are
 traitors. They are spies or even enemy soldiers.
 There is a very high chance that they will not follow orders or pass on the incorrect
message. The level of trust in the army is very less.
 Consider just a case of 1 commander and 2 Lieutenants and just 2 types of messages-
‘Attack’ and ‘Retreat’.


Performance Aspects of Agreement Protocols:
Few Performance Metrics are as follows:
Time: No of rounds needed to reach an agreement
Message Traffic: Number of messages exchanged to reach an agreement.
Storage Overhead: Amount of information that needs to store at processors during
execution of the protocol.
Problem Specifications
 Byzantine agreement problem
 Consensus problem
 Interactive consistency problem.
Byzantine agreement problem:

The Byzantine agreement problem requires a designated source process, with an initial
value, to reach agreement with the other processes about its initial value, subject to:

 Agreement: All non-faulty processes must agree on the same value.


 Validity: If the source process is non-faulty, then the agreed upon value by all the non-
faulty processes must be the same as the initial value of the source.
 Termination: Each non-faulty process must eventually decide on a value.

3
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Consensus Problem
All the process has an initial value and all the correct processes must agree on single value. This is
consensus problem.
The requirements of the consensus problem are:
 Agreement: All non-faulty processes must agree on the same (single) value.
 Validity: If all the non-faulty processes have the same initial value, then the
agreed upon value by all the non-faulty processes must be that same value.
 Termination: Each non-faulty process must eventually decide on a value.
Interactive Consistency Problem
All the process has an initial value, and all the correct processes must agree upon a set of values,
with one value for each process. This is interactive consistency problem.
The formal specifications are:
 Agreement: All non-faulty processes must agree on the same array of values A
[v1, …,vn].
 Validity: If process i is non-faulty and its initial value is vi, then all non faulty
processes agree on vi as the ith element of the array A. If process jis faulty, then the
non-faulty processes can agree on any value for A[j].
 Termination: Each non-faulty process must eventually decide on the array A.
The difference between the agreement problem and the consensus problem is that, in the
agreement problem, a single process has the initial value, whereas in the consensus
problem, all processes have an initial value.
4.2 OVERVIEW OF RESULTS:
Some important facts to remember are:
 Consensus is not solvable in asynchronous systems even if one process can fail by
crashing. Consensus is attainable for no failure case.
 In a synchronous system, common knowledge of the consensus value is also
attainable. In asynchronous case, concurrent common knowledge of the consensus
value is attainable.
S.N Failure Mode Synchronous System Asynchronous System
o
Agreement is attainable. Agreement is attainable.
1. No failure Common knowledge is also Concurrent common
attainable. knowledge is also attainable.

Agreement is attainable. f < n


process Agreement is not attainable.
2. Crash failure (f+1) rounds

4
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Agreement is attainable. f<=


Byzantine floor((n-1)/3) Byzantine process Agreement is not attainable.
3. (malicious) (f+1) rounds
failure

Solvable variants of agreement problem

. The following are the weaker consensus problem in asynchronous system:


 Terminating reliable broadcast: A correct process will always get a message even
if the sender crashes while sending. If the sender crashes while sending the message,
the message may be even null, but still it has to be delivered to the correct process.
 K-set consensus: It is solvable as long as the number of crashes is less than the
parameter k, which indicates the non-faulty processes that agree on different values,
as long as the size of the set of values agreed upon is bounded by k.
 Approximate agreement: The consensus value is from multi valued domain. The
agreed upon values by the non-faulty processes be within each other.
 Renaming problem: It requires the processes to agree on necessarily distinct values.
 Reliable broadcast: A weaker version of reliable terminating broadcast (RTB), isthe
one in which the terminating condition is dropped and is solvable under crash
failures.

5
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Solvable variants of agreement problem in asynchronous system:

4.3 AGREEMENT IN A FAILURE-FREE SYSTEM


 In a failure-free system, consensus can be reached by collecting information from the
different processes, arriving at a decision, and distributing this decision in the system.
 A distributed mechanism would have each process broadcast its values to others, and each
process computes the same function on the values received.
 The decision can be reached by using an application specific function.
 Algorithms to collect the initial values and then distribute the decision may be based on
the token circulation on a logical ring, or the three-phase tree-based broadcast converge
cast: broadcast, or direct communication with all nodes.
In a synchronous system, this can be done simply in a constant number of rounds.
 Further, common knowledge of the decision value can be obtained using an additional
round.
In an asynchronous system, consensus can similarly be reached in a constant number of
message hops.
 Further, concurrent common knowledge of the consensus value can also be attained.
4.4 AGREEMENT IN (MESSAGE-PASSING) SYNCHRONOUS SYSTEM WITH
FAILURES
 Consensus algorithm for crash failures message passing synchronous system.
 The consensus algorithm for n processes where up to f processes where f < n may fail in a
fail stop failure model.
 Here the consensus variable x is integer value; each process has initial value xi. If up to f
failures are to be tolerated than algorithm has f+1 rounds, in each round a process i sense
the value of its variable xi to all other processes if that value has not been sent before.
 So, of all the values received within that round and its own value xi at that start of the
round the process takes minimum and updates xi occur f + 1 rounds the local value xi
guaranteed to be the consensus value.

6
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 In one process is faulty, among three processes then f = 1. So the agreement requires f + 1
that is equal to two rounds.
 If it is faulty let us say it will send 0 to 1 process and 1 to another process i, j and k. Now,
on receiving one on receiving 0 it will broadcast 0 over here and this particular process on
receiving 1 it will broadcast 1 over here.
 So, this will complete one round in this one round and this particular process on receiving
1 it will send 1 over here and this on the receiving 0 it will send 0 over here.
(global constants)
(global constants)
integer: f; // maximum
integer: f;// maximum number
number of crash
of crash failures
failures tolerated
tolerated (local
(local variables)
variables)
Integer: x  local
Integer: x  local
value;value;
Process Pi (1 Pi
Process <=i(1<=n)
<=i <=n)execute the consensus
execute algorithm
the consensus for upfor
algorithm toup
f crash failures:
to f crash failures:
(1a) for
(1a)round from from
for round 1 to f1+to1 fdo
+ 1 do
(1b)if(1b)if
the current
the current value of xnot
value of x has hasbeen broadcast
not been then then
broadcast
(1c)broadcast(x);
(1c)broadcast(x);
 value
(1d)yi(1d)yi  value
(if any) received
(if any) from from
received process j in this
process j inround;
this round;
 min
(1e)x(1e)x  min(x,
j (x, y j)
y j;) ;
(1f) output x as the
(1f) output x asconsensus
the consensusvalue.value.

 The agreement condition is satisfied because in the f+ 1 rounds, there must be at least one
round in which no process failed.
 In this round, say round r, all the processes that have not failed so far succeed in
broadcasting their values, and all these processes take the minimum of the values
broadcast and received in that round.
 Thus, the local values at the end of the round are the same, say xi for all non-failed
r
processes.
 In further rounds, only this value may be sent by each process at most once, and no
process i will update its value xi .
 The validity condition is satisfied
r
because processes do not send fictitious values inthis
failure model.
 For all i, if the initial value is identical, then the only value sent by any process is the
value that has been agreed upon as per the agreement condition.
 The termination condition is seen to be satisfied.
Complexity

 Algorithm is it requires f + 1 rounds where f < n and the number of messages is O(n2
), The total number of messages is O((f +1)· n 2) is the total number of rounds and
in each round n2 messages are required.

7
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Consensus algorithms for Byzantine failures (synchronous system) Upper bound on


Byzantine processes

 In a system of n processes, the Byzantine agreement problem can be solved in a


synchronous system only if the number of Byzantine processes f is such that

f ≤ n-1

Fig: Impossibility of achieving Byzantine agreement with n = 3 processes


and f = 1

malicious process

 The condition where f < (n – 1) / 2 is violated over here; that means, if f = 1 andn = 2
this particular assumption is violated

 (n – 1) / 2 is not 1 in that case, but we are assuming 1 so obviously, as per the previous
condition agreement byzantine agreement is not possible.

 Here P 0 is faulty is non faulty and here P 0 is faulty so that means P 0 is the source,
the source is faulty here in this case and source is non faulty in the other case.

 So, source is non faulty, but some other process is faulty let us say that P 2 is faulty. P
1 will send because it is non faulty same values to P 1 and P 2 and as far as the P 2s
concerned it will send a different value because it is a faulty.

8
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 Agreement is possible when f = 1 and the total number of processor is 4. So, agreement we can see
how it is possible we can see about the commander P c.

 So, this is the source it will send the message 0 since it is faulty. It will send 0 to Pd 0 to P b, but 1
to pa in the first column. So, P a after receiving this one it will send one to both the neighbors,
similarly P b after receiving 0 it will send 0 since itis not faulty.

 Similarity P d will send after receiving 0 at both the ends.

 If we take these values which will be received here it is 1 and basically it is 0 and thisis also 0.

 So, the majority is basically 0 here in this case here also if you see the values 10 and 0. The
majority is 0 and here also majority is 0.

 In this particular case even if the source is faulty, it will reach to an agreement, reach an agreement
and that value will be agreed upon value or agreement variable will be equal to 0.

9
CS3551-DISTRIBUTED COMPUTING –UNIT 4

4.5 CHECK POINTING AND ROLLBACK RECOVERY: INTRODUCTION

 After the failure, restoring the state to its normal state is called as failure recovery.

 Rollback recovery protocols restore the system back to a consistent state after a failure,

 It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
 It treats a distributed system application as a collection of processes that communicate
over a network

Checkpoints

The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Rollback propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing

If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect

1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to


form a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take

10
CS3551-DISTRIBUTED COMPUTING –UNIT 4

checkpoints based on information piggybacked on the application.


3. Log-based rollback recovery - Combines check pointing with logging of non-
deterministic events • relies on piecewise deterministic (PWD) assumption.
4.6 BACKGROUND AND DEFINITIONS

System model

 A distributed system consists of a fixed number of processes, P1, P2,…_ PN , which


communicate only through messages.
 Processes cooperate to execute a distributed application and interact with the outside
world by receiving and sending input and output messages, respectively.
 Rollback-recovery protocols generally make assumptions about the reliability of theinter-
process communication.
 Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
 Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.

An example of a distributed system with three processes.

A local checkpoint

 All processes save their local states at certain instants of time

 A local check point is a snapshot of the state of the process at a given instance
 Assumption

– A process stores all local checkpoints on the stable storage


– A process is able to roll back to any of its existing local checkpoints
 𝐶𝑖,𝑘 – The kth local checkpoint at process 𝑃𝑖

 𝐶𝑖,0 – A process 𝑃𝑖 takes a checkpoint 𝐶𝑖,0 before it starts execution

11
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Consistent states
 A global state of a distributed system is a collection of the individual states of all
participating processes and the states of the communication channels
 Consistent global state
o a global state that may occur during a failure-free execution of distribution of
distributed computation
o if a process’s state reflects a message receipt, then the state of the corresponding
sender must reflect the sending of the message
 A global checkpoint is a set of local checkpoints, one from each process
 A consistent global checkpoint is a global checkpoint such that no message is sent by a
process after taking its local point that is received by another process before taking its
checkpoint.

 For instance, Figure shows two examples of global states.

 The state in fig (a) is consistent and the state in Figure (b) is inconsistent.

 Note that the consistent state in Figure (a) shows message m1 to have been sent but not
yet received, but that is alright.
 The state in Figure (a) is consistent because it represents a situation in which every
message that has been received, there is a corresponding message send event.
 The state in Figure (b) is inconsistent because process P2 is shown to have received m2
but the state of process P1 does not reflect having sent it.
Interactions with outside world
Outside World Process (OWP)

 It is a special process that interacts with the rest of the system through message passing.

12
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
 Thus, before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process-line by the symbol “||”.
Different types of Messages
1. In-transit message

 messages that have been sent but not yet received

2. Lost messages

 messages whose “send‟ is done but “receive‟ is undone due to rollback


3. Delayed messages

 messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphan messages

 messages with “receive‟ recorded but message “send‟ not recorded

 do not arise if processes roll back to a consistent global state

5. Duplicate messages

 arise due to message logging and replaying during process recovery


Example:

13
CS3551-DISTRIBUTED COMPUTING –UNIT 4

In-transit messages

 In Figure the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been
sent but not yet received. We call such a message an in-transit message.

 Message m2 is also an in-transit message.


Delayed messages

 Messages whose receive is not recorded because the receiving process was either down or
the message arrived after the rollback of the receiving process, are called delayed
messages.

 For example, messages m2 and m5 in Figure are delayed messages.


Lost messages

 Messages whose send is not undone but receive is undone due to rollback are called lost
messages.

 This type of messages occurs when the process rolls back to a checkpoint prior to
reception of the message while the sender does not rollback beyond the send operation of
the message. In Figure , message m1 is a lost message.
Duplicate messages

 Duplicate messages arise due to message logging and replaying during process recovery.
For example, in Figure, message m4 was sent and received before the rollback.

 However, due to the rollback of process P4 to C4,8 and process P3 to C3,8, both send and
receipt of message m4 are undone.

 When process P3 restarts from C3,8, it will resend message m4.Therefore, P4 should not
replay message m4 from its log.

 If P4 replays message m4, then message m4 is called a duplicate message.

4.7 ISSUES IN FAILURE RECOVERY

 In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and
recovery

14
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.

 The processes communicate solely by exchanging messages over fault- free, FIFO
communication channels.

 Processes Pi, Pj , and Pk have taken checkpoints

 The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan message .

 Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1

 Messages C, D, E, and F are potentially problematic

 Message C: a delayed message

 Message D: a lost message since the send event for D is recorded in therestored
state for 𝑃𝑗, but the receive event has been undone at process 𝑃𝑖.

 Lost messages can be handled by having processes keep a message log of allthe sent
messages

 Messages E, F: delayed orphan messages. After resuming execution from their


checkpoints, processes will generate both of these messages

Major Issues:

 Domino Effect
 Live-Lock
Domino Effect

15
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 Cumulative effect is produced when one event initiate series of similar events.
 How to avoid these?
o Allowing the process to take check point independently.

Live-Lock
• case where a single failure can cause an infinite number of rollbacks.

• The Livelock problem may arise when a process rolls back to its checkpoint after
a failure and requests all the other affected processes also to roll back.

• In such a situation if the roll back mechanism has no synchronization, it may lead
to the livelock problem as described in the example

• In case-I Process Y fails before receiving message ‘n1’ sent by X

• Y rolled back to y1, no record of sending message ‘m1’, causing X to rollback to


x1

• The above sequence can repeat indefinitely

16
CS3551-DISTRIBUTED COMPUTING –UNIT 4

4.8 CHECKPOINT-BASED RECOVERY

Checkpoint-based rollback-recovery techniques can be classified into three categories:

1. Uncoordinated checkpointing

2. Coordinated checkpointing

3. Communication-induced checkpointing

1. Uncoordinated Checkpointing
 Each process has autonomy in deciding when to take checkpoints

 The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation

Advantages:

 The lower runtime overhead during normal execution


Disadvantages:
 Domino effect during a recovery
 Recovery from a failure is slow because processes need to iterate to find aconsistent set of checkpoints
 Each process maintains multiple checkpoints and periodically invoke agarbage collection algorithm
 Not suitable for application with frequent output commits

17
CS3551-DISTRIBUTED COMPUTING –UNIT 4

2. Coordinated Checkpointing

In coordinated checkpointing, processes orchestrate their checkpointing activities so that all


local checkpoints form a consistent global state
Types

1 Blocking Checkpointing:

 After a process takes a local checkpoint, to prevent orphan messages, it remains


blocked until the entire checkpointing activity is complete

 Disadvantages: The computation is blocked during the checkpointing


2 Non-blocking Checkpointing:
 The processes need not stop their execution while taking checkpoints.
 A fundamental problem in coordinated checkpointing is to prevent a processfrom
receiving application messages that could make the checkpoint inconsistent.
 Example (a) : Checkpoint inconsistency

3. Communication-induced Checkpointing

 Communication-induced checkpointing is another way to avoid the domino effect, while


allowing processes to take some of their checkpoints independently.

 Processes may be forced to take additional checkpoints

 Communication-induced check pointing piggybacks protocol- related information on each


application message
 The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
 The forced checkpoint must be taken before the application may process the contents of

18
CS3551-DISTRIBUTED COMPUTING –UNIT 4

the message
Two types of communication-induced checkpointing

1. Model-based checkpointing

2. Index-based checkpointing.

Model-based checkpointing

 Model-based checkpointing prevents patterns of communications and checkpoints


that could result in inconsistent states among the existing checkpoints.
 No control messages are exchanged among the processes during normal operation.
All information necessary to execute the protocol is piggybacked on application
messages
 There are several domino-effect-free checkpoint and communication model.

 The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede
all message-sending events.
Index-based checkpointing.

 Index-based communication-induced checkpointing assigns monotonically increasing


indexes to checkpoints, such that the checkpoints having the same index at different
processes form a consistent state.
4.9 KOO AND TOUEG COORDINATED CHECKPOINTING AND RECOVERY TECHNIQUE:
 Koo and Toueg coordinated check pointing and recovery technique takes a consistent set
of checkpoints and avoids the domino effect and livelock problems during the recovery.
 Includes 2 parts:
o Check pointing algorithm and
o Recovery algorithm
The Checkpointing Algorithm

The checkpoint algorithm makes the following assumptions about the distributed system:

 Processes communicate by exchanging messages through communication channels.

 Communication channels are FIFO.

 Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
 Communication failures do not divide the network.
 The checkpoint algorithm takes two kinds of checkpoints on the stable storage:

19
CS3551-DISTRIBUTED COMPUTING –UNIT 4

• Permanent and
• Tentative.
 A permanent checkpoint is a local checkpoint at a process and is a part of a consistent global
checkpoint.
A tentative checkpoint is a temporary checkpoint that is made a permanent checkpoint on the
successful termination of the checkpoint algorithm.
The algorithm consists of two phases,
First Phase

1. An initiating process Pi takes a tentative checkpoint and requests all other processes to
take tentative checkpoints. Each process informs Pi whether it succeeded in taking a
tentative checkpoint.
2. A process says “no” to a request if it fails to take a tentative checkpoint

3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase

1. Pi informs all the processes of the decision it reached at the end of the first phase.

2. A process, on receiving the message from Pi will act accordingly.

3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic computation until it is informed of Pi’s decision.
Correctness: for two reasons
i. Either all or none of the processes take permanent checkpoint
ii. No process sends message after taking permanent checkpoint

An Optimization

The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
The Rollback Recovery Algorithm

 The rollback recovery algorithm restores the system state to a consistent state after a
failure.

 The rollback recovery algorithm assumes that a single process invokes the algorithm.

 It assumes that the checkpoint and the rollback recovery algorithms are not invoked

20
CS3551-DISTRIBUTED COMPUTING –UNIT 4

concurrently.

 The rollback recovery algorithm has two phases.


First Phase

1. An initiating process Pi sends a message to all other processes to check if they all are
willing to restart from their previous checkpoints.
2. A process may reply “no” to a restart request due to any reason (e.g., it is already
participating in a check pointing or a recovery process initiated by some other process).
3. If Pi learns that all processes are willing to restart from their previous checkpoints, Pi
decides that all processes should roll back to their previous checkpoints. Otherwise,
4. Pi aborts the roll back attempt and it may attempt a recovery at a later time.

Second Phase

1. Pi propagates its decision to all the processes.

2. On receiving Pi’s decision, a process acts accordingly.

3. During the execution of the recovery algorithm, a process cannot send messages related
to the underlying computation while it is waiting for Pi’s decision.
Correctness: Resume from a consistent state

Optimization: May not to recover all, since some of the processes did not change anything

4.10 ALGORITHM FOR ASYNCHRONOUS CHECKPOINTING AND RECOVERY:

The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
System Model and Assumptions

The algorithm makes the following assumptions about the underlying system:

21
CS3551-DISTRIBUTED COMPUTING –UNIT 4

 The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
 The message transmission delay is arbitrary, but finite.
 Underlying computation/application is event-driven: process P is at state s, receives
message m, processes the message, moves to state s’ and send messages out. So the
triplet (s, m, msgs_sent) represents the state of P.
Two type of log storage are maintained:

– Volatile log: short time to access but lost if processor crash. Move to stable
logperiodically.
– Stable log: longer time to access but remained if crashed

Asynchronous Check pointing

– After executing an event, the triplet is recorded without any synchronization with
other processes.
– Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:

• RCVDi←j(CkPti) represents the number of messages received by processor pi from processor


pj , from the beginning of the computation till the checkpoint CkPti.
• SENTi→j(CkPti) represents the number of messages sent by processor pi to processor pj , from
the beginning of the computation till the checkpoint CkPti.
Basic idea

 Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
 The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
 Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at

22
CS3551-DISTRIBUTED COMPUTING –UNIT 4

processor pj are orphan messages.


The Algorithm

When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed

Procedure RollBack_Recovery

processor pi executes the following:

STEP (a)

if processor pi is recovering after a failure then


CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if

STEP (b)

for k = 1 1 to N {N is the number of processors in the system} do


for each neighboring processor pj do
compute SENTi→j(CkPti)
send a ROLLBACK(i, SENTi→j(CkPti)) message to pj

end for

for every ROLLBACK(j, c) message received from a neighbor j do

if RCVDi←j(CkPti) > c {Implies the presence of orphan messages} then

find the latest event e such that RCVDi←j(e) = c {Such an event e may be in the volatile storage
or stable storage.}
CkPti := e
end if end
for
end for{for k}

An Example

Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y


fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.

23
CS3551-DISTRIBUTED COMPUTING –UNIT 4

Figure 2: An example of Juan-Venkatesan algorithm.

 Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is


initiated at processors X and Z.
 Initially, X, Y, and Z set CkPtX ← ex3, CkPtY ← ey2 and CkPtZ ← ez2, respectively,
and X, Y, and Z send the following messages during the first iteration:
 Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;
 X sends ROLLBACK(X,2) to Y and ROLLBACK(X,0) to Z;

 Z sends ROLLBACK(Z,0) to X and ROLLBACK(Z,1) to Y.

Since RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y,2) message
from Y), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 1≤ 2.
Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) = 1 ≤
1.
At Y, RCVDY←X(CkPtY ) = 1 < 2 and RCVDY←Z(CkPtY ) = 1 = SENTZ←Y (CkPtZ).

Y need not roll back further.

In the second iteration, Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;


Z sends ROLLBACK(Z,1) to Y and ROLLBACK(Z,0) to X; X
sends ROLLBACK(X,0) to Z and ROLLBACK(X, 1) to Y.
If Y rolls back beyond ey3 and loses the message from X that caused ey3, X can resend this
message to Y because ex2 is logged at X and this message available in the log. The second and
third iteration will progress in the same manner. The set of recovery points chosen at the end of
the first iteration, {ex2, ey2, ez1}, is consistent, and no further rollback occur.

24
CS3551-DISTRIBUTED COMPUTING –UNIT 4

25

You might also like