unit 4 final-1
unit 4 final-1
Synchronous/asynchronous communication:
Synchronous Computation:
Processes run in lock step manner [ Process receives a message sent to it earlier,
performs computation and sends a message to other process.
Step of synchronous Computation is called as rounds.
Asynchronous Computation:
Computation does not proceed in lock step.
1
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Process can send receive messages and perform computation at any time.
Network connectivity:
The system has full logical connectivity, i.e., each process can communicate with any
other by direct message passing.
Sender identification:
A process that receives a message always knows the identity of the sender process.
When multiple messages are expected from the same sender in a single round, a
scheduling algorithm is employed that sends these messages in sub-rounds, so that each
message sent within the round can be uniquely identified.
Channel reliability:
The channels are reliable, and only the processes may fail.
Authenticated vs. non-authenticated messages:
With unauthenticated messages, when a faulty process relays a message to other
processes
(i) it can forge the message and claim that it was received from another process,
(ii) it can also tamper with the contents of a received message before relaying it.
When a process receives a message, it has no way to verify its authenticity. This is
known as un authenticated message or oral message or an unsigned message.
Using authentication via techniques such as digital signatures, it is easier to solve the
agreement problem because, if some process forges a message or tampers with the
contents of a received message before relaying it, the recipient can detect the forgery or
tampering. This is known as authenticated message or signed message.
Agreement variable:
The agreement variable may be boolean or multivalued, and need not be an integer.
This simplifying assumption does not affect the results for other data types, but helps in
the abstraction while presenting the algorithms.
Byzantine General problem
The Byzantine Generals’ Problem (BGP) is a classic problem faced by any
distributed computer system network.
Imagine that the grand Eastern Roman empire aka Byzantine empire has decided to
capture a city.
There is fierce resistance from within the city.
The Byzantine army has completely encircled the city.
The army has many divisions and each division has a general.
The generals communicate between each as well as between all lieutenants within their
division only through messengers.
All the generals or commanders have to agree upon one of the two plans of action.
Exact time to attack all at once or if faced by fierce resistance then the time to retreat all at
once. The army cannot hold on forever.
2
CS3551-DISTRIBUTED COMPUTING –UNIT 4
If the attack or retreat is without full strength, then it means only one thing Unacceptable
brutal defeat.
If all generals and/or messengers were trustworthy then it is a very simple solution.
However, some of the messengers and even a few generals/commanders are
traitors. They are spies or even enemy soldiers.
There is a very high chance that they will not follow orders or pass on the incorrect
message. The level of trust in the army is very less.
Consider just a case of 1 commander and 2 Lieutenants and just 2 types of messages-
‘Attack’ and ‘Retreat’.
Performance Aspects of Agreement Protocols:
Few Performance Metrics are as follows:
Time: No of rounds needed to reach an agreement
Message Traffic: Number of messages exchanged to reach an agreement.
Storage Overhead: Amount of information that needs to store at processors during
execution of the protocol.
Problem Specifications
Byzantine agreement problem
Consensus problem
Interactive consistency problem.
Byzantine agreement problem:
The Byzantine agreement problem requires a designated source process, with an initial
value, to reach agreement with the other processes about its initial value, subject to:
3
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Consensus Problem
All the process has an initial value and all the correct processes must agree on single value. This is
consensus problem.
The requirements of the consensus problem are:
Agreement: All non-faulty processes must agree on the same (single) value.
Validity: If all the non-faulty processes have the same initial value, then the
agreed upon value by all the non-faulty processes must be that same value.
Termination: Each non-faulty process must eventually decide on a value.
Interactive Consistency Problem
All the process has an initial value, and all the correct processes must agree upon a set of values,
with one value for each process. This is interactive consistency problem.
The formal specifications are:
Agreement: All non-faulty processes must agree on the same array of values A
[v1, …,vn].
Validity: If process i is non-faulty and its initial value is vi, then all non faulty
processes agree on vi as the ith element of the array A. If process jis faulty, then the
non-faulty processes can agree on any value for A[j].
Termination: Each non-faulty process must eventually decide on the array A.
The difference between the agreement problem and the consensus problem is that, in the
agreement problem, a single process has the initial value, whereas in the consensus
problem, all processes have an initial value.
4.2 OVERVIEW OF RESULTS:
Some important facts to remember are:
Consensus is not solvable in asynchronous systems even if one process can fail by
crashing. Consensus is attainable for no failure case.
In a synchronous system, common knowledge of the consensus value is also
attainable. In asynchronous case, concurrent common knowledge of the consensus
value is attainable.
S.N Failure Mode Synchronous System Asynchronous System
o
Agreement is attainable. Agreement is attainable.
1. No failure Common knowledge is also Concurrent common
attainable. knowledge is also attainable.
4
CS3551-DISTRIBUTED COMPUTING –UNIT 4
5
CS3551-DISTRIBUTED COMPUTING –UNIT 4
6
CS3551-DISTRIBUTED COMPUTING –UNIT 4
In one process is faulty, among three processes then f = 1. So the agreement requires f + 1
that is equal to two rounds.
If it is faulty let us say it will send 0 to 1 process and 1 to another process i, j and k. Now,
on receiving one on receiving 0 it will broadcast 0 over here and this particular process on
receiving 1 it will broadcast 1 over here.
So, this will complete one round in this one round and this particular process on receiving
1 it will send 1 over here and this on the receiving 0 it will send 0 over here.
(global constants)
(global constants)
integer: f; // maximum
integer: f;// maximum number
number of crash
of crash failures
failures tolerated
tolerated (local
(local variables)
variables)
Integer: x local
Integer: x local
value;value;
Process Pi (1 Pi
Process <=i(1<=n)
<=i <=n)execute the consensus
execute algorithm
the consensus for upfor
algorithm toup
f crash failures:
to f crash failures:
(1a) for
(1a)round from from
for round 1 to f1+to1 fdo
+ 1 do
(1b)if(1b)if
the current
the current value of xnot
value of x has hasbeen broadcast
not been then then
broadcast
(1c)broadcast(x);
(1c)broadcast(x);
value
(1d)yi(1d)yi value
(if any) received
(if any) from from
received process j in this
process j inround;
this round;
min
(1e)x(1e)x min(x,
j (x, y j)
y j;) ;
(1f) output x as the
(1f) output x asconsensus
the consensusvalue.value.
The agreement condition is satisfied because in the f+ 1 rounds, there must be at least one
round in which no process failed.
In this round, say round r, all the processes that have not failed so far succeed in
broadcasting their values, and all these processes take the minimum of the values
broadcast and received in that round.
Thus, the local values at the end of the round are the same, say xi for all non-failed
r
processes.
In further rounds, only this value may be sent by each process at most once, and no
process i will update its value xi .
The validity condition is satisfied
r
because processes do not send fictitious values inthis
failure model.
For all i, if the initial value is identical, then the only value sent by any process is the
value that has been agreed upon as per the agreement condition.
The termination condition is seen to be satisfied.
Complexity
Algorithm is it requires f + 1 rounds where f < n and the number of messages is O(n2
), The total number of messages is O((f +1)· n 2) is the total number of rounds and
in each round n2 messages are required.
7
CS3551-DISTRIBUTED COMPUTING –UNIT 4
f ≤ n-1
malicious process
The condition where f < (n – 1) / 2 is violated over here; that means, if f = 1 andn = 2
this particular assumption is violated
(n – 1) / 2 is not 1 in that case, but we are assuming 1 so obviously, as per the previous
condition agreement byzantine agreement is not possible.
Here P 0 is faulty is non faulty and here P 0 is faulty so that means P 0 is the source,
the source is faulty here in this case and source is non faulty in the other case.
So, source is non faulty, but some other process is faulty let us say that P 2 is faulty. P
1 will send because it is non faulty same values to P 1 and P 2 and as far as the P 2s
concerned it will send a different value because it is a faulty.
8
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Agreement is possible when f = 1 and the total number of processor is 4. So, agreement we can see
how it is possible we can see about the commander P c.
So, this is the source it will send the message 0 since it is faulty. It will send 0 to Pd 0 to P b, but 1
to pa in the first column. So, P a after receiving this one it will send one to both the neighbors,
similarly P b after receiving 0 it will send 0 since itis not faulty.
If we take these values which will be received here it is 1 and basically it is 0 and thisis also 0.
So, the majority is basically 0 here in this case here also if you see the values 10 and 0. The
majority is 0 and here also majority is 0.
In this particular case even if the source is faulty, it will reach to an agreement, reach an agreement
and that value will be agreed upon value or agreement variable will be equal to 0.
9
CS3551-DISTRIBUTED COMPUTING –UNIT 4
After the failure, restoring the state to its normal state is called as failure recovery.
Rollback recovery protocols restore the system back to a consistent state after a failure,
It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
It treats a distributed system application as a collection of processes that communicate
over a network
Checkpoints
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Rollback propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect
10
CS3551-DISTRIBUTED COMPUTING –UNIT 4
System model
A local checkpoint
A local check point is a snapshot of the state of the process at a given instance
Assumption
11
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Consistent states
A global state of a distributed system is a collection of the individual states of all
participating processes and the states of the communication channels
Consistent global state
o a global state that may occur during a failure-free execution of distribution of
distributed computation
o if a process’s state reflects a message receipt, then the state of the corresponding
sender must reflect the sending of the message
A global checkpoint is a set of local checkpoints, one from each process
A consistent global checkpoint is a global checkpoint such that no message is sent by a
process after taking its local point that is received by another process before taking its
checkpoint.
The state in fig (a) is consistent and the state in Figure (b) is inconsistent.
Note that the consistent state in Figure (a) shows message m1 to have been sent but not
yet received, but that is alright.
The state in Figure (a) is consistent because it represents a situation in which every
message that has been received, there is a corresponding message send event.
The state in Figure (b) is inconsistent because process P2 is shown to have received m2
but the state of process P1 does not reflect having sent it.
Interactions with outside world
Outside World Process (OWP)
It is a special process that interacts with the rest of the system through message passing.
12
CS3551-DISTRIBUTED COMPUTING –UNIT 4
It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
Thus, before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process-line by the symbol “||”.
Different types of Messages
1. In-transit message
2. Lost messages
messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphan messages
5. Duplicate messages
13
CS3551-DISTRIBUTED COMPUTING –UNIT 4
In-transit messages
In Figure the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been
sent but not yet received. We call such a message an in-transit message.
Messages whose receive is not recorded because the receiving process was either down or
the message arrived after the rollback of the receiving process, are called delayed
messages.
Messages whose send is not undone but receive is undone due to rollback are called lost
messages.
This type of messages occurs when the process rolls back to a checkpoint prior to
reception of the message while the sender does not rollback beyond the send operation of
the message. In Figure , message m1 is a lost message.
Duplicate messages
Duplicate messages arise due to message logging and replaying during process recovery.
For example, in Figure, message m4 was sent and received before the rollback.
However, due to the rollback of process P4 to C4,8 and process P3 to C3,8, both send and
receipt of message m4 are undone.
When process P3 restarts from C3,8, it will resend message m4.Therefore, P4 should not
replay message m4 from its log.
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and
recovery
14
CS3551-DISTRIBUTED COMPUTING –UNIT 4
The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.
The processes communicate solely by exchanging messages over fault- free, FIFO
communication channels.
Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1
Message D: a lost message since the send event for D is recorded in therestored
state for 𝑃𝑗, but the receive event has been undone at process 𝑃𝑖.
Lost messages can be handled by having processes keep a message log of allthe sent
messages
Major Issues:
Domino Effect
Live-Lock
Domino Effect
15
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Cumulative effect is produced when one event initiate series of similar events.
How to avoid these?
o Allowing the process to take check point independently.
Live-Lock
• case where a single failure can cause an infinite number of rollbacks.
• The Livelock problem may arise when a process rolls back to its checkpoint after
a failure and requests all the other affected processes also to roll back.
• In such a situation if the roll back mechanism has no synchronization, it may lead
to the livelock problem as described in the example
16
CS3551-DISTRIBUTED COMPUTING –UNIT 4
1. Uncoordinated checkpointing
2. Coordinated checkpointing
3. Communication-induced checkpointing
1. Uncoordinated Checkpointing
Each process has autonomy in deciding when to take checkpoints
The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation
Advantages:
17
CS3551-DISTRIBUTED COMPUTING –UNIT 4
2. Coordinated Checkpointing
1 Blocking Checkpointing:
3. Communication-induced Checkpointing
18
CS3551-DISTRIBUTED COMPUTING –UNIT 4
the message
Two types of communication-induced checkpointing
1. Model-based checkpointing
2. Index-based checkpointing.
Model-based checkpointing
The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede
all message-sending events.
Index-based checkpointing.
The checkpoint algorithm makes the following assumptions about the distributed system:
Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
Communication failures do not divide the network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage:
19
CS3551-DISTRIBUTED COMPUTING –UNIT 4
• Permanent and
• Tentative.
A permanent checkpoint is a local checkpoint at a process and is a part of a consistent global
checkpoint.
A tentative checkpoint is a temporary checkpoint that is made a permanent checkpoint on the
successful termination of the checkpoint algorithm.
The algorithm consists of two phases,
First Phase
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to
take tentative checkpoints. Each process informs Pi whether it succeeded in taking a
tentative checkpoint.
2. A process says “no” to a request if it fails to take a tentative checkpoint
3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase
1. Pi informs all the processes of the decision it reached at the end of the first phase.
3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic computation until it is informed of Pi’s decision.
Correctness: for two reasons
i. Either all or none of the processes take permanent checkpoint
ii. No process sends message after taking permanent checkpoint
An Optimization
The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
The Rollback Recovery Algorithm
The rollback recovery algorithm restores the system state to a consistent state after a
failure.
The rollback recovery algorithm assumes that a single process invokes the algorithm.
It assumes that the checkpoint and the rollback recovery algorithms are not invoked
20
CS3551-DISTRIBUTED COMPUTING –UNIT 4
concurrently.
1. An initiating process Pi sends a message to all other processes to check if they all are
willing to restart from their previous checkpoints.
2. A process may reply “no” to a restart request due to any reason (e.g., it is already
participating in a check pointing or a recovery process initiated by some other process).
3. If Pi learns that all processes are willing to restart from their previous checkpoints, Pi
decides that all processes should roll back to their previous checkpoints. Otherwise,
4. Pi aborts the roll back attempt and it may attempt a recovery at a later time.
Second Phase
3. During the execution of the recovery algorithm, a process cannot send messages related
to the underlying computation while it is waiting for Pi’s decision.
Correctness: Resume from a consistent state
Optimization: May not to recover all, since some of the processes did not change anything
The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
System Model and Assumptions
The algorithm makes the following assumptions about the underlying system:
21
CS3551-DISTRIBUTED COMPUTING –UNIT 4
The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
The message transmission delay is arbitrary, but finite.
Underlying computation/application is event-driven: process P is at state s, receives
message m, processes the message, moves to state s’ and send messages out. So the
triplet (s, m, msgs_sent) represents the state of P.
Two type of log storage are maintained:
– Volatile log: short time to access but lost if processor crash. Move to stable
logperiodically.
– Stable log: longer time to access but remained if crashed
– After executing an event, the triplet is recorded without any synchronization with
other processes.
– Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:
Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
22
CS3551-DISTRIBUTED COMPUTING –UNIT 4
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
STEP (a)
STEP (b)
end for
find the latest event e such that RCVDi←j(e) = c {Such an event e may be in the volatile storage
or stable storage.}
CkPti := e
end if end
for
end for{for k}
An Example
23
CS3551-DISTRIBUTED COMPUTING –UNIT 4
Since RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y,2) message
from Y), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 1≤ 2.
Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) = 1 ≤
1.
At Y, RCVDY←X(CkPtY ) = 1 < 2 and RCVDY←Z(CkPtY ) = 1 = SENTZ←Y (CkPtZ).
24
CS3551-DISTRIBUTED COMPUTING –UNIT 4
25