0% found this document useful (0 votes)
51 views

Agreement Protocols Agreement Problem: M Out of N Processors May Fail

The document discusses agreement protocols for distributed systems. It describes the agreement problem where all sites must agree on a value. The system model assumes some number of processors may fail. Agreement is impossible in asynchronous systems, even with single processor failures. Several failure modes are described including crash, omission, and Byzantine faults. The taxonomy of agreement problems includes Byzantine agreement, consensus, and interactive consistency. Impossibility results are presented, as well as algorithms like the Lamport-Shostak-Pease algorithm for solving Byzantine agreement.

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Agreement Protocols Agreement Problem: M Out of N Processors May Fail

The document discusses agreement protocols for distributed systems. It describes the agreement problem where all sites must agree on a value. The system model assumes some number of processors may fail. Agreement is impossible in asynchronous systems, even with single processor failures. Several failure modes are described including crash, omission, and Byzantine faults. The taxonomy of agreement problems includes Byzantine agreement, consensus, and interactive consistency. Impossibility results are presented, as well as algorithms like the Lamport-Shostak-Pease algorithm for solving Byzantine agreement.

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Agreement Protocols

Agreement Problem
 all sites must agree on a value, say, 0 or 1
 example: decision to commit a DB transaction
 just voting is not enough
 processors may send inconsistent votes to different sites

System Model
Assume:

 m out of n processors may fail


 system is fully connected, pairwise
 receiver knows sender's identity
 communications are reliable

Synchronous vs. Asynchronous


 synchronous: all processors proceed in ``lock step''
 asynchronous: each processor proceeds at its own pace

Agreement problem is not solvable in an asynchronous system, even for single-


processor failures.

Failure Modes
 crash fault
 omission fault
 malicious (Byzantine) fault

Synchronous model allows detection of first two kinds of failures.

Byzantine failures may be due to hardware or software failures, or due to malicious


attacks.

Other Issues
 authentication of messages
 metrics: time, message traffic, storage overhead

Taxonomy of Problems
All non-faulty processors must agree on value(s) from a non-faulty processor

 Byzantine agreement:
o The source processor broadcasts its initial value to all other processes.
o Agreement: All nonfaulty processors agree on the same value.
o Validity: If the source processor is nonfaulty, the the common agreed
upon value by all nonfaulty processors should be the initial value of the
source
 consensus:
o Every processor broadcast the initial value to all other processors.
o Agreement: All nonfaulty processors agree on the same value.
o Validity: If the initial value of every nonfaulty processor is v, then the
agreed upon common value by all nonfaulty processors must be v.
 interactive consistency:
o every processor broadcasts its initial value to all other processors.
o Agreement: All nonfaulty processors agree on the same vector. (v_1,
v_2, ..., v_n).
o Validity: If the ith processor is nonfaulty and its initial value is v_i, then
the ith value to be agreed on by all nonfaulty processors must be v_i.

Byzantine agreement is the most basic one.

Algorithms to solve the other problems can be constructed from an algorithm to solve
the Byzantine agreement problem, though more direct algorithms may also exist.

Impossibility Results
 Byzantine agreement is impossible if m > (n1)/3 
o e.g., (31)/3  = 0
 Byzantine agreement is impossible with < (m+1) message exchanges

We will see some algorithms for solving the Byzantine agreement problem that fall
within these bounds. However, we will also see that the algorithms are fairly complex.
This should naturally lead one to think twice when designing a system, to see if there
is a way to avoid creating situations that require agreement.
See the following simple example with 3 processors, from text. The arrows indicate
state information made available to other nodes. In the first case, processor A initiates
the agreement protocol and processor B is maliciously faulty.

C sees that B has decided for 0 and A has decided for 1. To satisfy the Byzantine
agreement problem, C must decide for 1, since A is not faulty and A has decided for
1. This implies that the algorithm followed by C (and hence by any non-faulty non-
initiating processor) must break ties in favor of the initiating processor.

The next case is where the processor A is a traitor, and reports different values to B
and C.

B thinks A has decided for 0 and C thinks A has decided for 1. If the algorithm breaks
ties in favor of the initiator, C must decide for 1. However, B must follow the same
algorithm, and so it must decide for 0. This means we have no agreement among the
two nonfaulty processors.

Proof of the full theorem generalizes this reasoning to a larger number of processors.
Lamport-Shostak-Pease Algorithm -- No
failures
 solves Byzantine agreement for n  3m+1 processors in the presence
of m faulty processors
 recursively defined, as OM(m), m  0

This is called the ``Oral Message'' algorithm, because the conditions correspond to
what we would expect if messages are delivered orally, in person, by pairwise
conversations between the parties involved in the consensus.

`Oral' Messages
 every message that is sent is delivered exactly
 the receiver of a message knows who sent it
 the absence of a message can be detected

Lamport Terminology for Byzantine


Agreement
 every processor is a general
 the general who initiates the agreement protocol is the commander
 the value suggested by the commander is the order
 the other generals, to whom the commander sends the order, are his lieutenants
 the faulty processors are traitors
 the nonfaulty processors are loyal

OM(0,S)
If there are no traitors, achieving agreement is easy:

 The commander i sends the proposed value v to every lieutenant j in S  {i}


 Each lieutenant j accepts the value v from i

OM(m,S) for m > 0


S is the set of generals for which we want agreement.
1. The commander i sends a value v directly to every lieutenant j  S  {i}.
2. For each lieutenant j  S  {i}, let vj be the value lieutenant j receives from the
commander i, or else be RETREAT of he receives no value.
Lieutenant j initiates OM(m1, S  {i}) (recursively) with value vj, acting as
commander.

The notation vj here helps us to remember that j received the value vj from i in
the previous round, and j is asking the other generals to agree on this fact. At
the end of each of these recursive executions, all every loyal
lieutenants j  S {i} has agreed on a set of pairs (k,vk), one for each k  S{i}.

3. When Step 2 has been completed by all lieutenants, each lieutenant j tabulates
the pairs it received in Step 2 (its own pair containing the original value from
its commander and the other pairs containing the values returned by its own
lieutenants by the recursive invocation of OM(m,S{i})) and agrees on the
value v = majority ({(k,vk) | k  S {i}}) that is in the majority of those pairs, to
be the result of OM(m, S).

One feature of this algorithm that some people have found confusing is the way in
which the results of the recursive algorithms are combined. That is, the values must be
retained and then combined, by taking the majority, after the entire round has
completed.

Another feature that some people have found confusing is that there must be an
arbitrary rule, such as choosing the lower value, is to break ties. Since traitors may not
send messages, there also must be a default value, such as 0, that is used for all
generals from which no pair is received. Likewise, if there is no majority, a default
value must be used for the result of OM(m,S). So long as all loyal generals agree on
the tie-breaking rule and the default value, there will still be consensus among the
loyal generals.

To understand this algorithm, it helps to start with the case that the commander i is
loyal. In that case, each lieutenant j will receive the same value v from i. The loyal
ones can simply accept the value v and it will not matter what the traitors do.

However, since there is no way for a lieutenant j to tell whether the commander i is
traitor, one must assume that he may be a traitor. To protect against the commander
sending different values to the different lieutenants, the lieutenants must hold a ballot
to reach consensus on what message the commander sent to each one of them. The
rest of the algorithm is the procedure for that ballot.
Since the messages are transmitted "orally" (not broadcast), the lieutenants must all
exchange information about what they received in the previous round, before they can
hold the ballot. The ballot would still be easy if we could trust every processor to
report accurately what it received. However, we must allow fo the possibility that
some lieutenants are traitors, and so will report different things to different other
lieutenants. That is why we need to do a Byzantine agreement on each of the
messages that was sent to a lieutenant in the previous round.

When we get to the recursive invocation of OM(m1,S{i}), it is not obvious that we


have reduced the problem sufficiently to satisfy the preconditions for OM(m1,S{i}).
There are two possibilities:

a. The commander i is a traitor. In this case, it is clear that the recursion should
work. Only m of the lieutenants are traitors. We are assuming | S |  3m+1, so
| S {i} |  3m > 3(m1)+1. It follows that OM(m1, S{i}) can achieve
Byzantine agreement on the message "i sent j the value v" among the loyal
lieutenants in S{i}.
b. Processor i is a traitor. In this case, it is not so clear that the recursion should
work. We have reduced the number of processors in the consensus by one, but
there may remain m processors in S{i}. If m is the number of traitors, how can
we get away with using OM(m1,...)?

The second case is dealt with by the Validity Lemma, which is stated and proven
below. This lemma guarantees that if the commander is loyaal, O(m,S) can tolerate up
to k traitors if | S |  2k+m. We will explain this lemma in more detail below, using the
original theorems and proofs of Lamport, Shostak, and Pease.

Byzantine Agreement Conditions


1. Agreement: All loyal generals agree on the same value.
2. Validity: If the commander is loyal, then the common agreed upon value for all
loyal lieutenants is the initial one given by the commander.

Validity Lemma
Lemma: For any m and k, OM(m,S) satisfies the Validity Condition if there are more
than 2k+m processors and at most k of them are traitors.

Proof:
The proof is by induction on m. As a basis for the induction, we consider the case
of OM(0). The Validity Condition only specifies what must happen if the commander
is loyal. It is easy to see that if the commander is loyal OM(0) satisfies the Validity
Condition, since all the processes get the same value v and agree upon that. We
therefore can assume the theorem is true for OM(m1) and prove that is tis true
for OM(m), m > 0.

For the induction step, we have m  1. In Step 1, the loyal commander i sends a
value v to all the other processors. At Step 2, each loyal
lieutenant j applies O(M1,S{i}). Since we are assuming that | S | > 2k + m, we have
| S {i} | > 2k + (m1), so we can apply the induction hypothesis to conclude that
every loyal lieutenant agrees on the value vj=v for each invocation of OM(m1,S{i})
by a loyal commander j. Since there are atmost k traitors, and | S {i} | > 2k + (m1) >
2k, a majority of the lieutenants in S {i} are loyal. Hence, when each lieutenant gets
to Step 3 it will find a majority of the other lieutenants support the value v, and so it
will agree to the value v. This confirms the Validity Condition.

Agreement Theorem
Theorem: For any m, OM(m,S) satisfies the Validity and Agreement Conditions if
there are more than 3m generals and at most m of them are traitors.

Proof:

The proof is by induction on m, similar to that of the Validity Lemma. As a basis for
the induction, we consider the case of OM(0). If there are no traitors, it is easy to see
that OM(0) satsfies the Validity and Agreement Conditions. We therefore can assume
the theorem is true for OM(m1) and prove that it is true for OM(m), m > 0.

For the induction step, have m  1. We consider two cases, depending on whether the
commander is a traitor.

1. Suppose the commander i is loyal. By taking k equal to m in the Validity


Lemma, we see that OM(m) satisfies the Validity Condition. Moreover, since
we are assuming the commander is loyal the Agreement condition is also
satisfied.
2. Suppose the commander is a traitor. At most m1 of the lieutenants can be
traitors. Since there are more than 3m processes, there are more than 3m1 >
3(m1) processes in S{i}. We may therefore apply the induction hypothesis to
conclude that OM(m1) satisfies the Agreement and Validity Conditions.
Hence, any two loyal lieutenants get the same vector of values v1,,vn1, and
therefore obtain the same value majority(v1,,vn1) in Step 3, proving the
Agreement Condition.

Do an example, for 4 processors, interactively

Four Processor Example: Nonfaulty


Commander

Round 1: processor A executes OM(1), where processor C (in red) is faulty.

Round 2: processors B, C, and D execute OM(0). Dashed lines indicate messages sent
during the previous round.
Three Processor Example: Faulty
Commander

Round 1: processor A executes OM(1), where processor A is faulty.

Round 2: processors B, C, and D execute OM(0).

Message Complexity
 T(0,n) = n1
 T(m,n) = (n1)T(m1,n1), for m > 0
 T(m,n) = (n1)(n2)(n3)(nm1)  O(nm)

Dolev et alia Algorithm


 polynomical message complexity
 2m+3 message rounds
This is an example of a time-message trade off

The Basis of the Algorithm


 LOW = m + 1, HIGH = 2m + 1
 any subset of at least LOW processors has at least one nonfaulty processor
 we can throw out any assertion that is denied by at least LOW processors
 any subset of at least HIGH processors includes a m + 1 (a majority of)
processors that are nonfaulty
 we can rely on an assertion that is supported by the majority of a subset of
xHIGH processors

Messages Used by the Algorithm


 *  sender is asserting a value of 1
 name of a processor  sender received a * from the processor

Wix = set of processors that sent x to i = witnesses of x for i

i is a direct supporter of P iff P  Wi*

i is an indirect supporter of P iff | WiP |  LOW

i confirms P iff | WiP |  HIGH

To make this algorithm work, I believe we need to assume the following:

 "broadcast" means to send the message to all processors, including one's self
 therefore, whenever a process sends a message to the other processors, it will
receive the same message, and will be added to its own witness set for that
message in the next round

Initiation Condition
 2nd round: receives a * from source in round 1
 K+1st round: at least LOW+max(0,K/2 2) processors are confirmed

The Algorithm
 round 1: source broadcasts value to all processors
 round k > 1:
o broadcast the names of all processors for which it is a direct or indirect
supporter and which it has not previously broadcast
o if initiation condition was true and end of previous round and * was not
previously broadcast, broadcast * now
 commit to 1 if HIGH processors are confirmed
 after round 2m+3, if 1 is not commited, agree on 0

If time permits, go though an example of the algorithm.

In round 1, suppose the faulty processor A initiates, sending "*" messages to D & B,
but not to C.
At the start of round 2, processors B & D have each received a "*" message from A
and so become direct supporters of A. They respond by broadcasting "A". Since this is
round 2, B & D initiate, which causes them to broadcast "*"

At the start of round 3, processors B & D have received "*" from B & D, so both
become direct supporters of B & D, and broadcast "B" & "D". Processor C has
received "*" from B & D, so it becomes a supporter of both, and broadcasts "B" &
"D". Processor C has also received 2 witness messages for A, from B & D.
Since LOW=m+1=2, C now becomes an indirect supporter of A and so it also
broadcasts "A".

At the start of round 4, processor C has 3 witnesses for B & D.


Since LOW + max{0, K/2  2} = 2, C satisfies the initiation condition. C initiates
broadcasts "*".
At the start of round 5, processors B, C, & D have all just received "*" from C. They
become direct supporters of C and broadcast "C".

By the end of round 5 each nonfaulty processor has confirmed HIGH = 3 processors
supporting commitment, so all three nonfaulty processors commit.

What happens if the fault processor A only sends a "*" message to one of the other
processors?

What happens if the first initiating processor is not faulty?

Analysis of Dolev Algorithm


 Number of rounds may be up to 2m+3
 What is the worst-case number of messages?
The text says it is polynomial. What is the polynomial?

You might also like