CSE446 Lecture 4
CSE446 Lecture 4
Even though, we
only show nodes
here, there could
be processes
underneath
https://www.networkstraining.com/wp-content/uploads/2021/06/Client-Server-p2p.png
Network type
• Dwork et al.* categorised three types of networks exhibiting
different properties: synchronous, asynchronous, and
partially/eventually synchronous
• The latency involved in delivering a message to all nodes in a
synchronous network is bound by some time denoted as ∆
• Any message sent at time T will be delivered by T + ∆
• On the other hand, the latency in an asynchronous network cannot
be reliably bound by any ∆, message will eventually be delivered
*C. Dwork, N. Lynch, and L. Stockmeyer “Consensus in the presence of partial synchrony”. Journal of the ACM (JACM),
35(2):288323, 1988.
Network type
• In a partially/eventually synchronous network, it is assumed
that
• the network will eventually act as a synchronous network
• even though it might be asynchronous over some arbitrary
period of time
Fault model
• In a distributed system, a node (process) might behave differently
for various reasons (e.g. intentional or unintentional corruption)
• When this happens, we call these nodes as faulty nodes
• Up to f nodes out of N (total number) may fail
• f is usually a function of N, like f < N/2 or f < N/3
• We mostly look at two types of faults:
• Crash failure
• Byzantine failure
• Nodes that do not fail are called “honest” or “correct” nodes
Fault model: crash failure
• The crash failure model deals with nodes that simply fail to
respond due to some hardware or software failure
• E.g. Hardware crash, hard disk bad sector, software crash, etc.
• It may happen any time without any prior warning
• The corresponding faulty node remains unresponsive until
further actions are taken
Fault model: BGP
• The Byzantine Generals
Problem, by Leslie Lamport,
Robert Shostak, and Marshall
Pease. ACM TOPLAS 1982
• Byzantine army divisions
camped outside the walls of an
enemy city
• Each division is led by a
general
• Generals decide on a common
plan of action
Fault model: byzantine fault
• There are two types of
generals: Loyal or Traitor
• Conditions needed to be
met:
• Loyal generals decide
upon the same plan of
action
• Small number of traitors
should not be able to lead
the loyal generals make a
bad decision
Fault model: byzantine fault
• General 2 receives
ATTACK, ATTACK
• General 3 receives
ATTACK, ATTACK
• So ATTACK is Not a Bad
Decision
Fault model: byzantine fault
• General 2 receives ATTACK,
ATTACK
• General 3 receives RETREAT,
ATTACK
• Now, ATTACK or RETREAT?
Fault model: byzantine fault
• Coping with failures in nodes not related to crash
• A (faulty) byzantine node sends conflicting information to
different parts of system (the byzantine behavior)
• Non-malicious: Software bugs
• Malicious reasons: Machine compromised
• P2P Networks:
• Faulty nodes generate corrupted and misleading messages
• Good nodes have to “agree to do the same thing” (agreement)
• Agreement in the presence of faults is challenging
Fault model: byzantine failure
• Byzantine failure deals with nodes that misbehave due to
some software bugs or because of the nodes being
compromised by an adversary
• A Byzantine node can behave maliciously by arbitrarily
sending deceptive messages to others
• This might affect the security of distributed systems
• Hence, such nodes are mostly relevant in application with
security implications
The need for consensus algorithm
• Consensus is a fundamental problem in
distributed applications
• One use-case is database replication
(aka Replicated Database)
• Database replication is the process of
storing data in more than one site or
node
• It is useful in ensuring resilience against
node failures within a network
• E.g. data are not lost when one or more
nodes fail to function in an excepted https://databand.ai/blog/data-replication-the-basics-risks-and-best-practices/
fashion
• This improves the availability of data
The need for consensus algorithm
• It is simply copying data from one
server to another server
• So that all the users can share the
same data without any
inconsistency
• To ensure synchronisation across
multiple nodes
• The mechanism of consensus is
used
• Consensus enables all nodes agree https://databand.ai/blog/data-replication-the-basics-risks-and-best-practices/