Ch8 Distributed
Ch8 Distributed
S U P E R V I S O R
D R . A H M E D S A L E M
• A'laa Magdy
• Hadeer Anwar
• Martina Monir
code dealers team • Menna Mohsen
• Nermeen Ashraf
• Toqua Magdy
S U P E R V I S O R
D R . A H M E D S A L E M
What is
consensus
Consensus : in a distributed system agree on a single value.
ensures all processes
It's crucial for reliability and fault tolerance in systems like databases and blockchains.
1
Example:
2 imagine multiple friends trying to decide where to eat. Consensus is when all agree on one
restaurant despite some miscommunication.
3
Why Do We Need Consensus?
4
5 Reliability: If a node fails, the system can still function because all nodes agree on the
current state.
6
Consistency: All nodes maintain the same data, avoiding mismatches or errors.
Fault Tolerance: Consensus handles failures, such as crashed or malicious nodes, to ensure
7
the system works as expected.
8
9 Challenges in
Consensus
Failures: Nodes may crash or behave maliciously (e.g., Byzantine failures).
Network Issues: Messages can be delayed, lost, or arrive out of order.
Agreement Requirements: All non-faulty nodes must agree, even with some
faulty ones.
Achieving
1 consensus
In distributed systems, multiple computers (nodes) work together to
2 perform tasks. However, challenges like failures or communication delays
can lead to disagreements between nodes. Consensus algorithms are
3 used to ensure that all nodes in the system agree on a single value or
decision, even when some nodes fail or act unpredictably.
4
2
when leader crashes ?
Key Points:
3
• Simplifies consensus for easier understanding.
4 • Divides roles into Leader (proposes values) and
Followers (accepts values).
5 • Works in phases: Leader election → Log
replication → Commit.
6
7 Example:
8
• A team elects a captain (leader) who collects
votes (logs) from members and finalizes a
9
decision.
Paxos Algorithm
1
Key Points:
2
9
Consensus under arbitrary failure
semantics
6
System model
7
• We consider a primary P and n− 1 backups
8 B1,...,Bn−1.
• A client sends v ∈{T,F}to P • Messages may be lost,
9
but this can be detected.
• Messages cannot be corrupted beyond detection.
• A receiver of a message can reliably detect its
sender.
Byzantine agreement
requirements
2 BA2: If the primary is nonfaulty then every nonfaulty backup process stores exactly what the primary had
sent.
3
Observation
4
• Primary faulty ⇒ BA1 says that backups may store the same, but different (and thus wrong) value than
5
originally sent by the client. • Primary not faulty ⇒ satisfying BA2 implies that BA1 is satisfied.
9
Practical Byzantine Fault Tolerance (PBFT)
Key Points:
1
• Designed for systems with Byzantine failures (nodes acting maliciously or unpredictably).
2 • Works with a primary-backup model and requires 3f + 1 nodes to tolerate faulty nodes.
• Phases: Pre-prepare → Prepare → Commit → Reply.
3
Example:
4
• In a group of 4 friends deciding on dinner, one might lie about the choice, but the majority ensures the
5 right decision is made.
9
Handling Primary Failures
1 When the primary (leader) in PBFT fails, the system needs to switch to a new
leader (primary) without losing progress or creating inconsistencies. This
process ensures that any pending operation is executed exactly once by all
2
non-faulty servers.
3
Key Procedure: Essence:
• This process ensures that:
4
1.Detect Failure: ⚬ Pending operations are not lost.
⚬ A backup server detects the primary has failed and broadcasts a VIEW- ⚬ All servers execute the same operations in
5 CHANGE message to initiate a new view (v+1v + 1v+1). the same order.
2.New Primary: ⚬ The system transitions smoothly to a new
6 ⚬ The next primary (P∗P^*P∗) is chosen deterministically (pre-determined leader after a failure.
order). Example:
3.Collect View-Change Messages: 1.Primary P crashes while processing a request.
7
⚬ P∗P^*P∗ waits for 2k + 1 view-change messages (from non-faulty backups) 2.Backup B1 detects the failure, broadcasts
that include information about previously sent prepares. VIEW-CHANGE(2, P), and sends information
4.Broadcast New View:
8 ⚬ P∗P^*P∗ combines this information into a NEW-VIEW message:
about previous prepares.
3.The new primary P∗P^*P∗ collects messages
■ XXX: Includes all prepares from the previous view. from B1,B2,B3 and sends a NEW-VIEW
■ OOO: New set of pre-prepare messages to handle pending operations. message to all backups.
9
5.Replaying the View: 4.All backups replay pending operations and
⚬ Non-faulty backups replay the necessary operations from the previous view to resume execution in the new view.
ensure consistency in the new view (v+1v + 1v+1).
CAP Theorem:
• In distributed systems, achieving Consistency, Availability, and Partition Tolerance simultaneously is
impossible. Systems often trade off one of these properties during partitions.
1 C: consistency, by which a shared and replicated data
item appears as a single, up-to-date copy
2 A: availability, by which updates will always be
eventually executed
3
P: Tolerant to the partitioning of process group.
4 Failure Detection:
5
• Detecting Crashes:
⚬ A process checks for heartbeats from another process within a timeout.
6
⚬ If no heartbeat is received, the other process is suspected to have
crashed.
7
• Challenges:
⚬ False suspicions occur when delays are mistaken for crashes.
8
Reliable Communication:
Messages between processes must:
9 ■ Be received and delivered to all intended members.
■ Handle potential faults, like lost messages or crashes.
Message Logging:
⚬ Log nondeterministic events (e.g., received messages) to replay them during recovery.
⚬ Prevent orphan processes: If a logged message is missing, processes dependent on it may
behave inconsistently.