0% found this document useful (0 votes)
11 views

4.1.6. Coordinated Checkpointing Algorithm-1

Distributed Computing unit-4 Coordinated Checkpointing Algorithm

Uploaded by

953622205037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

4.1.6. Coordinated Checkpointing Algorithm-1

Distributed Computing unit-4 Coordinated Checkpointing Algorithm

Uploaded by

953622205037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT IV

4.1. Checkpointing and Rollback


Recovery
4.1.6. Coordinated
Checkpointing Algorithm
Source:
Ajay D Kshemkalyani & Mukesh Singhal (2010).
Distributed Computing: Principles, Algorithms and
Systems. Cambridge University Press

1 RIT/CSE/CS8603-DS/UNIT IV
TOPICS
4.1. Checkpointing and Rollback Recovery
 4.1.1.Introdution
 4.1.2. Background and Definitions
 4.1.3. Issues in Failure Recovery
 4.1.4. Checkpoint based Recovery
 4.1.5. Log based roll back Recovery
 4.1.6. Coordinated Checkpointing Algorithm
 4.1.7. Algorithm for asynchronous check pointing and
Recovery
4.2. Consensus and Agreement Algorithm
 4.2.1. Problem Definition
 4.2.2. Overview of Results
 4.2.3.Agreement in Failure- free system
 4.2.4. Agreement in Synchronous system with failure

2 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg)
It takes a consistent set of checkpointing
It avoids domino effect and livelock
problems during the recovery
Includes 2 parts: the checkpointing
algorithm and the recovery algorithm

3 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
 Checkpointing algorithm
 Assumptions:
 FIFO channel
 End-to-end protocols,
 Communication failures do not partition the network
 Single process initiation, no process fails during the execution
of the algorithm
 Two kinds of checkpoints:
 Permanent
 Tentative
 Permanent checkpoint:
 Local checkpoint, part of a consistent global checkpoint
 Tentative checkpoint:
 Temporary checkpoint, become permanent checkpoint when
the algorithm terminates successfully

4 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
 Checkpointing algorithm:
 2 phases
 The initiating process takes a tentative checkpoint and requests all other
processes to take tentative checkpoints.
 Every process can not send messages after taking tentative checkpoint.
 All processes will finally have the single same decision: do or discard
 All processes will receive the final decision from initiating process and act
accordingly
 Correctness: for 2 reasons
 Either all or none of the processes advances the checkpoint by
taking permanent checkpoint
 No process sends message after taking permanent checkpoint
 Optimization:
 maybe not all of the processes need to take checkpoints (if not
change since the last checkpoint

5 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
The rollback recovery algorithm
Restore the system state to a consistent
state after a failure with assumptions:
 Single initiator, checkpoint and rollback recovery
algorithms are not invoked concurrently
2 phases
 The initiating process send a message to all other
processes and ask for the preferences – restarting
to the previous checkpoints.
 All need to agree about either do or not.
 The initiating process send the final decision to all
processes, all the processes act accordingly after
receiving the final decision.

6 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…

• Correctness: resume from a consistent state


• Optimization: may not to recover all, since some of the
processes did not change anything

7 RIT/CSE/CS8603-DS/UNIT IV

You might also like