4.1.6. Coordinated Checkpointing Algorithm-1
4.1.6. Coordinated Checkpointing Algorithm-1
1 RIT/CSE/CS8603-DS/UNIT IV
TOPICS
4.1. Checkpointing and Rollback Recovery
4.1.1.Introdution
4.1.2. Background and Definitions
4.1.3. Issues in Failure Recovery
4.1.4. Checkpoint based Recovery
4.1.5. Log based roll back Recovery
4.1.6. Coordinated Checkpointing Algorithm
4.1.7. Algorithm for asynchronous check pointing and
Recovery
4.2. Consensus and Agreement Algorithm
4.2.1. Problem Definition
4.2.2. Overview of Results
4.2.3.Agreement in Failure- free system
4.2.4. Agreement in Synchronous system with failure
2 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg)
It takes a consistent set of checkpointing
It avoids domino effect and livelock
problems during the recovery
Includes 2 parts: the checkpointing
algorithm and the recovery algorithm
3 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
Checkpointing algorithm
Assumptions:
FIFO channel
End-to-end protocols,
Communication failures do not partition the network
Single process initiation, no process fails during the execution
of the algorithm
Two kinds of checkpoints:
Permanent
Tentative
Permanent checkpoint:
Local checkpoint, part of a consistent global checkpoint
Tentative checkpoint:
Temporary checkpoint, become permanent checkpoint when
the algorithm terminates successfully
4 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
Checkpointing algorithm:
2 phases
The initiating process takes a tentative checkpoint and requests all other
processes to take tentative checkpoints.
Every process can not send messages after taking tentative checkpoint.
All processes will finally have the single same decision: do or discard
All processes will receive the final decision from initiating process and act
accordingly
Correctness: for 2 reasons
Either all or none of the processes advances the checkpoint by
taking permanent checkpoint
No process sends message after taking permanent checkpoint
Optimization:
maybe not all of the processes need to take checkpoints (if not
change since the last checkpoint
5 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
The rollback recovery algorithm
Restore the system state to a consistent
state after a failure with assumptions:
Single initiator, checkpoint and rollback recovery
algorithms are not invoked concurrently
2 phases
The initiating process send a message to all other
processes and ask for the preferences – restarting
to the previous checkpoints.
All need to agree about either do or not.
The initiating process send the final decision to all
processes, all the processes act accordingly after
receiving the final decision.
6 RIT/CSE/CS8603-DS/UNIT IV
4.1.6. Coordinated Checkpointing
Algorithm (Koo Toueg) Contd…
7 RIT/CSE/CS8603-DS/UNIT IV