Data Recovery
Data Recovery
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second.
The durability and robustness of a DBMS depends on its complex architecture and its
underlying hardware and system software. If it fails or crashes amid transactions, it is expected
that the system would follow some sort of algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t
go any further. This is called transaction failure where only a few transactions or processes are
hurt.
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and
cause the system to crash. For example, interruptions in power supply may cause the failure of
underlying hardware or software failure.
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any
other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into
two categories −
Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of
information.
Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-
disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be rolled
back.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining
the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and later,
the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
<Tn, commit>
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following
manner −
The recovery system reads the logs backwards from the end to the last checkpoint.
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before saving
their logs.
We can have checkpoints at multiple stages so as to save the contents of the database
periodically.
A state of active database in the volatile memory can be periodically dumped onto a
stable storage, which may also contain logs and active transactions and buffer blocks.
<dump> can be marked on a log file, whenever the database contents are dumped from a
non-volatile memory to a stable one.
Recovery
When the system recovers from a failure, it can restore the latest dump.
Remote backup here a backup copy of the database is stored at a remote location from
where it can be restored in case of a catastrophe.
Alternatively, database backups can be taken on magnetic tapes and stored at a safer
place. This backup can later be transferred onto a freshly installed database to bring it to
the point of backup.
Grown-up databases are too bulky to be frequently backed up. In such cases, we have
techniques where we can restore a database just by looking at its logs. So, all that we need to do
here is to take a backup of all the logs at frequent intervals of time. The database can be backed
up once a week, and the logs being very small can be backed up every day or as frequently as
possible.
Remote Backup
Remote backup provides a sense of security in case the primary location where the database is
located gets destroyed. Remote backup can be offline or real-time or online. In case it is offline,
it is maintained manually.
Online backup systems are more real-time and lifesavers for database administrators and
investors. An online backup system is a mechanism where every bit of the real-time data is
backed up simultaneously at two distant places. One of them is directly connected to the system
and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and switches
the user system to the remote storage. Sometimes this is so instant that the users can’t even
realize a failure.
Shadow paging in DBMS
This is the method where all the transactions are executed in the primary memory or the
shadow copy of database. Once all the transactions completely executed, it will be
updated to the database. Hence, if there is any failure in the middle of transaction, it will
not be reflected in the database. Database will be updated after all the transaction is
complete.
A database pointer will be always pointing to the consistent copy of the database, and
copy of the database is used by transactions to update. Once all the transactions are
complete, the DB pointer is modified to point to new copy of DB, and old copy is
deleted. If there is any failure during the transaction, the pointer will be still pointing to
old copy of database, and shadow database will be deleted. If the transactions are
complete then the pointer is changed to point to shadow DB, and old DB is deleted.
As we can see in above diagram, the DB pointer is always pointing to consistent and
stable database. This mechanism assumes that there will not be any disk failure and only
one transaction executing at a time so that the shadow DB can hold the data for that
same memory space as the actual DB. Hence it is not efficient for huge DBs. In addition,
time.