Dbms - 5
Dbms - 5
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
o In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be
solved.
In a database transaction, the two main operations are READ and WRITE operations. So, there
is a need to manage these two operations in the concurrent execution of the transactions as if
these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the operations:
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted
and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only,
as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different
values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will
be read as $400.
o It means that within the same transaction T X, it reads two different values of account A,
i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
Concurrency Control Protocols
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow
all the transactions to get the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all
those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.
o The two-phase locking protocol divides the execution phase of the transaction into three
parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as soon
as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
It does not have cascading abort as 2PL does.
o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009 times.
T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
Where,
o TS protocol ensures freedom from deadlock that means no transaction ever waits.
o But the schedule may not be recoverable and may not even be cascade- free.
Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
o This protocol is used to determine the time stamp for the transaction for serialization
using the time stamp of the validation phase, as it is the actual phase which determines if
the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided in
advance.
o While executing the transaction, it ensures a greater degree of concurrency and also less
number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in DBMS
as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the
grade table and needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state
and remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and
aborts one of the transactions.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains a
Wait for the graph to detect the deadlock cycle in the database.
o This is the suitable method for deadlock detection. In this method, a graph is created
based on the transaction and their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any
cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting
lock by another transaction then the DBMS simply checks the timestamp of both transactions. It
allows the older transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource,
then Ti is allowed to wait until the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj
is waiting for it, then Tj is killed and restarted later with the random delay but with the
same timestamp.
Transaction Failure:
System Crash:
Data-transfer Failure:
Shadow Paging is recovery technique that is used to recover database. In this recovery
technique, database is considered as made up of fixed size of logical units of storage which are
referred as pages. pages are mapped into physical blocks of storage, with help of the page
table which allow one entry for each logical page of database. This method uses two page tables
named current page table and shadow page table. The entries which are present in current page
table are used to point to most recent database pages on disk. Another table i.e., Shadow page
table is used when the transaction starts which is copying current page table. After this, shadow
page table gets saved on disk and current page table is going to be used for transaction. Entries
present in current page table may be changed during execution but in shadow page table it never
get changed. After transaction, both tables become identical. This technique is also known
as Cut-of-Place updating.
To understand concept, consider above figure. In this 2 write operations are performed on page 3
and 5. Before start of write operation on page 3, current page table points to old page 3. When
write operation starts following steps are performed :
1. Firstly, search start for available free block in disk blocks.
2. After finding free block, it copies page 3 to free block which is represented by Page 3 (New).
3. Now current page table points to Page 3 (New) on disk but shadow page table points to old
page 3 because it is not modified.
4. The changes are now propagated to Page 3 (New) which is pointed by current page table.
COMMIT Operation :
To commit transaction following steps should be done :
1. All the modifications which are done by transaction which are present in buffers are
transferred to physical database.
2. Output current page table to disk.
3. Disk address of current page table output to fixed location which is in stable storage
containing address of shadow page table. This operation overwrites address of old shadow
page table. With this current page table becomes same as shadow page table and transaction
is committed.
Failure : If system crashes during execution of transaction but before commit operation, With
this, it is sufficient only to free modified database pages and discard current page table. Before
execution of transaction, state of database get recovered by reinstalling shadow page table. If the
crash of system occur after last write operation then it does not affect propagation of changes that
are made by transaction. These changes are preserved and there is no need to perform redo
operation.
Advantages :
This method require fewer disk accesses to perform operation.
In this method, recovery from crash is inexpensive and quite fast.
There is no need of operations like- Undo and Redo.
Recovery using this method will be faster.
Improved fault tolerance: Shadow paging provides improved fault tolerance since it isolates
transactions from each other. This means that if one transaction fails, it does not affect the
other transactions that are currently executing.
Increased concurrency: Since modifications made during a transaction are written to the
shadow copy instead of the actual database, multiple transactions can be executed
concurrently without interfering with each other. This leads to increased concurrency and
better performance.
Simplicity: Shadow paging is a relatively simple technique to implement. It requires minimal
modifications to the existing database system, making it easier to integrate into existing
systems.
No need for log files: In traditional database systems, log files are used to maintain a record
of all changes made to the database. Shadow paging eliminates the need for log files since all
changes are made to the shadow copy. This reduces the overhead associated with maintaining
log files and makes the system more efficient.
Disadvantages :
Due to location change on disk due to update database it is quite difficult to keep related
pages in database closer on disk.
During commit operation, changed blocks are going to be pointed by shadow page table
which have to be returned to collection of free blocks otherwise they become accessible.
The commit of single transaction requires multiple blocks which decreases execution speed.
To allow this technique to multiple transactions concurrently it is difficult.
Data fragmentation: The main disadvantage of this technique is the updated Data will suffer
from fragmentation as the data is divided up into pages that may or not be in linear order for
large sets of related hence, complex storage management strategies.
Garbage collection: Garbage will accumulate in the pages on the disk as data is updated and
pages lose any references. For example if i have a page that contains a data item X that is
replaced with a new value then a new page will be created. Once the shadow page table is
updated nothing will reference the old value of X. The operation to migrate between current
and shadow directories must be implemented as an atomic mode.
Performance overhead: Since modifications made during a transaction are written to the
shadow copy, there is a performance overhead associated with copying the changes back to
the actual database once the transaction is committed. This can impact the overall
performance of the system.
Limited concurrency control: Shadow paging does not provide strong concurrency control
mechanisms. While it allows for multiple transactions to execute concurrently, it does not
prevent conflicts between transactions. This means that transactions can interfere with each
other, leading to inconsistencies in the database.
Difficult to implement for some systems: Shadow paging can be difficult to implement for
some systems that have complex data structures or use a lot of shared memory. In these cases,
it may not be possible to maintain a shadow copy of the entire database.
Limited fault tolerance: While shadow paging does provide improved fault tolerance in
some cases, it does not provide complete fault tolerance. In the event of a crash, there is still a
risk of data loss if the changes made during a transaction are not properly copied to the actual
database.
The atomicity property of DBMS states that either all the operations of transactions must be
performed or none. The modifications done by an aborted transaction should not be visible to the
database and the modifications done by the committed transaction should be visible. To achieve
our goal of atomicity, the user must first output stable storage information describing the
modifications, without modifying the database itself. This information can help us ensure that all
modifications performed by committed transactions are reflected in the database. This
information can also help us ensure that no modifications made by an aborted transaction persist
in the database.
Log based Recovery in DBMS
Use of Checkpoints – When a system crash occurs, user must consult the log. In principle,
that need to search the entire log to determine this information. There are two major difficulties
with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need to be redone have already
written their updates into the database. Although redoing them will cause no harm, it will
cause recovery to take longer.
To reduce these types of overhead, user introduce checkpoints. A log record of the form
<checkpoint L> is used to represent a checkpoint in log where L is a list of transactions active at
the time of the checkpoint. When a checkpoint log record is added to log all the transactions that
have committed before this checkpoint have <Ti commit> log record before the checkpoint
record. Any database modifications made by Ti is written to the database either prior to the
checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no need to perform
a redo operation on Ti. After a system crash has occurred, the system examines the log to find
the last <checkpoint L> record. The redo or undo operations need to be applied only to
transactions in L, and to all transactions that started execution after the record was written to the
log. Let us denote this set of transactions as T. Same rules of undo and redo are applicable on T
as mentioned in Recovery using Log records part. Note that user need to only examine the part of
the log starting with the last checkpoint log record to find the set of transactions T, and to find
out whether a commit or abort record occurs in the log for each transaction in T. For example,
consider the set of transactions {T0, T1, . . ., T100}. Suppose that the most recent checkpoint
took place during the execution of transaction T67 and T69, while T68 and all transactions with
subscripts lower than 67 completed before the checkpoint. Thus, only transactions T67, T69, . . .,
T100 need to be considered during the recovery scheme. Each of them needs to be redone if it
has completed (that is, either committed or aborted); otherwise, it was incomplete, and needs to
be undone.
Log-based recovery is a technique used in database management systems (DBMS) to recover
a database to a consistent state in the event of a failure or crash. It involves the use of transaction
logs, which are records of all the transactions performed on the database.
In log-based recovery, the DBMS uses the transaction log to reconstruct the database to a
consistent state. The transaction log contains records of all the changes made to the database,
including updates, inserts, and deletes. It also records information about each transaction, such as
its start and end times.
When a failure occurs, the DBMS uses the transaction log to determine which transactions
were incomplete at the time of the failure. It then performs a series of operations to undo the
incomplete transactions and redo the completed ones. This process is called the redo/undo
recovery algorithm.
The redo operation involves reapplying the changes made by completed transactions that
were not yet saved to the database at the time of the failure. This ensures that all changes are
applied to the database.
The undo operation involves undoing the changes made by incomplete transactions that were
saved to the database at the time of the failure. This restores the database to a consistent state by
reversing the effects of the incomplete transactions.
Once the redo and undo operations are completed, the DBMS can bring the database back
online and resume normal operations.
Log-based recovery is an essential feature of modern DBMSs and provides a reliable
mechanism for recovering from failures and ensuring the consistency of the database.