Unit 4 Transaction Management
Unit 4 Transaction Management
What is Transaction?
A set of logically related operations is known as transaction. The main operations of a
transaction are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it
in a buffer in main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer.
(Note: It doesn’t always need to write it to database back it just writes the changes to buffer this
is the reason where dirty read comes into picture)
Let us take a debit transaction from an account which consists of following operations:
R(A);
A=A-1000;
W(A);
The first operation reads the value of A from database and stores it in a buffer.
Second operation will decrease its value by 1000. So, buffer will contain 4000.
Third operation will write the value from buffer to database. So, A’s final value will be 4000.
But it may also be possible that transaction may fail after executing some of its operations. The
failure can be because of hardware, software or power etc. For example, if debit transaction
discussed above fails after executing operation 2, the value of A will remain 5000 in the database
which is not acceptable by the bank. To avoid this, Database has two important operations:
Commit: After all instructions of a transaction are successfully executed, the changes made by
transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the changes made
by transaction are undone.
1
Properties of a transaction
Atomicity:
As a transaction is set of logically related operations, either all of them should be executed or
none. A debit transaction discussed above should either execute all three operations or none. If
debit transaction fails after executing operation 1 and 2 then its new value 4000 will not be
updated in the database which leads to inconsistency.
Consistency:
If operations of debit and credit transactions on same account are executed concurrently, it may
leave database in an inconsistent state.
For Example, T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches inconsistent state.
Let us assume Account balance of A is Rs. 5000. T1 reads A (5000) and stores the value in its
local buffer space. Then T2 reads A (5000) and also stores the value in its local buffer space.
A’s value is updated to 4000 in database and then T2 writes the value from its buffer back to
database. A’s value is updated to 5500 which shows that the effect of debit transaction is lost and
database has become inconsistent.
TABLE 1
2
Isolation:
Result of a transaction should not be visible to others before transaction is committed. For
example, let us assume that A’s balance is Rs. 5000 and T1 debits Rs. 1000 from A. A’s new
balance will be 4000. If T2 credits Rs. 500 to A’s new balance, a will become 4500 and after this
T1 fails. Then we have to rollback T2 as well because it is using value produced by T1. So a
transaction results are not made visible to other transactions before it commits.
Durable:
Once database has committed a transaction, the changes made by the transaction should be
permanent. e.g.; If a person has credited $500000 to his account, bank can’t say that the update
has been lost. To avoid this problem, multiple copies of database are stored at different locations.
Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called serial schedule. A serial schedule is always consistent. e.g.; If a
schedule S has debit transaction T1 and credit transaction T2, possible serial schedules are T1
followed by T2 (T1->T2) or T2 followed by T1 ((T2->T1). A serial schedule has low throughput
and less resource utilization.
Question: Consider the following transaction involving two bank accounts x and y:
read(x);
x := x – 50;
write(x);
read(y);
y := y + 50;
write(y);
The constraint that the sum of the accounts x and y should remain constant is that of?
• Atomicity
• Consistency
• Isolation
• Durability
3
Solution: As discussed in properties of transactions, consistency properties says that sum of
accounts x and y should remain constant before starting and after completion of transaction. So,
the correct answer is B.
A transaction is a single logical unit of work which accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain properties
are followed. These are called ACID properties.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves the following two
operations.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
4
If the transaction fails after completion of T1 but before completion of T2. (say, after write(X)
but before write(Y)), then amount has been deducted from X but not added to Y. This results in
an inconsistent database state. Therefore, the transaction must be executed in entirety in order to
ensure correctness of database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to the correctness of a database. Referring to the example
above,
The total amount before and after the transaction must be maintained.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that
particular change in that transaction is written to memory or has been committed. This property
ensures that the execution of transactions concurrently will result in a state that is equivalent to a
state achieved these were executed serially in some order.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and
sum computed by
5
T’’: (X+Y = 50, 000+500=50, 500)
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of
a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes are
durably stored.
4.3 Schedules
Schedule, is a process of lining the transactions and executing them one by one.
When there are multiple transactions that are running in a concurrent manner and the order of
operation is needed to be set so that the operations do not overlap each other, Scheduling is
brought into play and the transactions are timed accordingly.
6
Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is one in
which no transaction starts until a running transaction has ended are called serial schedules.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
W(B)
where R(A) denotes that a read operation is performed on some data item ‘A’.This is a serial
schedule since the transactions perform serially in the order T1 —> T2
Non-Serial Schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved. This
might lead to a rise in the concurrency problem. The transactions are executed in a non-serial
manner, keeping the end result correct and same as the serial schedule. Unlike the serial schedule
where one transaction must wait for another to complete all its operation, in the non-serial
7
schedule, the other transaction proceeds without waiting for the previous transaction to complete.
This sort of schedule does not provide any benefit of the concurrent transaction. It can be of two
types namely, Serializable and Non-Serializable Schedule.
The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
4.4 Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the other
hand, a serial schedule does not need the serializability because it follows a transaction only
when the previous transaction is complete. The non-serial schedule is said to be in a serializable
schedule only when it is equivalent to the serial schedules, for an n number of transactions. Since
concurrency is allowed in this case thus, multiple transactions can execute concurrently. A
serializable schedule helps in improving both resource utilization and CPU throughput. These are
of two types:
1. Conflict Serializable:
2. View Serializable:
A Schedule is called view serializable if it is view equal to a serial schedule (no overlapping
transactions). A conflict schedule is a view serializable but if the serializability contains blind
writes, then the view serializable does not conflict serializable.
Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they read
commit are called recoverable schedules. In other words, if some transaction Tj is reading value
updated or written by some other transaction Ti, then the commit of Tj must occur after the
commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
8
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Commit
This is a recoverable schedule since T1 commits before T2, that makes the value read by T2
correct.
Cascading Schedule:
Also called Avoids cascading aborts/rollbacks (ACA). When there is a failure in one
transaction and this leads to the rolling back or aborting other dependent transactions, then such
scheduling is referred to as Cascading rollback or cascading abort. Example:
Cascadeless Schedule:
Schedules in which transactions read values only after all transactions whose changes they are
going to read commit are called cascadeless schedules. Avoids that a single transaction abort
leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to disallow a
transaction from reading uncommitted changes from another transaction in the same schedule.
In other words, if some transaction Tj wants to read value updated or written by some other
transaction Ti, then the commit of Tj must read it after the commit of Ti.
9
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
Commit
R(A)
Commit
This schedule is cascadeless. Since the updated value of A is read by T2 only after the updating
transaction i.e. T1 commits.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(A)
W(A)
abort
abort
It is a recoverable schedule but it does not avoid cascading aborts. It can be seen that if T1
aborts, T2 will have to be aborted too in order to maintain the correctness of the schedule as T2
has already read the uncommitted value written by T1.
Strict Schedule:
A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti precedes a
conflicting operation of Tj (either read or write), then the commit or abort event of Ti also
precedes that conflicting operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after Ti commits/aborts.
Example: Consider the following schedule involving two transactions T1 and T2.
10
T1 T2
R(A)
R(A)
W(A)
commit
W(A)
R(A)
commit
This is a strict schedule since T2 reads and writes A which is written by T1 only after the
commit of T1.
Non-Recoverable Schedule:
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Abort
T2 read the value of A written by T1, and committed. T1 later aborted, therefore the
value read by T2 is wrong, but since T2 committed, this schedule is non-recoverable.
Cascadeless schedules are stricter than recoverable schedules or are a subset of recoverable
schedules.
Strict schedules are stricter than cascadeless schedules or are a subset of cascadeless
schedules.
Serial schedules satisfy constraints of all recoverable, cascadeless and strict schedules and
hence is a subset of strict schedules.
11
The relation between various types of schedules can be depicted as:
(A) The schedule is view serializable schedule and strict recoverable schedule
(C) The schedule is non-serializable schedule and is not strict recoverable schedule.
(D) The Schedule is serializable schedule and is not strict recoverable schedule
T1 T2 T3
R(A)
W(A)
Commit
W(A)
W(A)
Commit
Commit
12
First of all, it is a view serializable schedule as it has view equal serial schedule T1 —> T2 —>
T3 which satisfies the initial and updated reads and final write on variable A which is required
for view serializability. Now we can see there is write – write pair done by transactions T1
followed by T3 which is violating the above-mentioned condition of strict schedules as T3 is
supposed to do write operation only after T1 commits which is violated in the given schedule.
Hence the given schedule is serializable but not strict recoverable.
4. 5 Concurrency control
Concurrency control concept comes under the Transaction in database management system
(DBMS).
It is a procedure in DBMS which helps us for the management of two simultaneous processes to
execute without conflicts between each other, these conflicts occur in multi user systems.
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
If many transactions try to access the same data, then inconsistency arises. Concurrency control
required to maintain consistency data.
For example, if we take ATM machines and do not use concurrency, multiple persons cannot
draw money at a time in different places. This is where we need concurrency.
Advantages
• In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
13
• While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
• The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on making the concurrent
execution of the transaction operations, there occur several challenging problems that
need to be solved.
In a database transaction, the two main operations are READ and WRITE operations. So, there is
a need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent.
So, the following problems occur with the Concurrent Execution of the operations:
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.
• At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
• At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
• Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
14
• At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
• At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
• Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:
For example: Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300. The diagram is shown below:
• At time t1, transaction TX reads the value from account A, i.e., $300.
• At time t2, transaction TY reads the value from account A, i.e., $300.
• At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
• At time t4, transaction TY writes the updated value, i.e., $400.
• After that, at time t5, transaction TX reads the available value of account A, and that
will be read as $400.
• It means that within the same transaction TX, it reads two different values of account
A, i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is
an unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
1. Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
16
• It is also known as a Read-only lock. In a shared lock, the data item can only read
by the transaction.
• It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.
2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as written by the
transaction.
• This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
1. Simplistic lock protocol- It is the simplest way of locking the data while transaction.
Simplistic lock-based protocols allow all the transactions to get the lock on the data before insert
or delete or update on it. It will unlock the data item after completing the transaction.
• Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
• Before initiating an execution of the transaction, it requests DBMS for all the lock
on all those data items.
• If all the locks are granted, then this protocol allows the transaction to begin.
When the transaction is completed then it releases all the lock.
• If all the locks are not granted, then this protocol allows the transaction to rolls
back and waits until all the locks are granted.
• The two-phase locking protocol divides the execution phase of the transaction
into three parts.
• In the first part, when the execution of the transaction starts, it seeks permission
for the lock it requires.
• In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.
17
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
18
• Growing phase: from step 2-6
• Shrinking phase: from step 8-9
• Lock point: at 6
• The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring
all the locks, the transaction continues to execute normally.
• The only difference between 2PL and strict 2PL is that Strict-2PL does not release
a lock after using it.
• Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
• Strict-2PL protocol does not have shrinking phase of lock release.
• The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
• The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or logical
counter.
• The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as
soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation
on a data.
1.Check the following condition whenever a transaction Ti issues a Read (X) operation:
19
If W_TS(X) <= TS(Ti) then the operation is executed.
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation
is executed.
Where,
Validation phase is also known as optimistic concurrency control technique. In the validation-
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
20
• Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
• Finish (Ti): It contains the time when Ti finishes its write phase.
This protocol is used to determine the time stamp for the transaction for serialization using the
time stamp of the validation phase, as it is the actual phase which determines if the transaction
will commit or rollback.
The serializability is determined during the validation process. It can't be decided in advance.
While executing the transaction, it ensures a greater degree of concurrency and also a smaller
number of conflicts.
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves
the Basic Timestamp Ordering Algorithm.
• If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is
rejected.
• If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction
and continue processing.
• If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
• If we use the Thomas write rule then some serializable schedule can be permitted that
does not conflict serializable as illustrate by the schedule in a given figure:
In the above figure, T1's read and precedes T1's write of the same data item. This schedule does
not conflict serializable.
21
Thomas write rule checks that T2's write is never seen by any transaction. If we delete the write
operation in transaction T2, then conflict serializable schedule can be obtained which is shown in
below figure.
Multiple Granularity
Multiple Granularity:
It can be defined as hierarchically breaking up the database into blocks which can be locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It makes easy to decide either to lock a data item or to unlock a data item. This type of hierarchy
can be graphically represented as a tree.
22
Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
In this example, the highest level shows the entire database. The levels below are file, record,
and fields.
1. Intention-shared (IS): It contains explicit locking at a lower level of the tree but
only with shared locks.
2. Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive
or shared locks.
3. Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
23
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
Observe that in multiple-granularity, the locks are acquired in top-down order, and locks must be
released in bottom-up order.
• If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to lock the
database, area A1 and file Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
• If transaction T2 modifies record Ra9 in file Fa, then it can do so after locking the
database, area A1 and file Fa in IX mode. Finally, it needs to lock the Ra9 in X mode.
• If transaction T3 reads all the records in file Fa, then transaction T3 needs to lock the
database, and area A in IS mode. At last, it needs to lock Fa in S mode.
• If transaction T4 reads the entire database, then T4 needs to lock the database in S
mode.
• Whenever more than one transaction is being executed, then the interleaved of logs
occur. During recovery, it would become difficult for the recovery system to
backtrack all logs and then start recovering.
• To ease this situation, 'checkpoint' concept is used by most DBMS.
in Transaction Processing Concept of this tutorial, so you can go through the concepts again to
make things clearer.
24