DBMS Unit 4 Notes
DBMS Unit 4 Notes
me/jntuh
UNIT – IV
Transaction Management: Transaction Concept, Transaction State, Implementation of Atomicity and
Durability, Concurrent Executions, Serializability, Recoverability, Implementation of Isolation, Testing for
serializability, Lock Based Protocols, Timestamp Based Protocols, Validation- Based Protocols, Multiple
Granularity, Recovery and Atomicity, Log–Based Recovery, Recovery with Concurrent Transactions.
1. TRANSACTION
Definition: A transaction is a single logical unit consisting of one or more database access
operation.
Example: Withdrawing 1000 rupees from ATM.
The following set of operations are performed to withdraw 1000 rupees from database
2. ACID PROPERTIES
ACID properties are used for maintaining the integrity of database during transaction processing.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity: This property ensure that either all of the tasks of a transaction are performed or
none of them. In simple words it is referred as “all or nothing rule”.
Each transaction is said to be atomic if when one part of the transaction fails, the entire
transaction fails. When all parts of the transaction completed successfully, then the transaction
said to be success. (“all or nothing rule” )
ii. Consistency: The consistency property ensures that the database must be in consistent state
before and after the transaction. There must not be any possibility that some data is incorrectly
affected by the execution of a transaction.
For example, transfering funds from one account to another, the consistency property ensures
that the total values of funds in both the accounts is the same before and end of the transaction.
i.e., Assume initially, A balance = $400 and B balance = 700$.
The total balance of A + B = 1100$ (Before transferring 100$ from A to B)
The total balance of A + B = 1100$ (After transferring 100$ from A to B)
iii. Isolation: For every pair of transactions, one of the transactions should not start execution
before the other transaction execution completed, if they use some common data variable. That
is, if the transaction T1 is executing and using the data item X, then transaction T2 should not
start until the transaction T1 ends, if T2 also use same data item X.
For example, Transaction T1: Transfer 100$ from account A to account B
Transaction T2: Transfer 150$ from account B to account C
Assume initially, A balance = B balance = C balance = $1000
Transaction T1 Transaction T2
10:00 AM Read A’s balance ($1000) Read B’s balance ($1000)
10:01 AM A balance = A Balance – 100$ (1000-100 = 900$) B balance = B Balance – 150$ (1000-150 = 850$)
10:02 AM Read B’s balance ($1000) Read C’s balance ($1000)
10:03 AM B balance = B Balance + 100$ (1000+100 = 1100$) C balance = C Balance + 150$ (1000+150 = 1150$)
10:04 AM Write A’s balance (900$) Write B’s balance (850$)
10:05AM Write B’s balance (1100$) Write C’s balance (1150$)
10:06 AM COMMIT COMMIT
After completion of Transaction T1 and T2, A balance = 900$, B balance = 1100$, C balance
=1150$. But B balance should be 950$. The B balance is wrong due to execution of T1 and T2
parallel and in both the transactions, Account B is common. The last write in account B is at
10:05 AM, so that B balance is 1100$ (write in account B at 10:04 AM is overwritten).
iv. Durability: Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery-management
component of database systems ensures the durability of transaction. For example, assume
account A balance = 1000$. If A withdraw 100$ today, then the A balance = 900$. After two
days or a month, A balance should be 900$, if no other transactions done on A.
3. STATES OF TRANSACTION
A transaction goes through many different states throughout its life cycle. These states are called
as transaction states. They are:
Active State:
This is the first state in the life cycle of a transaction.
Once the transaction starts executing, then it is said to be in active state.
During this state it performs operations like READ and WRITE on some data items. All
the changes made by the transaction are now stored in the buffer in main memory. They
are not updated in database.
From active state, a transaction can go into either a partially committed state or a failed
state.
Committed State:
After all the changes made by the transaction have been successfully updated in the
database, it enters into a committed state and the transaction is considered to be fully
committed.
After a transaction has entered the committed state, it is not possible to roll back (undo)
the transaction. This is because the system is updated into a new consistent state and the
changes are made permanent.
The only way to undo the changes is by carrying out another transaction called
as compensating transaction that performs the reverse operations.
Failed State:
When a transaction is getting executed in the active state or partially committed state and
some failure occurs due to which it becomes impossible to continue the execution, it
enters into a failed state.
Aborted State:
After the transaction has failed and entered into a failed state, all the changes made by it
have to be undone.
To undo the changes made by the transaction, it becomes necessary to roll back the
transaction.
After the transaction has rolled back completely, it enters into an aborted state.
Terminated State:
This is the last state in the life cycle of a transaction.
After entering the committed state or aborted state, the transaction finally enters into a
terminated state where its life cycle finally comes to an end.
i. Serial Schedules:
All the transactions execute serially one after the other.
When one transaction executes, no other transaction is allowed to execute.
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 A=A+500
Write(A) Write(A)
Read(B) COMMIT
B=B+100 Read(A)
Write(B) A=A-100
COMMIT Write(A)
Read(A) Read(B)
A=A+500 B=B+100
Write(A) Write(B)
COMMIT COMMIT
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 Read(A)
Write(A) A=A-100
Read(A) Write(A)
A=A+500 A=A+500
Read(B) Read(B)
B=B+100 B=B+100
Write(B) Write(B)
COMMIT COMMIT
Write(A) Write(A)
COMMIT COMMIT
In schedule-1 and schedule-2, the two transactions T1 and T2 executing concurrently. The
operations of T1 and T2 are interleaved. So, these schedules are Non-Serial Schedule.
In Schedule -1, only rule (1) & (2) are true, but rule (3) is not holding. So, the operations are not conflict.
In Schedule -2, rule (1), (2) & (3) are true. So, the operations are conflict.
In Schedule -3, only rule (1) & (3) are true, but rule (2) is not holding. So, the operations are not conflict.
In Schedule -4, rule (1), (2) & (3) are true. So, the operations are conflict.
List all the conflicting operations and determine the dependency between the transactions
(Thumb rule to find conflict operations: For each Write(X) in Ta, make a pair with each Read(X) and Write(X) in Tb.
The order is important in each pair i.e., for example, Read after Write on X or write after read on X in the given schedule. )
There exists a cycle in the above graph. Therefore, the schedule S is not conflict serializable.
There exists no cycle in the precedence graph. Therefore, the schedule S is conflict serializable.
View Serializability: Two schedules S1 and S2 are said to be view equivalent if both of
them satisfy the following three rules:
(1) Initial Read: The first read operation on each data item in both the schedule must be same.
For each data item X, If first read on X is done by transaction T a in schedule S1, then in
schedule2 also the first read on X must be done by transaction Ta only.
(2) Updated Read: It should be same in both the schedules.
If Read(X) of Ta followed by Write(X) of Tb in schedule S1, then in schedule S2 also, Read(X)
of Ta must follow Write(X) of Tb ..
(3) Final write: The final write operation on each data item in both the schedule must be same.
For each data item X, if X has been updated at last by transaction Ti in schedule S1, then in
schedule S2 also, X must be updated at last by transaction Ti.
Note: Every conflict serializable schedule is also view serializable schedule but not vice-versa
Problem 03: Check whether the given schedule S is view serializable or not
Schedule – 1
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)
Solution:
For the given schedule-1, the serial schedule can be schedule -2
Schedule-1 (S1) Schedule-2 (S2)
T1 T2 T1 T2
Read(A) Read(A)
Write(A) Write(A)
Read(A) Read(B)
Write(A) Write(B)
Read(B) Read(A)
Write(B) Write(A)
Read(B) Read(B)
Write(B) Write(B)
Now let us check whether the three rules of view-equivalent satisfy or not.
Schedule-1 (S1) Schedule-2 (S2)
T1 T2 T1 T2
1 1
Read(A) Read(A)
Write(A) Write(A)
2 Read(A) Read(B) 1
3
Write(A) Write(B)
Read(B) 1 2 Read(A) 3
Write(B) Write(A)
2 Read(B) 2 Read(B)
3 3
Write(B) Write(B)
Note: Other way of solving it is, if we are able to prove that S1 is conflict serializable, then S1 is also view serializable. (Refer
conflict serializable problems. Every conflict serializable schedule is also view serializable but not vice-versa.)
If the transaction complete successfully, then the database system updates the pointer db-
pointer to point to the new copy of the database; the new copy then becomes the original
copy of the database. The old copy of the database is then deleted. Figure below depicts
the scheme, showing the database state before and after the update.
6. RECOVERABILITY
During execution, if any of the transaction in a schedule is aborted, then this may leads
the database into inconsistence state. If anything goes wrong, then the completed operations in
the schedule needs to be undone. Sometimes, these undone operations may not possible. The
recoverability of schedule depends on undone operations.
T1 T2
Read(A)
Write(A)
|
| Read(A) //Dirty Read
| Write(A)
|
COMMIT
COMMIT //Delayed
Here,
T2 performs a dirty read operation.
The commit operation of T2 is delayed till T1 commits or roll backs.
T1 commits later.
T2 is now allowed to commit.
In case, T1 would have failed, T2 has a chance to recover by rolling back.
7. IMPLEMENTATION OF ISOLATION
Isolation determines how transactions integrity is visible to other users and systems. It means
that a transaction should take place in a system in such a way that it is the only one transaction
that is accessing the resources in a database system.
Isolation level defines the degree to which a transaction must be isolated from the data
modifications made by any other transactions in the database system. The phenomena’s used to
define levels of isolation are:
a) Dirty Read
b) Non-repeatable Read
c) Phantom Read
Dirty Read: If a transaction reads a data value updated by an uncommitted transaction, then
this type of read is called as dirty read.
T1 T2
Read(A)
Write(A)
|
| Read(A) //Dirty Read
| Write(A)
| COMMIT
|
ROLLBACK
As T1 aborted, the results produced by T2 become wrong. This is because T2 read A (Dirty
Read) which is updated by T1.
Non-Repeatable Read: Non repeatable read occurs when a transaction read same data value
twice and get a different value each time. It happens when a transaction reads once before and
once after committed UPDATES from another transaction.
Table in Database
Read A=10 T1 T2
Read(A)
A
Write A=20
Write(A)
Read A=20
Read(A)
First, T1 reads data item A and get A=10
Next, T2 writes data item A as A = 20
Last, T1 reads data item A and get A=20
Phantom reads: Phantom reads occurs when a transaction read same data value twice and get
a different value each time. It happens when a transaction reads once before and once after
committed INSERTS and/or DELETES from another transaction.
Non-repeatable read Phantom read
When T1 perform second read, there is no When T1 perform second read, the no of rows
change in no of rows in the given table either increase or decrease.
T2 perform UPDATE operation on the T2 perform INSERT and/or DELETE
given table operation on the given table
Based on these three phenomena, SQL define four isolation levels. They are:
(1) Read uncommitted: This is the lowest level of isolation. In this level, one transaction
may read the data item modified by other transaction which is not committed. It mean dirty
read is allowed. In this level, transactions are not isolated from each other.
(2) Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus, it does not allow dirty read. The transaction holds a read/write lock
on the data object, and thus prevents other transactions from reading, updating or deleting it.
(3) Repeatable Read: This is the most restrictive isolation level. The transaction holds read
locks on all rows it references and writes locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, consequently it avoids non-
repeatable read. So other transactions cannot read, update or delete these data items.
(4) Serializable: This is the highest isolation level. A serializable execution is guaranteed to
be a serial schedule. Serializable execution is defined to be an execution of operations in
which concurrently executing transactions appears to be serially executing.
The table given below clearly depicts the relationship between isolation levels and the read
phenomena and locks.
Isolation Level Dirty Read Non-repeatable read Phantom Read
Read Uncommitted May occur May occur May occur
Read Committed Don’t occur May occur May occur
Repeatable Read Don’t occur Don’t occur May occur
Serializable Don’t occur Don’t occur Don’t occur
From the above table, it is clear that serializable isolation level is better than others.
8. CONCURRENCY CONTROL
Concurrency is the ability of a database to execute multiple transactions simultaneously.
Concurrency control is a mechanism to manage the simultaneously executing multiple
transactions such that no transaction interfere with other transaction.
Executing multiple transactions concurrently improves the system performance.
Concurrency control increases the throughput and reduces waiting time of transactions.
If Concurrency Control is not done, then it may leads to problems like lost updates, dirty
read, non-repeatable read, phantom read etc. (Refer section 7 for more details)
Lost Updates: It occur when two transactions update same data item at the same time. In
this the first write is lost and only the second write is visible.
Concurrency control Protocols:
The concurrency can be controlled with the help of the following Protocols
(1) Lock-Based Protocol
(2) Timestamp-Based Protocol
(3) Validation-Based Protocol
9. LOCK-BASED PROTOCOL
Lock assures that one transaction should not retrieve or update a record which another
transaction is updating.
For example, traffic at junction, there are signals which indicate stop and go. When one side
signal is green (vehicles allowed passing), then other side signals are red (locked. Vehicles
not allowed passing). Similarly, in database transaction when one transaction operations are
under execution, the other transactions are locked.
If at a junction, green signal is given to more than one side, then there may be chances of
accidents. Similarly, in database transactions, if the locking is not done properly, then it will
display the inconsistent and corrupt data.
There are two lock modes: (1). Shared Lock (2). Exclusive Lock
Shared Locks are represented by S. If a transaction Ti apply shared lock on data item A, then Ti
can only read A but not write into A. Shared lock is requested using lock-S instruction.
Exclusive Locks are represented by X. If a transaction Ti apply exclusive lock on data item A,
then Ti can read as well as write data item A. Exclusive lock is requested using lock-X
instruction.
Whenever a transaction wants to read a data item, it should apply shared lock and when a
transaction wants to write it should apply exclusive lock. If the lock is not applied, then
the transaction is not allowed to perform the operation.
Begin of End of
Transaction Transaction
When a transaction releases any of the acquired locks then it cannot acquire any more
new locks. But, it can only release the acquired locks one after the other during remaining
execution of that transaction.
No of locks
Growing Phase Shrinking Phase
Begin of End of
Transaction Transaction
The Two Phase Locking (2PL) has two phases. They are:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released. (Only get new locks but no release of locks).
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired. (Only release locks but no more getting new locks).
Example:
Time T1 T2
0 LOCK-S(A)
1 LOCK-S(A)
2 Read(A)
3 Read(A)
4 LOCK-X(B)
5 --
6 Read(B)
7 B = B + 100
8 Write(B)
9 UNLOCK(A)
10 LOCK-X(C)
11 UNLOCK(B) --
12 Read(C)
13 C = C + 500
14 Write(C)
15 COMMIT
16 UNLOCK(A)
17 UNLOCK(C)
18 COMMIT
The following way shows how unlocking and locking work with 2-PL.
Transaction T1: Transaction T2:
Growing phase: from step 1-5 (After first lock onwards) Growing phase: from step 2-11 (After first lock onwards)
Shrinking phase: from step 10-12 (After first unlock onwards) Shrinking phase: from step 17-18 (After first unlock onwards)
Lock point: at 5 (No more new locks) Lock point: at 11 (No more new locks)
Growing Phase
Begin of End of
Transaction Transaction
T3: Read(X)
X R-Timestamp(X) = TS(T2)
T2: Write(X)
There are mainly two Timestamp Ordering Algorithms in DBMS. They are:
Basic Timestamp Ordering
Thomas Write rule
Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If W_ timestamp(X) >TS(Ti) then the operation is rejected.
o If W_ timestamp(X) <= TS(Ti) then the operation is executed.
(Read is not allowed by Ti, if any younger transactions than Ti write X)
o If TS(Ti) < W_ timestamp(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed. (Write is not allowed by Ti, if any younger transactions than
Ti write X and also Ti should be rolled back and restarted later)
(i). If R_TS(X) > TS(Ta), then abort and rollback Ta and reject the operation.
Transaction:T1 Transaction:T2 Variable A
Arrival = 9:00 AM Arrival = 9:02 AM Initial A=100
TS(T1) = 9:00 AM TS(T1) = 9:02 AM
| |
| Read(A) (A=100) A = 100 (R_TS(A) = 9:02AM)
| | :
Write(A) (A=200) | (A=100) A = 200 100
(ii). If W_TS(X) > TS(Ta), then don’t execute the Write Operation of Ta but continue Ta
processing. This is a case of Outdated or Obsolete Writes.
Transaction:T1 Transaction:T2 Variable A
Arrival = 9:00 AM Arrival = 9:02 AM Initial A=100
TS(T1) = 9:00 AM TS(T1) = 9:02 AM
| |
| Write(A) (A=400) A = 400 (W_TS(A) = 9:02AM)
| | :
Write(A) (A=500) | (A=400) A = 500 (Outdated write)
|
Reject but continue T1 |
| |
(iii). If the condition in (i) or (ii) is not satisfied, then execute Write(X) of Ta and set
W_TS(X) to TS(Ta).
Outdated writes are rejected but the transaction is continued in Thomas Write Rule but in Basic
TO protocol will reject write operation and terminate such a Transaction.
To perform the validation test, we need to know when the various phases of transaction Ta took
place. We shall therefore associate three different timestamps with transaction Ta.
(i). Start (Ta): the time when Ta, started its execution.
(ii). Validation (Ta): the time when Ta finished its execution and started its
validation phase.
(iii). Finish (Ta): the time when Ta finished its write phase.
The serializability order is determined by changing the timestamp of T as TS(T) = Validation(T).
Hence the serializability is determined at the validation process and cannot be decided in
advance. Therefore it ensures greater degree of concurrency while executing the transactions.
A database contains multiple tables. Each table contains multiple records. Each record contains
multiple field values. It is shown in the above figure. For example, consider Table D and Record
R2. These two are not mutually exclusive. R2 is a part of D. So granularity means different
levels of data where as smaller levels are nested inside the higher levels. Inside database we have
tables. Inside table we have records. Inside record we have field values. This can be represented
with a tree as shown below.
DB
A B C D Tables
A lock can be applied at a node, if and only if there does not exist any locks on the decedents
(childs and grand childs) of that node. Otherwise lock cannot be applied. If lock is applied on
table A, it implies that the lack is also applicable to sub-tree from node A. If lock is applied on
database (at root node), it implies the lack is also applicable to all the nodes in the tree.
The larger the object size on which lock is applied, the lower the degree of concurrency
permitted. On the other hand, the smaller the object size on which lock is applied, the system has
to maintain larger number of locks. More locks cause a higher overhead and needs more disk
space. So, what is the best object size on which lock can be applied? It depends on the types of
transactions involved. If a typical transaction accesses data values from a record, it is
advantageous to have the lock to that one record. On the other hand, if a transaction typically
accesses many records in the same table, it may be better to have lock at that table.
Locking at higher levels needs lock details at lower levels. This information is provided by
additional types of locks called intention locks. The idea behind intention locks is for a
transaction to indicate, along the path from the root to the desired node, what type of lock
(shared or exclusive) it will require from one of the node’s descendants. There are three types of
intention locks:
(1) Intention-shared (IS): It indicates that one or more shared locks will be requested on
some descendant node(s).
(2) Intention-exclusive (IX): It indicates that one or more exclusive locks will be requested
on some descendant node(s).
(3) Shared-intention-exclusive (SIX): It indicates that the current node is locked in shared
mode but that one or more exclusive locks will be requested on some descendant node(s).
The compatibility table of the three intention locks, the shared and exclusive locks, is shown in
Figure.
Mode IS IX S SIX X
IS Yes Yes Yes Yes No
IX Yes Yes No No No
S Yes No Yes No No
SIX Yes No No No No
X No No No No No
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
When a system crashes, it may have many transactions being executed and many files may be
opened for them. When a DBMS recovers from a crash, it must maintain the following:
It must check the states of all the transactions that were being executed.
Few transactions may be within the middle of some operation; the DBMS should make
sure the atomicity of the transaction during this case.
It must check for each transaction whether its execution accepted or to be rolled back.
No transaction is allowed to be in an inconsistent state.
The following techniques facilitate a DBMS in recovering as well as maintaining the atomicity
of a transaction:
Log based recovery
Check point
Shadow paging
UNDO is not needed. It may be necessary to REDO the effect of the operations that are
recorded in the system log, because their effect not yet written in the database.
ii. Immediate database modification: In this technique, the database is modified
immediately after every operation. However, these operations are recorded in the log file
before they are applied to the database, making recovery still possible. If a transaction
fails to reach its commit point, the effect of its operation must be undone i.e. the
transaction must be rolled back hence we require both undo and redo.
T1
T2
T3
T4
Time
The recovery system reads the logs backwards from the end to the last checkpoint.
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <Ti, Start> and < Ti, Commit> or just < Ti,
Commit>, it puts the transaction Ti in the redo-list.
For example: In the log file, transaction T1 have only < Ti, commit> and the transactions
T2 and T3 have < Ti, Start> and < Ti, Commit>. Therefore T1, T2 and T3 transaction are
added to the redo list.
If the recovery system finds a log with < Ti, Start> but no commit or abort, then it puts
the transaction Ti in undo-list.
For example: Transaction T4 will have < Ti, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed in the middle.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then redone
before saving their logs.
16. ARIES ALGORITHM (Algorithm for Recovery and Isolation Exploiting Semantics)
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is one of the log based
recovery method. It uses the Write Ahead Log (WAL) protocol.
Analysis Phase
Redo Phase
Undo phase
(1) Analysis phase: The recovery subsystem scans the log file forward from the last checkpoint
up to the end. The purpose of the scan is to obtain information about the following:
The starting point from where the redo pass should start.
The list of transactions to be rolled back in the undo pass.
The list of dirty pages.
(2) Redo: In this phase, the log file is read forward starting from smallest LSN of a dirty page to
the end and each update found in the log file is redone. The purpose of this redo pass is to
repeat the history to reconstruct the database to the state present at the time of system failure.
(3) Undo: The log is scanned backward and updates related to loser transactions are undone. The
‘loser transaction’ updates are rolled back in reverse chronological order. If any of the
aborted transaction operations are undone, then skip them, no need to undo once again.
Database copy is created and stored in the remote area with the help of network. This
database is periodically updated with the current database so that it will be in sync with data and
other details. This remote database can be updated manually called offline backup. It can be
backed up online where the data is updated at current and remote database simultaneously. In
this case, as soon as there is a failure of current database, system automatically switches to the
remote database and starts functioning. The user will not know that there was a failure.
Full backup or Normal backup: Full backup is also known as Normal backup. In this,
an exact duplicate copy of the original database is created and stored every time the
backup made. The advantage of this type of backup is that restoring the lost data is very
fast as compared to other. The disadvantage of this method is that it takes more time to
backup.
Incremental Backup: Instead of backup entire database every time, backup only the
files that have been updated since the last full backup. For this at least weekly once
normal backup has to be done. While incremental database backups do run faster, the
recovery process is a bit more complicated.
Differential backup: Differential is similar to incremental backup but the difference is
that the recovery process is simplified by not clear the archive bit. So a file that is
updated after a normal backup will be archived every time a differential backup is run
until the next normal backup runs and clears the archive bit.