DBMS Unit-4
DBMS Unit-4
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
UNIT IV
By
Ms. P. Sarala
Sr. Assistant Professo
Dept.of CSE, LBRCE(A
A transaction is a unit of program execution that accesses
and possibly updates various data items.
A transaction is delimited by statements (or function calls) of
the form begin transaction and end transaction. The
transaction consists of all operations executed between the
begin transaction and end transaction.
Let’s take an example of a simple transaction. Suppose a bank employee transfers
Rs 500 from A's account to B's account. This very simple and small transaction
involves several low-level tasks.
A’s Account
Open_Account (A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account (A)
B’s Account
Open_Account (B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account (B)
ACID Properties
A transaction in a database system must maintain Atomicity, Consistency,
Isolation, and Durability − commonly known as ACID properties − in order to
ensure accuracy, completeness, and data integrity.
Atomicity (A)
This property ensures that either all the operations of a transaction reflect in
database or none.
Example
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is
transferring Rs.100 to Account B. This is a transaction that has two operations
Let’s say first operation passed successfully while second failed, in this case A’s
balance would be Rs. 300 while B would be having Rs. 700 instead of Rs. 800.
This is unacceptable in a banking system. Either the transaction should fail without
executing any of the operation or it should process both the operations. The
Atomicity property ensures that.
Consistency (C)
Integrity Constraints should be maintained. So, that the database is consistent
before and after the transaction. To preserve the consistency of database, the
execution of transaction should take place in isolation (that means no other
transaction should run concurrently when there is a transaction already running).
Example
Account A is having a balance of 400 and it is transferring 100 to account B & C
both. We have two transactions here. Let’s say these transactions run concurrently
and both the transactions read 400 balance, in that case the final balance of A
would be 300 instead of 200 .
This is wrong. If the transaction were to run in isolation, then the second
transaction would have read the correct balance 300 (before debiting 100 ) once
the first transaction went successful.
Isolation
Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that
either Tj finished execution before Ti started, or Tj started execution after Ti
finished. Thus, each transaction is unaware of other transactions executing
concurrently in the system.
Durability (D)
Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery
management component of database systems ensures the durability of transaction.
Transaction States
Any transaction at any point of time must be in one of the following states:
Active: The initial state; the transaction stays in this state while it is executing.
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
Committed: After successful completion.
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It
is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).
Suppose the current values of accounts A and B are $1000 and $2000
Schedule2 Concurrent schedule
Serial Schedules
It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then
the next transaction is executed. Transactions are ordered one after the other. This
type of schedule is called a serial schedule, as transactions are executed in a serial
manner.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
View Equivalence
Conflict Equivalence
Result Equivalence:
If two schedules produce the same result after execution, they are said to be
result equivalent. They may yield the same result for some value and different
results for another set of values. That's why this equivalence is not generally
considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the
schedules perform similar actions in a similar manner.
For example −
If Ti reads the initial data in S1, then it also reads the initial data in S2.
If Ti reads the value written by J in S1, then it also reads the value written by J
in S2.
If Ti performs the final write on the data value in S1, then it also performs the
final write on the data value in S2.
Conflict Equivalence
Two Operations would be conflicting if they have the following properties –
Both belong to separate transactions.
Both accesses the same data item.
At least one of them is "write" operation.
Example:
Conflicting operations pair (R1(A), W2(A)) because they belong to two
different transactions on same data item A and one of them is write operation.
Similarly,
(W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
On the other hand, (R1(A), W2(B)) pair is non-conflicting because they
operate on different data item.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Here S, S12 are not following the same order for conflicting operation pairs. S
Now compare S with <T2 T1> for conflict equivalence.
Here S, S21 are not following the same order for conflicting operation pairs.
So, the given schedule S is not a conflict serializable schedule.
S: R1(A)W1(A)R2(A)W2(A)R1(B)W1(B)R2(B)W2(B)
View Serializable Schedules
A schedule S is said to be view Serializable schedule if it is view equivalent to
the any one of the serial schedule.
Step-1: Generate all possible serial schedules for the given schedule. If a
schedule contains n transactions, then possible number of serial schedules are n!
Step-2: Now compare each serial schedule with given schedule for view
equivalence. If any serial schedule is view equivalent with the given schedule,
then it is view Serializable schedule otherwise it is not a view Serializable
schedule.
Example:
Check whether the schedule is view Serializable or not?
S: R2 (B); R2 (A); R1 (A); R3 (A); W1 (B); W2 (B); W3 (B);
Solution: With 3 transactions, total number of schedules possible = 3! = 6
<T1 T2 T3>
<T1 T3 T2>
<T2 T3 T1>
<T2 T1 T3>
<T3 T1 T2>
<T3 T2 T1>
Step 1: Initial Read
A: T2
B: T2
Remaining Schedules:
< T2 T1 T3>
<T2 T3 T1 >
Failure Classification
There are various types of failure that may occur in a system, each of which
needs to be dealt with in a different manner. The failures are classified as follows
Transaction failure.
System crash.
Disk failure.
Transaction failure
There are two types of errors that may cause a transaction to fail:
Logical error
The transaction can no longer continue with its normal execution because of
some internal condition, such as bad input, data not found, overflow, or resource
limit exceeded.
System error
The system has entered an undesirable state (for example, deadlock), as a result
of which a transaction cannot continue with its normal execution. The
transaction, however, can be executed later.
System crash
There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage and brings
transaction processing to a halt. The content of non-volatile storage remains
intact and is not corrupted.
Disk failure
A disk block loses its content as a result of either a head crash or failure during
a data transfer operation.
To determine how the system should recover from failures, we need to identify
the failure modes of those devices used for storing data. Next, we must consider
how these failure modes affect the contents of the database.
Storage Structure
The storage structure can be divided into two categories
Volatile storage
As the name suggests, a volatile storage cannot survive system crashes.
Examples
main memory ,RAM and cache memory are examples of volatile storage.
Non-volatile storage
These memories are made to survive system crashes. They are huge in data
storage capacity, but slower in accessibility.
Examples:
Harddisks, magnetic tapes, flash memory, and non-volatile (battery backed up)
ROM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.
According to atomicity of transactions must be maintained, that is, either all the
operations are executed or none.
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be
rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well
as maintaining the atomicity of a transaction
Maintaining the logs of each transaction and writing them onto some stable
storage before modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory,
and later, the actual database is updated.
Log-based Recovery
Log maintains the records of actions performed by a transaction. It is important
that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
When a transaction enters the system and starts execution, it writes a log about
it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When the transaction finishes, it logs −
<Tn, commit>
If all the operations in schedule are successful, then all the writes are deferred to
partial commit otherwise no writes are deferred to partial commit.
In this method write operation does not need old record i.e., (write, T, x, new
value).
(Abort, T) entry never used in this method.
In this method if any system crash occurs because of any transaction failure then
the schedule looks up into log file and reads all the writes until (commit, T) entry
is found. If the entry is found write all (write, T) entries into the database otherwise
redo all the operations. If (commit, T) entry is not found nothing to be done.
This method is also called as (No-undo) / redo recovery scheme.
Redo must be done in a order.
Example:
(Start,T0)
(Write,T0,a,9)
Crash (Nothing is to be done because no commit found)
(commit,T0)
(start, T1)
(write,T1,a,7)
Crash (T0 committed so redo T0)
(commit,T1)
Crash (T0, T1 are committed so redo T0,T1)
Example:
(Start, T0)
(Write, T0, A, 3, 9)
Crash (Undo T0 so A=3)
(Commit, T0)
(Start, T1)
(Write, T1, B, 2, 5)
(Write, T1, a, 9, 7)
Crash (undo – T1 and Redo T0 so B=2, A=9)
(Commit, T1)
Crash (Redo T0, T1 so B=5, A=7)
Checkpoint
Checkpoint declares a point before which the DBMS was in consistent state, and
all the transactions were committed.
Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in
the following manner
The recovery system reads the logs backwards from the end to the last
checkpoint.
It maintains two lists, an undo-list and a redo-list.
Ifthe recovery system sees a log with <Tn, Start> and <Tn, Commit> or just
<Tn, Commit>, it puts the transaction in the redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log
found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then
redone before saving their logs.
Example:
(Start, T1)
(Write, T1, B, 2, 3)
(Start, T2)
(Commit, T1)
(Write, T2, C, 5, 7)
(Commit, T2)
(Checkpoint, {T2})
(Start, t3)
(Write, T3, A, 1, 9)
(Commit, T3)
(Start, T4)
(Write, T4, C, 7, 2)
In the above example Undo list is – T4
Redo list is – T2,T3
First it performs Undo list in the order i.e., T4 and then it performs Redo T2,T3.
Deadlock Handling
Deadlock is a state of a database system having two or more transactions, when
each transaction is waiting for a data item that is being locked by some other
transaction.
A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed
graph in which the vertices denote transactions and the edges denote waits for
data items.
Example
Transaction T1 is waiting for data item X which is locked by T3. T3 is waiting
for Y which is locked by T2 and T2 is waiting for Z which is locked by T1.
Hence, a waiting cycle is formed, and none of the transactions can proceed
executing.
T1
X
Z
T2 T3
Y
There are three classical approaches for deadlock handling, namely
Deadlock prevention.
Deadlock avoidance.
Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a
distributed database system.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire
locks that will lead to deadlock.
The convention is that when more than one transactions request for locking the
same data item, only one of them is granted the lock.
One of the most popular deadlock prevention methods is pre-acquisition of all
the locks.
In this method, a transaction acquires all the locks before starting to execute
and retains the locks for the entire duration of transaction.
Using this approach, the system is prevented from being deadlocked since none
of the waiting transactions are holding any lock.
There are two algorithms for this purpose, namely wait-die and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a data
item which is already locked by T2. The algorithms are as follows
Wait-Die
The wait–die scheme is a non preemptive technique. When transaction Ti requests
a data item currently held by Tj , Ti is allowed to wait only if it has a timestamp
smaller than that of Tj (that is, Ti is older than Tj ). Otherwise, Ti is rolled back (dies).
Wound-Wait
The wound–wait scheme is a preemptive technique. It is a counterpart to the wait–die
scheme. When transaction Ti requests a data item currently held by Tj , Ti is allowed
to wait only if it has a timestamp larger than that of Tj (that is, Ti is younger than Tj ).
Otherwise, Tj is rolled back (Tj is wounded by Ti).
Deadlock Avoidance
The deadlock avoidance approach handles deadlocks before they occur. It analyzes
the transactions and the locks to determine whether waiting leads to a deadlock or not .
When a transaction requests a lock on data item The lock manager checks whether
the lock is available. If it is available, the lock manager allocates the data item and the
transaction acquires the lock.
If the item is locked by some other transaction in incompatible mode, the lock
manager runs an algorithm to test whether keeping the transaction in waiting state will
cause a deadlock or not.
Deadlock Detection and Removal
The deadlock detection and removal approach runs a deadlock detection
algorithm periodically and removes deadlock in case there is one.
When a transaction requests a lock, the lock manager checks whether it is
available. If it is available, the transaction is allowed to lock the data item;
otherwise the transaction is allowed to wait.
To detect deadlocks, the lock manager periodically checks if the wait-for graph
has cycles.
If the system is deadlocked, the lock manager chooses a victim transaction from
each cycle. The victim is aborted and rolled back; and then restarted later.
This protocol requires that all the data items must be accessed in a mutually
exclusive manner, i.e. when one transaction is executing then no other transaction
should interrupt the same object.
Shared Lock: When we take this lock we can just read the item but cannot write.
Exclusive Lock: In this type of lock we can write as well as read the data item.
Below table will give clear idea about what we can do and cannot while having
shared or exclusive lock.
In general we have 2 types of locking protocols. Those are
Simple locking protocol
2 – Phase locking Protocol
In 2 PL we have seen:
Serializability: It is guaranteed to happen.
Cascading Rollback: It is possible which is bad.
Deadlock: It is possible.
The Venn Diagram below shows the classification of schedules that are rigorous
strict and conservative.
Example 1:
Now we will see what is the above schedule following the properties discussed above
1.If R_TS(X) > TS(T), then abort and roll back T and reject the operation.
2.IfW_TS(X) > TS(T), then don’t execute the Write Operation and continue
processing. This is a case of Outdated or Obsolete Writes. Remember, outdated
writes are ignored in Thomas Write Rule but a Transaction following Basic TO
protocol will abort such a Transaction.
3.If
neither the condition in 1 or 2 occurs, then and only then execute the
W_item(X) operation of T and set W_TS(X) to TS(T)
Validation Based Protocol (Optimistic Concurrency)
validation based protocol executes transaction in three phases:
Read phase
In this phase, the transaction T is read and executed. It is used to read the value of
various data items and stores them in temporary local variables. It can perform all
the write operations on temporary variables without an update to the actual
database.
Validation phase
In this phase, the temporary variable value will be validated against the actual data
to see if it violates the serializability.
Write phase
If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the actual phase
which determines if the transaction will commit or rollback.
Hence TS(T) = validation(T).
The serializability is determined during the validation process. It can't be decided
in advance.
While executing the transaction, it ensures a greater degree of concurrency and
also less number of conflicts.
Thus it contains transactions which have less number of rollbacks.
Multiple Granularity
Granularity: It is the size of data item allowed to lock.
Multiple Granularity can be defined as hierarchically breaking up the database
into blocks which can be locked.
The Multiple Granularity protocol enhances concurrency and reduces lock
overhead.
It maintains the track of what to lock and how to lock.
It makes easy to decide either to lock a data item or to unlock a data item. This
type of hierarchy can be graphically represented as a tree.
Example
Consider a tree which has four levels of nodes.
The first level or higher level shows the entire database.
The second level represents a node of type area. The higher level database
consists of exactly these areas.
The area consists of children nodes which are known as files. No file can be
present in more than one area.
Finally, each file contains child nodes known as records. The file has exactly
those records that are its child nodes. No records represent in more than one file.
Hence, the levels of the tree starting from the top level are as follows:
1.Database
2.Area
3.File
4.Record
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but
only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with
exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix
It uses the intention lock modes to ensure serializability. It requires that if a
transaction attempts to lock a node, then that node must follow these protocols:
In multiple-granularity, the locks are acquired in top-down order, and locks must
be released in bottom-up order.
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES)
Analysis
The analysis step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of the crash.
The appropriate point in the log where the REDO operation should start is also
determined
REDO
The REDO phase updates only committed transactions from the log to the database.
ARIES will have information which provides the start point for REDO
Information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone had been applied to the database.
Thus only the necessary REDO operations are applied during recovery.
UNDO
During the UNDO phase, the log is scanned backwards and the operations of
transactions that were active at the time of the crash are undone in reverse order.
The information needed for ARIES to accomplish its recovery procedure
includes the log, the Transaction Table, and the Dirty Page Table.