Chapter 6
Chapter 6
For example, an airline reservations system is used by hundreds of travel agents and reservation
clerks concurrently.
In these systems, hundreds or thousands of users are typically operating on the database by
submitting transactions concurrently to the system.
Cont’d
Multiple users can access databases and use computer systems simultaneously because of the
concept of multiprogramming.
A single central processing unit (CPU) can only execute at most one process at a time.
multiprogramming operating systems execute some commands from one process, then suspend that process and execute
If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction.
write_item(X). Writes the value of program variable X into the database item named X.
Cont’d
specifying the transaction boundaries is by specifying explicit begin transaction and end
transaction statements in an application program.
If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction.
write_item(X). Writes the value of program variable X into the database item named X.
the basic unit of data transfer from disk to main memory is one block.
Cont’d
Executing a read_item(X) command includes the following steps:
2. A transaction or system error. Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error.
Additionally, the user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction. For example, data for the
transaction may not be found. An exception condition, such as insufficient account balance in a
banking database, may cause a transaction, such as a fund withdrawal, to be canceled.
Why Recovery Is Needed
4. Concurrency control enforcement: Transactions aborted because of serializability violations or
deadlocks are typically restarted automatically at a later time.
5. Disk failure. Some disk blocks may lose their data because of a read or write malfunction or
because of a disk read/write head crash.
6. Physical problems and catastrophes. This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft etc.
The recovery manager of the DBMS needs to keep track of the following operations:
BEGIN_TRANSACTION. This marks the beginning of transaction execution.
READ or WRITE. These specify read or write operations on the database items that are executed
as part of a transaction.
END_TRANSACTION. This specifies that READ and WRITE transaction operations have
ended and marks the end of transaction execution.
COMMIT_TRANSACTION. This signals a successful end of the transaction so that any
changes (updates) executed by the transaction can be safely committed to the database and
will not be undone.
ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.
A transaction goes into an active state immediately after it starts execution, where it can execute
its READ and WRITE operations.
When the transaction ends, it moves to the partially committed state.
Once this check is successful, the transaction is said to have reached its commit point and enters
the committed state.
a transaction can go to the failed state if one of the checks fails or if the transaction is aborted
during its active state.
The terminated state corresponds to the transaction leaving the system.
The System Log
To be able to recover from failures that affect transactions, the system maintains a log to keep
track of all transaction operations that affect the values of database items.
The following are the types of entries called log records that are written to the log file and the
corresponding action for each log record.
2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has changed the value of
database item X from old_value to new_value.
3. [read_item, T, X]. Indicates that transaction T has read the value of database item X.
The System Log
4. [commit, T]. Indicates that transaction T has completed successfully, and affirms that its effect
can be committed (recorded permanently) to the database.
That is, the execution of a transaction should not be interfered with by any other transactions
executing concurrently.
enforced by the concurrency control subsystem of the DBMS. every transaction does not make
its updates (write operations) visible to other transactions until it is committed,
4. Durability or permanency. The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any failure.
Schedules of Transaction
When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from all the various transactions is known as a schedule (or history)
A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the operations of the
transactions.
Operations from different transactions can be interleaved in the schedule S. However, for each
transaction Ti that participates in the schedule S, the operations of Ti in S must appear in the
same order in which they occur in Ti.
A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and a for the
operations begin_transaction, read_item, write_item, end_transaction, commit, and abort,
respectively, and appends as a subscript the transaction id (transaction number) to each operation
in the schedule.
Two operations in a schedule are said to conflict if they satisfy all three of the following
conditions:
(1) they belong to different transactions;
(2) they access the same item X; and
(3) at least one of the operations is a write_item(X).
Schedule Sa, the operations r1(X) and w2(X) conflict, as do the operations r2(X) and w1(X), and
the operations w1(X) and w2(X).
However, the operations r1(X) and r2(X) do not conflict, since they are both read operations; the
operations w2(X) and w1(Y) do not conflict because they operate on distinct data items X and Y;
and the operations r1(X) and w1(X) do not conflict because they belong to the same transaction.
Characterizing Schedules Based on Recoverability
The schedules that theoretically meet this criterion are called recoverable schedules; those
that do not are called non recoverable and hence should not be permitted by the DBMS.
Recoverable schedule: if no transaction T in S commits until all transactions T that have written
some item X that T reads have committed.
Non Recoverable schedule(s1) Recoverable schedule(s2)
Because cascading rollback can be quite time-consuming, since numerous transactions can be
rolled back.
it is important to characterize the schedules where this phenomenon is guaranteed not to occur.
A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the
schedule reads only items that were written by committed transactions.
a third, more restrictive type of schedule, called a strict schedule, in which transactions can
neither read nor write an item X until the last transaction that wrote X has committed (or
aborted). Strict schedules simplify the recovery process
Characterizing schedules based on serializability
characterize the types of schedules that are always considered to be correct when concurrent
transactions are executing. Such schedules are known as serializable schedules.
If no interleaving of operations is permitted, there are only two possible outcomes:
1. Execute all the operations of transaction T1(in sequence) followed by all the operations of
transaction T2(in sequence). Or
2. Execute all the operations of transaction T2(in sequence) followed by all the operations of
transaction T1 (in sequence).
Disadvantages of serializablity is that
1. limits concurrency
2. causes CPU to waste (idle)
3. smaller transactions may need to wait long
Schedule A and B are called serial schedules: because the operations of each transaction are
executed consecutively, without any interleaved operations from the other transaction.
The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules.
Schedules C and D are called non serial because each sequence interleaves operations from the two
transactions.
assume that the initial values of database items are X = 90 and Y = 90 and that N = 3 and M =
2.
After executing transactions T1and T2, we would expect the database values to be X= 89 and
Y = 93, according to the meaning of the transactions. executing either of the serial schedules A or
B gives the correct results.
consider the non serial schedules C and D. Schedule C gives the results X = 92 and Y = 93, in
which the X value is erroneous, whereas schedule D gives the correct results.
Schedule C gives an erroneous result because of the lost update problem; transaction T2 reads
the value of X before it is changed by transaction T1
Conflict serializability
Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same
in both schedules. Recall that two operations in a schedule are said to conflict if they belong to different
transactions, access the same database item, and either both are write_item operations or one is a
write_item and the other a read_item.
1. R1(X)W2(X)
2. R2(X)W1(X)
3. W1(X)W2(X)
NOT conflict serializable (even if T2T1)
If we switch R2(x) and W1(x) to W1(x) R2(x) then it will be conflict serializable
Testing for Conflict Serializability of a Schedule
We Can check by drawing precedence graph. Let S be a schedule, construct a directed graph
known as the precedence graph. This graph consists a pair of G = (V,E) where v is a set of
vertices and E is set of edges.
The algorithm is
1. create a node for each transaction
2. A directed edge, Ti Tj, if Tj reads a value of an item written by Ti
3. A directed edge, Ti Tj, if Tj writes a value of an item after it has been read by Ti
4. A directed edge, Ti Tj, if Tj writes a value of an item after Ti writes.
A schedule is conflict serializable if the precedence graph has no cycle/ acyclic
Example
Check for conflict serializable Precedence graph
T2
Z y
T1 T3
X
T2
x x
T1 T3
X
Transaction initiation is done implicitly when particular SQL statements are encountered.
However, every transaction must have an explicit end statement, which is either a COMMIT or a
ROLLBACK.
Every transaction has certain characteristics attributed to it. These characteristics are specified
by a SET TRANSACTION statement in SQL. The characteristics are the access mode, the
diagnostic area size, and the isolation level.
Transaction Support in SQL
A mode of READ WRITE allows select, update, insert, delete, and create commands to be
executed. A mode of READ ONLY, as the name implies, is simply for data retrieval.
2. The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which
indicates the number of conditions that can be held simultaneously in the diagnostic area. These
conditions supply feedback information (errors or exceptions) to the user or program on the n
most recently executed SQL statement.
The isolation level option is specified using the statement ISOLATION LEVEL <isolation>,
where the value for <isolation> can be READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ, or SERIALIZABLE.
This is important so as to prevent one or more of the following three violations that may occur:
1.Dirty read. A transaction T1 may read the update of a transaction T2, which has not yet committed.
If T2 fails and is aborted, then T1 would have read a value that does not exist and is incorrect.
2. Non repeatable read. A transaction T1 may read a given value from a table. If another transaction
T2 later updates that value and T1 reads that value again,T1 will see a different value.
3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause. Now suppose that a transaction T2 inserts a new row
that also satisfies the WHERE-clause condition used in T1, into the table used by T1. If T1is
repeated, then T1 will see a phantom, a row that previously did not exist.
Concurrency Control Techniques
One important set of protocols: known as two-phase locking protocols employ the technique of
locking data items to prevent multiple transactions from accessing the items concurrently.
Locking protocols are used in most commercial DBMSs.
Another set of concurrency control protocols use timestamps. A timestamp is a unique
identifier for each transaction, generated by the system. Timestamps values are generated in the
same order as the transaction start times.
A lock is a variable associated with a data item that describes the status of the item with respect
to possible operations that can be applied to it. Generally, there is one lock for each data item in
the database.
Binary Locks. A binary lock can have two states or values: locked and unlocked (or 1 and 0, for
simplicity).
A distinct lock is associated with each database item X. If the value of the lock on X is 1, item X
cannot be accessed by a database operation that requests the item.
If the value of the lock on X is 0, the item can be accessed when requested, and the lock value is
changed to 1. We refer to the current value (or state) of the lock associated with item X as lock(X).
Two operations, lock_item and unlock_item, are used with binary locking. no inter leaving should
be allowed once a lock or unlock operation is started until the operation terminates or the
transaction waits. (known as critical sections in operating systems)
the wait command within the lock_item(X) operation is usually implemented by putting the
transaction in a waiting queue for item X until X is unlocked and the transaction can be granted
access to it.
If the simple binary locking scheme described here is used, every transaction must
obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any read_item(X) or
write_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and
write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on
item X.
The preceding binary locking scheme is too restrictive for database items because at most, one
transaction can hold a lock on a given item.
We should allow several transactions to access the same item X if they all access X for reading
purposes only. This is because read operations on the same item by different transactions are not
conflicting.
However, if a transaction is to write an item X, it must have exclusive access to X. For this
purpose, a different type of lock called a multiple-mode lock is used
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using
lock-X instruction.
shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to the concurrency-control manager by the programmer. Transaction can
proceed only after request is granted.
Phase 2: Shrinking Phase(existing locks can be released but no new locks can be acquired )
◦ Transaction may release locks
◦ Transaction may not obtain locks