0% found this document useful (0 votes)
15 views

Chapter 6

Uploaded by

nat yesu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Chapter 6

Uploaded by

nat yesu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

CHAPTER 6:

TRANSACTION PROCESSING, CONCURRENCY CONTROL,


AND RECOVERY
Transaction and System Concepts
The concept of transaction provides a mechanism for describing logical units of database
processing.
Transaction processing systems are systems with large databases and hundreds of
concurrent users executing database transactions.
Examples of such systems include airline reservations, banking, credit card processing,
online retail purchasing, stock markets, supermarket checkouts, and many other
applications.
These systems require high availability and fast response time for hundreds of
concurrent users.
A transaction is typically implemented by a computer program, which includes
database commands such as retrievals, insertions, deletions, and updates.
Single-User versus Multiuser Systems
One criterion for classifying a database system is according to the number of users who can use
the system concurrently.
A DBMS is single-user if at most one user at a time can use the system
it is multiuser if many users can use the system and access the database concurrently.

For example, an airline reservations system is used by hundreds of travel agents and reservation
clerks concurrently.

In these systems, hundreds or thousands of users are typically operating on the database by
submitting transactions concurrently to the system.
Cont’d
 Multiple users can access databases and use computer systems simultaneously because of the
concept of multiprogramming.

A single central processing unit (CPU) can only execute at most one process at a time.

 multiprogramming operating systems execute some commands from one process, then suspend that process and execute

some commands from the next process.


Cont’d
specifying the transaction boundaries is by specifying explicit begin transaction and end
transaction statements in an application program.

If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction.

read_item(X) : Reads a database item named X into a program variable.

write_item(X). Writes the value of program variable X into the database item named X.
Cont’d
specifying the transaction boundaries is by specifying explicit begin transaction and end
transaction statements in an application program.

If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction; otherwise it is known as a read-write transaction.

read_item(X) : Reads a database item named X into a program variable.

write_item(X). Writes the value of program variable X into the database item named X.

the basic unit of data transfer from disk to main memory is one block.
Cont’d
Executing a read_item(X) command includes the following steps:

1. Find the address of the disk block that contains item X.


2. Copy that disk block into a buffer in main memory
3. Copy item X from the buffer to the program variable named X.
Executing a write_item(X) command includes the following steps:

1. Find the address of the disk block that contains item X.


2. Copy that disk block into a buffer in main memory
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated block from the buffer back to disk (either immediately or at some later point in
time).
Why Concurrency Control Is Needed
This concurrent execution is uncontrolled, it may lead to problems, such as an in consistent
database.
1. The Lost Update Problem. This problem occurs when two transactions that access the same
database items have their operations interleaved in a way that makes the value of some
database items incorrect.
Example : Suppose that transactions T1 and T2 are submitted at approximately the same time, and
suppose that their operations are interleaved then the final value of item X is incorrect because T2
reads the value of X before T1changes it in the database, and hence the updated value resulting
from T1is lost.
Why Concurrency Control Is Needed
2. The Temporary Update (or Dirty Read) Problem.
This problem occurs when one transaction updates a database item and then the transaction fails
for some reason.
Meanwhile, the updated item is accessed (read) by another transaction before it is changed back
to its original value.

3. The Incorrect Summary Problem. If one transaction is calculating an aggregate summary


function on a number of database items while other transactions are updating some of these items,
the aggregate function may calculate some values before they are updated and others after they
are updated.
Why Recovery Is Needed
1. A computer failure (system crash). A hardware, software, or network error occurs in the
computer system during transaction execution. Hardware crashes are usually media failure
for example, main memory failure.

2. A transaction or system error. Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error.
Additionally, the user may interrupt the transaction during its execution.

3. Local errors or exception conditions detected by the transaction. For example, data for the
transaction may not be found. An exception condition, such as insufficient account balance in a
banking database, may cause a transaction, such as a fund withdrawal, to be canceled.
Why Recovery Is Needed
4. Concurrency control enforcement: Transactions aborted because of serializability violations or
deadlocks are typically restarted automatically at a later time.

5. Disk failure. Some disk blocks may lose their data because of a read or write malfunction or
because of a disk read/write head crash.

6. Physical problems and catastrophes. This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft etc.
The recovery manager of the DBMS needs to keep track of the following operations:
BEGIN_TRANSACTION. This marks the beginning of transaction execution.
READ or WRITE. These specify read or write operations on the database items that are executed
as part of a transaction.
END_TRANSACTION. This specifies that READ and WRITE transaction operations have
ended and marks the end of transaction execution.
COMMIT_TRANSACTION. This signals a successful end of the transaction so that any
changes (updates) executed by the transaction can be safely committed to the database and
will not be undone.
ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.
A transaction goes into an active state immediately after it starts execution, where it can execute
its READ and WRITE operations.
When the transaction ends, it moves to the partially committed state.
Once this check is successful, the transaction is said to have reached its commit point and enters
the committed state.
a transaction can go to the failed state if one of the checks fails or if the transaction is aborted
during its active state.
The terminated state corresponds to the transaction leaving the system.
The System Log
To be able to recover from failures that affect transactions, the system maintains a log to keep
track of all transaction operations that affect the values of database items.
The following are the types of entries called log records that are written to the log file and the
corresponding action for each log record.

1. [start_transaction, T]. Indicates that transaction T has started execution.

2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has changed the value of
database item X from old_value to new_value.

3. [read_item, T, X]. Indicates that transaction T has read the value of database item X.
The System Log
4. [commit, T]. Indicates that transaction T has completed successfully, and affirms that its effect
can be committed (recorded permanently) to the database.

5. [abort, T]. Indicates that transaction T has been aborted.


Transactions should possess several properties, often called the ACID properties; they should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the
ACID properties:

1. Atomicity. A transaction is an atomic unit of processing; it should either be performed in its


entirety or not performed at all. transaction recovery subsystem of a DBMS should ensure atomicity.

2. Consistency preservation. A transaction should be consistency preserving, meaning that if it is


completely executed from beginning to end without interference from other transactions, it should
take the database from one consistent state to another.
3. Isolation. A transaction should appear as though it is being executed in isolation from other
transactions, even though many transactions are executing concurrently.

That is, the execution of a transaction should not be interfered with by any other transactions
executing concurrently.

 enforced by the concurrency control subsystem of the DBMS. every transaction does not make
its updates (write operations) visible to other transactions until it is committed,

4. Durability or permanency. The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any failure.
Schedules of Transaction
When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from all the various transactions is known as a schedule (or history)
A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the operations of the
transactions.
 Operations from different transactions can be interleaved in the schedule S. However, for each
transaction Ti that participates in the schedule S, the operations of Ti in S must appear in the
same order in which they occur in Ti.
 A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and a for the
operations begin_transaction, read_item, write_item, end_transaction, commit, and abort,
respectively, and appends as a subscript the transaction id (transaction number) to each operation
in the schedule.
Two operations in a schedule are said to conflict if they satisfy all three of the following
conditions:
(1) they belong to different transactions;
(2) they access the same item X; and
(3) at least one of the operations is a write_item(X).

Schedule Sa, the operations r1(X) and w2(X) conflict, as do the operations r2(X) and w1(X), and
the operations w1(X) and w2(X).
However, the operations r1(X) and r2(X) do not conflict, since they are both read operations; the
operations w2(X) and w1(Y) do not conflict because they operate on distinct data items X and Y;
and the operations r1(X) and w1(X) do not conflict because they belong to the same transaction.
Characterizing Schedules Based on Recoverability

once a transaction T is committed, it should never be necessary to roll back T.

This ensures that the durability property of transactions is not violated.

The schedules that theoretically meet this criterion are called recoverable schedules; those
that do not are called non recoverable and hence should not be permitted by the DBMS.

Recoverable schedule: if no transaction T in S commits until all transactions T that have written
some item X that T reads have committed.
Non Recoverable schedule(s1) Recoverable schedule(s2)
Because cascading rollback can be quite time-consuming, since numerous transactions can be
rolled back.

it is important to characterize the schedules where this phenomenon is guaranteed not to occur.
A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the
schedule reads only items that were written by committed transactions.

a third, more restrictive type of schedule, called a strict schedule, in which transactions can
neither read nor write an item X until the last transaction that wrote X has committed (or
aborted). Strict schedules simplify the recovery process
Characterizing schedules based on serializability
characterize the types of schedules that are always considered to be correct when concurrent
transactions are executing. Such schedules are known as serializable schedules.
If no interleaving of operations is permitted, there are only two possible outcomes:
1. Execute all the operations of transaction T1(in sequence) followed by all the operations of
transaction T2(in sequence). Or
2. Execute all the operations of transaction T2(in sequence) followed by all the operations of
transaction T1 (in sequence).
Disadvantages of serializablity is that
1. limits concurrency
2. causes CPU to waste (idle)
3. smaller transactions may need to wait long
Schedule A and B are called serial schedules: because the operations of each transaction are
executed consecutively, without any interleaved operations from the other transaction.
The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules.
Schedules C and D are called non serial because each sequence interleaves operations from the two
transactions.
 assume that the initial values of database items are X = 90 and Y = 90 and that N = 3 and M =
2.
After executing transactions T1and T2, we would expect the database values to be X= 89 and
Y = 93, according to the meaning of the transactions. executing either of the serial schedules A or
B gives the correct results.
consider the non serial schedules C and D. Schedule C gives the results X = 92 and Y = 93, in
which the X value is erroneous, whereas schedule D gives the correct results.
Schedule C gives an erroneous result because of the lost update problem; transaction T2 reads
the value of X before it is changed by transaction T1
Conflict serializability
 Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same
in both schedules. Recall that two operations in a schedule are said to conflict if they belong to different
transactions, access the same database item, and either both are write_item operations or one is a
write_item and the other a read_item.

Example S: R1(X) R2(X) W1(X) R1(Y)W2(X)W1(Y)


S1: T1 T2 = R1(X) W1(X) R1(Y) W1(Y) R2(X) W2(X)

1. R1(X)W2(X)
2. R2(X)W1(X)
3. W1(X)W2(X)
NOT conflict serializable (even if T2T1)
If we switch R2(x) and W1(x) to W1(x) R2(x) then it will be conflict serializable
Testing for Conflict Serializability of a Schedule
We Can check by drawing precedence graph. Let S be a schedule, construct a directed graph
known as the precedence graph. This graph consists a pair of G = (V,E) where v is a set of
vertices and E is set of edges.
The algorithm is
1. create a node for each transaction
2. A directed edge, Ti Tj, if Tj reads a value of an item written by Ti
3. A directed edge, Ti Tj, if Tj writes a value of an item after it has been read by Ti
4. A directed edge, Ti Tj, if Tj writes a value of an item after Ti writes.
A schedule is conflict serializable if the precedence graph has no cycle/ acyclic
Example
Check for conflict serializable Precedence graph
T2

Z y

T1 T3
X

There is no cycle in this graph so it is conflict serializable


Example2
Check if this is conflict serializable ?
soln
Precedence graph

T2

x x

T1 T3
X

There is a cycle in this graph so it is not conflict serializable


Transaction Support in SQL

With SQL, there is no explicit Begin_Transaction statement.

Transaction initiation is done implicitly when particular SQL statements are encountered.
However, every transaction must have an explicit end statement, which is either a COMMIT or a
ROLLBACK.

Every transaction has certain characteristics attributed to it. These characteristics are specified
by a SET TRANSACTION statement in SQL. The characteristics are the access mode, the
diagnostic area size, and the isolation level.
Transaction Support in SQL

1. The access mode can be specified as READ ONLY or READ WRITE.

 A mode of READ WRITE allows select, update, insert, delete, and create commands to be
executed. A mode of READ ONLY, as the name implies, is simply for data retrieval.

2. The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which
indicates the number of conditions that can be held simultaneously in the diagnostic area. These
conditions supply feedback information (errors or exceptions) to the user or program on the n
most recently executed SQL statement.
The isolation level option is specified using the statement ISOLATION LEVEL <isolation>,
where the value for <isolation> can be READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ, or SERIALIZABLE.
This is important so as to prevent one or more of the following three violations that may occur:

1.Dirty read. A transaction T1 may read the update of a transaction T2, which has not yet committed.
If T2 fails and is aborted, then T1 would have read a value that does not exist and is incorrect.
2. Non repeatable read. A transaction T1 may read a given value from a table. If another transaction
T2 later updates that value and T1 reads that value again,T1 will see a different value.
3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause. Now suppose that a transaction T2 inserts a new row
that also satisfies the WHERE-clause condition used in T1, into the table used by T1. If T1is
repeated, then T1 will see a phantom, a row that previously did not exist.
Concurrency Control Techniques

 One important set of protocols: known as two-phase locking protocols employ the technique of
locking data items to prevent multiple transactions from accessing the items concurrently.
Locking protocols are used in most commercial DBMSs.
Another set of concurrency control protocols use timestamps. A timestamp is a unique
identifier for each transaction, generated by the system. Timestamps values are generated in the
same order as the transaction start times.
 A lock is a variable associated with a data item that describes the status of the item with respect
to possible operations that can be applied to it. Generally, there is one lock for each data item in
the database.
Binary Locks. A binary lock can have two states or values: locked and unlocked (or 1 and 0, for
simplicity).
A distinct lock is associated with each database item X. If the value of the lock on X is 1, item X
cannot be accessed by a database operation that requests the item.
If the value of the lock on X is 0, the item can be accessed when requested, and the lock value is
changed to 1. We refer to the current value (or state) of the lock associated with item X as lock(X).
Two operations, lock_item and unlock_item, are used with binary locking. no inter leaving should
be allowed once a lock or unlock operation is started until the operation terminates or the
transaction waits. (known as critical sections in operating systems)
 the wait command within the lock_item(X) operation is usually implemented by putting the
transaction in a waiting queue for item X until X is unlocked and the transaction can be granted
access to it.
If the simple binary locking scheme described here is used, every transaction must
obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any read_item(X) or
write_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and
write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on
item X.
The preceding binary locking scheme is too restrictive for database items because at most, one
transaction can hold a lock on a given item.
We should allow several transactions to access the same item X if they all access X for reading
purposes only. This is because read operations on the same item by different transactions are not
conflicting.
However, if a transaction is to write an item X, it must have exclusive access to X. For this
purpose, a different type of lock called a multiple-mode lock is used
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using
lock-X instruction.
shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to the concurrency-control manager by the programmer. Transaction can
proceed only after request is granted.

Two phase locking (2PL) protocol


Phase 1: Growing Phase(the T can only acquire locks but no existing lock can be released)
◦ Transaction may obtain locks
◦ Transaction may not release locks

Phase 2: Shrinking Phase(existing locks can be released but no new locks can be acquired )
◦ Transaction may release locks
◦ Transaction may not obtain locks

You might also like