Chapter 4 Transaction
Chapter 4 Transaction
Multiple users can access databases and use computer systems simultaneously because of the concept
of multiprogramming, which allows the operating system of the computer to execute multiple
programs or processes at the same time.
A data item can be a database record, but it can also be a larger unit such as a whole disk block, or
even a smaller unit such as an individual field (attribute) value of some record in the database.
The basic database access operations that a transaction can include are as follows:
■ read_item(X): Reads a database item named X into a program variable. To simplify our notation,
we assume that the program variable is also named X.
1|Page
■ write_item(X): Writes the value of program variable X into the database item named X.
The DBMS will maintain in the database cache a number of data buffers in main memory. Each
buffer typically holds the contents of one database disk block, which contains some of the database
items being processed.
■ READ or WRITE: These specify read or write operations on the database items that are executed
as part of a transaction.
■ END_TRANSACTION: This specifies that READ and WRITE transaction operations have ended
and marks the end of transaction execution. However, at this point it may be necessary to check
whether the changes introduced by the transaction can be permanently applied to the database
(committed) or whether the transaction has to be aborted because it violates serializability or for some
other reason.
■ COMMIT_TRANSACTION: This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not be undone.
■ ROLLBACK (or ABORT): This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.
Fig 4.1: State transition diagram illustrating the states for transaction execution
As shown in the above Fig 3.1, transaction goes into an active state immediately after it starts
execution, where it can execute its READ and WRITE operations. When the transaction ends, it
moves to the partially committed state. At this point, some recovery protocols need to ensure that a
system failure will not result in an inability to record the changes of the transaction permanently
2|Page
(usually by recording changes in the system log. Once this check is successful, the transaction is said
to have reached its commit point and enters the committed state.
A transaction T reaches its commit point when all its operations that access the database have been
executed successfully and the effect of all the transaction operations on the database have been
recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect
must be permanently recorded in the database. The transaction then writes a commit record [commit,
T] into the log.
However, a transaction can go to the failed state if one of the checks fails or if the transaction is
aborted during its active state. The transaction may then have to be rolled back to undo the effect of
its WRITE operations on the database. The terminated state corresponds to the transaction leaving the
system.
ACID stands for Atomicity, Consistency, Isolation and Durability. So let’s check what all these Rules
states.
A: Atomicity states that every Transaction should be atomic in nature. A Transaction in a Relational
Database can contain either a single SQL statement or multiple SQL statements. Thus by Atomic
Transaction it means “all or none”. Either all SQL statements/steps execute successfully in a
transaction, or fail as a single unit and none of them should be treated as executed and the system
should be returned to its original state.
For example: If account-A & account-B both having $2000 balance, you have to transfer $1000 from
account-A to account-B, this will involves 2 steps. First withdrawal from account-A, and Second
deposit in account-B. Thus, both the steps should be treated as single or atomic unit and at the end
3|Page
account-A should have $1000 & account-B should have $3000 balance. If in case after First step the
system fails or any error occurs then first step should also be rolled-back and $1000 withdrawn from
account-A should be re-deposited to it, maintaining $2000 back in both the accounts. Thus there
should be no intermediate state where account-A has $1000 and account-B still has $2000 balance.
C: Consistency states that any Transaction happened in a database will take it from one consistent
state to another consistent state. The data finally recorded in the database must be valid according to
the defined Rules, Constraints, Cascades, Triggers, etc. If in case of any failure to these rules the
changes made by any transaction should be rolled-back, this will put the system in earlier consistent
state.
For example: If the money deposit process has any Trigger built on top of it. And at the time of
money transfer any of the Trigger fails or any database node, the system should automatically
Rollback the complete transaction and switch back the system to its previous consistent state before
the transaction was started. Or if everything executes successfully then the system is committed to a
new consistent state.
I: Isolation means Transactions performing same functions should run in Isolation and not in parallel
to provide more concurrency to the data and avoiding dirty reads & writes. One need to use proper
Transaction Isolation levels and locking in order to prevent this.
For example: If two people accessing a joint-account with $5000 balance from 2 terminals to
withdraw money. Let’s say at same time John & Marry apply to withdraw $4000 from two different
ATMs. If both the Transactions do not run in Isolation and run in parallel then both John & Marry
will be able to withdraw $4000 each i.e. $8000 total from their account. To make sure this won’t
happen Transactions should be not allowed to run in parallel, by setting Transaction Isolations and/or
locking methods on the database objects.
D: Durability, a transaction should be durable by storing the data permanently and making it
available in case of power failure, recovery from system failure, crash, any error, etc. All in all, the
data should not get lost in any of the miss-happenings and one should be able to recover data from
restore, logging and other methods.
4|Page
A schedule (or history) S of n transactions T 1, T2, … Tn is an ordering of the operations of the
transactions. Operations from different transactions can be interleaved in the schedule S. However, for
each transaction Ti that participates in the schedule S, the operations of T i in S must appear in the
same order in which they occur in T i. The order of operations in S is considered to be a total ordering,
meaning that for any two operations in the schedule, one must occur before the other.
For the purpose of recovery and concurrency control, we are mainly interested in the read_item and
write_item operations of the transactions, as well as the commit and abort operations. A shorthand
notation for describing a schedule uses the symbols b, r, w, e, c, and a for the operations
begin_transaction, read_item, write_item, end_transac-tion, commit, and abort, respectively, and
appends as a subscript the transaction id (transaction number) to each operation in the schedule.
Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:
The rest of this section covers some theoretical definitions concerning schedules. A schedule S of n
transactions T1, T2 ... Tn is said to be a complete schedule if the following conditions hold:
1. The operations in S are exactly those operations in T 1, T2 ... Tn, including a commit or abort
operation as the last operation for each transaction in the schedule.
2. For any pair of operations from the same transaction Ti, their relative order of appearance
in S is the same as their order of appearance in Ti.
3. For any two conflicting operations, one of the two must occur before the other in the
schedule.
The preceding condition (3) allows for two non-conflicting operations to occur in the schedule
without defining which occurs first, thus leading to the definition of a schedule as a partial order of
the operations in the n transactions.
However, a total order must be specified in the schedule for any pair of conflicting operations
(Condition 3) and for any pair of operations from the same transaction (condition 2). Condition 1
simply states that all operations in the transactions must appear in the complete schedule. Since every
transaction has either committed or aborted, a complete schedule will not contain any active
transactions at the end of the schedule.
5|Page
For some schedules it is easy to recover from transaction and system failures, whereas for other
schedules the recovery process can be quite involved. In some cases, it is even not possible to recover
correctly after a failure. Hence, it is important to characterize the types of schedules for which
recovery is possible, as well as those for which recovery is relatively simple.
A schedule S is recoverable if no transaction T in S commits until all transactions T’ that have written
some item X that T reads have committed. A transaction T reads from transaction T’ in a schedule S if
some item X is first written by T’ and later read by T. In addition, T’ should not have been aborted
before T reads item X, and there should be no transactions that write X after T’ writes it and before T
reads it (unless those transactions, if any, have aborted before T reads X).
In a recoverable schedule, no committed transaction ever needs to be rolled back, and so the definition
of committed transaction as durable is not violated. However, it is possible for a phenomenon known
as cascading rollback (or cascading abort) to occur in some recoverable schedules, where an
uncommitted transaction has to be rolled back because it read an item from a transaction that failed.
In the previous section, we characterized schedules based on their recoverability properties. Now we
characterize the types of schedules that are always considered to be correct when concurrent
transactions are executing. Such schedules are known as serializable schedules. Suppose that two
users for example, two airline reservations agents submit to the DBMS transactions T 1 and T2 at
approximately the same time. If no interleaving of operations is permitted, there are only two possible
outcomes:
1. Execute all the operations of transaction T 1 (in sequence) followed by all the operations of
transaction T2 (in sequence).
2. Execute all the operations of transaction T 2 (in sequence) followed by all the operations of
transaction T1 (in sequence).
Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the
operations of T are executed consecutively in the schedule; otherwise, the schedule is called non-
serial. Therefore, in a serial schedule, only one transaction at a time is active: the commit (or abort) of
6|Page
the active transaction initiates execution of the next transaction. No interleaving occurs in a serial
schedule. One reasonable assumption we can make, if we consider the transactions to be independent,
is that every serial schedule is considered correct.
The definition of conflict equivalence of schedules is as follows: Two schedules are said to be conflict
equivalent if the order of any two conflicting operations is the same in both schedules.
With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is done implicitly
when particular SQL statements are encountered. However, every transaction must have an explicit
end statement, which is either a COMMIT or a ROLLBACK. Every transaction has certain
characteristics attributed to it. These characteristics are specified by a SET TRANSACTION
statement in SQL. The characteristics are the access mode, the diagnostic area size, and the isolation
level.
The access mode can be specified as READ ONLY or READ WRITE. The default is READ WRITE,
unless the isolation level of READ UNCOMMITTED is specified, in which case READ ONLY is
assumed. A mode of READ WRITE allows select, update, insert, delete, and create commands to be
executed. A mode of READ ONLY, as the name implies, is simply for data retrieval.
The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which indicates
the number of conditions that can be held simultaneously in the diagnostic area. These conditions
supply feedback information (errors or exceptions) to the user or program on the n most recently
executed SQL statement.
The isolation level option is specified using the statement ISOLATION LEVEL <isolation>, where
the value for <isolation> can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE
READ, or SERIALIZABLE.
The default isolation level is SERIALIZABLE, although some systems use READ COMMITTED as
their default. The use of the term SERIALIZABLE here is based on not allowing violations that cause
dirty read, unrepeatable read, and phantoms, and it is thus not identical to the way serializability was
defined earlier. If a transaction executes at a lower isolation level than SERIALIZABLE, then one or
more of the following three violations may occur:
7|Page
Dirty read: A transaction T1 may read the update of a transaction T 2, which has not yet
committed. If T2 fails and is aborted, then T1 would have read a value that does not exist
and is incorrect.
Nonrepeatable read: A transaction T1 may read a given value from a table. If another
transaction T2 later updates that value and T 1 reads that value again, T 1 will see a different
value.
Phantoms: A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause.
The following table summarizes possible violations for the different isolation levels. An entry of Yes
indicates that a violation is possible and an entry of No indicates that it is not possible. READ
UNCOMMITTED is the most forgiving, and SERIALIZABLE is the most restrictive in that it avoids
all three of the problems mentioned above.
8|Page