0% found this document useful (0 votes)
29 views

Chapter 4 Transaction

advanced database

Uploaded by

joyeshu7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Chapter 4 Transaction

advanced database

Uploaded by

joyeshu7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER FOUR

TRANSACTION PROCESSING CONCEPTS


4.1. Introduction to Transaction Processing
The concept of transaction provides a mechanism for describing logical units of database processing.
Transaction processing systems are systems with large databases and hundreds of concurrent users
executing database transactions. Examples of such systems include airline reservations, banking,
credit card processing, online retail purchasing, stock markets, supermarket checkouts, and many
other applications. These systems require high availability and fast response time for hundreds of
concurrent users.

4.1.1. Single-User versus Multiuser Systems


One criterion for classifying a database system is according to the number of users who can use the
system concurrently. A DBMS is single-user if at most one user at a time can use the system, and it is
multiuser if many users can use the system and hence access the database concurrently. Single-user
DBMSs are mostly restricted to personal computer systems; most other DBMSs are multiuser. For
example, an airline reservations system is used by hundreds of travel agents and reservation clerks
concurrently. Database systems used in banks, insurance agencies, stock exchanges, supermarkets,
and many other applications are multiuser systems.

Multiple users can access databases and use computer systems simultaneously because of the concept
of multiprogramming, which allows the operating system of the computer to execute multiple
programs or processes at the same time.

4.1.2. Transactions, Database Items, Read and Write Operations


A transaction is an executing program that forms a logical unit of database processing. A transaction
includes one or more database access operations these can include insertion, deletion, modification, or
retrieval operations. The database operations that form a transaction can either be embedded within an
application program or they can be specified interactively via a high-level query language such as
SQL. One way of specifying the transaction boundaries is by specifying explicit begin transaction
and end transaction statements in an application program. If the database operations in a transaction
do not update the database but only retrieve data, the transaction is called a read-only transaction;
otherwise it is known as a read-write transaction.

A data item can be a database record, but it can also be a larger unit such as a whole disk block, or
even a smaller unit such as an individual field (attribute) value of some record in the database.

The basic database access operations that a transaction can include are as follows:

■ read_item(X): Reads a database item named X into a program variable. To simplify our notation,
we assume that the program variable is also named X.

1|Page
■ write_item(X): Writes the value of program variable X into the database item named X.

The DBMS will maintain in the database cache a number of data buffers in main memory. Each
buffer typically holds the contents of one database disk block, which contains some of the database
items being processed.

4.2. Transaction and System Concepts


4.2.1. Transaction States and Additional Operations
A transaction is an atomic unit of work that should either be completed in its entirety or not done at
all. For recovery purposes, the system needs to keep track of when each transaction starts, terminates,
and commits or aborts. Therefore, the recovery manager of the DBMS needs to keep track of the
following operations:

■ BEGIN_TRANSACTION: This marks the beginning of transaction execution.

■ READ or WRITE: These specify read or write operations on the database items that are executed
as part of a transaction.

■ END_TRANSACTION: This specifies that READ and WRITE transaction operations have ended
and marks the end of transaction execution. However, at this point it may be necessary to check
whether the changes introduced by the transaction can be permanently applied to the database
(committed) or whether the transaction has to be aborted because it violates serializability or for some
other reason.

■ COMMIT_TRANSACTION: This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not be undone.

■ ROLLBACK (or ABORT): This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.

Fig 4.1: State transition diagram illustrating the states for transaction execution
As shown in the above Fig 3.1, transaction goes into an active state immediately after it starts
execution, where it can execute its READ and WRITE operations. When the transaction ends, it
moves to the partially committed state. At this point, some recovery protocols need to ensure that a
system failure will not result in an inability to record the changes of the transaction permanently

2|Page
(usually by recording changes in the system log. Once this check is successful, the transaction is said
to have reached its commit point and enters the committed state.

A transaction T reaches its commit point when all its operations that access the database have been
executed successfully and the effect of all the transaction operations on the database have been
recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect
must be permanently recorded in the database. The transaction then writes a commit record [commit,
T] into the log.

However, a transaction can go to the failed state if one of the checks fails or if the transaction is
aborted during its active state. The transaction may then have to be rolled back to undo the effect of
its WRITE operations on the database. The terminated state corresponds to the transaction leaving the
system.

4.2.2. The System Log


To be able to recover from failures that affect transactions, the system maintains a log to keep track of
all transaction operations that affect the values of database items, as well as other transaction
information that may be needed to permit recovery from failures. The log is a sequential, append-only
file that is kept on disk, so it is not affected by any type of failure except for disk or catastrophic
failure. Typically, one (or more) main memory buffers hold the last part of the log file, so that log
entries are first added to the main memory buffer. When the log buffer is filled, or when certain other
conditions occur, the log buffer is appended to the end of the log file on disk. In addition, the log file
from disk is periodically backed up to archival storage (tape) to guard against catastrophic failures.

4.3. Properties of Transaction


In order to perform a Transaction in a database system and to make sure it works without any issues,
there are few rules a Database Transaction should follow. These rules are the standards across all
Relational Database systems (RDBMS) and are called ACID rules.

ACID stands for Atomicity, Consistency, Isolation and Durability. So let’s check what all these Rules
states.

A: Atomicity states that every Transaction should be atomic in nature. A Transaction in a Relational
Database can contain either a single SQL statement or multiple SQL statements. Thus by Atomic
Transaction it means “all or none”. Either all SQL statements/steps execute successfully in a
transaction, or fail as a single unit and none of them should be treated as executed and the system
should be returned to its original state.

For example: If account-A & account-B both having $2000 balance, you have to transfer $1000 from
account-A to account-B, this will involves 2 steps. First withdrawal from account-A, and Second
deposit in account-B. Thus, both the steps should be treated as single or atomic unit and at the end

3|Page
account-A should have $1000 & account-B should have $3000 balance. If in case after First step the
system fails or any error occurs then first step should also be rolled-back and $1000 withdrawn from
account-A should be re-deposited to it, maintaining $2000 back in both the accounts. Thus there
should be no intermediate state where account-A has $1000 and account-B still has $2000 balance.

C: Consistency states that any Transaction happened in a database will take it from one consistent
state to another consistent state. The data finally recorded in the database must be valid according to
the defined Rules, Constraints, Cascades, Triggers, etc. If in case of any failure to these rules the
changes made by any transaction should be rolled-back, this will put the system in earlier consistent
state.

For example: If the money deposit process has any Trigger built on top of it. And at the time of
money transfer any of the Trigger fails or any database node, the system should automatically
Rollback the complete transaction and switch back the system to its previous consistent state before
the transaction was started. Or if everything executes successfully then the system is committed to a
new consistent state.

I: Isolation means Transactions performing same functions should run in Isolation and not in parallel
to provide more concurrency to the data and avoiding dirty reads & writes. One need to use proper
Transaction Isolation levels and locking in order to prevent this.

For example: If two people accessing a joint-account with $5000 balance from 2 terminals to
withdraw money. Let’s say at same time John & Marry apply to withdraw $4000 from two different
ATMs. If both the Transactions do not run in Isolation and run in parallel then both John & Marry
will be able to withdraw $4000 each i.e. $8000 total from their account. To make sure this won’t
happen Transactions should be not allowed to run in parallel, by setting Transaction Isolations and/or
locking methods on the database objects.

D: Durability, a transaction should be durable by storing the data permanently and making it
available in case of power failure, recovery from system failure, crash, any error, etc. All in all, the
data should not get lost in any of the miss-happenings and one should be able to recover data from
restore, logging and other methods.

4.4. Schedules and Recoverability


When transactions are executing concurrently in an interleaved fashion, then the order of execution of
operations from all the various transactions is known as a schedule (or history).We characterize
schedules in terms of the interference of participating transactions, leading to the concepts of
serializability and serializable schedules.

4.4.1. Schedules (Histories) of Transactions

4|Page
A schedule (or history) S of n transactions T 1, T2, … Tn is an ordering of the operations of the
transactions. Operations from different transactions can be interleaved in the schedule S. However, for
each transaction Ti that participates in the schedule S, the operations of T i in S must appear in the
same order in which they occur in T i. The order of operations in S is considered to be a total ordering,
meaning that for any two operations in the schedule, one must occur before the other.

For the purpose of recovery and concurrency control, we are mainly interested in the read_item and
write_item operations of the transactions, as well as the commit and abort operations. A shorthand
notation for describing a schedule uses the symbols b, r, w, e, c, and a for the operations
begin_transaction, read_item, write_item, end_transac-tion, commit, and abort, respectively, and
appends as a subscript the transaction id (transaction number) to each operation in the schedule.

Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:

1. They belong to different transactions;

2. They access the same item X; and

3. At least one of the operations is a write_item (X).

The rest of this section covers some theoretical definitions concerning schedules. A schedule S of n
transactions T1, T2 ... Tn is said to be a complete schedule if the following conditions hold:

1. The operations in S are exactly those operations in T 1, T2 ... Tn, including a commit or abort
operation as the last operation for each transaction in the schedule.

2. For any pair of operations from the same transaction Ti, their relative order of appearance
in S is the same as their order of appearance in Ti.

3. For any two conflicting operations, one of the two must occur before the other in the
schedule.

The preceding condition (3) allows for two non-conflicting operations to occur in the schedule
without defining which occurs first, thus leading to the definition of a schedule as a partial order of
the operations in the n transactions.

However, a total order must be specified in the schedule for any pair of conflicting operations
(Condition 3) and for any pair of operations from the same transaction (condition 2). Condition 1
simply states that all operations in the transactions must appear in the complete schedule. Since every
transaction has either committed or aborted, a complete schedule will not contain any active
transactions at the end of the schedule.

In general, it is difficult to encounter complete schedules in a transaction processing system because


new transactions are continually being submitted to the system.

4.4.2. Characterizing Schedules Based on Recoverability

5|Page
For some schedules it is easy to recover from transaction and system failures, whereas for other
schedules the recovery process can be quite involved. In some cases, it is even not possible to recover
correctly after a failure. Hence, it is important to characterize the types of schedules for which
recovery is possible, as well as those for which recovery is relatively simple.

A schedule S is recoverable if no transaction T in S commits until all transactions T’ that have written
some item X that T reads have committed. A transaction T reads from transaction T’ in a schedule S if
some item X is first written by T’ and later read by T. In addition, T’ should not have been aborted
before T reads item X, and there should be no transactions that write X after T’ writes it and before T
reads it (unless those transactions, if any, have aborted before T reads X).

In a recoverable schedule, no committed transaction ever needs to be rolled back, and so the definition
of committed transaction as durable is not violated. However, it is possible for a phenomenon known
as cascading rollback (or cascading abort) to occur in some recoverable schedules, where an
uncommitted transaction has to be rolled back because it read an item from a transaction that failed.

In the previous section, we characterized schedules based on their recoverability properties. Now we
characterize the types of schedules that are always considered to be correct when concurrent
transactions are executing. Such schedules are known as serializable schedules. Suppose that two
users for example, two airline reservations agents submit to the DBMS transactions T 1 and T2 at
approximately the same time. If no interleaving of operations is permitted, there are only two possible
outcomes:

1. Execute all the operations of transaction T 1 (in sequence) followed by all the operations of
transaction T2 (in sequence).

2. Execute all the operations of transaction T 2 (in sequence) followed by all the operations of
transaction T1 (in sequence).

4.5. Serializability of Schedules


The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules. This section defines
serializability and discusses how it may be used in practice.

4.5.1. Serial, Nonserial, and Conflict-Serializable Schedules


Schedules A and B are called serial because the operations of each transaction are executed
consecutively, without any interleaved operations from the other transaction. In a serial schedule,
entire transactions are performed in serial order: T1 and then T2.

Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the
operations of T are executed consecutively in the schedule; otherwise, the schedule is called non-
serial. Therefore, in a serial schedule, only one transaction at a time is active: the commit (or abort) of

6|Page
the active transaction initiates execution of the next transaction. No interleaving occurs in a serial
schedule. One reasonable assumption we can make, if we consider the transactions to be independent,
is that every serial schedule is considered correct.

The definition of conflict equivalence of schedules is as follows: Two schedules are said to be conflict
equivalent if the order of any two conflicting operations is the same in both schedules.

4.6. Transaction Support in SQL


There are many more details, and the newer standards have more commands for transaction
processing. The basic definition of an SQL transaction is similar to our already defined concept of a
transaction. That is, it is a logical unit of work and is guaranteed to be atomic. A single SQL statement
is always considered to be atomic either it completes execution without an error or it fails and leaves
the database unchanged.

With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is done implicitly
when particular SQL statements are encountered. However, every transaction must have an explicit
end statement, which is either a COMMIT or a ROLLBACK. Every transaction has certain
characteristics attributed to it. These characteristics are specified by a SET TRANSACTION
statement in SQL. The characteristics are the access mode, the diagnostic area size, and the isolation
level.

The access mode can be specified as READ ONLY or READ WRITE. The default is READ WRITE,
unless the isolation level of READ UNCOMMITTED is specified, in which case READ ONLY is
assumed. A mode of READ WRITE allows select, update, insert, delete, and create commands to be
executed. A mode of READ ONLY, as the name implies, is simply for data retrieval.

The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which indicates
the number of conditions that can be held simultaneously in the diagnostic area. These conditions
supply feedback information (errors or exceptions) to the user or program on the n most recently
executed SQL statement.

The isolation level option is specified using the statement ISOLATION LEVEL <isolation>, where
the value for <isolation> can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE
READ, or SERIALIZABLE.

The default isolation level is SERIALIZABLE, although some systems use READ COMMITTED as
their default. The use of the term SERIALIZABLE here is based on not allowing violations that cause
dirty read, unrepeatable read, and phantoms, and it is thus not identical to the way serializability was
defined earlier. If a transaction executes at a lower isolation level than SERIALIZABLE, then one or
more of the following three violations may occur:

7|Page
 Dirty read: A transaction T1 may read the update of a transaction T 2, which has not yet
committed. If T2 fails and is aborted, then T1 would have read a value that does not exist
and is incorrect.
 Nonrepeatable read: A transaction T1 may read a given value from a table. If another
transaction T2 later updates that value and T 1 reads that value again, T 1 will see a different
value.
 Phantoms: A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause.
The following table summarizes possible violations for the different isolation levels. An entry of Yes
indicates that a violation is possible and an entry of No indicates that it is not possible. READ
UNCOMMITTED is the most forgiving, and SERIALIZABLE is the most restrictive in that it avoids
all three of the problems mentioned above.

Table 4.1: Possible Violations Based on Isolation Levels as Defined in SQ

8|Page

You might also like