0% found this document useful (0 votes)
11 views

Lecturenotes BCS403 Databasemanagementsystem

Uploaded by

tharanir.aiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecturenotes BCS403 Databasemanagementsystem

Uploaded by

tharanir.aiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

SCHEME - 2022

MoModule -4
BCS403-Database Management System

Transaction Processing

Introduction to Transaction Processing


 Defines the concept of transaction and presents a simple model of transaction execution
based on read and write database operations.
 This model is used as the basis for defining and formalizing concurrency control
and recovery concepts.
 uses informal examples to show why concurrency control techniques are needed
in multiuser systems.
 Finally, discusses why techniques are needed to handle recovery from system and
transaction failures by discussing the different ways in which transactions can fail while
executing.

Single-User versus Multiuser Systems


 A DBMS is single-user if at most one user at a time can use the system, and it is
multiuser if many users can use the system— and hence access the database—
concurrently.
 One criterion for classifying a database system is according to the number of users
who can use the system concurrently.
 Single-user DBMSs are mostly restricted to personal computer systems; most other
DBMSs are multiuser.
 Database systems used in banks, insurance agencies, stock exchanges, supermarkets,
and many other applications are multiuser systems.
 In these systems, hundreds or thousands of users are typically operating on the
database by submitting transactions concurrently to the system.
 Multiple users can access databases—and use computer systems—simultaneously
because of the concept of multiprogramming, which allows the operating system of the
computer to execute multiple programs—or processes—at the same time.
 A single central processing unit (CPU) can only execute at most one process at a time. A
process is resumed at the point where it was suspended whenever it gets its turn to use
the CPU again.
 Hence, concurrent execution of processes is actually interleaved, as illustrated in
Figure 20.1, which shows two processes, A and B, executing concurrently in an
interleaved fashion.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 1
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 In a multiuser DBMS, the stored data items are the primary resources that may be
accessed concurrently by interactive users or application programs, which are constantly
retrieving information from and modifying the database.

Transactions, Database Items, Read and Write Operations, and DBMS Buffers
 A transaction is an executing program that forms a logical unit of database processing. A
transaction includes one or more database access operations—these can include
insertion, deletion, modification (update), or retrieval operations.
 The database operations that form a transaction can either be embedded within an
application program or they can be specified interactively via a high-level query
language such as SQL.
 One way of specifying the transaction boundaries is by specifying explicit begin
transaction and end transaction statements in an application program; in this case, all
database access operations between the two are considered as forming one transaction.
 If the database operations in a transaction do not update the database but only retrieve
data, the transaction is called a read-only transaction; otherwise it is known as a read-
write transaction.
 A database is basically represented as a collection of named data items. The size of a
data item is called its granularity.
 A data item can be a database record, but it can also be a larger unit such as a
whole disk block, or even a smaller unit such as an individual field (attribute) value
of some record in the database.
 Using this simplified database model, the basic database access operations that a
transaction can include are as follows:

R.THARANI,AP,AIML,SSCE,ANEKAL Page 2
SCHEME - 2022

MoModule -4
BCS403-Database Management System

 The DBMS will maintain in the database cache a number of data buffers in main
memory.
 A transaction includes read_item and write_item operations to access and update the
database.
 The read-set of a transaction is the set of all items that the transaction reads, and the
write-set is the set of all items that the transaction writes.

Why Concurrency Control Is Needed


 Several problems can occur when concurrent transactions execute in an uncontrolled
manner. We illustrate some of these problems by referring to a much simplified airline
reservations database in which a record is stored for each airline flight.
 Each record includes the number of reserved seats on that flight as a named (uniquely
identifiable) data item, among other information.
 When a database access program is written, it has the flight number, the flight date, and
the number of seats to be booked as parameters; hence, the same program can be used
R.THARANI,AP,AIML,SSCE,ANEKAL Page 3
SCHEME - 2022

MoModule -4
BCS403-Database Management System
to execute many different transactions, each with a different flight number, date, and
number of seats to be booked.
 Figure 20.2(a) shows a transaction T1 that transfers N reservations from one flight
whose number of reserved seats is stored in the database item named X to another
flight whose number of reserved seats is stored in the database item named Y.

The Lost Update Problem

 This problem occurs when two transactions that access the same database items have
their operations interleaved in a way that makes the value of some database items
incorrect.

The Temporary Update (or Dirty Read) Problem

 This problem occurs when one transaction updates a database item and then the
transaction fails for some reason.

The Incorrect Summary Problem

 If one transaction is calculating an aggregate summary function on a number of


database items while other transactions are updating some of these items, the
aggregate function may calculate some values before they are updated and others after
they are updated.

The Unrepeatable Read Problem

 Another problem that may occur is called unrepeatable read, where a transaction T
reads the same item twice and the item is changed by another transaction T′ between
the two reads.

Why Recovery Is Needed


Whenever a transaction is submitted to a DBMS for execution, the system is responsible for
making sure that either all the operations in the transaction are completed successfully and
their effect is recorded permanently in the database, or that the transaction does not have
any effect on the database or any other transactions.
R.THARANI,AP,AIML,SSCE,ANEKAL Page 4
SCHEME - 2022

MoModule -4
BCS403-Database Management System

Types of Failures
Failures are generally classified as transaction, system, and media failures.

There are several possible reasons for a transaction to fail in the middle of execution:

 A computer failure (system crash).


A hardware, software, or network error occurs in the computer system during
transaction execution. Hardware crashes are usually media failures—for example, main
memory failure.

 A transaction or system error


Some operation in the transaction may cause it to fail, such as integer overflow or division
by zero. Transaction failure may also occur because of erroneous parameter values or because
of a logical programming error. Additionally, the user may interrupt the transaction during its
execution.

 Local errors or exception conditions detected by the transaction.


During transaction execution, certain conditions may occur that necessitate cancellation of
the transaction.
 Concurrency control enforcement.
The concurrency control method may abort a transaction because it violates serializability or
it may abort one or more transactions to resolve a state of deadlock among several transactions
Transactions aborted because of serializability violations or deadlocks are typically restarted
automatically at a later time.

 Disk failure
Some disk blocks may lose their data because of a read or write malfunction or
because of a disk read/write head crash. This may happen during a read or a write
operation of the transaction.

 Physical problems and catastrophes.


This refers to an endless list of problems that includes power or air-conditioning
failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a
wrong tape by the operator.

Transaction and System Concepts


Describes the various states a transaction can be in and discusses other operations
needed in transaction processing.

Transaction States and Additional Operations

R.THARANI,AP,AIML,SSCE,ANEKAL Page 5
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 A transaction is an atomic unit of work that should either be completed in its
entirety or not done at all. For recovery purposes, the system needs to keep track of
when each transaction starts, terminates, and commits, or aborts.
 Therefore, the recovery manager of the DBMS needs to keep track of the following
operations:
BEGIN_TRANSACTION.
This marks the beginning of transaction execution.

READ or WRITE.
These specify read or write operations on the database items that are
executed as part of a transaction.

END_TRANSACTION.
This specifies that READ and WRITE transaction operations have ended and
marks the end of transaction execution.

COMMIT_TRANSACTION

This signals a successful end of the transaction so that any changes (updates)
executed by the transaction can be safely committed to the database and will not be
undone.

ROLLBACK (or ABORT)

This signals that the transaction has ended unsuccessfully, so that any changes or effects
that the transaction may have applied to the database must be undone.

 Figure 20.4 shows a state transition diagram that illustrates how a transaction moves
through its execution states.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 6
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 A transaction goes into an active state immediately after it starts execution, where it can
execute its READ and WRITE operations. When the transaction ends, it moves to the
partially committed state.
 When a transaction is committed, it has concluded its execution successfully and all its
changes must be recorded permanently in the database, even if a system failure occurs.

The System Log


 To be able to recover from failures that affect transactions, the system maintains a log6
to keep track of all transaction operations that affect the values of database items, as
well as other transaction information that may be needed to permit recovery from
failures.
 The log is a sequential, append-only file that is kept on disk, so it is not affected by any
type of failure except for disk or catastrophic failure.
 When the log buffer is filled, or when certain other conditions occur, the log buffer is
appended to the end of the log file on disk.
 The following are the types of entries—called log records—that are written to the log
file and the corresponding action for each log record.
 In these entries, T refers to a unique transaction-id that is generated automatically by
the system for each transaction and that is used to identify each transaction:

 Protocols for recovery that avoid cascading rollbacks which include nearly all practical
protocols—do not require that READ operations are written to the system log.

Commit Point of a Transaction


 A transaction T reaches its commit point when all its operations that access the database
have been executed successfully and the effect of all the transaction operations on the
database have been recorded in the log.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 7
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 Beyond the commit point, the transaction is said to be committed, and its effect must be
permanently recorded in the database.
 Transactions that have written their commit record in the log must also have recorded
all their WRITE operations in the log, so their effect on the database can be redone from
the log records.
 At the time of a system crash, only the log entries that have been written back to disk
are considered in the recovery process if the contents of main memory are lost.
 Hence, before a transaction reaches its commit point, any portion of the log that has not
been written to the disk yet must now be written to the disk.
 This process is called force-writing the log buffer to disk before committing a
transaction.

DBMS-Specific Buffer Replacement Policies


 The DBMS cache will hold the disk pages that contain information currently being
processed in main memory buffers.
 If all the buffers in the DBMS cache are occupied and new disk pages are required to
be loaded into main memory from disk, a page replacement policy is needed to
select the particular buffers to be replaced.
 Some page replacement policies that have been developed specifically for database
systems are briefly discussed next.

Domain Separation (DS) Method

 In a DBMS, various types of disk pages exist: index pages, data file pages, log file
pages, and so on. In this method, the DBMS cache is divided into separate domains
(sets of buffers).
 Each domain handles one type of disk pages, and page replacements within each
domain are handled via the basic LRU (least recently used) page replacement.

Hot Set Method

 The hot set method determines for each database processing algorithm the set of
disk pages that will be accessed repeatedly, and it does not replace them until their
processing is completed.
 This page replacement algorithm is useful in queries that have to scan a set of
pages repeatedly, such as when a join operation is performed using the nested-loop
method.

The DBMIN Method

R.THARANI,AP,AIML,SSCE,ANEKAL Page 8
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 This page replacement policy uses a model known as QLSM (query locality set
model), which predetermines the pattern of page references for each algorithm for a
particular type of database operation.
 The DBMIN page replacement policy will calculate a locality set using QLSM for
each file instance involved in the query.
 The concept of locality set is analogous to the concept of working set, which is
used in page replacement policies for processes by the operating system but there
are multiple locality sets, one for each file instance in the query.

Desirable Properties of Transactions


Transactions should possess several properties, often called the ACID properties; they
should be enforced by the concurrency control and recovery methods of the DBMS.

The following are the ACID properties:

Levels of Isolation.

 There have been attempts to define the level of isolation of a transaction.


 A transaction is said to have level 0 (zero) isolation if it does not overwrite the dirty
reads of higher-level transactions.
 Level 1 (one) isolation has no lost updates, and level 2 isolation has no lost updates
and no dirty reads.
 Finally, level 3 isolation (also called true isolation) has, in addition to level 2
properties, repeatable reads.
 Another type of isolation is called snapshot isolation, and several practical
concurrency control methods are based on this.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 9
SCHEME - 2022

MoModule -4
BCS403-Database Management System

Characterizing Schedules Based on Recoverability


When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from all the various transactions is known as a schedule (or history).

Schedules (Histories) of Transactions

 A schedule (or history) S of n transactions T1, T2, … , Tn is an ordering of the operations


of the transactions. Operations from different transactions can be interleaved in the
schedule S.
 The order of operations in S is considered to be a total ordering, meaning that for any
two operations in the schedule, one must occur before the other.
 It is possible theoretically to deal with schedules whose operations form partial orders,
but we will assume for now total ordering of the operations in a schedule.
 For the purpose of recovery and concurrency control, we are mainly interested in
the read_item and write_item operations of the transactions, as well as the commit
and abort operations.
 A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and a for
the operations begin_transaction, read_item, write_item, end_transaction, commit, and
abort, respectively, and appends as a subscript the transaction id (transaction number)
to each operation in the schedule.
 In this notation, the database item X that is read or written follows the r and w
operations in parentheses.

Conflicting Operations in a Schedule

 Two operations in a schedule are said to conflict if they satisfy all three of the
following conditions: (1) they belong to different transactions; (2) they access the
same item X; and (3) at least one of the operations is a write_item(X).
 For example, in schedule Sa, the operations r1(X) and w2(X) conflict, as do the
operations r2(X) and w1(X), and the operations w1(X) and w2(X).
 Intuitively, two operations are conflicting if changing their order can result in a
different outcome.
 For example, if we change the order of the two operations r1(X); w2(X) to w2(X);
r1(X), then the value of X that is read by transaction T1 changes, because in the
second ordering the value of X is read by r1(X) after it is changed by w2(X), whereas
in the first ordering the value is read before it is changed.
 This is called a read-write conflict. The other type is called a write-write conflict and
is illustrated by the case where we change the order of two operations such as
w1(X); w2(X) to w2(X); w1(X).

R.THARANI,AP,AIML,SSCE,ANEKAL Page 10
SCHEME - 2022

MoModule -4
BCS403-Database Management System

Characterizing Schedules Based on Recoverability


 In some cases, it is even not possible to recover correctly after a failure. Hence, it is
important to characterize the types of schedules for which recovery is possible, as well
as those for which recovery is relatively simple.
 These characterizations do not actually provide the recovery algorithm; they only
attempt to theoretically characterize the different types of schedules.
 Some recoverable schedules may require a complex recovery process, as we shall see,
but if sufficient information is kept (in the log), a recovery algorithm can be devised for
any recoverable schedule.
 Consider the schedule Sa′ given below, which is the same as schedule Sa except that two
commit operations have been added to Sa:

 For the schedule to be recoverable, the c2 operation in Sc must be postponed until after
T1 commits, as shown in Sd.
 If T1 aborts instead of committing, then T2 should also abort as shown in Se, because
the value of X it read is no longer valid. In Se, aborting T2 is acceptable since it has not
committed yet, which is not the case for the nonrecoverable schedule Sc.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 11
SCHEME - 2022

MoModule -4
BCS403-Database Management System

Characterizing Schedules Based on Serializability


 Now we characterize the types of schedules that are always considered to be correct
when concurrent transactions are executing. Such schedules are known as serializable
schedules.
 Suppose that two users—for example, two airline reservations agents—submit to the
DBMS transactions T1 and T2 in Figure 20.2 at approximately the same time. If no
interleaving of operations is permitted, there are only two possible outcomes:

 These two schedules—called serial schedules—are shown in Figures 20.5(a) and (b),
respectively. If interleaving of operations is allowed, there will be many possible orders
in which the system can execute the individual operations of the transactions.
 Two possible schedules are shown in Figure 20.5(c). The concept of serializability of
schedules is used to identify which schedules are correct when transaction executions
have interleaving of their operations in the schedules.

Serial, Nonserial, and Conflict-Serializable Schedules


 Schedules A and B in Figures 20.5(a) and (b) are called serial because the operations of
each transaction are executed consecutively, without any interleaved operations from
the other transaction.
 In a serial schedule, entire transactions are performed in serial order: T1 and then T2 in
Figure 20.5(a), and T2 and then T1 in Figure 20.5(b). Schedules C and D in Figure 20.5(c)
are called nonserial because each sequence interleaves operations from the two
transactions.
 Formally, a schedule S is serial if, for every transaction T participating in the schedule, all
the operations of T are executed consecutively in the schedule; otherwise, the schedule
is called nonserial.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 12
SCHEME - 2022

MoModule -4
BCS403-Database Management System

 The concept used to characterize schedules in this manner is that of serializability of a


schedule.
 The definition of serializable schedule is as follows: A schedule S of n transactions is
serializable if it is equivalent to some serial schedule of the same n transactions.

Conflict Equivalence of Two Schedules

Two schedules are said to be conflict equivalent if the relative order of any two conflicting
operations is the same in both schedules.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 13
SCHEME - 2022

MoModule -4
BCS403-Database Management System
Serializable Schedules

 Using the notion of conflict equivalence, we define a schedule S to be serializable12


if it is (conflict) equivalent to some serial schedule S′. In such a case, we can reorder
the nonconflicting operations in S until we form the equivalent serial schedule S′.

Testing for Serializability of a Schedule

 There is a simple algorithm for determining whether a particular schedule is (conflict)


serializable or not. Most concurrency control methods do not actually test for
serializability.
 Rather protocols, or rules, are developed that guarantee that any schedule that follows
these rules will be serializable.
 Some methods guarantee serializability in most cases, but do not guarantee it
absolutely, in order to reduce the overhead of concurrency control.
 Algorithm 20.1 can be used to test a schedule for conflict serializability.
 The algorithm looks at only the read_item and write_item operations in a schedule to
construct a precedence graph (or serialization graph), which is a directed graph G = (N,
E) that consists of a set of nodes N = {T1, T2, … , Tn } and a set of directed edges E = {e1,
e2, … , em }.
 There is one node in the graph for each transaction Ti in the schedule. Each edge ei in
the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei
and Tk is the ending node of ei.

 The precedence graph is constructed as described in Algorithm 20.1.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 14
SCHEME - 2022

MoModule -4
BCS403-Database Management System

How Serializability Is Used for Concurrency Control


 Being serializable is distinct from being serial, however. A serial schedule represents
inefficient processing because no interleaving of operations from different transactions
is permitted.
 This can lead to low CPU utilization while a transaction waits for disk I/O, or for a long
transaction to delay other transactions, thus slowing down transaction processing
considerably.
 A serializable schedule gives the benefits of concurrent execution without giving up any
correctness. In practice, it is difficult to test for the serializability of a schedule.
 If transactions are executed at will and then the resulting schedule is tested for
serializability, we must cancel the effect of the schedule if it turns out not to be
serializable.
 The approach taken in most commercial DBMSs is to design protocols (sets of rules)
that—if followed by every individual transaction or if enforced by a DBMS concurrency
control subsystem—will ensure serializability of all schedules in which the transactions
participate.
 Serializability theory can be adapted to deal with this problem by considering only the
committed projection of a schedule S.

View Equivalence and View Serializability


 Another less restrictive definition of equivalence of schedules is called view equivalence.
This leads to another definition of serializability called view serializability.
 Two schedules S and S′ are said to be view equivalent if the following three conditions
hold:

 A schedule S is said to be view serializable if it is view equivalent to a serial schedule.


 The definitions of conflict serializability and view serializability are similar if a condition
known as the constrained write assumption (or no blind writes) holds on all transactions
in the schedule.
R.THARANI,AP,AIML,SSCE,ANEKAL Page 15
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 A blind write is a write operation in a transaction T on an item X that is not dependent
on the old value of X, so it is not preceded by a read of X in the transaction T.
 The definition of view serializability is less restrictive than that of conflict serializability
under the unconstrained write assumption, where the value written by an operation
wi(X) in Ti can be independent of its old value.

Other Types of Equivalence of Schedules


 Serializability of schedules is sometimes considered to be too restrictive as a condition
for ensuring the correctness of concurrent executions.
 The semantics of debit-credit operations is that they update the value of a data item X
by either subtracting from or adding to the value of the data item.
 For example, consider the following transactions, each of which may be used to transfer
an amount of money between two bank accounts:

 Also, in certain domains of applications, such as computer-aided design (CAD) of


complex systems like aircraft, design transactions last over a long time period.
 Researchers have been working on extending concurrency control theory to deal
with cases where serializability is considered to be too restrictive as a condition for
correctness of schedules.
 In such applications, more relaxed schemes of concurrency control have been proposed
to maintain consistency of the database, such as eventual consistency.

Transaction Support in SQL


 There are many more details, and the newer standards have more commands for
transaction processing.
 The basic definition of an SQL transaction is similar to our already defined concept of a
transaction. That is, it is a logical unit of work and is guaranteed to be atomic.
 A single SQL statement is always considered to be atomic—either it completes execution
without an error or it fails and leaves the database unchanged. With SQL, there is no
explicit Begin_Transaction statement. Transaction initiation is done implicitly when
particular SQL statements are encountered.
 These characteristics are specified by a SET TRANSACTION statement in SQL. The
characteristics are the access mode, the diagnostic area size, and the isolation level.
R.THARANI,AP,AIML,SSCE,ANEKAL Page 16
SCHEME - 2022

MoModule -4
BCS403-Database Management System
 The access mode can be specified as READ ONLY or READ WRITE. The default is READ
WRITE, unless the isolation level of READ UNCOMMITTED is specified (see below), in
which case READ ONLY is assumed.
 A mode of READ WRITE allows select, update, insert, delete, and create commands to be
executed. A mode of READ ONLY, as the name implies, is simply for data retrieval.
 If a transaction executes at a lower isolation level than SERIALIZABLE, then one or more
of the following three violations may occur:

3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps based on
some condition specified in the SQL WHERE-clause.
phantom record because it was not there when T1 starts but is there when T1 ends.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 17
SCHEME - 2022

MoModule -4
BCS403-Database Management System

Snapshot Isolation
 Another isolation level, known as snapshot isolation, is used in some commercial
DBMSs, and some concurrency control protocols exist that are based on this concept.
 The basic definition of snapshot isolation is that a transaction sees the data items that it
reads based on the committed values of the items in the database snapshot (or
database state) when the transaction starts.
 Snapshot isolation will ensure that the phantom record problem does not occur, since
the database transaction, or in some cases the database statement, will only see the
records that were committed in the database at the time the transaction starts.
 Any insertions, deletions, or updates that occur after the transaction starts will not be
seen by the transaction.

R.THARANI,AP,AIML,SSCE,ANEKAL Page 18

You might also like