UNIT4
UNIT4
Often, a collection of several operations on the database appears to be a single unit from the
point of view of the database user. For example, a transfer of funds from a checking account to a
savings account is a single operation from the customer’s standpoint; within the database system,
however, it consists of several operations.
Clearly, it is essential that all these operations occur, or that, in case of a failure, none occur. It
would be unacceptable if the checking account were debited, but the savings account were not
credited.
Collections of operations that form a single logical unit of work are called transactions. A
database system must ensure proper execution of transactions despite failures either the entire
transaction executes, or none of it does. Furthermore, it must manage concurrent execution of
transactions in a way that avoids the introduction of inconsistency. In our funds-transfer
example, a transaction computing the customer’s total money might see the checking-account
balance before it is debited by the fundstransfer transaction, but see the savings balance after it is
credited. As a result, it would obtain an incorrect result.
Each high level operation can be divided into a number of low level tasks or operations. For
example, a data update operation can be divided into three tasks −
Transaction Operations
commit − A signal to specify that the transaction has been successfully completed in its
entirety and will not be undone.
rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be rolled
back.
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed,
failed and aborted.
Active − The initial state where the transaction enters is the active state. The transaction
remains in this state while it is executing read, write or other operations.
Partially Committed − The transaction enters this state after the last statement of the
transaction has been executed.
Committed − The transaction enters this state after successful completion of the
transaction and system checks have issued commit signal.
Failed − The transaction goes from partially committed state or active state to failed
state when it is discovered that normal execution can no longer proceed or system
checks fail.
Aborted − This is the state after the transaction has been rolled back after failure and the
database has been restored to its state that was before the transaction began.
The following state transition diagram depicts the states in the transaction and the low level
transaction operations that causes change in states.
Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation, and
Durability.
Atomicity − This property states that a transaction is an atomic unit of processing, that
is, either it is performed in its entirety or not performed at all. No partial update should
exist.
Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
Isolation − A transaction should be executed as if it is the only one in the system. There
should not be any interference from the other concurrent transactions that are
simultaneously running.
Durability − If a committed transaction brings about a change, that change should be
durable in the database and not lost in case of any failure.
To gain a better understanding of ACID properties and the need for them, consider a simplified
banking system consisting of several accounts and a set of transactions that access and update
those accounts. For the time being, we assume that the database permanently resides on disk, but
that some portion of it is temporarily residing in main memory.
read(X), which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.
write(X), which transfers the data item X from the the local buffer of the transaction that
executed the write back to the database.
In a real database system, the write operation does not necessarily result in the immediate update
of the data on the disk; the write operation may be temporarily stored in memory and executed
on the disk later. For now, however, we shall assume that the write operation updates the
database immediately. We shall return to this subject in Recovery System.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be
defined as
Ti: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID requirements. (For ease of presentation, we consider them
in an order different from the order A-C-I-D).
Consistency: The consistency requirement here is that the sum of A and B be unchanged by
the execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction! It can be verified easily that, if the database is
consistent before an execution of the transaction, the database remains consistent after the
execution of the transaction. Ensuring consistency for an individual transaction is the
responsibility of the application programmer who codes the transaction. This task may be
facilitated by automatic testing of integrity constraints.
Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A
and B are $1000 and $2000, respectively. Now suppose that, during the execution of
transaction Ti, a failure occurs that prevents Ti from completing its execution successfully.
Examples of such failures include power failures, hardware failures, and software errors.
Further, suppose that the failure happened after the write(A) operation but before the write(B)
operation. In this case, the values of accounts A and B reflected in the database are $950 and
$2000. The system destroyed $50 as a result of this failure. In particular, wenote that the sum
A + B is no longer preserved. Thus, because of the failure, the state of the system no longer
reflects a real state of the world that the database is supposed to capture. We term such astate
an inconsistent state. We must ensure that such inconsistencies are not visible in a database
system. Note, however, that the system must at some point be in an inconsistent state. Even if
transaction Ti is executed to completion, there exists a point at which the value of account A
is $950 and the value of account B is $2000, which is clearly an inconsistent state. This state,
however, is eventually replaced by the consistent state where the value of account A is $950,
and the value of account B is $2050. Thus, if the transaction never started or was guaranteed
to complete, such an inconsistent state would not be visible except during the execution of the
transaction. That is the reason for the atomicity requirement: If the atomicity property is
present, all actions of the transaction are reflected in the database, or none are. The basic idea
behind ensuring atomicity is this: The database system keeps track (on disk) of the old values
of any data on which a transaction performs a write, and, if the transaction does not complete
its execution, the database system restores the old values to make it appear as though the
transaction never executed. Ensuring atomicity is the responsibility of the database system
itself; specifically, it is handled by a component called the transaction-management
component.
Durability: Once the execution of the transaction completes successfully, and the user who
initiated the transaction has been notified that the transfer of funds has taken place, it must be
the case that no system failure will result in a loss of data corresponding to this transfer of
funds. The durability property guarantees that, once a transaction completes successfully, all
the updates that it carried out on the database persist, even if there is a system failure after the
transaction completes execution.
We assume for now that a failure of the computer system may result in loss of data in main
memory, but data written to disk are never lost. We can guarantee durability by ensuring that
either
1. The updates carried out by the transaction have been written to disk before the transaction
completes.
2. Information about the updates carried out by the transaction and written to disk is sufficient to
enable the database to reconstruct the updates when the database system is restarted after the
failure.
Ensuring durability is the responsibility of a component of the database system called the
recovery-management component. The transaction-management component and the recovery-
management component are closely related, and we describe them inRecovery System
Isolation: Even if the consistency and atomicity properties are ensured for each transaction, if
several transactions are executed concurrently, their operations may interleave in some
undesirable way, resulting in an inconsistent state.
For example, as we saw earlier, the database is temporarily inconsistent while the transaction to
transfer funds from A to B is executing, with the deducted total written to A and the increased
total yet to be written to B. If a second concurrently running transaction reads A and B at this
intermediate point and computes A+B, it will observe an inconsistent value. Furthermore, if this
second transaction then performs updates on A and B based on the inconsistent values that it
read, the database may be left in an inconsistent state even after both transactions have
completed.
We discuss the problems caused by concurrently executing transactions . The isolation property
of a transaction ensures that the concurrent execution of transactions results in a system state that
is equivalent to astate that could have been obtained had these transactions executed one at a
time in some order. We shall discuss the principles of isolation . Ensuring the isolation property
is the responsibility of a component of the database system called the concurrency-control
component, whichwe discuss later, in Concurrency Control.
Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run concurrently.
Allowing multiple transactions to update data concurrently causes several complications with
consistency of the data, as we saw earlier. Ensuring consistency in spite of concurrent execution
of transactions requires extra work; it is far easier to insist that transactions run serially that is,
one at a time, each starting only after the previous one has completed. However, there are two
good reasons for allowing concurrency:
Improved throughput and resource utilization. Atransaction consists of many steps. Some
involve I/O activity; others involve CPU activity. The CPU and the disks in a computer
system can operate in parallel. Therefore, I/O activity can be done in parallel with processing
at the CPU. The parallelism of the CPU and the I/O system can therefore be exploited to run
multiple transactions in parallel. While a read or write on behalf of one transaction is in
progress on one disk, another transaction can be running in the CPU, while another disk may
be executing a read or write on behalf of a third transaction. All of this increases the
throughput of the system that is, the number of transactions executed in a given amount of
time. Correspondingly, the processor and disk utilization also increase; in other words, the
processor and disk spend less time idle, or not performing any useful work.
Reduced waiting time. There may be a mix of transactions running on a system, some short and
some long. If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a transaction. If the
transactions are operating on different parts of the database, it is better to let them run
concurrently, sharing the CPU cycles and disk accesses among them. Concurrent execution
reduces the unpredictable delays in running transactions. Moreover, it also reduces the average
response time: the average time for a transaction to be completed after it has been submitted.
The database system must control the interaction among the concurrent transactions to prevent
them from destroying the consistency of the database. It does so through a variety of mechanisms
called concurrency-control schemes. We study concurrency-control schemes in Concurrency
Control; for now, we focus on the concept of correct concurrent execution.
Let T1 and T2 be two transactions that transfer funds from one account to another. Transaction
T1 transfers $50 from account A to account B. It is defined as
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).
Suppose the current values of accounts A and B are $1000 and $2000, respectively. Suppose also
that the two transactions are executed one at a time in the order T1 followed by T2. This
execution sequence appears in Figure . In the figure, the sequence of instruction steps is in
chronological order from top to bottom, with instructions of T1 appearing in the left column and
instructions of T2 appearing in the right column. The final values of accounts A and B, after the
execution in Figure takes place, are $855 and $2145, respectively. Thus, the total amount of
money in accounts A and B that is, the sum A + B is preserved after the execution of both
transactions.
Figure 4.2 Schedule 1—a serial schedule in which T1 is followed by T2.
Similarly, if the transactions are executed one at a time in the order T2 followed by T1, then the
corresponding execution sequence is that of Figure. Again, as expected, the sum A + B is
preserved, and the final values of accounts A and B are $850 and $2150, respectively.
The execution sequences just described are called schedules. They represent the chronological
order in which instructions are executed in the system. Clearly, a schedule for a set of
transactions must consist of all instructions of those transactions, and must preserve the order in
which the instructions appear in each individual transaction. For example, in transaction T1, the
instruction write(A) must appear before the instruction read(B), in any valid schedule. In the
following discussion, we shall refer to the first execution sequence (T1 followed by T2) as
schedule 1, and to the second execution sequence (T2 followed by T1) as schedule 2. These
schedules are serial: Each serial schedule consists of a sequence of instructions from various
transactions, where the instructions belonging to one single transaction appear together in that
schedule. Thus, for a set of n transactions, there exist n! different valid serial schedules.
Figure 4.3 Schedule 2—a serial schedule in which T2 is followed by T1.
When the database system executes several transactions concurrently, the corresponding
schedule no longer needs to be serial. If two transactions are running concurrently, the operating
system may execute one transaction for a little while, then perform a context switch, execute the
second transaction for some time, and then switch back to the first transaction for some time, and
so on. With multiple transactions, the CPU time is shared among all the transactions.
Several execution sequences are possible, since the various instructions from both transactions
may now be interleaved. In general, it is not possible to predict exactly how many instructions of
a transaction will be executed before the CPU switches to another transaction. Thus, the number
of possible schedules for a set of n transactions is much larger than n!.
Figure 4.4 Schedule 3—a concurrent schedule equivalent to schedule 1.
We can ensure consistency of the database under concurrent execution by making sure that any
schedule that executed has the same effect as a schedule that could have occurred without any
concurrent execution. That is, the schedule should, in some sense, be equivalent to a serial
schedule.
Let us consider a schedule S in which there are two consecutive instructions Ii and Ij, of
transactions Ti and Tj , respectively (i ≠ j). If Ii and Ij refer to different data items, then we can
swap Ii and Ij without affecting the results of any instruction in the schedule. However, if Ii and
Ij refer to the same data item Q, then the order of the two steps may matter. Since we are dealing
with only read and write instructions, there are four cases that we need to consider:
1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value of Q is
read by Ti and Tj , regardless of the order.
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q that is
written by Tj in instruction Ij. If Ij comes before Ii, then Ti reads the value of Q that is written
by Tj. Thus, the order of Ii and Ij matters.
3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters for reasons similar to those of the
previous case.
4. Ii = write(Q), Ij = write(Q). Since both instructions are write operations, the order of these
instructions does not affect either Ti or Tj . However, the value obtained by the next read(Q)
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write(Q) instruction after Ii and Ij in S, then the
order of Ii and Ij directly affects the final value of Q in the database state that results from
schedule S.
Thus, only in the case where both Ii and Ij are read instructions does the relative order of their
execution not matter. We say that Ii and Ij conflict if they are operations by different transactions
on the same data item, and at least one of these instructions is a write operation. To illustrate the
concept of conflicting instructions, we consider schedule 3, in Figure The write(A) instruction of
T1 conflicts with the read(A) instruction of T2.However, the write(A) instruction of T2 does not
conflict with the read(B) instruction of T1, because the two instructions access different data
items.
Since the write(A) instruction of T2 in schedule 3 of Figure. does not conflict with the read(B)
instruction of T1, we can swap these instructions to generate an equivalent schedule, schedule 5,
in Figure Regardless of the initial system state, schedules 3 and 5 both produce the same final
system state.We continue to swap non conflicting instructions:
The final result of these swaps, schedule 6 of Figure 4.8 is a serial schedule. Thus, we have
shown that schedule 5 is equivalent to a serial schedule. This equivalence implies that, regardless
of the initial system state, schedule 3 will produce the same final state as will some serial
schedule.
Finally, consider schedule 7 of Figure ; it consists of only the significant operations (that is, the
read and write) of transactions T3 and T4. This schedule is not conflict serializable, since it is not
equivalent to either the serial schedule <T3,T4> or the serial schedule <T4,T3>.
It is possible to have two schedules that produce the same outcome, but that are not conflict
equivalent. For example, consider transaction T5, which transfers $10 from account B to account
A. Let schedule 8 be as defined in Figure . We claim that schedule 8 is not conflict equivalent to
the serial schedule <T1,T5>, since, in schedule 8, the write(B) instruction of T5 conflicts with
the read(B) instruction of T1.
Thus, we cannot move all the instructions of T1 before those of T5 by swapping consecutive
nonconflicting instructions. However, the final values of accounts A and B after the execution of
either schedule 8 or the serial schedule <T1,T5> are the same $960 and $2040, respectively.
View Serializability
In this section, we consider a form of equivalence that is less stringent than conflict equivalence,
but that, like conflict equivalence, is based on only the read and write operations of transactions.
Consider two schedules S and S', where the same set of transactions participates in both
schedules. The schedules S and S' are said to be view equivalent if three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S', also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was
produced by a write(Q) operation executed by transaction Tj , then the read(Q) operation of
transaction Ti must, in schedule S', also read the value of Q that was produced by the same
write(Q) operation of transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S'.
Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and,
therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2, ensures
that both schedules result in the same final system state.
In our previous examples, schedule 1 is not view equivalent to schedule 2, since, in schedule 1,
the value of account A read by transaction T2 was produced by T1, whereas this case does not
hold in schedule 2. However, schedule 1 is view equivalent to schedule 3, because the values of
account A and B read by transaction T2 were produced by T1 in both schedules.
The concept of view equivalence leads to the concept of view serializability. We say that a
schedule S is view serializable if it is view equivalent to a serial schedule. As an illustration,
suppose that we augment schedule 7 with transaction T6, and obtain schedule 9 in Figure .
Schedule 9 is view serializable. Indeed, it is view equivalent to the serial schedule <T3, T4, T6>,
since the one read(Q) instruction reads the initial value of Q in both schedules, and T6 performs
the final write of Q in both schedules.
Every conflict-serializable schedule is also view serializable, but there are viewserializable
schedules that are not conflict serializable. Indeed, schedule 9 is not conflict serializable, since
every pair of consecutive instructions conflicts, and, thus, no swapping of instructions is
possible.
Observe that, in schedule 9, transactions T4 and T6 perform write(Q) operations without having
performed a read(Q) operation. Writes of this sort are called blind writes. Blind writes appear in
any view-serializable schedule that is not conflict serializable.
Figure 4.10 Precedence graph for (a) schedule 1 and (b) schedule 2.
If an edge Ti → Tj exists in the precedence graph, then, in any serial schedule S' equivalent to
S, Ti must appear before Tj . For example, the precedence graph for schedule 1 in Figure a
contains the single edge T1 → T2, since all the instructions of T1 are executed before the first
instruction of T2 is executed. Similarly, Figure b shows the precedence graph for schedule 2
with the single edge T2 → T1, since all the instructions of T2 are executed before the first
instruction of T1 is executed.
The precedence graph for schedule 4 appears in Figure It contains the edge T1 →T2, because
T1 executes read(A) before T2 executes write(A). It also contains the edge T2 → T1, because
T2 executes read(B) before T1 executes write(B). If the precedence graph for S has a cycle,
then schedule S is not conflict serializable.
If the graph contains no cycles, then the schedule S is conflict serializable. A serializability
order of the transactions can be obtained through topological sorting, which determines a
linear order consistent with the partial order of the precedence graph. There are, in general,
several possible linear orders that can be obtained through a topological sorting. For example,
the graph of Figure a has the two acceptable linear orderings shown in Figures b and c.
Thus, to test for conflict serializability, we need to construct the precedence graph and to
invoke a cycle-detection algorithm. Cycle-detection algorithms can be found in standard
textbooks on algorithms. Cycle-detection algorithms, such as those based on depth-first
search, require on the order of n2 operations, where n is the number of vertices in the graph
(that is, the number of transactions). Thus, we have a practicalscheme for determining conflict
serializability.
Returning to our previous examples, note that the precedence graphs for schedules 1 and 2
indeed do not contain cycles. The precedence graph for schedule 4 , on the other hand,
contains a cycle, indicating that thisschedule is not conflict serializable. Testing for view
serializability is rather complicated. In fact, it has been shown that the problem of testing for
view serializability is itself NP-complete. Thus, almost certainly there exists no efficient
algorithm to test for view serializability. See the bibliographical notes for references on
testing for view serializability. However, concurrency-control schemes can still use sufficient
conditions for view serializability. That is, if the sufficient conditions are satisfied, the
schedule is view serializable, but there may be view-serializable schedules that do not satisfy
the sufficient conditions.
So far, we have studied what schedules are acceptable from the viewpoint of consistency of the
database, assuming implicitly that there are no transaction failures. We now address the effect of
transaction failures during concurrent execution. If a transaction Ti fails, for whatever reason, we
need to undo the effect of this transaction to ensure the atomicity property of the transaction. In a
system that allows concurrent execution, it is necessary also to ensure that any transaction Tj that
is dependent on Ti (that is, Tj has read data written by Ti) is also aborted. To achieve this surety,
we need to place restrictions on the type of schedules permitted in the system.
In the following two subsections, we address the issue of what schedules are acceptable from the
viewpoint of recovery from transaction failure. We describe in Concurrency Control how to
ensure that only such acceptable schedules are generated.
Recoverable Schedules
Consider schedule 11 in Figure , in which T9 is a transaction that performs only one instruction:
read(A). Suppose that the system allows T9 to commit immediately after executing the read(A)
instruction. Thus, T9 commits before T8 does. Now suppose that T8 fails before it commits.
Since T9 has read the value of data item A written by T8, we must abort T9 to ensure transaction
atomicity. However, T9 has already committed and cannot be aborted. Thus, we have a situation
where it is impossible to recover correctly from the failure of T8.
Schedule 11, with the commit happening immediately after the read(A) instruction, is an
example of a nonrecoverable schedule, which should not be allowed. Most database system
require that all schedules be recoverable. A recoverable schedule is one where, for each pair of
transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the commit operationof Tj .
Cascadeless Schedules
Even if a schedule is recoverable, to recover correctly from the failure of a transaction Ti, we
may have to roll back several transactions. Such situations occur if transactions have read data
written by Ti. As an illustration, consider the partial schedule of Figure . Transaction T10 writes
a value of A that is read by transaction T11.
Transaction T11 writes a value of A that is read by transaction T12. Suppose that, at this point,
T10 fails. T10 must be rolled back. Since T11 is dependent on T10, T11 must be rolled back.
Since T12 is dependent on T11, T12 must be rolled back. This phenomenon, in which a single
transaction failure leads to a series of transaction rollbacks, is called cascading rollback.
A computer system, like any other device, is subject to failure from a variety of causes: disk
crash, power outage, software error, a fire in the machine room, even sabotage. In any failure,
information may be lost. Therefore, the database system must take actions in advance to ensure
that the atomicity and durability properties of transactions, introduced in Transactions, are
preserved. An integral part of a database system is a recovery scheme that can restore the
database to the consistent state that existed before the failure. The recovery scheme must also
provide high availability; that is, it must minimize the time for which the database is not usable
after a crash.
Failure Classification
There are various types of failure that may occur in a system, each of which needs to be dealt
with in a different manner. The simplest type of failure is one that does not result in the loss of
information in the system. The failures that are more difficult to deal with are those that result in
loss of information. In this chapter,we shall consider only the following types of failure:
Transaction failure. There are two types of errors that may cause a transaction to fail:
a. Logical error. The transaction can no longer continue with its normal execution because of
some internal condition, such as bad input, data not found, overflow, or resource limit
exceeded.
b. System error. The system has entered an undesirable state (for example, deadlock), as a
result of which a transaction cannot continue with its normal execution. The transaction,
however, can be reexecuted at a later time.
System crash. There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage, and brings transaction
processing to a halt. The content of nonvolatile storage remains intact, and is not corrupted.
The assumption that hardware errors and bugs in the software bring the system to a halt, but do
not corrupt the nonvolatile storage contents, is known as the fail-stop assumption. Well-designed
systems have numerous internal checks, at the hardware and the software level, that bring the
system to a halt when there is an error. Hence, the fail-stop assumption is a reasonable one.
Disk failure. A disk block loses its content as a result of either a head crash or failure during a
data transfer operation. Copies of the data on other disks, or archival backups on tertiary
media, such as tapes, are used to recover from the failure.
To determine how the system should recover from failures, we need to identify the failure modes
of those devices used for storing data. Next, we must consider how these failure modes affect the
contents of the database.
Reexecute Ti. This procedure will result in the value of A becoming $900, rather than $950.
Thus, the system enters an inconsistent state.
Do not reexecute Ti. The current system state has values of $950 and $2000 for A and B,
respectively. Thus, the system enters an inconsistent state.
In either case, the database is left in an inconsistent state, and thus this simple recovery scheme
does not work. The reason for this difficulty is that we have modified the database without
having assurance that the transaction will indeed commit. Our goal is to perform either all or no
database modifications made by Ti. However, if Ti performed multiple database modifications,
several output operations may be required, and a failure may occur after some of these
modifications have been made, but before all of them are made.
To achieve our goal of atomicity, we must first output information describing the modifications
to stable storage, without modifying the database itself. As we shall see, this procedure will
allow us to output all the modifications made by a committed transaction, despite failures. There
are two ways to perform such outputs; we shall assume that transactions are executed serially; in
other words, only a single transaction is active at a time.
Transaction identifier is the unique identifier of the transaction that performed the write
operation.
Data-item identifier is the unique identifier of the data item written. Typically, it is the
location on disk of the data item.
Old value is the value of the data item prior to the write.
New value is the value that the data item will have after the write.
Other special log records exist to record significant events during transaction processing, such as
the start of a transaction and the commit or abort of a transaction. We denote the various types of
log records as:
For log records to be useful for recovery from system and disk failures, the log must reside in
stable storage. For now, we assume that every log record is written to the end of the log on stable
storage as soon as it is created. Observe that the log contains a complete record of all database
activity. As a result, the volume of data stored in the log may become unreasonably large.
The version of the deferred-modification technique that we describe in this section assumes that
transactions are executed serially. When a transaction partially commits, the information on the
log associated withthe transaction is used in executing the deferred writes. If the system crashes
before the transaction completes its execution, or if the transaction aborts, then the information
on the log is simply ignored.
The execution of transaction Ti proceeds as follows. Before Ti starts its execution, a record <Ti
start> is written to the log. A write(X) operation by Ti results in the writing of a new record to
the log. Finally, when Ti partially commits, a record <Ti commit> is written to the log.
When transaction Ti partially commits, the records associated with it in the log are used in
executing the deferred writes. Since a failure may occur while this updating is taking place, we
must ensure that, before the start of these updates, all the log records are written out to stable
storage. Once they have been written, the actual updating takes place, and the transaction enters
the committed state.
Observe that only the new value of the data item is required by the deferredmodification
technique. Thus, we can simplify the general update-log record structure that we saw in the
previous section, by omitting the old-value field. To illustrate, reconsider our simplified banking
system. Let T0 be a transaction that transfers $50 from account A to account B:
T0: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let T1 be a transaction that withdraws $100 from account C:
T1: read(C);
C := C − 100;
write(C).
Suppose that these transactions are executed serially, in the order T0 followed by T1, and that the
values of accounts A, B, and C before the execution took place were $1000, $2000, and $700,
respectively. The portion of the log containing the relevant information on these two transactions
appears in Figure . There are various orders in which the actual outputs can take place to both the
database system and the log as a result of the execution of T0 and T1. One such order appears in
Figure. Note that the value of A is changed in the database only after the record <T0, A, 950>
has been placed in the log.
<T0 start>
<T0 , A, 950>
<T0 , B, 2050>
<T0 commit>
<T1 start>
<T1 , C, 600>
<T1 commit>
Figure: Portion of the database log corresponding to T0 and T1.
Using the log, the system can handle any failure that results in the loss of information on volatile
storage. The recovery scheme uses the following recovery procedure:
redo(Ti) sets the value of all data items updated by transaction Ti to the new values.
The set of data items updated by Ti and their respective new values can be found in the log. The
redo operation must be idempotent; that is, executing it several times must be equivalent to
executing it once. This characteristic is required if we are to guarantee correct behavior even if a
failure occurs during the recovery process.
After a failure, the recovery subsystem consults the log to determine which transactions need to
be redone. Transaction Ti needs to be redone if and only if the log contains both the record <Ti
start> and the record <Ti commit>. Thus, if the system crashes after the transaction completes its
execution, the recovery scheme uses the information in the log to restore the system to a previous
consistent state after the transaction had completed.
As an illustration, let us return to our banking example with transactions T0 and T1 executed one
after the other in the order T0 followed by T1. Figure shows the log that results from the
complete execution of T0 and T1. Let us suppose that the system crashes before the completion
of the transactions, so that we can see how the recovery technique restores the database to a
consistent state. Assume that the crash occurs just after the log record for the step
write(B)
Figure 4.15 State of the log and database corresponding to T0 and T1.
of transaction T0 has been written to stable storage. The log at the time of the crash appears in
Figure a. When the system comes back up, no redo actions need to be taken, since no commit
record appears in the log. The values of accounts A and B remain $1000 and $2000, respectively.
The log records of the incomplete transaction T0 can be deleted from the log.
Now, let us assume the crash comes just after the log record for the step
write(C)
of transaction T1 has been written to stable storage. In this case, the log at the time of the crash is
as in Figure b. When the system comes back up, the operation redo(T0) is performed, since the
record
<T0 commit>
appears in the log on the disk. After this operation is executed, the values of accounts A and B
are $950 and $2050, respectively. The value of account C remains $700. As before, the log
records of the incomplete transaction T1 can be deleted from the log. Finally, assume that a crash
occurs just after the log record
<T1 commit>
is written to stable storage. The log at the time of this crash is as in Figure c. When the system
comes back up, two commit records are in the log: one for T0 and one for T1. Therefore, the
system must perform operations redo(T0) and redo(T1), in the order in which their commit
records appear in the log. After the system executes these operations, the values of accounts A,
B, and C are $950, $2050, and $600, respectively. Finally, let us consider a case in which a
second system crash occurs during recovery from the first crash. Some changes may have been
made to the database as a result of the redo operations, but all changes may not have been made.
When the system comes up after the second crash, recovery proceeds exactly as in the preceding
examples. For each commit record
<Ti commit>
found in the log, the the system performs the operation redo(Ti). In other words, it restarts the
recovery actions from the beginning. Since redo writes values to the database independent of the
values currently in the database, the result of a successful second attempt at redo is the same as
though redo had succeeded the first time.
Before a transaction Ti starts its execution, the system writes the record <Ti start> to the log.
During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate
new update record to the log. When Ti partially commits, the system writes the record <Ti
commit> to the log.
Since the information in the log is used in reconstructing the state of the database, we cannot
allow the actual update to the database to take place before the corresponding log record is
written out to stable storage.We therefore require that, before execution of an output(B)
operation, the log records corresponding to B be written onto stable storage.
As an illustration, let us reconsider our simplified banking system, with transactions T0 and T1
executed one after the other in the order T0 followed by T1. The portion of the log containing the
relevant information concerning these two transactions appears in Figure .
Figure shows one possible order in which the actual outputs took place in both the database
system and the log as a result of the execution of T0 and T1.
<T0 start>
<T0 , A, 1000, 950>
<T0 , B, 2000, 2050>
<T0 commit>
<T1 start>
<T1 , C, 700, 600>
<T1 commit>
Figure : Portion of the system log corresponding to T0 and T1.
4.7 Checkpoints
When a system failure occurs, we must consult the log to determine those transactions that need
to be redone and those that need to be undone. In principle, we need to search the entire log to
determine this information. There are two major difficulties with this approach:
1. The search process is time consuming.
2. Most of the transactions that, according to our algorithm, need to be redone have already
written their updates into the database. Although redoing them will cause no harm, it will
nevertheless cause recovery to take longer.
To reduce these types of overhead, we introduce checkpoints. During execution, the system
maintains the log, using one of the two techniques In addition, the system periodically performs
checkpoints, which requirethe following sequence of actions to take place:
1. Output onto stable storage all log records currently residing in main memory.
2. Output to the disk all modified buffer blocks.
3. Output onto stable storage a log record <checkpoint>. Transactions are not allowed to
perform any update actions, such as writing to a buffer block or writing a log record, while a
checkpoint is in progress.
The presence of a <checkpoint> record in the log allows the system to streamline its recovery
procedure. Consider a transaction Ti that committed prior to the checkpoint. For such a
transaction, the <Ti commit> record appears in the log before the<checkpoint> record. Any
database modifications made by Ti must have been written to the database either prior to the
checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no need to perform
a redo operation on Ti. This observation allows us to refine our previous recovery schemes. (We
continue to assume that transactions are run serially.) After a failure has occurred, the recovery
scheme examines the log to determine the most recent transaction Ti that started executing
before the most recent checkpoint took place. It can find such a transaction by searching the log
backward, from the end of the log, until it finds the first<checkpoint> record (since we are
searching backward, the record found is the final<checkpoint> record in the log); then it
continues the search backward until it finds the next <Ti start> record. This record identifies a
transaction Ti.
Once the system has identified transaction Ti, the redo and undo operations need to be applied to
only transaction Ti and all transactions Tj that started executing after transaction Ti. Let us
denote these transactions by the set T. The remainder (earlier part) of the log can be ignored, and
can be erased whenever desired. The exact recovery operations to be performed depend on the
modification technique being used. For the immediate-modification technique, the recovery
operations are:
For all transactions Tk in T that have no <Tk commit> record in the log, execute undo(Tk).
For all transactions Tk in T such that the record <Tk commit> appears in the log, execute
redo(Tk).
Obviously, the undo operation does not need to be applied when the deferred-modification
technique is being employed.
As an illustration, consider the set of transactions {T0, T1, . . ., T100} executed in the order
of the subscripts. Suppose that the most recent checkpoint took place during the execution of
transaction T67. Thus, only transactions T67, T68, . . ., T100 need to be considered during the
recovery scheme. Each of them needs to be redone if it has committed; otherwise, it needs to
be undone.
Figure 4.17 State of system log and database corresponding to T0 and T1.
Using the log, the system can handle any failure that does not result in the loss of information in
nonvolatile storage. The recovery scheme uses two recovery procedures:
undo(Ti) restores the value of all data items updated by transaction Ti to the old values.
redo(Ti) sets the value of all data items updated by transaction Ti to the new values.
The set of data items updated by Ti and their respective old and new values can be found in the
log. The undo and redo operations must be idempotent to guarantee correct behavior even if a
failure occurs during the recovery process. After a failure has occurred, the recovery scheme
consults the log to determine which transactions need to be redone, and which need to be
undone:
Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not
contain the record <Ti commit>.
Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record
<Ti commit>.
As an illustration, return to our banking example, with transaction T0 and T1 executed one after
the other in the order T0 followed by T1. Suppose that the system crashes before the completion
of the transactions. We shall consider three cases. The state of the logs for each of these cases
appears in Figure . First, let us assume that the crash occurs just after the log record for the step
write(B)
of transaction T0 has been written to stable storage . When the system comes back up, it finds the
record <T0 start> in the log, but no corresponding <T0 commit> record. Thus, transaction T0
must be undone, so an undo(T0) is performed. As a result, the values in accounts A and B (on the
disk) are restored to $1000 and $2000, respectively. Next, let us assume that the crash comes just
after the log record for the step
write(C)
of transaction T1 has been written to stable storage When the system comes back up, two
recovery actions need to be taken. The operation undo(T1) must be performed, since the record
<T1 start> appears in the log, but there is no record<T1 commit>. The operation redo(T0)must
be performed, since the log contains both the record <T0 start> and the record <T0 commit>. At
the end of the entire recovery procedure, the values of accounts A, B, and C are $950, $2050,
and $700, respectively. Note that the undo(T1) operation is performed before the redo(T0). In
this example, the same outcome would result if the order were reversed. However, the order of
doing undo operations first, and then redo operations, is important for the recovery algorithm.
Finally, let us assume that the crash occurs just after the log record<T1 commit> has been
written to stable storage. When the system comes back up, both T0 and T1 need to be redone,
since the records <T0 start> and <T0 commit> appear in the log, as do the records <T1 start>
and <T1 commit>. After the system performs the recovery procedures redo(T0) and redo(T1),
the values in accounts A, B, and C are $950, $2050, and $600, respectively.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete
its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T 2. T2 is
waiting for resource Z, which is held by T0. Thus, all the processes wait for each other to release
resources. In this situation, none of the processes can finish their task. This situation is known
as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions
involved in the deadlock are either rolled back or restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute. The DBMS inspects the operations and
analyzes if they can create a deadlock situation. If it finds that a deadlock situation might occur,
then that transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of transactions
in order to predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with a conflicting lock by another transaction, then one of the two possibilities may occur −
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than T j −
then Ti is allowed to wait until the data-item is available.
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with conflicting lock by some another transaction, one of the two possibilities may occur −
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is
restarted later with a random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an
item held by a younger one, the older transaction forces the younger one to abort and release the
item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for
graph" are available but they are suitable for only those systems where transactions are
lightweight having fewer instances of resource. In a bulky system, deadlock prevention
techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each
transaction entering into the system, a node is created. When a transaction T i requests for a lock
on an item, say X, which is held by some other transaction T j, a directed edge is created from
Ti to Tj. If Tj releases item X, the edge between them is dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items held
by others. The system keeps checking if there's any cycle in the graph.
First, do not allow any request for an item, which is already locked by another
transaction. This is not always feasible and may cause starvation, where a transaction
indefinitely waits for a data item and can never acquire it.
The second option is to roll back one of the transactions. It is not always feasible to roll
back the younger transaction, as it may be important than the older one. With the help of
some relative algorithm, a transaction is chosen, which is to be aborted. This transaction
is known as the victim and the process is known as victim selection.
Features
Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.
Features
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed system,
with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to
a halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized systems,
all queries have to pass through the central computer for processing, which increases the
response time.
Lower Communication Cost − In distributed database systems, if data is located locally where
it is mostly used, then the communication costs for data manipulation can be minimized. This is
not feasible in centralized systems.
Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.
Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the
sites.
Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.
In this section, we will see how the above techniques are implemented in a distributed database
system.
The basic principle of distributed two-phase locking is same as the basic two-phase locking
protocol. However, in a distributed system there are sites designated as lock managers. A lock
manager controls lock acquisition requests from transaction monitors. In order to enforce co-
ordination between the lock managers in various sites, at least one site is given the authority to
see all transactions and detect lock conflicts.
Depending upon the number of sites who can detect lock conflicts, distributed two-phase
locking approaches can be of three types −
Centralized two-phase locking − In this approach, one site is designated as the central
lock manager. All the sites in the environment know the location of the central lock
manager and obtain lock from it during transactions.
Primary copy two-phase locking − In this approach, a number of sites are designated as
lock control centers. Each of these sites has the responsibility of managing a defined set
of locks. All the sites know which lock control center is responsible for managing lock
of which data table/fragment item.
Distributed two-phase locking − In this approach, there are a number of lock managers,
where each lock manager controls locks of data items stored at its local site. The
location of the lock manager is based upon data distribution and replication.
For implementing timestamp ordering algorithms, each site has a scheduler that maintains a
separate queue for each transaction manager. During transaction, a transaction manager sends a
lock request to the site’s scheduler. The scheduler puts the request to the corresponding queue in
increasing timestamp order. Requests are processed from the front of the queues in the order of
their timestamps, i.e. the oldest first.
Conflict Graphs
Another method is to create conflict graphs. For this transaction classes are defined. A
transaction class contains two set of data items called read set and write set. A transaction
belongs to a particular class if the transaction’s read set is a subset of the class’ read set and the
transaction’s write set is a subset of the class’ write set. In the read phase, each transaction
issues its read requests for the data items in its read set. In the write phase, each transaction
issues its write requests.
A conflict graph is created for the classes to which active transactions belong. This contains a
set of vertical, horizontal, and diagonal edges. A vertical edge connects two nodes within a class
and denotes conflicts within the class. A horizontal edge connects two nodes across two classes
and denotes a write-write conflict among different classes. A diagonal edge connects two nodes
across two classes and denotes a write-read or a read-write conflict among two classes.
The conflict graphs are analyzed to ascertain whether two transactions within the same class or
across two different classes can be run in parallel.
Rule 1 − According to this rule, a transaction must be validated locally at all sites when it
executes. If a transaction is found to be invalid at any site, it is aborted. Local validation
guarantees that the transaction maintains serializability at the sites where it has been executed.
After a transaction passes local validation test, it is globally validated.
Rule 2 − According to this rule, after a transaction passes local validation test, it should be
globally validated. Global validation ensures that if two conflicting transactions run together at
more than one site, they should commit in the same relative order at all the sites they run
together. This may require a transaction to wait for the other conflicting transaction, after
validation before commit. This requirement makes the algorithm less optimistic since a
transaction may not be able to commit as soon as it is validated at a site.
In the pre computerization days, organizations would create physical directories of employees
and distribute them across the organization. In general, a directory is a listing of information
about some class of objects such as persons. Directories can be used to find information about a
specific object, or in the reverse direction to find objects that meet a certain requirement. In the
world of physical telephone directories, directories that satisfy lookups in the forward direction
are called white pages, while directories that satisfy lookups in the reverse direction are called
yellow pages
A directory system is implemented as one or more servers, which service multiple clients.
Clients use the application programmer interface defined by directory system to communicate
with the directory servers. Directory access protocols also define a data model and access
control.
The X.500 directory access protocol, defined by the International Organization for
Standardization (ISO), is a standard for accessing directory information. However, the protocol
is rather complex, and is not widely used. The Lightweight Directory Access Protocol (LDAP)
provides many of the X.500 features, but with less complexity, and is widely used.
In LDAP directories store entries, which are similar to objects. Each entry must have a
distinguished name (DN), which uniquely identifies the entry. A DN is in turn made up of a
sequence of relative distinguished names (RDNs).
Data Manipulation
Unlike SQL, LDAP does not define either a data-definition language or a data manipulation
language. However, LDAP defines a network protocol for carrying out data definition and
manipulation. Users of LDAP can either use an application programming interface, or use tools
provided by various vendors to perform data definition and manipulation. LDAP also defines a
file format called LDAP Data Interchange Format (LDIF) that can be used for storing and
exchanging information.
Distributed Directory Trees
Information about an organization may be split into multiple DITs, each of which stores
information about some entries. A node in a DIT may contain a referral to another node in
another DIT. Referrals are the key component that help organize a distributed collection of
directories into an integrated system. When a server gets a query on a DIT, it may return a
referral to the client, which then issues a query on the referenced DIT. Access to the referenced
DIT is transparent, proceeding without the user’s knowledge. Alternatively, the server itself
may issue the query to the referred DIT and return the results along with locally computed
results
The hierarchical naming mechanism used by LDAP helps break up control of information
across parts of an organization. The referral facility then helps integrate all the directories in an
organization into a single virtual directory