0% found this document useful (0 votes)
3 views

Unit 4 Notes

The document discusses transaction processing concepts, defining a transaction as a unit of program execution that may access and update data items. It emphasizes the importance of ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure data integrity during transaction execution, and explains the differences between serial and non-serial schedules, including conflict and view serializability. Additionally, it outlines how to check for serializability and the implications of non-serializable schedules.

Uploaded by

Diwakar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 4 Notes

The document discusses transaction processing concepts, defining a transaction as a unit of program execution that may access and update data items. It emphasizes the importance of ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure data integrity during transaction execution, and explains the differences between serial and non-serial schedules, including conflict and view serializability. Additionally, it outlines how to check for serializability and the implications of non-serializable schedules.

Uploaded by

Diwakar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

UNIT - IV

TRANSACTIONS PROCESSING CONCEPTS


Transaction Concept
Transaction: A Transaction is a list of actions to perform a single logical unit of work.
or
A transaction is a unit of program execution that accesses and possibly updates various data items.
i. Two main issues to deal with:
 Failures of various kinds, such as hardware failures and system crashes
 Concurrent execution of multiple transactions

ii. Access to the DB is done by two operations: Read(x) Write(x)


E.g. Transaction to transfer $50 from account A to account B:
T1:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Transaction State

Active: The initial state; the transaction stays in this state while it is executing
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database restored to its state prior to the start
of the transaction.
Two options after it has been aborted:
i. restart the transaction ii. kill the transaction

Committed: After successful completion. A transaction is said to have terminated if it has either
committed or aborted.
ACID Properties
To preserve the integrity of data the database system must ensure:
Atomicity: Either all operations of the transaction are properly reflected in the database or none are.
Consistency: Execution of a transaction in isolation preserves the consistency of the database.
Isolation: Although multiple transactions may execute concurrently, each transaction must be unaware
of other concurrently executing transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.
i. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution
before Ti started, or Tj started execution after Ti finished.

Durability: After a transaction completes successfully, the changes it has made to the database persist,
even if there are system failures.

Example of Fund Transfer


 Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Atomicity requirement
 If the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent
database state Failure could be due to software or hardware
 The system should ensure that updates of a partially executed transaction are not reflected in the
database

Durability requirement
Once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken
place), the updates to the database by the transaction must persist even if there are software or hardware
failures.
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Consistency requirement in above example:
The sum of A and B is unchanged by the execution of the transaction
In general, consistency requirements include
 A transaction must see a consistent database.
 During transaction execution the database may be temporarily inconsistent.
 When the transaction completes successfully the database must be consistent

Isolation requirement:
If between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it
will see an inconsistent database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A) write(A) read(A),
read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
Isolation can be ensured by running transactions serially that is, one after the other. However, executing
multiple transactions concurrently has significant benefits.
Schedules
Schedule: A sequences of instructions that specify the chronological order (i.e. starting with the earliest
date and finishing with most recent) in which instructions of concurrent transactions are executed
 a schedule for a set of transactions must consist of all instructions of those transactions
 must preserve the order in which the instructions appear in each individual transaction.

A transaction that successfully completes its execution will have a commit instruction as the last statement
 by default, transaction assumed to execute commit instruction as its last step

A transaction that fails to successfully complete its execution will have an abort instruction as the last
statement.

 A schedule is the order in which the operations of multiple transactions appear for execution.
 Serial schedules are always consistent.
 Non-serial schedules are not always consistent.
Serial Schedules-

In serial schedules,

 All the transactions execute serially one after the other.


 When one transaction executes, no other transaction is allowed to execute.

Characteristics-
Serial schedules are always-

 Consistent
 Recoverable
 Cascadeless
 Strict

Example-01:

In this schedule,

 There are two transactions T1 and T2 executing serially one after the other.
 Transaction T1 executes first.
 After T1 completes its execution, transaction T2 executes.
 So, this schedule is an example of a Serial Schedule.
Non-Serial Schedules-

In non-serial schedules,

 Multiple transactions execute concurrently.


 Operations of all the transactions are inter leaved or mixed with each other.

Characteristics-

Non-serial schedules are NOT always-

 Consistent
 Recoverable
 Cascadeless
 Strict

Example-01:

In this schedule,

 There are two transactions T1 and T2 executing concurrently.


 The operations of T1 and T2 are interleaved.
 So, this schedule is an example of a Non-Serial Schedule.
Transaction T1 to transfer $50 from account A to account B:
T1
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Transaction T2 transfers 10% of the balance from account A to account B:
T2
1. read(A);
2. temp:= A* 0.1;
3. A := A – temp;
4. write(A);
5. read(B);
6. B := B + temp;
7. write(B).
Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
A serial schedule in which T1 is followed by T2 :

Schedule 1.2
Sched

Schedule 2
In Schedules 1.1,1.2 and 2, the sum A + B is preserved
Serializability
 Some non-serial schedules may lead to inconsistency of the database.
 Serializability is a concept that helps to identify which non-serial schedules are correct and will maintain
the consistency of the database.
Serializable Schedules-

If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’ transactions,
then it is called as a serializable schedule.

Characteristics-

Serializable schedules behave exactly same as serial schedules.

Thus, serializable schedules are always-

 Consistent
 Recoverable
 Casacadeless
 Strict
Serial Schedules Vs Serializable Schedules-

Serial Schedules Serializable Schedules

No concurrency is allowed. Concurrency is allowed.


Thus, multiple transactions can execute
Thus, all the transactions necessarily execute
concurrently.
serially one after the other.

Serial schedules lead to less resource utilization Serializable schedules improve both resource
and CPU throughput. utilization and CPU throughput.

Serial Schedules are less efficient as compared to Serializable Schedules are always better than
serializable schedules. serial schedules.
(due to above reason) (due to above reason)

Types of Serializability-

Serializability is mainly of two types-

1. Conflict Serializability
2. View Serializability

Basic Assumption: Each transaction preserves database consistency.


Thus, serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is Serializable if it is equivalent to a serial schedule.
Conflict Serializability-

If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting
operations, then it is called as a conflict serializable schedule.

Conflicting Operations-

Two operations are called as conflicting operations if all the following conditions hold true for them-

 Both the operations belong to different transactions


 Both the operations are on the same data item
 At least one of the two operations is a write operation

Example-

Consider the following schedule-

In this schedule,

 W1 (A) and R2 (A) are called as conflicting operations.


 This is because all the above conditions hold true for them.

Checking Whether a Schedule is Conflict Serializable Or Not-


Follow the following steps to check whether a given non-serial schedule is conflict serializable or not-

Step-01:

Find and list all the conflicting operations.

Step-02:

Start creating a precedence graph by drawing one node for each transaction.

Step-03:

 Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict pair then draw an edge
from Ti to Tj.
 This ensures that Ti gets executed before Tj.

Step-04:

 Check if there is any cycle formed in the graph.


 If there is no cycle found, then the schedule is conflict serializable otherwise not.

Problem-01:

Check whether the given schedule S is conflict serializable or not. If yes, then determine all the possible
serialized schedules-
Solution-

Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the transactions-

 R4(A) , W2(A) (T4 → T2)


 R3(A) , W2(A) (T3 → T2)
 W1(B) , R3(B) (T1 → T3)
 W1(B) , W2(B) (T1 → T2)
 R3(B) , W2(B) (T3 → T2)

Step-02:

Draw the precedence graph-


 Clearly, there exists no cycle in the precedence graph.
 Therefore, the given schedule S is conflict serializable.

Finding the Serialized Schedules-

 All the possible topological orderings of the above precedence graph will be the possible serialized
schedules.
 The topological orderings can be found by performing the Topological Sort of the above precedence
graph.

After performing the topological sort, the possible serialized schedules are-

1. T1 → T3 → T4 → T2
2. T1 → T4 → T3 → T2
3. T4 → T1 → T3 → T2

View Serializability-

If a given schedule is found to be view equivalent to some serial schedule, then it is called as a view
serializable schedule.

View Equivalent Schedules-

Consider two schedules S1 and S2 each consisting of two transactions T1 and T2.
Schedules S1 and S2 are called view equivalent if the following three conditions hold true for them-

Condition-01:

For each data item X, if transaction Ti reads X from the database initially in schedule S1, then in schedule
S2 also, Ti must perform the initial read of X from the database.

Thumb Rule
“Initial readers must be same for all the data items”.

Condition-02:

If transaction Ti reads a data item that has been updated by the transaction Tj in schedule S1, then in
schedule S2 also, transaction Ti must read the same data item that has been updated by the transaction
Tj.

Thumb Rule
“Write-read sequence must be same.”.

Condition-03:

For each data item X, if X has been updated at last by transaction T i in schedule S1, then in schedule S2
also, X must be updated at last by transaction Ti.

Thumb Rule
“Final writers must be same for all the data items”.

Checking Whether a Schedule is View Serializable Or Not-


Method-01:

Check whether the given schedule is conflict serializable or not.

 If the given schedule is conflict serializable, then it is surely view serializable. Stop and report your
answer.
 If the given schedule is not conflict serializable, then it may or may not be view serializable. Go and check
using other methods.

Thumb Rules
 All conflict serializable schedules are view serializable.
 All view serializable schedules may or may not be conflict serializable.

Method-02:

Check if there exists any blind write operation.

(Writing without reading is called as a blind write).

 If there does not exist any blind write, then the schedule is surely not view serializable. Stop and report
your answer.
 If there exists any blind write, then the schedule may or may not be view serializable. Go and check using
other methods.

Thumb Rule
No blind write means not a view serializable schedule.

Method-03:

In this method, try finding a view equivalent serial schedule.

 By using the above three conditions, write all the dependencies.


 Then, draw a graph using those dependencies.
 If there exists no cycle in the graph, then the schedule is view serializable otherwise not.
PRACTICE PROBLEMS BASED ON VIEW SERIALIZABILITY-

Problem-01:

Check whether the given schedule S is view serializable or not-

Solution-

 We know, if a schedule is conflict serializable, then it is surely view serializable.


 So, let us check whether the given schedule is conflict serializable or not.

Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the transactions-

 R1(A) , W3(A) (T1 → T3)


 R2(A) , W3(A) (T2 → T3)
 R2(A) , W1(A) (T2 → T1)
 W3(A) , W1(A) (T3 → T1)
Step-02:

Draw the precedence graph-

 Clearly, there exists a cycle in the precedence graph.


 Therefore, the given schedule S is not conflict serializable.

Now,

 Since, the given schedule S is not conflict serializable, so, it may or may not be view serializable.
 To check whether S is view serializable or not, let us use another method.
 Let us check for blind writes.

Checking for Blind Writes-

 There exists a blind write W3 (A) in the given schedule S.


 Therefore, the given schedule S may or may not be view serializable.

Now,

 To check whether S is view serializable or not, let us use another method.


 Let us derive the dependencies and then draw a dependency graph.

Drawing a Dependency Graph-

 T1 firstly reads A and T3 firstly updates A.


 So, T1 must execute before T3.
 Thus, we get the dependency T1 → T3.
 Final updation on A is made by the transaction T1.
 So, T1 must execute after all other transactions.
 Thus, we get the dependency (T2, T3) → T1.
 There exists no write-read sequence.

Now, let us draw a dependency graph using these dependencies-

 Clearly, there exists a cycle in the dependency graph.


 Thus, we conclude that the given schedule S is not view serializable.

Non-Serializable Schedules-

 A non-serial schedule which is not serializable is called as a non-serializable schedule.


 A non-serializable schedule is not guaranteed to produce the the same effect as produced by some serial
schedule on any consistent database.

Characteristics-

Non-serializable schedules-

 may or may not be consistent


 may or may not be recoverable

Irrecoverable Schedules-
If in a schedule,

 A transaction performs a dirty read operation from an uncommitted transaction


 And commits before the transaction from which it has read the value
then such a schedule is known as an Irrecoverable Schedule.

Example-

Consider the following schedule-

Here,

 T2 performs a dirty read operation.


 T2 commits before T1.
 T1 fails later and roll backs.
 The value that T2 read now stands to be incorrect.
 T2 can not recover since it has already committed.

Recoverable Schedules-
If in a schedule,

 A transaction performs a dirty read operation from an uncommitted transaction


 And its commit operation is delayed till the uncommitted transaction either commits or roll backs
then such a schedule is known as a Recoverable Schedule.

Here,

 The commit operation of the transaction that performs the dirty read is delayed.
 This ensures that it still has a chance to recover if the uncommitted transaction fails later.

Example-

Consider the following schedule-

Here,

 T2 performs a dirty read operation.


 The commit operation of T2 is delayed till T1 commits or roll backs.
 T1 commits later.
 T2 is now allowed to commit.
 In case, T1 would have failed, T2 has a chance to recover by rolling back.

Checking Whether a Schedule is Recoverable or Irrecoverable-

Method-01:

Check whether the given schedule is conflict serializable or not.

 If the given schedule is conflict serializable, then it is surely recoverable. Stop and report your answer.
 If the given schedule is not conflict serializable, then it may or may not be recoverable. Go and check
using other methods.

Thumb Rules
 All conflict serializable schedules are recoverable.
 All recoverable schedules may or may not be conflict serializable.

Method-02:

Check if there exists any dirty read operation.

(Reading from an uncommitted transaction is called as a dirty read)

 If there does not exist any dirty read operation, then the schedule is surely recoverable. Stop and report
your answer.
 If there exists any dirty read operation, then the schedule may or may not be recoverable.

If there exists a dirty read operation, then follow the following cases-

Case-01:

If the commit operation of the transaction performing the dirty read occurs before the commit or abort
operation of the transaction which updated the value, then the schedule is irrecoverable.
Case-02:

If the commit operation of the transaction performing the dirty read is delayed till the commit or abort
operation of the transaction which updated the value, then the schedule is recoverable.

Thumb Rule
No dirty read means a recoverable schedule.

Recoverable Schedules-

If in a schedule,

 A transaction performs a dirty read operation from an uncommitted transaction


 And its commit operation is delayed till the uncommitted transaction either commits or roll backs
then such a schedule is called as a Recoverable Schedule.

Types of Recoverable Schedules-

A recoverable schedule may be any one of these kinds-

1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule

Cascading Schedule-
 If in a schedule, failure of one transaction causes several other dependent transactions to rollback or
abort, then such a schedule is called as a Cascading Schedule or Cascading Rollback or Cascading Abort.
 It simply leads to the wastage of CPU time.

Example-

Here,

 Transaction T2 depends on transaction T1.


 Transaction T3 depends on transaction T2.
 Transaction T4 depends on transaction T3.

In this schedule,

 The failure of transaction T1 causes the transaction T2 to rollback.


 The rollback of transaction T2 causes the transaction T3 to rollback.
 The rollback of transaction T3 causes the transaction T4 to rollback.
Such a rollback is called as a Cascading Rollback.

NOTE-

If the transactions T2, T3 and T4 would have committed before the failure of transaction T1, then the
schedule would have been irrecoverable.

Cascadeless Schedule-

If in a schedule, a transaction is not allowed to read a data item until the last transaction that has written
it is committed or aborted, then such a schedule is called as a Cascadeless Schedule.

In other words,

 Cascadeless schedule allows only committed read operations.


 Therefore, it avoids cascading roll back and thus saves CPU time.

Example-
NOTE-

 Cascadeless schedule allows only committed read operations.


 However, it allows uncommitted write operations.

Example-
Strict Schedule-

If in a schedule, a transaction is neither allowed to read nor write a data item until the last transaction that
has written it is committed or aborted, then such a schedule is called as a Strict Schedule.

In other words,

 Strict schedule allows only committed read and write operations.


 Clearly, strict schedule implements more restrictions than cascadeless schedule.

Example-

Remember-
 Strict schedules are more strict than cascadeless schedules.
 All strict schedules are cascadeless schedules.
 All cascadeless schedules are not strict schedules.

Equivalence of Schedules-

In DBMS, schedules may have the following three different kinds of equivalence relations among them-

1. Result Equivalence
2. Conflict Equivalence
3. View Equivalence

1. Result Equivalent Schedules-

 If any two schedules generate the same result after their execution, then they are called as result
equivalent schedules.
 This equivalence relation is considered of least significance.
 This is because some schedules might produce same results for some set of values and different results
for some other set of values.

2. Conflict Equivalent Schedules-

If any two schedules satisfy the following two conditions, then they are called as conflict equivalent
schedules-

1. The set of transactions present in both the schedules is same.


2. The order of pairs of conflicting operations of both the schedules is same.

3. View Equivalent Schedules-

We have already discussed about View Equivalent Schedules.

PRACTICE PROBLEMS BASED ON EQUIVALENCE OF SCHEDULES-

Problem-01:

Are the following three schedules result equivalent?


Solution-

To check whether the given schedules are result equivalent or not,

 We will consider some arbitrary values of X and Y.


 Then, we will compare the results produced by each schedule.
 Those schedules which produce the same results will be result equivalent.

Let X = 2 and Y = 5.

On substituting these values, the results produced by each schedule are-

Results by Schedule S1- X = 21 and Y = 10

Results by Schedule S2- X = 21 and Y = 10

Results by Schedule S3- X = 11 and Y = 10


 Clearly, the results produced by schedules S1 and S2 are same.
 Thus, we conclude that S1 and S2 are result equivalent schedules.

Failure Classification
Transaction failure:
Logical errors: transaction cannot complete due to some internal error condition
System errors: the database system must terminate an active transaction due to an error condition (e.g.,
deadlock)
System crash: a power failure or other hardware or software failure causes the system to crash.
Disk failure: a head crash or similar disk failure destroys all or part of disk storage.
Recoverability from Transaction Failure
To ensure atomicity despite failures, we first output information describing the modifications to stable
storage without modifying the database itself.
We study two approaches:
1. log-based recovery, and
2. shadow -paging
Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update activities on the database.
Each and every log record is given a unique id called log sequence number.
A log record is written for each of the following actions:
1. Updating a page
2. Commit
3. Abort
4. End
5. Undoing an update
Every log record contains the following fields:
Prev LSN |Trans Id |Type| Page ID| Length| Offset |Before Image |After image

Other special log records exist to record significant events during transaction processing, such as the start
of a transaction and the commit or abort of a transaction. We denote the various types of log records as:

 <Ti start>.Transaction Ti has started.


 <Ti, Xj, V1, V2> Transaction Ti has performed a write on data item Xj. Xj had value V1 before
write, and will have value V2 after the write.
 < Ti commit> Transaction Ti has committed.
 < Ti abort> Transaction Ti has aborted.

Whenever a transaction performs a write, it is essential that the log record for that write be created before
the database is modified. Once a log record exists, we can output the modification that has already been
output to the database. Also we have the ability to undo a modification that has already been output to the
database, by using the old-value field in the log records.
For log records to be useful for recovery from system and disk failures, the log must reside on stable storage.
However, since the log contains a complete record of all database activity, the volume of data stored in the
log may become unreasonable large.

 Two approaches using logs


1. Deferred database modification
2. Immediate database modification
Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all database
modifications in the log, but deferring all write operations of a transaction until the transaction partially
commits (i.e., once the final action of the transaction has been executed). Then the information in the logs
is used to execute the deferred writes. If the system crashes or if the transaction aborts, then the
information in the logs is ignored.

Let T0 be transaction that transfers $50 from account A to account B:


T0: read (A);
A: = A-50;
Write (A);
Read (B);
B: = B + 50;
Write (B).
Immediate Database Modification

The immediate-update technique allows database modifications to be output to the database while the
transaction is still in the active state. These modifications are called uncommitted modifications. In the
event of a crash or transaction failure, the system must use the old-value field of the log records to
restore the modified data items.
Transactions T0 and T1 executed one after the other in the order T0 followed by T1. The portion of
the log containing the relevant information concerning these two transactions appears in the following,
Portion of the system log corresponding to T0 and T1
< T0 start >
< T0, A, 1000, 950 >
< T0, B, 2000, 2050 >
< T0 commit >
< T1 start >
< T1, C, 700, 600 >
< T0 commit >

Checkpoints

When a system failure occurs, we must consult the log to determine those transactions that need to be
redone and those that need to be undone. Rather than reprocessing the entire log, which is time-
consuming and much of it unnecessary, we can use checkpoints:

 Output onto stable storage all the log records currently residing in main memory.
 Output to the disk all modified buffer blocks.
 Output onto stable storage a log record, <checkpoint>.

 The recovery system reads the logs backwards from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts
the transaction in the redo-list.
 If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the
transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in
the redo-list and their previous logs are removed and then redone before saving their logs.
Shadow Paging

Shadow paging is an alternative to log-based recovery techniques, which has both advantages and
disadvantages. It may require fewer disk accesses, but it is hard to extend paging to allow multiple
concurrent transactions. The paging is very similar to paging schemes used by the operating system for
memory management.
The idea is to maintain two page tables during the life of a transaction: the current page table and
the shadow page table. When the transaction starts, both tables are identical. The shadow page is never
changed during the life of the transaction. The current page is updated with each writeoperation. Each
table entry points to a page on the disk. When the transaction is committed, the shadow page entry
becomes a copy of the current page table entry and the disk block with the old data is released. If the
shadow is stored in nonvolatile memory and a system crash occurs, then the shadow page table is copied
to the current page table. This guarantees that the shadow page table will point to the database pages
corresponding to the state of the database prior to any transaction that was active at the time of the crash,
making aborts automatic.

There are drawbacks to the shadow-page technique:

1. Commit overhead. The commit of a single transaction using shadow paging requires multiple
blocks to be output -- the current page table, the actual data and the disk address of the current
page table. Log-based schemes need to output only the log records.
2. Data fragmentation. Shadow paging causes database pages to change locations (therefore, no
longer contiguous.
3. Garbage collection. Each time that a transaction commits, the database pages containing the old
version of data changed by the transactions must become inaccessible. Such pages are considered
to be garbage since they are not part of the free space and do not contain any usable information.
Periodically it is necessary to find all of the garbage pages and add them to the list of free pages.
This process is called garbage collection and imposes additional overhead and complexity on the
system.

Deadlock Handling
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for
another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some
prevention strategies:
1. Require that each transaction locks all its data items before it 74 begins execution (predeclaration).
2. Impose partial ordering of all data items and require that a transaction can lock data items only in the
order specified by the partial order (graph-based protocol
Following schemes use transaction timestamps for the sake of deadlock prevention alone.
wait-die scheme - non-preemptive
1. older transaction may wait for younger one to release data item. Younger transactions never wait for
older ones; they are rolled back instead.
2. a transaction may die several times before acquiring needed data item
wound-wait scheme – preemptive
1. older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger
transactions may wait for older ones.
2. may be fewer rollbacks than wait-die scheme.
Deadlock prevention
Both in wait-die and in wound-wait schemes, a rolled back transaction is restarted with its original
timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided.
Deadlock Detection
Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of vertices (all
the transactions in the system) E is a set of edges; each element is an ordered pair Ti →Tj.
If Ti → Tj is in E, then there is a directed edge from Ti to Tj, implying that T is wai ng for T to release a data
item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for
graph. This edge is removed only when Tj is no longer holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-
detection algorithm periodically to look for cycles.
Deadlock Recovery
When deadlock is detected:
Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as
victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart
it.
More effective to roll back transaction only as far as necessary to break deadlock.
Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the
cost factor to avoid starvation

Distributed DBMS - Distributed Databases


A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
Advantages of Distributed Database System:

 Transparent Management of Distributed and Replicated Data

 Reliability Through Distributed Transactions

 Improved Performance

 Easier System Expansion

Disadvantages of Distributed Database System

 Design Issues

 Cost of Update Replication , and Syncronization

 Cost of Security Constraints

 Recover from Failure and Syncronization


Distributed Data Storage

There are 2 ways in which data can be stored on different sites. These are:
1. Replication
In this approach, the entire relation is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests can
be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at
one site needs to be recorded at every site that relation is stored or else it may lead to inconsistency. This
is a lot of overhead. Also, concurrency control becomes way more complex as concurrent access now
needs to be checked over a number of sites.

2. Fragmentation
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the
fragments is stored in different sites where they’re required. It must be made sure that the fragments are
such that they can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Fragmentation of relations can be done in two ways:
 Horizontal fragmentation – Splitting by rows – The relation is fragmented into groups of tuples so
that each tuple is assigned to at least one fragment.
 Vertical fragmentation – Splitting by columns – The schema of the relation is divided into smaller
schemas. Each fragment must contain a common candidate key so as to ensure lossless join.

Directory System
A Derby database-A Derby database contains dictionary objects such as tables, columns, indexes,
and jar files. A Derby database can also store its own configuration information.

The database directory


A Derby database is stored in files that live in a directory of the same name as the database. Database
directories typically live in system directories.

Note: An in-memory database does not use the file system, but the size limits listed in the table later in
this topic still apply. For some limits, the maximum value is determined by the available main memory
instead of the available disk space and file system limitations.

A database directory contains the following, as shown in the following figure.

 log directory

Contains files that make up the database transaction log, used internally for data recovery (not
the same thing as the error log).

 seg0 directory

Contains one file for each user table, system table, and index (known as conglomerates).
 service.properties file

A text file with internal configuration information.

 tmp directory

(might not exist.) A temporary directory used by Derby for large sorts and deferred updates and
deletes. Sorts are used by a variety of SQL statements. For databases on read-only media, you
might need to set a property to change the location of this directory. See
"Creating Derby Databases for Read-Only Use".

 jar directory

(might not exist.) A directory in which jar files are stored when you use database class loading.

The following figure shows the files and directories in the Derby database directories that are used by
the Derby software.

Figure 1. An example of a Derby database directory and file structure

Derby imposes relatively few limitations on the number and size of databases and database objects. The
following table shows some size limitations of Derby databases and database objects.

Table 1. Size limits for Derby database objects


Type of Object Limit
Tables in each database java.lang.Long.MAX_VALUE

Some operating systems impose a limit to the number of files


allowed in a single directory.
Indexes in each table 32,767 or storage
Columns in each table 1,012
Number of columns on an index key 16
Rows in each table No limit.
Table 1. Size limits for Derby database objects
Type of Object Limit
Size of table No limit. Some operating systems impose a limit on the size
of a single file.
Size of row No limit. Rows can span pages. Rows cannot span tables so
some operating systems impose a limit on the size of a single
file, which results in limiting the size of a table and size of a
row in that table.

You might also like