0% found this document useful (0 votes)
7 views

DBMS unit4

notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

DBMS unit4

notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Transaction

Transactions refer to a set of operations that are used for performing a set of
logical work. Usually, a transaction means the data present in the DB has
changed. Protecting the user data from system failures is one of the primary
uses of DBMS.
We can define a transaction as a group of tasks in DBMS. Here a single task
refers to a minimum processing unit, and we cannot divide it further. Now let us
take the example of a certain simple transaction. Suppose any worker transfers
Rs 1000 from X’s account to Y’s account. This given small and simple
transaction involves various low-level tasks.
o The transaction is a set of logically related operation. It contains a group
of tasks.
o A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.

X’s Account
Open_Account(X)
Old_Bank_Balance = X.balance
New_Bank_Balance = Old_Bank_Balance – 1000
A.balance = New_Bank_Balance
Close_Bank_Account(X)
Y’s Account
Open_Account(Y)
Old_Bank_Balance = Y.balance
New_Bank_Balance = Old_Bank_Balance + 1000
B.balance = New_Bank_Balance
Close_Bank_Account(Y)

Example: Suppose an employee of bank transfers Rs 800 from X's account to


Y's account. This small transaction contains several low-level tasks:

X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the
transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are


visible.Example: Let's assume that following transaction T consisting of T1 and
T2. A consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from
account A to account B.
T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs


400.

If the transaction T fails after the completion of transaction T1 but before


completion of transaction T2, then the amount will be deducted from A but not
added to B. This shows the inconsistent database state. In order to ensure
correctness of database state, the transaction must be executed in entirety.

Consistency
o The integrity constraints are maintained so that the database is consistent
before and after the transaction.
o The execution of a transaction will leave a database in either its prior
stable state or a new stable state.
o The consistent property of database states that every transaction sees a
consistent database instance.
o The transaction is used to transform the database from one consistent
state to another consistent state.

For example: The total amount must be maintained before or after the
transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900

Therefore, the database is consistent. In the case when T1 is completed but T2


fails, then inconsistency will occur.

Isolation
o It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one is
completed.
o In isolation, if the transaction T1 is being executed and using the data
item X, then that data item can't be accessed by any other transaction T2
until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation
property.

Durability
o The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or
by the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot
be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.

States of Transaction

In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the
transaction is being executed.
o This is the state in which a transaction is being executed. Thus, it is like
the initial state of any given transaction.
o For example: Insertion or deletion or updating a record is done here. But
all the records are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation,
but the data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks
step is executed in this state.
o A transaction is in its partially committed state whenever it executes the
final operation.
o

Committed

A transaction is said to be in a committed state if it executes all its operations


successfully. In this state, all the effects are now permanently saved on the
database system.

Failed state
o In the example of total mark calculation, if the database is not able to fire
a query to fetch the marks, then the transaction will fail to execute.
o In case any check made by a database recovery system fails, then that
transaction is in a failed state. Remember that a failed transaction can not
proceed further.

Aborted
o If any of the checks fail and the transaction has reached a failed state then
the database recovery system will make sure that the database is in its
previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled back to
its consistent state.
o After aborting the transaction, the database recovery module will select
one of the two operations:
1. Re-start the transaction
2. Kill the transaction
Operations of Transaction:

Following are the main operations of transaction:

46Mow to find Nth Highest Salary in SQL

Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from
the buffer.

Let's take an example to debit transaction from an account which consists of


following operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's
final value will be 3500.

But it may be possible that because of the failure of hardware, software or


power, etc. that transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is
not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Scheduling in DBMS
Overview

Scheduling is the technique of preserving the order of the operations from one
transaction to another while executing such concurrent transactions. A series of
operations from one transaction to another transaction is known as a schedule.

What is Scheduling, and why is it required?


A series of operation from one transaction to another transaction is known as
schedule. It is used to preserve the order of the operation in each of the
individual transaction.

Transactions are a set of instructions that perform operations on databases.


When multiple transactions are running concurrently, then a sequence is needed
in which the operations are to be performed because at a time, only one
operation can be performed on the database. This sequence of operations is
known as Schedule, and this process is known as Scheduling.

When multiple transactions execute simultaneously in an unmanageable


manner, then it might lead to several problems, which are known
as concurrency problems. In order to overcome these problems, scheduling is
required.

Types of Schedules

There are mainly two types of scheduling -

1. Serial Schedule
2. Non-serial Schedule Further, they are divided into their subcategories, as
shown below.
Serial Schedule

As the name says, all the transactions are executed serially one after the other.
In serial Schedule, a transaction does not start execution until the currently
running transaction finishes execution. This type of execution of the transaction
is also known as non-interleaved execution. Serial Schedule are always
recoverable, cascades, strict and consistent. A serial schedule always gives the
correct result.

Transactions present in this schedule are executed serially, after the instruction
of Ti completes, the instructions of Tj will be executed, where j=i+1.
Serial schedule guarantee consistency −
 For 2 transactions, total number of serial schedules possible = 2.
 For 3 transactions, total number of serial schedules possible = 6.
2 Transaction 3 Transaction
T1->T2 T1->T2->T3
T2->T1 T1->T3->T2
T2->T1->T3
T2->T3->T1
T3->T1->T2
T3->T2->T1

Consider two transactions T1 and T2 shown above, which perform some


operations. If it has no interleaving of operations, then there are the following
two possible outcomes - Either execute all T1 operations, which were followed
by all T2 operations. Or execute all T2 operations, which were followed by all
T1 operations. In the above figure, the Schedule shows the serial Schedule
where T1 is followed by T2, i.e. T1 -> T2. Where R(A) -> reading some data
item ‘A’. And, W(B) -> writing/updating some data item ‘B’. If n = number of
transactions, then a number of serial schedules possible = n!.

Therefore, for the above 2 transactions, a total number of serial schedules


possible = 2.
Non-serial Schedule

In a non-serial Schedule, multiple transactions execute


concurrently/simultaneously, unlike the serial Schedule, where one transaction
must wait for another to complete all its operations. In the Non-Serial Schedule,
the other transaction proceeds without the completion of the previous
transaction. All the transaction operations are interleaved or mixed with each
other.
Non-serial schedules are NOT always recoverable, cascades, strict and
consistent.

In this Schedule, there are two transactions, T1 and T2, executing concurrently.
The operations of T1 and T2 are interleaved. So, this Schedule is an example of
a Non-Serial Schedule.
Total number of non-serial schedules = Total number of schedules – Total
number of serial schedules

Non-serial schedules are further categorized into serializable and non-


serializable schedules. Let's now discuss further Serializability.
Serializability in DBMS

Serializability is a concept that helps to identify which non-serial schedules are


correct and will maintain the consistency of the database.
A serializable schedule always leaves the database in a consistent state. A serial
schedule is always a serializable schedule because, in a serial Schedule, a
transaction only starts when the other transaction has finished execution. A non-
serial schedule of n transactions is said to be a serializable schedule, if it is
equivalent to the serial Schedule of those n transactions. A serial schedule does
not allow concurrency; only one transaction executes at a time, and the other
starts when the already running transaction is finished.

Difference between Serial Schedule and Serializable Schedule


Non-serial Schedules
When a transaction is overlapped between the transactions T1 and T2.
Example
T1 T2

READ1(A)

WRITE1(A)

READ2(B)

WRITE2(B)

READ1(B)

WRITE1(B)

READ1(B)

Types of Non serial schedules


Non serial schedules are divided into serializability and the Non serial
schedules. Let us first discuss serializability.
There are two types of serializability which are as follows −
View serializability
A schedule is view-serializability if it is view equivalent to a serial schedule.
The rules it follows are given below −
 T1 is reading the initial value of A, and then T2 also reads the
initial value of A.
 T1 is the reading value written by T2, and then T2 also reads the
value written by T1.
 T1 is writing the final value, and then T2 also has the write
operation as the final value.
Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A
pair of operations is said to conflict if they operate on the same data item and
one of them is a write operation.

That means,
 Readi(x) readj(x) - non conflict read-read operation
 Readi(x) writej(x) - conflict read-write operation.
 Writei(x) readj(x) - conflict write-read operation.
 Writei(x) writej(x) - conflict write-write operation.

Conflicting Operations-

Two operations are called as conflicting operations if all the following


conditions hold true for them-
 Both the operations belong to different transactions
 Both the operations are on the same data item
 At least one of the two operations is a write operation

Example-

Consider the following schedule-


In this schedule,
 W1 (A) and R2 (A) are called as conflicting operations.
 This is because all the above conditions hold true for them.

Checking Whether a Schedule is Conflict Serializable Or Not-

Follow the following steps to check whether a given non-serial schedule is


conflict serializable or not-

Step-01:

Find and list all the conflicting operations.

Step-02:

Start creating a precedence graph by drawing one node for each transaction.

Step-03:

 Draw an edge for each conflict pair such that if X i (V) and Yj (V) forms a
conflict pair then draw an edge from Ti to Tj.
 This ensures that Ti gets executed before Tj.

Step-04:

 Check if there is any cycle formed in the graph.


 If there is no cycle found, then the schedule is conflict serializable
otherwise not.

NOTE-

 Byperforming the Topological Sort of the Directed Acyclic Graph so


obtained, the corresponding serial schedule(s) can be found.
 Such schedules can be more than 1.

PRACTICE PROBLEMS BASED ON CONFLICT SERIALIZABILITY-


Problem-01:

Check whether the given schedule S is conflict serializable or not-


S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)

Solution-

Step-01:

List all the conflicting operations and determine the dependency between the
transactions-
 R2(A) , W1(A) (T2 → T1)
 R1(B) , W2(B) (T1 → T2)
 R3(B) , W2(B) (T3 → T2)

Step-02:

Draw the precedence graph-

 Clearly,there exists a cycle in the precedence graph.


 Therefore, the given schedule S is not conflict serializable.
Problem-02:

Check whether the given schedule S is conflict serializable and recoverable or


not-

Solution-

Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the
transactions-
 R2(X) , W3(X) (T2 → T3)
 R2(X) , W1(X) (T2 → T1)
 W3(X) , W1(X) (T3 → T1)
 W3(X) , R4(X) (T3 → T4)
 W1(X) , R4(X) (T1 → T4)
 W2(Y) , R4(Y) (T2 → T4)

Step-02:

Draw the precedence graph-

 Clearly,there exists no cycle in the precedence graph.


 Therefore, the given schedule S is conflict serializable.

Checking Whether S is Recoverable Or Not-

 Conflictserializable schedules are always recoverable.


 Therefore, the given schedule S is recoverable.

Alternatively,
 There exists no dirty read operation.
 This is because all the transactions which update the values commits
immediately.
 Therefore, the given schedule S is recoverable.
 Also, S is a Cascadeless Schedule.
Problem-03:

Check whether the given schedule S is conflict serializable or not. If yes, then
determine all the possible serialized schedules-

Solution-

Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the
transactions-
 R4(A) , W2(A) (T4 → T2)
 R3(A) , W2(A) (T3 → T2)
 W1(B) , R3(B) (T1 → T3)
 W1(B) , W2(B) (T1 → T2)
 R3(B) , W2(B) (T3 → T2)
Step-02:

Draw the precedence graph-

 Clearly,there exists no cycle in the precedence graph.


 Therefore, the given schedule S is conflict serializable.

Finding the Serialized Schedules-


All the possible topological orderings of the above precedence graph will be
the possible serialized schedules.
 Thetopological orderings can be found by performing the Topological
Sort of the above precedence graph.
After performing the topological sort, the possible serialized schedules are-
1. T1 → T3 → T4 → T2
2. T1 → T4 → T3 → T2
3. T4 → T1 → T3 → T2

Problem-04:
Determine all the possible serialized schedules for the given schedule-
Solution-

The given schedule S can be rewritten as-


This is because we are only concerned about the read and write operations
taking place on the database.
Checking Whether S is Conflict Serializable Or Not-
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
 R1(A) , W2(A) (T1 → T2)
 R2(A) , W1(A) (T2 → T1)
 W2(A) , W1(A) (T2 → T1)
 R2(B) , W1(B) (T2 → T1)
 R1(B) , W2(B) (T1 → T2)
 W1(B) , W2(B) (T1 → T2)

Step-02:

Draw the precedence graph-


 Clearly,there exists a cycle in the precedence graph.
 Therefore, the given schedule S is not conflict serializable.
 Thus, Number of possible serialized schedules = 0.

View Serializability

A schedule is viewed serializable if it is view equivalent to a serial schedule. If


a schedule is conflict serializable, then it will be view serializable. The view
serializable which does not conflict with serializable, contains blind writes.

A Non-Serializability in DBMS

non-serial schedule that is not serializable is called a non-serializable schedule.


Non-serializable schedules may/may not be consistent or recoverable. Non-
serializable Schedule is divided into types:
1. Recoverable Schedule
2. Non-recoverable Schedule

Recoverable Schedule
A schedule is recoverable if each transaction commits only after all the
transactions from which it has read have committed. In other words, if some
transaction Ty reads a value that has been updated/written by some other
transaction Tx, then the commit of Ty must occur after the commit of Tx.
Consider the following example −

T1 T2

R(X)

W(X)

W(X)

R(X)

READ1(B)

Commit

commit
Here, transaction T2 is reading the value written by transaction T1 and the
commit of T2 occurs after the commit of T1. Hence, it is a recoverable
schedule.
Non-Recoverable Schedule
If a transaction reads the value of an operation from an uncommitted
transaction and commits before the transaction from where it has read the
value, then such a schedule is called Non-Recoverable schedule. A non-
recoverable schedule means when there is a system failure, we may not be able
to recover to a consistent database state. If the commit operation of Ti doesn't
occur before the commit operation of Tj, it is non-recoverable.
A schedule that is not recoverable is non-recoverable. If the commit operation
of Ti doesn't occur before the commit operation of Tj, it is non-recoverable.
Consider an example for the non-recoverable schedule as given below −
Schedule 1
T1 T2

read(x)

x=x-n

write(x)

read(x)

x=x+n

write(x)

commit
Recoverable schedules are further categorized into 3 types:

1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule

Cascading Schedule
A cascading schedule is classified as a recoverable schedule. A recoverable
schedule is basically a schedule in which the commit operation of a particular
transaction that performs read operation is delayed until the uncommitted
transaction either commits or roll backs.
A cascading rollback is a type of rollback in which if one transaction fails, then
it will cause rollback of other dependent transactions. The main disadvantage of
cascading rollback is that it can cause CPU time wastage.
Given below is an example of a cascading schedule −

T1 T2 T3 T4

Read(A)

Write(A)

Read (A)

Write(A)

Read(A)

Write(A)

Read(A)

Write(A)

Failure
The above transaction is cascading rollback because of T1 failure, T2 is
rollback and rollback of T2 causes T3 to rollback and rollback T3 causes the T4
to rollback.
Cascadeless schedule
When a transaction is not allowed to read data until the last transaction which
has written it is committed or aborted, these types of schedules are called
cascadeless schedules.
Given below is an example of a cascadeless schedule −

T1 T2

R(X)

W(X)

W(X)

Commit

R(X)

Commit
Here, the updated value of X is read by transaction T2 only after the commit of
transaction T1. Hence, the schedule is cascadeless schedule.
Strict Schedule

If in a schedule, until the last transaction that has written it is committed or


aborted, a transaction is neither allowed to read nor write data item, then such a
schedule is called as Strict Schedule. Let's say we have two
transactions Ta and Tb. The write operation of transaction Ta precedes the read
or write operation of transaction Tb, so the commit or abort operation of
transaction Ta should also precede the read or write of Tb. A strict Schedule
allows only committed read and write operations. This Schedule implements
more restrictions than cascadeless schedule. Consider an example shown below.

 Strict schedule
Given below is an example of a strict schedule −

T1 T2

R(X)

R(X)

W(X)
T1 T2

Commit

W(X)

R(X)

Commit

Failure Classification

To find that where the problem has occurred, we generalize a failure into the
following categories:

1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches


a point from where it can't go any further. If a few transaction or process
is hurt, then this is called as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code


error or an internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.

2. System Crash
o System failure can occur due to power failure or other hardware or
software failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is


assumed not to be corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head
crash, and unreachability to the disk or any other failure, which
destroy all or part of disk storage.
DBMS Concurrency Control

Concurrency Control is the management procedure that is required for


controlling concurrent execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about


concurrent execution.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of the
database. It means that the same database is executed simultaneously on a
multi-user system by different users.
o While working on the database transactions, there occurs the requirement
of using the database by multiple users for performing different
operations, and in that case, concurrent execution of the database is
performed.
o The thing is that the simultaneous execution that is performed should be
done in an interleaved manner, and no operation should affect the other
executing operations, thus maintaining the consistency of the database.
Thus, on making the concurrent execution of the transaction operations,
there occur several challenging problems that need to be solved.

Problems with Concurrent Execution

In a database transaction, the two main operations


are READ and WRITE operations. So, there is a need to manage these two
operations in the concurrent execution of the transactions as if these operations
are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution
of the operations:

Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved manner (i.e.,
concurrent execution) that makes the values of the items incorrect hence making
the database inconsistent.
Consider the below diagram where two transactions T X and TY, are
performed on the same account A where the balance of account A is $300.

o At time t1, transaction TX reads the value of account A, i.e., $300 (only
read).
o At time t2, transaction TX deducts $50 from account A that becomes $250
(only deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that
will be $300 only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400
(only added but not updated/write).
o At time t6, transaction TX writes the value of account A that will be
updated as $250 only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it
will write as done at time t4 that will be $400. It means the value written
by TX is lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the
database, and somehow the transaction fails, and before the data gets rollback,
the updated database item is accessed by another transaction. There comes the
Read-Write Conflict between both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing


read/write operations on account A where the available balance in account
A is $300:

o At time t1, transaction TX reads the value of account A, i.e., $300.


o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e.,
$350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the
value changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as
committed, which is the dirty read and therefore known as the Dirty Read
Problem.

Unrepeatable Read Problem (W-R Conflict)

Also known as Inconsistent Retrievals Problem that occurs when in a


transaction, two different values are read for the same database item.

For example:
Consider two transactions, TX and TY, performing the read/write
operations on account A, having an available balance = $300. The diagram
is shown below:

o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100
to the available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account
A, and that will be read as $400.
o It means that within the same transaction TX, it reads two different values
of account A, i.e., $ 300 initially, and after updation made by transaction
TY, it reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such problems
that take place in concurrent execution, management is needed, and that is
where the concept of Concurrency Control comes into role.

Concurrency Control

Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation,


durability and serializability of the concurrent execution of the database
transactions. Therefore, these protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

Lock based Protocol in DBMS


A lock is kind of a mechanism that ensures that the integrity of data is
maintained. It does that, by locking the data while a transaction is running, any
transaction cannot read or write the data until it acquires the appropriate lock.
There are two types of a lock that can be placed while accessing the data so that
the concurrent transaction can not alter the data while we are processing it.

1. Shared Lock(S)
2. Exclusive Lock(X)

1. Shared Lock(S): Shared lock is placed when we are reading the data,
multiple shared locks can be placed on the data but when a shared lock is placed
no exclusive lock can be placed.

To understand the lock mechanism let’s take an example of conflict:

You and your brother have a joint bank account, from which you both can
withdraw money. Now let’s say you both go to different branches of the same
bank at the same time and try to withdraw 5000 INR, your joint account has
only 6000 balance.

Now if we don’t have concurrency control in place you both can get 5000 INR
at the same time but once both the transactions finish the account balance would
be -4000 which is not possible and leaves the database in inconsistent state.

We need something that controls the transactions in such a way that allows the
transaction to run concurrently but maintaining the consistency of data to avoid
such issues.

Solution of the above problem using Shared lock:


For example, when two transactions are reading Steve’s account balance, let
them read by placing shared lock but at the same time if another transaction
wants to update the Steve’s account balance by placing Exclusive lock, do not
allow it until reading is finished.

2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write
the data. This lock allows both the read and write operation, Once this lock is
placed on the data no other lock (shared or Exclusive) can be placed on the data
until Exclusive lock is released.

For example, when a transaction wants to update the Steve’s account balance,
let it do by placing X lock on it but if a second transaction wants to read the
data(S lock) don’t allow it, if another transaction wants to write the data(X lock)
don’t allow that either.

So based on this we can create a table like this:

Lock Compatibility Matrix

__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------

Difference between Shared Lock and Exclusive Lock


One of the method to ensure isolation property in transaction is to require
data items be accessed in a mutually exclusive manner. That means, while one
transaction is accessing a data item, no other transaction can modify that data
item. So, the most common method used to implement requirement is to
allow a transaction to access a data item only if it is currently holding a lock on
that item. Thus, the lock on operation is required to ensure isolation of
transaction.
1. Shared Lock (S):
 Another transaction that tries to read the same data is permitted to read, but a
transaction that tries to update the data will be prevented from doing so until
the shared lock is released.
 Shared lock is also called read lock, used for reading data items only.
 Shared locks support read integrity. They ensure that a record is not in
process of being updated during a read-only request.
 Shared locks can also be used to prevent any kind of updates of record.
 It is denoted by Lock-S.
 S-lock is requested using Lock-S instruction.
For example, consider a case where initially A=100 and there are two
transactions which are reading A. If one of transaction wants to update A, in
that case other transaction would be reading wrong value. However, Shared
lock prevents it from updating until it has finished reading.

2. Exclusive Lock (X) :


 When a statement modifies data, its transaction holds an exclusive lock on
data that prevents other transactions from accessing the data.
 This lock remains in place until the transaction holding the lock issues a
commit or rollback.
 They can be owned by only one transaction at a time.
 With the Exclusive Lock, a data item can be read as well as written. Also
called write lock.
 Any transaction that requires an exclusive lock must wait if another
transaction currently owns an exclusive lock or a shared lock against the
requested resource.
 They can be owned by only one transaction at a time.
 It is denoted as Lock-X.
 X-lock is requested using Lock-X instruction.

For example, consider a case where initially A=100 when a transaction needs to
deduct 50 from A. We can allow this transaction by placing X lock on it.
Therefore, when the any other transaction wants to read or write, exclusive lock
prevent it. Lock Compatibility Matrix :

Compatibility matrix for locks

 If the transaction T1 is holding a shared lock in data item A, then the control
manager can grant the shared lock to transaction T2 as compatibility is
TRUE, but it cannot grant the exclusive lock as compatibility is FALSE.
 In simple words if transaction T1 is reading a data item A, then same data
item A can be read by another transaction T2 but cannot be written by
another transaction.
 Similarly if an exclusive lock (i.e. lock for read and write operations) is hold
on the data item in some transaction then no other transaction can acquire
Shared or Exclusive lock as the compatibility function denoted FALSE.

Difference between Shared Lock and Exclusive Lock :


S.No.Shared Lock Exclusive Lock
Lock mode is read as well as write
1. Lock mode is read only operation. operation.
Shared lock can be placed on objects Exclusive lock can only be placed
that do not have an exclusive lock on objects that do not have any
2. already placed on them. other kind of lock.
Prevents others from updating the Prevents others from reading or
3. data. updating the data.
Issued when transaction wants to read
item that do not have an exclusive Issued when transaction wants to
4. lock. update unlocked item.
Any number of transaction can hold Exclusive lock can be hold by only
5. shared lock on an item. one transaction.
S-lock is requested using lock-S X-lock is requested using lock-X
6. instruction. instruction.

Two Phase Locking Protocol

We have discussed briefly the first type of Concurrency Control Protocol, i.e.,
Lock-based Protocol.
Now, recalling where we last left off, there are two types of Locks
available Shared S(a) and Exclusive X(a). Implementing this lock system
without any restrictions gives us the Simple Lock-based protocol (or Binary
Locking), but it has its own disadvantages, they do not guarantee
Serializability. Schedules may follow the preceding rules but a non-
serializable schedule may result.
To guarantee serializability, we must follow some additional
protocol concerning the positioning of locking and unlocking operations in
every transaction. This is where the concept of Two-Phase Locking(2-PL)
comes into the picture, 2-PL ensures serializability. Now, let’s dig deep!
Two-Phase Locking –

A transaction is said to follow the Two-Phase Locking protocol if Locking and


Unlocking can be done in two phases.
1. Growing Phase: New locks on data items may be acquired but none can be
released.
2. Shrinking Phase: Existing locks may be released but no new locks can be
acquired.
Note – If lock conversion is allowed, then upgrading of lock( from S(a) to
X(a) ) is allowed in the Growing Phase, and downgrading of lock (from X(a)
to S(a)) must be done in shrinking phase.
Let’s see a transaction implementing 2-PL.

T1 T2
1 lock-S(A)
2 lock-S(A)
3 lock-X(B)
4 ……. ……
5 Unlock(A)
6 Lock-X(C)
7 Unlock(B)
8 Unlock(A)
9 Unlock(C)
10……. ……
This is just a skeleton transaction that shows how unlocking and locking work
with 2-PL. Note for:
Transaction T1:
 The growing Phase is from steps 1-3.
 The shrinking Phase is from steps 5-7.
 Lock Point at 3
Transaction T2:
 The growing Phase is from steps 2-6.
 The shrinking Phase is from steps 8-9.
 Lock Point at 6
Hey, wait!
What is LOCK POINT? The Point at which the growing phase ends, i.e.,
when a transaction takes the final lock it needs to carry on its work. Now look
at the schedule, you’ll surely understand.
I have said that 2-PL ensures serializability, but there are still some drawbacks
of 2-PL. Let’s glance at the drawbacks:
 Cascading Rollback is possible under 2-PL.
 Deadlocks and Starvation are possible.
Cascading Rollbacks in 2-PL –
Let’s see the following Schedule:

Take a moment to analyze the schedule. Yes, you’re correct, because of Dirty
Read in T2 and T3 in lines 8 and 12 respectively, when T1 failed we have to
roll back others also. Hence, Cascading Rollbacks are possible in 2-PL. I
have taken skeleton schedules as examples because it’s easy to understand
when it’s kept simple. When explained with real-time transaction problems
with many variables, it becomes very complex.
Deadlock in 2-PL –
Consider this simple example, it will be easy to understand. Say we have two
transactions T1 and T2.
Schedule: Lock-X1(A) Lock-X2(B) Lock-X1(B) Lock-X2(A)
Drawing the precedence graph, you may detect the loop. So Deadlock is also
possible in 2-PL.
Two-phase locking may also limit the amount of concurrency that occurs in a
schedule because a Transaction may not be able to release an item after it has
used it. This may be because of the protocols and other restrictions we may put
on the schedule to ensure serializability, deadlock freedom, and other factors.
This is the price we have to pay to ensure serializability and other factors,
hence it can be considered as a bargain between concurrency and maintaining
the ACID properties.
The above-mentioned type of 2-PL is called Basic 2PL. To sum it up it
ensures Conflict Serializability but does not prevent Cascading Rollback and
Deadlock. Further, we will study three other types of 2PL, Strict 2PL,
Conservative 2PL, and Rigorous 2PL.

Categories of Two Phase Locking (Strict , Rigorous & Conservative)

Strict:
A transaction needs to comply with 2PL, and release its write (exclusive) locks
only after it has ended, i.e.being either committed or aborted.
The first phase of Strict-2PL is the same as 2PL.
After acquiring all the locks in the first phase, the transaction continues to
execute normally.
Strict-2PL holds all the locks until the commit point and releases all
the locks at a time.
Strict-2PL does not have cascading abort as 2PL does.
On the other hand, read (shared) locks are released regularly during phase 2.
Example:
Rigorous :

It requires that in addition to the lock being 2-Phase all Exclusive(X) and
Shared(S) Locks held by the transaction be released until after the
Transaction Commits.
 It guarantees that starvation cannot occur.

 It guarantees that deadlock cannot occur.

 Rigorous two-phase locking is even stricter: here all locks are held till

commit/abort. In this protocol transactions can be serialized in the order


in which they commit.
Example:
Conservative:

Conservative two-phase locking (C2PL) is a locking method used in


DBMS and relational databases.
 It prevents deadlocks but not starvation and cascading rollbacks.

 This is to ensure that a transaction that already holds some locks will not

block waiting for other locks.


 The difference between 2PL and C2PL is that C2PL's transactions obtain

all the locks they need before the transactions begin.


 All locks are acquired before beginning of transaction no acquisition in
between transaction therefore no deadlock but rollback problem still is
there.
Example:
Validation Based Protocol in DBMS

Validation based protocol avoids the concurrency of the transactions and


works based on the assumption that if no transactions are running concurrently
then no interference occurs. This is why it is also called Optimistic
Concurrency Control Technique.

In this protocol, a transaction doesn’t make any changes to the database directly,
instead it performs all the changes on the local copies of the data items that are
maintained in the transaction itself. At the end of the transaction, a validation is
performed on the transaction. If it doesn’t violate any serializability rule, the
transaction commit the changes to the database else it is updated and restarted.

Three phases of Validation based Protocol

1. Read phase: In this phase, a transaction reads the value of data items from
database and store their values into the temporary local variables.
Transaction then starts executing but it doesn’t update the data items in the
database, instead it performs all the operations on temporary local
variables.
2. Validation phase: In this phase, a validation check is done on the
temporary variables to see if it violates the rules of serializability.
3. Write phase: This is the final phase of validation based protocol. In this
phase, if the validation of the transaction is successful then the values of
temporary local variables is written to the database and the transaction is
committed. If the validation is failed in second phase then the updates are
discarded and transaction is slowed down to be restarted later.

Let’s look at the timestamps of each phase of a transaction:

Start(Tn): It represents the timestamp when the transaction Tn starts the


execution.

Validation(Tn): It represents the timestamp when the transaction Tn finishes


the read phase and starts the validation phase.

Finish(Tn): It represents the timestamp when the transaction Tn finishes all the
write operations.

This protocol uses the Validation(Tn) as the timestamp of the transaction Tn


because this is actual phase of the transaction where all the checks happen. So it
is safe to say that TS(Tn) = Validation(Tn).

If there are two transactions T1 & T2 managed by validation based protocol and
if Finish(T1) < Start(T2) then the validation will be successful as the
serializability is maintained because T1 finished the execution well before the
transaction T2 started the read phase.

Checkpoints in DBMS
Why do we need Checkpoints ?
Whenever transaction logs are created in a real-time environment, it eats up
lots of storage space. Also keeping track of every update and its maintenance
may increase the physical space of the system. Eventually, the transaction log
file may not be handled as the size keeps growing. This can be addressed with
checkpoints. The methodology utilized for removing all previous transaction
logs and storing them in permanent storage is called a Checkpoint.
What is a Checkpoint ?
The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed. During transaction
execution, such checkpoints are traced. After execution, transaction log files
will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its
update to the database. Then a new log is created with upcoming execution
operations of the transaction and it will be updated until the next checkpoint
and the process continues.
to use Checkpoints in How database ?
Steps :
1. Write begin_checkpoint record into log.
2. Collect checkpoint data in the stable storage.
3. Write end_checkpoint record into log.
The behavior when the system crashes and recovers when concurrent
transactions are executed is shown below –

Understanding Checkpoints in multiple Transactions

 The recovery system reads the logs backward from the end to the last
checkpoint i.e. from T4 to T1.
 It will keep track of two lists – Undo and Redo.
 Whenever there is a log with instruction <Tn, start>and <Tn, commit> or
only <Tn, commit> then it will put that transaction in Redo List. T2 and T3
contain <Tn, Start> and <Tn, Commit> whereas T1 will have only <Tn,
Commit>. Here, T1, T2, and T3 are in the redo list.
 Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn,
commit> as it is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and
then redone before saving their logs. All the transactions in the undo-list are
undone and their logs are deleted.
Relevance of Checkpoints :

A checkpoint is a feature that adds a value of C in ACID-compliant


to RDBMS. A checkpoint is used for recovery if there is an unexpected
shutdown in the database. Checkpoints work on some intervals and write all
dirty pages (modified pages) from logs relay to data file from i.e from a buffer
to physical disk. It is also known as the hardening of dirty pages. It is a
dedicated process and runs automatically by SQL Server at specific intervals.
The synchronization point between the database and transaction log is served
with a checkpoint.

Advantages of using Checkpoints :


 It speeds up data recovery process.
 Most of the dbms products automatically checkpoints themselves.
 Checkpoint records in log file is used to prevent unnecessary redo
operations.
 Since dirty pages are flushed out continuously in the background, it has
very low overhead and can be done frequently.

Real-Time Applications of Checkpoints :
 Whenever an application is tested in real-time environment that may have
modified the database, it is verified and validated using checkpoints.
 Checkpoints are used to create backups and recovery prior to applying any
updates in the database.
 The recovery system is used to return the database to the checkpoint state.

Difference between Conflict and View Serializability :


S.No. Conflict Serializability View Serializability

Two schedules are said to be conflict Two schedules are said to be view
equivalent if all the conflicting equivalent if the order of initial read,
operations in both the schedule get final write and update operations is
executed in the same order. If a the same in both the schedules. If a
schedule is a conflict equivalent to schedule is view equivalent to its
its serial schedule then it is called serial schedule then it is called View
1. Conflict Serializable Schedule. Serializable Schedule.

If a schedule is view serializable If a schedule is conflict serializable


then it may or may not be conflict then it is also view serializable
2. serializable. schedule.

Conflict equivalence can be easily View equivalence is rather difficult


achieved by reordering the to achieve as both transactions
operations of two transactions should perform similar actions in a
therefore, Conflict Serializability is similar manner. Thus, View
3. easy to achieve. Serializability is difficult to achieve.

For a transaction T1 writing a value If a transaction T1 writes a value A


4. A that no one else reads but later that no other transaction reads
S.No. Conflict Serializability View Serializability

some other transactions say T2 write (because later some other


its own value of A, W(A) cannot be transactions say T2 writes its own
placed under positions where it is value of A) W(A) can be placed in
never read. positions of the schedule where it is
never read.

Example for Conflict Serializability –


Let us consider the following transaction schedule and test it for Conflict
Serializability
T1 T2 T3

R(X)

R(X)

W(Y)

W(X)

R(Y)

W(Y)

Now, we will list all the conflicting operations. Further, we will determine
whether the schedule is conflict serializable using Precedence Graph.
Two operations are said to be conflicting if the belong to different transaction,
operate on same data and at least one of them is a write operation.
1. R3(X) and W2(X) [ T3 -> T2 ]
2. W1(Y) and R3(Y) [ T1 -> T3 ]
3. W1(Y) and W2(Y) [ T1 -> T2 ]
4. R3(Y) and W2(Y) [ T3 -> T2 ]
Constructing the precedence graph, we see there are no cycles in the graph.
Therefore, the schedule is Conflict Serializable.
The serializable schedule is,
T1 -> T3 -> T2
Example for View Serializability –
Let us consider the following transaction schedule and test it for View
Serializability.
T1 T2 T3

R(A)

W(A)

R(A)

W(A)

W(A)

As we know that if a schedule is Conflict Serializable, then it is View


Serializable also. So first let us check for Conflict Serializability.
The conflicting operations for this schedule are –
1. R1(A) and W2(A) [ T1 -> T2 ]
2. R1(A) and W2(A) [ T1 -> T3 ]
3. W2(A) and R3(A) [ T2 -> T3 ]
4. W2(A) and W1(A) [ T2 -> T1 ]
5. W2(A) and W3(A) [ T2 -> T3 ]
6. R3(A) and W1(A) [ T3 -> T1 ]
7. W3(A) and W1(A) [ T1 -> T3 ]
Constructing the precedence graph for conflicting operations in the
schedule.

As we can see that there is a cycle in the precedence graph, it means that
the given schedule is not Conflict Serializable. Now, on checking for blind
write we get that there exists a blind write W2(A) in the given schedule.
Thus, the schedule may or may not be View Serializable.
In order to check for View Serializability, we will draw aDependency Graph of
the schedule. From the given schedule we gather the following points :
1. T1 reads A before T2 updates A thus, T1 must execute before T2.
2. T3 does the final update on A thus, it must execute in the end.
Constructing the dependency graph.

As there exists no cycle in the graph, we can say that the given schedule is
View Serializable.
The serializable schedule is T1 -> T2 -> T3.
Introduction of Shadow Paging
Shadow Paging is recovery technique that is used to recover database. In this
recovery technique, database is considered as made up of fixed size of logical
units of storage which are referred as pages. pages are mapped into physical
blocks of storage, with help of the page table which allow one entry for each
logical page of database. This method uses two page tables named current
page table and shadow page table.

The entries which are present in current page table are used to point to most
recent database pages on disk. Another table i.e., Shadow page table is used
when the transaction starts which is copying current page table. After this,
shadow page table gets saved on disk and current page table is going to be
used for transaction. Entries present in current page table may be changed
during execution but in shadow page table it never get changed. After
transaction, both tables become identical.
This technique is also known as Cut-of-Place updating.

To understand concept, consider above figure. In this 2 write operations are


performed on page 3 and 5. Before start of write operation on page 3, current
page table points to old page 3. When write operation starts following steps are
performed :
1. Firstly, search start for available free block in disk blocks.
2. After finding free block, it copies page 3 to free block which is represented
by Page 3 (New).
3. Now current page table points to Page 3 (New) on disk but shadow page
table points to old page 3 because it is not modified.
4. The changes are now propagated to Page 3 (New) which is pointed by
current page table.
COMMIT Operation :
To commit transaction following steps should be done :
1. All the modifications which are done by transaction which are present in
buffers are transferred to physical database.
2. Output current page table to disk.
3. Disk address of current page table output to fixed location which is in
stable storage containing address of shadow page table. This operation
overwrites address of old shadow page table. With this current page table
becomes same as shadow page table and transaction is committed.
Failure :
If system crashes during execution of transaction but before commit operation,
With this, it is sufficient only to free modified database pages and discard
current page table. Before execution of transaction, state of database get
recovered by reinstalling shadow page table.
If the crash of system occur after last write operation then it does not affect
propagation of changes that are made by transaction. These changes are
preserved and there is no need to perform redo operation.
Advantages :
 This method require fewer disk accesses to perform operation.
 In this method, recovery from crash is inexpensive and quite fast.
 There is no need of operations like- Undo and Redo.
Disadvantages :
 Due to location change on disk due to update database it is quite difficult to
keep related pages in database closer on disk.
 During commit operation, changed blocks are going to be pointed by
shadow page table which have to be returned to collection of free blocks
otherwise they become accessible.
 The commit of single transaction requires multiple blocks which decreases
execution speed.
 To allow this technique to multiple transactions concurrently it is difficult.

You might also like