Chapter 3 - Introduction Database Transactions
Chapter 3 - Introduction Database Transactions
Introduction to Database
Transactions
Introduction to Transaction Processing
Based on the number of users who can use the system
concurrently, database systems can be classified as:
Single-User System:
At most one user at a time can use the database management
system.
Eg. Personal computer system
Multiuser System:
Many users can access the DBMS concurrently.
Eg. Air line reservation, Bank and the like system are
operated by many users who submit transaction concurrently to
the system
This is achieved by multiprogramming , which allows the
computer to execute multiple programs/processes at the same
time.
2
Introduction…
Concurrency
Interleaved processing:
A single central processing unit (CPU) executes at most one process
at a time.
However, multiprogramming operating systems execute some
commands from one process, then suspend that process and execute
some commands from the next process, and so on.
Therefore, concurrent execution of processes in a single CPU is
interleaved.
Advantages:
keeps the CPU busy when the process requires I/O by switching to
execute another process rather than remaining idle during I/O time
prevents long process from delaying other processes.
Parallel processing:
If the computer system has multiple hardware processors (CPUs),
parallel processing of multiple processes is possible.
If Processes are concurrently executed in multiple CPUs.
3
Introduction…
4
Introduction…
A Transaction:
Logical unit of database processing that includes one or
more access operations (read -retrieval, write - insert or
update, delete).
Examples include ATM transactions, credit card approvals,
flight reservations, hotel check-in, phone calls, supermarket
canning, academic registration and billing.
Database before transaction and after transaction must be in
consistent state.
Transaction boundaries:
Any single transaction in an application program is bounded
with Begin and End statements.
An application program may contain several transactions
separated by the Begin and End transaction boundaries.
5
Introduction…
SIMPLE MODEL OF A DATABASE :
A database is a collection of named data items
The size of a data item is called its Granularity – It may be a
field, a record , or a whole disk block that measure the size of
the data item
Basic operations that a transaction can perform are read and
write
read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume that
the program variable is also named X.
write_item(X): Writes the value of program variable X into
the database item named X.
6
Introduction…
Basic unit of data transfer from the disk to the
computer main memory is one block.
In general, a data item (what is read or written) will
be the field of some record in the database, although it
may be a larger unit such as a record or even a whole
block.
read_item(X) command includes the following steps:
Find the address of the disk block that contains item X.
Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
Copy item X from the buffer to the program variable named
X.
7
Introduction…
write_item(X) command includes the following steps:
Find the address of the disk block that contains item X.
Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
Copy item X from the program variable named X into its
correct location in the buffer.
Store the updated block from the buffer back to disk (either
immediately or at some later point in time).
8
The DBMS maintains a number of buffers in the main
memory that holds data base disk blocks which
contains the database items being processed.
When these buffers are all occupied and
if there is a need for additional database block to be copied
to the main memory ; some buffer management policy
is used to choose for replacement but if the chosen
buffer has been modified, it must be written back to
disk before it is used.
9
Introduction…
Two sample transactions
(a) Transaction T1
(b) Transaction T2
A=300 B=400
Assume we want to transfer from acount1 to account 2
T1 T2
Read(A) Read(B)
A=A-100 B=B+100
Write(A) Write(B) B=500
A=200 B=500
11
Transaction Properties
To ensure data integrity, DBMS should maintain the following ACID properties:
Atomicity: A transaction is an atomic unit of processing; it is either performed
all transaction or not performed at all.
Abort :- changes made to database is not visible(not saved)
Commit :-changes made to database is visible
CA2090= CA2090-2000
Write(CA2090)
Read(SB2359)
SB2359= SB2359+2000
Atomicity- either all or none of the above operation will be done – this is
materialized by transaction management component of DBMS
Consistency-the sum of CA2090 and SB2359 be unchanged by the execution of
Ti i.e 8500- this is the responsibility of application programmer who codes the
transaction
Isolation- when several transaction are being processed concurrently on a data
item they may create many inconsistent problems. So handling such case is the
responsibility of Concurrency control component of the DBMS
Durability - once Ti writes its update this will remain there when the database
restarted from failure . This is the responsibility of recovery management 13
Transaction States
A transaction is an atomic unit of work that is either completed
in its entirety or not done at all.
For recovery purposes, the system needs to keep track of when
the transaction starts, terminates, and commits or aborts.
Transaction states:
Active state -indicates the beginning of a transaction execution
Partially committed state -shows the end of read/write
operation but this will not ensure permanent modification on
the database
Committed state -ensures that all the changes done on a record
by a transaction were done persistently
Failed state happens when a transaction is aborted during its
active state or if one of the rechecking is fails
Terminated State -corresponds to the transaction leaving the
system
14
States …
State transition diagram illustrating the states for transaction
execution
15
Why Concurrency Control is needed:
Concurrency :- executing multiple transaction at a time.
Simultaneous execution of transaction over shared
database can create several data integrity and
consistency problems
Advantages:-
Waiting time decrease
Response time decrease
Resource utilization increase
Efficiency increase
But there are 3 main problems
16
CON’T…
1. Reading uncommitted data(WR conflict “dirty read )
This occurs when one transaction updates a database item
and then the transaction fails for some reason
R(A) = 5
W(A) = 6 R(A) = 6 committed but
reads value stored in local
buffer.(dirty read)
R(B) It reads before transaction
W(B) completes.
Abort
17
CON’T…
2. Unrepeatable read (Rw conflict )
An unrepeatable read, also known as a "non-repeatable read," is a
phenomenon that can occur in database transactions when a piece of
data is read by a transaction and, when the same transaction attempts
to read the same data again, it finds that the data has been modified or
deleted by another concurrent transaction in the meantime. This
inconsistency can lead to unexpected and incorrect results
A=10 assume A+5
T1 T2
R(A) = 10
R(A) =10
A+5
Confusion
W(A) =15
R(A) = 15
18
CON’T…
2. The Lost Update Problem (WW conflict )
The Lost Update Problem, also known as a Write-Write
(WW) conflict, is a concurrency issue that can occur in a
multi-user database environment when two transactions
update the same data concurrently, resulting in the loss of
one of the updates.
In this scenario, if both transactions A and B are operating
concurrently, there is a possibility that the update made by
Transaction B will overwrite the changes made by
Transaction A
19
Schedules
Schedule – a schedule refers to the sequential or chronological order
in which instructions of concurrent transactions are executed
a schedule for a set of transactions must consist of all
instructions of those transactions
must preserve the order in which the instructions appear
in each individual transaction.
Types of schedule
Serial Schedule
Schedule where operations of each transaction are executed
consecutively without any interleaved operations from other
transactions.
Inconsistency not present here
S1:- t1,t2 or
S2:- t2,t1
20
Schedule 1
Let :
T1 transfer $50 from A to B, and
T2 transfer 10% of the balance from A to B.
21
Schedule 2
A serial schedule where T is followed by T
2 1
22
Schedule 3
The following concurrent schedule
serializable schedule (conflict serilizable and view serializable)
Non-serilizable shedule (revocable and non revocable)
Using context switch
does not preserve the value of (A + B). Not in consistent state
23
Serializability
Serializability is a property of a database transaction that ensures that
the final state of the database, after executing a set of transactions
concurrently,
25
Con;t
Non Conflict serializable:
T1 T2
R(A) R(B)
W(A) W(B)
R(B) W(A)
R(A) W(B)
Conflict serializable
T1 T2
R(A) W(A)
W(A) R(A)
W(A) W(A)
26
Con;t
Equivalent Schedule(since Schedule S is converted to S’ by serial swaps of
non conflicting instruction)
The following operation are conflicting schedule
S S'
T1 T2 T1 T2 T1 T2
T1 T2
R(A) R(A) R(A)
R(A)
W(A) W(A) W(A)
W(A)
R(A) R(B) R(B)
R(A)
R(B) R(A) W(B)
W(A)
W(A) W(B) R(A)
R(B)
W(B) W(A) W(A)
W(B)
R(B) R(B) R(B) R(B)
W(B) W(B) W(B) W(B)
27
Con;t
Non Conflict Schedule
We can not convert to S dash
S S’
T1 T2 T1 T2
R(A) R(A)
W(A) W(A)
R(A)
R(A)
W(A)
W(A)
R(A)
R(A)
W(A)
W(A)
29
Precedence Graph
Checking Conflict serializable or not?
T1 T2
R(A)
T1 T2
W(A)
R(B)
W(B)
R(B)
W(B)
R(A)
W(A)
30
Precedence Graph
Checking Conflict serializable or not?
T1 T2 T3
R(A) T1 T2
W(A)
W(A)
T3
W(A)
31
Precedence Graph
Checking Conflict serializable or not?
T1 T2 T3
R(A) T1 T2
R(C)
W(C)
R(B) T3
R(B)
W(B)
W(A)
W(C)
32
View serializability:
A schedule is view serializable if it is view equivalent to a
serial schedule.
If the following three condition are met its view serializable
2.For each data item Q, the transaction(if any) that perform the
final write(Q)operation in schedule S must perform the final;
write (Q) operation in schedule S’.
Lock Scheduler
Table
Read and Writes
Buffers
37
Introduction…
The Purpose of Concurrency Control is:
To enforce Isolation (through mutual exclusion) among
conflicting transactions.
To preserve database consistency through consistency
preserving execution of transactions.
To resolve read-write and write-write conflicts.
Property: A + B + C = 1500
39
Introduction …
Example
Transaction T1: Transfer Transaction T2: Transfer
100 from A to B 100 from A to C
Read (A, t) Read (A, s)
t = t - 100 s = s - 100
Write (A, t) Write (A, s)
Read (B, t) Read (C, s)
t = t + 100 s = s + 100
Write (B, t) Write (C, s)
40
Transaction T1 Transaction T2 A B C
Read (C, s)
s = s + 100
Write (C, s) 400 600 600
Schedule
400 + 600 + 600 = 1600 41
Transaction T1 Transaction T2 A B C
Read (C, s)
s = s + 100
Alternative Write (C, s) 300 600 600
Schedule
300 + 600 + 600 = 1500
So What ? 42
Concurrency Control Techniques
Concurrency control techniques are used to ensure the
noninterference or isolation property of concurrently
executing transactions.
Basic concurrency control techniques are:
Locking
Timestamping
Multiversion
Optimistic methods(Validation or Certification)
Lock Granularity
43
Locking
Lock is a variable associated with a data item that describes the status of
the data item with respect to the possible operations that can be applied.
Lock based protocol insure conflict serialazblity.
It insure consisitency
44
Locking …
Two lock modes:
• (a) shared (read) (b) exclusive (write).
• Shared mode: shared lock (X)
– More than one transaction can apply share lock on X
for reading its value but no write lock can be applied
on X by any other transaction.
– read
• Exclusive mode: Write lock (X)
– Only one write lock on X can exist at any time and no
shared lock can be applied by any other transaction
on X.
Read Write
– Read and write
Read
Conflict matrix Y N
Write
N N
45
Locking …
Locking - Basic Rules
• It has two oprerations : Lock_item(X) and unLock_item(X)
• A transaction request access to an item X by first issuing a lock_Item(x)
opreation .
• If lock (x)=1, the transaction is forced to wait.
• If lock (X)= 0; it is set to 1 and the transaction is allowed to access x
• When a transaction finished operation on X it issues an Unlock_Item
operation which set lock(x) to 0 so that X may be accessed by another
transaction
• If transaction has shared lock on item, it can read but not update item.
• If transaction has exclusive lock on item, it can both read and update item.
• Reads cannot conflict, so more than one transaction can hold shared locks
simultaneously on same item.
• Exclusive lock gives transaction exclusive access to that item.
46
Locking …
1.Lock based Protocol
T1
A
T2
47
Locking …
S1 S2
T1 T2 T1 T2
Lock-x(a) Lock-s(a)
R(A) R(A)
W(A) Unlock(a)
Lock-x(b) R(A)
R(B) W(A)
W(B)
49
Limitation of Lock based Protocol
2. Starvation:
T1 T2 T3 T4 T2 must wait all the
Lock-
x(A)
Transaction till
R(A) unlockes. So now T2
Lock- is starved.
x(A)
R(A)
W(A)
Lock-s(A)
R(A)
Lock-s(A)
R(A)
50
Dealing with Deadlock and Starvation
Deadlock
It is a state that may result when two or more transactions are each waiting for
locks held by the other to be released
Example :
T1 T2
read_lock (Y);
read_item (Y);
read_lock (X);
read_item (X);
write_lock (X);
write_lock (Y);
T1 is in the waiting queue for X which is locked by T2
T2 is on the waiting queue for Y which is locked by T1
No transaction can continue until the other transaction completes
T1 and T2 did follow two-phase policy but they are deadlock
So the DBMS must either prevent or detect and resolve such deadlock situations
51
Two-Phase Locking Techniques: The algorithm
Every transaction can be divided into Two Phases: Locking (Growing) & Unlocking
(Shrinking)
Locking (Growing) Phase:
A transaction applies locks (read or write) on desired data items one at a
time.
Acquires all locks but cannot release any locks.
Unlocking (Shrinking) Phase:
A transaction unlocks its locked data items one at a time.
Releases locks but cannot acquire any new locks.
Requirement:
For a transaction, these two phases must be mutually exclusive, that is, during
locking phase unlocking phase must not start and during unlocking phase
locking phase must not begin.
# locks
held by
Ti
Transaction follows 2PL protocol if all locking operations precede the first unlock
operation in the transaction.
52
Two-Phase Locking Techniques
Example Example
T1 T2
Its not Two Phase Locking
Lock-S(A) becouse it will ask for lock After
R(A)
Unlock A
Unlock .
Lock-S(B)
In two phase protocol after
R(B) release locks no lock request
Unlock(B)
in(Shrinking phase)
Lock-X(B)
R(B)
W(B)
Unlock(B)
53
Two-Phase Locking Techniques
Example Example
T1 T2
Its Two Phase Locking.
Lock-S(A)
R(A)
Lock-X(B) Growing Phase
R(B)
W(B)
Unlock(A)
Unlock(B) Shrinking Phase
Lock-S(B)
R(B)
Unlock(B)
Growing Phase
Shrinking Phase
54
Refinement of two phase locking protocol
Locking Conversion:
Upgrade Lock (S to X)-(Growing phase)
Downgrade Lock (X to S)(Shrinking phase)
T1 T2
T1 T2
Lock-S(a1)
R(a) Lock-S(a)
Lock-S(a1) R(a)
R(a1) W(a)
Downgrade(a)
Lock-S(a2)
R(a2) Lock-S(a)
R(a)
Lock-S(a2)
R(a2) R(a)
Unlock(a)
Upgrade(a)
W(a) Upgrade(a)
W(a)
55
Types of Two Phase Locking
Simple Two phase lock:
Strict Two phase locking:- A transaction doesn’t
release any of its write lock until it commits or aborts.
It may release shared mode lock but hold exclusive
mode till commit operation
Rigorous two phase locking:- • A transaction
doesn't release any of its lock (exclusive and shared)
until it commit or aborts
56
Two phase locking protocol
T1 T2
T1 T2 T1 T2
Lock-S(a) (Rigorous-
Lock-s(a) Lock-S(a) It unlock R(a) two phase
R(a) R(a) shared Lock-x(b) lock)
Lock-S(a) W(a) mode lock R(b)
R(a1) Lock-x(b) W(b)
R(b) Unlock(a)
Lock-x(b) (Simple-two W(b) Unlock(b)
R(b) phase) Unlock(a) (Strict-two
Unlock(b) phase) commit
R(a)
Lock-S(a)
Lock-x(b) R(a)
R(b) Lock-S(a)
W(b) R(a)
Unlock(b) Lock-s(b) Lock-x(b)
commit Unlock(a) R(a)
Unlock(b) W(b)
Unlock(a) Unlock(a)
commit commit commit Unlock(b)
commit
57
Deadlock and Starvation …
There are possible solutions : Deadlock prevention, deadlock detection
and avoidance, and lock timeouts
i. Deadlock prevention protocol: two possibilities
The conservative two-phase locking
− A transaction locks all data items it refers to before it begins
execution.
− This way of locking prevents deadlock since a transaction never
waits for a data item.
− Limitation : It restricts concurrency
Transaction Timestamp( TS(T) )
We can prevent deadlocks by giving each transaction a priority
and ensuring that lower priority transactions are not allowed to
wait for higher priority transactions.
One way to assign priorities is to give each transaction a
timestamp when it starts up.
it is a unique identifier given to each transaction based on time in
which it is started. i.e if T1 starts before T2 , TS(T1)<TS(T2)
The lower the timestamp, the higher the transaction's priority, that
is, the oldest transaction has the highest priority.
58
Deadlock and Starvation …
ii. Deadlock Detection and resolution
In this approach, deadlocks are allowed to happen
The scheduler maintains a wait-for-graph for detecting cycle.
When a chain like: Ti waits for Tj waits for Tk waits for Ti or Tj
occurs, then this creates a cycle.
When the system is in the state of deadlock , some of the
transaction should be aborted by selected (victim) and rolled-back
This can be done by aborting those transaction: that have made the
least work, the one with the lowest locks, and that have the least #
of abortion and so on
Example:
59
Deadlock and Starvation …
iii. Timeouts
It uses the period of time that several transaction have been
waiting to lock items
It has lower overhead cost and it is simple
If the transaction wait for a longer time than the predefined time
out period, the system assume that may be deadlocked and
aborted it
Starvation
Starvation occurs when a particular transaction consistently waits
or restarted and never gets a chance to proceed further while other
transaction continue normally
This may occur , if the waiting method for item locking:
Gave priority for some transaction over others
Problem in Victim selection algorithm- it is possible that the same
transaction may consistently be selected as victim and rolled-back.
Solution
FIFO
Allow for transaction that wait for a longer time
Give higher priority for transaction that have been aborted for
60
many time
Timestamp based concurrency control algorithm
Timestamp
In lock based concurrency control , conflicting actions of different
transactions are ordered by the order in which locks are obtained.
But here, Timestamp values are assigned based on time in which the
transaction are submitted to the system using the current date & time of
the system
A monotonically increasing variable (integer) indicating the age of an
operation or a transaction.
A larger timestamp value indicates a more recent event or operation.
Timestamp based algorithm uses timestamp to serialize the execution of
concurrent transactions.
It doesn’t use lock, thus deadlock cannot be occurred
In the timestamp ordering, conflicting operation in the schedule shouldn’t
violate serilazable ordering
This can be achieved by associating timestamp value (TS) to each
database item which is denoted as follow:
a) Read_Ts(x): the read timestamp of x – this is the largest time among all
the time stamps of transaction that have successfully read item X
b) Write_TS(X): the largest of all the timestamps of transaction that have 61
Multiversion Concurrency Control Techniques
This approach maintains a number of versions of a data item
and allocates the right version to a read operation of a
transaction.
Thus unlike other mechanisms a read operation in this
mechanism is never rejected.
This algorithm uses the concept of view serilazability than
conflict serialiazability
Side effect:
Significantly more storage (RAM and disk) is required to maintain
multiple versions. To check unlimited growth of versions, a garbage
collection is run when some criteria is satisfied.
62
Validation (Optimistic) Concurrency Control Schemes
This technique allow transaction to proceed asynchronously and only at
the time of commit, serializability is checked &
transactions are aborted in case of non-serializable schedules.
Good if there is little interference among transaction
It has three phases: Read, Validation , and Write phase
i. Read phase:
A transaction can read values of committed data items. However,
updates are applied only to local copies (versions) of the data items (in
database cache).
ii. Validation phase:.
− If the transaction Ti decides that it wants to commit, the DBMS checks whether
the transaction could possibly have conflicted with any other concurrently
executing transaction.
− While one transaction ,Ti, is being validated , no other transaction can be
allowed to commit
− This phase for Ti checks that, for each transaction Tj that is either
committed or is in its validation phase, one of the following conditions
holds:
63
Validation (Optimistic) …
Tj completes its write phase before Ti starts its read phase.
Ti starts its write phase after Tj completes its write phase and the
read set of Ti has no item in common with the write set of Tj
Both the read_set and write_set of Ti have no items in common with
the write_set of Tj, and Tj completes its read phase before Ti
completes its read phase.
− When validating Ti, the first condition is checked first for each
transaction Tj, since (1) is the simplest condition to check. If (1) is false
then (2) is checked and if (2) is false then (3 ) is checked.
− If none of these conditions holds, the validation fails and Ti is aborted.
64
Multiple Granularity Locking
A lockable unit of data defines its granularity
Granularity can be coarse (entire database) or it can be fine (an attribute of a
relation).
Example of data item granularity:
A field of a database record
A database record
A disk block/ page
An entire file
The entire database
Data item granularity significantly affects concurrency control performance.
Thus, the degree of concurrency is low for coarse granularity and high for fine
granularity.
Example:
A transaction that expects to access most of the pages in a file should
probably set a lock on the entire file , rather than locking individual pages or
records
If a transaction that requires to access relatively few pages of the file , it is
better to lock those pages
Similarly , if a transaction access several records on a page , it should lock
the entire page and if it access just a few records , it should lock some those
records.
This example will hold true , if a lock on the node locks that node and implicitly 65