03_Concurrency (1)
03_Concurrency (1)
Concurrency Control
2
fix, unfix
Atomicity
Buffer Manager Durability
read, write
DB + Log
T4 b c e
T5 b c e t
bot(T2)
b(T2) c(T2) b(T1) c(T1)
var y;
r(D y) t
y = y + 6 D=100 D=106 D=109
w(y D)
commit(T2)
6
Concurrency Control
• Concurrency is fundamental
• Tens or hundreds of transactions per second cannot be
executed serially
• Examples: banks, ticket reservations
7
Concurrent Executions
b(T1) e(T1) b(T2) e(T2)
t SERIAL
8
Execution with Lost Update
T1 : UPDATE account
D = 100 SET balance = balance + 3
2 T 1: x = x + 3 T2 : UPDATE account
Note: this anomaly does not depend merely on T2 overwriting the value produced
by T1
w1(x), w2(x) is ok (serial)
r1(x), w1(x), w2(x) is ok too (serial)
r1(x), r2(x), w1(x), w2(x) is not ok: inconsistent updates from the same initial value
Sequence of I/O actions
producing the error
r1 r2 w1 w2
or
r1 r2 w2 w1
10
Dirty Read
T1 : UPDATE account
SET balance = balance + 3
WHERE client = ‘Smith’
T2 : UPDATE account
T2 : UPDATE account
1 T1: r(D x) SET balance = balance + 6
WHERE client = ‘Smith’
2 T2: r(D y)
3 T2: y = y + 6
4 T2: w(y D) D = 106
5 T1: r(D z) z <> x !
Phantom Update
Constraint: A+B+C=100, A=50, B=30, C=20
Phantom Insert A B
• S: r0(x) r0(y) w0(x) r1(y) r1(x) w1(y) r2(x) r2(y) r2(z) w2(z)
T0 T1 T2
Principles of Concurrency Control
• In case of non-serial schedules?
S: r0(x) r1(y) r0(y) w0(x) r1(x) r2(x) w1(y) r2(y) r2(z) w2(z)
All
Serial Schedules
Serializable Schedules
(“good”) (“good but
Schedules unrealistic”)
S10: w0(x) r1(x) w0(z) r1(z) r2(x) w0(y) r3(z) w3(z) w2(y) w1(x) w3(y)
S12: w0(x) w0(z) w0(y) r2(x) w2(y) r1(x) r1(z) w1(x) r3(z) w3(z) w3(y)
S10: w0(x) r1(x) w0(z) r1(z) r2(x) w0(y) r3(z) w3(z) w2(y) w1(x) w3(y)
S13: w0(x) w0(z) w0(y) r2(x) w2(y) r3(z) w3(z) w3(y) r1(x) r1(z) w1(x)
All
Schedules
VSR
Schedules
? Serial
Schedules
Conflict-serializability
• Preliminary definition:
• Two operations oi and oj (i ≠ j) are in conflict if
they address the same resource and at least one
of them is a write
• read-write conflicts ( r-w or w-r )
• write-write conflicts ( w-w )
Conflict-serializability
• Two schedules are conflict-equivalent (Si ≈C Sj) if :
• Si and Sj contain the same operations and in all the
conflicting pairs the transactions occur in the same
order
• A schedule is conflict-serializable iff it is conflict-
equivalent to a serial schedule of the same
transactions
• The class of conflict-serializable schedules is named
CSR
Relationship between CSR and VSR
• VSR ⊃ CSR : all conflict-serializable schedules are also
view-serializable, but the converse is not necessarily
true
All
Schedules
VSR CSR Serial
Schedules Schedules Schedules
Testing conflict-serializability
• Is done with a conflict graph that has:
• One node for each transaction Ti
• One arc from Ti to Tj if there exists at least one conflict
between an operation oi of Ti and an operation oj of Tj
such that oi precedes oj
• Theorem:
• A schedule is in CSR if and only if its conflict graph is
acyclic
Testing conflict-serializability
S10 : w0(x) r1(x) w0(z) r1(z) r2(x) w0(y) r3(z) w3(z) w2(y) w1(x) w3(y)
• Resource-based projections:
• x : w0 r1 r2 w1
• y : w0 w2 w3 T1
• z : w0 r1 r3 w3
T0
T2
T3
CSR implies acyclicity of the CG
• Consider a schedule S in CSR. As such, it is ≈C to a
serial schedule
• W.l.o.g. we can (re)label the transactions of S to say
that their order in the serial schedule is: T1 T2 … Tn
• Since the serial schedule has all conflicting pairs in
the same order as schedule S, in the conflict graph
there can only be arcs (i,j), with i<j
• Then the graph is acyclic, as a cycle requires at least
an arc (i,j) with i>j
Acyclicity of the CG implies CSR
• If S’s graph is acyclic then it induces a
topological (partial) ordering on its
nodes, i.e., an ordering such that the
graph only contains arcs (i,j) with i<j. The
same partial order exists on the
transactions of S
T0 T1
• Any serial schedule whose transactions
are ordered according to the partial
order is conflict-equivalent to S, because T2
for all conflicting pairs (i,j) it is always i<j
• In the example before: T0 < T2 < T1 < T3 T3
• In general, there can be many compatible
serial schedules (i.e., many serializations
for the same acyclic graph)
• As many as the total orders compatible
with the partial topological order
Let’s go back... (2)
r1(x) w2(x) w1(x) w3(x)
T1 T1 T2
T2
w2(x) w1(x) w2(x) w1(x)
43
Back to: A More Complex Example
S10: w0(x) r1(x) w0(z) r1(z) r2(x) w0(y) r3(z) w3(z) w2(y) w1(x) w3(y)
T0 T2 T1 T3
S12: w0(x) w0(z) w0(y) r2(x) w2(y) r1(x) r1(z) w1(x) r3(z) w3(z) w3(y)
Concurrency Control in Practice
• CSR checking would be efficient if we knew the graph from
the beginning — but we don’t
• A scheduler must rather work “online”, i.e., decide for each
requested operation whether to execute it immediately or
to reject/delay it
• It is not feasible to maintain the conflict graph, update it,
and check its acyclicity at each operation request
• The assumption that concurrency control can work only
with the commit-projection of the schedule is unrealistic,
aborts do occur
• Some simple on-line “decision criterion” is required for the
scheduler, which must
• avoid as many anomalies as possible
• have negligible overhead
45
Arrival sequences vs a posteriori
schedules
• So far the notation
• r1(x) w2(x) w1(x) w3(x)
• represented a “schedule”, which is an a posteriori view of the execution
of concurrent transactions in the DBMS (also called “history” in some
books)
• A schedule represents “what has happened”, “which operations have
been executed by which transaction in which order”
• They can be further restricted by the commit-projection hypothesis to
operations executed by committed transactions
• When dealing with “online” concurrency control, it is important also to
consider “arrival sequences”, i.e., sequences of operation requests
emitted in order by transactions
• With an abuse of notation, we will denote an arrival sequence in the
same way as a posteriori schedule
• r1(x) w2(x) w1(x) w3(x)
• The distinction will be clear from the context
Concurrency control approaches
• How can concurrency control be implemented
“online”?
• Two main families of techniques:
• Pessimistic
• Based on locks, i.e., resource access control
• If a resource is taken, make the requester wait or pre-empt the
holder
• Optimistic
• Based on timestamps and versions
• Serve as many requests as possible, possibly using out-of-date
versions of the data
• We will compare the two families after introducing
their features
• Commercial systems take the best of both worlds
Locking
• It’s the most common method in commercial systems
• A transaction is well-formed w.r.t. locking if
• read operations are preceded by r_lock r_lock1(x) r1(x) unlock1(x)
(aka SHARED LOCK) and followed by unlock
• write operations are preceded by w_lock w_lock1(x) w1(x) unlock1(x)
(aka EXCLUSIVE LOCK) and followed by unlock
• Note: unlocking can be delayed w.r.t. to the end of the read/write
operation
• Transactions that first read and then write an object may: r1(x) w1(x)
r_lock OK OK NO
R_LOCKED R_LOCKED (n++) W_LOCKED
OK NO NO n: counter of the
w_lock W_LOCKED R_LOCKED W_LOCKED concurrent readers,
(inc|dec)remented at
unlock OK OK each (r_|un)lock
ERROR
DEPENDS (n--) FREE
Example
• Arrival sequence: r1(x), w1(x), r2(x), r3(y), w1(y), …
• r1(x) x state y state
• r1-lock(x) request OK r-locked, nx=1
• w1(x)
• w1-lock(x) request OK (upgrade) w-locked
• r2(x)
• r2-lock(x) request NO because w-locked T2 waits for x
• r3(y)
• r3-lock(y) request OK r-locked, ny=1
• T3 unlock(y) free ny=0
• w1(y)
• w1-lock(y) request OK w-locked
• T1 unlock(x) free nx=0
• T1 unlock(y) free
number of plateau
resources growing shrinking
locked by Ti phase phase
t
commit-work/rollback-work
53
Serializability
• Consider a scheduler that
• Only processes well-formed transactions
• Grants locks according to the conflict table
• Checks that all transactions apply the two-phase rule
• The class of generated schedules is called 2PL
• Result: schedules in 2PL are both view- and
conflict-serializable
• (VSR ⊃ CSR ⊃ 2PL)
54
CSR, VSR and 2PL
VSR Schedules
All
Schedules
T3 T1 T2
59
A visualization of the 2PL test
Resources on the Y axis and operation times on the X axis
1 2 3 4 5 6
1 1 1 2 2 2
x r1 w1 r2 w2
3 3 1
y r3 w1
2PL and other anomalies
• Nonrepeatable read r1 - r2 - w2 - r1
• Already shown
• Lost update r1 - r2 - w2 - w1
• T1 releases a lock to T2 and then tries to acquire another one
• Phantom update: r1 - r2 - w2 - r1
• T1 releases a lock to T2 and then tries to acquire another one
• Phantom insert: r1 - w2(new data) - r1
• T1 releases a lock to T2 and then tries to acquire another one
(NOTE: T2 does not necessarily write on data already locked
by T1 requires lock on “future data” aka predicate locks)
• Dirty read r1 - w1 - r2 - abort1 - w2
• Requires dealing with abort
Dirty reads are still a menace:
Strict 2PL
• Up to now, we were still using the hypothesis of commit-
projection (no transactions in the schedule abort)
• 2PL, as seen so far, does not protect against dirty (uncommitted
data) reads (and therefore neither do VSR nor CSR)
• Releasing locks before rollbacks exposes “dirty” data
• To remove this hypothesis, we need to add a constraint to
2PL, that defines strict 2PL:
• Locks held by a transaction can be released only after
commit/rollback
• Remember: rollback restores the state prior to the aborted updates
• This version of 2PL is used in most commercial DBMSs when
a high level of isolation is required (see next: SQL isolation
levels)
Strict 2PL in Practice
number of plateau
growing
resources phase shrinking
locked by Ti phase
commit-work/rollback-work
• Strict 2PL locks are also called long duration locks, 2PL locks short duration
locks
• Note: real systems may apply 2PL policies differently to read and write locks
• Typically: long duration strict 2PL write locks, variable policies for read locks
• NOTE: long duration read locks are costly in terms of performances: real
systems replace them with more complex mechanisms
63
How to prevent phantom inserts:
predicate locks Tab
A B
• A phantom insertion occurs when a transaction adds items
to a data set previously read by another transaction
• To prevent phantom inserts a lock should be placed also on
“future data”, i.e., inserted data that would satisfy a
previous query
Example:
• Predicate locks extend the notion of data locks to “future T1: C=AVG(B: A=1)
data” T2: Insert (A=1,B=2)
T1: C=AVG(B: A=1)
• Example: For T1 lock on
• Suppose that transaction T = update Tab set B=1 where A>1 predicate A=1
• Then, the lock is on predicate A>1 T2 cannot Insert
(A=1,B=2)
• Other transactions cannot insert, delete, or update any tuple
satisfying this predicate
• In the worst case (predicate locks not supported):
• The lock extends to the entire table
• In case the implementation supports predicate locks:
• The lock is managed with the help of indexes (gap lock)
64
Isolation Levels in SQL:1999 (and JDBC)
• SQL defines transaction isolation levels which
specify the anomalies that should be prevented by
running at that level
• The level does not affect write locks. A transaction
should always get exclusive lock on any data it
modifies, and hold it until completion (strict 2PL on
write locks), regardless of the isolation level. For
read operations, levels define the degree of
protection from the effects of modifications made
by other transactions
65
Why long duration write locks are
necessary
• Consider the following schedule (admissible if write locks are short
duration) and remove the hypothesis that aborts do not occur
1 1 2 2
• w1[x] ... w2[x]...((c1 or a1) and (c2 or a2) in any order)
• T2 is allowed to write over the same object updated by T1 which has not yet
completed
• If T1 aborts… e.g., w1[x]...w2[x] ... , a1, (c2 or a2)
• How to process event a1?
• If x is restored to the state before T1, T2’s update is lost, so if T2 commits x has a stale
value
• If x is NOT restored and T2 also aborts… then T2’s proper before state cannot be
reinstalled either!
• Thus: write locks are held until the completion of the transaction to
enable the proper processing of abort events
• The anomaly of the above non commit-projection schedule is named
dirty write
Isolation Levels in SQL:1999 (and JDBC)
• READ UNCOMMITTED allows dirty reads, nonrepeatable reads and phantom updates
and inserts:
• No read locks (and ignores locks of other transactions)
• READ COMMITTED prevents dirty reads but allows nonrepeatable reads and
phantom updates/inserts:
• Read locks (and complies with locks of other transactions), but without 2PL on read locks (read
locks are released as soon as the read operation is performed and can be acquired again)
• REPEATABLE READ avoids dirty reads, nonrepeatable reads and phantom updates, but
allows phantom inserts:
• long duration read locks 2PL also for reads
• SERIALIZABLE avoids all anomalies:
• 2PL with predicate locks to avoid phantom inserts
• Note that SQL standard isolation levels dictate minimum requirements, real systems
may go beyond (e.g., in MySQL and Postgres REPEATABLE READ avoids phantom
inserts too) and use different mechanisms (e.g., to avoid long duration read locks)
Dirty Read Non rep. read Phantoms
Read uncommitted Y Y Y
Read committed N Y Y
Repeatable read N N Y (insert)
Serializable N N N
SQL92 serializable <> serial !
• Serializable transactions don't necessarily execute
serially
• The requirement is that transactions can only commit if
the result would be as if they had executed serially in
any order
• The locking requirements to meet this guarantee can
frequently lead to a deadlock (see next slides) where
one of the transactions needs to be rolled back
• Therefore, the SERIALIZABLE isolation level is used
sparingly and is NOT the default in most commercial
systems
SQL isolation levels and locks
SQL Isolation levels may be implemented with the appropriate
use of locks
Commercial systems make joint use of locks and of timestamp-
based concurrency control mechanisms
READ LOCKS WRITE LOCKS
READ UNCOMMITTED Not required Well formed writes
Long duration write locks
73
Deadlock
• Occurs because concurrent transactions hold and in
turn request resources held by other transactions
• T2 : r2(y) w2(x) x
y
requests (XL)
T2
holds (SL)
76
1) Timeout Method
• A transaction is killed and restarted after a given
amount of waiting (assumed as due to a deadlock)
• The simplest method, widely used in the past
• The timeout value is system-determined (sometimes it
can be altered by the database administrator)
• The problem is choosing a proper timeout value
• Too long: useless waits whenever deadlocks occur
• Too short: unrequired kills, redo overhead
• http://davebland.com/how-often-does-sql-server-look-
for-deadlocks
78
2) Deadlock Prevention
• Idea: killing transactions that could cause cycles
• Resource-based prevention: restrictions on lock requests
• Transactions request all resources at once, and only once
• Resources are globally sorted and must be requested “in global
order”
• Problem: it’s not easy for transactions to anticipate all requests!
• Transaction-based prevention: restrictions based on
transactions’ IDs
• Assigning IDs to transactions incrementally transactions’ “age”
• Preventing “older” transactions from waiting for “younger” ones to
end their work
• Options for choosing the transaction to kill
• Preemptive (killing the holding transaction – wound-wait)
• Non-preemptive (killing the requesting transaction – wait-die)
• Problem: too many “killings”! (waiting probability >> deadlock
probability) 79
80
Wait-Die Algorithm:
If RT T1 is older than CT T2, then T1 waits, otherwise T1 dies.
This is a non-preemptive algorithm in which RT never forces CT to abort.
T1 T2 T2 T1
(old) (young) (young) (old)
T1 T2 T2 T1
(old) (young) (young) (old)
82
Distributed Deadlock Detection
Distributed dependency graph: external call nodes represent a sub-transaction
activating another sub-transaction at a different node
Node A Node B T3
Call T2b
T2a EB EA
T4
T1a EB EA
Call T1b
84
Forwarding rule
Node A Node B T3
Call T2b
T2a EB EA
T4
T1a EB
EA
Call T1b
• Node A:
• Activation/wait sequence: Eb T2 T1 Eb
• i=2, j=1 i>j
• A can dispatch info to B
• Node B: (only distributed transactions count)
• Activation/wait sequence: Ea T1 T2 Ea
• i=1, j=2
• B does not dispatch info to A
i<j
Obermarck’s Algorithm
• Runs periodically at each node
• Consists of 4 steps
• Get graph info (wait dependencies among transactions
and external calls) from the “previous” nodes.
Sequences contain only node and top-level transaction
identifiers
• Update the local graph by merging the received
information
• Check the existence of cycles among transactions
denoting potential deadlocks: if found, select one
transaction in the cycle and kill it
• Send updated graph info to the “next” nodes
• Propagate also killed transactions
86
Algorithm execution, step 1:
communication
Node A Node B T3
Call T2b
T2a EB EA
T4
T1a EB EA
Call T1b
87
Algorithm execution, step 2: local
graph update EB
• at Node B: EA T3
• 1. Eb T2 T1 Eb is
received T2
• 2. Eb T2 T1 Eb is T4
88
Algorithm execution: deadlock
resolution EB
Node A Node B
EA T3
Call
///////
T2a EB
///////
T2
T4
T1a EB
EA
Call T1
EB
90
Another example, continued
Node A Node B Node C
EC T3 EA T2 EB T1
T2 EB T1 EC T3 EA
3 > 2 can send (to B) 2 > 1 can send (to C) 1 < 3 cannot send
91
Another example, continued
Node A Node B Node C
EC T3 EC T3 EA T2
T2 EB EA T2 EB EB T1 EC
T1 EC T3 EA
92
Another example, continued
Node A Node B Node C
EC T3 EC T3 EA T2
T2 EB EA T2 EB EB T1 EC
T1 EC EC T3 EA
Cycle detected!
93
Obermarck: immateriality of
conventions
• There are two arbitrary choices in the algorithm:
• Send messages only if: (1) i > j vs. (2) i < j
• Send them to: (a) the following node vs. (b) the
preceding node
• Therefore, there are four versions/variants of the
algorithm
• (1+a), (1+b), (2+a), (2+b)
• The sequence of the sent messages is different
• However, they all identify deadlocks (if present)
94
Deadlocks in practice
• Their probability is much less than the conflict probability
• Consider a file with n records and two transactions doing two
accesses to their records (uniform distribution); then:
• Conflict probability is O(1/n): ∑ i=1..n (1/n * 1/n) = n * (1/n * 1/n) = 1/n
• Deadlock probability is O(1/n2) : T1 conflicts with T2 AND vice versa
• Still, they do occur (once every minute in a mid-size bank)
• The probability is linear in the number of transactions, quadratic in
their length (measured by the number of lock requests)
• Shorter transactions are healthier (ceteris paribus)
• There are techniques to limit the frequency of deadlocks
• Update Lock, Hierarchical Lock, …,
95
96
Update lock
• The most frequent deadlock occurs when 2 concurrent
transactions start by reading the same resources (SL) and
then decide to write and try to upgrade their lock to XL
• To avoid this situation, systems offer the UPDATE LOCK (UL)
– asked by transactions that will read and then write
Resource status
Request free SL UL XL
SL OK OK OK No
UL OK OK No No
XL OK No No No
• Update locks are easy to implement and mitigate the most frequent
cause of collision: r1(x) r2(x) w1(x) w2(x)
• They are requested by using SQL SELECT FOR UPDATE statement
97
Update lock
r1(x) r2(x) w1(x) w2(x) Locks for r1(x)w1(x)
Sequence of lock requests with SL and XL only: SL(upgrade)XL
SL1 (granted), SL2 (granted), XL1 (T1 waits) , XL2 (T2 waits)
Deadlock!
Resource status
Request free SL UL XL
SL OK OK OK No
UL OK OK No No
XL OK No No No
# Session 3:
mysql> START TRANSACTION;
mysql> SELECT * FROM t FOR UPDATE SKIP LOCKED;
+---+ // you get unclaimed records – only committed records
| i | // when avoiding conflicts is more important than getting all the rows
+---+ // queue management: e.g., queue of tasks to consume
|1| FROM: https://dev.mysql.com/doc/refman/8.0/en/innodb-
|3| locking-reads.html
+---+
Hierarchical Locking
• Update locks prudentially extend the
interval during which a resource is
locked file
• What to lock? An entire table? Coarser
reduces concurrency too much Granularity
• Locks can be specified with different page
granularities
• e.g.: schema, table, fragment, page,
tuple, field
• Objectives: tuple
• Locking the minimum amount of data Increased
• Recognizing conflicts as soon as
possible Concurrency
value
• Method: asking locks on hierarchical
resouces by:
• Requesting resources top-down until
the right level is obtained
• Releasing locks bottom-up
Intention Locking Scheme
• 5 Lock modes:
• In addition to read (SHARED) locks (SL) and write
(EXCLUSIVE) locks (XL)
• The new modes express the “intention” of locking
at lower (finer) levels of granularity
• ISL: Intention of locking a subelement of the current
element in shared mode
• IXL: Intention of locking a subelement of the current
element in exclusive mode
• SIXL: Lock of the element in shared mode with intention
of locking a subelement in exclusive mode (SL+IXL)
Hierarchical Locking Protocol
• Locks are requested starting from the root (e.g., starting
from the whole table) and going down in the hierarchy
• Locks are released starting from the locked resource and
going up in the hierarchy
• To request an SL or ISL lock on a non-root element, a
transaction must hold an equally or more restrictive lock
(ISL or IXL) on its “parent”
• To request an IXL, XL or SIXL lock on a non-root element, a
transaction must hold an equally or more restrictive lock
(SIXL or IXL) on its “parent”
• When a lock is requested on a resource, the lock manager
decides based on the rules specified in the hierarchical lock
granting table
Hierarchical lock granting table
Resource state
Request free ISL IXL SL SIXL XL
ISL OK OK OK OK OK No
IXL OK OK OK No No No
SL OK OK No OK No No
SIXL OK OK No No No No
XL OK No No No No No
104
Example
Root = TableX
P1 P2 Page 1 (P1): t1,t2,t3,t4 tuples
t1 t5 Page 2 (P2): t5,t6,t7,t8 tuples
t2 t6 Locks
T1
Transaction 1: read(P1)
t3 t7
write(t3)
t4 t8
read(t8)
P1 P2
Transaction 2: read(t2)
read(t4)
t1 t5
Locks
write(t5)
t2 t6
T2 write(t6)
t3 t7
t4 t8 They are NOT in r-w conflict!
(indipendently of the order)
106
Transaction 1:
ISL root +ISL
+IXL
IXL
IXL root P1 ... t3 ... P2 ... t8
SL read(P1)
+IXL P1 P2 ISL
SIXL ISL1 SL1
t1 t2 t3 t4 t5 t6 t7 t8
XL SL write(t3)
IXL1 SIXL1 XL1
read(t8)
IXL1 SIXL1 XL1 ISL1 SL1
107
root P1 .. t2 .. t4 P2 .. t5 t6
ISL root +IXL IXL
read(t2)
ISL2 ISL2 SL2
ISL P1 P2 IXL
read(t4)
t1 t2 t3 t4 t5 t6 t7 t8 ISL2 ISL2 SL2 SL2
SL SL XL XL
write(t5)
IXL2 ISL2 SL2 SL2 IXL2 XL2
write(t6)
IXL2 ISL2 SL2 SL2 IXL2 XL2 XL2
108
+IXL2 IXL2
ISL2 +ISL1 root P1 .. t2 .. t4 P2 .. t5 t6
root
+IXL2?
read2(t2)
ISL2 ISL2 ISL2 SL2
+SL1 P1
P2
read1(P1)
ISL2,
t1 t2 t3 t4 t5 t6 t7 t8 ISL1,2 SL2
SL2 SL1
write2(t3) ISL1, ISL2,
SL SL2
Conflict! IXL2 (IXL2)
1
T2 waits!
Concurrency Control Based on
Timestamps
• Locking is also named pessimistic concurrency control because
it assumes that collisions (transactions reading-writing the same
object concurrently) will arise
• Assumption: conflicts occur lock the records to prevent them
• Alternative and complementary to 2PL (and to locking in
general) are optimistic concurrency control methods
• Assumption: conflicts are rare run the transaction and validate the
operations before commit (normal validation) or before each operation
(early validation)
• Timestamp:
• Identifier that defines a total ordering of the events of a system
• Each transaction has a timestamp representing the time at
which the transaction begins so that transactions can be
ordered by “birth date”: smaller index older transaction
• A schedule is accepted only if it reflects the serial ordering of
the transactions induced by their timestamps
TS concurrency control principles
• The scheduler has two counters: RTM(x) and WTM(x) for each
object e.g., after r2(x), r3(x), r1(x) RTM=3
• RTM (x) = timestamp of the transaction with the highest ts that has read x
• WTM (x) = timestamp of the transaction that did the last write on x
114
TS and CSR
• TS => CSR
• Let S be a TS schedule of T1 and T2
• Suppose S is not CSR, which implies that it contains a cycle
between T1 and T2
• S contains op1(x), op2(x) where at least one of the opi is a write
• S contains also op2(y), op1(y) where at least one of the opi is a
write
• When op1(y) arrives:
• If op1(y) is a read, T1 is killed by TS because it tries to read a value
written by a younger transaction [ts < WTM(y)] [1 < 2]
CONTRADICTION
• If op1(y) is a write, T1 is killed no matter what op2(y) is because it
tries to write a value already read or written by a younger
transaction [ts < R/WTM(y)] [1 < 2] CONTRADICTION
CSR, VSR, 2PL and TS
All schedules
VSR 2PL
CSR TS Serial
TS and dirty reads
• Basic TS-based control considers only committed
transactions in the schedule, aborted transactions
are not considered (commit-projection hypothesis)
• If aborts occur, dirty reads may happen
• To cope with dirty reads, a variant of basic TS must
be used e.g., w1(x), r2(x) T1 aborts!
• A transaction Ti that issues a rts(x) such that ts > WTM(x)
(i.e., acceptable) has its read operation delayed until the
transaction T’ that wrote the values of x has committed
or aborted
• Similar to long duration write locks
• But…buffering operations introduces delays
2PL vs. TS
• The serialization order with 2PL is imposed by conflicts,
while in TS it is imposed by the timestamps
• In 2PL transactions can be actively waiting. In TS they
are killed and restarted
• The necessity of waiting for commit of transactions
causes long delays in strict 2PL
• 2PL can cause deadlocks, TS can be used to prevent
deadlocks with the wound-wait and wait-die schemes
• Restarting a transaction costs more than waiting: 2PL
wins!
• Commercial systems implement a mix of optimistic and
pessimistic concurrency control (e.g., Strict 2PL or 2PL +
Multi Version TS)
Reducing kill rate: Thomas Rule
• The scheduler has two counters: RTM(x) and WTM(x) for each
object
• The scheduler receives read/write requests tagged with
timestamps:
• rts(x):
• If ts < WTM(x) the request is rejected and the transaction is killed
• Else, access is granted and RTM(x) is set to max(RTM(x), ts)
ts < RTM(x)
• wts(x):
• If ts < RTM(x) the request is rejected and the transaction is killed
e.g., r2(x), w1(x)
• Else, if ts < WTM(x) then our write is "obsolete": it can be skipped ts < WTM(x)
• Else, access is granted and WTM(x) is set to ts
e.g., w2(x), w1(x)
VSR 2PL
CSR TS Serial
x: r2 w3 Not CSR!
r1(y) r2(x) w3(y) w2(y) w3(x) w4(y)
y: r1 w3 w2 w4
w2(y) is an obsolete write and thus skipped
It reflects the behavior of: r1(y) r2(x) w2(y) w3(y) w3(x) w4(y) 120
TS’ (TS with Thomas Rule)
? All schedules
VSR 2PL
TS Serial
CSR
122
Example in theory:
TS-Multi allowing unordered writes
• Mechanism:
• rts(x) is always accepted. A copy xk is selected for
reading such that:
• If ts >= WTMN(x), then k = N
• Else take k such that WTMk(x) <= ts < WTMk+1(x)
• wts(x):
• If ts < RTM(x) the request is rejected
• Else a new version is created for timestamp ts (N is
incremented)
• WTM1(x), …, WTMN(x) are the new versions, kept sorted
from oldest to youngest
• NB: this version shows what can be done in theory but is
not the one used in the exercises
Example in theory:
TS-Multi allowing unordered writes
Assume RTM(x) = 7, N=1, WTM1(x) = 4
Request Response RTM(x) WTM(x)
r6(x) ok 7 WTM1(x) = 4, N=1
r8(x) ok 8
r9(x) ok 9
w8(x) no - T8 killed
w11(x) ok WTM2(x) = 11, N=2
r10(x) ok on x1 (not killed) 10
r12(x) ok on x2 12
w14(x) ok WTM3(x) = 14, N=3
w13(x) ok (not killed) WTM3(x)=13, N=3 requires
resorting
WTM4(x)=14, N=4
2nd version (used in practice):
TS-Multi under Snapshot Isolation
• Mechanism:
• rts(x) is always accepted. A copy xk is selected for
reading such that:
• If ts >= WTMN(x), then k = N
• Else take k such that WTMk(x) <= ts < WTMk+1(x)
• wts(x):
• If ts < RTM(x) or ts < WTMN(x) the request is rejected
• Else a new version is created for timestamp ts (N is
incremented)
• WTM1(x), …, WTMN(x) are the new versions, kept sorted
from oldest to youngest
• NB: this version is used in real systems based, e.g., on
snapshot isolation (see later) and in the exercises
Example in practice (for the exam):
TS-Multi under Snapshot Isolation
Assume RTM(x) = 7, N=1, WTM1(x) = 4
Request Response RTM(X) WTM(X)
r6(x) ok 7 WTM1(x) = 4, N=1
r8(x) ok 8
r9(x) ok 9
w8(x) no - T8 killed
w11(x) ok WTM2(x) = 11, N=2
r10(x) ok on x1 10
r12(x) ok on x2 12
w14(x) ok WTM3(x) = 14, N=3
w13(x) no - T13 killed*
* in the exercises!!!
CSR, VSR, 2PL, TSmono, TSmulti
? All
TS(multi)
2PL
TS(mono)
VSR
Serial
CSR
X : w1 w2 r1
• Versions: X0 (original value), X1, X2
• r1 reads X1 [WTMk(x) <= ts < WTMk+1(x)]
• The schedule is not in VSR
• T1, T2 different reads-from relation for r1(x) w1 r1 w2
• T2, T1 different final write relation w2 w1 r1 127
Snapshot Isolation (SI)
• The realization of multi-TS gives the opportunity to introduce into
DBMSs (e.g., Oracle, MySQL, PostgreSQL, MongoDB, Microsoft SQL
Server) another isolation level, SNAPSHOT ISOLATION
• In this level, no RTM is used on the objects, only WTMs
• Every transaction reads the version consistent with its timestamp (i.e.,
the version that existed when the transaction started a.k.a. snapshot),
and defers writes to the end
• Write: when a transaction attempts to write e.g., on a row, it first checks
whether any other transactions have modified that row since it began. If
there has been a modification (i.e., if the snapshot view is no longer
valid), the transaction is rolled back or retried, depending on the
DBMS's implementation.
• Read operations in a transaction do not block write operations in other
transactions. Transactions can read data without waiting for other
transactions to complete, improving performance and concurrency.
• It is yet another case of optimistic concurrency control
Anomalies in Snapshot Isolation
• Snapshot isolation does not guarantee serializability
• T1: update Balls set Color=White where Color=Black
• T2: update Balls set Color=Black where Color=White
• Serializable executions of T1 and T2 will produce a final
configuration with balls that are either all white or all
black
• An execution under Snapshot Isolation in which the
two transactions start with the same snapshot will just
swap the two colors
• This anomaly is called write skew
Assigning timestamps in
distributed systems
• Timestamp: an indicator of the “current time”
• Assumption: no “global time” is available
• Mechanism: a system’s function gives out timestamps on requests
• Syntax: timestamp = event-id.node-id
• event-ids are unique at each node
• Note that the notion of time is “lexical”:
timestamp 5.1 “occurs before” timestamp 5.2
• Synchronization: send-receive of messages
• for a given message m, send(m) precedes receive(m)
• Algorithm (Lamport method): cannot receive a message from “the
future”, if this happens the “bumping rule” is used to bump the
timestamp of the receive event beyond the timestamp of the send
event
• Mnemonically: if I receive a message from you that has a timestamp
greater than my last emitted one I update my current timestamp to
exceed yours
Example of timestamp assignment
A(1.1) B(2.1) C(3.1) D(4.1) E(5.1) F(6.1) G(8.1) H(9.1) I(10.1)
A B C D H
E F G I
Y Z T U V
X P
• MYSQL
• https://dev.mysql.com/doc/refman/8.0/en/locking-issues.html
• IBM DB2
• https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/co
m.ibm.db2.luw.admin.perf.doc/doc/c0054923.html
• MONGODB
• https://docs.mongodb.com/manual/faq/concurrency/