Concurrency Control in Distributed Database Systems
Concurrency Control in Distributed Database Systems
Concurrency controlling techniques ensure that multiple transactions are executed simultaneously while
maintaining the ACID properties of the transactions and serializability in the schedules.
Locking-based concurrency control protocols use the concept of locking data items. A lock is a variable
associated with a data item that determines whether read/write operations can be performed on that
data item. Generally, a lock compatibility matrix is used which states whether a data item can be locked
by two transactions at the same time.
Locking-based concurrency control systems can use either one-phase or two-phase locking protocols.
In this method, each transaction locks an item before use and releases the lock as soon as it has finished
using it. This locking method provides for maximum concurrency but does not always enforce
serializability.
In this method, all locking operations precede the first lock-release or unlock operation. The transaction
comprise of two phases. In the first phase, a transaction only acquires all the locks it needs and do not
release any lock. This is called the expanding or the growing phase. In the second phase, the transaction
releases the locks and cannot request any new locks. This is called the shrinking phase.
Every transaction that follows two-phase locking protocol is guaranteed to be serializable. However, this
approach provides low parallelism between two conflicting transactions.
These algorithms ensure that transactions commit in the order dictated by their timestamps. An older
transaction should commit before a younger transaction, since the older transaction enters the system
before the younger one.
Timestamp-based concurrency control techniques generate serializable schedules such that the
equivalent serial schedule is arranged in order of the age of the participating transactions.
Access Rule − When two transactions try to access the same data item simultaneously, for conflicting
operations, priority is given to the older transaction. This causes the younger transaction to wait for the
older transaction to commit first.
Late Transaction Rule − If a younger transaction has written a data item, then an older transaction is not
allowed to read or write that data item. This rule prevents the older transaction from committing after
the younger transaction has already committed.
Younger Transaction Rule − A younger transaction can read or write a data item that has already been
written by an older transaction.
In systems with low conflict rates, the task of validating every transaction for serializability may lower
performance. In these cases, the test for serializability is postponed to just before commit. Since the
conflict rate is low, the probability of aborting transactions which are not serializable is also low. This
approach is called optimistic concurrency control technique.
In this approach, a transaction’s life cycle is divided into the following three phases
Execution Phase − A transaction fetches data items to memory and performs operations upon them.
Validation Phase − A transaction performs checks to ensure that committing its changes to the database
passes serializability test.
Commit Phase − A transaction writes back modified data item in memory to the disk.
Rule 2 − Given two transactions Ti and Tj, if Ti is writing the data item that Tj is reading, then Ti’s commit
phase cannot overlap with Tj’s execution phase. Tj can start executing only after Ti has already
committed.
Rule 3 − Given two transactions Ti and Tj, if Ti is writing the data item which Tj is also writing, then Ti’s
commit phase cannot overlap with Tj’s commit phase. Tj can start to commit only after Ti has already
committed.
In this section, we will see how the above technologies are implemented in a distributed database
system.
The basic principle of two-phase distributed locking is the same as the two-phase primary locking
protocol. However, in the distributed system there are sites dedicated as lock managers. A lock manager
controls requests to obtain a lock from transaction monitors. In order to enforce coordination between
locksmith managers in different locations, at least one site is given the authority to see all transactions
and discover lock conflicts.
Depending on the number of sites that can detect locking conflicts, two-phase distributed locking
methods can be of three types -
Two-stage central locking - In this approach, one site is designated as central locking manager. All
locations in the environment know the location of the central lock manager and obtain a lock from it
during transactions.
Two-stage basic copy locking - In this approach, a number of locations are designated as lock control
centers. Each of these sites is responsible for managing a specific set of locks. All sites know the lock
control center responsible for managing the spreadsheet / segment element lock.
Two-stage distributed locking - In this approach, there are a number of lock managers, with each lock
manager controlling the locks of data items stored in their local location. The location of the lock
manager depends on the distribution and frequency of data.
Synchronization control of the distributed timestamp
In a centralized system, the timestamp of any transaction is determined by reading the actual clock. But
in a distributed system, the local physical / logical clock readings of any location cannot be used as
global timestamps, since they are not unique to the world. Therefore, the timestamp consists of a
combination of the site ID and that site's clock read.
To implement timestamp request algorithms, each site has a scheduler that maintains a separate queue
for each transaction manager. During the transaction, the Transaction Manager sends a lock request to
the site scheduler. The scheduler places the request in the corresponding queue in incremental
timestamp order. Requests are processed from the front of the queues in the order of their timestamps,
i.e. oldest first.
Conflict diagrams
Another way is to create conflict diagrams. For these parameters, classes are defined. The transaction
class contains two sets of data elements called the read group and the write group. Transaction belongs
to a certain class if the transaction read group is a subset of the class read group and the transaction
write group is a subset of the class write group. In the reading phase, each transaction issues its read
requests for the data items in its reading group. In the writing stage, each transaction issues its own
write requests.
A conflict graph is generated for the classes to which the active transactions belong. This has a
combination of vertical, horizontal, and diagonal edges. The vertical edge connects two nodes within the
classroom and indicates conflicts within the classroom. The horizontal edge connects two nodes across
two classes and indicates the writing and writing conflicts between the different classes. The diagonal
edge connects two nodes across two classes and indicates that writing, reading and writing conflict
between two classes.
Conflict diagrams are analyzed to ascertain whether two transactions within the same class or across
two different classes can run in parallel.
Rule 1 - As per this rule, the transaction must be validated locally in all locations when it is executed. If
invalid transaction is found on any site, it is aborted. Local validation ensures that the transaction
maintains serializability in the locations in which it was executed. After the transaction passes the local
validation test, it is validated globally.
Rule 2 - According to this rule, after the transaction passes the local validation test, it must be validated
globally. Thorough verification ensures that if two conflicting transactions run together at more than
one site, they must adhere to the same relative order in all locations that run together. This may require
a transaction to wait for the other conflicting transaction, after validation before commitment. This
requirement makes the algorithm less optimistic because a transaction may not be able to comply once
validated on site.