lecture 27
lecture 27
MANAGEMENT SYSTEM
MODULE 9 – NOSQL
Lecture 5 – Consistency
01
Consistency
Consistency
● Consistency, states that data cannot be written that would violate the rules for valid
data.
● If a certain action occurs that attempts to introduce inconsistent data, the entire
action is rolled back and an error returned to the user, and there are many types for
consistency like strong consistency and eventual consistency.
02
Update Consistency
Update Consistency
Write-write conflict
● Person x and person y want to update telephone number on website. Both will
have update rights
● Both person uses different format but at the same time they are updating
phone number.
● This issue is called write-write conflict (i.e.) two people updating same data
item at the same time.
● When these write reaches to server, the server will serialize them after one
another.
● Server will use alphabetical order then x person’s update will done first then y
person’s update will done.
● So x person’s update is overwritten by y person’s update and the x person’s
update became the lost update.
Update Consistency
● Pessimistic Approach
○ It works by preventing conflicts from occurring by having locks
○ In order to change a value you need to acquire a lock and the system ensures that
only one client can get a lock at a time.
○ So person Y will see result of person x and then he will decide whether to make
update or not.
● Optimistic Approach
○ Let conflicts occur, detect them and take action to sort them out.
○ A common optimistic approach is a conditional update where any client that does
an update tests the value just before updating it to see if it’s changed since his last
read.
○ In this case, X’s update would succeed but Y’s would fail.
○ The error would let Y know that he should look at the value again and decide
whether to attempt a further update.
Update Consistency
● Replication makes it much more likely to run into write-write conflicts.
● If different nodes have different copies of some data which can be independently
updated, then you’ll get conflicts unless you take specific measures to avoid them.
● Using a single node as the target for all writes for some data makes it much easier to
maintain update consistency.
03
Read Consistency
Read Consistency
● Having a data store that maintains update consistency is one thing, but it doesn’t
guarantee that readers of that data store will always get consistent responses to their
requests.
● Let’s imagine we have an order with line items and a shipping charge. The shipping
charge is calculated based on the line items in the order.
● If we add a line item, we thus also need to recalculate and update the shipping charge.
● In a relational database, the shipping charge and line items will be in separate tables.
● The danger of inconsistency is that Martin adds a line item to his order, Pramod then
reads the line items and shipping charge, and then Martin updates the shipping charge.
● This is an inconsistent read or read-write conflict
Read Consistency
● We refer to this type of consistency as logical consistency: ensuring that different
data items make sense together.
● To avoid a logically inconsistent read-write conflict, relational databases support the
notion of transactions.
● A common claim we hear is that NoSQL databases don’t support transactions and thus
can’t be consistent.
● Secondly, aggregate-oriented databases do support atomic updates, but only within a
single aggregate.
● This means that you will have logical consistency within an aggregate but not between
aggregates.
Read Consistency
● This is another inconsistent read—but it’s a breach of a different form of consistency we
call replication consistency: ensuring that the same data item has the same value
when read from different replicas
04
Relaxing Consistency
Relaxing Consistency
● Consistency is a Good Thing—but, sadly, sometimes we have to sacrifice it.
● It is always possible to design a system to avoid inconsistencies, but often impossible
to do so without making unbearable sacrifices in other characteristics of the system
● Trading off consistency is a familiar concept even in single-server relational database
systems.
● Here, our principal tool to enforce consistency is the transaction, and transactions can
provide strong consistency guarantees.
● However, transaction systems usually come with the ability to relax isolation levels,
allowing queries to read data that hasn’t been committed yet.
Relaxing Consistency
● CAP Theorem
○ NoSQL can not provide consistency and high availability together.
○ This was first expressed by Eric Brewer in CAP Theorem.
○ CAP theorem states that we can only achieve at most two out of three
guarantees for a database: Consistency, Availability, and Partition
Tolerance.
○ Consistency means that all nodes in the network see the same data at the same
time.
○ Availability is a guarantee that every request receives a response about
whether it was successful or failed.
○ Partition Tolerance is a guarantee that the system continues to operate despite
arbitrary message loss or failure of part of the system.
Relaxing Consistency
05
Relaxing Durability
Relaxing Durability
● Most people would laugh at relaxing durability—after all, what is the point of a data
store if it can lose updates?
● There are cases where you may want to trade off some durability for higher
performance.
● A failure of replication durability occurs when a node processes an update but fails
before that update is replicated to the other nodes.
● A simple case of this may happen if you have a master-slave distribution model where
the slaves appoint a new master automatically should the existing master fail.
● If a master does fail, any writes not passed onto the replicas will effectively become
lost.
● So, as user will think update is succeeded as master have acknowledged it but due to
failure of master node updates are lost.
06
Quorums
Quorums
● One piece of data is stored in multiple nodes. So we need some sort of mechanism so that
nodes agree on a particular return value.
● Suppose we replicate the data on 3 nodes. Node 0,1 and 2.
● We made a written request to node 0. Data is written on Node 0 and Node 1 and 2 are
waiting for the data.
● Meanwhile, we have a read request for the newly inserted data but Node 0 fails.
● Since Node 0 fails it checks in Node 1 and 2. But the replicas are still not inserted in these
nodes.
● So our database sends a data not found error which is false.
● Instead, the database should have sent database error.
● To avoid such issues we need a distributed consensus and one way to implement that is
through Quorum.
● If we have data on multiple nodes then we take the data with the latest timestamp. So even
if one of our nodes fails users can still read the data.