0% found this document useful (0 votes)
7 views30 pages

Chapter 6-Consistency and Replication-updated

Chapter 6 discusses the importance of consistency and replication in distributed systems, focusing on their roles in ensuring data accuracy and availability. It outlines various consistency models, including strict, sequential, weak, and release consistency, and explains how replication enhances reliability and performance while presenting trade-offs in maintaining consistency. The chapter emphasizes the need for balancing scalability and consistency requirements in distributed data management.

Uploaded by

yaredk383
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views30 pages

Chapter 6-Consistency and Replication-updated

Chapter 6 discusses the importance of consistency and replication in distributed systems, focusing on their roles in ensuring data accuracy and availability. It outlines various consistency models, including strict, sequential, weak, and release consistency, and explains how replication enhances reliability and performance while presenting trade-offs in maintaining consistency. The chapter emphasizes the need for balancing scalability and consistency requirements in distributed data management.

Uploaded by

yaredk383
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 6 - Consistency and Replication

Objectives of the Chapter

 we discuss
 why replication is useful and its relation with scalability; in
particular object-based replication
 consistency models
 Data –Centric consistency Model
 client–centric consistency models
 how consistency and replication are implemented

2
In distributed systems, consistency and replication are used for
ensuring that data remains accurate and available despite potential
failures and network partitions.
Consistency: This refers to the agreement of all nodes in a distributed
system about the current state of data. In other words, when a client
reads data from any node in the system, it should always receive the
most recent and accurate version of that data. Consistency ensures
that the data is always valid and up-to-date, regardless of which node
is accessed. There are various levels of consistency, ranging from
strong consistency (where all nodes see the same data at the same
time) to eventual consistency (where all nodes will eventually
converge to the same state).
Replication: Replication involves storing multiple copies of data across
different nodes in a distributed system. By replicating data, systems
can achieve fault tolerance, high availability, and scalability. If one
node fails or becomes unreachable, clients can still access the data
from other available nodes. Replication can be synchronous (where
data is replicated immediately to all nodes) or asynchronous (where
there might be a delay between updating data on one node and
replicating it to others). 3
6.1 Reasons for Replication
 two major reasons: reliability and performance
 reliability
 if a file is replicated, we can switch to other replicas if
there is a crash on our replica
 we can provide better protection against corrupted data;
similar to mirroring in non-distributed systems
 performance
 if the system has to scale in size and geographical area
 place a copy of data in the proximity of the process using
them, reducing the time of access and increasing its
performance; for example a Web server is accessed by
thousands of clients from all over the world

4
 Replication as Scaling Technique
 replication and caching are widely applied as scaling techniques
 processes can use local copies and limit access time and traffic
 however, we need to keep the copies consistent; but this may requires
more network bandwidth
 if the copies are refreshed more often than used (low access-to-
update ratio), the cost (bandwidth) is more expensive than the
benefits;
 Dilemma( tradeoff)
 scalability problems can be alleviated by applying replication and caching,
leading to a better performance
 but, keeping copies consistent requires global synchronization, which is
generally costly in terms of performance
 solution: loosen the consistency constraints
 updates do not need to be executed as atomic operations (no more
instantaneous global synchronization); but copies may not be always
the same everywhere
 to what extent the consistency can be loosened depends on the specific
application (the purpose of data as well as access and update patterns)
5
6.2 Data-Centric Consistency Models
 consistency has always been discussed
 in terms of read and write operations on shared data
available by means of (distributed) shared memory, a
(distributed) shared database, or a (distributed) file
system
 we use the broader term data store, which may be
physically distributed across multiple machines
 assume also that each process has a local copy of the data
store and write operations are propagated to the other
copies

the general organization of a logical data store, physically distributed and


replicated across multiple processes 6
 a consistency model is a contract between processes and the
data store
 processes agree to obey certain rules
 then the data store promises to work correctly
 ideally, a process that reads a data item expects a value that
shows the results of the last write operation on the data
 in a distributed system and in the absence of a global clock
and with several copies, it is difficult to know which is the last
write operation
 to simplify the implementation, each consistency model
restricts what read operations return
 data-centric consistency models to be discussed
1. strict consistency
2. sequential consistency
3. causal consistency
4. weak consistency
5. release consistency
6. entry consistency
7
1. Strict Consistency
 the most stringent consistency model and is defined by the following
condition:
 Any read on a data item x returns a value corresponding to the result
of the most recent write on x. It ensures that any read operation on a data item
returns the value of the most recent write operation on that data item,
regardless of the location of the read or write operations.
 the following notations and assumptions will be used
 Wi(x)a means write by Pi to data item x with the value a has been done
 Ri(x)b means a read by Pi to data item x returning the value b has been done
 Assume that initially each data item is NIL
 consider the following example; write operations are done locally and later
propagated to other replicas

behavior of two processes operating on the same data item


a) a strictly consistent data store
b) a data store that is not strictly consistent; P2’s first read may be, for example, after 1 nanosecond of
P1’s write
 the solution is to relax absolute time and consider time intervals
8
1. Strict Consistency
 Example 2: Consider two processes, P1 and P2, operating on
the same data item x in a strictly consistent data store. Let's
examine how strict consistency is maintained in this scenario.
1. Write Operations:
1. Suppose P1 writes a value "a" to data item x: 𝑊1( 𝑥) 𝑎W1​(x)a.

2. Later, P2 writes a value "b" to data item x: 𝑊2( 𝑥) 𝑏W2​(x)b.

2. Read Operations:
1. Now, suppose P1 reads data item x: 𝑅1(𝑥)𝑏R1​(x)b.

According to strict consistency, this read operation must


return the most recent value written to x, which is "b" in this
case.
2. Similarly, if P2 reads data item x: 𝑅2( 𝑥) 𝑏R2​(x)b, it must also

return the most recent value written to x, which is "b".

9
2.Sequential Consistency
 strict consistency is the ideal but impossible to implement
 fortunately, most programs do not need strict consistency
 sequential consistency is a slightly weaker consistency
 a data store is said to be sequentially consistent when it
satisfies the following condition:
 The result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process appear in this sequence in the
order specified by its program
 i.e., all processes see the same interleaving of operations
 time does not play a role; no reference to the “most recent”
write operation

10
 Example 2: Assume the following sequence of operations:
1. Initial State: Initially, the value of data item x is undefined or set to some initial value
(e.g., NIL).
2. Operations by P1 and P2:
1. P1 performs a write operation, setting the value of x to "1": 𝑊1( 𝑥)=1 W1​(x)=1.
2. P2 performs a write operation, setting the value of x to "2": 𝑊2( 𝑥)=2 W2​(x)=2.
3. P1 performs a read operation, reading the value of x: 𝑅1( 𝑥) R1​(x).
4. P2 performs a read operation, reading the value of x: 𝑅2( 𝑥) R2​(x).
3. Possible Sequential Consistent Execution:
1. If the system exhibits sequential consistency, the execution might proceed as
follows:
1. 𝑊1(𝑥)=1W1​(x)=1 (P1 writes "1" to x).

2. 𝑊2(𝑥)=2W2​(x)=2 (P2 writes "2" to x).

3. 𝑅1(𝑥)R1​(x) reads the value "2" (consistent with the most recent write by P2).

4. 𝑅2(𝑥)R2​(x) reads the value "2" (also consistent with the most recent write by

P2). In this example, the system ensures that:


• The order of write operations is respected: 𝑊1( 𝑥) W1​(x) precedes 𝑊2(𝑥)W2​(x) in the
global order of operations.
• The read operations return values consistent with the global order of writes: both reads
return the value "2", which was the result of the most recent write operation.

11
 example: four processes operating on the same data item x

 the write operation of P2 appears


to have taken place before that of
P1; but for all processes
a sequentially consistent data
store

 to P3, it appears as if the data item


has first been changed to b, and
later to a; but P4 , will conclude
that the final value is b
a data store that is not
 not all processes see the same
sequentially consistent
interleaving of write operations

12
3.Weak Consistency
 there is no need to worry about intermediate results in a
critical section since other processes will not see the data
until it leaves the critical section; only the final result need to
be seen by other processes
 this can be done by a synchronization variable, S, that has
only a single associated operation synchronize(S), which
synchronizes all local copies of the data store
 a process performs operations only on its locally available
copy of the store
 when the data store is synchronized, all local writes by
process P are propagated to the other copies and writes by
other processes are brought in to P’s copy
 It works using

 Weak consistency is achieved using a synchronization variable 𝑆.


 Synchronization Variable (S):

This variable is associated with a single operation, typically called


synchronize(S).
13
Local Copies of the Data Store:
Each process in the distributed system maintains its own local copy of the data store.
Operations on Local Copies:
Processes perform read and write operations only on their locally available copy of the
data store. This means that each process operates independently on its local data,
without immediate concern for the state of data in other processes' copies.
Synchronization Process:
When a process decides that it needs to synchronize its local copy of the data store with
other replicas, it invokes the synchronize(S) operation associated with the
synchronization variable 𝑆
Propagation of Writes:
Upon invoking synchronize(S), all local writes made by the process are propagated to the
other copies of the data store, ensuring that the changes made locally are eventually
visible to other processes.
Updates from Other Processes:
Additionally, during synchronization, any writes made by other processes are brought
into the local copy of the data store, ensuring that the process has an up-to-date view of
the shared data.

14
 this leads to weak consistency models which have three
properties
1. Accesses to synchronization variables associated with a
data store are sequentially consistent (all processes see all
operations on synchronization variables in the same order)
2. No operation on a synchronization variable is allowed to be
performed until all previous writes have been completed
everywhere
3. No read or write operation on data items are allowed to be
performed until all previous operations to synchronization
variables have been performed.All previous
synchronization will have been completed; by doing a
synchronization a process can be sure of getting the most
recent values)

15
 weak consistency enforces consistency on a group of
operations, not on individual reads and writes
 e.g., S stands for synchronizes; it means that a local copy
of a data store is brought up to date

a) a valid sequence of events for weak consistency


b) an invalid sequence for weak consistency; P2 should get b

16
4. Release Consistency
 with weak consistency model, when a synchronization
variable is accessed, the data store does not know whether it
is done because the process has finished writing the shared
data or is about to start reading
 if we can separate the two (entering a critical section and
leaving it), a more efficient implementation might be possible
 the idea is to selectively guard shared data; the shared data
that are kept consistent are said to be protected
 release consistency provides mechanisms to separate the
two kinds of operations or synchronization variables
 an acquire operation is used to tell that a critical region is
about to be entered
 a release operation is used to tell that a critical region has
just been exited

17
 when a process does an acquire, the store will ensure that all
copies of the protected data are brought up to date to be
consistent with the remote ones; does not guarantee that
locally made changes will be sent to other local copies
immediately
 when a release is done, protected data that have been
changed are propagated out to other local copies of the store;
it does not necessarily import changes from other copies

a valid event sequence for release consistency


 a distributed data store is release consistent if it obeys the
following:
 Before a read or write operation on shared data is performed,
all previous acquires done by the process must have
completed successfully.
 Before a release is allowed to be performed, all previous reads
and writes by the process must have been completed.
18
 implementation algorithm :
i. Eager release consistency
1. Acquire Operation:
1. When a process needs to acquire a lock, it sends a message to a
central synchronization manager requesting an acquire on a
particular lock.
2. If there is no competition for the lock, the request is granted by the
synchronization manager.
3. The process then performs reads and writes on the shared data
locally.
2. Release Operation:
1. After performing local reads and writes, when the process releases
the lock, it sends the modified data to the other copies that use
them.
2. Once each copy acknowledges receipt of the data, the
synchronization manager is informed of the release.
ii.Lazy release consistency
3. Acquire Operation:
1. When a process tries to acquire a lock, it doesn't send any
message to the synchronization manager.
2. Instead, it retrieves the most recent values of the data it needs.
4. Release Operation:
1. When a process releases the lock, it doesn't send any data 19
 Key Differences:
• Data Propagation:
• Eager Release Consistency: Modified data are immediately
sent to other copies upon release.
• Lazy Release Consistency: No data is sent upon release;
instead, processes fetch the most recent values when
acquiring locks.
• Bandwidth Usage:
• Eager Release Consistency: Data are sent even if they may
not be needed, potentially leading to higher bandwidth
usage.
• Lazy Release Consistency: Data are fetched only when
needed, reducing wastage of bandwidth.
• Synchronization Overhead:
• Eager Release Consistency: Synchronization manager needs
to coordinate data propagation upon release.
• Lazy Release Consistency: No coordination required for data
propagation during release; fetching data is done at the
time of acquire
20
6.3 Client-Centric Consistency Models
 Prioritize consistency from the perspective of the client or
application accessing the data in a distributed system. These models
focus on ensuring that clients observe a consistent and coherent
view of the shared data, regardless of the system's internal
mechanisms for data replication and synchronization.
 some key aspects of Client-Centric Consistency Models:
 Consistency Guarantees: Client-Centric Consistency Models
provide specific guarantees about the consistency level that clients
can expect when accessing shared data. These guarantees often
include properties such as linearizability, serializability, or strong
consistency, ensuring that clients see a globally consistent view of
the data.
 Client Interaction: These models consider how clients interact
with the system and prioritize providing consistent and
predictable behavior for client operations, such as reads and
writes. Clients expect their operations to be performed in a
manner that reflects a logical order and consistency across the
distributed system. 21
6.3 Client-Centric Consistency Models
Client-centric Policies: Client-Centric Consistency Models often define
policies and mechanisms for handling conflicts, resolving
inconsistencies, and ensuring that clients' operations are processed
correctly and in accordance with the specified consistency guarantees.
These policies may involve techniques such as conflict resolution
algorithms, versioning mechanisms, or coordination protocols.
Client-Server Architecture: Many Client-Centric Consistency Models are
designed around a client-server architecture, where clients interact with a
centralized or distributed set of servers to access and manipulate shared
data. The consistency guarantees and policies are enforced by the server
infrastructure to ensure a consistent view of the data for clients.

22
Eventual Consistency
 Eventual consistency is a consistency model used in distributed
systems where data updates are guaranteed to be propagated to all
replicas eventually, ensuring that all replicas converge to the same
value over time. Unlike strong consistency models, where updates are
immediately visible to all nodes in the system, eventual consistency
allows for temporary inconsistencies between replicas but guarantees
that these inconsistencies will eventually be resolved.
 the problem with eventual consistency is when different replicas are
accessed.

23
1. Monotonic Reads
 a data store is said to provide monotonic-read consistency if the
following condition holds:
 If a process reads the value of a data item x, any successive
read operation on x by that process will always return that same
value or a more recent value
 i.e., a process never sees a version of data older than what it has already seen
2. Writes Follow Reads
 a data store is said to provide writes-follow-reads consistency, if:
 A write operation by a process on a data item x following a previous
read operation on x by the same process, is guaranteed to take place
on the same or a more recent value of x that was read
 i.e., any successive write operation by a process on a data item x will be
performed on a copy of x that is up to date with the value most recently
read by that process
 this guaranties, for example, that users of a newsgroup see a posting of a
reaction to an article only after they have seen the original article; if B is a
response to message A, writes-follow-reads consistency guarantees that B
will be written to any copy only after A has been written 24
6.4 Distribution Protocols
 there are different ways of propagating, i.e., distributing
updates to replicas, independent of the consistency
model
 we will discuss
1. replica placement
2. update propagation
3. epidemic protocols
1. Replica Placement
 a major design issue for distributed data stores is
deciding where, when, and by whom copies of the data
store are to be placed
 three types of copies:
i. permanent replicas
ii. server-initiated replicas
iii. client-initiated replicas
25
i. Permanent Replicas
 the initial set of replicas that constitute a distributed data store;
normally a small number of replicas
 e.g., a Web site: two forms
 the files that constitute a site are replicated across a limited number
of servers on a LAN; a request is forwarded to one of the servers
 mirroring: a Web site is copied to a limited number of servers,
called mirror sites, which are geographically spread across the
Internet; clients choose one of the mirror sites
ii. Server-Initiated Replicas (push caches)
 Web Hosting companies dynamically create replicas to improve
performance (e.g., create a replica near hosts that use the Web site very often)
iii. Client-Initiated Replicas (client caches or simply caches)
 to improve access time
 a cache is a local storage facility used by a client to temporarily store a
copy of the data it has just received
 managing the cache is left entirely to the client; the data store from
which the data have been fetched has nothing to do with keeping
cached data consistent 26
2. Update Propagation
 updates are initiated at a client, forwarded to one of the
copies, and propagated to the replicas ensuring
consistency
 some design issues in propagating updates
i. state versus operations
ii. pull versus push protocols
iii. unicasting versus multicasting
i. State versus Operations
 what is actually to be propagated? three possibilities
 send notification of update only (for invalidation
protocols - useful when read/write ratio is small); use of
little bandwidth
 transfer the modified data (useful when read/write ratio
is high)
 transfer the update operation (also called active
replication); it assumes that each machine knows how
to do the operation; use of little bandwidth, but more
processing power needed from each replica
27
ii. Pull versus Push Protocols
 push-based approach (also called server- based protocols):
propagate updates to other replicas without those replicas
even asking for the updates (used when high degree of
consistency is required and there is a high read/write ratio)
 pull-based approach (also called client-based protocols):
often used by client caches; a client or a server requests
for updates from the server whenever needed (used when
the read/write ratio is low)
 a comparison between push-based and pull-based
protocols; for simplicity assume multiple clients and a
single server

28
iii. Unicasting versus Multicasting
 multicasting can be combined with push-based
approach; the underlying network takes care of sending a
message to multiple receivers
 unicasting is the only possibility for pull-based approach;
the server sends separate messages to each receiver
3. Epidemic Protocols
 update propagation in eventual consistency is often
implemented by a class of algorithms known as epidemic
protocols
 updates are aggregated into a single message and then
exchanged between two servers

29
6.5 Consistency Protocols (Reading Assignment)
 so far we have concentrated on various consistency
models and general design issues
 consistency protocols describe an implementation of a
specific consistency model
 there are three types
1. primary-based protocols
 remote-write protocols
 local-write protocols
2. replicated-write protocols
 active replication
 quorum-based protocols
3. cache-coherence protocols

30

You might also like