Flynn's Classification Divides Computers Into Four Major Groups That Are
Flynn's Classification Divides Computers Into Four Major Groups That Are
1. Flynn's classification
Designing a distributed system does not come as easy and straight forward.
A number of challenges need to be overcome in order to get the ideal
system. The major challenges in distributed systems are listed below:
1. Heterogeneity:
The Internet enables users to access services and run applications over a
heterogeneous collection of computers and networks. Heterogeneity (that
is, variety and difference) applies to all of the following:
2. Transparency:
4. Concurrency
5. Security
Many of the information resources that are made available and maintained
in distributed systems have a high intrinsic value to their users. Their
security is therefore of considerable importance. Security for information
resources has three components:
confidentiality (protection against disclosure to unauthorized individuals)
integrity (protection against alteration or corruption),availability for the
authorized (protection against interference with the means to access the
resources).
6. Scalability
● Size
o A number of users and resources to be processed. The problem
associated is overloading
● Geography
o Distance between users and resources. The problem associated is
communication reliability
● Administration
o As the size of distributed systems increases, many of the systems need
to be controlled. The problem associated is an administrative mess
7. Failure Handling
These have one node with a real-time receiver and are called time server
node. The clock time of this node is regarded as correct and used as
reference time.The goal of this algorithm is to keep the clocks of all other
nodes synchronized with time server node.
i. Cristian’s Algorithm
2. Distributed algorithms
Synchronous services are easy to implement since they keep the complexity
of the communication low by providing immediate feedback. They avoid
the need to keep the context of a call on the client and server-side, including
e.g. the caller’s address or a message-id, beyond the lifetime of the request.
2. Group communication.
Group Communication
In the first case, the client can accept a response to its multicast from any
member of the group as long as at least one responds. The communication
system need only guarantee delivery of the multicast to a nonfaulty process
of the group on a best-effort basis. In the second case, the all-or-none
atomic delivery requirements require that the multicast needs to be
buffered until it is committed and subsequently delivered to the application
process, and so incurs additional latency.
Messages sent between machines may arrive zero or more times at any
point after they are sent
If we can't agree on the time then we can't always agree on what order
things happen. Suppose I say "my user logged on at 10:00:00" and you say
"my user logged on at 10:00:01". Maybe mine was first or maybe my clock
is just fast relative to yours. The only way to know for sure is if something
connects those two events. For example, if my user logged on and then sent
your user an email and if you received that email before your user logged
on then we know for sure that mine was first
This concept is called causal ordering and is written like this
If A and B happen on the same machine and A happens before B then A ->
B If I send you some message M and you receive it then (send M) -> (recv
M)
Consistency
Eventual Consistency
A process could record its own local state at a given time but the messages
that are in transit (on their way to being delivered) would not be included
in the recorded state and hence the actual state of the system would be
incorrect after the time in transit message is delivered.
Any process in the distributed system can initiate this global state
recording algorithm using a special message called MARKER. This
marker traverses the distributed system across all communication channel
and causes each process to record its own state. In the end, the state of the
entire system (Global state) is recorded. This algorithm does not interfere
with the normal execution of processes.
Algorithm:
Unit 3.
1. Ricart Agarwala algorithm.
Ricart–Agrawala algorithm is an algorithm to for mutual exclusion in a
distributed system proposed by Glenn Ricart and Ashok Agrawala. This
algorithm is an extension and optimization of Lamport’s Distributed
Mutual Exclusion Algorithm. Like Lamport’s Algorithm, it also follows
permission-based approach to ensure mutual exclusion.
In this algorithm:
● Two types of messages ( REQUEST and REPLY) are used and
communication channels are assumed to follow FIFO order.
● A site sends a REQUEST message to all other sites to get their
permission to enter the critical section.
● A site sends a REPLY message to another site to give its permission
to enter the critical section.
● A timestamp is given to each critical section request using Lamport’s
logical clock.
● The timestamp is used to determine the priority of critical section
requests. Smaller timestamp gets high priority over larger timestamp. The
execution of critical section requests is always in the order of their
timestamp.
Algorithm:
● To enter the Critical section:
○ When a site Si wants to enter the critical section, it sends a
timestamped REQUEST message to all other sites.
○ When a site Sj receives a REQUEST message from site Si, It sends a
REPLY message to site Si if and only if
■ Site Sj is neither requesting nor currently executing the critical
section.
■ In the case of Site, Sj's request, the timestamp of Site Si‘s request is
smaller than its own request.
○ Otherwise, the request is deferred by site Sj.
● To execute the critical section:
○ Site Si enters the critical section if it has received the REPLY
message from all other sites.
● To release the critical section:
○ Upon exiting the site Si sends a REPLY message to all the deferred
requests.
Drawbacks of Ricart–Agrawala algorithm:
● Unreliable approach: failure of any one node in the system can halt
the progress of the system. In this situation, the process will starve forever.
The problem of failure of a node can be solved by detecting failure after
some timeout.
2. Maekawa's algorithm.
Maekawa’s Algorithm is a quorum-based approach to ensure mutual
exclusion in distributed systems. As we know, In permission-based
algorithms like Lamport’s Algorithm, Ricart-Agrawala Algorithm, etc. a
site request permission from every other site but in quorum based
approach, A site does not request permission from every other site but
from a subset of sites which is called a quorum.
In this algorithm:
● Three types of messages ( REQUEST, REPLY, and RELEASE) are
used.
● A site sends a REQUEST message to all other sites in its request set
or quorum to get their permission to enter the critical section.
● A site sends a REPLY message requesting the site to give its
permission to enter the critical section.
● A site sends a RELEASE message to all other sites in its request set
or quorum upon exiting the critical section.
Algorithm:
● To enter the Critical section:
○ When a site Si wants to enter the critical section, it sends a request
message REQUEST(i) to all other sites in the request set Ri.
○ When a site Sj receives the request message REQUEST(i) from site
Si, it returns a REPLY message to site Si if it has not sent a REPLY
message to the site from the time it received the last RELEASE message.
Otherwise, it queues up the request.
● .
● To execute the critical section:
○ A site Si can enter the critical section if it has received the REPLY
message from all the sites in the request set Ri
● To release the critical section:
○ When a site Si exits the critical section, it sends a RELEASE(i)
message to all other sites in request set Ri
○ When a site Sj receives the RELEASE(i) message from site Si, it
sends a REPLY message to the next site waiting in the queue and deletes
that entry from the queue
○ In case the queue is empty, site Sj updates its status to show that it
has not sent any REPLY message since the receipt of the last RELEASE
message
3. Suzuki kasami.
Suzuki–Kasami algorithm is a token-based algorithm for achieving mutual
exclusion in distributed systems. This is a modification of the
Ricart–Agrawala algorithm, a permission-based (Non-token based)
algorithm which uses REQUEST and replies messages to ensure mutual
exclusion.
In token-based algorithms, A site is allowed to enter its critical section if it
possesses a unique token. Non-token-based algorithms use the timestamp
to order requests for the critical section whereas sequence number is used
in token-based algorithms.
Each request for a critical section contains a sequence number. This
sequence number is used to distinguish between old and current requests.
Data structure and Notations:
4. Broadcast algorithm.
5. System model.
6. Models of deadlock.
Deadlock Detection
DeadlockRecovery
A traditional operating system such as Windows doesn’t deal with
deadlock recovery as it is time and space consuming process. Real-time
operating systems use Deadlock recovery.
Recovery method
1. Killing the process: killing all the process involved in the deadlock.
Killing process one by one. After killing each process check for deadlock
again keep repeating the process till system recover from deadlock.
2. Resource Preemption: Resources are preempted from the processes
involved in the deadlock, preempted resources are allocated to other
processes so that there is a possibility of recovering the system from
deadlock. In this case, the system goes into starvation.
7. Knapp's classification.
Unit 4.
1. Checkpoint best recovery.
Checkpoint-Recovery is a common technique for imbuing a program or
system with fault tolerant qualities, and grew from the ideas used in
systems which employ transaction processing
. It allows systems to recover after some fault interrupts the system, and
causes the task to fail, or be aborted in some way. While many systems
employ the technique to minimize lost processing time, it can be used more
broadly to tolerate and recover from faults in a critical application or task.
In the event of a system failure, the internal state of the system can be
restored, and it can continue service from the point at which its state was
last saved. Typically this involves restarting the failed task or system and
providing some parameter indicating that there is a state to be recovered.
Depending on the task complexity, the amount of state, and the bandwidth
to the storage device this process could take from a fraction of a second to
many seconds.
key concepts
Between full snapshots, or even in place of all but the first complete shot,
only that state which has changed may be saved. This is known as
incremental checkpointing and can be thought of in the same way as
incremental backups of hard disks. The basic idea here is to minimize the
cost of checkpointing, both in terms of the time required and the space (on
non-volatile storage).
Not all program states may need to be saved. System designers may find it
more efficient to build mechanisms to regenerate states internally, based on
a smaller set of the saved states. Although this technique might be difficult
for some applications, it has the benefit of having the potential to save both
time and space during both checkpoint and recovery operations.
RestoringExecutivestate
When a failure has occurred, the recovery mechanism restores the system
state to the last checkpointed value. This is the fundamental idea in the
tolerance of a fault within a system employing checkpoint recovery. Ideally,
the state will be restored to a condition before the fault occurred within the
system. After the state has been restored, the system can continue normal
execution.
If the root cause of the failure did not manifest until after a checkpoint, and
that cause is part of the state or input data, the restored system is likely to
fail again. In such a case the error in the system may be latent through
several checkpoint cycles. When it finally activates and causes a system
failure, the recovery mechanism will restore the state (including the error!)
and execution will begin again, most likely triggering the same activation
and failure. Thus it is in the system designers' best interest to ensure that
any checkpoint-recovery-based system is fail fast - meaning errors are
either tolerated or case the system fails immediately, with little or no
incubation period.
Failure Detection
To achieve our goal of atomicity, the user must first output stable storage
information describing the modifications, without modifying the database
itself. This information can help us ensure that all modifications performed
by committed transactions are reflected in the database. This information
can also help us ensure that no modifications made by an aborted
transaction persist in the database.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1. Undo: using a log record sets the data item specified in the log record
to an old value.
2. Redo: using a log record sets the data item specified in the log record
to the new value.
After a system crash has occurred, the system examines the log to find the
last <checkpoint L> record. The redo or undo operations need to be
applied only to transactions in L, and to all transactions that started
execution after the record was written to the log. Let us denote this set of
transactions as T. Same rules of undo and redo are applicable on T as
mentioned in Recovery using the Log records part.
Note that the user needs to only examine the part of the log starting with
the last checkpoint log record to find the set of transactions T, and to find
out whether a commit or abort record occurs in the log for each
transaction in T. For example, consider the set of transactions {T0, T1, . . .,
T100}. Suppose that the most recent checkpoint took place during the
execution of transactions T67 and T69, while T68 and all transactions with
subscripts lower than 67 were completed before the checkpoint. Thus, only
transactions T67, T69, . . ., and T100 need to be considered during the
recovery scheme. Each of them needs to be redone if it has been completed
(that is, either committed
or aborted); otherwise, it was incomplete and needs to be undone.
Unit 5.
1. Peer-to-peer middleware.
2. Chord.
As described above, the entire CAN space is divided among the nodes
currently in the system. To allow the CAN to grow incrementally, a new
node that joins the system must be allocated its own portion of the
coordinate space. This is done by an existing node splitting its allocated
zone in half, retaining half, and handing the other half to the new node.
The process takes three steps: 1. First the new node must find a node
already in the CAN. 2. Next, using the CAN routing mechanisms, it
must find a node whose zone will be split. 3. Finally, the neighbors of the
split zone must be notified so that routing can include the new node.
Bootstrap A new CAN node first discovers the IP address of any node
currently in the system. The functioning of a CAN does not depend on
the details of how this is done, but we use the same bootstrap
mechanism as Yallcast and YOID. We assume that a CAN has an
associated DNS domain name and that this resolves to the IP address of
one or more CAN bootstrap nodes. A bootstrap node maintains a partial
list of CAN nodes it believes are currently in the system. To join a CAN,
a new node looks up the CAN domain name in DNS to retrieve a
bootstrap node’s IP address. The bootstrap node then supplies the IP
addresses of several randomly chosen nodes currently in the system.
Finding a Zone The new node then randomly chooses a point P in the
space and sends a JOIN request destined for point P. This message is
sent into the CAN via any existing CAN node. Each CAN node then uses
the CAN routing mechanism to forward the message, until it reaches the
node in whose zone P lies. This current occupant node then splits its
zone in half and assigns one half to the new node. The split is done by
assuming a certain ordering of the dimensions in deciding along which
dimension a zone is to be split so that zones can be re-merged when
nodes leave. For a 2-d space a zone would first be split along the X
dimension, then the Y, and so on. The (key, value) pairs from the half
zone to be handed over are also transferred to the new node. Joining the
Routing Having obtained its zone, the new node learns the IP addresses
of its coordinate neighbor set from the previous occupant. This set is a
subset of the previous occupant’s neighbors, plus that occupant itself.
Similarly, the previous occupant updates its neighbor set to eliminate
those nodes that are no longer neighbors. Finally, both the new and old
nodes’ neighbors must be informed of this reallocation of space. Every
node in the system sends an immediate update message, followed by
periodic refreshes, with its currently assigned zone to all its neighbors.
These soft-state style updates ensure that all of their neighbors will
quickly learn about the change and will update their own neighbor sets
accordingly. Figures 2 and 3 show an example of a new node (node 7)
joining a 2-dimensional CAN. As can be inferred, the addition of a new
node affects only a small number of existing nodes in a very small
locality of the coordinate space. The number of neighbors a node
maintains depends only on the dimensionality of the coordinate space
and is independent of the total number of nodes in the system. Thus,
node insertion affects only O(number of dimensions) of existing nodes
which are important for CANs with huge numbers of nodes.
4. Tapestry
Consistency Model
● A consistency model is a contract between a distributed data store
and processes, in which the processes agree to obey certain rules in contrast
the store promises to work correctly.
● A consistency model basically refers to the degree of consistency that
should be maintained for the shared memory data.
● If a system supports the stronger consistency model, then the weaker
consistency model is automatically supported but the converse is not true.
● The types of consistency models are Data-Centric and client-centric
consistency models.
1. Data-Centric Consistency Models
A data store may be physically distributed across multiple machines. Each
process that can access data from the store is assumed to have a local or
nearby copy available of the entire store.
i.Strict Consistency model
● Any read on a data item X returns a value corresponding to the
result of the most recent write on X
● This is the strongest form of memory coherence which has the most
stringent consistency requirement.
● Strict consistency is the ideal model but it is impossible to implement
in a distributed system. It is based on absolute global time or a global
agreement on the commitment to changes.
ii.Sequential Consistency
● Sequential consistency is an important data-centric consistency
model which is a slightly weaker consistency model than strict consistency.
● A data store is said to be sequentially consistent if the result of any
execution is the same as if the (read and write) operations by all processes
on the data store were executed in some sequential order and the
operations of each individual process should appear in this sequence in a
specified order.
● Example: Assume three operations read(R1), write(W1), and
read(R2) performed in order on a memory address. Then (R1, W1,
R2),(R1, R2, W1),(W1, R1, R2)(R2, W1, R1) are acceptable provided all
processes see the same ordering.
iii. Linearizability
● It that is weaker than strict consistency, but stronger than sequential
consistency.
● A data store is said to be linearizable when each operation is
timestamped and the result of any execution is the same as if the (read and
write) operations by all processes on the data store were executed in some
sequential order
● The operations of each individual process appear in sequence order
specified by its program.
● If tsOP1(x)< tsOP2(y), then operation OP1(x) should precede OP2(y)
in this sequence.
iv. Causal Consistency
● It is a weaker model than sequential consistency.
● In Casual Consistency, all processes see only those memory reference
operations in the correct order that are potentially causally related.
● Memory reference operations that are not related may be seen by
different processes in a different order.
● A memory reference operation is said to be causally related to
another memory reference operation if the first operation is influenced by
the second operation.
● If a write(w2) operation is casually related to another write (w1) the
acceptable order is (w1, w2).
v.FIFO Consistency
● It is weaker than causal consistency.
● This model ensures that all write operations performed by a single
process are seen by all other processes in the order in which they were
performed like a single process in a pipeline.
● This model is simple and easy to implement having good
performance because processes are ready in the pipeline.
● Implementation is done by sequencing write operations performed at
each node independently of the operations performed on other nodes.
● Example: If (w11) and (w12) are write operations performed by p1 in
that order and (w21),(w22) by p2. A process p3 can see them as
[(w11,w12),(w21,w2)] while p4 can view them as [(w21,w2),(w11,w12)].
vi. Weak consistency
● The basic idea behind the weak consistency model is enforcing
consistency on a group of memory reference operations rather than
individual operations.
● A Distributed Shared Memory system that supports the weak
consistency model uses a special variable called a synchronization variable
which is used to synchronize memory.
● When a process accesses a synchronization variable, the entire
memory is synchronized by making visible the changes made to the
memory to all other processes.
vii.Release Consistency
● The release consistency model tells whether a process is entering or
exiting from a critical section so that the system performs either of the
operations when a synchronization variable is accessed by a process.
● Two synchronization variables acquire and release are used instead
of a single synchronization variable. Acquire is used when the process
enters a critical section and release is when it exits a critical section.
● Release consistency can be viewed as a synchronization mechanism
based on barriers instead of critical sections.
viii. Entry Consistency
● In entry consistency, every shared data item is associated with a
synchronization variable.
● In order to access consistent data, each synchronization variable
must be explicitly acquired.
● Release consistency affects all shared data but entry consistency
affects only those shared data associated with a synchronization variable.
2. Client-Centric Consistency Models
● Client-centric consistency models aim at providing a system-wide
view of a data store.
● This model concentrates on consistency from the perspective of a
single mobile client.
● Client-centric consistency models are generally used for applications
that lack simultaneous updates where most operations involve reading
data.
i.Eventual Consistency
● In Systems that tolerate a high degree of inconsistency, if no updates
take place for a long time all replicas will gradually and eventually become
consistent. This form of consistency is called eventual consistency.
● Eventual consistency only requires those updates that guarantee
propagation to all replicas.
● Eventual consistent data stores work fine as long as clients always
access the same replica.
● Write conflicts are often relatively easy to solve when assuming that
only a small group of processes can perform updates. Eventual consistency
is therefore often cheap to implement.
ii. Monotonic Reads Consistency
● A data store is said to provide monotonic-read consistency if a
process reads the value of a data item x, any successive read operation on x
by that process will always return that same value or a more recent value.
● A process has seen a value of x at time t, it will never see an older
version of x at a later time.
● Example: A user can read incoming mail while moving. Each time
the user connects to a different e-mail server, that server fetches all the
updates from the server that the user previously visited. Monotonic Reads
guarantees that the user sees all updates, no matter from which server the
automatic reading takes place.
iii. Monotonic Writes
● A data store is said to be monotonic-write consistent if a write
operation by a process on a data item x is completed before any successive
write operation on X by the same process.
● A write operation on a copy of data item x is performed only if that
copy has been brought up to date by means of any preceding write
operations, which may have taken place on other copies of x.
● Example: Monotonic-write consistency guarantees that if an update
is performed on a copy of Server S, all preceding updates will be
performed first. The resulting server will then indeed become the most
recent version and will include all updates that have led to previous
versions of the server.
iv. Read Your Writes
● A data store is said to provide read-your-writes consistency if the
effect of a write operation by a process on data item x will always be a
successive read operation on x by the same process.
● A write operation is always completed before a successive read
operation by the same process no matter where that read operation takes
place.
● Example: Updating a Web page and guaranteeing that the Web
browser shows the newest version instead of its cached copy.
v.Writes Follow Reads
● A data store is said to provide writes-follow-reads consistency if a
process has a write operation on a data item x following a previous read
operation on x then it is guaranteed to take place on the same or a more
recent value of x that was read.
● Any successive write operation by a process on a data item x will be
performed on a copy of x that is up to date with the value most recently
read by that process.
● Example: Suppose a user first reads an article A and then posts a
response B. By requiring writes-follow-reads consistency, B will be written
to any copy only after A has been written.