CS8603-Distributed Systems QA
CS8603-Distributed Systems QA
UNIT-I
S.No.
PART-A
1 Define distributed system.
17 List any two resources of hardware and software, which can be shared in
distributed systems with example. (NOV 2017)
Hardware – Printer, Disks, Fax machine, Router, Modem.
Software – Application Programs, Shared Files, Shared Databases, Documents, Services.
For a synchronization bound D>0, and for a source S of UTC time,|S(t) –Ci(t)|<T, for
i=1,2,…N and for all real times t in I. Internal synchronization:
If the clocks Ci are synchronized with one another to a Known degree of accuracy, then
we can measure the interval between two events occurring at different computers by
appealing to their local clocks, even though they are not
necessarily synchronized to an external source of time. This is internal synchronization. For
a synchronization bound D>0,|Ci(t)-Cj(t)|<D,
for i,j=1,2,…N. and for all real times t in I.
23 Explain Faultry and Crash Failure.
A clock that does not Keep to whatever correctness conditions apply is defined to be
faulty.
A clock’s crash failure is said to occur when the clock stops ticKing altogether;any other
clock failure is an arbitrary failure. A historical example of an arbitrary failure is that of a
clock with the ‘Y2K bug’, which broKe the monotonicity condition by registering the date
after 31 December 1999 as 1 January 1900 instead of 2000; another example is a clock
whose batteries are very low and whose drift rate suddenly becomes very large
24 How the clock synchronization done in
Cristian’s method?
A single time server might fail, so they suggest the use of a group of synchronized servers
It does not deal with faulty servers
25 Explain Logical time and logical clocks. MAY/JUNE 2016
Logical time
Lamport proposed a model of logical time that can be used to provide an ordering among
the events at processes running in different computers in a distributed system.Logical time
allows the order in which the messages are presented to be inferred without recourse to
clocks.
Vector clocks • Mattern and Fidge developed vector clocks to overcome the shortcoming
of Lamport’s clocks: the fact that from L(e)<L(e’)we cannot conclude that e ->e’
A vector clock for a system of N processes is an array of Nintegers. Each process Keeps
its own vector clock, Vi , which it uses to timestamp local events. Like Lamport
timestamps, processes piggybacK vector timestamps on the messages they send to one
another, and there are simple rules for updating the clocks:
TaKing the componentwise maximum of two vector timestamps in this way is Known as a
merge operation.
27 Explain global states and consistent cuts with example.
1 Define distributed systems. What are the significant issues and challenges of the
distributed systems? NOV/DEC 2017, APRIL/MAY 2018
Designing a distributed system does not come as easy and straight forward. A number of
challenges need to be overcome in order to get the ideal system. The major challenges in
distributed systems are listed below:
1. Heterogeneity:
The Internet enables users to access services and run applications over a heterogeneous
collection of computers and networks. Heterogeneity (that is, variety and difference) applies to
all of the following:
Different programming languages use different representations for characters and data
structures such as arrays and records. These differences must be addressed if programs written
in different languages are to be able to communicate with one another. Programs written by
different developers cannot communicate with one another unless they use common standards,
for example, for network communication and the
representation of primitive data items and data structures in messages. For this to happen,
standards need to be agreed and adopted – as have the Internet protocols.
Middleware : The term middleware applies to a software layer that provides a programming
abstraction , hardware, operating systems and programming languages. Most middleware
is implemented over the
Internet protocols, which themselves mask the differences of the underlying networks, but all
middleware deals with the differences in operating systems
and hardware
Heterogeneity and mobile code : The term mobile code is used to refer to program code that
can be transferred from one computer to another and run at the destination – Java applets are an
example. Code suitable for running on one computer is not necessarily suitable for running on
another because executable programs are normally specific both to the instruction set and to
the host operating system.
2. Transparency:
Transparency is defined as the concealment from the user and the application programmer of
the separation of components in a distributed system, so that the system is perceived as a whole
rather than as a collection of independent components. In other words, distributed systems
designers must hide the complexity of the systems as much as they can. Some terms of
transparency in distributed systems are:
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in
use Replication Hide that a resource may be copied in several places
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or a disk
3. Openness
The openness of a computer system is the characteristic that determines whether the system
can be extended and reimplemented in various ways. The openness of distributed systems is
determined primarily by the degree to which new resource-sharing services can be added and
be made available for use by a variety of client programs. If the well-defined interfaces for a
system are published, it is easier for developers to add new features or replace sub-systems in
the future. Example: Twitter and Facebook have API that allows developers to develop theirs
own software interactively.
4. Concurrency
Both services and applications provide resources that can be shared by clients in a distributed
system. There is therefore a possibility that several clients will attempt to access a shared
resource at the same time. For example, a data structure that records bids for an auction may be
accessed very frequently when it gets close to the deadline time. For an object to be safe in a
concurrent environment, its operations must be synchronized in such a way that its data
remains consistent. This can be achieved by standard techniques such as semaphores, which
are used in most operating systems.
5. Security
Many of the information resources that are made available and maintained in distributed
systems have a high intrinsic value to their users. Their security is therefore of considerable
importance. Security for information resources has three components:
confidentiality
integrity (protection against alteration or corruption),
availability for the authorized (protection against interference with the means to access the
resources).
6. Scalability
Distributed systems must be scalable as the number of user increases. The scalability is defined
by B. Clifford Neuman as A system is said to be scalable if it can handle the addition of users
and resources without suffering a noticeable loss of performance or increase in administrative
complexity
Size
o Number of users and resources to be processed. Problem associated
is overloading
Geography
o Distance between users and resources. Problem associated is
communication reliability
Administration
o As the size of distributed systems increases, many of the system needs to
be controlled. Problem associated is administrative mess
7. Failure Handling
Computer systems sometimes fail. When faults occur in hardware or software, programs may
produce incorrect results or may stop before they have completed the intended computation.
The handling of failures is particularly difficult.
Web search has emerged as a major growth industry in the last decade, with recent figures
indicating that the global number of searches has risen to over 10 billion per calendar month.
The task of a web search engine is to index the entire contents of the World Wide Web,
encompassing a wide range of information styles including web pages, multimedia sources and
(scanned) books. This is a very complex task, as current estimates state that the Web consists
of over 63 billion pages and one trillion unique web
Creative industries and entertainment - The emergence of online gaming as a novel and
highly interactive form of entertainment; the availability of music and film in the home through
networked media centres and more widely in the Internet via downloadable or streaming
content; the role of user-generated content (as mentioned above) as a new form of creativity,
for example via services such as YouTube; the creation of new forms of art and entertainment
enabled by emergent (including networked) technologies.
Healthcare - The growth of health informatics as a discipline with its emphasis on online
electronic patient records and related issues of privacy; the increasing role of telemedicine in
supporting remote diagnosis or more advanced services such as remote surgery (including
collaborative working between healthcare teams); the increasing application of networking and
embedded systems technology in assisted living, for example for monitoring the elderly in their
own homes.
Education - The emergence of e-learning through for example web-based tools such as virtual
learning environments; associated support for distance learning; support for collaborative or
community-based learning.
Transport and logistics - The use of location technologies such as GPS in route finding
systems and more general traffic management systems; the modern car itself as an example of
a complex distributed system (also applies to other forms of transport such as aircraft); the
development of web-based map services such as MapQuest, Google Maps and Google Earth.
In this method each node periodically sends a message to the server. When the time
server receives the message it responds with a message T, where T is the current time
of server node.
Assume the clock time of client be To when it sends the message and T1 when it
receives the message from server. To and T1 are measured using same clock so best
estimate of time for propagation is (T1-To)/2.
When the reply is received at clients node, its clock is readjusted to T+(T1-T0)/2. There
can be unpredictable variation in the message propagation time between the nodes
hence (T1-T0)/2 is not good to be added to T for calculating current time.
For this several measurements of T1-To are made and if these measurements exceed
some threshold value then they are unreliable and discarded. The average of the
remaining measurements is calculated and the minimum value is considered accurate
and half of the calculated value is added to T.
Advantage-It assumes that no additional information is available.
Disadvantage- It restricts the number of measurements for estimating the
value. ii.The Berkley Algorithm
This is an active time server approach where the time server periodically broadcasts its
clock time and the other nodes receive the message to correct their own clocks.
In this algorithm the time server periodically sends a message to all the computers in
the group of computers. When this message is received each computer sends back its
own clock value to the time server. The time server has a prior knowledge of the
approximate time required for propagation of a message which is used to readjust the
clock values. It then takes a fault tolerant average of clock values of all the computers.
The calculated average is the current time to which all clocks should be readjusted.
The time server readjusts its own clock to this value and instead of sending the current
time to other computers it sends the amount of time each computer needs for
readjustment. This can be positive or negative value and is calculated based on the
knowledge the time server has about the propagation of message.
2.Distributed algorithms
Distributed algorithms overcome the problems of centralized by internally synchronizing for
better accuracy.
i. Global Averaging Distributed Algorithms
In this approach the clock process at each node broadcasts its local clock time in the
form of a “resync” message at the beginning of every fixed-length resynchronization
interval. This is done when its local time equals To+iR for some integer i, where To is a
fixed time agreed by all nodes and R is a system parameter that depends on total nodes
in a system.
After broadcasting the clock value, the clock process of a node waits for time T which
is determined by the algorithm.
During this waiting the clock process collects the resync messages and the clock
process records the time when the message is received which estimates the skew after
the waiting is done. It then computes a fault-tolerant average of the estimated skew and
uses it to correct the clocks.
ii. Localized Averaging Distributes Algorithms
The global averaging algorithms do not scale as they need a network to support
broadcast facility and a lot of message traffic is generated.
Localized averaging algorithms overcome these drawbacks as the nodes in distributed
systems are logically arranged in a pattern or ring.
Each node exchanges its clock time with its neighbors and then sets its clock time to
the average of its own clock time and of its neighbors.
9 Explain Christian’s method for synchronizing Clocks
Cristian’s Algorithm
Cristian’s Algorithm is a clock synchronization algorithm is used to synchronize time with a
time server by client processes. This algorithm works well with low-latency networks where
Round Trip Time is short as compared to accuracy while redundancy prone distributed
systems/applications do not go hand in hand with this algorithm. Here Round Trip Time refers
to the time duration between start of a Request and end of corresponding Response.
Below is an illustration imitating working of cristian’s algorithm
Algorithm:
1) The process on the client machine sends the request for fetching clock time(time at server)
to the Clock Server at time.
2) The Clock Server listens to the request made by the client process and returns the response
in form of clock server time.
3) The client process fetches the response from the Clock Server at time and calculates the
synchronised client clock time
filter_none
brightness_4
import socket
import datetime
s = socket.socket()
print("Socket successfully created")
# Server port
port = 8000
s.bind(('', port))
# Driver function
if name == ' main ':
Output:
Lamport[1978] generalized these two relationships into the happened-before relation: e →i e'
Overview
There are two formal models of distributed systems: synchronous and asynchronous.
the time to execute each step of a process has known lower and upper bounds;
each message transmitted over a channel is received within a known bounded time;
each process has a local clock who drift in real time has a known bound.
Asynchronous distributed systems, in contrast, guarantee no bounds on process execution
speeds, message transmission delays, or clock drift rates. Most distributed systems we discuss,
including the Internet, are asynchronous systems.
Suppose we want to build a distributed system to track the battery usage of a bunch of
laptop computers and we'd like to record the percentage of the battery each has
remaining at exactly 2pm.
Suppose we want to build a distributed, real time auction and we want to know which
of two bidders submitted their bid first.
Suppose we want to debug a distributed system and we want to know whether variable
x1 in process p1 ever differs by more than 50 from variable x2 in process p2.
In the first example, we would really like to synchronize the clocks of all participating
computers and take a measurement of absolute time. In the second and third examples,
knowing the absolute time is not as crucial as knowing the order in which events occurred.
Clock Synchronization
Every computer has a physical clock that counts oscillations of a crystal. This hardware clock
is used by the computer's software clock to track the current time. However, the hardware
clock is subject to drift -- the clock's frequency varies and the time becomes inaccurate. As a
result, any two clocks are likely to be slightly different at any given time. The difference
between two clocks is called their skew.
There are several methods for synchronizing physical clocks. External synchronization means
that all computers in the system are synchronized with an external source of time (e.g., a UTC
signal). Internal synchronization means that all computers in the system are synchronized with
one another, but the time is not necessarily accurate with respect to UTC.
Cristian's method for synchronization in asynchronous systems is similar, but does not rely on
a predetermined max and min transmission time. Instead, a process p1 requests the current time
from another process p2 and measures the RTT (Tround) of the request/reply. Whenp1 receives
the time t from p2 it sets its time to t + Tround/2.
The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an
internal synchronization mechanism that works by electing a master to coordinate the
synchronization. The master polls the other computers (called slaves) for their times, computes
an average, and tells each computer by how much it should adjust its clock.
The Network Time Protocol (NTP) is yet another method for synchronizing clocks that uses a
hierarchical architecture
Logical Time
Physical time cannot be perfectly synchronized. Logical time provides a mechanism to define
the causal order in which events occur at different processes. The ordering is based on the
following:
Two events occurring at the same process happen in the order in which they are
observed by the process.
If a message is sent from one process to another, the sending of the message happened
before the receiving of the message.
If e occurred before e' and e' occurred before e" then e occurred before e".
"Lamport called the partial ordering obtained by generalizing these two relationships the
happened-before relation." ( → )
(a || e).
A Lamport logical clock is a monotonically increasing software counter, whose value need
bear no particular relationship to any physical clock. Each process pi keeps its own logical
clock, Li, which it uses to apply so-called Lamport timestamps to events.
If e → e ' then V(e) < V(e') and if V(e) < V(e') then e → e ' .
Global States
In general, this problem is referred to as Global Predicate Evaluation. "A global state predicate
is a function that maps from the set of global state of processes in the system ρ to {True,
False}."
Cuts
Definitions:
ρ is a system of N processes pi (i = 1, 2, ..., N)
history(pi) = hi = < e i 0 , e i 1 ,...>
h i k =< e i 0 , e i 1 ,..., e i k > - a finite prefix of the process's history
s i k is the state of the process pi immediately before the kth event occurs
All processes record sending and receiving of messages. If a process pi records the
sending of message m to process pj and pj has not recorded receipt of the message, then
m is part of wthewchannel
w . between
A U Npi e andwpj.s B l o g . n e t
A global
histories: H the
= hstate
history ∪ ρhof
0 of 1i ∪sh2 th e u n io n of t h e i nd i v id u al process
∪...∪hN-1
A global state can be formed by taking the set of states of the individual processes: S =
(s1, s2, ..., sN)
A cut of the system's execution is a subset of its global history that is a union of
prefixes of process histories (see figure below).
The frontier of the cut is the last state in each process.
A cut is consistent if, for all events e and e':
o ( e ∈ C and e ' → e ) ⇒ e ' ∈ C
A consistent global state is one that corresponds to a consistent cut.
Distributed Debugging
To further examine how you might produce consistent cuts, we'll use the distributed debugging
example. Recall that we have several processes, each with a variable x i. "The safety condition
required in this example is |xi-xj| <= δ (i, j = 1, 2, ..., N)."
The algorithm we'll discuss is a centralized algorithm that determines post hoc whether the
safety condition was ever violated. The processes in the system, p 1, p2, ..., pN, send their states
to a passive monitoring process, p0. p0 is not part of the system. Based on the states collected,
p0 can evaluate the safety condition.
Collecting the state: The processes send their initial state to a monitoring process and send
updates whenever relevant state changes, in this case the variable x i. In addition, the processes
need only send the value of x i and a vector timestamp. The monitoring process maintains a an
ordered queue (by the vector timestamps) for each process where it stores the state messages. It
can then create consistent global states which it uses to evaluate the safety condition.
Let S = (s1, s2, ..., SN) be a global state drawn from the state messages that the monitor process
has received. Let V(si) be the vector timestamp of the state s i received from pi. Then it can be
shown that S is a consistent global state if and only if:
In Distributed systems, we neither have shared memory nor a common physical clock and there
for we can not solve mutual exclusion problem using shared variables. To eliminate the mutual
exclusion problem in distributed system approach based on message passing is used.
A site in distributed system do not have complete information of state of the system due to lack
of shared memory and a common physical clock.
No Deadlock:
Two or more site should not endlessly wait for any message that will never arrive.
No Starvation:
Every site who wants to execute critical section should get an opportunity to execute it
in finite time. Any site should not wait indefinitely to execute critical section while
other site are repeatedly executing critical section
Fairness:
Each site should get a fair chance to execute critical section. Any request to execute
critical section must be executed in the order they are made i.e Critical section
execution requests should be executed in the order of their arrival in the system.
Fault Tolerance:
In case of failure, it should be able to recognize it by itself in order to continue
functioning without any disruption.
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
In the exclusive lock, the data item can be both reads as well as written by the
transaction.
This lock is exclusive, multiple transactions do not modify the same
data simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it. It
will unlock the data item after completing the transaction.
Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
If all the locks are not granted then this protocol allows the transaction to rolls back
and waits until all the locks are granted.
The two-phase locking protocol divides the execution phase of the transaction into
three parts.
In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
Synchronous communication: The calling party requests a service, and waits for
the service to complete. Only when it receives the result of the service it continues
with its work. A timeout may be defined, so that if the service does not finish
within the defined period the call is assumed to have failed and the caller continues.
Asynchronous communication: The calling party initiates a service call, but does
not wait for the result. The caller immediately continues with its work without
caring for the result. If the caller is interested in the result there are mechanisms
which we'll discuss in the next paragraphs.
Be aware that the distinction between synchronous and asynchronous is highly dependent
on the viewpoint. Often asynchronous is used in the sense of “the user interface must stay
responsive all the time”. This interpretation often leads to the wrong conclusion: “…and
therefore every communication must be asynchronous”. A non-blocking GUI usually has
nothing to do with the low-level communication contracts and can be achieved by different
means, e.g. parallel processing of the interactive and communication tasks. The truth is
that synchronous communication on a certain level of abstraction can be implemented with
asynchronous interfaces on another level vice versa, if needed.
File based communication is often considered to be asynchronous. One party writes a file
but does not care if the other party is active, fetches the file or is able to process it.
However it is possible to implement another layer of functionality so that the second
(reading) party gives feedback, e.g. by writing a short result file, so that the first (writing)
party can wait and poll for the result of the file processing. This layer introduces a
synchronous communication over file exchange.
Synchronous services are easy to implement, since they keep the complexity of the
communication low by providing immediate feedback. They avoid the need to keep the
context of a call on the client and server side, including e.g. the caller’s address or a
message id, beyond the lifetime of the request.
In some cases the delivery of the asynchronous request can be assured by some other
mechanism, e.g. message local clockwhose drift rate file system or creation of a T4x job.
The simplest asynchronous message exchange pattern is called fire-and-forget and means
that a message is sent but no feedback is required (at least on that level of abstraction!).
The only possible feedback can come from the communication layer in case of an error in
processing or sending the request, but never from the processing of the server.
Polling causes potentially high network loads and is therefore not recommended.
Nevertheless it has the advantage that the service provider (server) does not need to know
about its clients and that no client needs to provide a service by itself.
On the contrary for the callback pattern, the receiver of the request (server) must by some
means know how to send the feedback message and must know how to address the correct
client (this information can be passed in the request or be stored statically). To collect the
feedback some active instance on the caller’s side must listen to receive the feedback
message (which in turn can be a fire-and-forget message). So the caller must become a
service provider (“server”) itself. It continues with its work after the request was fired
instead of waiting. So there can be some interaction between the client
and the callback instance to notify the client or the user of the arrival of the feedback. This
interaction happens entirely on the client and is usually not a communication issue (instead
you can imagine sending notification emails, notifying the GUI, push a workflow task to
the user’s inbox or similar actions).
As previously stated the implementation of the message transfer may use synchronous or
asynchronous transfers on a lower level. In the fire-and-forget example, the request might
be transferred via TCP, which implicitly acknowledges each message. Even if the
acknowledgment is being implemented, higher levels might not be interested in it. For the
callback and polling scenarios, each message might be acknowledged, but from a high
level perspective, there are only fire-and-forget messages.
Asynchronous behavior can be implemented for a T4x server by writing the message
(input parameters) to the file system (where the T4x scheduler will poll for it) or by
creating a job at the T4x job server. If the caller wants to be informed about the
execution’s result, the T4x server needs to store the caller’s response address and some
context information to be able to report the result back. This has to be done in the service
implementation. T4x as consumer can handle this by providing a callback service. Another
possibility is the caller periodically polling for the execution result, e.g. by looking for a
result file in the file system or by asking the T4x job server for the result of a job identified
by the job id. This can easily be done by providing an additional service asking for the job
result.
Messages sent between machines may arrive zero or more times at any point after they are
sent
For example, because of this property it is impossible for two computers communicating
over a network to agree on the exact time a message saying "it is now 10:00:00" but I
don't know how long it took for that message to arrive. We can send
messages back and forth all day but we will never know for sure that we are synchronised.
If we can't agree on the time then we can't always agree on what order things happen in.
Suppose I say "my user logged on at 10:00:00" and you say "my user logged on at
10:00:01". Maybe mine was first or maybe my clock is just fast relative to yours. The only
way to know for sure is if something connects those two events. For example, if my user
logged on and then sent your user an email and if you received that email before your user
logged on then we know for sure that mine was first.
Let's define it a little more formally. We model the world as follows: We have a number of
machines on which we observe a series of events. These events are either specific to one
machine (eg user input) or are communications between machines. We define the causal
ordering of these events by three rules:
If A and B happen on the same machine and A happens before B then A -> B
If I send you some message M and you receive it then (send M) -> (recv M)
We are used to thinking of ordering by time which is a total order - every pair of events
can be placed in some order. In contrast, causal ordering is only a partial order - sometimes
events happen with no possible causal relationship i.e. not (A -> B or B -> A).
On a single machine causal ordering is exactly the same as time ordering (actually, on a
multi-core machine the situation is more complicated, but let's forget about that for now).
Between machines causal ordering is conveyed by messages. Since sending messages is
the only way for machines to affect each other this gives rise to a nice property:
Since we don't have a single global time this is the only thing that allows us to reason
about causality in a distributed system. This is really important so let's say it again:
The lack of a total global order is not just an accidental property of computer systems, it is
a fundamental property of the laws of physics. I claimed that understanding causal order
makes many other concepts much simpler. Let's skim over some examples.
Vector Clocks
Lamport clocks and Vector clocks are data-structures which efficiently approximate the
causal ordering and so can be used by programs to reason about causality.
If A -> B then LC_A < LC_B
If VC_A < VC_B then A -> B
Consistency
When mutable state is distributed over multiple machines each machine can receive update
events at different times and in different orders. If the final state is dependent on the order
of updates then the system must choose a single serialisation of the events, imposing a
global total order. A distributed system is consistent exactly when the outside world can
never observe two different serialisations.
CAP Theorem
The first choice risks violating consistency if some other machine makes the same choice
with a different set of events. The second violates availability by waiting for every other
machine that could possibly have received a conflicting event before performing the
requested action. There is no need for an actual network partition to happen - the trade-off
between availability and consistency exists whenever communication between components
is not instant. We can state this even more simply:
Even your hardware cannot escape this law. It provides the illusion of synchronous access
to memory at the cost of availabilty. If you want to write fast parallel programs then you
need to understand the messaging model used by the underlying hardware.
Eventual Consistency
A system is eventually consistent if the final state of each machine is the same regardless
of how we choose to serialise update events. An eventually consistent system allows us to
sacrifice consistency for availability without having the state of different machines diverge
irreparably. It doesn't save us from having the outside world see different serialisations of
update events. It is also difficult to construct eventually consistent data structures and to
reason about their composition.
3 What is group communication? What are the Key areas of applications of group
communication? Explain the programming model for group
communication.APRIL/MAY 2018
Group Communication
The term multicast means the use of a single communication primitive to send a message
to a specific set of processes rather than using a collection of individual point to point
message primitives. This is in contrast with the term broadcast which means the message is
addressed to every host or process.
A client wishes to obtain a service which can be performed by any member of the
group without affecting the state of the service.
A client wishes to obtain a service which must be performed by each member of the
group.
In the first case, the client can accept a response to its multicast from any member of the
group as long as at least one responds. The communication system need only guarantee
delivery of the multicast to a nonfaulty process of the group on a best-effort basis. In the
second case, the all-or-none atomic delivery requirements requires that the multicast needs
to be buffered until it is committed and subsequently delivered to the application process,
and so incurs additional latency.
Failure may occur during a multicast at the recipient processes, the communication links
or the originating process.
Failures at the recipient processes and on the communication links can be detected by the
originating process using standard time-out mechanisms or message acknowledgements.
If the originator fails during the multicast, there are two possible outcomes. Either the
message has not have arrived at any destination or it has arrived at some. In the first case,
no process can be aware of the originator's intention and so the multicast must be aborted.
In the second case it may be possible to complete the multicast by selecting one of the
recipients as the new originator. The recipients would have to buffer messages until safe
for delivery in case they were called on for this role.
A reliable multicast protocol imposes no restriction on the order in which messages are
delivered to group processes. Given that multicasts may be in progress by a number of
originators simultaneously, the messages may arrive at different processes in a group in
different orders. Also, a single originator may have a number of simultaneous multicasts in
progress or may have issued a sequence of multicast messages whose ordering we might
like preserved at the recipients. Ideally, multicast messages should be delivered
instantaneously in the real-time order they were sent, but this is unrealistic as there is no
global time and message transmission has a possibly significant and variable latency.
A number of possible scenarios are given below which may require different levels of
ordering semantics. G and s represent groups and message sources. s may be inside or
outside a group. Note that group membership may overlap with other groups, that is,
processes may be members of more than one group.
A FIFO ordered protocol guarantees that messages by the same sender are delivered in the
order that they were sent. That is, if a process multicasts a message m before it multicasts a
message m', then no correct process receives m' unless it has previously received m. To
implement this, messages can be assigned sequence numbers which define an ordering on
messages from a single source. Some applications may require the context of previously
multicast messages from an originator before interpreting the originator's latest message
correctly.
However, the content of message m may also depend on messages that the sender of m
received from other sources before sending m. The application may require that the context
which could have caused or affected the content of m be delivered at all destinations of m,
before m. For example, in a network news application, user A broadcasts an article. User B
at a different site receives the article and broadcasts a response. User C can only interpret
the response if the original article is delivered first at their site. Two messages are said to
be causally related if one message is generated after receipt of the other. Causal order is a
strengthening of FIFO ordering which ensures that a message is not delivered until all
messages it depends on have been delivered.
Event e causally precedes event f (i.e. happened before), (ef), if an only if:
A causal protocol then guarantees that if the broadcast of message m causally precedes the
broadcast of m', then no correct process receives m' unless it has previously received m.
The definition of causal ordering does not determine the delivery order of messages which
are not causally related. Consider a replicated database application with two copies of a
bank account x residing at different sites. A client side process at one site sends a multicast
to the database to lodge £100 to account x. At another site simultaneously, a client side
process initiates a multicast to add 10% interest to the current balance of x. For
consistency, all database replicas should apply the two updates in the same sequence. As
these two messages are not causally related, a causal broadcast would allow the update
messages to x to be delivered in different sequences at the replicas.
Total Ordering guarantees that all correct processes receive all messages in the same order.
That is, if correct processes p and q both receive messages m and m', then p receives m
before m' if and only if q receives m before m'. The multicast is atomic across all members
of the group.
Note that this definition of a totally ordered broadcast does not require that messages be
delivered in Causal Order or even FIFO Order, so it is not stronger than these orderings.
For example, if a process suffers a transient failure during the broadcast of message m, and
subsequently broadcasts m', a totally ordered protocol will guarantee only that processes
receive m'.
Integrity: For any message m, every correct process receives m at most once and
only if m was multicast by the sender of m.
The protocols only differ in the strength of their message delivery order requirements.
Multicast Algorithms
In the algorithms to follow, R stands for Reliable Multicast, F for FIFO, C for Causal and
A for Atomic.
In an asynchronous system where a reliable link exists between every pair of processes, the
algorithm below demonstrates how a Reliable multicast can be achieved.
multicast(R, m):
receive(R, m)
multicast(F, m):
multicast(R, m)
For each q, every process p maintains a counter next[q] that indicates the sequence number
of the next F-multicast from q that p is willing to F-deliver. Incoming messages are placed
in a message bag from which messages that can be FIFO delivered (according to the value
of next[q]) are removed.
msgbag =
q := sender(m);
receive(F, m')
next[q] = next[q]+1;
Initialisation:
prevReceives =
multicast(C, m):
prevReceives =
for i := 1 to n do
receive(C, mi)
Each site maintains a 'local clock'. A clock doesn't necessarily have to supply the exact
time, but could be implemented simply by a counter which is incremented after each send
or receive event that occurs at the site, so that successive events have different 'times'. The
algorithm executes in two phases. During the first phase the originator multicasts the
message to all destinations and awaits a reply from each. Each receiver queues the message
and assigns it a proposed timestamp based on the value of its local clock. This timestamp is
returned to the originator. The originator collects all the proposed timestamps for the
multicast and selects the highest. During the second phase of the algorithm, the originator
commits the multicast by sending the final chosen timestamp to all destinations. Each
receiver then marks the message as deliverable in its queue. The message queue is ordered
on the value of the timestamps associated with each message each time a timestamp is
updated. When a message gets to the top of the queue and is deliverable it may be
delivered immediately to the application.
UNIT-III
PART-A
1 What is distributed deadlock? Explain with example. With deadlock detection schemes,
a transaction is aborted only when it is involved in a deadlock. Most deadlock detection
schemes operate by finding cycles in the transaction wait- for graph. In a distributedsystem
involving multiple servers being accessed by multiple transactions, a global wait-for graph
can in theory be constructed from the local ones. There can be a cycle in the global wait-for
graph that is not in any single local one – that is, there can be a distributed deadlock
A deadlock that is 'detected' but is not really a deadlock is called phantom deadlock. In
distributed deadlock detection, information about wait-for relationships between transactions
is transmitted from on server to another. If there is a deadlock, the necessary information
will eventually be collected in one place and a cycle will be detected. Ja this procedure will
take some time, there is a chance that one of the transactions
that Holds a lock will meanwhile have released it, in which case the deadlock will no longer
exist.
A distributed approach to deadlock detection uses a technique called edge chasing or path
pushing. In this approach, the global wait-for graph is not constructed, but each of the
servers involved has Knowledge about some of its edges.
The servers attempt to find cycles by forwarding messages called probes, which follow the
edges of the graph throughout the distributed system. A probe message consists of
transaction wait-for relationships representing a path in the global
wait-for graph.
5. Define Distributed Mutual Exclusion.
A condition in which there is a set of processes ,only one of which is able to access a given
resource or perform a given function at a time.
6. Compare Deadlock and Starvation
Deadlock happens when two or more process indefinitely gets stopped when it
attempts to enter or exit the critical section
Starvation is the indefinite postponement of entry for a process that has requested
it.Without Deadlock Starvation may also occur. No starvation leads to fairness
condition.
7. What are the approaches to implement distributed mutual exclusion.
Token based approach
Non Token based approach
Quorum based approach
PART-B
Each distributed system has a number of processes running on a number of different physical
servers. These processes communicate with each other via communication channels using
text messaging. These processes neither have a shared memory nor a common physical clock,
this makes the process of determining the instantaneous global state difficult.
A process could record it own local state at a given time but the messages that are in transit
(on its way to be delivered) would not be included in the recorded state and hence the actual
state of the system would be incorrect after the time in transit message is delivered.
Chandy and Lamport algorithm is to capture consistent global state of a distributed system.
The main idea behind proposed algorithm is that if we know that all
message that hat have been sent by one process have been received by another then we can
record the global state of the system.
Any process in the distributed system can initiate this global state recording algorithm using a
special message called MARKER. This marker traverse the distributed system across all
communication channel and cause each process to record its own state. In the end, the state of
entire system (Global state) is recorded. This algorithm does not interfere with normal
execution of processes.
There are finite number of processes in the distributed system and they do not share
memory and clocks.
There are finite number of communication channels and they are unidirectional and
FIFO ordered.
There exists a communication path between any two processes in the system
On a channel, messages are received in the same order as they are sent.
Algorithm:
(Note: Process Q will receive this marker on his incoming channel C1.)
Marker receiving rule for a process Q :
o If process Q has not yet recorded its own local state then
Record the state of incoming channel C1 as an empty sequence or null.
After recording the state of incoming channel C1, process Q Follows
the marker sending rule
o If process Q has already recorded its state
Record the state of incoming channel C1 as the sequence of messages
received along channel C1 after the state of Q was recorded and before
Q received the marker along C1 from process P.
Bully Algorithm
Let’s say the scenario is, we have 6 process numbered as 1, 2, 3, 4, 5, 6 and also, the priority
or process number are also in the same order, therefore the process 6 is the highest process
number. The process are shown below; Circles are the processes and the square boxes are
their numbers.
Now, suppose the case is Process 6 has crashed and other processes are active. The crashing
has been noticed by process 2. It finds out that the Process 6 is longer responding to the
request. In this case, Process 2 will start a fresh election.
Process 2 sends an election message to the process, which has the highest number. In
our case, the processes are 3, 4, 5, 6. Now, as Process 6 is down or fails, it will definitely
not respond to the election message.
Process 3, 4, 5 are active and therefore they respond with a reply or acknowledgement
message to Process 2.
If no one will respond to the request message by Process 2, it will win the election. Now, the
election will be initiated by the next highest number. In our case, it is Process 3, which will
send the election message to Process 4, 5 and 6. As Process 6 is down, it will again not
respond to the request message. Again, if no one will respond, then Process 3 will win the
election.
It will be the same process for Process 4 and it will be the next initiator to conduct the
election. Now, the chance came to Process 5, as it is the highest in all processes. Also,
because Process 6 is down. In this case, Process 5 will win the election and will send this
victory message to all.
Meanwhile, Process 6 came back from the down state to active state. It will definitely hold
the election, as it is the highest among the processes in the system. It will win the election,
which is based on the highest number and control over the Coordinator job.
Whenever the highest number process is recovered from the down state, it holds the election
and win the election. Also, it bullies the other process to get in submission.
3 What is a deadlock? How deadlock can be recovered? Explain distributed dead locks.
Deadlock Detection
In the above diagram, resource 1 and resource 2 have single instances. There is a
cycle R1 → P1 → R2 → P2. So, Deadlock is Confirmed.
DeadlockRecovery
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is
time and space consuming process. Real-time operating systems use Deadlock recovery.
Recovery method
1. Killing the process: killing all the process involved in the deadlock. Killing process
one by one. After killing each process check for deadlock again keep repeating the
process till system recover from deadlock.
2. Resource Preemption: Resources are preempted from the processes involved in the
deadlock, preempted resources are allocated to other processes so that there is a
possibility of recovering the system from deadlock. In this case, the system goes into
starvation.
S. UNIT-IV
No
.
PART-A
1 What is Roll back recovery?
Roll back recovery is defines as a system recovers correctly if its internal state is consistent
with the observable behavior of the system before the failure.
2 What is a local checkpoint?
A local checkpoint is a snapshot of the state of the process at a given instance and the event of
recording the state of a process is called local check pointing.
3 What are the types of messages in recovery?
In-transit messages
Lost messages
Delayed messages
Orphan messages
Duplicate messages
4 What is an Orphan message?
Messages with receive recorded but message send not recorded are called orphan messages.
5 Classify the checkpoint-based rollback recovery techniques.
Uncoordinated check pointing
Coordinated check pointing
Communication induced check pointing
Introduction
Checkpoint-Recovery is a common technique for imbuing a program or system with fault
tolerant qualities, and grew from the ideas used in systems which employ transaction processing
. It allows systems to recover after some fault interrupts the system, and causes the task to fail,
or be aborted in some way. While many systems employ the technique to minimize lost
processing time, it can be used more broadly to tolerate and recover from faults in a critical
application or task.
The basic idea behind checkpoint-recover is the saving and restoration of system state. By
saving the current state of the system periodically or before critical code sections, it provides the
baseline information needed for the restoration of lost state in the event of a system failure.
While the cost of checkpoint-recovery can be high, by using techniques like memory exclusion,
and by designing a system to have as small a critical state as possible may minimize the cost of
checkpointing enough to be useful in even cost sensitive embedded applications.
When a system is checkpointed, the state of the entire system is saved to non-volatile storage.
The checkpointing mechanism takes a snapshot of the system state and stores the data on some
non-volatile storage medium. Clearly, the cost of a checkpoint will vary with the amount of
state required to be saved and the bandwidth available to the storage mechanism being used to
save the state.
In the event of a system failure, the internal state of the system can be restored, and it can
continue service from the point at which its state was last saved. Typically this involves
restarting the failed task or system, and providing some parameter indicating that there is state
to be recovered. Depending on the task complexity, the amount of state, and the bandwidth to
the storage device this process could take from a fraction of a second to many seconds.
Typically upon state restoration the system will continue processing in an identical manner as
it did previously.
It will tolerate any transient fault, however if the fault was caused by a design error, then the
system will continue to fail and recover endlessly. In some cases, this may be the most
important type of fault to guard against, but not in every case.
Unfortunately, it has only limited utility in the presence of a software design fault. Consider for
instance a system which performs control calculations, one of which is to divide a temperature
reading into some value. Since the specification requires the instrument to read out in degrees
Kelvin (absolute temperature), a temperature of 0 is not possible. In this case the programmer
(realizing this) fails to check for zero prior to performing the divide. The system works well for
a few months, but then the temperature gauge fails. The manufacturer realizes that a 0K
temperature is not possible, and decides that the gauge should fail low, since a result of 0 is
obviously indicative of a failure. The system faults, and attempts to recover its state.
Unfortunately, it reaches the divide instruction and faults, and continues to recover and fault
until some human intervention occurs. The point here is not that there should be redundant
temperature sensors, but that the most common forms of checkpoint and recovery are not
effective against some classes of failures.
KeyConcepts
The basic mechanism of checkpoint-recovery consists of three key ideas - the saving and
restoration of executive state, and the detection of the need to restore system state. Additionally,
for more complex distributed embedded systems, the checkpoint-recovery mechanism can be
used to migrate processes off individual nodes
A snapshot of the complete program state may be scheduled periodically during program
execution. Typically this is accomplished by pausing the operation of the process whose state is
to be saved, and copying the memory pages into non-volatile storage. While this can be
accomplished by using freely available checkpoint-recovery libraries, it may be more efficient
to build a customized mechanism into the system to be protected.
Between full snapshots, or even in place of all but the first complete shot, only that state which
has changed may be saved. This is known as incremental checkpointing, and can be thought of
in the same way as incremental backups of hard disks. The basic idea here is to minimize the
cost of checkpointing, both in terms of the time required and the space (on non-volatile storage).
Not all program state may need to be saved. System designers may find it more efficient to build
in mechanisms to regenerate state internally, based on a smaller set of saved state. Although this
technique might be difficult for some applications, it has the benefit of having the potential to
save both time and space during both the checkpoint and recovery operations.
A technique known as memory exclusion allows a program to notify the checkpoint algorithm
which memory areas are state critical and which are not. This technique is similar to that of
rebuilding state discussed above, in that it facilitates saving only the information most critical to
program state. The designer can exclude large working set arrays, string constants, and other
similar memory areas from being checkpointed.
When these techniques are combined, the cost of checkpointing can be reduced by factors of 3-
4. Checkpointing, like any fault tolerant computing technique, does require additional resources.
Typically those systems which must meet hard real-time deadlines will have the most
difficulty implementing any type of checkpoint recovery system
RestoringExecutivestate
When a failure has occurred, the recovery mechanism restores system state to the last
checkpointed value. This is the fundamental idea in the tolerance of a fault within a system
employing checkpoint-recovery. Ideally, the state will be restored to a condition before the fault
occurred within the system. After the state has been restored, the system can continue normal
execution.
State is restored directly from the last complete snapshot, or reconstructed from the last
snapshot and the incremental checkpoints. The concept is similar to that of a journaled file
system, or even RCS(revision control system), in that only the changes to a file are recorded.
Thus when the file is to be loaded or restored, the original document is loaded, and then the
specified changes are made to it. In a similar fashion, when the state is restored to a system
which has undergone one or more incremental checkpoints, the last full checkpoint is loaded,
and then modified according to the state changes indicated by the incremental checkpoint data.
If the root cause of the failure did not manifest until after a checkpoint, and that cause is part of
the state or input data, the restored system is likely to fail again. In such a case the error in the
system may be latent through several checkpoint cycles. When the it finally activates and causes
a system failure, the recovery mechanism will restore the state (including the error!) and
execution will begin again, most likely triggering the same activation and failure. Thus it is in
the system designers best interest to ensure that any checkpoint-recovery based system is fail
fast - meaning errors are either tolerated, or case the system to fail immediately, with little or no
incubation period.
Such recurring failures might be addressed through multi-level rollbacks and/or algorithmic
diversity. Such a system would detect multiple failures as described above, and recover state
from checkpoint data previous to the last recovery point. Additionally, when the system detects
such multiple failures it might switch to a different algorithm to perform its functionality, which
may not be susceptible to the same failure modes. The system might degrade its performance by
using a more robust, but less efficient algorithm in an attempt to provide base level functionality
to get past the fault before switching back to the more efficient routines.
Failure Detection
Failure detection can be a tricky part of any fault tolerant design. Sometimes the line between an
unexpected (but correct) result, and garbage out is difficult to discern. In traditional checkpoint-
recovery failure detection is somewhat simplistic. If the process or system terminates, there is a
failure. Additionally, some systems will recover state if they attempted a non-transactional
operation that failed and returned. The discussion of failure detection, and especially how it
impacts embedded systems is left to the chapters on fault tolerance, reliability, dependability,
and architecture.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1. Transaction identifier: Unique Identifier of the transaction that performed the write
operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write operation.
1. Undo: using a log record sets the data item specified in log record to old value.
2. Redo: using a log record sets the data item specified in log record to new value.
1. Deferred Modification Technique: If the transaction does not modify the database
until it has partially committed, it is said to use deferred modification technique.
2. Immediate Modification Technique: If database modification occur while transaction
is still active, it is said to use immediate modification technique.
1. Transaction Ti needs to be undone if the log contains the record <Ti start> but does not
contain either the record <Ti commit> or the record <Ti abort>.
2. Transaction Ti needs to be redone if log contains record <Ti start> and either the record
<Ti commit> or <Ti abort>
Use of Checkpoints –
When a system crash occurs, user must consult the log. In principle, that need to search the
entire log to determine this information. There are two major difficulties with this
approach:
To reduce these types of overhead, user introduce checkpoints. A log record of the form
<checkpoint L> is used to represent a checkpoint in log where L is a list of transactions active at
the time of the checkpoint. When a checkpoint log record is added to log all the transactions that
have committed before this checkpoint have <Ti commit> log record before the checkpoint
record. Any database modifications made by Ti is written to the database either prior to the
checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no need to perform
a redo operation on Ti.
After a system crash has occurred, the system examines the log to find the last <checkpoint L>
record. The redo or undo operations need to be applied only to transactions in L, and to all
transactions that started execution after the record was written to the log. Let us denote this set
of transactions as T. Same rules of undo and redo are applicable on T as mentioned in Recovery
using Log records part.
Note that user need to only examine the part of the log starting with the last checkpoint log
record to find the set of transactions T, and to find out whether a commit or abort record occurs
in the log for each transaction in T. For example, consider the set of transactions {T0, T1, . . .,
T100}. Suppose that the most recent checkpoint took place during the execution of transaction
T67 and T69, while T68 and all transactions with subscripts lower than 67 completed before the
checkpoint. Thus, only transactions T67, T69, . . ., T100 need to be considered during the
recovery scheme. Each of them needs to be redone if it has completed (that is, either committed
or aborted); otherwise, it was incomplete, and needs to be undone.
S. UNIT-V
No
.
PART-A
1 What is peer to peer system?
Peer-to-peer systems aim to support useful distributed services and applications using data
and computing resources available in the personal computers and workstations that are present
in the
Internet and other networks in ever-increasing numbers.
2 What is goal of peer to peer system?
The goal of peer-to-peer systems is to enable the sharing of data and resources on a very large
scale by eliminating any requirement for separately managed servers and their associated
infrastructure.
3 What are the characteristics of peer to peer system? MAY/JUNE 2016
Their design ensures that each user
contributes resources to the system.
• All the nodes in a peer-to-peer system have the same functional capabilities and
responsibilities.
• Their correct operation does not depend on the existence of any centrally
administered systems.
• They can be designed to offer a limited degree of anonymity to the providers
and users of resources.
5 What is the need of peer to peer middleware system? NOV/DEC 2017
Peer-to-peer middleware systems are designed specifically to meet the need for the automatic
placement and subsequent location of the distributed objects managed by peer-to-peer systems
and applications.
6 Write the Non-functional requirements of peer-to-peer middleware system.
o Global scalability
o Load balancing
o Optimization for local interactions between neighbouring peers
o Accommodating to highly dynamic host availability
7 What is the role of routing overlays in peer to peer system? APR/MAY 2017
Peer-to-peer systems usually store multiple
replicas of objects to ensure availability. In that case, the routing overlay maintains
Knowledge of the location of all the available replicas and delivers requests to the nearest
‘live’ node (i.e. one that has not failed) that has a copy of the relevant object.
What are the tasks performed by routing overlay?
8 o Insertion of objects
o Deletion of objects
o Node addition and removal
9 What are the generations of peer to peer system?
Three generations of peer-to-peer system and application development can be identified.
o The first generation was launched by the Napster music exchange service
[OpenNap 2001].
o A second generation of file sharing applications offering greater scalability,
anonymity and fault tolerance quickly followed including Freenet, Gnutella,
Kazaa and BitTorrent
o The third generation is characterized by the emergence of middleware
layers for the application-independent management of
distributed
o resources on a global scale
10 What are the case studies used in overlay?
NOV/DEC 2017
o Pastry is the message routing infrastructure deployed in several applications
including PAST.
o Tapestry is the basis for the Ocean Store storage system.
11 Difference between Structured vs. unstructured peer-to-peer systems
PART-B
1 With neat sketch explain Routing Overlays in detail. MAY/JUNE 2016, NOV/DEC
2016,APRIL/MAY 2017, APRIL/MAY 2018
A RON Model:-
Designate RON nodes for the overlay.
Exchange of performance and reach ability, and routing based on this.
2-50 nodes (only) on overlay.
The RON architecture achieves the following benefits:
1. Fault detection: A RON can more efficiently find alternate paths around problems even
when the underlying network layer incorrectly believes that all is well.
2. Better reliability for applications: Each RON can have an independent, application-
specific definition of what constitutes a fault.
3. Better performance: A RON's limited size allows it to use more aggressive path
computation algorithms that the Internet. RON nodes can exchange more complete topologies,
collect more detailed link quality metrics, execute more complex routing algorithms , and
respond more quickly to change.
4. Application-specific routing: Distributed applications can link with the RON library
and choose, or even define, their own routing matrices.