0% found this document useful (0 votes)
117 views44 pages

Flynn's Classification Divides Computers Into Four Major Groups That Are

Flynn's classification divides computers into four categories based on the number of instruction and data streams: SISD, SIMD, MISD, and MIMD. Designing distributed systems presents challenges including heterogeneity between hardware, software, and networks. Key issues are transparency, openness, concurrency, security, scalability, and failure handling. Physical clock synchronization is needed to determine the order of events, and algorithms can be centralized, with a time server as reference, or distributed, with nodes synchronizing without a central server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views44 pages

Flynn's Classification Divides Computers Into Four Major Groups That Are

Flynn's classification divides computers into four categories based on the number of instruction and data streams: SISD, SIMD, MISD, and MIMD. Designing distributed systems presents challenges including heterogeneity between hardware, software, and networks. Key issues are transparency, openness, concurrency, security, scalability, and failure handling. Physical clock synchronization is needed to determine the order of events, and algorithms can be centralized, with a time server as reference, or distributed, with nodes synchronizing without a central server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Unit - 1

1. Flynn's classification

Flynn's classification divides computers into four major groups that


are:

1. Single instruction stream, single data stream (SISD)

2. Single instruction stream, multiple data stream (SIMD)

3. Multiple instruction streams, single data stream (MISD)

4. Multiple instruction streams, multiple data streams (MIMD)

Single instruction stream, single data stream (SISD)


In computing, a single instruction stream, the single data stream is a
computer architecture in which a single uni-core processor executes a
single instruction stream, to operate on data stored in a single memory.
This corresponds to the von Neumann architecture.

Single instruction stream, multiple data stream (SIMD)


Single instruction and multiple data are a type of parallel processing in
Flynn's taxonomy. SIMD can be internal and it can be directly accessible
through an instruction set architecture, but it should not be confused with
an ISA.

Multiple instruction streams, single data stream (MISD)


In computing, multiple instructions, single data is a type of parallel
computing architecture where many functional units perform different
operations on the same data. Pipeline architectures belong to this type,
though a purist might say that the data is different after processing by each
stage in the pipeline.
Multiple instruction streams, multiple data streams (MIMD)
In computing, multiple instruction, and multiple data is a techniques
employed to achieve parallelism. Machines using MIMD have a number of
processors that function asynchronously and independently. At any time,
different processors may be executing different instructions on different
pieces of data.
2. Design issues and challenges.

Designing a distributed system does not come as easy and straight forward.
A number of challenges need to be overcome in order to get the ideal
system. The major challenges in distributed systems are listed below:

1. Heterogeneity:

The Internet enables users to access services and run applications over a
heterogeneous collection of computers and networks. Heterogeneity (that
is, variety and difference) applies to all of the following:

● Hardware devices: computers, tablets, mobile phones, embedded


devices, etc.
● Operating System: Ms Windows, Linux, Mac, Unix, etc.
● Network: Local network, the Internet, wireless network, satellite
links, etc.
● Programming languages: Java, C/C++, Python, PHP, etc.
● Different roles of software developers, designers, system managers

Different programming languages use different representations for


characters and data structures such as arrays and records. These
differences must be addressed if programs written in different languages
are to be able to communicate with one another. Programs written by
different developers cannot communicate with one another unless they use
common standards, for example, for network
communication and the representation of primitive data items and
data structures in messages. For this to happen, standards need to be
agreed upon and adopted – as have the Internet protocols. Middleware:
The term middleware applies to a software layer that provides a
programming abstraction as well as masking the heterogeneity of the
underlying networks, hardware,
operating systems, and programming languages. Most middleware is
implemented over theInternet protocols, which themselves mask the
differences of the underlying networks, but all middleware deals with
the differences in operating systems and hardware
Heterogeneity and mobile code: The term mobile code is used to refer to
program code that can be transferred from one computer to another and
run at the destination – Java applets are an example. Code suitable for
running on one computer is not necessarily suitable for running on another
because executable programs are normally specific both to the instruction
set and to the host operating system.

2. Transparency:

Transparency is defined as the concealment from the user and the


application programmer of the separation of components in a distributed
system so that the system is perceived as a whole rather than as a collection
of independent components. In other words, distributed systems designers
must hide the complexity of the systems as much as they can. Some terms
of transparency in distributed systems are:
Access Hide differences in data representation and how a resource is
accessed
LocationHide where a resource is located
Migration Hide that a resource may move to another location
RelocationHide that a resource may be moved to another location while in
use Replication Hide that a resource may be copied in several
places Concurrency Hide that a resource may be shared by
several competitive users Failure Hide the failure and recovery of a
resource
Persistence Hide whether a (software) resource is in memory or a disk
3. Openness

The openness of a computer system is the characteristic that determines


whether the system can be extended and reimplemented in various ways.
The openness of distributed systems is determined primarily by the degree
to which new resource-sharing services can be added and be made
available for use by a variety of client programs. If the well-defined
interfaces for a system are published, it is easier for developers to add new
features or replace sub-systems in the future. Example: Twitter and
Facebook have API that allows developers to develop theirs own software
interactively.

4. Concurrency

Both services and applications provide resources that can be shared by


clients in a distributed system. There is therefore a possibility that several
clients will attempt to access a shared resource at the same time. For
example, a data structure that records bids for an auction may be accessed
very frequently when it gets close to the deadline time. For an object to be
safe in a concurrent environment, its operations must be synchronized in
such a way that its data remains consistent. This can be achieved by
standard techniques such as semaphores, which are used in most operating
systems.

5. Security

Many of the information resources that are made available and maintained
in distributed systems have a high intrinsic value to their users. Their
security is therefore of considerable importance. Security for information
resources has three components:
confidentiality (protection against disclosure to unauthorized individuals)
integrity (protection against alteration or corruption),availability for the
authorized (protection against interference with the means to access the
resources).
6. Scalability

Distributed systems must be scalable as the number of users increases.


Scalability is defined by B. Clifford Neuman as A system said to be scalable
if it can handle the addition of users and resources without suffering a
noticeable loss of performance or increase in administrative complexity

Scalability has 3 dimensions:

● Size
o A number of users and resources to be processed. The problem
associated is overloading
● Geography
o Distance between users and resources. The problem associated is
communication reliability
● Administration
o As the size of distributed systems increases, many of the systems need
to be controlled. The problem associated is an administrative mess

7. Failure Handling

Computer systems sometimes fail. When faults occur in hardware or


software, programs may produce incorrect results or may stop before they
have completed the intended computation. The handling of failures is
particularly difficult.
3. A model of distributed executions

The goal of this section is to provide motivational examples of


contemporary distributed systems and the great diversity of the associated
applications. As mentioned in the introduction, networks are everywhere
and underpin many everyday services that we now take for granted: the
Internet and the associated World Wide Web, web search, online gaming,
email, social networks, eCommerce, etc. To illustrate this point further,
consider Figure1.1, which describes a selected range of key commercial or
social application sectors highlighting some of the associated established or
emerging uses of distributed systems technology.
As can be seen, distributed systems encompass many of the most significant
technological developments of recent years and hence an understanding of
the underlying technology is absolutely central to a knowledge of modern
computing. The figure also provides an initial insight into the wide range of
applications in use today, from relatively localized systems (as found, for
example, in a car or aircraft) to global-scale systems involving millions of
nodes, from data-centric services to processor-intensive tasks, from systems
built from very small and relatively primitive sensors to those
incorporating powerful computational elements, from embedded systems
to ones that support a sophisticated interactive user experience, and so on.
We now look at more specific examples of distributed systems to further
illustrate the diversity and indeed complexity of distributed systems
provision today
4. Physical clock synchronization.

Physical clock synchronization algorithm

Every computer contains a clock which is an electronic device that counts


the oscillations in a crystal at a particular frequency. Synchronization of
these physical clocks to some known high degree of accuracy is needed.
This helps to measure the time relative to each local clock to determine
order between events.

Physical clock synchronization algorithms can be classified as centralized


and distributed.

1.Centralized clock synchronization algorithms

These have one node with a real-time receiver and are called time server
node. The clock time of this node is regarded as correct and used as
reference time.The goal of this algorithm is to keep the clocks of all other
nodes synchronized with time server node.

i. Cristian’s Algorithm

● In this method each node periodically sends a message to the server.


When the time server receives the message it responds with a message T,
where T is the current time of server node.
● Assume the clock time of client be To when it sends the message and
T1 when it receives the message from server. To and T1 are measured using
same clock so best estimate of time for propagation is (T1-To)/2.
● When the reply is received at clients node, its clock is readjusted to
T+(T1-T0)/2. There can be unpredictable variation in the message
propagation time between the nodes hence (T1-T0)/2 is not good to be
added to T for calculating the current time.
● For this several measurements of T1-To are made and if these
measurements exceed some threshold value then they are unreliable and
discarded. The average of the remaining measurements is calculated and
the minimum value is considered accurate and half of the calculated value
is added to T.
● Advantage-It assumes that no additional information is available.
● Disadvantage- It restricts the number of measurements for

estimating the value. ii.The Berkley Algorithm

● This is an active time server approach where the time server


periodically broadcasts its clock time and the other nodes receive the
message to correct their own clocks.
● In this algorithm, the time server periodically sends a message to all
the computers in the group of computers. When this message is received
each computer sends back its own clock value to the time server. The time
server has prior knowledge of the approximate time required for the
propagation of a message which is used to readjust the clock values. It then
takes a fault-tolerant average of clock values of all the computers. The
calculated average is the current time to which all clocks should be
readjusted.
● The time server readjusts its own clock to this value and instead of
sending the current time to other computers, it sends the amount of time
each computer needs for readjustment. This can be a positive or negative
value and is calculated based on the knowledge the time server has about
the propagation of the message.

2. Distributed algorithms

Distributed algorithms overcome the problems of centralized by internally


synchronizing for better accuracy. One of the two approaches can be used:

i. Global Averaging Distributed Algorithms


● In this approach, the clock process at each node broadcasts its local
clock time in the form of a “resync” message at the beginning of every
fixed-length resynchronization interval. This is done when its local time
equals To+iR for some integer I, where To is a fixed time agreed by all
nodes and R is a system parameter that depends on the total nodes in a
system.
● After broadcasting the clock value, the clock process of a node waits
for time T which is determined by the algorithm.
● During this waiting, the clock process collects the resync messages
and the clock process records the time when the message is received which
estimates the skew after the waiting is done. It then computes a
fault-tolerant average of the estimated skew and uses it to correct the
clocks.

ii. Localized Averaging Distributes Algorithms

● The global averaging algorithms do not scale as they need a network


to support the broadcast facility and a lot of message traffic is generated.
● Localized averaging algorithms overcome these drawbacks as the
nodes in distributed systems are logically arranged in a pattern or ring.
● Each node exchanges its clock time with its neighbors and then sets
its clock time to the average of its own clock time and of its neighbors.
Unit 2
1. A synchronous execution with synchronous communication.

Synchronous vs. Asynchronous Definition:

● Synchronous communication: The calling party requests a service,


and waits for the service to complete. Only when it receives the result of the
service does it continue with its work. A timeout may be defined so that if
the service does not finish within the defined period the call is assumed to
have failed and the caller continues.
● Asynchronous communication: The calling party initiates a service
call, but does not wait for the result. The caller immediately continues with
their work without caring about the result. If the caller is interested in the
result there are mechanisms that we'll discuss in the next paragraphs.

Be aware that the distinction between synchronous and asynchronous is


highly dependent on the viewpoint. Often asynchronous is used in the sense
of “the user interface must stay responsive all the time”. This
interpretation often leads to the wrong conclusion: “…and therefore every
communication must be asynchronous”. A non-blocking GUI usually has
nothing to do with low-level communication contracts and can be achieved
by different means, e.g. parallel processing of the interactive and
communication tasks. The truth is that synchronous communication on a
certain level of abstraction can be implemented with asynchronous
interfaces on another level of abstraction and vice versa if needed.
File-based communication is often considered to be asynchronous. One
party writes a file but does not care if the other party is active, fetches the
file, or is able to process it. However, it is possible to implement another
layer of functionality so that the second (reading) party gives feedback, e.g.
by writing a short result file, so that the first (writing) party can wait and
poll for the result of the file processing. This layer introduces synchronous
communication over file exchange.

Communication over a database often is implemented by one party writing


execution orders into a special table and the other party reads this table
periodically and processes new entries, marking them as “done” or “failed”
after execution. So far this is an asynchronous communication pattern. As
soon as the first party waits for the result of the execution, this second layer
introduces a synchronous communication pattern again.

The following chapters explain synchronous and asynchronous


communication patterns in more detail, using web services as an example.
The scenarios can also be used for other connectivity types.

Synchronous services are easy to implement since they keep the complexity
of the communication low by providing immediate feedback. They avoid
the need to keep the context of a call on the client and server-side, including
e.g. the caller’s address or a message-id, beyond the lifetime of the request.
2. Group communication.

Group Communication

A group is an operating system abstraction for a collective of related


processes. A set of cooperative processes may, for example, form a group to
provide an extendable, efficient, available, and reliable service. The group
abstraction allows member processes to perform computation on different
hosts while providing support for communication and synchronization
between them.

The term multicast means the use of a single communication primitive to


send a message to a specific set of processes rather than using a collection
of individual point-to-point message primitives. This is in contrast with the
term broadcast which means the message is addressed to every host or
process.

A consensus protocol allows a group of participating processes to reach a


common decision, based on their initial inputs, despite failures.

A reliable multicast protocol allows a group of processes to agree on a set of


messages received by the group. Each message should be received by all
members of the group or by none. The order of these messages may be
important for some applications. A reliable multicast protocol is not
concerned with message ordering, only message delivery guarantees.
Ordered delivery protocols can be implemented on top of reliable multicast
service.

Multicast algorithms can be built on top of lower-level communication


primitives such as point-to-point sends and receive or perhaps by availing
of specific network mechanisms designed for this purpose.

The management of a group needs an efficient and reliable multicast


communication mechanism to allow clients to obtain services from the
group and ensure consistency among servers in the presence of failures.
Consider the following two scenarios:-

A client wishes to obtain a service that can be performed by any member of


the group without affecting the state of the service.

A client wishes to obtain a service that must be performed by each member


of the group.

In the first case, the client can accept a response to its multicast from any
member of the group as long as at least one responds. The communication
system need only guarantee delivery of the multicast to a nonfaulty process
of the group on a best-effort basis. In the second case, the all-or-none
atomic delivery requirements require that the multicast needs to be
buffered until it is committed and subsequently delivered to the application
process, and so incurs additional latency.

Failure may occur during a multicast at the recipient processes, the


communication links, or the originating process.

Failures at the recipient processes and on the communication links can be


detected by the
originating process using standard time-out mechanisms or message
acknowledgments. The multicast can be aborted by the originator, or the
service group membership may be dynamically adjusted to exclude the
failed processes and the multicast can be continued.

3. Casual order and Total order


Causal ordering and Total order

Causal ordering is a vital tool for thinking about distributed systems.

Messages sent between machines may arrive zero or more times at any
point after they are sent

This is the sole reason that building distributed systems is hard.

For example, because of this property, it is impossible for two computers


communicating
over a network to agree on the exact time. You can send me a message
saying "it is now known how long it took for that message to arrive. We
can send
messages back and forth all day but we will never know for sure that we
are synchronized.

If we can't agree on the time then we can't always agree on what order
things happen. Suppose I say "my user logged on at 10:00:00" and you say
"my user logged on at 10:00:01". Maybe mine was first or maybe my clock
is just fast relative to yours. The only way to know for sure is if something
connects those two events. For example, if my user logged on and then sent
your user an email and if you received that email before your user logged
on then we know for sure that mine was first
This concept is called causal ordering and is written like this

A -> B (event A is causally ordered before event B)

Let's define it a little more formally. We model the world as follows: We


have a number of machines on which we observe a series of events. These
events are either specific to one machine (eg user input) or are
communications between machines. We define the causal ordering of these
events by three rules:

If A and B happen on the same machine and A happens before B then A ->

B If I send you some message M and you receive it then (send M) -> (recv
M)

If A -> B and B -> C then A -> C

Consistency

When a mutable state is distributed over multiple machines each machine


can receive update events at different times and in different orders. If the
final state is dependent on the order of updates then the system must
choose a single serialization of the events, imposing a global total order. A
distributed system is consistent exactly when the outside world can never
observe two different serializations
Ordering requires waiting
Even your hardware cannot escape this law. It provides the illusion of
synchronous access to memory at the cost of availability. If you want to
write fast parallel programs then you need to understand the messaging
model used by the underlying hardware.

Eventual Consistency

A system is eventually consistent if the final state of each machine is the


same regardless of how we choose to serialize update events. An eventually
consistent system allows us to sacrifice consistency for availability without
having the state of different machines diverge irreparably. It doesn't save
us from having the outside world see different serializations of updated
events. It is also difficult to construct eventually consistent data structures
and reason about their composition.

4. Snapshot algorithms for the FIFO channel.


Each distributed system has a number of processes running on a number of
different physical servers. These processes communicate with each other
via communication channels using text messaging. These processes neither
have a shared memory nor a common physical clock, this makes the
process of determining the instantaneous global state difficult.

A process could record its own local state at a given time but the messages
that are in transit (on their way to being delivered) would not be included
in the recorded state and hence the actual state of the system would be
incorrect after the time in transit message is delivered.

Chandy and Lamport were the first to propose an algorithm to capture a


consistent global state the main idea behind the proposed algorithm is that
if we know that all
the message that what has been sent by one process has been received by
another then we can record the global state of the system.

Any process in the distributed system can initiate this global state
recording algorithm using a special message called MARKER. This
marker traverses the distributed system across all communication channel
and causes each process to record its own state. In the end, the state of the
entire system (Global state) is recorded. This algorithm does not interfere
with the normal execution of processes.

Assumptions of the algorithm:

● There is a finite number of processes in the distributed system and


they do not share memory and clocks.
● There is a finite number of communication channels and they are
unidirectional and FIFO ordered.
● There exists a communication path between any two processes in the
system
● On a channel, messages are received in the same order as they are
sent.

Algorithm:

● Marker sending rule for a process P :


o Process p records its own local state
o For each outgoing channel C from process P, P sends a marker along
C before sending any other messages along C.
(Note: Process Q will receive this marker on his incoming channel C1.)
● Marker receiving rule for a process Q :
o If process Q has not yet recorded its own local state then
▪ Record the state of incoming channel C1 as an empty sequence or
null.
▪ After recording the state of incoming channel C1, process Q Follows
the marker sending rule
o If process Q has already recorded its state
▪ Record the state of incoming channel C1 as the sequence of messages
received along channel C1 after the state of Q was recorded and before Q
received the marker along C1 from process P.

Need of taking snapshots or recording the global state of the system:

● Checkpointing: It helps in creating checkpoints. If somehow the


application fails, this checkpoint can be used re
● Garbage collection: It can be used to remove objects that do not have
any references.
● It can be used in deadlock and termination detection.
● It is also helpful in other debugging.

Unit 3.
1. Ricart Agarwala algorithm.
Ricart–Agrawala algorithm is an algorithm to for mutual exclusion in a
distributed system proposed by Glenn Ricart and Ashok Agrawala. This
algorithm is an extension and optimization of Lamport’s Distributed
Mutual Exclusion Algorithm. Like Lamport’s Algorithm, it also follows
permission-based approach to ensure mutual exclusion.
In this algorithm:
● Two types of messages ( REQUEST and REPLY) are used and
communication channels are assumed to follow FIFO order.
● A site sends a REQUEST message to all other sites to get their
permission to enter the critical section.
● A site sends a REPLY message to another site to give its permission
to enter the critical section.
● A timestamp is given to each critical section request using Lamport’s
logical clock.
● The timestamp is used to determine the priority of critical section
requests. Smaller timestamp gets high priority over larger timestamp. The
execution of critical section requests is always in the order of their
timestamp.
Algorithm:
● To enter the Critical section:
○ When a site Si wants to enter the critical section, it sends a
timestamped REQUEST message to all other sites.
○ When a site Sj receives a REQUEST message from site Si, It sends a
REPLY message to site Si if and only if
■ Site Sj is neither requesting nor currently executing the critical
section.
■ In the case of Site, Sj's request, the timestamp of Site Si‘s request is
smaller than its own request.
○ Otherwise, the request is deferred by site Sj.
● To execute the critical section:
○ Site Si enters the critical section if it has received the REPLY
message from all other sites.
● To release the critical section:
○ Upon exiting the site Si sends a REPLY message to all the deferred
requests.
Drawbacks of Ricart–Agrawala algorithm:

● Unreliable approach: failure of any one node in the system can halt
the progress of the system. In this situation, the process will starve forever.
The problem of failure of a node can be solved by detecting failure after
some timeout.

2. Maekawa's algorithm.
Maekawa’s Algorithm is a quorum-based approach to ensure mutual
exclusion in distributed systems. As we know, In permission-based
algorithms like Lamport’s Algorithm, Ricart-Agrawala Algorithm, etc. a
site request permission from every other site but in quorum based
approach, A site does not request permission from every other site but
from a subset of sites which is called a quorum.
In this algorithm:
● Three types of messages ( REQUEST, REPLY, and RELEASE) are
used.
● A site sends a REQUEST message to all other sites in its request set
or quorum to get their permission to enter the critical section.
● A site sends a REPLY message requesting the site to give its
permission to enter the critical section.
● A site sends a RELEASE message to all other sites in its request set
or quorum upon exiting the critical section.

Algorithm:
● To enter the Critical section:
○ When a site Si wants to enter the critical section, it sends a request
message REQUEST(i) to all other sites in the request set Ri.
○ When a site Sj receives the request message REQUEST(i) from site
Si, it returns a REPLY message to site Si if it has not sent a REPLY
message to the site from the time it received the last RELEASE message.
Otherwise, it queues up the request.
● .
● To execute the critical section:
○ A site Si can enter the critical section if it has received the REPLY
message from all the sites in the request set Ri
● To release the critical section:
○ When a site Si exits the critical section, it sends a RELEASE(i)
message to all other sites in request set Ri
○ When a site Sj receives the RELEASE(i) message from site Si, it
sends a REPLY message to the next site waiting in the queue and deletes
that entry from the queue
○ In case the queue is empty, site Sj updates its status to show that it
has not sent any REPLY message since the receipt of the last RELEASE
message

Drawbacks of Maekawa’s Algorithm:


● This algorithm is deadlock prone because a site is exclusively locked
by other sites and requests are not prioritized by their timestamp.

3. Suzuki kasami.
Suzuki–Kasami algorithm is a token-based algorithm for achieving mutual
exclusion in distributed systems. This is a modification of the
Ricart–Agrawala algorithm, a permission-based (Non-token based)
algorithm which uses REQUEST and replies messages to ensure mutual
exclusion.
In token-based algorithms, A site is allowed to enter its critical section if it
possesses a unique token. Non-token-based algorithms use the timestamp
to order requests for the critical section whereas sequence number is used
in token-based algorithms.
Each request for a critical section contains a sequence number. This
sequence number is used to distinguish between old and current requests.
Data structure and Notations:

● An array of integers RN[1…N]


A site Si keeps RNi[1…N], where RNi[j] is the largest sequence number
received so far through the REQUEST message from site Si.
● An array of integer LN[1…N]
This array is used by the token.LN[J] is the sequence number of the
request that is recently executed by site Sj.
● A queue Q
This data structure is used by the token to keep a record of the ID of sites
waiting for the token
Algorithm:

● To enter the Critical section:


○ When a site Si wants to enter the critical section and it does not have
the token then it increments its sequence number RNi[i] and sends a
request message REQUEST(i, sn) to all other sites in order to request the
token.
Here sn is updated value of RNi[i]
○ When a site Sj receives the request message REQUEST(i, sn) from
site Si, it sets RNj[i] to maximum of RNj[i] and sn i.e RNj[i] = max(RNj[i],
sn).
○ After updating RNj[i], Site Sj sends the token to site Si if it has token
and RNj[i] = LN[i] + 1
● To execute the critical section:
○ Site Si executes the critical section if it has acquired the token.
● To release the critical section:
After finishing the execution Site Si exits the critical section and does the
following:
○ sets LN[i] = RNi[i] to indicate that its critical section request RNi[i]
has been executed
○ For every site Sj, whose ID is not present in the token queue Q, it
appends its ID to Q if RNi[j] = LN[j] + 1 to indicate that site Sj has an
outstanding request.
○ After the above updation, if the Queue Q is non-empty, it pops a site
ID from the Q and sends the token to the site indicated by popped ID.
○ If the queue Q is empty, it keeps the token

Drawbacks of Suzuki–Kasami Algorithm:

● Non-symmetric Algorithm: A site retains the token even if it does not


have requested for a critical section. According to the definition of a
symmetric algorithm
“No site possesses the right to access its critical section when it has not been
requested.”

4. Broadcast algorithm.

5. System model.

System models Purpose illustrate/describe common properties and


design choices for illustrating/describe common properties and design
choices for a distributed system in a single descriptive model Three
types of models Physical models: capture the hardware composition of
a system in terms of computers and other devices and their
interconnecting network; Architecture models: define the main
components of the system, what their roles are and how they interact
(software 2 system, what their roles are and how they interact (software
architecture), and how they are deployed in an underlying network of
computers (system architecture); Fundamental models: a formal
description of the properties that are common to architecture models.
Three fundamental models: – interaction models, failure models, and
security models

6. Models of deadlock.

Deadlock Detection

1. If resources have single instance:


In this case for Deadlock detection we can run an algorithm to check for
cycle in the Resource Allocation Graph. Presence of cycle in the graph is
the sufficient condition for deadlock.

In the above diagram, resource 1 and resource 2 have single instances.


There is a cycle R1 → P1 → R2 → P2. So, Deadlock is Confirmed.
tiple instances of resources
Detection of the cycle is necessary but not sufficient condition for deadlock
detection, in this case, the system may or may not be in deadlock varies
according to different situations.

DeadlockRecovery
A traditional operating system such as Windows doesn’t deal with
deadlock recovery as it is time and space consuming process. Real-time
operating systems use Deadlock recovery.
Recovery method

1. Killing the process: killing all the process involved in the deadlock.
Killing process one by one. After killing each process check for deadlock
again keep repeating the process till system recover from deadlock.
2. Resource Preemption: Resources are preempted from the processes
involved in the deadlock, preempted resources are allocated to other
processes so that there is a possibility of recovering the system from
deadlock. In this case, the system goes into starvation.

7. Knapp's classification.

Unit 4.
1. Checkpoint best recovery.
Checkpoint-Recovery is a common technique for imbuing a program or
system with fault tolerant qualities, and grew from the ideas used in
systems which employ transaction processing
. It allows systems to recover after some fault interrupts the system, and
causes the task to fail, or be aborted in some way. While many systems
employ the technique to minimize lost processing time, it can be used more
broadly to tolerate and recover from faults in a critical application or task.

The basic idea behind checkpoint-recover is the saving and restoration of


system state. By saving the current state of the system periodically or
before critical code sections, it provides the baseline information needed for
the restoration of lost state in the event of a system failure. While the cost
of checkpoint-recovery can be high, by using techniques like memory
exclusion, and by designing a system to have as small a critical state as
possible may minimize the cost of checkpointing enough to be useful in
even cost-sensitive embedded applications.

When a system is checkpointed, the state of the entire system is saved to


non-volatile storage. The checkpointing mechanism takes a snapshot of the
system state and stores the data on some non-volatile storage medium.
Clearly, the cost of a checkpoint will vary with the amount of state required
to be saved and the bandwidth available to the storage mechanism being
used to save the state.

In the event of a system failure, the internal state of the system can be
restored, and it can continue service from the point at which its state was
last saved. Typically this involves restarting the failed task or system and
providing some parameter indicating that there is a state to be recovered.
Depending on the task complexity, the amount of state, and the bandwidth
to the storage device this process could take from a fraction of a second to
many seconds.

This technique provides protection against the transient fault model.


Typically upon state will continue processing in an identical manner as it
did previously. This
will tolerate any transient fault, however, if the fault was caused by a
design error, then the system will continue to fail and recover endlessly. In
some cases, this may be the most important type of fault to guard against,
but not in every case.

Unfortunately, it has only limited utility in the presence of a software


design fault. Consider for instance a system that performs control
calculations, one of which is to divide a temperature reading into some
value. Since the specification requires the instrument to read out in degrees
Kelvin (absolute temperature), a temperature of 0 is not possible. In this
case, the programmer (realizing this) fails to check for zero prior to
performing the divide. The system works well for a few months, but then
the temperature gauge fails. The manufacturer realizes that a 0K
temperature is not possible, and decides that the gauge should fail low,
since a result of 0 is obviously indicative of a failure. The system faults, and
attempts to recover its state. Unfortunately, it reaches the divide between
instruction and faults and continues to recover and fault until some human
intervention occurs. The point here is not that there should be redundant
temperature sensors, but that the most common forms of checkpoint and
recovery are not effective against some classes of
failures.

key concepts

The basic mechanism of checkpoint-recovery consists of three key ideas -


the saving and restoration of the executive state, and the detection of the
need to restore the system state. Additionally, for more complex distributed
embedded systems, the checkpoint-recovery mechanism can be used to
migrate processes off individual nodes

Saving executive state

A snapshot of the complete program state may be scheduled periodically


during program execution. Typically this is accomplished by pausing the
operation of the process whose state is to be saved and copying the memory
pages into non-volatile storage. While this can be accomplished by using
freely available checkpoint-recovery libraries, it may be more efficient to
build a customized mechanism into the system to be protected.

Between full snapshots, or even in place of all but the first complete shot,
only that state which has changed may be saved. This is known as
incremental checkpointing and can be thought of in the same way as
incremental backups of hard disks. The basic idea here is to minimize the
cost of checkpointing, both in terms of the time required and the space (on
non-volatile storage).

Not all program states may need to be saved. System designers may find it
more efficient to build mechanisms to regenerate states internally, based on
a smaller set of the saved states. Although this technique might be difficult
for some applications, it has the benefit of having the potential to save both
time and space during both checkpoint and recovery operations.

A technique known as memory exclusion allows a program to notify the


checkpoint algorithm which memory areas are stated as critical and which
are not. This technique is similar to that of rebuilding the state discussed
above, in that it facilitates saving only the information most critical to the
program state. The designer can exclude large working set arrays, string
constants, and other similar memory areas from being checkpointed.

When these techniques are combined, the cost of checkpointing can be


reduced by factors of 3-
4. Checkpointing, like any fault-tolerant computing technique, does require
additional resources.
Whether or not it will work well, is highly dependent on both the target
system design and the application. Typically those systems which must
meet hard real-time deadlines will have the most difficulty
implementing any type of checkpoint-recovery system

RestoringExecutivestate

When a failure has occurred, the recovery mechanism restores the system
state to the last checkpointed value. This is the fundamental idea in the
tolerance of a fault within a system employing checkpoint recovery. Ideally,
the state will be restored to a condition before the fault occurred within the
system. After the state has been restored, the system can continue normal
execution.

The state is restored directly from the last complete snapshot or


reconstructed from the last snapshot and the incremental checkpoints. The
concept is similar to that of a journaled file system, or even RCS(revision
control system), in that only the changes to a file are recorded. Thus when
the file is to be loaded or restored, the original document is loaded, and
then the specified changes are made to it. In a similar fashion, when the
state is restored to a system that has undergone one or more incremental
checkpoints, the last full checkpoint is loaded, and then modified according
to the state changes indicated by the incremental checkpoint data.

If the root cause of the failure did not manifest until after a checkpoint, and
that cause is part of the state or input data, the restored system is likely to
fail again. In such a case the error in the system may be latent through
several checkpoint cycles. When it finally activates and causes a system
failure, the recovery mechanism will restore the state (including the error!)
and execution will begin again, most likely triggering the same activation
and failure. Thus it is in the system designers' best interest to ensure that
any checkpoint-recovery-based system is fail fast - meaning errors are
either tolerated or case the system fails immediately, with little or no
incubation period.

Such recurring failures might be addressed through multi-level rollbacks


and/or algorithmic diversity. Such a system would detect multiple failures
as described above, and recover state from checkpoint data previous to the
last recovery point. Additionally, when the system detects such multiple
failures it might switch to a different algorithm to perform its functionality,
which may not be susceptible to the same failure modes. The system might
degrade its performance by using a more robust, but less efficient
algorithm in an attempt to provide base-level functionality to get past the
fault before switching back to the more efficient routines.

Failure Detection

Failure detection can be a tricky part of any fault-tolerant design.


Sometimes the line between an unexpected (but correct) result, and
garbage out is difficult to discern. In traditional checkpoint-recovery
failure, detection is somewhat simplistic. If the process or system
terminates, there is a failure. Additionally, some systems will recover state
if they attempted a non-transactional operation that failed and returned.
The discussion of failure detection, and especially how it impacts embedded
systems is left to the chapters on fault tolerance, reliability, dependability,
and architecture.

2. Log-based rollback recovery.

Atomicity property of DBMS states that either all the operations of


transactions must be performed or none. The modifications done by an
aborted transaction should not be visible to
database and the modifications done by the committed transactions should
be visible.

To achieve our goal of atomicity, the user must first output stable storage
information describing the modifications, without modifying the database
itself. This information can help us ensure that all modifications performed
by committed transactions are reflected in the database. This information
can also help us ensure that no modifications made by an aborted
transaction persist in the database.

Log and log records –


The log is a sequence of log records, recording all the updated activities in
the database. In stable storage, logs for each transaction are maintained.
Any operation which is performed on the database is recorded on the log.
Prior to performing any modification to the database, an update log record
is created to reflect that modification.

An update log record represented as: <Ti, Xj, V1, V2> has these fields:

1. Transaction identifier: Unique Identifier of the transaction that


performed the write operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to writing.
4. New value: Value of data item after the write operation.

Other types of log records are:

1. <Ti start>: It contains information about when a transaction Ti


starts.
2. <Ti commit>: It contains information about when a transaction Ti
commits.
3. <Ti abort>: It contains information about when a transaction Ti
aborts.

Undo and Redo Operations –


Because all database modifications must be preceded by the creation of a
log record, the system has available both the old value prior to
modification of the data item and the new value that is to be written for the
data item. This allows the system to perform redo and undo operations as
appropriate:

1. Undo: using a log record sets the data item specified in the log record
to an old value.
2. Redo: using a log record sets the data item specified in the log record
to the new value.

The database can be modified using two approaches –

1. Deferred Modification Technique: If the transaction does not modify


the database until it has been partially committed, it is said to use the
deferred modification technique.
2. Immediate Modification Technique: If database modification occurs
while the transaction is still active, it is said to use an immediate
modification technique.

Recovery using Log records –


After a system crash has occurred, the system consults the log to determine
which transactions need to be redone and which need to be undone.

1. Transaction Ti needs to be undone if the log contains the record <Ti


start> but does not contain either the record <Ti commit> or the record
<Ti abort>.
2. Transaction Ti needs to be redone if log contains record <Ti start>
and either the record
<Ti commit> or the record <Ti abort>.
Use of Checkpoints –
When a system crash occurs, the user must consult the log. In principle,
that needs to search the entire log to determine this information. There are
two major difficulties with this approach:

1. The search process is time-consuming.


2. Most of the transactions that, according to our algorithm, need to be
redone have already written their updates into the database. Although
redoing them will cause no harm, it will cause recovery to take longer.

To reduce these types of overhead, users introduce checkpoints. A log


record of the form
<checkpoint L> is used to represent a checkpoint in the log where L is a list
of transactions active at the time of the checkpoint. When a checkpoint log
record is added to the log all the transactions that have been committed
before this checkpoint have <Ti commit> log record before the checkpoint
record. Any database modifications made by Ti are written to the database
either prior to the checkpoint or as part of the checkpoint itself. Thus, at
recovery time, there is no need to perform a redo operation on Ti.

After a system crash has occurred, the system examines the log to find the
last <checkpoint L> record. The redo or undo operations need to be
applied only to transactions in L, and to all transactions that started
execution after the record was written to the log. Let us denote this set of
transactions as T. Same rules of undo and redo are applicable on T as
mentioned in Recovery using the Log records part.

Note that the user needs to only examine the part of the log starting with
the last checkpoint log record to find the set of transactions T, and to find
out whether a commit or abort record occurs in the log for each
transaction in T. For example, consider the set of transactions {T0, T1, . . .,
T100}. Suppose that the most recent checkpoint took place during the
execution of transactions T67 and T69, while T68 and all transactions with
subscripts lower than 67 were completed before the checkpoint. Thus, only
transactions T67, T69, . . ., and T100 need to be considered during the
recovery scheme. Each of them needs to be redone if it has been completed
(that is, either committed
or aborted); otherwise, it was incomplete and needs to be undone.

3. Algorithm for synchronous checkpointing and recovery.

4. Agreement in synchronous systems with failures.

Unit 5.
1. Peer-to-peer middleware.

When a peer-to-peer architecture is adopted, data and services are no


longer gathered in a single point of accumulation. Instead, they are spread
across all the nodes of the distributed system. Users may directly host the
resources they want to share with others, with no need to publish them on a
particular server.
Interestingly, these features are relevant not only in mobile scenarios but
also in fixed ones, where the decentralized nature of a peer-to-peer
architecture naturally encompasses the case of multisite or multi-company
projects, whose cooperation infrastructure must span administrative
boundaries, and is subject to security concerns.
Unfortunately, most of the peer-to-peer applications developed in recent
years started from premises that are rather different from those outlined
thus far. They target the Internet and aim at providing peer-to-peer
computing over millions of nodes, with file-sharing as their main
application concern. The difference in perspective from the domain of
collaborative work is made evident by their search capabilities, which
typically do not guarantee capturing information about all matching files.
In most cases, they do not take into consideration features like security or
the ability to support reactive interactions, which are crucial in cooperative
business applications. Moreover, they bring peer-to-peer to an extreme,
where the logical network of peers is totally fluid, and none can be assumed
to be fixed and contribute to the definition of permanent infrastructure.
This radical view prevents access to resources exported by non-connected
peers, which is unacceptable in the business world, where critical data is
often required to be always available, independently of its owner.
PEERWARE
On the basis of the above considerations, we have developed PEERWARE:
a peer-to-peer middleware for teamwork support specifically geared
towards the enterprise domain.
PEERWARE is both a model and an incarnation of this model in a
middleware. In developing both, our first concerns were minimality and
flexibility.
The Model
The PEERWARE coordination model exploits the notion of a global virtual
data structure (GVDS), which is a generalization of the LIME coordination
model. Coordination among units is enabled through a data space that is
transiently shared and dynamically built out of the data spaces provided by
each accessible unit.
The data structure managed by PEERWARE is a hierarchy of nodes
containing documents, where a document may actually be accessible from
multiple nodes, as shown in Figure 1. This structure resembles a standard
file system, where directories play the role of nodes, files are the
documents, and Unix-like hard links are allowed only on documents.

2. Chord.

3. Addressable Content Network.

As described above, the entire CAN space is divided among the nodes
currently in the system. To allow the CAN to grow incrementally, a new
node that joins the system must be allocated its own portion of the
coordinate space. This is done by an existing node splitting its allocated
zone in half, retaining half, and handing the other half to the new node.
The process takes three steps: 1. First the new node must find a node
already in the CAN. 2. Next, using the CAN routing mechanisms, it
must find a node whose zone will be split. 3. Finally, the neighbors of the
split zone must be notified so that routing can include the new node.
Bootstrap A new CAN node first discovers the IP address of any node
currently in the system. The functioning of a CAN does not depend on
the details of how this is done, but we use the same bootstrap
mechanism as Yallcast and YOID. We assume that a CAN has an
associated DNS domain name and that this resolves to the IP address of
one or more CAN bootstrap nodes. A bootstrap node maintains a partial
list of CAN nodes it believes are currently in the system. To join a CAN,
a new node looks up the CAN domain name in DNS to retrieve a
bootstrap node’s IP address. The bootstrap node then supplies the IP
addresses of several randomly chosen nodes currently in the system.
Finding a Zone The new node then randomly chooses a point P in the
space and sends a JOIN request destined for point P. This message is
sent into the CAN via any existing CAN node. Each CAN node then uses
the CAN routing mechanism to forward the message, until it reaches the
node in whose zone P lies. This current occupant node then splits its
zone in half and assigns one half to the new node. The split is done by
assuming a certain ordering of the dimensions in deciding along which
dimension a zone is to be split so that zones can be re-merged when
nodes leave. For a 2-d space a zone would first be split along the X
dimension, then the Y, and so on. The (key, value) pairs from the half
zone to be handed over are also transferred to the new node. Joining the
Routing Having obtained its zone, the new node learns the IP addresses
of its coordinate neighbor set from the previous occupant. This set is a
subset of the previous occupant’s neighbors, plus that occupant itself.
Similarly, the previous occupant updates its neighbor set to eliminate
those nodes that are no longer neighbors. Finally, both the new and old
nodes’ neighbors must be informed of this reallocation of space. Every
node in the system sends an immediate update message, followed by
periodic refreshes, with its currently assigned zone to all its neighbors.
These soft-state style updates ensure that all of their neighbors will
quickly learn about the change and will update their own neighbor sets
accordingly. Figures 2 and 3 show an example of a new node (node 7)
joining a 2-dimensional CAN. As can be inferred, the addition of a new
node affects only a small number of existing nodes in a very small
locality of the coordinate space. The number of neighbors a node
maintains depends only on the dimensionality of the coordinate space
and is independent of the total number of nodes in the system. Thus,
node insertion affects only O(number of dimensions) of existing nodes
which are important for CANs with huge numbers of nodes.

4. Tapestry

Tapestry is a decentralized distributed system. It is an overlay network


that implements simple key-based routing. It is a prototype of a
decentralized, scalable, fault-tolerant, adaptive location and routing
infrastructure Each node serves as both an object store and a router
that applications can contact to obtain objects. In a Tapestry network,
objects are published at nodes, and once an object has been successfully
published, it is possible for any other node in the network to find the
location at which that object is published. The difference between
Chord and Tapestry is that in Tapestry the application chooses where to
store data, rather than allowing the system to choose a node to store the
object at. The application only publishes a reference to the object. The
Tapestry P2P overlay network provides efficient scalable
location-independent routing to locate objects distributed across the
Tapestry nodes. The hashed node identifiers are termed VIDs (Virtual
ID) and the hashed object identifiers are termed as GUIDs (Globally
Unique ID).

5. Memory consistency models.

Consistency Model
● A consistency model is a contract between a distributed data store
and processes, in which the processes agree to obey certain rules in contrast
the store promises to work correctly.
● A consistency model basically refers to the degree of consistency that
should be maintained for the shared memory data.
● If a system supports the stronger consistency model, then the weaker
consistency model is automatically supported but the converse is not true.
● The types of consistency models are Data-Centric and client-centric
consistency models.
1. Data-Centric Consistency Models
A data store may be physically distributed across multiple machines. Each
process that can access data from the store is assumed to have a local or
nearby copy available of the entire store.
i.Strict Consistency model
● Any read on a data item X returns a value corresponding to the
result of the most recent write on X
● This is the strongest form of memory coherence which has the most
stringent consistency requirement.
● Strict consistency is the ideal model but it is impossible to implement
in a distributed system. It is based on absolute global time or a global
agreement on the commitment to changes.
ii.Sequential Consistency
● Sequential consistency is an important data-centric consistency
model which is a slightly weaker consistency model than strict consistency.
● A data store is said to be sequentially consistent if the result of any
execution is the same as if the (read and write) operations by all processes
on the data store were executed in some sequential order and the
operations of each individual process should appear in this sequence in a
specified order.
● Example: Assume three operations read(R1), write(W1), and
read(R2) performed in order on a memory address. Then (R1, W1,
R2),(R1, R2, W1),(W1, R1, R2)(R2, W1, R1) are acceptable provided all
processes see the same ordering.
iii. Linearizability
● It that is weaker than strict consistency, but stronger than sequential
consistency.
● A data store is said to be linearizable when each operation is
timestamped and the result of any execution is the same as if the (read and
write) operations by all processes on the data store were executed in some
sequential order
● The operations of each individual process appear in sequence order
specified by its program.
● If tsOP1(x)< tsOP2(y), then operation OP1(x) should precede OP2(y)
in this sequence.
iv. Causal Consistency
● It is a weaker model than sequential consistency.
● In Casual Consistency, all processes see only those memory reference
operations in the correct order that are potentially causally related.
● Memory reference operations that are not related may be seen by
different processes in a different order.
● A memory reference operation is said to be causally related to
another memory reference operation if the first operation is influenced by
the second operation.
● If a write(w2) operation is casually related to another write (w1) the
acceptable order is (w1, w2).
v.FIFO Consistency
● It is weaker than causal consistency.
● This model ensures that all write operations performed by a single
process are seen by all other processes in the order in which they were
performed like a single process in a pipeline.
● This model is simple and easy to implement having good
performance because processes are ready in the pipeline.
● Implementation is done by sequencing write operations performed at
each node independently of the operations performed on other nodes.
● Example: If (w11) and (w12) are write operations performed by p1 in
that order and (w21),(w22) by p2. A process p3 can see them as
[(w11,w12),(w21,w2)] while p4 can view them as [(w21,w2),(w11,w12)].
vi. Weak consistency
● The basic idea behind the weak consistency model is enforcing
consistency on a group of memory reference operations rather than
individual operations.
● A Distributed Shared Memory system that supports the weak
consistency model uses a special variable called a synchronization variable
which is used to synchronize memory.
● When a process accesses a synchronization variable, the entire
memory is synchronized by making visible the changes made to the
memory to all other processes.
vii.Release Consistency
● The release consistency model tells whether a process is entering or
exiting from a critical section so that the system performs either of the
operations when a synchronization variable is accessed by a process.
● Two synchronization variables acquire and release are used instead
of a single synchronization variable. Acquire is used when the process
enters a critical section and release is when it exits a critical section.
● Release consistency can be viewed as a synchronization mechanism
based on barriers instead of critical sections.
viii. Entry Consistency
● In entry consistency, every shared data item is associated with a
synchronization variable.
● In order to access consistent data, each synchronization variable
must be explicitly acquired.
● Release consistency affects all shared data but entry consistency
affects only those shared data associated with a synchronization variable.
2. Client-Centric Consistency Models
● Client-centric consistency models aim at providing a system-wide
view of a data store.
● This model concentrates on consistency from the perspective of a
single mobile client.
● Client-centric consistency models are generally used for applications
that lack simultaneous updates where most operations involve reading
data.
i.Eventual Consistency
● In Systems that tolerate a high degree of inconsistency, if no updates
take place for a long time all replicas will gradually and eventually become
consistent. This form of consistency is called eventual consistency.
● Eventual consistency only requires those updates that guarantee
propagation to all replicas.
● Eventual consistent data stores work fine as long as clients always
access the same replica.
● Write conflicts are often relatively easy to solve when assuming that
only a small group of processes can perform updates. Eventual consistency
is therefore often cheap to implement.
ii. Monotonic Reads Consistency
● A data store is said to provide monotonic-read consistency if a
process reads the value of a data item x, any successive read operation on x
by that process will always return that same value or a more recent value.
● A process has seen a value of x at time t, it will never see an older
version of x at a later time.
● Example: A user can read incoming mail while moving. Each time
the user connects to a different e-mail server, that server fetches all the
updates from the server that the user previously visited. Monotonic Reads
guarantees that the user sees all updates, no matter from which server the
automatic reading takes place.
iii. Monotonic Writes
● A data store is said to be monotonic-write consistent if a write
operation by a process on a data item x is completed before any successive
write operation on X by the same process.
● A write operation on a copy of data item x is performed only if that
copy has been brought up to date by means of any preceding write
operations, which may have taken place on other copies of x.
● Example: Monotonic-write consistency guarantees that if an update
is performed on a copy of Server S, all preceding updates will be
performed first. The resulting server will then indeed become the most
recent version and will include all updates that have led to previous
versions of the server.
iv. Read Your Writes
● A data store is said to provide read-your-writes consistency if the
effect of a write operation by a process on data item x will always be a
successive read operation on x by the same process.
● A write operation is always completed before a successive read
operation by the same process no matter where that read operation takes
place.
● Example: Updating a Web page and guaranteeing that the Web
browser shows the newest version instead of its cached copy.
v.Writes Follow Reads
● A data store is said to provide writes-follow-reads consistency if a
process has a write operation on a data item x following a previous read
operation on x then it is guaranteed to take place on the same or a more
recent value of x that was read.
● Any successive write operation by a process on a data item x will be
performed on a copy of x that is up to date with the value most recently
read by that process.
● Example: Suppose a user first reads an article A and then posts a
response B. By requiring writes-follow-reads consistency, B will be written
to any copy only after A has been written.

You might also like