0% found this document useful (0 votes)
13 views

DC - Unit I

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DC - Unit I

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING

Sub Code: CS3551

Subject Name: DISTRIBUTED COMPUTING

Semester/Year: III/V

Unit I-Study Material

Prepared by Approved by

Dr.N.Indumathi HoD

1
UNIT I - INTRODUCTION

SL NO TOPIC PAGE NO
UNIT – I : Syllabus with PL 05
1 INTRODUCTION: DEFINITION 06
2 RELATION TO COMPUTER SYSTEM COMPONENTS 16
3 MOTIVATION 19
4 MESSAGE - PASSING SYSTEMS VERSUS SHARED MEMORY 24
SYSTEMS
5 PRIMITIVES FOR DISTRIBUTED COMMUNICATION 31
6 SYNCHRONOUS VERSUS ASYNCHRONOUS EXECUTIONS 39
7 DESIGN ISSUES AND CHALLENGES 43
8 A MODEL OF DISTRIBUTED COMPUTATIONS: A 54
DISTRIBUTED PROGRAM
A MODEL OF DISTRIBUTED EXECUTIONS
9 MODELS OF COMMUNICATION NETWORKS 64
10 GLOBAL STATE OF A DISTRIBUTED SYSTEM 69

TOTAL PAGES : 74

UNIT – I : Syllabus

SYLLABUS:

UNIT IINTRODUCTION 8
2
Introduction: Definition-Relation to Computer System Components – Motivation – Message - Passing
Systems versus Shared Memory Systems – Primitives for Distributed Communication – Synchronous
versus Asynchronous Executions – Design Issues and Challenges; A Model of Distributed Computations:
A Distributed Program – A Model of Distributed Executions – Models of Communication Networks –
Global State of a Distributed System.
TOTAL TOPICS: 10

3
1. INTRODUCTION: DEFINITION
1.1 Introduction to Distributed System
 A distributed system is a system whose components are located on different networked computers,
which communicate and coordinate their actions by passing messages to one another.
 Distributed computing is a field of computer science that studies distributed systems. Wikipedia

What Is a Distributed System?


A distributed system consists of multiple components, possibly across geographical boundaries, that
communicate and coordinate their actions through message passing.

Fundamental concepts of a distributed system


 A distributed system contains multiple nodes that are physically separate but linked together using
the network.
 All the nodes in this system communicate with each other and handle processes in tandem
(grouped together one behind the other).
 Each of these nodes contains a small part of the distributed operating system software.

1.2 Definition: Distributed Computing


Distributed computing
 Distributed computing also refers to the use of distributed systems to solve computational
problems.
Definition: A Distributed computing is a model of computation that is firmly related to Distributed
Systems, refers to as multiple computer systems located at different places linked together over a network
and use to solve higher level computation without having to use an expensive supercomputer.
 In distributed computing, a problem is divided into many tasks, each of which is solved by one or
more computers, which communicate with each other via message passing.
 Distributed computing is a model in which components of a software system are shared among
multiple computers or nodes.
 Even though the software components may be spread out across multiple computers in multiple
locations, they're run as one system.
 This is done to improve efficiency and performance.

4
Fundamental Principle: make distribution transparent to user.
Goal: The goal is to make task management as efficient as possible and to find practical flexible
solutions.
 The term “distributed computing” describes a digital infrastructure in which a network of
computers solves pending computational tasks.
 Despite being physically separated, these autonomous computers work together closely in a process
where the work is divvied up.
 The hardware being used is secondary to the method here. In addition to high-performance computers
and workstations used by professionals, you can also integrate minicomputers and desktop computers
used by private individuals.
 Distributed hardware cannot use a shared memory due to being physically separated, so the
participating computers exchange messages and data (e.g. computation results) over a network.
1.3 Distributed System and Distributed Computing
Definition: Distributed Computing (DC)
A Distributed computing is a model of computation that is firmly related to Distributed Systems, refers
to as multiple computer systems located at different places linked together over a network and use to solve
higher level computation without having to use an expensive supercomputer.
Definition: Distribution System & Distributed Computing
A distributed system is a system whose components are located on different networked computers,
which communicate and coordinate their actions by passing messages to one another. Distributed
computing is a field of computer science that studies distributed systems.
Wikipedia
1.4 Key components of Distributed Computing System
There are three key components of a Distributed Computing System
a) Devices or Systems:
 The devices or systems in a distributed system have their own processing capabilities and may
also store and manage their own data.
b) Network:
 The network connects the devices or systems in the distributed system, allowing them to
communicate and exchange data.
c) Resource Management:

5
 Distributed systems often have some type of resource management system in place to allocate
and manage shared resources such as computing power, storage, and networking.

Fundamental Principle: make distribution transparent to user.

 The architecture of a Distributed Computing System is typically a Peer-to-Peer Architecture, where


devices or systems can act as both clients and servers and communicate directly with each other.
 The three basic components of a distributed system include
a) primary system controller,
b) system data store, and
c) database.
 In a non-clustered environment, optional components consist of user interfaces and secondary
controllers.

6
1. Primary system controller
 The primary system controller is the only controller in a distributed system and keeps track of
everything.
 It’s also responsible for controlling the dispatch and management of server requests throughout the
system.
 The executive and mailbox services are installed automatically on the primary system controller.
 In a non-clustered environment, optional components consist of a user interface and secondary
controllers.
a) Secondary controller
 The secondary controller is a process controller or a communications controller.
 It’s responsible for regulating the flow of server processing requests and managing the system’s
translation load.
 It also governs communication between the system and VANs or trading partners.

b) User-interface client
 The user interface client is an additional element in the system that provides users with important
system information.
 This is not a part of the clustered environment, and it does not operate on the same machines as the
controller.
 It provides functions that are necessary to monitor and control the system.
2. System data store
 Each system has only one data store for all shared data.
 The data store is usually on the disk vault, whether clustered or not.
 For non-clustered systems, this can be on one machine or distributed across several devices, but all
of these computers must have access to this datastore.
3. Database
 In a distributed system, a relational database stores all data. Once the data store locates the data, it
shares it among multiple users.
 Relational databases can be found in all data systems and allow multiple users to use the same
information simultaneously.

7
1.5 Advantages and Disadvantages Distributed Computing System
1.5.1. Advantages
Advantages of the Distributed Computing System are:
1) Scalability:
 Vertical scaling is generally constrained by hardware limitations, for instance, we can only have so
many processor cores.
 However, we can theoretically achieve unlimited horizontal scaling with relatively inexpensive
commodity machines. 
 Distributed systems are generally more scalable than centralized systems, as they can easily add new
devices or systems to the network to increase processing and storage capacity. 
2) Reliability:
 Distributed systems are often more reliable than centralized systems, as they can continue to operate
even if one device or system fails. 
 Since a distributed system is comprised of multiple machines with data replicated on multiple nodes,
it’s generally more resilient to the failure of a part of the system. Hence, the overall system continues
to function, even if at a reduced capacity. 
3) Flexibility:
 Distributed systems are generally more flexible than centralized systems, as they can be configured
and reconfigured more easily to meet changing computing needs.
4) Performance:
 Typical applications of distributed computing work by breaking down a workload into smaller parts
that can be run on multiple machines simultaneously.
 Hence, this greatly improves the performance of many complex workloads, like matrix
multiplications.
1.5.2. Disadvantages
There are a few limitations to Distributed Computing System and they are:
 Complexity: Distributed systems can be more complex than centralized systems, as they involve
multiple devices or systems that need to be coordinated and managed. 
 Security: It can be more challenging to secure a distributed system, as security measures must be
implemented on each device or system to ensure the security of the entire system. 

8
 Performance: Distributed systems may not offer the same level of performance as centralized
systems, as processing and data storage is distributed across multiple devices or systems. 
1.6 Nature of Distributed Computing Systems
1.6.1 Loosely Coupling
 Changes to one component are unlikely to affect the others.
 The nodes in distributed computing systems are an example of loose coupling;
o they have their own memory and processing power and communicate with other nodes only
when necessary.
 Loosely coupled systems are often more scalable, flexible and interoperable than tightly coupled ones
because changes to one part don't affect the others.
 This also results in lower costs much of the time.
 A loosely coupled system is one:
1. in which components are weakly associated (have breakable relationships) with each other, and
thus changes in one component least affect existence or performance of another component.
2. in which each of its components has, or makes use of, little or no knowledge of the definitions of
other separate components. Subareas include the coupling of classes, interfaces, data, and
services.[1] Loose coupling is the opposite of tight coupling.

1.6.2 Loosely Coupling in Distributed System


 The loosely coupled system contains distributed memory. On the other hand, a tightly coupled system
has a shared memory.
 The loosely coupled system contains a low data rate. On the other hand, the tightly coupled system
contains a high data rate.
 These are the systems in which data is stored and processed on many machines which are
connected by some network.
 To make it more simple, distributed systems are a collection of several separate(individual) systems
which communicate (through a LAN or WAN) and cooperate with each other (using some
software) in order to provide the users, access to various resources that the system maintains. 

9
 One important point to note about distributed systems is that they are loosely-coupled i.e;
hardware and software may communicate with each other but they need not depend upon each
other.
Eg: Solaris Operating System
 The major Objectives for a Distributed system development are:
1. Making resources easily available.
2. Open & Scalable.
3. Distribution transparency i.e; the fact that the resources are distributed must be hidden.
 Thus the loosely-coupled components in Distributed System provides major support for
performance.

1.7 Applications of Distributed Computing Systems


Distributed Computing Systems have a number of applications, including:
a) Cloud Computing: Cloud Computing systems are a type of distributed computing system that are
used to deliver resources such as computing power, storage, and networking over the Internet.
b) Peer-to-Peer Networks: Peer-to-Peer Networks are a type of distributed computing system that is

used to share resources such as files and computing power among users.
c) Distributed Architectures: Many modern computing systems, such as microservices architectures,
use distributed architectures to distribute processing and data storage across multiple devices or
systems.
 A distributed computation consists of a set of processes that cooperate to achieve a common goal.

10
 A main characteristic of these computations is that the processes do not already share a common
global memory and that they communicate only by exchanging messages over a communication
network.
 Moreover, message transfer delays are finite yet unpredictable.
1.8 Types of Distributed Systems
1. Client Server System can be applied with multiple servers.
2. Peer-to-Peer Systems: Peer-to-Peer System communication model works as a decentralized model in
which the system works like both Client and Server.
Nodes are an important part of a system.
Types of Distributed Systems
1. Client/Server Systems
2. Peer-to-Peer Systems
3. Middleware
4. Three-tier
5. N-tier
1. Client/Server Systems:
 Client-Server System is the most basic communication method where the client sends input to the
server and the server replies to the client with an output.
 The client requests the server for resources or a task to do, the server allocates the resource or
performs the task and sends the result in the form of a response to the request of the client.
 Client Server System can be applied with multiple servers.
2. Peer-to-Peer Systems:
 Peer-to-Peer System communication model works as a decentralized model in which the system works
like both Client and Server. Nodes are an important part of a system. 
 In this, each node performs its task on its local memory and shares data through the supporting
medium, this node can work as a server or as a client for a system. 
 Programs in the peer-to-peer system can communicate at the same level without any hierarchy.
3. Middleware:
 Middleware can be thought of as an application that sits between two separate applications and
provides service to both.

11
 It works as a base for different interoperability applications running on different operating systems.
 Data can be transferred to other between others by using this service.
4. Three-tier:
 Three-tier system uses a separate layer and server for each function of a program. In this data of the
client is stored in the middle tier rather than sorted into the client system or on their server through
which development can be done easily. 
 It includes an Application Layer, Data Layer, and Presentation Layer. This is mostly used in web or
online applications. 
5. N-tier:
 N-tier is also called a multitier distributed system.
 The N-tier system can contain any number of functions in the network. N-tier systems contain similar
structures to three-tier architecture.
 When interoperability sends the request to another application to perform a task or to provide a
service. N-tier is commonly used in web applications and data systems.
2. RELATION TO COMPUTER SYSTEM COMPONENTS
2.1 Distributed Components
When collection of various computers seems a single coherent system to its client, then it is called
distributed system.
 A typical distributed system consists of three major components :
a) Processor(s)
b) Memory unit or Bank(s)
c) Communication Network

The interconnection between the components brings a loosely coupled computing system called
distributed system.
Fundamental Principle of Distributed System : Make distribution transparent to user.
The interconnection of a typical distributed system is shown in Figure 1.1.

12
2.2 Functioning of Distributed Components
 Each computer has a memory-processing unit and the computers are connected by a communication
network.
 Figure 1.2 shows the relationships of the software components that run on each of the computers and
use the local operating system and network protocol stack for functioning. 

 Figure 1.2 schematically shows the interaction of this software with these system components at each
processor.
The same principle is applied to a real time systems used online through internet.
The same is depicted as below:

13
 Here we assume that the middleware layer does not contain the traditional application layer functions
of the network protocol stack, such as http, mail, ftp, and telnet.
 Various primitives and calls to functions defined in various libraries of the middleware layer are
embedded in the user program code.
 The distributed software is also termed as middleware.
 A distributed execution is the execution of processes across the distributed system to collaboratively
achieve a common goal.
 An execution is also sometimes termed a computation or a run.
 The distributed system uses a layered architecture to break down the complexity of system design. 
 The middleware is the distributed software that drives the distributed system, while providing
transparency of heterogeneity at the platform level
 The RPC mechanism conceptually works like a local procedure call, with the difference that the
procedure code may reside on a remote machine, and the RPC software sends a message across the
network to invoke the remote procedure.
 It then awaits a reply, after which the procedure call completes from the perspective of the program
that invoked it.
 Currently deployed commercial versions of middleware often use CORBA, DCOM (distributed
component object model), Java, and RMI (remote method invocation) technologies.
 The message-passing interface (MPI) developed in the research community is an example of an
interface for various communication functions. 
*****************

14
3. MOTIVATION
3.1 Requirements of Distributed System
The motivation for using a distributed system is some or all of the following
requirements:
1) Inherently distributed computations
In many applications such as money transfer in banking, or reaching consensus among parties that are
geographically distant, the computation is inherently distributed.
2) Resource sharing
 The resources such as peripherals, complete data sets in databases, special libraries, as well as data
(variable/files) cannot be fully replicated at all the sites because it is often neither practical nor
cost-effective.
 Further, they cannot be placed at a single site because access to that site might prove to be a
bottleneck.
 Therefore, such resources are typically distributed across the system.
 For example, distributed databases such as DB2 partition the data sets across several servers, in
addition to replicating them at a few sites for rapid access as well as reliability.
3. Access to geographically remote data and resources
 In many scenarios, the data cannot be replicated at every site participating in the
 distributed execution because it may be too large or too sensitive to be replicated.
 For example, payroll data within a multinational corporation is both too large and too sensitive to
be replicated at every branch office/site.
 It is therefore stored at a central server which can be queried by branch offices.
 Similarly, special resources such as supercomputers exist only in certain locations, and to access
such supercomputers, users need to log in remotely.
 Advances in the design of resource-constrained mobile devices as well as in the wireless
technology with which these devices communicate have given further impetus to the importance of
distributed protocols and middleware.
4. Enhanced reliability
 A distributed system has the inherent potential to provide increased reliability because of the
possibility of replicating resources and executions, as well as the reality that geographically

15
distributed resources are not likely to crash/malfunction at the same time under normal
circumstances.
Reliability entails several aspects:
• availability, i.e., the resource should be accessible at all times;
• integrity, i.e., the value/state of the resource should be correct, in the face of concurrent access from
multiple processors, as per the semantics expected by the application;
• fault-tolerance, i.e., the ability to recover from system failures, where such failures may be defined to
occur in one of many failure models
5. Increased performance/cost ratio
 By resource sharing and accessing geographically remote data and resources, the performance/cost
ratio is increased.
 Although higher throughput has not necessarily been the main objective behind using a distributed
system, nevertheless, any task can be partitioned across the various computers in the distributed
system.
 Such a configuration provides a better performance/cost ratio than using special parallel machines.
 This is particularly true of the NOW configuration.
In addition to meeting the above requirements, a distributed system also offers
the following advantages:
6. Scalability
As the processors are usually connected by a wide-area network, adding more processors does not pose a
direct bottleneck for the communication network.
7. Modularity and incremental expandability
 Heterogeneous processors may be easily added into the system without affecting the performance,
as long as those processors are running the same middleware algorithms.
 Similarly, existing processors may be easily replaced by other processors.
3.2 Major Characteristics of distributed system
 There are several characteristics that define a Distributed Computing System.
 A main characteristic of these computations is that the processes do not already share a common
global memory and that they communicate only by exchanging messages over a communication
network.

16
a) Multiple Devices or Systems: Processing and data storage is distributed across multiple devices
or systems.
b) Peer-to-Peer Architecture: Devices or systems in a distributed system can act as both clients and

servers, as they can both request and provide services to other devices or systems in the network.
c) Shared Resources: Resources such as computing power, storage, and networking are shared
among the devices or systems in the network.
Horizontal Scaling: Scaling a distributed computing system typically involves adding more devices or
systems to the network to increase processing and storage capacity. This can be done through hardware
upgrades or by adding additional devices or systems to the network.
3.2 Reasons for building a Distributed System
3.2.1 Fundamental concepts of a distributed system
 A distributed system contains multiple nodes that are physically separate but linked together using
the network.
 All the nodes in this system communicate with each other and handle processes in tandem.
 Each of these nodes contains a small part of the distributed operating system software.

17
Why make a System distributed?
a) It is inherently distributed :
For example, sending a message from your mobile phone to your friend’s phone.
b) For better reliability
Even if one node fails, the system as a whole keep functioning.
c) For better performance
Get data from nearby node rather than one halfway round the world.
d) To solve bigger problem:
For example, a huge amount of data will not be fit into a single machine.
There are four major reasons for building distributed systems:
a) Resource sharing
b) Computation speedup
c) Reliability
d) Communication.
3.2.2 Interconnectivity principles of Distributed System
 A distributed system is a collection of loosely coupled processors interconnected by a communication
network.
 From the point of view of a specific processor in a distributed system, the rest of the processors and
their respective resources are remote, whereas its own resources are local.
 The processors in a distributed system may vary in size and function. They may include small
microprocessors, workstations, minicomputers, and large general-purpose computer systems.
 These processors are referred to by a number of names, such as sites, nodes, computers, machines,
and hosts, depending on the context in which they are mentioned.
 We mainly use site to indicate the location of a machine and host to refer to a specific system at a site.

18
 Generally, one host at one site, the server, has a resource that another host at another site, the client
(or user), would like to use. A general structure of a distributed system is shown in Figure
**************************
4. MESSAGE - PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS
4.1 Introduction
 Communication among processors takes place via shared data variables, and control variables for
synchronization among the processors.
 The communications between the tasks in multiprocessor systems take place through two main
modes:
 Processes can communicate with each other through both:
a) Message passing
b) Shared Memory
a) Message passing
 Message passing is a fundamental mechanism for communication in distributed systems. It enables
processes or nodes to exchange messages and coordinate their actions.
 There are several types of message-passing models, including synchronous, asynchronous, and
hybrid approaches.
 Message passing in distributed systems refers to the communication medium used by nodes
(computers or processes) to commute information and coordinate their actions.
 It involves transferring and entering messages between nodes to achieve various goals such as
coordination, synchronization, and data sharing.
Types of Message Passing
a) Synchronous message passing
b) Asynchronous message passing
c) Hybrids
a) Synchronous Message Passing
 Synchronous message passing is a communication mechanism in existing programming where
processes or threads change messages in a synchronous manner.
 The sender blocks until the receiver has received and processed the message, ensuring
coordination and predictable execution
b) Asynchronous Message Passing

19
 Asynchronous message passing is a communication mechanism in concurrent and distributed
systems that enables processes or factors to change messages without demanding synchronization
in time.
 It involves sending a message to a receiving process or component and continuing execution
without waiting for a response.
 Key characteristics of asynchronous message passing include its asynchronous nature, which
allows the sender and receiver to operate singly without waiting for a response.
c) Hybrids
 Hybrid message passing combines elements of both synchronous and asynchronous message ends.
 It provides flexibility to the sender to choose whether to block and hold on for a response or
continue execution asynchronously.
a) Shared Memory
 Shared memory is memory that may be simultaneously accessed by multiple programs with an
intent to provide communication among them or avoid redundant copies.
 Shared memory is an efficient means of passing data between programs.
4.2 Message Passing System(MPS)
What is a Message Passing System(MPS)?
 If a set of processes communicate with one another by sending and receiving messages over a
communication channel is called message-passing systems.
 Message passing in distributed systems involves communication between nodes to coordinate
actions, exchange data, and propagate information.
Where is message passing used? Why is it required?
Use:
 Mainly the message passing is used for communication.
 It is used in distributed environments where the communicating processes are present on remote
machines which are connected with the help of a network.
Message Passing Requirement:
 In message-passing systems, processes communicate with one another by sending and receiving
messages over a communication channel.

20
 The pattern of the connection provided by the channel is described by some topology systems. The
collection of the channels is called a network.
 This allows multiple processes to read and write data to the message queue without being
connected to each other
 Messages are stored on the queue until their recipient retrieves them.
 Message queues are quite useful for inter process communication and are used by most operating
systems.
4.2.1 Message Passing Model
 In this model, data is shared by sending and receiving messages between co-operating processes,
using system calls .
 Message Passing is particularly useful in a distributed environment where the communicating
processes may reside on different, network connected, systems.
 Message passing architectures are usually easier to implement but are also usually slower than
shared memory architectures.

Message-Passing Communication: for data transfer


 Tasks exchange data through communications by sending and receiving explicit messages.
 Data transfer usually requires cooperative operations to be performed by each process.
 For example, a send operation must have a matching receive operation.
4.2.2 Example for MPS

21
4.2.3 Role of MPS in Distribution Systems
 Message passing is a flexible and scalable method for inter-node communication in distributed
systems.
 It enables nodes to exchange information, coordinate activities, and share data without relying on
shared memory or direct method invocations.
 Models like synchronous and asynchronous message passing offer different synchronization and
communication semantics to suit system requirements.
 Synchronous message passing ensures sender and receiver synchronization, while asynchronous
message passing allows concurrent execution and non-blocking communication.
4.3 SHARED MEMORY SYSTEMS
4.3.1 The Shared Memory
 In this model stored information in a shared region of memory is processed, possibly under the
control of a supervisor process.
 An example might be a single node with
 multiple cores
 share a global memory space
 cores can efficiently exchange/share data

22
4.3.2. Shared Memory Systems (SMS)
Definition: Shared Memory Systems (SMS)
 The shared memory is the memory that can be simultaneously accessed by multiple processes.
This is done so that the processes can communicate with each other.
 Shared memory systems are those in which there is a (common) shared address space throughout
the system. 
 Communication among processors takes place through shared data variables, and control variables
for synchronization among the processors.
 Semaphores and monitors are common synchronization mechanisms on shared memory systems.
Definition: Distributed Shared Memory Systems (DSMS)
 When shared memory model is implemented in a distributed environment, it is termed as
distributed shared memory.
Need of Shared Memory
 Communication among processors takes place via shared data variables, and control variables for
synchronization among the processors.
 Semaphores and monitors that were originally designed for shared memory uniprocessors and
multiprocessors are examples of how synchronization can be achieved in shared memory systems.
 All multicomputer (NUMA as well as message-passing) systems that do not have a shared address
space provided by the underlying architecture and hardware necessarily communicate by message
passing.
 Conceptually, programmers find it easier to program using shared memory than by message
passing.
 For this and several other reasons that we examine later, the abstraction called shared memory is
sometimes provided to simulate a shared address space.
 For a distributed system, this abstraction is called distributed shared memory. 
 Implementing this abstraction has a certain cost but it simplifies the task of the application
programmer. 

23
4.4 Differences between Message Passing System(MPS) and Shared Memory System(SMS)
Message Passing Distributed Shared Memory
Services Offered: The processes share variables directly, so no
Variables have to be marshalled from one marshalling and unmarshalling. Shared
process, transmitted and unmarshalled into other variables can be named, stored and accessed in
variables at the receiving process. DSM.
Processes can communicate with other Here, a process does not have private address
processes. They can be protected from one space. So one process can alter the execution of
another by having private address spaces. other.
This technique can be used in heterogeneous This cannot be used to heterogeneous
computers. computers.
Synchronization between processes is through Synchronization is through locks and
message passing primitives. semaphores.
Processes communicating via message passing Processes communicating through DSM may
must execute at the same time. execute with non-overlapping lifetimes.
Efficiency: Any particular read or update may or may not
All remote data accesses are explicit and involve communication by the underlying
therefore the programmer is always aware of runtime support.
whether a particular operation is in-process or
involves the expense of communication.

4.4.1 Communication between Message-Passing and Shared Memory(SM)


Emulating message-passing on a shared memory system (MP → SM)
 The shared memory system can be made to act as message passing system. The shared address
space can be partitioned into disjoint parts, one part being assigned to each processor.
 Send and receive operations care implemented by writing to and reading from the
destination/sender processor’s address space. The read and write operations are synchronized.
 Specifically, a separate location can be reserved as the mailbox for each ordered pair of processes.

24
Emulating shared memory on a message-passing system (SM → MP)
 This is also implemented through read and write operations. Each shared location can be modeled
as a separate process. Write to a shared location is emulated by sending an update message tothe
corresponding owner process and read operation to a shared location is emulated by sending a
query message to the owner process.
 This emulation is expensive as the processes have to gain access to other process memory location.
The latencies involved in read and write operations may be high even when using shared memory
emulation because the read and write operations are implemented by using network-wide
communication.

 There also exists a well-known folklore result that communication via message-passing can be
simulated by communication via shared memory and vice-versa. 
 Therefore, the two paradigms are equivalent.

*******************
5. PRIMITIVES FOR DISTRIBUTED COMMUNICATION
5.1 Design primitives in distributed system
There four major types of design primitives for distributed system
a) Blocking
b) Non-blocking
c) Synchronous
d) Asynchronous primitives
5.2 Message passing primitives in distributed system
What are the message passing primitives in distributed system?
 Two primitives,
i) SEND and
ii) RECEIVE
are used in the message passing scheme.
 The SEND primitive sends a message to a destination process.
 While the RECEIVE primitive receives a message from a specified source process.
 Message send and message receive communication primitives are denoted

25
Send() and Receive(), respectively.
 A Send primitive has at least two parameters
– the destination, and the buffer in the user space, containing the data to be sent.
 Similarly, a Receive primitive has at least two parameters – the source from which the data is to be
received (this could be a wildcard), and the user buffer into which the data is to be received.
5.3 Message invoking Primitives for distributed communication
 There are two ways of sending data when the Send primitive is invoked
i) the buffered option and
ii) the unbuffered option.
 The buffered option which is the standard option copies the data from the user buffer to the kernel
buffer.
 The data later gets copied from the kernel buffer onto the network.
 In the unbuffered option, the data gets copied directly from the user buffer onto the network.
 For the Receive primitive, the buffered option is usually required because the data may already
have arrived when the primitive is invoked, and needs a storage place in the kernel.
The following are some definitions of blocking/non-blocking and synchronous/
asynchronous primitives:
a) Synchronous primitives
 A Send or a Receive primitive is synchronous if both the Send() and Receive() handshake with each
other.
 The processing for the Send primitive completes only after the invoking processor learns that the
other corresponding Receive primitive has also been invoked and that the receive operation has
been completed.
 The processing for the Receive primitive completes when the data to be received is copied into the
receiver’s user buffer.
b) Asynchronous primitives
 A Send primitive is said to be asynchronous if control returns back to the invoking process after
the data item to be sent has been copied out of the user-specified buffer.
 It does not make sense to define asynchronous Receive primitives.
c) Blocking primitives

26
A primitive is blocking if control returns to the invoking process after the processing for the primitive
(whether in synchronous or asynchronous mode) completes.
d) Non-blocking primitives
 A primitive is non-blocking if control returns back to the invoking process immediately after
invocation, even though the operation has not completed.
 For a non-blocking Send, control returns to the process even before the data is copied out of the
user buffer.
 For a non-blocking Receive, control returns to the process even before the data may have arrived
from the sender.
 For non-blocking primitives, a return parameter on the primitive call returns a system-generated
handle which can be later used to check the status of completion of the call. The process can check
for the completion of the call in two ways.
 First, it can keep checking (in a loop or periodically) if the handle has been flagged or posted.
 Second, it can issue a Wait with a list of handles as parameters.
 The Wait call usually blocks until one of the parameter handles is posted.
 Presumably after issuing the primitive in non-blocking mode, the process has done whatever
actions it could and now needs to know the status of completion of the call, therefore using a
blocking Wait() call is usual programming practice.

The code for a non-blocking Send would look as shown in Figure 1.7.

 If at the time that Wait() is issued, the processing for the primitive (whether synchronous or
asynchronous) has completed, the Wait returns immediately.

27
 The completion of the processing of the primitive is detectable by checking the value of handle_k.
 If the processing of the primitive has not completed, the Wait blocks and waits for a signal to wake
it up.
 When the processing for the primitive completes, the communication subsystem software sets the
value of handle_k and wakes up (signals) any process with a Wait call blocked on this handle_k.
 This is called posting the completion of the operation.
5.4 Four types of the Send primitive
1) synchronous blocking,
2) synchronous non-blocking,
3) asynchronous blocking, and
4) asynchronous non-blocking
For the Receive primitive, there are the blocking synchronous and non-blocking synchronous versions.

These versions of the primitives are illustrated in Figure 1.8 using a timing diagram.

28
Here, three time lines are shown for each process:
(1) for the process execution,
(2) for the user buffer from/to which data is sent/received, and
(3) for the kernel/communication subsystem.

a) Blocking synchronous Send (See Figure 1.8(a))


 The data gets copied from the user buffer to the kernel buffer and is then sent over the network.
 After the data is copied to the receiver’s system buffer and a Receive call has been issued, an
acknowledgement back to the sender causes control to return to the process that invoked the Send
operation and completes theSend.

b) Non-blocking synchronous Send (See Figure 1.8(b))


 Control returns back to the invoking process as soon as the copy of data from the user buffer to the
kernel buffer is initiated.
 A parameter in the non-blocking call also gets set with the handle of a location that the user
process can later check for the completion of the synchronous send operation.
 The locationgets posted after an acknowledgement returns from the receiver, as per the semantics
described for (a).
 The user process can keep checking for the completion of the non-blocking synchronous Send by
testing the returned handle, or it can invoke the blocking Wait operation on the returned
handleV(Figure 1.8(b)).
c) Blocking asynchronous Send (See Figure 1.8(c))
 The user process that invokes the Send is blocked until the data is copied from the user’s buffer to the
kernel buffer. (For the unbuffered option, the user process that invokes the Send is blocked until the
data is copied from the user’s buffer to the network.)
d) Non-blocking asynchronous Send (See Figure 1.8(d))
 The user process that invokes the Send is blocked until the transfer of the data from the user’s buffer
to the kernel buffer is initiated. (For the unbuffered option, the user process that invokes the Send is
blocked until the transfer of the data from the user’s buffer to the network is initiated.)

29
 Control returns to the user process as soon as this transfer is initiated, and a parameter in the non-
blocking call also gets set with the handle of a location that the user process can check later using the
Wait operation for the completion of the asynchronous Send operation.
 The asynchronous Send completes when the data has been copied out of the user’s buffer.
 The checking for the completion may be necessary if the user wants to reuse the buffer from which
the data was sent.
Two types of Receive primitives
a) Blocking Receive (See Figure 1.8(a))
 The Receive call blocks until the data expected arrives and is written in the specified user buffer.
 Then control is returned to the user process.
b) Non-blocking Receive (See Figure 1.8(b))
 The Receive call will cause the kernel to register the call and return the handle of a location that the
user process can later check for the completion of the non-blocking Receive operation.
 This location gets posted by the kernel after the expected data arrives and is copied to the user-
specified buffer.
 The user process can check for the completion of the non-blocking Receive by invoking the Wait
operation on the returned handle. (If the data has already arrived when the call is made, it would be
pending in some kernel buffer, and still needs to be copied to the user buffer.)
***************************
6. SYNCHRONOUS VERSUS ASYNCHRONOUS EXECUTIONS
6.1 SYNCHRONOUS EXECUTIONS
6.1.1 Introduction
 The terms asynchronous and synchronous in the context of programming are a slight difference in
terms execution style.
 These terms refer to two different ways of executing code and understanding the difference between
them is essential for writing efficient and effective programs.
6.1.2 Synchronous Executions
What is Synchronous Execution?
 Synchronous execution is the simplest way of executing code.
 In a synchronous program, each line of code is executed in order, one after the other.

30
 This means that if a line of code takes a long time to execute, the entire program will be blocked until
that line of code is finished.
 Synchronous code is easy to read and write, but it can be problematic for certain types of programs.
o For example, if you are writing a web application that needs to handle multiple requests at the
same time, synchronous code can cause long wait times and slow down the application.
6.1.3 Properties of Synchronous Execution
 Synchronous tasks happen in order — you must finish task one before moving on to the next.
Asynchronous tasks can be executed in any order or even simultaneously.
 Synchronous is a blocking architecture, so the execution of each operation is dependent on the
completion of the one before it.
 Synchronous, sometimes referred to as “sync,” and asynchronous, also known as “async,” are two
different types of programming models.
 Synchronous programming, on the other hand, is advantageous for developers. Quite simply,
synchronous programming is much easier to code.
 It’s well supported among all programming languages, and as the default programming method,
developers don’t have to spend time learning something new that could open the door to bugs.
6.2 Asynchronous Executions
6.2.1 Asynchronous Execution
What is Asynchronous Execution?
 Asynchronous execution is a way of executing code that allows multiple lines of code to run at the
same time.
 This means that if a line of code takes a long time to execute, other lines of code can continue to
run in the meantime.
6.2.2 Properties of Synchronous Execution
 Asynchronous code is more complex than synchronous code, but it can be much more efficient for
certain types of programs.
 For example, if you are writing a web application that needs to handle multiple requests at the
same time, asynchronous code can allow the application to respond to each request quickly and
efficiently.

31
 Asynchronous is a non-blocking architecture, so the execution of one task isn't dependent on
another. Tasks can run simultaneously.
 Asynchronous programming enhances the user experience by decreasing the lag time between
when a function is called and when the value of that function is returned.
 In the real world, this translates to a faster, more seamless flow.
 For example, users want their apps to run fast, but it takes time to fetch data from an API.
 In these cases, asynchronous programming helps the app screen load faster, improving the user
experience.
6.3 Asynchronous vs synchronous programming
 Understanding how these two models differ is critical in building application programming
interfaces (APIs), creating event-based architectures, and deciding how to handle long-running
tasks.
o Async for users and
o sync for developers.
Asynchronous synchronous programming
Async is multi-thread, which means operations Sync is single-thread, so only one operation or
or programs can run in parallel. program will run at a time.
Async is non-blocking, which means it will send Sync is blocking — it will only send the server
multiple requests to a server. one request at a time and will wait for that
request to be answered by the server.
Async increases throughput because multiple Sync is slower and more methodical.
operations can run at the same time.

Some of the main differences between asynchronous and synchronous code:

32
1. Blocking vs Non-Blocking
 Synchronous code is blocking, which means that it stops the execution of the entire program until a
line of code is finished.
 Asynchronous code is non-blocking, which means that it allows other lines of code to continue
running while a long-running line of code is executing.
2. Efficiency
 Asynchronous code can be much more efficient than synchronous code for certain types of
programs.
 For example, if you are writing a web application that needs to handle multiple requests at the
same time, asynchronous code can allow the application to respond to each request quickly and
efficiently.
3. Complexity
 Asynchronous code is more complex than synchronous code.
 Asynchronous code requires the use of callbacks, promises, or async/await functions to manage the
flow of code execution.
 This can make asynchronous code harder to read and write than synchronous code.
4. Debugging
 Debugging asynchronous code can be more difficult than debugging synchronous code.
 Asynchronous code can create race conditions, deadlocks, and other bugs that are hard to
reproduce and fix.
When to Use Asynchronous or Synchronous Execution
 Knowing when to use asynchronous or synchronous execution is essential for writing efficient and
effective programs.
Here are some general guidelines:
Use Synchronous Execution When:
 The program is simple and does not require multiple tasks to be executed at the same time.
 The program is small and easy to manage.
 The program does not require long-running tasks.
Use Asynchronous Execution When:
 The program needs to handle multiple tasks at the same time. The program needs to respond
quickly to requests. The program needs to perform long-running tasks

33
7 : DESIGN ISSUES AND CHALLENGES

7.1 DESIGN ISSUES


 Primary issues in the design of the distributed systems included providing access to remote data in the
face of failures, file system design, and directory structure design.
Below we describe the important design issues and challenges after categorizing them as
(i) having a greater component related to systems design and operating systems design, or
(ii) having a greater component related to algorithm design, or
(iii) emerging from recent technology advances and/or driven by new applications.
 There is some overlap between these categories.
 However, it is useful to identify these categories because of the deep differences among the:
(i) The systems community,
(ii) The theoretical algorithms community within distributed computing, and
(iii) The forces driving the emerging applications and technology.
 For example, the current practice of distributed computing follows the client–server architecture to
a large degree, whereas that receives scant attention in the theoretical distributed algorithms
community.
Two major deep divide reasons for Design practicality are :
First,
 an overwhelming number of applications outside the scientific computing community of users of
distributed systems are business applications for which simple models are adequate.
Second,
 the state of the practice is largely controlled by industry standards, which do not necessarily
choose the “technically best” solution.
 The fundamental issue in design and implementation of DSM system is data irregularity.
 The data irregularity might be raised by the synchronous access.
 To solve this problem in the DSM system we need to utilize some synchronization primitives,
semaphores, event count, and so on.
 Performance is an important issue and challenge of Distributed Software System.
 To minimize the constraints, and thus challenges, problems are to be discussed and solutions are to be
provided.

34
 In distributed software system different task scheduling algorithms are developed.
 These algorithms should be evaluated on different available task evaluation parameters for a specific
task graph which ultimately should represent the DSS.
 The best algorithm performance result should ultimately be adopted.
 This approach will minimize the challenges of DSS.
7.2 Major design issues of distributed systems

The following are some of the major design issues of distributed systems:
Here are a number of design considerations to take into account.
i) Heterogeneity:
 Heterogeneity is applied to the network, computer hardware, operating system, and
implementation of different developers.
 A key component of the heterogeneous distributed system client-server environment is
middleware.
 Middleware is a set of services that enables applications and end-user to interact with each
other across a heterogeneous distributed system.
ii) Openness:
 The openness of the distributed system is determined primarily by the degree to which new
resource-sharing services can be made available to the users.
 Open systems are characterized by the fact that their key interfaces are published.
 It is based on a uniform communication mechanism and published interface for access to shared
resources.
 It can be constructed from heterogeneous hardware and software.
iii) Scalability:
 The scalability of the system should remain efficient even with a significant increase in the
number of users and resources connected.
 It shouldn’t matter if a program has 10 or 100 nodes; performance shouldn’t vary.
 A distributed system’s scaling requires consideration of a number of elements, including size,
geography, and management.
iv) Security:

35
 The security of an information system has three components Confidentially, integrity, and
availability.
 Encryption protects shared resources and keeps sensitive information secrets when transmitted.
v) Failure Handling:
 When some faults occur in hardware and the software program, it may produce incorrect
results or they may stop before they have completed the intended computation so corrective
measures should to implemented to handle this case.
 Failure handling is difficult in distributed systems because the failure is partial i, e, some
components fail while others continue to function.
vi) Concurrency:
 There is a possibility that several clients will attempt to access a shared resource at the same
time. Multiple users make requests on the same resources, i.e. read, write, and update.
 Each resource must be safe in a concurrent environment. Any object that represents a shared
resource in a distributed system must ensure that it operates correctly in a concurrent
environment.
vii) Transparency:
 Transparency ensures that the distributed system should be perceived as a single entity by the
users or the application programmers rather than a collection of autonomous systems, which is
cooperating.
 The user should be unaware of where the services are located and the transfer from a local
machine to a remote one should be transparent.
viii) Performance improvement:
 With the rapid growth of parallel and distributed processing in modern computers a high demand
for performance improvement and low cost productivity in real life is desired.
 Moreover challenges like communication fault delay or computation fault delay may occur
because of network failure or machine failure respectively.
 Making fault tolerant DSS is a tough job.
 The ability to tolerate the fault and functioning normally is required by DSS.
 Synchronization is another important aspect in DSS because Distributed System do not have a
global clock.
 It is required that synchronization be done as per the actual real time.

36
ix) Scheduling issue for distributed system:
 Focuses on Scheduling problems in homogeneous and heterogeneous parallel distributed systems.
 The performance of distributed systems are affected by Broadcast/multicast processing and
required to develop a delivering procedure that completes the processing in minimum time.
x) Controllability and Observability issues:
 Controllability and observability are two important issues in testing because they have an effect on
the capability of the test system to check the conformance of an implementation under test.
 Controllability is the capability of the Test System to force the Implementation under Test to
receive inputs in a given order.
7.2 CHALLENGES
 The Distribution System start addresses the challenges in designing distributed systems from a
system building perspective.
The major challenges are:
a) Security,
b) Maintaining consistency of data in every system,
c) Network Latency between systems,
d) Resource Allocation, or
e) Proper node balancing across multiple nodes
Some of the key challenges include:
i) object storage mechanisms,
ii) efficient object lookup, and
iii) retrieval in a scalable manner;
iv) dynamic reconfiguration with nodes as well as objects joining and leaving the network randomly;
v) replication strategies to expedite object search;
vi) tradeoffs between object size latency and table sizes;
vii) anonymity,
viii) privacy, and
ix) security.
7.2.1 Distributed systems challenges from a system perspective
The following functions must be addressed when designing and building a distributed system:
i) Communication

37
 This task involves designing appropriate mechanisms for communication among the processes in
the network.
Some example mechanisms are:
o remote procedure call (RPC),
o remote object invocation (ROI),
o Message-oriented communication versus stream-oriented communication.
ii) Processes
Some of the issues involved are:
i) management of processes and threads at clients/servers; code migration; and
ii) the design of software and mobile agents.
iii) Naming
 Devising easy to use and robust schemes for names, identifiers, and addresses is essential for locating
resources and processes in a transparent and scalable manner.
 Naming in mobile systems provides additional challenges because naming cannot easily be tied to any
static geographical topology.
iv) Synchronization
 Mechanisms for synchronization or coordination among the processes are essential.
 Mutual exclusion is the classical example of synchronization, but many other forms of
synchronization, such as leader election are also needed.
 In addition, synchronizing physical clocks, and devising logical clocks that capture the essence of
the passage of time, as well as global state recording algorithms, all require different forms of
synchronization.
v) Data storage and access
 Schemes for data storage, and implicitly for accessing the data in a fast and scalable manner across
the network are important for efficiency.
 Traditional issues such as file system design have to be reconsidered in the setting of a distributed
system.
vi) Consistency and replication
 To avoid bottlenecks, to provide fast access to data, and to provide scalability, replication of data
objects is highly desirable.

38
 This leads to issues of managing the replicas, and dealing with consistency among the
replicas/caches in a distributed setting.
 A simple example issue is deciding the level of granularity (i.e., size) of data access.
vii) Fault tolerance
 Fault tolerance requires maintaining correct and efficient operation in spite of any failures of links,
nodes, and processes.
 Process resilience, reliable communication, distributed commit, checkpointing and
 recovery, agreement and consensus, failure detection, and self-stabilization are some of the
mechanisms to provide fault-tolerance.
viii) Security
 Distributed systems security involves various aspects of cryptography, secure channels, access
control, key management – generation and distribution, authorization, and secure group
management.
ix) Applications Programming Interface (API) and transparency
 The API for communication and other specialized services is important for the ease of use and
wider adoption of the distributed systems services by non-technical users.
7.2.2 Transparency on Implementation perspective
Transparency deals with hiding the implementation policies from the user, and can be classified as
follows:
i) Access transparency: hides differences in data representation on different systems and
provides uniform operations to access system resources.
ii) Location transparency : makes the locations of resources transparent to the users.
iii) Migration transparency: allows relocating resources without changing names.
The ability to relocate the resources as they are being accessed is relocation transparency.
iv) Replication transparency: does not let the user become aware of any replication.
v) Concurrency transparency: deals with masking the concurrent use of shared resources for the user.
vi) Failure transparency: refers to the system being reliable and fault-tolerant.
vii) Scalability and modularity :
 The algorithms, data (objects), and services must be as distributed as possible.
 Various techniques such as replication, caching and cache management, and asynchronous
processing help to achieve scalability.

39
7.2.3 Algorithmic challenges in distributed computing
The summarization of the key algorithmic challenges in distributed computing are:
i) Designing useful execution models and frameworks
 The interleaving model and partial order model are two widely adopted models of distributed
system executions.
 They have proved to be particularly useful for operational reasoning and the design of distributed
algorithms.
 The input/output automata model and the TLA (temporal logic of actions) are two other examples
of models that provide different degrees of infrastructure for reasoning more formally with and
proving the correctness of distributed programs.
ii) Dynamic distributed graph algorithms and distributed routing algorithms
 The distributed system is modeled as a distributed graph, and the graph algorithms form the
building blocks for a large number of higher level communication, data dissemination, object
location, and object search functions.
 The algorithms need to deal with dynamically changing graph characteristics, such as to model
varying link loads in a routing algorithm.
 The efficiency of these algorithms impacts not only the user-perceived latency but also the traffic
and hence the load or congestion in the network. Hence, the design of efficient distributed graph
algorithms is of paramount importance.
iii) Time and global state in a distributed system
 The processes in the system are spread across three-dimensional physical space.
 Another dimension, time, has to be superimposed uniformly across space.
 The challenges pertain to providing accurate physical time, and to providing a variant of time,
called logical time.
 Logical time is relative time, and eliminates the overheads of providing physical time for
applications where physical time is not required.
 More importantly, logical time can
(i) capture the logic and inter-process dependencies within the distributed program,
(ii) track the relative progress at each process.
iv) Synchronization/coordination mechanisms

40
 The processes must be allowed to execute concurrently, except when they need to synchronize to
exchange information, i.e., communicate about shared data.
 Synchronization is essential for the distributed processes to overcome the limited observation of
the system state from the viewpoint of any one process.
 The synchronization mechanisms can also be viewed as resource management and concurrency
management mechanisms to streamline the behavior of the processes that would otherwise act
independently.
iv) Group communication, multicast, and ordered message delivery
 A group is a collection of processes that share a common context and collaborate on a common
task within an application domain.
 Specific algorithms need to be designed to enable efficient group communication and group
management wherein processes can join and leave groups dynamically, or even fail.
 When multiple processes send messages concurrently, different recipients may receive the
messages in different orders, possibly violating the semantics of the distributed program. Hence,
formal specifications of the semantics of ordered delivery need to be formulated, and then
implemented.
v) Monitoring distributed events and predicates
 Predicates defined on program variables that are local to different processes are used for specifying
conditions on the global system state, and are useful for applications such as debugging, sensing
the environment, and in industrial process control. On-line algorithms for monitoring such
predicates are hence important.
 An important paradigm for monitoring distributed events is that of event streaming, wherein
streams of relevant events reported from different processes are examined collectively to detect
predicates.
 Typically, the specification of such predicates uses physical or logical time relationships.
vi) Distributed program design and verification tools
 Methodically designed and verifiably correct programs can greatly reduce the overhead of
software design, debugging, and engineering.
 Designing mechanisms to achieve these design and verification goals is a challenge.
vii) Performance

41
 Although high throughput is not the primary goal of using a distributed system, achieving good
performance is important. In large distributed systems, network latency (propagation and
transmission times) and access to shared resources can lead to large delays which must be
minimized.
 The user perceived turn-around time is very important.
The following are some example issues arise in determining the performance:
• Metrics
 Appropriate metrics must be defined or identified for measuring the performance of theoretical
distributed algorithms, as well as for implementations of such algorithms.
 The former would involve various complexity measures on the metrics, whereas the latter would
involve various system and statistical metrics.
• Measurement methods/tools
 As a real distributed system is a complex entity and has to deal with all the difficulties that arise in
measuring performance over a WAN/the Internet, appropriate methodologies and tools must be
developed for measuring the performance metrics.
*******************
8. MODEL OF DISTRIBUTED COMPUTATIONS: DISTRIBUTED PROGRAM
8.1 DISTRIBUTED PROGRAM
Definition: Distributed Program
 A distributed program is composed of a set of asynchronous processes that communicate by
message passing over the communication network.
 A distributed program consists of a collection of processes that work concurrently and
communicate by explicit message passing.
 Each process can access a set of variables which are disjoint from the variables that can be
changed by any other process.
 A computer program that runs within a distributed system is called a distributed program, and
distributed programming is the process of writing such programs.
A distributed program structure is:
 A distributed program is composed of a set of n asynchronous processes, p1, p2, ..., pi , ..., pn.
Example:
Example of a distributed application is

42
 An e-commerce platform that distributes different functions of the application to different
computers in its network is an example for Distributed program.
 The servers or computers host different functions, such as the following: Accept payment from
customers at checkout.
Conditions of Distributed Process
i) Each process may run on different processor.
ii) The processes do not share a global memory and communicate solely by passing messages.
iii) These processes do not share a global clock that is instantaneously accessible to these
processes.
iv) Process execution and message transfer are asynchronous – a process may execute an action
spontaneously and a process sending a message does not wait for the delivery of the message
to be complete.
v) The global state of a distributed computation is composed of the states of the processes and the
communication channels.
vi) The state of a process is characterized by the state of its local memory and depends upon the
context.
vii) The state of a channel is characterized by the set of messages in transit in the channel.
viii) Without loss of generality, we assume that each process is running on a different processor.
ix) Let Cij denote the channel from process pi to process pj and let mij denote a message sent by pi
to pj .
x) The message transmission delay is finite and unpredictable.

8.1.2 Distributed Process execution


 The execution of a process consists of a sequential execution of its actions.
 The actions are atomic and the actions of a process are modeled as three types of events:
i) internal events,
ii) message send events, and
iii) message receive events.
 The execution of process pi produces a sequence of events e1, e2, e3, …, and it is denoted by
o Hi: Hi =(hi  i). Here hiare states produced by pi and
o  are the casual dependencies among events pi.

43
o binary relation →i defines a linear order on these events.
o Relation →i expresses causal dependencies among the events of pi .

  msg indicates the dependency that exists due to message passing between two events.
 Let ex i denote the xth event at process pi .
 For a message m, let send(m) and rec(m) denote its send and receive events, respectively.
 The events at a process are linearly ordered by their order of occurrence.
 The execution of process pi produces a sequence of events
e1i , e2i , ..., and is denoted by Hi where
Hi = (hi , →i )
hi is the set of events produced by pi and
8.1.3 Occurrence of events
 The occurrence of events changes the states of respective processes and channels.
 The occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state.
 An internal event changes the state of the process at which it occurs.
 A send event changes the state of the process that sends the message and the state of the
channel on which the message is sent.
 A receive event changes the state of the process that receives the message and the state of the
channel on which the message is received.

44
Fig: Space time distribution of distributed systems

8.1.4 Message distribution ordering


i) Casual Precedence Relations
 Causal message ordering is a partial ordering of messages in a distributed computing environment.
 It is the delivery of messages to a process in the order in which they were transmitted to that
process.
 It places a restriction on communication between processes by requiring that if the transmission of
message mi to process pk necessarily preceded the transmission of message mj to the same
process, then the delivery of these messages to that process must be ordered such that mi is
delivered before mj.
ii) Happen Before Relation

 The partial ordering obtained by generalizing the relationship between two process is called as
happened-before relation or causal ordering or potential causal ordering.
 This term was coined by Lamport. Happens-before defines a partial order of events in a distributed
system.
 Some events can’t be placed in the order. If say A →B if A happens before B. A→B is defined
using the following rules:
Local ordering:A and B occur on same process and A occurs before B.
Messages: send(m) → receive(m) for any message m
Transitivity: e → e’’ if e → e’ and e’ → e’’

45
Ordering can be based on two situations:
1. If two events occur in same process then they occurred in the order observed.
2. During message passing, the event of sending message occurred before the event of receiving it.
Lamports ordering is happen before relation denoted by
 a→b, if a and b are events in the same process and a occurred before b.
 a→b, if a is the event of sending a message m in a process and b is the event of the same message
m being received by another process.
 If a→b and b→c, then a→c. Lamports law follow transitivity property.
When all the above conditions are satisfied, then it can be concluded that a b is casually related. Consider
two events c and d; c d and d c is false (i.e) they are not casually related, then c and d are said to be
concurrent events denoted as c||d.

Fig: 1:Communication between processes

 Fig 1 shows the communication of messages m1 and m2 between three processes p1, p2 and p3. a,
b, c, d, e and f are events.
 It can be inferred from the diagram that, a b; c d; e f; b->c; d f; a d; a f; b d; b f.
 Also a||e and c||e are concurrent events.

8.2 A MODEL OF DISTRIBUTED EXECUTIONS / COMPUTATIONS

8.2.1 Distributed Computing System Models


The types of Distributed Computing System Models are:
a) Physical Model
b) Architectural Model

46
c) Layered model
d) Micro-services model

a) Physical Model
 A physical model is basically a representation of the underlying hardware elements of a
distributed system.
 It encompasses the hardware composition of a distributed system in terms of computers and
other devices and their interconnections.
 It is primarily used to design, manage, implement and determine the performance of a distributed
system.
A physical model majorly consists of the following components:
i) Nodes :–
 Nodes are the end devices that have the ability of processing data, executing tasks and
communicating with the other nodes.
 These end devices are generally the computers at the user end or can be servers, workstations etc.
 Nodes provision the distributed system with an interface in the presentation layer that enables the
user to interact with other back-end devices, or nodes, that can be used for storage and database
services, or processing, web browsing etc.
 Each node has an Operating System, execution environment and different middleware
requirements that facilitate communication and other vital tasks.
ii) Links :–
 Links are the communication channels between different nodes and intermediate devices. These
may be wired or wireless.
 Wired links or physical media are implemented using copper wires, fibre optic cables etc. The
choice of the medium depends on the environmental conditions and the requirements.
 Generally, physical links are required for high performance and real-time computing.

Different connection types that can be implemented are as follows:


a) Point-to-point links – It establishes a connection and allows data transfer between only two nodes.
b) Broadcast links – It enables a single node to transmit data to multiple nodes simultaneously.

47
c) Multi-Access links – Multiple nodes share the same communication channel to transfer data.
Requires protocols to avoid interference while transmission.
iii) Middleware :–
 These are the softwares installed and executed on the nodes. By running middleware on each
node, the distributed computing system achieves a decentralized control and decision-
making.
 It handles various tasks like communication with other nodes, resource management, fault
tolerance, synchronization of different nodes and security to prevent malicious and
unauthorized access.
iv) Network Topology :–
 This defines the arrangement of nodes and links in the distributed computing system. The
most common network topologies that are implemented are bus, star, mesh, ring or hybrid.
 Choice of topology is done by determining the exact use cases and the requirements.
v) Communication Protocols :–
 Communication protocols are the set rules and procedures for transmitting data from in the
links.
 Examples of these protocols include TCP, UDP, HTTPS, MQTT etc.
 These allow the nodes to communicate and interpret the data.

b) Architectural Model

48
 Architectural model in distributed computing system is the overall design and structure of the
system, and how its different components are organised to interact with each other and provide the
desired functionalities.
 It is an overview of the system, on how will the development, deployment and operations take
place.
 Construction of a good architectural model is required for efficient cost usage, and highly
improved scalability of the applications.

The key aspects of architectural model are –


i) Client-Server model – It is a centralised approach in which the clients initiate requests for services
and severs respond by providing those services.
ii) Peer-to-peer model – It is a decentralised approach in which all the distributed computing nodes,
known as peers, are all the same in terms of computing capabilities and can both request as well as
provide services to other peers.
c) Layered model: –
 It involves organizing the system into multiple layers, where each layer will provision a specific
service.
 Each layer communicated with the adjacent layers using certain well-defined protocols without
affecting the integrity of the system.
 A hierarchical structure is obtained where each layer abstracts the underlying complexity of lower
layers.
d) Micro-services model: –
 In this system, a complex application or task, is decomposed into multiple independent tasks and
these services running on different servers.
 Each service performs only a single function and is focussed on a specific business-capability. This
makes the overall system more maintainable, scalable and easier to understand.
 Services can be independently developed, deployed and scaled without affecting the ongoing
services.
9. MODELS OF COMMUNICATION NETWORKS
9.1 Introduction to Models of Communication Networks

49
 The communications model underlying the network middleware is the most important factor in
how applications communicate.
 The communications model impacts the performance, the ease to accomplish different
communication transactions, the nature of detecting errors, and the robustness to different error
conditions.
 Unfortunately, there is no “one size fits all” approach to distributed applications. Different
communications models are better suited to handle different classes of application domains.

9.2 Models of Communication Networks


The main types of network communications models:
a) Point-to-point
b) Client-server

c) Publish-subscribe
a) Point-to-point model:
 Point-to-point is the simplest form of communication, as illustrated in Figure 8.
 The telephone is an example of an everyday point-to-point communications device.
 To use a telephone, you must know the address (phone number) of the other party. Once a
connection is established, you can have a reasonably high-bandwidth conversation.
 However, the telephone does not work as well if you have to talk to many people at the same time.
 The telephone is essentially one-to-one communication.
 TCP is a point-to-point network protocol designed in the 1970s. While it provides reliable, high-
bandwidth communication, TCP is cumbersome for systems with many communicating nodes.
Figure 8 Point-to-Point

Point-to-point is one-to-one communication.


b) Client-server model:

50
 To address the scalability issues of the Point-to-Point model, developers turned to the Client-
Server model.
 Client-server networks designate one special server node that connects simultaneously to many
client nodes, as illustrated in Figure 9.
Figure 9 Client-Server

Client-server is many-to-one communications.


 Client-server is a "many-to-one" architecture. Ordering pizza over the phone is an example of
client-server communication.
 Clients must know the phone number of the pizza parlor to place an order. The parlor can handle
many orders without knowing ahead of time where people (clients) are located.
 After the order (request), the parlor asks the client where the response (pizza) should be sent. In the
client-server model, each response is tied to a prior request.
 As a result, the response can be tailored to each request. In other words, each client makes a
request (order) and each reply (pizza) is made for one specific client in mind.
 The client-server network architecture works best when information is centralized, such as in
databases, transaction processing systems, and file servers.
 However, if information is being generated at multiple nodes, a client-server architecture requires
that all information are sent to the server for later redistribution to the clients.
 This approach is inefficient and precludes deterministic communications, since the client does not
know when new information is available.
 The time between when the information is available on the server, and when the client asks and
receives it adds a variable latency to the system.

51
c) Publish-subscribe model:
 In the publish-subscribe communications model (Figure 10), computer applications (nodes)
“subscribe” to data they need and “publish” data they want to share.
 Messages pass directly between the publisher and the subscribers, rather than moving into and out
of a centralized server. Most time-sensitive information intended to reach many people is sent by a
publish-subscribe system.
 Examples of publish-subscribe systems in everyday life include television, magazines, and
newspapers.
 Publish-subscribe communication architectures are good for distributing large quantities of time-
sensitive information efficiently, even in the presence of unreliable delivery mechanisms. This
direct and simultaneous communication among a variety of nodes makes publish-subscribe
network architecture the best choice for systems with complex time-critical data flows.

While the publish-subscribe model provides system architects with many advantages, it may not be the
best choice for all types of communications, including:
 File-based transfers (alternate solution: FTP)
 Remote Method Invocation (alternate solutions: CORBA, COM, SOAP)
 Connection-based architectures (alternate solution: TCP/IP)
 Synchronous transfers (alternate solution: CORBA)
Figure 10 Publish-Subscribe

52
Publish-subscribe is many-to-many communications.

 There are several models of the service provided by communication networks, namely, FIFO,
Non-FIFO, and causal ordering.
 In the FIFO model, each channel acts as a first-in first-out message queue and thus, message
ordering is preserved by a channel.

The other three major types of communication models in distributed systems are:
1) FIFO (first-in, first-out): each channel acts as a FIFO message queue.
2) Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and
receiver removes messages in random order.
3) Causal Ordering (CO): It follows Lamport’s law.

The relation between the three models is given by CO  FIFO  N-FIFO.


A system that supports the causal ordering model satisfies the following property:

Thus, the Models of Communication Networks are more supportive role in distributed Computing
Environment.
***************************
10. GLOBAL STATE OF A DISTRIBUTED SYSTEM

10.1 Introduction to Global State


 Distributed Snapshot represents a state in which the distributed system might have been in.
 A snapshot of the system is a single configuration of the system.

What is Global state in DS?

53
 The global state of a distributed system is a collection of the local states of its components,
namely, the processes and the communication channels.
 The state of a process at any time is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application.
Global state
 The global state of a distributed system is a collection of the local states of its components, namely,
the processes and the communication channels.
 The state of a process at any time is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application.
 The state of a channel is given by the set of messages in transit in the channel.

10.2 Inconsistent and Consistent global states


 A message cannot be received if it was not sent; that is, the state should not violate causality.
 Such states are called consistent global states and are meaningful global states. Inconsistent global
states are not meaningfulin the sense that a distributed system can never be in an inconsistent state.

Example:
Let LS x denote the state of process p after the occurrence of event e x and before the event e x+1.

A global state GS1 consisting of local states {LS11 , LS23 , LS33 , LS42} is inconsistent
because the stateof p2 has recorded the receipt of message m12, however, the state of p1 has not
recorded its send.

A global state GS2 consisting of local states {LS 2 , LS 4 , LS 4 , LS 2} is consistent; all the channels
are empty except C21 that contains message m21.

54
Transitless Global state
 A global state is transitless iff all channels are recorded as empty.
 Strongly consistent global states
 A global state is strongly consistent iff it is transitless as well as consistent. The global state
2 3
2 3 4 2
consistingof local states {LS1 , LS , LS , LS4 } is strongly consistent.

10.3 Issues in recording a global state


I1: How to distinguish between the messages to be recorded in the snapshot (either in a channel
state or aprocess state) from those not to be recorded.

 The answer to this comes from conditions C1 and C2 as follows:


o Any message that is sent by a process before recording its snapshot, must be recorded
in the globalsnapshot (from C1).
o Any message that is sent by a process after recording its snapshot, must not be recorded
in the globalsnapshot (from C2).

I2: How to determine the instant when a process takes its snapshot.
The answer to this comes from condition
C2 as follows:
A process pj must record its snapshot before processing a message mij that was sent by process pi after
recording

Cuts of a Distributed Computation


 In the space–time diagram of a distributed computation, a zigzag line joining one arbitrary point on
eachprocess line is termed a cut in the computation.
 Such a line slices the space–time diagram, and thus the set of events in the distributed computation,
into a PAST and a FUTURE.
 The PAST contains all the events to the left of the cut and the FUTURE contains all the events to the
right of the cut.

55
 For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the PAST and FUTURE of C,
respectively.
 Every cut corresponds to a global state and every global state can be graphically represented as a cut
in the computation’s space–time diagram.

 The state of a channel is given by the set of messages in transit in the channel.
 The state of a channel is difficult to state formally because a channel is a distributed entity
and its state depends upon the states of the processes it connects.

Let denote the state of a channel Cij defined as follows:

A distributed snapshot should reflect a consistent state.


A global state is consistent if it could have been observed by an external observer.

For a successful Global State, all states must be consistent:


 If we have recorded that a process P has received a message from a process Q, then we
should have also recorded that process Q had actually send that message.

56
 Otherwise, a snapshot will contain the recording of messages that have been received but
never sent
 The reverse condition (Q has sent a message that P has not received) is allowed.
 The notion of a global state can be graphically represented by what is called a cut.
 A cut represents the last event that has been recorded for each process.

 The history of each process if given by:

 Each event either is an internal action of the process. We denote by si k the state of process pi
immediately before the kth event occurs.

 The state si in the global state S corresponding to the cut C is that of pi immediately after the
last event processed by pi in the cut – eici .
 The set of events eici is called the frontier of the cut.

********************************************

57

You might also like