0% found this document useful (0 votes)
127 views163 pages

Unit - 2

This document discusses models of distributed systems including the physical, architectural, and fundamental models. It describes the physical model in terms of the hardware components and how they communicate by passing messages. The architectural model describes the computational and communication tasks. It then discusses key aspects of distributed system architecture including elements like clients/servers and communication paradigms. Placement strategies such as caching and mobile code are also summarized.

Uploaded by

bharat fifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views163 pages

Unit - 2

This document discusses models of distributed systems including the physical, architectural, and fundamental models. It describes the physical model in terms of the hardware components and how they communicate by passing messages. The architectural model describes the computational and communication tasks. It then discusses key aspects of distributed system architecture including elements like clients/servers and communication paradigms. Placement strategies such as caching and mobile code are also summarized.

Uploaded by

bharat fifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 163

Unit - 2

Communication in Distributed System


Models of Distributed System:
1. Physical Model
2. Architectural Model
3. Fundamental Model
Physical Model:
• Explicit way to describe the system
• Capture the h/w composition of the system in terms of the computers and their interconnecting
networks
Architectural Model:
• Describes the system in terms of the computational and communication tasks performed by its
computational elements ( individual computers)
Fundamental Model:
• Takes an abstract perspective in order to examine individual aspects of a distributed system
• Types – interaction model, failure model, security model
Physical Model
• It is a representation of the underlying hardware elements of a distributed system
that abstracts away from specific details of the computer and networking
technologies employed
Baseline physical model:
A distributed system is defined as one in which h/w and s/w components
located at networked computers communicate and coordinate their actions only by
passing messages
Generations of distributed systems:
1. Early distributed system
2. Internet-scale distributed system
3. Contemporary distributed systems
Early distributed system: (1970 – 1980 )
• They consisted of between 10 and 100 nodes interconnected by a local area network
• They had limited Internet connectivity
• They supported a small range of services ( shared local printer, file server, email, file
transfer)
• Systems were largely homogeneous and openness was not a concern
Internet-scale distributed systems: (1990)
• Larger scale distributed system
• Emerged in response to the dramatic growth of the Internet
• They exploited the infrastructure offered by the Internet to become global
• They incorporated large number of nodes and provide distributed systems services for
global organizations and across organizational boundaries
• Systems were largely heterogeneous with respect to networks, computer architecture,
operating systems and languages employed and development teams involved
• Emphasis was on open standards and associated middleware technologies
Contemporary distributed systems:
• Nodes were desktop computers
Nodes were
Static (remaining in one physical location for extended periods)
Discrete ( not embedded within other physical entity) and
Autonomous ( independent of other computers in terms of their physical infrastructure)

Significant developments in physical model:


• The emergence of mobile computing has led the need for added capabilities such as service
discovery and support for spontaneous interoperation
• The emergence of ubiquitous computing has led to a move from discrete nodes to architecture
where computers are embedded in everyday objects and in the surrounding environment
• The emergence of cloud computing and in particular cluster architectures has led to a move from
autonomous nodes performing a given role to pools of nodes that together provide a given
service
Distributed systems of systems:
A systems of system can be defined as a complex system consisting of a series of
subsystems that are systems in their own right and that come together to perform a particular task
or tasks.
Architectural models
3 stage approach
• Architectural elements
• Architectural patterns
• Middleware platform

Architectural elements:
To understand the fundamental building blocks of a distributed system, it is necessary to consider 4
key questions
1. What are the entities that are communicating in the distributed system?
2. How do they communicate, or, more specifically, what communication paradigm is used?
3. What roles and responsibilities do they have in the overall architecture?
4. How are they mapped on to the physical distributed infrastructure (what is their placement)?
Architectural elements
Communicating entities:
• System oriented perspective
• Problem oriented perspective
System oriented perspective :
Communicating entities – processes, nodes, threads
Problem oriented perspective:
communicating entities – objects, components, web services
Architectural elements
Communication paradigm:
- 3 types of communication paradigm
1. Interprocess communication
2. Remote invocation
3. Indirect communication
Interprocess communication:
- Provides low level support for communication between processes in DS
Ex. Message passing primitives, socket programming, multicast communication
Remote Invocation:
- Most common communication paradigm in DS
- Based on two way exchange between communicating entities in a DS
- Results in calling of a remote operation, procedure or method
Ex. Request-reply protocols, remote procedure calls, remote method invocation
Architectural elements
Indirect communication:
- Communication is through third entity
- Allows a strong degree of decoupling between senders and receivers
Ex. Group communication, Publish-subscribe systems, Message queues, Tuple spaces, Distributed
shared memory
Group communication – one to many communication, recipients elect to receive messages sent to a
group by joining the group
Publish-subscribe systems – provides an intermediary service that efficiently ensures information
generated by producer is routed to consumers who desire this information
Message queues - point to point service , producer processes can send messages to a specified
queue and consumer processes can receive message from the queue
Tuple spaces – processes can place arbitrary items of structured data, called tuples , in a persistent
tuple space and other processes can read or remove such tuples from tuple space by specifying
patterns of interest
Distributed shared memory – provides an abstraction for sharing data between processes that do
not share physical memory
Roles and Responsibilities:
From the role of individual processes there are two architectural styles
1. Client-server architecture
2. Peer-to-peer architecture
Client-Server Architecture:
• Most important and most widely employed architecture
The role of client and server played by processes are illustrated by the figure
• Client processes interact with individual server processes in potentially separate host computers
in order to access the shared resources that they manage
• Servers may in turn be clients of other servers (ex. Web servers are clients to DNS service)
• Programs that run in the background at a search engine site and using HTTP requests to access
web servers throughout the Internet are called web crawlers
• A search engine is both a server and a client
- it responds to queries from browser clients
- it runs web crawlers that act as clients of other web servers
Peer-to-peer architecture :
• In this architecture all of the processes involved in a task or activity play similar roles and interact
cooperatively as peers without any distinction between client and server processes or computers
on which they run.
• All participating processes run the same program and offer the same set of interfaces to each
other
• The aim of peer-to-peer architecture is to exploit the resources in a large number of participating
computers
Placement:
• The question of where to place a given client or server in terms of machines and processes within
machines is a matter of careful design
• Placement needs to take into account
- the patterns of communication between entities,
- the reliability of given machines and their current loading,
- the quality of communication between different machines and so on.
Placement Strategies:
1. Mapping of services to multiple servers
2. Caching
3. Mobile code
4. Mobile agents
Mapping of services to multiple servers:

Services may be implemented as several server processes in separate host computers interacting as
necessary to provide a service to client processes.
Caching:
• A cache is a store of recently used data objects that is closer to one client or a
particular set of clients than the objects themselves
• When new object is received from a server it is added to the local cache store,
replacing some existing objects if necessary
• When an object is needed by a client process, the caching service first checks the
cache and supplies the object from there if an up-to-date copy is available
• If not an up-to-date copy is fetched.
Location of caches
- co-located with each client
- Located in a proxy-server
Mobile code:
Ex. Applets
The user running a browser selects a link to an applet whose code is stored on a web server. The
code is downloaded to the browser and runs there which gives good interactive response since it
does not suffer from delays
Mobile code is a potential security threat to the local resources in the destination computer so the
browsers give applet limited access to local resources.
Mobile agents (ex. Worm programs)
• It is a running program that travels from one computer to another in a network carrying out a task
on someone’s behalf (collecting information, returning results)
• Mobile agents might be used to install and maintain software on the computers within an
organization
• Mobile agents are a potential security threat to the resources in computers that they visit
• The environment receiving a mobile agent should decide which of the local resources it should be
allowed to use, based on the identity of the user on whose behalf the agent is acting.
Architectural pattern
Key architecture pattern
• Layering
• Tiered architecture
Layering:
• Complex system is partitioned into a number of layers
• Each layer makes use of the services offered by the layer below
• A given layer therefore offers a software abstraction, with the higher layers being unaware of the
implementation details, or indeed of any other layers beneath them
• In distributed systems services are organized into layers
Platform:
• Platform for distributed systems and applications consists of the lowest level hardware and
software layers.
• These low level layers provide services to the layers above them
• The layers are implemented independently in each computer
• The system's programming interface facilitates communication and coordination between
processes.
Middleware:
• It is a layer of software whose purpose is to mask heterogeneity and to provide a convenient
programming model to application programmers.
• It is represented by processes or objects in a set of computers that interact with each other to
implement communication and resource sharing support for distributed applications
• It provides useful building blocks for the construction of software components that can work with
one another in a distributed system.
• It raises the level of the communication activities of application programs.
Tiered architecture:
• Tiering organizes the functionality of a given layer and place this functionality into appropriate
servers and as a secondary consideration on to physical nodes.
• The functional decomposition of a given application consists of
 presentation logic – concerned with handling user interaction and updating the view of
the application as presented to the user
 Application layer – concerned with the detailed application specific processing associated
with the application
 data logic – concerned with the persistent storage of application in DBMS
Types of tiered architecture:
1. Two tier architecture
2. Three tier architecture
Two tier architecture:

• In two tier architecture the three aspects are partitioned into two processes namely client and
the server
• The application logic is splitted with some residing in the client while the remaining resides in the
server
• To invoke an operation only one message exchange is enough
Three tier architecture:

• In three tier architecture there is one-to-one mapping from logic elements to physical servers
• Each tier has a well defined role
• It increases the complexity of managing three servers
• It also increases the network traffic and latency
Role of AJAX:
It enables the development and deployment of major interactive web
applications
It enables Javascript front-end programs to request new data directly
from server programs
Any data items can be requested and the current page is updated
selectively to show the new values
Thin client:
• It refers to a software layer that supports a window based user interface that is local to the user
while executing application programs ,or more generally accessing services on a remote computer
• It increases the delay due to network and operating system latencies (due to interactive graphics)

Virtual network computing(VNC):


• It provides remote access to user interfaces
• VNC client interacts with VNC server through a VNC protocol
Other patterns:
 Proxy - It is a recurring pattern in DS designed to support location transparency. It offers exactly the same
interface as the remote object. The programmer makes call only on the proxy object
 Brokerage – it is an architectural pattern that supports interoperability. It consists of service provider,
service requester and the service broker
 Reflection – it is a pattern used in DS to support introspection (dynamic discovery of properties ) and
intercession ( ability to dynamically modify structure / behaviour) . The std. service interfaces are available
at the base level, but a meta level interface is also available to provide access to components and their
parameters involved in the realization of the services.
Associate middleware solutions
The task of the middleware is to
 provide programming abstraction for the development of DS
 abstract the heterogeneity in the underlying infrastructure
 promote interoperability and portability
Categories of middleware:
Interaction model:
DS consists of many processes interacting in complex ways
Ex.1 Many server processes may cooperate with one another to provide a service
Ex.2 A set of peer processes may cooperate with one another to achieve a common goal

The interacting processes perform all activities in a DS.


Each process has its own state, consists of a set of data that it can access and update.
The state belonging to each process is completely private.

Factors affecting the interacting processes:


• Communication
• Time
Performance of communication channels:
Communication over a computer network has the following performance characteristics related to
latency, bandwidth and jitter
Latency: it is the delay between the start of a message’s transmission from one process and the
beginning of its receipt by another process. It includes
• the time taken by the first of a string of bits transmitted through a network to reach
its destination
• the delay in accessing the network
•The time taken by the OS communication services at both the sending and receiving
processes
Bandwidth : it is the total amount of information that can be transmitted over the network in a
given time
Jitter: it is the variation in the time taken to deliver a series of message. It is related to
multimedia data
Computer clocks and timing events:
Each computer in a DS has its own internal clock, which can be used by local
processes to obtain the value of the current time.
Two processes running on different computers can each associate timestamps with
their events
Even if two processes read their clocks at the same time, their local clocks may
supply different time values. This is because computer clocks drift from perfect time
and more importantly from one another.
Drift- deviation
Types of interaction models based on time:
• Synchronous DS
time to execute each step of a process has known upper and lower bound
each message transmitted over a channel is received within a known bounded time
each process has a local clock whose drift rate from real time has a known bound
• Asynchronous DS – there is no bounds on
process execution time
message transmission delay
clock drift rate
Failure Model:
It defines the ways in which failure may occur in order to provide an understanding of the
effects of failures
Types of failures:
• Omission failures
• Arbitrary failures
• Timing failures
Omission failure:
It refers to cases when a process or communication channel fails to perform actions that it
is supposed to do
Arbitrary failures(Byzantine failure)
it is used to describe the worst possible failure semantics, in which any type of error may
occur.
Timing failure:
it is applicable in synchronous DS where time limits are set on process execution time,
message delivery time and clock drift rate.
Masking failure:
A service masks a failure either by hiding it altogether or by converting it into a more
acceptable type of failure.
Reliability of one-to-one communication:
the term reliable communication is defined in terms of validity and integrity as follows
Validity: Any message in the outgoing message buffer is eventually delivered to the incoming
message buffer.
Integrity: the message received is identical to one sent , and no messages are delivered twice

Threat to Integrity:
Retransmission of messages
Injection of spurious messages by malicious users
Security Model
The security of a DS can be achieved by
• Securing the processes and the channels used for their interaction
• Protecting the objects ( encapsulate against unauthorized access)
Protecting objects:
The server manages a collection of objects on behalf of some users
The users can run a client program that send invocations to the server
to perform operations on the objects
The server carries out the operations specified in each invocation and
sends the result to the client
Objects are intended to be used in differently by users
Some objects may hold user’s private data while others may hold shared data
Access rights are used to specify who is allowed to perform operations on an object.
Each invocation and each result is associated with an authority called , principal
The principal may be a user or a process
The server verifies the identity of the principal behind each invocation and checks whether they
have sufficient access rights
The client may check the identity of the principal behind the server to ensure that the result comes
from the correct server.

Securing processes and their interactions:


The enemy: (Adversary) An enemy is capable of sending any messages to any process and reading
or copying any message sent between a pair of processes
Threats from a potential enemy include
1. Threat to processes
2. Threat to communication channels
Threat to processes:
• a process that is designed to handle incoming requests may receive a message from any
other process in the DS and it can’t determine the identity of the sender
• an enemy can generate a message with a forged source address
• the lack of reliable knowledge of the source of a message is a threat to the correct
functioning of both servers and clients. Ex. Spoofing the mail server
Threats to communication channels:
• An enemy can copy, alter or inject messages as they travel across the network and its intervening
gateways.
• These attacks present a threat to the privacy and integrity of information as it travels over the
network and to the integrity of the system.
• These threats are overcome by the use of secured channels which are based on cryptography and
authentication
Defeating security threats:
Cryptography:
• It is the science of keeping message secure.
• It is based on encryption algorithms that uses secret keys to transform data in a manner that can
only be reversed with the knowledge of the corresponding decryption key.
Authentication:
Authentication provides the identity of the sender.
Secure channels:
• It is a communication channel connecting a pair of processes, each of which acts on behalf of a
principal.
• It has the following properties
- each of the processes knows the identity of the principal on whose behalf the other
process is executing
- it ensures the privacy and integrity of data transmitted across it
- each message includes a physical or logical time stamp to prevent messages being
replayed or reordered.
Secure channel:

Other possible threat from enemy:


Denial of service
Mobile code
The characteristics of interprocess communication
Message passing between a pair of processes can be supported by message communication operations,
• send and
• receive .
To communicate, one process sends a message to a destination and another process at the destination
receives the message. This activity involves the communication of data from the sending process to the
receiving process and may involve the synchronization of the two processes.

Synchronous and asynchronous communication


In the synchronous form of communication,
• both send and receive are blocking operations.
• Whenever a send is issued the sending process (or thread) is blocked until the corresponding
receive is issued.
• Whenever a receive is issued by a process (or thread), it blocks until a message arrives.
In the asynchronous form of communication,
• The send operation is non-blocking, in that the sending process is allowed to proceed as soon as
the message has been copied to a local buffer, and the transmission of the message proceeds in
parallel with the sending process.
• The receive operation can have blocking and non-blocking variants. In the non-blocking variant,
the receiving process proceeds with its program after issuing a receive operation.
Message destinations
Messages are sent to (Internet address, local port) pairs. A local port is a message destination within a
computer.
Processes may use multiple ports to receive messages. Any process that knows the number of a port can send a
message to it. Servers generally publicize their port numbers for use by clients.
If the client uses a fixed Internet address to refer to a service, then that service must always run on the same
computer for its address to remain valid.

Reliability
Reliable communication in terms of validity and integrity.
A point-to-point message service can be described as reliable if messages are guaranteed to be delivered
despite a ‘reasonable’ number of packets being dropped or lost.
For integrity, messages must arrive uncorrupted and without duplication.

Ordering
Applications require that messages be delivered in sender order – that is, the order in which they were
transmitted by the sender
Sockets
Both forms of communication (UDP and TCP) use the socket abstraction, which provides an
endpoint for communication between processes.

For a process to receive messages, its socket must be bound to a local port and one of the Internet
addresses of the computer on which it runs.
Processes may use the same socket for sending and receiving messages, but a process cannot share
ports with other processes on the same computer.
Java API for Internet addresses
Java provides a class, InetAddress, that represents Internet addresses. Users of this class refer to
computers by Domain Name System (DNS) hostnames.
For example, to get an object representing the Internet address of the host whose DNS name is
bruno.dcs.qmul.ac.uk, use:
InetAddress aComputer = InetAddress.getByName("bruno.dcs.qmul.ac.uk");

UDP datagram communication


A datagram sent by UDP is transmitted from a sending process to a receiving process without
acknowledgement or retries. If a failure occurs, the message may not arrive.

A server will bind its socket to a server port – one that it makes known to clients so that they can send
messages to it.

A client binds its socket to any free local port. The receive method returns the Internet address and
port of the sender, in addition to the message, allowing the recipient to send a reply.
The following are some issues relating to datagram communication:
i. Message size: The receiving process needs to specify an array of bytes of a particular size in which
to receive a message. If the message is too big for the array, it is truncated on arrival.

ii. Blocking: Sockets normally provide non-blocking sends and blocking receives for datagram
communication .The send operation returns when it has handed the message to the underlying
UDP and IP protocols.
The method receive blocks until a datagram is received, unless a timeout has been set on the socket.
When a server receives a message from a client, the message may specify work to do, in which
case the server will use separate threads to do the work and to wait for messages from other
clients.

iii.Timeouts: In some programs, it is not appropriate that a process that has invoked a receive
operation should wait indefinitely in situations where the sending process may have crashed or the
expected message may have been lost. To allow for such requirements, timeouts can be set on
sockets.
iv. Receive from any: The receive method does not specify an origin for messages. Instead, an
invocation of receive gets a message addressed to its socket from any origin.

Failure model for UDP datagrams


UDP datagrams suffer from the following failures:
i. Omission failures: Messages may be dropped occasionally, either because of a checksum error
or because no buffer space is available at the source or destination.
ii. Ordering: Messages can sometimes be delivered out of sender order.

Use of UDP
UDP datagrams are sometimes an attractive choice because they do not suffer from the
overheads associated with guaranteed message delivery.

For example, the Domain Name System, Voice over IP (VoIP) run over UDP.
Java API for UDP datagrams
The Java API provides datagram communication by means of two classes:
1. DatagramPacket and
2. DatagramSocket

DatagramPacket: This class provides a constructor that makes an instance out of


• an array of bytes comprising a message,
• the length of the message
• the Internet address and
• local port number of the destination socket

Datagram packet:
array of bytes containing message length of message Internet address port number

This class provides another constructor for use when receiving a message. Its arguments specify an array of
bytes in which to receive the message and the length of the array.

The message can be retrieved from the DatagramPacket by means of the method getData. The
methods getPort and getAddress access the port and Internet address.
DatagramSocket:
This class supports sockets for sending and receiving UDP datagrams.
• It provides a constructor that takes a port number as its argument, for use by processes that
need to use a particular port.
• It also provides a no-argument constructor that allows the system to choose a free local port.

The class DatagramSocket provides methods that include the following:


i. send and receive:
These methods are for transmitting datagrams between a pair of sockets. The argument of send is an instance of
DatagramPacket containing a message and its destination.
The argument of receive is an empty DatagramPacket in which to put the message, its length and its origin.
ii. setSoTimeout:
This method allows a timeout to be set. With a timeout set, the receive method will block for the time specified.
iii. connect:
This method is used for connecting to a particular remote port and Internet address.
TCP stream communication
The API to the TCP protocol, provides the abstraction of a stream of bytes to which data may be written and
from which data may be read.
The following characteristics of the network are hidden by the stream abstraction:
i. Message sizes: The application can choose how much data it writes to a stream or reads from it. On arrival,
the data is handed to the application as requested. Applications can, if necessary, force data to be sent
immediately.
ii. Lost messages: The TCP protocol uses an acknowledgement scheme. If the sender does not receive an
acknowledgement within a timeout, it retransmits the message.
iii. Flow control: The TCP protocol attempts to match the speeds of the processes that read from and write to a
stream.
iv. Message duplication and ordering: Message identifiers are associated with each IP packet, which enables
the recipient to detect and reject duplicates, or to reorder messages.
v. Message destinations: A pair of communicating processes establish a connection before they can
communicate over a stream. Establishing a connection involves a connect request from client to server
followed by an accept request from server to client before any communication can take place.
The client process creates a stream socket, bound to any port and issues a connect request to a server
process.

The server process creates a listening socket, binds it to a well known port and waits for any incoming
request from any client.

Once a request arrives, it is queued up and the server process fetches a request to be processed from the
queue, creates a separate process (child server process / child thread) to handle the request and goes
back to listening mode to accept more connect requests.

When an application closes a socket, this indicates that it will not write any more data to its output
stream. Any data in the output buffer is sent to the other end of the stream and put in the queue at the
destination socket. When a process exits or fails, all of its sockets are eventually closed.

The following are some outstanding issues related to stream communication:


i. Matching of data items: Two communicating processes need to agree as to the contents of the data
transmitted over a stream.
When a pair of processes do not cooperate correctly in their use of a stream, the reading process may
experience errors when interpreting the data or may block due to insufficient data in the stream.

ii. Blocking: The data written to a stream is kept in a queue at the destination socket. When a process
attempts to read data from an input channel, it will get data from the queue or it will block until data
becomes available.

iii. Threads: When a server accepts a connection, it generally creates a new thread in which to
communicate with the new client.

Failure model
For upholding integrity in reliable communication, TCP streams use checksums to detect and reject
corrupt packets and sequence numbers to detect and reject duplicate packets. For the sake of the
validity property, TCP streams use timeouts and retransmissions to deal with lost packets.

TCP may not provide reliable communication, when it does not guarantee to deliver messages in the
face of network failures, severe congestion etc.
Use of TCP
Many frequently used services run over TCP connections, with reserved port numbers. These include the
following:

i. HTTP: The Hypertext Transfer Protocol is used for communication between web browsers and web
servers;

ii. FTP: The File Transfer Protocol allows directories on a remote computer to be browsed and files to be
transferred from one computer to another over a connection.

iii. Telnet: Telnet provides access by means of a terminal session to a remote computer.

iv. SMTP: The Simple Mail Transfer Protocol is used to send mail between computers.
Java API for TCP streams
The Java interface to TCP streams is provided in the classes ServerSocket and Socket

i. ServerSocket: This class is intended for use by a server to create a socket at a server port for listening
for connect requests from clients.

Its accept method gets a connect request from the queue or, if the queue is empty, blocks until one
arrives.

The result of executing accept is an instance of Socket – a socket to use for communicating with the
client.

ii. Socket: This class is for use by a pair of processes with a connection. The client uses a constructor to
create a socket, specifying the DNS hostname and port of a server.
This constructor not only creates a socket associated with a local port but also connects it to the
specified remote computer and port number.
The Socket class provides the methods getInputStream and getOutputStream for accessing the two
streams associated with a socket. The return types of these methods are InputStream and
OutputStream, respectively.(abstract classes)

DataInputStream and DataOutputStream, allow binary representations of primitive data types to be


read and written in a machine-independent manner.
As message consists of a string, the client and server processes use the method writeUTF of
DataOutputStream to write it to the output stream and the method readUTF of DataInputStream to
read it from the input stream. UTF-8 is an encoding that represents strings in a particular format.

When a process has closed its socket, it will no longer be able to use its input and output streams. The
process to which it has sent data can read the data in its queue, but any further reads after the queue is
empty will result in an EOFException.

Attempts to use a closed socket or to write to a broken stream result in an IOException.


External Data Representation
• An agreed standard for the representation of data structures and primitives values is called an
external data representation
• Marshalling – it is the process of taking a collection of data items and assembling them into a form
suitable for transmission in a message.
• Unmarshalling – it is the process of disassembling data items on arrival to produce an equivalent
collection of data items at the destination.

• Marshalling consists of the translation of structured data items and primitive values into an
external data representation.
• Unmarshalling consists of the generation of primitive values from their external data representation
and the rebuilding of the data structures
Approaches:
1. CORBA’s common data representation
2. Java’s object serialization
3. XML
• In CORBA and in Java , the primitive data types are marshalled into a binary form.
• In XML, the primitive data types are represented textually (ex. HTTP follows textual approach)

The textual representation of a data value will generally be longer than the equivalent binary
representation.
In the first two cases, the marshalling and unmarshalling activities are intended to be carried out by a
middleware layer without any involvement on the part of the application programmer.
In the case of XML, which is textual and therefore more accessible to hand-encoding, software for
marshalling and unmarshalling is available for all commonly used platforms and programming
environments.

Issues
• Marshalling requires the consideration of all the finest details of the representation of the primitive
components of composite objects, the process is likely to be error-prone if carried out by hand.
• Whether the marshalled data should include information concerning the type of its contents
• CORBA’s representation includes just the values of the objects transmitted, and nothing about their types.
• Java serialization and XML do include type information, but in different ways.
 Java puts all of the required type information into the serialized form
 XML documents may refer to externally defined sets of names (with types) called namespaces

Other Approaches:
• Google’s Protocol buffers
• JSON ( Java Script Object Notation)
Multicast Communication
• Multicast operation sends a single message from one process to each of the members of a group of
processes in such a way that the membership of the group is transparent to the sender
Characteristics:
1. Fault tolerance based on replicated services
2. Discovering services in spontaneous networking
3. Better performance through replicated data
4. Propagation of event notifications
IP multicast
• IP multicast is built on top of Internet Protocol (IP)
• It allows a sender to transmit a single IP packet to a set of computers that form a multicast group
• A multicast group is specified by a Class D Internet address ( first four bits 1110 )
• The membership of multicast group is dynamic
• At the programming level, IP multicast is available only via UDP
• At the IP level a computer belongs to a multicast group when one or more of its processes has
sockets that belong to that group.
Types of multicast – local multicast, Internet multicast
Multicast Routers
Internet multicast make use of multicast routers , which forwards single datagram to routers on other
networks, where they are again multicast to local members
TTL is used to limit the distance of propagation of a multicast datagram
Multicast address allocation:
• Class D addresses in the range of 224.0.0.0 to 239.255.255.255 are reserved for multicast traffic
and managed globally by IANA (Internet Assigned Numbers Authority)
• The management of the address space is reviewed annually.
• Based on RPC 3171 the address space is partitioned into a number of blocks
 Local network control block (224.0.0.0 to 224.0.0.225)
 Internet control block (224.0.1.0 to 224.0.1.225)
 Ad Hoc control block (224.0.2.0 to 224.0.255.0)
 Administratively Scoped Block ( 239.0.0.0 to 239.255.255.255 )
The remainder of the multicast addresses are available for use by temporary groups.
Failure model for multicast datagram:
Omission failure
Java API to IP multicast
• Java API provides a datagram interface to IP multicast through the class MulticastSocket
• MulticastSocket is a subclass of DatagramSocket
• MulticastSocket provides two constructors , allowing sockets to be created to use either a
specified local port or any free localport.
• A process can join a multicast group with a given multicast address by invoking the joinGroup
method of the multicast socket
• A process can leave a specified group by invoking the leaveGroup method of its multicast socket
• The Java API allows the TTL to be set for a multicast socket by means of the setTimeToLive
method
• The default value is 1 , allowing the multicast to propagate only on the local network
Network virtualization : Overlay networks
Network virtualization is concerned with the construction of many different virtual networks
over an existing network such as the Internet. Each virtual network can be designed to support a
particular distributed application.
Overlay networks:
An overlay network is a virtual network consisting of nodes and virtual links, which sits on top of an
underlying network and offers
a. a service that is tailored towards the needs of a class of application
b. more efficient operation in a given networked environment
c. an additional feature
Advantages of Overlay networks:
• They enable new network services to be defined without requiring changes to the underlying network
• They encourage experimentation with network services and customization of services
• Multiple overlays can be defined and can coexist, with the end result being a more open and extensible
network architecture
Disadvantages:
• Performance penalty
• Complexity of network services
Skype:
Skype
• Skype is a peer-to-peer application offering Voice over IP (VoIP)
• It includes instant messaging, video conferencing and interfaces to the standard telephony services
through SkypeIn and SkypeOut
• The software was developed by Kazaa in 2003
• It indicates how advanced functionalities are provided in an application-specific manner without
modification of the core architecture of the Internet.
• No IP address or port is required to establish a call
• It is a virtual network which establishes connections between people who are currently active.
Architecture:
• Skype is based on peer-to-peer architecture
• The infrastructure consists of
a. ordinary user’s machine
b. super nodes
• Super node is a normal host with enhanced role (additional capabilities)
• Selection of super nodes is based on
- available bandwidth
- reachability
- availability
User connection :
• Skype users are authenticated via a well known login server
• They then make contact with a selected super node
• The client maintains a cache of super node identities
• At first login this cache is filled with addresses of around seven super nodes
• As time passes the client builds and maintains a larger set
Search for users:
• The main goal of the super nodes is to perform the efficient search of the global index of users
• The search is orchestrated by the client’s chosen super node and involves an expanding search of
other super nodes until the specified user is found
• On average 8 super nodes are contacted
Voice connection:
• Once the required user is discovered Skype establishes a voice connection between the two parties
using TCP for signaling call request and terminations and either UDP / TCP for streaming audio
• The s/w used for encoding and decoding audio plays a key part in providing excellent call quality
attained using Skype
MPI
MPI – Message Passing Interface
• Introduced in 1994
The goal of MPI Forum was to retain the inherent simplicity, practicality and efficiency of
message passing approach but enhance this with portability through presenting a standardized
interface independent of OS/ programming language specific socket interface
Architecture:
• Sender and Receiver are added with MPI library buffers
• The library buffers are used to hold the data in transit
types of send operations Blocking Non-blocking
1. Generic MPI_Send MPI_Isend
2. Synchronous MPI_Ssend MPI_Issend
3. Buffered MPI_Bsend MPI_Ibsend
4. Ready MPI_Rsend MPI_Irsend
Remote Invocation
Request-reply protocols
• This form of communication is designed to support the roles and message exchanges in typical
client-server interactions
• In the normal case request-reply communication is synchronous (client blocks until reply arrives
from the server)
• The client-server exchanges are described in terms of the send and receive operations in the Java
API
• The protocol is based on the trio of communication primitives
1.doOperation 2.getRequest 3.sendReply
doOperation method:
• Used by clients to invoke remote operations
• The arguments specify the remote server and which operation to invoke , together with additional
information required by the operation
• Its result is a byte array containing the reply
• The client marshals the argument into array of bytes and unmarshals the result from the array of
bytes
• The first argument of doOperation is an instance of the class RemoteRef which represents the
remote servers
• The doOperation method sends a request message to the server whose Internet address and port are
specified in the remote reference
• After sending the request message , doOperation invokes receive to get a reply message, from
which it extracts the result and returns it to the caller
• The caller of the doOperation is blocked until the server performs the requested operation and
transmits a reply message to the client process
getRequest:
• Used by server process to acquire service request
sendReply :
• sendReply is used by the server to send reply messages to the client after it had invoked the
specified operation
• When reply message is received by client the original doOperation is unblocked and execution of
client program continues
Message structure
Failure model
- Omission failure -No in-order delivery -Timeout -Discarding duplicate request messages
-Lost reply messages -History
Styles of Exchange protocols
 The request ( R ) protocol
 The request-reply ( RR ) protocol
 The request-reply-acknowledge reply ( RRA ) protocol
HTTP : an example of a request-reply protocol
• HTTP is used by web browser clients to make requests to web servers and to receive replies from
them
• HTTP invokes methods on web resources and also allows for content negotiation and password
style authentication
• HTTP is implemented over TCP
• The client –server interaction consisted of the following steps
 the client requests and the server accepts a connection at the default server port or at the
port specified in the URL
 the client sends a request message to the server
 the server sends a reply message to the client
 the connection is closed
• Establishing and closing a connection for every request-reply exchange is expensive so most
browsers make multiple requests to the same server
• HTTP 1.1 uses persistent connections ( connections remain open over a series of request-reply
exchanges)
HTTP Methods:
 GET - requests the resource whose URL is given as its argument
 HEAD – similar to GET but does not return any data
 POST – specifies the URL of a resource that can deal with data supplied in the body of the request
 PUT – requests that the data supplied in the request is stored with the given URL as its identifier
 DELETE – server deletes the resources identified by the given URL
 OPTIONS – the server supplies the client with a list of methods it allows to be applied to the given
URL
 TRACE – the server sends back the request message. Used for diagnostic purposes
Message formats:
Remote procedure call
• The goal of RPC is achieving a high level of distribution transparency.

• Procedures in processes on remote computers can be called as if they are


procedures in the local address space.
• Design issues for RPC
• the style of programming promoted by RPC – programming with interfaces;
• the call semantics associated with RPC;
• the key issue of transparency and how it relates to remote procedure calls.
Programming with interfaces
• Programming languages organize a program as a set of modules that can communicate with one another.
• Communication between modules can be by means of procedure calls between modules or by direct
access to the variables in another module.
• To control the possible interactions between modules, an explicit interface is defined for each module.
• Interface of a module specifies the procedures and the variables that can be accessed from other
modules.
Interfaces in DS:
• The term service interface is used to refer to the specification of the procedures offered by a server,
defining the types of the arguments of each of the procedures.
Benefits in programming with interfaces in distributed systems:
• Programmers are concerned only with the abstraction offered by the service interface and need not be
aware of implementation details.

• Programmers also do not need to know the programming language or underlying platform used to
implement the service

• The interface can also change as long as it remains compatible with the original.
Definition of service interfaces is influenced by the distributed nature of the underlying
infrastructure:

• It is not possible for a client module running in one process to access the variables in a module in
another process. Therefore the service interface cannot specify direct access to variables.
• The parameter-passing mechanisms used in local procedure calls. It is not suitable when the caller and
procedure are in different processes.
• Addresses in one process are not valid in another remote one. Therefore, addresses cannot be passed as
arguments or returned as results of calls to remote modules.
Interface definition languages(IDL)
• Interface definition languages (IDLs) are designed to allow procedures
implemented in different languages to invoke one another.
• The concept of an IDL was initially developed for RPC systems but applies
equally to RMI and also web services
RPC call semantics
Choices of delivery guarantees (ways to implement doOperation )
• Retry request message: Controls whether to retransmit the request message until either a reply is
received or the server is assumed to have failed.
• Duplicate filtering: Controls when retransmissions are used and whether to filter out duplicate
requests at the server.
• Retransmission of results: Controls whether to keep a history of result messages to enable lost
results to be retransmitted without re-executing the operations at the server.

• Combinations of these choices lead to a variety of possible semantics for the reliability of remote
invocations as seen by the invoker
Maybe semantics:
• The remote procedure call may be executed once or not at all.
Types of failure
• omission failures if the request or result message is lost;
• crash failures when the server containing the remote operation fails.
At-least-once semantics:
• The invoker receives either a result, in which case the invoker knows that the procedure was executed at
least once, or an exception informing it that no result was received. At-least-once semantics can be
achieved by the retransmission of request messages, which masks the omission failures of the request or
result message.
Types of failure
• crash failures when the server containing the remote procedure fails;
• arbitrary failures – in cases when the request message is retransmitted, the remote server may
receive it and execute the procedure more than once, possibly causing wrong values to be stored or
returned.
At-most-once semantics:
• the caller receives either a result, in which case the caller knows that the procedure was
executed exactly once, or an exception informing it that no result was received, in which
case the procedure will have been executed either once or not at all.

Transparency:
• Aimed to make remote procedure calls as much like local procedure calls as possible, with
no distinction in syntax between a local and a remote procedure call.
• All the necessary calls to marshalling and message-passing procedures were hidden from
the programmer making the call.
• RPC strives to offer at least location and access transparency.
• In the case of failure, it is impossible to distinguish between failure of the network and of
the remote server process. This requires that clients making remote calls are able to recover
from such situations.
• The choice as to whether RPC should be transparent is also available to the designers of
IDLs.
• For example, in some IDLs, a remote invocation may throw an exception when the
client is unable to communicate with a remote procedure.
• The current consensus is that remote calls should be made transparent in the sense that
the syntax of a remote call is the same as that of a local invocation, but that the
difference between local and remote calls should be expressed in their interfaces.
Implementation of RPC
• The client that accesses a service includes one stub procedure for each procedure in the service
interface
• The stub procedure behaves like a local procedure to the client
• The stub procedure instead of executing the call, marshals the procedure identifier and the
arguments into a request message
• It then sends the request message via a communication module to the server
• When a reply message arrives, it unmarshals the result
• The server process contains a dispatcher together with one server stub procedure and one service
procedure for each procedure in the service interface
• The dispatcher selects one of the server stub procedures according to the procedure identifier in the
request message.
• The server stub procedure then unmarshals the arguments in the request message, calls the
corresponding service procedure and marshals the return values for the reply message
• The service procedures implement the procedures in the service interface
• An interface compiler can automatically generates the client and server stub procedures and the
dispatcher
Remote Method Invocation
In RMI, a calling object can invoke a method in a potentially remote object.

The commonalities between RMI and RPC


• They both support programming with interfaces
• They are both typically constructed on top of request-reply protocols and can offer a range of call
semantics such as at-least-once and at-most-once.
• They both offer a similar level of transparency

The differences between RMI and RPC


• The programmer is able to use the full expressive power of object-oriented programming in the
development of distributed systems software
• RMI-based system have unique object references such object references can also be passed as
parameters, thus offering significantly richer parameter-passing semantics than in RPC
Design issues for RMI
• RMI shares the same design issues as RPC in terms of programming with interfaces, call semantics
and level of transparency.

The object model


• An object communicates with other objects by invoking their methods, generally passing arguments
and receiving results.
• Objects can encapsulate their data and the code of their methods.
Object references:
• Objects can be accessed via object references.
• To invoke a method in an object, the object reference and method name are given, together with any
necessary arguments.
• The object whose method is invoked is sometimes called the target and sometimes the receiver.
Interfaces:
• An interface provides a definition of the signatures of a set of methods
• In Java, a class may implement several interfaces, and the methods of an interface may be
implemented by any class.
Actions :
• Action in an object-oriented program is initiated by an object invoking a method in another object.
• The receiver (target/remote object) executes the appropriate method and then returns control to the
invoking object, sometimes supplying a result.
An invocation of a method can have three effects:
• The state of the receiver may be changed.
• A new object may be instantiated.
• Further invocations on methods in other objects may take place.
Exceptions:
• Programs can encounter many sorts of errors and unexpected conditions of varying seriousness.
During the execution of a method, many different problems may be discovered.
Garbage collection:
• It is necessary to provide a means of freeing the space occupied by objects when they are no longer
needed.
Distributed objects
• Objects are managed by servers and their clients invoke their methods using remote method invocation.
• In RMI, the client’s request to invoke a method of an object is sent in a message to the server managing
the object. The invocation is carried out by executing a method of the object at the server and the result
is returned to the client in another message.

Advantages
• Having client and server objects in different processes enforces encapsulation. That is, the state of an
object can be accessed only by the methods of the object, which means that it is not possible for
unauthorized methods to act on the state.

• Shared state of a distributed program as a collection of objects is that an object may be accessed via
RMI, or it may be copied into a local cache and accessed directly
The distributed object model
• Each process contains a collection of objects, some of which can receive both local and remote
invocations, whereas the other objects can receive only local invocations
• Method invocations between objects in different processes, whether in the same computer or not, are
known as remote method invocations.
• Method invocations between objects in the same process are local method invocations.

Two fundamental concepts


• Remote object references: Other objects can invoke the methods of a remote object if they have
access to its remote object reference.
Eg. a remote object reference for B must be available to A.
• Remote interfaces: Every remote object has a remote interface that specifies which of its methods can
be invoked remotely.
Eg. the objects B and F must have remote interfaces.
Remote Object References:
• A remote object reference is an identifier that can be used throughout a distributed system to refer to a
particular unique remote object.
• Remote object references are similar to local ones
• The remote object to receive a remote method invocation is specified by the invoker as a remote
object reference.
• Remote object references may be passed as arguments and results of remote method invocations.
Remote interfaces:
• Objects in other processes can invoke only the methods that belong to its remote interface
• Local objects can invoke the methods in the remote interface as well as other methods implemented
by a remote object.
• Do not have constructors.
• The CORBA system provides an interface definition language (IDL), which is used for defining
remote interfaces.
Actions in a distributed object system
• non-distributed case, an action is initiated by a method invocation, which may result in further
invocations on methods in other objects.
• distributed case, the objects involved in a chain of related invocations may be located in different
processes or different computers. When an invocation crosses the boundary of a process or
computer, RMI is used, and the remote reference of the object must be available to the invoker.

• object A needs to hold a remote object reference to object B. Remote object references may be
obtained as the results of remote method invocations. object A might obtain a remote reference to
object F from object B.
• Distributed applications may provide remote objects with methods for
instantiating objects that can be accessed by RMI, thus effectively providing
the effect of remote instantiation of objects

• if the object L contains a method for creating remote objects, then the
remote invocations from C and K could lead to the instantiation of the
objects M and N, respectively.
Garbage collection in a distributed-object system:

• Distributed garbage collection is generally achieved by cooperation between


the existing local garbage collector and an added module that carries out a
form of distributed garbage collection, usually based on reference counting.

• If garbage collection is not available, then remote objects that are no longer
required should be deleted.
Exceptions:
• Any remote invocation may fail for reasons related to the invoked object
being in a different process or computer from the invoker.
• The process containing the remote object may have crashed or may be too
busy to reply, or the invocation or result message may be lost. Therefore,
remote method invocation should be able to raise exceptions

Implementation of RMI:
• Several separate objects and modules are involved in achieving a remote
method invocation.
• An application-level object A invokes a method in a remote application-level object
B for which it holds a remote object reference.
Communication module
• The two cooperating communication modules carry out the request-reply protocol,
which transmits request and reply messages between the client and server.
• The communication module uses only the first three items
• messageType, requestId and remoteReference
• The operationId and all the marshalling and unmarshalling are the concern of the
RMI software.
• The communication modules are together responsible for providing a specified
invocation semantics.(at-most-once)
• The communication module in the server selects the dispatcher for the class of the
object to be invoked
Remote reference module
• A remote reference module is responsible for translating between local and remote
object references and for creating remote object references.
• The remote reference module in each process has a remote object table that records
the correspondence between local object references in that process and remote object
references
remote object table includes:
• An entry for all the remote objects held by the process. Eg. the remote object
B will be recorded in the table at the server
• An entry for each local proxy. Eg. The proxy for B will be recorded in the
table at the client
Servants
• A servant is an instance of a class that provides the body of a remote object.
• Servants live within a server process.
• They are created when remote objects are instantiated and remain in use
until they are no longer needed, finally being garbage collected or deleted.
The RMI software
• This consists of a layer of software between the application-level objects and
the communication and remote reference modules.
• Proxy: The role of a proxy is to make remote method invocation transparent
to clients by behaving like a local object to the invoker; but instead of
executing an invocation, it forwards it in a message to a remote object.
• It hides the details of the remote object reference, the marshalling of
arguments, unmarshalling of results and sending and receiving of messages
from the client.
• There is one proxy for each remote object for which a process holds a
remote object reference.
Dispatcher:
• A server has one dispatcher and one skeleton for each class representing a
remote object.
• our example, the server has a dispatcher and a skeleton for the class of
remote object B.
• The dispatcher receives request messages from the communication module.
It uses the operationId to select the appropriate method in the skeleton,
passing on the request message
Skeleton:
• The class of a remote object has a skeleton, which implements the methods
in the remote interface.
• A skeleton method unmarshals the arguments in the request message and
invokes the corresponding method in the servant.
It waits for the invocation to complete and then marshals the result, together
with any exceptions, in a reply message to the sending proxy’s method.
Generation of the classes for proxies, dispatchers and skeletons
• The classes for the proxy, dispatcher and skeleton used in RMI are generated
automatically by an interface compiler.
• The Java RMI compiler generates the proxy, dispatcher and skeleton classes
from the class of the remote object.
Dynamic invocation: An alternative to proxies
• The proxy just described is static, in the sense that its class is generated from
an interface definition and then compiled into the client code.
• Suppose that a client program receives a remote reference to an object
whose remote interface was not available at compile time. In this case it
needs another way to invoke the remote object.
• Dynamic invocation gives the client access to a generic representation of a
remote invocation like the doOperation method used, which is available as
part of the infrastructure for RMI.
• The dynamic invocation interface is not as convenient to use as a proxy, but
it is useful in applications where some of the interfaces of the remote objects
cannot be predicted at design time.
• To summarize: the shared whiteboard application displays many different
types of shapes, such as circles, rectangles and lines, but it should also be
able to display new shapes that were not predicted when the client was
compiled. A client that uses dynamic invocation is able to address this
challenge.
Dynamic skeletons:
• a server will need to host remote objects whose interfaces were not known
at compile time.
• a client may supply a new type of shape to the shared whiteboard server for
it to store. A server with dynamic skeletons would be able to deal with this
situation.
• Java RMI addresses this problem by using a generic dispatcher and the
dynamic downloading of classes to the server.
Server and client programs
• The server program contains the classes for the dispatchers and skeletons,
together with the implementations of the classes of all of the servants that it
supports
• The server program contains an initialization section
• The initialization section is responsible for creating and initializing at least
one of the servants to be hosted by the server.
• Additional servants may be created in response to requests from clients.
• The initialization section may also register some of its servants with a binder
• The client program will contain the classes of the proxies for all of the
remote objects that it will invoke. It can use a binder to look up remote
object references.
Factory methods:
• The term factory method is sometimes used to refer to a method that creates
servants, and a factory object is an object with factory methods.
• Any remote object that needs to be able to create new remote objects on demand
for clients must provide methods in its remote interface for this purpose.
• Such methods are called factory methods, although they are really just normal
methods.
The binder
• Client programs generally require a means of obtaining a remote object
reference for at least one of the remote objects held by a server.
• Eg. object A would require a remote object reference for object B
• A binder in a distributed system is a separate service that maintains a table
containing mappings from textual names to remote object references. It is used
by servers to register their remote objects by name and by clients to look them
up.
Server threads
• Whenever an object executes a remote invocation, that execution may lead to
further invocations of methods in other remote objects, which may take
sometime to return.
• To avoid the execution of one remote invocation delaying the execution of
another, servers generally allocate a separate thread for the execution of each
remote invocation.
Activation of remote objects
• Some applications require that information survive for long periods of time.
However, it is not practical for the objects representing such information to
be kept in running processes for unlimited periods, particularly since they
are not necessarily in use all of the time.
• To avoid the potential waste of resources that would result from to running
all of the servers that manage remote objects all of the time, the servers can
be started whenever they are needed by clients, as is done for the standard
set of TCP services.
• Processes that start server processes to host remote objects are called
activators.
• A remote object is described as active when it is available for invocation
with in a running process, whereas it is called passive if is not currently
active but can be made active.
• A passive object consists of two parts:
1. The implementation of its methods;
2. Its state in the marshalled form.
An activator is responsible for:
• Registering passive objects that are available for activation
• Starting named server processes and activating remote objects in them;
• Keeping track of the locations of the servers for remote objects that it has
already activated.
Persistent object stores
• An object that is guaranteed to live between activations of processes is
called a persistent object.
• Persistent objects are generally managed by persistent object stores, which
store their state in a marshalled form on disk.
• In general, a persistent object store will manage very large numbers of
persistent objects, which are stored on disk or in a database until they are
needed. They will be activated when their methods are invoked by other
objects.
• Activation is generally designed to be transparent – that is, the invoker
should not be able to tell whether an object is already in main memory or
has to be activated before its method is invoked.
• Persistent object stores generally allow collections of related persistent
objects to have human-readable names such as pathnames or URLs.
• In practice, each human readable name is associated with the root of a
connected set of persistent objects.
Two approaches to deciding whether an object is persistent or not
• The persistent object store maintains some persistent roots, and any object
that is reachable from a persistent root is defined to be persistent.
• The persistent object store provides some classes on which persistence is
based – persistent objects belong to their subclasses
Object location
• A location service helps clients to locate remote objects from their remote
object references. It uses a database that maps remote object references to
their probable current locations.
Indirect Communication

Group Communication
Indirect communication is defined as communication between entities in a distributed
system through an intermediary with no direct coupling between the sender and the receiver

Properties stemming from the use of intermediary


1. Space uncoupling – sender does not know or need to know the identity of the receiver
2. Time uncoupling – sender and receiver can have independent lifetimes
Group Communication
• Group communication offers a service whereby a message is sent to a group and this message is
delivered to all members of the group
• The sender is not aware of the receivers
• It represents an abstraction over the multicast communication and may be implemented over IP
multicast or an equivalent overlay network
Significances:
 Managing group membership
 Detecting failures
 Providing reliability
 Ordering guarantees
Applications:
Financial industries
Multiuser games
Implementation of highly available (replicated) servers – fault tolerance
Support for system monitoring and management – load balancing strategies
Programming model:
• In group communication the central concept is that of a group with associated group
membership, whereby processes may join or leave the group.
• Processes can send a message to this group and have it propagated to all members of the group
with certain guarantees in terms of reliability and ordering
• Group communication implements multicast communication
Advantages
1. Efficient utilization of bandwidth
2. Minimizing the total time taken to deliver the message to all destinations

Process group : it is a group in which the communicating entities are processes

Object group : it is a collection of objects that process the same set of invocations concurrently, with
each returning responses
Closed group : a group is said to be closed if only members of the group may multicast to it
Open group : a group is said to be open if processes outside the group may send to it
Overlapping group : entities may be members of multiple groups
Non overlapping group : entities may be members of a single group

Implementation issues:
Reliability and ordering in multicast
Reliability is defined with respect to
a. Integrity – msg. received is the same as the one sent, and no msgs. are delivered twice
b.Validity – any outgoing msg is eventually delivered
c.Agreement – if msg is delivered to one process , then it is delivered to all processes in the group
Properties of Ordering
FIFO Ordering, Causal Ordering, Total Ordering
Group membership management:
Group membership service has 4 main tasks
 Providing an interface to group membership changes
 Failure detection
 Notifying members of group membership changes
 Performing group address expansion
Publish-subscribe system
Publish-subscribe system is also called as distributed event-based systems
• It is fundamentally a one-to-many communication paradigm
• It is a system where publishers publish structured events to an event service and subscribers
express interest in particular events through subscriptions which can be arbitrary patterns over the
structured events
• The task of the publish-subscribe system is to match the subscriptions against published events
and ensure correct delivery of event notification
Applications of publish-subscribe system:
• Financial information system
• Other areas with live feed of real time data
• Support for cooperative working
• Support for ubiquitous computing
• Network monitoring in the Internet
Characteristics of publish-subscribe systems:
• Heterogeneity
with event notification components in a DS that were not designed to interoperate can be
made to work together
• Asynchronicity
notifications are sent asynchronously by event generating publishers to all the subscribers
that have expressed an interest in them to prevent publishers needing to synchronize with subscribers

Programming model:
• The programming model in publish-subscribe systems is based on a small set of operations
• Publishers disseminate an event e through a publish(e) operation
• Subscribers express an interest in a set of events through subscribe (f) operation where f refers to a
filter ( pattern defined over the set of possible events)
• Subscribers can later revoke the interest through a unsubscribe(f) operation
• When events arrive at a subscriber, the events are delivered using a notify(e) operation
• Some systems complement the set of operations by introducing the concept of advertisements.
• With advertisements, publishers have the option of declaring the nature of future events through an
advertise(f) operation
• Advertisements can be revoked through a call of unadvertised(f)
• The effectiveness of publish-subscribe systems is determined by the subscription (filter) model
Categories of filters:
 Channel-based :Publishers publish events to named channels and subscribers then subscribe to one of these
named channels to receive all events sent to that channel
 Topic-based : Each notification is expressed in terms of a number of fields, with one field denoting the
topic. Subscriptions are then defined in terms of the topic of interest
 Content-based :They are a generalization of the topic-based approaches allowing the expression of
subscriptions over a range of fields in an event notification
 Type-based : subscriptions are defined in terms of types or subtypes of the given filter
 Objects of interest : similar to type-base approach, they differ by focusing the change of state of the objects of
interest rather than predicates associated with the type of objects
Architecture of Objects of interest :
• The main component is an event service
• It maintains a database of event notifications and of interests of subscribers
• The event service is notified of events that occur at objects of interest
• Subscribers inform event service about the types of events they are interested in
• When an event occurs at the object of interest a message containing the notification is sent directly
to the subscribers of that type of event
Implementation issues:
1) Centralized versus distributed implementations:
Centralized implementation:
• simplest approach
• server is implemented in a single node that acting as an event broker.
• Publishers publish events to this broker
• Subscribers send subscriptions to the broker and receive notification in return
• Interaction with broker is through a series of point-to-point messages. (msg. passing / remote invocation)
Drawback:
Lack of resilience and scalability
Distributed implementation:
• The centralized broker is replaced by a network of brokers
• Brokers cooperate to offer the desired functionality
Advantages:
Potential to survive failure
Peer-to-peer implementation:
• There is no distinction between publishers, subscribers and brokers
• All nodes acts as brokers
• Brokers cooperatively implement the required event routing functionality
2) Overall systems architecture:
• In the bottom layer, publish-subscribe systems make use of a range of inter process communication
services such as TCP/IP, IP multicast
• The heart of the architecture is provided by the event routing layer
• The event routing layer is supported by a network overlay infrastructure.
• Event routing performs the task of ensuring that event notifications are routed efficiently as
possible to appropriate subscribers
• The overlay infrastructure supports this by setting up appropriate network of brokers or peer-to-
peer structures
• In content-based approach this is called content- based routing (CBR)
• The top layer implements matching ( ensures that events match a given subscription)
Implementation approaches:
- Flooding
- Filtering
- Advertisements
- Rendezvous
Flooding:
• Simplest approach
• It involves sending an event notification to all nodes in the network and then carrying out the
appropriate matching at the subscriber end.
• Alternatively, flooding can be used to send subscriptions back to all possible publishers, with the
matching carried out at the publishing end
• The matched events are then sent directly to the relevant subscribers using point-to-point
communication
Advantage:
Simplicity
Disadvantage:
It causes unnecessary network traffic
Filtering:
Advertisements:
Advertisements are propagated towards subscribers
Rendezvous :
• This approach defines rendezvous nodes, which are broker nodes responsible for a given subset of
the event space.
• This routing algorithm defines two functions SN(s) and EN(e)
• SN(s) takes a given subscription, s, and returns one or more rendezvous nodes that take
responsibility for that subscription
• Each rendezvous node maintains a subscription list and forwards all matching events to the set of
subscribed nodes
• When an event , e, is published the function EN(e) returns one or more rendezvous node
• These nodes are responsible for matching e against subscriptions in the system
• Both SN(s) and EN(e) return more than one node if reliability is a concern.
Interesting interpretation:
• The event space is mapped onto a distributed hash table (DHT)
• Hash function can be used to map both events and subscriptions onto a corresponding rendezvous node for the
management of such subscriptions.
Gossiping:
• It operates by nodes in the network periodically and probabilistically exchanging events with
neighbouring nodes.
• Through this approach, it is possible to propagate events effectively through the network without
the structure imposed by other approaches
Examples of publish-subscribe systems
Message queues
Message queues provide a point-to-point service in that the sender places the message into a
queue, and it is then removed by a single process. Message queues are also referred to as
Message-Oriented Middleware.

The programming model


The programming model offers a communication approach thru which, producer processes can
send messages to a specific queue and other (consumer) processes can then receive messages from
this queue. Three styles of receive are generally supported:
1. A blocking receive, which will block until an appropriate message is available;
2. A non-blocking receive (a polling operation), which will check the status of the queue and
return a message if available, or a not available indication otherwise;
3. A notify operation, which will issue an event notification when a message is available in
the associated queue.
This overall approach is captured pictorially in Figure.
A number of processes can send messages to the same queue, and likewise a number of receivers
can remove messages from a queue. The queuing policy is normally first-in-first-out (FIFO), and
sometimes also support the concept of priority.

Consumer processes can also select messages from the queue based on properties of a message.
A message consists of a destination (that is, a unique identifier designating the destination queue),
metadata associated with the message, including fields such as the priority of the message and the
delivery mode, and also the body of the message. The body is normally opaque and untouched by
the message queue system.

The associated content is serialized using any of the standard approaches like marshalled data
types, object serialization or XML structured messages . Message sizes are configurable and can
be very large.

Since message bodies are opaque, message selection is normally expressed through predicates
defined over the metadata.

Messages in queue are persistent – that is, message queues will store the messages indefinitely
(until they are consumed) and will also commit the messages to disk to enable reliable delivery.
In particular, any message sent is eventually received (validity) and the message received is
identical to the one sent, and no messages are delivered twice (integrity), message queue systems
cannot say anything about the
timing of the delivery.

Message passing systems can also support additional functionality:


i. Entire messages contained in a transaction are delivered. The goal is to ensure that
all the steps in the transaction are completed, or the transaction has no effect at all. This relies
on interfacing with an external transaction service, provided by the middleware environment.

ii. A number of systems also support message transformation between formats to deal with
heterogeneity. This could be as simple as transforming from one byte order to another (big-endian
to little-endian) or more complex, like a transformation from one external data representation to
another (such as SOAP to IIOP).

The term message broker is often used to denote a service responsible for message
transformation.
iii. Some message queue implementations also provide support for security.

Message queues are similar in many ways to the message-passing system. The difference is that
message-passing systems have implicit queues associated with senders and receivers whereas
message queuing systems have explicit queues that are third-party entities, separate from the
sender and the receiver.

It is this key difference that makes message queues an indirect communication paradigm with the
crucial properties of space and time uncoupling.

Case study: WebSphere MQ


WebSphere MQ is middleware developed by IBM based on the concept of message queues,
offering an indirection between senders and receivers of messages.
Queues in WebSphere MQ are managed by queue managers which host and manage queues and
allow applications to access queues through the Message Queue Interface (MQI).

The MQI is a relatively simple interface allowing applications to carry out operations such as
connecting to or disconnecting from a queue (MQCONN and MQDISC) or sending/receiving
messages to/from a queue (MQPUT and MQGET). Multiple queue managers can reside on a
single physical server.

Client applications accessing a queue manager may reside either on the same physical or on
different machines . If on different machines, they and must then communicate with the queue
manager through what is known as a client channel.

Client channels adopt the concept of a proxy whereby MQI commands are issued on the proxy
and then sent transparently to the queue manager for execution using RPC.
In the above fig, a client application is sending messages to a remote queue manager and
multiple services (on the same machine as the server) are then consuming the incoming
messages.
This is a very simple use of WebSphere MQ, and in practice it is more common for queue managers
to be linked together into a federated structure, mirroring the approach often adopted in publish-
subscribe systems (with networks of brokers).

To achieve this, MQ introduces the concept of a message channel as a unidirectional connection


between two queue managers that is used to forward messages asynchronously from one queue to
another.

A message channel is a connection between two queue managers, whereas a client channel is a
connection between a client application and a queue manager.

A message channel is managed by a message channel agent (MCA) at each end. The two agents
are responsible for establishing and maintaining the channel, including an initial negotiation to
agree on the properties of the channel (including security properties).

Routing tables are also included in each queue manager, and together with channels this allows
arbitrary topologies to be created.
Tools are provided for systems administrators to create suitable topologies and to hide the
complexities of establishing message channels and routing strategies.

A wide range of topologies can be created, including trees, meshes or a bus-based configuration.
one example topology often used in WebSphere MQ deployments, the hub-and-spoke topology
is presented below.

In the hub-and-spoke topology, one queue manager is designated as the hub. The hub hosts a
range of services. Client applications can connect to this hub only through queue managers
designated as spokes.

Spokes relay messages to the message queue of the hub for processing by the various services.
Spokes are placed strategically around the network to support different clients.

The hub is placed appropriately on a node with sufficient resources to deal with the volume of
traffic. Most applications and services are located on the hub, although it is also possible to have
some more local services on spokes.
This topology is heavily used with WebSphere MQ, particularly in large-scale deployments
covering significant geographical areas. The key to the approach is to be able to connect to a
local spoke over a high-bandwidth connection, for example over a local area network.

Communication between a client application and a queue manager uses RPC, whereas internal
communication between queue managers is asynchronous (nonblocking). Client application is
only blocked until the message is deposited in the local queue manager (the local spoke);.

Subsequent delivery, potentially over wide area networks, is asynchronous but guaranteed to be
reliable by the WebSphere MQ middleware.

The drawback of this architecture is that the hub can be a potential bottleneck and a single point
of failure.

WebSphere MQ overcome these problems, by including queue manager clusters, which allow
multiple instances of the same service to be supported by multiple queue managers with implicit
load balancing across the different instantiations.
Shared memory approaches
1.Distributed shared memory
Distributed shared memory (DSM) is an abstraction used for sharing data between computers that
do not share physical memory. Processes access DSM by reads and updates to what appears to be
ordinary memory within their address space.

An underlying runtime system ensures transparently that processes executing at different


computers observe the updates made by one another. It is as though the processes access a single
shared memory, but in fact the physical memory is distributed.
DSM is primarily a tool for parallel applications or for any distributed application or group of
applications in which individual shared data items can be accessed directly. It is different from
client – server architecture.

Message passing cannot be avoided altogether in a distributed system: in the absence of physically
shared memory, the DSM runtime support has to send updates in messages between computers.
DSM systems manage replicated data: each computer has a local copy of recently accessed data
items stored in DSM.

DSM can be persistent, that is, it may outlast the execution of any process or group of processes
that accesses it and be shared by different groups of processes over time.

Message passing versus DSM


A. Service offered: In message-passing model, marshalling and unmarshalling variables at
sending and receiving end takes place. By contrast, with shared memory the processes involved
share variables directly, so no marshalling is necessary – even of pointers to shared variables – and
thus no separate communication operations are necessary
Message passing allows processes to communicate while being protected from one another by
having private address spaces, whereas processes sharing DSM can, cause one another to fail by
erroneously altering data.

When message passing is used between heterogeneous computers, marshalling takes care of
differences in data representation; but in the case of memory being shared between computers, it is
difficult.

Synchronization between processes is achieved using techniques such as the lock server
implementation. In the case of DSM, synchronization is via normal constructs for shared-memory
programming such as locks and semaphores.

Finally, since DSM can be made persistent, processes communicating via DSM may execute with
non-overlapping lifetimes. A process can leave data in an agreed memory location for the other to
examine when it runs. By contrast, processes communicating via message passing must execute at
the same time.
B. Efficiency
The performance of a program based on DSM depends upon many factors, like the pattern of
data sharing.

With respect to cost, in message passing, all remote data accesses are explicit and therefore the
programmer is always aware of whether a particular operation is in-process or involves the
expense of communication.

Using DSM, however, any particular read or update may or may not involve communication by
the underlying runtime support. Whether it does or not depends upon such factors as whether the
data have been accessed before and the sharing pattern between processes at different
computers.
2. Tuple space communication
Processes communicate indirectly by placing tuples in a tuple space, from which other processes
can read or remove them. Tuples do not have an address but are accessed by pattern matching
on content.

The programming model


In the tuple space programming model, processes communicate through a tuple space – a shared
collection of tuples. Tuples in turn consist of a sequence of one or more typed data fields such as
<"fred", 1958>, <"sid", 1964> and <4, 9.8, "Yes">.

Any combination of types of tuples may exist in the same tuple space. Processes share data by
accessing the same tuple space: they place tuples in tuple space using the write operation and read
or extract them from tuple space using the read or take operation.

The write operation adds a tuple without affecting existing tuples in the space. The read operation
returns the value of one tuple without affecting the
contents of the tuple space. The take operation also returns a tuple, but in this case it also removes
the tuple from the tuple space.
When reading or removing a tuple from tuple space, a process provides a tuple specification and
the tuple space returns any tuple that matches that specification. To enable processes to
synchronize their activities, the read and take operations both block until there is a matching
tuple in the tuple space.

A tuple specification includes the number of fields and the required values or types of the fields.
For example, take(<String, integer>) could extract either <"fred", 1958> or <"sid", 1964>;
take(<String, 1958>) would extract only <"fred", 1958> of those two.

In the tuple space paradigm, no direct access to tuples in tuple space is allowed and processes
have to replace tuples in the tuple space instead of modifying them. Thus, tuples are immutable.

For eg, the current count (say, 64) is in the tuple <"counter", 64>. A process must execute code
of the following form in order to increment the counter in a tuple space myTS:
<s, count> := myTS.take(<"counter", integer>);
myTS.write(<"counter", count+1>);
This tuple space shown in fig, contains a range of tuples representing geographical information
about countries in the United Kingdom, including populations and capital cities.
The take operation take(<String, "Scotland", String>) will match <"Capital", "Scotland",
"Edinburgh">, whereas take(<String, "Scotland", Integer>) will match <"Population", "Scotland",
5168000>.

The write operation write(<"Population", "Wales, 2900000>) will insert a new tuple in the tuple
space with information on the population of Wales.

Finally, read(<"Population", String, Integer) can match the equivalent tuples for the populations
of the UK, Scotland or indeed Wales, if this operation is executed after the corresponding write
operation.

Properties associated with tuple spaces:


1. Space uncoupling: A tuple placed in tuple space may originate from any number of sender
processes and may be delivered to any one of a number of potential recipients.

2. Time uncoupling: A tuple placed in tuple space will remain in that tuple space until removed
(potentially indefinitely), and hence the sender and receiver do not need to overlap in time.
Implementation issues
Tuple spaces adopt a centralized solution where the tuple space resource is
managed by a single server. This has advantages in terms of simplicity, but
such solutions are clearly not fault tolerant and also will not scale. Because
of this, distributed solutions have been proposed.

Several systems have proposed the use of replication to overcome the


problems identified above.

Distributed Objects
Middleware based on distributed objects brings the benefits of object
oriented approach to distributed programming. With object oriented
approach, distributed systems programmers are offered with like rich
programming abstractions, and benefits of object oriented tools and
techniques.

Examples of distributed object middleware include java RMI and CORBA.


Java RMI usage is restricted to java based applications whereas CORBA is a
multi language solution allowing objects written in a variety of languages to
interoperate.
Differences between distributed programming and object oriented
programming:

1. Class: A fundamental concept in OO programming but in distributed


object middleware, it is difficult to agree upon a class in a heterogeneous
environment where multiple languages coexist.
In distributed environment, instead of class specific terms such as factory,
template are used.

2. Inheritance: Distributed object middleware offers interface inheritance,


where a new interface inherits the method signatures of the original
interface and can add extra.
In OO approach, inheritance is a relationship between implementations
where a new class inherits the implementation of original class and can
add extra behaviour.
Such implementation inheritance is difficult in distributed systems due to
the need to resolve the correct executable behaviour at run time.
Because of added complexities involved, a distributed system should need to
have following functionalities:

1. Inter-object communication: A distributed object middleware framework


must offer one or more objects to communicate in the distributed
environment.
This facility, though provided by remote method invocation, is often
supplemented by Corba’s event service and an associated notification
service, both implemented as services on top of the core middleware.

2. Life cycle management: It is concerned with the creation, migration and


deletion of objects, with each step having to deal with the distributed nature
of the underlying environment.

3. Activation & Deactivation: Activation is the process of making an object


active in distributed environment by providing necessary resources for it to
process incoming invocations effectively by locating the object in virtual
memory and giving it the necessary thread to execute.
Deactivation is to make an object temporarily unable to process invocations.

4. Persistence: Maintaining states of objects across possible cycles of


activation and deactivation and system failures. Stateful objects should be
persistent.

5. Additional services: Additional services like naming, security and


transaction should also be offered by distributed object middleware
framework
From objects to components

Components based approaches emerged to tackle the problems distributed object computing.
They are
1. Implicit dependencies
2. Interaction with the middleware
3. Lack of separation of distribution concerns
4. No support for deployment
Implicit dependencies:
Implicit dependencies in the distributed configuration makes it difficult to ensure the safe composition of
a configuration, to replace one object with another, and for third-party developers to implement one
particular element in a distributed configuration.
Requirement:
Specify not only the interfaces offered by the objects but also the dependencies that object has on other
objects
Interaction with the middleware:
In using distributed object middleware the programmer is exposed to many relatively low-level
details associated with the middleware architecture which becomes a distraction form the main
purpose, which is to create a particular application.
Requirement:
Simplify the programming of distributed applications , by presenting a clean separation of concerns
between code related to operation in a middleware framework and code associated with the
application.
Lack of separation of distribution concerns:
Programmers using distributed object middleware have to deal explicitly with non-functional
concerns related to issues such as security, transactions, coordination and replication.
Requirement:
Separation of the concerns and the complexities of dealing with the services should be hidden
wherever possible from the programmer
No support for deployment:
There is no support for the deployment of the distributed objects and developers inevitably resort to
adhoc strategies for deployment which are not portable to other environments
Requirements:
Middleware platforms should provide intrinsic support for deployment
Essence of components:
Components: A software component is a unit of composition(structure) with contractually specified
interfaces and explicit context dependencies only.
A component specifies both its interfaces provided to the outside world and its dependencies on other
components in the distributed environment.
A component is specified in terms of a contract , which includes
• A set of provided interfaces
• A set of required interfaces

A software architecture consists of components, interfaces and connections between interfaces.


Component-based approaches offer two styles of interface:
• Interfaces supporting remote method invocation
• Interfaces supporting distributed events
Components and distributed systems:
Containers:
Containers support a common pattern often encountered in distributed applications , which consists of
• A front-end client
• One or more components that implement the application or business logic
• System services that manages the associated data in persistent storage

The task of the container is to provide a managed server-side hosting environment for
components and to provide the necessary separation of the concerns alluded.
• components deal with application concerns and the container deals with DS and middleware issues.
• The structure of the container shows a number of components encapsulated within the container.
• The container does not provide direct access to the components but rather intercepts incoming
invocations and then takes appropriate actions to ensure the desired properties of the distributed
applications are maintained.
Application servers:
Middleware that supports the container pattern and the separation of concerns implied by the pattern is known as
an application server.
Ex.
WebSphere Application Server
Enterprise JavaBeans
GlassFish
Support for deployment:
Component based middleware support for the deployment of component configurations.
Release of software are packaged as software architectures together with deployment descriptors.
Deployment descriptors – fully describe how the configurations should be deployed in a distributed
environment.
Deployment descriptors are written in XML and include sufficient information to ensure that
• Component are correctly connected with appropriate protocols and associated middleware support
• The underlying middleware and platform are configured to provide the right level of support to the
component configuration
• The associated distributed system services are set up to provide the right level of security, transaction support
and so on.

You might also like