Distributed Operating Syst EM: 15SE327E Unit 2
Distributed Operating Syst EM: 15SE327E Unit 2
DI S TRI B
15SE327E
UNIT 2
COMMUNICATION IN DISTRIBUTED SYSTEMS
• THESINGLE MOST IMPORTANT DIFFERENCE BETWEEN A DISTRIBUTED SYSTEM AND A
UNIPROCESSOR SYSTEM IS THE INTERPROCESS COMMUNICATION.
• IN
A UNIPROCESSOR SYSTEM, MOST INTERPROCESS COMMUNICATION IMPLICITLY
ASSUMES THE EXISTENCE OF SHARED MEMORY. A TYPICAL EXAMPLE IS THE PRODUCER-
CONSUMER PROBLEM, IN WHICH ONE PROCESS WRITES INTO A SHARED BUFFER AND
ANOTHER PROCESS READS FROM IT.
• IN
A DISTRIBUTED SYSTEM THERE IS NO SHARED MEMORY WHATSOEVER, SO THE
ENTIRE NATURE OF INTERPROCESS COMMUNICATION MUST BE COMPLETELY RETHOUGHT
FROM SCRATCH
LAYERED PROTOCOLS {OSI MODEL}
• Due to the absence of shared memory, all communication in distributed systems is based on
message passing. When process A wants to communicate with process B, it first builds a
message in its own address space. Then it executes a system call that causes the operating
system to fetch the message and send it over the network to B.
• Although this basic idea sounds simple enough, in order to prevent chaos, A and B have to agree
on the meaning of the bits being sent. If A sends a brilliant new novel written in French and
encoded in IBM's EBCDIC character code, and B expects the inventory of a supermarket written in
English and encoded in ASCII, communication will be less than optimal.
• To make it easier to deal with the numerous levels and issues involved in communication, the
International Standards Organization (ISO) has developed a reference model that clearly identifies
the various levels involved, gives them standard names, and points out which level should do
which job. This model is called the Open Systems Interconnection Reference Model , usually
abbreviated as ISO OSI or sometimes just the OSI model.
• To start with, the OSI model is designed to allow open systems to communicate.
• An open system is one that is prepared to communicate with any other open system by using
standard rules that govern the format, contents, and meaning of the messages sent and received.
• When the CPU wants to read a word from memory, it puts the address and certain control signals
on the bus. The memory board is expected to see these signals and respond by putting the word
requested on the bus within a certain time interval. If the memory board observes the required bus
protocol, it will work correctly, otherwise it will not.
• Similarly, to allow a group of computers to communicate over a network, they must all agree on
the protocols to be used. The OSI model distinguishes between two general types of protocols.
• With connection-oriented protocols, before exchanging data, the sender and receiver first explicitly
establish a connection, and possibly negotiate the protocol they will use. When they are done,
they must release (terminate) the connection.
• With connectionless protocols, no setup in advance is needed. The sender just transmits the first
message when it is ready. Dropping a letter in a mailbox is an example of connectionless
communication. With computers, both connection-oriented and connectionless communication are
common.
• In the OSI model, communication is divided up into seven levels or layers.
• In this way, the problem can be divided up into manageable pieces, each of which can be
solved independent of the others.
• The interface consists of a set of operations that together define the service the layer is
prepared to offer its users.
• In the OSI model, when process A on machine 1 wants to
communicate with process B on machine 2, it builds a
message and passes the message to the application layer
on its machine.
• Some layers add not only a header to the front, but also a
trailer to the end.
• In addition, the size and shape of the network connector (plug), as well as the number of pins
and meaning of each are of concern here.
• The physical layer protocol deals with standardizing the electrical, mechanical, and signalling
interfaces so that when one machine sends a 0 bit it is actually received as a 0 bit and not a
1 bit.
DATA LINK LAYER
• The physical layer just sends bits.
• As long as no errors occur, all is well. However, real communication networks are subject to
errors, so some mechanism is needed to detect and correct them.
• This mechanism is the main task of the data link layer. What it does is to group the bits into
units, sometimes called frames, and see that each frame is correctly received.
• The data link layer does its work by putting a special bit pattern on the start and end of each
frame, to mark them, as well as computing a checksum by adding up all the bytes in the frame
in a certain way.
• The data link layer appends the checksum to the frame.
• When the frame arrives, the receiver re-computes the checksum from the data and compares the
result to the checksum following the frame.
• If they agree, the frame is considered correct and is accepted. It they disagree, the receiver asks
the sender to retransmit it. Frames are assigned sequence numbers (in the header), so everyone
can tell which is which.
For example:
• A is trying to send two messages, 0 and 1, to B. At time 0, data message 0 is sent, but when it
arrives, at time 1, noise on the transmission line has caused it to be damaged, so the checksum
is wrong. B notices this, and at time 2 asks for a retransmission using a control message.
Unfortunately, at the same time, A is sending data message 1.
• A wide-area network, however, consists of a large number of machines, each with some number
of lines to other machines, rather like a large-scale map showing major cities and roads
connecting them.
• For a message to get from the sender to the receiver it may have to make a number of hops,
at each one choosing an outgoing line to use. The question of how to choose the best path is
called routing, and is the primary task of the network layer.
• The problem is complicated by the fact that the shortest route is not always the best route.
• What really matters is the amount of delay on a given route, which, in turn, is related to the
amount of traffic and the number of messages queued up for transmission over the various
lines.
• The delay can thus change over the course of time. Some routing algorithms try to adapt to
changing loads, whereas others are content to make decisions based on long-term averages.
• Two network-layer protocols are in widespread use, one connection-oriented and one
connectionless.
• The connection-oriented one is called X.25, and is favoured by the operators of public networks,
such as telephone companies and the European PTTs.
• The connectionless one is called IP (Internet Protocol) and is part of the DoD (U.S. Department
of Defence) protocol suite.
THE TRANSPORT LAYER
• Packets can be lost on the way from the sender to the receiver. Although some applications can
handle their own error recovery, others prefer a reliable connection. The job of the transport layer
is to provide this service.
• The idea is that the session layer should be able to deliver a message to the transport layer
with the expectation that it will be delivered without loss.
• Upon receiving a message from the session layer, the transport layer breaks it into pieces small
enough for each to fit in a single packet, assigns each one a sequence number, and then sends
them all.
• The discussion in the transport layer header concerns which packets have been sent, which have
been received, how many more the receiver has room to accept, and similar topics.
• Reliable transport connections (which by definition are connection-oriented) can be built on top of
either X.25 or IP.
• In the former case all the packets will arrive in the correct sequence (if they arrive at all), but in
the latter case it is possible for one packet to take a different route and arrive earlier than the
packet sent before it.
• It is up to the transport layer software to put everything back in order to maintain the illusion
that a transport connection is like a big tube — you put messages into it and they come out
undamaged and in the same order in which they went in.
• The official ISO transport protocol has five variants, known as TP0 through TP4.
• The differences relate to error handling and the ability to send several transport connections
over a single X.25 connection. The choice of which one to use depends on the properties of
the underlying network layer.
• The DoD transport protocol is called TCP (Transmission Control Protocol). It is similar to TP4.
• The combination TCP/IP is widely used at universities and on most UNIX systems.
• The DoD protocol suite also supports a connectionless transport protocol called UDP(Universal
Datagram Protocol), which is essentially just IP with some minor additions. User programs that
do not need a connection-oriented protocol normally use UDP.
THE SESSION LAYER
• The session layer is essentially an enhanced version of the transport layer.
• It provides dialog control, to keep track of which party is currently talking, and it provides
synchronization facilities.
• The latter are useful to allow users to insert checkpoints into long transfers, so that in the event
of a crash it is only necessary to go back to the last checkpoint, rather than all the way back to
the beginning.
THE PRESENTATION LAYER
• Unlike the lower layers, which are concerned with getting the bits from the sender to the receiver
reliably and efficiently, the presentation layer is concerned with the meaning of the bits.
• Most messages do not consist of random bit strings, but more structured information such as
people's names, addresses, amounts of money, and so on.
• In the presentation layer it is possible to define records containing fields like these and then have
the sender notify the receiver that a message contains a particular record in a certain format.
THE APPLICATION LAYER
• The application layer is really just a collection of miscellaneous protocols for common activities
such as electronic mail, file transfer, and connecting remote terminals to computers over a
network.
• The best known of these are the X.400 electronic mail protocol and the X.500 directory server.
ASYNCHRONOUS TRANSFER MODE NETWORKS
• When the telephone companies decided to build networks for the 21st Century, they faced a
dilemma: voice traffic is smooth, needing a low, but constant bandwidth, whereas data traffic is
bursty, usually needing no bandwidth (when there is no traffic), but sometimes needing a great
deal for very short periods of time.
• Neither traditional circuit switching nor packet switching was suitable for both kinds of traffic.
• After much study, a hybrid form using fixed-size blocks over virtual circuits was chosen as a
compromise that gave reasonably good performance for both types of traffic. This scheme,
called ATM (Asynchronous Transfer Mode) has become an international standard and is likely to
play a major role in future distributed systems, both local-area ones and wide-area ones.
• The ATM model that a sender first establishes a connection (i.e., a virtual circuit) to the receiver
or receivers.
• During connection establishment, a route is determined from the sender to the receiver(s) and
routing information is stored in the switches along the way.
• Using this connection, packets can be sent, but they are chopped up by the hardware into small,
fixed-sized units called cells.
• The cells for a given virtual circuit all follow the path stored in the switches. When the
connection is no longer needed, it is released and the routing information purged from the
switches.
• This scheme has a number of advantages over traditional packet and circuit switching. The most
important one is that a single network can now be used to transport an arbitrary mix of voice,
data, broadcast television, videotapes, radio, and other information efficiently, replacing what were
previously separate networks.
• New services, such as video conferencing for businesses, will also use it.
• In all cases, what the network sees is cells; it does not care what is in them.
• This integration represents an enormous cost saving and simplification that will make it possible
for each home and business to have a single wire (or fiber) coming in for all its communication
and information needs.
• It will also make possible new applications, such as video-on-demand, teleconferencing, and
access to thousands of remote data bases.
• Cell switching lends itself well to multicasting (one cell going to many destinations), a technique
needed for transmitting broadcast television to thousands of houses at the same time.
• Conventional circuit switching, as used in the telephone system, cannot handle this. Broadcast
media, such as cable TV can, but they cannot handle point-to-point traffic without wasting
bandwidth (effectively broadcasting every message).
• The advantage of cell switching is that it can handle both point-to-point and multicasting
efficiently.
• Fixed-size cells allow rapid switching, something much harder to achieve with current store-and-
forward packet switches. They also eliminate the danger of a small packet being delayed
because a big one is hogging a needed line.
• With cell switching, after each cell is transmitted , a new one can be sent, even a new one
belonging to a different packet.
• ATM has its own protocol hierarchy.
• The ATM layer deals with cells and cell transport, including
routing, so it covers OSI layer 2 and part of layer 3.
• However, unlike OSI layer 2, the ATM layer does not recover
lost or damaged cells. The adaptation layer handles breaking
packets into cells and reassembling them at the other end, which
does not appear explicitly in the OSI model until layer 4.
• Alternatively, the adaptor board can use SONET (Synchronous Optical NETwork) in the physical
layer, putting its cells into the payload portion of SONET frames.
• The virtue of this approach is compatibility with the internal transmission system of AT&T and
other carriers that use SONET.
• Of these 810 bytes, 36 bytes are overhead, leaving 774 bytes of payload.
• One frame is transmitted every 125 sec, to match the telephone system's standard sampling rate
of 8000 samples/sec, so the gross data rate (including overhead) is 51.840 Mbps and the net
data rate (excluding overhead) is 49.536 Mbps.
• The most important of these for ATM are OC-3c, at 155.520 Mbps and OC-12c, at 622.080
Mbps, because computers can probably produce data at these rates in the near future.
• For long-haul transmission within the telephone system, OC-12 and OC-48 are the most widely
used at present.
• OC-3c SONET adaptors for computers are now available to allow a computer to output SONET
frames directly.
• OC-12c is expected shortly. Since even OC-1 is overkill for a telephone, it is unlikely that many
audio telephones will ever speak ATM or SONET directly (ISDN will be used instead), but for
videophones ATM and SONET are ideal.
THE ATM LAYER
• When ATM was being developed, two factions developed within the standards committee.
• The Europeans wanted 32-byte cells because these had a small enough delay that echo
suppressors would not be needed in most European countries.
• The Americans, who already had echo suppressors, wanted 64-byte cells due to their greater
efficiency for data traffic.
• The end result was a 48-byte cell, which no one really liked. It is too big for voice and too small
for data.
• To make it even worse, a 5-byte header was added, giving a 53-byte cell containing a 48-byte
data field.
• Note that a 53-byte cell is not a good match for a 774-byte SONET payload, so ATM cells will
span SONET frames.
• Two separate levels of synchronization are thus needed: one to detect the start of a SONET
frame, and one to detect the start of the first full ATM cell within the SONET payload.
• However, a standard for packing ATM cells into SONET frames exists, and the entire layer can
be done in hardware.
• The layout of a cell header from a computer to the first ATM switch is shown in the diagram
below.
• Unfortunately, the layout of a cell header between two ATM switches is different, with
the GFC field being replaced by four more bits for the VPI field.
• In the view of many, this is unfortunate, since it introduces an unnecessary distinction between
computer-to-switch and switch-to-switch cells and hence adaptor hardware.
• Both kinds of cells have 48-byte payloads directly following the header.
• The GFC may some day be used for flow control, if an agreement on how to do it can be
achieved.
• The VPI and VCI fields together identify which path and virtual circuit a cell belongs to. Routing
tables along the way use this information for routing. These fields are modified at each hop along
the path.
• The purpose of the VPI field is to group together a collection of virtual circuits for the same
destination and make it possible for a carrier to reroute all of them without having to examine
the VCI field.
• The Payload type field distinguishes data cells from control cells, and further identifies several
kinds of control cells.
• The CLP field can be used to mark some cells as less important than others, so if congestion
occurs, the least important ones will be the ones dropped.
• Finally, there is a 1-byte checksum over the header (but not the data).
THE ATM ADAPTATION LAYER
• At 155 Mbps, a cell can arrive every 3 sec. Few, if any, current CPUs can handle in excess of
300,000 interrupts/sec.
• Thus a mechanism is needed to allow a computer to send a packet and to have the ATM
hardware break it into cells, transmit the cells, and then have them reassembled at the other end,
generating one interrupt per packet, not per cell.
• This disassembly/reassembly is the job of the adaptation layer. It is expected that most host
adaptor boards will run the adaptation layer on the board and give one interrupt per incoming
packet, not one per incoming cell.
Unfortunately, here too, the standards writers did not get it quite right. Originally adaptation layers
were defined for four classes of traffic:
• Quickly it was discovered that classes 3 and 4 were essentially the same, so they were merged
into a new class, 3/4.
• At that point the computer industry woke up from a short nap and noticed that none of the
adaptation layers were suitable for data traffic, so they drafted AAL 5, for computer-to-computer
traffic.
• Its nickname, SEAL (Simple and Efficient Adaptation Layer), hints at what its designers thought of
the other three AAL layers.
• Let us focus on SEAL, due to its simplicity.
• It uses only one bit in the ATM header, one of the bits in the Payload type field.
• This bit is normally 0, but is set to 1 in the last cell of a packet. The last cell contains a trailer
in the final 8 bytes.
• In most cases there will be some padding (with zeros) between the end of the packet and the
start of the trailer.
• With SEAL, the destination just assembles incoming cells for each virtual circuit until it finds one
with the end-of-packet bit set. Then it extracts and processes the trailer.
• The trailer has four fields. The first two are each 1 byte long and are not used. Then comes a
2-byte field giving the packet length, and a 4-byte checksum over the packet, padding, and
trailer.
ATM SWITCHING
• ATM networks are built up of copper or optical
cables and switches. The figure below (a)
illustrates a network with four switches. Cells
originating at any of the eight computers
attached to the system can be switched to any
of the other computers by traversing one or
more switches. Each of these switches has four
ports, each used for both input and output.
• Although the standard allows cells to be dropped, it requires that those delivered must be
delivered in order.
• A problem arises when two cells arrive at the same time on different input lines and need to go
to the same output port.
• Just throwing one of them away is allowed by the standard, but if your switch drops more than 1
cell in 10^1^2, you are unlikely to sell many switches.
• An alternative scheme is to pick one of them at random and forward it, holding the other cell
until later.
• In the next round, this algorithm is applied again. If two ports each have streams of cells for the
same destination, substantial input queues will build up, blocking other cells behind them that
want to go to output ports that are free. This problem is known as head-of-line blocking.
• A different switch design copies the cell into a queue associated with the output buffer and lets
it wait there, instead of keeping it in the input buffer.
• It is also possible for a switch to have a pool of buffers that can be used for both input and
output buffering.
• Still another possibility is to buffer on the input side, but allow the second or third cell in line to
be switched, even if the first one cannot be.
THE CLIENT-SERVER MODEL
• While ATM networks are going to be important in the future, for the moment they are too
expensive for most applications, so let us go back to more conventional networking.
• At first glance, layered protocols along the OSI lines look like a fine way to organize a distributed
system. In effect, a sender sets up a connection (a bit pipe) with the receiver, and then pumps the
bits in, which arrive without error, in order, at the receiver. What could be wrong with this?
• Plenty. To start with, the existence of all those headers generates a considerable amount of
overhead. Every time a message is sent it must be processed by about half a dozen layers, each
one generating and adding a header on the way down or removing and examining a header on the
way up. All of this work takes time.
• On wide-area networks, where the number of bits/sec that can be sent is typically fairly low (often
as little as 64K bits/sec), this overhead is not serious. The limiting factor is the capacity of the
lines, and even with all the header manipulation, the CPUs are fast enough to keep the lines
running at full speed. Thus a wide-area distributed system can probably use the OSI or TCP/IP
protocols without any loss in (the already meagre) performance.
• However, for a LAN-based distributed system, the protocol overhead is often substantial.
• So much CPU time is wasted running protocols that the effective throughput over the LAN is
often only a fraction of what the LAN can do.
• As a consequence, most LAN-based distributed systems do not use layered protocols at all, or if
they do, they use only a subset of the entire protocol stack.
• In addition, the OSI model addresses only a small aspect of the problem — getting the bits from
the sender to the receiver
CLIENTS AND SERVERS
• The idea behind this model is to structure the operating system as a group of cooperating
processes, called servers, that offer services to the users, called clients.
• The client and server machines normally all run the same microkernel, with both the clients and
servers running as user processes, as we saw earlier.
• A machine may run a single process, or it may run multiple clients, multiple servers, or a mixture
of the two.
• To avoid the considerable overhead of the connection-oriented protocols such as OSI or TCP/IP,
the client server model is usually based on a simple, connectionless request/reply protocol.
• The client sends a request message to the server asking for some service. The server does the
work and returns the data requested or an error code indicating why the work could not be
performed, as depicted in the figure.
• The primary advantage of this model is the simplicity. The client sends a request and gets an
answer. No connection has to be established before use or torn down afterward. The reply
message serves as the acknowledgement to the request.
• From the simplicity comes another advantage: efficiency.
• Assuming that all the machines are identical, only three levels of protocol are
needed, as shown in the figure.
• The physical and data link protocols take care of getting the packets from
client to server and back.
• Layer 5 is the request/reply protocol. It defines the set of legal requests and
the set of legal replies to these requests.
• These system calls can be invoked through library procedures, say, send (dest, & mptr) and
receive (addr,&mptr).
• The former sends the message pointed to by mptr to a process identified by dest and causes the
caller to be blocked until the message has been sent.
• When one does, the message is copied to the buffer pointed to by mptr and the caller is
unblocked.
• The addr parameter specifies the address to which the receiver is listening. Many variants of
these two procedures and their parameters are possible.
ADDRESSING
• In order for a client to send a message to a server, it must know the server's address.
• In the example of the preceding section, the server's address was simply hardwired
into header.h as a constant. While this strategy might work in an especially simple system, usually
a more sophisticated form of addressing is needed.
• In our example, the file server has been assigned a numerical address (243), but we have not
really specified what this means.
• If it refers to a specific machine, the sending kernel can extract it from the message structure and
use it as the hardware address for sending the packet to the server.
• All the sending kernel has to do then is build a frame using the 243 as the data link address and
put the frame out on the LAN. The server's interface board will see the frame, recognize 243 as
its own address, and accept it.
• If there is only one process running on the destination machine, the kernel will know what to do
with the incoming message — give it to the one and only process running there.
• However, what happens if there are several processes running on the destination machine? Which
one gets the message? The kernel has no way of knowing.
• Consequently, a scheme that uses network addresses to identify processes means that only one
process can run on each machine. While this limitation is not fatal, it is sometimes a serious
restriction.
• An alternative addressing system sends messages to processes rather than
to machines. Although this method eliminates all ambiguity about who the
real recipient is, it does introduce the problem of how processes are
identified.
• One common scheme is to use two part names, specifying both a machine
and a process number. Thus 243.4 or 4@243 or something similar
designates process 4 on machine 243.
• The machine number is used by the kernel to get the message correctly
delivered to the proper machine, and the process number is used by the
kernel on that machine to determine which process the message is
intended for.
• A nice feature of this approach is that every machine can number its
processes starting at 0. No global coordination is needed because there is
never any ambiguity between process 0 on machine 243 and process 0 on
machine 199. The former is 243.0 and the latter is 199.0.
• The local-id field is normally a randomly chosen 16-bit or 32-bit integer (or the next one in
sequence).
• One process, typically a server, starts up by making a system call to tell the kernel that it wants
to listen to local-id. Later, when a message comes in addressed to machine.local_id, the kernel
knows which process to give the message to.
• Most communication in Berkeley UNIX, for example, uses this method, with 32-bit Internet
addresses used for specifying machines and 16-bit numbers for the local-id fields.
• Nevertheless, machine.process addressing is far from ideal.
• Specifically, it is not transparent since the user is obviously aware of where the server is located,
and transparency is one of the main goals of building a distributed system.
• To see why this matters, suppose that the file server normally runs on machine 243, but one day
that machine is down. Machine 176 is available, but programs previously compiled
using header.h all have the number 243 built into them, so they will not work if the server is
unavailable. Clearly, this situation is undesirable.
•
An alternative approach is to assign each process a unique address that does not contain an
embedded machine number.
• One way to achieve this goal is to have a centralized process address allocator that simply
maintains a counter.
• Upon receiving a request for an address, it simply returns the current value of the counter and
then increments it by one. The disadvantage of this scheme is that centralized components like
this do not scale to large systems and thus should be avoided.
• Yet another method for assigning process identifiers is to let each
process pick its own identifier from a large, sparse address space,
such as the space of 64-bit binary integers.
• Every time a client runs, on the first attempt to use a server, the
client sends a query message to a special mapping server, often
called a name server, asking it for the machine number where the
server is currently located.
• Once this address has been obtained, the request can be sent
directly. As in the previous case, addresses can be cached.
In summary, we have the following methods for addressing processes:
1. Hardwire machine.number into client code.
2. Let processes pick random addresses; locate them by broadcasting.
3. Put ASCII server names in clients; look them up at run time.
• The first one is not transparent, the second one generates extra load on the system, and the
third one requires a centralized component, the name server.
• Of course, the name server can be replicated, but doing so introduces the problems associated
with keeping them consistent.