0% found this document useful (0 votes)
34 views

Final Unit 5

The document discusses memory hierarchy, cache memory, virtual memory, and demand paging. 1. It describes the memory hierarchy from slow auxiliary memory to fast main memory and cache memory. Main memory occupies the central position as it communicates directly with the CPU and auxiliary memory. 2. Cache memory stores currently executing program data and is much faster than main memory, helping reduce average memory access time. 3. Virtual memory allows processes to have logical addresses that are translated to physical addresses, allowing processes to be swapped in and out of main memory. 4. Demand paging is the process of loading pages into memory on demand when a page fault occurs, reducing memory usage but increasing access time.

Uploaded by

n9166254105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Final Unit 5

The document discusses memory hierarchy, cache memory, virtual memory, and demand paging. 1. It describes the memory hierarchy from slow auxiliary memory to fast main memory and cache memory. Main memory occupies the central position as it communicates directly with the CPU and auxiliary memory. 2. Cache memory stores currently executing program data and is much faster than main memory, helping reduce average memory access time. 3. Virtual memory allows processes to have logical addresses that are translated to physical addresses, allowing processes to be swapped in and out of main memory. 4. Demand paging is the process of loading pages into memory on demand when a page fault occurs, reducing memory usage but increasing access time.

Uploaded by

n9166254105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture 34

Memory Organization, Memory Hierarchy, Main Memory

Memory Hierarchy

The total memory capacity of a computer can be visualized by hierarchy of components. The
memory hierarchy system consists of all storage devices contained in a computer system from
the slow Auxiliary Memory to fast Main Memory and to smaller Cache memory.
Auxillary memory access time is generally 1000 times that of the main memory, hence it is at
the bottom of the hierarchy.
The main memory occupies the central position because it is equipped to communicate directly
with the CPU and with auxiliary memory devices through Input/output processor (I/O).
When the program not residing in main memory is needed by the CPU, they are brought in from
auxiliary memory. Programs not currently needed in main memory are transferred into auxiliary
memory to provide space in main memory for other programs that are currently in use.
The cache memory is used to store program data which is currently being executed in the CPU.
Approximate access time ratio between cache memory and main memory is about 1 to 7~10

000001
000002
Lecture 35
Auxiliary Memory, Associative Memory
Definition: Auxiliary memory (also referred to as secondary storage) is the non-volatile
memory lowest-cost, highest-capacity, and slowest-access storage in a computer system. It is
where programs and data kept for long-term storage or when not in immediate use. Such
memories tend to occur in two types-sequential access (data must access in a linear sequence)
and direct access (data may access in any sequence). The most common sequential storage
device is the hard disk drives, whereas direct-access devices include rotating drums, disks, CD-
ROMs, and DVD-ROMs.It used as permanent storage of data in mainframes and
supercomputers.
Auxiliary memory may also refer to as auxiliary storage, secondary storage, secondary memory,
external storage or external memory. Auxiliary memory is not directly accessible by the CPU;
instead, it stores noncritical system data like large data files, documents, programs and other
back up information that supplied to primary memory from auxiliary memory over a high-
bandwidth channel, which will use whenever necessary. Auxiliary memory holds data for future
use, and that retains information even the power fails.
Associative memory: A type of computer memory from which items may be retrieved by
matching some part of their content, rather than by specifying their address (hence also
called associative storage or Content-addressable memory (CAM).) Associative memory is much
slower than RAM, and is rarely encountered in mainstream computer designs. For example, that
serves as an identifying tag. Associative memory is used in multilevel memory systems, in which
a small fast memory such as a cache may hold copies of some blocks of a larger memory for
rapid access.
To retrieve a word from associative memory, a search key (or descriptor) must be presented that
represents particular values of all or some of the bits of the word. This key is compared in
parallel with the corresponding lock or tag bits of all stored words, and all words matching this
key are signaled to be available.
Associative memory is expensive to implement as integrated circuitry.Associative memory can
use in certain very-high-speed searching applications. Associative memory can search data (tag)
for access by content of data rather than address.

000003
Lecture 36

Cache Memory, Virtual Memory


Cache Memory is a special very high-speed memory. It is used to speed up and synchronizing
with high-speed CPU. Cache memory is costlier than main memory or disk memory but
economical than CPU registers. Cache memory is an extremely fast memory type that acts as a
buffer between RAM and the CPU. It holds frequently requested data and instructions so that
they are immediately available to the CPU when needed.
Cache memory is used to reduce the average time to access data from the Main memory. The
cache is a smaller and faster memory which stores copies of the data from frequently used main
memory locations. There are various different independent caches in a CPU, which store
instructions and data.

Virtual Memory in Operating System


Virtual Memory is a storage allocation scheme in which secondary memory can be addressed as
though it were part of main memory. The addresses a program may use to reference memory are
distinguished from the addresses the memory system uses to identify physical storage sites, and
program generated addresses are translated automatically to the corresponding machine
addresses.
The size of virtual storage is limited by the addressing scheme of the computer system and
amount of secondary memory is available not by the actual number of the main storage locations.
It is a technique that is implemented using both hardware and software. It maps memory
addresses used by a program, called virtual addresses, into physical addresses in computer
memory.

000004
1. All memory references within a process are logical addresses that are dynamically
translated into physical addresses at run time. This means that a process can be swapped in
and out of main memory such that it occupies different places in main memory at different
times during the course of execution.
2. A process may be broken into number of pieces and these pieces need not be continuously
located in the main memory during execution. The combination of dynamic run-time
address translation and use of page or segment table permits this.
If these characteristics are present then, it is not necessary that all the pages or segments are
present in the main memory during execution. This means that the required pages need to be
loaded into memory whenever required. Virtual memory is implemented using Demand Paging
or Demand Segmentation.
Demand Paging :
The process of loading the page into memory on demand (whenever page fault occurs) is known
as demand paging.

000005
The process includes the following steps :

1. If CPU try to refer a page that is currently not available in the main memory, it generates an
interrupt indicating memory access fault.
2. The OS puts the interrupted process in a blocking state. For the execution to proceed the
OS must bring the required page into the memory.
3. The OS will search for the required page in the logical address space.
4. The required page will be brought from logical address space to physical address space.
The page replacement algorithms are used for the decision making of replacing the page in
physical address space.
5. The page table will updated accordingly.
6. The signal will be sent to the CPU to continue the program execution and it will place the
process back into ready state.
Hence whenever a page fault occurs these steps are followed by the operating system and the
required page is brought into memory.

000006
Advantages :
 More processes may be maintained in the main memory: Because we are going to load only
some of the pages of any particular process, there is room for more processes. This leads to
more efficient utilization of the processor because it is more likely that at least one of the
more numerous processes will be in the ready state at any particular time.
 A process may be larger than all of main memory: One of the most fundamental restrictions
in programming is lifted. A process larger than the main memory can be executed because
of demand paging. The OS itself loads pages of a process in main memory as required.
 It allows greater multiprogramming levels by using less of the available (primary) memory
for each process.
Page Fault Service Time :
The time taken to service the page fault is called as page fault service time. The page fault
service time includes the time taken to perform all the above six steps.
Let Main memory access time is: m
Page fault service time is: s
Page fault rate is : p
Then, Effective memory access time = (p*s) + (1-p)*m
Swapping:
Swapping a process out means removing all of its pages from memory, or marking them so that
they will be removed by the normal page replacement process. Suspending a process ensures that
it is not runnable while it is swapped out. At some later time, the system swaps back the process
from the secondary storage to main memory. When a process is busy swapping pages in and out
then this situation is called thrashing.

000007
Thrashing :

At any given time, only few pages of any process are in main memory and therefore more

000008
processes can be maintained in memory. Furthermore time is saved because unused pages are not
swapped in and out of memory. However, the OS must be clever about how it manages this
scheme. In the steady state practically, all of main memory will be occupied with process’s
pages, so that the processor and OS has direct access to as many processes as possible. Thus
when the OS brings one page in, it must throw another out. If it throws out a page just before it is
used, then it will just have to get that page again almost immediately. Too much of this leads to a
condition called Thrashing. The system spends most of its time swapping pages rather than
executing instructions. So a good page replacement algorithm is required.

In the given diagram, initial degree of multi programming upto some extent of point(lamda), the
CPU utilization is very high and the system resources are utilized 100%. But if we further
increase the degree of multi programming the CPU utilization will drastically fall down and the
system will spent more time only in the page replacement and the time taken to complete the
execution of the process will increase. This situation in the system is called as thrashing.

Causes of Thrashing :
1. High degree of multiprogramming : If the number of processes keeps on increasing in
the memory than number of frames allocated to each process will be decreased. So, less
number of frames will be available to each process. Due to this, page fault will occur more
frequently and more CPU time will be wasted in just swapping in and out of pages and the
utilization will keep on decreasing.
For example:
Let free frames = 400
Case 1: Number of process = 100
Then, each process will get 4 frames.
Case 2: Number of process = 400
Each process will get 1 frame.
Case 2 is a condition of thrashing, as the number of processes are increased,frames per
process are decreased. Hence CPU time will be consumed in just swapping pages.
2. Lacks of Frames:If a process has less number of frames then less pages of that process
will be able to reside in memory and hence more frequent swapping in and out will be

000009
required. This may lead to thrashing. Hence sufficient amount of frames must be allocated
to each process in order to prevent thrashing.
Recovery of Thrashing :
 Do not allow the system to go into thrashing by instructing the long term scheduler not to
bring the processes into memory after the threshold.
 If the system is already in thrashing then instruct the mid term schedular to suspend some
of the processes so that we can recover the system from thrashing.

000010
Lecture 37
Characteristics of Multiprocessors
Characteristics of Multiprocessors
A multiprocessor system is an interconnection of two or more CPU, with memory and input-
output equipment. As defined earlier, multiprocessors can be put under MIMD category. The
term multiprocessor is some times confused with the term multicomputers. Though both support
concurrent operations, there is an important difference between a system with multiple
computers and a system with multiple processors. In a multicomputers system, there are multiple
computers, with their own operating systems, which communicate with each other, if needed,
through communication links. A multiprocessor system, on the other hand, is controlled by a
single operating system, which coordinate the activities of the various processors, either through
shared memory or interprocessor messages.

The advantages of multiprocessor systems are:

· Increased reliability because of redundancy in processors

· Increased throughput because of execution of


- multiple jobs in parallel
- portions of the same job in parallel

A single job can be divided into independent tasks, either manually by the programmer, or by the
compiler, which finds the portions of the program that are data independent, and can be executed
in parallel. The multiprocessors are further classified into two groups depending on the way their
memory is organized. The processors with shared memory are called tightly coupled or shared
memory processors. The information in these processors is shared through the common memory.
Each of the processors can also have their local memory too. The other class of multiprocessors
is loosely coupled or distributed memory multi-processors. In this, each processor have their own
private memory, and they share information with each other through interconnection switching
scheme or message passing.

000011
The principal characteristic of a multiprocessor is its ability to share a set of main memory and
some I/O devices. This sharing is possible through some physical connections between them
called the interconnection structures.
1. Multiprocessor:
A Multiprocessor is a computer system with two or more central processing units (CPUs) share
full access to a common RAM. The main objective of using a multiprocessor is to boost the
system’s execution speed, with other objectives being fault tolerance and application matching.
There are two types of multiprocessors, one is called shared memory multiprocessor and another
is distributed memory multiprocessor. In shared memory multiprocessors, all the CPUs shares
the common memory but in a distributed memory multiprocessor, every CPU has its own private
memory.

Applications of Multiprocessor –
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is
usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction, single
data stream (MISD), which is used for describing hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor –

000012
 Enhanced performance.
 Multiple applications.
 Multi-tasking inside an application.
 High throughput and responsiveness.
 Hardware sharing among CPUs.
2. Multicomputer:
A multicomputer system is a computer system with multiple processors that are connected
together to solve a problem. Each processor has its own memory and it is accessible by that
particular processor and those processors can communicate with each other via an
interconnection network.

As the multicomputer is capable of messages passing between the processors, it is possible to


divide the task between the processors to complete the task. Hence, a multicomputer can be used
for distributed computing. It is cost effective and easier to build a multicomputer than a
multiprocessor.
Difference between multiprocessor and Multicomputer:

000013
1. Multiprocessor is a system with two or more central processing units (CPUs) that is
capable of performing multiple tasks where as a multicomputer is a system with multiple
processors that are attached via an interconnection network to perform a computation task.
2. A multiprocessor system is a single computer that operates with multiple CPUs where as a
multicomputer system is a cluster of computers that operate as a singular computer.
3. Construction of multicomputer is easier and cost effective than a multiprocessor.
4. In multiprocessor system, program tends to be easier where as in multicomputer system,
program tends to be more difficult.
5. Multiprocessor supports parallel computing, Multicomputer supports distributed
computing.

000014
Lecture 38
Interconnection Structures
A computer consists of a set of components or modules of three basic types
(processor, memory, I/O) that communicate with each other. In effect, a computer is a
network of basic modules. Thus, there must be paths for connecting the modules. The
collection of paths connecting the various modules is called the interconnection structure.
The design of this structure will depend on the exchanges that must be made among
modules.
Figure below suggests the types of exchanges that are needed by indicating the major forms
of input and output for each module type.

000015
6.
• Memory: Typically, a memory module will consist of N words of equal length. Each
word is assigned a unique numerical address (0, 1, . . . , N – 1). A word of data can be read
from or written into the memory.The nature of the operation is indicated by read and write
control signals. The location for the operation is specified by an address.
7. • I/O module: From an internal (to the computer system) point of view, I/O is functionally
similar to memory. There are two operations, read and write. Further, an I/O module may
control more than one external device. We can refer to each of the interfaces to an external
device as a port and give each a unique address (e.g., 0, 1,…, M – 1). In addition, there are

000016
external data paths for the input and output of data with an external device. Finally, an I/O
module may be able to send interrupt signals to the processor.
8. • Processor: The processor reads in instructions and data, writes out data after processing,
and uses control signals to control the overall operation of the system. It also receives
interrupt signals. The preceding list defines the data to be exchanged. The
interconnection structure must support the following types of transfers:
9. • Memory to processor: The processor reads an instruction or a unit of data
from memory.
• Processor to memory: The processor writes a unit of data to memory.
• I/O to processor:The processor reads data from an I/O device via an I/O module.
• Processor to I/O: The processor sends data to the I/O device.
• I/O to or from memory: For these two cases, an I/O module is allowed to exchange data
directly with memory, without going through the processor, using direct memory access
(DMA).
10. Though a number of interconnection structures have been tried. By far the most common is
the bus and various multiple-bus structures.

000017
Lecture 39
Inter-processor Arbitration
Computer systems contain a number of buses at various levels to facilitate the transfer of
information between components. The CPU contains a number of internal buses for transferring
information between processor registers and ALU. A memory bus consists of lines for
transferring data, address, and read/write information. An I/O bus is used to transfer information
to and from input and output devices. A bus that connects major components in a multi-processor
system, such as CPUs, lOPs, and memory, is called a system bus. The physical circuits of a
system bus are contained in a number of identical printed circuit boards. Each board in the
system belongs to a particular module. The board consists of circuits connected in parallel
through connectors. Each pin of each circuit connector is connected by a wire to the
corresponding pin of all other connectors in other boards. Thus any board can be plugged into a
slot in the back-pane that forms the system bus.
The processors in a shared memory multiprocessor system request access to common memory or
other common resources through the system bus. If n other processor is currently utilizing the
bus, the requesting processor may be granted access immediately. However, the requesting
processor must wait if another processor is currently utilizing the system bus. Furthermore, other
processors may request the system bus at the same time. Arbitration must then be performed to
resolve this multiple contention for the shared resources. The arbitration logic would be part of
the system bus controller placed between the local bus and the system bus as shown in figure
below.
Figure: System Bus Structure for Multiprocessors
System Bus
A typical system bus consists of approximately 100 signal lines. These lines are divided into
three functional groups: data, address, and control. In addition, there are power distribution lines
that supply power to the components. For example, the IEEE standard 796 multibus system has
16 data lines, 24 address lines, 26 control lines, and 20 power lines, for a total of 86 lines.
The data lines provide a path for the transfer of data between processors and common memory.
The number of data lines is usually a multiple of 8, with 16 and 32 being most common. The
address lines are used to identify a memory address or any other source or destination, such as
input or output ports. The number of address lines determines the maximum possible memory

000018
capacity in the system. For example, an address of 24 lines can access up to 2″ (16 mega) words
of memory. The data and address lines are terminated with three-state buffers. The address
buffers are unidirectional from processor to memory. The data lines are bi- directional, allowing
the transfer of data in either direction.
Data transfers over the system bus may be synchronous or asynchronous. In a synchronous bus,
each data item is transferred during a time slice known in advance to both source and destination
units; synchronization is achieved by driving both units from a common clock source. An
alternative procedure is to have separate clocks of approximately the same frequency in each
unit. Synchronization signals are transmitted periodically in order to keep all clocks
asynchronous bus in the system in step with each other. In an asynchronous bus, each data item
being transferred is accompanied by handshaking control signals to indicate when the data are
transferred from the source and received by the destination.
The control lines provide signals for controlling the information transfer between units. Timing
signals indicate the validity of data and address information. Command signals specify
operations to be performed. Typical control lines include transfer signals such as memory read
and write, acknowledge of a transfer, interrupt requests, bus control signals such as bus request
and bus grant, and signals for arbitration procedures.
Serial Arbitration Procedure
Arbitration procedures service all processor requests on the basis of established priorities. A
hardware bus priority resolving technique can be established by means of a serial or parallel
connection of the units requesting control the system bus. The serial priority resolving technique
is obtained from daisy-chain connection of bus arbitration circuits similar to the priority interrupt
logic.
The processors connected to the system bus are assigned priority according to their position
along the priority control line. The device closest to the priority line is assigned the highest
priority. When multiple devices concurrently request the use of the bus, the device with the
highest priority is granted access to it.
Figure below shows the daisy-chain connection of four arbiters. It is assumed that each processor
has its own bus arbiter logic with priority-in and priority-out lines. The priority out (PO) of each
arbiter is connected to the priority in (PI) of the next-lower-priority arbiter. The PI of the highest-
priority unit is maintained at a logic value 1. The highest-priority unit in the system will always

000019
receive access to the system bus when it requests it. The PO output for a particular arbiter is
equal to 1 if its PI input is equal to 1 and the processor associated with the arbiter logic is not
requesting control of the bus. This is the way that priority is passed to the next unit in the chain.
If the processor requests control of the bus and the corresponding arbiter finds its PI input equal
to 1, it sets its PO output to 0. Lower-priority arbiters receive a 0 in PI and generate a 0 in PO.
Thus the processor whose arbiter has a PI = 1 and PO = 0 is the one that is given control of the
system bus.
A processor may be in the middle of a bus operation when a higher priority processor requests
the bus. The lower-priority processor must complete its bus operation before it relinquishes
control of the bus. The bus busy line shown in Fig. below provides a mechanism for an orderly
transfer of control. The busy line comes from open-collector circuits in each unit and provides a
wired-OR logic connection. When an arbiter receives control of the bus (because its PI = 1 and
PO = 0) it examines the busy line. If the line is inactive, it means that no other processor is using
the bus. The arbiter activates the busy line and its processor takes control of the bus. However, if
the arbiter finds the busy line active, it means that another processor is currently using the bus.
The arbiter keeps examining the busy line while the lower-priority processor that lost control of
the bus completes its operation. When the bus busy line returns to its inactive state, the higher-
priority arbiter enables the busy line, and its corresponding processor can then conduct the
required bus transfers.
Figure: Serial (Daisy Chain) Arbitration

Parallel Arbitration Logic


The parallel bus arbitration technique uses an external priority encoder and decoder as shown in
figure below. Each bus arbiter in the parallel scheme has a bus request output line and a bus
acknowledge input line. Each arbiter enables the request line when its processor is requesting
access to the system bus. The processor takes control of the bus if its acknowledge input line is
enabled. The bus busy line provides an orderly transfer of control, as in the daisy-chaining case.
Figure: Parallel Arbitration
Figure above shows the request lines from four arbiters going into a 4 x 2 priority encoder. The
output of the encoder generates a 2-bit code, which represents the highest-priority unit among

000020
those requesting the bus. The 2-bit code from the encoder output drives a 2×4 decoder which
enables the proper acknowledge line to grant bus access to the highest-priority unit.
Dynamic Arbitration Algorithms
The two bus arbitration procedures just described use a static priority algorithm since the priority
of each device is fixed by the way it is connected to the bus. In contrast, a dynamic priority
algorithm gives the system the capability for changing the priority of the devices while the
system is in operation. We now discuss a few arbitration procedures that use dynamic priority
algorithms.
The time slice algorithm allocates a fixed-length time slice of bus time that is offered
sequentially to each processor, in round-robin fashion. The service given to each system
component with this scheme is independent of its location along the bus. No preference is given
to any particular device since each is allotted the same amount of time to communicate with the
bus.
In a bus system that uses polling, the bus-grant signal is replaced by a set of lines called poll
lines, which are connected to all units. These lines are used by the bus controller to define an
address for each device connected to the bus. The bus controller sequences through the addresses
in a prescribed manner. When a processor that requires access recognizes its address, it activates
the bus busy-line and then accesses the bus. After a number of bus cycles, the polling process
continues by choosing a different processor. The polling sequence is normally programmable,
and as a result, the selection priority can be altered under program control.
The least recently used (LRU) algorithm gives the highest priority to the requesting device that
has not used the bus for the longest interval. The priorities are adjusted after a number of bus
cycles according to the LRU algorithm. With this procedure, no processor is favored over any
other since the priorities are dynamically changed to give every device an opportunity to access
the bus.
In the first-come, first-serve scheme, requests are served in the order received. To implement this
algorithm, the bus controller establishes a queue arranged according to the time that the bus
requests arrive. Each processor must wait for its turn to use the bus on a first-in, first-out (FIFO)
basis. The rotating daisy-chain procedure is a dynamic extension of the daisy-chain algorithm. In
this scheme there is no central bus controller, and the priority line is connected from the priority-
out of the last device back to the priority-in of the first device in a closed loop. This is similar to

000021
the connections shown in figure for serial arbitration except that the PO output of arbiter 4 is
connected to the PI input of arbiter 1. Whichever device has access to the bus serves as a bus
controller for the following arbitration. Each arbiter priority for a given bus cycle is determined
by its position along the bus priority line from the arbiter whose processor is currently
controlling the bus. Once an arbiter releases the bus, it has the lowest priority.

000022
Lecture 40
Inter processor Communication and Synchronization
A process can be of two type:
 Independent process.
 Co-operating process.
An independent process is not affected by the execution of other processes while a co-operating
process can be affected by other executing processes. Though one can think that those processes,
which are running independently, will execute very efficiently but in practical, there are many
situations when co-operative nature can be utilised for increasing computational speed,
convenience and modularity. Inter process communication (IPC) is a mechanism which allows
processes to communicate each other and synchronize their actions. The communication between
these processes can be seen as a method of co-operation between them. Processes can
communicate with each other using these two ways:
1. Shared Memory
2. Message passing
The Figure 1 below shows a basic structure of communication between processes via shared
memory method and via message passing.
An operating system can implement both method of communication. First, we will discuss the
shared memory method of communication and then message passing. Communication between
processes using shared memory requires processes to share some variable and it completely
depends on how programmer will implement it. One way of communication using shared
memory can be imagined like this: Suppose process1 and process2 are executing simultaneously
and they share some resources or use some information from other process, process1 generate
information about certain computations or resources being used and keeps it as a record in shared
memory. When process2 need to use the shared information, it will check in the record stored in
shared memory and take note of the information generated by process1 and act accordingly.
Processes can use shared memory for extracting information as a record from other process as
well as for delivering any specific information to other process.
Let’s discuss an example of communication between processes using shared memory method.
Now, We will start our discussion for the communication between processes via message
passing. In this method, processes communicate with each other without using any kind of

000023
shared memory. If two processes p1 and p2 want to communicate with each other, they proceed
as follow:
 Establish a communication link (if a link already exists, no need to establish it again.)
 Start exchanging messages using basic primitives.
We need at least two primitives:
– send(message, destinaion) or send(message)
– receive(message, host) or receive(message)

The message size can be of fixed size or of variable size. if it is of fixed size, it is easy for OS
designer but complicated for programmer and if it is of variable size then it is easy for
programmer but complicated for the OS designer. A standard message can have two
parts: header and body.
The header part is used for storing Message type, destination id, source id, message length and
control information. The control information contains information like what to do if runs out of
buffer space, sequence number, priority. Generally, message is sent using FIFO style.

Message Passing through Communication Link.


Direct and Indirect Communication link
Now, We will start our discussion about the methods of implementing communication link.
While implementing the link, there are some questions which need to be kept in mind like :

000024
1. How are links established?
2. Can a link be associated with more than two processes?
3. How many links can there be between every pair of communicating processes?
4. What is the capacity of a link? Is the size of a message that the link can accommodate fixed
or variable?
5. Is a link unidirectional or bi-directional?
A link has some capacity that determines the number of messages that can reside in it
temporarily for which Every link has a queue associated with it which can be either of zero
capacity or of bounded capacity or of unbounded capacity. In zero capacity, sender wait until
receiver inform sender that it has received the message. In non-zero capacity cases, a process
does not know whether a message has been received or not after the send operation. For this, the
sender must communicate to receiver explicitly. Implementation of the link depends on the
situation, it can be either a Direct communication link or an In-directed communication link.
Direct Communication links are implemented when the processes use specific process identifier
for the communication but it is hard to identify the sender ahead of time.
For example: the print server.

In-directed Communication is done via a shared mailbox (port), which consists of queue of
messages. Sender keeps the message in mailbox and receiver picks them up.

Message Passing through Exchanging the Messages.


Synchronous and Asynchronous Message Passing:
A process that is blocked is one that is waiting for some event, such as a resource becoming
available or the completion of an I/O operation. IPC is possible between the processes on same
computer as well as on the processes running on different computer i.e. in networked/distributed
system. In both cases, the process may or may not be blocked while sending a message or
attempting to receive a message so Message passing may be blocking or non-blocking. Blocking
is considered synchronous and blocking send means the sender will be blocked until the message
is received by receiver. Similarly, blocking receive has the receiver block until a message is
available. Non-blocking is considered asynchronous and Non-blocking send has the sender sends
the message and continue. Similarly, Non-blocking receive has the receiver receive a valid

000025
message or null. After a careful analysis, we can come to a conclusion that, for a sender it is
more natural to be non-blocking after message passing as there may be a need to send the
message to different processes But the sender expect acknowledgement from receiver in case the
send fails. Similarly, it is more natural for a receiver to be blocking after issuing the receive as
the information from the received message may be used for further execution but at the same
time, if the message send keep on failing, receiver will have to wait for indefinitely. That is why
we also consider the other possibility of message passing. There are basically three most
preferred combinations:
 Blocking send and blocking receive
 Non-blocking send and Non-blocking receive
 Non-blocking send and Blocking receive (Mostly used)
In Direct message passing, The process which want to communicate must explicitly name the
recipient or sender of communication.
e.g. send(p1, message) means send the message to p1.
similarly, receive(p2, message) means receive the message from p2.
In this method of communication, the communication link get established automatically, which
can be either unidirectional or bidirectional, but one link can be used between one pair of the
sender and receiver and one pair of sender and receiver should not possess more than one pair of
link. Symmetry and asymmetry between the sending and receiving can also be implemented i.e.
either both process will name each other for sending and receiving the messages or only sender
will name receiver for sending the message and there is no need for receiver for naming the
sender for receiving the message.The problem with this method of communication is that if the
name of one process changes, this method will not work.

In Indirect message passing, processes uses mailboxes (also referred to as ports) for sending and
receiving messages. Each mailbox has a unique id and processes can communicate only if they
share a mailbox. Link established only if processes share a common mailbox and a single link
can be associated with many processes. Each pair of processes can share several communication
links and these link may be unidirectional or bi-directional. Suppose two process want to
communicate though Indirect message passing, the required operations are: create a mail box,
use this mail box for sending and receiving messages, destroy the mail box. The standard

000026
primitives used are : send(A, message) which means send the message to mailbox A. The
primitive for the receiving the message also works in the same way e.g. received (A, message).
There is a problem in this mailbox implementation. Suppose there are more than two processes
sharing the same mailbox and suppose the process p1 sends a message to the mailbox, which
process will be the receiver? This can be solved by either forcing that only two processes can
share a single mailbox or enforcing that only one process is allowed to execute the receive at a
given time or select any process randomly and notify the sender about the receiver. A mailbox
can be made private to a single sender/receiver pair and can also be shared between multiple
sender/receiver pairs. Port is an implementation of such mailbox which can have multiple sender
and single receiver. It is used in client/server application (Here server is the receiver). The port is
owned by the receiving process and created by OS on the request of the receiver process and can
be destroyed either on request of the same receiver process or when the receiver terminates itself.
Enforcing that only one process is allowed to execute the receive can be done using the concept
of mutual exclusion. Mutex mailbox is create which is shared by n process. Sender is non-
blocking and sends the message. The first process which executes the receive will enter in the
critical section and all other processes will be blocking and will wait.

i) Shared Memory Method


Ex: Producer-Consumer problem
There are two processes: Producer and Consumer. Producer produces some item and Consumer

000027
consumes that item. The two processes shares a common space or memory location known as
buffer where the item produced by Producer is stored and from where the Consumer consumes
the item if needed. There are two version of this problem: first one is known as unbounded buffer
problem in which Producer can keep on producing items and there is no limit on size of buffer,
the second one is known as bounded buffer problem in which producer can produce up to a
certain amount of item and after that it starts waiting for consumer to consume it. We will
discuss the bounded buffer problem. First, the Producer and the Consumer will share some
common memory, then producer will start producing items. If the total produced item is equal to
the size of buffer, producer will wait to get it consumed by the Consumer. Sim-
ilarly, the consumer first check for the availability of the item and if no item is available,
Consumer will wait for producer to produce it. If there are items available, consumer will
consume it. The pseudo code are given below:

000028
Lecture 41
Cache Coherence, Shared Memory Multiprocessors
The Cache Coherence Problem
In a multiprocessor system, data inconsistency may occur among adjacent levels or within the
same level of the memory hierarchy. For example, the cache and the main memory may have
inconsistent copies of the same object.
As multiple processors operate in parallel, and independently multiple caches may possess
different copies of the same memory block, this creates cache coherence problem. Cache
coherence schemes help to avoid this problem by maintaining a uniform state for each cached
block of data.

Let X be an element of shared data which has been referenced by two processors, P1 and P2. In
the beginning, three copies of X are consistent. If the processor P1 writes a new data X1 into the
cache, by using write-through policy, the same copy will be written immediately into the shared
memory. In this case, inconsistency occurs between cache memory and the main memory.
When a write-back policy is used, the main memory will be updated when the modified data in
the cache is replaced or invalidated.
In general, there are three sources of inconsistency problem −
 Sharing of writable data
 Process migration
 I/O activity

000029
Snoopy Bus Protocols
Snoopy protocols achieve data consistency between the cache memory and the shared memory
through a bus-based memory system. Write-invalidate and write-update policies are used for
maintaining cache consistency.

000030
In this case, we have three processors P1, P2, and P3 having a consistent copy of data element
‘X’ in their local cache memory and in the shared memory (Figure-a). Processor P1 writes X1
in its cache memory using write-invalidate protocol. So, all other copies are invalidated via the
bus. It is denoted by ‘I’ (Figure-b). Invalidated blocks are also known as dirty, i.e. they should
not be used. The write-update protocol updates all the cache copies via the bus. By using write
back cache, the memory copy is also updated (Figure-c).

000031
Cache Events and Actions
Following events and actions occur on the execution of memory-access and invalidation
commands −
 Read-miss − When a processor wants to read a block and it is not in the cache, a read-
miss occurs. This initiates a bus-read operation. If no dirty copy exists, then the main
memory that has a consistent copy, supplies a copy to the requesting cache memory. If a
dirty copy exists in a remote cache memory, that cache will restrain the main memory
and send a copy to the requesting cache memory. In both the cases, the cache copy will
enter the valid state after a read miss.
 Write-hit − If the copy is in dirty or reserved state, write is done locally and the new state
is dirty. If the new state is valid, write-invalidate command is broadcasted to all the
caches, invalidating their copies. When the shared memory is written through, the
resulting state is reserved after this first write.

000032
 Write-miss − If a processor fails to write in the local cache memory, the copy must come
either from the main memory or from a remote cache memory with a dirty block. This is
done by sending a read-invalidate command, which will invalidate all cache copies.
Then the local copy is updated with dirty state.
 Read-hit − Read-hit is always performed in local cache memory without causing a
transition of state or using the snoopy bus for invalidation.
 Block replacement − When a copy is dirty, it is to be written back to the main memory
by block replacement method. However, when the copy is either in valid or reserved or
invalid state, no replacement will take place.
Directory-Based Protocols
By using a multistage network for building a large multiprocessor with hundreds of processors,
the snoopy cache protocols need to be modified to suit the network capabilities. Broadcasting
being very expensive to perform in a multistage network, the consistency commands is sent
only to those caches that keep a copy of the block. This is the reason for development of
directory-based protocols for network-connected multiprocessors.
In a directory-based protocols system, data to be shared are placed in a common directory that
maintains the coherence among the caches. Here, the directory acts as a filter where the
processors ask permission to load an entry from the primary memory to its cache memory. If an
entry is changed the directory either updates it or invalidates the other caches with that entry.
Hardware Synchronization Mechanisms
Synchronization is a special form of communication where instead of data control, information
is exchanged between communicating processes residing in the same or different processors.
Multiprocessor systems use hardware mechanisms to implement low-level synchronization
operations. Most multiprocessors have hardware mechanisms to impose atomic operations such
as memory read, write or read-modify-write operations to implement some synchronization
primitives. Other than atomic memory operations, some inter-processor interrupts are also used
for synchronization purposes.
Cache Coherency in Shared Memory Machines
Maintaining cache coherency is a problem in multiprocessor system when the processors
contain local cache memory. Data inconsistency between different caches easily occurs in this
system.

000033
The major concern areas are −
 Sharing of writable data
 Process migration
 I/O activity
Sharing of writable data
When two processors (P1 and P2) have same data element (X) in their local caches and one
process (P1) writes to the data element (X), as the caches are write-through local cache of P1,
the main memory is also updated. Now when P2 tries to read data element (X), it does not find
X because the data element in the cache of P2 has become outdated.

Process migration
In the first stage, cache of P1 has data element X, whereas P2 does not have anything. A process
on P2 first writes on X and then migrates to P1. Now, the process starts reading data element X,
but as the processor P1 has outdated data the process cannot read it. So, a process on P1 writes
to the data element X and then migrates to P2. After migration, a process on P2 starts reading
the data element X but it finds an outdated version of X in the main memory.

000034
I/O activity
As illustrated in the figure, an I/O device is added to the bus in a two-processor multiprocessor
architecture. In the beginning, both the caches contain the data element X. When the I/O device
receives a new element X, it stores the new element directly in the main memory. Now, when
either P1 or P2 (assume P1) tries to read element X it gets an outdated copy. So, P1 writes to
element X. Now, if I/O device tries to transmit X it gets an outdated copy.

Uniform Memory Access (UMA)


Uniform Memory Access (UMA) architecture means the shared memory is the same for all
processors in the system. Popular classes of UMA machines, which are commonly used for

000035
(file-) servers, are the so-called Symmetric Multiprocessors (SMPs). In an SMP, all system
resources like memory, disks, other I/O devices, etc. are accessible by the processors in a
uniform manner.
Non-Uniform Memory Access (NUMA)
In NUMA architecture, there are multiple SMP clusters having an internal indirect/shared
network, which are connected in scalable message-passing network. So, NUMA architecture is
logically shared physically distributed memory architecture.
In a NUMA machine, the cache-controller of a processor determines whether a memory
reference is local to the SMP’s memory or it is remote. To reduce the number of remote
memory accesses, NUMA architectures usually apply caching processors that can cache the
remote data. But when caches are involved, cache coherency needs to be maintained. So these
systems are also known as CC-NUMA (Cache Coherent NUMA).
Cache Only Memory Architecture (COMA)
COMA machines are similar to NUMA machines, with the only difference that the main
memories of COMA machines act as direct-mapped or set-associative caches. The data blocks
are hashed to a location in the DRAM cache according to their addresses. Data that is fetched
remotely is actually stored in the local main memory. Moreover, data blocks do not have a fixed
home location, they can freely move throughout the system.
COMA architectures mostly have a hierarchical message-passing network. A switch in such a
tree contains a directory with data elements as its sub-tree. Since data has no home location, it
must be explicitly searched for. This means that a remote access requires a traversal along the
switches in the tree to search their directories for the required data. So, if a switch in the
network receives multiple requests from its subtree for the same data, it combines them into a
single request which is sent to the parent of the switch. When the requested data returns, the
switch sends multiple copies of it down its subtree.
A shared-memory multiprocessor is a computer system composed of multiple independent
processors that execute different instruction streams. Using Flynns’s classification [ 1], an SMP
is a multiple-instruction multiple-data (MIMD) architecture. The processors share a common
memory address space and communicate with each other via memory. A typical shared-memory
multiprocessor (Fig. 1 ) includes some number of processors with local caches, all
interconnected with each other and with common memory via an interconnection (e.g., a bus).

000036
Open image in new window

000037

You might also like