0% found this document useful (0 votes)
21 views8 pages

CO Unit6

Multiprocessor systems consist of multiple CPUs that can be categorized into tightly coupled (shared memory) and loosely coupled (distributed memory) systems, each with distinct communication methods and performance characteristics. Tightly coupled systems allow synchronized communication through shared memory, while loosely coupled systems use message-passing for communication between processors. Various interconnection structures, such as time-shared common buses, crossbar switches, and hypercube networks, facilitate data transfer and processor communication, each with its own advantages and limitations.

Uploaded by

N sowjanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

CO Unit6

Multiprocessor systems consist of multiple CPUs that can be categorized into tightly coupled (shared memory) and loosely coupled (distributed memory) systems, each with distinct communication methods and performance characteristics. Tightly coupled systems allow synchronized communication through shared memory, while loosely coupled systems use message-passing for communication between processors. Various interconnection structures, such as time-shared common buses, crossbar switches, and hypercube networks, facilitate data transfer and processor communication, each with its own advantages and limitations.

Uploaded by

N sowjanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

 The packets are addressed to a specific processor or taken by the first available processor,

UNIT -6 depending on the communication system used. Loosely coupled systems are most
MULTI-PROCESSORS efficient when the interaction between tasks is minimal, whereas tightly coupled systems
can tolerate a higher degree of interaction between tasks.
Characteristics of Multiprocessor systems:  Overhead for data exchange is high.
A multiprocessor system is an interconnection of two or more CPU’s with memory and input-  Distributed memory system.
output equipment. Multiprocessors system are classified as multiple instruction-stream, multiple
Data-stream systems (MIMD).
INTERCONNECTION STRUCTURES:
There exists a distinction between multiprocessor and multicomputers that though both support
concurrent operations. In multicomputers several autonomous computers are connected through The interconnection between the components of a multiprocessor System can have different
a network and they may or may not communicate but in a multiprocessor system there is a physical configurations depending on the number of transfer paths that are available between the
single OS Control that provides interaction between processors and all the components of the processors and memory in a shared memory system and among the processing elements in a
system to cooperate in the solution of the problem. loosely coupled system.

VLSI circuit technology has reduced the cost of the computers to such a low Level that the Some of the schemes are as:
concept of applying multiple processors to meet system performance requirements has become
an attractive design possibility.  Time-Shared Common Bus
 Multiport Memory
Benefits of Multiprocessing:  Crossbar Switch
 Multistage Switching Network
1. Multiprocessing increases the reliability of the system so that a failure or error in one part  Hypercube System
has limited effect on the rest of the system. If a fault causes one processor to fail, a
second processor can be assigned to perform the functions of the disabled one.
2. Improved System performance. System derives high performance from the fact that Time-Shared Common Bus:
computations can proceed in parallel in one of the two ways:
A common-bus multiprocessor system consists of a number of processors connected through a
a) Multiple independent jobs can be made to operate in parallel. common path to a memory unit. Only one processor can communicate with the memory or
b) A single job can be partitioned into multiple parallel tasks. This can be achieved in another processor at any given time.
two ways:  when one processor is communicating with the memory, all other processors are either
 The user explicitly declares that the tasks of the program be executed in parallel. busy with internal operations or must be idle waiting for the bus.
 The compiler provided with multiprocessor s/w that can automatically detect  As a consequence, the total overall transfer rate within the system is limited by the speed
parallelism in program. Actually it checks for Data dependency. of the single path.

Tightly Coupled System/Shared Memory: Transfer operations are conducted by the processor that is in control of the bus at the time. Any
other processor wishing to initiate a transfer must first determine the availability status of the
bus, and only after the bus becomes available can the processor address the destination unit to
Multiprocessors are classified by the way their memory is organized. A multiprocessor system initiate the transfer. A command is issued to inform the destination unit what operation is to be
with common shared memory is classified as a shared-memory or tightly coupled performed. The receiving unit recognizes its address in the bus and responds to the control
multiprocessor. Tasks and/or processors communicate in a highly synchronized fashion. This signals from the sender, after which the transfer is initiated. The system may exhibit transfer
does not preclude each processor from having its own local memory. In fact, most commercial
conflicts since one common bus is shared by all processors. These conflicts must be resolved by
tightly coupled multiprocessors provide a cache memory with each CPU. In addition, there is a incorporating a bus controller that establishes priorities among the requesting units.
global common memory that all CPUs can access. Information can therefore be shared among
the CPUs by placing it in the common global memory. A single common-bus system is restricted to one transfer at a time. This means that when one
processor is communicating with the memory, all other processors are either busy with internal
operations or must be idle waiting for the bus. As a consequence, the total overall transfer rate
Loosely Coupled System/Distributed Memory: within the system is limited by the speed of the single path. The processors in the system can be
kept busy more often through the implementation of two or more independent buses to permit
 In the distributed-memory or loosely coupled system each processors element in a loosely multiple simultaneous bus transfers. However, this increases the system cost and complexity.
coupled system has its own private local memory. Tasks or processors do not
communicate in a synchronized fashion.
 The processors are tied together by a switching scheme designed to route information
from one processor to another through a message-passing scheme. The processors relay
program and data to other processors in packets. A packet consists of an address, the data
content and some error detection code.
Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
Figure : Time-shared common bus organization.

A more economical implementation of a dual bus structure is depicted in Figure below. Here we
have a number of local buses each connected to its own local memory and to one or more Figure: Multiport memory Organization.
processors. Each local bus may be connected to a CPU, an IOP, or any combination of  Thus CPU 1 will have priority over CPU 2, CPU 2 will have priority over CPU 3, and
processors. A system bus controller links each local bus to a common system bus. The I/O CPU 4 will have the lowest priority.
devices connected to the local IOP, as well as the local memory, are available to the local  The advantage of the multiport memory organization is the high transfer rate that can be
processor. The memory connected to the common system bus is shared by all processors. If an achieved because of the multiple paths between processors and memory.
IOP is connected directly to the system bus, the I/O devices attached to it may be made available
to all processors. Only one processor can communicate shared memory and other common  The disadvantage is that it requires expensive memory control logic and a large number
resources through the system bus at any given time. The other processors are kept busy of cables and connectors. As a consequence, this interconnection structure is usually
communicating with their local memory and I/O devices. appropriate for systems with a small number of processors.
 Part of the local memory may be designed as a cache memory attached to the CPU. In Crossbar Switch:
this way, the average access time of the local memory can be made to approach the cycle  The crossbar switch organization consists of a number of crosspoints that are placed at
time of the CPU to which it is attached. intersections between processor buses and memory module paths.
 Each crosspoint is a switch that determines the path from a processor to a memory
module.
 Each switch point has control logic to set up the transfer path between a processor and
memory. It examines the address that is placed in the bus to determine whether its
particular module is being addressed. It also resolves multiple requests for access to the
same memory module on a predetermined priority basis.
 Figure shows the functional design of a crossbar switch connected to one memory
module. The circuit consists of multiplexers that select the data, address, and control from
one CPU for communication with the memo module.
Priority levels are established by the arbitration logic to select one CPU when two or more CPUs
attempt to access the same memory. The multiplexers are controlled with the binary code that is
generated by a priority encoder within the arbitration logic.

A crossbar switch organization supports simultaneous transfers from memory modules because
Figure: System bus structure for multiprocessors.
there is a separate path associated with each module. However, the hardware required to
implement the switch can become quite large and complex.
Multiport Memory:

A multiport memory system employs separate buses between each memory module and each
CPU.Each processor bus is connected to each memory module. A processor bus consists of the
address, data, and control lines required to communicate with memory. The memory module is
said to have four ports( given in figure below) and each port accommodates one of the buses.
The module must have internal control logic to determine which port will have access to memory
at any given time. Memory access conflicts are resolved by assigning fixed priorities to each
memory port. The priority for memory access associated with each processor may be established
by the physical port position that its bus occupies in each module.
Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
the communication between a number of sources and destinations. To see how this is done,
consider the binary tree shown in Figure. The two processors P1 and P2 are connected through
switches to eight memory modules marked in binary from 000 through 111. The path from a
source to a destination is determined from the binary bits of the destination number. The first bit
of the destination number determines the switch output in the first level. The second bit specifies
the output of the switch in the second level, and the third bit specifies the output of the switch in
the third level. For example, to connect P1 to memory 101, it is necessary to form a path from P1
to output 1 in the first-level switch, output 0 in the second-level switch, and output 1 in the third-
level switch. It is clear that either P1 or P2 can be connected to any one of the eight memories.
Certain request patterns, however, cannot be satisfied simultaneously. For example, if P a is
connected to one of the destinations 000 through Oil, P 2 can be connected to only one of the
destinations 100 through 111

Figure: Crossbar switch and Block diagram of crossbar switch

Multistage Switching Network:


Figure: Binary tree with 2 x 2 switches
Control the communication between a number of sources and destinations
 Tightly coupled system : PU MM Many different topologies have been proposed for multistage switching networks to control
 Loosely coupled system : PU PU processor-memory communication in a tightly coupled multiprocessor system or to control the
communication between the processing elements in a loosely coupled system. One such topology
 The basic component of a multistage network is a two-input, two-output interchange is the omega switching network shown in Figure In this configuration, there is exactly one path
switch. from each source to any particular destination. Some request patterns, however, cannot be
 As shown in Figure, the 2x2 switch has two inputs labeled A and B, and two outputs, connected simultaneously. For example, any two sources cannot be connected simultaneously to
labeled 0 and 1. There are control signals (not shown) associated with the switch that destinations 000 and 001.
establish the interconnection between the input and output terminals. The switch has the
capability of connecting input A to either of the outputs. Terminal B of the switch
behaves in a similar fashion.
 The switch also has the capability to arbitrate between conflicting requests.
 If inputs A and B both request the same output terminals only one of them will be
connected; the other will be blocked.

Figure: 8 x 8 omega switching network


Figure: Operation of a 2 x 2 interchange switch.
Using the 2x2 switch as a building block, it is possible to build a multistage network to control A particular request is initiated in the switching network by the source, which sends a 3-bit
Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
pattern representing the destination number. As the binary pattern moves through the network, For example, in a three-cube structure, node 000 can communicate directly with node 001.
each level examines a different bit to determine the 2 x 2 switch setting. Level 1 inspects the It must cross at least two links to communicate with 011 (from 000 to 001 to 011 or from
most significant bit, level 2 inspects the middle bit, and level 3 inspects the least significant bit. 000 to 010 to 011). It is necessary to go through at least three links to communicate from
When the request arrives on either input of the 2 x 2 switch, it is routed to the upper output if the node 000 to node 111.
specified bit is 0 or to the lower output if the bit is 1  A routing procedure can be developed by computing the exclusive-OR of the source
node address with the destination node address. The resulting binary value will
have 1 bits corresponding to the axes on which the two nodes differ. The message is
then sent along any one of the axes. For example, in a three-cube structure, a
In a tightly coupled multiprocessor system, the source is a processor and the destination is a message at 010 going to 001 produces an exclusive-OR of the two addresses equal
memory module. The first pass through the network sets up the path. Succeeding passes are used to 011. The message can be sent along the second axis to 000 and then through the
to transfer the address into memory and then transfer the data in either direction, depending on third axis to 001.
whether the request is a read or a write. In a loosely coupled multiprocessor system, both the
source and destination are processing elements. After the path is established, the source Interprocessor Arbitration:
processor transfers a message to the destination processor.

 Only one of CPU, IOP, and Memory can be granted to use the bus at a time
Hypercube Interconnection:  Arbitration mechanism is needed to handle multiple requests to the shared resources
to resolve multiple contention.
The hypercube or binary n-cube multiprocessor structure is a loosely coupled system composed
of N = T processors interconnected in an n-dimensional binary cube. Single Bus System: Address bus, Data bus, Control
 Each processor forms a node of the cube. Although it is customary to refer to each bus Multiple Bus System: Memory bus, I/O bus, System
node as having a processor, in effect it contains not only a CPU but also local memory bus.
and I/O interface.
 Each processor has direct communication paths to n other neighbor processors.  System bus:
These paths correspond to the edges of the cube. There are 2n distinct n-bit binary
addresses that can be assigned to the processors. Each processor address differs from A bus that connects the major components such as CPU’s, IOP’s and memory. A
that of each of its n neighbors by exactly one bit position. typical System bus consists of 100 signal lines divided into three functional groups:
data, address and control lines. In addition there are power distribution lines that
supply power to the components.
 Data transfer method over the system bus:

 Synchronous bus : achieved by driving both units from a common clock source
 Asynchronous bus : accompanied by handshaking control signals.

 Bus Arbitration Algorithm: Static / Dynamic


Figure: Hypercube structures for n=1, 2, 3 Arbitration procedures service all processor requests on the basis of established
priorities. A hardware bus priority resolving technique can be established by means
Figure 13-9 shows the hypercube structure for n = 1, 2, and 3. of a serial or parallel connection of the units requesting control of the system bus.
 A one-cube structure has n = 1 and 2n = 2. It contains two processors interconnected by
a single path.  Static : priority
 A two-cube structure has n =2 and 2n = 4. It contains four nodes interconnected as fixed: Serial
a square. arbitration:
 A three-cube structure has eight nodes interconnected as a cube.
 An n-cube structure has 2n nodes with a processor residing in each node.
 Each node is assigned a binary address in such a way that the addresses of two The serial priority resolving technique is obtained from a daisy-chain connection
neighbors differ in exactly one bit position. For example, the three neighbors of of bus arbitration circuits similar to the priority interrupt logic .The processors
the node with address 100 in a three-cube structure are 000, 110, and 101. Each connected to the system bus are assigned priority according to their position along
of these binary numbers differs from address 100 by one bit value. the priority control line. The device closest to the priority line is assigned the
highest priority. When multiple devices concurrently request the use of the bus,
Routing messages through an n-cube structure may take from one to n links from a source the device with the highest priority is granted access to it.
node to a destination node.

Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
Figure : Serial(daisy-chain) arbitration
Figure: Parallel arbitration
Above figure shows the daisy-chain connection of four arbiters. It is assumed that each
processor has its own bus arbiter logic with priority-in and priority-out lines. Dynamic Arbitration Algorithm
 The priority out (PO) of each arbiter is connected to the priority in (PI) of the next-
lower- priority arbiter. The PI of the highest-priority unit is maintained at a Priorities of the units can be dynamically changeable while the system is in operation
logic 1 value. The highest-priority unit in the system will always receive access Time Slice:
to the system bus when it requests it. Fixed length time slice is given sequentially to each processor, round-robin fashion
 The PO output for a particular arbiter is equal to 1 if its PI input is equal to 1 and Polling
the processor associated with the arbiter logic is not requesting control of the bus. Unit address polling
This is the way that priority is passed to the next unit in the chain. Bus controller advances the address to identify the requesting unit. When processor that
requires the access recognizes its address, it activates the bus busy line and then accesses
 If the processor requests control of the bus and the corresponding arbiter finds its the bus. After a number of bus cycles, the polling continues by choosing a different
PI input equal to 1, it sets its PO output to 0. Lower-priority arbiters receive a 0 in processor.
PI and generate a 0 in PO. Thus the processor whose arbiter has a PI = 1 and LRU
PO = 0 is the one that is given control of the system bus. The least recently used algorithm gives the highest priority to the requesting device that
has not used bus for the longest interval.
A processor may be in the middle of a bus operation when a higher- priority processor FIFO
requests the bus. The lower-priority processor must complete its bus operation The first come first serve scheme requests are served in the order received. The bus
before it relinquishes control of the bus. The bus busy line shown in Figure controller here maintains a queue data structure.
provides a mechanism for an orderly transfer of control. When an arbiter receives Rotating Daisy Chain
control of the bus (because its PI = 1 and PO = 0) it examines the busy line. If the line is  Conventional Daisy Chain - Highest priority to the nearest unit to the bus controller
inactive, it means that no other processor is using the bus.  The PO output of the last device is connected to the PI of the first one. Highest priority to
the unit that is nearest to the unit that has most recently accessed the bus (it becomes the
 The arbiter activates the busy line and its processor takes control of the bus. bus controller)
However, if the arbiter finds the busy line active, it means that another processor
is currently using the bus. The arbiter keeps examining the busy line while the
lower-priority processor that lost control of the bus completes its operation.
When the bus busy line returns to its inactive

state, the higher-priority arbiter enables the busy line, and its corresponding processor can
then conduct the required bus transfers.

Parallel Arbitration Logic:

The parallel bus arbitration technique uses an external priority encoder and a decoder as shown
in Figure below. Each bus arbiter in the parallel scheme has a bus request output line and a bus
acknowledge input line. Each arbiter enables the request line when its processor is requesting
access to the system bus. The processor takes control of the bus if its acknowledge input line is
enabled. The bus busy line provides an orderly transfer of control, as in the daisy-chaining case.
Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
different processors at different times.
Interprocessor Communication & Synchronization:

A communication path can be established through common input-output channels. In a shared Interprocessor Synchronization :
memory multiprocessor system, the most common procedure is to set aside a portion of
memory that is accessible to all processors.
Synchronization
 The sending processor structures a request, a message, or a procedure and places it in the
Communication of control information between processors
memory mailbox.
 To enforce the correct sequence of processes
 Status bits residing in common memory are generally used to indicate the condition of the
 To ensure mutually exclusive access to shared writable data
mailbox, whether it has meaningful information, and for which processor it is intended.
 The receiving processor can check the mailbox periodically to determine if there are valid Hardware Implementation
messages for it. Mutual Exclusion:
 The response time of this procedure can be time consuming since a processor will  Protect data from being changed simultaneous by two or more processor
recognize a request only when polling messages. Mutual Exclusion with a Semaphore:
 A more efficient procedure is for the sending processor to alert the receiving processor  One processor to exclude or lock out access to shared resource by other
directly by means of an interrupt signal. This can be accomplished through a software- processors when it is in a Critical Section
initiated interprocessor interrupt by means of an instruction in the program of one  Critical Section is a program sequence that, once begun, must complete
processor which when executed produces an external interrupt condition in a second execution before another processor accesses the same shared resource.
processor. This alerts the interrupted processor of the tact that a new message was Semaphore
inserted by the interrupting processor.  A binary variable
 1: A processor is executing a critical section, that not available to other
A multiprocessor system may have other shared resources. For example, a magnetic disk storage processors
unit connected to an IOP may be available to all CPUs. This provides a facility for sharing of  0: Available to any requesting processor
system programs stored in the disk. A communication path between two CPUs can be  Software controlled Flag that is stored in memory that all processors can
established through a link between two IOPs associated with two different CPUs. This type of be access

link allows each CPU to treat the other as an I/O device so that messages can be transferred
through the I/O path.

To prevent conflicting use of shared resources by several processors there must be a provision
for assigning resources to processors. This task is given to the operating system. There are three
organizations that have been used in the design of operating system for multiprocessors:
 Master-slave configuration,
 Separate operating system, and
 Distributed operating system.

In a master-slave mode
One processor, designated the master, always executes the operating system functions.
The remaining processors, denoted as slaves, do not perform operating system functions.
If a slave processor needs an operating system service, it must request it by interrupting
the master and waiting until the current program can be interrupted.

Separate operating system organization


Each processor can execute the operating system routines it needs. This organization is
more suitable for loosely coupled systems where every processor may have its own copy
of the entire operating system.

Distributed operating system organization


The operating system routines are distributed among the available processors. However,
each particular operating system function is assigned to only one processor at a time. This
type of organization is also referred to as a floating operating system since the routines
float from one processor to another and the execution of the routines may be assigned to
Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
Testing and Setting the Semaphore
 Avoid two or more processors test or set the same semaphore
 May cause two or more processors enter the same critical section at the
same time
 Must be implemented with an indivisible operation

R M [SEM] / Test semaphore /


M [SEM] 1 / Set semaphore /

These are being done while locked, so that other processors cannot test and set
while current processor is being executing these instructions.
Figure: Cache configuration after a load on X
If R=1, another processor is executing the critical section, the processor
executed this instruction does not access the shared memory.  If one of the processors performs a store to X , the copies of X in the caches become
If R=0, available for access, set the semaphore to 1 and access the last inconsistent.
instruction in the program must clear the semaphore.  Depending on the memory update policy used in the cache, the main memory may also be
inconsistent with respect to the cache. This is shown in Figure below:

Cache Coherence:

 The primary advantage of cache is its ability to reduce the average access time in
uniprocessors.
 When the processor finds a word in cache during a read operation, the main memory is
not involved in the transfer.
 If the operation is to write, there are two commonly used procedures to update memory.
 In the write-through policy, both cache and main memory are updated with
every write operation.
 In the write-back policy, only the cache is updated and the location is marked so
that it can be copied later into main memory.

 The same information may reside in a number of copies in some caches and main Figure (a) : with write-through cache policy
memory.
 To ensure the ability of the system to execute memory operations correctly, the multiple
copies must be kept identical. This requirement imposes a cache coherence problem.
 A memory scheme is coherent if the value returned on a load instruction is always the
value given by the latest store instruction with the same address.

Conditions for incoherence


Cache coherence problems exist in multiprocessors with private caches because of the need to
share writable data. Read-only data can safely be replicated without cache coherence
enforcement mechanisms.
 To illustrate the problem, consider the three-processor configuration with private
caches shown in Figure: Figure (b) : with write-back cache policy

The load on X to the three processors results in consistent copies in the caches and Another configuration that may cause consistency problems is a direct memory access
main memory. (DMA) activity in conjunction with an IOP connected to the system bus

Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal. Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.
 In the case of input, the DMA may modify locations in main memory that also reside
in cache without updating the cache.
 During a DMA output, memory locations may be read before they are updated from
the cache when using a write-back policy.

Solutions to the cache coherence problem

A simple scheme is to disallow private caches for each processor and have a shared cache
memory associated with main memory.
 This method violates the principle of closeness of CPU to cache and increases the
average memory access time.

One scheme that has been used allows only nonshared and read-only data to be stored in caches.
Such items are called cachable. Shared writable data are non-cacheable.

 This method restricts the type of data stored in caches and introduces extra software
overhead that may degrade performance.

A scheme that allows writable data to exist in at least one cache is a method that employs a
centralized global table in its compiler.
 The status of memory blocks is stored in the central global table.
 Each block is identified as read-only (RO) or read and write (RW).

The cache coherence problem can be solved by means of a combination of software and
hardware or by means of hardware-only schemes.
 Hardware-only solutions are handled by the hardware automatically and have the
advantage of higher speed and program transparency.
 In the hardware solution, the cache controller is specially designed to allow it to monitor
all bus requests from CPUs and IOPs.
 The bus controller that monitors this action is referred to as a snoopy cache controller.
 The local snoopy controllers in all other caches check their memory to determine if they
have a copy of the word that has been overwritten.
 If a copy exists in a remote cache, that location is marked invalid.

Prepared By P DASTAGIRI REDDY, RGMCET, Nandyal.

You might also like