csso-U-5
csso-U-5
Parallel Processing:
• Parallel processing is a term used for a large class of techniques that are used
to provide simultaneous data-processing tasks for the purpose ofincreasing
the computational speed of a computer system.
• It refers to techniques that are used to provide simultaneous data processing.
• The system may have two or more ALUs to be able to execute two or more
instruction at the same time.
• The system may have two or more processors operating concurrently.
• It can be achieved by having multiple functional units that perform
same or different operation simultaneously.
• Parallel processing is done by distributing the data among multiple
functional Units.
Processor with Multiple function units:
The following figure shows one possible way of separating the execution unit into
8 functional units operating in parallel
Fig: Processor with Multiple functional units
• The operation performed in each functional unit is indicated in each block
of the diagram.
• The Adder and integer multiplier perform arithmetic operation with Integer
numbers.
• The floating point operations are separated into 3 circuits operating in
parallel.
• The logic, shift, and increment operation can be performed concurrently on
different data.
• All units are independent, so one number can be shifted while another
number is being activated.
• Architectural Classification: –
• Flynn's classification
• Considers the organization of a computer system by number of instructions
and data items that are manipulated simultaneously.
• Based on the multiplicity of Instruction Streams and Data Streams
• Instruction Stream-Sequence of Instructions read from memory
• Data Stream - Operations performed on the data in the processor
• Parallel processing may occur in the instruction stream, in the data stream
or in both.
• Flynn’s classification divides computer into 4 major groups:
1. SISD (Single Instruction stream, Single Data stream)
2. SIMD (Single Instruction stream, Multiple Data stream)
3. MISD (Multiple Instruction stream, Single Data stream)
4. MIMD (Multiple Instruction stream, Multiple Data stream
PIPELINING
• A technique of decomposing a sequential process into sub operations, with
each sub process being executed in a special dedicated segment that operates
concurrently with all other segments.
• A pipelinig is a collection of processing segments.
• Each segment performs partial processing dictated by the way task is
partitioned.
• The result obtained from each segment is transferred to next segment.
• The final result is obtained when data have passed through all segments.
• Suppose we have to perform the following task:
• Each sub operation is to be performed in a segment within a pipeline.
• Each segment has one or two registers and a combinational circuit.
• The register holds the data. The combinational circuit performs
the suboperation in the particular segment.
• A clock is applied to all registers after enough time has elapsed to perform
all segment activity.
• A clock is applied to all registers after enough time has elapsed to perform
all segment activity.
• The pipeline organization will be demonstrated by means of a simple
example.
• To perform the combined multiply and add operations with a stream of
numbers
Ai * Bi + Ci for i = 1, 2, 3, …, 7
• Each suboperation is to be implemented in a segment within a
pipeline. R1 Ai , R2 BiInput Ai and Bi
R3 R1 * R2, R4 Ci Multiply and input Ci
R5 R3 + R4 Add Ci to product
• Each segment has one or two registers and a combinational circuit as shown
in Fig.
• The five registers are loaded with new data every clock pulse. The effect of
each clock is shown in Table.
General Considerations:
• Any operation that can be decomposed into a sequence of suboperations of
about the same complexity can be implemented by a pipeline processor.
• The general structure of a four-segment pipeline is illustrated in Fig. 4-2.
We define a task as the total operation performed going through all the
segments in the pipeline.
Arithmetic
pipeline:
• Pipeline arithmetic units are usually found in very high speed computers
• Floating–point operations, multiplication of fixed-point numbers, and
similar computations in scientific problem
• Floating–point operations are easily decomposed into suboperations as
demonstrated in Sec. 10-5.
• An example of a pipeline unit for floating-point addition and subtraction is
showed in the following:
• The inputs to the floating-point adder pipeline are two normalized floating
point binary number
• A and B are two fractions that represent the mantissas, a and b are the
exponents.
• The floating-point addition and subtraction can be performed in four
segments, as shown in Fig. 9-6.
• The suboperations that are performed in the four segments are:
1. Compare the exponents
2. Align the mantissa
3. Add or subtract the mantissas
4. Normalize the result
Instruction Pipeline:
• Pipeline processing can occur not only in the data stream but in the
instruction as well.
• Consider a computer with an instruction fetch unit and an instruction
execution unit designed to provide a two-segment pipeline.
• Computers with complex instructions require other phases in addition to
above phases to process an instruction completely.
• In the most general case, the computer needs to process each instruction
with the following sequence of steps.
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
• There are certain difficulties that will prevent the instruction pipeline from
operating at its maximum rate.
• Different segments may take different times to operate on the incoming
information.
• Some segments are skipped for certain operations.
• Two or more segments may require memory access at the same time,
causing one segment to wait until another is finished with the
memory.
Example: four-segment instruction pipeline:
• Assume that:
• The decoding of the instruction can be combined with the calculation of
the effective address into one segment (DA in segment 2 and FI in
segment 1).
• The instruction execution and storing of the result can be combined into one
segment(FO in segment 3 and IE in segment 4)
• Fig 9-7 shows how the instruction cycle in the CPU can be processed with a
four segment pipeline.
• In the above diagram we can see that how the values of A vector and B
Vector which represents the matrix are being multiplied. Here we will be
considering a 4x4 matrix A and B.
• When addition operation is taking place in the adder pipeline the next set of
values will be brought into the multiplier pipeline, so that all the operations
can be performed simultaneously using the parallel processing concepts by
the implementation of pipeline.
Memory Interleaving:
• Pipeline and vector processors often require simultaneous access to memory
from two or more sources.
• An instruction pipeline may require the fetching of an instruction and an
operand at the same time from two different segments.
• An arithmetic pipeline usually requires two or more operands to enter the
pipeline at the same time.
• Instead of using two memory buses for simultaneous access, the
memory can be partitioned into a number of modules connected to a
common memory address and data buses.
• A memory module is a memory array together with its own address and
data registers.
• Fig. 9-13 shows a memory unit with four modules.
Array Processors:
• An array processor is a processor that performs computations on large
arrays of data.
• The term is used to refer to two different types of processors.
Attached array processor:
It is an auxiliary processor. It is intended to improve the performance of the
host computer in specific numerical computation tasks.
SIMD array processor:
Has a single-instruction multiple-data organization. It manipulates vector
instructions by means of multiple functional units responding to a common
instruction