Unit 1
Unit 1
• The pipeline is divided into segments and each segment can execute
its operation concurrently with the other segments. When a segment
completes an operation, it passes the result to the next segment in the
pipeline and fetches the next operation from the preceding segment.
The final results of each instruction emerge at the end of the pipeline
in rapid succession.
Stages
O/P O/P O/P
EX I1 I2 I3 ...
OF I1 I2 I3
...
ID I1 I2 I3 ...
IF I1 I2 I3 I4 ...
1 2 3 4 5 6 7 8 9 10 11 12 13 Time
IF ID OF EX
A) A Pipeline Processor
Pipeline
Stages
O/P
EX
I1 I2 I3 I4 I5 ...
OF
I1 I2 I3 I4 I5 ...
ID
I1 I2 I3 I4 I5 ...
IF I1 I2 I3 I4 I5
...
1 2 3 4 5 6 7 8 9 Time
Fig. :-Space-time Diagram for Pipelined Processor (Pipeline Cycles)
Superscalar Processor
A superscalar processor is a CPU that implements a form of parallelism
called instruction-level parallelism within a single processor. It therefore
allows faster CPU throughput (the number of instructions that can be
executed in a unit of time) than would otherwise be possible at a given
clock rate. A superscalar processor executes more than one instruction
during a clock cycle by simultaneously dispatching multiple instructions to
different execution units on the processor. Each execution unit is not a
separate processing unit (called "core") as in multi-core processors, but
an execution resource within a single CPU such as an arithmetic logic
unit, a bit shifter, or a multiplier.
Pipelining and Superscalar Execution
• Pipelining overlaps various stages of instruction
execution to achieve performance.
• At a high level of abstraction, an instruction can
be executed while the next one is being decoded
and the next one is being fetched.
• This is akin to an assembly line for manufacture
of cars.
Pipelining and Superscalar Execution
• Pipelining, however, has several limitations.
• The speed of a pipeline is eventually limited by the slowest
stage.
• For this reason, conventional processors rely on very deep
pipelines (20 stage pipelines in state-of-the-art Pentium
processors).
• However, in typical program traces, every 5-6th instruction
is a conditional jump! This requires very accurate branch
prediction.
• The penalty of a misprediction grows with the depth of the
pipeline, since a larger number of instructions will have to
be flushed.
Pipelining and Superscalar Execution
• One simple way of alleviating these bottlenecks is
to use multiple pipelines.
• The total time for the computation is therefore approximately the sum of time
for load/store operations and the time for the computation itself, i.e., 200 + 16
µs.