7.1 Instruction Pipelining
7.1 Instruction Pipelining
Before you go through this article, make sure that you have gone through the previous article on
Instruction Pipelining.
In instruction pipelining,
● A form of parallelism called as instruction level parallelism is implemented.
● Multiple instructions execute simultaneously.
● The efficiency of pipelined execution is more than that of non-pipelined execution.
Performance of Pipelined Execution-
The following parameters serve as criterion to estimate the performance of pipelined execution-
● Speed Up
● Efficiency
● Throughput
1. Speed Up-
It gives an idea of “how much faster” the pipelined execution is as compared to non-pipelined
execution.
It is calculated as-
2. Efficiency-
The efficiency of pipelined execution is calculated as-
3. Throughput-
Throughput is defined as number of instructions executed per unit time.
It is calculated as-
Calculation of Important Parameters-
Let us learn how to calculate certain important parameters of pipelined architecture.
Consider-
● A pipelined architecture consisting of k-stage pipeline
● Total number of instructions to be executed = n
Point-01: Calculating Cycle Time-
In pipelined architecture,
● There is a global clock that synchronizes the working of all the stages.
● Frequency of the clock is set such that all the stages are synchronized.
● At the beginning of each clock cycle, each stage reads the data from its register and
process it.
● Cycle time is the value of one clock cycle.
There are two cases possible-
Case-01: All the stages offer same delay-
If all the stages offer same delay, then-
Cycle time = Delay offered by one stage including the delay due to its register
Case-02: All the stages do not offer same delay-
If all the stages do not offer same delay, then-
Cycle time = Maximum delay offered by any stage including the delay due to its register
Point-02: Calculating Frequency of Clock-
Frequency of the clock (f) = 1 / Cycle time
Point-03: Calculating Non-Pipelined Execution Time-
In non-pipelined architecture,
● The instructions execute one after the other.
● The execution of a new instruction begins only after the previous instruction has executed
completely.
● So, number of clock cycles taken by each instruction = k clock cycles
Thus,
Non-pipelined execution time
= Total number of instructions x Time taken to execute one instruction
= n x k clock cycles
Point-04: Calculating Pipelined Execution Time-
In pipelined architecture,
● Multiple instructions execute parallely.
● Number of clock cycles taken by the first instruction = k clock cycles
● After first instruction has completely executed, one instruction comes out per clock cycle.
● So, number of clock cycles taken by each remaining instruction = 1 clock cycle
Thus,
Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
Point-04: Calculating Speed Up-
Speed up
= Non-pipelined execution time / Pipelined execution time
= n x k clock cycles / (k + n – 1) clock cycles
= n x k / (k + n – 1)
= n x k / n + (k – 1)
= k / { 1 + (k – 1)/n }
● For very large number of instructions, n→∞. Thus, speed up = k.
● Practically, total number of instructions never tend to infinity.
● Therefore, speed up is always less than number of stages in pipeline.
Important Notes-
Note-01:
● The aim of pipelined architecture is to execute one complete instruction in one clock
cycle.
● In other words, the aim of pipelining is to maintain CPI ≅ 1.
● Practically, it is not possible to achieve CPI ≅ 1 due to delays that get introduced due to
registers.
● Ideally, a pipelined architecture executes one complete instruction per clock cycle
(CPI=1).
Note-02:
● The maximum speed up that can be achieved is always equal to the number of stages.
● This is achieved when efficiency becomes 100%.
● Practically, efficiency is always less than 100%.
● Therefore speed up is always less than number of stages in pipelined architecture.
Note-03:
Under ideal conditions,
● One complete instruction is executed per clock cycle i.e. CPI = 1.
● Speed up = Number of stages in pipelined architecture
Note-04:
● Experiments show that 5 stage pipelined processor gives the best performance.
Note-05:
In case only one instruction has to be executed, then-
● Non-pipelined execution gives better performance than pipelined execution.
● This is because delays are introduced due to registers in pipelined architecture.
● Thus, time taken to execute one instruction in non-pipelined architecture is less.
Note-06:
High efficiency of pipelined processor is achieved when-
● All the stages are of equal duration.
● There are no conditional branch instructions.
● There are no interrupts.
● There are no register and memory conflicts.
Performance degrades in absence of these conditions.