Chapter 8 - Pipelining
Chapter 8 - Pipelining
Pipelining
Overview
Pipelining
Basic Concepts
Example
Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes
Dryer
takes 40 minutes
Folder
takes 20 minutes
10
Midnight
11
Time
30
40
20 30
40
20 30
40
20 30
Sequential
A
B
C
D
40
20
for 4 loads
If they learned pipelining, how
long would laundry take?
10
11
Midnight
Time
30
40
40
40
40 20
A
Pipelined
B
C
D
laundry takes
3.5 hours for 4 loads
Pipelining
9
Time
T
a
s
k
O
r
d
e
r
30
A
B
C
D
40
40
40
40
20
I2
Time
I3
Clockcycle
F
I2
Interstagebuffer
B1
Instruction
fetch
unit
I3
Execution
unit
(b)Hardwareorganization
F1
E1
Instruction
I1
(a)Sequentialexecution
Time
F2
E2
F3
E3
(c)Pipelinedexecution
Figure8.1.Basicideaofinstructionpipelining.
Clockcycle
F1
D1
E1
W1
F2
D2
E2
W2
F3
D3
E3
W3
F4
D4
E4
Instruction
Fetch + Decode
+ Execution + Write
I1
I2
I3
I4
W4
(a)Instructionexecutiondividedintofoursteps
Interstagebuffers
D:Decode
instruction
andfetch
operands
F:Fetch
instruction
B1
E:Execute
operation
B2
(b)Hardwareorganization
W:Write
results
B3
Pipeline Performance
The
Pipeline Performance
Time
Clockcycle
F1
D1
E1
W1
F2
D2
Instruction
I1
I2
I3
I4
I5
F3
E2
W2
D3
E3
W3
F4
D4
E4
W4
F5
D5
E5
Figure8.3. Effectofanexecutionoperationtakingmorethanoneclockcycle.
Pipeline Performance
The previous pipeline is said to have been stalled for two clock
cycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard any condition in which either the source or the
destination operands of an instruction are not available at the
time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
Instruction (control) hazard a delay in the availability of an
instruction causes the pipeline to stall.
Structural hazard the situation when two instructions require
the use of a given hardware resource at the same time.
Pipeline Performance
Time
Instruction
hazard
Clockcycle
F1
D1
E1
W1
D2
E2
W2
F3
D3
E3
Instruction
I1
I2
F2
I3
W3
(a)Instructionexecutionstepsinsuccessiveclockcycles
Time
Clockcycle
F1
F2
F2
F2
F2
F3
D1
idle
idle
idle
D2
D3
E1
idle
idle
idle
E2
E3
W1
idle
idle
idle
W2
Stage
F:Fetch
D:Decode
E:Execute
W:Write
Idle periods
stalls (bubbles)
W3
(b)Functionperformedbyeachprocessorstageinsuccessiveclockcycles
Figure8.4. PipelinestallcausedbyacachemissinF2.
Pipeline Performance
Structural
hazard
Load X(R1), R2
Time
Clockcycle
F1
D1
E1
W1
F2
D2
F3
E2
M2
W2
D3
E3
W3
F4
D4
E4
Instruction
I1
I2 (Load)
I3
I4
I5
F5
D5
Figure8.5. EffectofaLoadinstructiononpipelinetiming.
Pipeline Performance
Quiz
Four
Data Hazards
Data Hazards
Data Hazards
Time
Clockcycle
F1
D1
E1
W1
F2
D2
D2A
E2
W2
D3
E3
W3
F4
D4
E4
Instruction
I1 (Mul)
I2 (Add)
I3
I4
F3
W4
Figure8.6. PipelinestalledbydatadependencybetweenD2andW1.
Operand Forwarding
Instead
Source1
Source2
SRC1
SRC2
Register
file
ALU
RSLT
Destination
(a)Datapath
SRC1,SRC2
RSLT
E:Execute
(ALU)
W:Write
(Registerfile)
Forwardingpath
(b)Positionofthesourceandresultregistersintheprocessorpipeline
Side Effects
Instruction Hazards
Overview
Whenever
Unconditional Branches
Time
Clockcycle
F1
E1
Instruction
I1
I2 (Branch)
I3
Ik
Ik+1
F2
Executionunitidle
E2
F3
Fk
Ek
Fk+1
Ek+1
Figure8.8. Anidlecyclecausedbyabranchinstruction.
Time
Branch Timing
Clockcycle
I1
F1
D1
E1
W1
F2
D2
E2
F3
D3
F4
I2 (Branch)
I3
- Branch penalty
- Reducing the penalty
I4
Fk
Ik
Ik+1
Dk
Ek
Wk
Fk+1
Dk+1
E k+1
(a)BranchaddresscomputedinEx ecutestage
Time
Clockcycle
I1
F1
D1
E1
W1
F2
D2
I2 (Branch)
I3
Ik
Ik+1
F3
Dk
Ek
Wk
X
Fk
Fk+1
D k+1 E k+1
(b)BranchaddresscomputedinDecodestage
Figure8.9. Branchtiming.
D:Dispatch/
Decode
unit
E:Execute
instruction
W:Write
results
Figure8.10.UseofaninstructionqueueinthehardwareorganizationofFigure8.2b.
Conditional Braches
A