Pipelining and Vector Processing
Pipelining and Vector Processing
Instruction Stream
Data Stream
1
B Time re;uired to complete the n tasks
1
= n 7 t
n
Pipelined "achine -k stages/
t
p
B Clock c(cle -time to complete each suboperation/
= -k : n - 4/ 7 t
p
!peedup
!
k
B !peedup
!
k
= n7t
n
. -k : n - 4/7t
p
n
!
k
=
t
n
t
p
- = k1 if t
n
= k 7 t
p
/ lim
Pipelining
P
1
I
i
P
2
I
i+1
P
3
I
i+2
P
4
I
i+3
"ultiple 3unctional +nits
Example
- 8-stage pipeline
- subopertion in each stageC t
p
= 5Dn!
- 4DD tasks to be executed
- 4 task in non-pipelined s(stemC 5D78 = @Dn!
Pipelined !(stem
-k : n - 4/7t
p
= -8 : AA/ 7 5D = 5D?Dn!
on-Pipelined !(stem
n7k7t
p
= 4DD 7 @D = @DDDn!
!peedup
!
k
= @DDD . 5D?D = 6<@@
8-!tage Pipeline is basicall( identical to the s(stem
%ith 8 identical function units
Pipelining
!ix Phases7 in an Instruction C(cle
E4F 3etch an instruction from memor(
E5F #ecode the instruction
E6F Calculate the effective address of the operand
E8F 3etch the operands from memor(
E9F Execute the operation
E?F !tore the result in the proper place
7 !ome instructions skip some phases
7 Effective address calculation can be done in
the part of the decoding phase
7 !torage of the operation result into a register
is done automaticall( in the execution phase
==G 8-!tage Pipeline
E4F 3IB 3etch an instruction from memor(
E5F #AB #ecode the instruction and calculate
the effective address of the operand
E6F 3,B 3etch the operand
E8F EHB Execute the operation
Instruction Pipeline
Execution of Three Instructions in a 8-!tage Pipeline
Instruction Pipeline
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
i
i:4
i:5
Conventional
Pipelined
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
i
i:4
i:5
4 5 6 8 9 ? > @ A 4D 45 46 44
3I #A 3, EH 4
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
3I #A 3, EH
5
6
8
9
?
>
3I
!tepB
Instruction
-)ranch/
Instruction Pipeline
3etch instruction
from memor(
#ecode instruction
and calculate
effective address
)ranchI
3etch operand
from memor(
Execute instruction
InterruptI
Interrupt
handling
+pdate PC
Empt( pipe
no
(es
(es
no
!egment4B
!egment5B
!egment6B
!egment8B
!tructural haJards-&esource Conflicts/
*ard%are &esources re;uired b( the instructions in
simultaneous overlapped execution cannot be met
#ata haJards -#ata #ependenc( Conflicts/
An instruction scheduled to be executed in the pipeline re;uires the
result of a previous instruction1 %hich is not (et available
J"P I# PC : PC
bubble I3 I# ,3 ,E ,!
)ranch address dependenc(
Hazards in pipelines may make it
necessary to stall the pipeline
Pipeline Interlock:
Detect Hazards Stall until it is cleared
Instruction Pipeline
A## #A )1C :
IC #A :4 &4 bubble
#ata dependenc(
&4 K- ) : C
&4 K- &4 : 4
Control haJards
)ranches and other instructions that change the PC
make the fetch of the next instruction to be dela(ed
!tructural *aJards
,ccur %hen some resource has not been
duplicated enough to allo% all combinations
of instructions in the pipeline to execute
ExampleB 'ith one memor(-port1 a data and an instruction fetch
cannot be initiated in the same clock
The Pipeline is stalled for a structural haJard
K- T%o Loads %ith one port memor(
-G T%o-port memor( %ill serve %ithout stall
Instruction Pipeline
3I #A 3, EH
i
i:4
i:5
3I #A 3, EH
3I #A 3, EH stall stall
#ata *aJards
,ccurs %hen the execution of an instruction
depends on the results of a previous instruction
A## &41 &51 &6
!+) &81 &41 &9
*ard%are Techni;ue
Interlock
- hard%are detects the data dependencies and dela(s the scheduling
of the dependent instruction b( stalling enough clock c(cles
Forwarding -b(passing1 short-circuiting/
- Accomplished b( a data path that routes a value from a source
-usuall( an AL+/ to a user1 b(passing a designated register< This
allo%s the value to be produced to be used at an earlier stage in the
pipeline than %ould other%ise be possible
!oft%are Techni;ue
Instruction !cheduling-compiler/ for delayed load
#ata haJard can be dealt %ith either hard%are
techni;ues or soft%are techni;ue
Instruction Pipeline
&egister
file
&esult
%rite bus
)(pass
path
AL+ result buffer
"+H
AL+
&8
"+H
Instruction Pipeline
ExampleB
A## &41 &51 &6
!+) &81 &41 &9
6-stage Pipeline
IB Instruction 3etch
AB #ecode1 &ead &egisters1
AL+ ,perations
EB 'rite the result to the
destination register
I A E
A##
!+)
I A E
'ithout )(passing
I A E
!+) 'ith )(passing
a = b : cC
d = e - fC
+nscheduled codeB
#ela(ed Load
A load re;uiring that the follo%ing instruction not use its result
!cheduled CodeB
L' &b1 b
L' &c1 c
L' &e1 e
A## &a1 &b1 &c
L' &f1 f
!' a1 &a
!+) &d1 &e1 &f
!' d1 &d
L' &b1 b
L' &c1 c
A## &a1 &b1 &c
!' a1 &a
L' &e1 e
L' &f1 f
!+) &d1 &e1 &f
!' d1 &d
Instruction Pipeline
)ranch Instructions
- )ranch target address is not kno%n until
the branch instruction is completed
- !tall -G %aste of c(cle times
3I #A 3, EH
3I #A 3, EH
)ranch
Instruction
ext
Instruction
Target address available
#ealing %ith Control *aJards
7 Prefetch Target Instruction
7 )ranch Target )uffer
7 Loop )uffer
7 )ranch Prediction
7 #ela(ed )ranch
Instruction Pipeline
Instruction C(cles of Three-!tage Instruction Pipeline
RISC Pipeline
&I!C
- "achine %ith a ver( fast clock c(cle that
executes at the rate of one instruction per c(cle
K- !imple Instruction !et
3ixed Length Instruction 3ormat
&egister-to-&egister ,perations
#ata "anipulation Instructions
IB Instruction 3etch
AB #ecode1 &ead &egisters1 AL+ ,perations
EB 'rite a &egister
Load and !tore Instructions
IB Instruction 3etch
AB #ecode1 Evaluate Effective Address
EB &egister-to-"emor( or "emor(-to-&egister
Program Control Instructions
IB Instruction 3etch
AB #ecode1 Evaluate )ranch Address
EB 'rite &egister-PC/
Three-segment pipeline timing
Pipeline timing %ith data conflict
clock c(cle 4 5 6 8 9 ?
Load &4 I A E
Load &5 I A E
Add &4:&5 I A E
!tore &6 I A E
Pipeline timing %ith dela(ed load
clock c(cle 4 5 6 8 9 ? >
Load &4 I A E
Load &5 I A E
,P I A E
Add &4:&5 I A E
!tore &6 I A E
L,A#B &4 "Eaddress 4F
L,A#B &5 "Eaddress 5F
A##B &6 &4 : &5
!T,&EB "Eaddress 6F &6
RISC Pipeline
The data dependenc( is taken
care b( the compiler rather
than the hard%are
1
I
3 4 6 5 2 Clock cycles:
1. Load A
2. Increment
4. Subtract
5. Branch to X
7
3. Add
8
6. NOP
E
I A E
I A E
I A E
I A E
I A E
9 10
7. NOP
8. Instr. in X
I A E
I A E
1
I
3 4 6 5 2 Clock cycles:
1. Load A
2. Increment
4. Add
5. Subtract
7
3. Branch to X
8
6. Instr. in X
E
I A E
I A E
I A E
I A E
I A E
Compiler anal(Jes the instructions before and after
the branch and rearranges the program se;uence b(
inserting useful instructions in the dela( steps
+sing no-operation instructions
&earranging the instructions
RISC Pipeline
ector Processin! "pplications