Lect6 Pipelining2 Sec2 PDF
Lect6 Pipelining2 Sec2 PDF
BranchHazards,StaticBranchPrediction,
Branch
Hazards, Static Branch Prediction,
MulticycleFPMIPSpipeline
Control Hazards
ControlHazards
EE/CS520 Comp.Archi.
9/18/2012
Control(Branch)Hazards
Cancausegreaterperformancedegradationthanthe
datahazards
CannotdecideaboutthenextvalueofPCuntil
C
d d b
h
l
f PC
l
Branchconditionisevaluated
Branchaddressiscalculated
Branch address is calculated
Evenonestallcycleforeverybranchcauses1030%
y
y
%
performanceloss
Dependsonbranchinstfrequency
EE/CS520 Comp.Archi.
9/18/2012
PipelinedMIPSProcessor
FiguretakenfromtheTextbookbyHennessyandPatterson
BranchconditionandaddressevaluationinEXstage
Branch condition and address evaluation in EX stage
Resultsinoneadditionalstallcycle
Savesusanadder
EE/CS520 Comp.Archi.
9/18/2012
PipelinedMIPSProcessor
FiguretakenfromtheTextbookbyHennessyandPatterson
Taken
Untaken
BranchconditionandaddressevaluationinIDstage
Branch condition and address evaluation in ID stage
Resultsinasinglestallcycle
Costsusanadditionaladder
EE/CS520 Comp.Archi.
9/18/2012
HowtoAvoidBranchPenalties?
Staticbranchprediction
Theactionstakenforbranchesarealwaysfixed
Decidedatcompiletime
Dynamicbranchprediction
Theactionstoresolveabranchpenaltycanvary
The actions to resolve a branch penalty can vary
dependinguponpreviousobservations
Doneatruntime
EE/CS520 Comp.Archi.
9/18/2012
StaticBranchPrediction
FreezeorFlushthepipeline
PredictedTaken
PredictedUntaken
BTFN(BackwardTaken,ForwardNottaken)
DelayedBranch
EE/CS520 Comp.Archi.
9/18/2012
FreezeorFlush
Holdorremoveinstafterbranchuntilthetargetisknown
Iftaken,Brsucc.isrefetchedfromtarget
Else,refetchedfromPC+4
El
f h df
PC 4
Advantage:Simplicity
Disadvantage:Branchpenaltyisfixedforanycase
Disadvantage Branch penalty is fixed for any case
Brinst
Brsucc.
Brsucc.+1
Brsucc.+2
8
EE/CS520 Comp.Archi.
IF
ID
EX
MEM
WB
IF
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
IF
ID
EX
9/18/2012
PredictedUntaken
Treateverybranchasuntaken
Continuewiththeprogramflowwithnextinst
Dontchangetheprocessorstateuntilbranch
outcomeisknown
k
Ifbranchuntakeninrealityweavoidstall
Else,turnthefetchedsequentialinstintoaNOP
El t
th f t h d
ti l i t i t NOP
EE/CS520 Comp.Archi.
9/18/2012
PredictedUntaken
ii=Untaken
UntakenBr
Br
IF
Insti+1
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
Insti+2
Insti+3
Insti+4
i=TakenBr
IF
Insti+1
Br Target
Br.Target
Br.Target+1
Br.Target+2
10
EE/CS520 Comp.Archi.
MEM
ID
EX
MEM WB
IF
NOP
NOP
NOP
NOP
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM
WB
WB
9/18/2012
PredictedTaken
OppositetoPredictedUntaken
Treateverybranchastaken
AssoonasthebranchtargetiscomputedinID
Startfetchingtheinstfromtargetaddress
In5stagepipeline,targetaddressiscalculatedwith
In 5 stage pipeline target address is calculated with
branchconditionevaluation
Notusefulforthiscase
11
EE/CS520 Comp.Archi.
9/18/2012
BTFNBranchPrediction
Mergeroftwopreviousapproaches
Merger of two previous approaches
Pro:Yieldstobetterresultsthanthetwo
Con:Compilersjobbecomestougher
p
j
g
12
EE/CS520 Comp.Archi.
9/18/2012
DelayedBranch
Addaslotafterbranchinstcalled
Add a slot after branch inst called Sequential
SequentialSuccessor
Successor
Carriesaninstthatwillalwaysbeexecutedwhether
branchtakenornot
Branchinstruction
Branchtargetiftaken
Branch target if taken
13
EE/CS520 Comp.Archi.
9/18/2012
DelayedBranch
ii=Untaken
UntakenBr
Br
IF
Br.DelayInst(i+1)
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM WB
Insti+2
Insti+3
Insti+4
i=TakenBr
Br.DelayInst(i+1)
Br Target
Br.Target
Br.Target+1
Br.Target+2
14
EE/CS520 Comp.Archi.
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM WB
IF
ID
EX
MEM
WB
9/18/2012
DelayedBranch
Compilerisresponsibletomakethesequential
successorvalidanduseful
Delayslotisfilledwithan
Delay slot is filled with an
instfromabovethebranch
instinprogramflow
Bestoptionasthereareno
sideeffects
DADDR1,R2,R3
DADD
R1 R2 R3
IfR2=0then
DelaySlot
becomes
IfR2=0then
DADDR1,R2,R3
, ,
15
EE/CS520 Comp.Archi.
9/18/2012
DelayedBranch
DSUBR4,R5,R6
Thedelayslotisfilledwithaninst
b
branchprobabilityisquitehigh
h
b bl
h h
E.g.loops
Themovedbranchhastobecopied
DADDR1,R2,R3
IfR1=0then
DelaySlot
becomes
DSUB R4, R5, R6
DSUBR4,R5,R6
asitcanbereachedfromotherpath
DADDR1,R2,R3
If R1 0 th
IfR1=0then
DSUBR4,R5,R6
16
EE/CS520 Comp.Archi.
9/18/2012
DelayedBranch
Thedelayslotisfilledwitha
untakenbranchprobabilityis
k b
h
b bl
quitehigh
Theworkdonebydelayslotwill
bewastedinthiscase
Inlasttwocases,thecompiler
mustmakesurethattheprogram
executescorrectlyifthebranch
y
goesintheunpredicteddirection
17
EE/CS520 Comp.Archi.
DADDR1,R2,R3
IfR1=0then
DelaySlot
ORR7,R5,R3
DSUBR4,R5,R6
becomes
DADDR1,R2,R3
DADD
R1 R2 R3
IfR1=0then
ORR7,R5,R3
DSUBR4,R5,R6
9/18/2012
DelayedBranch
Compilereffectivenessforsinglebranchdelayslot:
Fillsabout60%ofbranchdelayslots
About80%ofinstructionsexecutedinbranchdelayslots
usefulincomputation
~50%(60%x80%)ofslotsusefullyfilled
50% (60% 80%) f l t
f ll fill d
DelayedBranchdownside:Asprocessorgoestodeeper
pipelinesandmultipleissue,thebranchdelaygrowsand
i li
d
li l i
h b
hd l
d
needmorethanonedelayslot
Delayedbranchinghaslostpopularitycomparedtomore
expensivebutmoreflexibledynamicapproaches
expensive
but more flexible dynamic approaches
Growthinavailabletransistorshasmadedynamic
approachesrelativelycheaper
18
EE/CS520 Comp.Archi.
9/18/2012
PerformanceofBranchSchemes
PipelineDepth
SpeedUp
1 Pipeline Stall Cycle from Branches
PipelineDepth
SpeedUp
1 Branch Frequency x Branch Penalty
19
EE/CS520 Comp.Archi.
9/18/2012
MultiCycle
Multi
CycleFPMIPSPipeline
FP MIPS Pipeline
20
EE/CS520 Comp.Archi.
9/18/2012
MulticycleFPMIPSPipeline
IF
ID
EX
MEM
WB
1ClockCycle
ImpracticalSolution:
Eithertheclockfrequencyistoolow
Orneedhugeamountofparallellogic
O
dh
t f
ll ll i
21
EE/CS520 Comp.Archi.
9/18/2012
MulticycleFPMIPSPipeline
IF
ID
EX
MEM
WB
EX
(intadd/sub)
EX
(FPmul)
IF
ID
EX
MEM
WB
(FPadd/sub)
EX
22
(FPdiv)
EE/CS520 Comp.Archi.
9/18/2012
MulticycleFPMIPSPipeline
Latency(withforwarding):
No.ofcyclesaninsttakesinproducingresult
afterenteringEXstage
Simply(EXpipelinelength
l (
l
l
h 1))
EX(int)
( )
IF
ID
M
1
M
2
M
7
A
1
A
2
A
3
A
4
MEM
WB
DIV(unpipelined)
23
EE/CS520 Comp.Archi.
9/18/2012
HazardsinFPMIPSPipeline
StructuralHazard
DIV
DIVunitisnotfully
f ll pipelined
l d
secondinstneedingDIVunithastobestalled
Instshavevariableexecutiontime
Needformorethanonewritetoregfileinaclockcycle
WAWHazards
Sinceexecutionisoutoforder
Since execution is outoforder
NoWARHazard
SinceregreadalwaysoccurinIDstage
Wedontissueinstoutoforder(yet!)
Longerlatency HigherStallFreq.onRAWHazards
24
EE/CS520 Comp.Archi.
9/18/2012
Example:RAWHazard
25
EE/CS520 Comp.Archi.
9/18/2012
Example:StructuralHazard
26
EE/CS520 Comp.Archi.
9/18/2012
Example:StructuralHazard(Solution)
12
Stall
Stall
27
EE/CS520 Comp.Archi.
13
WB
Stall WB
9/18/2012
Example:WAWHazard
Solution:1)StallLDuntilADD.DisinMEM(MoreCommon)
2)IssueLDnormally,ButIgnoretheresultofADD.D
28
EE/CS520 Comp.Archi.
9/18/2012
DataForwardinginMCFPMIPS
29
EE/CS520 Comp.Archi.
9/18/2012
DataForwardinginMCFPMIPS
EX
IF
ID
M
1
M
2
M
7
A
1
A
2
A
3
A
4
MEM
WB
DIV (unpipelined)
DIV(unpipelined)
30
EE/CS520 Comp.Archi.
9/18/2012
SummaryofHazardChecks
StructuralHazard
Waituntiltheresource(FU)isbusy
WAWHazard(inIDStage)
CheckifanyinstinA1,,A4,M1,,M7hasthesame
Ch k if
i
i A1
A4 M1
M7 h h
destregasthisone,ifsostall
RAWHazard
RAW Hazard
Waituntilthesrcregistersarenotlistedaspending
destinationsinapipelineregister(willnotbeavailable
whenthisinstneedstheresult)
E.g.ifaninst(i)inIDneedsF2assrcreg,F2cantbeinID/A1,
A1/A2orA2/A3
31
EE/CS520 Comp.Archi.
9/18/2012