0% found this document useful (0 votes)
26 views

Lect6 Pipelining2 Sec2 PDF

The document discusses different techniques for branch prediction in pipelined processors including static prediction methods like predicted taken/untaken, BTFN, and delayed branches. It also covers hazards in multi-cycle floating point pipelines and techniques to mitigate them like forwarding.

Uploaded by

Ahmad Asghar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lect6 Pipelining2 Sec2 PDF

The document discusses different techniques for branch prediction in pipelined processors including static prediction methods like predicted taken/untaken, BTFN, and delayed branches. It also covers hazards in multi-cycle floating point pipelines and techniques to mitigate them like forwarding.

Uploaded by

Ahmad Asghar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Lecture6

BranchHazards,StaticBranchPrediction,
Branch
Hazards, Static Branch Prediction,
MulticycleFPMIPSpipeline

Control Hazards
ControlHazards

EE/CS520 Comp.Archi.

9/18/2012

Control(Branch)Hazards
Cancausegreaterperformancedegradationthanthe

datahazards
CannotdecideaboutthenextvalueofPCuntil
C
d d b
h
l
f PC
l
Branchconditionisevaluated
Branchaddressiscalculated
Branch address is calculated

Evenonestallcycleforeverybranchcauses1030%
y
y
%

performanceloss
Dependsonbranchinstfrequency

EE/CS520 Comp.Archi.

9/18/2012

PipelinedMIPSProcessor
FiguretakenfromtheTextbookbyHennessyandPatterson

BranchconditionandaddressevaluationinEXstage
Branch condition and address evaluation in EX stage
Resultsinoneadditionalstallcycle
Savesusanadder
EE/CS520 Comp.Archi.

9/18/2012

PipelinedMIPSProcessor
FiguretakenfromtheTextbookbyHennessyandPatterson

Taken

Untaken

BranchconditionandaddressevaluationinIDstage
Branch condition and address evaluation in ID stage
Resultsinasinglestallcycle
Costsusanadditionaladder
EE/CS520 Comp.Archi.

9/18/2012

HowtoAvoidBranchPenalties?
Staticbranchprediction
Theactionstakenforbranchesarealwaysfixed
Decidedatcompiletime

Dynamicbranchprediction
Theactionstoresolveabranchpenaltycanvary
The actions to resolve a branch penalty can vary

dependinguponpreviousobservations
Doneatruntime

EE/CS520 Comp.Archi.

9/18/2012

StaticBranchPrediction
FreezeorFlushthepipeline
PredictedTaken
PredictedUntaken
BTFN(BackwardTaken,ForwardNottaken)
DelayedBranch

EE/CS520 Comp.Archi.

9/18/2012

FreezeorFlush
Holdorremoveinstafterbranchuntilthetargetisknown
Iftaken,Brsucc.isrefetchedfromtarget
Else,refetchedfromPC+4
El
f h df
PC 4

Advantage:Simplicity
Disadvantage:Branchpenaltyisfixedforanycase
Disadvantage Branch penalty is fixed for any case

Brinst
Brsucc.
Brsucc.+1
Brsucc.+2
8

EE/CS520 Comp.Archi.

IF

ID

EX

MEM

WB

IF

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

IF

ID

EX
9/18/2012

PredictedUntaken
Treateverybranchasuntaken
Continuewiththeprogramflowwithnextinst
Dontchangetheprocessorstateuntilbranch

outcomeisknown
k
Ifbranchuntakeninrealityweavoidstall
Else,turnthefetchedsequentialinstintoaNOP
El t
th f t h d
ti l i t i t NOP

EE/CS520 Comp.Archi.

9/18/2012

PredictedUntaken
ii=Untaken
UntakenBr
Br

IF

Insti+1

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

Insti+2
Insti+3
Insti+4
i=TakenBr

IF

Insti+1
Br Target
Br.Target
Br.Target+1
Br.Target+2

10

EE/CS520 Comp.Archi.

MEM

ID

EX

MEM WB

IF

NOP

NOP

NOP

NOP

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM

WB

WB

9/18/2012

PredictedTaken
OppositetoPredictedUntaken
Treateverybranchastaken
AssoonasthebranchtargetiscomputedinID
Startfetchingtheinstfromtargetaddress

In5stagepipeline,targetaddressiscalculatedwith
In 5 stage pipeline target address is calculated with

branchconditionevaluation
Notusefulforthiscase

11

EE/CS520 Comp.Archi.

9/18/2012

BTFNBranchPrediction

Mergeroftwopreviousapproaches
Merger of two previous approaches
Pro:Yieldstobetterresultsthanthetwo
Con:Compilersjobbecomestougher
p
j
g

12

EE/CS520 Comp.Archi.

9/18/2012

DelayedBranch
Addaslotafterbranchinstcalled
Add a slot after branch inst called Sequential
SequentialSuccessor
Successor
Carriesaninstthatwillalwaysbeexecutedwhether

branchtakenornot

Branchinstruction
Branchtargetiftaken
Branch target if taken

13

EE/CS520 Comp.Archi.

9/18/2012

DelayedBranch
ii=Untaken
UntakenBr
Br

IF

Br.DelayInst(i+1)

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM WB

Insti+2
Insti+3
Insti+4
i=TakenBr
Br.DelayInst(i+1)
Br Target
Br.Target
Br.Target+1
Br.Target+2

14

EE/CS520 Comp.Archi.

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM WB

IF

ID

EX

MEM

WB

9/18/2012

DelayedBranch
Compilerisresponsibletomakethesequential

successorvalidanduseful
Delayslotisfilledwithan
Delay slot is filled with an
instfromabovethebranch
instinprogramflow
Bestoptionasthereareno
sideeffects

DADDR1,R2,R3
DADD
R1 R2 R3
IfR2=0then
DelaySlot

becomes

IfR2=0then
DADDR1,R2,R3
, ,

15

EE/CS520 Comp.Archi.

9/18/2012

DelayedBranch
DSUBR4,R5,R6
Thedelayslotisfilledwithaninst

from the br target address


fromthebr.targetaddress
Mostlypreferredincaseswhentaken

b
branchprobabilityisquitehigh
h
b bl
h h
E.g.loops

Themovedbranchhastobecopied

DADDR1,R2,R3
IfR1=0then
DelaySlot
becomes
DSUB R4, R5, R6
DSUBR4,R5,R6

asitcanbereachedfromotherpath
DADDR1,R2,R3
If R1 0 th
IfR1=0then
DSUBR4,R5,R6

16

EE/CS520 Comp.Archi.

9/18/2012

DelayedBranch
Thedelayslotisfilledwitha

untaken fall through inst


untakenfallthroughinst
Mostlypreferredincaseswhen

untakenbranchprobabilityis
k b
h
b bl
quitehigh
Theworkdonebydelayslotwill

bewastedinthiscase
Inlasttwocases,thecompiler

mustmakesurethattheprogram
executescorrectlyifthebranch
y
goesintheunpredicteddirection
17

EE/CS520 Comp.Archi.

DADDR1,R2,R3
IfR1=0then
DelaySlot
ORR7,R5,R3
DSUBR4,R5,R6
becomes
DADDR1,R2,R3
DADD
R1 R2 R3
IfR1=0then
ORR7,R5,R3

DSUBR4,R5,R6

9/18/2012

DelayedBranch
Compilereffectivenessforsinglebranchdelayslot:

Fillsabout60%ofbranchdelayslots
About80%ofinstructionsexecutedinbranchdelayslots

usefulincomputation
~50%(60%x80%)ofslotsusefullyfilled
50% (60% 80%) f l t
f ll fill d

DelayedBranchdownside:Asprocessorgoestodeeper

pipelinesandmultipleissue,thebranchdelaygrowsand
i li
d
li l i
h b
hd l
d
needmorethanonedelayslot
Delayedbranchinghaslostpopularitycomparedtomore

expensivebutmoreflexibledynamicapproaches
expensive
but more flexible dynamic approaches
Growthinavailabletransistorshasmadedynamic
approachesrelativelycheaper

18

EE/CS520 Comp.Archi.

9/18/2012

PerformanceofBranchSchemes
PipelineDepth
SpeedUp
1 Pipeline Stall Cycle from Branches

PipelineDepth
SpeedUp
1 Branch Frequency x Branch Penalty

19

EE/CS520 Comp.Archi.

9/18/2012

MultiCycle
Multi
CycleFPMIPSPipeline
FP MIPS Pipeline

20

EE/CS520 Comp.Archi.

9/18/2012

MulticycleFPMIPSPipeline

IF

ID

EX

MEM

WB

1ClockCycle
ImpracticalSolution:
Eithertheclockfrequencyistoolow
Orneedhugeamountofparallellogic
O
dh

t f
ll ll i
21

EE/CS520 Comp.Archi.

9/18/2012

MulticycleFPMIPSPipeline
IF

ID

EX

MEM

WB

EX
(intadd/sub)

EX
(FPmul)

IF

ID

EX

MEM

WB

(FPadd/sub)

EX
22

(FPdiv)
EE/CS520 Comp.Archi.

9/18/2012

MulticycleFPMIPSPipeline
Latency(withforwarding):
No.ofcyclesaninsttakesinproducingresult
afterenteringEXstage
Simply(EXpipelinelength
l (
l
l
h 1))
EX(int)
( )

IF

ID

M
1

M
2

M
7

A
1

A
2

A
3

A
4

MEM

WB

DIV(unpipelined)
23

EE/CS520 Comp.Archi.

9/18/2012

HazardsinFPMIPSPipeline
StructuralHazard
DIV
DIVunitisnotfully
f ll pipelined
l d
secondinstneedingDIVunithastobestalled
Instshavevariableexecutiontime
Needformorethanonewritetoregfileinaclockcycle

WAWHazards
Sinceexecutionisoutoforder
Since execution is outoforder

NoWARHazard
SinceregreadalwaysoccurinIDstage
Wedontissueinstoutoforder(yet!)

Longerlatency HigherStallFreq.onRAWHazards

24

EE/CS520 Comp.Archi.

9/18/2012

Example:RAWHazard

25

EE/CS520 Comp.Archi.

9/18/2012

Example:StructuralHazard

26

EE/CS520 Comp.Archi.

9/18/2012

Example:StructuralHazard(Solution)

12

Stall

Stall

27

EE/CS520 Comp.Archi.

13

WB

Stall WB

9/18/2012

Example:WAWHazard

Solution:1)StallLDuntilADD.DisinMEM(MoreCommon)
2)IssueLDnormally,ButIgnoretheresultofADD.D

28

EE/CS520 Comp.Archi.

9/18/2012

DataForwardinginMCFPMIPS

29

EE/CS520 Comp.Archi.

9/18/2012

DataForwardinginMCFPMIPS
EX

IF

ID

M
1

M
2

M
7

A
1

A
2

A
3

A
4

MEM

WB

DIV (unpipelined)
DIV(unpipelined)
30

EE/CS520 Comp.Archi.

9/18/2012

SummaryofHazardChecks
StructuralHazard
Waituntiltheresource(FU)isbusy

WAWHazard(inIDStage)
CheckifanyinstinA1,,A4,M1,,M7hasthesame
Ch k if
i
i A1
A4 M1
M7 h h

destregasthisone,ifsostall
RAWHazard
RAW Hazard
Waituntilthesrcregistersarenotlistedaspending

destinationsinapipelineregister(willnotbeavailable
whenthisinstneedstheresult)
E.g.ifaninst(i)inIDneedsF2assrcreg,F2cantbeinID/A1,

A1/A2orA2/A3

31

EE/CS520 Comp.Archi.

9/18/2012

You might also like