CA Lecture 4 Module 3
CA Lecture 4 Module 3
Branch Prediction
Instructor: M. Lancaster
September 2012 2
Review of a Branching Optimization
Branch destination and test known at end Branch destination and test known at
of third cycle of execution end of second cycle of execution
PCSrc IF.Flush
Hazard
ID/EX detection
0 unit
M
u WB M ID/EX
x EX/MEM u
1 x
WB
Control M WB EX/MEM
MEM/WB
M
EX M WB Control u M WB
IF/ID x MEM/WB
0
IF/ID EX M WB
Add
Add
4 Add result
RegWrite
4 Shift
Shift Branch
left 2
left 2
MemWrite
ALUSrc
M
u
Read x
=
MemtoReg
Instruction
Instruction M
[20– 16]
0 ALUOp
u
M x
Instruction u Forwarding
[15– 11] x unit
1
RegDst
Program Time (in clock cycles) Program Time (in clock cycles)
execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
order order
(in instructions) (in instructions)
40 beq $1, $3, 7 IM Reg DM Reg 40 beq $1, $3, 7 IM Reg DM Reg
44 and $12, $2, $5 IM Reg DM Reg 44 and $12, $2, $5 IM Reg DM Reg
52 add $14, $2, $2 IM Reg DM Reg 52 add $14, $2, $2 IM Reg DM Reg
September 2012 4
Dynamic Branch Prediction
September 2012 5
Dynamic Branch Prediction
September 2012 6
The States in a 2 Bit Prediction Scheme
September 2012 7
Branch Prediction Buffer
September 2012 8
Branch Prediction Buffer
September 2012 9
Prediction accuracy of a 4096 entry 2-bit prediction
buffer
September 2012 10
Increasing the size of the buffer does not help much
September 2012 11
Correlating Branch Predictors
September 2012 12
Correlating Branch Predictors
• MIPS Code
DSUBUI R3,R1,#2
BNEZ R3,L1 ;branch b1(aa!=2)
DADD R1,R0,R0 ;aa=0
L1: DSUBUI R3,R2,#2
BNEZ R3,L2 ;branch b2 (bb!=2)
DADD R2,R0,R0 ;bb=0
L2: DSUBU R3,R1,R2
BEQZ R3,L3 ;branch b3(aa==bb)
September 2012 13
Correlating Branch Predictors
September 2012 14
Correlating Branch Predictors
L2;
September 2012 15
Correlating Branch Predictors
September 2012 16
Correlating Branch Predictors
September 2012 17
1-Bit Correlation Prediction
• Can yield higher prediction rates than the 2 bit scheme and
requires only a small amount of additional hardware We
can record the global history of the most recent m branches
in an m bit shift register, where each bit records whether
the branch was taken or not taken
• The branch prediction buffer can be indexed by using a
concatenation of the low order bits from the branch
address with the m bit global history. That is the address
indexes a row in the prediction buffer and the global buffer
chooses among them.
September 2012 19
Fig 14
September 2012 20
Comparison of Predictors – First is non-correlating for 4096 entries,
followed by a non-correlating 2 bit predictor with unlimited entries and finally a 2 bit
predictor with 2 bits of global history and 1024 entries
September 2012 21
Tournament Predictor for the Alpha 21264
September 2012 22
Fraction of Predictions Coming from the Local Predictor for
a Tournament Predictor using SPEC89 Benchmarks
September 2012 23
Branch Target Buffers
(Advanced Technique for Instruction Delivery)
September 2012 24
Branch Target Buffers
September 2012 25
Fig 3.21 A Branch Target Buffer – The PC of the instruction being fetched is matched
against a set of instruction addresses stored in the first column; which represent the addresses of
known branches. If the PC matches one of these entries, then the instruction being fetched is a taken
branch, and the second field, predicted PC, contains the prediction for the next PC after the branch.
Fetching immediately begins at that address.
September 2012 26
Fig 3.22 Steps Involve In Handling an Instruction
with a Branch Target Buffer
September 2012 27