Lecture05 Branches
Lecture05 Branches
Branches
Nikos Bellas
The 5-stage
microarchitecture
executes beq/bne
instructions in EX stage
BUT:
Three instructions from
the wrong path have
already made it to the
pipeline if the beq
instruction is TAKEN
? ? ? IM Reg DM Reg
• How long does it take to fetch 500 useful instructions (i.e. from the
correct path)?
– 100% accuracy
• 100 cycles (all instructions fetched on the correct path)
• No wasted work; IPC = 500/100
– 99% accuracy
• 100 (correct path) + 20 * 1 (wrong path) = 120 cycles
• 20% extra instructions fetched; IPC = 500/120
– 90% accuracy
• 100 (correct path) + 20 * 10 (wrong path) = 300 cycles
• 200% extra instructions fetched; IPC = 500/300
– 60% accuracy
• 100 (correct path) + 20 * 40 (wrong path) = 900 cycles
• 800% extra instructions fetched; IPC = 500/900
ECE338 Parallel Computer Architecture 6
Branches in real programs
Cycle
1 2 3 4 5 6 7
IM Reg DM Reg
Next instruction 1
IM Reg DM Reg
Next instruction 2
ECE338 Parallel Computer
13
Architecture
Branch prediction
• But if we are wrong (Branch NOT TAKEN), we have fetched an instruction
I from the wrong path.
1. Need to flush that instruction I from the pipeline and place a nop to register IF/ID
2. Need to change PC so that we immediately start fetching from the Label.
• We pay N cycles penalty in case of misprediction
Cycle
1 2 3 4 5 6 7 8
next instruction 1 IM
flush
outer: …
…
inner: …
…
beq …, …, inner
…
beq …, …, outer
taken
predict predict
not taken
not taken taken taken
not taken
if cond1 is False,
then the second conditional is False
If branch Y is True,
then branch X is False
NT NT T
ECE338 Parallel Computer Architecture 23
Correlating branch prediction
• Use both Local and Global Branch
Predictions to improve prediction B1
rate
T F
– Correlating branch predictors
B2 B3
• Make a prediction based on the
outcome of the branch the last time
the same global branch history was
B5
encountered
B4
• Use Global history branch predictor
– Use an m-bit Global History
Table (GHR) to capture the B6 B7
history of the most recent m B8
111
branches 101
• Uses two levels of history (GHR +
history at that GHR for the specific
branch)
ECE338 Parallel Computer Architecture 24
(m, n) branch predictor n-bit saturating
counters for local
Branch address (k bits) prediction
• m = #bits of GHR
• n = #bits of for the predictor of
each branch (n-bit counter)
GHR:
m-bit recent global
ECE338 Parallel Computer Architecture branch history 25
Gshare predictors
• Gshare predictors is a type of (m, n) predictor in which GHR is
combined with the Branch PC (using a hash function)
• Pattern History Table (PHT)
– Contains n-bit saturating counters for prediction
– Indexed by a Hash function which can be as simple as an XOR
=? =? =?
1 1 1 1 1 1
1
1
1
Tagless base
predictor
prediction
U Tag Cntr
TAGE is the basis of the state-of-art branch predictor (branch prediction championship)
ECE338 Parallel Computer Architecture 31
Hybrid predictors performance
• Why?
– Could be very useful in deciding how to speculate:
• What predictor/PHT to choose/use
• Whether to keep fetching on this path
• Whether to switch to some other way of handling the
branch, e.g. dual-path execution (eager execution) or
dynamic predication
ECE338 Parallel Computer Architecture 33
How to Estimate Confidence
• An example estimator:
– Keep a record of correct/incorrect outcomes for the past
N instances of the “branch”
– Based on the correct/incorrect patterns, guess if the
current prediction will likely be correct/incorrect
0 ID/EX
WB EX/MEM
PCSrc M WB
Control MEM/WB
IF/ID EX M WB
4 Branch
Add
P Add Zero PCSrc
C
RegWrite << 2
Read Read
register 1 data 1 Zero MemWrite
ALU
Read Instruction Read
address [31-0] register 2 0
Read Result Address
data 2
Write
register Data
Instruction 1 MemToReg
Registers ALUOp memory
memory Write
data
ALUSrc Write Read
1
data data
Sign
extend RegDst
MemRead
0
Instr [20 - 16]
0
Tag V Target Addr. Instr [15 - 11]
0x1010A 1 0x1010A000 1
0
0x10200 1 0x10110B00
Branch Target Buffer ECE338 Parallel Computer
39
Architecture
Integration of Branch Prediction and
BTB
Branch predictor
taken?
Address of the
current instruction
target address
FP code