L02 Branch Prediction V2021
L02 Branch Prediction V2021
beq$x,$y,offset
beq $x,$y,L1
IF ID EX ME WB
Instruction Fetch Instruction Decode Execution Memory Access Write Back
beq
beq $x,$y,offset
$x,$y,L1
Instr. Fetch Register Read ALU Op. ($x-$y) Write of
& PC Increm. $x e $y & (PC+4+offset) PC
2-bit Left
S hifter
WR Adder
To control logic of
[25-21] R egis ter conditional branch Branch Target
R ead 1 Addres s
Content
Is truction [20-16] R egis ter regis ter 1
Zero P C +4
R ead 2
(from fetch unit)
R eg is ter F ile
R egis ter AL U
Content
write regis trer 2
Write
Data
[15-0] S ign
16 bit E xtens ion 32 bit
2-bit Left
S hifter
WR Branch
R ead [25-21] R egis ter OP Outcome WR RD
PC Addres s R ead 1 Content
R ead
[20-16] R egis ter regis ter 1 AL U
Ins truction Zero Addres s
R ead 2 M
Write R ead Data
RF U
Ins truction Addres s X
M R es ult
Memory R egis ter Content U
Write regis ter 2 X Data
M Write Write
Data Data Memory
[15-11] U
X
[15-0] S ign
16 bit extens ion 32 bit
IF — Instruction Fetch
The branch instruction may or may not change the PC in MEM stage,
but the next 3 instructions are fetched and their execution is
started.
If the branch is not taken, the pipeline execution is fine
If the branch is taken, it is necessary to flush the next 3 instructions
in the pipeline before they are writing their results, then we need to
fetch the lw instruction at the branch target address (L1)
or $13, $6, $2 IF ID EX ME WB
or $13, $6, $2 IF ID EX ME WB
Adder
2-bit Left
WR Shift
Read Register OP
PC Read 1 Cont.
Address
Register
Instruction
Read 2
Reg. 1
Branch ALU M
=
Instruction Register File Outcome U
M X
Memory Register Cont. Result
Write U
Reg. 2 X
Write Data
M Data Memory
U
X
Sign
16 bit Extension
32 bit
or $13, $6, $2 IF ID EX ME WB
resolution.
addi $1, $1, 4 IF ID EX ME WB
beq $1, $6, L1 IF stall ID EX ME WB
and $12, $2, $5 stall IF ID EX ME WB
resolution.
lw $1, 32($2) IF ID EX ME WB
beq $1, $6, L1 IF stall stall
ID EX ME WB
and $12, $2, $5 stall IF ID EX ME WB
There are two types of methods to deal with the performance loss
due to branch hazards:
• Static Branch Prediction Techniques: The actions
(taken/untaken) for a branch prediction are fixed at compile
time for each branch during the entire execution.
• Dynamic Branch Prediction Techniques: The actions
4) Profile-Driven Prediction
5) Delayed Branch
BO: Untaken
Untaken branch IF ID EX ME WB
Branch delay instr. IF ID EX ME WB BRANCH DELAY SLOT
Instr. i+1 IF ID EX ME WB
Instr. i+2 IF ID EX ME WB
Instr. i+3 IF ID EX ME WB
BO:Taken
Taken branch IF ID EX ME WB
Branch delay instr. IF ID EX ME WB BRANCH DELAY SLOT
2. From target
3. From fall-through
4. From after
if $1 == 0 then if $1 == 0 then
BOP:T/NT BO = BOP?
BTB:-- BTA
Branch IF ID EX ME WB
Predicted Instr. IF ID EX ME WB
Untaken branch IF ID EX ME WB
Prediction: Instruction i+1 IF ID EX ME WB
Instruction i+2 IF ID EX ME WB
Instruction i+3 IF ID EX ME WB
Instruction i+4 IF ID EX ME WB
Taken branch IF ID EX ME WB
Prediction: Instruction i+1 IF nop nop nop nop
Branch target IF ID EX ME WB
Branch target+1 IF ID EX ME WB
Branch target+2 IF ID EX ME WB
Taken branch IF ID EX ME WB
Prediction: Branch Target IF ID EX ME WB
Branch Target + 1 IF ID EX ME WB
Branch Target + 2 IF ID EX ME WB
Branch Target + 3 IF ID EX ME WB
BOP:taken BO Untaken
BTB:PTA Misprediction
Untaken branch IF ID EX ME WB
Prediction: Branch target IF nop nop nop nop
Instr. i+1 IF ID EX ME WB
Instr. i+2 IF ID EX ME WB
Instr. i+3 IF ID EX ME WB
BHT
T/NT (1-bit)
or
• The same index has been referenced by two different branches,
and the previous history refers to the other branch (This can
occur because there is no tag check)
• To reduce this problem it is enough to increase the number
subi r3,r1,2
bnez r3,L1; bb1
add r1,r0,r0
If(a==2) a = 0; bb1
L1: subi r3,r2,2
L1: If(b==2) b = 0; bb2
bnez r3,L2; bb2
L2: If(a!=b) {}; bb3
add r2,r0,r0
L2: sub r3,r1,r2
beqz r3,L3; bb3
L3:
0 1
.... ....
2k entries
Branch Address
(k bit)
Branch Prediction
1 0 1 0 1 0 1 0
0 0 1 1 1 1 1 1
Branch Address
(k bit) 2k entries
2-bit Prediction
0 0 1 1 1 1 1 1
address 24 entries
2-bit Prediction
Prof. Cristina Silvano – Politecnico di Milano - 74 -
2) Example of (2, 2) Correlating Predictor
The first level history is recorded in one (or more) k-bit shift
register called Branch History Register (BHR), which records the
outcomes of the k most recent branches (i.e. T, NT, NT, T) (used
as a global history)
The second level history is recorded in one (or more) tables called
Pattern History Table (PHT) of two-bit saturating counters (used
as a local history)
The BHR is used to index the PHT to select which 2-bit counter to
use.
Once the two-bit counter is selected, the prediction is made using
the same method as in the two-bit counter scheme.
BHR
T NT NT T T/NT
PHT
4-bit
PC
BHR
T NT NT T T/NT
XOR
Branch Address
BTB entry: TAG INDEX
Tag PTA BOP
• Tag + Predicted Target
Address (expressed as PC-
relative for conditional T/NT
branches) + Branch
Outcome Predictor
(prediction state bit(s) as in =
a BHT)
PTA
Hit/Miss T/NT