pipe3
pipe3
cs 152
cs 152 L1 3L1
.1 3 . DAP Fa97, U.CB
a) Assume Branch not Taken
cs 152
cs 152 L1 3L1
.3 3 .3 DAP Fa97, U.CB
b) Reducing the Delay of Branches
Hazard
detection
unit
M ID/EX
u
x
WB
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
4 Shift
left 2
M
u
x
Registers =
Instruction Data
PC ALU
memory memory M
u
M x
u
x
Sign
extend
M
u
x
Forwarding
unit
IF.Flush
Hazard
detection
unit
72 ID/EX
M
u
48 x WB
EX/MEM
M
Control u M WB
x MEM/WB
28
0
IF/ID EX M WB
48 44 72
4
$1
Shift M $4
left 2 u
x
=
Registers
Instruction Data
PC ALU
memory memory M
72 44 $3
u
M $8 x
7 u
x
Sign
extend
10
Forwarding
unit
Clock 3
lw $4, 50($7) bubble (nop) beq $1, $3, 7 sub $10, . . . before<1>
IF.Flush
Hazard
detection
unit
ID/EX
M
u
76 x WB
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
76 72
Shift M $1
left 2 u
x
Registers
=
Instruction Data
PC ALU
memory memory M
76 72
u
M $3 x
u
x
Sign
extend
10
Forwarding
unit
Clock 4
04/26/99 8
cs 152
cs 152 L1 3L1
.8 3 .8 DAP Fa97, U.CB
Implementation of Dynamic Branch Prediction
04/26/99 10
cs 152
cs 152 L1 3L1
.113 .10 DAP Fa97, U.CB
Finite State Machine for 2-bit Prediction Scheme
Taken
Not taken
Predict taken Predict taken
Taken
Taken Not taken
Not taken
Predict not taken Predict not taken
Taken
Not taken
cs 152 97108/Patterson
L1 3 .12 DAP Fa97, U.CB
Figure 06.53
Compiler Scheduling for Branch Delays
Multicycle:
No. of cycles for each operation (CPI) in multicycle implementation:
loads = 5, stores = 4, R-format = 4, branches = 3, jumps = 3 cycles.
CPI = 0.22*5 + 0.11*4 + 0.49*4 + 0.16*3 + 0.02*3 = 4.04 cycles / instr
Average time for instr execution = 4.04 * 2 ns = 8.08 ns.
cs 152 L1 3 .16 DAP Fa97, U.CB
Performance Comparisson Continuation
Pipelined: assume
1/2 of loads followed by instr that uses load result; branch delay on
misprediction 1 cycle and 1/4 branches are mispredicted; jumps
always delay 1 cycle.
loads: 1 cycle when no dependency and 2 when there is dependency,
average cycle for loads = 1.5 cycles
stores and R-format: 1 cycle
branches: 1 cycle when prediction is correct and 2 cycles when
wrong, average cycle for branches = 1 + 1*0.25 = 1.25 cycles
jumps: 1 cycle + 1 cycle delay = 2 cycles
Hazard
detection
unit
M ID/EX
u M 0
40000040 u
x 0 10
WB x
0 E X/MEM
M 0 010 M 0
Control u M u WB
x x MEM/WB
0
0
0 Cause 1
IF/ID EX M WB
58 54 50 Except
PC
4 Shift
left 2 $6
M $2
u
x
12 Registers = Data
Instruction ALU
PC memory M
mem ory $7
40000040 u
M $1 x
54 u
x
Sign
extend
M 13 12
$1 u
15 x
Forwarding
unit
Clock 5
Hazard
detection
unit
M ID/EX
u M 00
40000040 u
x 0 00
WB x
0 E X/MEM
Control M 0 000 M 00
u M u WB
x x MEM/WB
0
40000044 0
0 1
EX Cause M WB
IF/ID
Except
PC
4 Shift
left 2
M
u
x
13 Registers = Data
Instruction ALU
PC memory
mem ory M
40000044 u
M x
40000040 u
x
Sign
extend
M 13
u
x
Forwarding
unit
Clock 6
Hazard
detection
unit
M ID/EX
40000040 u M
x u
WB x
0 EX/MEM
M M
Control u M u WB
x x MEM/WB
0
0
EX Cause M WB
IF/ID
Except
PC
4 Shift
left 2
M
u
x
Registers = Data
Instruction ALU
PC memory M
memory
u
M x
u
x
Sign
extend
M
u
x
Forwarding
unit
IAU
npc
Regs lw $2,20($5) PC
detect bad instruction
B A im n op rw
instr__1__ _2___3___4___5___6___7___8___9___10
i-3 IF ID EX MEM WB
i-2 IF ID EX MEM WB
i-1 IF ID EX MEM WB
i LW IF ID EX MEM WB
i+1 ADD IF ID EX MEM WB
i+2 IF ID EX MEM WB
Treats ADD exception at the end of 5th cycle. Stop instrs i-2, i-1, i,
and i+1. Reinitiate pipe at intrs i-2 after exception handling. It
will then detect the pg fault of instr i, stopping instrs i...i+3.
cs 152 L1 3 .31 DAP Fa97, U.CB
Why use Imprecise Exceptions?