Computer Architecture: Pipeline Hazards
Computer Architecture: Pipeline Hazards
Pipeline
R. Pacalet
Telecom Paris
2023-01-30
Solutions of exercises
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
Data hazards
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
ins[31:25, 14:0]
ctrl
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
stall ?
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
Other solutions?
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
Add paths from outputs of Execute, Memory and Write back to output of Decode
Add multiplexers at output of Decode to select register bank output or bypass
Add control logic to drive multiplexers
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
ALUout if (OpE = ALU) ∧ bump1E
MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump1M
out
bp1 =
WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump1W
else rdata1
ALUout
if (OpE = ALU) ∧ bump2E
MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump2M
out
bp2 =
WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump2W
else rdata2
add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
⌣ With our small pipeline almost no more need for stall (but for loads)
⌢ Muliplexer logic ⇒ increase critical path
⌢ Clock period ↑, clock frequency ↓
On deep and complex pipelines full bypass implies:
⌢ Prohibitive cost
⌢ Huge performance impact
Deep pipelines ⇒ bypass only small set of selected stages
What if t1=t3?
The lw instruction reads memory location written by sw instruction
Is this a read-after-write data hazard?
Ë No: when lw accesses memory, sw already modified memory
Ë . . . unless out-of-order memory
sw t0,0(t1) # Mem[t1+0] <- t0
lw t2,0(t3) # t2 <- Mem[t3+0]
Control hazards
⌢ CPUI ≥ 2!
add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
kill
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1 br?
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)
instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)