0% found this document useful (0 votes)
46 views46 pages

Computer Architecture: Pipeline Hazards

The document discusses computer architecture with a focus on pipeline execution and the handling of structural and data hazards. It outlines various techniques to manage these hazards, including instruction re-ordering, inserting nop instructions, and implementing pipeline stalls. Additionally, it covers the design of stall logic and bypass paths to optimize performance in pipelined processors.

Uploaded by

Jeff Granger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views46 pages

Computer Architecture: Pipeline Hazards

The document discusses computer architecture with a focus on pipeline execution and the handling of structural and data hazards. It outlines various techniques to manage these hazards, including instruction re-ordering, inserting nop instructions, and implementing pipeline stalls. Additionally, it covers the design of stall logic and bypass paths to optimize performance in pipelined processors.

Uploaded by

Jeff Granger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Architecture

Pipeline

R. Pacalet

Telecom Paris

2023-01-30

1 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Section 1

Solutions of exercises

2 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Subsection 1

Shared memory structural hazards

3 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow, underline memory accesses
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


lw t1,0(t0) F D E M W
addi t2,t1,10 F D E M W
sw t2,0(t0) F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W
jalr zero,0(ra) F D E M W

4 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Insert bubbles to fix execution flow
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


lw t1,0(t0) F D E M W
addi t2,t1,10 F D E M W
sw t2,0(t0) F D E M W
andi t2,t2,0xff O F D E M W
ori t2,t2,1 O O F D E M W
jalr zero,0(ra) O O F D E M W

5 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid structural hazards


• Instructions re-ordering, insert nop instructions
Does not work on this example because nop must also be fetched
• But could work in other examples of structural hazards
Duplicate hardware resource such that there are always enough
Pipeline stall
• Hardware freeze PC and insert nop instructions (not fetched)
• Back on this later

6 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Subsection 2

Data hazards

7 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow, underline issues
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


add t2,t1,t0 F D E M W
ori t3,t2,1 F D E M W

8 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Insert bubbles to fix execution flow
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


add t2,t1,t0 F D E M W
ori t3,t2,1 F O O O D E M W

9 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions (1/2)
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid data hazards


• Instructions re-ordering, insert nop instructions
Run and kill: assume correct behaviour and kill instructions if needed
• Back on this later
Hardware stall (freeze) pipeline and insert nop instructions
Important: always stall newest of two dependent instructions
• Stall oldest instruction while newest waits for its result ⇒ dead-lock

10 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Pipeline stall

ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

How to stall pipeline?


What information do we need to stall pipeline?
Design equations of stall logic

11 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


How to stall pipeline?

Add multiplexer at control output of decode stage to insert nop


Add multiplexers to freeze PC and IR
Add control logic to drive multiplexers
stall

stall ?

ins[31:25, 14:0] nop


ctrl
add

4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

12 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


What information do we need to stall pipeline? (1/2)

In execute, memory and write-back stages, extract:


• Register write-enable + index of written register
In decode stage, extract:
• Register read-enables (2) + indexes of read registers (2)
Compare and if match:
• Insert nop (at control output of decode stage)
• Freeze PC and IR

13 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


What information do we need to stall pipeline? (2/2)
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop


ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

14 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Design equations of stall logic

bump1E = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxE ) ∧ wenE


bump1M = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxM) ∧ wenM
bump1W = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxW ) ∧ wenW
bump2E = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxE ) ∧ wenE
bump2M = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxM) ∧ wenM
bump2W = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxW ) ∧ wenW
stall = bump1E ∨ bump1M ∨ bump1W ∨ bump2E ∨ bump2M ∨ bump2W

Other solutions?

15 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions (2/2)
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop


ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

⌢ Bubbles ⇒ CPUI > 1 (Useful)


Requested data usually available before Write back completes
Pick it at output of Execute, Memory or Write back stages
Add fast bypass (feedback) paths
Send outputs of later stages to output of decode stage

16 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Pipeline bypass
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop


ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

How to bypass pipeline?


Design equations of bypass logic
Is there an impact on stall logic?

17 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


How to bypass pipeline?

Add paths from outputs of Execute, Memory and Write back to output of Decode
Add multiplexers at output of Decode to select register bank output or bypass
Add control logic to drive multiplexers
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop


ctrl
add

bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

18 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Equations of bypass logic




ALUout if (OpE = ALU) ∧ bump1E

MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump1M

out
bp1 =


WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump1W

else rdata1



ALUout

 if (OpE = ALU) ∧ bump2E

MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump2M

out
bp2 =


 WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump2W

else rdata2

19 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Is there and impact on stall logic?

Yes: the bypass logic avoids most stall situations

newbump1E = (OpE ̸= ALU) ∧ oldbump1E


newbump1M = (OpM ̸= ALU) ∧ (OpM ̸= load) ∧ oldbump1M
newbump1W = (OpW ̸= ALU) ∧ (OpW ̸= load) ∧ oldbump1W
newbump2E = (OpE ̸= ALU) ∧ oldbump2E
newbump2M = (OpM ̸= ALU) ∧ (OpM ̸= load) ∧ oldbump2M
newbump2W = (OpW ̸= ALU) ∧ (OpW ̸= load) ∧ oldbump2W

Pipeline bypass performance and cost?

20 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Pipeline bypass performance and cost (1/2)
Best case / worst case (with 5 stages RV32IM pipeline)
Execute output to Decode output ⇒ no penalty
Write back output to Decode output ⇒ 2 clock cycles
But Write back output is the same as previous Memory output
Example code of real worst case and execution flow?

21 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Example code of real worst case and execution flow?
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t1,10 # t2 <- t1+10

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


lw t1,0(t0) F D E M W
addi t2,t1,10 F O D E M W

22 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Pipeline bypass performance and cost (2/2)
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop


ctrl

add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

⌣ With our small pipeline almost no more need for stall (but for loads)
⌢ Muliplexer logic ⇒ increase critical path
⌢ Clock period ↑, clock frequency ↓
On deep and complex pipelines full bypass implies:
⌢ Prohibitive cost
⌢ Huge performance impact
Deep pipelines ⇒ bypass only small set of selected stages

23 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Is there a data hazard here?

What if t1=t3?
The lw instruction reads memory location written by sw instruction
Is this a read-after-write data hazard?
Ë No: when lw accesses memory, sw already modified memory
Ë . . . unless out-of-order memory
sw t0,0(t1) # Mem[t1+0] <- t0
lw t2,0(t3) # t2 <- Mem[t3+0]

24 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


And here?

Ë add writes ra when in WB-stage (at t0 + 3)


Ë jal writes ra when in WB-stage (at t0 + 4)
Ë Not technically a data hazard (jal does not need add result)
Ë Write-after-write
Ë More likely a programming error!
add ra,s0,s1 # ra <- s0 + s1
jal getchar # jump and link at getchar

25 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Subsection 3

Control hazards

26 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Next instruction at PC + 4? Address?
When do we know if next instruction is PC + 4 or not?
Ë jal (jump immediate): decode stage
Ë jalr (jump at register): decode stage
Ë beq, bne,... (conditional branch): execute stage
Ë Other: decode stage

When do we know the address of next instruction?


Ë jal (jump immediate): decode stage (PC + 2 × imm)
Ë jalr (jump at register): decode stage (but delayed if stall)
Ë beq, bne,...: execute stage (PC + 4 or PC + 2 × imm)
Ë Other: decode stage (PC + 4)

27 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow, underline issues
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

28 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Insert bubbles to fix execution flow
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
andi t2,t2,0xff O F D E M W
ori t2,t2,1 O O F D E M W

⌢ CPUI ≥ 2!

29 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow if Branch Not Taken (BNT), underline issues
addi t2,t1,10 # t2 <- t1+10
bne t2,t3,waitchar # branch at waitchar if t2!=t3
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
bne t2,t3,waitchar F D E M W
andi t2,t2,0xff F D E M W
jal zero,there F D E M W
addi t2,t2,5 F D E M W
ori t2,t2,1 F D E M W

30 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Fix execution flow if BNT
addi t2,t1,10 # t2 <- t1+10
bne t2,t3,waitchar # branch at waitchar if t2!=t3
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
bne t2,t3,waitchar O F D E M W
andi t2,t2,0xff O O O F D E M W
jal zero,there O O O O F D E M W
ori t2,t2,1 O O O O O F D E M

31 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for jump (1/2)
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop


ctrl

add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)


address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid control hazards


• Insert a nop instruction after each jump
Speculate and kill (S&K)
• Always fetch next instruction
• Replace by nop when jump decoded
How to kill instruction in Fetch?

32 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


How to kill instruction in Decode?

Add multiplexer on front of IR to insert nop


Add control logic in Decode to drive multiplexer (jump detector)

stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)


address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

33 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Interaction with stall?
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)


address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Ë Yes: stall > kill


lw t2,0(t3) # t2 <- Mem[t3+0]
jalr zero,0(t2) # jump at t2

34 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow with S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
jal zero,there F D E M W
lw t1,0(t0) F O O O O
ori t2,t2,1 F D E M W

35 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for jump (2/2)
One Delay Slot (DS, as in MIPS32, not RISC-V)
Always execute instruction that follows a jump
This is an ISA change
Kill still needed?
⌣ No
Micro-architecture complexity?
⌣ Simple
Compiler or programmer complexity?
⌢ Must find jump-independent instruction to populate DS
⌢ Instructions reordering, dependency analysis
⌢ Not always possible ⇒ nop (to be fetched)
⌢ Not 100% efficient (e.g., 70%)

36 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow with one DS
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2, 1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
jal zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

37 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for branch (1/4)

Programmer or compiler avoid control hazards


• Insert nop instructions after each branch instruction
How many nop instructions
Speculate and kill (S&K)
How?
stall kill
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register branch? data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)


address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

38 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Interaction with stall?
stall kill
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register branch? data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)


address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Ë Yes: killed instructions are invalid


Ë Invalid instruction in decode stage shall not cause stall
Ë Kill > stall
Ë New stall equations

39 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow if Branch Taken (BT) with S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
bne t5,zero,there F D E M W
ori t1,t1,1 F D O O O
lw t3,0(t2) F O O O O
ori t2,t2,1 F D E M W

40 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for branch (2/4)

Move branch decision logic in Decode


⌢ Clock period ↑, clock frequency ↓
⌣ Saves one killed instruction
Same impact as jump

kill
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1 br?
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)


address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

41 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow if BT with decision in Decode, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
bne t5,zero,there F D E M W
ori t1,t1,1 F O O O O
ori t2,t2,1 F D E M W

42 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for branch (3/4)
Delay slot(s)
Always execute instruction(s) that follow a branch
ISA change
How many DS?
Ë One or two with our simple pipeline, could be more with deeper pipelines
Kill still needed?
⌣ No
Micro-architecture complexity?
⌣ Simple
Compiler or programmer complexity?
⌢ Must find branch-independent instruction(s) to populate DS
⌢ Instructions reordering, dependency analysis
Efficiency
⌢ Find useful instructions not always possible ⇒ nop (to be fetched)
⌢ Not 100% efficient (e.g., first: 70%, second: 50%)

43 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine execution flow if BT with 1 DS, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
bne t5,zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t1,t1,1 F O O O O
ori t2,t2,1 F D E M W

44 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine flow if BT with decision in Decode, 1 DS, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13


addi t2,t1,10 F D E M W
bne t5,zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

45 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline


Imagine solutions for branch (4/4)
Branch prediction (outcome and target address)
What if we could predict branch outcome and target address?
More on this later

branch & jump


imm r prediction ?
ins[31:25, 14:0]
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4
register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)


address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

46 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

You might also like