0% found this document useful (0 votes)

46 views46 pages

Computer Architecture: Pipeline Hazards

The document discusses computer architecture with a focus on pipeline execution and the handling of structural and data hazards. It outlines various techniques to manage these hazards, including instruction re-ordering, inserting nop instructions, and implementing pipeline stalls. Additionally, it covers the design of stall logic and bypass paths to optimize performance in pipelined processors.

Uploaded by

Jeff Granger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views46 pages

Computer Architecture: Pipeline Hazards

Uploaded by

Jeff Granger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Computer Architecture

Pipeline

R. Pacalet

Telecom Paris

2023-01-30

1 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Section 1

Solutions of exercises

2 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Subsection 1

Shared memory structural hazards

3 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow, underline memory accesses
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

lw t1,0(t0) F D E M W
addi t2,t1,10 F D E M W
sw t2,0(t0) F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W
jalr zero,0(ra) F D E M W

4 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Insert bubbles to fix execution flow
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

lw t1,0(t0) F D E M W
addi t2,t1,10 F D E M W
sw t2,0(t0) F D E M W
andi t2,t2,0xff O F D E M W
ori t2,t2,1 O O F D E M W
jalr zero,0(ra) O O F D E M W

5 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid structural hazards

• Instructions re-ordering, insert nop instructions
Does not work on this example because nop must also be fetched
• But could work in other examples of structural hazards
Duplicate hardware resource such that there are always enough
Pipeline stall
• Hardware freeze PC and insert nop instructions (not fetched)
• Back on this later

6 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Subsection 2

Data hazards

7 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow, underline issues
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

add t2,t1,t0 F D E M W
ori t3,t2,1 F D E M W

8 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Insert bubbles to fix execution flow
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

add t2,t1,t0 F D E M W
ori t3,t2,1 F O O O D E M W

9 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions (1/2)
ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid data hazards

• Instructions re-ordering, insert nop instructions
Run and kill: assume correct behaviour and kill instructions if needed
• Back on this later
Hardware stall (freeze) pipeline and insert nop instructions
Important: always stall newest of two dependent instructions
• Stall oldest instruction while newest waits for its result ⇒ dead-lock

10 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Pipeline stall

ins[31:25, 14:0]
ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

How to stall pipeline?

What information do we need to stall pipeline?
Design equations of stall logic

11 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

How to stall pipeline?

Add multiplexer at control output of decode stage to insert nop

Add multiplexers to freeze PC and IR
Add control logic to drive multiplexers
stall

stall ?

ins[31:25, 14:0] nop

ctrl
add

4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

12 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

What information do we need to stall pipeline? (1/2)

In execute, memory and write-back stages, extract:

• Register write-enable + index of written register
In decode stage, extract:
• Register read-enables (2) + indexes of read registers (2)
Compare and if match:
• Insert nop (at control output of decode stage)
• Freeze PC and IR

13 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

What information do we need to stall pipeline? (2/2)
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop

ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

14 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Design equations of stall logic

bump1E = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxE ) ∧ wenE

bump1M = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxM) ∧ wenM
bump1W = (ridx 1 ̸= 0) ∧ ren1 ∧ (ridx 1 = widxW ) ∧ wenW
bump2E = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxE ) ∧ wenE
bump2M = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxM) ∧ wenM
bump2W = (ridx 2 ̸= 0) ∧ ren2 ∧ (ridx 2 = widxW ) ∧ wenW
stall = bump1E ∨ bump1M ∨ bump1W ∨ bump2E ∨ bump2M ∨ bump2W

15 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions (2/2)
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop

ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

⌢ Bubbles ⇒ CPUI > 1 (Useful)

Requested data usually available before Write back completes
Pick it at output of Execute, Memory or Write back stages
Add fast bypass (feedback) paths
Send outputs of later stages to output of decode stage

16 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Pipeline bypass
stall
widxE, wenE
ridx1, ren1, ridx2, ren2 widxM, wenM
stall widxW, wenW

ins[31:25, 14:0] nop

ctrl

add
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

How to bypass pipeline?

Design equations of bypass logic
Is there an impact on stall logic?

17 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

How to bypass pipeline?

Add paths from outputs of Execute, Memory and Write back to output of Decode
Add multiplexers at output of Decode to select register bank output or bypass
Add control logic to drive multiplexers
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop

ctrl
add

bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

18 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Equations of bypass logic




ALUout if (OpE = ALU) ∧ bump1E

MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump1M

out
bp1 =


WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump1W

else rdata1



ALUout

 if (OpE = ALU) ∧ bump2E

MEM if ((OpM = load) ∨ (OpM = ALU)) ∧ bump2M

out
bp2 =


 WBout if ((OpW = load) ∨ (OpW = ALU)) ∧ bump2W

else rdata2



19 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Is there and impact on stall logic?

Yes: the bypass logic avoids most stall situations

newbump1E = (OpE ̸= ALU) ∧ oldbump1E

newbump1M = (OpM ̸= ALU) ∧ (OpM ̸= load) ∧ oldbump1M
newbump1W = (OpW ̸= ALU) ∧ (OpW ̸= load) ∧ oldbump1W
newbump2E = (OpE ̸= ALU) ∧ oldbump2E
newbump2M = (OpM ̸= ALU) ∧ (OpM ̸= load) ∧ oldbump2M
newbump2W = (OpW ̸= ALU) ∧ (OpW ̸= load) ∧ oldbump2W

Pipeline bypass performance and cost?

20 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Pipeline bypass performance and cost (1/2)
Best case / worst case (with 5 stages RV32IM pipeline)
Execute output to Decode output ⇒ no penalty
Write back output to Decode output ⇒ 2 clock cycles
But Write back output is the same as previous Memory output
Example code of real worst case and execution flow?

21 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Example code of real worst case and execution flow?
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t1,10 # t2 <- t1+10

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

lw t1,0(t0) F D E M W
addi t2,t1,10 F O D E M W

22 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Pipeline bypass performance and cost (2/2)
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop

ctrl

add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

⌣ With our small pipeline almost no more need for stall (but for loads)
⌢ Muliplexer logic ⇒ increase critical path
⌢ Clock period ↑, clock frequency ↓
On deep and complex pipelines full bypass implies:
⌢ Prohibitive cost
⌢ Huge performance impact
Deep pipelines ⇒ bypass only small set of selected stages

23 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Is there a data hazard here?

What if t1=t3?
The lw instruction reads memory location written by sw instruction
Is this a read-after-write data hazard?
Ë No: when lw accesses memory, sw already modified memory
Ë . . . unless out-of-order memory
sw t0,0(t1) # Mem[t1+0] <- t0
lw t2,0(t3) # t2 <- Mem[t3+0]

24 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

And here?

Ë add writes ra when in WB-stage (at t0 + 3)

Ë jal writes ra when in WB-stage (at t0 + 4)
Ë Not technically a data hazard (jal does not need add result)
Ë Write-after-write
Ë More likely a programming error!
add ra,s0,s1 # ra <- s0 + s1
jal getchar # jump and link at getchar

25 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Subsection 3

Control hazards

26 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Next instruction at PC + 4? Address?
When do we know if next instruction is PC + 4 or not?
Ë jal (jump immediate): decode stage
Ë jalr (jump at register): decode stage
Ë beq, bne,... (conditional branch): execute stage
Ë Other: decode stage

When do we know the address of next instruction?

Ë jal (jump immediate): decode stage (PC + 2 × imm)
Ë jalr (jump at register): decode stage (but delayed if stall)
Ë beq, bne,...: execute stage (PC + 4 or PC + 2 × imm)
Ë Other: decode stage (PC + 4)

27 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow, underline issues
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

28 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Insert bubbles to fix execution flow
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
andi t2,t2,0xff O F D E M W
ori t2,t2,1 O O F D E M W

⌢ CPUI ≥ 2!

29 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow if Branch Not Taken (BNT), underline issues
addi t2,t1,10 # t2 <- t1+10
bne t2,t3,waitchar # branch at waitchar if t2!=t3
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
bne t2,t3,waitchar F D E M W
andi t2,t2,0xff F D E M W
jal zero,there F D E M W
addi t2,t2,5 F D E M W
ori t2,t2,1 F D E M W

30 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Fix execution flow if BNT
addi t2,t1,10 # t2 <- t1+10
bne t2,t3,waitchar # branch at waitchar if t2!=t3
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
bne t2,t3,waitchar O F D E M W
andi t2,t2,0xff O O O F D E M W
jal zero,there O O O O F D E M W
ori t2,t2,1 O O O O O F D E M

31 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for jump (1/2)
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW

ins[31:25, 14:0] nop

ctrl

add
bp1
4 register data
bank memory
ins[19:15]
program ridx1 rdata1
stall memory ALU address
ins[24:20]
ridx2 rdata2
stall rdata

Write Back (WB, W)

address widx
wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Programmer or compiler avoid control hazards

• Insert a nop instruction after each jump
Speculate and kill (S&K)
• Always fetch next instruction
• Replace by nop when jump decoded
How to kill instruction in Fetch?

32 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

How to kill instruction in Decode?

Add multiplexer on front of IR to insert nop

Add control logic in Decode to drive multiplexer (jump detector)

stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)

address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

33 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Interaction with stall?
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)

address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Ë Yes: stall > kill

lw t2,0(t3) # t2 <- Mem[t3+0]
jalr zero,0(t2) # jump at t2

34 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow with S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
jal zero,there F D E M W
lw t1,0(t0) F O O O O
ori t2,t2,1 F D E M W

35 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for jump (2/2)
One Delay Slot (DS, as in MIPS32, not RISC-V)
Always execute instruction that follows a jump
This is an ISA change
Kill still needed?
⌣ No
Micro-architecture complexity?
⌣ Simple
Compiler or programmer complexity?
⌢ Must find jump-independent instruction to populate DS
⌢ Instructions reordering, dependency analysis
⌢ Not always possible ⇒ nop (to be fetched)
⌢ Not 100% efficient (e.g., 70%)

36 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow with one DS
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
jal zero,there # jump at there
lw t1,0(t0) # t1 <- Mem[t0+0]
addi t2,t2,5 # t2 <- t2+5
there:
ori t2,t2, 1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
jal zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

37 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for branch (1/4)

Programmer or compiler avoid control hazards

• Insert nop instructions after each branch instruction
How many nop instructions
Speculate and kill (S&K)
How?
stall kill
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register branch? data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)

address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

38 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Interaction with stall?
stall kill
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1
register branch? data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)

address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

Ë Yes: killed instructions are invalid

Ë Invalid instruction in decode stage shall not cause stall
Ë Kill > stall
Ë New stall equations

39 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow if Branch Taken (BT) with S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
bne t5,zero,there F D E M W
ori t1,t1,1 F D O O O
lw t3,0(t2) F O O O O
ori t2,t2,1 F D E M W

40 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for branch (2/4)

Move branch decision logic in Decode

⌢ Clock period ↑, clock frequency ↓
⌣ Saves one killed instruction
Same impact as jump

kill
stall
ridx1, ren1, ridx2, ren2 widxE, wenE
stall & widxM, wenM
bypass widxW, wenW
imm r
ins[31:25, 14:0] nop
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4 bp1 br?
register data
bank memory
ins[19:15]
program ridx1 rdata1
stall stall ALU address
memory ins[24:20]
kill ridx2 rdata2
rdata

Write Back (WB, W)

address widx
nop jump? wdata
instruction bp2
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

41 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow if BT with decision in Decode, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
andi t2,t2,0xff F D E M W
bne t5,zero,there F D E M W
ori t1,t1,1 F O O O O
ori t2,t2,1 F D E M W

42 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for branch (3/4)
Delay slot(s)
Always execute instruction(s) that follow a branch
ISA change
How many DS?
Ë One or two with our simple pipeline, could be more with deeper pipelines
Kill still needed?
⌣ No
Micro-architecture complexity?
⌣ Simple
Compiler or programmer complexity?
⌢ Must find branch-independent instruction(s) to populate DS
⌢ Instructions reordering, dependency analysis
Efficiency
⌢ Find useful instructions not always possible ⇒ nop (to be fetched)
⌢ Not 100% efficient (e.g., first: 70%, second: 50%)

43 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine execution flow if BT with 1 DS, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
bne t5,zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t1,t1,1 F O O O O
ori t2,t2,1 F D E M W

44 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine flow if BT with decision in Decode, 1 DS, S&K
addi t2,t1,10 # t2 <- t1+10
andi t2,t2,0xff # t2 <- t2 AND 0xff
bne t5,zero,there # if t5!=0 goto there
ori t1,t1,1 # t1 <- t1 OR 1
lw t3,0(t2) # t3 <- Mem[t2+0]
there:
ori t2,t2,1 # t2 <- t2 OR 1

T t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

addi t2,t1,10 F D E M W
bne t5,zero,there F D E M W
andi t2,t2,0xff F D E M W
ori t2,t2,1 F D E M W

45 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Imagine solutions for branch (4/4)
Branch prediction (outcome and target address)
What if we could predict branch outcome and target address?
More on this later

branch & jump

imm r prediction ?
ins[31:25, 14:0]
PC + 2 × imm ctrl
(r + imm) AND 0xfffffffe
PC + 4
register data
bank memory
ins[19:15]
program ridx1 rdata1
memory ALU address
ins[24:20]
ridx2 rdata2
rdata

Write Back (WB, W)

address widx
wdata
instruction
PC wdata
ins[6:0]
IR
sign 32
ins[31:20, 11:7] ext.

instruction
register
Instruction fetch (IF, I) Decode, read registers (D) Execute (EX, E, X) Memory (M)

46 / 46 2023-01-30 Telecom Paris Computer Architecture — Pipeline

Understanding Pipelining in Processors
No ratings yet
Understanding Pipelining in Processors
51 pages
Intel's 6 GHz Core i9-13900KS Overview
No ratings yet
Intel's 6 GHz Core i9-13900KS Overview
36 pages
Pipeline Design Issues in Parallel Computing
No ratings yet
Pipeline Design Issues in Parallel Computing
38 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
19 pages
Pipelining for Enhanced CPU Performance
No ratings yet
Pipelining for Enhanced CPU Performance
71 pages
Pipelining Stages in Computer Architecture
No ratings yet
Pipelining Stages in Computer Architecture
44 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
14 pages
Pipelined Datapath in Computer Organization
No ratings yet
Pipelined Datapath in Computer Organization
68 pages
Pipelining Hazards in Computer Architecture
No ratings yet
Pipelining Hazards in Computer Architecture
4 pages
Pipelining Techniques in Modern Processors
No ratings yet
Pipelining Techniques in Modern Processors
22 pages
32-Bit MIPS Processor Design Guide
No ratings yet
32-Bit MIPS Processor Design Guide
23 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
68 pages
Pipelined Processor Design Overview
No ratings yet
Pipelined Processor Design Overview
130 pages
Pipelining and Flynn's Classification Explained
No ratings yet
Pipelining and Flynn's Classification Explained
86 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
66 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
42 pages
Pipelined Datapath in MIPS Architecture
No ratings yet
Pipelined Datapath in MIPS Architecture
26 pages
Understanding Pipelining in Computer Architecture
No ratings yet
Understanding Pipelining in Computer Architecture
89 pages
RISC Pipeline Overview
No ratings yet
RISC Pipeline Overview
39 pages
Processor Structure and Pipelining Insights
No ratings yet
Processor Structure and Pipelining Insights
48 pages
Pipelined Datapath in Computer Architecture
No ratings yet
Pipelined Datapath in Computer Architecture
16 pages
MIPS Pipelining Concepts and Challenges
100% (3)
MIPS Pipelining Concepts and Challenges
23 pages
Understanding Pipelining Hazards and Solutions
No ratings yet
Understanding Pipelining Hazards and Solutions
29 pages
MIPS Pipeline Architecture and Hazards
No ratings yet
MIPS Pipeline Architecture and Hazards
59 pages
Six-Stage Instruction Pipeline Overview
No ratings yet
Six-Stage Instruction Pipeline Overview
27 pages
Pipelining Principles in Processor Design
No ratings yet
Pipelining Principles in Processor Design
123 pages
Understanding CPU Structure and Function
No ratings yet
Understanding CPU Structure and Function
44 pages
Pipelining and Instruction Parallelism
No ratings yet
Pipelining and Instruction Parallelism
20 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
9 pages
Forwarding Paths in Pipelined Datapath
No ratings yet
Forwarding Paths in Pipelined Datapath
11 pages
CPU Architecture and Instruction Pipelining
No ratings yet
CPU Architecture and Instruction Pipelining
43 pages
MIPS Pipeline Performance and Hazards
No ratings yet
MIPS Pipeline Performance and Hazards
64 pages
Computer Organization Exam Guide 2024
No ratings yet
Computer Organization Exam Guide 2024
13 pages
Understanding Pipeline Processing in Computing
No ratings yet
Understanding Pipeline Processing in Computing
18 pages
Data Hazards and Forwarding in Processors
No ratings yet
Data Hazards and Forwarding in Processors
35 pages
Pipelined Processor Design Overview
No ratings yet
Pipelined Processor Design Overview
97 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
38 pages
Computer Architecture Fundamentals Guide
No ratings yet
Computer Architecture Fundamentals Guide
14 pages
CPU Pipelining and Instruction Execution
No ratings yet
CPU Pipelining and Instruction Execution
35 pages
MIPS Processor Design and Pipelining
No ratings yet
MIPS Processor Design and Pipelining
72 pages
RISC-V 5-Stage Pipeline Overview
No ratings yet
RISC-V 5-Stage Pipeline Overview
31 pages
RISC-V Pipelining and Hazards Analysis
No ratings yet
RISC-V Pipelining and Hazards Analysis
99 pages
Understanding Data Hazards in Pipelining
No ratings yet
Understanding Data Hazards in Pipelining
29 pages
Pipelined Datapath and Control Overview
No ratings yet
Pipelined Datapath and Control Overview
26 pages
MIPS Pipeline Architecture Overview
No ratings yet
MIPS Pipeline Architecture Overview
84 pages
8086 Architecture and Pipeline Hazards
No ratings yet
8086 Architecture and Pipeline Hazards
26 pages
Understanding Pipeline Hazards in CPUs
No ratings yet
Understanding Pipeline Hazards in CPUs
31 pages
x86 Architecture and Assembly Overview
No ratings yet
x86 Architecture and Assembly Overview
44 pages
Tanish Raghute's Role in Pipelined Processor
No ratings yet
Tanish Raghute's Role in Pipelined Processor
8 pages
Pipelining Data Hazards: Stalls & Forwarding
No ratings yet
Pipelining Data Hazards: Stalls & Forwarding
27 pages
Data Hazards in CPU Architecture
No ratings yet
Data Hazards in CPU Architecture
50 pages
Pipeline vs Non-Pipeline Architecture
No ratings yet
Pipeline vs Non-Pipeline Architecture
94 pages
Principles of Pipelining in Processors
No ratings yet
Principles of Pipelining in Processors
34 pages
Enhancing CPU Performance: Pipelining & Caches
No ratings yet
Enhancing CPU Performance: Pipelining & Caches
50 pages
Processor Pipeline Overview and Hazards
No ratings yet
Processor Pipeline Overview and Hazards
38 pages
Pipelining and Data Hazards in Processors
No ratings yet
Pipelining and Data Hazards in Processors
27 pages
Pipelining in MIPS Architecture Explained
No ratings yet
Pipelining in MIPS Architecture Explained
37 pages
Pipelining Techniques in MIPS Architecture
No ratings yet
Pipelining Techniques in MIPS Architecture
59 pages
Branch Prediction in Computer Architecture
No ratings yet
Branch Prediction in Computer Architecture
16 pages
Managing Uncertainty in Knowledge Graphs
No ratings yet
Managing Uncertainty in Knowledge Graphs
38 pages
Knowledge Graphs: Innovations & Impact
No ratings yet
Knowledge Graphs: Innovations & Impact
87 pages
DC Motor Analysis: Excitation Methods
No ratings yet
DC Motor Analysis: Excitation Methods
2 pages
Innovator Mindset and Design Thinking
No ratings yet
Innovator Mindset and Design Thinking
21 pages
Heat and Mass Transfer Formulas Guide
No ratings yet
Heat and Mass Transfer Formulas Guide
35 pages
Figures and Tables from Cengel's Text
No ratings yet
Figures and Tables from Cengel's Text
35 pages
Angle Modulation in Communication Systems
No ratings yet
Angle Modulation in Communication Systems
38 pages
TensorFlow for Image Processing
100% (2)
TensorFlow for Image Processing
29 pages
Cache Tag Directory Size Calculation
No ratings yet
Cache Tag Directory Size Calculation
6 pages
Understanding Data Flow Diagrams (DFD)
No ratings yet
Understanding Data Flow Diagrams (DFD)
14 pages
PGP in Data Science & AI Overview
No ratings yet
PGP in Data Science & AI Overview
14 pages
C Program to Find Average of Marks
No ratings yet
C Program to Find Average of Marks
49 pages
Understanding Technical English Essentials
No ratings yet
Understanding Technical English Essentials
12 pages
NET Framework Notes For Professionals
No ratings yet
NET Framework Notes For Professionals
191 pages
CS3X Cash Register Software Guide
No ratings yet
CS3X Cash Register Software Guide
37 pages
ACCO NET: Scalable Access Control System
No ratings yet
ACCO NET: Scalable Access Control System
6 pages
Theory of Computation Course Overview
No ratings yet
Theory of Computation Course Overview
2 pages
IBM z16 Product Brief
No ratings yet
IBM z16 Product Brief
6 pages
DD Microtech Corp.: AMC3842B/43B
No ratings yet
DD Microtech Corp.: AMC3842B/43B
10 pages
Oracle Database History Overview
No ratings yet
Oracle Database History Overview
2 pages
C++ Programs for OOP Concepts and Operator Overloading
No ratings yet
C++ Programs for OOP Concepts and Operator Overloading
27 pages
Yaman Jain: Backend Developer Profile
No ratings yet
Yaman Jain: Backend Developer Profile
2 pages
Grade 11 IT Practical Memo 2021
No ratings yet
Grade 11 IT Practical Memo 2021
9 pages
Understanding CSS Pseudo-classes
No ratings yet
Understanding CSS Pseudo-classes
2 pages
Types of Virtualization in Cloud Computing
No ratings yet
Types of Virtualization in Cloud Computing
12 pages
Thales Group TechOps Engineer Profile
No ratings yet
Thales Group TechOps Engineer Profile
3 pages
AZDOME M550 Dashcam Owner's Manual
No ratings yet
AZDOME M550 Dashcam Owner's Manual
18 pages
MT-MX Coexistence Solution Overview
No ratings yet
MT-MX Coexistence Solution Overview
4 pages
Proofing Techniques in Microsoft Office
No ratings yet
Proofing Techniques in Microsoft Office
2 pages
Introduction to Human-Computer Interaction
No ratings yet
Introduction to Human-Computer Interaction
36 pages
PGDCA Syllabus 2010-11 MJP Rohilkhand
No ratings yet
PGDCA Syllabus 2010-11 MJP Rohilkhand
15 pages
Computer Science P-PGT Exam Questions
No ratings yet
Computer Science P-PGT Exam Questions
22 pages
Admitted Applicants: Computer Science
No ratings yet
Admitted Applicants: Computer Science
32 pages
Installation and UserManual - FFeDTMv1ProductionRelease
No ratings yet
Installation and UserManual - FFeDTMv1ProductionRelease
9 pages
Client-Server Architecture Design Insights
100% (1)
Client-Server Architecture Design Insights
16 pages
Prashanth Mangali's Resume and Skills
No ratings yet
Prashanth Mangali's Resume and Skills
1 page