0% found this document useful (0 votes)
51 views

Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page

The document describes data hazards that can occur in a pipelined processor when instructions have dependencies between register values. An example code sequence is given that contains read-after-write dependencies that would cause problems in a pipelined datapath. Specifically, instructions further down the pipeline rely on register values that have not been written yet. Two solutions are proposed: inserting no-ops to stall the pipeline, or forwarding register values between stages of the pipeline before they are written to the register file.

Uploaded by

bsudheertec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page

The document describes data hazards that can occur in a pipelined processor when instructions have dependencies between register values. An example code sequence is given that contains read-after-write dependencies that would cause problems in a pipelined datapath. Specifically, instructions further down the pipeline rely on register values that have not been written yet. Two solutions are proposed: inserting no-ops to stall the pipeline, or forwarding register values between stages of the pipeline before they are written to the register file.

Uploaded by

bsudheertec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Pipeline Review

0
ID/EX
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P Add
C Shift
RegWrite left 2

Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0]
register 2 data 2 0 Result Address
Write
Data
Instruction register 1 MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

Our examples are too simple


Here is the example instruction sequence used to
illustrate pipelining on the previous page
lw $8, 4($29)
sub $2, $4, $5
and $9, $10, $11
or $16, $17, $18
add $13, $14, $0

The instructions in this example are independent


 Each instruction reads and writes completely different
registers
 Our datapath handles this sequence easily
But most sequences of instructions are not
independent!
2

1
An example with dependences
Read after Write dependences

sub $2, $1, $3


and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Dependences are a property of how the
computation is expressed

An example with dependences


sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)

There are several dependences in this code fragment


 The first instruction, SUB, stores a value into $2
 That register is used as a source in the rest of the instructions
This is no problem for 1-cycle and multicycle datapaths
 Each instruction executes completely before the next begins
 This ensures that instructions 2 through 5 above use the new
value of $2 (the sub result), just as we expect.
How would this code sequence fare in our pipelined
datapath?
4

2
Data hazards in the pipeline diagram
Clock cycle
1 2 3 4 5 6 7 8 9

sub $2, $1, $3 IF ID EX MEM WB

and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

add $14, $2, $2 IF ID EX MEM WB

sw $15, 100($2) IF ID EX MEM WB

The SUB does not write to register $2 until clock cycle 5


causeing 2 data hazards in our pipelined datapath
 The AND reads register $2 in cycle 3. Since SUB hasn’t
modified the register yet, this is the old value of $2
 Similarly, the OR instruction uses register $2 in cycle 4, again
before it’s actually updated by SUB 5

Things that are okay


Clock cycle
1 2 3 4 5 6 7 8 9

sub $2, $1, $3 IF ID EX MEM WB

and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

add $14, $2, $2 IF ID EX MEM WB

sw $15, 100($2) IF ID EX MEM WB

The ADD is okay, because of the register file design


 Registers are written at the beginning of a clock cycle
 The new value will be available by the end of that cycle
The SW is no problem at all, since it reads $2 after the
SUB finishes
6

3
One Solution To Data Hazards
sub $2, $1, $3 sub $2, $1, $3
and $12, $2, $5 sll $0, $0, $0
or $13, $6, $2 sll $0, $0, $0
add $14, $2, $2 and $12, $2, $5
sw $15, 100($2) or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)

Since it takes two instruction cycles to get the value stored,


one solution is for the assembler to insert no-ops or for
compilers to reorder instructions to do useful work while
the pipeline proceeds
A software solution to data hazards 7

A fancier pipeline diagram


Clock cycle
1 2 3 4 5 6 7 8 9

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2

IM Reg DM Reg
add $14, $2, $2

IM Reg DM Reg
sw $15, 100($2)

4
Forwarding
Since the pipeline registers already contain the ALU
result, we could just forward the value to later
instructions, to prevent data hazards
•  In clock cycle 4, the AND instruction can get the value of $1 -
$3 from the EX/MEM pipeline register used by SUB
•  Then in cycle 5, the OR can get that same result from the MEM/
WB pipeline register being used by SUB
Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2 9

Forwarding Implementation
Forwarding requires …
(a)  Recognizing when a potential data hazard
exists, and
(b) Revising the pipeline to introduce
forwarding paths …
We’ll do those revisions next time

10

5
What about stores?
Two “easy” cases:
1 2 3 4 5 6

add $1, $2, $3 IM Reg DM Reg

IM Reg DM Reg
sw $4, 0($1)

1 2 3 4 5 6

add $1, $2, $3 IM Reg DM Reg

IM Reg DM Reg
sw $1, 0($4)
11

What about stores?


A harder case:
1 2 3 4 5 6

lw $1, 0($2) IM Reg DM Reg

IM Reg DM Reg
sw $1, 0($4)

In what cycle is:


 The load value available?
 The store value needed?

What do we have to add to the datapath?


12

6
Load/Store Bypassing: Extends Datapath#
By cycling the result of Read data EX/MEM MEM/WB

back to be the value for Write


data, the combination
Sequence :
lw $1, 0($2) Address
Data
sw $1, 0($4) 1
memory

Write Read

can operate at normal pipeline


0 1
data data

0
speeds … until there is a cache
miss!
ForwardC

13

Stalls and flushes


We have seen data hazards can occur in pipelined CPUs
when instructions depend upon others still executing
 Many hazards can be resolved by forwarding data from the
pipeline registers, instead of waiting for the writeback stage
 The pipeline continues running at full speed, with one
instruction beginning on every clock cycle
Now, we’ll see some real limitations of pipelining
 Forwarding may not work for data hazards from load
instructions
 Branches affect the instruction fetch for the next clock cycle
In both of these cases we may need to slow down, or
stall, the pipeline

14

7
What about loads?
Imagine if the first instruction in the example was LW
instead of SUB
 The load data doesn’t come from memory until the end of
cycle 4
 But the AND needs that value at the beginning of the same
cycle!
This is a “true” data hazard—the data is simply not
available when its needed
Clock cycle
1 2 3 4 5 6

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

15

Stalling
The easiest solution is to stall the pipeline
We can delay the AND instruction by introducing a 1
cycle delay in the pipeline, often called a bubble

Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

Notice that we’re still using forwarding in cycle 5, to get


data from the MEM/WB pipeline register to the ALU

16

8
Stalling and forwarding
Without forwarding, we’d have to stall for two cycles to
wait for the LW instruction’s writeback stage.
Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

In general, you can always stall to avoid hazards—but


dependencies are very common in real code, and
stalling will often reduce performance significantly

17

Stalling delays the entire pipeline


If we delay the 2nd instruction, we must delay the 3rd too
 This is necessary to make forwarding work between AND and
OR
 It also prevents problems such as two instructions trying to
write to the same register in the same cycle.
Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $12, $2
18

9
Implementing stalls
To implement a stall we force the two instructions after
LW to remain in their ID & IF stages for 1 extra cycle
Clock cycle
1 2 3 4 5 6 7 8
IM Reg DM Reg
lw $2, 20($3)

IM Reg Reg DM Reg


and $12, $2, $5

IM IM Reg DM Reg
or $13, $12, $2

This is easily accomplished


 Don’t update the IF/ID register, so the ID stage is repeated
19
 Don’t update the PC, so the current IF stage is repeated

What about EXE, MEM, WB


But what about the ALU during cycle 4, the data
memory in cycle 5, and the register file write in cycle
6? Clock cycle
1 2 3 4 5 6 7 8
IM Reg DM Reg
lw $2, 20($3)

IM Reg Reg DM Reg


and $12, $2, $5

IM IM Reg DM Reg
or $13, $12, $2

Those units aren’t used in those cycles because of the


stall, so we can set the EX, MEM and WB control
signals to all 0s … the bubble “bubbles” through 20

10
Stall = Nop conversion
Clock cycle
1 2 3 4 5 6 7 8
IM Reg DM Reg
lw $2, 20($3)

IM Reg DM
and -> nop Reg

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg

or $13, $12, $2

The effect of a load stall is to insert an empty or


nop instruction into the pipeline
21

11

You might also like