RISC
Pipeline
Han
Wang
CS3410,
Spring
2010
Computer
Science
Cornell
University
See:
P&H
Chapter
4.6
1
Homework
2
0
1
2
3
4
5
6
7
8
9
Announcements
- Homework
2
due
tomorrow
midnight
- Programming
Assignment
1
release
tomorrow
- Pipelined
MIPS
processor
(topic
of
today)
- Subset
of
MIPS
ISA
-
Feedback
- We
want
to
hear
from
you!
- Content?
Absolute
Jump
Prog.
inst
Mem
+4
+4
Reg.
File
5
5
5
=?
cmp
ALU
addr
PC
oset
+
||
control
imm
Data
Mem
Could
have
used
ALU
for
link
add
tgt
ext
op
0x3
mnemonic
description
JAL
target
r31
=
PC+8
(+8
due
to
branch
delay
slot)
PC
=
(PC+4)31..28
||
(target
<<
2)
A
Processor
Review:
Single
cycle
processor
memory
inst
register
le
+4
alu
+4
PC
oset
=?
control
imm
extend
cmp
addr
din
dout
memory
new
pc
target
Single
Cycle
Processor
Advantages
Single
Cycle
per
instruc`on
make
logic
and
clock
simple
Disadvantages
Since
instruc`ons
take
dierent
`me
to
nish,
memory
and
func`onal
unit
are
not
eciently
u`lized.
Cycle
`me
is
the
longest
delay.
Load
instruc`on
Best
possible
CPI
is
1
Pipeline
Hazards
0h
1h
2h
3h
A
Processor
memory
inst
register
le
alu
+4
addr
control
din
compute
jump/branch
targets
dout
memory
PC
new
pc
imm
extend
Instruc`on
Fetch
Instruc`on
Decode
Execute
Memory
Write -
Back
8
Basic
Pipeline
Five
stage
RISC
load-store
architecture
1.Instruc`on
fetch
(IF)
get
instruc`on
from
memory,
increment
PC
translate
opcode
into
control
signals
and
read
registers
2.Instruc`on
Decode
(ID)
3.Execute
(EX)
perform
ALU
opera`on,
compute
jump/branch
targets
4.Memory
(MEM)
access
memory
if
needed
5.Writeback
(WB)
update
register
le
Slides
thanks
to
Sally
McKee
&
Kavita
Bala
Pipelined
Implementa`on
Break
instruc`ons
across
mul`ple
clock
cycles
(ve,
in
this
case)
Design
a
separate
stage
for
the
execu`on
performed
during
each
clock
cycle
Add
pipeline
registers
to
isolate
signals
between
dierent
stages
10
Pipelined
Processor
alu
B
+4
addr
inst
control
extend
PC
new
pc
din
dout
imm
compute
jump/branch
targets
memory
Instruc`on
Fetch
IF/ID
Instruc`on
Decode
ID/EX
Write Memory
-
Execute
Back
EX/MEM
MEM/WB
ctrl
ctrl
11
ctrl
memory
register
le
IF
Stage
1:
Instruc`on
Fetch
Fetch
a
new
instruc`on
every
cycle
Current
PC
is
index
to
instruc`on
memory
Increment
the
PC
at
end
of
cycle
(assume
no
branches
for
now)
Write
values
of
interest
to
pipeline
register
(IF/ID)
Instruc`on
bits
(for
later
decoding)
PC+4
(for
later
compu`ng
branch
targets)
12
IF
instruc`on
memory
addr
+4
1
WE
00
=
read
word
PC
new
pc
pcsel
pcreg
pcrel
pcabs
IF/ID
13
Rest
of
pipeline
mc
PC+4
inst
ID
Stage
2:
Instruc`on
Decode
On
every
cycle:
Read
IF/ID
pipeline
register
to
get
instruc`on
bits
Decode
instruc`on,
generate
control
signals
Read
from
register
le
Write
values
of
interest
to
pipeline
register
(ID/EX)
Control
informa`on,
Rd
index,
immediates,
osets,
Contents
of
Ra,
Rb
PC+4
(for
compu`ng
branch
targets
later)
14
result
ID
Stage
1:
Instruc`on
Fetch
extend
IF/ID
ID/EX
15
ctrl
PC+4
imm
decode
PC+4
Rest
of
pipeline
WE
A
Rd
register
D
le
B
Ra
Rb
dest
inst
EX
Stage
3:
Execute
On
every
cycle:
Read
ID/EX
pipeline
register
to
get
values
and
control
bits
Perform
ALU
opera`on
Compute
targets
(PC+4+oset,
etc.)
in
case
this
is
a
branch
Decide
if
jump/branch
should
be
taken
Write
values
of
interest
to
pipeline
register
(EX/MEM)
Control
informa`on,
Rd
index,
Result
of
ALU
opera`on
Value
in
case
this
is
a
memory
store
instruc`on
16
pcrel
ID/EX
ctrl
PC+4
imm
B
A
ctrl
17
pcabs
pcsel
pcreg
Stage
2:
Instruc`on
Decode
+
branch?
alu
B
Rest
of
pipeline
||
EX/MEM
D
EX
MEM
Stage
4:
Memory
On
every
cycle:
Read
EX/MEM
pipeline
register
to
get
values
and
control
bits
Perform
memory
load/store
if
needed
address
is
ALU
result
Write
values
of
interest
to
pipeline
register
(MEM/WB)
Control
informa`on,
Rd
index,
Result
of
memory
opera`on
Pass
result
of
ALU
opera`on
18
Stage
3:
Execute
EX/MEM
ctrl
B
D
din
addr
memory
mc
dout
ctrl
19
MEM/WB
M
Rest
of
pipeline
D
MEM
WB
Stage
5:
Write-back
On
every
cycle:
Read
MEM/WB
pipeline
register
to
get
values
and
control
bits
Select
value
and
write
to
register
le
20
Stage
4:
Memory
MEM/WB
ctrl
M
dest
D
result
21
WB
inst
inst
mem
imm
+4
Rd
OP
Rd
IF/ID
ID/EX
EX/MEM
OP
MEM/WB
OP
22
Rd
PC
PC+4
PC+4
mem
Rd
A
D
B
Ra
Rb
addr
din
dout
Example
add
r3,
r1,
r2;
nand
r6,
r4,
r5;
lw
r4,
20(r2);
add
r5,
r2,
r5;
sw
r7,
12(r3);
23
sw
r r5,
2(r3)
lw
r4,
6,
2,
5
nand
3,
r1,
r2
add
7,
20(r2)
5
r 1 r4,
r
aw
r4,
20(r2)
5
s and
5,
2(r3)
lw
r r3,
r2,
5
ndd
7,
6,
1,
r2
r 1 r4,
r r0
r1
r2
Rd
r3
D
r4
r5
r6
Ra
r7
0
36
A
9
12
18
B
7
41
Rb
77
22
aw
r4,
20(r2)
5
s and
5,
2(r3)
lw
r r3,
r2,
5
ndd
7,
6,
1,
r2
r 1 r4,
r
aw
r4,
20(r2)
5
naw
r4,
20(r2)
s and
5,
2(r3)
lw
r r3,
r2,
5
ndd
7,
6,
1,
r2
r 1 r4,
r
lw
r7,
1r1,
r2
and
5,
2,
5
s dd
r3,
2(r3)
6,
4,
imm
+4
Rd
OP
Rd
IF/ID
ID/EX
EX/MEM
OP
MEM/WB
OP
24
Rd
PC
PC+4
PC+4
mem
0:add
1:nand
inst
2:lw
3:add
mem
4:sw
inst
addr
din
dout
Clock
cycle
1
2
add
nand
lw
add
sw
Time
Graphs
IF
ID
IF
EX
MEM
WB
ID
IF
EX
MEM
WB
ID
IF
EX
MEM
WB
ID
IF
EX
MEM
WB
ID
EX
MEM
WB
Latency:
Throughput:
Concurrency:
CPI
=
25
Pipelining
Recap
Powerful
technique
for
masking
latencies
Logically,
instruc`ons
execute
one
at
a
`me
Physically,
instruc`ons
execute
in
parallel
Instruc`on
level
parallelism
Abstrac`on
promotes
decoupling
Interface
(ISA)
vs.
implementa`on
(Pipeline)
26
The
end
27
Sample
Code
(Simple)
Assume
eight-register
machine
Run
the
following
code
on
a
pipelined
datapath
add
3
1
2
;
reg
3
=
reg
1
+
reg
2
nand
6
4
5
;
reg
6
=
~(reg
4
&
reg
5)
lw
4
20
(2)
;
reg
4
=
Mem[reg2+20]
add
5
2
5
;
reg
5
=
reg
2
+
reg
5
sw
7
12(3)
;
Mem[reg3+12]
=
reg
7
28
Slides
thanks
to
Sally
McKee
M
U
X
target
PC+1
R0
regA
regB
R1
R2
R3
R5
R6
R7
R4
Register
le
PC
Inst
mem
PC+1
ALU
result
valA
valB
oset
M
U
X
A
L
U
ALU
result
mdata
M
U
X
IF/ID
instrucJon
Data
mem
data
dest
valB
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
dest
op
dest
op
dest
op
ID/EX
EX/MEM
MEM/WB
29
data
dest
IF/ID
ID/EX
EX/MEM
MEM/WB
30
add
3
1
2
M
U
X
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
12
R4
18
R5
7
R6
41
R7
22
R0
0
0
0
0
0
0
0
Data
mem
M
U
X
A
L
U
M
U
X
add
3
1
2
data
dest
Fetch:
add
3
1
2
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
0
nop
0
nop
0
nop
Time:
1
IF/ID
ID/EX
EX/MEM
MEM/WB
31
nand
6
4
5
add
3
1
2
M
U
X
2
1
2
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
12
R4
18
R5
7
R6
41
R7
22
R0
0
0
36
9
3
M
U
X
0
0
Data
mem
A
L
U
M
U
X
nand
6
4
5
data
dest
0
0
nop
0
nop
Fetch:
nand
6
4
5
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
3
add
Time:
2
IF/ID
ID/EX
EX/MEM
MEM/WB
32
lw
4
20(2)
nand
6
4
5
add
3
1
2
M
U
X
3
4
5
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
12
R4
18
R5
7
R6
41
R7
22
R0
4
0
18
7
6
36
9
A
L
U
0
0
Data
mem
45
M
U
X
lw
4
20(2)
M
U
X
data
dest
9
3
Fetch:
lw
4
20(2)
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
6
nand
3
add
0
nop
Time:
3
IF/ID
ID/EX
EX/MEM
MEM/WB
33
add
5
2
5
lw
4
20(2)
nand
6
4
5
add
3
1
2
M
U
X
4
2
4
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
12
R4
18
R5
7
R6
41
R7
22
R0
8
0
9
18
20
18
7
A
L
U
45
0
Data
mem
-3
45
M
U
X
add
5
2
5
M
U
X
data
dest
7
6
Fetch:
add
5
2
5
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
4
lw
6
nand
3
add
Time:
4
IF/ID
ID/EX
EX/MEM
MEM/WB
34
sw
7
12(3)
add
5
2
5
lw
4
20
(2)
nand
6
4
5
add
3
1
2
M
U
X
5
2
5
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
45
R4
18
R5
7
R6
41
R7
22
R0
23
0
45
9
7
5
9
A
L
U
-3
0
Data
mem
29
-3
M
U
X
sw
7
12(3)
M
U
20
X
data
dest
18
4
Fetch:
sw
7
12(3)
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
5
add
4
lw
6
nand
Time:
5
IF/ID
ID/EX
EX/MEM
MEM/WB
35
sw
7
12(3)
add
5
2
5
lw
4
20(2)
nand
6
4
5
M
U
X
3
7
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
45
R4
18
R5
7
R6
-3
R7
22
R0
9
0
-3
45
22
12
9
7
A
L
U
29
99
Data
mem
16
29
M
U
X
M
U
X
data
dest
7
5
No
more
instrucJons
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
7
sw
5
add
4
lw
Time:
6
IF/ID
ID/EX
EX/MEM
MEM/WB
36
nop
nop
sw
7
12(3)
add
5
2
5
lw
4
20(2)
M
U
X
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
45
R4
99
R5
7
R6
-3
R7
22
R0
15
0
45
A
L
U
16
0
Data
mem
57
16
M
U
99
X
M
U
12
X
data
dest
22
7
No
more
instrucJons
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
7
sw
5
add
Time:
7
IF/ID
ID/EX
EX/MEM
MEM/WB
37
nop
nop
nop
sw
7
12(3)
add
5
2
5
M
U
X
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
45
R4
99
R5
16
R6
-3
R7
22
R0
16
M
U
X
57
57
A
L
U
22
Data
mem
M
U
X
data
dest
5
22
No
more
instrucJons
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
7
sw
Time:
8
IF/ID
ID/EX
EX/MEM
MEM/WB
38
Slides
thanks
to
Sally
McKee
nop
nop
nop
nop
sw
7
12(3)
M
U
X
PC
Register
le
Inst
mem
0
R1
36
R2
9
R3
45
R4
99
R5
16
R6
-3
R7
22
R0
M
U
X
Data
mem
A
L
U
M
U
X
data
dest
No
more
instrucJons
Bits
0-2
Bits
15-17
Bits
21-23
M
U
X
Time:
9
IF/ID
ID/EX
EX/MEM
MEM/WB
39