0% found this document useful (0 votes)
40 views

EE108b&Lecture&8& & Pipelined&Processor: Christos (Kozyrakis ( (H.p://ee108b.stanford - Edu ( (

The document discusses pipelining processors to improve performance. It explains that pipelining allows overlapping execution of instructions in different pipeline stages. This improves throughput by keeping all hardware busy at each clock cycle, although individual instruction latency is unchanged. A 5-stage pipeline is presented as an example, with stages for instruction fetch, decode, execute, memory access, and writeback. Pipelining can achieve a speedup equal to the number of pipeline stages, but hazards like branch misprediction can reduce this.

Uploaded by

Mo Lê
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

EE108b&Lecture&8& & Pipelined&Processor: Christos (Kozyrakis ( (H.p://ee108b.stanford - Edu ( (

The document discusses pipelining processors to improve performance. It explains that pipelining allows overlapping execution of instructions in different pipeline stages. This improves throughput by keeping all hardware busy at each clock cycle, although individual instruction latency is unchanged. A 5-stage pipeline is presented as an example, with stages for instruction fetch, decode, execute, memory access, and writeback. Pipelining can achieve a speedup equal to the number of pipeline stages, but hazards like branch misprediction can reduce this.

Uploaded by

Mo Lê
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

EE108b&Lecture&8&

&
Pipelined&Processor&
Christos(Kozyrakis(
(
h.p://ee108b.stanford.edu(
(

EE108b&–&Winter&2014&–&Lecture&08&
Announcements&
!  Upcoming(deadlines(
!  HW2,(PA2,(Lab2(

!  Midterm(exam:(Monday(2/20,(6pmK9pm(
!  Included:(lectures(1K9(
!  Closed(books,(1(page(of(notes,(green(page,(calculator(
!  CatchKup(with(reading(material;(uPlize(office(hours(

!  Review(session(on(Friday((
!  2.15K3.15pm(Gates(B01(
( 2
Review:&
Single&Cycle&Processor&
PC [3 1– 28 ] Instru ction [2 5– 0 ] 0 0 0 0
M M
u u
x x
AL U
Ad d 1 1
re su lt
A dd S hift
le ft 2 Ju mp
Re gDst
4 Bra nc h
M emRe ad
Instru ction [3 1– 26 ] M emto Reg
Con trol AL UO p
M emWr ite
AL US rc
Reg Write

Instru ction [2 5– 21 ] R e ad
Read r eg ister 1
PC Read
a d dres s
Instru ction [2 0– 16 ] d a ta 1
R e ad
r eg ister 2 Ze ro
Ins truc tio n 0 Re g is ters Read ALU AL U
[31– 0 ] 0 R ea d
M W rite d a ta 2 r es ult Ad d re ss 1
Instruction u r eg ister M data
u M
me mo ry x u
Instru ction [1 5– 11 ] W rite x
1 Da ta x
d at a 1 m em o ry 0
Write
data
16 32
Instru ction [1 5– 0] Si g n
extend A LU
con tr ol

Instru ctio n [5 – 0]

3
Review:&
Single&Cycle&Processor&
!  Pros(
!  Simple(
!  CPI(=(1((

!  Cons(
!  Cycle(Pme(is(the(worst(case(path(→((long(cycle(Pmes(
!  Worst(case(=(?(
!  Hardware(is(underuPlized(
!  ALU(and(memory(used(only(for(a(fracPon(of(clock(cycle(
!  Not(well(amorPzed!(

!  Best(possible(CPI(is(1(
4
Key&Tools&for&System&Architects&
1.  Pipelining&
2.  Parallelism(
3.  OutKofKorder(execuPon(
4.  PredicPon(
5.  Caching(
6.  IndirecPon(
7.  AmorEzaEon&
8.  Redundancy(
9.  SpecializaPon(
10.  Focus(on(the(common(case(

5
Pipelining&
!  Overlapping(execuPon(
!  Helps(throughput,(not(latency(
!  PotenPal(speedup(=(number(
pipe(stages(
!  Pipeline(rate(limited(by(
slowest(stage(
!  Unbalanced(pipe(stages(
reduces(speedup(
!  Fill/drain(Pme(reduce(
speedup(

6
Pipelining&the&Processor&
!  5(stages,(one(clock(cycle(per(stage(
!  IF:(instrucPon(fetch(from(memory(
!  ID:(instrucPon(decode(&(register(read(

!  EX:(execute(operaPon(or(calculate(address(

!  MEM:(access(memory(operand(

!  WB:(write(result(back(to(register(

Cycle(1( Cycle(2( Cycle(3( Cycle(4( Cycle(5(

lw( IF( RF/ID( EX( MEM( WB(

7
Pipelining&the&Processor&
!  Overlap(instrucPons(in(different(stages(
!  All(hardware(used(all(the(Pme(
!  Clock(cycle(is(fast(

!  CPI(is(sPll(1(

Cycle(1( Cycle(2( Cycle(3( Cycle(4( Cycle(5( Cycle(6( Cycle(7(


Clock(

1st(lw( IF( RF/ID( EX( MEM( WB(

2nd(lw( IF( RF/ID( EX( MEM( WB(

3rd(lw( IF( RF/ID( EX( MEM( WB(

8
Pipeline&Datapath&

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


Read dat a 1
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1 0
Wri te
data
16 32
Sign
extend

9
Load:&Stage&1&(IF)&
lw
Instruction Fetch
0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


dat a 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Wri te
data
16 32
Sign
extend

10
Load:&Stage&2&(ID)&
lw
Register Fetch
0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


dat a 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1 0
Wri te
data
16 32
Sign
extend

11
Load:&Stage&3&(EX)&
lw
Execute
0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


dat a 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1 0
Wri te
data
16 32
Sign
extend

12
Load:&Stage&4&(MEM)&
lw
Memory
0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


dat a 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1 0
Wri te
data
16 32
Sign
extend

13
Load:&Stage&5&(WB)&
lw

0
M
u
Write Back
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


dat a 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
dat a 2 result 1
register M data
u Data M
Write x u
memory x
data 1 0
Wri te
data
16 32
Sign
extend

14
Pipeline&Control&
!  Need(to(control(funcPonal(units(
!  But(they(are(working(on(different(instrucPons!(

!  Not(a(problem(
!  Just(pipeline(the(control(signals(along(with(data(
!  Make(sure(they(line(up(

!  Using(labeling(convenPons(ogen(helps(
!  InstrucPon_rf(–(means(this(instrucPon(is(in(RF(
!  Every(Pme(it(gets(flopped,(changes(pipestage(
!  Make(sure(right(signals(go(to(the(right(places(
15
Control&Signals&
!  Same(control(unit(generates(signals(in(ID(stage(
!  Control(signals(for(EX((
!  (ExtOp,(ALUSrc,(…)(used(1(cycle(later(
!  Control(signals(for(Mem((
!  (MemWr,(Branch)(used(2(cycles(later(
!  Control(signals(for(WB((
!  (MemtoReg,(MemWr)(used(3(cycles(later(

16
Pipelined&Control&

RF/ID( EX( MEM( WB(

ExtOp( ExtOp(
ALUSrc( ALUSrc(

Ex/MEM(Register(

MEM/WB(Register(
ALUOp( ALUOp(
ID/Ex(Register(
IF/ID(Register(

Main( RegDst( RegDst(


Control(
MemWr( MemWr( MemWr(
Branch( Branch( Branch(
MemtoReg( MemtoReg( MemtoReg( MemtoReg(
RegWr( RegWr( RegWr( RegWr(
_rf( _ex( _mem( _wb(

17
PuUng&it&All&Together:&
Pipelined&Processor&
PCSrc

ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB

EX M WB
IF/ID

Add

Add
4 Add result

RegWr ite
Branch
Shift
left 2

Mem Wr ite
ALUSrc
Read

Mem toReg
Ins truc tion

PC Addr ess register 1


Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data

Instruction 16 32 6
[15– 0] Sign ALU MemRead
extend control

Instruction
[20– 16]
0 ALUOp
M
Instruction u
[15– 11] x
1
RegDst

18
MIPS&ISA&designed&for&pipelining&
!  All(instrucPons(are(32Kbits(
!  Easier(to(fetch(and(decode(in(one(cycle(
!  c.f.(x86:(1K(to(17Kbyte(instrucPons(

!  Few(and(regular(instrucPon(formats(
!  Can(decode(and(read(registers(in(one(step(

!  Load/store(addressing(
!  Can(calculate(address(in(3rd(stage,(access(memory(in(4th(stage(

!  Alignment(of(memory(operands(
!  Memory(access(takes(only(one(cycle(

19
Pipeline&Performance&
!  Assume(Pme(for(stages(is(
!  100ps(for(register(read(or(write(
!  200ps(for(other(stages(

!  Compare(pipelined(with(singleKcycle(processor(

Instr Instr fetch Register ALU op Memory Register Total time


read access write
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
ALU ops 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

20
Pipeline&Performance&
Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

21
Pipeline&Speedup&
!  If(all(stages(are(balanced(
!  i.e.,(all(take(the(same(Pme(

!  Time(between(instrucPonspipelined(
=(Time(between(instrucPonsnonpipelined(
( (Number(of(stages(

!  If(not(balanced,(speedup(is(less(

!  Speedup(due(to(increased(throughput(
!  Latency((Pme(for(each(instrucPon)(does(not(decrease(

22
But&Something&Feels&Wrong&
!  Why(stop(at(5(pipeline(stages(
!  If(pipelining(improves(Tclock(&(CP=1(
!  We(should(keep(subdividing(the(cycle(
(
!  Three(issues(
!  Some(things(have(to(complete(in(a(cycle(
!  CPI(is(not(really(one(
!  Cost((area(and(power)(
23
Quiz&
!  Ignoring(all(other(issues,(what(is(the(highest(clock(
frequency(you(can(achieve(with(pipelining?(
!  Lowest(clock(cycle(Pme?(

!  What(are(the(limiPng(factors?((
(

24
Pipeline&Hazards&
!  SituaPons(that(prevent(starPng(the(next(instrucPon(
in(the(next(cycle(
!  Lead(to(CPI(>(1(

!  Structure(hazards(
!  A(required(resource(is(busy(

!  Data(hazard(
!  Must(wait(previous(instrucPons(to(produce/consume(data(

!  Control(hazard(
!  Next(PC(depends(on(previous(instrucPon(

25
Structural&Hazards&
!  Resource(conflict(
!  Two(instrucPons(use(same(hardware(in(the(same(cycle(

!  Example:(pipeline(with(a(single(unified(memory(
!  No(separate(instrucPon(&(data(memories(
!  Load/store(requires(data(access(
!  One(instrucPon(would(have(to(stall(for(that(cycle(
!  Which(one?(
!  Would(cause(a(pipeline(“bubble”(((

!  Other(examples(
!  FuncPonal(units(that(are(not(fully(pipelined((mult,(div)(
26
Avoiding&Structural&Hazards&
1.  Do(nothing((performance(hit)(
2.  Replicate(resources(
!  Separate(instrucPon/data(memories,(mulPported(memories,(…(
3.  Design(away(the(structural(stall(
!  Use(resource(once(per(instrucPon,(always(in(the(same(stage(
!  Example(of(bad(pipeline(arrangement(
!  Load(uses(Register(File’s(Write(port(during(its(5 (stage(
th

1 2 3 4 5
Load IF RF/ID EX MEM WB

!  RKtype(uses(Register(File’s(Write(port(during(the(4th(stage(
1 2 3 4
R-type IF RF/ID EX WB
27
Structural&Hazard&Example&
!  Consider(a(load(followed(immediately(by(an(ALU(operaPon(
!  Register(file(only(has(a(single(write(port(
!  But(need(to(write(the(results(of(the(ALU(and(the(memory(back(

Cycle(1( Cycle(2( Cycle(3( Cycle(4( Cycle(5( Cycle(6( Cycle(7( Cycle(8( Cycle(9(


Clock(

RKtype( IF( RF/ID( EX( WB( Oops!&&We&have&a&problem!&

RKtype( IF( RF/ID( EX( WB(


Load( IF( RF/ID( EX( MEM( WB(
RKtype( IF( RF/ID( EX( WB(
RKtype( IF( RF/ID( EX( WB(

28
Delayed&WriteYback&in&&
5Ystage&Pipeline&
!  Delay(RKtype(register(write(by(one(cycle(
!  Does(this(increase(the(CPI(of(instrucPon?(
!  What(is(the(cost?(
1( 2( 3( 4( 5(
RKtype( IF( RF/ID( EX( MEM( WB(

Cycle(1( Cycle(2( Cycle(3( Cycle(4( Cycle(5( Cycle(6( Cycle(7( Cycle(8( Cycle(9(


Clock(

RKtype( IF( RF/ID( EX( MEM( WB(

RKtype( IF( RF/ID( EX( MEM( WB(

Load( IF( RF/ID( EX( MEM( WB(

RKtype( IF( RF/ID( EX( MEM( WB(


RKtype( IF( RF/ID( EX( MEM( WB( 29
Data&Dependencies&
!  Dependencies(for(instrucPon(j(following(instrucPon(i&
!  Read(ager(Write((RAW(or(true(dependence)(
!  InstrucPon(j(tries(to(read(before(instrucPon(i(tries(to(write(it(
!  Write(ager(Write((WAW(or(output(dependence)(
!  InstrucPon(j(tries(to(write(an(operand(before(i(writes(its(value(
!  Write(ager(Read((WAR(or((anP(dependence)(
!  InstrucPon(j(tries(to(write(a(desPnaPon(before(it(is(read(by(i(
!  Dependencies(through(registers(or(through(memory(

!  Dependencies(are(a(property(of(your(program((always(there)(
!  Dependencies(may(lead(to(hazards(on(a(specific(pipeline(

30
Dependency&Examples&
!  True(dependency(=>(RAW(hazard(
addu $t0, $t1, $t2
subu $t3, $t4, $t0

!  Output(dependency(=>(WAW(hazard(
addu $t0, $t1, $t2
subu $t0, $t4, $t5

!  AnP(dependency(=>(WAR(hazard(
addu $t0, $t1, $t2
subu $t1, $t4, $t5

31
Analyzing&the&Problem&
!  Can(an(output(dependency(cause(a(WAW(hazard(in(5Kstage(pipeline?(

!  Can(an(anPKdependency(cause(a(WAR(hazard(in(5Kstage(pipeline?(

!  Are(these(answers(universally(true?(

32
Dealing&with&RAW&Hazards&&
!  Must(keep(our(“promise”(in(the(instrucPon(set(
!  Each(instrucPon(fully(completes(before(next(on(starts(
!  All(RAW(dependencies(are(respected(

!  Pipelining(may(break(this(promise(
!  Overlapping(i(and(j(
!  i(writes(late(in(the(pipeline((WB);(j(reads(early((ID)(

!  Must(ensure(that(programmers(cannot(observe(this(behavior(
!  Without(necessarily(reverPng(to(singleKcycle(design…((
(
(

33
RAW&Hazard&Example&
!  Dependencies(backwards(in(Pme(are(hazards(
Time&(clock&cycles)& 0 1 2 3 4 5 6 7
IF( ID/RF( EX( MEM( WB(

ALU(
add(r1,r2,r3( Im( Reg( Dm( Reg(
I&
n&

ALU(
s& sub(r4,(r1,(r3( Im( Reg( Dm( Reg(
t&
r.&

ALU(
& and(r6,(r1,(r7( Im( Reg( Dm( Reg(
O&
r&

ALU(
d& or(r8,(r1,(r9( Im( Reg( Dm( Reg(
e&
r&

ALU(
xor(r10,(r1,(r11( Im( Reg( Dm( Reg(

34
SoluEons&for&RAW&Hazards&
!  Delay(the(reading(instrucPon(unPl(data(is(available(
!  Also(called(stalling(or(inserPng(pipeline(bubbles(

!  How(can(we(delay(the(younger(instrucPon?((
!  Compiler(insert(independent(work(or(NOPS(ahead(of(it(
!  NOP(example:(or($0,($0,($0(
!  Disadvantage:(pipelineKspecific(binary(program(
!  Hardware(inserts(NOPs(as(needed((interlocks)(
!  Advantage:(correct(operaPon(for(all(programs/pipelines(
!  Disadvantage:(may(miss(some(opPmizaPon(opportuniPes(
!  Most(modern(machines(
!  Hardware(inserts(NOPs(but(compiler(may(try(to(minimize(need(
(
35
Data&Hazard&Y&Stalls&
!  Eliminate(reverse(Pme(dependency(by(stalling(
Time&(clock&cycles)& 0 1 2 3 4 5 6 7
IF( ID/RF( EX( MEM( WB(

ALU(
add(r1,(r2,(r3( Im( Reg( Dm( Reg(
I&
n&

ALU(
s& sub(r4,(r1,(r3( Im( bubble&bubble&bubble&Reg( Dm( Reg(
t&
r.&
& and(r6,(r1,(r7(

ALU(
O& Im( Reg( Dm(
r&
d& or(r8,(r1,(r9(

ALU(
e& Im( Reg(
r&

xor(r10,(r1,(r11( Im( Reg(

36
How&to&Stall&the&Pipeline&&
!  Discover(need(to(stall(when(2nd(instrucPon(is(in(ID(stage(
!  Repeat(its(ID(stage(unPl(hazard(resolved(
!  Let(all(instrucPons(ahead(of(it(move(forward(
!  Stall(all(instrucPons(behind(it(

1.  Force(control(values(in(ID/EX(register(a(NOP(instrucPon(
!  As(if(you(fetched(or($0,($0,($0(
!  When(it(propagates(to(EX,(MEM(and(WB,(nothing(will(happen(
2.  Prevent(update(of(PC(and(IF/ID(register(
!  Using(instrucPon(is(decoded(again(
!  Following(instrucPon(is(fetched(again(
37
Performance&Effect&
!  Stalls(can(have(a(significant(effect(on(performance(

!  Consider(the(following(case(
!  The(ideal(CPI(of(the(machine(is(1((
!  A(RAW(hazard(causes(a(3(cycle(stall(

!  If(40%(of(the(instrucPons(cause(a(stall?(
!  The(new(effecPve(CPI(is(1(+(3(×(0.4(=(2.2(
!  And(the(real(%(is(probably(higher(than(40%(

!  You(get(less(than(½(the(desired(performance!(

38
Reducing&Stalls&
!  Key:(when(you(say(new(data(is(actually(available?(

!  In(the(5Kstage(pipeline(
!  Ager(WB(stage?(
!  During(WB(stage?(
!  Register(file(is(typically(fast(
!  Write(in(the(first(half,(read(in(the(second(half(
!  Ager(EX(stage?(

39
Decreasing&Stalls:&Fast&RF&
!  Register(file(writes(on(first(half(and(reads(on(second(half(
Time&(clock&cycles)& 0 1 2 3 4 5 6 7
IF( ID/RF( EX( MEM( WB(

ALU(
add(r1,(r2,(r3( Im( Reg( Dm( Reg(
I&
n&

ALU(
s& sub(r4,(r1,(r3( Im( bubble&bubble& Reg( Dm( Reg(
t&
r.&
& and(r6,(r1,(r7(

ALU(
O& Im( Reg( Dm(
r&
d& or(r8,(r1,(r9(

ALU(
e& Im( Reg(
r&

xor(r10,(r1,(r11( Im( Reg(

40
Performance&Effect&
!  Stalls(can(have(a(significant(effect(on(performance(
!  Consider(the(following(case(
!  The(ideal(CPI(of(the(machine(is(1((
!  A(RAW(hazard(causes(a(2(cycle(stall(

!  If(40%(of(the(instrucPons(cause(a(stall?(
!  The(new(effecPve(CPI(is(1(+(2(×(0.4(=(1.8(
!  And(the(real(%(is(probably(higher(than(40%(

!  You(get(a(li.le(more(than(½(the(desired(performance!(

(
41
Reducing&Stalls&–&one&step&beyond&
!  Key(is(to(be(careful(about(when(((
!  Data(is(actually(available(as(output(
!  Data(is(actually(required(as(an(input(

!  In(our(example:(
!  Data(becomes(available(when(add(finishes(EX(stage(
!  Cycle((2(
!  Data(needed(by(sub(at(the(beginning(of(its(EX(stage(
!  Cycle(3((the(soonest(possible)(
!  If(you(can(use(this(value,(the(stall(for(ALU(is(zero!(

!  Fastest,(but(requires(more(hardware(–(called(forwarding(
!  Aka(bypassing,(shortKcircuiPng(
42
Decreasing&Stalls:&Forwarding&
!  “Forward”(the(data(to(the(appropriate(unit(
Time&(clock&cycles)& 0 1 2 3 4 5 6 7
IF( ID/RF( EX( MEM( WB(

ALU(
add(r1,(r2,(r3( Im( Reg( Dm( Reg(
I&
n&
s&

ALU(
t& sub(r4,(r1,(r3( Im( Reg( Dm( Reg(
r.&
&

ALU(
O& and(r6,(r1,(r7( Im( Reg( Dm( Reg(
r&
d&

ALU(
e&
or(r8,(r1,(r9( Im( Reg( Dm( Reg(
r&

ALU(
xor(r10,(r1,(r11( Im( Reg( Dm( Reg(

43
Forwarding&LimitaEon:&
LoadYUse&Case&
!  Data(is(not(available(yet(to(be(forwarded(
0 (1((((((((2((((((((3(((((((((4(((((((((5((((((((((6((((((((7(
Time&(clock&cycles)&
IF( ID/RF( EX( MEM( WB(

ALU(
lw(r1,(0(r2)( Im( Reg( Dm( Reg(
I&
n&
s&

ALU(
t& sub(r4,(r1,(r6( Im( Reg( Dm( Reg(
r.&
&

ALU(
O& and(r6,(r1,(r7( Im( Reg( Dm( Reg(
r&
d&

ALU(
e&
or(r8,(r1,(r9( Im( Reg( Dm( Reg(
r&

44
LoadYUse&Case:&Hardware&Stall&
!  A(pipeline&interlock(checks(and(stops(the(instrucFon&issue&
Time&(clock&cycles)&
IF( ID/RF( EX( MEM( WB(

ALU(
lw(r1,(0(r2)( Im( Reg( Dm( Reg(
I&
n&
s&

ALU(
t& sub(r4,(r1,(r3( Im( Reg( bubble& Dm( Reg(
r.&
&
Im(

ALU(
O&
and(r6,(r1,(r7( bubble& Reg( Dm( Reg(
r&
d&
e&

ALU(
r& or(r8,(r1,(r9( Im( Reg( Dm( Reg(

45
IdenEfying&the&&
Forwarding&Datapaths&
!  IdenPfy(all(stages(that(produce(new(values(
!  EX(and(MEM(
!  All(stages(ager(first(producer(are(sources(of(forwarding(data(
!  MEM,(WB(

!  IdenPfy(all(stages(that(really(consume(values(
!  EX(and(MEM(
!  These(stages(are(the(desPnaPons(of(a(forwarding(data(

!  Add(mulPplexor(for(each(pair(of(source/desPnaPon(stages(
!  Consider(both(possible(instrucPon(operands(

46
Forwarding&Paths:&ParEal&

47
Forwarding&Control&
!  Pass(register(numbers(along(pipeline(
!  e.g.,(ID/EX.RegisterRs(=(register(number(for(Rs(in(ID/EX(pipeline(register(

!  ALU(operand(register(numbers(in(EX(stage(are(given(by(
!  ID/EX.RegisterRs,(ID/EX.RegisterRt(

!  Data(hazards(possible(when(
!  1a.(EX/MEM.RegisterRd(==(ID/EX.RegisterRs(
Fwd(from(
!  1b.(EX/MEM.RegisterRd(==(ID/EX.RegisterRt( EX/MEM(
!  2a.(MEM/WB.RegisterRd(==(ID/EX.RegisterRs( pipeline(reg(

!  2b.(MEM/WB.RegisterRd(==(ID/EX.RegisterRt( Fwd(from(
MEM/WB(
pipeline(reg(
48
Forwarding&Control&
!  But(only(if(forwarding(instrucPon(will(write(to(a(register!(
!  EX/MEM.RegWrite,(MEM/WB.RegWrite(

!  And(if(Rd(for(that(instrucPon(is(not($zero(
!  EX/MEM.RegisterRd(≠(0,(
MEM/WB.RegisterRd(≠(0(

!  And(if(forwarding(instrucPon(is(not(a(load(in(MEM(stage(
!  EX/MEM.MemToReg==0(
!  This(is(a(case(we(have(to(stall…((

49
Forwarding&Control&
(Stall&Case&not&Shown)&
!  EX(hazard(
!  if((EX/MEM.RegWrite(and((EX/MEM.RegisterRd(≠(0)(
((((and((EX/MEM.RegisterRd(==(ID/EX.RegisterRs))(
((ForwardA(=(10(
!  if((EX/MEM.RegWrite(and((EX/MEM.RegisterRd(≠(0)(
((((and((EX/MEM.RegisterRd(==(ID/EX.RegisterRt))(
((ForwardB(=(10(
!  MEM(hazard(
!  if((MEM/WB.RegWrite(and((MEM/WB.RegisterRd(≠(0)(
((((and((MEM/WB.RegisterRd(==(ID/EX.RegisterRs))(
((ForwardA(=(01(
!  if((MEM/WB.RegWrite(and((MEM/WB.RegisterRd(≠(0)(
((((and((MEM/WB.RegisterRd(==(ID/EX.RegisterRt))(
((ForwardB(=(01(

50
Double&Data&Hazard&
!  Consider(the(sequence:(
add $1,$1,$2
sub $1,$1,$3
or $1,$1,$4

!  Both(hazards(occur(
!  Want(to(use(the(most(recent(result(from(the(sub(

!  Revise(MEM(hazard(condiPon(
!  Only(fwd(if(EX(hazard(condiPon(isn’t(true(

51
Forwarding&Control&(Revised)&
!  MEM(hazard(
!  if((MEM/WB.RegWrite(and((MEM/WB.RegisterRd(≠(0)(
((((and(not((EX/MEM.RegWrite(and((EX/MEM.RegisterRd(≠(0)(
(((((((((((((((((and((EX/MEM.RegisterRd(==(ID/EX.RegisterRs))(
((((and((MEM/WB.RegisterRd(=(ID/EX.RegisterRs))(
((ForwardA(=(01(
!  if((MEM/WB.RegWrite(and((MEM/WB.RegisterRd(≠(0)(
((((and(not((EX/MEM.RegWrite(and((EX/MEM.RegisterRd(≠(0)(
(((((((((((((((((and((EX/MEM.RegisterRd(==(ID/EX.RegisterRt))(
((((and((MEM/WB.RegisterRd(=(ID/EX.RegisterRt))(
((ForwardB(=(01(
52
Datapath&with&Forwarding&

53
LoadYUse&Data&Hazard&

Need to stall
for one cycle

54
LoadYUse&Hazard&DetecEon&
!  Check(when(use(instrucPon(is(decoded(in(ID(stage(

!  ALU(register(numbers(in(ID(stage(are(given(by(
!  IF/ID.RegisterRs,(IF/ID.RegisterRt(

!  LoadKuse(hazard(when(
!  ID/EX.MemRead(and(
((((ID/EX.RegisterRt(=(IF/ID.RegisterRs)(or(
((((ID/EX.RegisterRt(=(IF/ID.RegisterRt))(
!  If(detected,(stall(and(insert(bubble(
55
Datapath&with&&
Hazard&DetecEon&

56
Example:&LoadYUse&Stall&
sub r4, r1, r3 lw r1, 0(r2)

57
Example:&LoadYUse&Stall&
1&cycle&later&
sub r4, r1, r3 nop lw r1, 0(r2)

58
Looking&Ahead&
!  Compilers(and(data(hazards(

!  Control(hazards(

!  ExcepPons(and(interrupts(

!  Advanced(pipelining(–((CPI(<(1.0)(

59

You might also like