0% found this document useful (0 votes)
6 views

Comp206 Lecture9

The document discusses data hazards in processors and how forwarding and stalling can resolve them. It describes how the processor detects hazards and how stalling works. Branch hazards and exceptions are also covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Comp206 Lecture9

The document discusses data hazards in processors and how forwarding and stalling can resolve them. It describes how the processor detects hazards and how stalling works. Branch hazards and exceptions are also covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter 4

The Processor

Chapter 4 — The Processor — 1


§4.7 Data Hazards: Forwarding vs. Stalling
Data Hazards in ALU Instructions
• Consider this sequence:
 sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)

• We can resolve hazards with forwarding

Chapter 4 — The Processor — 80


 How do we detect when to forward?
Dependencies & Forwarding

Chapter 4 — The Processor — 81


Datapath with Forwarding

Chapter 4 — The Processor — 88


Load-Use Data Hazard

Need to stall
for one cycle

Chapter 4 — The Processor — 89


How to Stall the Pipeline
• Force control values in ID/EX register
to 0
 EX, MEM and WB do nop (no-operation)
 NOP ≈ bubble NOP: sll $0, $0, 0

• Identify the hazard in the ID stage

Chapter 4 — The Processor — 91


• Insert a bubble into the pipeline: change the EX, MEM, and
WB control fields of the ID/EX pipeline register to 0
 percolated forward at each clock cycle
 no registers or memories are written

• Prevent update of PC and IF/ID register


 Using instruction is decoded again
 Following instruction is fetched again
 1-cycle stall allows MEM to read data for lw
 Can subsequently forward to EX stage
Datapath with Hazard Detection

Chapter 4 — The Processor — 94


Stalls and Performance
The BIG Picture
• Stalls reduce performance
 But are required to get correct results

• Compiler can arrange code to avoid hazards and stalls


 Requires knowledge of the pipeline structure

Chapter 4 — The Processor — 95


§4.8 Control Hazards
Control/branch hazard: Delay in
determining the proper instruction to
Branch Hazards fetch

branch/not decided

Chapter 4 — The Processor — 96


Flush these
instructions
(Set control
values to 0)

PC
Branch Hazards
• Less frequent than data hazards
• We don’t have an effective approach like forwarding.
 Static branch prediction: assume branch NOT taken
 Continue to execute sequentially
 If prediction is wrong: FLUSH
 set control signals to 0
 change the instructions in IF , ID and EX stages)
 Reduce the delay of branches

Chapter 4 — The Processor — 97


 Dynamic branch prediction
 Look up the addr. of instruction
 If branch is taken the last time, execute the same
sequence of instructions
Reducing Branch Delay
• Move hardware to determine outcome from MEM to ID stage
 Branch adder moved from EX to ID stage
 Fewer instructions need to be flushed
 Many branches rely on simple tests
 can be done with a few gates of circuitry
 Target address adder
 Register comparator

Chapter 4 — The Processor — 98


• Example: branch taken
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
56: slt $15, $6, $7
...
72: lw $4, 50($7)
Reducing Branch Delay
• Example: branch taken
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
Determines the branch will be
56: slt $15, $6, $7
taken:

Chapter 4 — The Processor — 99


...
Next PC = 40 + 4 + 7*4
72: lw $4, 50($7)
= 72

cc1 cc2 cc3 cc4 cc5 cc6 cc7


sub IF ID EX MEM WB
beq IF ID EX MEM WB
and IF ID EX MEM WB flushed
Reducing Branch Delay
• Example: branch taken
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
Determines the branch will be
56: slt $15, $6, $7

Chapter 4 — The Processor — 100


taken:
...
Next PC = 40 + 4 + 7*4
72: lw $4, 50($7)
= 72

cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8


sub IF ID EX MEM WB
beq IF ID EX MEM WB
NOP - - - - -
lw IF ID EX MEM WB
Example: Branch Taken

Chapter 4 — The Processor — 101


Example: Branch Taken

Chapter 4 — The Processor — 102


Data Hazards for Branches
• If a comparison register is a destination of 2 nd or 3rd preceding
ALU instruction

add $1, $2, $3 IF ID EX MEM WB

Chapter 4 — The Processor — 103


add $4, $5, $6 IF ID EX MEM WB

… IF ID EX MEM WB

beq $1, $4, target IF ID EX MEM WB

◼ Can resolve using forwarding


Data Hazards for Branches
• If a comparison register is a destination of preceding ALU
instruction or 2nd preceding load instruction
 Need 1 stall cycle

Chapter 4 — The Processor — 104


lw $1, addr IF ID EX MEM WB

add $4, $5, $6 IF ID EX MEM WB

beq stalled IF ID

beq $1, $4, target ID EX MEM WB


Data Hazards for Branches
• If a comparison register is a destination of immediately
preceding load instruction
 Need 2 stall cycles

Chapter 4 — The Processor — 105


lw $1, addr IF ID EX MEM WB

beq stalled IF ID

beq stalled ID

beq $1, $0, target ID EX MEM WB


Dynamic Branch Prediction
• Simple branch prediction: assume branch is
not taken
• In deeper and superscalar pipelines, branch
penalty is more significant
• Use dynamic prediction

Chapter 4 — The Processor — 106


 Branch prediction buffer (aka branch history
table)
 Indexed by recent branch instruction
addresses
 Stores outcome (taken/not taken)
 To execute a branch
 Check table, expect the same outcome
 Start fetching from fall-through or target
 If wrong, flush pipeline and flip prediction
1-Bit Predictor: Shortcoming
• Even if branch is almost always taken, we can predict wrong
twice
• Inner loop branches mispredicted twice!
outer: …

Chapter 4 — The Processor — 108


inner: …

beq …, …, inner

beq …, …, outer
◼ Mispredict as taken on last iteration of inner loop
◼ Then mispredict as not taken on first iteration of
inner loop next time around
2-Bit Predictor
• Only change prediction on two successive mispredictions

Chapter 4 — The Processor — 109


Calculating the Branch Target
• Even with predictor, still need to calculate the target address
 1-cycle penalty for a taken branch

• Branch target buffer


 Cache of target addresses
 Indexed by PC when instruction fetched in IF stage

Chapter 4 — The Processor — 110


 If hit and instruction is branch predicted taken, can fetch target
immediately
§4.9 Exceptions
Exceptions and Interrupts
• “Unexpected” events requiring change
in flow of control
 Different ISAs use the terms differently

• Exception
 Arises within the CPU

Chapter 4 — The Processor — 111


 e.g., undefined opcode, overflow, syscall, …

• Interrupt
 From an external I/O controller

• Dealing with them without sacrificing


performance is hard
Handling Exceptions
• In MIPS, exceptions managed by a System Control Coprocessor
(CP0)
• Save PC of offending (or interrupted) instruction
 In MIPS: Exception Program Counter (EPC)

• Save indication of the problem

Chapter 4 — The Processor — 112


 In MIPS: Cause register
 Single entry point for all exceptions, Cause Register is used
for decoding
 We’ll assume 1-bit
 0 for undefined opcode, 1 for overflow
An Alternate Mechanism
• Vectored Interrupts
 Handler address determined by the cause

• Example:
 Undefined opcode: C000 0000
 Overflow: C000 0020
 …: C000 0040

Chapter 4 — The Processor — 113


• Instructions either
 Deal with the interrupt, or
 Jump to real handler
Handler Actions
• Read cause, and transfer to relevant handler
• Determine action required
• If restartable
 Take corrective action
 use EPC to return to program

Chapter 4 — The Processor — 114


• Otherwise
 Terminate program
 Report error using EPC, cause, …
Exceptions in a Pipeline
• Another form of control hazard
• Consider overflow on add in EX stage
 add $1, $2, $1
 Prevent $1 from being clobbered
 Complete previous instructions
 Flush add and subsequent instructions

Chapter 4 — The Processor — 115


 Set Cause and EPC register values
 Transfer control to handler

• Similar to mispredicted branch


 Use much of the same hardware
Pipeline with Exceptions

Chapter 4 — The Processor — 116


Exception Properties
• Restartable exceptions
 Pipeline can flush the instruction
 Handler executes, then returns to the instruction
 Refetched and executed from scratch

• PC saved in EPC register


 Identifies causing instruction

Chapter 4 — The Processor — 117


 Actually PC + 4 is saved
 Handler must adjust
Exception Example
 Exception on add in
40 sub $11, $2, $4
44 and $12, $2, $5
48 or $13, $2, $6
4C add $1, $2, $1
50 slt $15, $6, $7
54 lw $16, 50($7)

Chapter 4 — The Processor — 118


 Handler
80000180 sw $25, 1000($0)
80000184 sw $26, 1004($0)

Exception Example

Chapter 4 — The Processor — 119


Exception Example

Chapter 4 — The Processor — 120


Multiple Exceptions
• Pipelining overlaps multiple instructions
 Could have multiple exceptions at once

• Simple approach: deal with exception from earliest instruction


 Flush subsequent instructions
 “Precise” exceptions

Chapter 4 — The Processor — 121


• In complex pipelines
 Multiple instructions issued per cycle
 Out-of-order completion
 Maintaining precise exceptions is difficult!
Superscalar implementation of a processor: common instructions—
integer and floating-point arithmetic, loads, stores, and conditional branches
—can be initiated simultaneously and executed independently.
Such implementations raise a number of complex design issues related to the
instruction pipeline

© 2016 Pearson Education, Inc., Hoboken, NJ. All


Chapter 16

rights reserved.
Instruction-Level Parallelism
and Superscalar Processors
Refers to a machine that
is designed to improve
Term first coined in
S 1987
the performance of the
execution of scalar
u O instructions
p v
er e Represents the next

© 2016 Pearson Education, Inc., Hoboken, NJ. All


sc r In most applications the
bulk of the operations
step in the evolution of
high-performance
a v are on scalar quantities general-purpose
processors
l i
a e
Essence of the approach Concept can be further
r w

rights reserved.
is the ability to execute exploited by allowing
instructions instructions to be
independently and executed in an order
concurrently in different different from the
pipelines program order
Floating point
Integer register file
register file

Memory

Pipelined integer Pipelined floating-


functional unit point functional unit

(a) Scalar organization

It is the responsibility of the hardware & the compiler, to assure that the

© 2016 Pearson Education, Inc., Hoboken, NJ. All


parallel execution does not violate the intent of the program
Floating point
Integer register file
register file

Stream of Memory
instructions

rights reserved.
Pipelined integer Pipelined floating-
functional units point functional units

(b) Superscalar organization

Figure 16.1 Superscalar Organization Compared to Ordinary Scalar Organization


Key: Execute
Ifetch Decode Write
2 stages in 1 cycle → stage
split into 2 completing in half
Simple 4-stage a cycle
pipeline

2 instances of each stage in


parallel
Successive instructions

many pipeline stages perform


tasks that require less than

© 2016 Pearson Education, Inc., Hoboken, NJ. All


Superpipelined half a clock cycle: a doubled
internal clock speed is achieved
(e.g. MIPS R4000)

superpipelined processor falls


Superscalar behind the superscalar processor at
the start of the program and at each

rights reserved.
branch target
0 1 2 3 4 5 6 7 8 9
Time in base cycles

Figure 16.2 Comparison of Superscalar and Superpipeline Approaches


Constraints
• Instruction level parallelism
 Refers to the degree to which the instructions of a program can be
executed in parallel
 A combination of compiler based optimization and hardware
techniques can be used to maximize instruction level parallelism

Limitations:

© 2016 Pearson Education, Inc., Hoboken, NJ. All



 True data dependency
 cannot be eliminated
 Procedural dependency
 Resource conflicts
 can be overcome by duplication of resources
 when an operation takes a long time to complete, resource
conflicts can be minimized by pipelining the appropriate

rights reserved.
functional unit.
 Output dependency
 Antidependency
Key: Execute
Ifetch Decode Write

i0 The consequence for a superscalar


No Dependency
i1
pipeline is more severe, because a
greater magnitude of opportunity is lost
i0
with each delay.
Data Dependency
i1 (i1 uses data computee by i0)
Variable-length instructions:
• another sort of procedural
i0
dependency & length of

© 2016 Pearson Education, Inc., Hoboken, NJ. All


Procedural Dependency
i1/branch
instructions are
i2
i3
unknown(need to be
i4
partially decoded before the
i5 following instruction can
be fetched)
• Prevents the simultaneous
i0
Resource Conflict
fetching required in a

rights reserved.
i1 (i0 and i1 use the same superscalar pipeline
functional unit)
• Superscalar techniques are
0 1 2 3 4 5 6 7 8 9
Time in base cycles more readily applicable to a
RISC or RISC-like
architecture, with its fixed
Figure 16.3 Effect of Dependencies
instruction length
Design Issues
Instruction-Level Parallelism
and Machine Parallelism
• Instruction level parallelism
 Instructions in a sequence are independent
 Execution can be overlapped

© 2016 Pearson Education, Inc., Hoboken, NJ. All


 Governed by data and procedural dependency & operation
latency

• Machine Parallelism
 Ability to take advantage of instruction level parallelism
 Governed by number of parallel pipelines
 Determined by

rights reserved.
 the number of instructions that can be fetched and executed at
the same time (the number of parallel pipelines)
 the speed and sophistication of the mechanisms that the
processor uses to find independent instructions
Instruction-Level Parallelism
• Micro-architectural techniques that are used to exploit ILP
include:
 Instruction pipelining
 Explicitly parallel instruction computing concepts
 Superscalar execution, VLIW, etc

© 2016 Pearson Education, Inc., Hoboken, NJ. All


 multiple execution units & multiple instructions are executed in
parallel
 Out-of-order execution
 independent of both pipelining and superscalar execution
 Register renaming
 Rename registers to avoid false data dependencies. A logical set
of registers are created, a set of logical registers can refer to 1

rights reserved.
physical register
 false data dependency: successive instructions that do not
have any real data dependencies between them using the same
registers
 used to enable out-of-order execution
Instruction-Level Parallelism
 Speculative execution
 execution of complete instructions / parts of instructions before
being certain whether it is on the actual execution path
 e.g control flow speculation where instructions past a control
flow instruction (e.g., a branch) are executed before the target
of the control flow instruction is determined

© 2016 Pearson Education, Inc., Hoboken, NJ. All


 other forms exist: value prediction, memory dependence
prediction and cache latency prediction, etc.
 Branch prediction
 used to avoid stalling for control dependencies to be resolved
 used with speculative execution

rights reserved.
Instruction Issue Policy
•Refers to the
process of initiating
instruction •Refers to the protocol used
execution in the to issue instructions
processor’s •Instruction issue occurs
functional units when instruction moves
from the decode stage of
the pipeline to the first
execute stage of the
pipeline
Instruction issue
processor looks ahead of Instruction issue
policy

© 2016 Pearson Education, Inc., Hoboken, NJ. All


the current point of
execution to locate
instructions that can be
brought into the pipeline
and executed. Superscalar
instruction issue
Three types of
policies can be •The order in which
•In-order issue with orderings are
grouped into the instructions are fetched
in-order completion important:
following •The order in which
•In-order issue with categories:

rights reserved.
instructions are executed
out-of-order •The order in which
completion instructions update the
•Out-of-order issue contents of register and
with out-of-order memory locations
completion
Superscalar pipeline capable of Decode Execute Write Cycle
I1 I2 1
fetching and decoding two I3 I4 I1 I2 2
instructions at a time, having I3 I4 I1 3
I4 I3 I1 I2 4
three separate functional I5 I6 I4 5
units (e.g., two integer arithmetic I6 I5 I3 I4 6
I6 7
and one floating-point arithmetic), I5 I6 8
and having two instances of the
(a) In-order issue and in-order completion
write-back pipeline stage. The
example assumes the following
Decode Execute Write Cycle
constraints on a six-instruction I1 I2 1
code fragment: I3 I4 I1 I2 2

• I1 requires two cycles to

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I4 I1 I3 I2 3
I5 I6 I4 I1 I3 4
execute I6 I5 I4 5
• I3 and I4 conflict for the same I6 I5
I6
6
7
functional unit
• I5 depends on the value (b) In-order issue and out-of-order completion

produced by I4
• I5 and I6 conflict for a
Decode Window Execute Write Cycle
functional unit I1 I2 1
Instructions are fetched in I3 I4 I1,I2 I1 I2 2

rights reserved.
I5 I6 I3,I4 I1 I3 I2 3
pairs: the next two instructions I4,I5,I6 I6 I4 I1 I3 4
must wait until the pair of decode I5 I5 I4 I6 5
I5 6
pipeline stages has cleared.
(c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


To guarantee in-order completion, when Decode Execute Write Cycle
there is a conflict for a functional unit or I1 I2 1
when a functional unit requires more I3 I4 I1 I2 2
I3 I4 I1 3
than one cycle to generate a result, the I4 I3 I1 I2 4
issuing of instructions temporarily stalls I5 I6 I4 5
I6 I5 I3 I4 6
I6 7
I5 I6 8

(a) In-order issue and in-order completion

Decode Execute Write Cycle


superscalar processor: I1 I2 1
I3 I4 I1 I2 2
• I2 is allowed to complete prior to

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I4 I1 I3 I2 3
I1: I3 is completed earlier I5 I6 I4 I1 I3 4

(saving one cycle) I6 I5


I6
I4
I5
5
6
With out-of-order completion, any I6 7
number of instructions may be (b) In-order issue and out-of-order completion
in the execution stage at any
one time allowed by machine
parallelism. Instruction issuing is Decode Window Execute Write Cycle
stalled by a resource conflict, a I1 I2 1

rights reserved.
I3 I4 I1,I2 I1 I2 2
data dependency, or a I5 I6 I3,I4 I1 I3 I2 3
procedural dependency. I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6

(c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


Decode Execute Write Cycle
I1 I2 1
I3 I4 I1 I2 2
I3 I4 I1 3
I4 I3 I1 I2 4
I5 I6 I4 5
I6 I5 I3 I4 6
I6 7
I5 I6 8
true data dependency:
(a) In-order issue and in-order completion
I2->I1
I4->I3 Decode Execute Write Cycle
I1 I2 1
WAW dependency: I3 I4 I1 I2 2

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I4 I1 I3 I2 3

If I3 completes prior to I1, I5 I6


I6 I5
I4 I1
I4
I3 4
5
wrong value in R3 will be I6 I5 6
I6 7
fetched for I4:
• I3 must complete after I1 to (b) In-order issue and out-of-order completion

produce the correct output


values. Decode Window Execute Write Cycle
• Issuing of I3 must be stalled I1 I2 1

rights reserved.
I3 I4 I1,I2 I1 I2 2
if its result might later be I5 I6 I3,I4 I1 I3 I2 3

overwritten by an older
I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
instruction that takes longer I5 6

to complete. (c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


In-order issue: Decode Execute Write Cycle

• only decode instructions up to the I1


I3
I2
I4 I1 I2
1
2
point of a dependency or conflict I3 I4 I1 3
are decoded I5
I4
I6
I3
I4
I1 I2 4
5
I6 I5 I3 I4 6
Out-of-order issue: I6 7
I5 I6 8
• decouple the decode and execute
stages of the pipeline with a (a) In-order issue and in-order completion
buffer referred to as an
instruction window Decode Execute Write Cycle
I1 I2 1
• Put decoded instruction in

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I3 I4 I1 I2 2
the instruction window until I4 I1 I3 I2 3
I5 I6 I4 I1 I3 4
it is I6 I5 I4 5
• When a functional unit I6 I5 6
I6 7
becomes available in the
execute stage, an instruction (b) In-order issue and out-of-order completion
from the instruction window
may be issued to the execute
stage (provided that (1) Decode
I1 I2
Window Execute Write Cycle
1

rights reserved.
needed functional unit is I3 I4 I1,I2 I1 I2 2
available, and (2) no I5 I6 I3,I4 I1 I3 I2 3
I4,I5,I6 I6 I4 I1 I3 4
conflicts or dependencies I5 I5 I4 I6 5
block this instruction) I5 6

(c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


• First three cycles: Decode Execute Write Cycle
• two instructions are fetched & I1 I2 1
decoded & moved to instruction I3 I4 I1 I2 2
I3 I4 I1 3
window at each cycle (window is I4 I3 I1 I2 4
not full) I5 I6 I4 5
I6 I5 I3 I4 6
• Possible to issue I6 ahead of I5 (I5 I6 7
depends on I4, but I6 does not): I5 I6 8

• one cycle is saved in both the (a) In-order issue and in-order completion
execute and write-back
stages Decode Execute Write Cycle
I1 I2 1

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I3 I4 I1 I2 2
I4 I1 I3 I2 3
I5 I6 I4 I1 I3 4
I6 I5 I4 5
I6 I5 6
I6 7

(b) In-order issue and out-of-order completion


• Instruction window is not an
additional pipeline stage
• Processor has sufficient Decode Window Execute Write Cycle
information about that I1 I2 1

rights reserved.
I3 I4 I1,I2 I1 I2 2
instruction to decide when it can I5 I6 I3,I4 I1 I3 I2 3
be issued I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6

(c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


In-order issue & out-of-order Decode Execute Write Cycle
issue: I1 I2 1
• An instruction cannot be issued I3
I3
I4
I4
I1
I1
I2 2
3
if it violates a dependency or I4 I3 I1 I2 4
conflict. I5 I6
I6 I5
I4
I3 I4
5
6
Out-of-order issue: I6 7
• more instructions are available I5 I6 8

for issuing, reducing the (a) In-order issue and in-order completion
probability that a pipeline
stage will have to stall. Decode Execute Write Cycle
• WAR dependency: I1 I2 1

© 2016 Pearson Education, Inc., Hoboken, NJ. All


I3 I4 I1 I2 2
antidependency I4 I1 I3 I2 3
I5 I6 I4 I1 I3 4
I6 I5 I4 5
I6 I5 6
I6 7

(b) In-order issue and out-of-order completion


I3 cannot complete execution before
I2 begins execution and has fetched
its operands: Decode Window Execute Write Cycle
I1 I2 1
• I3 updates register R3, which is

rights reserved.
I3 I4 I1,I2 I1 I2 2
a source operand for I2 I5 I6 I3,I4 I1 I3 I2 3
I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6

(c) Out-of-order issue and out-of-order completion

Figure 16.4 Superscalar Instruction Issue and Completion Policies


Buffer of instructions

Dispatch

Register read

Commit
Rename
Decode

Write back
Fetch

Execute
Issue

© 2016 Pearson Education, Inc., Hoboken, NJ. All


In-order front end
Out-of-order execution

rights reserved.
Figure 16.5 Organization for Out-of-Order Issue with Out-of-Order Completion
Register Renaming
Output(WAW) and antidependencies(WAR)
occur because register contents may not
reflect the correct ordering from the
program (out-of-order execution)

© 2016 Pearson Education, Inc., Hoboken, NJ. All


May result in a pipeline stall (registers
allocated dynamically)

rights reserved.
Duplicate resources: same original register
reference in several different instructions
may refer to different actual registers
Recall the dependencies:
• WAR:

value may not be valid anymore


WAW:

© 2016 Pearson Education, Inc., Hoboken, NJ. All


• RAW:
resource conflict

rights reserved.
Register Renaming
• No subscript: logical register reference

• Subscript: hardware register allocated to hold a new value

• When a new allocation is made for a particular logical register,


subsequent instruction references to that logical register as a source

© 2016 Pearson Education, Inc., Hoboken, NJ. All


operand are made to refer to the most recently allocated hardware
register (in terms of the program sequence of instructions).
R3c in I3 avoids the
• WAR dependency on I2
• WAW dependency on I1
• does not interfere with the correct

rights reserved.
value being accessed by I4
• I3 can be issued immediately with
renaming.

You might also like