Comp206 Lecture9
Comp206 Lecture9
The Processor
Need to stall
for one cycle
branch/not decided
PC
Branch Hazards
• Less frequent than data hazards
• We don’t have an effective approach like forwarding.
Static branch prediction: assume branch NOT taken
Continue to execute sequentially
If prediction is wrong: FLUSH
set control signals to 0
change the instructions in IF , ID and EX stages)
Reduce the delay of branches
… IF ID EX MEM WB
beq stalled IF ID
beq stalled IF ID
beq stalled ID
• Exception
Arises within the CPU
• Interrupt
From an external I/O controller
• Example:
Undefined opcode: C000 0000
Overflow: C000 0020
…: C000 0040
Handler
80000180 sw $25, 1000($0)
80000184 sw $26, 1004($0)
…
Exception Example
rights reserved.
Instruction-Level Parallelism
and Superscalar Processors
Refers to a machine that
is designed to improve
Term first coined in
S 1987
the performance of the
execution of scalar
u O instructions
p v
er e Represents the next
rights reserved.
is the ability to execute exploited by allowing
instructions instructions to be
independently and executed in an order
concurrently in different different from the
pipelines program order
Floating point
Integer register file
register file
Memory
It is the responsibility of the hardware & the compiler, to assure that the
Stream of Memory
instructions
rights reserved.
Pipelined integer Pipelined floating-
functional units point functional units
rights reserved.
branch target
0 1 2 3 4 5 6 7 8 9
Time in base cycles
Limitations:
rights reserved.
functional unit.
Output dependency
Antidependency
Key: Execute
Ifetch Decode Write
rights reserved.
i1 (i0 and i1 use the same superscalar pipeline
functional unit)
• Superscalar techniques are
0 1 2 3 4 5 6 7 8 9
Time in base cycles more readily applicable to a
RISC or RISC-like
architecture, with its fixed
Figure 16.3 Effect of Dependencies
instruction length
Design Issues
Instruction-Level Parallelism
and Machine Parallelism
• Instruction level parallelism
Instructions in a sequence are independent
Execution can be overlapped
• Machine Parallelism
Ability to take advantage of instruction level parallelism
Governed by number of parallel pipelines
Determined by
rights reserved.
the number of instructions that can be fetched and executed at
the same time (the number of parallel pipelines)
the speed and sophistication of the mechanisms that the
processor uses to find independent instructions
Instruction-Level Parallelism
• Micro-architectural techniques that are used to exploit ILP
include:
Instruction pipelining
Explicitly parallel instruction computing concepts
Superscalar execution, VLIW, etc
rights reserved.
physical register
false data dependency: successive instructions that do not
have any real data dependencies between them using the same
registers
used to enable out-of-order execution
Instruction-Level Parallelism
Speculative execution
execution of complete instructions / parts of instructions before
being certain whether it is on the actual execution path
e.g control flow speculation where instructions past a control
flow instruction (e.g., a branch) are executed before the target
of the control flow instruction is determined
rights reserved.
Instruction Issue Policy
•Refers to the
process of initiating
instruction •Refers to the protocol used
execution in the to issue instructions
processor’s •Instruction issue occurs
functional units when instruction moves
from the decode stage of
the pipeline to the first
execute stage of the
pipeline
Instruction issue
processor looks ahead of Instruction issue
policy
rights reserved.
instructions are executed
out-of-order •The order in which
completion instructions update the
•Out-of-order issue contents of register and
with out-of-order memory locations
completion
Superscalar pipeline capable of Decode Execute Write Cycle
I1 I2 1
fetching and decoding two I3 I4 I1 I2 2
instructions at a time, having I3 I4 I1 3
I4 I3 I1 I2 4
three separate functional I5 I6 I4 5
units (e.g., two integer arithmetic I6 I5 I3 I4 6
I6 7
and one floating-point arithmetic), I5 I6 8
and having two instances of the
(a) In-order issue and in-order completion
write-back pipeline stage. The
example assumes the following
Decode Execute Write Cycle
constraints on a six-instruction I1 I2 1
code fragment: I3 I4 I1 I2 2
produced by I4
• I5 and I6 conflict for a
Decode Window Execute Write Cycle
functional unit I1 I2 1
Instructions are fetched in I3 I4 I1,I2 I1 I2 2
rights reserved.
I5 I6 I3,I4 I1 I3 I2 3
pairs: the next two instructions I4,I5,I6 I6 I4 I1 I3 4
must wait until the pair of decode I5 I5 I4 I6 5
I5 6
pipeline stages has cleared.
(c) Out-of-order issue and out-of-order completion
rights reserved.
I3 I4 I1,I2 I1 I2 2
data dependency, or a I5 I6 I3,I4 I1 I3 I2 3
procedural dependency. I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6
rights reserved.
I3 I4 I1,I2 I1 I2 2
if its result might later be I5 I6 I3,I4 I1 I3 I2 3
overwritten by an older
I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
instruction that takes longer I5 6
rights reserved.
needed functional unit is I3 I4 I1,I2 I1 I2 2
available, and (2) no I5 I6 I3,I4 I1 I3 I2 3
I4,I5,I6 I6 I4 I1 I3 4
conflicts or dependencies I5 I5 I4 I6 5
block this instruction) I5 6
• one cycle is saved in both the (a) In-order issue and in-order completion
execute and write-back
stages Decode Execute Write Cycle
I1 I2 1
rights reserved.
I3 I4 I1,I2 I1 I2 2
instruction to decide when it can I5 I6 I3,I4 I1 I3 I2 3
be issued I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6
for issuing, reducing the (a) In-order issue and in-order completion
probability that a pipeline
stage will have to stall. Decode Execute Write Cycle
• WAR dependency: I1 I2 1
rights reserved.
I3 I4 I1,I2 I1 I2 2
a source operand for I2 I5 I6 I3,I4 I1 I3 I2 3
I4,I5,I6 I6 I4 I1 I3 4
I5 I5 I4 I6 5
I5 6
Dispatch
Register read
Commit
Rename
Decode
Write back
Fetch
Execute
Issue
rights reserved.
Figure 16.5 Organization for Out-of-Order Issue with Out-of-Order Completion
Register Renaming
Output(WAW) and antidependencies(WAR)
occur because register contents may not
reflect the correct ordering from the
program (out-of-order execution)
rights reserved.
Duplicate resources: same original register
reference in several different instructions
may refer to different actual registers
Recall the dependencies:
• WAR:
• RAW:
resource conflict
rights reserved.
Register Renaming
• No subscript: logical register reference
rights reserved.
value being accessed by I4
• I3 can be issued immediately with
renaming.