0% found this document useful (0 votes)
27 views

Control Hazard

The document discusses techniques for handling control hazards (also called branch hazards) in a pipelined processor. It describes two schemes for resolving control hazards: assuming the branch is not taken and continuing execution down the sequential instruction stream, and reducing the delay of branches by moving the branch execution earlier in the pipeline. It also discusses optimizations like dynamic branch prediction using a branch prediction buffer to improve branch prediction accuracy.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Control Hazard

The document discusses techniques for handling control hazards (also called branch hazards) in a pipelined processor. It describes two schemes for resolving control hazards: assuming the branch is not taken and continuing execution down the sequential instruction stream, and reducing the delay of branches by moving the branch execution earlier in the pipeline. It also discusses optimizations like dynamic branch prediction using a branch prediction buffer to improve branch prediction accuracy.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

CONTROL HAZARD

IMPACT OF THE PIPELINE ON THE BRANCH


INSTRUCTION
CONTROL HAZARD
 An instruction must be fetched at every clock cycle to sustain the
pipeline
 The decision about whether to branch, doesn’t occur until the
MEM pipeline stage
 This delay in determining the proper instruction to fetch is called a
control hazard or branch hazard
 Control hazards are relatively simple to understand
 Occur less frequently than data hazards
 There is nothing as effective against control hazards as forwarding
is against data hazards
 Two schemes for resolving control hazards and one optimization
to improve these schemes
ASSUME BRANCH NOT TAKEN
 Stalling until the branch is complete is too slow
 A common improvement over branch stalling is to assume that
the branch will not be taken
 Continue execution down the sequential instruction stream
 If the branch is taken, the instructions that are being fetched
and decoded must be discarded

 Execution continues at the branch target


 This optimization halves the cost of control hazard If
 branches are untaken half the time, and
 It costs little to discard the instructions
ASSUME BRANCH NOT TAKEN (2)
 To discard instructions, we merely change the original control
values to 0s
 We must also change the three instructions in the IF, ID, and
EX stages when the branch reaches the MEM stage
 Discarding instructions means we must be able to flush
instructions in the IF, ID, and EX stages of the pipeline
REDUCING THE DELAY OF BRANCHES
 One way to improve branch performance is to reduce the cost of
the taken branch
 So far assumed the next PC for a branch is selected in the MEM
stage
 Moving the branch execution earlier in the pipeline, then fewer
instructions need be flushed
 The designers observed that many branches rely only on simple
tests (equality or sign, for example)
 Such tests do not require a full ALU operation but can be done
with at most a few gates
 When a more complex branch decision is required, a separate
instruction that uses an ALU to perform a comparison is required
REDUCING THE DELAY OF BRANCHES -
MOVING THE BRANCH EXECUTION
 Moving the branch decision up requires two actions to occur
earlier:
 Computing the branch target address
 Evaluating the branch decision

 The easy part of this change is to move up the branch address


calculation
 We already have the PC value and the immediate field in the
IF/ID pipeline register
 We move the branch adder from the EX stage to the ID stage

 The branch target address calculation will be performed for all


instructions, but only used when needed
MOVING THE BRANCH DECISION - MOVING
THE BRANCH TEST
 The harder part is the branch decision itself
 For branch equal, we would compare the two registers read during
the ID stage to see if they are equal
 Moving the branch test to the ID stage implies additional forwarding
and hazard detection hardware
 During ID, decode the instruction
 If the instruction is a branch, we can set the PC to the branch target
address
 Forwarding for the operands of branches was formerly handled by
the ALU forwarding logic
 But the introduction of the equality test unit in ID will require new
forwarding logic
REDUCING THE DELAY OF BRANCHES -
STALL
 Data hazard can occur and a stall will be needed
 Because the values in a branch comparison are needed
during ID but may be produced later in time
 Ex., if an ALU instruction immediately preceding a
branch produces one of the operands for the comparison
in the branch, a stall will be required
 Since the EX stage for the ALU instruction will occur
after the ID cycle of the branch
 If a load is immediately followed by a conditional
branch that is on the load result, two stall cycles will be
needed
REDUCING THE DELAY OF BRANCHES (5)
 Despite these difficulties, moving the branch execution to the
ID stage is an improvement
 Because it reduces the penalty of a branch to only one
instruction if the branch is taken (one currently being fetched)
 To flush instructions in the IF stage, we add a control line,
called IF.Flush
 IF.Flush zeros the instruction field of the IF/ID pipeline
register
 Clearing the register transforms the fetched instruction into a
nop, an instruction that has no action and changes no state
PIPELINED BRANCH
 Show what happens when the branch is taken in this instruction
sequence
 36 sub $10, $4, $8

 40 beq $1, $3, 7 # PC-relative branch to 40 + 4 + 7 * 4 = 72


 44 and $12, $2, $5

 48 or $13, $2, $6

 52 add $14, $4, $2

 56 slt $15, $6, $7

...

 72 lw $4, 50($7)

 Assuming the pipeline is optimized for branches that are not


taken and that we moved the branch execution to the ID stage:
PIPELINED BRANCH
PIPELINED BRANCH
PIPELINED BRANCH (4)
 The ID stage of CC 3
 Determines a branch must be taken,
 Selects 72 as the next PC address and
 Zeros the instruction fetched for the next clock cycle

 Clock cycle 4
 Instruction at location 72 being fetched and
 Single bubble or nop instruction in the pipeline as a result of
the taken branch
DYNAMIC BRANCH PREDICTION

 With deeper pipelines, the branch penalty increases when


measured in clock cycles
 A simple static prediction scheme will probably waste too much
performance
 With more hardware it is possible to try to predict branch
behavior during program execution

 Dynamic branch prediction


 One approach is to look up the address of the instruction to see if
a branch was taken the last time this instruction was executed
 if so, to begin fetching new instructions from the same place as
the last time
BRANCH PREDICTION BUFFER

 Branch prediction buffer or branch history table


 A branch prediction buffer is a small memory
 Indexed by the lower portion of the address of the branch
instruction
 The memory contains a bit that says whether the branch was
recently taken or not
 Prediction is just a hint that we hope is correct

 Fetching begins in the predicted direction


 If the hint turns out to be wrong,

 the incorrectly predicted instructions are deleted,


 the prediction bit is inverted and stored back, and
 the proper sequence is fetched and executed
LOOP AND PREDICTION
 Consider a loop branch that branches nine times in a row, then is not
taken once
 What is the prediction accuracy for this branch, assuming the prediction
bit for this branch remains in the prediction buffer?

 The steady-state prediction behavior will mispredict on the first and last
loop iterations
 Mis-predicting the last iteration is inevitable since the prediction bit will
indicate taken, as the branch has been taken nine times in a row at that
point
 The mis-prediction on the first iteration happens because the bit is
flipped on prior execution of the last iteration of the loop, since the
branch was not taken on that exiting iteration
LOOP AND PREDICTION (2)
 The prediction accuracy for this branch that is taken 90% of
the time is only 80%
 Two incorrect predictions and eight correct ones
 Ideally, the accuracy of the predictor would match the taken
branch frequency for these highly regular branches
2-BIT PREDICTION SCHEME
 To remedy this weakness, 2-bit prediction schemes are often used
 In a 2-bit scheme, a prediction must be wrong twice before it is
changed
Assume starting with predict taken
Prediction is correct so it loop
within predict taken for nine
branches
When tenth branch comes
prediction turn out to be wrong
Though wrong it wont change the
decision
Since changing decision requires
two consecutive wrong prediction
When it come back to 1st decision
in a loop, prediction found to be
correct
So prediction accuracy is 90 %
2-BIT PREDICTION SCHEME

You might also like