0% found this document useful (0 votes)

187 views

Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties

1. Branch instructions can significantly impact pipeline performance as they are common in programs and may cause the pipeline to stall. 2. Techniques like instruction fetching units, instruction queues, and delayed branching can help reduce branch penalties by continuing to fetch instructions after a branch and executing instructions in the branch delay slot. 3. With delayed branching, the instruction immediately after a branch is executed before the branch outcome is known, allowing some work to be done during what would otherwise be stall cycles. The compiler aims to find an instruction that can be safely executed in the delay slot regardless of the branch outcome.

Uploaded by

vinnisharma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views

Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties

Uploaded by

vinnisharma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Datorarkitektur Fö 4 - 1 Datorarkitektur Fö 4 - 2

Reducing Pipeline Branch Penalties

INSTRUCTION PIPELINING (II) • Branch instructions can dramatically affect pipeline

performance. Control operations (conditional and
unconditional branch) are very frequent in current
programs.

1. Reducing Pipeline Branch Penalties

• Some statistics:
- 20% - 35% of the instructions executed are
2. Instruction Fetch Units and Instruction Queues
branches (conditional and unconditional).
- Conditional branches are much more frequent
3. Delayed Branching than unconditional ones (more than two times).
More than 50% of conditional branches are
4. Branch Prediction Strategies taken.

5. Static Branch Prediction • It is very important to reduce the penalties

produced by branches.

6. Dynamic Branch Prediction

7. Branch History Table

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 3 Datorarkitektur Fö 4 - 4

Instruction Fetch Units and Instruction Queues Delayed Branching

These are the pipeline sequences for a conditional
•
Most processors employ sophisticated fetch units
branch instruction (see slide 12, lecture 3)
that fetch instructions before they are needed and
store them in a queue. Branch is taken
Instruction Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
cache
ADD R1,R2 FI DI COFO EI WO
Instruction Queue
Instruction Rest of the BEZ TARGET FI DI COFO EI WO
Fetch Unit pipeline the target FI stall stall FI DI COFO EI WO

• The fetch unit also has the ability to recognize Penalty: 3 cycles
branch instructions and to generate the target
address. Thus, penalty produced by unconditional Branch is not taken
branches can be drastically reduced: the fetch unit Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
computes the target address and continues to fetch
instructions from that address, which are sent to the ADD R1,R2 FI DI COFO EI WO
queue. Thus, the rest of the pipeline gets a
continuous stream of instructions, without stalling. BEZ TARGET FI DI COFO EI WO

• The rate at which instructions can be read (from the instr i+1 FI stall stall DI COFO EI WO
instruction cache) must be sufficiently high to avoid
Penalty: 2 cycles
an empty queue.
• With conditional branches penalties can not be
avoided. The branch condition, which usually • The idea with delayed branching is to let the CPU
depends on the result of the preceding instruction, do some useful work during some of the cycles
has to be known in order to determine the following which are shown above to be stalled.
instruction.
• With delayed branching the CPU always executes
the instruction that immediately follows after the
Observation
branch and only then alters (if necessary) the
• In the Pentium 4, the instruction cache (trace sequence of execution. The instruction after the
cache) is located between the fetch unit and the branch is said to be in the branch delay slot.
instruction queue (See Fö 2, slide 31).

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 5 Datorarkitektur Fö 4 - 6

Delayed Branching (cont’d) Delayed Branching (cont’d)

This is what the programmer has written

This instruction does not This happens in the pipeline:
influence any of the instructions
MUL R3,R4 R3 ← R3*R4
which follow until the branch; it Branch is taken At this moment, both the
also doesn’t influence the SUB #1,R2 R2 ← R2-1 condition (set by ADD) and
outcome of the branch. ADD R1,R2 R1 ← R1+R2 the target address are known.
This instruction should BEZ TAR branch if zero
Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
be executed only if the MOVE #10,R1 R1 ← 10
branch is not taken.
------------- ADD R1,R2 FI DI COFO EI WO
TAR ------------- BEZ TAR FI DI COFO EI WO
MUL R3,R4 FI DI COFO EI WO
• The compiler (assembler) has to find an instruction the target FI stall FI DI COFO EI WO
which can be moved from its original place into the
branch delay slot after the branch and which will be Penalty: 2 cycles
executed regardless of the outcome of the branch.
Branch is not take
This is what the compiler (assembler) has produced and At this moment the condition is
what actually will be executed: known and the MOVE can go on.
Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
SUB #1,R2
ADD R1,R2 ADD R1,R2 FI DI COFO EI WO
This instruction will be execut-
BEZ TAR ed regardless of the condition.
BEZ TAR FI DI COFO EI WO
MUL R3,R4 MUL R3,R4 FI DI COFO EI WO
MOVE #10,R1 MOVE FI stall DI COFO EI WO
This will be executed only if the
------------- branch has not been taken Penalty: 1 cycle
TAR -------------

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 7 Datorarkitektur Fö 4 - 8

Delayed Branching (cont’d) Branch Prediction

•
In the last example we have considered that the
branch will not be taken and we fetched the
• What happens if the compiler is not able to find an instruction following the branch; in the case the
instruction to be moved after the branch, into the branch was taken the fetched instruction was
branch delay slot? discarded. As result, we had
1 if the branch is not taken
(prediction fulfilled)
In this case a NOP instruction (an instruction that branch penalty of
2 if the branch is taken
does nothing) has to be placed after the branch. In
(prediction not fulfilled)
this case the penalty will be the same as without
delayed branching. • Let us consider the opposite prediction: branch taken.
For this solution it is needed that the target address
Now, with R2, this instruction in- is computed in advance by an instruction fetch unit.
MUL R2,R4 fluences the following ones and
cannot be moved from its place. Branch is taken
SUB #1,R2 Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
ADD R1,R2 ADD R1,R2 FI DI COFO EI WO
BEZ TAR
BEZ TAR FI DI COFO EI WO
NOP
MUL R3,R4 FI DI COFO EI WO
MOVE #10,R1
the target FI stall DI COFO EI WO
-------------
TAR ------------- Penalty: 1 cycle (prediction fulfilled)
Branch is not taken
• Some statistics show that for between 60% and Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
85% of branches, sophisticated compilers are able ADD R1,R2 FI DI COFO EI WO
to find an instruction to be moved into the branch
delay slot. BEZ TAR FI DI COFO EI WO
MUL R3,R4 FI DI COFO EI WO
MOVE FI stall FI DI COFO EI WO
Penalty: 2 cycles (prediction not fulfilled)

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 9 Datorarkitektur Fö 4 - 10

Branch Prediction (cont’d) Static Branch Prediction

• Correct branch prediction is very important and can
produce substantial performance improvements. • Static prediction techniques do not take into
consideration execution history.
• Based on the predicted outcome, the respective in-
struction can be fetched, as well as the instructions
following it, and they can be placed into the instruc- Static approaches:
tion queue (see slide 3). If, after the branch condi-
tion is computed, it turns out that the prediction was
correct, execution continues. On the other hand, if • Predict never taken (Motorola 68020): assumes
the prediction is not fulfilled, the fetched instruc- that the branch is not taken.
tion(s) must be discarded and the correct instruc-
tion must be fetched. • Predict always taken: assumes that the branch is
taken.
• To take full advantage of branch prediction, we can
have the instructions not only fetched but also
begin execution. This is known as speculative • Predict depending on the branch direction
execution. (PowerPC 601):
- predict branch taken for backward branches;
• Speculative execution means that instructions are - predict branch not taken for forward branches.
executed before the processor is certain that they are
in the correct execution path. If it turns out that the
prediction was correct, execution goes on without
introducing any branch penalty. If, however, the pre-
diction is not fulfilled, the instruction(s) started in ad-
vance and all their associated data must be purged
and the state previous to their execution restored.

• Branch prediction strategies:

1. Static prediction
2. Dynamic prediction

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 11 Datorarkitektur Fö 4 - 12

Dynamic Branch Prediction Two-Bit Prediction Scheme

• With a two-bit scheme predictions can be made

• Dynamic prediction techniques improve the depending on the last two instances of execution.
accuracy of the prediction by recording the history of • A typical scheme is to change the prediction only if
conditional branches. there have been two incorrect predictions in a row.
taken
One-Bit Prediction Scheme
• One-bit is used in order to record if the last execu- not taken
tion resulted in a branch taken or not. The system 11 01
predicts the same behavior as for the last time. prd.: taken prd.: taken
taken
Shortcoming
When a branch is almost always taken, then
when it is not taken, we will predict incorrectly not taken
taken
twice, rather than once:
-----------
not taken
LOOP - - - - - - - - - - - 10 00
----------- prd.: not taken prd.: not taken
taken
BNZ LOOP
-----------
- After the loop has been executed for the first time not taken
and left, it will be remembered that BNZ has not
been taken. Now, when the loop is executed
again, after the first iteration there will be a false -----------
prediction; following predictions are OK until the After the first execution of the
last iteration, when there will be a second false LOOP ----------- loop the bits attached to BNZ
prediction. ----------- will be 01; now, there will be
- In this case the result is even worse than with BNZ LOOP always one false prediction for
static prediction considering that backward loops the loop, at its exit.
-----------
are always taken (PowerPC 601 approach).

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 13 Datorarkitektur Fö 4 - 14

Branch History Table Branch History Table (cont’d)

• History information can be used not only to predict Some explanations to previous figure:
the outcome of a conditional branch but also to
avoid recalculation of the target address. Together
- Address where to fetch from: If the branch
with the bits used for prediction, the target address
instruction is not in the table the next instruction
can be stored for later use in a branch history table.
(address PC+1) is to be fetched. If the branch
instruction is in the table first of all a prediction
Instr. Target Pred.
based on the prediction bits is made. Depending
addr. addr. bits
on the prediction outcome the next instruction
(address PC+1) or the instruction at the target
address is to be fetched.
Update - Update entry: If the branch instruction has been
entry
in the table, the respective entry has to be
updated to reflect the correct or incorrect
Addr. of branch instr.

prediction.
Add new

- Add new entry: If the branch instruction has not

entry

been in the table, it is added to the table with the

m e

Y
fro her

corresponding information concerning branch

ch w

outcome and target address. If needed one of

fet ss

N Branch
to ddre

was in the existing table entries is discarded.

TAB? Replacement algorithms similar to those for
A

cache memories are used.

Instruction
EI • Using dynamic branch prediction with history tables
Fetch Unit
up to 90% of predictions can be correct.
discard fetches • Both Pentium and PowerPC 620 use speculative
Instruction and restart from Pred.
correct address execution with dynamic branch prediction based on
cache was
a branch history table.
N OK? Y

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 15 Datorarkitektur Fö 4 - 16

The Intel 80486 Pipeline The ARM pipeline

• The 80486 is the last x86 processor that is not
superscalar. It is a typical example of an advanced
non-superscalar pipeline.
• The 80486 has a five stage pipeline. ARM7 pipeline
• No branch prediction or, in fact, always not taken.

• Fetch: instructions fetched • Fetch: instructions fetched

Fetch Fetch from cache.
instructions from cache and placed into
instruction queue (organised
as two prefetch buffers).
Operates independently of the • Decode: instructions and
Decode
other stages and tries to keep operand registers decoded.
the prefetch buffers full.
• Decode_1: Takes the first 3
Decode_1 bytes of the instruction and • Execute: registers read;
Execute
decodes opcode, addressing- shift and ALU operations;
mode, instruction length; rest results or loaded data
of the instruction is decoded written back to register.
Decode_2 by Decode_2.
• Decode_2: decodes the rest
of the instruction and produces
control signals; preforms
Execute address computation.
• Execute: ALU operations;
cache access for operands.
• Write back: updates registers,
Write back status flags; for memory
update sends values to cache
and to write buffers.

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 17 Datorarkitektur Fö 4 - 18

The ARM pipeline (cont’d) The ARM pipeline (cont’d)

ARM11 pipeline
ARM9 pipeline
Fetch 1 The performance of ARM11 is
• Fetch: instructions fetched further enhanced by:
Fetch from I-cache.

• Higher clock speed due to

Fetch 2 larger number of pipeline
• Decode: instructions and
Decode stages; more even distribution
operand registers decoded;
registers read. of tasks among pipeline stages.

Decode
• Execute: shift and ALU • Branch prediction:
Execute
operations. - Dynamic two bits prediction
based on a 64 entry branch
Issue history table (branch target
Data memory • Data memory access: fetch/ address cache - BTAC).
access store data from/to D-cache. - If the instruction is not in
the BTAC, static prediction
Shift/ is done: taken if backward,
Register • Register write: results or Address not taken if forward.
write loaded data written back to
register. • Decoupling of the load/store
ALU/
Memory 1 pipeline from the ALU&MAC
The performance of the ARM9 is significantly superior (multiply-accumulate) pipeline.
to the ARM7: ALU operations can continue
while load/store operations
• Higher clock speed due to larger number of ALU/
complete (see next slide).
pipeline stages. Memory 2
• More even distribution of tasks among pipeline
stages; tasks have been moved away from the
execute stage Writeback

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Datorarkitektur Fö 4 - 19 Datorarkitektur Fö 4 - 20

The ARM pipeline (cont’d)

Summary
ARM11 pipeline

Fetch 1 • Branch instructions can dramatically affect pipeline

performance. It is very important to reduce
• Fetch 1, 2: penalties produced by branches.
instructions fetched
from I-cache; dynamic • Instruction fetch units are able to recognize branch
Fetch 2 branch prediction. instructions and generate the target address.
Fetching at a high rate from the instruction cache
and keeping the instruction queue loaded, it is
• Decode: instructions possible to reduce the penalty for unconditional
decoded; static branches to zero. For conditional branches this is
Decode branch prediction (if not possible because we have to wait for the
needed). outcome of the decision.
• Delayed branching is a compiler based technique
• Issue: instruction aimed to reduce the branch penalty by moving
Issue issued; registers read. instructions into the branch delay slot.
• Efficient reduction of the branch penalty for
conditional branches needs a clever branch
• Shift: Shift Address • Address: prediction strategy. Static branch prediction does
register shift/ address not take into consideration execution history.
rotate calculation. Dynamic branch prediction is based on a record of
the history of a conditional branch.
ALU 1 Memory 1 • Branch history tables are used to store both
• ALU 1,2: • Memory 1,2: information on the outcome of branches and the
ALU/MAC data memory target address of the respective branch.
operations. access.
ALU 2 Memory 2

• Writeback: • Writeback:
results written Writeback Writeback write loaded
to register data to reg.;
commit store

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
Oracle Corporation 600 Oracle Parkway Redwood Shores CA 94065
No ratings yet
Oracle Corporation 600 Oracle Parkway Redwood Shores CA 94065
1 page
Computer Architecture Question Bank
100% (2)
Computer Architecture Question Bank
16 pages
Hazards PDF
No ratings yet
Hazards PDF
30 pages
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
No ratings yet
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
4 pages
Reducing Pipeline Branch Penalties
No ratings yet
Reducing Pipeline Branch Penalties
4 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
Lec5 PDF
No ratings yet
Lec5 PDF
23 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
pipe3
No ratings yet
pipe3
32 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
6 Stage Pipelining Explained
No ratings yet
6 Stage Pipelining Explained
4 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
Pipeline Hazards (1)
No ratings yet
Pipeline Hazards (1)
53 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
ACA-notes
No ratings yet
ACA-notes
39 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
Branch Hazard in Pipelining
No ratings yet
Branch Hazard in Pipelining
35 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Control Hazards
No ratings yet
Control Hazards
19 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
100% (1)
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
15 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Lec14 Control Hazards
No ratings yet
Lec14 Control Hazards
48 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
App C
No ratings yet
App C
50 pages
Branch Prediction Schemes2
No ratings yet
Branch Prediction Schemes2
5 pages
CA Classes-155-160
No ratings yet
CA Classes-155-160
6 pages
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
No ratings yet
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
24 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
Pipelining
No ratings yet
Pipelining
44 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
Data Hazards
No ratings yet
Data Hazards
29 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
Comp206 Lecture9
No ratings yet
Comp206 Lecture9
53 pages
Lecture05 Branches
No ratings yet
Lecture05 Branches
47 pages
More On Pipelining
100% (1)
More On Pipelining
34 pages
Pipelining Hazards 3
No ratings yet
Pipelining Hazards 3
14 pages
DLCO Module 6 Sem 3
No ratings yet
DLCO Module 6 Sem 3
40 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
PipelineHazards
No ratings yet
PipelineHazards
4 pages
L9 PipelineHazards 2
No ratings yet
L9 PipelineHazards 2
21 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
L02 Branch Prediction V2021
No ratings yet
L02 Branch Prediction V2021
82 pages
Unit 7 - Basic Processing
No ratings yet
Unit 7 - Basic Processing
85 pages
Pipelining: Basic Concepts
No ratings yet
Pipelining: Basic Concepts
20 pages
5.Branch prediction
No ratings yet
5.Branch prediction
25 pages
Group 17_2151177
No ratings yet
Group 17_2151177
15 pages
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
CCNA Exam Excellence: Study Guide & Practice Tests
From Everand
CCNA Exam Excellence: Study Guide & Practice Tests
SUJAN
No ratings yet
CCNA Exam Focus: Study Guide with Practice Tests
From Everand
CCNA Exam Focus: Study Guide with Practice Tests
SUJAN
No ratings yet
Bulk Collect Operations Performance Optimization in PLSQL
No ratings yet
Bulk Collect Operations Performance Optimization in PLSQL
4 pages
Training - Forms 6i
No ratings yet
Training - Forms 6i
209 pages
OPm Inventory Transactions PDF
100% (1)
OPm Inventory Transactions PDF
35 pages
Oracle Approvals Management
0% (1)
Oracle Approvals Management
22 pages
Oracle Fusion Middle Ware 11g
No ratings yet
Oracle Fusion Middle Ware 11g
13 pages
BR 100
No ratings yet
BR 100
141 pages
Appointment Advt
No ratings yet
Appointment Advt
3 pages
PLSQL
No ratings yet
PLSQL
11 pages
Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
Aca 3
No ratings yet
Aca 3
113 pages
COA Mod 3
No ratings yet
COA Mod 3
30 pages
Cimple: Instruction and Memory Level Parallelism: A DSL For Uncovering ILP and MLP
No ratings yet
Cimple: Instruction and Memory Level Parallelism: A DSL For Uncovering ILP and MLP
16 pages
Chapter-10 Parallel Programming Models, Languages and Compilers
No ratings yet
Chapter-10 Parallel Programming Models, Languages and Compilers
30 pages
Avispado 222 DataSheet
No ratings yet
Avispado 222 DataSheet
16 pages
Pentium - Salient Features
No ratings yet
Pentium - Salient Features
16 pages
Chapter 04 RISC V
No ratings yet
Chapter 04 RISC V
130 pages
Intel - Performance Analysis Guide For Intel® Core™ I7 Processor and Intel® Xeon™ 5500 Processors
No ratings yet
Intel - Performance Analysis Guide For Intel® Core™ I7 Processor and Intel® Xeon™ 5500 Processors
72 pages
RiSC Pipe
No ratings yet
RiSC Pipe
9 pages
CH12 CPU Structure and Function
No ratings yet
CH12 CPU Structure and Function
44 pages
Implementation Issue For Super Instructions in Gforth
No ratings yet
Implementation Issue For Super Instructions in Gforth
9 pages
Pipelining
No ratings yet
Pipelining
22 pages
8 Great Ideas in Computer Architecture
50% (2)
8 Great Ideas in Computer Architecture
4 pages
Svcet: 1. What Is MIPS, MIPS Instruction, MIPS Implementation
No ratings yet
Svcet: 1. What Is MIPS, MIPS Instruction, MIPS Implementation
24 pages
COA 2013 Chapter 4 The Processor
No ratings yet
COA 2013 Chapter 4 The Processor
153 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
L13 MIPS Control Hazards
No ratings yet
L13 MIPS Control Hazards
40 pages
TAGE Predictor
No ratings yet
TAGE Predictor
24 pages
Dynamic Cache Management Technique
No ratings yet
Dynamic Cache Management Technique
15 pages
Unit 3
No ratings yet
Unit 3
5 pages
coa mod 3 s4
No ratings yet
coa mod 3 s4
27 pages
General Purpose Processor
No ratings yet
General Purpose Processor
13 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Mumbai_University_Question_Papers (1)
No ratings yet
Mumbai_University_Question_Papers (1)
6 pages
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
No ratings yet
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
18 pages

Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties

Uploaded by

Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties

Uploaded by

Datorarkitektur Fö 4 - 1 Datorarkitektur Fö 4 - 2

Reducing Pipeline Branch Penalties

INSTRUCTION PIPELINING (II) • Branch instructions can dramatically affect pipeline

1. Reducing Pipeline Branch Penalties

5. Static Branch Prediction • It is very important to reduce the penalties

6. Dynamic Branch Prediction

7. Branch History Table

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Instruction Fetch Units and Instruction Queues Delayed Branching

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Delayed Branching (cont’d) Delayed Branching (cont’d)

This is what the programmer has written

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Delayed Branching (cont’d) Branch Prediction

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Branch Prediction (cont’d) Static Branch Prediction

• Branch prediction strategies:

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Dynamic Branch Prediction Two-Bit Prediction Scheme

• With a two-bit scheme predictions can be made

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

Branch History Table Branch History Table (cont’d)

- Add new entry: If the branch instruction has not

been in the table, it is added to the table with the

corresponding information concerning branch

outcome and target address. If needed one of

was in the existing table entries is discarded.

cache memories are used.

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

The Intel 80486 Pipeline The ARM pipeline

• Fetch: instructions fetched • Fetch: instructions fetched

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

The ARM pipeline (cont’d) The ARM pipeline (cont’d)

• Higher clock speed due to

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

The ARM pipeline (cont’d)

Fetch 1 • Branch instructions can dramatically affect pipeline

Petru Eles, IDA, LiTH Petru Eles, IDA, LiTH

You might also like