CA - Slides

The document discusses techniques for improving instruction delivery and speculation in computer architecture, including branch target buffers, return address predictors, and integrated instruction fetch units. It also covers implementation issues of speculation such as register renaming vs reorder buffers, how much to speculate, and the challenges of speculation for energy efficiency.

Uploaded by

sharanya shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

CA - Slides

Uploaded by

sharanya shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

COMPUTER ARCHITECTURE

Advanced Techniques for Instruction

Delivery and Speculation

Pavitra Y J
Electronics and Communication Engineering
COMPUTER ARCHITECTURE
Pavitra Y J
Electronics and Communication Engineering
Increasing instruction fetch bandwidth
• Multiple issue processor is required to increase instruction fetch
bandwidth and extract ILP
• A multiple-issue processor will require that the average number of
instructions fetched every clock cycle be at least as large as the
average throughput
• Fetching these instructions requires wide enough paths to the
instruction cache, but the most difficult aspect is handling branches
Branch Target Buffers(BTB)
• To reduce the branch penalty for deeper pipelines, we must know
whether the as-yet-undecoded instruction is a branch and, if so, what
the next program counter (PC) should be
• If the instruction is a branch and we know what the next PC should
be, we can have a branch penalty of zero
• A branch-prediction cache that stores the predicted address for the
next instruction after a branch is called a branch-target buffer or
branch-target cache
BTB
BTB
• Because a branch-target buffer predicts the next instruction address
and will send it out before decoding the instruction, we must know
whether the fetched instruction is predicted as a taken branch.
• If the PC of the fetched instruction matches an address in the
prediction buffer, then the corresponding predicted PC is used as the
next PC
• The hardware for this branch-target buffer is essentially identical to
the hardware for a cache
• If a matching entry is found in the branch-target buffer, fetching
begins immediately at the predicted PC
BTB
• unlike a branch-prediction buffer, the predictive entry must be
matched to this instruction because the predicted PC will be sent out
before it is known whether this instruction is even a branch
• If the processor did not check whether the entry matched this PC,
then the wrong PC would be sent out for instructions that were not
branches, resulting in worse performance
• Store only the predicted-taken branches in the branch target buffer,
since an untaken branch should simply fetch the next sequential
instruction, as if it were not a branch
Steps for using BTB with 5-stage pipeline
Steps for using BTB with 5-stage pipeline
• Dealing with the mispredictions and misses is a significant challenge,
since instruction fetch has to halt while rewrite the buffer entry
• Make this process fast to minimize the penalty
Exercise
Determine the total branch penalty for a branch-target buffer assuming
the penalty cycles for individual mispredictions from table(previous
slide).
Make the following assumptions about the prediction accuracy and hit
rate:
■ Prediction accuracy is 90% (for instructions in the buffer).
■ Hit rate in the buffer is 90% (for branches predicted taken).
Exercise
We compute the penalty by looking at the probability of two events:
1 the branch is predicted taken but ends up being not taken,
2 the branch is taken but is not found in the buffer
Both carry a penalty of two cycles
BTB
• improvement from dynamic branch prediction will grow as the
pipeline length and, hence, the branch delay grows; in addition,
better predictors will yield a larger performance advantage
• Modern high-performance processors have branch misprediction
delays on the order of 15 clock cycles; clearly, accurate prediction is
critical!
BTB
• One variation on the branch-target buffer is to store one or more
target instructions instead of, or in addition to, the predicted target
address.
• This variation has two potential advantages.
1 It allows the branch-target buffer access to take longer than the time
between successive instruction fetches, possibly allowing a larger
branch-target buffer.
2 Buffering the actual target instructions allows us to perform an
optimization called branch folding. Branch folding can be used to obtain
0-cycle unconditional branches and sometimes 0-cycle conditional
branches
Return Address Predictors
• Procedure returns are important
• Though procedure returns can be predicted with a branch-target
buffer, the accuracy of such a prediction technique can be low if the
procedure is called from multiple sites and the calls from one site are
not clustered in time
• designs use a small buffer of return addresses operating as a stack
• This structure caches the most recent return addresses: pushing a
return address on the stack at a call and popping one off at a return
Return address predictors
• If the cache is sufficiently large (i.e., as large as the maximum call
depth), it will predict the returns perfectly.
Integrated Instruction Fetch Units
• To meet the demands of multiple-issue processors, many recent
designers have chosen to implement an integrated instruction fetch
unit as a separate autonomous unit that feeds instructions to the rest
of the pipeline
• recent designs have used an integrated instruction fetch unit that
integrates several functions:
1. Integrated branch prediction—The branch predictor becomes part of
the instruction fetch unit and is constantly predicting branches, so as to
drive the fetch pipeline.
Integrated Instruction Fetch Units
2 Instruction prefetch—To deliver multiple instructions per clock, the
instruction fetch unit will likely need to fetch ahead. The unit
autonomously manages the prefetching of instructions integrating it
with branch prediction.
3. Instruction memory access and buffering—When fetching multiple
instructions per cycle a variety of complexities are encountered,
including the difficulty that fetching multiple instructions may require
accessing multiple cache lines. The instruction fetch unit encapsulates
this complexity, using prefetch to try to hide the cost of crossing cache
blocks. The instruction fetch unit also provides buffering, essentially
acting as an on-demand unit to provide instructions to the issue stage
as needed and in the quantity needed.
Speculation: Implementation Issues and
Extensions
• Four issues that involve the design trade-offs in speculation, starting
with the use of register renaming, the approach that is often used
instead of a reorder buffer
1 Speculation Support: Register Renaming versus Reorder Buffers
• Register Renaming
If the processor does not issue new instructions for a period of time, all
existing instructions will commit, and the register values will appear in
the register file, which directly corresponds to the architecturally visible
registers
Register renaming

• speculation recovery is handled because a physical register holding an

instruction destination does not become the architectural register
until the instruction commits
• When an instruction commits, the renaming table is permanently
updated to indicate that a physical register corresponds to the actual
architectural register, thus effectively finalizing the update to the
processor state
Renaming vs ROB
• An advantage of the renaming approach versus the ROB approach is
that instruction commit is slightly simplified, since it requires only two
simple actions:
(1) record that the mapping between an architectural register number
and physical register number is no longer speculative
(2) free up any physical registers being used to hold the “older” value
of the architectural register. In a design with reservation stations, a
station is freed up when the instruction using it completes execution,
and a ROB entry is freed up when the corresponding instruction
commits.
Renaming vs ROB
• With register renaming, deallocating registers is more complex, since
before we free up a physical register, we must know that it no longer
corresponds to an architectural register and that no further uses of the
physical register are outstanding.
• A physical register corresponds to an architectural register until the
architectural register is rewritten, causing the renaming table to point
elsewhere.
• The processor can determine whether this is the case by examining the
source register specifiers of all instructions in the functional unit queues. If
a given physical register does not appear as a source and it is not
designated as an architectural register, it may be reclaimed and reallocated
2 How Much to Speculate
• significant advantages of speculation is its ability to uncover events
that would otherwise stall the pipeline early, such as cache misses
• Speculation is not free. It takes time and energy, and the recovery of
incorrect speculation further reduces performance
• to support the higher instruction execution rate needed to benefit
from speculation, the processor must have additional resources,
which take silicon area and power.
• if speculation causes an exceptional event to occur, such as a cache or
translation lookaside buffer (TLB) miss, the potential for significant
performance loss increases, if that event would not have occurred
without speculation
How Much to Speculate
• To minimize the most pipelines with speculation will allow only low-
cost exceptional events (such as a first-level cache miss) to be
handled in speculative mode. If an expensive exceptional event
occurs, such as a second-level cache miss or a TLB miss, the processor
will wait until the instruction causing the event is no longer
speculative
• Although this may slightly degrade the performance of some
programs, it avoids significant performance losses in others,
especially those that suffer from a high frequency of such events
coupled with less-than-excellent branch prediction
Speculating through Multiple Branches
• Three different situations can benefit from speculating on multiple
branches simultaneously:
(1) a very high branch frequency,
(2) significant clustering of branches,
(3) long delays in functional units
3 Speculation and the Challenge of Energy
Efficiency
• What is the impact of speculation on energy efficiency?
1. The instructions that were speculated and whose results were not
needed generated excess work for the processor, wasting energy.
2. Undoing the speculation and restoring the state of the processor to
continue execution at the appropriate address consumes additional
energy that would not be needed without speculation.
4 Value Prediction
• Value prediction attempts to predict the value that will be produced
by an instruction
• Obviously, since most instructions produce a different value every
time they are executed
• There are, however, certain instructions for which it is easier to
predict the resulting value—for example, loads that load from a
constant pool or that load a value that changes infrequently
• Value prediction is useful if it significantly increases the amount of
available ILP. This possibility is most likely when a value is used as the
source of a chain of dependent computations, such as a load.
Value Prediction
• Address aliasing prediction is a simple technique that predicts
whether two stores or a load and a store refer to the same memory
address.
• If two such references do not refer to the same address, then they
may be safely interchanged. Otherwise, we must wait until the
memory addresses accessed by the instructions are known
• Because we need not actually predict the address values, only
whether such values conflict, the prediction is both more stable and
simpler. This limited form of address value speculation has been used
in several processors already and may become universal in the future
THANK YOU
Pavitra Y J
Electronics & Communication
Engineering

[email protected]
+91 80 2672 1983 EXT:741

Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
The Particular Examination of Conscience and The Dominant Defect
100% (1)
The Particular Examination of Conscience and The Dominant Defect
79 pages
Iso 21527 2 2008 PDF
50% (2)
Iso 21527 2 2008 PDF
6 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Seminar Monday ACA
No ratings yet
Seminar Monday ACA
20 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
Unit V
No ratings yet
Unit V
23 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
moduel 5
No ratings yet
moduel 5
46 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
Computer Architecture Solutions_OK
No ratings yet
Computer Architecture Solutions_OK
6 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Pipeline Hazards (1)
No ratings yet
Pipeline Hazards (1)
53 pages
ACA-notes
No ratings yet
ACA-notes
39 pages
Pipelining: Basic Concepts
No ratings yet
Pipelining: Basic Concepts
20 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
Processor Organization
100% (1)
Processor Organization
55 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Correlating (Global) Branch Predictors Correlating Branch Predictors
No ratings yet
Correlating (Global) Branch Predictors Correlating Branch Predictors
3 pages
Pipelining
No ratings yet
Pipelining
44 pages
App C
No ratings yet
App C
50 pages
pipe3
No ratings yet
pipe3
32 pages
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
No ratings yet
Instruction Pipelining (Ii) : Reducing Pipeline Branch Penalties
5 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
02-Processors
No ratings yet
02-Processors
49 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Appendix C
No ratings yet
Appendix C
26 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Reducing Pipeline Branch Penalties
No ratings yet
Reducing Pipeline Branch Penalties
4 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
10_Pipelining
No ratings yet
10_Pipelining
44 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
branchPred
No ratings yet
branchPred
27 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Modern CPU
No ratings yet
Modern CPU
14 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
CCTV Course Assessment Doc 1
No ratings yet
CCTV Course Assessment Doc 1
3 pages
Instant download (Ebook) Business Venture 2 Teacher's Guide by Dorothy Zemach, Nina Leeke ISBN 9780194578097, 0194578097 pdf all chapter
100% (1)
Instant download (Ebook) Business Venture 2 Teacher's Guide by Dorothy Zemach, Nina Leeke ISBN 9780194578097, 0194578097 pdf all chapter
37 pages
Chem Mid 1
No ratings yet
Chem Mid 1
4 pages
Experiment To Verify Snell's Law of Refraction and To Estimate The Speed of Light Inside A Transparent Plastic Block (Topic 4)
No ratings yet
Experiment To Verify Snell's Law of Refraction and To Estimate The Speed of Light Inside A Transparent Plastic Block (Topic 4)
3 pages
Cost and Management Accounting JK Shaw Notes
No ratings yet
Cost and Management Accounting JK Shaw Notes
21 pages
Power Trim Assembly Components
No ratings yet
Power Trim Assembly Components
2 pages
Introduction To Modulation and Applications of JFET and
No ratings yet
Introduction To Modulation and Applications of JFET and
14 pages
Spare Parts KPC 212-L-U S Versions 5
No ratings yet
Spare Parts KPC 212-L-U S Versions 5
19 pages
Journal Pre-Proof: Journal of Cleaner Production
No ratings yet
Journal Pre-Proof: Journal of Cleaner Production
65 pages
Personal Development: Quarter 1 - Module 1: Knowing and Understanding Oneself During Middle and Late Adolescence
100% (1)
Personal Development: Quarter 1 - Module 1: Knowing and Understanding Oneself During Middle and Late Adolescence
9 pages
Essay
No ratings yet
Essay
6 pages
UCSP Module 7
No ratings yet
UCSP Module 7
7 pages
Design Technology - Study Guide - Core Topics 7-10 HL
No ratings yet
Design Technology - Study Guide - Core Topics 7-10 HL
49 pages
Design of A Class F Power Amplifier: Piers O V N
No ratings yet
Design of A Class F Power Amplifier: Piers O V N
4 pages
The Gödel theorem and human nature (Putnam)
No ratings yet
The Gödel theorem and human nature (Putnam)
13 pages
8 Being Aware of What You Share _ Common Sense Education
No ratings yet
8 Being Aware of What You Share _ Common Sense Education
7 pages
EAPP Quarter 2-WPS Office
No ratings yet
EAPP Quarter 2-WPS Office
10 pages
AJR Transducers Price List
No ratings yet
AJR Transducers Price List
4 pages
Intranets, Extranets, and Enterprise Collaboration Lecture Notes The Internet and Business
No ratings yet
Intranets, Extranets, and Enterprise Collaboration Lecture Notes The Internet and Business
20 pages
Math7 - q1 - Mod6 - Properties of Integers - v3
100% (2)
Math7 - q1 - Mod6 - Properties of Integers - v3
32 pages
Leadership Powerpoint
No ratings yet
Leadership Powerpoint
14 pages
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
No ratings yet
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
8 pages
Online Shopping Behavioral Influences of Online Shoppers in Choosing A Product
No ratings yet
Online Shopping Behavioral Influences of Online Shoppers in Choosing A Product
12 pages
Swyd Project
100% (1)
Swyd Project
51 pages
Matlab Installation For Professors
No ratings yet
Matlab Installation For Professors
4 pages
TOPOGRAPHY, HYDROLOGICAL AND GEOTECH Request For Proposal (RFP) - Final Version
No ratings yet
TOPOGRAPHY, HYDROLOGICAL AND GEOTECH Request For Proposal (RFP) - Final Version
55 pages
Jurnal Akbar Lutfi-1
No ratings yet
Jurnal Akbar Lutfi-1
11 pages
Create A Simple BSP
No ratings yet
Create A Simple BSP
9 pages

CA - Slides

Uploaded by

CA - Slides

Uploaded by

COMPUTER ARCHITECTURE

Advanced Techniques for Instruction

• speculation recovery is handled because a physical register holding an

You might also like