Two Level Branch Prediction
Two Level Branch Prediction
Adaptive
Tse-Yu
Training
Yeh and Yale
Branch
N. and Patt Computer
Predict
ion
Department
Engineering
Science
University Michigan
of Michigan 48109-2122
Arbor,
Abstract
1
present other get
Introduction
at least time hand, deeper as early as [18] and continuing one of the most on a single impede unresolved bandwidth machine effective On processor. performance As greater, increases. conditional to be resolved before the tarbranches In until a the to the ways the due the [6], has been performance branches stalls or effect have target for issuing
Pipelining, High-performance structures, The tiveness branches proposals Some and make This the the dynamic are profiling deep of a deep importance microarchitectures pipelines of a good pipeline to help branch In fact, of branch they to make use run-time use use, speed predictor the literature prediction opcode execution among up other to improve to pipeline
in the presence
branches. becomes
pipelines
static in that
different
predictions.
to be calculated
predictions. paper proposes Adaptive prediction a new dynamic on branch the basis predictor, which alters of inforTwo-Level branch Training algorithm scheme,
be fetched. target
Unconditional
conventional bubbles.
mation
Several Training and tive dynamic curacy
at run-time. of the are Two-Level introduced, known Two-Level Adaptive simulated, static and Adapaccompared a predicexecution rate. Adaptive for the other imflushes Predictor to simulations prediction of the ten
to resolve to reduce
configurations
loss due to the pipeline branch fetch fetching the as early pipeline and execution to reduce as posbubbles. decoding pipeline the execuprefetching before into on the branch that taken all are the the of
is considerable.
compared Training
instruction to reduce
Branch 93 percent
97 percent Since
benchmarks,
bubbles.
Branch due
is a way
flushing
of the
to branches of the
in progress,
is the miss
execution
is 3 percent vs.
is resolved. prediction and used are dynamic to not make can taken schemes schemes predictions. be as simple or that and Smith predicting all can be classified depending Static that prediction In are the in this static inforpreall can inabout Static Certain in one also Taken can
represents reducing
number
schemes
as predicting branches
branches accuracy
taken.
Permission to copy without fee all or part of this material is grantrd pro. vialed that the copies are not made or distributed for direct conurremal advantage, the ACM ropyright notice and the title of the publication and its date appesr, and notire is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0-89791-460-0/91/001 1/0051 $1.50
achieve structions
approximately
dynamic study,
taken. more
classes of branch
to branch as the
branch
consideration
Forward
Not
Taken
scheme
[16]
which
is fairly
efonly
achieve
under
93 percent. in
This
reduction Adaptive
Training
on programs
irregular of the
branches.
reduction Section posed scheme. in this tion lection static cluding four
directly an
to predict in the
profiling
in advance occur of
methodology
run-time. Dynamic knowledge dictions. called urating which history entry get the is based Buffer result advantage a structure uses 2-bit information The of the branch The to branch. execution branchs prediction Branch record Taronly history
of schemes branch
including
dynamic contains
predictors.
to make
remarks.
Two-Level Branch
Adaptive Prediction
Training
dynamically in the on buffer. the design of the scheme Training from state last also scheme
The
Adaptive following
Branch
Prediction
scheme
prediction executed
on
the
history execution
of of
dynamic Static lected tern the tage has same sets. There pipelined diction work tivation branch levels The second of that tory ing the advantage proposed Branch only record on the first
by Lee and
Smith
current
uses the statistics and The the a history results execution major the statistics to different is that
program. history history table of the pattern information of the predictor. information execution in the is collected by updating branch Therefore, the history no pre-
program
a prediction. to accumulate
to be run statistics
program
are necessary.
be applicable
is
serious
performance superscalar
in by
deeppre-
2.1
Concept
of Branch
Adaptive the history the Static by
Two-Level Prediction
Training branch pattern Training Adaptive profiling by the table The scheme history table (PT),
Adaptive
Training
The jor and those Smith execution tions of the the are entries Two-Level data the branch in
to
the
large
has register
two similar
that for
has
be discarded a new,
structures,
higher-accuracy The of the for last new last the scheme to make last fly
dynamic uses two The hisdispredictions. s occurrences The major executscheme Training based particular study. branch The prebenchTraining accuracy most for not
used [13].
of
Lee instead
information
n branches.
accumulating
n branches. without the The are but of the in this prediction static the Adaptive SPEC
information
on the
history
registers pattern
eliminating Prediction.
Two-Level
Adaptive
branches. shifts most recent are results Branch history content that pattern for
history representing
register
is a shift
in bits
predictions n branches,
moreover
contained
(HRT).
particular in the
Trace-driven Two-Level aa well diction mark Branch the as the schemes suite.
predictions
by check-
of the is being
history predicted.
register
register the
table history
by
branch is called
Prediction,
prediction while
addresses,
benchmarks
97 percent,
of the
52
Pat&n BraDchmtllry
Pattero BrauchEistmy B.egk& (Eli) wheoupiate) (Mftleft RwBi@+I II . . . . . .. RwR~.I ~~~~~ I
Table (PT)
o
*y=
Autanatm
Ime-t-Time
(LT)
Auta!mtnn
Al
eat
of B
$ , ,,,,,0
s,
lramitim
$*!=6(SGM
11.....11 ~ @
Auto-n
Automaton
A4
Figure Training
1:
The
structure
of
the
Two-Level
Adaptive
2:
The
state used
diagrams the
of
the
finitein
scheme.
machines pattern
pattern
history
table
entry.
patall
in the old
branch bits. come
pattern
history to the
bits
and the
the new
as inputs Therefore,
generate new
structure
Branch of a last
pattern
history
Prediction
Bj
is
k
are
$%+1 =
out comes
of executing
k bits
a
6(s.,
Ri,c)
(2) logic circuit the table. S and pattern The the is used histranout-
in the
register
keep 1 k
A straightforward to implement tory sition come which the bits function the in the
are
patterns
history of the
keep track
in
there is indexed
2k entries
R of the branch
can is based machine transition in this pattern table
the pattern history When contents denoted comes pattern dressed table diction
table;
each
by one distinct
be characterized
by equations on the 1. of the are shown stores of the pattern pattern pattern machine
prediction
pattern. a conditional of its history ss &,c-kRi,c-k+~ of executing table. entry of the The the branch register,
Bi is being predicted, the HE, whose content is for the last k outto address the the adbits
in
z characterized
by equation
is used
updating
pattern
,,=-kR,,c-k+l
SC in
the
PTR
used branch
Last-Time
outcome history
in the
, . . . .. R..=_l
pattern
The pre-
are then
for is
predicting
the
branch.
the the
last
the
history
Zc
A(SC),
decision branch function. is resolved, the
(1)
will ton
be what
happened pattern
the
last history
time.
of the
to store
prediction left
Al
results
out-
history branch
pattern recorded,
appeared. the next has the taken; The which Branch when
is no taken branch
the position
H~
used taup-
execution
of the
in the
to ble
the history predicted predicted ing in The and next up-down Lee and counter
register as not
pattern branch A2
update entry
bits
pTRi,=_kR,,c_k+
l . . . . .. Ri.c_l.
dated, the content of the history register becomes by Rj,C-k+~~,C_k+z . ..... &,C and the state represented
the pattern pattern by the history bits becomes bits transition in the SC+l. pattern The transition table 6 which entry takes of the is done history state
Buffer is not
branch
is decremented execution
function
be predicted
53
when
the
counter the A3
value branch
is greater will be
than
to two; taken.
is to be predicted, is located the have the tag first. contents to address an entry branch. store
the
otherwise, Automata Both ing are dictions namic these tion Level from diction pattern output branch dictions at tive ate tory two in the
predicted similar
branch
and
A4 branch
Static dynamic
Training
Two-Level
Traintheir predythe
is used
pattern AHRT,
a new
information,
i.e.
history.
implementing
changes but
address approach
a condiis called
is preset the
history History
table Register
a given
branch for
Since this
colliimple-
is determined history are pattern, made times history different the pattern to the
execution. execution is, the history updates with the history therefore, decision Predictions the
when
accessing
a hash
mentation history. for with this In this the Ideal is a history were Branch
in more would but the the above for the The is lower
execution accuracy
execution.
an AHRT,
on the
approaches in which
Register for
of each be found different Two-Level Adaptive the Since Adaptive rent proper tive ferent contrary, sults
branch,
As a result,
same
branch there
register
information
simulated Predictor.
Two-Level
inputs
function of Two-Level
configurations: entry lated ulation is lost history 4-way with due data to register
Adaptive Training the pattern execution can still and not execution Training,
Training. change history the With data behavior be highly sets. well behavior.
adaptively bits
with
program
accuracy practical
predictor
adjust program
designs.
branch Training
the updates,
3.2
The needs tion. which
Prediction
Two-Level two
Latency
Adaptive table Training lookups for next is to Branch to make into Predictor a predicone cycle, address. the pattern and store patof a prothe taprethe of a
predict execution
if changing
sequential
in different
behavior.
It is hard is usually in solution table at the a prediction with next the time table
processor
3
3.1
Implementation
Implementations History Register
to have branch a big
Methods
of the Per-address
lookup
updated pattern in the for must history when the result been the
register
is updated, register
Table
enough history history two register regisHi~tory per-address A fixed together (LRU) lower table part and in numas a alof a the the histhe a
prediction It ter is not in real feasible ble the table for each static for to have the implement cache. grouped are its own
as a prediction history the does problem of the very branch often Since does not when this the register branch in the not
Therefore,
approaches
diction pattern
is available
are propoeed Register The register ber set. branch higher entry tory Table. first table
Per-addrew the
have
to be accessed of the
approach
is required
of entries Within
executed
a set, for
superscalar
gorithm
has a high
is used
is predicted
have
previous
allocated register
is confirmed.
implemented Register
is called When
Associative
(AHRT).
54
Dynamic Indmwti.n
DMribution
100 @o 80
=i\ ~ i
P
I
70 80 60 40 30 20
Nm+snti Inu
h Sub hsl km Imt mallw
e r c : t a 0 e
u bum s Jm
El Im 5rdl
r
u ml. u Ihm s WV
mlti O CuWuwl A
IWSU
Bmmh Ins!
10 0 7&A ~
10 0
oqnt q+.
~.
1,
FPA tic m
son 2*
Mwhmarh
Figure
3: Distribution
of dynamic
instructions.
Figure
4: Distribution
of dynamic
branch
instructions.
Methodology Model
and
Simulation
Benchmark Name
Bencbrnark Name
Number
of
Trace-driven torola for lator verifies statistics address 88100 traces generating which
study, (ISIM)
277 6922
1149
213 370
decodes
to collect set
Table
1: The
number
of static
conditional
branches
in
The branch [4] are classified subroutine branches, structions non-branch Conditional in order branches stack. a subroutine for the branch is detected. when sets without routines, and Emma prediction. get address struction address branches which can the the return can A return
conditional
branches,
benchmarks point
benchmarks.
branches, than
floating
doduc,
fpppp,
unconditional class.
the branches
are classified for condition Subroutine a return as the a return prediction for returns proposed the return the offset therefore, onto
matrix300, spice2g6 and torncatv and the integer ones eqntott, espresso, gee, and li. Nasa7 is not ininclude
cluded because benchmarks, loop execution; it takes too long to capture the the five have tend branch branch floating repetaccuto have behavwhere condiand branch branches million gcc inexeare simuto 1.8 in Figfor are behavior point itive many ior. of all seven kernels. thus, The it study all branches is on the focuses Among high
instruction branches
to decide
matrix300
integer
a very
prediction
is pushed when
the stack
is popped address
address stack
return
of the branch
address
overflows. scheme
For instruction
instructions stacks
benchmarks
except
fpppp
were simulated structions. cution lated billion. The dynamic executed. before
conditional and
f pppp
gcc finish
unconditional
branchs
millions range
of dynamic
instructions
immediately. to become
Unconditional ready.
on registers
is the target
of the dy-
4.1
Description
from
of Traces
the SPEC benchmark study. Five suite are are
benchmarks instructions
Nine benchmarks
used in this
The
branch
branch
prediction
float-
55
Branch
HRT NaIu. AT(AHRT(2S6,12SR), PT(212,A2),) AT(AHRT(512,12sR), PT(212,A2),) AT(AHRT(512,12SR), PTGJ2,A3),) AT(AHRT(512,1!JSR), PT(212 ,A4),) 512 612 512 .512 256 512 12-bit 10-bit 8-bit 6-bit 12-bit 12-bit 12-bit ),) 512 .512 12-bit 12-bit 12-bit 612 512 . ),,) ),,) 512 512 512 512 . + 12-bit SR SR SR SR SR SR A2 LT A2 LT Al LT PB PB PB PB PB PB ,PB),SaIXIe) ,PB),SaIIIe) ,PB),SaIIIe) ,PB)$DM) SR SR SR SR SR SR SR Atra AtuI AtIU Atm A41u AtIU Atza LT AZ A2 AT(AHRT(512,12SR), PT(212,LT),) AT(AHRT(512,1OSR), PT(210,A2),) AT(AHRT(512,8SR), PT(28,A2),) AT(AHRT(512,6SR), PT(26,A2),) AT(HHRT(256,12SR), AZ A2 A2 AX 512 12-bit SR At= A4 512 I%bit SR Aim A3 512 12. bit SR AtDI A2 ~ Entries 256 1. !Lner. tatmzl Entry content 12-bii SR Ilelliat*o Eniry C.rlten< At= A2 Il
Buffer
design
(LS).
In
History
for the
(Size,
for
Entry-Content),
keeping IHRT, specifies an entry tomaton Pattern(Size, mentation patterns, plementation, in each history 2. the For entry. table Lee and part the for Size of entries or
History
information HHRT. in Figure
is the Size
each register 2 or
of branches,
example, number
implementation, history in
Ent ry.Cent
content be any
ent
of auIn
content
entry. table
shown
a history
Entry_Content),
keeping specifies and The can history the number
Pattern
information
implehistory
Entry_Content
content be any automaton Branch included, kept sets same used are data
PT(#z ,Aa)>)
AT(HHRT(512,12SR), PT(212 PT(212,A2 PT(_J12 PT(212 PT(212 PT(212 ,A2),) AT(1HRT(,12SR), ST(AHRT(512,12SR), ST(HHRT(51;,12SR), ST(IHRT(,12SR), ST(AHRT(512,12SR), ST(HHRT(512,12SR), PT(212,PB),Diff) ST(1HRT(,12SR), PT(212 ,PB),Diff) LS(AHRT(512,A2 LS(AHRT(512,LT),,) LS(HHRT(512,A2 LS(HHRT(512,LT),,) LS(lHRT( ,A2),,) LS(lHRT( ,LT),,) 12-bit Atm Aim AiIII A*UI Aim A* III
of an entry Target
Putt ern
history how
Data Data is
both
for as testing.
When
Data
for
is specified data
Diff,
If for
12-bit
training
Data
or Lee this
is not specified,
and Smiths and are listed simulation usually all HRT to the
no training
the shemes,
schemes The in
Buffer simulation
designs. model
AT and
Two-Level Smsths
ST
lkainmg,
about
60 percent
accordhis0s.
ing tory
of the 1s than
Des$gn, HHRT
register
History Ideal
Accordingly, entry in the of program is re-allocated register The is not pattern branches using
in the
Table, PB -
1s at the branch,
Automaton,
Last-
Preset
Prediction
execution.
Table
2: Configurations
of simulated
branch
predictors.
re-initialized. history are to state bits more Al, 1 such will the Smiths in the beginning likely, for pattern those A4, table entries Since taare beentries
in Figure
the
distribuinstrucbranch the
are also initialized taken bles automata, to state of execution addition Lee Target branch to and
at the
branch conditional
A2,
that
to improve conditional
3. For
number
of static
branches
at the
in the 1.
trace
of the benchmarks
are listed
4.2
Several Adaptive tory tions, (HHRT), ulated. the
Simulation
configurations Training table associative along In naming is order with to the
Model
were scheme. (PHRT), HRT the simulated For two (AHRT) ideal for HRT the the the for the Two-Level per-address implementathe hash were HRT simhis-
Buffer prediction
comparison with
Smiths
register
is similar
Two-Level pattern
Adaptive
distinguish
different branch
schemes, prediction
by profiling.
convention
schemes
(Size, Data).
Two-Level (ST), or
Entry.Content), Scheme
Adaptive Lee and speciTrainSmiths
Putt ern(Size,
fies the scheme, ing (AT), Static
designs
were
simulated static
automata
Last-Time.
include
simulated
56
the
Always and
Taken, a simple
Taken
and
Forward The
Not
Wo-I.avd 1
khPth l%ddIw
SdIme4 Udq
profiling and
1 Om \ ~ c c : OM - - . . . . . . . . . . ... ....... .. ... . + ATIAHflTIS!2.12SR),PT140WA4).) O AT(AHIIT(5M! 2SRWWW.LTI.I 0.92 - -. AT(A~T(SI z,,2S~,PT(400AA2L)
is done for
The
takes for
used
in this by taking
. .
y 0.24
dynamic
conditional
instructions.
0.8
. . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
The run Static signs, ures the egory mean the ing This ferent
Simulation
simulation with and results the Two-Level Schemes, static 10 show as Tot all the some
Results
presented Adaptive the the On G branch the in this section schemes, Buffer schemes. accuracy axis, the G Mean benchmarks, across shows axis the the were the deFigacross catFigure different
r90-Z.awl
5:
Adaptive
Training
schemes
using
Branch prediction
automata.
ImdEemA8A2m
benchmarks.
Mean
geometric point
mean shows
all integer
0 ~v-k$y ~ : v
,2.>;.V$Z...... I...
-
Ad@iveTn2mhw -.
mug Ditrelentxmr
n.n . - ...................$$.
c c
0,22
- -----------------------
. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .
76 percent a comparison
a c
#u-
- . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .
3
in the state the
* ATIHRTI!6, MI%?T(4wLA21.I
schemes.
0.8 -- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
The ulated ent HRT lengths using
Two-Level
Two-Level with
Adaptive
Adaptive Training transition and
Training
schemes automata, history Training the and accuracy. were simdifferregister The scheme scheme is used table.
! 10MS
different their
state effects
different
Figure
different
6: Two-Level
history
Adaptive table
Training
schemes
using
register
implementations.
simulations an
without
effect Static
In
order each
to
the
curves
following transition
as a comparison which
Smiths
Training
register
A2
best study.
among
transition
in this
5.1.1 Figure
of
State
Transition of different
A2,
A3, A4,
beto the scheme than similar than last the time; in
Register
Table
Imple-
included, The
Figure
on the Training ulated forms ond, entry scheme hit the ratio.
HRT in the
implementations Adaptive was simWith scheme the the the persec256HRT in graph
Last- Time
around maintain which and A4
achieve
equivalent
the IHRT
97 percent. records
finite-state
what more
HHRT
A2, the
A3,
are therefore
to noise
decreasing
execution
history.
is due
interference
branch
57
WO.Lwd Ad@iw
sn
A c c ; * c Y
a 8,
ala
0s4
0,1
. 1~
rnkJvg Safnmm
U,hlgaf.lnrys@stelUof
f.a@u
Name
Training
Set
Testing
Data bca
S=
. .. .. .
.. .. .. . . ..
~...
~$
:..: ................
//
. .. . . . . . . . . . . . . . . .
. ,g-----------------------------------
tower tiny
of hanoi doducin NA NA
queens
. ,:., e
. .. . . .. . . . . . .
. . . .. .. .. .. .. .. .. . ..
. . . .. . . . .
. .. .. .. .. . .
. . .
.. .
. .. .. .. . .. .
.. .. ..
El
-
* AlpWT(siaamMAf))
short
greycode.in NA
. . . . .. .. .. .. .. .. .. . . .
. . . . . .. .
. .
Table mark.
3: Training
and
testing
data
sets of each
bench-
implementations Figure history 7: Two-Level Adaptive Training lengths. schemes using the Static Level table 5.1.3 Figure the using lated. Effect 7 shows of the History effect Register of history of Twc-Level Adaptive register for for Length register Adaptive Training lengths about 2 bits. the history were length on similar. tern the their data sets the names) are same table In order Static Training
used
study The
were cost
with
registers
of different
is not
register
Trainschemes simuby
ing schemes.
Same
tested
on the same lli~~ data in were in their data are for sets order
history results,
on different
simulation often
improves
the
accuracy
as those
asymptote
is reached.
results
schemes
achieve for
with
5.2
Static tory the
Static
Training pattern statistics
~aking
Branch of the gathered last Prediction examines the the the hisand with the his-
because
branches other
beforehand. of nine data benchmarks sets. The and other with excluded four benchmarks, eqntott, because applicable data 3. in sets
n executions
from profiling calculate not-taken the branch required done keep History in track
of a branch program
plicable
data
set
to or
matrix300,
there data sets are
fpppp,
too
tomcatv,
data to each
with path. to
are no other
pattern
gather the
the
the
are shown with 8. using is about Training The the history execution, by
in Table similar
Static re-
configuraschemes predicdata set This 12 bit is about Adaptive and prediction lower is more floating Ii the an data
execution which be used of each is being to index branch is then number bit
registers
must history
schemes
97 percent. accuracy registers different the for For are not Since of the 1 percent
is achieved history the Training 512-entry sets accuracy respectively. significant. point The same
preset
prediction
branch.
However,
when
varies
are used
number the
required a big
gcc and
espresso
5 percent branch
is about
accuracy lower.
to hold
branches
programs.
to consider
the to the
effects IHRT,
implementapractical HRT
benchmarks, degradations
degradations behavior
in addition
are within
0.5 percent.
58
camu190a
1
d Srmdl
Plwdictlm
sch-
+ Pr&h@(..s.fnl)
Figure schemes.
8:
Prediction
accuracy
of
Static
Training
Figure
10:
Comparison
of branch
prediction
schemes
the A2.
same Using
with
practi4 to
is about
compared fall
Taken
predict of the
1
U3(AMIT(512LT).J 0 LS(MIYT(5WA2).) * L.SWIMT(SULTM a L$$+HRT@12#2).) Xm - N.q, Wal ~ Pm$llnj(,,samo)
other
schemes.
76 percent, The (BTFN) matrix300 For racy The quite average the Backward is effective and Taken for tomcatv and the but Forward loop-bound not for taken benchmarks benchmarks. accuthe other
loop-bound
benchmarks,
benchmarks, The
is approximate benchmark
accuracy
60 percent. scheme accumulate is taken each branch. simulated the and how The here statistics many is to run of how times the bit to for
9:
Prediction BTFN,
accuracy Always
of Branch and
Target the
Buffer Profiling
Taken,
different the
data average
sets ac-
in the on
branch branch
whether
the
curacy
the
not-taken of the
5.3
Other
The
average
scheme
is about at the
scheme
low
prediction
Always
5.4
Comparison
of
Schemes
were sim-
Al,
are shown
Figure 10 illustrates the comparison between the schemes mentioned above. The 512-entry 4-way AHRT was chosen for all the uses of HRT, because it is simple enough to be implemented. Tmm-Level Adaptive
The similar
Al
and similar
Static
training At scheme
are is the
on the
basis Adaptive
of is the
lower Using
those
costs.
Two-Level the
to IHRT, an IHRT
were
Training about
average
accuracy graph,
97 percent.
be seen
59
1 to
References
[1] M. Butler, and T-Y M. Yeh, Shebanow, Than Y.N. Two, Patt, Instruction M. Alsup, Level H. Par-
around
Scales, allelism
a branch branch
the about
is Greater
Proceedings
of the Archi-
execution
achieves
89 percent
accuracy. [2]
Tato
6
This
Concluding
paper proposes Training. the behavior of the Two-Level with three history the HHRT to history a new The
Remarks
branch scheme last predictor, predicts Two-Level a branch and by the [3]
Target
Branches
Due
Subroutine
, Proceedings
ternational
(May Tse-Yu Branch of Michigan, [4] Motorola Arizona, [5] of 1991),
Symposium
pp. 34-42.
on Computer
of the
n branches of that
occurrences
unique
Yeh, Prediction,
Two-Level Technical
Adaptive Report,
Training University
n branches.
Adaptive HRT the register AHRT which obtain A scheme accuracy same the Training table schemes the large enough table. The for were to simwhich hold IHRT each usually scheme AHRT simseen [6] configurations: which is a hash upper using than size, HHRT. Training register prediction the history scheme lengths. accuracy register. Training predicTraining Taken, a simple [8] [7] branch Static Always and Adaptive or static Smiths designs, Not taken, was As IHRT
Phoeniz,
is an ideal cache, data the has using Each ulated from ally In tion and was other higher an
(March
13, 1989).
is a set-associative bounds
Hwu, T. M. Conte, and P. P. Chang, ComparW.W. ing Software and Hardware Schemes for Reducing the Cost of Branches, ternational
(May N.P. Level pipelined 1989). Jouppi and D. Wall, for Available Superscalar Instructionand Super-
Proceedings on Computer
Symposium
has lower
Parallelism Machines.,
the
simulation to such
is usu-
Proceedings
of the Third
In-
by lengthening other Target and were dynamic and Buffer Forward simulated.
Two-Level
scheme, schemes, Backward profiling The shown cent suite. ter than diction reduction Since lative on
ternational Conference on Architectural Support Languages and Operating Sysfor Programming 1989), pp. 272-282. tems, (April
D. J. Lilja, Reducing Processors pp.47-55. Hwu and Y.N. Execution Patt, Checkpoint 1987), Repair for , the Branch Penalty in (July
as Lee
IEEE
Computer,
Adaptive an average
has
been
Out-of-order
Machines, (December
IEEE
Transpp. 1496-
actions
1514.
on Computers,
prediction
accuracy means of in
4 percent
schemes, in the
flushes of
a prediction execution
causes
Trans-
progrem,
performance
actions
on Computers,
and H.
1987), An
pp.859-876. Evaluation of
improvement considerable scheme. Deep-pipelining tive ness, good Branch methods however, branch
a high-performance Two-Level
[10] J. A. DeRosa
Branch (June [11] D-R. in the Delay
Levy,
Architectures
ternational
superscalar instruction performance. critically Two-Level by minimizing branches. 60 on the as a way 1987), Ditzel exploiting processor depends predictor. processors mispredicted parallelism effectiveof a high as-
Symposium
PP.1O-16. and Zero, H.R.
to improve
McLellan,
Branch Reducing
Folding Branch
CRISP to
Microprocessor:
Adaptive
Training
Prediction
is proposed
tional
1987),
[12]
S. Cost
McFarling of
and
J.
Hennessy,
Reducing
the
Branches,
ternational (1986),
ings of the 8th International Symposium on Computer Architecture, (May. 1981), pp.443-458. [16] J.E. Smith, gies, A Study Computer
of Branch Prediction
Strate&m1981),
Proceedings on
of the
$ih
International (May.
[13] J. Lee and A. J. Smith, gies and Branch (January Computer, [14] T.R.
posium
Architecture,
Target 1984),
Optimizing
Delayed
Branches, Proceedings of the l$th shop on Microprogramming, (Oct. 120. [15] D.A. duced Patterson Instruction and C.H. Sequin,
cessor System Implemented Through Pipelining., IEEE Computer, (Feb. 1974), pp.42-51. [18] T. C. Chen, parallelism, Pipelining Design, Vol. and Computer 10, No. 1, (Jan.
RISC-I:
A ReProceed-
Set VLSI
Computer,
61