Comp Arch Proj Report 2
Comp Arch Proj Report 2
PROJECT #2
Submitted by,
Page | 1
INTRODUCTION
In computer architecture, a branch predictor is a digital circuit that
tries to speculate which way a branch will go before this is known for sure (i.e.,
before its execution). The purpose of the branch predictor is to improve the
flow in the instruction pipeline. They play a critical role in achieving high
effective performance in many modern pipelined microprocessor architectures
such as x86.
In this project, we analyze the behavior of different branch predictor
configurations in three well-recognized benchmarks, especially GCC,
ANAGRAM and GO. We used simplescalar sim-outorder, which models all the
execution aspects of Alpha 21264. The simulations provide the CPI
values(sim_CPI), which we used to compare among different benchmarks.
We have used three types of hardware based branch prediction
strategies, they are:
1) Bimodal Predictor: It is a simple predictor, which uses 2-bit saturating
counters to predict if a given branch is likely to be taken or not.
2) Two Level Predictor: A two-level adaptive predictor with an n-bit history
is that it can predict any repetitive sequence with any period if all n-bit subsequences are different. The advantage of the two-level adaptive predictor
is that it can quickly learn to predict an arbitrary repetitive pattern.
3) Combined Predictor: A hybrid predictor also called combined predictor
implements more than one prediction mechanism. The final prediction is
based either on a meta-predictor that remembers which of the predictors
has made the best predictions in the past or a majority vote function based
on an odd number of different predictors.
Page | 2
Baseline default RAS: Bimodal predictor with the default value for RAS.
-bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2
2 Level Predictor: Uses two bit for defining the state for branch predictor.
-bpred 2lev -bpred:2lev 1 256 4 0 -bpred:ras 8 -bpred:btb 64 2
RAS 16: Change the return address stack (RAS) size to 16.
-bpred bimod -bpred:bimod 256 -bpred:ras 16 -bpred:btb 64 2
Configuration
Benchmarks
GCC
ANAGRAM
GO
Baseline
0.9069
0.466
0.8112
2 Level Predictor
0.9453
0.4578
0.8447
Combining
0.8934
0.4537
0.8052
Bimod:RAS 4
0.9115
0.4663
0.8113
Bimod:RAS 16
0.9066
0.466
0.8112
Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO
Page | 3
Page | 4
Configuration
Baseline
2 Level Predictor
Comb
Bimod:RAS 4
Bimod:RAS 16
GCC
0.7102
0.6627
0.7206
0.7058
0.7105
Benchmarks
ANAGRAM
0.9555
0.9579
0.9684
0.9552
0.9555
GO
0.6402
0.5747
0.6409
0.64
0.6402
Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO
of
different
Page | 5
Configuration
Benchmarks
GCC
ANAGRAM
GO
0.8431
0.9608
0.7525
0.791
0.9629
0.6915
Baseline
2 Level Predictor
Comb
0.8568
0.9736
0.7572
Bimod:RAS 4
0.8431
0.9608
0.7525
Bimod:RAS 16
0.8431
0.9608
0.7525
The graph for the Direction Hit Rates with respect to every benchmark
will provide us more information on the effect of branch prediction
configurations on different benchmarks.
Graphical Representation with above Direction Hit Rates
1.2
1
0.8
0.6
0.4
Benchmarks GCC
0.2
Benchmarks ANAGRAM
Benchmarks GO
The Direction Hit Rates of the branch predictors fairly stays constant for
each benchmark. Specifically, ANAGRAM benchmark has more direction hit
rates than other two benchmarks. In this case, 2 level prediction direction rate
gives worst performance for GCC and GO benchmarks. Combining Predictor
gives best performance for all benchmarks.
Page | 6
"(includes addr-
bpred->dir_hits = 0;
bpred->addr_misses = 0;
----------/* Have a branch here */
if (correct)
pred->addr_hits++;
if (!!pred_taken == !!taken)
pred->dir_hits++;
else
pred->misses++;
pred->addr_misses= (pred->misses + pred->dir_hits - pred->addr_hits);
-----------
Page | 7
}
Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way
GCC
1005521
937745
1100970
1018386
995879
Benchmarks
ANAGRAM
2032397
2020880
2034249
2037020
2028135
GO
1051818
1010267
1076578
1054258
1031176
Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO
Page | 8
relatively minimum for BTB with 32 sets and 2 way set associative. GO
benchmark has moderate address hits and the performance is relatively
minimum for BTB with 32 sets and 2 way set associative. GCC benchmark has
poor address hits when compared to other benchmark. For this benchmark,
the address hits is again minimum for the configuration of BTB with 32 sets
and 2 way set associative.
Comparison of BTB Performance based on addr_misses
Sr. No.
1
2
3
4
5
Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way
GCC
563339
631115
467890
550474
572981
Benchmarks
ANAGRAM
76544
88061
74692
71921
80806
GO
345506
387057
320746
343066
366148
Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO
From the above graph, as expected, address misses is very optimal for
ANAGRAM benchmark. GCC benchmark has maximum address misses among
all the three benchmarks. As we can see from the graph, decreasing the
sets from 64 to 32 increases the address misses and increasing the
number of set from 64 to 128 decreases the address misses. This is
because capacity misses is reduced by increasing the number of sets. In case
of 32 sets/4 way configuration, even though set is decreased from 64 to 32 the
address miss is decreased because the associativity is increased which
reduces the conflict misses. In case of 128 sets/1 way configuration, due to
direct mapping, even the increase in number of set increases the addr_misses.
Page | 9
Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way
GCC
0.9741
0.9899
0.9495
0.9737
0.9748
Benchmarks
ANAGRAM
0.4578
0.4601
0.4572
0.457
0.4584
GO
0.7208
0.7265
0.716
0.7206
0.7226
Benchmarks GCC
0.2
Benchmarks ANAGRAM
Benchmarks GO
From the above graph, CPI remains fairly constant for every benchmark.
Among the benchmarks, ANAGRAM benchmark has the most optimal CPI and
GCC benchmark holds the maximum CPI for execution with various BTB
configurations. The CPI seems to be higher for configuration 32 sets/2 way
compared to the 64 sets/2 way which has much higher sets than this
configuration. In case of 32 sets/4 way and 128 sets/1 way configurations,
associativity and number of sets makes the CPI almost equal to the 64 sets/2
way CPI. For the configuration with set 128 and associativity 2 the CPI remains
much lower than all other configurations.
P a g e | 10
Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way
GCC
0.6409
0.5977
0.7018
0.6491
0.6348
Benchmarks
ANAGRAM
0.9637
0.9582
0.9646
0.9659
0.9615
GO
0.7527
0.723
0.7705
0.7545
0.738
Benchmarks GCC
0.2
Benchmarks ANAGRAM
Benchmarks GO
The above graph clearly shows us that the branch predictor hit rate
for all the benchmarks is relatively low when number of set decreases in
a BTB. When we closely observe the variation in the branch predictor hit rates
of different configurations, it is evident that for BTB configuration, 32 sets and
2 way set associative the branch prediction hit rate is lower for all the
benchmarks. If we have change 32 sets with 4 way set associative to 128 sets
with 1 way set associative, branch prediction hit rate decreases.
CONCLUSION
For an optimal branch predictor, it is recommended to have higher sets but at
the same time tradeoff between cost and performance should be taken into
consideration.
To have high address hit rates and direction hit rates, the simulation results
suggests that combination of two level and bimodal predictor configuration is
better.