0% found this document useful (0 votes)
85 views

Comp Arch Proj Report 2

The document analyzes the performance of different branch predictor configurations and branch target buffer (BTB) configurations using three benchmarks: GCC, ANAGRAM, and GO. Simulation results show cycles per instruction (CPI) and hit rates for different branch predictor types (bimodal, two-level, combined) and BTB configurations varying the number of sets and associativity. The combined predictor performed best overall with ANAGRAM showing the highest hit rates and lowest CPI across configurations. GCC generally had the highest CPI and lowest hit rates. Address misses increased for all benchmarks with a smaller 32-set, 2-way BTB configuration.

Uploaded by

Nitu Vlsi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Comp Arch Proj Report 2

The document analyzes the performance of different branch predictor configurations and branch target buffer (BTB) configurations using three benchmarks: GCC, ANAGRAM, and GO. Simulation results show cycles per instruction (CPI) and hit rates for different branch predictor types (bimodal, two-level, combined) and BTB configurations varying the number of sets and associativity. The combined predictor performed best overall with ANAGRAM showing the highest hit rates and lowest CPI across configurations. GCC generally had the highest CPI and lowest hit rates. Address misses increased for all benchmarks with a smaller 32-set, 2-way BTB configuration.

Uploaded by

Nitu Vlsi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Page | 1

The University of Texas at Dallas


Department of Electrical Engineering

EECE/CS 6304: COMPUTER ARCHITECTURE

PROJECT #2

ANALYSIS OF DIFFERENT TYPES OF


BRANCH PREDICTORS

Submitted by,

Chintan Modi (chm130430)


Ujas Patel (unp130030)

Page | 1

INTRODUCTION
In computer architecture, a branch predictor is a digital circuit that
tries to speculate which way a branch will go before this is known for sure (i.e.,
before its execution). The purpose of the branch predictor is to improve the
flow in the instruction pipeline. They play a critical role in achieving high
effective performance in many modern pipelined microprocessor architectures
such as x86.
In this project, we analyze the behavior of different branch predictor
configurations in three well-recognized benchmarks, especially GCC,
ANAGRAM and GO. We used simplescalar sim-outorder, which models all the
execution aspects of Alpha 21264. The simulations provide the CPI
values(sim_CPI), which we used to compare among different benchmarks.
We have used three types of hardware based branch prediction
strategies, they are:
1) Bimodal Predictor: It is a simple predictor, which uses 2-bit saturating
counters to predict if a given branch is likely to be taken or not.
2) Two Level Predictor: A two-level adaptive predictor with an n-bit history
is that it can predict any repetitive sequence with any period if all n-bit subsequences are different. The advantage of the two-level adaptive predictor
is that it can quickly learn to predict an arbitrary repetitive pattern.
3) Combined Predictor: A hybrid predictor also called combined predictor
implements more than one prediction mechanism. The final prediction is
based either on a meta-predictor that remembers which of the predictors
has made the best predictions in the past or a majority vote function based
on an odd number of different predictors.

Page | 2

Part 1: Performance analysis of different types of


branch predictors and different RAS configurations
The simulation is done for different configuration of Return Address
Space (RAS) and types of branch predictions.

Baseline default RAS: Bimodal predictor with the default value for RAS.
-bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2

2 Level Predictor: Uses two bit for defining the state for branch predictor.
-bpred 2lev -bpred:2lev 1 256 4 0 -bpred:ras 8 -bpred:btb 64 2

Combining (comb): Combines a two levels and bimodal predictor.


-bpred comb -bpred:comb 256 -bpred:bimod 256 -bpred:2lev 1 256 4 0
-bpred:ras 8 -bpred:btb 64 2

RAS 4: Change the return address stack (RAS) size to 4.


-bpred bimod -bpred:bimod 256 -bpred:ras 4 -bpred:btb 64 2

RAS 16: Change the return address stack (RAS) size to 16.
-bpred bimod -bpred:bimod 256 -bpred:ras 16 -bpred:btb 64 2

Performance Analysis based on CPI:


Sr. No.

Configuration

Benchmarks
GCC

ANAGRAM

GO

Baseline

0.9069

0.466

0.8112

2 Level Predictor

0.9453

0.4578

0.8447

Combining

0.8934

0.4537

0.8052

Bimod:RAS 4

0.9115

0.4663

0.8113

Bimod:RAS 16

0.9066

0.466

0.8112

Graphical Representation with above CPI


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO

Page | 3

Above graph clearly displays the performance of different configurations of


branch predictor.
Number of instructions run for GCC= 337326966
Number of instructions run for ANAGRAM= 27022205
Number of instructions run for GO = 692097038
Analysis: Benchmark GCC vs BP Configurations
GCC benchmark has more CPI as compared to the other benchmarks. It
has high CPI for 2 level predictor which uses two bits for defining state of
branch predictor. It can be noted that for combination of two level and bimodal
predictor CPI has decreased. With decrease in Return Stack Address ,CPI
increases.
Analysis: Benchmark ANAGRAM vs BP Configurations
From the above graph, we can infer that ANAGRAM benchmark has a
less CPI than the other two benchmarks. The performance of ANAGRAM
benchmark is fairly constant for all the configurations of branch predictor.
Specifically, CPI is optimal for combination of two level and bimodal predictor
(Comb).
Analysis: Benchmark GO vs BP Configurations
Above graph shows that GO benchmark performs better than the GCC
benchmark. The performance of GO benchmark is almost constant for all the
configurations of branch predictor. Specifically, CPI is optimal for combination
of two level and bimodal predictor (Comb). With respect to bimod size
variation, if we change baseline configuration from the default return address
space from size of 4 to size of 16, CPI performance does not change much.

Page | 4

Performance Analysis based on Address Hit Rates :


Sr. No.
1
2
3
4
5

Configuration
Baseline
2 Level Predictor
Comb
Bimod:RAS 4
Bimod:RAS 16

GCC
0.7102
0.6627
0.7206
0.7058
0.7105

Benchmarks
ANAGRAM
0.9555
0.9579
0.9684
0.9552
0.9555

GO
0.6402
0.5747
0.6409
0.64
0.6402

Graphical Representation with above Address Hit Rates


1.2
1
0.8
0.6
0.4
0.2
0

Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO

The above graph clearly shows the performance


configurations of branch predictor for different benchmarks.

of

different

For ANAGRAM benchmark, for 2 level predictor and combining predictor ,


the hit rates are appreciable.
For GO benchmark, except for 2 level predictor configurations, the
Address Hit Rates are same and appreciable.
For GCC benchmark, except for 2 level predictor configurations, the
Address Hits Rates are appreciable.

Page | 5

Performance Analysis based on Direction Hit Rates


Sr. No.

Configuration

Benchmarks
GCC

ANAGRAM

GO

0.8431

0.9608

0.7525

0.791

0.9629

0.6915

Baseline

2 Level Predictor

Comb

0.8568

0.9736

0.7572

Bimod:RAS 4

0.8431

0.9608

0.7525

Bimod:RAS 16

0.8431

0.9608

0.7525

The graph for the Direction Hit Rates with respect to every benchmark
will provide us more information on the effect of branch prediction
configurations on different benchmarks.
Graphical Representation with above Direction Hit Rates
1.2
1
0.8
0.6
0.4

Benchmarks GCC

0.2

Benchmarks ANAGRAM

Benchmarks GO

The Direction Hit Rates of the branch predictors fairly stays constant for
each benchmark. Specifically, ANAGRAM benchmark has more direction hit
rates than other two benchmarks. In this case, 2 level prediction direction rate
gives worst performance for GCC and GO benchmarks. Combining Predictor
gives best performance for all benchmarks.

Page | 6

Part 2: Modification of the code to accommodate


address misses
We carried out modifications in the following two files in Simplescalar.
1) bpred.h
2) bpred.c
1)

Changes in file bpred.h:

---------------/* branch predictor def */


struct bpred_t {
-----} dirpred;
struct {
-------} retstack;
/* stats */
counter_t addr_hits;
counter_t dir_hits;
counter_t addr_misses;
counter_t used_ras;
counter_t used_bimod;
----------};

/* num correct addr-predictions */


/* num correct dir-predictions (incl addr) */
/* num address misses */
/* num RAS predictions used */
/* num bimodal predictions used (BPredComb) */

2) Changes in file bpred.c:


----------sprintf(buf, "%s.dir_hits", name);
stat_reg_counter(sdb, buf, "total number of direction-predicted hits "
hits)",
&pred->dir_hits, 0, NULL);
sprintf(buf, "%s.addr_misses", name);
stat_reg_counter(sdb, buf, "total number of address misses",
&pred->addr_misses, 0, NULL);
----------if (bpred == NULL)
return;

"(includes addr-

bpred->dir_hits = 0;
bpred->addr_misses = 0;
----------/* Have a branch here */
if (correct)
pred->addr_hits++;
if (!!pred_taken == !!taken)
pred->dir_hits++;
else
pred->misses++;
pred->addr_misses= (pred->misses + pred->dir_hits - pred->addr_hits);
-----------

Page | 7
}

Part 3: Comparison of BTB Performance


The simulation is done for the following configurations of Branch Target
Buffer:
Baseline BTB configuration: 64 sets, 2 way associativity
bpred bimod bpred:bimod 256 -bpred:btb 64 2
Showing the effect of the number of sets in BTB with the following options
bpred bimod bpred:bimod 256 -bpred:btb 32 2
bpred bimod bpred:bimod 256 bpred:btb 128 2
Showing the effect of associativity when the total size of BTB is fixed with the
following options
bpred bimod bpred:bimod 256 -bpred:btb 32 4
bpred bimod bpred:bimod 256 -bpred:btb 128 1
Performance Analysis based on addr_hits
Sr. No.
1
2
3
4
5

Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way

GCC
1005521
937745
1100970
1018386
995879

Benchmarks
ANAGRAM
2032397
2020880
2034249
2037020
2028135

GO
1051818
1010267
1076578
1054258
1031176

Graphical Representation with above addr_hits


2500000
2000000
1500000
1000000
500000
0

Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO

The above graph shows the behavior of various configurations of Branch


Target Buffer (BTB) for different benchmarks. Among all the three benchmarks,
ANAGRAM benchmark has the highest address hits and the performance is

Page | 8

relatively minimum for BTB with 32 sets and 2 way set associative. GO
benchmark has moderate address hits and the performance is relatively
minimum for BTB with 32 sets and 2 way set associative. GCC benchmark has
poor address hits when compared to other benchmark. For this benchmark,
the address hits is again minimum for the configuration of BTB with 32 sets
and 2 way set associative.
Comparison of BTB Performance based on addr_misses
Sr. No.
1
2
3
4
5

Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way

GCC
563339
631115
467890
550474
572981

Benchmarks
ANAGRAM
76544
88061
74692
71921
80806

GO
345506
387057
320746
343066
366148

Graphical Representation with above addr_misses


700000
600000
500000
400000
300000
200000
100000
0

Benchmarks GCC
Benchmarks ANAGRAM
Benchmarks GO

From the above graph, as expected, address misses is very optimal for
ANAGRAM benchmark. GCC benchmark has maximum address misses among
all the three benchmarks. As we can see from the graph, decreasing the
sets from 64 to 32 increases the address misses and increasing the
number of set from 64 to 128 decreases the address misses. This is
because capacity misses is reduced by increasing the number of sets. In case
of 32 sets/4 way configuration, even though set is decreased from 64 to 32 the
address miss is decreased because the associativity is increased which
reduces the conflict misses. In case of 128 sets/1 way configuration, due to
direct mapping, even the increase in number of set increases the addr_misses.

Page | 9

Comparison of BTB Performance based on CPI


Sr. No.
1
2
3
4
5

Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way

GCC
0.9741
0.9899
0.9495
0.9737
0.9748

Benchmarks
ANAGRAM
0.4578
0.4601
0.4572
0.457
0.4584

GO
0.7208
0.7265
0.716
0.7206
0.7226

Graphical Representation with above CPI


1.2
1
0.8
0.6
0.4

Benchmarks GCC

0.2

Benchmarks ANAGRAM

Benchmarks GO

From the above graph, CPI remains fairly constant for every benchmark.
Among the benchmarks, ANAGRAM benchmark has the most optimal CPI and
GCC benchmark holds the maximum CPI for execution with various BTB
configurations. The CPI seems to be higher for configuration 32 sets/2 way
compared to the 64 sets/2 way which has much higher sets than this
configuration. In case of 32 sets/4 way and 128 sets/1 way configurations,
associativity and number of sets makes the CPI almost equal to the 64 sets/2
way CPI. For the configuration with set 128 and associativity 2 the CPI remains
much lower than all other configurations.

P a g e | 10

Comparison of BTB Performance based on Branch Predictor Hit Rates


Sr. No.
1
2
3
4
5

Configuration
64 sets/2 way
32 sets/2 way
128 sets/2 way
32 sets/4 way
128 sets/1 way

GCC
0.6409
0.5977
0.7018
0.6491
0.6348

Benchmarks
ANAGRAM
0.9637
0.9582
0.9646
0.9659
0.9615

GO
0.7527
0.723
0.7705
0.7545
0.738

Graphical Representation with above Branch Predictor Hit Rates


1.2
1
0.8
0.6
0.4

Benchmarks GCC

0.2

Benchmarks ANAGRAM
Benchmarks GO

The above graph clearly shows us that the branch predictor hit rate
for all the benchmarks is relatively low when number of set decreases in
a BTB. When we closely observe the variation in the branch predictor hit rates
of different configurations, it is evident that for BTB configuration, 32 sets and
2 way set associative the branch prediction hit rate is lower for all the
benchmarks. If we have change 32 sets with 4 way set associative to 128 sets
with 1 way set associative, branch prediction hit rate decreases.

CONCLUSION
For an optimal branch predictor, it is recommended to have higher sets but at
the same time tradeoff between cost and performance should be taken into
consideration.
To have high address hit rates and direction hit rates, the simulation results
suggests that combination of two level and bimodal predictor configuration is
better.

You might also like