0% found this document useful (0 votes)
62 views

RRL Parallel

The document discusses techniques for improving the performance of superscalar microprocessors. It proposes a theoretical model to analyze superscalar processor performance that views it as an interaction between program parallelism and machine parallelism. These parallelisms are broken down into component functions that can be measured or computed. The functions are combined to model the interaction and accurately estimate performance. Calculated performance based on this model is then compared to simulated performance for benchmarks on different processor configurations.

Uploaded by

Amelia Lamera
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

RRL Parallel

The document discusses techniques for improving the performance of superscalar microprocessors. It proposes a theoretical model to analyze superscalar processor performance that views it as an interaction between program parallelism and machine parallelism. These parallelisms are broken down into component functions that can be measured or computed. The functions are combined to model the interaction and accurately estimate performance. Calculated performance based on this model is then compared to simulated performance for benchmarks on different processor configurations.

Uploaded by

Amelia Lamera
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Over the past decade superscalar microprocessors have become a source of tremendous

computing power. They form the core of a wide spectrum of high-performance computer
systems ranging from desktop computers to small-scale parallel servers to massively-parallel
systems. To satisfy the ever-growing need for higher levels of computing power, computer
architects need to investigate techniques that continue improving the performance of superscalar
microprocessors while considering both changing technology and applications. Superscalar
microarchitectures, on which superscalar microprocessors are based, deliver high performance
by executing multiple instructions in parallel every cycle. Hardware is used to detect and execute
parallel instructions. This technique of exploiting fine-grain parallelism at the instruction level to
improve performance is commonly referred to as instruction-level parallelism. The maximum
number of instructions processed in parallel, also known as the width of the microarchitecture, is
typically four for the fastest microprocessors available today.
The current trace-driven simulation approach to determine superscalar processor performance is widely
used but has some shortcomings. Modern benchmarks generate extremely long traces, resulting in
problems with data storage, as well as very long simulation run times. More fundamentally, simulation
generally does not provide significant insight into the factors that determine performance or a
characterization of their interactions. This paper proposes a theoretical model of superscalar processor
performance that addresses these shortcomings. Performance is viewed as an interaction of program
parallelism and machine parallelism. Both program and machine parallelisms are decomposed into
multiple component functions. Methods for measuring or computing these functions are described. The
functions are combined to provide a model of the interaction between program and machine
parallelisms and an accurate estimate of the performance. The computed performance, based on this
model, is compared to simulated performance for six benchmarks from the SPEC 92 suite on several
configurations of the IBM RS/6000 instruction set architecture.
Superscalar microprocessors have increased in power over the previous ten years. They serve
as the foundation for a broad range of high-performance computer systems, including desktop
computers, small-scale parallel servers, and massively parallel systems. Computer architects
must research methods that maintain the performance of superscalar microprocessors while
taking into account both evolving technology and applications in order to meet the
continuously increasing need for higher levels of computing power. Superscalar
microarchitectures, on which superscalar microprocessors are based, supply excessive overall
performance via way of means of executing a couple of commands in parallel each cycle.
Hardware is used to locate and execute parallel commands. This approach of exploiting fine-
grain parallelism on the practice degree to enhance overall performance is normally known
as practice-degree parallelism. The most range of commands processed in
parallel, additionally called the width of the microarchitecture, is commonly four for
the quickest microprocessors to be had today (Palacharla, Complexity-Effective Superscalar
Processors, 1998).

The widely utilized trace-driven simulation method currently used to assess the performance of
superscalar processors has certain drawbacks. Modern benchmarks produce traces that are
incredibly extensive, which causes issues with data storage and extends simulation run times.
Fundamentally speaking, simulation typically doesn't offer much understanding of the variables
that affect performance or a description of their relationships. In order to correct these flaws,
this work suggests a theoretical model of the performance of superscalar processors. Program
and machine parallelism are thought to interact to produce performance. The parallelism of
both programs and machines is divided into a number of separate component functions. The
ways in which these functions can be measured or calculated are described. By combining the
functions, a model of the interaction between parallelisms in the program and the machine is
produced, along with a precise prediction of performance. For six benchmarks from the SPEC 92
suite, calculated performance based on this model is compared to simulated performance using
different configurations of the IBM RS/6000 instruction set architecture (Noonburg & Shen,
n.d).

It is shown in (Hennessy & Patterson, 1996) that the maximum ILP that a micro-processor with infinite
resources and perfect branch prediction could achieve is no higher than a few hundred IPC (instructions
per cycle) and for some applications, a few tens of IPC. This limitation on the amount of ILP is due
to the serialization caused by data dependences. In this scenario, data value
speculation is a promising mechanism to go beyond the barriers imposed by data dependences, since it
allows instructions to be executed without having to wait for the results generated by those on which
they depend. All that is necessary to implement data value
speculation is a mechanism that allows the processor to predict the values that flow among the
instructions, as well as recover the precise state in the case of a misprediction. Data value speculation is
based on the observation that inputs and outputs of many instructions sometimes follow a predictable
pattern.

The highest instruction-per-cycle (ILP) that a microprocessor with infinite resources and flawless branch
prediction could achieve is shown in (Hennessy & Patterson, 1996) to be no more than a few hundred
IPC (instructions per cycle), and for some applications, a few tens of IPC. The serialization brought on by
data dependencies is what limits the quantity of ILP. Since it enables commands to be executed without
having to wait for the outcomes produced by those on which they depend, data value speculation in this
situation is a viable way to overcome the constraints imposed by data dependences. Implementing data
value speculation just requires a technique that enables the processor to foresee the values that will
flow between the instructions and recover the precise state in the event of a prediction error. The
assumption that data values have some predictive pattern is based on the observation that inputs and
outputs of numerous instructions occasionally do (Gonzalez, J & Gonzalez, A.,1998).

Hennessy, J.L. & Patterson, D.A. (1996) Computer Architecture. A Quantitative Approach, 2nd ed.,
Morgan Kaufmann, San Francisco: N.A.

Palacharla, S. (1998). Complexity-Effective Superscalar Processors. In S. Palacharla, Complexity-Effective


Superscalar Processors (p. 182). University of Winconsin-Madison: N.A.

Shen, J., & Noonburg, D. (n.d). Theoretical Modeling of Superscalar Processor Performance. In D. B.
Shen, Theoretical Modeling of Superscalar Processor Performance (p. 11). Carnegie Mellon University:
N.A.
Gonzalez, J. & Gonzalez, A. (1998). Data value speculation in superscalar processors. In Data value
speculation in superscalar processors (p. 9). Universitat Politècnica de Catalunya, Barcelona,
Spain: Microprocessors and Microsystems.

You might also like