MIPS Superscalar Simulator
MIPS Superscalar Simulator
Computer Architecture
Doo H. Kim, Matthew Zhu
Group # 02, CSCI 5593, 05/03/2016
A simulator was developed which models the behavior of a processor with a superscalar
pipeline that is capable of fetching and committing two instructions in a single clock cycle.
The software simulator consists of an assembler, memory components, and a superscalar
processor. Hazard detection, data forwarding, pipeline stalling, instruction reordering,
instruction cancellation, and a branch untaken prediction scheme are simulated.
Benchmarks are assembled and executed using the simulator for verification of functional
correctness and for timing assessments. The simulator is found to be capable of accurately
processing instructions with a cycles per instruction ratio between 0.7 and 1.1.
Nomenclature
CPI
DE
EX
funct
IF
MEM
MIPS
NOP
opcode
RAW
rd
rs
rt
shamt
VLIW
WAR
WAW
WB
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
I. Introduction
OMPUTER architects have been constantly striving to improve the performance of computers since their
advent. Modern computer architectures use methods such as superpipelining, superscalar pipelining, and VLIW
processing in order to enhance computational performance [1]. Difficulty arises in the exploration of the design
space for computer architectures since it is very time consuming and expensive to develop hardware prototypes and
analytical models tend to be inaccurate. Because of this, favorable designs are modeled in software and analysis is
done by simulation. Well designed architectural simulators are flexible, parameterizable, accurate, and have short
evaluation and development times. In simulators, various components of a computer architecture are evaluated such
as branch predictors, cache, and instruction pipelines.
CSCI 5593
Spring 2016
Simulators can be divided into two main categories: functional simulators and cycle-accurate simulators. The
purpose of functional simulators is to validate the correctness of a design as well as to produce sequences of
instructions or memory addresses, or traces, for usage in a cycle-accurate simulator component. Cycle-accurate
simulators are used to determine the practical instruction throughput of a processor. Simulators can also be
subdivided into user-level simulators and full-system simulators; full system simulators account for the effects of
operating systems, whereas user-level simulators only account for application and system library code.
Execution-driven simulators consist of both functional and timing components [2]. How both components
interact varies between simulators; various simulator organizations are given in Fig. 1. In order to achieve the goal
of flexibility, it is common for simulators to decouple their functional and timing components. In an integrated
simulator, changing something in one component may require changing something in another component.
Simulator implementations often consist of assemblers and instruction parsers. Implementation of these
components requires knowledge of instruction set architecture characteristics. For this project, a MIPS instruction
set architecture is used, for which instructions are generally categorized into the three types depicted by Fig. 2.
III. Implementation
There are three main components for the simulator built in the project: an assembler, memory model, and
superscalar processor. The class diagram for the assembler is shown in Fig. 3. At the start of the simulation, the path
CSCI 5593
Spring 2016
Spring 2016
at the end of every clock cycle as shown in Fig. 8. This list is used in the decode stage to determine whether or not
there is a RAW hazard between instructions in the stage and instructions that passed the stage within two cycles ago.
If there is, a forwarding flag maintained by the simulated instruction is set for it to obtain forwarded data at the
beginning of the execution stage.
Spring 2016
between instructions 10 and 11 so instruction 10 enters the pipeline alone. The second branch condition evaluates to
true, so the branch is mispredicted and instructions in the preceding two stages must be cancelled into NOPs. If the
instructions between the branch and the target are not skipped over, the final result will not be the expected value
from executing the instructions sequentially. The load word instruction on line 16 needs to forward from MEM to
EX for the add instruction on line 19. The final instruction depends on a r9 which in turn depends on the load
instruction having correctly forwarded its value for r7. Upon execution of the benchmark, the simulation completed
with the expected final value of 2 in r1 with a CPI of 0.9. This verifies that the processor model of the simulator is
capable of correctly forwarding and reordering instructions in a functionally correct manner.
Other benchmarks are also executed for the purpose of a timing assessment for the simulated processor. Multiple
benchmarks were executed with a resultant CPI between 0.7 and 1.1. This is a reasonable result for a superscalar
processor that ideally fetches and commits two instructions per cycle. Because of branch mispredictions and data
dependences, it is not always possible to achieve a CPI of 0.5. In general, the simulated superscalar processor
exceeds the ideal performance of a scalar pipeline processor.
V. Conclusion
Superscalar processor simulators are complex to design and implement. Each pipeline stage has a distinct role,
but must coordinate with other pipeline stages in order to correctly simulate behaviors such as data forwarding,
branch prediction, instruction cancelling, instruction reordering, hazard detection, and pipeline stalling. These
behaviors must be accurately simulated in order for the execution of the benchmarks to produce expected results.
The simulator developed in this project was functionally validated and timing assessments were made by executing
various benchmarks. Although the simulated superscalar processor does not achieve an ideal value of 0.5 for the
CPI, it generally outperforms an ideal scalar pipelined processor.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
J. L. Gaudiot et al., Techniques to Improve Performance Beyond Pipelining: Superpipelining, Superscalar, and VLIW,
Advances in Computers, vol. 63, 2005, pp. 1-34.
L. Eeckhout, Simulation, in Computer Architecture Performance Evaluation Methods, San Rafael, Morgan & Claypool,
2010, ch. 5, pp. 49-62.
M. T. Yourst, PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator, IEEE Int. Symp.
Performance Analysis of Systems & Software (ISPASS), San Jose, CA, 2007, pp. 23-34.
P. Wang et al., Simple-VLIW: A Fundamental VLIW Architectural Simulation Platform, IEEE Asia Simulation
Conference. System Simulation and Scientific Computing (ICSC), Beijing, 2008, pp. 1258-1266.
J. J. Yi et al., Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations,
IEEE Transactions on Computers, vol. 55, 2006, pp. 268-280.
D. A. Penry, A Single-Specification Principle for Functional-to-Timing Simulator Interface Design, IEEE Int. Symp.
Performance Analysis of Systems & Software (ISPASS), Austin, TX, 2011, pp. 186-196.
H. Zeng et al., MPTLsim: A Cycle-Accurate, Full-System Simulator for x86-64 Multicore Architectures with Coherent
Caches, Newsletter ACM SIGARCH Computer Architecture News, vol. 37, 2009, pp. 2-9.
A. Patel et al., MARSSx86: A Full System Simulator for x86 CPUs, Dept. of Computer Science, State University of
New York at Binghamton, 2011.
Milo Bev and Stanislav Kahnek., VLIW-DLX Simulator for Educational Purposes, WCAE '07 Proceedings of the
2007 workshop on Computer architecture education, 2007, pp. 8-13.
J. M. Colmenar et al., An Overview of Computer Architecture and System Simulation, SCS M&S Magazine, 2011.
"MIPS Reference Sheet", www-inst.eecs.berkeley.edu, 2016. [Online]. Available: http://wwwinst.eecs.berkeley.edu/~cs61c/resources/MIPS_help.html.
"MIPS Assembly - Wikibooks, open books for an open world", En.wikibooks.org, 2016. [Online]. Available:
https://en.wikibooks.org/wiki/MIPS_Assembly.
"CS161: MIPS Instruction Reference", Alumni.cs.ucr.edu, 2016. [Online]. Available:
http://alumni.cs.ucr.edu/~vladimir/cs161/mips.html.
"NOP", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/NOP.
CSCI 5593
Spring 2016