0% found this document useful (0 votes)
86 views5 pages

Fast Simulation

This document discusses fast simulation techniques for field-programmable gate arrays (FPGAs). It describes how FPGAs are growing larger and more complex, making simulation an important but challenging part of the design process. The document outlines different simulation techniques used for FPGAs, including static timing analysis, functional simulation at the register transfer level and gate levels, and discusses the need for faster simulation methods to keep up with increasing design sizes and complexity. It also notes the challenges in developing simulators that can run significantly faster than real time while maintaining accuracy.

Uploaded by

sunil3679
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views5 pages

Fast Simulation

This document discusses fast simulation techniques for field-programmable gate arrays (FPGAs). It describes how FPGAs are growing larger and more complex, making simulation an important but challenging part of the design process. The document outlines different simulation techniques used for FPGAs, including static timing analysis, functional simulation at the register transfer level and gate levels, and discusses the need for faster simulation methods to keep up with increasing design sizes and complexity. It also notes the challenges in developing simulators that can run significantly faster than real time while maintaining accuracy.

Uploaded by

sunil3679
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fast Simulation Techniques in FPGA

Uma Rajchandani#1, Ashwani Kumar*2


#
M.Tech VLSI, S.G.V.U. Jaipur,India

1
[email protected]

*M.Tech VLSI, S.G.V.U. Jaipur,India

2
[email protected]

ABSTRACT the process of taking a hardware design from


Advancement of technology has transformed big concept to realization.
and complex circuit boards into small and With the increasing design size of chip,
simple Integrated Chips (ICs). ICs have simulation is not sufficiently fast to accommodate
surpassed circuit boards in every field. Be it Performance evaluation with realistic benchmarks.
their small size, low cost, higher speed and This poses the following requirements on the
reliability. This paper describes fast simulation design of the performance model: (1) The
methodology that can produce simulators that simulation speed of the original HDL model needs
(i) are orders of magnitude faster than to be high enough to allow fast functional
comparable simulators, (ii) are cycle- accurate, simulation and (2) the speed of the design
(iii) a functional model that simulates the synthesized from this model should be acceptably
functionality of the computer system using high to support the simulation of large benchmarks.
FPGA. A method for automatic multi This paper examines the specify
partitioning of a multiple-output logic function problems of speeding up partial dynamic
into the smallest number of sub functions for reconfiguration of a fine grain FPGA. The time
mapping to fixed-size .Power consumption and taken to perform reconfiguration depends on a
delays play an important role in extending the number of factor .the numbers of resources to be
architecture to complex designs, configured, off chip-configuration bandwidth,
Implementation of larger designs leads to same granularity of the configuration memory and the
difficulty as that of discrete component. We configuration memory organization. The
describe a prototype FAST system: a full- importance of the first three factors to
system, RTL-level cycle-accurate-capable configuration time is obvious .The organization of
computer system simulator. the configuration memory is important, since can
adversely after the expected linear relationship
KEYWORDS: Field-programmable Gate Array between the number of resources being configured
(FPGA), Programmable logic array (PLA), and the amount of the data that must be loaded into
Register transfer level (RTL). the device. It configuration bit controlling unrelated
resources are contained in the same memory
1. INTRODUCTION location ,then there is a high likelihood that with a
small change to one area of the fabrics, a
One of the biggest challenges that FPGA design disproportionately large number of memory
and simulation engineers face today is time and locations will need to be written in order to bring
resource constraints. With FPGAs growing in about the change.
speed, density and complexity, there is a lot of
taxation not only on manpower but also on 2. SIMULATION TECHNIQUES
computer processors and available memory to
complete a full timing simulation. Furthermore FPGAs need simulation techniques in order to
there is an escalating challenge for the design and ensure that designs work and continue to work fig
verification engineer to get proper testing of 1. FPGA designs are growing in complexity and
today’s FPGA designs in shorter timeframes with the traditional verification methodologies are no
an increased confidence of first-pass success. longer sufficient. In the past, simulation was not an
simulation is the primary tool used for verifying the important stage in the FPGA design flow.
logical correctness of a hardware design. In many Currently, however, it is becoming one of the most
cases simulation is the first activity performed in critical. simulation is especially important when
designing with the more advanced FPGA. 2.2.1REGISTER TRANSFER LEVEL (RTL)
Simulation techniques of FPGA are:- SIMULATION:

2.1 STATIC TIMING ANALYSIS /FORMAL In integrated circuit design, register transfer level
VERIFICATION: (RTL) fig 2 is a level of abstraction used in
describing the operation of a synchronous digital
Static Timing Analysis techniques were originally circuit. In RTL design, a circuit's behavior is
employed for gate-level event-driven simulations to defined in terms of the flow of signals (or transfer
verify both functionality and timing of internal of data) between hardware registers and the logical
logic. Because of the growth in gate count operations performed on those signals. Register
availability these analysis techniques have become transfer level abstraction is used in hardware
valuable for functional simulation of FPGA description languages (HDLs) like Verilog and
internal logic as well. VHDL to create high-level representations of a
Most engineers see this as the only analysis circuit, from which lower-level representations and
needed to verify that the design meets timing. ultimately actual wiring can be derived.
There are a lot of drawbacks to using this as the
only timing analysis methodology. Static analysis
cannot find any of the problems that can be seen
when running a design dynamically. This analysis
will only be able to show if the design as a whole
can meet setup and hold requirements and
generally is only as good as the timing constraints
applied. In a real system, dynamic factors can
cause timing violations on the FPGA.

2.2 FUNCTIONAL SIMULATION:


Fig 2. RTL schematic
Functional simulation is a very important part of
the verification process, but it should not be the 2.2.2 GATE LEVEL SIMULATION:
only part. When doing a functional simulation it
will only test for the functional capabilities of the Gate level simulation is used in the late design
RTL design. It does not include any timing cycle to increase the level of confidence about a
information, nor does it take into consideration design implementation and can help to verify
changes done to the original design due to dynamic circuit behavior that cannot be accurately
implementation and optimization. verified with static methods.

3. CHALLENGES AHEAD
We now come to the crux of the problem:
simulators simply do not execute fast enough. It is
easy to see that a simulator which simulates your
design at 1 cycle per second will take a very long
time to run a test of a million cycles. Using special-
purpose hard ware, like hardware emulators, is very
expensive and not very flexible --making changes
and re-running takes a long time.
Simulators are wonderfully consistent
they produce the same results every time and the
stimulus is 100% repeatable. Not so in the real
world where things can drift based on temperature
or other physical factors or unpredictable delays
inserted into some part of the process. So this kind
of verification is unique to the rapid prototyping
world.
One way to do this would be to route a few
interesting signals to the external pins of the
device, but there are two problems here. The first is
that many designs are pin-constrained, and the
number of interesting signals could be quite large.
The second problem is that if the board has not
Fig 1. FPGA simulation process
been designed carefully, deciding which pins can
perform this debug function, some of the time, and companies will put in a larger than necessary
be accessible is not a trivial task. Of course, if they FPGA on a few debug systems and make the
are used for this purpose 100% of the time, then production versions with smaller FPGAs inserted.
that can be taken care of more simply. So instead
we could think about on-chip instrumentation fig 3. 4. SPEED STRATEGIES
On-chip instrumentation is special
purpose logic built into the device that can capture To improve your system performance following
internal activity for replay at a later time. So, just methods can be applied globally to the entire
like a logic analyzer, we need some triggers to design.
decide when to start capturing the data and we need
some memory to hold the data and a mechanism to Type of VHDL DesignVerilog
be able to access that data.
simulation Runtime / Design
Simulation Runtime /
Memory Simulation
Memory
Full RTL 6.4 minutes /18.1 minutes /
Simulation 28.8 MB 26 MB
Full Timing 176.9
186.2 minutes /
Simulation minutes / 742
775 MB
MB
Timing
7.7 minutes /28.0 minutes /
simulation of
35.8 MB 112 MB
subsection
Full
simulation, 13.8 minutes /48.9 minutes /
timing only 56 MB 134 MB
on subsection
Fig 2.Internal logic diagram of FPGA
Table 1: Runtimes and memory usage for different styles of
simulation FPGA designs
Almost all systems these days contain a JTAG port
and it is fairly simple to connect this debug logic 4.1. QUICK METHODS
into the JTAG system. Through this port, the debug
instrumentation can be configured, controlled, and A) OPTIMIZATION EFFORT:
used to stream the data out. In order to gain access By default, XST synthesizes designs using Normal
to the internal data, the system is stopped and the optimization effort. However, setting the effort to
scan chains fed out of the system along with the High may improve speed up to five percent, at the
captured data from the debug system. So this is cost of increased runtime. To apply the
very static in nature. Set up the test – capture the OPT_EFFORT constraint, set the Optimization
data, stop the test, look at and analyze the data. Effort Synthesis Option.
The next issue to look at is memory.
While FPGAs have quit a lot of memory there are B) REGISTER BALANCING:
many designs that use large percentages of this. So Use register balancing to improve speed at the cost
with limited memory the amount of debug data that of increased area. If your design no longer fits after
can captured is also limited and this has to be a using register balancing, make a precise timing
tradeoff between the number of signals captured analysis of your design and apply register
and the depth of the trace. An external logic balancing only on the most critical clocks or
analyzer may be able to capture millions of vectors, regions of your design. To use the
but this will not be possible on chip. Using external REGISTER_BALANCING constraint, you must
memory will slow the process down such that it set the Register Balancing.
may not be possible to capture data and real time.
These let you select what signals are of C) CONVERT TRISTATES TO LOGIC:
interest, triggering, and sampling logic and may If you target an architecture that supports internal
also contain compression logic to help maximize tristates and your design has tristate inferences,
the use of the available memory. While this takes convert the tristates to logic. The replacement of
extra logic, this is not something that needs to be internal tristates by logic usually leads to an
there for every system that is shipped, so a lot of increase in speed and area. However, this
replacement can lead to an area reduction, because replicates the logic or inserts a buffer. In general,
the logic generated from tristates can be combined this improves the speed of the design, but the logic
and optimized with surrounding logic. replication for this net may be excessive or
insufficient. You can apply a different maximum
D) RESOURCE SHARING: fan-out value to a particular net to force XST to
In most cases, resource sharing improves area and further replicate or reduce the net to improve
speed results. However, for some designs, disabling performance. To apply this constraint, set the
resource sharing can improve speed up to 10 MAX_FANOUT constraint on a specific signal in
percent. To disable the RESOURCE_SHARING the HDL code.
synthesis constraint, disable the Resource Sharing
HDL Option. E) REGISTER PARTITION BOUNDARIES WHEN
USING INCREMENTAL SYNTHESIS:
4.2. IN-DEPTH METHODS When you use incremental synthesis to divide your
design into several partitions, XST cannot perform
A) CHECK HDL ADVISOR MESSAGES: efficient optimization across partition boundaries,
These messages help you improve your design. For which leads to less-than-optimal results. When
example, if you place a KEEP constraint on a net using incremental synthesis, register the boundaries
and this constraint prevents XST from improving for each of the partitions. This minimizes the
design speed, the XST HDL Advisor points out this impact on optimization. For example, if the critical
limitation. Monitor these HDL Advisor messages path goes through two partitions, XST must
in the Console tab of the Project Navigator preserve hierarchy and cannot optimize across the
Transcript window, or double-click View Synthesis partitions. In this case, use registers to separate
Report. For more information on the Console tab, these blocks or to change your design partitions.
see Using the Console, Errors, and Warnings Tabs.
F) REDUCE AREA:
If your target device is nearing capacity, the placer
B) CHECK THE USE OF FPGA-SPECIFIC and router may have problems finding efficient
RESOURCES: routing, and may have problems meeting timing
Check the use of resources, such as block versus objectives. When the route delay is significantly
distributed RAM and LUT-based versus hardware higher than the logic delay in the Timing Reports,
multipliers. For example, if your critical path goes this indicates such a problem. Reducing area may
through a multiplier and the multiplier is free more routing and logic placement resources to
implemented using a MULT18X18 primitive, you help meet speed requirements.
can increase the speed by changing the
implementation to a LUT structure and pipelining 5. CONCLUSION
it.
As designs get more complex, simulation speed
C) ADJUST SLICE UTILIZATION RATIO: will become the overriding consider ration for
The SLICE_UTILIZATION_RATIO constraint, selecting an HDL and simulator. So in this paper
which is set to 100 percent for the entire design by we have described the fast simulation
default, controls the amount of logic and register methodologies for advanced simulation with a
replication that takes place during timing technology that is currently available. This is by no
optimization. For example, if you specify a ratio of means a revolutionary methodology but one that
50 percent for one of the blocks in your design, but either most designers are not fully aware of or fully
XST detects that the actual ratio is 48 percent, XST understand. These are techniques that have been
performs timing optimization until timing used in the past for different types of simulation
constraints are met or until the 50 percent limit is and verification, but may not have been used to
reached. If the timing is not met, but the ratio limit their full potential. Using simulation can have an
is reached, decrease the ratio limit to see if it is immense effect on how much time and effort it
possible to meet the timing constraints. If the takes to completely verify a design. Hopefully,
timing constraints are met after decreasing the with the aid of this paper, it is possible to
ratio, find a way to reduce the area for less critical accomplish faster and more efficient simulation.
blocks to allow greater area for the critical block.
To apply this constraint, use the Slice Utilization 6. ACKNOWLEDGEMENT
Ratio Synthesis Option.
I am heartily thankful to my coordinator Asso.
D) ADJUST MAX FAN-OUT: Prof. Sujeet Gupta (S.G.V.U. Jaipur, India), whose
The value set for the MAX_FANOUT synthesis encouragement, guidance and support from the
constraint controls logic replication. If the critical initial to the final level enabled us to develop an
path goes through a net with a high fan-out, XST understanding of the subject.
7. REFERENCES

1. D. Chatterjee, A. DeOrio, and V. Bertacco. High-performance


Gate-level simulation with GP-GPUs. In Proc. DATE, 2009.

2. H. Kubota, Y. Tanji, T. Watanabe, and H. Asai, “Generalized


method of the time-domain circuit simulation Proc. CICC 2005.

3. Wolkotte P.T., Holzenspies P.K.F. and Smit G.J.M. Fast,


Accurate and Detailed NoC Simulations In Proceedings of the
First International Symposium on Networks-on-Chip.

4. Y.-I. Kim, W. Yang, Y.-S. Kwon, and C.-M. Kyung. efficient


hardware acceleration for fast functional simulation. Proc. DAC,
2004.

5. Z. Barzilai, J. Carter, B. Rosen, and J. Rutledge. HSS–a high-


speed simulator. IEEE Trans., 1987.

6. Derek Chiou, Huzefa Sunjeliwala, Dam Sunwoo, John Xu,


and Nikhil Patil. FPGA-based Fast, Cycle-Accurate, Full-
System Simulators. Number UTFAST-2006-01, Austin, TX,
2006.

7. T. Suh, H.-H. S. Lee, S.-L. Lu, and J. Shen. Initial


Observations of Hardware/Software Co-Simulation using FPGA
in Architectural Research.

8. Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil Patil,


William H. Reinhart, D. Eric Johnson and Zheng Xu The FAST
methodology for high-speed SoC/computer simulation. In
Proceedings of the 2007 IEEE.

9. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren,


G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B.
Werner. Simics: A full system simulation platform. IEEE
Computer, February 2002.

You might also like