Fast Simulation
Fast Simulation
2.1 STATIC TIMING ANALYSIS /FORMAL In integrated circuit design, register transfer level
VERIFICATION: (RTL) fig 2 is a level of abstraction used in
describing the operation of a synchronous digital
Static Timing Analysis techniques were originally circuit. In RTL design, a circuit's behavior is
employed for gate-level event-driven simulations to defined in terms of the flow of signals (or transfer
verify both functionality and timing of internal of data) between hardware registers and the logical
logic. Because of the growth in gate count operations performed on those signals. Register
availability these analysis techniques have become transfer level abstraction is used in hardware
valuable for functional simulation of FPGA description languages (HDLs) like Verilog and
internal logic as well. VHDL to create high-level representations of a
Most engineers see this as the only analysis circuit, from which lower-level representations and
needed to verify that the design meets timing. ultimately actual wiring can be derived.
There are a lot of drawbacks to using this as the
only timing analysis methodology. Static analysis
cannot find any of the problems that can be seen
when running a design dynamically. This analysis
will only be able to show if the design as a whole
can meet setup and hold requirements and
generally is only as good as the timing constraints
applied. In a real system, dynamic factors can
cause timing violations on the FPGA.
3. CHALLENGES AHEAD
We now come to the crux of the problem:
simulators simply do not execute fast enough. It is
easy to see that a simulator which simulates your
design at 1 cycle per second will take a very long
time to run a test of a million cycles. Using special-
purpose hard ware, like hardware emulators, is very
expensive and not very flexible --making changes
and re-running takes a long time.
Simulators are wonderfully consistent
they produce the same results every time and the
stimulus is 100% repeatable. Not so in the real
world where things can drift based on temperature
or other physical factors or unpredictable delays
inserted into some part of the process. So this kind
of verification is unique to the rapid prototyping
world.
One way to do this would be to route a few
interesting signals to the external pins of the
device, but there are two problems here. The first is
that many designs are pin-constrained, and the
number of interesting signals could be quite large.
The second problem is that if the board has not
Fig 1. FPGA simulation process
been designed carefully, deciding which pins can
perform this debug function, some of the time, and companies will put in a larger than necessary
be accessible is not a trivial task. Of course, if they FPGA on a few debug systems and make the
are used for this purpose 100% of the time, then production versions with smaller FPGAs inserted.
that can be taken care of more simply. So instead
we could think about on-chip instrumentation fig 3. 4. SPEED STRATEGIES
On-chip instrumentation is special
purpose logic built into the device that can capture To improve your system performance following
internal activity for replay at a later time. So, just methods can be applied globally to the entire
like a logic analyzer, we need some triggers to design.
decide when to start capturing the data and we need
some memory to hold the data and a mechanism to Type of VHDL DesignVerilog
be able to access that data.
simulation Runtime / Design
Simulation Runtime /
Memory Simulation
Memory
Full RTL 6.4 minutes /18.1 minutes /
Simulation 28.8 MB 26 MB
Full Timing 176.9
186.2 minutes /
Simulation minutes / 742
775 MB
MB
Timing
7.7 minutes /28.0 minutes /
simulation of
35.8 MB 112 MB
subsection
Full
simulation, 13.8 minutes /48.9 minutes /
timing only 56 MB 134 MB
on subsection
Fig 2.Internal logic diagram of FPGA
Table 1: Runtimes and memory usage for different styles of
simulation FPGA designs
Almost all systems these days contain a JTAG port
and it is fairly simple to connect this debug logic 4.1. QUICK METHODS
into the JTAG system. Through this port, the debug
instrumentation can be configured, controlled, and A) OPTIMIZATION EFFORT:
used to stream the data out. In order to gain access By default, XST synthesizes designs using Normal
to the internal data, the system is stopped and the optimization effort. However, setting the effort to
scan chains fed out of the system along with the High may improve speed up to five percent, at the
captured data from the debug system. So this is cost of increased runtime. To apply the
very static in nature. Set up the test – capture the OPT_EFFORT constraint, set the Optimization
data, stop the test, look at and analyze the data. Effort Synthesis Option.
The next issue to look at is memory.
While FPGAs have quit a lot of memory there are B) REGISTER BALANCING:
many designs that use large percentages of this. So Use register balancing to improve speed at the cost
with limited memory the amount of debug data that of increased area. If your design no longer fits after
can captured is also limited and this has to be a using register balancing, make a precise timing
tradeoff between the number of signals captured analysis of your design and apply register
and the depth of the trace. An external logic balancing only on the most critical clocks or
analyzer may be able to capture millions of vectors, regions of your design. To use the
but this will not be possible on chip. Using external REGISTER_BALANCING constraint, you must
memory will slow the process down such that it set the Register Balancing.
may not be possible to capture data and real time.
These let you select what signals are of C) CONVERT TRISTATES TO LOGIC:
interest, triggering, and sampling logic and may If you target an architecture that supports internal
also contain compression logic to help maximize tristates and your design has tristate inferences,
the use of the available memory. While this takes convert the tristates to logic. The replacement of
extra logic, this is not something that needs to be internal tristates by logic usually leads to an
there for every system that is shipped, so a lot of increase in speed and area. However, this
replacement can lead to an area reduction, because replicates the logic or inserts a buffer. In general,
the logic generated from tristates can be combined this improves the speed of the design, but the logic
and optimized with surrounding logic. replication for this net may be excessive or
insufficient. You can apply a different maximum
D) RESOURCE SHARING: fan-out value to a particular net to force XST to
In most cases, resource sharing improves area and further replicate or reduce the net to improve
speed results. However, for some designs, disabling performance. To apply this constraint, set the
resource sharing can improve speed up to 10 MAX_FANOUT constraint on a specific signal in
percent. To disable the RESOURCE_SHARING the HDL code.
synthesis constraint, disable the Resource Sharing
HDL Option. E) REGISTER PARTITION BOUNDARIES WHEN
USING INCREMENTAL SYNTHESIS:
4.2. IN-DEPTH METHODS When you use incremental synthesis to divide your
design into several partitions, XST cannot perform
A) CHECK HDL ADVISOR MESSAGES: efficient optimization across partition boundaries,
These messages help you improve your design. For which leads to less-than-optimal results. When
example, if you place a KEEP constraint on a net using incremental synthesis, register the boundaries
and this constraint prevents XST from improving for each of the partitions. This minimizes the
design speed, the XST HDL Advisor points out this impact on optimization. For example, if the critical
limitation. Monitor these HDL Advisor messages path goes through two partitions, XST must
in the Console tab of the Project Navigator preserve hierarchy and cannot optimize across the
Transcript window, or double-click View Synthesis partitions. In this case, use registers to separate
Report. For more information on the Console tab, these blocks or to change your design partitions.
see Using the Console, Errors, and Warnings Tabs.
F) REDUCE AREA:
If your target device is nearing capacity, the placer
B) CHECK THE USE OF FPGA-SPECIFIC and router may have problems finding efficient
RESOURCES: routing, and may have problems meeting timing
Check the use of resources, such as block versus objectives. When the route delay is significantly
distributed RAM and LUT-based versus hardware higher than the logic delay in the Timing Reports,
multipliers. For example, if your critical path goes this indicates such a problem. Reducing area may
through a multiplier and the multiplier is free more routing and logic placement resources to
implemented using a MULT18X18 primitive, you help meet speed requirements.
can increase the speed by changing the
implementation to a LUT structure and pipelining 5. CONCLUSION
it.
As designs get more complex, simulation speed
C) ADJUST SLICE UTILIZATION RATIO: will become the overriding consider ration for
The SLICE_UTILIZATION_RATIO constraint, selecting an HDL and simulator. So in this paper
which is set to 100 percent for the entire design by we have described the fast simulation
default, controls the amount of logic and register methodologies for advanced simulation with a
replication that takes place during timing technology that is currently available. This is by no
optimization. For example, if you specify a ratio of means a revolutionary methodology but one that
50 percent for one of the blocks in your design, but either most designers are not fully aware of or fully
XST detects that the actual ratio is 48 percent, XST understand. These are techniques that have been
performs timing optimization until timing used in the past for different types of simulation
constraints are met or until the 50 percent limit is and verification, but may not have been used to
reached. If the timing is not met, but the ratio limit their full potential. Using simulation can have an
is reached, decrease the ratio limit to see if it is immense effect on how much time and effort it
possible to meet the timing constraints. If the takes to completely verify a design. Hopefully,
timing constraints are met after decreasing the with the aid of this paper, it is possible to
ratio, find a way to reduce the area for less critical accomplish faster and more efficient simulation.
blocks to allow greater area for the critical block.
To apply this constraint, use the Slice Utilization 6. ACKNOWLEDGEMENT
Ratio Synthesis Option.
I am heartily thankful to my coordinator Asso.
D) ADJUST MAX FAN-OUT: Prof. Sujeet Gupta (S.G.V.U. Jaipur, India), whose
The value set for the MAX_FANOUT synthesis encouragement, guidance and support from the
constraint controls logic replication. If the critical initial to the final level enabled us to develop an
path goes through a net with a high fan-out, XST understanding of the subject.
7. REFERENCES