Noxim Tutorial: A Systemc Cycle-Accurate Simulator For On-Chip Networks
Noxim Tutorial: A Systemc Cycle-Accurate Simulator For On-Chip Networks
• Reference Architecture
68
Delay (ps)
45
23
0
180 nm 130 nm 90 nm 65 nm 45 nm 28 nm
Genera:on
FO4 1 mm Global Wire
1 mm Global Wire repeated
Ron Ho et al, “The future of wires”, Proceedings of the IEEE , Volume 89 , Issue 4
■ Network-on-Chip
➔ Mesh of Routers (in red)
➔ Each Processing Element
connected to a Router
➔ Scalability and modularity
➔ Low energy consumption
➔ Increase of design complexity
PE+Router= Tile Node
• Default scenario: Mesh of
Tile nodes
• Solution: wormhole
Wormhole Example (1/6)
H
PE
PE
Wormhole Example (2/6)
B
H H
IP
IP
Wormhole Example (3/6)
B
B H B
IP
H
IP
Wormhole Example (4/6)
T
B H B
IP
B
H
IP
Wormhole Example (5/6)
B H T
IP
B
B
IP
Wormhole Example (6/6)
B B H
IP
T
B
IP
DSE for NoC Architectures
Design effort [Ogras et al., ASAP'05]
Design quality
Increased customization level and flexibility
The Mapping Problem
1 2 3
NoC
Application
(concurrent IP
apps.) Library
The application
tasks are assigned
The application is and scheduled
divided into a graph
of concurrent tasks
ASIC1
T1 T2 T1 T2
MEM1
T3 CPU2
T5 T5 T3
T4 T4
Decide to which tile
each selected IP
T6 Tm T6 Tm should be mapped
CPU1 Mapping such that the metrics
DSP1 (NP-hard) of interest are
Graph of concurrent tasks
optimized
Noxim: NoC Simulator
• Cycle accurate, Open Source:
• https://github.com/davidepatti/noxim
• Windows: it is possible. Some people did it. Not supported (by me)
Directory Structure
• config_examples: NoC configuration files, just make
a copy of default_config.yaml if you want to
experiment
3. type:
4. Enjoy results
Simple Results Format
Noxim simulation completed.
( 11000 cycles executed)
% Total received packets: 1458
% Total received flits: 11665
% Received/Ideal flits Ratio: 1.01259
% Average wireless utilization: 0
% Global average delay (cycles): 11.5528
% Max delay (cycles): 70
% Network throughput (flits/cycle): 1.29611
% Average IP throughput (flits/cycle/IP): 0.0810069
% Total energy (J): 2.12248e-06
% Dynamic energy (J): 1.47232e-07
% Static energy (J): 1.97525e-06
• Throughput in flits: number of flit per packets may change
dynamically, while bits per flit usually does not (bus width)
• TIP: For a more detailed output, add -detailed
to command line
Configuration File
• Cool results, but what did I simulated?
• You use the YAML file to set a baseline NoC configuration, i.e., most
of them will remain unchanged
• Then use the command line to change the parameters you are
exploring, e.g., traffic rate, simulation length, traffic pattern, routing
algorithms
…
% Global average delay (cycles): 7.45455
% Max delay (cycles): 24
% Network throughput (flits/cycle): 0.136889
% Average IP throughput (flits/cycle/IP): 0.00855556
• The
blue numbers seems to be good, but they only tell the story of what have
been delivered
• Throughput: how the network is able to process the requested packet injection
rate (pir)?
• We have set a pir to 0.001 (1 packet every 1000 cycles). Since packets are set to
8 flits size (see YAML config), the resulting 0.008 flits/node/cycle corresponds to
0.001 packet/node/cycle of througthput, i.e. similar to the requested pir.
• Numbers don’t match exactly, because “pir” is a probabilistic value (longer sims).
Checking Saturation
• Pir 0.5 —> Insanely saturated scenario, nothing make sense anymore
% Total received packets: 4766
% Total received flits: 38098
% Received/Ideal flits Ratio: 0.0661424
% Average wireless utilization: 0
% Global average delay (cycles): 5100.47
% Max delay (cycles): 9443
% Network throughput (flits/cycle): 4.23311
% Average IP throughput (flits/cycle/IP): 0.264569
% Total energy (J): 2.4664e-06
% Dynamic energy (J): 4.9115e-07
% Static energy (J): 1.97525e-06
• So, why should I mess things to go into congestion? —-> The breakpoint at which
begins to happen is a measure of “quality”
• Congestion checklist —> (1) huge average delay (2) throughput far less than
expected (3) increasing sim length make things worse
R R R R
PE PE PE PE
R R R R
D=2 ( n − 1) PE PE PE PE
R R R R
E flit = n ⋅ Eswitch + n ⋅ Elink PE PE PE PE
R R R R
Hub Radio
Hub
PE + Switch
Wired Link
Wireless Link
S.Deb et al, “Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and
Challenges”, IEEE Journal on Emerging and Selected Topics In Circuits And Systems, Vol. 2, No.
2, June 2012
80
60
40
20
0
0 1 2 3 4 5 6 7 8
Packet injection rate (flit/cycle/tile) −4
x 10
• Where Pavg(e) is the average dynamic power and α(e,c) is the activity function: 0
if e is not active in cycle c, 1 otherwise
• make clean
• make
Performing Modifying
simulations source code
• Each connection requires 3 signals, not one: two booleans (“req/ack”) are for
protocol, the other one for the actual data
5. Edit power.yaml to put static & dynamic power cost of doing PIZZA
6. Go to noxim/bin and recompile by typing “make”: notice how only your code is
recompiled!
Future Works
• Parallelisation of Noxim. Mapping SystemC threads to different
host-machine threads [Roth et al. 2013; Sinha et al. 2012]
• Vincenzo Catania, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi, and Davide
Patti. 2016. Cycle-Accurate Network on Chip Simulation with Noxim. ACM Trans. Model.
Comput. Simul. 27, 1, Article 4 (August 2016), 25 pages. DOI: https://doi.org/
10.1145/2953878