GinosarNOC Tutorial
GinosarNOC Tutorial
Ran Ginosar
DMA IF DMA
Data
IF Data Controller
AHB DMA - DATA bus 2
Master CORE - DATA bus
Bridge
CORE - program bus CEVA-X1620
O Core I/O
DMA - DATA bus 1
DMA
Prog.
AHB
Slave ARM DATA bus IF Program Controller
-
Bridge
Internal Program memory
L2 SRAM TAG
I/O
APB
bridge
L2 SRAM
Peripheral APB
User
TIMERS ICU PMU GPIO APB system User
peripherals
CRU User
control peripherals
peripherals
3
The NoC Paradigm Shift
Network
link
Network
router
Computing
module
Bus
• Architectural paradigm shift
– Replace the spaghetti by a customized network
• Usage paradigm shift
– Pack everything in packets
• Organizational paradigm shift
– Confiscate communications from logic designers
– Move it to physical design
4
Organizational Paradigm Shift
System architecture
Chip architecture
Netlist
Place
Modules
Trim Adjust
routers / link
ports /
links capacities
• 3-way collaboration: Architects, logic designers, backend
• Requires novel special CAD ! 6
Why go there?
8
What’s in the NoC?
source
network interface
link router
OR:
Time slots allocated,
Circuits are switched,
And packets are
pre-scheduled
9
What flows in the NoC?
• Basic unit exchanged by end-points is the PACKET
• Packets broken into many FLITs
– “flow control unit”
– Typically # bits = # wires in each link (variations)
– Typically contains some ID bits, needed by each switch along
the path:
• Head / body / tail
• VC #
• SL #
• FLITs typically sent in a sequence, making a “worm”
going through wormhole.
• Unlike live worms, FLITs of different packets may
interleave on same link
– Routers know who’s who
10
FLIT interleaving
IP2 IP3
Interface Interface
Interface
IP1
11
Merging of disciplines
SOC / CMP
Architecture
Data
VLSI
Networking
NOC
confusion of terminology
12
NoC vs. Off-chip Networks
NoC Off-Chip Networks
• Main costs power & area • Power and area negligible
• Wires are relatively cheap • Cost is in the links
• Prefer simple hardware • Uses complex software
• Latency is critical • Latency is tolerable
• Traffic may be known • Traffic/applications
a-priori unknown
• Design time specialization • Changes at runtime
• Custom NoCs are possible • Adherence to standards
• No faults, no changes • Faults and changes
13
Simplest NoC router:
Single level
INPUT
SL1
PORT OUTPUT
SL1PORT
INPUT
SL1
PORT OUTPUT
SL1PORT
INPUT
SL1
PORT SWITCH OUTPUT
SL1PORT
BUFFERS ! Software ?
Very expensive Very expensive
on chip on chip
14
Virtual Channels (VC):
Multiple same-priority levels
SL1 OUTPUT
SL1PORT
INPUT PORT
SL1 OUTPUT
SL1PORT
INPUT PORT
SL1 SWITCH
INPUT PORT OUTPUT
SL1PORT
Arbiter
Both VC flits traverse the SAME wires Expensive
on chip 15
Service Levels (a.k.a. VC….):
Multiple priority levels
Delay
* E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process
for Network on Chip”, JSA special issue on NOC, 2004.
19
SoC with NoC
R R R R
R R
R R R R R
R R R R
R R
20
SoC with NoC
R R R R
R R
R R R R R
R R R R
R R
21
The case for Async NoC and hard IP cores
• NOCs are for large SOCs
• Large SOCs = multiple clock domains
→ NOCs in large SOCs should be asynchronous
• Two complementary research areas:
– Asynchronous routers
• simplify design, low power
– Asynchronous interconnect
• high bandwidth, low power
• Problem: need special CAD, special methodology
– Solutions:
• deliver and use as “configurable hard IP core”
• use only at physical design phase
• deliver as predesigned infrastructure (FPGA, SOPC)
22
NoC: Three Levels
• Circuits
– Wires, Buffers, Routers, NI
• Network
– Topology, routing, flow-control
• Architecture
– Everything is packets
– Traffic must be characterized
– NoC can extend to other chips
23
Circuit Issues
• Power challenge
– Possible power sorting: Modules > NI > Switching > buffers >
wires
– Network interface (NI)
• Buffer, request and allocate, convert, synchronize
– Switches: X-bar or mux, arbitrated or pre-configured
– Buffers: Enabled SRAM vs. FF
– Wires: Parallel vs. serial, low voltage, fast wires
• Area challenge (a.k.a. leakage power)
• Latency challenge
• Design challenge
– These circuits are not in your typical library !
• EDA challenge
– Flow? Algorithms? NoC compiler?
• Who is the user?
– Logic design vs. back-end
• Not fit for simple HDL synthesis. Needs customized circuits
24
Networking Issues
• Topology: Regular mesh or custom?
– ASIC are irregular
• Topology: Flat or hierarchical?
25
Networking Issues (cont.)
• Topology: Low or high radix?
– Higher radix nets provide fewer hops (lower dynamic
power)
– But use more wires and more drivers / receivers
(higher static power)
• How many buffers?
– They are expensive (dynamic and static power)
26
Networking Issues (cont.)
• Guaranteed Service or Best Effort?
– GS easy to verify performance
– GS employs no buffers (only muxes): faster, lower
power
– But GS good only for precise traffic patterns
– Philips (NXP) combined GS and BE
• Routing: Flexible or simple?
– Flexible routing bypasses faults and congestions
– Multiple routes may require re-ordering (expensive)
– Fixed, simple single-path routing saves energy and
area
• Multiple priorities and virtual channels
– Effective but cost buffers 27
One size does not fits all!
Reconfiguration
rate
during run time
CMP
ASSP
at boot time
FPGA
at design time ASIC
Flexibility
single General
application purpose computer
29
Other Dimensions
• ASIC vs FPGA
– In FPGA, NoC by vendor or user?
• ASYNC vs SYNC
• One chip vs Multiple chips
– 3D, multi-chip systems
• HW vs SW
• Fixed vs Reconfigurable SoC/NoC
30
NoC for Testing SoC
• Certain test methods seek repeatable cycle-
accurate patterns on chip I/O pins
• But systems are not cycle-accurate
– Multiple clock domains, synchronizers, statistical
behavior
• NoC facilitate cycle-accurate testing of each
component inside the SoC
– Enabling controllability and observability on module
pins
• Instead of chip pins
• Can be extended to space
– Decomposed testing and b-scan in mission
– Useful together with reconfiguration
31
Some rules were made to be broken..
34
Network on Chip
35