0% found this document useful (0 votes)
86 views

Soc Design

The document discusses various design approaches for embedded systems on silicon including ASICs, ASIPs, DSPs, FPGAs, general purpose microprocessors and their combinations with custom accelerators. It also covers the concepts of computational efficiency, design tools, design flows, logic synthesis, floorplanning and power planning.

Uploaded by

maharajm98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Soc Design

The document discusses various design approaches for embedded systems on silicon including ASICs, ASIPs, DSPs, FPGAs, general purpose microprocessors and their combinations with custom accelerators. It also covers the concepts of computational efficiency, design tools, design flows, logic synthesis, floorplanning and power planning.

Uploaded by

maharajm98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

SoC Design

ICE of silicon
[Roza]

Computational efficiency
106 [MOPS/W]

105

3DTV

Intrinsic computational efficiency

Query
by
humming

104
103
7400

102
i386SX

101
100

601
microsparc

604

i486DX P5 Super
sparc
68040

Turbosparc
604e
604e
21364
21164a
Ultra
P6
sparc

0.13
0.25
Feature size [m]

0.5

0.07

http://bwrc.eecs.berkeley.edu/cic
Designing Embedded Systems on Silicon-1
J. van Meerbergen

2/7/13

Hardware Efficiency
efficiency
high

ASIC

ASIP

medium

DSP
GP proc
FPGA

low

low
Designing Embedded Systems on Silicon-1
J. van Meerbergen

medium

2/7/13

high

flexibility

ASIC Style

A Finite Impulse
Response (FIR) filter

! highly efficient for fixed algorithms


! Ok only for large market volumes (100Ms for 32 nm)
! No changes after processing at all (no field upgrades, tuning to
specific context, bug fixes, new standards)
! Irregular code leads to highly irregular floorplan with large wiring
impact (Edyn) and large leakage (Estat)
! Difficult to efficiently include time multiplexing for irregular code

ASIC + microcontroller style


CPU

MEM

ASIC

! highly efficient for fixed algorithms that use -controller very


seldom
! Ok only for large market volumes (100Ms for 32 nm)
! Limited changes after processing
! Changes only very locally in non-critical code (ok for some field
upgrades, tuning to specific context, bug fixes, new standards)
! Irregular code leads to highly irregular floorplan with large wiring
impact (Edyn) and large leakage (Estat)
! Difficult to efficiently include time multiplexing for irregular code

General-purpose microprocessors
No picture

! Highly flexible: easy field upgrades, tuning to specific context,


bug fixes, new standards
! Easy to use and compiler friendly
! Large market due to combination of smaller markets
! Large A+E overhead: data cache hierarchy, multi-port register file,
instr. hierarchy, very flexible data-path units (wide multiplier, ALU
with many instr.)

GP CPUs + custom accelerators


Accel

! Highly flexible: easy field upgrades, tuning to specific context,


bug fixes, new standards. But degraded when accelerators have
to be used too much
! Easy to use and compiler friendly
! Large market due to combination of smaller markets, but not
when accelerators used more
! Large A+E overhead: data cache hierarchy, multi-port register file,
instr hierarchy, very flexible data-path units (wide multiplier, ALU
with many instr). Partly mitigated when accelerators are used
sufficiently
! Large overhead in communication between microproc and
accelerators except when large code segments(not flexible!)

SoC Design

Synthesis
DFT Insertion
Floorplanning
Power Planning
Clock tree insertion
Place and Route
RC extraction
Timing check
8

Design Tools
System Architecture
C/C++
SystemC
Matlab

RTL
Verilog-XL
NC-Verilog
NC-VHDL
Debussy

Synthesis
RC Compiler
Design Compiler

Physical Design
SoC Encounter
Magma (Synopsys)
Mentor
9

Simplified Flow
.lib
LEF

RTL

Front End
Test
(ATPG)

Logic Synthesis

Logic
Simulation

Floor planning

Formal
Verification

Clock Tree
Synthesis

Timing
Constraints
Static Timing
Analysis

Back End

Place &Route

RC Extraction

DRC/LVS

Netlist

GDSII

Static Timing
Analysis

SPEF, SDF

10

TSMCs Design Flow

11

Flow with Multi-Vendor Tools

12

Design Abstraction Levels


SYSTEM

MODULE
+
GATE

CIRCUIT

DEVICE
G
S
n+

D
n+

13

impact of a
design decision
Conceptual level
high level
RT level
gate level
transistor level
complexity
Designing Embedded Systems on Silicon-1
J. van Meerbergen

2/7/13

Design Flow: Summary


Level
Concept
High level
RT level
Gate level
Transistor level

Time concept
comm. processes with
distinct rates
frame, signal rate
clock
set-up en hold times
Analog

Data type
Tokens

Code lines
1K

arrays, lists
scalars, int, float
bits
Volt, mA

10K
100K
1M
10M

At higher levels the impact of a design decision is


larger.
Vendors concentrate on lower levels (more general
solutions).
Designing Embedded Systems on Silicon-1
J. van Meerbergen

2/7/13

Logic Synthesis

Netlist Synthesis

Synthesis is the process by which an


Logic
DFT
Synthesis
Architecture
abstract description (known as RTL) of
the circuit behaviour (generally in VHDL)
is mapped to a set of primitive standard
cells in a library for a particular process
Translation of RTL description
technology.
into an intermediate format

Idea

Functional
Description

Behavioral
HDL

RTL code
Target ASIC cell library
User Constraints

RTL
Gate-Level
Netlist

Optimization of logic
Mapping of the optimized netlist to
the gates of target library.
Synthesis tool requires

Timing and Area


Environmental
Power, Load etc.

Output of the synthesis is a gate


level netlist in the target
technology
16

RTL Coding

RTL stands for Register Transfer Level

RTL description of a design describes the


design in terms registers and logic that
resides between them

This captures the timing constraints of the


design efficiently
Verilog and VHDL are two most popular
hardware description languages that are
commonly used to write RTL description

Sample RTL code


if IR(3) = 0'then'
PC

:= PC + 1;

RTL description captures the change in


data at each clock cycle

DBUF

:= MEM(PC);

All the registers are updated at the same


time in a clock cycle

SP

:= SP - 1;

RTL captures the data flow

PC

:= DBUF;

Logic synthesis tools translate an RTL


model more efficiently compared to
behavioral model

else

MEM(SP) := PC + 1;

end if;

17

Logic Synthesis
RTL
Process (CLK, RST)
if (RST = 1) then
Q <= 0;
else
if rising_edge (CLK) then
Q <=A and B and !(C and D);

ASIC cell
library

User
constraints

Logic Synthesis
Tool

Gate level netlist


18

Logic Synthesis: Technology Mapping


A
S

Z = (not S and A) or (S and B)


Generic Gates
Z

B
A

Standard Cells
I-002

S
B

Z
ANDOR-001
19

DfT Insertion
Testable Flip-Flops
Scan chain generation
Chain propagation
from core to output pin

DfT Insertion
DfT Insertion and Synthesis
DfT Analysis

Test generation
ATPG / Expansion

test validation

Handoff deliverables

20

Backend Design
Technology Information and
Physical Libraries
Corelib.lef
IOlib.lef
Rams.vclef

Timing libraries

Corelib_slow,lib
Corelib_fast.lib
Corelib_typ.lib
IOlib_slow.lib
RAM timing libraries

Timing constraints (user


defined)
Design Netlist
Add IO pads, power pads
Verilog design netlist

Chip Physical Architecture


I/O
& Hierarchical
Planning

Power Grid
Design
Analysis

Chip
Assembly

Hierarchical
STA

Floorplan
Implementation

Physical Synthesis
Placement

DFT

Clock Tree
Synthesis

Post Placement
Optimisation

Routing and Final Optimisation


Signal Routing
Antennas
Decap, Fillers

Crosstalk Fixing

Post Route Fix


Editing

IO pad location file


21

Floorplanning

Floor planning is the task of deciding


how the chip area is to be utilized by
the leaf modules taking care of wiring
considerations
Two methods of floorplanning:
Top Down: Here the chip is
partitioned up during the
development of the RTL level
modelling. Area is assigned on the
basis of estimated block areas and
shapes, and blocks are placed
relative to each other depending on
connectivity.
Bottom up: Here the design is first
synthesised and then the resultant
gates are clustered together into
blocks on the basis of connectivity.

Std. Cells

IP Block

Most designs use a combination of


both of the above techniques, but the
emphasis is increasingly on the first.

Pads 22

Floorplanning
Calculating core size, width and height
When calculating core size of standard cells, the core utilization must be
decided first. Usually the core utilization is higher than 85%
The core size is calculated as follows

Core Size of Standard Cell =

standard cell area


core utilization

The recommended core shape is a square, i.e. Core Aspect Ratio = 1.


Width = Height = (Core Size of Standard Cells)0.5

Example
Standard cell area = 2,000,000um2
Core utilization demanded = 85%
No macros
Core Size of Standard Cells = 2,000,000 / 0.85 =
2,352,941um2
Width = Height = (2,352,941)0.5 =1534um

23

Floorplanning

Core Margins
Space for power and ground
routing

Core limited / Pad limited designs


When pad width > (core width +
core margin),die size is decided
by pads. And it is called pad
limited design
When pad width < (core width +
core margin), die size is decided
by core. And it is called core
limited design

24

Power Planning
Metal migration (also known as electromigration)
Under high currents, electron collisions with
metal grains cause the metal to move. The
metal wire may be open circuit or short circuit.
Prevention: sizing power supply lines to
ensure that the chip does not fail
Experience: make current density of power
ring < 1mA/m

IR drop
IR drop is the problem of voltage drop of the
power and ground due to high current flowing
through the power-ground resistive network
When there are excessive voltage drops in the
power network or voltage rises in the ground
network, the device will run at slower speed
IR drop can cause the chip to fail due to

Performance (circuit running slower than


specification)
Functionality problem (setup or hold violations)
Unreliable operation (less noise margin)
Power consumption (leakage power)
Latch up
Prevention: adding stripes to avoid IR drop on
cells power line

25

Power Planning: IR Drop


Number of counts inversely proportional
to DSP clock frequency
FC = 10, 20 and 25 MHz
Ringo frequency 115 MHz @ VDD = 1.8V
DSP induced PSN is clearly detected

Counter

enable

Average PSN = 6 counts 2.4 mV/count = 14.4 mV


v(t)

C2 Counts vs. DSP activity (Fc = 20 MHz)


(Tambient = 27C)
699

TC =

1
FC

C2 counts

698
697
696

counts = 6

695
694
693
692
691
0

Source: J. Rius, UPC

50

100

150

200

250

Tester ck-cycles

26

Voltage Drop Verification


VoltageStorm (Cadence)
Block-level Analysis

SoC Encounter

Encounter Power Analysis


Block Power
Consumption

Block
Powergrid
View

Voltage Storm

Partition 1
Virtual
Prototype
IP Block
(flat implementation)

Top-level Analysis
Encounter Power Analysis

Partition 2

Power Grid
View Library

Instance Power
Consumption
Voltage Storm

Top-level
Block-level
Chip
PG
PG
Analysis
SignCreate
Hierarchy
Results displayed
in
off
SoC Encounter Interface

27

Power Grid Design

Power Grid Design &


Analysis

Power Grid Design


Power
Grid
Creation

Power
Grid
Connect

Multiple
Power
Ground

Extraction & Analysis


Parasitics
Extraction

Power
Grid
Analysis

Power
Propagation

Power
Plan
Refinement

Power
Routing

Power
Propagation

Extraction & Hierarchical


Analysis Power
Parasitics
Grid
Extraction
Analysis

28

Power Ring Width

Experience
Gate count = 70 k
4000 Flip-Flops
80% FF with dynamic gated clock
Current needed = 0.2mA/MHz

Note: the value should multiply with 1.8~2 for no


gated design

Example:
Gate count = 200 k
No gated clock
Clock frequency = 20 MHz
Current needed = (200/70) * 0.2 * 20 * 2 = 22.86 mA
Current density < 1mA/m
The Width of P/G Ring > 22.86 um
In order to avoid the slot rule of wide metal, the
largest width is 20 um (process dependent)
Use two sets of P/G ring for this case
29

Power Stripe Calculation


Experience
Add one strap set per 100 um
Example
Core width = height = 1600
Stripe set added = 15
Core/IO power pad selection
Core power pad
One set core power pad
(PVDDC along with PVSSC)
can provide 40~50mA current

IO power pad

Core power
connection
Stripes
Power ring

One set IO power pad


(PVDDR along with PVSSR)
can provide the power for
3~4 output pads, or
6~8 input pads
30

Placement

Placement decides the positions of components within allocated blocks


One cannot route until the components have been placed.
The quality of placement is decided solely on the basis of the quality of routing it allows.
Placement is performed using simple estimates of final routing.
Timing driven P&R is the state of the art
Gates, flip-flops/latches are the common placement objects.
Smaller elements like logic gates are placed in single row.
Larger blocks are placed in multiple-rows.

Std cells

Low utilization
core

31

Placement

Source: Magma

32

Clock Tree Synthesis

Clock signal is used as a timing reference


in a synchronous digital system for the
movement of data within that system.
The Clock Tree or clock distribution
network distributes the clock signal(s) from
a common point to all the elements that
need it
Properties of clock signals
They are loaded with the greatest fanout,

travel over the greatest distances

The goal of clock tree synthesis


includes
Creating clock tree spec file
Building a buffer distribution network

In automatic CTS mode, Encounter will


do the following things
Build the clock buffer tree according to
the clock tree specification file
Balance the clock phase delay with
appropriately sized, inserted clock
buffers

operate at the highest speeds

33

Clock Tree Synthesis

34

Routing
Routing is the process of building the
physical connections between blocks
as defined by the logical connections.
Routing takes place in more than one
layer, the exact number available
depending on the process and design
conventions.
Layers are connected together using
vias
Global Routing
Assigns wires to channels
defined during the floor
planning phase
Detailed Routing
Assigns nets to individual
tracks in the channel

Routing and Final Optimisation


Signal Routing
Antennas
Decap, Fillers

Crosstalk Fixing

Post Route Fix


Editing

35

Routing: Signal Integrity Cross-talk


Peak Noise 20mm wire
Parallel repeater insertion does not reduce
the cross-talk peak noise
For a 10mm communication bus, the delay
noise is lowered by about 77%

Staggered repeaters reduce delay noise by


about 88%

shield wire
pico pad
T1IN

driver

aggressor

receiver

bfx4

T2IN

bfx3
driver

victim

driver

bfx4

aggressor

Power supply 2

T1OUT

Propagation Delay 20mm wire

bfx50ohm

receiver

bfx4

bfx3
bfx4

bfx50ohm

receiver

bfx3

bfx50ohm

bfx4

T3IN

bfx4

T2OUT

T3OUT

shield wire

wire length

Source: M. Meijer and A. Katoch, Philips

36

Routing: SI Prevention

Verification Signoff
Timing & Crosstalk
Analysis

Power
Distribution
Analysis

Parasitic
Extraction

37

Static Timing Analysis


Path 1

This involves three main steps:

Path 2

Design is broken down into sets of


timing paths

The delay of each path is


calculated

CLK
Path 3

All path delays are checked to see


if timing constraints have been met

Path delay calculations


0.54
D1

1.0
0.32

0.66

0.43
0.23

0.25 U33

path_delay = (1.0 + 0.54 + 0.32 + 0.66 + 0.23 + 0.43 + 0.25) = 3.43 ns


38

Physical Verification
DRC
Design Rule
Checking

LVS
Layout vs.
Schematic
verifications

39

Chip Finishing

tiles

Seal-ring & Artefact Generation


helps to make the circuit moisture
resistant and prevents the
generation of cracks in the die
during sawing the wafer
Sometimes this step is simply
called Design Chip Finishing
critical dimensions structures, mask
ids, fuse markers, etc

Tiling - dummy fill/pattern fill

Seal ring

Fabs stringent min and rules on


layer densities on active, poly and
metal must be met by all designs
Currently back-end operation

Each step is followed by


Physical Verification step

40

Package Fitting

Package options

Selection of appropriate
package
Route pads to pins
Wire length is important
Rule checking

GDS2 minimum required


information is the nitride or
pad opening layer or the
pad boundary layer

41

Packaging

You might also like