0% found this document useful (0 votes)
21 views

Nov2014 V3distributed

The document discusses assessing the costs of variability, reliability, and resilience in integrated circuit design. It notes that while density ideally grows at 2x per node, realized density only grows at 1.6x due to resources spent on guardbands, reliability, etc. Reducing guardband sizes can provide benefits like delay and power reductions but quantifying these costs is challenging. The author's group works on reducing margins through more accurate models and optimizing across engineering scopes to reduce costs from variability, reliability, and improve resilience through mechanisms like adaptive voltage scaling.

Uploaded by

ravishoping
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Nov2014 V3distributed

The document discusses assessing the costs of variability, reliability, and resilience in integrated circuit design. It notes that while density ideally grows at 2x per node, realized density only grows at 1.6x due to resources spent on guardbands, reliability, etc. Reducing guardband sizes can provide benefits like delay and power reductions but quantifying these costs is challenging. The author's group works on reducing margins through more accurate models and optimizing across engineering scopes to reduce costs from variability, reliability, and improve resilience through mechanisms like adaptive voltage scaling.

Uploaded by

ravishoping
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Assessing Costs of Variability,

Reliability and Resilience

AndrewB.Kahng
UCSDCSEandECEDepartments

[email protected]
http://vlsicad.ucsd.edu
Design Capability Gap, Value Scaling Gap
Available density ideally grows at 2x/node
= a typical view of Moores Law
Even so, realized density grows at 1.6x/node
Power, performance, area resources spent on guardband, reliability, etc.
Designers obtain only part of Moores Law scaling benefits

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 2
Challenge: Variability + Reliability
Defocus/Dose Variation Mask CD Error
Non-Rectangular Shapes Misalignment
Line-End Shortening Erosion/Dishing in CMP Flare
Line Edge Roughness Wafer flatness
Lens Aberration
Non-Uniform CD
Reliability Alpha-Particle
Imperfect regulators
Temperature NBTI Electromigration
IR-drop
Variation Hot-Carrier Injection
Crosstalk

Variability+Reliability=challengestodesign
closureforacompetitiveICproduct
Designcostsfrommargins;01benefits
Resilience=systemproductsabilityto
mitigatevariabilityandreliabilityphenomena
Errordetectionandrepairmechanisms
Alternativeguardbanding mechanismsfor
differentsystemabstractions:stochastic,
approximate,
Costsandbenefitsoftenlesswelldefined
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 3
Cost of Variability and Reliability
Standard vague picture: increased guardband
Design quality (e.g., frequency)

lost benefits of technology = no ROI

Guardbands
Lost benefits

Signoff with larger


guardbands

Technology Node
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 4
Quantified Cost of Guardband [ISQED08]
Canwequantifycostofguardband? Expectedimpactsof
guardband reduction:
Idea(20072008):studydesign
benefitofreducedguardband
Delay reduction

N.B.:goingtothenextnodegives Easier optimization


20%speed,20%powerbenefit
10%ishalfanode! Smaller gate size

Smaller area (A)


E.g.,50%guardband reductionlookslike:
Shorter wires

Yr e Ad
Fewer defects (d: defect density)
Parambest Paramworst
r2 2r
Less cost N dies
-100% 0% 100% A 2A
(r: wafer radius)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 5
Design Outcomes from Guardband Reduction
Technology 40%guardband reduction
(90nm, 65nm, 45nm)
Area:13% reduction
Dynamicpower:13% reduction
Cell library guardband RC guardband
reduction reduction Leakagepower:19% reduction
Wirelength:12% reduction
Toolruntime(S,P&R):28% reduction
RTL Design #Timingviols.:100% reduction
Synthesis
(AES, JPEG, SOC1) #Gooddiesperwafer(w/oprocess
enhancement):4% increase
Placement Rawdieperwafer
Parametricyield
Experiments
40nmsweetspot:20%guardband reduction
with industry chip Clock tree synthesis
implementation
flow Routing Quantifiedimpactofguardband
insightintocostofguardband !
Analyze outcomes Canwethenanswer:Whatiscost
(Area, wirelength, of{variability,reliability,resilience}?
runtime, #violations,
yield)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 6
My Group: Reduced Margin = Reduced Cost
Pessimismremovalwithmoreaccuratemargins
Explicittradeoffsacrossvarioustypesofmargine.g.,1mV=5MHz
Cooptimizationacrossengineeringscopes,chipimplementation
phasesincludescrosslayer,adaptivity /resilience,

Design Time

ps, nm, mV,

Margin

rms, %, power, area, fmax, Iddq,

Model and Product Quality


Analysis Accuracy

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 7
Reducing Cost Measuring Cost
MeasuringCostofXisdifficult!whichiswhywerehere
Reliabilitymarginsareintertwinedwithothermargins
Toughtoisolatespecificcostsofvariability/reliability/resilience,
especiallyinanydesignagnosticway
TowardAssessingCostof(workatUCSD)
Variability
Reducing(phantom)margins:BEOLcorners,FFtimingmodel
Reliability
costofEMguardband
AVSBTIEM:costofwrongsignoffconditions
Nondefaultroutingrules:costofnaveenforcementofreliabilitymargins
AssessmentofEMmarginconsideringlifetime(throughputandperformance)
Resilience
MinRazor:tradeoffofresiliencemechanismcostvs.margincost
PVS:processawarevoltagescaling(designindependent,tunablemonitors)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 8
Our Usual Playing Field: SOC Implementation
Technology
(90nm, 65nm, 45nm, 28nm)

Cell library guardband RC guardband


FF model reduction reduction
BEOL corners

MinRazor
RTL Designs Synthesis

Placement

P&R stage
optimization Clock tree synthesis

Routing naive EM compliance


EM-overdrive Runtime
optimization
Signoff AVS-BTI-EM signoff

Outcomes
(area, wirelength,
runtime, #viols, yield)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 9
Another View of the (Reliability) Playing Field
Wire Wire
width general
EM spacing

EM TDDB
Driver general

size Jrms EM, TDDB,


general general
NBTI, HCI
general Temp
Lifetime
general
(MTTF)
Activity
factor NBTI
general HCI
() general
Timing
general slack TDDB
Supply NBTI NBTI
voltage |Vthp | HCI

A B Direct relation; if A increases


general
then B increases
HCI HCI
Freq. A B Inverse relation; if A
HCI
|Vthn | increases then B decreases
HCI
general
general Tunable at design or
HCI
HCI runtime
Gate general HCI
Tunable at design
length Slew rate
Models; technology
general Junction parameters
Load/ general resistance (not tunable)
general
fanout
general

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 10
I. Assessing Costs of Variability
Phantom margins: (1) BEOL, (2) FF model pessimism

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 11
[ICCD14]
Pessimism in Conventional BEOL Corners
H3 W T H
T3 M3 Typical typical typical Typical
H2 Inter-layer dielectric
Cbest min min max
T2 S2 W2 M2
Cworst max max min
H1
RCbest max max max
T1 M1
Rcworst min min min
Inter-metal dielectric

ConventionalBEOLcorners(CBC)
Skewalllayersinthesamedirectiontoguardband forvariability
Toopessimistic!Impossibletohaveworstcaseonalllayers
PessimisminCBCcreatesfalsetimingcriticalpaths
Fixingfalsepathsdegradesdesignquality
Slowdowndesignturnaroundtime

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 12
A New Timing Signoff Flow

Routed design Routed design

Classify timing critical


paths

GTBC GCBC

Timing analysis ECO ECO ECO


using conventional using using Timing Timing using
BEOL corners TBC analysis analysis CBC
CBC
using TBC using CBC
(CBC)

violation violation
violation = 0?
No = 0? = 0?
No No

done done

Conventional Signoff Our work


UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 13
Pessimism in Conventional BEOL Corners (CBC)
Assumption: amax(setup)pathpj issafewhenthedelay
evaluatedatagivenCBCislargerthannominaldelay+3j
dj(YCBC)3j+dj(Ytyp)
Foragivenpath,wecancomparethestatisticaldelayvariationand
thedelayobtainedfromagivenCBC
j =3j /dj(YCBC)
dj(YCBC)=[dj(YCBC) dj(Ytyp)]
YCBC {Ycw,Ycb,Yrcw,Yrcb}
Asmallj impliesthereisalargepessimism

3j dj(YCBC)-dj(Ytyp)

-3 delay
Large pessimism

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 14
Wiring Structure in Timing-Critical Paths
Testcase:
45nm foundry library (wire resistivity scaled by 8X)
Netlist: NETCARD 1mm2, 570K standard cell instances
9 metal layers
Extract critical paths from different PVT and BEOL corners

92% of paths have <


Cumulative probability

0.92 60% of wirelength on


any single layer
Wires on critical paths are
routed on many layers
60%
Similar wiring structure is
an outcome of design flow
Max. wirelength ratio across all layers (%)

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 15
Wiring Structure in Timing-Critical Paths

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 16
Opportunities for Tightened BEOL Corners

Challenge: how to avoid


underestimating delay
3j/d(Ytyp) x variation to preserve
100%
parametric yield?

dj(Yrcw)/dj(Ytyp) x 100%

CBC can be pessimistic! Most paths have < 0.5


Use tightened BEOL corners, e.g., scale BEOL variation in
.itf with = 0.5

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 17
Scaling Factor Delay Variation @Cw,RCw
Pathswithsmalldrcw and dcw havelarge
E.g.,therearej >0.6when((drcw <3%)AND (dcw <3%))
IdentifypathsfortightenedBEOLcornersbasedondrcw anddcw

d(Yrcw)/d(Ytyp)

d(Ycw)/d(Ytyp)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 18
A Practical Filter for TBC-Amenable Paths
Gtbc = paths which can be safely signed off using tightened corners:
(Path with (dcw larger than Acw)) OR (Path with (drcw larger than Arcw))

Acw
d(Yrcw)/d(Ytyp)
Arcw

d(Ycw)/d(Ytyp)
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 19
Benefits of Tightened BEOL Corners
CBC TBC0.5 TBC0.6 TBC0.7
WNS and TNS are reduced
1500
by up to 100ps and 53ns

#Timingviolations
#Timing violations reduced by 1000
24% to 100% [Moores Law: 1% / week !]
500
TBC-0.6 : more benefits
Tradeoff between reduced margin 0
vs. #paths which use TBC LEON SUPERBLUE12 NETCARD

CBC TBC0.5 TBC0.6 TBC0.7 CBC TBC0.5 TBC0.6 TBC0.7


LEON SUPERBLUE12 NETCARD LEON SUPERBLUE12 NETCARD
0 0

0.05 20
WNS(ns)

TNS(ns)
40
0.1
60
0.15 80
0.2 100

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 20
[ISQED14]
Flexible FF Timing Margin Recovery
Setup time, hold time and clock-to-q
hold
(c2q) delay of FF setupholdc2q
NOT fixed values flexiblemodel
Flexible FF timing model considering c2q1
operating (function/test) modes

...
Reduce pessimism in timing analysis setupholdc2q c2qn
fixedmodel
Reassessment of costs of variation setup

Objective: Find the best setup/hold time/c2q for each FF


Sequential LP C2q-setup-hold surface c2q
setup-c2q
optimization + hold- c2q
setup hold setup
c2q optimization c2q

hold
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 21
Improved Timing Signoff Flow
Netlist (andSPEF,ifrouted)
Takeaways
Fix timing violations for free
Extractpathtiminginformation 48ps average improvement of
slack over 5 designs in a
foundry 65nm technology
LPformulation
withflexibleflipfloptimingmodel

Next steps
SolveSequentialLP Study in advanced nodes
(STA_FTmax ,STA_FTmin)
Better exploitation of disjoint
Solution cycles/modes
Annotatenewtimingmodel More accurate modeling of
foreachflipflop setup-hold-c2q tradeoff
Circuit optimization exploiting
FF timing model flexibility
Timingsignoffwithannotatedtiming

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 22
Takeaways on Variability
PhantommarginsleaveO(node)valueonthetable
recoveringthisisessentialequivalentscaling
Twoexamples:BEOLcorners,FFtimingmodel
NOTE:Toassesscosts/benefitsofnewmethods,needcorrectstartingpoint!
ConventionalBEOLcornersareVERYpessimistic!
Bottleneckforwiredominated,highperformancecircuits
Revisedsignoffflow+tightenedBEOLcornersreducesWNS,TNS
and#timingviolations
Signoffmethodologychangeunderwayatsponsorcompany
Relaxedtimingclosure shorteneddesigncycle,betterPPA

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 23
II. Assessing Costs of Reliability
(1) cost of suboptimal AVS-BTI-EM signoff;
(2) cost of nave EM rule enforcement;
(3) available lifetime throughput and performance benefit from
scheduling of multi-cores

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 24
[DATE13,SLIP14]
Reliability Margin vs. Adaptive Voltage Scaling
Interaction betweenreliabilitymarginsandAVSmechanism
BTIaging higher|Vth| lowerfmax AVSusedtocompensate
performancedegradation
HighervoltageworsensEMonwires
EM loop
Stress on Circuit Without AVS
Wires frequency
With AVS

VDD Design
(AVS) Implementation target
time

Vlib , Derated
Libraries Vdd
VBTI
BTI loop
Signoff loop of BTI + EM time
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 25
Derated Library Characterization and AVS (BTI Loop)
VBTI =VoltageforBTIagingestimation
Vlib =Voltageforcircuitperformanceestimation(library
characterization)
VBTI andVlib arerequiredinsignoff
VBTI andVlib selectionshouldconsiderBTI+AVSinteraction
AgingandVfinal areunknownsbeforecircuitimplementation

Step 1 Step 2 Step 3


VBTI |Vt| Circuit
Derated
implementation
library
Vlib and signoff

BTI
? Vfinal degradation circuit
and AVS
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 26
Derated Library Characterization and AVS (BTI Loop)

VBTI =VoltageforBTIagingestimation
Inconsistency among Vfinal , Vlib , VBTI
Vlib =Voltageforcircuitperformanceestimation(library
What is the design overhead when timing
characterization)
libraries
VBTI andVare
lib
not properly characterized?
arerequiredinsignoff No obvious
Can we define
VBTI andV lib
BTI- and AVS-aware
dependonagingduringAVS signoff
guideline to
corners
AgingandVthat final
ensure product goals
areunknownsbefore with
define Vsmall
BTI and
design, lifetime energy overheads?
circuitimplementation Vlib
Step 1 Step 2 Step 3
What is the impact of EM for different signoff
|Vt
VBTI Circuit
corners? Derated
|
implementation
library
Vlib and signoff

BTI
? Vfinal degradation circuit
and AVS
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 27
Energy vs. Area Across Different Signoffs

Pessimistic signoff corner


Ovestimate aging and/or
underestimate circuit
performance
Large area overhead

Optimistic signoff corner


AVS increases supply voltage
aggressively to compensate
aging
Large lifetime energy overhead
May fail to meet timing if
Knee point for area desired supply voltage > Vmax
vs. lifetime energy
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 28
AVS Impact on EM Lifetime
AssumenoEMfixatsignoff
BTIdegradationischeckedateachstepandMTTFisupdatedas
1
1

Lifetime (year) Vfinal (V)


12 1.2
30%MTTFpenalty
10
Lifetime (year)

1.1
8

Vfinal (V)
6 1
4
200mVvoltagecompensation 0.9
2
0 0.8
1 2 3 4 5 6 7 8
Implementation #
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 29
Power Penalty of Fixing EM with AVS
Corepowerincreaseswithelevatedvoltage
P/Gpowerincreasesduetobothelevatedvoltage,PDNdegradation
Tradeoffwithguardband investmentatdesignsignoff

Core Power (mW) P/G Power (mW)


17.00 14%powerpenalty 0.35
Core Power (mW)

0.35

P/G Power (mW)


16.00 0.34
Least 0.34
15.00 0.33
invested guardband
14.00 Highest 0.33
invested guardband 0.32
13.00 0.32
0.31
12.00 0.31
1 2 3 4 5 6 7 8
Implementation #
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 30
EM Impact on AVS Scheduling
AVSaffectsEMlifetimepenalty
WeempiricallysweepAVSvoltagestepsizetoobtaintheimpact
5stepsizes:S1 S5={8,10,15,18,20}mV
S1 S2 S3 S4 S5

1.2yearsMTTFpenalty
1.00 8.1

MTTF (Year)
8.1
VDD

8.0
8.0
0.95 7.9
7.9
DMA, #3 S1 S2 S3 S4 S5
0.90
0 5 10 15
UCSD VLSI CAD Laboratory
Year NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 31
[DAC13]
Smarter NDRs in CTS (EM Cost Reduction)
NDRsapplywiderwirewidths(=costsofEM)andspacing
toaddressEMandparasiticanddelayvariationforclock
tree
However,awiredoesNOTneedtobewideifithasa
smallnumberofdownstreamsinks
AccurateassessmentofEMmargin shouldinclude
clocktreetopologies(e.g.,#downstreamsinks)

sink
Driving 4 buffers
Less #downstream sinks
driver (== Less current)
Driving 2 buffers at leaf-side in a clock tree

Driving 1 buffer

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 32
Vicious Cycle vs. Virtuous Cycle
Excessive margin Not just design overhead
Vicious cycle vs. virtuous cycle
sink Fixed Larger
NDR Cap.

Fixed NDR More


driver
(Wider Wires) power

More More/Larger
EM Viol. Buffers

# downstream = 2
Smart Smaller
# NDR Cap.
Smart NDR downstream Less
(Tapering) = 16 powe
Less
rFewer/Smaller
EM Viol. Buffers

Less-nave compliance with EM rules


reduce design overhead, and avoid vicious cycle
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 33
Smart Routing NDRs: Clock Power Reduction
9.2% wire capacitance, 4.9% clock switching power reduction
Still, satisfy skew, max transition limits and EM limit
ProportionsofNDRs
Capacitance,ClockPowerReduction
dma
Default: Wire Cap. conmax
Reduction

15.0% 4W5S Clock Switching Pwr usbf


10.0%
[%]

5.0% tv80s
0.0% mpeg2
mc 1W8S
jpeg_enc 2W7S
eth 3W6S
4W5S
aes
Default: Wire Cap. Clock Switching Pwr
Reduction

8.0% 2W4S 0% 20% 40% 60% 80% 100%


6.0%
[%]

4.0%
2.0%
0.0%
NDR{M}W{N}S
M*widthmin
N*spacingmin

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 34
[ISQED14]
Reliability-Constrained foverdrive Selection
Reliability andsystemlifetimeguaranteesarekeydesign
considerationsformulticoreprocessorsinadvancednodes
Taskschedulingdeterminesuseofcoresacrossoperatingmodes
Overdrive(turbo)modecanmeetperformanceandthroughput
requirements,butincursfasterMTTFdegradation
Twopotentialfailures:throughputandperformance
Canviolateacceptablethroughputfortasks:coresfailbeforeallassignedtasks
arecompleted
Canviolateminimumacceptableperformancefortasks:oresoperateonlyat
lowerfrequenciesthanneeded
EMOD:solvesanewMaximumValueReliabilityConstrained
OverdriveFrequencies(MVRCOF)optimization(offline)problem
Whenallcoresnotsimultaneouslyactive,adjusttaskschedulingonasubsetof
activecoresforbalancedwearout
Guaranteeprescribedlevelsofperformance and lifetimethroughput
Overdrivefrequencies=optimizationvariables;userexperience=objective

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 35
Comparison vs. Previous Works
(N)RC (Non-) Reliability Constrained
(N)LG (No) Lifetime Guarantee
(N)PG (No) Performance Guarantee
Work Type
Reiss12 NRC, NLG, NPG
Karpuzcu09 RC, NLG, NPG
Mihic04 RC, LG (Dynamic power management), NPG
Rosing07 RC, LG (Dynamic power management), NPG
Rong06 RC, LG (Dynamic power management), NPG
Coskun09 RC, LG (Dynamic thermal management), NPG
Srinivasan04 RC, LG (Dynamic reliability management), NPG
Karl08 RC, LG (Dynamic reliability management), NPG
Our Work RC, LG (Dynamic reliability management, PG
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 36
Optimal (Discretized) Solution Flow
Foreachcore
Foreachcombinationinwhichthecoreisactive
Choosediscretevaluesofoverdrivefrequencieswithinarange
Performpowerandtemperaturesimulations onetimeLUTcreation
Example:
Ifasystemhas3cores(CoreA,B,C),thenumberofactivecores
canbe1,2or3
CoreAisactive
One(outofthree)combinationswhen 1;two(outofthree)
combinationswhen 2;one(outofone)combinationwhen 3

UseexhaustivesearchbasedonLUTtofindoptimal
overdrivefrequenciesthatmaximizethevalueofthe
objectivefunction

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 37
Heuristic Flow

, , , , , ,

Wemaximizetheoverdrivefrequency(fOD,m)inthe
orderofthesetofactivecoresforwhichtheproductof
weights(wnom,m,wOD,m)andexecutiontimes(Enom,m,
EOD,m)ismaximum
Example:
Ifasystemhas3cores,thenumberofactivecores canbe1,2or3
If , , , , , , ,we
maximize , , , , and ,

Empirically,findslargeimprovementsinobjective
functionvalue
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 38
Testcases
Testcasesaredescribedby
#active cores
, , , nominal and overdrive execution times
, , , nominal and overdrive userdefined weights
Eighttestcases intotal
Formatis Testcase#
Sevenhaveoptimalsolutions
Onedoesnothavefeasiblesolution
Example
Name m Enom,m EOD,m wnom,m wOD,m
(Kh) (Kh)
4-I 1, 2 1, 2 3, 5 0.5, 0.3 0.5, 0.7
3, 4 3, 2 8, 5 0.2, 0.4 0.8, 0.6

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 39
Optimal, Heuristic vs. RC-LG (Baseline)
45000 Optimal Heuristic Baseline
40000
-12%
Objective Function Value

35000 -9%

30000 -3.3%

25000

20000
-17.4%
15000

10000

5000

0
4-I 4-II 4-III 4-IV 4-V 6-I 8-I
Optimal solution improvesTestcase
objective function value
by up to 17.4%
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 40
Takeaways on Reliability
Signoffmethodologycanhavehugeimpact
Example:ChickenandeggloopsamongBTI,EM,andsignoffcornerselectionin
AVSenabledsystems
AVS=newdimensioninreliabilityvs.designcost(power/area)tradeoffspace
Naveenforcementofreliabilityrulescanbecostly
PostICimplementation,reliabilityawarenessatschedulerlevel
improveslifetimeuserexperienceandguaranteedperformance
Basicchallengesremain:
(i)reliabilitymodelingandcalibration
(ii)measuringreliabilitycost({PPA})with,withoutreliabilitymargins
(iii)manydontturnoverrocksbarrierstoreliabilitycostreduction

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 41
III. Assessing Costs of Resilience
(1) MinRazor

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 42
[GLSVLSI14]
How to Minimize Cost of Resilience ?
Additionalcircuits areaandpowerpenalties
Recoveryfromerrors throughputdegradation
Largeholdmargin shortpathpaddingcost
Wantbenefits(e.g.,energy)tomaximallyoutweighcosts
MinRazor:MinimumCostResilientDesignImplementation

Razor Razor-Lite TIMBER


Power penalty 30% [Das08] ~0% [Kim13] 100% [Choudhury09]
Area penalty 182% [Kim13] 33% [Kim13] 255% [Chen13]
#recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

Razor Razor-Lite TIMBER


UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 43
Tradeoff: Resilience Cost vs. Datapath Cost

#Razor FFs
(resilience cost)
Tradeoff
Power/area of
fanin circuits

12 4
Total energy
Energy of non-resilient part
11 3
Energy (mJ)

Resilience cost
10 2
We seek to minimize total
energy via this tradeoff
9 1

8 0
300 100 50 0
#Razor FFs
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 44
Selective-Endpoint Optimization (SEOpt)
Optimizefanin coneofanendpointw/tighterconstraints
AllowsreplacementofRazorFFw/normalFF
Pickendpointsbasedonheuristicsensitivityfunctions
Candidate Sensitivity Functions Vary #endpoints
compare area/power
1 | |
penalty
2 | |

3 | |

4 | |

5 | |

p negative slack endpoint


c cells within fanin cone
Numcri number of negative slack cells
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 45
Clock Skew Optimization (SkewOpt)
Increaseslacksontimingcriticaland/orfrequently
exercisedpaths
1. Generatesequentialgraph
2. Findcycleofpathswithminimumtotalweight
adjustclocklatencies
contractthecycleintoonevertex
3. IterateStep2untilallendpointsareoptimized

W = average W31 W
weight on cycle Setup slack of path p-q

FF1 W FF2 W FF3 1


,
,
W12 W23
Weighting factor
Clock Toggle rate of path p-q
Data path Clock tree
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 46
Overall Optimization Flow
IterativelyoptimizewithSEOpt andSkewOpt
Initial placement
(all FFs = error-tolerant FFs)

OR-tree insertion

Margin insertion on K paths


based on sensitivity function
SEOpt
Replace error-tolerant FFs
w/ normal FFs

SkewOpt
Activity aware clock skew
optimization

Energy < min energy?

Save current solution


UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 47
Benefit of Low-Cost Resilience
Proposedmethod(CO)minimizescostofresilienceintermsofenergy
Referenceflows
Puremargin(PM):conventionalmethodw/onlymargininsertion
Costofpuremargininsertion=upto21%energyoverhead
Bruteforce(BF):useerrortolerantFFsfortimingcriticalendpoints
Costofresiliencew/poordesignmethod=upto10%energyoverhead
Costincreaseswithlargerprocessvariation
38 37
Energy penalty of throughput degradation
EXU
Energy penalty of additional circuits 35
34
Energy w/o resilience
Energy (mJ)

Energy (mJ)
MUL 33
30
31

26
29

22 27
PM BF CO PM BF CO PM BF CO PM BF CO PM BF CO PM BF CO
Small margin Medium margin Large margin Small margin Medium margin Large margin

Small/medium/large margin 1/2/3 for SS Technology: foundry 28nm


cornerVLSI CAD Laboratory
UCSD NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 48
Increased Benefit of Resilience with AVS
Adaptivevoltagescalingallowsalowersupplyvoltageforresilient
designs,thusreducedpower
Proposedmethodtradesoffbetweentimingerrorpenaltyvs.
reducedpoweratalowersupplyvoltage
Proposedmethodachievesanaverageof17%energyreduction
comparedtopuremargindesigns
Proposedoptimizationleadstofurtherreducedresiliencecostin
thecontextofAVSstrategy
36 pure-margin 50
pure-margin
brute-force Minimum achievable brute-force
34
CombOpt energy 45
CombOpt
32
Energy (mJ)

Energy (mJ)
40
30
35
28

26 30
MUL EXU
24 25
0.70 0.72 0.74 0.76 0.78 0.80 0.86 0.9 0.94 0.98 1.02
Supply voltage (V) Supply voltage (V)

Technology: foundry 28nm


UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 49
Optimization of TIMBER-Based Designs
TIMBERFFsusetimeborrowingtomasktimingerrors
AdditionalconstraintstoselectendpointsasTIMBERFFs
(1) NoloopofTIMBERFFs
(2)NochainedTIMBERFFswithmorethantwostages(assumetwoerror
detectionintervals)
Requireadditionaltimingslacksonfanout pathstomitigatetimingerrors
Ascomparedtothesolutionoftheproposedflow(CO)
Costofpuremargininsertion=23%energyoverhead
Costofresiliencew/poordesignmethod=7%energyoverhead

7
Energy penalty of additional circuits
6
Energy w/o resilience
5
Energy (mJ)

2 Design: ARM M0
1 Technology: foundry 40nm
0
ED interval = 10% of clock period
PM BF CO

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 50
Recent: Iterative Opt for Conventional Designs
Costofresilience=area/poweroverheads,designdifficulties
Canweachievesimilarbenefitswithout resilientcircuits,butfollowingthe
samespiritofoptimizationforresilientdesigns?
Optimizationflow
I. Relaxtimingconstraintsonallpathstobeoriginalclockperiod+relaxedmargin
II. Calculatesensitivityfunctionofeachendpointwithrespecttooriginalclockperiod
(SF=sumof|slack*power|ofnegativeslackcellsinthefanin cone)
III. BasedonSF(sortedinincreasingorder),selecttop10%endpointstorecoverto
originalclockperiod(i.e.,performtimingoptimizationwithupdatedSDCfile)
IV. IterateStepsIIandIII10times
Design:ARMM0atfoundry40nm(clockperiod=6ns,relaxedmargin=300ps)
Optimizationshows16%powerreduction
3
PM
Power Endpoints Area
Iteration
(mW) w/ violation (um^2)

Total power (mW)


1 2.145 452 131340
2
2 2.669 339 131089 Opt
3 2.703 264 130563
4 2.371 215 130486 1
5 2.329 139 130880
6 2.373 89 131415
7 2.446 31 130712 0
8 2.452 0 131011 500 400 300 200 100 0
PM 2.934 0 131319 #Endpoints with timing violation
All power values are reported at clock period = 6ns
UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 51
A Different Slack Distribution
Designoptimizedwiththenewiterativeoptimizationflowhasmore
balancedslackdistribution
Moretimingpathswithsmallslacks
exploitadditionaltimingslacksforpowerreduction
Reopensthequestion:Howtobesttradetimingslacksforpower
reductioninICimplementation/performanceclosure

Optimized design
PM design

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 52
Takeaways on Resilience
Costofresiliencestronglydependsonabilitytomixresilientand
nonresilientcircuits
Upto21%and10%energyoverheadsrespectivelyforcostofmargininsertion
andresilience(withpoordesignmethod)
Carefulreductionofresiliencecostcanimproveresilientdesignvalue
propositionintheAVScontext
Yetagain:hardtoobtaincorrectstartingpointforbenefit/costassessment!
Basicchallengesremain:
(i)measuringcostofresilienceatsoftwarelevel
(ii)unpredictabledependenciesondesign,implementationandoperating
scenarios
(iii)missingformulationsofresilienceasoptimizable objectivesfordesign
tools

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 53
In Closing
Groundup(crawlbeforewalk,walkbeforerun)approach
isstillstuckatgroundlevel(inmygroup)
Basicsoftechnomodels,reliabilitymodels,margins,signoff
criteria,implementationflows,designtestcases,workload
models,narrowwindowsofopportunity,,optimizationproblem
statements,stillwaytoofuzzyforourtastes(!)
Howshouldweassessthecostof{reliability,resilience}?
Isitevenpossibleinageneral,nonartifactual way?
Canwetaxonomize andavoidpitfallsseeninpreviousworks?
Targetsfornext/newresearch?
Missingtheorems?Missinglinks?Missinginfrastructure?
Missingmodelsanddata?Missingproblemstatements?
=tobeidentifiedduringthiscolloquium!?!?

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 54
THANK YOU !

UCSD VLSI CAD Laboratory NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113 55

You might also like