0% found this document useful (0 votes)
30 views

NOLO: A No-Loop, Predictive Useful Skew Methodology For Improved Timing in IC Implementation

The document describes several methodologies for improving timing in integrated circuit implementation through useful skew optimization. It proposes a No-Loop (NOLO) predictive useful skew methodology that determines clock latencies for each sink in one pass without iteration. This is aimed to address the long turnaround time of typical back-annotation flows that iteratively back-annotate post-placement skew to synthesis. Experimental results on several designs are presented to validate that optimization based on a single-voltage netlist can still be effective after full multi-voltage placement and routing.

Uploaded by

sony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

NOLO: A No-Loop, Predictive Useful Skew Methodology For Improved Timing in IC Implementation

The document describes several methodologies for improving timing in integrated circuit implementation through useful skew optimization. It proposes a No-Loop (NOLO) predictive useful skew methodology that determines clock latencies for each sink in one pass without iteration. This is aimed to address the long turnaround time of typical back-annotation flows that iteratively back-annotate post-placement skew to synthesis. Experimental results on several designs are presented to validate that optimization based on a single-voltage netlist can still be effective after full multi-voltage placement and routing.

Uploaded by

sony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

NOLO: A No-Loop, Predictive Useful Skew

Methodology for Improved Timing


in IC Implementation

Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li

VLSI CAD LABORATORY, UC San Diego

UC San Diego / VLSI CAD Laboratory


Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-2-
Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-3-
Typical Useful Skew Flow
■Useful Skew adjusts clock sink latencies to
improve performance and/or timing robustness of
IC designs
➢ Clock period = 10
➢ Min. slack with zero skew = 0
10/0

FF1 FF2 FF3


7/3 7/3

Clock
5 5 5
Data path Clock
Delay/Slack/Clock tree
latency -4-
Typical Useful Skew Flow
■ Useful Skew adjusts clock sink latencies to
improve performance and/or robustness of IC
designs
➢ Clock period = 10 Typical useful skew flow
➢ Min. slack with useful skew = 2 RTL netlist
10/2

Synthesis
FF1 FF2 FF3
7/2 7/2 Placement/Place Opt.

Skew
Clock CTS/CTS Opt.
7 6 5 Opt.
Data path Clock
Routing/Route Opt.
Delay/Slack/Clock tree
latency -5-
“Chicken-and-Egg” Problem
■ Typical useful skew flow synthesizes and places
designs with zero skew
⇒ Benefit of useful skew is limited
RTL netlist

Synthesis
Assume zero skew
Placement/Place Opt.

Skew
CTS/CTS Opt.
Opt.
Apply useful skew
Routing/Route Opt.
-6-
Back-Annotation Flow
■ Iteratively back-annotates post-placement useful
skew to synthesis
⇒ Account for interactions among synthesis,
placement and useful skew optimization

RTL netlist
Issue: unacceptable
large turnaround time
Synthesis
Useful
Our goal = predictive, Skew
one-pass (no-loop) flow Placement/Place Opt.

CTS/CTS Opt.

Routing/Route Opt. -7-


Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-8-
NOLO (No-Loop) Useful Skew
Optimization Problem
Given a netlist and timing constraints

Determine clock latency for each sink (= flip-flop),


using a one-pass implementation flow
Objective: minimize total negative slack (TNS)

-9-
Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-10-
Previous Useful Skew Optimizations
Maximize minimum slack in a circuit
■ [Fishburn90] formulates linear programming (LP)
to optimize clock latencies
■ [Szymanski92] improves the efficiency of LP by
selectively generating constraints
■ [Wang04] proposes LP-based approach to
evaluate potential slacks and optimize clock skew

Maximize all slacks in a circuit


■ [Albrecht02] formulates useful skew optimization
as maximum mean weight cycle (MMWC)
problem
⇒ optimizes using graph-based method -11-
MMWC-Based Skew Optimization
1. Construct sequential graph (vertex = flip-flop, edge =
max-/min-delay path, edge weight = setup/hold slack)

Initial graph
+ A
20/20 10/
12/ 10
+ B C +
8
0 0
10/ 10/
10 2/ 10
+ D E +
18
0 0
Clock period = 20 Delay/Slack/Clock latency -12-
MMWC-Based Skew Optimization
1. Construct sequential graph (vertex = flip-flop, edge =
max-/min-delay path, edge weight = setup/hold slack)
2. Iteratively find critical loop → optimize slacks → contract
critical loop into one vertex → update adjacent edges
→ optimize the rest
Initial graph After 1 st iteration
+ A + A
20/20 10/ 20/60 10/6
12/ 10 12/
+ B C + + B 6 C +
8
0 0 6 4
10/ 10/ 10/ 10/
10 2/ 10 4 2/ 14
+ D E + + D E +
18 18
0 0 0 0
Clock period = 20 Delay/Slack/Clock latency -13-
MMWC-Based Skew Optimization
1. Construct sequential graph (vertex = flip-flop, edge =
max-/min-delay path, edge weight = setup/hold slack)
2. Iteratively find critical loop → optimize slacks → contract
critical loop into one vertex → update adjacent edges
→ optimize the rest
Initial graph After 1 st iteration After 2 nd iteration
+ A + A + A
20/20 10/ 20/60 10/6 20/60 10/6
12/ 10 12/ 12/
+ B C + + B 6 C + + B C +
8 6
0 0 6 4 6 4
10/ 10/ 10/ 10/ 10/ 10/
10 2/ 10 4 2/ 14 12 2/ 12
+ D E + + D E + + D E +
18 18 12
0 0 0 0 8 2
Clock period = 20 Delay/Slack/Clock latency -14-
Simple Predictive Flow
1. Timing analysis at post- RTL netlist
synthesis stage
2. Perform useful skew
Synthesis
optimization
Maximize ∑ setup slacks
Predictive Useful Skew
Subject to hold constraints
3. Apply resulting useful skew
(clock latencies) during Placement/Place Opt.
following implementation
stages
CTS/CTS Opt.

Routing/Route Opt.

-15-
Impact of Early Optimization
■ Post-synthesis useful skew optimization (simple predictive)
⇒ Improved clock skew relaxes timing constraints
⇒ Correlation between post-synthesis & post-routing slacks↑

With useful skew Without useful skew

0ps to 150ps
0ps to 250ps

➢ Post-routing critical path corresponds to paths with 0-150


(0-250)ps slacks w/ (w/o) useful skew
-16-
Key Observation
■ Will the optimization at post-synthesis stage
still be valid at post-routing stage? - Yes
■ Recall: Improved correlation between post-
synthesis and post-routing slacks
■ Expect: Post-synthesis optimization leads to similar
timing improvement as post-routing optimization

Synthesis

Useful Skew

P&R Compare

Useful Skew
-17-
Improved Predictive Flow
■ Solution quality of predictive optimization is affected by
timing optimizations during P&R (e.g., Vt-swapping)
⇒ Predict useful skew based on LVT-only netlist
■ LVT-only synthesis ⇒ estimation of achievable slacks
RTL netlist

Synthesis w/ Multi-Vt Synthesis w/ LVT


LVT-only
Predictive Useful Skew
netlist

Placement/Place Opt.

CTS/CTS Opt. We use setup slacks from


LVT-only case and hold
Routing/Route Opt. slacks from multi-Vt case -18-
Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-19-
Experimental Setup
■ Design
Design Clk period (ns) #Cells #Flip-flops #Paths
aes_cipher 0.6 ~23K 530 16251
des_perf 0.5 ~11K 1985 23153
jpeg_encoder 0.6 ~50K 4712 137333
mpeg2 0.4 ~11K 3381 95490
■ Technology 28nm FDSOI, dual-Vt {SVT, LVT}
■ Signoff corners {125ºC, 0.9V, SS} and {-40ºC, 1.05V, FF}
■ Tools
– Synthesis: Synopsys Design Compiler vH-2013.03-SP3
– P&R: Synopsys IC Compiler vH-2013.06-SP2
■ Tool “denoising” execute three separate runs with small
perturbation of clock period (-1ps, 0ps, +1ps), take best
outcome
-20-
Comparison Among Flows
■ Variants of back-annotation flows
Flow Back annotate from Back annotate to
BA-W Post-placement Pre-synthesis
BA-I Post-placement Pre-placement
BA-II Post-routing Pre-synthesis
BA-III Post-routing Pre-placement
BA-IV Post-routing Pre-CTS
■ SimPred = simple prediction flow
■ ImpPred = improved prediction flow

-21-
Experimental Results
■ Predictive flow (ImpPred) achieves similar / better timing,
with much less runtime, compared to the average of back-
annotation flow variants (BA avg)
■ Different back-annotation flows → timing quality varies
⇒ Cannot completely resolve the “chicken-and-egg” problem
aes_cipher Less
runtime

Smaller TNS des_perf

jpeg_encoder mpeg2
-22-
Outline
■ Background and Motivation
■ Problem Statement
■ Our Methodologies
■ Experimental Setup and Results
■ Conclusion

-23-
Conclusion
■ NOLO = a no-loop predictive useful skew
optimization flow
■ Improved prediction of potential slack using LVT-only
netlist
■ Similar or better timing, with much less runtime
compared to back-annotation flows
■ Back-annotation flow cannot completely resolve the
“chicken-and-egg” problem
■ Future Work
– Analyze and apply useful skew across multiple PVT corners
– Study tradeoff among area, power and timing of useful
skew optimization

-24-
Acknowledgments
■ Work supported from Qualcomm, Samsung,
NSF, SRC, the IMPACT (UC Discovery) and
IMPACT+ centers

-25-
Thank You!
Backup Slides
Zero-skew flow
RTL netlist

Synthesis

Placement/Place Opt.

CTS/CTS Opt.

Routing/Route Opt.

You might also like