Techniques To Reduce Timing Violations Using Clock Tree Optimizations in Synopsys ICC2
Techniques To Reduce Timing Violations Using Clock Tree Optimizations in Synopsys ICC2
semiwiki.com
1 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
Clock Tree Synthesis is a process which makes sure that the clock
gets distributed evenly to all sequential elements in a design to
meet the clock tree design rule violations (DRVs)Vs such as max
Transition, Capacitance and max Fanout, balancing the skew and
minimizing insertion delay.
There are many types of clock structures namely H-Tree, X-Tree,
Conventional clock tree, Multi source clock tree, Mesh Tree etc. In
this article, we will focus on clock tree optimization of a mesh clock
tree.
Mesh Tree Structure
Mesh tree has clock nets in grid pattern that are driven by clock
inverters and buffers. With this structure we can have minimum
skew, latency and On-chip Variation as compared to other clock
structures. The network of inverter and buffer drivers from clock
port to clock mesh drivers is known as Pre-mesh clock structure.
An example of a clock mesh tree is shown in figure 1 below.
Mesh tree structure has high power consumption and requires
high routing resources because the whole layer is consumed by
the clock tree structure. Generally mesh is created at the top layer
to acquire the advantage of less resistance in metals and to save
routing resources for signal nets in lower layers. A design can
consist of one mesh tree or multiple mesh trees.
Mesh terminals are created at a particular pitch in X and Y
direction based on various experiments. First step is to create a
mesh terminal as shown in fig 2, then clock tree synthesis where
skew groups are created according to flop distribution in design.
First level routing is done from the mesh terminal to the first buffer
to reserve routing resources for first level clock nets. Inverter is
connected to the clock gating cell. Then the network of clock
inverters and buffers are created upto the clock sinks as shown in
figure 2.
These clock gating cells are cloned as per the number of fanout
sink points. In first level cloning, it looks for the sink points and
checks whether the number of fanout exceeds a certain limit. If it
exceeds the limit then this clock gaters are again cloned according
to Design rule violation checks, RVs (Max fanout, Max capacitance
and Max Transition). After cloning, clock tree synthesis is executed
and followed by clock_opt which performs timing, power and area
optimizations.
2 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
3 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
Experiments
1) Enabling Global routing for timing and skew optimization.
Default : set_app_options –name cts.compile.enable_global_route
–value false
Exp1 : set_app_options –name cts.compile.enable_global_route
–value true
During clock tree synthesis these options enable a global router at
its initial stage. By default this option is false and instead of global
router, virtual router is enabled during initial synthesis.
Virtual routers are used at pre pre-optimization stage for fast
prediction of the wire pattern. It does not contain a layer
assignment. Does not consider whether there are enough routing
resources.
Global routing is used for the first step of the actual wire
implementation. Tries to avoid global congestion. It takes longer
time for optimization but has accurate timing results.
So the advantage of a global router is that we have accurate
timing results and the optimization is done based on the estimation
of the routability and congestion in the design.
Results Default Using Switch
Setup slack -46.1ps 9ps
Launch path 247.7ps 222.3ps
latency
Capture path 172.5ps 191.02ps
latency
Skew 75.2ps 31.3ps
CK capture path Buff : X8, X24, X8, Buff : X8, X32, X8,
BUF/INV X4, X8 X12, X12
CK launch path Buff : X8, X8, X12, Buff : X8, X32, X4,
BUF/INV X12 X8, X12
CKBUF Count 6841 7273
CKINV Count 844 864
CKBUFF Power 42.2mw 42.9mw
CKINV Power 4.08mw 4.11mw
Due to enabling global routing during the clock tree synthesis the
synthesis was based on the actual wire implementation. Launch
path, capture path and skew is decreased. And we got a margin of
4 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
5 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
3) Appling NDR
Default : set_app_options –name clock_opt.flow.optimize_ndr
–value false
Exp : set_app_options -name clock_opt.flow.optimize_ndr -value
true
Tool applies non-default-routing rules on long timing critical nets
during clock_opt optimization to improve timing, by applying NDR
on timing critical nets the width of the net increases due to which
resistance in the nets decreases which results in a decrease in net
delay.
Results Default Switch
Setup slack -46.1ps -21ps
Launch path latency 247.7ps 228.5ps
Capture path latency 172.5ps 175.8ps
Skew 75.2ps 52.7ps
CK capture path Buff : X8, X24, X8, Buff : X8, X24,
BUF/INV X4, X8 X32, X20
CK launch path Buff : X8, X8, X12, Buff : X8, X32,
BUF/INV X12 X24, X8
CKBUF Count 6841 6912
CKINV Count 844 854
CKBUFF Power 42.2mw 42.5mw
CKINV Power 4.08mw 4.16mw
From the above table, WNS in default experiment is -46.1ps slack
and with NDR optimization is -21ps, Here launch path latency is
less than the default experiment because the NDR is applied on
timing critical nets due to which the net delays is decreased. But
the total no of clock buffer, inverter count and power consumption
is increased. Here the power consumption has increased because
after applying NDR on timing critical nets still the setup is slack is
negative but it is better than the default experiment as there was
no margin available if we didn’t see any power optimization.
4) Enabling Area Recovery
set_app_options -name
clock_opt.flow.enable_clock_power_recovery -value area
This option turns on power recovery in clock_opt optimization. The
valid values are: auto, none, power, area. By default, it is auto
when CCD flow is enabled. In non-CCD flow, auto means none.
6 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
7 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
CK launch path Buff : X8, X8, X12, Buff : X8, X18, X4,
BUF/INV X12 X8
CKBUF Count 6841 6980
CKINV Count 844 885
CKBUFF Power 42.2mw 42.1mw
CKINV Power 4.08mw 4.11mw
Here we see that again the priority is given to timing not to power,
Here also margin was not available so power recovery is not done.
6) Disabling Path groups for optimization if margin is
available
set_app_options -name ccd.skip_path_groups -value {reg2mem
mem2reg}
set_app_options -name clock_opt.flow.enable_ccd -value true
This app option skips the path groups which are mentioned in the
list. We can skip those path groups which are not timing critical. So
the tool can put most of its effort on those path which are timing
critical
Results Default Switch
Setup slack -46.1ps 6ps
Launch path 247.7ps 221.9ps
latency
Capture path 172.5ps 175ps
latency
Skew 72.5ps 46.9ps
CK capture path Buff : X8, X24, X8, Buff : X8, X32, X24,
BUF/INV X4, X8 X24, X8
CK launch path Buff : X8, X8, X12, Buff : X8, X8, X12,
BUF/INV X12 X12
CKBUF Count 6841 5968
CKINV Count 844 674
CKBUFF Power 42.2mw 41.3mw
CKINV Power 4.08mw 3.09mw
In this block my two path groups have margin in timing so the tool
will not use its resources to optimize those paths and enable the
CCD optimization. By doing these the tool will give emphasis on
8 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
the paths which are timing critical and hence we get a positive
margin in timing and no clock buffer, inverter count and power is
reduced.
7) Hold Fixing
set_app_options -name ccd.hold_control_effort -value high
set_app_options -name clock_opt.enable_ccd -value true
The first app options control the hold optimization effort. It has five
values: none, low, medium, high and ultra. By default it is set to
low.
Here the hold slack is given in the below table.
Results Default Switch
Hold slack -89ps 35ps
Launch path 238.76ps 230.34ps
latency
Capture path 194.57ps 206.46ps
latency
Skew 44.19ps 23.88ps
CK capture path Buff : X8, X4, X16, Buff : X8, X28, X12,
BUF/INV X12, X8 X32, X4
CK launch path Buff : X8, X12, X32, Buff : X8, X18, X32,
BUF/INV X8 X4
Delay Buff:DLX2
9 of 10 31-08-2020, 11:20
Techniques to Reduce Timing Violations using Clock Tree Optimizations... about:reader?url=https://semiwiki.com/semiconductor-services/290148-t...
10 of 10 31-08-2020, 11:20