DAY8 CLOCK TREE
SYNTHESIS
PHYSICAL DESIGN
Y.V.L.Tanuja
Definition
• The process of distributing the clock and balancing the load is called
CTS.
• Basically, delivering the clock to all sequential elements.
• CTS is the process of insertion of buffers or inverters along the clock
paths of ASIC design in order to achieve zero/minimum skew or
balanced skew.
• Before CTS, all clock pins are driven by a single clock source. CTS
starting point is clock source and CTS ending point is clock pins of
sequential cells.
Inputs of CTS:
- Technology file (.ƞ)
- Netlist
- SDC
- Library files (.lib & .lef) & TLU+ file
- Placement DEF file
- Clock specification file which contains Insertion delay, skew, clock
transition, clock cells, NDR, CTS tree type, CTS exceptions, list of
buffers/inverters etc...
CTS Targets:
• Skew &
• Insertion delay
About Skew:
• The difference in the clock latencies of two flops belong to the same
clock domain.
- If the capture clock latency is more than the launch clock, then it is
positive skew. This helps to meet setup.
- If the capture clock latency is less than the launch clock, then it is
negative skew. This helps to meet hold.
• Types of skew:
Local Skew:
- The difference in the clock latencies of two logically connected flops
of same clock domain.
Global Skew:
- The difference in the lowest clock latency and highest clock latency of
two flops of same clock domain.
About clock latency/insertion delay
• The time taken by the clock to reach the sink point from the clock
source is called Latency. It is divided into two parts
– Clock Source Latency &
– Clock Network Latency.
• Clock Source Latency:
- The delay between the clock waveform origin point to the
definition point.
• Clock Network Latency:
- The delay from the clock definition point to the destination/sink
point.
Clock Tree Reference:
• By default, each clock tree references list contains all the clock buffers
and clock inverters in the logic library.
• The clock tree reference list is,
- Clock tree synthesis
- Boundary cell insertions
- Sizing
- Delay insertion : If the delay is more, instead of adding many buffers
we can just add a delay cell of particular delay value.
Advantage is the size and also power reduction. But it has high variation,
so usage of delay cells in clock tree is not recommended.
Boundary cell insertions:
• When we are working on a block-level design, we might want to
preserve the boundary conditions of the block’s clock ports (the
boundary clock pin).
• A boundary cell is a fixed buffer that is inserted immediately after the
boundary clock pins to preserve the boundary conditions of the clock
pin.
• When boundary cell insertion is enabled, buffer is inserted from the
clock tree reference list immediately after the boundary clock pins. For
multi-voltage designs, buffers are inserted at the boundary in the
default voltage area.
• The boundary cells are fixed for clock tree synthesis after insertion; it
can’t be moved or sized. In addition, no cells are inserted between a
clock pin and its boundary cell.
Clock Tree Exceptions:
• Non-Stop pin:
-Non-stop pins trace through the endpoints that are normally
considered as endpoints of the clock tree.
Example:
- The clock pin of sequential cells driving generated clock are implicit
non-stop pins.
- Clock pin of ICG cells.
Conti…
• Float pin: Float pins are clock pins that have special insertion delay
requirements and balancing is done according to the delay [Macro
modeling].
• This is same as sync pin but internal clock latency of the pin is taken
into consideration while building the clock tree.
• To adjust the clock arrival for specific endpoints with respect to all
other endpoints.
Example - Clock entry pin of hard macros
Conti…
• Exclude pin:
Exclude pin are clock tree endpoints that are excluded from clock tree
timing calculation and optimization. The tool considers exclude pins
only in calculation and optimizations for design rule constraints.
During CTS, the tool isolates exclude pins from the clock tree by
inserting a guide buffer before the pin or these pins are need not to be
considered during the clock tree propagation.
Example - Non clock input pin of sequential cell
Beyond the exclude pin the tool never perform skew or insertion delay
optimization but does perform design rule fixing.
Conti…
• Stop pin: Stop pins are the endpoints of clock tree that are used for
delay balancing. In CTS, the tool uses stop pins in calculation &
optimization for both DRC and clock tree timing.
Example - Clock sink are implicit stop pins
• The optimization is done only up to the stop pin as shown in the above
fig. The clock signal should not propagate after reaching the stop/sync.
This pin needs to be considered for building the clock tree.
Don't Touch Sub-tree:
• If we want to preserve a portion of an existing clock tree, we put don’t
touch exception on the sub-tree.
- CLK1 is the pre-existing clock and path 1 is optimized with respect to
CLK1.
- CLK2 is the new generated clock. Don’t touch sub-tree attribute is set
w.r.t C1.
Example:
- If path1 is 300ps and path2 is 200ps, during balancing delay are
added in path2.
- If path1 is 200ps and path2 is 300ps, during balancing delay can’t be
added on path1 because on path1 don’t touch attribute is set and we get
violation.
Conti…
• Don't Buffer Net: It is used in order to improve the results, by
preventing the tool from buffering certain nets. Don’t buffer nets have high
priority than DRC.
-CTS do not add buffers on such nets. Example - If the path is a false path,
then no need of balancing the path. So set don’t buffer net attribute.
• Don't Size Cell: To prevent sizing of cells on the clock path during CTS
and optimization, we must identify the cell as don’t size cells.
• Specifying Size-Only Cells: During CTS & optimization, size only
cells can only be sized not moved or split.
-After sizing, if the cells overlap with an adjacent cell, the size-only cell
might be moved during the legalization step.
CTS Algorithms
• RC Tree based CTS
• H- tree based algorithm
• X- tree based algorithm
• Method of mean and meridian
• Geometric matching algorithm
• Pi configuration
RC Tree based CTS
H- tree based algorithm
• A perfect synchronization between the clock signals is achieved by ‘H’
like model before the arrival of clock to the sub-blocks or synchronous
elements. With the help of H-tree zero skew can be easily achieved.
Conti…
• Consider ‘a’ to ‘p’ as clock pin of sequential elements and the four
modules (boxed) are nothing but sub-modules within the top module.
All those sequential elements need to get the clock at same time.
• To achieve this H-tree is built within the top module and the sub
module
• it is clear that all the clock pins are exactly 9 units from the clock
definition point.
• The points marked are called as tap points and when the signal reaches
these tap points, they split into two different directions.
• This is how the clock consumes exactly 9 time units to
reach all the sequential elements with zero skew.
Advantages:
- Balanced latencies & Low skew
Disadvantages:
- Requires big driver, thus lots of power
- Requires more routing resources
X- tree based algorithm
• X tree is similar to H-tree but only difference is the connections are
not rectilinear. The module design used for H-tree is taken and X-tree
is implemented and the difficulties.
• Advantages:
- Balanced latencies & Low skew
• Disadvantages:
- Crosstalk
Geometric matching algorithm
• For explaining H, X and MMM algorithms 16 point structures were
taken, and let us consider an 8 point structure for explaining the
Geometric Matching Algorithm.
• The physical locations of sub-modules are not symmetric. Developing
H-tree among these sub-modules is practically not possible. At first
two sub-modules are grouped together and those trees are named as X-
1, X-2, X-3 and X-4.
Conti…
• The optimal entry point may not be equidistant from the entry point of
X-1 and X-2; buffer insertion can balance the delay because of un-
equal net length.
• Then two two-point trees are joined together to form a H like
structure. The resultant H-trees are named as X-12 and X-34.
• The tap points of both the H structure cannot be connected by using
rectilinear nets.
• In order to connect the two trees the geometrical position of one H-
tree is changed compatible to the other tree’s tap point.
Pi configuration
• In pi configuration, the total number of buffers inserted along the
clock path is multiple of previous level.
• This type of structure uses the same number of buffers and
geometrical wires and relies on matching the delay components at
each level of the clock.
• The pi structure is clock tree is considered to be balanced.
• Π and H-tree are the most efficient clk routing algorithms because of
no cross talk and it consumes minimum skew
CTS quality checks
• There are following quality checks for CTS:
- Minimum insertion delay
- Skew balancing
- Duty cycle
- Pulse width
- Clock tree power consumption
- Signal integrity & cross-talk issue
CTS Optimization Techniques:
1. Buffer/Gate Sizing: Sizes up or down buffers and gates to improve
both skew and insertion delay.
2. Buffer/Gate Relocation: Physical location of the buffer or gate is
moved to reduce skew and insertion delay.
3. Delay Insertion: Delay is inserted for shortest paths.
4. Dummy Load Insertion: Uses load balancing to fine tune the clock
skew by increasing the shortest path delay.
Outputs of CTS:
- Timing report
- Congestion report
- Skew report
- Insertion delay report
- CTS DEF file
Reference: ASIC Back-end, VLSI backend adventure
- Y.V.L.Tanuja
THANK YOU