The document discusses power dissipation in clock distribution. It provides an example where a chip with a 3250pF capacitive load operating at 200MHz and 3.3V dissipates 7.08W of power. In general, clock power is proportional to frequency, voltage, and total load capacitance. Various techniques are discussed to reduce clock power, such as reducing load capacitance through different clock tree structures, adjusting wire lengths/widths to reduce skew, and using distributed buffers.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
194 views
Buffer Clock Tree
The document discusses power dissipation in clock distribution. It provides an example where a chip with a 3250pF capacitive load operating at 200MHz and 3.3V dissipates 7.08W of power. In general, clock power is proportional to frequency, voltage, and total load capacitance. Various techniques are discussed to reduce clock power, such as reducing load capacitance through different clock tree structures, adjusting wire lengths/widths to reduce skew, and using distributed buffers.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24
Power dissipation in clock distribution
The synchronization of a digital system requires one or
more signal to coordinate and insure correct sequence of operations. Clock is a signal that synchronize all the blocks in the design. A clock tree is constructed to distribute the clock signal to all the modules through the system. The clock is a major source of power dissipation because it carries large load and switching that with high frequency. Example : A chip uses a clock driver that drives 3250pF capacitive load, operating at 3.3V supply, 200MHz frequency. Then the dynamic power dissipation due to clock signal itself Pdynamic= 3250pF(3.3)2 200MHz = 7.08Watts. In general Pclk = f v2 (CL+Cd)
Where CL is the total load on the clock
Cd is the clock driver capacitance CL= Ncg+1.5(2h-1)Dcw+(N4h)cw N= total number of clock terminals Cg = Nominal input capacitance at each terminal Cw= Wire capacitance D = Chip dimension h = H tree clock routing of h levels = estimation factor depending on the algorithm used. From the equation the dynamic power dissipation increases as dimension and number of clocked devices increase
To reduce the Pclk :
Reduce the clock terminal load, routing capacitance and driver capacitance Different clock distribution structure Clock Skew and Phase Delay Clock Skew: Variation between clock source to clock terminals. To achieve the desired performance, clock skews have to be controlled within very small or tolerable values. Phase Delay: Longest delay from source to sinks ( terminals), has to be controlled in order to maximize the throughput To reduce power dissipation, clock tree construction methods are importance. Adjusting wire lengths or widths to reduce clock skew Reducing the interconnect capacitance. Note: While minimizing the clock power, constraints of the clock skew & phase delay has to be considered. Single driver and Distributed Buffers Clock driving scheme: To ensure fast clock transition, buffers have to be used to drive the large load capacitance on a clock. There are two clock driving schemes. To reduce Delay Consider a equal path length tree and its delay model. The skew between s1 and s2 can be
The skew variation in
terms of wire width variations Assuming the maximum width variations w= +/- 15%, the worst case additional skew is
Buffer Insertion in Clock Tree
Let ts is the tolerable skew of a buffered clock tree : ts = ts w + ts b ts b : tolerable skew for buffer delays ts w : skew due to asymmetric loads & wire width variations is less than. Formulation 1: Determine the minimum number of Buffer Insertion Points (BIPs) in clock tree T, such that the skew due to asymmetric loads & wire width variations is less than ts b To meet skew constraint, the buffer insertion scheme should try to balance the buffer delays on source – to sink paths independent of clock tree topology. In balanced buffer insertion scheme partitions the clock tree into subtrees such that every subtree is of equaul path length and all source to sink paths have equal number of levels. The clock tree is partitioned into multiple levels &BIPs are determined at cut- lines. This buffer insertion scheme has following properties : 1. Each source to sink path has the same number of buffers (levels) 2 All subtrees rooted at a given level are equal path length trees. 3 Select the cut lines so as to form iso radius levels. Let L be the path length, selecting a radius of the first level cut line as = L/+1 where … is the designated number of levels of buffers. - first iso radius level 2 I second iso radius level, ……. for level Buffer insertion in an equal path length tree:
a) Using the balanced buffer insertion method
b) Using the level-by-level method Comparison of buffer insertion Zero skew Vs Tolerable skew Tolerable skew: Negative & Positive TS Double clocking & Zero clocking To avoide double clocking : d01+dFF+mindlogic >= d02+dHold To avoide zero clocking : d01+dFF+maxd(logic)+d setup <= d02+T Clock Distribution including Tree, H-Tree, Mesh Clock roots from center and routed with equal path length from Source. Good Practices in clock design Try to achieve the lowest Latency (Super Buffer/H-tree) Control transition times (keep edge rates sharp) Use clock buffer for good matching Have min/max line lengths for good matching Determine whether spacing or shielding provides better tradeoff Use integral decoupling in buffers to reduce IR and Ldi/dt Clock design Objectives Minimize the clock skew (in presence of IR drop) Minimize the clock delay (latency) Minimize the clock power (and area) Maximize noise immunity (due to coupling effects) Maximize the clock reliability Problems that have to deal with Routing the clock to all flip-flops on the chip Driving unbalanced loading, which will not be known until the chip is nearly completed On-chip process/temperature variations