Clock Distribution: Shmuel Wimer
Clock Distribution: Shmuel Wimer
Shmuel Wimer
Bar Ilan Univ. Eng. Faculty
Technion, EE Faculty
July 2010
External
Clock
ext_clk
Clock
Generator
Clock
Distribution
gclk
Buffers
clk1
Clocked
Elements
Gaters
clk2
Chip
Chip B
CLKin
CLKout
ext_clk
clk
Dout
Din
July 2010
Din
Dout
ref_clk
PLL clk_out
gclk
fdbk_clk
Clock Distribution
4
Charge
Pump
Loop
Filter
C
Up
ref_clk
fdbk_clk
M
N
I R
Phase
Detect
Vctrl
Voltage
Controlled
Oscillator
clk_out
I
Down
July 2010
Q_A: B should go
faster
CLR
CLR
B
1
Q_B: B should go
slower
The two flip-flops receive the signals at their clock input (one is usually a
reference and the other is the sampled).
The output of the leading flip-flop is 1 for the lead duration.
Once the lagging signal arrives, a reset turns both Q_A and Q_B to zero.
July 2010
What happens when the reference and the sampled signals are a shift of
each other?
A: reference
B: sampled
Q_A: sampled
should go faster
Q_B: sampled
should go slower
The spikes at Q_B are a result of the delay of the AND gate driving the
CLR input of flip-flip and the internal delay from CLR to Q.
July 2010
What happens when the reference and the sampled signals have different
frequencies?
A: reference
B: sampled
Q_A: sampled
should go faster
Q_B: sampled
should go slower
Sampled is more often 1-value than the reference is, since rising edge of
B occurs more often than rising edge of A.
July 2010
Charge Pump
faster
1
CLK_ref
Q
Icp
CLR
Sup
Sdn
CLR
CLK_fdbk
Vctrl
Icp
1
slower
Converts PFD error (digital) to charge (analog), which then controls PLL VCO.
Current Mirror
Iin
Vcc
Iout
N2
N1
Vss
P1
Iin
P2
Iout
10
R load determines
the current through
current mirror
Vcc
I
Vout
slower
I
11
Vcc
C
faster=1
Vout
slower=0
Vss
July 2010
12
Vcc
C
faster=0
Vout
slower=1
Vss
July 2010
13
Loop Filter
Vcc
C
R
Vctrl
Vctrl
July 2010
14
Vout
15
Components of VCO
Buffering for
driving clk_out
clk_out
Vctrl
Vcc
Vctrl
Vss
Ring of 5 inverters
July 2010
17
Charge
Pump
Loop
Filter
C
Up
I R
ref_clk
Phase
Detect
Vctrl
fdbk_clk
Voltage
Controlled
Delay Line
clk_out
I
Down
July 2010
18
Delay Line
The signal from input to output is delayed
In
2n1
2n2
Out
Dn 1
Dn 2
Out
D0
19
July 2010
20
clkn
No constraints imposed on buffers and wires.
Used mostly by automatic tools in automatic synthesis flows.
Can be used for small blocks within large design.
Tools aim at minimizing the variance of clock delays.
July 2010
21
1 n
T
i 1 CLKi n i 1 CLKi
n
should be minimized.
22
23
July 2010
25
July 2010
26
July 2010
27
July 2010
28
July 2010
29
July 2010
30
H-Tree
Recursive pattern to
distribute signals uniformly
with equal delay over area
31
Clock H-Tree
chip / functional block / IP
sequential elements
clock / PLL
July 2010
32
July 2010
33
Delay Calculation
34
35
36
July 2010
37
38
Skew Modeling
Point of
divergence
Tclk1
Clock
Generator
1
CL
Tclk2
n
D
TCLK1
m
T
i 1 i
TCLK2 i 1Ti n
n
Cl VCC I d
July 2010
39
VCC
Cl
I d VCC
Cl
I d
2
VCC
Cl
I d
Id
Id
Id
VCC Cl I d
Cl
Id
VCC
m n skew .
40
2
PCLK_m Cl_mVCC
f,
PCLK_ m j
July 2010
Cl_m
kj
2
VCC
f , 0 j m 1.
41
V f j 0
2
CC
m 1
Cl_m
kj
1
k
2
VCC
fCl_m
1 1 k
PCLK _ m
PCLK
July 2010
1 1 k
1 1 k
.
42
43
44
45
46
PLL
47
July 2010
48
If a phase detector (PD) has a skew guard band g, then guard bands
may accumulate along tree paths.
For example, if a logic stage is shared between region B and C, it may
add 7g time units to path delay.
July 2010
49
July 2010
50
July 2010
51
Data-DrivenClock Gating
52
Aug 2011
Data-DrivenClock Gating
53
Aug 2011
Data-DrivenClock Gating
54
clk_en
D
clk
D
clk
Aug 2011
Q
clk_en
Data-DrivenClock Gating
55
Aug 2011
Data-DrivenClock Gating
56
clk
en2
clk_g
Aug 2011
clk_en
clk_en
en1
en_joint
1 0 0 0 0 1 0 0 0 1
Data-DrivenClock Gating
57
D1
FF
Q1
CL
CL
D2
FF
clk_g
clk_g
Latch
D3
clk
Q2
Q3
Data-DrivenClock Gating
58
k fan-out clock-tree
level
level 2
level 1
level 0
k 2K
Aug 2011
n 2 N 2 K leaves
Data-DrivenClock Gating
59
clk
en_joint
clk_g
enk
clk
clk_g
en_joint
en_joint
en1
en1
clk_g
Data-DrivenClock Gating
enk
clk
Aug 2011
enk
en1
backward connection
of enabling signal
60
sequential elements
clock / PLL
Aug 2011
Data-DrivenClock Gating
61
Data-DrivenClock Gating
62
Net saving at
a leaf flip-flop
Latch overhead
amortized over k
FFs
Derivate by k:
Aug 2011
Switching probability
of FF enabling
q k ln q cFF cW clatch k 2 0
Data-DrivenClock Gating
63
Aug 2011
Data-DrivenClock Gating
64
Timing Implications
TC
clk
tpA
tpcq_latch
tpA
tpcq_latch
TC
clk_g
tpcq_FF
tsetup_latch
Q1
tpd_logic
tsetup_FF
D3
tpX tpO
D2
Aug 2011
Data-DrivenClock Gating
65
Timing Constraints
clk_g:
Data-DrivenClock Gating
66
Data-DrivenClock Gating
67
Aug 2011
Data-DrivenClock Gating
68
Aug 2011
Data-DrivenClock Gating
69
eij vi , v j E is FF pairing.
ai | a j is joint toggling.
Data-DrivenClock Gating
70
Total power:
P 2 e E ai | a j
ij
v V
i
v V
i
Aug 2011
ai e E e E ai ai | a j a j ai | a j
ij
ij
ai e E
ij
Essential + Waste
ai a j v V ai e E w eij
i
Data-DrivenClock Gating
ij
71
aj
ai
ai a j
Data-DrivenClock Gating
72
Data-DrivenClock Gating
73
a 7 | a8
a1 | a2
a1
Aug 2011
a3
a3 | a 4
a 6 | a5
a2
a7
a4
Data-DrivenClock Gating
a6
a5
a8
74
Aug 2011
Data-DrivenClock Gating
75
v V , v k , ev vu uv E is hyperedge, E n .
k
Data-DrivenClock Gating
76
Total power:
P e E k
v
uv
au
vi V
ai e E vv a v Uuv au
vi V
ai e E w ev
Data-DrivenClock Gating
77
Aug 2011
Data-DrivenClock Gating
78
Aug 2011
Data-DrivenClock Gating
79