0% found this document useful (0 votes)
17 views

Lecture 16

Latches pdf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture 16

Latches pdf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

ECE 225

Lecture 16
Clocked Circuits, Timing and Clocking

Prof. Kaustav Banerjee


Electrical and Computer Engineering
University of California, Santa Barbara

Lecture 16, ECE 225 Kaustav Banerjee


Latches
Multiplexer based

CLK=1: D to Q
CLK=0:Holds
state of Q As long as
CLK remains
high, D will be
written on Q

Lecture 16, ECE 225 Kaustav Banerjee


Latch versus Register
 Latch  Register
stores data when stores data when
clock is low (or high) clock rises (or falls)

D Q D Q

Clk Clk

Clk Clk

D D

Q Q

Lecture 16, ECE 225 Kaustav Banerjee


Characterizing Timing
Latch
Register tD 2 Q

D Q D Q Data may
arrive after
Clk edge….
Clk Clk

tC 2 Q tC 2 Q

Requires an extra
Data is ready when
timing parameter…
Clk arrives….

Lecture 16, ECE 225 Kaustav Banerjee


Registers or Flip-Flops

Combines two latches:


One +ve sensitive (slave) and one –ve
sensitive latch (master)
Edge Triggered FF or Master-Slave FF Master Slave
QM

CLK=0: D to QM
QM = D
QM
Slave holds previous value
of Q
CLK=1: master can’t sample
input and holds value of D
Slave opens and QM=(D) =Q

Lecture 16, ECE 225 Kaustav Banerjee


Timing Definitions
CLK
t Register
tsu thold D Q

D DATA CLK
STABLE t
tc 2 q

Q DATA
STABLE t

tsu= setup time =time for which the data inputs (D) must be valid before the CLK edge
thold= hold time =time for which data input must remain valid after the CLK edge
tc2q= worst case propagation time through the Register (w.r.t the CLK edge)

Lecture 16, ECE 225 Kaustav Banerjee


Maximum Clock Frequency

tsetup thold

FF’s
D Q
CLK
Edge
Register
To ensure that the input data
of the sequential elements is
held long enough after the
CLK edge and is not modified
LOGIC too soon by the new wave of
data coming in:
tp,comb

2) tcdreg + tcdlogic > thold


tcd: contamination delay
1) Tmin = tclk-Q + tp,comb + tsetup = minimum delay
Clk period must accommodate the longest
delay of any stage in the network

Lecture 16, ECE 225 Kaustav Banerjee


Master-Slave +ve Edge Triggered
Register
Transistor Level Implementation
X-gate Multiplexer-based latch pair
Master Slave

I2 T2 I3 I5 T4 I6 Q

QM
D I1 T1 I4 T3

CLK

CLK=0: T1 is on, T2 is off, D input sampled CLK=1: T3 is on, T4 is off, QM sampled onto Q
onto QM
T2 on and T1 off: I2 and I3 hold the state of QM
T3 off and T4 on: I5 and I6 hold the state of the
Slave
Lecture 16, ECE 225 Kaustav Banerjee
Master-Slave +ve Edge Triggered
Register
Transistor Level Implementation
Master Slave

I2 T2 I3 I5 T4 I6 Q

Tc-q=delay through T3 and I6


QM
D I1 T1 I4 T3 Since delay of I2 is included in
set-up time output of I4 is valid
CLK before the rising edge of CLK

Tc-q= tpd_inv +tpd_tx

tsu = set-up time = time before the rising edge of the CLK during which the D input
should remain stable so that QM samples the value reliably
Since D must propagate through I1, T1, I3, and I2 before the rising edge

tsu= 3tpd_inv +tpd_tx To ensure equal node voltages


on both sides of the Xgate
thold=0 (since T1 is cut off after CLK edge)
Lecture 16, ECE 225 Kaustav Banerjee
Pipelining: Optimizing Sequential Circuits
Widely used to accelerate the operation of datapaths in digital microprocessors…

Reference Circuit: computes log (|a+b|)


Pipelined Circuit
REG

REG
a a

REG

REG

REG
REG
log Out CLK log Out
CLK

REG
REG

b CLK b CLK CLK CLK

Tmin  tcq  t pd ,logic  t su


CLK CLK

Tmin, pipe  tcq  max t pd ,adder  t pd ,abs  t pd ,log  t su

Computation of one set of input data


spreads over several clock cycles.
Pipelining improves resource
utilization and increases functional
throughput.

Lecture 16, ECE 225 Kaustav Banerjee


Storage Mechanisms
A Static Latch
Uses a bistable memory Dynamic Latch (charge-based)
device (FF), more complex
Useful for frequently
CLK CLK clocked structures

Q
D Q
CLK

D CLK

Stored value on a parasitic capacitor


CLK
can remain for a limited time….few
Q = Clk . Q + Clk . D ms….need to refresh periodically

Lecture 16, ECE 225 Kaustav Banerjee


Dynamic X-gate Edge-Triggered
Registers T =t su dx-T1

Thold~0
CLK CLK
Tc-q=2td-inv + tdx_T2
A B
D T1 I1 T2 I2 Q
C1 C2
C1 consists of Leakage will destroy the
CLK gate cap. of I1, CLK
jnc. & overlap state if not refreshed
Master gate cap. of T1 Slave
CLK=0: D input is sampled at storage node 1, Slave stage in hold mode with node 2 in high-
impedance state
At the rising edge of CLK: T2 is turned on and value sampled at node 1 right before the rising
edge is copied to Q
Very efficient: requires only 8 transistors, can be made even simpler (6 transistors) using NMOS-
only pass transistors
Race condition can occur due to CLK overlaps!
Robustness is also an issue….(state at the nodes can be distorted by injected noise)
Lecture 16, ECE 225 Kaustav Banerjee
Impact of Non-overlapping Clocks…on +ve
Edge Triggered Register
CLK CLK

A B
D T1 I1 T2 I2 Q
C1 C2
CLK CLK

CLK
toverlap_0-0 < tT1 + t I1 + tT2
So that output Q doesn’t change
on the falling CLK edge….

CLK thold > toverlap_1-1


data must remain stable
during the high overlap period
toverlap_0-0 toverlap_1-1

Lecture 16, ECE 225 Kaustav Banerjee


Making a Dynamic Latch Pseudo-Static
Problems with Dynamic Latches: 1) any signal net that is capacitively coupled to the
internal storage node (such as A…) can inject significant noise and destroy the state
2) Leakage..during CLK gating or slowdown 3) internal dynamic node doesn’t track
variations in Vdd

Problems of the dynamic register can be overcome by


adding a weak feedback INV
CLK

A
D D

CLK
Increases delay slightly…but improves noise immunity
Most registers should be pseudo-static or static unless used in
high-performance datapaths

Lecture 16, ECE 225 Kaustav Banerjee


Dual Edge-Triggered Master-Slave
Flip-Flop

 Higher throughput
 Data is sampled both in positive-edge
and negative edge of the clock
 Duty cycle of the clock should be 50%
 Setup time and hold time is important in
both edges of the clock

Lecture 16, ECE 225 Kaustav Banerjee


Dual Edge-Triggered Master-Slave
Flip-Flop

D to Q: both at the
rising and falling
edge of CLK

Lecture 16, ECE 225 Kaustav Banerjee


Single vs Dual Edge Triggered FF Timing
 Single Edge Triggered: (T/2)CLK>Max(tS,tH)

 Double Edge Triggered: (T/2)CLK>tS+tH

Note: lower frequency CLK giving same throughput, good for power savings
Lecture 16, ECE 225 Kaustav Banerjee
Other Latches/Registers: C2MOS
Clocked CMOS Register: +ve edge triggered Master-Slave FF
VDD VDD

CLK=0: M2 M6 CLK=1:
first tri-state Master is in hold
buffer turns CLK M4 CLK M8 mode…second
on….and D
X
Q
stage turns on
Master CL1 CL2 and CL1
CLK M3 CLK M7
samples the propagates to
inverted the output Q
version of D M1 M5

Q retains
previous value
on CL2 Master Stage Slave Stage

“Keepers” can be added to make circuit pseudo-static

Lecture 16, ECE 225 Kaustav Banerjee


C2MOS: Insensitive to Clock-Overlap
Either PUN or PDN is activated by overlaps…but
not both simultaneously….
VDD VDD VDD VDD

M2 M6 M2 M6

0 M4 0 M8
X X
D Q D Q
1 M3 1 M7

M1 M5 M1 M5

(a) (0-0) overlap (b) (1-1) overlap

….hence any change in stored value at X cannot propagate to Q

Lecture 16, ECE 225 Kaustav Banerjee


Other Latches/Registers: TSPCR
(True Single-Phase Clocked Register)
No signal can ever propagate from input to output…because of the dual stage
approach….slightly increases number of transistors (=12)
VDD VDD VDD VDD

Out

In CLK CLK In CLK CLK

Out

Negative latch
Positive latch
(transparent when CLK= 1,
(transparent when CLK= 0)
when CLK=0, both inverters are disabled...latch is
in hold mode)
Lecture 16, ECE 225 Kaustav Banerjee
Including Logic in TSPC
Reduces delay overhead associated with the latches…..
VDD VDD VDD VDD

In1 In2
PUN
Q Q

In CLK CLK CLK CLK

PDN In1

In2
Example: logic inside the +ve latch AND +ve latch

Note: set-up time increases a bit…but overall performance (TCLK) is


improved….used in the design of DEC Alpha Microprocessors + other HP
processors Lecture 16, ECE 225 Kaustav Banerjee
A Specialized TSPC Edge-Triggered
Register T =t
T < 1x t
su d-inv

VDD VDD VDD hold d-inv

Tcd-reg=3td-inv
CLK Q
M3 M6 M9
On the rising
Y
Q edge of CLK
D CLK X CLK dynamic-
M2 M5 M8
Note: if D=1, X=0, inverter M4-M6
hence, D must be evaluates…inv
stable until the
value on X
M7-M9 is ON,
CLK
(before the rising M1 M4 M7 and passes the
edge of CLK)
propagates to Y
value of Y to Q
(hold time)

First (static) inverter: samples inverted version of D at X …..for CLK=0


Second (dynamic) inverter: is in pre-charge mode…M6 pull up Y
Third (static) inverter: is in hold mode….M8 and M9 are off, Q is stable
Lecture 16, ECE 225 Kaustav Banerjee
Pulse-Triggered Latches
An Alternative Approach
Ways to design an edge-triggered sequential cell:

Master-Slave Pulse-Triggered
Latches Latch
L1 L2 L
Data Data
D Q D Q D Q

Clk Clk Clk Clk


Clk

Lecture 16, ECE 225 Kaustav Banerjee


Pulsed Latches
A short-pulse is constructed around the rising (or falling) edge of the CLK….
Pulse acts as the CLK input to a latch….sampling the input only in a small
window…avoids RACE conditions tsu=0;
VDD VDD
th=pulse-width; tcq=2
gate delays
M3 M6 VDD
CLK (w.r.t rising edge of CLKG)
Q
D CLKG CLKG MP CLKG
M2 M5
X

MN
M1 M4

(a) register (b) glitch generation

CLK

CLKG

(c) glitch clock


Lecture 16, ECE 225 Kaustav Banerjee
TIMING……is everything!!

Lecture 16, ECE 225 Kaustav Banerjee


Timing Classification of Digital
Systems
Synchronous: A signal that has exactly the same frequency
as the local CLK and has a known fixed phase offset to that
CLK
Mesochronous: A signal that has the same frequency as the
local CLK but has an unknown phase offset w.r.t. that CLK.
Plesiochronous: A signal that has a “slightly different”
frequency compared to that of the local CLK. This causes the
phase difference to drift in time.
Asynchronous: A signal that has no fixed relationship w.r.t.
that of the local CLK. Asynchronous signals can transition any
time.

Lecture 16, ECE 225 Kaustav Banerjee


Synchronous Timing

CLK

In Combinational
R1 R2
Cin Logic Cout Out

The certainty period of the signal Cout---the period


during which data are valid is synchronized with the
system CLK. Hence, R2 can sample the data with
confidence.

Lecture 16, ECE 225 Kaustav Banerjee


Mesochronous Timing
 If data are being passed between two
different CLK domains, the data signal
transmitted from the first module can have an
unknown phase relationship to the CLK of the
receiving module
 Not possible to directly sample the output
data at the receiving module because of
uncertainty in the phase offset
 A synchronizer is needed to synchronize the
data signal with the receiving CLK

Lecture 16, ECE 225 Kaustav Banerjee


Plesiochronous Timing
 Signal has slightly different frequency as
compared to the local CLK---phase difference
drifts over time
 Can arise due to the use of different CLK
generators
 Practically, plesiochronous timing occurs in
large distributed systems that involves long
distance communications
 A timing recovery circuit is needed to ensure
that all data are received

Lecture 16, ECE 225 Kaustav Banerjee


Asynchronous Timing
 Signals transition at arbitrary times
 Can synchronize asynchronous signals by detecting events and
by introducing latencies into the data stream synchronized to a
local CLK
 Alternatively, a self-timed asynchronous design approach may
be adopted—communication between modules achieved not
through a CLK but through a handshaking protocol that ensures
proper ordering of operation
 Computations can be performed at speeds determined solely by
the latency of the logic---no need to manage CLK skew!
 Increased complexity + overhead in communication, impacts
performance. Also, CAD methodology is more complex.

Lecture 16, ECE 225 Kaustav Banerjee


Timing Definitions…

Lecture 16, ECE 225 Kaustav Banerjee


Latch Parameters

D Q

Clk

T
Clk PWm
tsu
D
thold

tc-q td-q
Q

Delays can be different for rising and falling data transitions

Lecture 16, ECE 225 Kaustav Banerjee


Register Parameters

D Q

Clk

T
Clk

D thold

tsu
tc-q
Q

Delays can be different for rising and falling data transitions


Lecture 16, ECE 225 Kaustav Banerjee
Timing Constraints
R1 R2
In Combinational
D Q D Q
Logic

CLK tCLK1 tCLK2

tc - q tlogic
tc - q, cd tlogic, cd
tsu, thold
CLK period constraint:
Hold time constraint: TCLK > tc-q + tlogic + tsu
t(c-q, cd) + t(logic, cd) > thold
Worst case is when receiving edge arrives late
Race between data and clock

Lecture 16, ECE 225 Kaustav Banerjee


Clock Nonidealities
 Clock skew
 Spatial variation in temporally equivalent clock
edges; deterministic + random, tSK
 Clock jitter
 Temporal variations in consecutive edges of the
clock signal; modulation + random noise
 Cycle-to-cycle (short-term) tJS
 Long term tJL
 Variation of the pulse width
 Important for level sensitive clocking

Lecture 16, ECE 225 Kaustav Banerjee


Clock Skew and Jitter
Clk
tSK

Clk tJS

 Both skew and jitter affect the effective cycle time


 Only skew affects the race margin

Lecture 16, ECE 225 Kaustav Banerjee


Clock Skew
# of registers

Earliest occurrence Latest occurrence


of Clk edge of Clk edge
Nominal – /2 Nominal +  /2

Insertion delay Clk delay


Max Clk skew

Lecture 16, ECE 225 Kaustav Banerjee


Positive Skew

TCLK + d
TCLK
1 3
CLK1
d

CLK2 2 4
d + th

Launching edge arrives before the receiving edge

Lecture 16, ECE 225 Kaustav Banerjee


Timing Constraints: +ve Skew
R1 R2
In Combinational
D Q D Q
Logic

CLK tCLK1 tCLK2

tc - q tlogic
tc - q, cd tlogic, cd Minimum cycle time:
tsu, thold
TCLK +  = tc-q + tsu + tlogic

CLK skew can improve performance?


Watch out for RACE problems….(if minimum logic delay is
small, inputs to R2 can change before the CLK2 edge at 2 )
Minimum delay constraint:  + thold < tc-q,cd + tlogic,cd

Lecture 16, ECE 225 Kaustav Banerjee


Negative Skew

 is -ve
TCLK + d
TCLK
1 3
CLK1

CLK2 2 4
d

Receiving edge arrives before the launching edge

Lecture 16, ECE 225 Kaustav Banerjee


Timing Constraints: -ve Skew
R1 R2
In Combinational
D Q D Q
Logic

CLK tCLK1 tCLK2

tc - q tlogic
tc - q, cd tlogic, cd
tsu, thold Minimum cycle time:
TCLK -  = tc-q + tsu + tlogic

-ve skew adversely affects the performance….

Minimum delay constraint:  + thold < tc-q,cd + tlogic,cd: eliminates RACE

Lecture 16, ECE 225 Kaustav Banerjee


Positive and Negative Skew
CLK and data in the same direction
R1 R2 R3
In Combinational Combinational
D Q D Q D Q ???
Logic Logic

CLK tCLK1 tCLK2 tCLK3

delay delay
(a) Positive skew

R1 R2 R3
In Combinational Combinational
D Q D Q D Q ???
Logic Logic

tCLK1 tCLK2 tCLK3

delay delay CLK


(b) Negative skew
CLK and data in the opposite direction

Lecture 16, ECE 225 Kaustav Banerjee


How to counter Clock Skew?
Negative Skew

REG

REG
REG
 . log Out
REG

In  
Positive Skew

Clock Distribution

Data and Clock Routing

Must account for the worst case skew…


Lecture 16, ECE 225 Kaustav Banerjee
Impact of Jitter
 TC LK 

  t j itter
CLK 
-tji tte r 

Combinational
Worst Case:
REGS
In Logic Tjitter = 2tjitter
CLK t log ic
tc-q , tc-q, cd t log ic, cd
ts u, thold
tjitter
Absolute jitter = tjitter= worst case variation of a CLK edge at a given location w.r.t. an ideally
periodic reference CLK edge
Cycle-to-Cycle Jitter = Tjitter = time varying deviations of a single clock period relative to an ideal
reference CLK: i
T jitter n tclk
i i
, n1 clk , n  Tclk
 t
Arrival time of the nth CLK edge at i
Lecture 16, ECE 225 Kaustav Banerjee
Longest Logic Path in
Edge-Triggered Systems
TJI + 
TSU
Clk
TClk-Q
TLogic
T

Latest point Earliest arrival


of launching of next cycle

Lecture 16, ECE 225 Kaustav Banerjee


Shortest Path
Earliest point
of launching

Clk
TClk-Q TLogic

Clk
TH

Data must not arrive


Nominal
before this time
clock edge

Lecture 16, ECE 225 Kaustav Banerjee


Impact of Jitter on Clock Constraints in
Edge-Triggered Systems
tjitter

tjitter
Clk

TCLK

Jitter directly impacts the performance of sequential systems:


TCLK – 2tjitter > tc-q + tlogic + tsu

Combined effects of Skew and Jitter:

TCLK +  - 2 tjitter > tc-q + tlogic + tsu


Skew can be either positive or negative. +ve skew improves
performance, jitter always degrades performance

Lecture 16, ECE 225 Kaustav Banerjee


Impact of Jitter and Skew on Minimum Delay
Constraints in Edge-Triggered Systems
tjitter
CLK1 Receiving edge is late

CLK2 thold
tjitter

If launching edge (CLK1) is early and receiving edge (CLK2) is late:

thold +  + 2tjitter < tc-q, cd + tlogic, cd

Minimum logic delay constraint:

 < tc-q,cd + tlogic,cd - thold – 2tjitter


Acceptable skew is reduced by the jitter of the two signals

Lecture 16, ECE 225 Kaustav Banerjee


Clock Uncertainties
4 Power Supply
3 Interconnect
2 6 Capacitive Load
Devices

7 Coupling to Adjacent Lines


5 Temperature
1 Clock Generation

Sources of clock uncertainty

Lecture 16, ECE 225 Kaustav Banerjee


Clock Design Issues…

Lecture 16, ECE 225 Kaustav Banerjee


Clock Distribution
H-tree
Each path must be
balanced: equal
interconnect and
device load
Difficult to achieve
due to parameter
variations

CLK

Clock is distributed in a tree-like fashion

Lecture 16, ECE 225 Kaustav Banerjee


More realistic H-tree
Matched RC Trees
Doesn’t not require a regular
physical structure
interconnections carrying
CLK signals to different sub-
blocks are of equal length
Chip is partitioned into
balanced load segments
(tiles)
Global CLK driver distributes
the CLK to the tile drivers
located at the dots in the
figure

[Restle98]

Lecture 16, ECE 225 Kaustav Banerjee


The Grid System
GCL K Allows for late design
changes….CLK is
Driver
more easily accessible
Delay from the final
driver to each load is
not matched
Driver

Driver
GCLK GCLK Absolute delay is
minimized
Large power
dissipation: due to
excess interconnects
Driver

GCL K
•No rc-matching
•Large power

Lecture 16, ECE 225 Kaustav Banerjee


Example: DEC Alpha 21164
Clock Frequency: 300 MHz - 9.3 Million Transistors
0.5 um CMOS process
Total Clock Load: 3.75 nF
extensive use of dynamic logic….
Power in Clock Distribution network : 20 W (out of 50)
Uses Two Level Clock Distribution:

• Single 6-stage driver at center of chip


• Secondary buffers drive left and right side
clock grid in Metal3 and Metal4
Total driver size: 58 cm!

Lecture 16, ECE 225 Kaustav Banerjee


21164 Clocking
tcycle= 3.3ns  2 phase single wire clock,
trise = 0.35ns tskew = 150ps distributed globally
 2 distributed driver channels
Clock waveform
 Reduced RC delay/skew
final drivers  Improved thermal distribution
 3.75nF clock load
 58 cm final driver width
 Local inverters for latching
 Conditional clocks in caches to
reduce power
pre-driver  More complex race checking
 Device variation
Location of clock
driver on die
Lecture 16, ECE 225 Kaustav Banerjee
Clock Drivers

Lecture 16, ECE 225 Kaustav Banerjee


Clock Skew in Alpha Processor

Skew is zero at
the output of left
and right
drivers….

Lecture 16, ECE 225 Kaustav Banerjee


21264 Clocking

Lecture 16, ECE 225 Kaustav Banerjee


EV6 (Alpha 21264) Clocking
600 MHz – 0.35 micron CMOS
tcycle= 1.67ns

trise = 0.35ns tskew = 50ps


Global clock waveform
 2 Phase, with multiple conditional
buffered clocks
 2.8 nF clock load
 40 cm final driver width
 Local clocks can be gated “off” to
save power
 Reduced load/skew
 Reduced thermal issues
 Multiple clocks complicate race
PLL
checking
Lecture 16, ECE 225 Kaustav Banerjee
EV6 Clock Results
ps ps
5 300
10 305
15 310
20 315
25 320
30 325
35 330
40 335
45 340
50 345

GCLK Skew GCLK Rise Times


(at Vdd/2 Crossings) (20% to 80% Extrapolated to 0% to 100%)

Lecture 16, ECE 225 Kaustav Banerjee


EV7 Clock Hierarchy
Active Skew Management and Multiple Clock Domains

+ widely dispersed
NCLK
(Mem Ctrl) drivers
+ DLLs compensate
DLL
DLL

DLL
static and low-
frequency variation
+ divides design and
verification effort
(L2 Cache)

(L2 Cache)
L2R_CLK
L2L_CLK

PLL

- DLL design and


GCLK verification is added
(CPU Core)
work
SYSCLK + tailored clocks

Lecture 16, ECE 225 Kaustav Banerjee

You might also like