Clock and Data Recovery
Clock and Data Recovery
(MEL G623)
Project Report
On
DESIGN OF CLOCK AND DATA RECOVERY
CIRCUITS WITH FAST ACQUISITION AND
LOW JITTER
Submitted By:
Dilsya Joy 2017H1230220P
Submitted To:
Dr. Anu Gupta
1
ABSTRACT
2
ACKNOWLEDGEMENTS
I would like to express my gratitude to my Professor Dr. Anu Gupta for guiding
me throughout this project with her invaluable knowledge. She was always
supportive of my work since I began studying VLSI courses at BITS Pilani.
3
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . 6
4
4.2.1 Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.31
4.2.2 Relaxation Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.2.3 Ring Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.32
4.2.4 LC Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
CHAPTER 6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5
LIST OF FIGURES
6
Fig. 5.2 Simulation of CDR with TSPC
Fig. 5.3 Simulation of CDR with CML
Fig. 5.4 Successful Sampling of noisy data
Fig 5.5 Unsuccessful sampling of high noise data
Fig. 5.6 Eye Diagram of Ring Oscillator
Fig. 5.7 Eye-Diagram of CDR with LC Oscillator
Fig. 5.8 Eye-Diagram of Modified CDR
7
CHAPTER 1
INTRODUCTION
In wire-linked communication systems, when data flows over a single wire without
any accompanying clock, the receiver of the system is required to process this data
synchronously. Therefore, the CDR circuits are used in the receiver of the system
to recover the clock or timing information from these data. Data bandwidth for
wire-linked communication systems is also increasing at a high rate. In 2007,
according to the International Technology Roadmap for Semiconductors (ITRS),
the non-return to zero (NRZ) data rate for high-performance differential pair point-
to-point nets on the package would reach 100 Gbps by the year 2019.
In such high-speed wire-linked communication systems, these data are
corrupted both by internal and external noise during its passage from transmitter to
receiver, resulting in jitter and skew in the data received at the receiver. Here, the
clock and data recovery circuit is necessary to extract the data transmitted by the
transmitter from the corrupted received signal and also to recover the accompany
clock timing information at the receiver side of the communication systems.
In a source asynchronous system, the transmitter and receiver use different clock
sources of the same frequency. The received data are first equalized in the receiver
input buffer and then fed to the CDR circuit for retiming before proceeding into the
deserializer module. Hence, there exists a frequency offset between the transmitted
data and the local clock on the receiver side due to natural device mismatches,
which creates the challenges for CDR circuit designers. Clock and data recovery
(CDR) has been widely used in data communication systems, including optical
8
communication, backplane routing, chip-to-chip interconnects, and disk drive read
channels.
9
edges of the clock occur in the midpoint of each bit, the sampling
occurs farthest from the data transitions, providing maximum margin
for jitter and other time uncertainty.
The clock must exhibit a small jitter since the jitter of the clock
contributes to the retimed data jitter.
The data retiming circuit uses a Delay Flip Flop (DFF), which is triggered by the
recovered clock to retime the received data. The DFF samples the corrupted
received data and regenerates the data with less jitter and skew.
Both phase locked loop (PLL) and delay locked loop (DLL) have been widely used
in clock and data recovery. PLL solutions to CDR usually use narrow-band loop
filters to reduce jitter which results in longer acquisition times. Usually this is in
the ½ to 1 microsecond range. If the jitter is low, less coding is needed to reduce
the number of bit errors. DLL CDRs can lock to the data in just a few clock cycles
by means of phase selection but have high jitter that results in higher bit error rate.
Thus, more coding overhead is needed to reduce the number of bit errors.
Therefore, there are tradeoffs between fast acquisition and low jitter.
10
CHAPTER 2
LITERATURE REVIEW
In many systems, data are transmitted or retrieved without any additional time
reference, but the receiver must eventually process the data synchronously. Thus,
the time information (e.g. clock) must be recovered from the data at the receive
end. The common ways to recover the clock are with a phase locked loop or a
delay locked loop (DLL).
11
Fig. 2.1 Basic elements of the PLL based CDR
12
of the data transition density of the input, this design fails to uniquely represent the
phase difference for various data patterns as shown in Figure 2.2, thus, this design
is data pattern dependent
One example of a linear phase detector is the Hogge Phase Detector. The
circuit implementation and output waveform is shown in Figure 2.3. The Hogge
13
Phase Detector consists of two DFFs and two Exclusive OR (XOR) gates. The
function of a DFF is to produce a delayed replica of the input signal at its output.
The first DFF, named FF1, produces a delayed replica of the input data at the rising
edge of the clock and is then XORed with input data. The output of the XOR gate,
named X, gives the phase difference between two input signals. To avoid the
problem of data pattern dependency, the proportional pulses obtained at node X are
accompanied by reference pulses at node Y, which are generated by using an
additional DFF (FF2) and XOR gate. The reference pulses appear on the data edge
and have constant pulse width, thus avoiding the pattern dependency, as shown in
Figure 2.3.
14
To summarize, this PD samples the data using the VCO clock for which a
DFF and a XOR gate perform the explicit function of edge detection. Then, we
produce a reference pulse, using the other DFF and XOR to eliminate the
ambiguity for different data transitions. Due to the Clock to Q delays of the flip-
flops, the Din and Clock must sustain a skew to equalize the widths of the output
pulses. This skew effect becomes significant at high speeds as the skew ΔT
becomes a significant fraction of the clock period. This might lead to a phase offset
after the loop is locked which degrades the clock phase margin and jitter tolerance.
To overcome this, we can widen the reference pulses by ΔT/2 or narrow the
proportional output pulses by the same amount. One of the disadvantages of the
Hogge PD is that there is a skew of TClk/2 between the two output pulses in the
locked condition. This causes a lot of disturbance in the VCO. As this is a linear
PD it sends out small average signals to the charge pump resulting in little activity
at the charge pump.
In binary phase detectors, the output is either logic one or zero. One example
of a binary phase detector is the Alexander Phase Detector. The Alexander Phase
Detector accepts two input signals (e.g. clock and data) and determines whether the
clock is earlier or later than the data. If the clock is earlier than the data, the early
node goes to logic one and the late node goes to logic zero. Otherwise, when the
clock is later than the data, the late node goes to logic one and early node goes to
logic zero. A more detailed explanation of the Alexander Phase Detector is
presented in Chapter 3.
15
2.1.2 Charge Pump (CP) and Low Pass Filter (LPF)
The function of the charge pump is to convert the output voltage of the
phase detector to current. This current is then fed to a low pass filter, where the
capacitor is either charged or discharged depending on the phase detector output.
The circuit diagram of the charge pump with a Type-I LPF (capacitor) is shown in
Figure 2.4 and a Type-II LPF is shown in Figure 2.5.
In this project, the Alexander Phase Detector is used, where the output is
either early or late. The early and late nodes are connected to respective switches
of the charge pump circuit, as shown in Figure 2.5. When the early node is high,
closing the early switch, the capacitor starts charging and continues to charge until
the early node goes low, opening the early switch. Similarly, when the late node
goes high, the capacitor starts to discharge and will continue to discharge until the
late node goes low.
16
Fig. 2.4 Charge Pump with Type-I LPF
Designing a charge pump is not an easy task, because to achieve zero net
voltage on the capacitor, the charging current should be equal to the discharging
current. Even if the charging and discharging currents are designed to be close to
equal, there will still be leakage current through the charge pump circuit, resulting
in an offset voltage on the capacitor. One way to minimize this offset voltage is to
calibrate the charge pump circuit by using a feedback loop circuitry.
The function of the low pass filter (LPF) is to convert the charge pump
current into control voltage. The Type-I LPF is replaced by Type-II LPF due to
trade-offs between the settling time, ripple on the control voltage, and the phase
error and stability. To minimize the ripples on the control voltage, the capacitor
from Figure 2.5 is replaced by the resistor (R) in series with the capacitor (C1),
both in parallel with the capacitor (C2), as shown in Figure 2.5. If the capacitor
17
(C2) is five to ten times less then capacitor (C1), then the Type-II LPF will still
approximately behave as a Type-I LPF.
2.1.3 Voltage Controlled Oscillator (VCO)
The function of the voltage control oscillator is to generate the clock signal
at its output, the frequency of which can be changed by varying the input control
voltage. Oscillators have wide applications in communication system ranging from
clock generation in microprocessors to carrier synthesis in cellular telephones.
Detailed working of Oscillators will be explained in chapter 4.
18
CHAPTER 3
ALEXANDER PHASE DETECTOR
19
data at S3 on the falling edge of the clock and the fourth flip flop (FF4) delays the
output of the third flip-flop (FF3) by half a clock cycle.
As seen from the waveform of Figure 3.3, for the early case, the FF1
samples the high data level (logic one) at the first rising edge of the clock. At the
second rising edge of the clock, the FF2 performs two functions:
1) Produces the replica of the first sample (S1) delayed by one clock cycle, at
the output of the FF2, and
2) Samples the low data level (logic zero).
The FF3 samples the high data level (logic one) at the first falling edge of the
clock.
At the next rising edge of the clock, the FF4 produces the replica of the second
sample (S2) delayed by half a clock cycle, at its output. The clock phases of all the
four DFFs should be such that, the three samples S1, S2, and S3 reaches a valid
logic level for comparison at t = T1 and remains constant for one clock period.
Once the three samples S1, S2, and S3 reaches valid logic level and remain
constant for one clock period, the XOR gate produces a valid logic level at the
output. The same process is vice versed for the late case and shown in Figure 3.1.
S1 S2 S3 Decision
0 0 0 Cannot determine whether the clock is earlier or later than the data.
1 1 1 Cannot determine whether the clock is earlier or later than the data.
20
Table 3.1: Decisions of the Alexander PD.
21
Fig. 3.1 Alexander phase detector
3.1.1 Transistor level Alexander PD with TSPC Logic
The complete Alexander PD designed in cadence using TSMC 180nm
technology is as shown in Fig. 3.2. It consists of three Positive edge triggered D-
FF, one Negative edge triggered D-FF and two Exor gates all using TSMC logic.
The sizes of the transistors used in TSPC logic are given in table 3.2. All
dimensions are given in micrometer. The circuit consists of alternating stages
called n-blocks and p-blocks and each block is being driven by the same clock
signal. The schematic of original TSPC flip-flop is shown in Fig.3.3. In this design
a single global clock signal needs to be generated and distributed in order to
simplify the design. Fig.3.3 presents positive edge triggered TSPC D-flip-flop. It is
operated as when the clock signal clk is LOW, the input is isolated from the output.
22
When clock makes a LOW-to-HIGH the output will latch the complement of the
input.
Transistor Size
PMOS 8/0.18
NMOS 4/0.18
Table 3.2: Dimension of TSPC D-FF
Fig. 3.3 shows the schematic of TSPC D flip-flop with 11 transistors, this edge
triggered flip-flop uses just a single clock signal for synchronization. Fig.1 shows
the positive edge triggered 11 transistors TSPC (True Single Phase Clocking) flip-
flop. During the ON period whatever is the value of input it becomes output.
This TSPC based implementation fails at higher data rates as it is not able to
sample the inputs at such higher rates. Fig. 3.4 shows that the TSPC D-FF was not
23
able to sample the data correctly at a data rate of 10Gbps. Thus the need for
another type of logic which was capable of wideband operations.
24
their low current gain. To increase the current gain, the size of the transistor in the
output branch can be made large, however, at the cost of reduced bandwidth. The
schematic of CML D-FF is shown in Fig. 3.5.
25
M0,M1 M2,M3,M4,M5 L R Iss
5/0.18 12/0.18 1p 300 10m
Table 3.3: Sizes of CML D-FF
The simulation of this D-FF showed that the data is retained even at data rate
of 10Gbps.
NMOS L R Iss
12/0.18 1p 600 2.5m
Table 3.4: Sizes of CML EXOR gate
26
Fig. 3.7 CML Based EXOR Gate
27
Fig. 3.8 Simulation of CML Alexander PD
28
Fig. 3.10 Clock Early
29
CHAPTER 4
CHARGE PUMP AND VCO
30
Fig. 4.1 Schematic of Charge Pump
When the up pulses arrive, the upper circuit charges the capacitor C0 as the
transistor M21 is off. Similarly, when the down pulses arrive the lower circuit
discharges the capacitor C0 as the transistor M10 is off. Here current mirror is used
for charging and discharging. The voltage across the capacitor is the control
voltage (vcontrol) for the VCO.
The transistor sizes are as shown in table 4.1.
31
4.1.1 Simulation Results
32
As the results show, when continuous up pulses are applied as in Fig 4.2,
piecewise linear increase is observed in vcontrol. Similarly, when continuous down
pulses are applied as in Fig. 4.3, piecewise linear decrease is observed in vcontrol.
33
4.2.2 Relaxation Oscillator
34
Fig. 4.4 Schematic of Ring Oscillator
The NOT gates, or inverters, are attached in a chain; the output of the last
inverter is fed back into the first. Because a single inverter computes the logical
NOT of its input, it can be shown that the last output of a chain of an odd number
of inverters is the logical NOT of the first input. This final output is asserted a
finite amount of time after the first input is asserted; the feedback of this last output
to the input causes oscillation. A real ring oscillator only requires power to operate;
above a certain threshold voltage, oscillations begin spontaneously. To increase the
frequency of oscillation, two methods may be used. Firstly, the applied voltage
may be increased; this increases both the frequency of the oscillation and the
power consumed, which is dissipated as heat.
The schematic for a 3 stage ring oscillator is shown in Fig. 4.4. Width of the
Pmos and Nmos are in the ratio Wp:Wn=12:0.18.
The output at the third stage gets inverted as odd numbers of stages are used and
this output is fedback to the first stage. Therefore the output of the third stage
keeps on changing after each cycle and this results in oscillations.
4.2.4 LC Oscillator
An LC oscillator is actually a feedback oscillator which uses capacitors and
inductors in its feedback network. It can be built from a transistor, an operational
amplifier, a tube, or some other active (amplifying) device. Oscillation is brought
about by applying a portion of the amplifiers’ output signal to its input. That
feedback signal must be applied in phase with the original input signal. The
amplifier is usually an inverter that provides 180o of phase shift by itself, and an
additional 180o of phase shift must be provided through some other means.
35
In an LC oscillator circuit, the feedback network is a tuned circuit (often
called a tank circuit). The tuned circuit is a resonator consisting of an inductor (L)
and a capacitor (C) connected together. Charge flows back and forth between the
capacitor's plates through the inductor, so the tuned circuit can store electrical
energy oscillating at its resonant frequency. There are small losses in the tank
circuit, but the amplifier compensates for those losses and supplies the power for
the output signal. LC oscillators are often used at radio frequencies, when a tunable
frequency source is necessary, such as in signal generators, tunable radio
transmitters and the local oscillators in radio receivers. Typical LC oscillator
circuits are the Hartley, Colpitts and Clapp circuits.
The LC-tank VCO incorporated in this design is shown in Fig. 11. The bias
current, inductors, and device sizes are properly chosen such that it reaches optimal
performance. The resistor is used to slightly lift up the output common-mode level
of by 200 mV so as to relax the voltage headroom of the subsequent buffers
(realized as differential pairs). The table 4.2 shows the sizes of transistors.
36
Fig 4.5 Frequency VS Vcontrol
37
Fig. 4.6 Schematic of LC Oscillator
The LC oscillator has been simulated for Vcontrol vs frequency and the sensitivity
obtained is 900Mhz/1V and the result shown in Fig 4.5.
38
CHAPTER 5
CDR BLOCK
39
5.1 Complete CDR Architecture with TSPC
As the result shows, the complete architecture when simulated with TSPC based
PD is not able to reproduce the data at high data rate of 7Ghz.
40
5.2 Complete CDR Architecture with CML
41
5.3 Data Recovery with Noisy Data
42
As the results show, when a small noise of 100mV(p-p) is present in the circuit
then the data is recovered. If the noise is increased, then at 900mV(p-p), the circuit
is no longer able to reproduce the data correctly.
43
Fig. 5.7 Eye-Diagram of CDR with LC Oscillator
44
Fig. 5.8 Eye-Diagram of Modified CDR
The results of the eye diagrams are compared in the table and as can be seen the
jitter in architecture with LC oscillator and rectifier block is the least.
Eye Opening
Architecture Jitter
(Vertical)
CDR with Ring
26.844ps 1.5V
Oscillator(~4GHz)
45
CHAPTER 6
CONCLUSION
46
REFERENCES
[1] P. Yue and M. Rodwell, “mm-wave IC design: The transition from III-V to CMOS circuit techniques,
short course, RF and high speed CMOS,” in Proc. IEEE Compound Semiconductor Integrated Circuit
Symp. (CSICS), Nov. 2006.
[2] Y. M. Greshishchev and P. Schvan, “SiGe clock and data recovery IC with linear-type PLL for 10-
Gb/s SONET application,” IEEE J. Solid- State Circuits, vol. 35, no. 9, pp. 1353–1359, Sep. 2000.
[3] J. D. H. Alexander, “Clock recovery from random binary data,” Elec- tron. Lett., vol. 11, pp. 541–542,
Oct. 1975.
[4] C. R. Hogge, “A self-correcting clock recovery circuit,” J. Lightw. Techol., vol. 3, no. 12, pp. 1312–
1314, Dec. 1985.
[5] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear
phase detector,” IEEE J. Solid-State Circuits, vol. 36, no. 5, pp. 761–768, May 2001.
[6] H. Noguchi et al., “A 40-Gb/s CDR circuit with adaptive decision-point control based on eye-opening
monitor feedback,” IEEE J. Solid-State Circuits, vol. 43, no. 12, pp. 2929–2938, Dec. 2008.
[7] Y. Amamiya et al., “A 40 Gb/s multi-data-rate CMOS transceiver chipset with SFI-5 interface for
optical transmission systems,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
Feb. 2009, pp. 358–359.
[8] 40 Gb/s and 100 Gb/s Ethernet Task Force, IEEE P802.3ba [Online]. Available:
http://www.ieee802.org/3/ba/index.html
[9] A. Pottbacker et al., “A Si bipolar phase and frequency detector for clock extraction up to 8 Gb/s,”
IEEE J. Solid-State Circuits, vol. 27, no. 12, pp. 1747–1751, Dec. 1992.
[10] S. B. Anand and B. Razavi, “A 2.75 Gb/s CMOS clock recovery circuit with broad capture range,” in
Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2001, pp. 214–215.
[11] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery circuit with a half-rate binary
phase/frequency detector,” IEEE J. Solid- State Circuits, vol. 38, no. 1, pp. 13–21, Jan. 2003.
[12] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill,
2002.
47