Lecture 03 13 PDF
Lecture 03 13 PDF
gate
C gd C gs C gb C gd
t ox gate
drain
Cd b
substrate
source
C sb
depletion layer
drain
Cd b C gs
source
C sb
substrate
C gb
In cutoff region, gate-to-channel capacitance composed entirely of Cgb where Cgb = CoxWLeff
o SiO t ox
When channel is formed, depletion layers blocks Cgb . In linear region, Cgb blocked by formation of channel and gate-to-channel capacitance split evenly between Cgs and Cgd where Cgs = Cgd =
1 CoxWLeff 2 2 CoxWLeff 3
Cgb CoxWLeff ~0 ~0
Source/Drain Capacitance
b b
source diffusion
poly gate
drain diffusion
side wall
xj
source
ND
bottom side wall
xj b a C ja C jp substrate
channel
substrate
NA
Two components: Cbottom diffusion area to substrate Csidewall diffusion depth peripheral area Cja = junction capacitance per m2 Cjp = periphery capacitance per m Cdiff = Cbottom + Csw = Cja area + Cjp perimeter = Cja a b + Cjp (2a + 2b)
The source/drain areas from p /n junctions with substrate or well. The junction voltage will affect the capacitance, both Cja and Cjp
General expression:
Cj =
C jo Vj 1 V b m
where Vj = junction voltage, negative for reverse bias Cjo = zero bias capacitance Vb = built-in junction potential (0.6V) m = grading coefficient (typical values between 0.3 and 0.5)
Definitions:
AS AD PS PD = = = = area of Source area of Drain perimeter of Source perimeter of Drain
The TOX parameter allows computation of Cox Cg = Cg (intrinsic) + Cg (extrinsic) Cg (intrinsic) = Cox W Leff Extrinsic Cg caused by overlap of gate with source/drain and channel ( only
2 if in saturation ) 3
channel
poly
source L
drain
Cgbo caused by poly extension past channel Cgso , Cgdo caused by overlap of poly with source/drain
Oxide encroachment:
Cgbo multiplied by channel length; Cgso , Cgdo multiplied by channel width Typically, gate capacitance will tend to dominate drain, source capacitance but can vary significantly with process. Example from book:
Cg(intrinsic) = W L Cox = 4 1 17 10-4 [pF] = 0.0068 [pF] In this example, the extrinsic gate capacitance for a typical MOS transistor is Cg(extrinsic) = (W Cgso ) + (W Cgdo ) + (2L Cgbo ) = (4 6 10-4 ) + (4 6 10-4 ) + 2 (1 2 10-4 ) [pF] = 0.0052 [pF]
where CJ = the zero-bias capacitance per junction area CJSW = the zero-bias junction capacitance per junction periphery MJ = the grading coefficient of the junction bottom MJSW = the grading coefficient of the junction sidewall VJ = the junction potential PB = the built-in voltage (~ 0.4 to 0.8 [V]) Area = AS or AD, the area of the source or drain Periphery = PS or PD, the periphery of the source or drain PB, CJ, CJSW, MJ, and MJSW are specified in the model card. AS, AD, PS, and PD are specified by the element card. VJ depends on circuit conditions. At VJ = 2.5 [V] (half rail (VDD = 5 [V])), Cjdrain = 15 10 12 2 10 4 [1 + (2 .5 / 0.7 )] + 11 .5 10 6 4 10 4 [1 + (2.5 / 0.7 )] [pF] -4 -4 = (15 2 10 0.47) + (11.5 4 10 0.63) [pF] = 0.0014 + 0.0029 [pF] = 0.0043 [pF] = 4.3 [fF]
0. 3
0 .5
Summarizing these capacitances then, Cgtotal = 0.0068 + 0.0052 [pF] = 12 [fF] Cdrain = Csource = 0.0043 [pF] (@ 2.5 [V]).
Routing Capacitance
fringing field capacitance metal interconnect capacitance to adjacent conductor
parallel capacitance
SiO 2 substrate
Fringing Field Capacitance occurs at edge of the conductor and is due to the conductor's finite thickness. Fringing Field Capacitance will cause effective capacitance to increase.
Use empirical formulas to estimate.
calculate Requires 3-D CAD Typically, just use substrate capacitance multiplied by a "fudge" factor of ~1.1, ~1.3, or even ~2.0
m2 m2 C C C m2 m2
12k 6 k
C m1 C
m2 m2 m1 C C
m1 C
6 k 6 k
poly
poly 3k 6k diffusion
Substrate
Very thin oxide (200) computationally intensive to LINE-to-GROUND EQUATION # (see text)
4.19 4.19 4.20 4.20 4.20 4.20 4.20 4.19
CONDITION
A B C D E E F G
LAYER
Poly-substrate Metal2-substrate Poly-metal2 Metal1-substrate Metal1-poly Metal1-metal2 Metal1-diffusion Metal2-diffusion
Delay
Long wire distributed RC line
R C
First-order approximation:
delay =
r c l2 2
where r = resistance per unit length c = capacitance per unit length l = length of the wire Important fact interconnect delay does not scale with lambda, it is constant. When lambda decreases, R increases and C decreases, resulting in delay constant
2mm delay
20 4 10 4 2 2 2
= 16 ns
If broken into two 1mm sections, then delay of each section = 4ns. Add a buffer with delay = 1ns and total delay becomes 4 + 1 + 4 = 9ns.
Typically, resistive effects of interconnect much more important than capacitive effects since capacitance tends to be dominated by the gate capacitances.
Load
Driver
Load
Load
Resistance/Capacitance of interconnect
MOSFET load capacitance >> wire capacitance [unless DSM (deep submicron (0.25m) CMOS technology]
So, if we decrease interconnect resistance, then we reduce overall propagation delay between driver and load.
Reduce interconnect resistance by using metal, increasing the width of the interconnect.
Usually just want delay (RC), where R is the resistance of the interconnect and C is the total of all the capacitive loads.
Example (from text) A register that fits in data-path is 25m tall (the direction of repetition). A metal2 clock line runs vertically to link all registers in an n bit register. The register has 30m of 1m metal1, 20m of 1m poly (over field oxide), and 16m of 1m gate capacitance. 1. Calculate the per-bit clock load and the load for a 16-bit register. 2. What would be the RC delay of the register from a clock buffer using 5mm of 1m metal2 (0.05 /sq.)? 3. How wide would the clock line have to be to keep the skew below 0.5ns if a register file containing 32 16-bit registers was fed with the same 5mm metal2 wire? Solution: [Capacitance values found in Table 4.6, page 202 of text.] 1. The parasitics are as follows: Cm1 = 30 30 [aF] = 900aF Cpoly = 20 50 [aF] = 1000aF = 1fF Cgs = 16 1800 [aF] = 28,800aF Creg1 = 900 + 1000 + 28,800 [aF] = 30fF Creg16 = 16 Creg1 = 480fF 2. Rmetal2 = 5000 0.05 [ /sq.] = 250 Because the capacitance load is at the end of the wire, we approximate the RC delay by adding the metal2 track capacitance to the load capactiance and performing a simpe RC calculation. Ctotal = 0.48 + Cmetal2 [pF] = 0.48 + (5000 20 10-6 ) [pF] = 0.58pF RC = 250 0.58 10-12 seconds = 0.145ns 3. We now have 32 registers, so the load capacitance of the registers is Cregfile = 32 Creg16 = 15.36pF.
5mm
Bit15
The RC for a 1m-wide clock feed is 250 15.36pF = 3.84ns. Delay of 3.84ns too big, widen the wire to reduce R; will increase C somewhat but capacitance is dominated by cell capacitance. The clock line has to be widened by 3.84/0.5 or 7.68. To be conservative, one might choose a 10m wire. Now Ct otal = 15.36 + Cmetal2 [pF] = 15.36 + (5000 10 20 10-6 ) [pF] = 16.36pF Note: R reduced by 10x, Ctotal slightly increased RC = 25 16.36 10-12 seconds = 0.41ns
For short and lightly loaded wire lengths, can ignore the R and just model wires as lumped capacitances.
w =
rcl2 2
2 g rc
l <<
Minimum width (1m) Aluminum wire, gate delay = 200ps (using data from previous example) Guidelines for ignoring RC wire delays: l <<
16000m
If lambda = 0.5m, ignore RC delay for < 2.5mm metal runs (see table).
V Input 50%
V Input Output
time
time tpd
tdelay,50-50 (or t pd ) =
time between input reaching 50% point and output reaching 50% point
One advantage of using 50% points for measurement is that it does not matter if output is rising or falling (gate inverting or non-inverting).
One problem with 50% propagation delays is that you can end up with a negative propagation delay for slowly rising/falling inputs.
V
Input Output
50% pt. time Output begins changing before input reaches 50% point
Can also define delay at 30% -to- 70% points, 10% -to- 90% points, etc.
For non-inverting gates, if we use 30% -to- 70% points: tpdlh prop delay low to high (measure between 30% input, 30% output)
tpdhl prop delay high to low (measure between 70% input, 70% output)
70%
time tpdhl
For inverting gates, if we use 30% -to- 70% points: tpdlh measure 70% input to 30% output
Modeling Delay
For a step input, then propagation delay simplifies to just rise/fall time of the output to a particular point (50%, 30%/70%, etc.).
time tr 30%
Delay can be modeled in terms of an RC delay: rise = Rrise Cload fall = Rfall Cload ,
and
V DD
V DD R rise
Rrise = k rise
1 p 1 n
Rfall = k fall
How do I determine k rise, k fall? Do SPICE simulation for a particular Cload, measure delay, solve for k rise, k fall.
These values of k rise, k fall would be valid for the particular VDD you used in the simulations.
By characterizing an inverter this way, then one can predict delay for more complex gates after transforming the complex gates into an "equivalent" inverter. In this procedure, the original characterized inverter is sometimes called the "base" inverter.
have the same worst case trise as the base inverter because
V DD
V DD A Wp Lp B Wp Lp Vin Wp Lp Wn Ln C load
Y B Wn Ln Wn Ln
For tfall, would expect NAND gate to be twice as slow (as the base inverter) because channel lengths add.
More accurate delay model breaks gate delay into two parts: internal, output
internal = gate delay with zero load (only internal capacitance values
affect delay)
external = portion of delay proportional to external load.
gate = internal + k
Make SPICE measurement at unit load (typical output load), determine k value.
Hopefully, k is a constant for different output loads, but may not be.
In this case, take SPICE measurements at different output loads and perform a curve fit of k against C values.
desired
actual k values
1xC
4xC
10xC
20xC
ko
actual k curve
1xC
4xC
10xC
20xC
Cnorm =
Delay calculation: a) compute k value based on Cload b) compute delay value based on k For a gate, need propagation delay factor for each input, both H L and L H
A B
Tphl_a_to_z , Tphl_b_to_z
Tplh_a_to_z , Tplh_b_to_z Would need k o , , , and internal parameters for each one of these...
But wait a minute! All of the previous discussion assumed a step input!
Is this realistic? No
varying
input slopes
How do I determine the range of input slopes I might see in a circuit? Need to know fastest slope, slowest slope
Fastest case would probably be for the inverter driving a 1X load, pulling down.
step input
Measure the output slope. Call it your fastest slope. Measure again for 4X load and call that your typical slope.
To get a representative "slow" slope, use a 2-input NOR gate pulling high
15x
"heavy load"
V DD
Now that you have a fast slope, and slow slope, pick values in between and generate tables of model parameters (k o , , , internal)
For different slope values do table lookup based on input slope value.
70%
30%
During characterization, I can apply a straight-line input (as shown below left)
30%
30%
0 t slope
Not very realistic. Probably want to apply a more realistic waveform (above right).
t slope
Most realistic waveform is achieved however by having another gate drive the input:
C in
C load
Vary Cin to control input slope. Only problem is that precise control of input slope is difficult, must be able to accurately predict slope of output gate based upon value of "Cin ".
C load
1
inv-1
a
inv-2
a2
inv-3
aN
inv-n
C load Cg C g1
minimum sized inverter
C g2
C gN
Each driver (inverter) larger than preceding driver by stage ratio "a". Let Cg be gate load of first driver which is minimum size. Then, CgN will be Cg aN and want Cg an CL , [Note: n = N + 1]
to guarantee that none of capacitances internal to the chain of inverters exceed Cload. For example, if CgN
an =
CL Cg
Question: What value of "a" will lead to minimum delay? What value of "n"? If we find one, we can compute the other. Delay through each stage is approximately a td where td is the delay through a minimum-sized inverter driving another minimum-sized inverter. Total Delay = n a td
We know an =
CL , C g
so
CL a= C g
Substituting,
1/n
CL Total Delay = n C g
To find optimum value for n, differentiate and set equal to zero.
1/n
td
n opt = ln
CL C g
an =
CL C g
ln C L Cg
CL Cg
ln
CL C g
ln(a) = ln
CL C g
ln(a) = 1 a = e1 2.7
A more detailed analysis shows that the intrinsic output capacitance of the inverter will affect this ratio. aopt = exp
k + a opt a opt
where k=
C drain C gate
k=
C drain C gate
= 0.215
aopt = 2.93
where Ta ambient Temperature (C) ja package thermal impedance (C/watt) Pd power dissipation Typical values for ja range from 35 to 45 (C/watt), depending on chip package
ja still air 45 38 37 48 43 33 40
ja 300 ft/min. 35 29 28 40 35 20 30
Plastic Quad Flatpack Very Thin (1.0mm) Quad Flatpack Ceramic Pin Grid Array Ceramic Quad Flatpack
Voltage also affects device speed: voltage increases , drain current increases, delay decreases
operations, variations in diffusion depth, dopant densities, oxide/diffusion geometry variations can cause transistor switching speeds to vary from wafer batch to wafer batch, wafer to wafer and even on the same
However, variations between n MOS-speeds and p MOS-speeds can be independent so one can obtain "four corners" model
When characterizing for high speed, also want to use lowest temperature, highest voltage.
When characterizing for "slow" case, want highest temperature, lowest voltage.
PROCESS
Fast-n / fast-p
TEMPERATURE
0 C 125 C 0 C
VOLTAGE
5.5V (3.6V)
TESTS Power dissipation (DC), clock races Circuit speed, external setup and hold times Pseudo- nMOS noise margin, level shifters, memory write/read, ratioed circuits Memories, ratioed circuits, level shifters
Slow-n / slow-p
4.5V (3.0V)
Slow-n / fast-p
5.5V (3.6V)
Fast-n / slow-p
0 C
5.5V (3.6V)
Power Dissipation
Power Dissipation has three components:
1. 2. 3.
For traditional CMOS design, static dissipation is limited to the leakage currents in the reversed-biased diodes formed between the substrate (or well) and source/drain regions. But in some DSM CMOS
technology subthreshold leakage tends to also contribute significant static dissipation. Subthreshold leakage increases exponentially as threshold voltage decreases; i.e., lower V T (VTn and |VTp |) CMOS technology has more static power dissipation (due to subthreshold leakage) than higher VT technology.
Static power dissipation can be extremely small: 1 inverter @ 5V 1 to 2 nanowatts static power
P =fCV
d p L
DD
This is the amount of power dissipated by charging/discharging internal capacitance and load capacitance.
Note the relations: Higher the switching speed Lower the voltage the Bigger the gates Pd Pd ! Pd
Pd = ( Pd )
clock network
+ ( Pd )
The power dissipation in the clock network tends to dominate in most designs. Usually assume the switching frequency of logic signals as some fraction of the clock frequency, can estimate by running some sample simulations and keeping switching statistics on internal nodes to build a probabilistic model of switching activity.
Logic synthesis techniques can be used to do the following: a. or and/or b. c. minimize # of gates maximize speed minimize switching activity
Also, have "short-circuit" power dissipation proportional to the amount of time when both p - and n -trees are conducting.
Slow rise/fall times on nodes can make this significant. Usually ignored in most calculations.
For power conductors, need to worry about 1. 2. Metal migration - too much current in too small a conductor will "blow" the conductor Ground Bounce - large current spikes in V DD /GND leads can occur when simultaneous outputs switch
a.
IR
b.
di dt
VDD /GND pins. Package inductance dominates. Note that by slew rates on input/output pins.
Example What would be the conductor width of power and ground wires to a 50MHz clock buffer that drives 100pF of on-chip load to satisfy the metal-migration consideration (JAL = 0.5mA/m)? What is the ground bounce with chosen conductor size? The module is 500m from both the power and ground pads and the supply voltage is 5 volts.
1.
= P/V = 25mA
Thus the width of the clock wires should be at least 50m. A good choice would be 100m. = 500/100 0.05 = 5 squares 0.05 /sq. = 0.25 IR = 0.25 25 10-3 = 6.25mV di Typically, IR term of ground bounce very small compared to L term. dt
2.