Logical Effort
Logical Effort
Ivan E. Sutherland
Bob F. Sproull
David L. Harris
2 Design Examples 23
2.1 The AND function of eight inputs . . . . . . . . . . . . . . . . . . 23
2.1.1 Calculating gate sizes . . . . . . . . . . . . . . . . . . . . 26
2.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 Generating complementary inputs . . . . . . . . . . . . . 29
2.3 Synchronous arbitration . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 The original circuit . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 Improving the design . . . . . . . . . . . . . . . . . . . . 34
2.3.3 Restructuring the problem . . . . . . . . . . . . . . . . . 38
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
iii
iv CONTENTS
6 Forks of Amplifiers 95
6.1 The fork circuit form . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 How many stages should a fork use? . . . . . . . . . . . . . . . . 99
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
CONTENTS v
12 Conclusions 189
12.1 The theory of logical effort . . . . . . . . . . . . . . . . . . . . . 189
12.2 Insights from logical effort . . . . . . . . . . . . . . . . . . . . . 191
12.3 A design procedure . . . . . . . . . . . . . . . . . . . . . . . . . 193
12.4 Other approaches to path design . . . . . . . . . . . . . . . . . . 196
12.4.1 Simulate and tweak . . . . . . . . . . . . . . . . . . . . . 196
12.4.2 Equal fanout . . . . . . . . . . . . . . . . . . . . . . . . 196
12.4.3 Equal delay . . . . . . . . . . . . . . . . . . . . . . . . . 197
12.4.4 Numerical optimization . . . . . . . . . . . . . . . . . . 197
12.5 Shortcomings of logical effort . . . . . . . . . . . . . . . . . . . 198
12.6 Parting words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Ivan E. Sutherland
Bob F. Sproull
David Harris
February 1998
Chapter 1
The method of logical effort is an easy way to estimate the delay in an MOS
circuit. By comparing delay estimates of different logic structures, the fastest
candidate can be selected. The method also specifies the proper number of logic
stages on a path and the best transistor sizes for the logic gates. Because the
method is easy to use, it is ideal for evaluating alternatives in the early stages of a
design and provides a good starting point for more intricate optimizations.
This chapter describes the method of logical effort and applies it to simple ex-
amples. Chapter 2 explores more complex examples. These two chapters together
provide all you need to know to apply the method of logical effort to a wide class
of circuits. The remainder of this monograph is devoted to derivations that show
why the method of logical effort works, to some detailed optimization techniques,
and to the analysis of special circuits such as domino logic and multiplexers.
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
1
2 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
The effort delay depends on the load and on properties of the logic gate driving
the load. We introduce two related terms for these effects: the logical effort, g ,
captures properties of the logic gate, while the electrical effort, h, characterizes
the load. The effort delay of the logic gate is the product of these two factors:
f = gh (1.3)
1
The term “gate” is ambiguous in integrated-circuit design, signifying either a circuit that im-
plements a logic function such as NAND or the gate of an MOS transistor. We hope to avoid
confusion by referring to “logic gate” or “transistor gate” unless the meaning is clear from con-
text.
2
This definition of differs from that used by Mead & Conway [6].
1.1. DELAY IN A LOGIC GATE 3
The logical effort, g , captures the effect of the logic gate’s topology on its ability to
produce output current. It is independent of the size of the transistors in the circuit.
The electrical effort, h, describes how the electrical environment of the logic gate
affects performance and how the size of the transistors in the gate determines its
load-driving capability. The electrical effort is defined by:
h = Cout=Cin (1.4)
where Cout is the capacitance that loads the logic gate and Cin is the capacitance
presented by the logic gate at one of its input terminals. Electrical effort is also
called fanout by many CMOS designers.3
Combining Equations 1.2 and 1.3, we obtain the basic equation that models
the delay through a single logic gate, in units of :
d = gh + p (1.5)
This equation shows that logical effort g and electrical effort h both contribute
to delay in the same way. This formulation separates , g , h, and p, the four
contributions to delay. The process parameter represents the speed of the basic
transistors. The parasitic delay, p, expresses the intrinsic delay of the gate due to
its own internal capacitance, which is largely independent of the size of the tran-
sistors in the logic gate. The electrical effort, h, combines the effects of external
load, which establishes Cout , with the sizes of the transistors in the logic gate,
which establish Cin . The logical effort, g , expresses the effects of circuit topology
on the delay free of considerations of loading or transistor size. Logical effort is
useful because it depends only on circuit topology.
Logical effort values for a few CMOS logic gates are shown in Table 1.1. Log-
ical effort is defined so that an inverter has a logical effort of one. This unitless
form means that all delays are measured relative to the delay of a simple inverter.
An inverter driving an exact copy of itself experiences an electrical effort of one.
Because the logical effort of an inverter is defined to be one, an inverter driving
an exact copy of itself will therefore have an effort delay of one, according to
Equation 1.3.
The logical effort of a logic gate tells how much worse it is at producing output
current than is an inverter, given that each of its inputs may contain only the same
input capacitance as the inverter. Reduced output current means slower operation,
3
Fanout, in this context, depends on the load capacitance, not just the number of gates being
driven.
4 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
and thus the logical effort number for a logic gate tells how much more slowly
it will drive a load than an inverter would. Equivalently, logical effort is how
much more input capacitance a gate presents to deliver the same output current
as an inverter. Figure 1.1 illustrates simple gates sized for roughly equal output
currents. From the ratio of input capacitances, one can see that the NAND gate has
logical effort g = 4=3 and the NOR gate has logical effort g = 5=3. Chapter 4
estimates the logical effort of other gates, while Chapter 5 shows how to extract
logical effort from circuit simulations.
It is interesting but not surprising to note from Table 1.1 that more complex
logic functions have larger logical effort. Moreover, the logical effort of most logic
gates grows with the number of inputs to the gate. Larger or more complex logic
gates will thus exhibit greater delay. As we shall see later on, these properties
make it worthwhile to contrast different choices of logical structure. Designs that
minimize the number of stages of logic will require more inputs for each logic
gate and thus have larger logical effort. Designs with fewer inputs and thus less
logical effort per stage may require more stages of logic. In Section 1.3, we will
see how the method of logical effort expresses these tradeoffs.
The electrical effort, h, is just a ratio of two capacitances. The load driven by a
logic gate is the capacitance of whatever is connected to its output; any such load
will slow down the circuit. The input capacitance of the circuit is a measure of
the size of its transistors. The input capacitance term appears in the denominator
of Equation 1.4 because bigger transistors in a logic gate will drive a given load
faster. Usually most of the load on a stage of logic is the capacitance of the input
or inputs of the next stage or stages of logic that it drives. Of course, the load also
includes the stray capacitance of wires, drain regions of transistors, etc. We shall
1.1. DELAY IN A LOGIC GATE 5
a 2 4
b 2
2
x 4
x
x
a 2
1 a 1
b
2 1
Figure 1.1: Simple gates. (a) Inverter. (b) Two-input NAND gate. (c) Two-input
NOR gate.
=2
,p
5 4/3
g=
1
=
Normalized delay: d
p
:
ND
4 1,
=
NA
g
r:
ut
rte
inp
ve
3
2-
in
effort
delay
2
1
parasitic delay
1 2 3 4 5
Electrical effort: h
Figure 1.2: Plots of the delay equation for an inverter and aa two-input NAND
gate.
1.1. DELAY IN A LOGIC GATE 7
gate and of the load capacitance it drives, because wider transistors have corre-
spondingly greater diffusion capacitance. This delay is a form of overhead that
accompanies any gate. The principal contribution to parasitic delay is the capac-
itance of the source/drain regions of the transistors that drive the gate’s output.
Table 1.2 presents estimates of parasitic delay for a few logic gate types; note that
parasitic delays are given as multiples of the parasitic delay of an inverter, denoted
as pinv . A typical value for pinv is 1:0 delay units4, which is used in most of the
examples in this book. Parasitic delay is covered in more detail in Chapters 3 and
5.
The delay model of a single logic gate, as represented in Equation 1.5, is a
simple linear relationship. Figure 1.2 shows this relationship graphically: delay is
plotted as a function of electrical effort for an inverter and for a 2-input NAND gate.
The slope of the line is the logical effort of the gate; its intercept is the parasitic
delay. The graph shows that we can adjust the delay by adjusting the electrical
effort or by choosing a logic gate with a different logical effort. Once we have
chosen a gate type, however, the parasitic delay is fixed, and our optimization
procedure can do nothing to reduce it.
Because the logical effort of an inverter is 1, we have, from Equa-
tion 1.5, d = gh + p = 1 1 + pinv = 2:0. This result expresses the
delay in delay units; it can be scaled by to obtain the absolute delay,
dabs = 2:0 . In a 0:6 process with = 50ps, dabs = 100ps.
4
pinv is a strong function of process-dependent diffusion capacitances, but 1:0 is representative
and is convenient for hand analysis.
8 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
Example 1.3 A four-input NOR gate drives ten identical gates, as shown in Fig-
ure 1.5. What is the delay in the driving NOR gate?
1.2. MULTI-STAGE LOGIC NETWORKS 9
x
d x
x
x
x x
x
x
x
x
x
The electrical effort along a path through a network is simply the ratio of the
capacitance that loads the last logic gate in the path to the input capacitance of the
first gate in the path. We use an upper-case symbol, H , to indicate the electrical
effort along a path.
10 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
H = Cout=Cin (1.8)
In this case, Cin and Cout refer to the input and output capacitances of the path
as a whole, as may be inferred from context.
We need to introduce a new kind of effort, named branching effort, to account
for fanout within a network. So far we have treated fanout as a form of electrical
effort: when a logic gate drives several loads, we sum their capacitances, as in
Example 1.3, to obtain an electrical effort. Treating fanout as a form of electrical
effort is easy when the fanout occurs at the final output of a network. This method
is less suitable when the fanout occurs within a logic network because we know
that the electrical effort for the network depends only on the ratio of its output
capacitance to its input capacitance.
When fanout occurs within a logic network, some of the available drive current
is directed along the path we are analyzing, and some is directed off the path. We
define the branching effort b at the output of a logic gate to be:
where Con ;path is the load capacitance along the path we are analyzing and Co ;path
is the capacitance of connections that lead off the path. Note that if the path does
not branch, the branching effort is one. The branching effort along an entire path,
B , is the product of the branching effort at each of the stages along the path.
Y
B= bi (1.10)
Armed with definitions of logical, electrical, and branching effort along a path,
we can define the path effort, F . Again, we use an upper-case symbol to distin-
guish the path effort from the stage effort, f , associated with a single logic stage.
The equation that defines path effort is reminiscent of Equation 1.3, which defines
the effort for a single logic gate:
F = GBH (1.11)
Note that the path branching and electrical efforts are related to the electrical effort
of each stage:
Y Y
BH = CCout bi = hi (1.12)
in
1.2. MULTI-STAGE LOGIC NETWORKS 11
The designer knows Cin , Cout , and branching efforts bi from the path specification.
Sizing the path consists of choosing appropriate electrical efforts h i for each stage
to match the total BH product.
Although it is not a direct measure of delay along the path, the path effort
holds the key to minimizing the delay. Observe that the path effort depends only
on the circuit topology and loading and not upon the sizes of the transistors used
in logic gates embedded within the network. Moreover, the effort is unchanged
if inverters are added to or removed from the path, because the logical effort of
an inverter is one. The path effort is related to the minimum achievable delay
along the path, and permits us to calculate that delay easily. Only a little more
work yields the best number of stages and the proper transistor sizes to realize the
minimum delay.
The path delay, D , is the sum of the delays of each of the stages of logic in
the path. As in the expression for delay in a single stage (Equation 1.5), we shall
distinguish the path effort delay, D F , and the path parasitic delay, P :
X
D= di = DF + P (1.13)
where the subscripts index the logic stages along the path. The path effort delay
is simply X
DF = gihi (1.14)
and the path parasitic delay is X
P = pi (1.15)
Optimizing the design of an N -stage logic network proceeds from a very sim-
ple principle which we will prove in Chapter 3: The path delay is least when each
stage in the path bears the same stage effort. This minimum delay is achieved
when the stage effort is
f^ = gihi = F 1=N (1.16)
We use a hat over a symbol to indicate an expression that achieves minimum delay.
A C
y
z B
To equalize the effort borne by each stage on a path, and therefore achieve the
minimum delay along the path, we must choose appropriate transistor sizes for
each stage of logic along the path. Equations 1.16 and 1.3 combine to require that
each logic stage be designed so that
From this relationship, we can determine the transistor sizes of gates along a
path. Start at the end of the path and work backward, applying the capacitance
transformation:
Cin;i = Cout;i gi
f^
(1.19)
This determines the input capacitance of each gate, which can then be distributed
appropriately among the transistors connected to the input. The mechanics of this
process will become clear in the following examples.
Example 1.4 Consider the path from A to B involving three two-input NAND
gates shown in Figure 1.6. The input capacitance of the first gate is C and the
load capacitance is also C. What is the least delay of this path and how should the
transistors be sized to achieve least delay? (The next example will use the same
circuit with a different electrical effort.)
Example 1.5 Using the same network as in the previous example, Figure 1.6, find
the least delay achievable along the path from A to B when the output capacitance
is 8C .
Using the result from Example 1.4 that G = (4=3) 3 and the new
electrical effort H = 8C=C = 8, we compute F = GBH = (4=3)3
8 = 18:96, so the least path delay is D^ = 3(18:96)1=3 + 3(2pinv ) =
14:0 delay units. Observe that although the electrical effort in this
example is eight times the electrical effort in the earlier example, the
delay is increased by only 40%.
Now let us compute the transistor sizes that achieve minimum
delay. The stage effort f^ = 18:961=3 = 8=3. Starting with the output
load 8C , apply the capacitance transformation of Equation 1.19 to
compute input capacitance z = 8C (4=3)=(8=3) = 4C . Similarly,
y = z (4=3)=(8=3) = z=2 = 2C . To verify the calculation, calculate
the capacitance of the first gate y (4=3)=(8=3) = z=2 = C , matching
the design specification. Each successive logic gate has twice the
input capacitance of its predecessor. This is achieved by making the
transistors in a gate twice as wide as the corresponding transistors in
its predecessor. The wider transistors in successive stages are better
able to drive current into the larger loads.
Example 1.6 Optimize the circuit in Figure 1.7 to obtain the least delay along
the path from A to B when the electrical effort is 4.5.
14 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
4.5C
y z
4.5C
A C
y
z B
4.5C
10 m gate cap
x
y
z
20 m gate cap
Example 1.7 Size the circuit in Figure 1.8 for minimum delay. Suppose the load
1.2. MULTI-STAGE LOGIC NETWORKS 15
is 20 microns of gate capacitance and that the inverter has 10 microns of gate
capacitance.
in terms of microns of gate width, as given in this problem.
The path has logical effort G = 1 (5=3) (4=3) 1 = 20=9.
The electrical effort is H = 20=10 = 2 and the branching effort is 1.
Thus, F = GBH = 40=9, and f^ = (40=9)1=4 = 1:45.
Start from the output and work backward to compute sizes. z =
20 1=1:45 = 14. y = 14 (4=3)=1:45 = 13. x = 13 (5=3)=1:45 =
15. These input gate widths are divided among the transistors in each
gate. Notice that the inverters are assigned larger electrical efforts
than the more complex gates because they are better at driving loads.
Also note that these calculations do not have to be very precise. We
will see in Section 3.6 that sizing a gate too large or too small by
a factor of 1.5 still result in circuits within 5% of minimum delay.
Therefore, it is easy to use “back of the envelope” hand calculations
to find gate sizes to one or two significant figures.
Note that the parasitic delay does not enter into the procedure for
calculating transistor sizes to obtain minimum delay. Because the
parasitic delay is fixed, independent of the size of the logic gate, ad-
justments to the size of logic gates cannot alter parasitic delay. In fact,
we can ignore parasitic delay entirely unless we want to obtain an ac-
curate estimate of the time required for a signal to propagate through
a logic network, or if we are comparing two logic networks that con-
tain different types of logic gates or different numbers of stages and
therefore exhibit different parasitic delays.
Example 1.8 Consider three alternative circuits for driving a load 25 times the
input capacitance of the circuit. The first design uses one inverter, the second
uses three inverters in series, and the third uses five in series. All three designs
compute the same logic function. Which is best, and what is the minimum delay?
In all three cases, the path logical effort is one, the branching
effort is one, and the electrical effort is 25. Equation 1.17 gives the
path delay D = N (25)1=N + Npinv where N = 1, 3, or 5. For N = 1,
16 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
This example shows that the fastest speed obtainable depends on the number
of stages in the circuit. Since the path delay varies markedly for different values
of N , it is clear we need a method for choosing N to yield the least delay; this is
the topic of the next section.
Example 1.9 A string of inverters is used to drive a signal that goes off-chip
through a pad. The capacitance of the pad and its load is 35 pF, which is equiv-
alent to about 20,000 microns of gate capacitance. Assuming the load on the
input should be a unit-sized inverter in a 0:6 process with 7.2 microns of input
capacitance, how should the inverter string be designed?
Thus the input capacitance of each inverter in the string will be 3.75
^ = 6 3:75 +
times that of its predecessor. The path delay will be D
6 pinv = 28:5 delay units. This corresponds to an absolute delay of
28:5 = 1:43 ns, assuming = 50 ps.
This example finds the best ratio of the sizes of succeeding stages to be 3.75.
Many texts teach us to use a ratio of e = 2:718, but the reasoning behind the
smaller value fails to account for parasitic delay. As the parasitic delay increases,
the size ratio that achieves least delay rises above e, and the best number of stages
to use decreases. Chapter 3 explores these issues further and presents a formula
for the best stage effort
In general, the best stage effort f^ is between 3 and 4. Targeting a stage effort
of 4 is convenient during design and gives delays within 1% of minimum delay
for typical parasitics. Thus, the number of stages N ^ is about log4 F . We will find
that stage efforts between 2 and 8 give delays within 35% of minimum and efforts
between 2.4 and 6 give delays within 15% of minimum. Therefore, choosing the
right stage effort is not critical.
We will also see in Chapter 3 that an easy way to estimate the delay of a path is
to approximate the delay of a stage with effort of 4 as that of a fanout-of-4 (FO4)
inverter. We found in Example 1.2 that a FO4 inverter has a delay of 5 units.
Therefore, the delay of a circuit with path effort F is about 5 log4 F , or about
log4 F FO4 delays. This is somewhat optimistic because it neglects the larger
parasitic delay of complex gates.
1.4 Summary
The method of logical effort is a design procedure for achieving the least de-
lay along a path of a logic network. It combines into one calculation the effort
1.4. SUMMARY 19
G = Q gi
Term Stage expression Path expression
Logical effort g (Table 1.1)
Electrical effort h = Cout =Cin H = Cpath ;outQ=Cpath ;in
Branching effort — B = bi Q
Effort f = gh F = GBHP= fi
Effort delay f DF = fi
minimized when fi = F 1=N
^
required to drive large electrical loads and to perform logic functions. The proce-
dure is, in summary:
1. Compute the path effort, F = GBH , along the path of the network you are
analyzing. The path logical effort, G, is the product of the logical efforts
of the logic gates along the path; use Table 1.1 to obtain the logical effort
of each individual logic gate. The branching effort, B , is the product of the
branching effort at each stage along the path. The electrical effort, H , is
the ratio of the capacitance loading the last stage of the network to the input
capacitance of the first stage of the network.
^
2. Use Table 1.3 or estimate N = log4 F to find out how many stages, N^ , will
yield the least delay.
(gi=f^)Cout for each stage. The value of Cin for a stage becomes Cout for
the previous stage, perhaps modified to account for branching effort.
This design procedure finds the circuit with the least delay, without regard to
area, power, or other limitations that may be as important as delay. In some cases,
compromises will be necessary to obtain practical designs. For example, if this
procedure is used to design drivers for a high-capacitance bus, the drivers may be
too big to be practical. You may compromise by using a larger stage delay than
the design procedure calls for, or even by making the delay in the last stage much
greater than in the other stages; both of these approaches reduce the size of the
final driver and increase delay.
The method of logical effort achieves an approximate optimum. Because it ig-
nores a number of second-order effects, such as stray capacitances between series
transistors within logic gates, a circuit designed with the procedure given above
can sometimes be improved by careful simulation with a circuit simulator and
subsequent adjustment of transistor sizes. However, we have evidence that the
method of logical effort alone obtains designs that are within 10% of the mini-
mum.
One of the strengths of the method of logical effort is that it combines into
one framework the effects on performance of capacitive load, of the complexity
of the logic function being computed, and of the number of stages in the network.
For example, if you redesign a logic network to use high fan-in logic gates in
order to reduce the number of stages, the logical effort increases, thus blunting
the improvement. Although many designers recognize that large capacitive loads
must be driven with strings of drivers that increase in size geometrically, they are
not sure what happens when logic is mixed in, as occurs often in tri-state drivers.
The method of logical effort addresses all of these design problems.
The information presented in this chapter is sufficient to attack almost any
design. The next chapter applies the method to a variety of circuits of practical
importance. Chapter 4 exposes the model behind the method and derives the equa-
tions presented in this chapter. Chapter 4 shows how to compute the logical effort
of a logic gate and exhibits a catalog of logic gate types. Chapter 5 describes how
to measure various parameters required by the method, such as p inv and . The
remaining chapters explore refinements to the method and more intricate design
problems.
1.5. EXERCISES 21
C C y
x
6C C 6C
(a) (b)
Figure 1.9: Two circuits for computing the AND function of two inputs.
1.5 Exercises
1-1 [20] Consider the circuits shown in Figure 1.9. Both have a fanout of 6, i.e.,
they must drive a load six times the capacitance of each of the inputs. What is the
path effort of each design? Which will be fastest? Compute the sizes x and y of
the logic gates required to achieve least delay.
1-2 [20] Design the fastest circuit that computes the NAND of four inputs with a
fanout of 6. Consider a 4-input NAND gate by itself, a 4-input NAND gate followed
by two additional inverters, and a tree formed by two 2-input NAND gates whose
outputs are connected to a 2-input NOR gate followed by an inverter. Estimate the
shortest delay achievable for each circuit. If the fanout were larger, would other
circuits be better?
1-3 [10] A 3-stage logic path is designed so that the effort borne by each stage is
10, 9, and 7 delay units, respectively. Can this design be improved? Why? What
is the best number of stages for this path? What changes do you recommend to
the existing design?
1-4 [10] A clock driver must drive 500 minimum-size inverters. If its input must
be a single minimum-size inverter, how many stages of amplification should be
used? If the input to the clock driver comes from outside the integrated circuit via
an input pad, could fewer stages be used? Why?
1-5 [15] A particular system design of interest will have eight levels of logic
between latches. Assuming that the most complex circuits involve 4-input NAND
gates in all eight levels of logic, estimate a useful clock period.
1-6 [20] A long metal wire carries a signal from one part of a chip to another.
22 CHAPTER 1. THE METHOD OF LOGICAL EFFORT
Only a single unit load may be imposed on the signal source. At its destination
the signal must drive 20 unit loads. The wire capacitance is equivalent to 100 unit
loads; assume the wire has no resistance. Design a suitable amplifier. You may
invert the signal if necessary. Should the amplifier be placed at the beginning,
middle, or end of the wire?
Chapter 2
Design Examples
This chapter presents a number of design examples worked out in detail. To sim-
plify the presentation, some of the designs are simpler than cases that are likely
to arise in practice. The last design, however, is taken from an actual problem
confronted by designers.
As you read through the examples, focus not only on how the mechanics of the
method of logical effort are applied, but also on the insights into circuit structure
that the concepts of logical effort permit. Perhaps the greatest strength of the
method of logical effort is in simplifying analysis of structural variants.
All of these examples assume we are using CMOS logic gates with p inv = 1:0.
Values for the logical effort and parasitic delay of logic gates are obtained from
Tables 1.1 and 1.2 respectively. The best number of stages to accommodate a
given path effort is obtained from Table 1.3.
23
24 CHAPTER 2. DESIGN EXAMPLES
g = 10/3 g=1
p=8 p=1
(a)
g=2 g = 5/3
p=4 p=2
(b)
(c)
Figure 2.1: Three circuits for computing the AND of eight inputs.
2.1. THE AND FUNCTION OF EIGHT INPUTS 25
circuit is (4-NAND, 2-NOR) and the third is (2-NAND, 2-NOR, 2-NAND, inverter).
Often the networks are symmetric, so that all paths through the network have the
same description, as is the case with all three circuits in Figure 2.1.
Let us start the analysis by computing the logical effort of each of the three
alternatives. In case a, the path logical effort is the product of the logical effort
of an 8-input NAND gate, which is 10=3, and that of an inverter, which is 1, so
G = 10=3 1 = 3:33. In case b, the logical effort is the product of 6=3, the
logical effort of a 4-input NAND gate, and 5=3, the logical effort of a 2-input NOR
gate, for a total of 10=3 = 3:33—the same as case a. The logical effort in the
last case is computed as (4=3) (5=3) (4=3) 1 = 2:96. Since we know that
logical effort is related to delay, we might conclude that the last case is the fastest
because it yields the lowest logical effort.
Logical effort is not the only aspect to consider, however, because the load to
be driven will also influence the speed of the circuit. In particular, the circuits do
not all have the same number of stages, and the method of logical effort shows
that minimum delay is obtained only when the number of stages is chosen to
accommodate the effort, both logical and electrical. So we can’t decide which
circuit will achieve the least delay until we know the electrical effort and can
determine the best number of stages.
The delay equation, Equation 1.17, tells us how the minimum delay that can
be obtained from each circuit is related to the electrical effort H the circuit bears.
These equations also include the effect of the parasitic delays, obtained by sum-
ming the parasitic delays of each of the logic gates along the path:
D^ = N (GBH )1=N + P
Case a D^ = 2(3:33H )1=2 + 9:0 (2.1)
Case b D^ = 2(3:33H )1=2 + 6:0 (2.2)
Case c D^ = 4(2:96H )1=4 + 7:0 (2.3)
Let us illustrate the effect of electrical effort on circuit choice by solving two
problems, one with H = 1, and one with H = 12. Table 2.1 shows the results
of evaluating the delay equations for the three circuits with different electrical
efforts. The table shows that for H = 1, the designs with two stages (cases a and
b) have less effort delay than the design with four stages (case c). Of the two-stage
designs, case b is faster because it has less parasitic delay. When the electrical
effort increases to H = 12, the design with the larger number of stages is best.
These results agree with the predictions for the best number of stages to use
for a given path effort. Since the logical effort of all three circuits is approximately
26 CHAPTER 2. DESIGN EXAMPLES
Case H =1 H = 12
NF 1=N ^
P D = NF + P NF
1=N 1=N
P D^ = NF 1=N + P
a 3.65 9.0 12.65 12.64 9.0 21.64
b 3.65 6.0 9.65 12.64 6.0 18.64
c 5.25 7.0 12.25 9.77 7.0 16.77
Table 2.1: Delays for computing the AND of eight inputs for two different values
of electrical effort.
three, we find that the path effort when H = 1 is F = GBH 3, while when
H = 12, F 36. Table 1.3 shows that when F = 3, a one-stage design will be
best, while when F = 36, a three-stage design will be best. Clearly, cases a and
b best approximate a one-stage design. It is not immediately obvious whether a
two-stage or four-stage path is closest to the three-stage design recommended by
the table, but usually it is better to err by one stage too many, as happens in this
example where case c is the fastest. Note that this reasoning ignores the effects of
parasitic delay when the logic gate types in the competing circuits are different,
as they are in this case. While this method yields approximate answers, a precise
answer requires comparing the delay equations for each circuit.
This example shows that the choice of circuit to use depends on the load to be
driven. Because there is a relationship between the load and the best number of
stages, one must know the size of the load capacitance in relation to the size of
the input capacitance in order to make the proper choice of circuit structure.
capacitance given the output load.
The inverter at the right should have Cin = 48 1=2:44 = 19:66. This
becomes the load for the third stage, which therefore should have C in = 19:66
(4=3)=2:44 = 10:73. This in turn becomes the load for the NOR in the second
stage, which should have Cin = 10:73 (5=3)=2:44 = 7:33. Finally, we can
2.2. DECODER 27
use this as the load on the NAND gate in the first stage, which should have C in =
7:33 (4=3)=2:44 = 4:0: This agrees with the specified input capacitance, so our
calculation checks.
If Ben Bitdiddle were building a full-custom chip, he could select transistor
sizes for each gate to match the input capacitances we have just computed. This
will be discussed further in Section 4.3. If Ben were using an existing cell library,
he could simply select the gates from the library which have input capacitances
closest to the computed values. We will see in Section 3.6 that modest deviation
from the computed sizes still gives excellent performance, so he should not be
concerned if his library does not have a cell of exactly the desired size. Even for a
full-custom design, it is necessary to adjust transistor sizes to the nearest available
size, such as an integer.
Since rounding will occur anyway and precision in sizing is not very im-
portant, experienced designers often perform logical effort calculations mentally,
keeping results to only one or two significant figures.
Now let us consider electrical effort of unity, which calls for design 2.1b. We
will again assume that the input capacitance is 4 units, so now the output capac-
itance is also 4 units. To obtain the fastest operation, each stage should bear an
effort f^ = F 1=2 = (3:33 1)1=2 = 1:83.
Working backward, the NOR gate in the second stage should have C in = 4
(5=3)=1:83 = 3:64. This is the load on the first stage NAND gate, which must have
input capacitance of 4. Notice that the NAND has an electrical effort 3:64=4 = 0:91
less than one! This result may seem somewhat alarming at first, but it simply
means that the load on the gate’s output must be less than the load presented at its
input, in order that the gate be sufficiently lightly loaded that it can operate in the
required time. In other words, since we’re equalizing effort in each stage, a stage
with large logical effort g must have small electrical effort h.
2.2 Decoder
Ben Bitdiddle is now responsible for memory design on the Motoroil 68W86, an
embedded processor targeting automotive applications. He must design a decoder
for a 16 word register file. Each register is 32 bits wide and each bit cell presents
a total load, gate and wire, equal to 3 unit-sized transistors. True and complemen-
tary versions of the four address bits are available and can each drive 10 unit-sized
transistors.
The decoder could be designed with a few stages of high fan-in gates or with
28 CHAPTER 2. DESIGN EXAMPLES
a0 a0 a1 a1 a2 a2 a3 a3
x x x x x x x x
y
z out0
y
z out15
many stages of simple gates. The best topology depends on the effort of the path.
Unfortunately, the path effort depends on the logical effort, which depends in turn
on the topology!
Because a decoder is a relatively simple structure, we can make an initial esti-
mate of the path effort by assuming the logical effort is unity. The electrical effort
is 32 3=10 = 9:6. The branching effort is 8 because the true and complementary
address inputs each control half of the outputs. Path effort is 9:6 8 = 76:8.
Hence, we should use about log4 76:8 = 3:1 stages. Since we neglected logical
effort, the actual number of stages will be slightly higher than the number we
have estimated. A 3-stage circuit is shown in Figure 2.2 while a 4-stage circuit is
considered in Exercise 2-3.
The circuit uses sixteen 4-input NAND gates. Since each address input must
drive eight of the NAND gates, yet can handle only a relatively small input capaci-
tance, we use an inverter to power up the signal. How do we size the decoder and
what is its delay?
Because the logical effort is 1 2 1 = 2, the actual path effort is 154 and the
stage effort is f = (154)1=3 = 5:36. Working from the output, the final inverter
must have input capacitance z = (32 3) 1=5:36 = 18 and the NAND gate
2.2. DECODER 29
a0 a1 a2 a3
u u u u
x x x x
v v v v
y
z out0
y
z out15
must have input capacitance y = 18 2=5:36 = 6:7. The delay is 3f + P =
3 5:36+1+4+1 = 22:1. These results are summarized in Case 1 of Table 2.2.
Suppose we keep all sizes the same except to choose a size v for the extra
p
inverter. We recall that the stage efforts of inverters u and v should be equal and
are therefore 5:36 = 2:32 because they must together bear the same effort as the
30 CHAPTER 2. DESIGN EXAMPLES
Case x y z u v P D
1 10 6.7 18 6 22.1
2 10 6.7 18 10 23.2 7 22.4
3 11.2 9.8 21.6 8.8 26.2 7 21.8
Table 2.2: Sizes and delays of decoder designs.
arbitration wiring
bus
Figure 2.4: The physical arrangement of five units connected by a common bus
and arbitration circuitry. The units are sufficiently large that the wires between
them have significant stray capacitance.
p
one-inverter path. Therefore, we can size v = 10 5:36 = 23:2. The delay of the
decoder via the 2-inverter leg is now 2:32 2 + 5:36 2 + 1 + 1 + 4 + 1 = 22:4.
These results are summarized in Case 2 of Table 2.2. This topology is less than
2% slower than the original design, so the approximation worked well.
If we were concerned about every picosecond of delay, we could try tweaking
some of the sizes. For example, the circuit may be improved by dedicating more
than half of the address input capacitance to one leg of the fork. Also, the circuit
may be improved by choosing a stage effort for the second two stages between
the efforts used for the 1-inverter and 2-inverter legs of the fork. We found the
best sizes by writing the delay equations in a spreadsheet and letting it solve for
minimum delay. The results are summarized in Case 3 of Table 2.2. The delay
improvement is tiny and was probably not worth the effort.
Ben Bitdiddle, faced with bizillions of transistors to design, would rather not
waste time tweaking sizes for tiny speedups. How could he have found in advance
that his design was good enough? We will show in Section 3.4 that the best pos-
sible delay is log F + P , where the best stage effort is about 4. Therefore, a
lower bound on the delay of the circuit in Figure 2.2 is 4 log 4 154+1+4+1 = 20:5.
2.3. SYNCHRONOUS ARBITRATION 31
C1 10 C2 10 C3
10
R2 R3
R1
d1 d2 d3
Unit 4 Unit 5
C3 10 C4 10
180
10
R4 R5
G5
d4 d5
Figure 2.5: Arbitration circuit for five units, using a daisy-chain method. Unit 1
has highest priority, and unit 5 lowest. Only the critical path is shown; additional
circuitry is required to compute grant signals for units 1 through 4.
2.3. SYNCHRONOUS ARBITRATION 33
Thus the gates on the daisy chain alternate between NAND and NOR gates, and
the daisy chain signal alternates between true and complement forms. Figure 2.5
shows all of the circuitry on the critical path from R 1 to G5 , but omits much of the
rest. We assume that the request signals are available in true or complement form,
that the grant signal can be computed in complement form, that each R i and Gi
is loaded with 10 units of capacitance, and that the daisy-chain wire leading from
one function unit to the next has a stray capacitance of 180 units.
Let us start by estimating the speed of the circuit shown in the figure. We will
analyze the stage delay di in each of the five stages, as shown in Figure 2.5. For
each stage, we determine the electrical and logical effort, which we multiply to
obtain the effort delay. The results are shown in Table 2.3: the overall effort delay
is 103, and parasitic delay is 9, for a total delay of 112.
Table 2.3 illustrates some of the defects in the circuit design of Figure 2.5. We
know that overall delay is least when the effort delay is the same in every stage,
but in this design the delays vary between 1.7 and 32. This observation suggests
that we have used the wrong number of stages in the design.
Let us compute the effort along the path. The electrical effort is 1, because
both the input capacitance of R1 and the output capacitance of G5 are 10. There
are four sites along the path at which the branching effort is (180 + 10)=10 = 19,
due to the stray capacitance of the wiring; thus the branching effort is 19 4 . The
logical effort is the product of the logical efforts of the gates, or 1 (4=3) (5=3)
(4=3) (5=3) = 4:94. The path effort is therefore F = GBH = 4:94 194 1 =
643785. Table 1.3 shows that we should be using 10 stages, rather than the five in
the present design. This is a big error, which suggests there is room for dramatic
improvement.
A simple improvement is to enlarge the NAND gates along the daisy chain.
If the input capacitance of each gate input were 90 rather than 10, the branching
effort would be reduced to 34 and the total effort becomes F = 4:94 81 1 =
34 CHAPTER 2. DESIGN EXAMPLES
Ci - 1
x Ci
180 x 180
w z z 10 10
Ri Gi R5 G5
R1 G1
Figure 2.6: An improved arbitration circuit, using two stages of logic for each
unit.
400. This calls for a 5-stage design, with an estimated delay of 5(400)1=5 +9pinv =
25:6, which is a vast improvement over the estimate of 112 for the original design.
However, this change increases the load on each of the request signals, which will
add more delay as well as more area.
on this path is the stray capacitance, 180, plus x + z , the input capacitance of
the two NAND gates in the next unit. For the critical path, H = C out =Cin =
(180 + x + z)=x. The logical effort along this path is the logical effort of the
NAND gate, which is 4=3 1 = 4=3. For the design to be fast, we know that we
should target a stage effort of about 4, as discussed in Section 1.3. Because we
are using a two-stage design, the two stages should bear an effort of 4 4 = 16.
So we have the equation:
F = GH (2.9)
16 = 43 (180 +xx + z) (2.10)
To solve this equation, we will assume that z is small compared to 180 + x, and
can be neglected. Solving, we obtain x = 16:4.
We can now calculate y in two ways. The NAND gate stage should have an
effort delay of 4, so:
f = gh (2.11)
4 = (4=3)(y=x) (2.12)
x + z = 22:3 units for the input capacitance of unit 2, for a total of 212. Thus
the electrical effort is H = 212=10 = 21. Since the logical effort of the inverters
is 1, the path effort F is also 21. Table 1.3 tells us that two stages of logic are
required to bear this effort, but we need an odd number of inversions. Shall we
use one or three inverters? The effort is closer to the range for three inverters than
one, so we use three. Another way of choosing the number of stages is to compute
N = log4 F = 2:2, then rounding N to 3, the nearest odd number of stages.
The stage effort delay will be H 1=N = 21:21=3 = 2:8. We know that the input
capacitance of the first inverter is 10 units, so the input capacitance of the second
will be 10 2:8 = 28, and that of the third will be 10 2:8 2:8 = 78.
Now that the design is finished, let us compute the delay we expect along
the critical path from R1 to G5 . This calculation is largely a matter of recalling
the stage delays used to obtain the transistor sizes. The calculation appears in
Table 2.4. The path effort delay is 33:7 and the parasitic delay is 14, for a total of
47:7. The improved circuit is better than twice as fast as the original. The designer
of the original tried to achieve speed by minimizing the number of logic gates in
the circuit, but a far faster circuit uses twice the number of gates!
Also notice that in this circuit the fixed wiring capacitance still dominates the
loading. Therefore, larger gates could have been used in the daisy chain, only
slightly increasing total loading on the C i signals while significantly reducing
stage effort. Finding exact solutions to problems with fixed loading usually re-
quires iteration, but the essential idea is to enlarge gates on the node with fixed
capacitance until their input capacitance becomes a non-negligible portion of the
node capacitance.
2.3. SYNCHRONOUS ARBITRATION 37
B2
B3
B4
x x x xxxxx
*
R1 G1 R3 G3 R5 G5
2.4 Summary
The design examples in this chapter illustrate a number of points about designing
for high speed.
2.5. EXERCISES 39
Tree structures are an attractive way to combine a great many inputs, espe-
cially when the electrical effort is large. These structures show up in adders,
decoders, comparators, etc. Chapter 11 shows further design examples of
tree structures.
Minimizing the number of gates is not always a good idea. The design of
Figure 2.6 uses twice as many gates in the critical path as the design of
Figure 2.5, but is substantially faster. The best number of stages depends on
the overall path effort.
Because delay grows only as the logarithm of the capacitive load, it is almost
always wise to consolidate load in one part of the circuit rather than to
distribute it around. Thus the broadcast scheme in Figure 2.7 is better than
the daisy-chain method. Section 7.4 considers this problem further.
When a path has a large fixed load, such as wire capacitance, the path can be
made faster by using a large receiving gate on the node because the larger
gates will provide much more current, yet only slightly increase the total
node capacitance. In other words, the larger receiver reduces the branching
effort of the path.
While the parasitic delay is important to estimate the actual delay of a de-
sign, it rarely enters directly into our calculations. Rather, it enters indirectly
into the choice of the best number of stages and, equivalently, the best effort
borne by each stage.
2.5 Exercises
2-1 [20] Compare the delays of the three cases in Figure 2.1 by plotting three
curves on one graph, one curve for each of the delays predicted by Equations 2.1
to 2.3. The graph should show total delay as a function of electrical effort, H , up
to H = 200. Consider also a case similar to case c, but with two more inverters
connected to the output. Write the delay equation for this case and add its plot to
the graph. What does the graph show?
40 CHAPTER 2. DESIGN EXAMPLES
2-2 [20] Find the network that computes the OR function of six inputs in least
time, assuming an electrical effort of 140. The network may use NAND and NOR
gates with up to four inputs, as well as inverters.
2-3 [20] Since we did not include logical effort in the estimate of the number of
decoder stages, we may have underestimated the best number of stages. Suppose
the decoder design with true and complementary inputs from Figure 2.2 were
modified to use 4 stages instead of 3 by adding another input inverter. Find the
best size for each stage and the delay of the decoder. Is it better or worse than the
3 stage design? Is the difference significant?
2-4 [15] The critical path for the middle units of the arbitration circuit in Fig-
ure 2.6 is from Ci;1 to Ci . This suggests that the sizes of the gates associated with
Ri and Gi can be made as small as we wish, e.g., w = z = 1. Is this a good idea?
Why or why not?
2-5 [10] The design in Figure 2.6 uses a NAND gate in each stage. Why not use
a NOR gate?
2-6 [25] The design in Figure 2.6 uses some rather large transistors. Suppose
that the largest logic gate you may use has an input capacitance of 30 units. How
fast a design can you obtain?
6
2-7 [25] Using the reasoning outlined in Section 2.3.3, compute transistor sizes
for the design in Figure 2.7, without assuming that x = 10. Why is x = 10 for the
fastest design?
2-8 [30] Suppose you are told to design an arbitration circuit like the ones
described in Section 2.3, with the requirement that its overall delay be no more
than 60 units. Which structure would you choose? Show a detailed design.
Chapter 3
The method of logical effort is a direct result of a simple model of logic gates in
which delays result from charging and discharging capacitors through resistors.
The capacitors model transistor gates and stray capacitances; the resistors model
networks of transistors connected between the power supply voltages and the out-
put of a logic gate. The derivations presented in this chapter provide a physical
basis for the following notions:
The logical effort, electrical effort, and parasitic delay are parameters of a
linear equation that gives the delay in a logic gate.
The least delay along a path of logic gates is obtained when each logic gate
bears the same effort.
The number of stages to use in a path for least delay can be computed know-
ing only the effort along the path and, remarkably, the parasitic delay of an
inverter.
The extra delay incurred by using the wrong number of stages is small un-
less the error in the number of stages is large.
41
42 CHAPTER 3. DERIVING THE METHOD OF LOGICAL EFFORT
Rui
in out
Cpi Cout
Cin
Rdi
Figure 3.1: Conceptual model of a logic gate, showing only one input. The output
is driven HIGH or LOW through a resistor.
The logic gate is modeled by the four quantities C in , Rui , Rdi , and Cpi , which
are related in various ways depending on the particular logic function, the perfor-
mance of the transistors in the CMOS process used, and so on. Because we are
interested in choosing transistor sizes to obtain minimum delay, we shall view a
3.1. MODEL OF A LOGIC GATE 43
Wp/Lp
in out
Wn/Ln
Figure 3.2: A design for an inverter. The transistors are labeled with the ratio of
the width to length of the transistor.
The scaling of the template increases the widths of all transistors by the factor
, leaving the transistor lengths unchanged. As a transistor’s width is scaled,
its gate capacitance increases by the scale factor, while its resistance decreases
by the scale factor. The relationships shown in these equations also reflect an
assumption that the pullup and pulldown resistances are equal, so as to obtain
equal rise and fall times when the output of the logic gate changes. This restriction
makes circuits slightly slower overall; it will be relaxed in Chapter 9.
The model shown in Figure 3.1 relates easily to the design of an inverter, such
as the template shown in Figure 3.2. The n-type pulldown transistor, with width
Wn and length Ln, is modeled in Figure 3.1 by the switch and resistor R di that
form a path from the output to ground. The p-type pullup transistor, with width
Wp and length Lp, is modeled by the switch and resistor Rui forming a path to
the positive power supply. The input signal is loaded by the capacitance formed
by the gates of both transistors, which is proportional to the area of the transistor
gates:
Ct = 1WnLn + 1WpLp
where 1 is a constant that depends on the fabrication process. The resistances are
44 CHAPTER 3. DERIVING THE METHOD OF LOGICAL EFFORT
determined by:
1=Rt = 2nWn=Ln = 2pWp=Lp
where 2 is a constant that depends on the fabrication process, and the ’s charac-
terize the relative mobilities of carriers in n and p-type transistors. Note that this
equation implies a constraint on the design of the inverter template to insure that
pullup and pulldown resistances are equal, namely n Wn =Ln = p Wp =Lp .
The model of Figure 3.1 also relates easily to logic gates other than inverters.
Each input is loaded by the capacitance of the transistor gates it drives. The circuit
of the logic gate is a network of source-to-drain connections of transistors such
that the output of the logic gate can be connected either to the power supply or
to ground, depending on the voltages present on the input signals that control
the transistors in the network. The pullup and pulldown resistances shown in the
model are the effective resistances of the network when the pullup or pulldown
path is active. We shall defer until Chapter 4 a detailed analysis of popular logic
gates and their correspondence to the model.
h = CCout (3.9)
in
p = R t CCpt
R (3.10)
inv inv
where Cinv is the input capacitance of the inverter template, and Rinv is the resis-
tance of the pullup or pulldown transistor in the inverter template.
Equation 3.6 gives the delay of a logic gate in terms of logical effort g , electri-
cal effort h, and parasitic delay p. This equation expresses absolute delay, unlike
its counterpart, Equation 1.5, where delay is measured in delay units. Absolute
delay and delay units are related by the time, , that is characteristic of the fab-
rication process. It is the delay of an ideal inverter with electrical effort of 1 and
no parasitic delay. With more accurate transistor models and a reformulation of
Equation 3.4, we could develop an analytic value for , expressed in terms of
transistor length and width, gate oxide thickness, mobility, and other process pa-
rameters. We shall use an alternative approach, extracting the value of from
suitable test circuits (see Section 5.1).
The logical effort, given by Equation 3.8, is determined by the circuit topology
of the template for the logic gate, and is independent of the scale factor . In
effect, the logical effort compares the characteristic RC time constant of a logic
gate with that of an inverter. Note that the logical effort of an inverter is chosen to
be 1.
The electrical effort, defined by Equation 3.9, is just the ratio of the load ca-
pacitance of the logic gate to the capacitance of a particular input. This is the same
as the definition in Equation 1.4. Observe that the size of the transistors used in
the logic gate influences the electrical effort, because it determines the gate’s input
capacitance. This is the only remnant of the scale factor .
Finally, Equation 3.10 defines the parasitic delay of the logic gate. Because
this equation is independent of the logic gate’s scale, , it represents a fixed delay
associated with the operation of the gate, irrespective of its size or load. Observe
that for an inverter, the parasitic delay, p is the ratio of the parasitic capacitance to
the input capacitance.
The linear relationship between delay and load expressed in Equation 3.6 is a
more general result than the formulation of our model might suggest. Although
our derivation has assumed that transistors behave like resistors, we would obtain
the same linear relationship if we had assumed that transistors are current sources.
In fact, our result is correct for any model of transistor behavior that combines a
current source and a resistor, and thus handles both the linear and saturated regions
46 CHAPTER 3. DERIVING THE METHOD OF LOGICAL EFFORT
Actually, Equation 3.6 requires only that delay grow linearly with load and
diminish linearly as the widths of transistors are scaled. The exponential behavior
of the output voltage in the simple model is described by a differential equation
relating the rate of change of output voltage to the value of the output voltage. As
the output voltage approaches its final value, its rate of change decreases because
of the smaller current provided by the resistors. If any of the parameters we have
assumed to be constant vary instead with output voltage, the differential equation
becomes more complex, but its solution retains the same character. For example,
if the capacitance of the transistor gates that form the driven load depends on their
voltage, as it really does, the behavior of the output voltage will be distorted from
exponential, but it will not change its general character. Similarly, if the current
through the transistors depends on their drain to source voltage, as it really does,
the behavior of the output voltage will be distorted from exponential, but again
will not change its general character.
Some effects that the model ignores have little effect on its application to the
method of logical effort. One of the most important is the variation in output cur-
rent because of different input gate voltages, which leads to variations in the delay
of a logic gate due to different risetimes of input signals. Long input risetimes
increase the delay of the logic gate because the pullup and pulldown networks are
not switched fully on or off while the input voltage is near the switching thresh-
old. If all risetimes are equal, our simple model again holds because all logic
gates will exhibit identical charging current waveforms and thus the same output
voltage waveforms. Because the method of logical effort leads to nearly equal
risetimes by equalizing effort borne by all logic gates, we are justified in omitting
risetime effects from Equation 3.6.
Further evidence to support the model is obtained from detailed circuit simu-
lations, described in Section 5.1. Although the delay model is very simple, it is
quite accurate when it is suitably calibrated. It is, indeed, the basis of models used
by most static timing analyzers.
3.3. MINIMIZING DELAY ALONG A PATH 47
Gate Gate
1 2
C3
Input Capacitance: C1 C2
Logical Effort: g1 g2
Parasitic Delay: p1 p2
Thus, delay is minimized when each stage bears the same effort, which is the
product of the logical effort and the electrical effort. This result is independent of
the scale of the circuits and of the parasitic delays. It does not say that the delays
in the two stages will be equal—the delays will differ if the parasitic delays differ.
This result generalizes to paths with any number of stages (Exercise 3-3) and
to paths that include branching effort. The fastest design always equalizes effort
in each stage.
Let us now see how to compute the effort in each stage. We have for a path of
length N :
h1 h2 hN = BH (3.15)
where the path electrical effort H is the ratio of the load on the last stage to the
input capacitance of the first stage and the branching effort B is the product of the
branching efforts at each stage. Define the path logical effort to be:
g1g2 gN = G (3.16)
To obtain minimum delay, the N factors on the left must be equal, so that each
stage bears the same effort f^ = gh. Thus the equation can be rewritten as:
f^N = F (3.18)
or
f^ = F 1=N (3.19)
Given G, B , H , and N for the path, we can compute F and therefore the stage
effort, f^, that achieves least delay. (Recall that our notation places a hat over a
quantity chosen to achieve least path delay.) Now we can solve for the electrical
^ i. To calculate transistor sizes, we work backward
effort hi of each stage: hi = f=g
or forward along the path, choosing transistor sizes to obtain the required electrical
effort in each stage. This is the procedure outlined in Section 1.2.
The path delay obtained by this optimization procedure is
X
D^ = (gihi + pi) = NF 1=N + P (3.20)
Although the parasitic delays do not affect the procedure for designing the path to
obtain least delay, they do affect the actual delay obtained. We will see in the next
section that parasitic delay also influences the best number of stages in a path.
3.4. CHOOSING THE LENGTH OF A PATH 49
Let us define the solution to this equation to be N^ , the number of stages to use to
obtain least delay. If we define = F 1=N to be the effort borne by each stage when
^
the number of stages is chosen to minimize delay, the solution of the equation can
;
be expressed as:
pinv + (1 ln ) = 0 (3.23)
In other words, the fastest design is one in which each stage along a path bears an
effort equal to , where is a solution of Equation 3.23. Thus we call the best
stage effort.
50 CHAPTER 3. DERIVING THE METHOD OF LOGICAL EFFORT
10
8
d= + pinv
6
4 3.59
e
2
0 1 2 3 4
pinv
Figure 3.4: Best effort per stage, , and corresponding best stage delay + pinv ,
as a function of pinv . Calculated from Equation 3.23.
for values of given values of pinv . Figure 3.4 shows the solution as a function
of an inverter’s parasitic delay. Note that if we assume that the parasitic delay of
an inverter is zero, then = e = 2:718; this is the familiar result when parasitic
delay is ignored [6]. Although Equation 3.23 is nonlinear, the equation:
fits it well over the range of reasonable inverter parasitics. For most of our exam-
ples, we shall assume that pinv = 1:0 and thus that = 3:59.
The quantity is sometimes called the best step-up ratio, because it is the ratio
of the sizes of successive inverters in a string of inverters designed to drive a large
capacitive load. Figure 3.4 shows the stage delay obtained when the best step-up
ratio is used. From Equation 3.6, the stage delay is the sum of the effort and the
parasitic delay.
Actual designs will require us to choose a step-up ratio that differs somewhat
from because the design must use an integral number of stages. Given the path
effort F , we must find the number of stages N ^ that gives the least delay; this
result will have a stage delay close to . Table 3.1 shows how to select N ^ , given
the effort F and several values of the parasitic delay of an inverter. The values of
F in the table satisfy N^ (F 1=N^ + pinv ) = (N^ + 1)(F 1=(N^ +1) + pinv ). These are
the values of path effort for which the best N ^ -stage design exhibits just as much
delay as the best (N ^ + 1)-stage design.
Some designs will not speed up when inverters are added. For example, if the
path effort is 10 and there are three stages of logic, the logic network already has
more stages than the optimum, which is two stages. In this case, we might try to
consolidate the three stages of logic into two; this may result in a speedup.
Equations 3.18 and 3.19 allow us to derive equations that approximate the
number of stages and delays when F is large. Using the fact that F = N , we
^
find:
N^ ln F = log F
ln X (3.25)
D ^ + pi
N (3.26)
As the effort gets large, we see that the stage delay approaches + p. For an
inverter chain, these two equations can be combined to read:
^.
Table 3.1: Table of ranges of path effort, F , and the best number of stages, N
3.4. CHOOSING THE LENGTH OF A PATH 53
^ 3
D(N)
^ ^
D(N)
2
1.51
1.26
1
0.25 0.5 1 2 4 8
^
N/N
Figure 3.5: The relative delay compared to the best possible, as a function of the
relative error in the number of stages used, N=N^ . Assumes pinv = 1.
where we assume the parasitic delay of each stage is the same. Let r be the ratio
^ stages to the delay when using the best number of
of the delay when using sN
^:
stages, N
r = D(sN^ )=D(N^ ) (3.29)
^ is best, we know that F
Since N = N^ . Solving for r, we obtain:
r = s( ++p p)
1=s
(3.30)
This is the relationship plotted in Figure 3.5 for p = 1 and thus = 3:59.
As the graph shows, doubling the number of stages from optimum increases
the delay only 26%. Using half as many stages as the optimum increases the
delay 51%. Thus one should not slavishly stick to exactly the correct number of
stages, and it is slightly better to err in the direction of using more stages than the
optimum. A stage or two more or less in a design with many stages will make
little difference, provided proper transistor sizes are used. Only when very few
stages are required does a change of one or two stages make a large difference.
A designer often faces the problem of deciding whether it would be beneficial
to change the number of stages in an existing circuit. This can easily be done by
calculating the stage effort. If the effort is between 2 and 8, the design is within
35% of best delay. If the effort is between 2.4 and 6, the design is within 15% of
best delay. Therefore, there is little benefit in modifying a circuit unless the stage
effort is grossly high or low.
Targeting a stage effort of 4 is convenient because 4 is a round number and it is
easy to compute the desired number of stages mentally. For values of p inv between
0.7 and 2.5, a stage effort of 4 also produces delays within 2% of minimum.
3.6. USING THE WRONG GATE SIZE 55
1 4s 16
64
3.7 Summary
This chapter has presented all of the major results of the method of logical effort.
These are summarized as follows:
1.6
1.5
1.4
D(s) / D(1)
1.3
1.2
1.133
1.1
1.044
1.0
Figure 3.7: The relative delay compared to the best possible, as a function of s,
the size error of a stage. Assumes pinv = 1.
3.8. EXERCISES 57
The next chapter shows how to estimate or measure the logical effort and
parasitic delay of logic gates for a particular fabrication process, and how
to measure .
The least delay along a path is obtained when each logic gate bears the same
effort. This result leads to the equation for delay along a path:
X
D = NF 1=N + pi (3.32)
Delay along a path is least when each stage bears effort , a quantity calcu-
lated from the parasitic delay of an inverter (Equation 3.23 and Figure 3.4).
This in turn determines the best number of stages to use, for any path effort
(Table 3.1). In practice, the stage effort deviates slightly from because the
number of stages, N , must be an integer.
Select about 4. Any value from 2 to 8 gives reasonable results and any
value from 2.4 to 6 gives nearly optimal results, so you can be sloppy and
still have a good design.
Estimate the delay of a path from the path effort as log F fanout-of-4 in-
4
verter delays.
3.8 Exercises
3-1 [25] Show that modeling transistors as current sources leads to the same
basic results (Equations 3.6 to 3.10).
3-2 [30] Using process parameters from your favorite CMOS process, estimate
values for and .
3-3 [30] Generalize the result of Section 3.3 to show that the least delay in a path
of N stages results when all stages bear the same effort.
3-4 [25] Redo the analysis in Section 3.4 to choose the best number of stages N
assuming we add 2-input NAND gates rather than inverters.
58 CHAPTER 3. DERIVING THE METHOD OF LOGICAL EFFORT
3-5 [30] One impediment to scaling each stage precisely is the resolution of
widths supported by the lithographic equipment used in fabrication. Suppose the
process could support only three distinct widths of each transistor type (n and p),
but that you could choose these widths. What would you choose? How might you
get the effect of widths greater than those chosen?
3-6 [15] If a logic string must be increased in length, the inverters can be added
either before or after the logic gates, or between them. What practical considera-
tions would cause one to choose one location over the other?
Chapter 4
The simplicity of the theory of logical effort follows from assigning to each kind of
logic gate a number—its logical effort—that describes its drive capability relative
to that of a reference inverter. The logical effort is independent of the actual size
of the logic gate, allowing one to postpone detailed calculations of transistor sizes
until after the logical effort analysis is complete.
Each logic gate is characterized by two quantities: its logical effort and its
parasitic delay. These parameters may be determined in three ways:
Using a few process parameters, one can estimate logical effort and parasitic
delay as described in this chapter. The results are sufficiently accurate for
most design work.
Using test circuit simulations, the logical effort and parasitic delay can be
simulated for various logic gates. This technique is explained in Chapter 5.
Using fabricated test structures, logical effort and parasitic delay can be
physically measured.
59
60 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
This alternative definition suggests an easy way to measure the logical effort of
any particular logic gate by experiements with real or simulated circuits of various
fanouts.
grouped into a bundle. Because bundles of complementary pairs of signals
occur frequently in CMOS circuits, we adopt a special notation: s stands
for a bundle containing the true signal s and the complement signal s. The
input group of a bundle contains all the signals in the bundle.
Total logical effort, the logical effort of all inputs taken together. The input
group contains all the input signals of the logic gate.
Terminology and context determine which kind of logical effort applies. The
adjective “total” is always used when total logical effort is meant, while the other
two cases are distinguished by the signals associated with them in context. “The
total logical effort of a 2-input NAND gate” is the logical effort of both inputs taken
together, while “the logical effort of a 2-input NAND gate” is the logical effort per
input of one of its two inputs.
The logical effort of an input group is defined analogously to the logical effort
per input, shown in the previous section. The analog of Definition 4.2 is: the
logical effort gb of an input group b is just
C PC
gb = C = Cb i
b
(4.1)
inv inv
62 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
where Cb is the combined input capacitance of every signal in the input group b,
and Cinv is the input capacitance of an inverter designed to have the same drive
capabilities as the logic gate whose logical effort we are calculating.
A consequence of Equation 4.1 is that the logical efforts associated with input
groups sum in a straightforward way. The total logical effort is the sum of the log-
ical effort per input of every input to the logic gate. The logical effort of a bundle
is the sum of the logical effort per input of every signal in the bundle. Thus a logic
gate can be viewed as having a certain total logical effort that can be allocated to
its inputs according to their contribution to the gate’s input capacitance.
6
chapter, we will also find general expressions for logical effort as a function of .
In Chapter 9, we will consider the benefits of choosing = .
Let us now design a 2-input NAND gate so that it has the same drive char-
acteristics as an inverter with a pulldown of width 1 and a pullup of width 2.
Figure 4.1b shows such a NAND gate. Because the two pulldown transistors of
the NAND gate are in series, each must have twice the conductance of the inverter
pulldown transistor so that the series connection has a conductance equal to that of
the inverter pulldown transistor. Therefore, these transistors are twice as wide as
the inverter pulldown transistor. This reasoning assumes that transistors in series
4.3. CALCULATING LOGICAL EFFORT 63
a 2 4
b 2
2
x 4
x
x
a 2
1 a 1
b
2 1
Figure 4.1: Simple gates. (a) The reference inverter. (b) A two-input NAND gate.
(c) A two-input NOR gate.
obey Ohm’s law for resistors in series. By contrast, each of the two pullup tran-
sistors in parallel need be only as large as the inverter pullup transistor to achieve
the same drive as the reference inverter. Here we assume that if either input to the
NAND gate is LOW, the output must be pulled HIGH, and so the output drive of the
NAND gate must match that of the inverter even if only one of the two pullups is
conducting.
We find the logical effort of the NAND gate in Figure 4.1b by extracting ca-
pacitances from the circuit schematic. The input capacitance of one input signal
is the sum of the width of the pulldown transistor and the pullup transistor, or
2 + 2 = 4. The input capacitance of the inverter with identical output drive is
Cinv = 1 + 2 = 3. According to Equation 4.1, the logical effort per input of the
2-input NAND gate is therefore g = 4=3. Observe that both inputs of the NAND
gate have identical logical efforts. Chapter 8 considers asymmetric gate designs
favoring the logical effort of one input at the expense of another.
We designed the NOR gate in Figure 4.1c in a similar way. To obtain the
same pulldown drive as the inverter, pulldown transistors one unit wide suffice.
To obtain the same pullup drive, transistors four units wide are required, since
two of them in series must be equivalent to one transistor two units wide in the
inverter. Summing the input capacitance on one input, we find that the NOR gate
has logical effort, g = 5=3. This is larger than the logical effort of the NAND
gate because pullup transistors are less effective at generating output current than
pulldown transistors. Were the two types of transistors similar, i.e., = 1, both
64 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
a 30 48
b 30
40
x 48
x
x
a 30
20 a 12
b
30 12
NAND and NOR gates would both have a logical effort of 1.5.
All of the sizing calculations in this monograph compute the input capacitance
of gates. This capacitance is distributed among the transistors in the gate in the
same proportions as are used when computing logical effort. For example, Fig-
ure 4.2 shows an inverter, NAND, and NOR gate, each with input capacitance equal
to 60 unit-sized transistors.
When designing logic gates to produce the same output drive as the reference
inverter, we are modeling CMOS transistors as pure resistors. If the transistor is
off, the resistor has no conductance; if the transistor is on, it has a conductance
proportional to its width. To determine the conductance of a transistor network,
the conductances of the transistors are combined using the standard rules for cal-
culating the conductance of a resistor network containing series and parallel resis-
tor connections. While this model is only approximate, it characterizes logic gate
performance well enough to design fast structures. More accurate values for logi-
cal effort can be obtained by simulating or measuring test circuits, as discussed in
Chapter 5.
An important limitation of the model is that it does not account for velocity
saturation. The velocity of carriers, and hence the current of a transistor, normally
scales linearly with the electric field across the channel. When the field reaches
a critical value, carrier velocity begins to saturate and no longer increases with
field strength. The field across a single transistor is proportional to V DD =L. In
sub-micron processes, VDD is usually scaled with L so that an NMOS transistor
in an inverter is on the borderline of velocity saturation. PMOS transistors have
lower mobility and thus are less prone to velocity saturation. Also, series NMOS
transistors have a lower field across each transistor and therefore are less velocity
4.4. ASYMMETRIC LOGIC GATES 65
a 4
b 4
x
2
c
1
2
b
x
c
latch total 4
(dynamic) d,
2,2
upper bounds total n2 2n 32/3 48 512/3
1+
per bundle n2n 16/3 16 128/3
1+
sections in two ways. First, the expressions apply to logic gates with an arbitrary
number of inputs, n. Second, they use a parameter for the ratio of p-type to n-type
transistor widths, so as to permit calculation of logical effort for gates fabricated
with various CMOS processes. Whereas the reference inverter in Figure 4.1a has
a pullup-to-pulldown width ratio of 2 : 1, a ratio of : 1 is used throughout this
section. Each logic gate will be designed to have a pulldown drive equivalent to
an n-type transistor of width 1 and a pullup drive equivalent to a p-type transistor
of width .
68 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
The logical effort per input is just 1=n of this value, because the input capacitance
is equally distributed among the n inputs.
Table 4.1 includes the expressions for logical effort and calculations for several
common cases: = 2, n = 2 3 4. Note from the equation that the logical effort
changes only slightly for a wide range of : when ranges from 1 to 3, the total
logical effort for n = 2 ranges from 3 to 2.5.
4.5.2 N OR gate
The n-input NOR gate consists of a parallel connection of pulldown transistors,
each of width 1, and a series connection of pullup transistors, each of width n .
The total logical effort is therefore:
gtot = n(11 +
+
n) (4.3)
Again, the logical effort per input is just 1=n times this value. Table 4.1 includes
examples of the logical effort of a NOR gate. For CMOS processes in which > 1,
the logical effort of a NOR gate is greater than that of a NAND gate. If the CMOS
fabrication process were perfectly symmetric, so that we could choose = 1, then
the logical effort of NAND and NOR gates would be equal.
Each data input is wired to a four-transistor select arm, which is in turn connected
to the output c. To select input i, only bundle s i is driven TRUE, which enables
current to flow through the pullup or pulldown structures in the select arm associ-
ated with di .
4.5. CATALOG OF LOGIC GATES 69
s1 2 s2 sn
s1 s2 sn
2
d1 d2 dn
2
Figure 4.4: An n-way multiplexer. Each arm of the multiplexer has a data input
di and a select bundle s i.
The total logical effort of a multiplexer is n(4 + 4 )=(1 + ) = 4n. The
logical effort per data input is just (2 + 2 )=(1 + ) = 2, and the logical effort per
select bundle is also 2. Note that the logical effort per input of a multiplexer does
not depend on the number of inputs. Although this property suggests that large,
fast, multiplexers could be built, stray capacitance in large multiplexers limits
their growth. This problem is analyzed fully in Chapter 11. Also, increasing
the number of multiplexer inputs tends to increase the logical effort of the select
generation logic.
A single multiplexer arm is sometimes called a tri-state inverter. When a mul-
tiplexer is distributed across a bus, the individual arms are often drawn separately
as tri-state inverters. Note that the logical efforts of the s and s inputs may differ.
b 2 2 b
a 2 2 a
a a
2 2
b 2 2 b
Figure 4.5: A two-input XOR gate, with input bundles a and b , and output c.
will have 2n;1 pulldown chains, each with n transistors in series, each of width n.
There will be 2n;1 pullup chains, each with n transistors in series, each of width
n . Thus the total logical effort will be 2n;1n(n + n )=(1 + ) = n2 2n;1. The
logical effort per input will be 1=(2n) times this figure, or n2 n;2 , and the logical
effort per input bundle will be 1=n times the total logical effort, or n2 n;1 .
For n = 3 and above, symmetric structures such as the one shown in Figure
4.6a fail to yield least logical effort. Figure 4.6b shows a way to share some of
the transistors in separate pullup and pulldown chains to reduce the logical effort.
Repeating the calculation, we see that the total logical effort is 24, which is a
substantial reduction from 36, the total logical effort of the symmetric structure
in Figure 4.6a. In the asymmetric version, bundles a and c have a logical effort
per bundle of 6. Bundle b has a logical effort of 12, which is the same as in
the symmetric version because no transistors connected to b or b are shared in the
asymmetric gate.
The XOR and parity gates can be altered slightly to produce an inverted output:
simply interchange the a and a connections. Note that this transformation does
not change any of the logical effort calculations.
a 3 a 3 a 3 a 3
b 3 b 3 b 3 b 3
c 3 c 3 c 3 c 3
(a) x
c c c c
3 3 3 3
b 3 b 3 b 3 b 3
a 3 a 3 a 3 a 3
a 3 a 3
b 3 b 3 b 3 b 3
c 3 c 3
x
(b)
c c
3 3
b b b b
3 3 3 3
a a
3 3
Figure 4.6: Two designs for three-input parity gates. (a) A symmetric design. (b)
An asymmetric design with reduced logical effort.
72 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
shows an asymmetric design, which shares transistors as does the XOR design
in Figure 4.6b. The total logical effort of this design is 10, and it is unevenly
distributed among the inputs. The a input has a logical effort of 2, while the b and
c inputs have logical efforts of 4 each.
a 2
2
b 2 2
c 2 2
(a)
2 2
2 2
2 2
a 2
b 2
2
c 2
2
2 2
(b)
2 2
Figure 4.7: Two designs for three-input majority gates. (a) A symmetric design.
(b) An asymmetric design with reduced logical effort.
74 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
g 2
k
2
Cin Cout
2 g
k 2
Figure 4.8: A carry-propagation gate. The carry arrives on Cin and leaves on C out .
The g input is HIGH if a carry is generated at this stage, and the k input is LOW if
a carry is killed at this stage.
2
q
d 2
Figure 4.9: A dynamic latch with input d and output q . The clock bundle is
.
4.5. CATALOG OF LOGIC GATES 75
2
c
a 2
b 2
Figure 4.10: A two-input inverting dynamic Muller C-element. The inputs are a
and b, and the output is c.
where wd is the width of transistors connected to the logic gate’s output. For this
estimate to apply, we assume that transistor layouts in the logic gates are similar
to those in the inverter. Note that this estimate ignores other stray capacitances in
a logic gate, such as contributions from wiring and from diffused regions that lie
between transistors that are connected in series.
This approximation can be applied to an n-input NAND gate, which has one
pulldown transistor of width n and n pullup transistors of width connected to
the output signal, so p = np inv . An n-input NOR gate likewise has p = np inv .
4.7. PROPERTIES OF LOGICAL EFFORT 77
technology. For CMOS with = 2, the total logical effort for 2-input NAND
and NOR gates is nearly, but not quite, three. If CMOS were exactly symmetric
( = 1), the total logical effort for both NAND and NOR would be exactly three;
the asymmetry of practical CMOS processes favors NAND gates over NOR gates.
In contrast to the weak dependence on , the logical effort of a gate depends
strongly on the number of inputs. For example, the logical effort per input of an
n-input NAND gate is (n + )=(1 + ), which clearly increases with n. When an
additional input is added to a NAND gate, the logical effort of each of the existing
inputs increases through no fault of its own. Thus the total logical effort of a logic
gate includes a term that increases as the square of the number of inputs; and
in the worst case, logical effort may increase exponentially with the number of
inputs. When many inputs must be combined, this non-linear behavior forces the
designer to choose carefully between single-stage logic gates with many inputs
and multiple-stage trees of logic gates with fewer inputs per gate. Surprisingly,
one logic gate escapes super-linear growth in logical effort—the multiplexer. This
property makes it attractive for high fan-in selectors, which are analyzed in greater
detail in Chapter 11.
The logical effort of gates covers a wide range. A two-input XOR gate has a
total logical effort of 8, which is very large compared to the effort of NAND and
NOR of about 3. The XOR circuit is also messy to lay out because the gates of its
transistors interconnect with a criss-cross pattern. Are the large logical effort and
the difficulty of layout related in some fundamental way? Whereas the output of
most other logic functions changes only for certain transitions of the inputs, the
XOR output changes for every input change. Is its large logical effort related in
some way to this property?
The designs for logic gates we have shown in this chapter do not exhaust the
possibilities. In Chapter 8, logic gates are designed with reduced logical effort
for certain inputs that can lower the overall delay of a particular path through a
network. In Chapter 9, we consider designs in which the rising and falling delays
of logic gates differ, which saves space in CMOS and permits analysis of ratioed
NMOS designs with the method of logical effort.
4.8 Exercises
4-1 [20] Show that Equation 4.1 corresponds to the definition of logical effort
given in Equation 3.8.
4.8. EXERCISES 79
a b
b a
a b
b a
4-2 [20] Modify the latch shown in Figure 4.9 so that its output is statically
stable, even when the clock is LOW. How big should the transistors be? What is
the logical effort of the new circuit?
4-3 [20] In a fashion similar to Exercise 4-2, modify the dynamic C-element
so that its output is static. How big should the transistors be? What is the logical
effort of the new circuit?
4-4 [20] Another way to construct a static C-element is shown in Figure 4.11.
What relative transistor sizes should be used? What is the logical effort of the
gate?
4-5 [20] Figure 4.8 shows an adder element that inverts the polarity of the carry
signal. A different design will be required for stages that accept a complemented
carry input and generate a true carry output. Design such a circuit and calculate
the logical effort of each input.
4-6 [10] In many CMOS processes the ratio of pullup to pulldown conductance,
, is greater than 2. How high does have to be before the logical effort of NOR
is twice that of NAND?
4-7 [20] The choice of transistor sizes for the inverter of Figure 4.1 was influ-
80 CHAPTER 4. CALCULATING THE LOGICAL EFFORT OF GATES
b
a+b
a
d q
Figure 4.13: An inverting bus driver circuit, equivalent to the function of a tri-state
inverter.
enced by the value of . Express the best pullup and pulldown transistor sizes
in an inverter as a function of to obtain minimum delay in a two-inverter pair.
Consider rising and falling delays separately.
4-8 [20] Compare the logical effort of a two-stage XOR circuit such as shown in
Figure 4.12 with that of the single stage XOR of Figure 4.4. Under what circum-
stances is each perferable?
4-9 [20] Figure 4.13 shows a design for an inverting bus driver that achieves the
same effect as a tri-state inverter. Compare the logical effort of the two circuits.
Under whiat circumstances is each perferable?
4-10 [25] Measure the gate and diffusion capacitances of your process. From
these values, estimate pinv .
Chapter 5
One can calculate the logical effort and parasitic delay of a logic gate from simple
transistor models, as in the preceding chapter, or can obtain more accurate values
by measuring the behavior of suitable test circuits. This chapter shows how to de-
sign and measure such circuits to obtain the two parameter values. The reader who
wishes to skip this chapter may wish to glance at Table 5.1, which summarizes the
characterization of one set of test circuits.
81
82 CHAPTER 5. CALIBRATING THE MODEL
500
400
Delay: d (ps)
300
200
100
0
0 2 4 6 8 10
Electrical Effort: h
Figure 5.1: Simulated delay of inverters driving various loads. Results from 0.6,
3.3v process.
Figure 5.2 shows a similar set of data for a two-input NAND gate. The straight
line in this case is fitted with the equation d = (g 2nand h + p2nand ), where the
value of was determined from the inverter characterization. This figure presents
delay along the vertical axis in units of , using the value of computed from
Figure 5.1. As a result, the slope of the fitted line will be the logical effort of the
NAND gate and the intercept will be its parasitic delay. Similar simulations will
calibrate an entire family of logic gates; some results are shown in Table 5.1.
Notice that the logical effort of NOR gates agrees fairly well with our model,
but that the logical effort of NAND gates is lower than predicted. This can be
attributed to velocity saturation, as discussed in Section 4.3.
The parasitic delay depends on layout and on the order of input switching.
These effects are discussed later in this chapter.
The values in these figures and table were obtained through simulation. The
remainder of this chapter discusses methods and pitfalls of logical effort charac-
terization.
5.1. CALIBRATION TECHNIQUE 83
Delay: d ( )
Electrical Effort: h
Figure 5.2: Simulated delay of two-input NAND gate driving various loads. Re-
sults from 0.6, 3.3v process. The vertical axis is marked in units of .
Designing a good test circuit is more subtle than one might initially imagine. A
reasonable first attempt would be to load a gate with a capacitor, apply a step
input, and measure the delay to the output crossing 50%. Such a circuit has two
major problems. It does not account for the input slope dependence of delay, and
it neglects the nonlinearity of MOS capacitors.
A better test circuit is shown in Figure 5.3 for a 2-input NAND. The circuit is
divided into four stages. The first two stages are responsible for shaping the input
slope. The third stage contains the gate being characterized. The final stage serves
as a load on the gate. Each stage contains a primary gate (a), a load gate (b), and
a load on the load (c)!
Gate (c) is necessary because of gate-drain overlap capacitance. If gate (c)
were removed, the output of gate (b) would switch very rapidly. Because of the
Miller effect, this would increase the effective input capacitance to gate (b). Sim-
ulation shows that this leads to an 8% overestimation of the delay of a fanout-of-4
inverter.
5.2. DESIGNING TEST CIRCUITS 85
;
works for NOR and NAND gates equally. Signal 0 is called the innermost signal,
while signal n 1 is the outermost.
The parasitic delay of a gate changes greatly with choice of input, as shown in
86 CHAPTER 5. CALIBRATING THE MODEL
in0 C0
in1 C1
in2 C2
in3 C3
Figure 5.4. When input 0 of a NAND gate rises, all the other NMOS transistors were
already on and had discharged diffusion capacitances C1 –C3 to ground. The gate
has little parasitic delay because only the diffusion C 0 on the output node must
switch. On the other hand, when the outer input (e.g. input 3 of a 4-input NAND)
rises last, all the diffusion capacitances C0 –C3 are initially charged up to near
VDD . Discharging this capacitance diverts some output current and increases the
parasitic delay. Indeed, parasitic delay from the outer input scales quadratically
with the number of inputs as discussed in Section 4.6. On account of parasitic
delay, it is usually best to place the latest arriving signal on input 0.
Multiple-input gates also have somewhat lower logical effort than computed
in Chapter 4 because only one input is switching, as we have seen in Table 5.1.
The other transistors have already turned on and have a lower effective resistance
than the switching transistor. If inputs to series transistors arrive simultaneously,
the logical effort will be greater than these simulations have indicated.
AD
AS L
Figure 5.5: Simplified transistor structure illustrating diffusion area and perimeter
for capacitance computation.
P2 P3 P4
P1 P5
N4
N5
N1 N2 N3
Transistor W AS AD PS PD
N1 8 40 40 18 18
P1 16 80 80 26 26
N2 4 20 12 14 6
N3 4 20 12 14 6
P2 8 40 24 18 6
P3 8 40 24 18 6
N4 16 80 24 26 3
N5 16 24 80 3 26
P4 16 80 48 26 6
P5 16 80 48 26 6
Table 5.3: Diffusion area and perimeter capacitances of transistors in Figure 5.6.
Reported in units of 2 and , respectively, where is half of the minimum drawn
channel length. Layout according to MOSIS submicron design rules.
VDD = 3.3
VDD = 5
100
(ps) log scale
VDD = 2.5
50
20
10
2.0 1.2 0.8 0.6 0.35
Process
Figure 5.7: for various processes and voltages. Simulated using MOSIS SPICE
parameters
Logical effort can also be measured from fabricated chips by plotting the fre-
quency of ring oscillators. The oscillator should contain an odd number of in-
verting stages. The frequency of the ring oscillator is related to the delay of the
gate, as was discussed in Example 1.1. Oscillators with different fanouts provide
data for the delay vs. electrical effort curves and thus values for logical effort and
parasitic delay.
Care should be taken to load the load gates suitably to avoid excessive Miller
multiplication of the load capacitance. Also, fabricated chips will include wire
capacitance, which may have been neglected in simulation. Finally, the output
should be tapped off one of the load gates to avoid extra branching effort on the
ring oscillator. Unfortunately, this is not possible in rings built from fanout-of-1
gates.
cannot directly drive another, there must be an inverting static gate between dy-
namic stages. Attempting to measure delay from input crossing 50% to output
crossing 50% often leads to misleading results. If the slope of the input is slow,
dynamic gates may even have a negative delay. A better approach is to character-
ize the delay of the dynamic gate and subsequent inverter as a pair. Remember
to use an electrical effort for the pair equal to the produce of the electrical ef-
forts of each stage. Initial estimates of logical effort can be used to determine the
size of the static gate such that the stage effort of the dynamic and static gate are
approximately equal.
Static gates are sometimes skewed to favor a particular transition, as will be
discussed in Section 9.2.1. For example, a high-skew gate with a larger PMOS
transistor may be used after a dynamic gate. Characterizing a chain of identical
skewed gates also leads to misleading results. We would like to characterize the
logical effort of the rising output of a high-skew gate because that is the delay
which would appear in a critical path. If a chain of such skewed gates is used,
the input slope will come from a falling transition and will be unreasonably slow.
This retards the rising output as well. To avoid this problem, characterize skewed
gates as part of a unit, just as recommended for dynamic gates.
5.5 Summary
This chapter has explored the accuracy of the method of logical effort through
circuit simulation. The results suggest that the calculation methods described in
Chapter 4 are good, but that somewhat greater accuracy and confidence can be
obtained from more detailed calibrations.
Since the logical effort and parasitic delay of gates change only slightly with
process, is a powerful way to characterize the speed of a process with a single
number. Parasitic delay varies more than logical effort, but since effort delay
usually exceeds parasitic delay, the variation is a smaller portion of the overall
delay. By expressing the delay of circuits in terms of , or in the more widely
recognized unit of fanout-of-4 inverter (FO4) delay (1 FO4= 5 ), the designer
can communicate with others in a process-independent way and can easily predict
how gate performance will improve in more advanced processes.
5.6. EXERCISES 93
5.6 Exercises
5-1 [25] Determine the logical effort and parasitic delay of an inverter, 2-input
NAND, and 2-input NOR gate in your process. How well do the numbers agree
with the estimates from Chapter 4 and the measurements in this chapter?
5-2 [20] Make plots of delay vs. electrical effort for each of the three inputs
of a three-input NAND gate, using values from Tables 5.1 and 5.2. What general
advice can you extract from your plot?
5-3 [15] The two inputs of an ordinary 2-input NAND gate differ because one of
them connects to a transistor close to the output and the other to a transistor close
to a power rail. Show how to build a 2-input NAND with identical logical efforts
per input using two ordinary 2-input NAND gates with a shorted output.
94 CHAPTER 5. CALIBRATING THE MODEL
Chapter 6
Forks of Amplifiers
The most difficult problems in applying the method of logical effort stem from
branching. When a logic signal divides within a network and flows along multi-
ple paths, we must decide how to allocate the input current. How much should
each path load the common input? In general, paths that have higher logical or
electrical effort should receive a greater share of the input signal’s drive. When
a logic signal has significant parasitic capacitance, for example when it drives a
long wire, it branches: as some of the signal is diverted to charge the parasitic
load, less drive is available to the logic path.
Optimizing networks that branch usually involves adjusting branching effort
along paths to equalize the delays in several paths through the network. Determin-
ing the branching factors adds a new element of difficulty to our design method
that can be quite tricky to handle. One of the complications is that different paths
through a network often have different numbers of stages. Branching can some-
times be straightforward, depending on the design problem. For example, branch-
ing is simple in the synchronous arbitration problem of Section 2.3.
This chapter covers a simple but common case of branching: generating the
true and complement forms of a logic signal. We call such circuits “forks,” after
the general appearance of their circuit diagrams. Forks are interesting not only for
their own utility, but also as a further exercise in applying the method of logical
effort. Many CMOS circuits require forks to produce such true and complement
signals. For example, an arm of the multiplexer circuit of Figure 4.4 is switched
on when one of its control lines, s i , is made HIGH, and the other, s i , is made LOW.
The XOR circuit shown in Figure 4.5 also requires true and complement forms of
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
95
96 CHAPTER 6. FORKS OF AMPLIFIERS
a.H
*
a
a.L
*
Figure 6.1: General form of an amplifier fork. One leg inverts the input signal and
one does not.
its two input signals. We often use the notation x:H and x:L to indicate true and
complement forms of a signal x, respectively. The signal x:H is HIGH and x:L is
LOW when x is true; x:H is LOW and x:L is HIGH when x is false.
a.H
a
a.L
a.H
a
a.L
Figure 6.2: A 2-1 fork and a 3-2 fork, both of which produce the same logic
signals.
N stages
load Ca
Cina
Cin
load Cb
Cinb
Cout = Ca + Cb
N-1 stages
one in Figure 4.4, however, presents unequal loads, because the pullup transistor
driven by s is wider than the pulldown transistor driven by s. We shall defer to
Chapter 7 consideration of circuits with multiple outputs driving different load
capacitance (but see Exercise 6-1).
The design of a fork starts out with a known load on the output legs and known
total input capacitance. As shown in Figure 6.3, we shall call the two output
capacitances Ca and Cb . The combined total load driven we will call C out =
Ca + Cb. The total input capacitance for the fork we shall call Cin = Cina + Cinb ,
and can thereby describe the electrical effort for the fork as a whole to be H =
Cout =Cin. This electrical effort of the fork may differ from the electrical efforts of
the individual legs, Ca =Cina and Cb =Cinb .
The input current to an optimized fork may divide unequally to drive its two
legs. Even if the load capacitances on the two legs of the fork are equal, it is not
in general true that the input capacitances to the two legs of the fork are equal.
Because the legs have a different number of amplifiers but must operate with the
same delay, their electrical efforts may differ. The leg that can support the larger
electrical effort, usually the leg with more amplifiers, will require less input cur-
98 CHAPTER 6. FORKS OF AMPLIFIERS
rent than the other leg, and can therefore have a smaller input capacitance. If we
call the electrical efforts of the two legs Ha and Hb , using the notation of Fig-
ure 6.3, then Ha = Ca =Cina and Hb = Cb =Cinb . Even if Ca = Cb , Ha may not
equal Hb and Cina and Cinb may also differ.
The design of a fork is a balancing act. Either leg of the fork can be made
faster by reducing its electrical effort, which is done by giving it wider transistors
for its initial amplifier. Doing so, however, takes input current away from the other
leg of the fork and will inevitably make it slower. A fixed value of C in provides,
in effect, only a certain total width of transistor material to distribute between the
first stages of the two legs; putting wider transistors in one leg requires putting
narrower transistors in the other leg. The task of designing a minimum delay fork
is really the task of allocating the available transistor width set by C in to the input
stages of the two legs.
Example 6.1 Design a 2-1 fork with input capacitance C in = 10 and total output
capacitance Cout = 200. What is the total delay of the fork?
Example 6.2 Design a 3-2 fork with the same input and output capacitances as
in the previous example.
6.2. HOW MANY STAGES SHOULD A FORK USE? 99
These two examples show that obtaining the least delay requires choosing the
right number of stages in the fork. This result is not surprising. In fact, we should
have anticipated that the first design could be improved because the effort of the
one-inverter leg is 13:5, very far from the best . This result suggests that we
should develop a method to determine the best number of stages to use in a fork.
The next section turns to this problem.
;
and electrical effort that was discussed in Chapter 1. Figure 6.4 shows schemat-
ically a plot of delay versus electrical effort for amplifiers with N 1, N and
N + 1 stages. The thick curve represents the fastest possible amplifier for any
given electrical effort, and so no amplifier design may lie below it. For different
electrical efforts, different numbers of stages are required to obtain this optimum
design, as the figure shows.
The task of designing a fork is specified by giving an electrical effort that the
combined activities of its two legs are to support. In Figure 6.4 such an electrical
effort is represented by the vertical line. One possible design for the fork requires
each leg to support exactly this electrical effort. Since the two legs of the fork
must produce true and complement signals, their lengths must differ by an odd
number of inverters. Thus if one leg has N stages, its delay can be reduced to the
;
point labeled z in the figure, but the other leg must have either N 1 or N + 1
stages, and its delay can be reduced only to the points labeled x and y respectively.
For the particular electrical effort we have chosen, point y is faster than point x.
Thus we have a fork with two legs that support equal electrical effort but have
unequal delay.
100 CHAPTER 6. FORKS OF AMPLIFIERS
es
Stag
N+1
y b
x
es
ag
St
N
Delay d
z
Given Electrical Effort
ges
Sta
N-1
Electrical Effort F
Figure 6.4: A plot of delay vs. electrical effort for reasoning about forks.
6.2. HOW MANY STAGES SHOULD A FORK USE? 101
We can improve such a fork by raising the electrical effort of one leg and
reducing the electrical effort of the other in such a way as to continue supporting
the required total electrical effort. In effect we slide to the right from point z , thus
increasing its delay, and the left from point y , thus decreasing its delay. We do this
by reallocating transistor width from the transistors of the first amplifier in one leg
to the transistors of the first amplifier in the other leg. Our intent, of course, is to
decrease the delay of the slower leg as much as we can, which will be until the
two legs are equal in delay.
One might think that there are two possible discontinuities in the process of
reallocating the input transistor width. From point z moving to the right we may
reach point b before the equal delay condition is met, or from point y moving to
the left we may reach point w before the equal delay condition is met. Z could not
possibly reach point b, however, because the delay at point y is already less than
that at point b. If point y reaches point w before the equal delay condition is met,
;
we should change it from N + 1 to N 1 stages and continue along the N 1 ;
stage curve until we reach the equal delay condition. It is not hard to see that for
any placement of the given electrical effort line, this optimization procedure will
result in a fork with legs that differ in length by a single amplifier.
One leg of a fork will always have exactly the same number of stages as would
an optimum amplifier supporting an equal electrical effort. This is easy to see from
Figure 6.4. If the given electrical effort line crosses through the dark optimum
curve in the segment where N stages are best, one arm will have N stages. The
;
other arm will have either N + 1 or N 1 stages. Thus one simple way to design
nearly optimal forks is to choose the number of stages for one leg from Table 1.3,
and then use one more or one fewer stage for the other leg. The electrical effort
for the fork as a whole, H = Cout =Cin , can be used as a guide, since the electrical
efforts of each leg will turn out to be nearly that value. Applying this technique
to Example 6.1, H = 20 would have correctly suggested a 3-2 fork as the best
design.
A more precise guide for choosing the number of stages in a fork appears in
Table 6.1. For any given electrical effort, the table shows what kind of fork to
use. Remember that the electrical effort of the fork is the total load of all the legs
divided by the total input capacitance. The break points in Table 6.1 lie between
the break points in Table 1.3. It is easy to see that this must be so. There are
certain electrical efforts, namely 5.83, 22.3, 82.2, and so on, for which strings of
amplifiers N and N + 1 long give identical delays. Obviously, for an electrical
effort of 22.3, for example, a 3-2 fork is best, because both strings of 3 amplifiers
and strings of 2 amplifiers have the same delay for this electrical effort. For an
102 CHAPTER 6. FORKS OF AMPLIFIERS
Example 6.3 Design a path to drive the enables on a bank of 64 tri-state bus
drivers. The first stage of the path can present an input capacitance of twelve
unit-sized transistors and the tri-state drivers are each 6 times unit size. A unit
sized tri-state is shown in Figure 6.5.
The load on each of the true and complementary enable signals is
64 6 2 = 768 unit-sized transistors. Therefore the electrical effort
of the entire bundle is (768 + 768)=12 = 128. From Table 6.1, we
find that a 4-3 fork is best. Now we must divide the input capacitance
between the two legs. If the 4-inverter leg gets a fraction
, then we
have:
6.2. HOW MANY STAGES SHOULD A FORK USE? 103
s 2
x
s 2
d 2
!1=4 !1=3
768
4 12 768
+ 4pinv = 3 12(1 ;
) + 3pinv
(6.3)
24
6.5 32 156
s 12
enable x0 to 63 other
5.5 19 65 224 tri-states
s 12
d0 12
Figure 6.7, because the delays in the two paths cannot be equalized: the delay of
the zero-stage path is guaranteed to be less than that of the one-stage path. Ex-
ercise 6-5 examines the performance penalty of 1-0 forks. Rather than use a 1-0
fork, it is better to use a 2-1 fork and, if necessary, remove a stage somewhere else
in the network.1
6.3 Summary
The forks we have designed in this chapter are a special case of more complex
circuits with branches that operate in parallel. While the general cases are more
1
Rather than remove a stage of the network, the stage immediately preceding the fork can be
duplicated in each of the fork’s paths, in effect moving the branch point from after this gate to
before this gate.
6.4. EXERCISES 105
difficult to solve, two of the techniques shown in this chapter apply to more com-
plex branching problems covered in the next chapter:
The path effort of a network, measured as the total load capacitance divided
by the input capacitance, can be used to estimate the correct number of
stages to use.
In the case of amplifier forks driving equal loads, we have shown that the
number of stages in the two paths is nearly the same. However, as we shall see in
more general cases that have different efforts along different paths, the lengths of
different paths in a network may differ substantially.
6.4 Exercises
6-1 [25] A common use of forks in CMOS designs is to drive the enable signals
of multiplexers, which present different loads to the true and complement signals.
Many multiplexers may be driven by the same logic signal, resulting in a very
large load. Modify the analysis of this chapter to apply to these forks, assuming
the load on the leg with an odd number of stages is twice the load on the leg with
an even number of stages; note that this assumption is equivalent to saying that
the higher load must be driven by a signal that is the logical complement of the
input to the fork. Build a table analogous to Table 6.1 that tells a designer what
kind of fork to use.
6-2 [20] Generalize Exercise 6-1 to guide a designer when either the true or
complement form of the input signal is available.
6-3 [25] Set up equations for computing the entries in Table 6.1. Solve them and
verify that your answers match those of the table. (You’ll want to write a computer
program or spreadsheet to find the solutions.)
6-4 [30] Suppose that for H > 38:7, rather than building a pure fork, we use a
string of inverters driving a 3-2 fork. How far does this strategy depart from the
optimum fork designs?
106 CHAPTER 6. FORKS OF AMPLIFIERS
(a)
(b)
Figure 6.8: Comparing these two designs illustrates the problems with 1-0 forks.
6-5 [20] Propose an improvement to the design of Figure 6.7 that uses a 2-1 fork.
If each of the load capaitances is 400 times as large as the input capacitance, what
are the delays of the original and improved designs?
6-6 [20] Consider the two designs in Figure 6.8. The first uses a 1-0 fork, while
the other avoids this structure. Compare the delays of the two designs over a range
of plausible electrical efforts. Is the first design ever preferred?
Chapter 7
Logical effort is easy to use on circuits with easily computed branching efforts.
For example, circuits with a single output or a regular structure are easy to design.
Real circuits often involve more complex branching and fixed wire loads. There
is often no closed-form expression for the best design of such circuits, but this
chapter develops approximations and iterative methods that lead to good designs
in most cases.
Designing networks that branch requires not only finding the best topology for
the network, but also deciding how to divide the drive at a branch so that delays in
all paths are equalized. The previous chapter covered the special case of “forks of
amplifiers” that generate true and complemented forms of an input signal. In this
chapter, we build on the previous results to handle more general cases. We shall
consider circuits with two or more legs, where each leg may contain a different
number of stages, each leg may perform a different logic function, and each leg
may drive a different load. As one might guess, we can combine the logical and
electrical efforts associated with each leg to obtain a composite effort on which to
base our computations.
The simplicity of the forks considered in the previous chapter makes it possible
to choose their topology from a table on the basis of the overall electrical effort
imposed on the fork. In this chapter we will consider a more complex and varied
set of circuits. We will use the theory of logical effort to write equations that relate
the size of individual logic elements to their delay. By balancing the delay in the
various legs of the circuit, we will be able to reduce the delay in the worst path.
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
107
108 CHAPTER 7. BRANCHES AND INTERCONNECT
The first section analyzes a series of examples in order to develop some in-
tuitive arguments about branching networks. These examples are generalizations
of forks of amplifiers. Next, we turn to an exclusive-or network that involves not
only branching but also recombination of signals within the network. Intercon-
nect presents new problems because the capacitance of the wire does not scale
with gate size. However, circuits with interconnect can be analyzed on a case by
case basis. We close the chapter with an outline of a general design procedure for
dealing with branching networks.
Although it is possible to formulate network design problems as a set of delay
equations and solve for a minimum, the method of logical effort often provides
simple insights that yield good designs without a lot of numerical work. If neces-
sary, these initial designs can be adjusted based on more detailed timing analysis.
This equation holds for paths of any length, provided there are the same number
of stages in each path and that the parasitic delay of each path is equal. It shows
that the input drive should be allocated in proportion to the electrical effort borne
7.1. CIRCUITS THAT BRANCH AT A SINGLE INPUT 109
C = H1
C=1 C1
C = H2
C2
by the path. Once we have determined how to allocate the input capacitance, we
can calculate transistor sizes for each path independently.
What happens if the paths include logic, rather than simply inverters? The
method of logical effort teaches us that logical effort and electrical effort are in-
terchangeable. So if path 1 had a total logical effort of G1 and path 2 a total logical
effort of G2 , then Equation 7.1 becomes simply
F1 = F2
C1 C2 (7.2)
and Fi = Gi Hi for each path. In other words, the input capacitance should be
allocated in proportion to the total effort borne by each path.
Even more importantly, the entire configuration of Figure 7.1 is equivalent to a
single string of two inverters with output capacitance F 1 + F2 . The important point
is that the equivalent configuration has no branch; the effect of the branch has
been captured by summing the efforts of the two paths. This property allows us
to analyze branching networks by working backward from the outputs, replacing
branching paths with their single-path equivalents. The branching effort of the leg
is:
Example 7.1 Size the circuit in Figure 7.2 for minimum delay.
144
Cin = 12
192
Figure 7.2: A path with different logical and electrical efforts on each leg.
effort of the top leg is G1 = 5=3 7=3 1 = 3:89 and the logical
effort of the bottom leg is G2 = 1 1 1 = 1. Thus, the path effort
of the top leg is F1 = G1 H1 = 46:7 and the path effort of the bottom
leg is F2 = G2 H2 = 16. The overall path effort is F1 + F2 = 62:7.
First consider sizing the top leg. From the perspective of the top
leg, the circuit has a branching effort of B = (F1 + F2 )=F1 = 1:34.
Thus, the circuit can be designed as if it had only one leg with F =
G1BH1 = 62:7. The stage effort is f = 62:71=3 = 3:97. Starting at
the output and working backward, we find gate input capacitances of
36, 31, and 9 for the inverter, NOR, and NAND, respectively.
The bottom leg is now easy to size because we know the stage
effort must be the same, 3:97. Working backward, we find input ca-
pacitances of 48, 12, and 3, respectively. The total input capacitance
of the path is 3 + 9 = 12, meeting the original specification. Also,
the input capacitance is divided between the legs in a 3:1 ratio, just as
path effort is in a 46:7 : 16 3 : 1 ratio.
The delay of the top leg, including parasitics, is 3(3:97)+3+3+
1 = 18:91. The delay of the bottom leg is 3(3:97) + 1 + 1 + 1 =
14:91. Although we had attempted to size each leg for equal delay,
the different parasitics result in different delays. To equalize delay, a
larger portion of the input capacitance must be dedicated to the top
leg with more parasitics. Even when this is done, however, the delay
of each path is 18.28, representing only a 3% speedup.
example, the overall improvement from devoting more input capacitance to the
slower leg is often small.
If differences in parasitic delay are too great, we can use our analysis to find
an initial design, but to get the best design, we will need to adjust the branching
allocation. Usually, this is simply a matter of making accurate delay calculations
for each path, and modifying the branching allocation slightly. In effect, we are
finding a value of C1 for which
1=N 1=N
D = N CF1 + P1 = N 1 ;F2C + P2 (7.4)
1 1
Because parasitic delay adds considerable complexity to the algebra for analyzing
branching circuits, we will omit the effects of parasitic delay in the other exam-
ples in this chapter. In all cases, assuming zero parasitic delay leads to a pretty
good design that can then be refined by more accurate delay calculations and ad-
justments. A spreadsheet program is a handy tool for making such calculations.
F1 = D
C1
F2 = D 2
C2 2 (7.5)
C = F1
C=1 C1
C = F2
C2
C = F1
C1
C=1 C = F2
C2
C = F3
C3
Fi = D i
Ci i (7.8)
we could improve the performance of the circuit of Figure 7.4 by removing one
leg. If the delay is short, because F1 , F2 and F3 are small compared to C = 1,
then a simple 2-1 fork will have less maximum delay, and we should eliminate
two amplifiers from the three amplifier leg, in effect collapsing it into the first leg.
If the delay is long, because F1 , F2 and F3 are large compared to C = 1, then a
3-2 fork will have less maximum delay, and we should add two amplifiers to the
one amplifier leg, thus combining it with the three amplifier leg.
Of course, our example involves only inverters, and we want to consider cases
where each leg contains a logic function as well. When there are logic functions
involved, one might argue that the particular logic functions require the given
number of stages. That may be a valid argument for preserving the three-stage
leg, because there may be logic functions that can be done with less logical effort
in 3 stages than in a single stage. Thus if the delay is short, because F 1 , F2 , and
F3 are small compared to 1 and the logical efforts of the legs are also small, we
may nevertheless require the three-stage leg. On the other hand, if the delay is
long, because F1 , F2 , and F3 are large compared to 1 or large logical efforts are
involved, we could improve the design by augmenting the single stage leg with a
pair of inverters. We will think it unusual, therefore, to find a least delay circuit
whose legs differ in length by more than one. An important exception arises when
the problem is constrained by minimum drawn device widths.
This reasoning suggests that for purposes of branch analysis, we can always
collapse N -way branches (N > 2) into 2-way branches by combining paths. This
simplifies the equations for allocating input capacitance: we will have at most two
equations like Equation 7.8. Moreover, because logical and electrical effort are
interchangeable, all these branching problems are equivalent to designing ampli-
fier forks, covered in Chapter 6. Note, however, that when we model parasitic
delay properly, collapsing paths of equal length but different parasitic delay may
introduce errors.
In summary, circuits with a single input and multiple outputs can be analyzed
as forks, except that the effective load capacitance on each output must be in-
creased by the logical effort of the leg driving it. Because the minimum delay
circuits will generally have paths of nearly the same length, a good approxima-
tion to their performance can be obtained by assuming that all paths are the same
length, summing the path efforts, and analyzing the entire network as a single
path. We learned in Section 3.5 that the performance of strings of amplifiers or
strings of logic is relatively insensitive to small changes in their length. A good
first approximation, therefore, lumps all the effort of the network for choosing a
suitable path length.
114 CHAPTER 7. BRANCHES AND INTERCONNECT
2da
dg
H1
C=1 C1
H2
C2 da da
Now let us choose sample values to match closely the special case of 2-1 fork
with stage delay of 2. The logical effort of the NAND gate is 4=3, and H 1 = H2 =
7.3. CIRCUITS THAT BRANCH AND RECOMBINE 115
3, so the total effort is 8, considering the outputs as a bundled pair, which would
give a stage delay of 2 in a pure three-stage design. If both sides of the fork had
two stages, a per stage delay of 2 would be best, and the overall delay would be
6. One side of the fork, however, has only one stage instead of two. One might
think that because of the similarity of a single inverter with stage delay of 4 to a
pair with stage delays of 2 each substituting the single inverter for two to make
the fork would leave the stage delay unchanged. This is simply not so, as the
numbers show. Solving Equation 7.11 for the sample values, we find d a = 1:796
and dg = 2:35. In other words, a slightly faster circuit can be obtained by using
more time in the logic element and less in the fork, with an overall delay of 5.94
instead of 6. This is not entirely unreasonable because the single amplifier leg
of the fork is not as good at driving heavy loads as is the two amplifier side.
Nevertheless, in keeping with the relative insensitivity of delay to the number of
stages, the improvement of 1% may not be worth the effort of calculating how to
get it.
Consider an example with an overall effort of 20, namely g = 5=3, and H1 =
H2 = 6. Here three stages are more obviously required. If both sides of the fork
had two stages, we would have a pure three stage design with a stage delay of
2.71, and an overall delay of 8.14. Solving Equation 7.11, we find d a = 2:54 and
dg = 3:52, we find the best we can do is D = 8:60, reflecting the fact that there is
only one amplifier in one side of the fork. As we expected, dg = 3:52 lies between
da = 2:54 and 2da = 5:08.
In some circuits several stages of logic or amplification precede a single branch
point that leads to a fork. For such circuits, least delay will be obtained with a
stage delay in the early circuitry that lies between the stage delay of the longer leg
of the fork and the stage delay of the shorter leg of the fork. As the number of
stages grows larger, the influence of one stage more or less on the overall delay
becomes less, as we learned in Section 3.5, and thus nearly minimal delays can be
obtained by treating the outputs as a bundle. If more accurate results are desired,
it is easy to write equations similar to those in the figures for any particular case.
The solution of such equations leads to the fastest design.
w
C=1 y
x z C=H
for the input capacitance of each logic gate. Where optimization is required, these
expressions can be differentiated.
An interesting example of such a circuit is the form of XOR shown in Fig-
ure 7.6. While this circuit has only one output, its early stages branch and re-
combine in a way that requires an analysis similar to others in this chapter. The
topology of this circuit involves both some paths with three stages and some paths
with two stages. The output of the first stage recombines with direct inputs at the
second stage. Our interest in this example lies in learning how to analyze it and in
understanding the resulting delays.
We will solve this example three times. First, we shall assume that all delays
are equal and obtain the circuit with least delay that meets that condition. Second,
we shall permit the delay in the first stage, c, to differ from that of the other two
stages, d, and again obtain the circuit with least delay. We are interested in whether
minimum delay in this second situation requires c to be longer or shorter than d,
and whether the circuit where c and d differ will be faster or slower than when they
are required to be the same. Third, as a thought experiment, we shall add mythical
non-inverting amplifiers to the circuit, as indicated by the dotted symbols in the
figure. Although such amplifiers are not realizable, it will be instructive to study
the changes that they would cause to circuit performance by providing three stages
in each path.
As shown in the figure, the circuit consists of four NAND gates, which we
shall treat as having a logical effort of 4=3 per input; in order not to write 4=3 over
7.3. CIRCUITS THAT BRANCH AND RECOMBINE 117
and over, we shall use the symbol g for it. In order to have a particular electrical
effort to work with, we shall assume that the output loading, H , is the same as
the combined input capacitance, 2. We will also take advantage of symmetries in
the circuit to deal only with capacitances labeled x, y and z . Now we can write
three delay equations for each stage of the circuit and one equation constraining
the branch
c = g(2y=x)
d = g(z=y)
d = g(H=z)
x+y = 1
Eliminating x, y , z , we find an equation for c in terms of d. In the special case
where we insist c = d, we can solve to obtain d = 2:67, so D = c + 2d =
8. To obtain the lowest delay possible, we write an equation for D = c + 2d,
substitute the expression for c, take the partial derivative of D with respect to d,
and set it to zero. We obtain a fourth-order equation in d, which we solve to find
d = 2:98, implying c = 1:78, for an overall delay of D = 7:74. This is an
improvement over the value D = 8 obtained with c = d. Notice that the delay is
distributed unequally to the three stages; the first stage operates with less delay,
and the remaining two with more delay than when all three delays were forced to
be equal. This is because NAND x must operate faster than usual to try to equal
the zero delay through the zero-stage path from the input to gates y .
Notice that the first stage operates in parallel with the direct connection from
the inputs to the second stage. Since the direct connection is not an amplifier, it
pays to provide more amplification in later stages by making them operate rela-
tively more slowly.
We have assumed in these calculations that the delay, d, in the second and
third stages should be equal. Why is this a reasonable assumption? Proving that
it is so may be an interesting exercise for the reader.
As a final exercise, let us put in the mythical non-inverting amplifiers in the po-
sitions shown as dotted symbols in Figure 7.6. This case is easy to solve, because
we want the delays in all three stages to be equal. We can easily write expressions
for the input capacitances of the two paths:
w = g2H=d3
x = 2g3H=d3
118 CHAPTER 7. BRANCHES AND INTERCONNECT
Note how the path effort, namely the product of the branching, logical, and elec-
trical effort, enters into these expressions. Setting w + x = 1 and solving, we
find d = 2:35 and D = 7:06, an improvement over both of the other cases we
have considered. Clearly, failure to amplify the direct input signals during the
time the first stage of logic operates costs delay. The lesson to be learned from
this example is to seize every opportunity to buffer less-critical signals because
such amplification in one path can make available more source current for other
paths and thus improve overall performance. Carried to an extreme, faster paths
are buffered until all paths complete simultaneously.
7.4 Interconnect
Interconnect introduces particular problems for designing with logical effort be-
cause it has fixed capacitance. The branching effort of a gate driving a wire to
another gate load is (Cgate + Cwire )=Cgate . This branching effort changes when-
ever transistor sizes in the network are adjusted because the wiring capacitance
does not change in proportion to the transistor size changes. Therefore, our handy
rule that delay is minimized when the effort of each stage is equal breaks down;
the gate driving the wire may use higher effort, while the gate at the end of the
wire will use lower effort. This problem leads us to approximate the branching ef-
fort or to solve complex equations for the exact optimum. This section addresses
approximations for circuits with interconnect. The necessity to make such ap-
proximations is one of the most unsatisfying limitations of logical effort.
When doing designs that account for wiring capacitance, it is helpful to relate
the total capacitance of a wire to the input capacitance of logic gates. The gate
capacitance of a minimum-length transistor is approximately 2 fF/m and has
remained such over many process generations because both length and dielectric
thickness scale by about the same amount. The wire capacitance of minimum-
pitch interconnect is approximately 0.2 fF/m and also remains constant when
wire thickness, wire pitch, and dielectric thickness scale uniformly. Therefore, a
handy rule of thumb is that wire capacitance per unit distance is 1/10 that of gate
capacitance. Of course this does not apply for wider wires and depends somewhat
on the details of your process. It is very useful to know the ratio for your process
to one significant figure so that you can quickly convert wire capacitances into
equivalent amounts of gate width.
7.4. INTERCONNECT 119
both wire resistance and capacitance are proportional to wire length, wire delay
scales quadratically with wire length. Therefore, it is beneficial to break long
wires into sections, each driven by an inverter or buffer called a repeater. Wires
with proper repeaters have delay that is only a linear function of length [1]. When
using repeaters, the designer must plan where the repeaters will be located on the
chip floorplan.
may not be required at the same time, and so devoting a larger portion of the in-
put capacitance to critical outputs can hasten them at the expense of non-critical
outputs.
When fixed capacitances are small compared to other capacitances on the
node, ignore them. When fixed capacitances are large compared to other capac-
itances, the fixed capacitance dominates delay. If other gates loading the node
are not fast enough, increase them in size to reduce their own delay while only
slightly increasing the total capacitance on the node. The most difficult case is
when fixed capacitances are comparable to gate capacitances. In such a case, the
designer may have iterate to achieve acceptable results.
Here we will try to summarize some of these techniques by suggesting a design
procedure.
1. Draw a network.
3. Estimate the total effort along each path, e.g., by working backwards from
the outputs, combining efforts at each branch point.
4. Verify that the number of stages for the network is appropriate for the total
effort that the network bears.
5. Assign a branching ratio to each branch; work backwards from the outputs,
considering each branch you reach. Estimate a branching ratio based on the
ratio of the effort required by each path leaving the branch. You may choose
not to optimize certain paths that bear very little effort or whose speeds are
not critical for your purpose.
6. Compute accurate delays for your design, including the effects of parasitic
delay. Adjust the branching ratios to minimize these delays. You can write
algebraic equations, but it is usually easier to use repeated evaluations of
the delay equations for the competing paths, observing the effects of small
adjustments. If a path is too slow, allocate more drive to that path by using
a greater fraction of the input capacitance.
7.6 Exercises
7-1 [25] Modify Equation 7.5 to account for parasitic delay in the inverters, and
solve for D . If Equation 7.7 is used as an approximation to determine x, how
much does the resulting design differ from the optimum?
7-2 [20] In Section 7.3, we assume that the delay, d, of the second and third
stages are equal. Why is this a good assumption?
7-3 [50] Develop better heuristics for selecting topologies and chosing gate sizes
in the presence of capacitive interconnect.
Chapter 8
Logic gates sometimes have different logical effort for different inputs. Such gates
are called asymmetric. For example, the and-or-invert gate from Section 4.4 is in-
herently asymmetric. The 3-input XOR and majority gates from Sections 4.5.4 and
4.5.5 can be built in either symmetric or asymmetric forms, but the asymmetric
forms have lower logical effort. Finally, normally symmetric gates such as NAND
or NOR can be made asymmetric by sizing transistors to reduce the logical effort
of one or more inputs at the expense of increasing the logical effort of the other
inputs. These asymmetric gates can be used to speed up critical paths in a network
by reducing the logical effort along the critical paths. This attractive property has
a price, however: the total logical effort of the logic gate increases. This chapter
discusses design issues arising from biasing a gate to favor particular inputs.
;
Figure 8.1 shows a NAND gate designed so that the widths of the two pulldown
transistors are allowed to differ: input a has width 1=(1 s), while input b has
width 1=s. The parameter s, 0 < s < 1, called the symmetry factor, determines the
amount by which the logic gate is asymmetric. If s = 1=2, the gate is symmetric,
the pulldown transistors have equal sizes, and the logical effort is the same as we
computed in Section 4.3. Values of s between 0 and 1=2 favor the a input by
making its pulldown transistor will be smaller than the pulldown transistor for b.
Values of s between 1=2 and 1 favor the b input.
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
123
124 CHAPTER 8. ASYMMETRIC LOGIC GATES
a 1
1-s
1 b
s
Despite the flexibility to favor one of the two inputs, the gate still has the same
output drive as the reference inverter with a pulldown transistor of width 1 and a
pullup transistor of width . We can verify that the conductance of the pulldown
connection is 1:
1
1 = 1
1=(1;s) + 1=s
1 (8.1)
Using Equation 4.1, we can compute g a and gb , the logical effort per input for
inputs a and b, and the total logical effort g tot :
ga = 1=(11;+s) + (8.2)
gb = 11=s++ (8.3)
s(1;s) + 2
1
gtot = 1 + (8.4)
transistors are near the output node. In the design shown in Figure 8.1, this means
that we should use only values in the range s 1=2, which favor the a input.
One can also approach the design of asymmetric logic gates by specifying the
desired logical effort gf of the favored input and deriving the necessary transistor
sizes. This approach allows us to calculate the logical effort of the unfavored
input, gu in terms of the logical effort of the favored input. The following equation
is derived from Equations 8.2 and 8.3 (see Exercise 8-1):
Equation 8.5 shows the symmetric relationship between the favored and unfavored
logical efforts: the logical effort of the unfavored input will increase as the logical
effort of the favored input decreases.
Figure 8.2 presents some results that summarize the effects of varying the
symmetry factor of a two-input NAND gate. Recall that, in a single-stage design,
an effort f will result in a delay of f delay units, plus parasitic delay. To achieve
a 0.13 unit reduction in the delay of the favored input (1.33 to 1.2 units), we incur
a 0.23 unit increase in the delay of the unfavored input (1.33 to 1.56 units).
The same design techniques we have illustrated for a two-input NAND gate
apply to other logic gates as well. Rather than catalog all these designs, we shall
126 CHAPTER 8. ASYMMETRIC LOGIC GATES
14
12
10
6
gtot
4
gu
2 1.56
1.33
0
1 1.05 1.1 1.15 1.2 1.25 1.3 1.33
gf
Figure 8.2: Relationship between favored and unfavored logical efforts for a two-
input NAND gate with = 2.
develop and analyze asymmetric designs as the need arises. You might wish to
repeat the analysis shown here for a two-input NOR gate or for a three-input NAND
gate.
in
out
reset
12 1
36
out
18
in 7
reset
50
Figure 8.3: A buffer amplifier with reset input. When reset is LOW, the output
will always be LOW.
g = 76 + 12 = 1:05
+ 12 (8.6)
This circuit takes advantage of the slow response allowed to changes on reset by
using the smallest pullup transistor possible. This choice reduces the area required
to lay out the gate and partially compensates for the large pulldown transistor. The
design violates the practice of designing gates that have the same drive character-
istics as the reference inverter, because the pullup and pulldown drives controlled
by reset differ from those of the standard inverter. In this case, the exception
seems justified because we are interested only in performance when reset is not
active, when the gate’s output drive is nearly identical to that of the reference
inverter.
128 CHAPTER 8. ASYMMETRIC LOGIC GATES
si
1-s
out
si 1
1-s
di 1
s
Figure 8.4: One arm of a multiplexer. The data input is d i and the select bundle is
si and si .
8.2.1 Multiplexers
Just as the CMOS multiplexer is unique in that its logical effort per input does
not grow as the number of inputs increases, asymmetric multiplexer designs have
some peculiar properties. An n-way multiplexer can be viewed as containing
n “arms,” each of which contains the transistors connected to one data input and
one select bundle (Figure 8.4). The unique properties of the multiplexer arise
because the individual arms do not interact, so that each arm may be designed
independently and is insensitive to the presence of other arms. 1
An arm of a multiplexer may be asymmetric so as to favor the speed of the data
or select signals. Favoring the select path may be appropriate when the control
signals arrive late, such as in a carry select adder where the result is speculatively
computed for both carry = 0 and carry = 1, then selected when the carry arrives.
Values of s < 1=2 in Figure 8.4 will produce the required asymmetry.
Favoring the data input is more problematic. Choosing 1=2 < s < 1 leads
to suitable transistor sizes, but the design shown in Figure 8.4 will not tolerate
much asymmetry before the stray capacitance of the larger transistors connected
to select inputs slows the multiplexer and defeats the effort to reduce the load on
the data input. In some cases, the data and select transistors can be interchanged
in the pullup and pulldown chains to avoid loading the small transistors with large
strays. This may lead to severe charge-sharing problems, depending on the design
1
This is a first-order model. When stray capacitance is included in delay calculations, each arm
of a multiplexer contributes a stray capacitance that indeed affects the other arms. See Chapter 11.
8.2. APPLICATIONS OF ASYMMETRIC LOGIC GATES 129
e a
s 1 b
a e
1-s 1 q
a q
d e
1-s 1 b
e a
s 1
Cf
Cd q Cq
d q
e* e*
;
The left arm of the multiplexer is configured to favor the data input, which
will experience a logical effort of 1=(1 s). The transistor sizes on this arm
are marked in terms of three parameters: s, the symmetry factor of the gate; a,
an overall scale factor; and , the ratio of p- to n-type transistor widths. The
right arm of the multiplexer can use minimum-size transistors because it is never
130 CHAPTER 8. ASYMMETRIC LOGIC GATES
charges or discharges its output load, but rather supplies a trickle of current to
counteract leakage.
Along the critical path from d to q , the logical effort of the d input to the
;
multiplexer is 1=(1 s) and the logical effort of the inverter is 1, so that the
;
logical effort of the path is 1=(1 s). The electrical effort is the ratio of the load
on the inverter to the capacitance of the d input. If we define Cq to be the load
capacitance, Cf to be the input capacitance of the multiplexer on the feedback
path, and Cd to be the capacitance of the d input, the electrical effort is just H =
(Cq + Cf )=Cd. Thus we have:
1 Cq + Cf Cq
F = 1;s Cd = (1 ; s)rCd (8.7)
;
useful output. It is clear from this equation that the effort is minimized for given
input and output loads by maximizing r and 1 s. Not surprisingly, this means
minimizing the feedback capacitance, Cf , and biasing the multiplexer in favor of
the d input as much as practical.
Modifying the multiplexer to favor the data input has the side effect of increas-
ing the logical effort of the select bundle, e . This need not impact speed because
favoring the data input implies that the select was non-critical. Moreover, if the
multiplexer is serves as a latch, the select is a clock signal whose delay can be
absorbed into the clock distribution network. Nevertheless, biasing the select in-
creases power consumption of the select driver, and so asymmetry should not be
unreasonably large.
8.3 Summary
The theory of logical effort shows how to design logic gates with transistor sizes
chosen so as to bias the logical effort in favor of one input at the expense of the
remaining inputs. This will have the effect of reducing the delay on the path
through the favored input, while increasing the delay on paths through the other
inputs. Although biasing a gate in this way raises the total logical effort of the
logic gate, the technique can be used to reduce the delay along critical paths.
The benefits of asymmetric designs are most evident when many asymmetric
logic gates are connected serially along a path so as to reduce the delay along the
path. Carry chains are an important application of such techniques.
8.4. EXERCISES 131
8.4 Exercises
8-1 [15] Derive Equation 8.5 from Equations 8.2 and 8.3.
8-2 [25] Show how to design an asymmetric a 3-input NAND gate using two
parameters 0 < s t < 1 to specify the logical effort on two of the three inputs.
Derive an expression for the total logical effort in terms of s and t.
8-3 [20] Complete the design of the static latch shown in Figure 8.5 when C d = 9,
Cq = 6Cd, assuming = 2. What is the delay from d to q when the latch is
transparent? The logical effort?
8-4 [20] Repeat the preceding exercise, but minimize the delay from d to q,
assuming the q output is not used at all.
8-5 [20] The left-most multiplexer arm in Figure 8.5 is by itself an inverting
dynamic latch. The remaining circuits are required in order to make the latch
static. What is the “cost” in terms of logical effort for making the latch static
rather than dynamic?
8-6 [20] Suppose that the static latch of Figure 8.5 must drive a very large load,
e.g., Cq = 100, while Cd = 3. How would you change the design?
8-7 [25] Figure 8.6a shows a conventional set-reset latch. (a) How should you
choose symmetry factors s1 and s2 to obtain short and equal and set and reset de-
lays, i.e., equal delays from the falling of S or R to the change on Q? You may
need to make appropriate assumptions about the load on Q. (b) Devise modifi-
cations to the circuit in Figure 8.6b so that it is statically stable and so that the
impact on logical effort is small. (c) Compare the operation and performance of
the two circuits.
8-8 [25] Which of the NAND gates in Figure 8.7 might be made asymmetric
in order to yield the fastest design? Assume that both inputs should be equally
favored. Does your answer depend on the electrical effort of the “gate”?
132 CHAPTER 8. ASYMMETRIC LOGIC GATES
S S1 q
S2
R
(a)
(b)
Figure 8.6: A set-reset flip-flop in conventional form, (a), and dynamic CMOS
form, (b).
a+b
Figure 8.7: A network that computes the XOR function of two inputs.
Chapter 9
All of our analysis of delays in CMOS logic gates has assumed that the delay of a
logic gate is the same for rising and falling output transitions. It is easy to relax this
condition and to consider the rising and falling delays separately. Allowing rise
and fall times to differ permits us to analyze a greater range of designs, including
pseudo-NMOS circuits, skewed static gates, CMOS domino logic, and precharged
circuits of all kinds. It also allows us to design static CMOS gates with various
choices for , the ratio of widths of PMOS to NMOS transistors.
The principal result is that for all but the most demanding cases, the techniques
of logical effort can be used without modification to determine the best transistor
sizes even when rising and falling delays differ. When calculating the total delay
along a path, however, , the delay of a reference inverter, must be replaced by the
average of the unequal rising and falling delays through the reference inverter.
Most often, the analysis need concern only the average of the rising and falling
delays, because a signal flowing through a network of gates will alternately rise
and fall as it propagates through each stage. Thus the number of rising and falling
transitions differs by at most one, and the average stage delay is usually an ad-
equate measure of the network’s performance. If the speed of propagation of a
particular transition is more important than that of other combinations, skewed
static gates can reduce the logical effort of that transition at the expense of larger
effort on other transitions.
One of the interesting applications of this analysis is to find the best value of
, the ratio of widths of pullup to pulldown transistors in static CMOS designs. If
is too small, then the rising transition will be too slow, because the conductance
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
133
134 CHAPTER 9. UNEQUAL RISING AND FALLING DELAYS
of the pullup transistor will be diminished. On the other hand, if is too large,
the rising transition will be suitably fast, but the transistor gate capacitance of the
pullup transistor will be so large that the circuit driving it will slow down. The
best value of will find a compromise between these extremes. It turns out that
the value of for least total delay leads to rising and falling delays that differ.
where the delays are measured in terms of . Notice that the logical efforts, para-
sitic delays, and stage delays differ for rising transitions (u) and falling transitions
(d). The efforts and parasitic delays of each transition can be extracted from a plot
of delay versus electrical effort, as discussed in Chapter 5. The electrical effort is
independent of the transition direction.
In a path containing N logic gates, we use one of two equations for the path
;
delay, depending whether the final output of the path rises or falls. In the equa-
tions, i is the distance from the last stage, ranging from 0 for the final gate to N 1
for the first gate.
X X
Du = (gdihi + pdi ) + (guihi + pui) (9.3)
iX i even
odd
X
Dd = (guihi + pui) + (gdihi + pdi) (9.4)
i odd i even
The first equation models the delay incurred when a network produces a rising
transition. In this equation, the first sum tallies the delay of falling transitions at
the output of stages whose distance from the last stage is odd, and the second the
delay of rising transitions in stages an even distance from the last stage, including
9.1. ANALYZING DELAYS 135
the last stage itself. Note that every path through a network of logic gates will
experience alternating rising and falling transitions, as this sum indicates. Equa-
tion 9.4 is similar to its companion, but models the network producing a falling
transition: the falling edges occur in even stages, and rising ones occur in odd
stages. These two equations model the two separate cases we must consider.
In most cases, we want the delays experienced by launching a rising or falling
transition into a network to be similar. The delays cannot, in general, be identi-
cal, because the two cases will experience different numbers of rising and falling
delays. A reasonable goal is to minimize the average delay:
X gui + gdi pui + pdi
D = 21 (Du + Dd) = 2 h i + 2
X X
= (gihi + pi ) = (fi + pi) (9.5)
Q
subject to the usual constraint on the total effort, F = fi . Notice that the logical
effort gi and parasitic delay pi of a stage are the average of the rising and falling
quantities. Once again, the observation of Section 3.3 applies, and we see that
the average delay is minimized by making the total effort borne by each stage the
same, so that fi = f = F 1=N . Then we have for the average delay:
This is the same result as we obtained for equal delays, Equation 3.20. There-
fore, we are justified in using the average value of rising and falling logical effort
and parasitic delay to minimize the average path delay, regardless of differences
between rising and falling delays. All values are normalized so that the average
logical effort of an inverter is 1.
The maximum delay through a path, however, may be different than the delay
predicted by the original theory. To find maximum delay, one must select the
worst of the delays for a rising output and for a falling output.
Example 9.1 Size the path shown in Figure 9.1 for minimum average delay, us-
ing the logical effort and parasitic delay data from Table 9.1. What is the average
and worst case delay?
Notice how the average logical effort and parasitic delays in the
table are the same as we are accustomed to, but that the rising values
are larger than the falling values.
136 CHAPTER 9. UNEQUAL RISING AND FALLING DELAYS
a
c
C=1
C = 20
We size the gates along the path for minimum average delay using
average effort values. We find G = 4=3 and H = 20, so F = 80=3.
Table 1.3 recommends N = 3, verifying our choice of two inverters
in Figure 9.1. The effort of each stage is thus = (80=3)1=3 = 2:99.
Working from the output, we obtain the transistor sizes shown in Fig-
ure 9.2.
Now we can compute the delays from Equations 9.4 and 9.3:
The average is 12:96, which agrees with the direct calculation using
Equation 9.6.
9.2. CASE ANALYSIS 137
a 0.5
b 0.5 1.5 4.48
c
0.5
Figure 9.2: The network of Figure 9.1, optimized for the average of rising and
falling delays, assuming = 2.
Example 9.2 Repeat Example 9.1, but with the objective of minimizing the prop-
agation time of a rising transition presented at the input.
Because the input rises, the NAND gate output will fall. The next
inverter output will rise and the final inverter output will fall. There-
fore, the logical efforts of interest are (16=15), 6=5, and 4=5, respec-
tively. The logical effort of the path is (16=15) (6=5) (4=5) = 1:02.
The electrical effort is still H = 20=1 = 20, so the path effort is 20:5.
138 CHAPTER 9. UNEQUAL RISING AND FALLING DELAYS
a 0.5
b 0.5 1.71 3.89
c
0.5
Figure 9.3: The network of Figure 9.1, optimized for the particular case of a rising
transition entering on a. = 2.
This example shows clearly that optimizing one of the two complementary
delays yields slightly faster circuits at the expense of a large increase in the delay
of the other transition.
9.2. CASE ANALYSIS 139
High-skew Gates
a
2 4
b 2
2
x x 4
x
a 1
1/2 a 1/2
b
1 1/2
Low-skew Gates
a 1 2
b 1
1
x 2
x
x
a 2
1 a 1
b
2 1
Figure 9.4: High-skewed and low-skewed inverters, NAND gates, and NOR gates,
assuming = 2.
9.3. OPTIMIZING CMOS P=N RATIOS 141
to make the non-critical transistors half the size they would have been in a normal
gate, as was done in Figure 9.4.
giving equal rising and falling delays. We will also see that the minimum is very
flat, so the best size is a weak function of process parameters.
Consider a gate for which the P=N ratio, i.e., the ratio of the size of PMOS
transistors to the size of NMOS transistors, is k for equal rising and falling delays.
As we can recall from Chapter 4, for an inverter, k = 1. For a 2-input NAND gate,
k = 0:5. For a 2-input NOR gate, k = 2. If the actual P/N ratio is r, the falling,
rising, and average gate delay are proportional to:
dd / (1 + r)
du / k (1 + r)
r
d / (1 + kr )(1 + r)
2 (9.11)
where the first term reflects the rising and falling currents and the second term
reflects the input capacitance. Taking the partial derivative with respect to r and
setting it to 0 shows that minimum average delay is achieved for:
q
r = k (9.12)
For typical CMOS processes, = n =p is between 2 and 3, which implies that
the best P=N ratio of an inverter is between 1.4 and 1.7.
How sensitive is delay to the P=N ratio? Figure 9.5 plots delay of a fanout-
of-4 inverter as a function of , the inverter’s P=N ratio, for three values of .
It assumes pinv = 1. The vertical axis has units of , the delay of a fanout-of-1
inverter with = = 1.
Figure 9.5 shows that the delay curves are very flat near the best value of
. Indeed, values from 1.4 to 1.7 give fanout-of-4 inverter delays within 1%
of minimum for any value of from 2 to 3. Moreover, the minimum delay at
p
= is only 2-6% better than the delay at = . The minimum is so flat
p
that simulation-based optimization programs often do not converge to a minimum
at = . However, the flat minimum is convenient because it means the
value can be selected with little regard to actual process parameters. = 1:5 is a
convenient choice because it offers good performance and relatively easy layout.
The most important benefit of optimizing the P=N ratio is not average speed,
but rather reduction in area and power consumption. Remember, however, that
rising and falling delays may differ substantially. For short paths, this may cause
the worst-case delay to be significantly longer than the average delay. Also, certain
9.3. OPTIMIZING CMOS P=N RATIOS 143
6.5
=3
6
Fanout-of-4 Inverter Delay ( )
3 = 2.5
5.5
2.5 =2
5
2
4.5
1 1.5 2 2.5 3
a 2 2
b 2
1.5
x 2
x
x
a 2
1 a 1
b
2 1
Figure 9.6: Inverter, NAND and NOR gates sized for improved area and average
delay.
circuits such as clock drivers need equal rise and fall times and should be designed
with = . p
Other gates should use a P=N ratio of about k. Typical values are 1 for
a 2-input NAND, 1.5 for an inverter or multiplexer, and 2 for a 2-input NOR, as
shown in Figure 9.6. The area and power savings are especially large for NOR
gates.
9.4 Summary
The analysis presented in this chapter shows how to apply the theory of logical
effort to designs using logic gates that introduce different delays for rising and
falling outputs. We assign different rising and falling logical efforts, normalized
such that the average logical effort of an inverter is 1. There are two ways to
design paths with unequal rise/fall delays:
Assume that each logic stage has the average of the rising and falling delays
(Section 9.1). This method applies the techniques of logical effort without
alteration. The maximum delay through a network may be slightly greater
than the average delay.
Use case analysis to minimize the delay of the particular transition whose
propagation through the network must be fast (Section 9.2). The propaga-
9.5. EXERCISES 145
tion delay of this transition can be reduced only at the expense of lengthen-
ing the delay of the complementary transition. Skewed logic gates can be
used to favor a critical transition even more.
The analysis of delays also leads to a calculation for the best value of , the
ratio of pullup transistor width to pulldown transistor width. While = yields
equal rising and falling delays for an inverter,
p
= yields designs whose
average delay is slightly less. Using = 1:5 yields designs within 1% of least
delay over a wide range of processes and saves area and power relative to a circuit
with equal rising and falling delays. The different rising and falling delays lead
to slightly different logical efforts for gates. For simplicity, it is good enough to
use the values of logical effort calculated in Chapter 4. However, if more accurate
effort values are found from simulation or direct measurement, they may be used
instead.
The analysis presented in this chapter must be used cautiously. The accuracy
of our simple delay model of MOS logic gates is poor for modeling delay when
arriving input signals have different rising and falling transitions. The accuracy
would be improved by using a more accurate delay model, such as the one pro-
posed by Horowitz [3], which considers the risetime of input transitions explicitly
to predict the delay of a logic gate.
9.5 Exercises
9-1 [15] Sketch high-skew and low-skew 3-input NAND and NOR gates. What
are logical efforts of each gate on its critical transition?
9-2 [20] Derive the rising, falling, and average logical efforts of the gates with
unequal and in Table 9.1.
9-3 [20] Derive the rising, falling, and average logical efforts of skewed gates
presented in Table 9.2.
9-4 [20] Derive the delay vs. information shown in Figure 9.5.
Circuit Families
So far, we have applied logical effort primarily to analyze static CMOS circuits.
High-performance integrated circuits often use other circuit families to achieve
better speed at the expense of power consumption, noise margins, or design effort.
This chapter computes the logical effort of gates in different circuit families and
shows how to optimize such circuits. We begin by examining pseudo-NMOS logic
and the closely related symmetric NOR gates. Then we delve into the design of
domino circuits. Finally, we analyze transmission gate circuits by combining the
transmission gates and driver into a single complex gate.
The method of logical effort does not apply to arbitrary transistor networks,
but only to logic gates. A logic gate has one or more inputs and one output, subject
to the following restrictions:
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
147
148 CHAPTER 10. CIRCUIT FAMILIES
a a a
4/3 8/3 4/3
b 4/3
b 8/3
;
sitions, the output current is the pulldown current minus the pullup current which
is fighting the pulldown, 4=3 1=3 = 1. For rising transitions, the output current
10.1. PSEUDO-NMOS CIRCUITS 149
2/3
a 4/3
out
2/3
b 4/3
X Y
1 0 0 X
The HIGH input to the first gate causes its output to fall, but the second gate’s out-
put also falls in response to its initial HIGH input. The circuit therefore produces
an incorrect result because the second output will never rise during evaluation, as
shown in Figure 10.3. Domino circuits solve this problem by using inverting static
gates between dynamic gates so that the input to each dynamic gate is initially
LOW. The falling dynamic output and rising static output ripple through a chain
of gates like a chain of toppling dominos. In summary, domino logic runs 1:5 to
2 times faster than static CMOS logic [2] because dynamic gates present a much
lower input capacitance for the same output current and have a lower switching
threshold, and because the inverting static gate can be skewed to favor the critical
monotonically rising evaluation edges.
Figure 10.4 shows some domino gates. Each domino gate consists of a dy-
namic gate followed by an inverting static gate 1 . The static gate is often but not
always an inverter. Since the dynamic gate’s output falls monotonically during
evaluation, the static gate should be skewed high to favor its monotonically ris-
ing output, as discussed in Section 9.2.1. We have calculated the logical effort of
high-skew gates in Table 9.2 and will compute the logical effort of dynamic gates
in the next section. The logical effort of a domino gate is then the product of the
logical effort of the dynamic gate and of the high-skew gate. Remember that a
domino gate counts as two stages when choosing the best number of stages.
A dynamic gate may be designed with or without a clocked evaluation tran-
sistor; the extra transistor slows the gate but eliminates any path between power
1
Note that a domino “gate” actually refers to two stages, rather than a single gate. This is
unfortunate, but accepted in the literature.
152 CHAPTER 10. CIRCUIT FAMILIES
Dynamic gate
H
Precharge transistor keeper
No clocked
evaluation transistor
and ground during precharge when the inputs are still high. Some dynamic gates
include weak PMOS transistors called keepers so that the dynamic output will re-
main driven if the clock stops high.
Domino designers face a number of questions when selecting a circuit topol-
ogy. How many stages should be used? Should the static gates be inverters,
or should they perform logic? How should precharge transistors and keepers be
sized? What is the benefit of removing the clocked evaluation transistors? We will
show that domino logic should be designed with a stage effort of 2–2:75, rather
than 4 that we found for static logic. Therefore, paths tend to use more stages and
it is rarely beneficial to perform logic with the inverting static gates.
Logical effort partially explains why dynamic gates are faster than static gates.
In static gates, much of the input capacitance is wasted on slow PMOS transistors
that are not even used during a falling transition. Therefore, a dynamic inverter
enjoys a logical effort only 1/3 that of a static inverter because all of the input
capacitance is dedicated to the critical falling transition.
Our simple model for estimating logical effort fails to capture two other rea-
sons that dynamic gates are fast. One is the lower switching threshold of the gate:
the dynamic gate output will begin switching as soon as inputs rise to V t , rather
than all the way to VDD =2. Another is the fact that velocity saturation makes the
resistance of long NMOS stacks lower than our resistive model predicts. There-
fore, simulations show that dynamic gates have even lower logical effort than
Table 10.2 predicts.
Notice that dynamic NOR gates have less logical effort than NAND s and indeed
have effort independent of the number of inputs. This is reversed from static CMOS
gates and motivates designers to use wide NOR gates where possible.
/2 /2 /2
(a) 2 3 2 2
2 3
2
/2 /2 /2
(b) 1 2 1 1
Figure 10.5: Dynamic gates (a) with and (b) without clocked evaluation transis-
tors.
10.2. DOMINO CIRCUITS 155
by a string of inverters with logical effort 1. Domino paths are slightly different
because extra amplification can be provided by domino buffers with logical effort
less than 1. Adding more buffers actually reduces F , the path effort! Therefore,
we would expect that domino paths would benefit from using more stages, or
equivalently, that the best stage effort is lower for domino paths. In this section,
we will compute this best stage effort.
Our arguments parallel those in Section 3.4. We begin with a path that has n 1
stages and path effort F . We contemplate adding n2 additional stages to obtain
a path with a total of N = n1 + n2 stages. This time, however, the gates we
add are not static inverters. Instead, they may be dynamic buffers. In general,
the additional stages have logical effort g and parasitic delay p. The extra gates
therefore change the path effort. The minimum delay of the path is:
n !
^D = N FgN ;n 1=N + X pi + (N ; n1 )p
1
1
(10.1)
i=1
We can differentiate and solve for N ^ which gives minimum delay. When
parasitics are non-zero, it proves to be more convenient to compute the best stage
effort (g p), which depends on the logical effort and parasitic delay of the extra
stages. The mathematics is hideous, but the conclusion is remarkably elegant:
where (1 pinv ) is the best stage effort plotted in Figure 3.4 and fit by Equa-
tion 3.24. This result depends only on the characteristics of the stages being added,
not on any properties of the original path.
156 CHAPTER 10. CIRCUIT FAMILIES
Let us apply this result to domino circuits, where the extra stages are domino
buffers. The logical effort of a domino buffer, like the one in Figure 10.4, is
10=18 with series evaluation transistors and 5=18 without. Estimate parasitic de-
lays to be (5=6)pinv for a high-skew static inverter, pinv for a dynamic inverter
with series evaluation transistor. Since the buffer withq series evaluation tran-
sistor consists of two stages, each stage has g = 10=18 = 0:75 and p =
(11=6)pinv =2 = (11=12)pinv . If pinv = 1, the best stage effort is (0:75 0:92) =
0:75(1 0:92=0:75) = 2:76. The same reasoning applies to dynamic inverters
with no clocked evaluation transistors having g = 0:52 and p = 2=3, yielding a
best stage effort of 2:0.
In summary, domino paths with clocked evaluation transistors should target a
stage effort around 2:75, rather than 4 used for static paths. If the stage effort is
higher, the path may be improved by using more stages. If the stage effort is lower,
the path may be improved by combining logic into more complex gates. Similarly,
domino paths with no clocked evaluation transistors should target a stage effort of
2:0. Since it is impractical to leave out all of the clocked evaluation transistors,
many domino paths mix clocked and unclocked dynamic gates and should target
an effort between 2 and 2:75. As with static logic, the delay is a weak function
of path effort around the best effort, so the designer has freedom to stray from the
best effort without severe performance penalty.
g = 5/3
g = 5/6
H g=1
g = 5/6
(a) H
g = 5/3
g = 3/2
(b) H
advantageous to use logic in the static stage. For stage efforts above , the longer
path is better. An example will clarify this calculation:
Example 10.1 Which of the designs in Figure 10.6 is best if the path has electri-
cal effort of 1? If the path has electrical effort of 5?
could work out the parasitic delay exactly, but we recall that over a
range of parasitics, (1 p) is about 4. Therefore, (0:68 p=0:68)
0:68 4 = 2:72. If the stage effort is below 2:7, design (b) is best. If
the stage effort is above 2:7, design (a) will be better.
p
Design (b) has a logical effort of (5=3) (3=2) = 2:5 and thus a
path effort of 2:5H and stage effort of 2:5H . If H = 1, the stage
effort is 1:6 and design (b) is best. If H = 5, the stage effort is 3:5
and design (a) would be better.
158 CHAPTER 10. CIRCUIT FAMILIES
This seems like too much work for a simple example in which comparing de-
lays is easy. The advantage of the method is the insight it gives: one should build
logic into static gates only when the stage effort is below about 2.7. Moreover,
it is best to first reduce the number of stages by using more complex dynamic
gates. Simple calculations show that a dynamic gate with up to 4 series transistors
followed by a high-skew inverter generally has lower logical effort than a smaller
dynamic gate followed by a static gate. An exception is very wide dynamic NOR
gates and multiplexers, which may be faster when divided into narrower chunks
feeding a high-skew NAND gate to reduce parasitic delay.
In summary it is rarely beneficial to build logic into the static stages of domino
gates. If a domino path has stage effort below about 2.7, the path can be improved
by reducing the number of stages. The designer should first use more complex
dynamic gates with up to 4 series transistors. If the stage effort is still below 2.7,
only then should the designer consider replacing some of the static inverters with
actual static gates.
/2
4/3
keeper
/2 r
inverter. The logical effort is thus only 4/9, much better than the effort of 2/3 with
a normal pulldown size and nearly as good as 1/3 for a dynamic inverter with no
clocked pulldown. The main cost of large clocked transistors is the extra clock
power. Therefore, a small amount of unbalancing such as 1.5 or 2 is best.
Some dynamic gates use keepers to prevent the output from floating high dur-
ing evaluation, as shown in Figure 10.8. The keepers also slightly improve the
noise margin on the input of the dynamic gate. They have little effect on the
noise margin at the output because they are usually too small to respond rapidly.
The drawback of keepers is that they initially fight a falling output and slow the
dynamic gate. How should keepers be sized?
The keeper current is subtracted from the pulldown stack current during eval-
uation. If the ratio of keeper current to pulldown stack current is r , the logical
;
effort of the dynamic gate increases by 1=(1 r ). Therefore, a reasonable rule of
160 CHAPTER 10. CIRCUIT FAMILIES
long
minimum size
/2
thumb is to size keepers at r = 1=4 to 1=10 of the strength of the pulldown stack.
For small dynamic gates, this implies that keepers must be weaker than minimum
sized devices. Increasing the channel length of the keeprs will weaken them, but
also add to the capacitive loading on the inverter. A better approach is to split
the keeper into two series transistors, as shown in Figure 10.9. Such an approach
minimizes the load on the inverter while reducing keeper current.
4
s s
2
4 2 d q
d q
s 2
2
2
s
2
(a) (b)
Figure 10.10: An inverter driving a transmission gate, and the same circuit re-
drawn so that it can be considered to be a single logic gate for the purposes of
logical effort analysis.
Given this model, the circuit has drive equal to that of an inverter for both
rising and falling transitions. The logical effort is 2 for input a and only 4=3
for s . This improvement in logical effort on s relative to a normal tri-state
inverter comes at the expense of increased diffusion capacitance, leaving no great
advantage for transmission gate tri-states over normal tri-states.
In general, transmission gate circuits are sized with equal PMOS and NMOS
transistors and compared to an inverter with equal output current. As long as a
delay equation such as Equation 3.6 describes the delay of a circuit, the method
of logical effort applies. However, the parasitic capacitance increases rapidly with
series transmission gates, so practical circuits are normally limited to about two
series transmission gates.
10.4 Summary
This chapter used ideas of best stage effort, unbalanced gates, and unequal rise/fall
delays to analyze circuit families other than static CMOS. Quantifying the logical
effort of these circuit families enables us to understand better their advantages
over static CMOS and to choose the most effective topologies.
We first examined ratioed circuits, such as pseudo-NMOS gates, by comput-
ing separate rising and falling logical efforts. The analysis shows that Johnson’s
symmetric NOR is a remarkably efficient way to implement the NOR function.
We then turned to domino circuits and found a remarkable result for the best
stage effort of a path when considering adding extra stages, given in Equation 10.2.
The equation tells us that the best stage effort of dynamic circuits is in the range
of 2–2:75, depending on the use of clocked evaluation transistors. The equation
also tells us when it is beneficial to break a path into more stages of simpler gates.
We conclude that a path should incorporate logic into static gates only when the
dynamic gates are already complex and the stage effort is still less than 2.7.
Finally, we explored transmission gate circuits. The logical effort of transmis-
sion gate circuits can be found by redrawing the driver and transmission gates as
a single complex gate. Neglecting the driver is a common pitfall which makes
transmission gates appear faster than they actually perform.
10.5 Exercises
10-1 [20] Derive the logical efforts of pseudo-NMOS gates shown in Table 10.1.
10-2 [20] Design an 8-input AND gate with an electrical effort of 12 using pseudo-
NMOS logic. If the parasitic delay of an n-input pseudo-NMOS NOR gate is (4n +
2)=9, what is the path delay? How does it compare to the results from Section 2.1?
10-3 [25] Design a 3-input symmetric NOR gate. Size the inverters so that the
worst-case pulldown is four times as strong as the pullups. What is the average
logical effort? How does it compare to a pseudo-NMOS NOR? To static CMOS?
10-4 [20] Design a 2-input symmetric NAND gate. Size the inverters so that the
worst-case pulldown is four times as strong as the pullups. What is the average
logical effort? How does it compare to static CMOS?
10.5. EXERCISES 163
10-6 [25] Design a 4-16 decoder like the one in Section 2.2, using domino logic.
You may assume you have true and complementary address inputs available.
10-7 [25] A 4:1 multiplexer can be constructed from two levels of transmission
gates. Design such a structure and compute its logical effort.
164 CHAPTER 10. CIRCUIT FAMILIES
Chapter 11
Wide Structures
One of the applications of logical effort is the analysis of wide structures, such
as decoders or high fan-in gates and multiplexers, to find the topological struc-
ture that offers the best performance. This chapter presents four examples. The
first is the design of an n-input AND structure. Then we design an n-input Muller
C-element, in which the n-input AND structure can be used. Third, we present
alternative designs for decoders that form 2n selection outputs from an n-bit ad-
dress. Finally, we analyze high fan-in multiplexers and show that it is best to
partition wide multiplexers into trees of 4-input multiplexers.
165
166 CHAPTER 11. WIDE STRUCTURES
Although this is a simple solution, its logical effort grows rapidly as the number
of inputs increases. An n-input NOR gate could also be used, with an inverter on
each input, to compute the AND function. But since the logical effort of an n-input
NOR gate is always greater than that of an n-input NAND gate, this structure is not
an improvement.
To avoid the linear growth of logical effort, we can build a tree of NAND and
NOR gates to compute the AND function. Figure 11.1 shows such a tree: it has
a NOR gate at the root, alternating levels of NAND and NOR gates, and an even
number of levels. Observe that the number of inputs to gates at different levels in
the tree may differ. In the figure, most levels have 2-input gates, but the gates at
the leaves of the tree use 3-input gates. In some cases, the gates at certain levels
in the tree may have only one input, i.e., they will be inverters. Figure 11.2 shows
an example, in which the root NOR gate is an inverter.
The tree of Figure 11.1 has a logical effort per input of 6.17 (for = 2),
11.1. AN N -INPUT AND STRUCTURE 167
Figure 11.1: A 3,2,2,2 AND tree composed of alternating levels of NAND and NOR
gates.
168 CHAPTER 11. WIDE STRUCTURES
Figure 11.2: A degenerate case of the tree, in which the NOR gate has only one
input, i.e., it is an inverter.
while an equivalent 24-input NAND gate and inverter would have a logical effort
per input of 8.67. Of course, a 24-input NAND gate is also impractical because
parasitic delay grows quadratically with stack height. The tree in Figure 11.1
does not yield the lowest logical effort for 24 inputs: as we shall shortly see, a tree
with 8 levels and a logical effort per input of 3.95 is best.
A simple procedure can find the tree structure with the least logical effort.
The design process searches recursively through all plausible tree structures with
the right number of inputs. When designing a tree for n inputs, we first calcu-
late the logical effort of using a single n-input gate at the current level, perhaps
d e
using inverters on its inputs to make the number of levels in the tree even. Then
we consider trees with a b-input gate at the root, and subtrees that have n=b in-
puts, where b ranges from 1 to n. The determination of the best subtree design is
achieved by a recursive call on the same tree-design procedure. Care is required
in the control of recursion to be sure that we don’t explore endlessly deep trees
that use 1-input gates at every level.
Logical effort offers several hints about the nature of the solution. Because the
logical effort of NOR gates exceeds that of NAND gates, we expect the NOR gates
in the tree to have fewer inputs than the NAND gates. In fact, to obtain minimum
logical effort, all the NOR gates in the tree will have only one input—they will
be inverters! Rather than inserting a NOR gate, the design procedure will find it
advantageous to use a NAND gate at the next lower level in the tree.
Table 11.1 shows designs for trees with up to 64 inputs. Notice that the trees
are very skinny, using only 2- and 3-input gates. Observe too that no NOR gates
with multiple inputs are used. This table shows the minimum effort design for the
24-input problem. Note that it is a tree eight levels deep.
The results of these designs can be used to formulate a lower bound on the
logical effort of an n-input AND tree. The tree will contain only 2-input NAND
gates, alternating with inverters, with as many levels as necessary to accommodate
n inputs. Thus if l is the number of levels of NAND gates, 2 l = n, or l = log2 n.
11.1. AN N -INPUT AND STRUCTURE 169
When = 2, this simplifies to Gand = n0:415 . Note that the logical effort of
AND trees grows much more slowly than the linear growth of a single NAND gate
(Equation 11.1).
We find the best tree by minimizing this delay rather than minimizing the logical
effort.
Table 11.2 shows some results for the electrical efforts of H = 1, H = 5, and
H = 200. The trees with low effort are bushier than those which minimize logical
effort because the limited total effort will make designs with too many stages slow.
The trees with high effort, on the other hand, use one of the skinny designs from
Table 11.1, possibly followed by additional inverters to yield the best number of
stages. For example, when n = 2, the least logical effort tree has logical effort of
4/3. Thus, the path with electrical effort of 200 has total effort 800/3. Table 3.1
shows a 5-stage design would be fastest, but the number of stages must be even.
Four stages turns out to be better than 6, so 2 additional inverters are used after
the 2-stage minimum logical effort tree.
a n
b n
c n
out
Figure 11.3: The “simple-C” design for a Muller C-element, with n = 3 inputs.
gate is just n (see Section 4.5.8). Variations of this dynamic circuit make it static
by adding some form of feedback. Although the feedback will increase the logical
effort slightly, we will ignore this effect.
Figure 11.4 shows another way to build a C-element using AND trees to detect
when the inputs are all HIGH and when they are all LOW. We shall call this design
an “AND-C.” If we seek a design with the least logical effort, then the design of
the two AND trees will be identical, and each will have the same logical effort.
It might seem that the calculation of the logical effort for the entire C-element
would require deciding on the fraction of input current that is directed into each
AND tree. However, we can appeal to the results on bundles and observe that both
top and bottom paths experience the same logical effort in the AND trees, and so
signals x and y can be treated as a bundle, as shown in Figure 11.5. This bundle
drives a circuit that is identical to an inverter, which has a logical effort of 1. So
we see that the minimum logical effort of an n-input C-element is equal to the
logical effort of an n-input AND tree. The design of these trees was addressed in
the previous section.
If we study Table 11.1, we can see that the AND-C design has lower logical
effort than the simple-C design for any number of inputs. The column labeled
Gand in this table gives the logical effort of the n-input AND tree, which is the
logical effort of the n-input AND-C design. In comparison, the logical effort of an
172 CHAPTER 11. WIDE STRUCTURES
a
x
b AND
c
out
y
AND 1
Figure 11.4: The “AND-C” design for a Muller C-element, using AND trees to
determine when all inputs are HIGH or LOW.
a
b x out
AND
c y
Figure 11.5: A different drawing of Figure 11.4, showing the bundle of two signals
computed by the AND trees.
11.2. AN N -INPUT MULLER C-ELEMENT 173
x
NAND
s
out
y
NOR 1
1-s
Figure 11.6: More precise version of Figure 11.4, showing NAND and NOR func-
tions driving the output stage.
by using a minimal effort tree. However, the parasitic delays are comparable
to the logical effort delay, and so the reduced logical effort delay is a smaller
fraction of total delay. Moreover, when electrical effort is small, the trees are
bushy and do not achieve minimal logical effort; they may also have stage efforts
well below optimal. When electrical effort is large, the electrical effort delay
term is large, also making the savings in logical effort delay a smaller fraction of
the total delay. The conclusion is that for both large and small electrical efforts,
the overall speedup of the AND-C design is much less than one would expect by
considering logical effort alone.
11.3 Decoders
Efficient decoders are important for addressing memories and microprocessor reg-
ister files, where speed is critical. Decoding structures tend to have large total
effort because the fanout of address bits to all decoders and the fanout of the de-
coder output to the transistors in the memory word are both large. In this section,
we analyze three decoder designs from the perspective of logical effort.
11.3. DECODERS 175
n H simple-C AND-C
delay stages delay NAND tree NOR tree
2 1 5.8 2 5.6 2 2
5 9.3 2 8.8 2 1,2,1
3 1 7.5 2 7.1 3 3
5 11.7 2 10.8 3,1,1 1,3,1
4 1 8.0 2 8.5 4 4
5 12.9 2 12.7 2,1,2 2,2,1
6 1 9.9 2 12.2 3,1,2 2,3,1
5 16.4 4 14.7 3,1,2 2,3,1
8 1 11.7 2 12.5 2,2,2 2,2,2
5 17.1 4 15.3 2,2,2 2,2,2
16 1 16.0 2 15.1 4,2,2 2,4,2
5 20.0 4 18.2 4,2,2 2,4,2
32 1 19.5 4 18.1 4,2,4 4,4,2
5 25.2 6 21.6 4,2,4 2,2,2,2,2
64 1 23.3 4 21.2 4,4,4 4,4,4
5 28.9 6 24.9 4,2,2,2,2 2,4,2,2,2
Table 11.3: Comparison of minimum-delay designs for Muller C-elements, for
= 2, when the total electrical effort is specified.
176 CHAPTER 11. WIDE STRUCTURES
The considerations that affect decoder design are many, and minimizing log-
ical effort may not be paramount. Layout considerations are important, because
often the decoder must fit on the same layout pitch as the memory cells it ad-
dresses. Overall decoder size and power are important; a design that minimizes
logical effort may require too much power or too many transistors to be practical.
Finally, many decoder structures use precharging to reduce logical effort; we will
not analyze such designs here.
11.3.2 Predecoding
Figure =refwsf-8 illustrates the idea of predecoding. The n address bits form p
groups of q each. Each group is decoded to yield 2q predecode values. Then a
second layer of p-input AND trees combines the predecoded signals to generate
2n final signals. Let us compute the total effort on the longest path through the
decoder. Each address bit fans out to 2q decoders in the first layer, so there will be
a branching effort of 2q . The decoder will introduce a logical effort of Gand (q ),
the logical effort of an AND tree with q inputs. Then there will be a fanout to 2 n;q
AND gates in the second layer, each with logical effort G and (p). The path effort
from an address bit to the output is:
We can compare this result with Equation 11.5, given n = pq . If we try a few
values, we find that predecoding has about the same logical effort as the simple
11.3. DECODERS 177
000
AND
001
AND
AND 010
n
2 outputs
n address bits
AND
a0
q inputs { AND
a1
2q
predecoded AND
outputs
final outputs
AND
a2 AND
q inputs { AND
a3
a decoder will be HIGH, it is possible to share the PMOS transistors across all of
the decoder gates! This is shown in Figure 11.9 for a 3:8 decoder. The decoder
can be viewed as eight 3-input NOR gates that share PMOS pullups.
Rather than making the pullups all the same size, we shall make the transistors
higher in the tree wider. The lowest-level pullups will have width w , the next
pullup will have width 2w , then 4w , and so on. This scheme has the effect of
loading the input lines equally, so they will all have equal logical effort. Other
sizing schemes might reduce the logical effort of certain inputs and increase the
logical effort of others. If we compute the conductance of the n series pullup
transistors sized in this way, and equate it to the conductance of a PMOS transistor
of width from the reference inverter, we find:
w= 1 ; 21n ! 2
1 ; 12 (11.7)
Now that we have designed the decoder to have the same output drive as the
reference inverter, the logical effort per input is just the input capacitance of each
input, divided by 1 + , the input capacitance of the reference inverter. Observe
that each input is connected to 2n;1 pulldown transistors, each of width 1, and to
a total pullup width of 2 n;1 w . So the logical effort per input is:
0 1; 1 1
B 1 + ;
2n
C
G(n) = 2n;1 B@ 1 + 2 CA
1 1
(11.8)
The corresponding expression for the AND tree, using Equation 11.5 and the lower
bound from Equation 11.2:
F = 2nn0:415 H (11.11)
The second factors in the two equations are the only differences, so we see that
the Lyon-Schediwy decoder always has lower effort than the AND tree.
180 CHAPTER 11. WIDE STRUCTURES
w w
w w
w w w w
w w w w
A0 1 1 1 1
A0 1 1 1 1
A1 1 1 1 1
A1 1 1 1 1
A2 1 1 1 1
A2 1 1 1 1
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7
11.4 Multiplexers
CMOS multiplexers are interesting structures because the logical effort of a mul-
tiplexer is independent of the number of inputs. This suggests that multiplexers
could have a large number of inputs without speed penalty. Common sense tells
us otherwise. One problem is that decoding select signals for wide multiplexers
requires large effort, though this does not impact delay from data inputs. When
stray capacitance is considered, we discover that multiplexers should not be very
broad at all. In fact, over a broad range of assumptions, the best multiplexer has
four inputs. To select one of a large number of signals, we will see that it is best
to build a tree of 4-input multiplexers. Nevertheless, it is sometimes beneficial to
use multiplexers with up to 8 inputs.
s0
Cout
s0 Cs
Cin
s1
s1 Cs
Cin
to r arms total
r inputs per
multiplexor {
na amplifier stages
n inputs Cin
Cout
nm multiplexor stages
the parasitic delay of the multiplexer stages; and the third is the parasitic delay of
the amplifier stages. We can find the fastest network by computing the values of
nm and na that minimize D. From the best value of nm we can obtain the best
multiplexer width r = n1=nm .
Before starting to minimize the delay, let us try to anticipate the result. Sup-
pose that the overall electrical effort is H . We observe that the logical effort
of the nm multiplexer stages in cascade will be 2nm , so the total effort will be
F = 2nm H . If the best effort borne by each stage should is , the best number of
stages is nm + na = (ln F )=(ln ). Solving for na , we obtain:
!
na = nm ln 2 ; 1 + ln H
ln ln (11.17)
This equation shows that as the electrical effort grows, the number of stages in-
creases. But it also shows that there will be cases where no amplifiers are required,
i.e., na = 0. For example, if H = 1, then na = 0 because it is always true that
2 (see Equation 3.23). For values of H not much greater than one, the number
of amplifiers will still be zero.
This result has an intuitive explanation. The logical effort of a multiplexer
stage, 2, is less than the best step-up ratio, , which is always e = 2:718 : : : or
greater. Thus a multiplexer stage has some “gain.” If the electrical effort per stage
is less than =2, no additional amplifier stages are necessary. For sufficiently large
electrical effort, of course, extra amplifiers are required.
Let us now find the best value of nm , and therefore the best width for a mul-
tiplexer, r = n1=nm . Minimizing Equation 11.16 is quite complex, because there
are two independent variables, nm and na , and because the equation is itself com-
plex. In the simple case that H = 1, we have observed that na = 0. In this case,
we obtain:
!
ln n
DH =1 = 2nm (1 + pinv n ) = 2 ln r (1 + rpinv )
1=nm
(11.18)
Taking the partial derivative with respect to r and setting it to zero, we find:
1 + rp1 ; ln r = 0 (11.19)
inv
This striking equation lets us calculate r , the width of a multiplexer, given only
some information about the stray capacitance of the multiplexer design. The best
width is independent of the total number of inputs, n.
11.4. MULTIPLEXERS 185
6
best width r
0
0 1 2 3 4
pinv
Figure 11.12 plots r for different values of pinv , computed using Equation 11.19.
In practice, to make decoding manageable, we will require r to be a power of two.
With this constraint, it is clear from the table that for reasonable contributions of
stray capacitance, multiplexers should have four inputs.
To be sure of this result, we should analyze Equation 11.14 for electrical ef-
forts other than one. This analysis leads to slightly different values for r than those
predicted by Equation 11.19, but the best width for a practical multiplexer is still
four!
an n-input multiplexer
an n-input multiplexer followed by an inverter
a 4-input multiplexer followed by an dn=4e multiplexer
The best design can be determined by comparing the delay equations of the
three choices. Figure 11.13 shows the ranges of n and H for which each design
is best. The choice between the first and second designs of adding an inverter
depends on the electrical effort: larger electrical efforts are best driven by more
stages. The third design is better than than the first driving large electrical efforts,
but it is not as good as the second. Therefore, the number of inputs at which the
multiplexer is best divided into a tree varies with the electrical effort. At electrical
efforts above 12, it is worth considering three stage designs as well.
The plot shows that it is useful to have multiplexers with up to 6 or 7 inputs in
a library. This cutoff depends on the parasitic capacitance; if the capacitance were
cut by two, multiplexers with 8-10 inputs become useful.
11.5 Summary
This chapter surveyed a number of tree-structured designs. The logical effort of a
tree structure grows more slowly with the number of inputs than does the logical
effort of a single gate that computes the same function. We made two observations
from the design of the tree structures in this chapter:
Trees that minimize logical effort are deep and have low branching factor,
i.e., the number of inputs to gates is 2 or 3. Moreover, NOR gates never
appear in these trees, because a NAND gate and inverter computes the same
function with less logical effort.
It is not always advisable to use the tree with the lowest logical effort, be-
cause the tree may have too many stages for best speed. Bushier trees with
larger logical effort and fewer stages may result in less delay.
11.5. SUMMARY 187
7
4-input mux + n/4 -input mux
5
Mux Width n
n-input mux
3 n-input mux + inverter
0
0 2 4 6 8 10 12
Electrical Effort H
We have applied the tree structures to the design of C-elements and decoders.
We have seen, however, that logical effort is but one component of the delay and
that often electrical effort and parasitics dominate total delay. Thus, the delay
advantage of tree-based C-elements is lower than a simple logical effort analysis
would predict. Similarly, wide multiplexers are best divided into trees of 4-input
multiplexers to reduce the parasitic delay, despite the increase in logical effort.
11.6 Exercises
11-1 [20] For what values of p and q does Equation 11.6 show substantial dif-
ferences between AND trees and predecoding? (Use Table 11.1.) Why is there a
difference?
11-2 [25] The Lyon-Schediwy decoder is presented in its NOR form. Show the
NAND form, and compute its logical effort. Which form has less logical effort?
11-3 [30] Determine the best multiplexer width for values of H > 1, as suggested
at the end of Section 11.4. Your results will depend on n.
Conclusions
We characterize the complexity of the gate by a number called logical effort. Log-
ical effort, g , is the ratio of the input capacitance of a gate to the input capacitance
of an inverter that can produce equal current; in other words, it describes how
much bigger than an inverter a gate must be to drive loads as well as the inverter
can. By this definition, an inverter has logical effort of 1.
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
189
190 CHAPTER 12. CONCLUSIONS
know the path itself was a good design? A good path uses the right number of
stages and selects gates for each stage with low logical effort and parasitic delay.
The path effort and the best stage effort set the best number of stages. For gates
with no parasitics, the best stage effort is e = 2:718 : : :. For gates with realistic
parasitics, the best stage effort is larger because it is better to use fewer stages
and reduce parasitic delay of paths. Stage effort of 4 is an excellent choice over a
range of assumptions. The designer has significant freedom to deviate from this
best stage effort, however. Stage efforts from 2.4 to 6 give delays within 15% of
minimum. The best number of stages is thus about:
N log4 F (12.4)
The designer should not only select a reasonable number of stages, but should
also employ gates with low logical efforts. For example, NAND gates are better
than NOR gates in static CMOS . Multiple stages of low fan-in gates have lower
logical effort than a single gate with many inputs. Indeed, considering parasitics
and logical effort, fast gates generally have no more than 4 series transistors. Path
design may involve iteration because the path’s logical effort is not known until
the topology is chosen, but the right number of stages cannot be known accurately
without knowing the logical effort.
Logical effort also explains and quantifies the benefits of various circuit fam-
ilies. For example, domino circuits are faster than static because they have lower
logical effort. Pseudo-NMOS wide NOR structures are also fast because of low log-
ical effort. When static CMOS is insufficient to meet delay requirements, consider
other circuit families.
1. The idea of a numeric “logical effort” that characterizes the delay charac-
teristics of a logic gate or a path through a network is very powerful. It
allows us to compare alternative circuit topologies and to show that some
topologies are uniformly better than others.
192 CHAPTER 12. CONCLUSIONS
2. Circuits are fastest when the effort delay of each stage is the same. More-
over, one should select the number of stages to make this effort about 4.
CAD tools can automatically check a design and flag nodes with poorly
chosen efforts.
3. Fortunately, path delay is very insensitive to modest deviations from the op-
timum. Therefore, the designer has freedom to adjust the number of stages.
Sizing calculations can be done “on the back of an envelope” or in the de-
signer’s head to one or two significant figures. The final results will be
very close to minimum delay for the topology, so there is little benefit to
tweaking transistor sizes in a circuit simulator.
5. The logical effort of each input of a gate increases through no fault of its
own as the number of inputs grows. This vividly illustrates the cost of gates
with large fan-in. Logical effort can be used to compare designs that are
bushy and shallow with those that are narrow and deep.
7. Circuits that branch should generally have path lengths differing by no more
than 1 gate between the branches. Input capacitance is divided among the
legs in proportion to the effort of each leg. It is much better to use 1-2 forks
or 2-3 forks than 0-1 forks because the capacitance can be balanced between
the legs.
Block Spec:
function, Cin, Cout, delay
Sketch path
it is best to start with static CMOS. The best number of stages emerges from
preliminary logical effort calculations, but may be revised later. Label each gate
with its logical effort.
Next, the designer should consider interconnect because the flight time across
long wires is independent of the driver size. Each wire should be labeled with its
length, metal layer (e.g. metal4), and width and spacing. Wires can default to min-
imum width and spacing unless they prove to be critical. From these parameters,
the designer can compute the wire resistance R, capacitance C , and distributed
delay, RC=2. If this delay is small enough, perhaps less than a gate delay, the
wire is acceptable. If the delay is too large, the designer may increase the width
or spacing or insert repeaters. Once the wire design is complete, the wire can be
treated as a lumped capacitance for logical effort purposes.
It is now time to pick sizes for the gates. The designer should estimate the
path effort and thus compute the stage effort. The stage effort should be about
4 for static logic and about 2.75 for domino logic; if it is far off, stages should
be added or combined. If the path is simple branching, the path effort is easy
to determine. If the path has complex branches and medium-length wires, the
estimate may be inaccurate, but will be corrected later. The designer starts at the
end and works backward, applying a capacitance transformation to compute the
size of each gate. Practical constraints sometimes restrict the choice of sizes. For
instance, transistors have a minimum allowable size or a library may limit choice
of gate sizes. Sometimes a large driver should be undersized to save area and
power.
After assigning gate sizes, compare the actual input capacitance to the speci-
fication. If the input capacitance is smaller than specification, the stage effort was
larger than necessary and can be reduced. If the input capacitance exceeds the
specification, the stage effort was smaller than necessary and must be increased.
If the input capacitance greatly exceeds the specification, the design probably has
too few stages and buffers should be added to the end of the path.
Once the input capacitance meets specification, compute the delay of the cir-
cuit by simulation, static timing analysis, or hand estimation. If you are fortunate,
the design is faster than necessary. Increase the stage effort and reduce the in-
put capacitance to reduce the area of the design and present a smaller load to the
previous path.
Murphy’s Law dictates that the design is usually too slow. A common mis-
take among beginners is to tweak the sizes of transistors in the path in the hope of
improving speed. This is doomed to failure if logical effort was correctly applied
because the gate sizes are already right for theoretically minimum delay, as accu-
196 CHAPTER 12. CONCLUSIONS
rately as the model allows. Often the problem is “solved” by upsizing all of the
devices in the path, but such a solution violates the input capacitance specification
and pushes the problem to the previous path. Moreover, it leads to designs with
overly large gates. A better approach is to rethink the overall topology. Perhaps it
is possible to use faster circuit families or rearrange gates to favor late inputs. If
a better topology is found, repeat the sizing process. If the topology and sizes are
thoroughly optimized, the block spec is infeasible and the specification must be
modified. Sometimes logical effort allows one to reject a specification as unreal-
istic with very little design work.
With practice, this design procedure is easy to use and works for a wide range
of circuit problems. It is intended only as a general guide; the designer should
also trust his or her own intuition and special knowledge of the problem.
Equal fanout design is sufficient for circuits like decoders in which the logical
effort tends to be low, but is suboptimal for paths with large logical effort because
it results in stage efforts well above 4.
the path is fast nor whether the topology was a good one in the first place. More-
over, numerical methods are prone to get stuck in local optima and are unlikely
to produce meaningful results unless the user knows approximately what results
should be expected. Synthesis tools make some effort to explore topologies, but
still cannot match experienced designers on critical paths. Moreover, designers
have in their heads many constraints on the design, such as performance, floor-
plan, wiring, and interfaces with other circuits. Merely specifying all of these
constraints to an optimization tool may take longer than selecting reasonable sizes
by hand. Finally, accurate circuit optimization is fundamentally a nonlinear prob-
lem which tends to have runtime and convergence problems when applied to real
designs.
Logical effort explains how to design a path for maximum speed, but does
not easily show how to design a path for minimum area or power under a
fixed delay constraint.
Logical effort does not provide simple closed-form solutions to paths that
branch and have a different number of stages or different parasitic delays
on each branch. Usually iteration is required to tune such circuits. Itera-
tion is also required when fixed wire capacitances are comparable to gate
capacitance.
Many real circuits are too complex to optimize by hand. For example, prob-
lems in Chapter 11 were solved with spreadsheets or with simple scripts.
Given that numerical optimization is sometimes necessary, perhaps the op-
timizer should use a more accurate delay equation.
12.6. PARTING WORDS 199
[3] M. Horowitz. Timing Models for MOS Circuits. PhD thesis, Stanford Uni-
versity, December 1983. TR SEL83-003.
[5] R.F.Lyon and R.R.Schediwy. “CMOS Static Memory with a New Four-
Transistor Memory Cell.” Proc. Stanford Conf. on Advanced Research in
VLSI, March 1987.
[7] I.E. Sutherland, and R.F. Sproull. “Logical Effort: Designing for Speed on
the Back of an Envelope.” Proc. Conf. on Advanced Research in VLSI, March
1991.
[9] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design, 2nd edition.
Addison-Wesley, 1993, p. 219.
201
202 BIBLIOGRAPHY
Appendix A
Cast of Characters
The notation used in this monograph obeys certain conventions whenever possi-
ble:
Parameters of the fabrication process and design parameters that are likely
to be the same for all logic gates are given by Greek letters.
Logic gate inputs and outputs are single lower-case letters, in the set a b c
whenever possible. Subscripts are often used to indicate different stages of
logic along a path in a network.
d The delay in a single stage of a logic network, or “stage delay.” Often sub-
scripted to identify a single stage of a network.
D^ The total delay along a path through a logic network when the design of the
network is optimized to obtain least delay.
g The logical effort per input or bundle of a logic gate. Often subscripted to
denote the particular input or bundle and/or to identify a single stage of a
0
Copyright c 1998, Morgan Kaufmann Publishers, Inc. This material may not be copied or
distributed without permission of the publisher.
203
204 APPENDIX A. CAST OF CHARACTERS
network. (The letter g represents logical effort because it is the first letter
in the word “logical” that is not easily confused with other symbols—l with
one and o with zero.)
gtot The total logical effort of a single logic gate. Often subscripted to identify a
single stage of a network.
G The path logical effort borne by one or more paths through a logic network.
When subscripted with characters, denotes the logical effort between two
points in a network: Gab is the logical effort along the path from a to b.
h The electrical effort borne by a single stage: h = Cout =Cin . This is the ratio
of a logic gate’s load capacitance to the input capacitance of a single input.
Often subscripted to identify a single stage of a network. (The letter h is
chosen so that the formula f = gh reads in alphabetical order.)
H The path electrical effort borne by one or more paths through a logic network.
When subscripted with characters, denotes the electrical effort between two
points in a network: Hab is the electrical effort along the path from a to b.
b The branching effort borne at the output of a single logic gate. Often subscripted
to identify a stage of a network.
B The path branching effort borne by one or more paths through a network. Note
that branching effort at the last stage in a network is not counted, since the
electrical effort reflects the effort of branching in the last stage.
f The effort, electrical and logical, borne by a single stage: f = gh. Often
subscripted to identify a single stage of a network. Sometimes called the
effort delay, because it is the contribution to delay in a single logic gate that
is induced by the effort the gate bears. (The letter f was chosen to represent
the word “effort;” the letter e being too easily confused with the constant
2.718.)
f^ The optimum value of f to minimize delay along a path with a given number
of stages.
F The path effort, electrical, branching, and logical, borne by one or more paths
through a logic network: F = GBH . When subscripted with characters,
denotes the path effort between two points in a network: F ab is the path
effort along the path from a to b.
The ratio of the shape factor of p-type pullup transistors to that of n-type pull-
down transistors in an inverter: = (Wp =Lp )=(Wn =Ln ). Usually > 1.
P=N ratio The ratio of the shape factor of p-type pullup transistors to that of n-
type pulldown transistors in an arbitrary logic gate. For inverters, the P=N
ratio equals . For a 2-input NOR gate, the P=N ratio must be 2 to have
rising and falling delays proportional to those of an inverter.
The adjectives “stage” and “path” are applied to logical effort, electrical effort,
effort, effort delay, and parasitic delay. The adjective “total” is applied to “logical
effort” only, and means the sum of the logical effort per input of all inputs of a
logic gate.
Appendix B
The Logical Effort web page offers several tools to assist with logical effort. The
page can be found at:
http://velox.stanford.edu/TBD
Documentation for the tools is also online.
207
Index
208
INDEX 209
length, 43
precharge, 158
width, 43
transmission gate, 161
tree
AND, 166
C-element, 171
design procedure, 168
minimum delay, 169, 173
minimum logical effort, 166, 171
multiplexer, 181
tri-state, 80
example, 102
logical effort, 68
velocity saturation, 64
dynamic gates, 153
XNOR
logical effort, 69
parasitic delay, 77
XOR
logical effort, 4, 67, 69
parasitic delay, 5, 77