Datapath Subsystems
Datapath Subsystems
Contents i
List of Figures ii
i
List of Figures
ii
List of Tables
1.1 Truth table for generating even and odd parity bit . . . . . . . . . . . . . . . . 17
iii
Chapter 1
1.1 Introduction
1. Datapath operators
2. Memory elements
3. Control structures
4. Special-purpose cells
I/O
Power distribution
Clock generation and distribution
Analog and RF
CMOS system design consists of partitioning the system into subsystems of the
types listed above. Many options exist that make trade-offs between speed, den-
sity, programmability, ease of design, and other variables. This chapter addresses
design options for common datapath operators. The next chapter addresses arrays,
especially those used for memory. Control structures are most commonly coded in
a hardware description language and synthesized.
Datapath operators benefit from the structured design principles of hierarchy,
regularity, modularity, and locality. They may use N identical circuits to process
N-bit data. Related data operators are placed physically adjacent to each other to
reduce wire length and delay. Generally, data is arranged to flow in one direction,
while control signals are introduced in a direction orthogonal to the dataflow.
Common datapath operators considered in this chapter include adders, one/zero
detectors, comparators, counters, shifters, ALUs, and multipliers.
1.2 Shifters
1.3 Adders
Addition is one of the basic operation perform in various processing like counting,
multiplication and filtering. Adders can be implemented in various forms to suit
different speed and density requirements.
The truth table of a binary full adder is shown in Figure 4.3, along with some
functions that will be of use during the discussion of adders. Adder inputs: A, B
= C.(A ⊕ B) + C.(A ⊕ B)
=A⊕B⊕C
= A.B + C.(A + B)
= A.B + C.(A + C)
The direct implementation of the above equations is shown in Fig. 4.4 using the
gate schematic and the transistors is shown in Fig. 4.5.
The full adder of Fig. 4.5 employs 32 transistors (6 for the inverters, 10 for the
carry circuit, and 16 for the 3-input XOR). A more compact design is based on the
observation that S can be factored to reuse the CARRY term as follows:
Such a design is shown at the transistor levels in Figure 4.6 and uses only 28
transistors. Note that the pMOS network is complement to the nMOS network.
Here Cin=C
A ripple carry adder is a digital circuit that produces the arithmetic sum of two
binary numbers. It can be constructed with full adders connected in cascaded,
with the carry output from each full adder connected to the carry input of the
next full adder in the chain. Figure 4.7 shows the interconnection of four full adder
(FA) circuits to provide a 4-bit ripple carry adder. Notice from Figure 4.7 that the
input is from the right side because the first cell traditionally represents the least
significant bit (LSB). Bits a0 and b0 in the figure represent the least significant bits
of the numbers to be added. The sum output is represented by the bits S0 -S3 .
The carry lookahead adder (CLA) solves the carry delay problem by calculating
the carry signals in advance, based on the input signals. It is based on the fact that
a carry signal will be generated in two cases: (1) when both bits ai and bi are 1, or
(2) when one of the two bits is 1 and the carry-in is 1 . Thus, one can write,
si = (ai ⊕ bi ) ⊕ ci
The above two equations can be written in terms of two new signals Pi and Gi ,
which are shown in Figure 4.8:
ci+1 = Gi + Pi .ci
s i = Pi ⊕ ci
Where
Gi = ai .bi
Pi = (ai ⊕ bi )
Pi and Gi are called carry propagate and carry generate terms, respectively. Notice
that the generate and propagate terms only depend on the input bits and thus will
be valid after one and two gate delay, respectively. If one uses the above expression
to calculate the carry signals, one does not need to wait for the carry to ripple
through all the previous stages to find its proper value. Let’s apply this to a 4-bit
adder to make it clear.
Putting i = 0, 1, 2, 3 in ci+1 = Gi + Pi .ci we get
c1 = G0 + P0 .c0
c4 = G3 + P3 .G2 + P3 .P2 .G1 + P3. .P2 .P1 .G0 + P3 .P2 .P1 .P0 .c0
Notice that the carry-out bit, ci+1 , of the last stage will be available after four delays:
two gate delays to calculate the propagate signals and two delays as a result of the
gates required to implement Equation c4 .
Figure 4.9 shows that a 4-bit CLA is built using gates to generate the Pi and
Gi and signals and a logic block to generate the carry out signals according to
Equations c1 to c4 .
Logic gate and transistor level implementation of carry bits are shown in Figure
4.10.
The disadvantage of CLA is that the carry logic block gets very complicated for
more than 4-bits. For that reason, CLAs are usually implemented as 4-bit modules
and are used in a hierarchical structure to realize adders that have multiples of
4-bits.
(a) Logic network for 4-bit CLA carry bits (b) Sum calculation using CLA network
transmission gate. If the carry path is precharged to VDD, the transmission gate is
then reduced to a simple NMOS transistor. In the same way the PMOS transistors
of the carry generation is removed. One gets a Manchester cell.
The Manchester cell is very fast, but a large set of such cascaded cells would
be slow. This is due to the distributed RC effect and the body effect making the
propagation time grow with the square of the number of cells. Practically, an
inverter is added every four cells, like in Figure 4.12.
1.4 Multipliers
The multiplication process may be viewed to consist of the following two steps:
• Serial/parallel form
• Parallel form
A parallel multiplier is based on the observation that partial products in the multi-
plication process may be independently computed in parallel. For example, consider
the unsigned binary integers X and Y.
m−1
X n−1
X
i
X= Xi · 2 and Y = Yj · 2j
i=0 j=0
m−1
! n−1
!
X X
P =X × Y = Xi · 2i × Yj · 2j
i=0 j=0
m−1
XX n−1
= (Xi · Yj ) · ·2i+j
i=0 j=0
m+n−1
X
= Pk · 2k
k=0
Thus Pk are the partial product terms called summands. There are mn summands,
which are produced in parallel by a set of mn AND gates.
For 4-bit numbers, the expression above may be expanded as in the table below.
Figure 1.15
The worst-case delay associated with such a multiplier is (2n + l)tg , where tg is the
worst-case adder delay.
Cell shown in Figure 4.16 is a cell that may be used to construct a parallel
multiplier.
The Xi term is propagated diagonally from top right to bottom left, while the
yj term is propagated horizontally. Incoming partial products enter at the top.
Incoming CARRY IN values enter at the top right of the cell. The bit-wise AND is
performed in the cell, and the SUM is passed to the next cell below. The CARRY
0UT is passed to the bottom left of the cell.
Figure 4.17 depicts the multiplier array with the partial products enumerated.
The Multiplier can be drawn as a square array, as shown here, Figure 4.18 is
the most convenient for implementation.
In this version the degeneration of the first two rows of the multiplier are shown.
The first row of the multiplier adders has been replaced with AND gates while the
second row employs half-adders rather than full adders.
This optimization might not be done if a completely regular multiplier were
required (i.e. one array cell). In this case the appropriate inputs to the first and
second row would be connected to ground, as shown in the previous slide. An
adder with equal carry and sum propagation times is advantageous, because the
worst-case multiply time depends on both paths.
If the truth table for an adder, is examined, it may be seen that an adder is in
effect a “one’s counter” that counts the number of l’s on the A, B, and C inputs
and encodes them on the SUM and CARRY outputs.
A l-bit adder provides a 3:2 (3 inputs, 2 outputs)compression in the number of
bits. The addition of partial products in a column of an array multiplier may be
thought of as totaling up the number of l’s in that column, with any carry being
passed to the next column to the left.
Figure 1.19
Considering the product P3, it may be seen that it requires the summation of
four partial products and a possible column carry from the summation of P2.
can be used here. The delay through the array addition (not including the CPA)
is proportional to log1.5(n), where n is the width of the Wallace tree.
2. External noise and loss of signal strength cause loss of data bit information
while transporting data from one device to other device, located inside the
computer or externally.
3. To indicate any occurrence of error, an extra bit is included with the message
according to the total number of 1s in a set of data, which is called parity.
4. If the extra bit is considered 0 if the total number of 1s is even and 1 for odd
quantities of 1s in a set of data, then it is called even parity.
5. On the other hand, if the extra bit is 1 for even quantities of 1s and 0 for an
odd number of 1s, then it is called odd parity
Table 1.1: Truth table for generating even and odd parity bit
If the message bit combination is designated as, D3 D2 D1 D0 and Pe, Po are the
even and odd parity respectively, then it is obvious from the table that the Boolean
expressions of even parity and odd parity are
Pe =D3 D2 D1 D0
Po =(D3 D2 D1 D0 )
The above illustration is given for a message with four bits of information.
However, the logic diagrams can be expanded with more XOR gates for any number
of bits.
Detecting all ones or zeros on wide N-bit words requires large fan-in AND or NOR
gates. Recall that by DeMorgan’s law, AND, OR, NAND, and NOR are funda-
mentally the same operation except for possible inversions of the inputs and/or
outputs. You can build a tree of AND gates, as shown in Figure 4.26(b). Here,
alternate NAND and NOR gates have been used. The path has log N stages.
Figure 1.26: One/zero detectors (a) All one detector (b) All zero detector (c) All zero detector
transistor level representation
1.7 Comparators
Another common and very useful combinational logic circuit is that of the Digital
Comparator circuit. Digital or Binary Comparators are made up from standard
AND, NOR and NOT gates that compare the digital signals present at their input
terminals and produce an output depending upon the condition of those inputs.
For example, along with being able to add and subtract binary numbers we need
to be able to compare them and determine whether the value of input A is greater
than, smaller than or equal to the value at input B etc. The digital comparator
accomplishes this using several logic gates that operate on the principles of Boolean
Algebra. There are two main types of Digital Comparator available and these are.
A > B, A + B, A < B
Inputs Outputs
B A A>B A=B A < B
0 0 0 1 0
0 1 1 0 0
1 0 0 0 1
1 1 0 0 0
From the above table the obtained expressions for magnitude comparator using
K-map are as follows
For A < B : C = AB
For A = B : D = AB + AB
For A > B : E = AB The logic diagram of 1-bit comparator using basic gates is
shown bellow in Figure 1.24.
*** Draw separate diagrams for grater, equality and less than expressions.
1.8 Counters
Counters can be implemented using the adder/subtractor circuits and registers (or
equivalently, D flip- flops)
The simplest counter circuits can be built using T flip-flops because the tog-
gle feature is naturally suited for the implementation of the counting operation.
Counters are available in two categories
The flip-flop output transition serves as a source for triggering other flip-flops
i.e the C input (clock input) of some or all flip-flops are triggered NOT by the
common clock pulses Eg:- Binary ripple counters, BCD ripple counters
C input (clock input) of all flip-flops receive the common clock pulses
E.g.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring
counter, Johnson counter,
Figure 4.28 shows a 3-bit counter capable of counting from 0 to 7. The clock
inputs of the three flip-flops are connected in cascade. The T input of each flip-
flop is connected to a constant 1, which means that the state of the flip-flop will
be toggled at each active edge (here, it is positive edge) of its clock. We assume
that the purpose of this circuit is to count the number of pulses that occur on the
primary input called Clock. Thus the clock input of the first flip-flop is connected
to the Clock line. The other two flip-flops have their clock inputs driven by the Q
output of the preceding flip-flop. Therefore, they toggle their states whenever the
preceding flip-flop changes its state from Q = 1 to Q = 0, which results in a positive
edge of the Q signal.
Note here the value of the count is the indicated by the 3-bit binary number
Q2Q1Q0. Since the second flip-flop is clocked by Q0 , the value of Q1 changes
shortly after the change of the Q0 signal. Similarly, the value of Q2 changes shortly
after the change of the Q1 signal. This circuit is a modulo-8 counter. Because it
counts in the upward direction, we call it an up-counter. This behavior is similar
to the rippling of carries in a ripple-carry adder. The circuit is therefore called an
asynchronous counter, or a ripple counter.
A synchronous counter usually consists of two parts: the memory element and the
combinational element. The memory element is implemented using flip-flops while
the combinational element can be implemented in a number of ways. Using logic
gates is the traditional method of implementing combinational logic and has been
applied for decades.
pattern of bits in each row of the table, it is apparent that bit Q0 changes on each
clock cycle. Bit QQ1 changes only when Q0 = 1. Bit Q2 changes only when both Q1
and Q0 are equal to 1. Bit Q3 changes only when Q2 = Q1 = Q0 = 1. In general, for
an n-bit up-counter, a give flip-flop changes its state only when all the preceding
flip-flops are in the state Q = 1. Therefore, if we use T flip-flops to realize the 4-bit
counter, then the T inputs should be defined as
T0 = 1
T1 = Q0
T2 = Q0 Q1
T3 = Q0 Q1 Q2
In Figure 5, instead of using AND gates of increased size for each stage, we use a
factored arrangement. This arrangement does not slow down the response of the
counter, because all flip-flops change their states after a propagation delay from
the positive edge of the clock. Note that a change in the value of Q0 may have to
propagate through several AND gates to reach the flip-flops in the higher stages of
the counter, which requires a certain amount of time. This time must not exceed
the clock period. Actually, it must be 3less than the clock period minus the setup
time of the flip-flops. It shows that the circuit behaves as a modulo-16 up-counter.
Because all changes take place with the same delay after the active edge of the
Clock signal, the circuit is called a synchronous counter.