Lecture 10 Arithmetic Circuits 2021
Lecture 10 Arithmetic Circuits 2021
(83-313)
Lecture 10:
Arithmetic Circuits
Prof. Adam Teman
10 June 2021
Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email [email protected] and I will address this as soon as possible.
Lecture Content
2
AdamJune
Teman,
10, 2021
DataPaths
3
Multiple functional units
• A complex processor may have multiple functional units working in parallel:
bit 0
bit 1
bit 2
Multiplexer
bit 3
Registers
bit 4
Shifter
Adder
bit 5
bit 6
Data Data
In out
6
Serial Adder Concept
• At time i, read ai and bi.
Produce si and ci+1
• Internal state stores ci.
Carry bit c0 is set as cin
8
AdamJune
Teman,
10, 2021
Full-Adder Implementation
• A full-adder is therefore a majority gate and a 3-input XOR:
Total: 32 Transistors
S A B Cin
ABCin A B Cin Cout
VDD
Ci 4 A 4B 4 4 4 P 4 4 4 4 6
A
6 4 B K 6
12A 4 4
G! 4 6
B
6 4
4 Ci 12 B VDD
2 P! 2 3
A
6 2
X
12 C i 3
Ci
2 2 A S
2 2 2 2 2 2 3
2 3 Ci
A 2 B 2 2 B VDD
A
2 B 2C 2 3 i A
Co
3 B
242463
Cout AB ACi BCi G A B LECi 7
S ABCin A B Cin Cout P A B 3
2 4 2 3 12 4 …BUT ~64 stages to propagate
LECi 9 i.e., PEopt=464
11 3 AdamJune
Teman,
10, 2021
Exploiting the Inversion Property A B A B
Ci FA Co Ci FA Co
S S
S A B C i = S A B Ci
C o A B C i = Co A B Ci
Even cell Odd cell
A0 B0 A1 B1 A2 B2 A3 B3
We saved the
inverter, so PEopt=432
Ci,0 Co,0 Co,1 Co,2 Co,3
FA FA FA FA
S0 S1 S2 S3
12
AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• Problem: How can we make a high speed bitslice layout?
• If we upsize each stage according to Logical Effort,
we will have non-identical bitslices.
• Such upsizing will result in huge gates.
• Why not design the adder to inherently achieve optimal Electrical Effort (EFopt=4)?
• Assume everything not on the carry path can be sized like a minimum inverter!
4 4 4 4 6
6 Not on the critical path!
4 4 6
2 3
2
3
2 2 2 2 3
13
AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• Now, let’s try to size the first stage to get EF=4:
• Remember, logical effort is a function of gate topology and not sizing!
• Therefore, we can temporarily size the first stage as a minimum sized inverter,
giving us: 42
LECin 2
3
• So to get EF=4: 4 4 4 4 6
6
LECin CL ,Cout CL ,Cout 4
EFFA,Cin 2 4 6
CCin CCin EF 4
2 3
2
• But what is CL,Cout? 3
Cin 2 2 2 2 3
CL,Cout
14
AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• What is CL,Cout?
• Obviously, we have the second stage…
• But don’t forget the next full adder!
• So CL,Cout is: CL ,Cout 6 CCin 6 9 CCin 21 CL ,Cout
2 CCin 21
• And now, we can find Cin using the EF constraint we found: CCin EF 4
4 4 4 4 6 4 4 4 4 6
6 6
14 4 4 6 14 4
4 6
7 2 3 7 2 3
2 2
3 3
Cin 2 2 2 2 3 2 2 2 2 3
CL,Cout Cin, i+1
15
AdamJune
Teman,
10, 2021
Subtraction
• To subtract two’s complement, just remember that:
x x 1 A B A B 1
• So, to subtract:
• Invert one of the operands.
• Add a carry in to the first bit.
• Therefore, to provide an adder/subtractor:
• Add an XOR gate to the B-input
• Use the sub/add selector to the XOR and carry in.
16
AdamJune
Teman,
10, 2021
Faster Adders
Carry-Skip (Carry Bypass) Adder
M Sections of (N/M) Bits Each
N
tskip tp/g 1 tcarry M 1 tbypass tsum
M
N M
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup tsetup Setup Setup Setup
tbypass
18
M bits AdamJune
Teman,
10, 2021
Carry-Select Adder
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup Setup Setup Setup
Let’s guess the answer for N-bit input with M CSA blocks
each value of the carry.
tselect tp/g
N
M
tcarry M tmux tsum N M
19
AdamJune
Teman,
10, 2021
Square Root Carry Select
G1
A0, B0 A1, B1 ••• AN-1, BN-1
G0
Ci,0
Co,3
P0
P1
Ci,0 P0 Ci,1 P1
Ci, N-1 PN-1 P2
P3
21 S0 S1 ••• SN-1 AdamJune
Teman,
10, 2021
Gi Ai Bi Pi Ai Bi
Tree Adders (Logarithmic CLA) S P Cin
• Can we reduce the complexity of calculating Pi, Gi ? Cout G P Cin
P1:0 P1 P0 G1:0 G1 P1 G0
Cout,1 G1:0 P1:0Cin,0
P3:2 P3 P2 G3:2 G3 P3 G2
Cout,3 G3:2 P3:2Cin,2
23
AdamJune
Teman,
10, 2021
Manchester Carry-Chain Adder
VDD
N i
t P 0.69 Ci R j
P0 P1 P2 P3
j 1
C3
i 1
Ci,0
G0 G1 G2 G3
N ( N 1)
0.69 RC
2
where R j R, Ci C
Static Circuits
VDD
Pi C0 C1 C2 C3
Propagate/Generate Row
Ci Co VDD
Pi Gi Pi + 1 Gi + 1
Gi
Ci - 1 Ci Ci + 1
GND
27
Grade School Multiplication
1 2 3 4
X 1 2
28
AdamJune
Teman,
10, 2021
Multiplication using serial addition
Multiplicand
1 0 1 0 1 0
Multiplier
X 1 0 1 1
1 0 1 0 1 0
1 0 1 0 1 0 Partial
0 0 0 0 0 0 Products
+ 1 0 1 0 1 0
Result
1 1 1 0 0 1 1 1 0
29
AdamJune
Teman,
10, 2021
Binary Multiplication
N
multiplicand
multiplier
partial
product can be formed in parallel
N array
2N
30
AdamJune
Teman,
10, 2021
Serial Shift and Add
• Concept:
• Multiplying by ‘1’ is copying the multiplicand
• Multiplying by ‘0’ is a row of zeros
• Select multiplicand or zeros
according to multiplier bit
• Add to result
• Shift multiplier and accumulated result
tserial O N tadder O N 2
for RCA
31
AdamJune
Teman,
10, 2021
Array Multiplier
• Calculate the final product in
a single combinatorial calculation
(=potentially one cycle)
32
AdamJune
Teman,
10, 2021
Array Multiplier Implementation
• Stack 2-input Adders:
X3 X2 X1 X0 Y0
X3 X2 X1 X0 Y1 Z0
HA FA FA HA
X3 X2 X1 X0 Y2 Z1
FA FA FA HA
X3 X2 X1 X0 Y3 Z2
FA FA FA HA
Z7 Z6 Z5 Z4 Z3
AdamJune
Teman,
10, 2021
Many Critical Paths
tmult tAND M 1 N 2 tcarry N 1 tsum
ON M
34
AdamJune
Teman,
10, 2021
Can we do it better?
O N log 2 N
36
AdamJune
Teman,
10, 2021
Multiplier Floorplan
X3 X2 X1 X0
Y0
Y1
C S C S C S C S
Z0
Y2
C S C S C S C S
Z1
X3 X2 X1 X0
Y3
Y0 C S C S C S C S
Z2
Y1 Half Adder HA Multiplier Cell
C S C S C S C S
Z0
C C C C
Full Adder FA Multiplier Cell
S S S S
Y2
C S C S C S C S Vector
Z1 Vector Z6Cell
Z7Merging Z5 Z4 Z3
Merging Cell
Y3
X and Y signals are broadcast
37 C S C S C S Cthrough
S X andarray
the complete Y signals are broadcasted AdamJune
Teman,
10, 2021
Faster Multipliers
Booth Recoding n 1
2 i
2 n
1
• Multiplying by ‘0’ is redundant. i 0
39
AdamJune
Teman,
10, 2021
Radix-2 Booth Recoding
• Parse multiplier from left to right
• For each change from 0 to 1, encode a ‘1’
• For each change from 1 to 0, encode a ‘-1’
• For bit 0, assume bit i=-1 is a 0
• Example: 0011 0111 0011 = 0x373
0 1 0 1 1 0 0 1 0 1 0 1
0 1 0 0 1 0 0 0 0 1 0 0 0x 484
0 0 0 1 0 0 0 1 0 0 0 1 0x 111
0x 373
40
AdamJune
Teman,
10, 2021
Modified (Radix-4) Booth Recoding
• Radix-2 Booth Recoding doesn’t work for parallel hardware implementations:
• A worst case (010101010101010) doesn’t reduce the number of partial products.
• Variable length recoders (according to the length of ‘1’ strings)
cannot be implemented efficiently. Partial Product Selection Table
• Instead, just assume a constant length recoder. Multiplier Bits Recorded Bits
000 0
• First apply standard booth recoding.
001 + Multiplicand
• Next encode each pair of bits:
010 + Multiplicand
011 +2 × Multiplicand
100 -2 × multiplicand
101 - Multiplicand
110 - Multiplicand
• This can be summarized in a truth table: 111 0
41
AdamJune
Teman,
10, 2021
Modified (Radix-4) Booth Recoding
• For example, let’s take our previous example:
• 0011 0111 0011 = 01 0-1 10 0-1 01 0-1
• This comes out: 1 -1 2 -1 1 -1.
• We could have done this by using the table:
• 001101110011
Source:
CLA
Result
43
AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier
y0 y1
y2
y0 y1 y2 y3 y4 y5
Ci-1
FA
y3 FA FA
Ci Ci Ci-1
Ci-1
FA Ci Ci-1
y4
FA
Ci Ci-1 Ci Ci-1
FA
y5
Ci FA
FA
C S
C S
44
AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier
45
AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier
Partial products First stage
6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position
(a) (b)
FA HA
(c) (d)
H
A
46
AdamJune
Teman,
10, 2021
Pipelining Multipliers
• Pipelining can be applied to most multiplier structures:
47
AdamJune
Teman,
10, 2021
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”
48
AdamJune
Teman,
10, 2021