0% found this document useful (0 votes)
61 views

Lecture 10 Arithmetic Circuits 2021

This lecture discusses arithmetic circuits such as adders. It explains how multi-bit addition can be performed using a ripple carry adder made up of full adders in series. However, this design has long propagation delays that scale with the number of bits. An alternative approach exploits an inversion property to reduce the propagation delay by eliminating some inverters in the design. This results in a more efficient adder circuit.

Uploaded by

Noam Shemla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Lecture 10 Arithmetic Circuits 2021

This lecture discusses arithmetic circuits such as adders. It explains how multi-bit addition can be performed using a ripple carry adder made up of full adders in series. However, this design has long propagation delays that scale with the number of bits. An alternative approach exploits an inversion property to reduce the propagation delay by eliminating some inverters in the design. This results in a more efficient adder circuit.

Uploaded by

Noam Shemla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Digital Integrated Circuits

(83-313)

Lecture 10:
Arithmetic Circuits
Prof. Adam Teman
10 June 2021

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email [email protected] and I will address this as soon as possible.
Lecture Content

2
 AdamJune
Teman,
10, 2021
DataPaths

3
Multiple functional units
• A complex processor may have multiple functional units working in parallel:

Source: Kuchuk, 2003


4
 AdamJune
Teman,
10, 2021
Bit-Sliced Design
Control

bit 0
bit 1
bit 2

Multiplexer
bit 3
Registers

bit 4

Shifter
Adder
bit 5
bit 6

Data Data
In out

bit 62 Fetzer, Orton, ISSCC’02


bit 63
Tile identical Processor Elements

5 Design for energy efficiency!  AdamJune


Teman,
10, 2021
Basic Addition

6
Serial Adder Concept
• At time i, read ai and bi.
Produce si and ci+1
• Internal state stores ci.
Carry bit c0 is set as cin

Source: Gate Overflow


7
 AdamJune
Teman,
10, 2021
Basic Addition Unit – Full Adder
S  x  y  Cin Kill  x  y
X Y Cin S Cout
0 0 0 0 0  S  P  Cin Generate  x  y
0 0 1 1 0 Cout  xy  xCin  yCin Propagate  x  y
0 1 0 1 0
0 1 1 0 1  Cout  G  P  Cin  x y
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1 Cout=MAJ(X,Y,Cin)

8
 AdamJune
Teman,
10, 2021
Full-Adder Implementation
• A full-adder is therefore a majority gate and a 3-input XOR:

Total: 32 Transistors

9 Source: CMOS VLSI Design


 AdamJune
Teman,
10, 2021
Ripple Carry Adder
• So, it is clear, the Cout output of the
Full Adder is on the critical path.
• Can we exploit this to improve the
design?

S  A  B  Cin 
 ABCin   A  B  Cin  Cout

tadder = (N-1)tcarry + tsum tpd = O(N)


10
Source: CMOS VLSI Design  AdamJune
Teman,
10, 2021
S  ABCin   A  B  Cin  Cout
Full-Adder Implementation 24 Transistors
28 Transistors VDD

VDD
Ci 4 A 4B 4 4 4 P 4 4 4 4 6
A
6 4 B K 6
12A 4 4
G! 4 6
B
6 4
4 Ci 12 B VDD
2 P! 2 3
A
6 2
X
12 C i 3
Ci
2 2 A S
2 2 2 2 2 2 3
2 3 Ci

A 2 B 2 2 B VDD
A
2 B 2C 2 3 i A

Co
3 B

242463
Cout  AB  ACi  BCi G  A B LECi  7
S  ABCin   A  B  Cin  Cout P  A B 3
2  4  2  3  12  4 …BUT ~64 stages to propagate
LECi  9 i.e., PEopt=464
11 3  AdamJune
Teman,
10, 2021
Exploiting the Inversion Property A B A B

Ci FA Co Ci FA Co

S S

S  A B C i  = S  A B  Ci 

C o  A B C i  = Co  A B  Ci 
Even cell Odd cell

A0 B0 A1 B1 A2 B2 A3 B3
We saved the
inverter, so PEopt=432
Ci,0 Co,0 Co,1 Co,2 Co,3
FA FA FA FA

S0 S1 S2 S3
12
 AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• Problem: How can we make a high speed bitslice layout?
• If we upsize each stage according to Logical Effort,
we will have non-identical bitslices.
• Such upsizing will result in huge gates.
• Why not design the adder to inherently achieve optimal Electrical Effort (EFopt=4)?
• Assume everything not on the carry path can be sized like a minimum inverter!

4 4 4 4 6
6 Not on the critical path!
4 4 6
2 3
2
3
2 2 2 2 3
13
 AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• Now, let’s try to size the first stage to get EF=4:
• Remember, logical effort is a function of gate topology and not sizing!
• Therefore, we can temporarily size the first stage as a minimum sized inverter,
giving us: 42
LECin  2
3
• So to get EF=4: 4 4 4 4 6
6
LECin  CL ,Cout CL ,Cout 4
EFFA,Cin   2 4 6
CCin CCin EF  4
2 3
2
• But what is CL,Cout? 3
Cin 2 2 2 2 3
CL,Cout
14
 AdamJune
Teman,
10, 2021
Sizing the Mirror Adder
• What is CL,Cout?
• Obviously, we have the second stage…
• But don’t forget the next full adder!
• So CL,Cout is: CL ,Cout  6  CCin  6  9  CCin  21 CL ,Cout
 2  CCin  21
• And now, we can find Cin using the EF constraint we found: CCin EF  4

4 4 4 4 6 4 4 4 4 6
6 6
14 4 4 6 14 4
4 6
7 2 3 7 2 3
2 2
3 3
Cin 2 2 2 2 3 2 2 2 2 3
CL,Cout Cin, i+1
15
 AdamJune
Teman,
10, 2021
Subtraction
• To subtract two’s complement, just remember that:
x  x 1 A  B  A  B 1
• So, to subtract:
• Invert one of the operands.
• Add a carry in to the first bit.
• Therefore, to provide an adder/subtractor:
• Add an XOR gate to the B-input
• Use the sub/add selector to the XOR and carry in.

16
 AdamJune
Teman,
10, 2021
Faster Adders
Carry-Skip (Carry Bypass) Adder
M Sections of (N/M) Bits Each
N 
tskip  tp/g    1 tcarry   M  1 tbypass  tsum
M 
 N  M 
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup tsetup Setup Setup Setup
tbypass

Carry Carry Carry Carry


propagation propagation propagation propagation

Sum Sum Sum tsum Sum

18
M bits  AdamJune
Teman,
10, 2021
Carry-Select Adder
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup Setup Setup Setup

0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry

1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry

Multiplexer Multiplexer Multiplexer Multiplexer


Ci,0 Co,3 Co,7 Co,11 Co,15

Sum Generation Sum Generation Sum Generation Sum Generation


S0–3 S4–7 S8–11 S12–15

Let’s guess the answer for N-bit input with M CSA blocks
each value of the carry.

tselect  tp/g 
N
M
tcarry  M  tmux  tsum   N M  
19
 AdamJune
Teman,
10, 2021
Square Root Carry Select

tsqrt  tp/g  Mtcarry  2 Ntmux  tsum   2N 


20
 AdamJune
Teman,
10, 2021
Carry Lookahead Adder – Basic Idea
• Problem – Cout,k takes approximately k gate delays to ripple. Gi  Ai  Bi
• Question – can we calculate the carry without any ripple? Pi  Ai  Bi
Cout,k  f ( Ak , Bk , Cout,k 1 )  Gk  Pk  Cout,k 1 VDD

Cout,k  Gk  Pk  (Gk 1  Pk 1  Cout,k  2 ) G3

Cout,k  Gk  Pk  (Gk 1  Pk 1  (  P1 (G0  P0Cin,0 ))) G2

G1
A0, B0 A1, B1 ••• AN-1, BN-1
G0

Ci,0
Co,3

P0

P1
Ci,0 P0 Ci,1 P1
Ci, N-1 PN-1 P2

P3
21 S0 S1 ••• SN-1  AdamJune
Teman,
10, 2021
Gi  Ai  Bi Pi  Ai  Bi
Tree Adders (Logarithmic CLA) S  P  Cin
• Can we reduce the complexity of calculating Pi, Gi ? Cout  G  P  Cin
P1:0  P1  P0 G1:0  G1  P1  G0
 Cout,1  G1:0  P1:0Cin,0
P3:2  P3  P2 G3:2  G3  P3  G2
 Cout,3  G3:2  P3:2Cin,2

P3:0  P3:2  P1:0 G3:0  G3:2  P3:2  G1:0


 Cout,3  G3:0  P3:0Cin,0
ttree  tp/g  log 2 N  tAND/OR  tsum  O  log 2 N 
22  AdamJune
Teman,
10, 2021
Tree Adders (Logarithmic CLA)
• Many ways to construct these CLA or tree adders, based on:
• Radix: How many bits combined in each gate
• Tree Depth: How many stages of logic to the final carry (>=logradixN)
• Fanout: Maximal logic branching in tree

23
 AdamJune
Teman,
10, 2021
Manchester Carry-Chain Adder
VDD

N  i 
t P  0.69 Ci    R j 

P0 P1 P2 P3

 j 1 
C3
i 1
Ci,0
G0 G1 G2 G3
N ( N  1)
 0.69 RC
 2
where R j  R, Ci  C
Static Circuits
VDD
Pi C0 C1 C2 C3
Propagate/Generate Row

Ci Co VDD
Pi Gi  Pi + 1 Gi + 1 
Gi

Ci - 1 Ci Ci + 1

GND

24 Dynamic Circuit  AdamJune


Teman,
10, 2021
Inverter/Sum Row
The Computer Hall of Fame
• The home computer that 80s kids learned
how to play games on and program with:
Source:
http://www.gondolin.org.uk

• Introduced in Dec. 1982 for $595. Continued selling until 1992!


• 8-bit, 1 MHz, 64KB RAM, 16KB ROM
• Ran BASIC as it’s interface.
• The highest selling single computer model of all time.
• It has been compared to the Ford Model T for its role in
bringing a new technology to middle-class households
via creative and affordable mass-production. Source: wikipedia

• Considered the computer that provided the foundation


for the development of open-source software (freeware)
Basic Multiplication

27
Grade School Multiplication

1 2 3 4
X 1 2

28
 AdamJune
Teman,
10, 2021
Multiplication using serial addition
Multiplicand
1 0 1 0 1 0
Multiplier
X 1 0 1 1
1 0 1 0 1 0
1 0 1 0 1 0 Partial
0 0 0 0 0 0 Products

+ 1 0 1 0 1 0
Result
1 1 1 0 0 1 1 1 0
29
 AdamJune
Teman,
10, 2021
Binary Multiplication
N

multiplicand

multiplier

partial
product can be formed in parallel
N array

double precision product

2N

30
 AdamJune
Teman,
10, 2021
Serial Shift and Add
• Concept:
• Multiplying by ‘1’ is copying the multiplicand
• Multiplying by ‘0’ is a row of zeros
• Select multiplicand or zeros
according to multiplier bit
• Add to result
• Shift multiplier and accumulated result

tserial  O  N  tadder   O  N 2 
for RCA

31
 AdamJune
Teman,
10, 2021
Array Multiplier
• Calculate the final product in
a single combinatorial calculation
(=potentially one cycle)

32
 AdamJune
Teman,
10, 2021
Array Multiplier Implementation
• Stack 2-input Adders:
X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y1 Z0

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3 Z2

FA FA FA HA

Z7 Z6 Z5 Z4 Z3

 AdamJune
Teman,
10, 2021
Many Critical Paths
tmult  tAND   M  1   N  2   tcarry   N  1 tsum

 ON  M 

34
 AdamJune
Teman,
10, 2021
Can we do it better?

Source: CMOS VLSI Design


 AdamJune
Teman,
10, 2021
Carry-Save Multiplier
tmult  tAND   N  1 tcarry  tmerge

 O  N  log 2 N 

36
 AdamJune
Teman,
10, 2021
Multiplier Floorplan
X3 X2 X1 X0

Y0
Y1
C S C S C S C S
Z0

Y2
C S C S C S C S
Z1
X3 X2 X1 X0

Y3
Y0 C S C S C S C S
Z2
Y1 Half Adder HA Multiplier Cell
C S C S C S C S
Z0
C C C C
Full Adder FA Multiplier Cell
S S S S
Y2
C S C S C S C S Vector
Z1 Vector Z6Cell
Z7Merging Z5 Z4 Z3
Merging Cell

Y3
X and Y signals are broadcast
37 C S C S C S Cthrough
S X andarray
the complete Y signals are broadcasted  AdamJune
Teman,
10, 2021
Faster Multipliers
Booth Recoding n 1

 2 i
 2 n
1
• Multiplying by ‘0’ is redundant. i 0

• Can we reduce the number of partial products?


• Based on the observation that
• We can turn sequences of 1’s
into sequences of 0’s. For example: 0111=1000-0001
• So we can introduce a ‘-1’ bit and recode the multiplier:
• For example, the number 56

39
 AdamJune
Teman,
10, 2021
Radix-2 Booth Recoding
• Parse multiplier from left to right
• For each change from 0 to 1, encode a ‘1’
• For each change from 1 to 0, encode a ‘-1’
• For bit 0, assume bit i=-1 is a 0
• Example: 0011 0111 0011 = 0x373

0 1 0 1 1 0 0 1 0 1 0 1

0 1 0 0 1 0 0 0 0 1 0 0 0x 484
0 0 0 1 0 0 0 1 0 0 0 1 0x 111
0x 373

40
 AdamJune
Teman,
10, 2021
Modified (Radix-4) Booth Recoding
• Radix-2 Booth Recoding doesn’t work for parallel hardware implementations:
• A worst case (010101010101010) doesn’t reduce the number of partial products.
• Variable length recoders (according to the length of ‘1’ strings)
cannot be implemented efficiently. Partial Product Selection Table
• Instead, just assume a constant length recoder. Multiplier Bits Recorded Bits
000 0
• First apply standard booth recoding.
001 + Multiplicand
• Next encode each pair of bits:
010 + Multiplicand
011 +2 × Multiplicand
100 -2 × multiplicand
101 - Multiplicand
110 - Multiplicand
• This can be summarized in a truth table: 111 0
41
 AdamJune
Teman,
10, 2021
Modified (Radix-4) Booth Recoding
• For example, let’s take our previous example:
• 0011 0111 0011 = 01 0-1 10 0-1 01 0-1
• This comes out: 1 -1 2 -1 1 -1.
• We could have done this by using the table:
• 001101110011

Source:

• To implement this we need pretty simple hardware: CMOS VLSI Design


42
 AdamJune
Teman,
10, 2021
Tree Multipliers PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8

• Can we further reduce the multiplier + + +

delay by employing logarithmic (tree)


structures? + +

CLA

Result

43
 AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier
y0 y1
y2

y0 y1 y2 y3 y4 y5
Ci-1
FA

y3 FA FA
Ci Ci Ci-1
Ci-1
FA Ci Ci-1

y4
FA
Ci Ci-1 Ci Ci-1
FA

y5

Ci FA
FA

C S
C S
44
 AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier

45
 AdamJune
Teman,
10, 2021
Wallace-Tree Multiplier
Partial products First stage
6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)

H
A

46
 AdamJune
Teman,
10, 2021
Pipelining Multipliers
• Pipelining can be applied to most multiplier structures:

47
 AdamJune
Teman,
10, 2021
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”

48
 AdamJune
Teman,
10, 2021

You might also like