0% found this document useful (0 votes)
34 views

Adders

This document discusses adders and their design and implementation in digital circuits. It describes single-bit addition and different types of multi-bit adders including ripple carry adders, carry lookahead adders, and their operation.

Uploaded by

abhi1984_lucky
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Adders

This document discusses adders and their design and implementation in digital circuits. It describes single-bit addition and different types of multi-bit adders including ripple carry adders, carry lookahead adders, and their operation.

Uploaded by

abhi1984_lucky
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

Adder

1
Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 2
Introduction

• Digital Computer Arithmetic belongs to


Computer Architecture, however, it is also an
aspect of logic design.
• The objective of Computer Arithmetic is to
develop appropriate algorithms that are utilizing
available hardware in the most efficient way.
• Ultimately, speed, power and chip area are the
most often used measures, making a strong link
between the algorithms and technology of
implementation.

Oklobdzija 2004 Computer Arithmetic 3


Building Blocks for Digital Architectures

• Arithmetic unit
 Bit sliced data path – adder, multiplier, shifter,
comparator, etc.
• Memory
 RAM, ROM, buffers, shift registers
• Control
 Finite state machine (PLA, random logic)
 Counters
• Interconnect
 Switches, arbiters, bus

17: Adders 4
Motivation
• Arithmetic units are, among others, core of every data path and
addressing unit.
• Data path is at the core of
 microprocessors (CPU)
 signal processors (DSP)
 data processing application specific IC’s (ASIC) and
programmable IC’s (FPGA)
• Standard arithmetic units available from libraries
• Design of arithmetic units necessary for
 non-standard operations
 high performance components
 library development

17: Adders 5
Why Adders?
• Addition: a fundamental operation
 Basic block of most arithmetic operations
 Address calculation
• Faster, faster and faster
• How?
 Architectural level optimization
 Gate-level optimization
 Speed/area trade-off

Fall 2008 6
• Single-bit Addition
• Carry-Ripple Adder
• Carry-Skip Adder
• Carry-Lookahead Adder
• Carry-Select Adder
• Carry-Increment Adder
• Tree Adder etc

7
Adding Two One-bit Operands
• One-bit Half Adder:
A B Sum Cout
A B 0 0 0 0
Sum = A  B 0 1 1 0
1 0 1 0
Cout HA
Cout = A.B 1 1 0 1

Sum
Cin A B Sum Cout
• One-bit Full Adder: 0 0 0 0 0
A B 0 0 1 1 0
0 1 0 1 0
Sum = A  B  Cin
0 1 1 0 1
Cout FA Cin 1 0 0 1 0
Cout = A.B + B.Cin 1 0 1 0 1
+ A.Cin 1 1 0 0 1
Sum 1 1 1 1 1

Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 8


N-Bit Ripple-Carry Adder: Series of FA Cells
• To add two n-bit numbers
An-1 Bn-1 A2 B2 A1 B1 A0 B0

Cn FA ... FA FA FA C0

Sn-1 S2 S1 S0

• Note: adder delay = Tc * n


A B
• Tc = (Cin:Cout delay)
Cout FA Cin

Su
Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan m 9
4-bit Ripple Carry Addition: Example
A=0011 0 0 0 1 1 0 1 1
B=0101 A3 B3 A2 B2 A1 B1 A0 B0

C4 C3 C2 C1
FA FA FA FA C0 0

S3 S2 S1 S0

T=0 0 0 0 0 0 0 0 0 S=0000
T=1 0 0 0 1 0 1 1 0 S=0110
T=2 0 0 0 1 1 0 1 0 S=0100
T=3 0 0 1 0 1 0 1 0 S=0000
T=4 0 1 1 0 1 0 1 0 S=1000

Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 10


CLA Definitions: One-bit adder
ci 1  g i  pi ci

a i b i
g i  ai  bi pi  ai  bi

0
c out
s 1 c in

si  pi  ci
si

Oklobdzija 2004 Computer Arithmetic 11


ai+3 CLA
b
Definitions:
a
i+3 b a
4-bit
b
i+2
Adder
a b
i+2 i+1 i+1 i i

Ci+4 Ci+3 Ci+2 Ci+1 Ci

gi+3 pi+3 gi+2 pi+2 gi+1 pi+1 gi pi

ci 1  ai bi ci  ai bi ci  ai bi  g i  pi ci

ci  2  g i 1  pi 1ci 1  g i 1  pi 1 ( g i  pi c1 )
 g i 1  pi 1 g i  pi 1 pi c1

Oklobdzija 2004 Computer Arithmetic 12


Carry-Lookahead Adder: 4-bits
ai+3 bi+3 ai+2 bi+2 ai+1 bi+1 ai bi

Ci+4 Ci+3 Ci+2 Ci+1 Ci

gi+3 pi+3 gi+2 pi+2 gi+1 pi+1 gi pi

ci  3  g i  2  pi  2 ci  2  g i  2  pi  2 ( g i 1  pi 1 g i  pi 1 pi ci )
 g i  2  pi  2 g i 1  pi  2 pi 1 g i  pi  2 pi 1 pi ci
ci  4  g i  3  pi  3ci  3  g i  3  pi  3 ( g i  2  pi  2 g i 1  pi  2 pi 1 g i  )
 g i  3  pi  3 g i  2  pi  3 pi  2 g i 1  pi  3 pi  2 pi 1 g i  pi  3 pi  2 pi 1 pi ci

Gj Pj
Oklobdzija 2004 Computer Arithmetic 13
Carry-Lookahead Adder
G j  g i  3  pi  3 g i  2  pi  3 pi  2 g i 1  pi  3 pi  2 pi 1 g i
Pj  pi  3 pi  2 pi 1 pi
a i+ 3 b i+ 3
a i+ 2 b i+ 2
a i+ 1 b i+ 1
a i b i

One gate delay  C in C j

to calculate p, g g i+ 1p i+ 1 g i+ 1 p i+ 1 g i+ 1
p i+ 1 g i p i

One to calculate C 4 (j+ 1 )


P and two for G P , G G ro u p

Three gate delays


C C C
To calculate C4(j+1) 4 j+ 3 4 j+ 2 4 j+ 1

G j
P j

c4 ( j 1)  G j  Pj c j Compare that to 8  in RCA !


Oklobdzija 2004 Computer Arithmetic 14
CLA: Propagation Equations
• If C4=1, then either:
 g3 generated at bit pos 3
 g2.p3 generated at bit pos 2, propagated 3
 g1.p2.p3 generated at bit pos 1, propagated 2,3
 g0.p1.p2.p3 generated at bit pos 0, propagated 1,2,3
 Cin.p0.p1.p2.p3 input carry, propagated 0,1,2,3
• C4 = g3+ g2.p3 + g1.p2.p3 + g0.p1.p2.p3 +
Cin.p0.p1.p2.p3

Implement
Implement CC44 as
as aa one-stage
one-stage CMOS
CMOS logic
logic
 large
large delay
delay

Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 15


CLA: 12-Bit Example
A= 1101 1001 1010
B= 0111 0110 1101
A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
B11 B10 B9 B8 B7 B6 B5 B4 B3 B2 B1 B0

p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g
0
C0
Carry Generator Carry Generator Carry Generator

C12 C8 C4
S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0

T=0 0 0000 0 0000 0 0000


T=2 1 0100 0 1111 1 0111
T=3 1 0100 1 0000 1 0111
T=4 1 0101 1 0000 1 0111
Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 16
Summary: Carry Lookahead Adder
• CLA compared to ripple-carry adder:
 Faster (“4 times”?),
but delay still linear (w.r.t. # of bits)
 Larger area
o P, G signal generation
o Carry generation circuits
o Carry generation ckt for each bit position (no re-use)
• Limitation: cannot go beyond 4 bits of look-ahead
 Large p,g fan-out slows down carry generation

Fall 2008 EE 5323 - VLSI Design I - © Kia Bazargan 17


Carry-Skip Adder
• Carry-ripple is slow through all N stages
• Carry-skip allows carry to skip over groups of n
bits
 Decision based on n-bit propagate signal

A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1 B4:1

P16:13 P12:9 P8:5 P4:1


1 C12 1 C8 1 C4 1
Cout Cin
0 + 0 + 0 + 0 +

S16:13 S12:9 S8:5 S4:1

17: Adders 18
Carry-Skip Adder

17: Adders 19
Carry-Skip Adder

17: Adders 20
Serial adder
• May be used in signal-processing arithmetic
where fast computation is important but latency
is unimportant.
• Data format (LSB first):

0 1 1 0

LSB
Serial adder structure
LSB control signal clears the carry shift register:
Lecture 20: Multiplier Design
Review: Basic Building Blocks
• Datapath
 Execution units
o Adder, multiplier, divider, shifter, etc.
 Register file and pipeline registers
 Multiplexers, decoders
• Control
 Finite state machines (PLA, ROM, random logic)
• Interconnect
 Switches, arbiters, buses
• Memory
 Caches (SRAMs), TLBs, DRAMs, buffers
The Binary Multiplication

1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0

0 0 0 0 0 0 Partial products

 1 0 1 0 1 0

1 1 1 0 0 1 1 1 0 Result
Multiply Operation
• Multiplication is just a a lot of additions

N
multiplican
d
multiplier

partial
N product can be formed in parallel
array

double precision product

2N
Multiplication Approaches
• Right shift and add
 Partial product array rows are accumulated from top to bottom on
an N-bit adder
o After each addition, right shift (by one bit) the accumulated partial product to
align it with the next row to add
 Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
 Making it faster
 Use a faster adder
 Use higher radix (e.g., base 4) multiplication – O(N/2 T adder)
- Use multiplier recoding to simplify multiple formation (booth)
 Form the partial product array in parallel and add it in parallel

 Making it smaller (i.e., slower)


 Use serial-parallel mult
 Use an array multiplier
- Very regular structure with only short wires to nearest neighbor cells.
Thus, very simple and efficient layout in VLSI Can be easily and
efficiently pipelined
Serial-parallel multiplier structure
The Array Multiplier
X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y1

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3

FA FA FA HA

Z7 Z6 Z5 Z4 Z3
Booth multiplier
• Encoding scheme to reduce number of stages in
multiplication.
• Performs two bits of multiplication at once—
requires half the stages.
• Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as
small/fast as adder.
Booth encoding
• Two’s-complement form of multiplier:
 y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign
bit)
(example, y=18=010010 y= -18 = 101110 )
• Rewrite using 2a = 2a+1 - 2a:
 y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
• Consider first two terms: by looking at three bits
of y, we can determine whether to add x, 2x to
partial product.
Booth actions
 y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

 Consider first two terms: by looking at three bits of


y, we can determine whether to add x, 2x to partial
yi yi-1 yi-2 product.
increment
0 0 00
0 0 1x
0 1 0x
0 1 1 2x
1 0 0 -2x
1 0 1 -x
1 1 0 -x
1 1 10
Booth example
• x = 1001 (910), y = 0111 (710).
• P0 = 00000000
• y3y2y1=011 y1y0y-1=11(0)
• y1y0y-1 = 110, P1 = P0 - (1001) = 11110111
• x shift left for 2 bits to be 100100
• y3y2y1 = 011, P2 = P1 (10*100100) =
11110111+01001000 = 001111111 (6310)
• An array multiplier needs N addtions, booth
multiplier needs only N/2 additions
Booth structure
Wallace-Tree Multiplier
Partial products First stage
6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)
Wallace-Tree Multiplier

x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1


Partial products x3y3 x2y3 x1y3 x0y3 x0y2 x1y0 x0y

First stage
HA HA

Second stage FA FA FA FA

Final adder
z7 z6 z5 z4 z3 z2 z1 z0

Full adder = (3,2) compressor


Making it Faster: Tree Multiplier Structure
0 D
Q (‘ier)
0 D
multiple 0 D
forming 0 D (‘icand)
circuits

partial
product mux

interconnect
array +
reduction
tree
reduction
tree (log
N)
fast carry
propagate +
adder (CPA) CPA (log
P (product) N)
(4,2) Counter
• Built out of two (3,2) counters (just FA’s!)
 all of the inputs (4 external plus one internal) have the
same weight (i.e., are in the same bit position)
 the internal carry output is fed to the next higher weight
position (indicated by the )

(3,2)

Note: Two carry outs - one


(3,2)
“internal” and one
“external”
Tiling (4,2) Counters

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

• Reduces columns four high to columns only two


high
 Tiles with neighboring (4,2) counters
 Internal carry in at same “level” (i.e., bit position weight)
as the internal carry out
4x4 Partial Product Array Reduction
 Fast 4x4 multiplication using (4,2) counters
 How would you lay it out?
multiplicand
multiplier multiplicand

partial

multiplie
product

r
array

reduced pp five (4,2)


array (to counters
CPA) 5-bit CPA
double
precision
product 8-bit
product
8x8 Partial Product Array Reduction
‘icand
 Wallace tree
‘ier
multiplier

partial two rows


product of nine
array (4,2)
counters

reduced one row of


partial thirteen
product (4,2)
array counters

to a 13-bit fast CPA


An 8x8 Multiplier Layout
 How should it be laid out?
multiplicand

multiplie
r
nine (4,2)
counters
nine (4,2)
counters
thirteen (4,2) counters

13-bit
CPA
Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques


- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

You might also like