0% found this document useful (0 votes)
51 views

EC8095 - VLSI Design: Unit-Iv

This document discusses data path circuits used in VLSI design, including adders, multipliers, and accumulators. It describes several types of adders: full adders, ripple carry adders, carry look ahead adders, carry select adders, Manchester carry chains, and carry save adders. It also discusses array multipliers and Booth multipliers. The document provides details on bit-sliced design and building blocks for digital architectures such as arithmetic units, memory, control units, and interconnects. It focuses on adders, describing full adders, ripple carry adders, and optimizations to improve adder speed.

Uploaded by

lokesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

EC8095 - VLSI Design: Unit-Iv

This document discusses data path circuits used in VLSI design, including adders, multipliers, and accumulators. It describes several types of adders: full adders, ripple carry adders, carry look ahead adders, carry select adders, Manchester carry chains, and carry save adders. It also discusses array multipliers and Booth multipliers. The document provides details on bit-sliced design and building blocks for digital architectures such as arithmetic units, memory, control units, and interconnects. It focuses on adders, describing full adders, ripple carry adders, and optimizations to improve adder speed.

Uploaded by

lokesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

EC8095– VLSI Design

UNIT-IV
Presentation Outline
• Data-path Circuits
– Adders
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• Full Adder, Ripple Carry Adder, Carry Look Ahead Adder


• High Speed Adders – Carry Select Adder, Manchester Carry Chain, Carry Save
Adder

– Multipliers
• Array Multiplier, Booth Multiplier
– Accumulators
• Barrel Shifters
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

INPUT-OUTPUT
MEM ORY

DATAPATH
A Generic Digital Processor

CONTROL
Building Blocks for Digital Architectures
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Arithmetic unit
- Bit-sliced datapath (adder , multiplier,
shifter, comparator, etc.)
Memory
- RAM, ROM, Buffers, Shift registers
Control
- Finite state machine (PLA, random logic.)
- Counters
Interconnect
- Switches
- Arbiters
- Bus
Bit-Sliced Design
•Datapaths are often arranged
in bit sliced organization.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• Data processed in processor


Control is word based.

•Typical microprocessor
Bit 3
datapaths are 32 bit or 64 bit.

Data-Out
M ultiplexer
Bit 2
Data-In

Register

A dder

Shifter
Bit 1 •Those in DSL Modems,
Bit 0 Magnetic Disk drives, compact
disk players, are of arbitrary
width, typically 5 to 24 bits.
Tile identical processing elements
•The datapath consists of 32
bit slices (For eg: 32 bit µp),
each operating on a single bit-
Hence, the term bit slices.
Adder - Introduction
•Addition is the most common used arithmetic operation.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• but it is speed limiting elements


• careful optimization of the adder is of the utmost importance.

•Optimization is done at
•Logic level
•Circuit level

•In logic level optimization, the boolean functions are rearranged so


that faster or smaller circuit is obtained. Example of such logic
optimization is carry look ahead adder.

•In circuit optimization, transistor sizes and circuit topologies are


manipulated to optimize speed.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Cin
A

Full

Sum
B

adder
Cout
Full-Adder
The Binary Adder
A B
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Cin Full Cout


adder

Sum

S = A  B  Ci

= ABC i + ABC i + ABCi + ABCi


C o = AB + BCi + ACi
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Express Sum and Carry as a function of P, G, D

Define 3 new variable which ONLY depend on A, B


Generate (G) = AB
Propagate (P) = A  B
Delete = A B

Can also derive expressions for S and Co based on D


and P
The Ripple-Carry Adder
A0 B0 A1 B1 A2 B2 A3 B3
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

C i ,0 C o ,0 C o ,1 C o ,2 C o ,3
FA FA FA FA
(= C i,1 )

S0 S1 S2 S3

W o rs t c ase d elay line ar w ith the n u m b er o f b its


t d = O(N )

ta d d e r   N – 1  tc a r r y + ts u m

G oa l: M a ke th e fa ste st po ssible carry path circu it


FDP on VLSI Design, SSNCE, Jan 4-8, 2016
The Ripple-Carry Adder
• Two significant conclusions from the above equation

– The propagation delay of the RCA is linearly proportional to N. This


property becomes increasingly important when designing adders for the
wide data paths (N= 16…..128) that are desirable in current and future
computers

– Designing the full-adder for a fast RCA, it is important to optimize tcarry


.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
The Ripple-Carry Adder

• Inverting property
– Inverting all inputs to a full adder results in inverted values
for all outputs.
– Expressed as
Full Adder: Circuit Design
Considerations
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

A B A B

Ci FA Co Ci FA Co

S S
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Complimentary Static CMOS Full Adder

VDD
VDD
Ci A B
A B
A
B
Ci B VDD
A
X
Ci
Ci A S
Ci
A B B VDD
A B Ci A

Co B

28 Transistors
Complimentary Static CMOS Full Adder
• Complementary static CMOS adder design requires 28 transistors.
Consumes large area and circuit is slow;
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

– Tall pmos transistor stacks are present in both carry- and sum-
generation circuits.
– The Intrinsic load capacitance of the C0 signal is large and consists of
two diffusion and six gate capacitances, plus the wiring capacitance

– The signal propagates through the inverting stages in the carry-


generation circuit. Minimizing the carry path delay is the prime goal of
the designer in the high speed adder circuits. Given the small load(Fan-
out) at the output of the carry chain, having two logic states is too high
a number, and leads to extra delay.
– The sum generation requires one extra logic stage, and is not that
significant as the sum delay factor appears only once in the propagation
delay of RCA.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Minimize Critical Path by Reducing Inverting Stages

Even Cell Odd Cell

A1 B1 A3 B3
A0 B0 A2 B2

Ci,0 C o,0 Co,1 C o,2 C o,3


FA’ FA’ FA’ FA’

S0 S2
S1 S3

Exploit Inversion Property

Note: need 2 different types of cells


FDP on VLSI Design, SSNCE, Jan 4-8, 2016
The better structure: the Mirror Adder

VDD

VDD VDD A

A B B A B Ci B
Kill
"0"-Propagate A Ci
Co
Ci S
A Ci
"1"-Propagate Generate
A B B A B Ci A

24 transistors
The Mirror Adder
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• The NMOS and PMOS chains are completely symmetrical. This guarantees
identical rising and falling transitions if the NMOS and PMOS devices are
properly sized. A maximum of two series transistors can be observed in the
carry-generation circuitry.
• When laying out the cell, the most critical issue is the minimization of the
capacitance at node Co. The reduction of the diffusion capacitances is
particularly important.
• The capacitance at node Co is composed of four diffusion capacitances, two
internal gate capacitances, and six gate capacitances in the connecting adder
cell .
• The transistors connected to Ci are placed closest to the output.
• Only the transistors in the carry stage have to be optimized for optimal speed.
All transistors in the sum stage can be minimal size.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Transmission Gate Full Adder

P
VDD
VDD Ci
A
P S Sum Generation
A A P Ci

A P VDD
B B
VDD A
P
P Co Carry Generation
Ci Ci Ci
A
Setup P

• Similar circuits for sum and carry generation


• tsum = tcarry in this case
Transmission Gate Full Adder
• Multiplexers and XORs can be used to design a full adder cell.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• While this is impractical in a complementary CMOS implementation, it


becomes attractive when the multiplexers and XORs are implemented as
transmission gates.
• A full adder based on this approach uses 24 transistors.
• It is based on the propagate generate model in equation

• The propagate signal, which is XOR of inputs A and B, is used to select the
true or complementary value of the input carry as the new sum output.
• Based on the propagate signal, the output carry is either set to the input
carry, or either one of inputs A or B.
• One interesting feature of this adder is, it has similar delays for both sum
and carry outputs.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Manchester Carry Chain
Manchester Carry Chain
• A manchester carry chain adder uses a cascade of pass transistors to
implement the carry chain. This is shown in above fig.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• During the precharge phase (φ = 0), all intermediate nodes of the pass
transistor carry chain are precharged to VDD .
• During evaluation, the Ak node is discharged when there is an incoming
carry and the propagate sign Pk is high, or when the generate sign for stage
k(Gk ) is high.
• The worst case delay of the carry chain adder is modeled by the linearized
RC network in the following fig(next slide).
• Increasing the transistor width reduces the time constant, but it loads the
gates in the previous stage.
• Therefore transistor size is limited by the input loading capacitance.
• The distributed nature of RC of the carry chain results in a propagation
delay that is quadratic in the number of bits N.
• To avoid this, it is necessary to insert signal buffering inverters.
• Adding inverter makes the overall propagation delay a linear function of N,
as is the case with ripple carry adders.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Manchester Carry Chain

N  i  N ( N  1)
t p  0.69 Ci   R j   0.69
  RC
i 1  j 1  2
when all Ci  C & R j  R
ADDER: Logic Design Considerations
Carry-Bypass Adder
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

P0 G1 P0 G1 P2 G2 P3 G3

Ci,0 C o,0 C o,1 Co,2 Co,3


FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3
BP=P oP1 P2 P3
Ci,0 C o ,0 Co,1 C o,2

Multiplexer
FA FA FA FA
Co,3

Idea: If (P0 and P1 and P2 and P3 = 1)


then C o3 = C0, else “kill” or “generate”.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Manchester Carry Chain

• Consider the four-bit adder of as in above fig. The values of Ak and Bk


(k=0…..3) are such that all propagate signals Pk (k = 0….3) are high.
• An incoming carry Ci,0 = 1 propagates under those conditions through the
complete adder chain and causes and outgoing carry Co,3 = 1. In other
words,

– If (P0 P1 P2 P3 =1) then Co,3 = Ci,0


else
either DELETE or GENERATE occurred

• This information can be used to speed up the operation of the adder as in


fig. when BP = P0 P1 P2 P3 =1, the incoming carry is forwarded
immediately to next block through the bypass transistor Mb -- hence the
name carry-bypass adder or carry – skip adder.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Manchester-Carry Implementation
P0 P1 P2 P3 BP
Ci,0 Co,3
G0 G1 G2 G3

BP

• Fig shows the possible carry propagation paths when the full-adder
circuit is implemented in Manchester carry style. This kind of
arrangements speeds up addition.

•The carry propagate either through the bypass path, or a carry is


generated somewhere in the chain.

•In both the cases, the delay is smaller than the normal ripple
configuration.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Carry-Bypass Adder

Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup Setup Setup Setup

Carry Carry Carry Carry


Ci,0 Propagation Propagation Propagation Propagation

Sum Sum Sum Sum


FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Carry-Bypass Adder

• The delay of N-bit carry skip adder is computed as

N 
t p  t setup  M tcarry    1 tbypass  ( M  1) tcarry  t sum
M 

• tsetup : the fixed overhead time to create the generate and propagate signals
• tcarry : the propagation delay through a single bit. The worst case carry –
propagation delay through a single stage of M bits is approximately M
times larger.
• tbypass : the propagation delay through the bypass multiplexer of a single
stage
• tsum : the time to generate the sum of the final stage.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Carry Ripple versus Carry Bypass

tp
ripple adder

bypass adder

4..8
N
Carry-Select Adder
Setup
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

P,G

"0" "0" Carry Propagation

"1" "1" Carry Propagation

Co,k-1 Multiplexer Co,k+3

Carry Vector

Sum Generation
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Carry-Select Adder

• In RCA, every FA cell has to wait for the incoming carry


before an outgoing carry is generated.

• Possible values of carry input and the result for both


possibilities are evaluated in advance.

• Once the real value of incoming carry is known, the correct


result is easily selected with a simple multiplexer stage.

• This implementation idea is called carry-select adder.


Carry Select Adder: Critical Path
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Setup Setup Setup Setup

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"

Multiplexer Multiplexer Multiplexer Multiplexer


Ci,0 Co,3 Co,7 Co,11 Co,15

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S4-7 S8-11 S12-15


Linear Carry Select
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Setup Setup Setup Setup

(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(5) (5) (5) (5) (5)
(6) (7) (8)
Multiplexer Multiplexer Multiplexer Multiplexer
Ci,0
(9)

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S4-7 S8-11 S12-15 (10)


Square Root Carry Select
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Setup Setup Setup Setup


(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(3) (3) (4) (5) (6) (7)
(4) (5) (6) (7)
Multiplexer Multiplexer Multiplexer Multiplexer Mux
Ci,0
(8)
Sum Generation Sum Generation Sum Generation Sum Generation Sum

S0-1 S2-4 S5-8 S9-13 S14-19 (9)


Square Root Carry Select

• Assume that an N-bit adder contains P stages, and the first stage adds M
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

bits. An additional bit is added in each subsequent stage.


• Then the following relation holds:
N  M  (M  1)  (M  2)  ( M  3)  .  (M  P - 1)

P( P  1) P 2  1
 MP    P M  
2 2  2
• If M << N (eg., M =2 and N = 64), the first term dominates, and above
equation can be simplified to

P2
N  or P  2N
2
Square Root Carry Select

Above equ is used to express tadd as a function of N by rewriting


FDP on VLSI Design, SSNCE, Jan 4-8, 2016

equation
N
t add  t setup  Mtcarry   t mux  t sum
M 
t add  t setup  Mtcarry  ( 2 N ) t mux  t sum
• The delay is proportional to is square root of N for large adders (N >>>M)
• It is observed that for large values of N, tadd almost becomes constant
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Adder Delays - Comparison

50.0

ripple adder
40.0

30.0
tp

linear select

20.0

10.0 square root select

0.0
0.0 20.0 40.0 60.0
N
CARRY LOOK AHEAD ADDER
LookAhead - Basic Idea
A0 ,B 0 A1 ,B 1 AN-1 ,BN-1
...

Ci,0 P0 Ci,1 P1
FDP on

Ci,N-1 PN-1

...
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Look Ahead - Basic Idea
• Carry look ahead logic uses the concepts of generating and propagating
carries.
• A carry-lookahead adder improves speed by reducing the amount of time
required to determine carry bits.
• The carry-lookahead adder calculates one or more carry bits before the sum,
which reduces the wait time to calculate the result of the larger value bits.
The Kogge-stone adder and Brent-kung adder are examples of this type of
adder.
• Carry lookahead depends on two things:
- Calculating, for each digit position, whether that position is going to
propagate a carry if one comes in from the right.
- Combining these calculated values to be able to deduce quickly whether,
for each group of digits, that group is going to propagate a carry that comes
in from the right.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Look Ahead - Basic Idea
• Supposing that groups of 4 digits are chosen. Then the sequence of events
goes something like this:
-All 1-bit adders calculate their results. Simultaneously, the
lookahead units perform their calculations.
-Suppose that a carry arises in a particular group. Within at most 5
gate delays, that carry will emerge at the left-hand end of the group and
start propagating through the group to its left.
-If that carry is going to propagate all the way through the next group,
the lookahead unit will already have deduced this. Accordingly, before the
carry emerges from the next group the lookahead unit is immediately
(within 1 gate delay) able to tell the next group to the left that it is going to
receive a carry - and, at the same time, to tell the next lookahead unit to the
left that a carry is on its way.
Carry-Look-Ahead Adders
• Objective - generate all incoming carries in parallel
FDP on VLSI Design, SSNCE, Jan 4-8, 2016c

• Feasible - carries depend only on xn-1,xn-2,...,x0 and yn-1,yn-


2,…,y0 - information available to all stages for calculating
incoming carry and sum bit
• Requires large number of inputs to each stage of adder -
impractical
• Number of inputs at each stage can be reduced - find out
from inputs whether new carries will be generated and
whether they will be propagated
Carry Propagation
• If xi = yi=1 - carry-out generated regardless of incoming carry - no
additional information needed
• If xi,yi=10 or xi,yi=01 - incoming carry propagated
• If xi=yi=0 - no carry propagation
• Gi=xi yi - generated carry ; Pi=xi+yi - propagated carry
• ci+1= xi yi + ci (xi + yi) = Gi + ci Pi
• Substituting ci=Gi-1+ci-1Pi-1  ci+1=Gi+Gi-1Pi+ci-1Pi-1Pi
• Further substitutions -

• All carries can be calculated in parallel from xn-1,xn-2,...,x0 , yn-1,yn-


2,…,y0 , and forced carry c0
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Example - 4-bit Adder
Look-Ahead: Topology
VDD

G3
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

G2

G1

G0

C i,0
C o,3

P0

P1

P2

P3
KOGGE STONE ADDER
FDP on VLSI Design, SSNCE, Jan 4-8, 2016 Brent-Kung Adder
• Number of logic levels = log2N
• Gate fan-in is reduced
• Fan-out can be large, but is handled by careful buffering
• Regular, compact layout; forward tree and reverse tree fit
together perfectly
• Once carry bits are available, sum bits are easily derived in
constant time
• Lookahead adders are 100% larger than ripple carry adders,
but yield dramatic speed advantages for large adders
• Logarithmic behavior makes it preferable over bypass or select
adders
The Binary Multiplication
M + N– 1
··  Y k
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Z X= =
 Zk 2
M –1
k=0 i
M – 1 N – 1 
X =
 Xi 2
 i  j i=0
=  Xi 2    Yj 2 
    with N–1
j
 i=0  j = 0  Y =
 Y j2
M – 1 N – 1  j= 0
 i + j
=
 
  Xi Yj 2 


i =0 j= 0 

• Multiplication needs M cycles using N-bit adder


• In shift and add
-M partial product added
-Partial product is AND operation of multiplier bit and
multiplicand followed by a ‘shift’
52
FDP on VLSI Design, SSNCE, Jan 4-8, 2016
The Binary Multiplication

1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0

0 0 0 0 0 0 Partial products

 1 0 1 0 1 0

1 1 1 0 0 1 1 1 0 Result
Partial Product Generation
• Logical AND of multiplicand X and multiplier bit Yi
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• Adding zeros has no impact on results


• Can reduce no. or partial products by half!!
• Eg. 0111 1110 ≡ 1000 0010 where 1 = -1
– So only two partial products need be added!
(N-1)/2
– Multiplier word Y = S Yj 4j with Yj e {-2,-1, 0, 1, 2}
j=0
• This transformation is Booth’s Recoding
– Leads to less additions with area reduction and higher speed
– Alternating 10101010 for eight bit is the worst case!
– Multiplying with {-2,-1, 0, 1, 2} versus {1, 0}; needs encoding
– Used modified Booth’s recoding for consistent operation size
Modified Booth’s Recording
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

•Bunch bits from msb to lsb in three


Partial Product Selection Table
with successive overlap
Multiplier Bits Recorded Bits
000 0 •Assign multiplier as per the table
001 + Multiplicand •Number of partial products is half
010 + Multiplicand
011 +2 × Multiplicand Eg. 01111111 is bunched into
100 -2 × multiplicand  01(1), 11(1), 11(1), 11(0)
101 - Multiplicand Multiplier ≡ 10 00 00 01 (see
110 - Multiplicand table)
111 0 Four partial products developed
instead of eight
The Array Multiplier
X3 X2 X1 X0
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Y0

X3 X2 X1 X0 Y1 Z 0

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3 Z2

FA FA FA HA

Z7 Z6 Z5 Z4 Z3

• N partial products of M bit size each


• N×M two bit AND; N-1 M-bit adders
• Layout need not be straggled, but routing will take care of shift
The MxN Array Multiplier - Critical Path
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

HA FA FA HA

FA FA FA HA Critical Path 1
Critical Path 2

Critical Path 1 & 2


FA FA FA HA

Many critical paths!! Critical timing determination non-trivial


FDP on VLSI Design, SSNCE, Jan 4-8, 2016
Carry-Save Multiplier
HA HA HA HA

HA FA FA FA

HA FA FA FA

HA FA FA HA
Extra set of adders
Usually fast carry look ahead adder
Vector Merging Adder

•Carry passed diagonally downward


•Assumes tadd = tcarry
Carry-Save Multiplier
• Large number of identical critical paths are present in the array multiplier.
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• Increasing the performance of the structure through transistor sizing yields


marginal benefits.
• A more efficient realization can be obtained by noticing that the
multiplication result does not change when the output carry bits are passed
diagonally downwards instead of only to the right.
• An extra adder called a vector merging adder to generate the final result is
included.
• This resulting multiplier is called carry-save multiplier. Because, the carry
bits are not immediately added, but are rather saved for the next adder
stage.
• In the final stage, carries and sums are merged in a fast carry-propagate
adder stage (eg; carry look ahead adder).
• It has the advantage that its worst case critical path is shorter.
Wallace-Tree Multiplier
Partial products First stage
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)

•Substantial Hardware Savings


•Higher Speeds
•Propagation delay O(log3/2N)
•Irregular; inefficient for layout
Wallace-Tree Multiplier
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1


Partial products x3y3 x2y3 x1y3 x0y3 x2y1 x0y2 x1y0 x0y0

First stage
HA HA

HA
Second stage FA FA FA FA

Final adder
z7 z6 z5 z4 z3 z2 z1 z0
Wallace-Tree Multiplier
x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Partial products x3y3 x2y3 x1y3 x0y3 x2y1 x0y2 x1y0 x0y0

First stage
HA HA

Second stage FA FA FA FA

HA
Final adder
z7 z6 z5 z4 z3 z2 z1 z0

•Final adder choice critical; depends on structure of accumulator array


•Carry look ahead might be good if data arrives simultaneously
•Place pipeline stage before final addition
•In non-pipelined, other adders  similar performance w/ less hardware
Multipliers —Summary
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques


- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
Shifters
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Needs extensive hardware support


Used for floating point units; scalers and multiplication by constants
Programmable shifter more complex
 an intricate multiplexer circuitry
The Binary Shifter
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Right nop Left

Ai Bi

Ai-1 Bi-1

Bit-Slice i

...
• Too slow for large shift values
The Barrel Shifter
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

•# rows = # data word length


A3
B3
•Control wire routed diagonally
•Signal goes through only one
Sh1
A2 transmission gate (theoretically
delay is constant for shift value and
B2

Sh2 : Data Wire


shifter size)
A1
B1 : Control Wire •Reality – delay depends on shift
widths due to parasitic capacitance
Sh3

•Layout and area dominated by


A0
B0
wiring and not active elements
•Need decoder to interpret shift
data to route signal to appropriate
Sh0 Sh1 Sh2 Sh3

wire
Logarithmic Shifter
•Total shift decomposed into powers
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

of two
Sh1 Sh1 Sh2 Sh2 Sh4 Sh4
•Max shift width of M has log2M
stages
A3 B3

•ith stage shifts 2i or passes data


unchanged
A2 B2
•Speed depends on shift length
•Series connection of pass transistor
A1 B1
slows shifter down for larger shift
values (need intermediate buffers)
A0 B0 •Appropriate for larger shifts (in
terms of area and speed)
•Structure is regular  Can be
parameterised / auto- generated
FDP on VLSI Design, SSNCE, Jan 4-8, 2016

Thank You

You might also like