0% found this document useful (0 votes)
28 views65 pages

DSD Subsystem Design

Uploaded by

Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views65 pages

DSD Subsystem Design

Uploaded by

Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Subsystem Design

Introduction
• Most of the chips are built from a collection of subsystems:
• Adders
• Shifters
• Multipliers
• State machines etc.
Pipelining
• Simple method for reducing the clock period of long combinational
operations
• Complex combinational components can pose serious constraints on
system design if their delay is much longer than the delay of the other
components
• If the propagation time through that combinational element
determines the clock period, logic in the rest of the chip may sit idle
for most of the clock cycle while the critical element finishes
• Allows large combinational functions to be broken up into pieces
whose delays are in balance with the rest of the system components
Data paths
• A data path is a logical and a physical structure
• bitwise logical organization
• bitwise physical design
• It is built from components which perform typical data
operations
• Datapath often has ALU, registers, some other function units
• Data is generally passed via busses
Typical data path structure
• Slice includes one bit of function units, connected by busses:
Bit-slice structure
• Many arithmetic and logical functions can be defined
recursively on bits of word
• A bit-slice is a one-bit (or n-bit) segment of an operation of
minimum size to ensure regularity
Shifter
Combinational shifters
• Useful for arithmetic operations, bit field extraction, etc.
• Latch-based shift register can shift only one bit per clock
cycle
• Shifting data left or right over a constant amount is a trivial
hardware operation and is implemented by the appropriate
signal wiring
• But the programmable shifter is more complex and requires
active circuitry
Barrel shifter
• A barrel shifter can perform n-bit shifts in a single cycle
• It has efficient layout
• Requires transmission gates and long wires
Barrel shifter structure
• Accepts 2n data inputs and n control signals, producing n
data outputs
Barrel shifter operation
• Selects arbitrary contiguous n bits out of 2n input bits
• Examples:
• right shift: data into top, 0 into bottom
• left shift: 0 into top, data into bottom
• rotate: data into top and bottom
Barrel shifter layout
• Two-dimensional array of 2n vertical X n horizontal cells
• Input data travels diagonally upward
• Output wires travel horizontally
• Control signals run vertically
• Exactly one control signal is set to 1, turning on all
transmission gates in that column
Barrel shifter cell
Barrel shifter in action
Analysis
• The circuit has many transmission
gates
• Each signal must traverse only one
transmission gate
• The delay of barrel shifter is
determined by parasitic capacitances
on the wires (and transmission gate)
Adders
Adders
• Addition is the most commonly used arithmetic operation
• Adder delay is dominated by carry chain
• Carry chain analysis must consider transistor, wiring delay
• A full adder computes one-bit sum, carry as:
• si = ai  bi  ci
• ci+1 = aibi + aici + bici
Ripple-carry adder
• Ripple-carry adder: n-bit adder built from full adders
• The addition is not completed until the n-1th adder has
computed its sn-1 output
• Delay of ripple-carry adder goes through all carry bits
• It is area efficient and easy to design but slow when n is large
Carry-lookahead adder
• Speeding up the adder requires speeding up the carry chain
• It breaks the carry computation into two steps
• First compute carry propagate (P), and generate (G):
• Pi = ai + bi
• Gi = ai bi
Carry-lookahead adder
• If Gi=1 carry is generated in ith bit of sum
• If Pi=1 then carry from the i-1th bit is propagated to the next
bit
• Then compute sum and carry from P and G:
• si = ci  Pi  Gi (eq to ciaibi)
• ci+1 = Gi + Pici (eq to aibi+bici+ciai)
Carry-lookahead expansion
• Can recursively expand carry formula:
• ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1)
• ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2)
• Expanded formula does not depend on intermediate carries
• Allows carry for each bit to be computed independently
Depth-4 carry-lookahead
• The structure of a carry lookahead adder is:
Analysis

• Deepest carry expansion requires gates with large fanin:


large, slow
• Carry-lookahead unit requires complex wiring between
adders and lookahead unit—values must be routed back
from lookahead unit to adder
• Layout is even more complex with multiple levels of
lookahead
Carry-skip adder
• Looks for cases in which carry out of a set of bits is identical
to carry in
• Typically organized into m-bit stages
• If ai = bi for every bit in stage, then bypass gate sends stage’s
carry input directly to carry output
• The skip condition is computed from the propagate (P)
signals
Two-bit carry-skip structure
• The structure of carry chain for two bit carry skip adder is:
Carry-skip adder

• At each stage the propagate signals are computed and used


to determine whether the carry will be computed from the
bits internal to that group or will be taken from the previous
group’s carry
• Use optimum number of bits in a group
Carry-select adder
• Computes two results of additions in parallel, each for
different carry input assumptions
• As the real value of incoming carry is known, the correct
result is easily selected
• Uses actual carry in to select correct result
Carry-select structure
Serial adder
• Serial adders present different approach to high-speed
arithmetic they require many clock cycles to add two n-bit
numbers, but with very short cycle time
• May be used in signal-processing arithmetic where fast
computation is important but latency is unimportant
Serial adder
• Serial adders can work on nibbles or bytes
• Data format (LSB first):
Serial adder structure
• The bit serial adder is shown as:

LSB control signal clears


the carry shift register
ALUs
ALUs
• The arithmetic logic unit (ALU) is a modified adder
• An ALU can perform both arithmetic and logical operations
based on opcode
• But the arithmetic operations’ requirements dominate the
design
• ALU built around adder, since carry chain determines delay
• May offer complete set of functions of two variables or a
subset
ALUs
• A basic ALU takes two data inputs and a set of control
signals, also called opcode
• The opcode together with ALU’s carry-in determine the ALU’s
function
• Like, if the ALU is set to add then c0=0 produces a+b, while
c0=1 produces a+b+1
ALU structure
ALU design
• P and G compute intermediate values from inputs
• May not correspond to carry lookahead P and G for non-
addition functions
• Add unit is adder of choice
• Output unit computes from sum, propagate signal
ALUs
• All ALUs need not necessarily implement full set of logical
functions
• If the ALU need implement only a few functions
• An ALU having only addition, subtraction and one or two bit
wise functions can be implemented using static gates
Multipliers
Elementary algorithm
0 1 1 0 multiplicand
x 1 0 0 1 multiplier
0110
+0000 partial product
00110
+0000
000110
+0110
0110110
Serial-parallel multiplier
• Used in serial-arithmetic operations
• Multiplicand can be held in place by register
• Multiplier is shifted into array
• As the n-bit multiplier is fed in serially while the m-bit
multiplicand is held in parallel during multiplication 
known as serial-parallel multiplier
Serial-parallel multiplier structure
Serial-parallel multiplier function
• The multiplier is fed in least-significant bit first and followed
by at least m zeros
• The result appears serially at the end of the multiplier chain
• A one bit multiplier is simply an AND gate
• The sum units include a combinational full adder and a
register hold the carry
• The chain of summation units and registers performs the
shift and add operation
Array multiplier
• Array multiplier is an efficient layout of a combinational
multiplier
• Well suited for VLSI implementation
• Array multipliers may be pipelined to decrease clock period
at the expense of latency
Array multiplier organization
multiplicand 0110
multiplier x1001
0110
0000
0000
0110
product 0110110
Array multiplier structure

x2y0 x1y0 x0y0


0 0

+ x1y1 + x0y1

+ x1y2 + x0y2

xny0

+ + 0

P(2n-1) P(2n-2) P0
Array multiplier
• Here only the last adder has a carry chain
• The earlier additions are performed by full adders which are
used to reduce three one-bit inputs to two one-bit outputs,
and only in the last stage all the values accumulated with
carries
• So relatively simple adders can be used for early stages and
faster adder reserved for the last stage
Memory Elements
High-Density Memory
• Read-only memory (ROM) can be read but now written
• It is used to store data or program that will not change
• Random access memory (RAM) can be read or write as per
necessity
• RAM are two types: static (SRAM) and dynamic (DRAM)
High-Density Memory
• SRAM is faster but uses more power and is large
• DRAM has a smaller layout and uses less power
• DRAM cells are slower and require the dynamically stored
values to be periodically refreshed
High-density memory architecture
Memory operation
• Address is divided into row, column
• Row may contain full word or more than one word
• Selected row drives/senses bit lines in columns
• Amplifiers/drivers read/write bit lines
Read-only memory (ROM)
• ROM core is organized as NOR gates—pulldown transistors of
NOR determine programming
• Erasable ROMs require special processing
• ROMs on digital ICs are generally mask-programmed—
placement of pulldowns determines ROM contents
ROM core circuit
Static RAM (SRAM)
• Core cell uses six-transistor circuit to store value
• Value is stored symmetrically—both true and complement
are stored on cross-coupled transistors
• SRAM retains value as long as power is applied to circuit
SRAM core cell
SRAM core operation
• Read:
• precharge bit and bit’ to high (VDD)
• set select line high from row decoder
• one bit line will be pulled down
• Write:
• set bit/bit’ to desired (complementary) values
• set select line high
• drive on bit lines will flip state if necessary
3Transistor dynamic RAM (DRAM)
• The simplest dynamic RAM cell uses a three-transistor circuit
• It is large and slow
• It is denser than SRAM and does not require special processing steps
as one-transistor DRAM
• Dynamic RAM loses value due to charge leakage—must be refreshed
3-Trsnsistor DRAM core cell
3-T DRAM operation
• Value is stored on gate capacitance of t1
• Read:
• read_data’ is precharged to VDD, set read = 1, write = 0,
• t1 will pull down read_data’ if 1 is stored, else read_data’ will
remain charged (read_data’ carries complement of value stored on
t1)
• Write:
• read = 0, write = 1, write_data = value
• guard transistor writes value onto gate capacitance of t1
One-transistor dynamic RAM
• One transistor DRAM quickly take place of three-transistor
circuit
• It has more packing density
One-transistor DRAM structure
• Circuit diagram for one-transistor DARM core cell
bit
word
One-transistor DRAM function
• The value is stored on a capacitor guarded by a single
transistor
• Setting the word line high connects the capacitor to the bit
line
• Write:
• bit line is set accordingly and capacitor is forced to the proper
value
• Read:
• bit line is precharged before the word line is activated
• If storage capacitor is discharged voltage on bit line is lower
Programmable logic array (PLA)
• Used to implement specialized logic functions
• A PLA decodes only some addresses (input values); a ROM
decodes all addresses
PLA organization
PLA structure
• AND plane, OR plane, inverters together form complete two-
level logic functions
• Both AND and OR planes are implemented as NOR circuits
• Pulldown transistors form programming/personality of PLA
• Transistors may be referred to as programming tabs

You might also like