DSD Subsystem Design
DSD Subsystem Design
Introduction
• Most of the chips are built from a collection of subsystems:
• Adders
• Shifters
• Multipliers
• State machines etc.
Pipelining
• Simple method for reducing the clock period of long combinational
operations
• Complex combinational components can pose serious constraints on
system design if their delay is much longer than the delay of the other
components
• If the propagation time through that combinational element
determines the clock period, logic in the rest of the chip may sit idle
for most of the clock cycle while the critical element finishes
• Allows large combinational functions to be broken up into pieces
whose delays are in balance with the rest of the system components
Data paths
• A data path is a logical and a physical structure
• bitwise logical organization
• bitwise physical design
• It is built from components which perform typical data
operations
• Datapath often has ALU, registers, some other function units
• Data is generally passed via busses
Typical data path structure
• Slice includes one bit of function units, connected by busses:
Bit-slice structure
• Many arithmetic and logical functions can be defined
recursively on bits of word
• A bit-slice is a one-bit (or n-bit) segment of an operation of
minimum size to ensure regularity
Shifter
Combinational shifters
• Useful for arithmetic operations, bit field extraction, etc.
• Latch-based shift register can shift only one bit per clock
cycle
• Shifting data left or right over a constant amount is a trivial
hardware operation and is implemented by the appropriate
signal wiring
• But the programmable shifter is more complex and requires
active circuitry
Barrel shifter
• A barrel shifter can perform n-bit shifts in a single cycle
• It has efficient layout
• Requires transmission gates and long wires
Barrel shifter structure
• Accepts 2n data inputs and n control signals, producing n
data outputs
Barrel shifter operation
• Selects arbitrary contiguous n bits out of 2n input bits
• Examples:
• right shift: data into top, 0 into bottom
• left shift: 0 into top, data into bottom
• rotate: data into top and bottom
Barrel shifter layout
• Two-dimensional array of 2n vertical X n horizontal cells
• Input data travels diagonally upward
• Output wires travel horizontally
• Control signals run vertically
• Exactly one control signal is set to 1, turning on all
transmission gates in that column
Barrel shifter cell
Barrel shifter in action
Analysis
• The circuit has many transmission
gates
• Each signal must traverse only one
transmission gate
• The delay of barrel shifter is
determined by parasitic capacitances
on the wires (and transmission gate)
Adders
Adders
• Addition is the most commonly used arithmetic operation
• Adder delay is dominated by carry chain
• Carry chain analysis must consider transistor, wiring delay
• A full adder computes one-bit sum, carry as:
• si = ai bi ci
• ci+1 = aibi + aici + bici
Ripple-carry adder
• Ripple-carry adder: n-bit adder built from full adders
• The addition is not completed until the n-1th adder has
computed its sn-1 output
• Delay of ripple-carry adder goes through all carry bits
• It is area efficient and easy to design but slow when n is large
Carry-lookahead adder
• Speeding up the adder requires speeding up the carry chain
• It breaks the carry computation into two steps
• First compute carry propagate (P), and generate (G):
• Pi = ai + bi
• Gi = ai bi
Carry-lookahead adder
• If Gi=1 carry is generated in ith bit of sum
• If Pi=1 then carry from the i-1th bit is propagated to the next
bit
• Then compute sum and carry from P and G:
• si = ci Pi Gi (eq to ciaibi)
• ci+1 = Gi + Pici (eq to aibi+bici+ciai)
Carry-lookahead expansion
• Can recursively expand carry formula:
• ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1)
• ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2)
• Expanded formula does not depend on intermediate carries
• Allows carry for each bit to be computed independently
Depth-4 carry-lookahead
• The structure of a carry lookahead adder is:
Analysis
+ x1y1 + x0y1
+ x1y2 + x0y2
xny0
+ + 0
P(2n-1) P(2n-2) P0
Array multiplier
• Here only the last adder has a carry chain
• The earlier additions are performed by full adders which are
used to reduce three one-bit inputs to two one-bit outputs,
and only in the last stage all the values accumulated with
carries
• So relatively simple adders can be used for early stages and
faster adder reserved for the last stage
Memory Elements
High-Density Memory
• Read-only memory (ROM) can be read but now written
• It is used to store data or program that will not change
• Random access memory (RAM) can be read or write as per
necessity
• RAM are two types: static (SRAM) and dynamic (DRAM)
High-Density Memory
• SRAM is faster but uses more power and is large
• DRAM has a smaller layout and uses less power
• DRAM cells are slower and require the dynamically stored
values to be periodically refreshed
High-density memory architecture
Memory operation
• Address is divided into row, column
• Row may contain full word or more than one word
• Selected row drives/senses bit lines in columns
• Amplifiers/drivers read/write bit lines
Read-only memory (ROM)
• ROM core is organized as NOR gates—pulldown transistors of
NOR determine programming
• Erasable ROMs require special processing
• ROMs on digital ICs are generally mask-programmed—
placement of pulldowns determines ROM contents
ROM core circuit
Static RAM (SRAM)
• Core cell uses six-transistor circuit to store value
• Value is stored symmetrically—both true and complement
are stored on cross-coupled transistors
• SRAM retains value as long as power is applied to circuit
SRAM core cell
SRAM core operation
• Read:
• precharge bit and bit’ to high (VDD)
• set select line high from row decoder
• one bit line will be pulled down
• Write:
• set bit/bit’ to desired (complementary) values
• set select line high
• drive on bit lines will flip state if necessary
3Transistor dynamic RAM (DRAM)
• The simplest dynamic RAM cell uses a three-transistor circuit
• It is large and slow
• It is denser than SRAM and does not require special processing steps
as one-transistor DRAM
• Dynamic RAM loses value due to charge leakage—must be refreshed
3-Trsnsistor DRAM core cell
3-T DRAM operation
• Value is stored on gate capacitance of t1
• Read:
• read_data’ is precharged to VDD, set read = 1, write = 0,
• t1 will pull down read_data’ if 1 is stored, else read_data’ will
remain charged (read_data’ carries complement of value stored on
t1)
• Write:
• read = 0, write = 1, write_data = value
• guard transistor writes value onto gate capacitance of t1
One-transistor dynamic RAM
• One transistor DRAM quickly take place of three-transistor
circuit
• It has more packing density
One-transistor DRAM structure
• Circuit diagram for one-transistor DARM core cell
bit
word
One-transistor DRAM function
• The value is stored on a capacitor guarded by a single
transistor
• Setting the word line high connects the capacitor to the bit
line
• Write:
• bit line is set accordingly and capacitor is forced to the proper
value
• Read:
• bit line is precharged before the word line is activated
• If storage capacitor is discharged voltage on bit line is lower
Programmable logic array (PLA)
• Used to implement specialized logic functions
• A PLA decodes only some addresses (input values); a ROM
decodes all addresses
PLA organization
PLA structure
• AND plane, OR plane, inverters together form complete two-
level logic functions
• Both AND and OR planes are implemented as NOR circuits
• Pulldown transistors form programming/personality of PLA
• Transistors may be referred to as programming tabs