Write assist
Weaker PMOS is needed
Supply reduction of the to be written cells by ~200mV
Only the columns to be
written on get the lower
supply voltage: a power
decoder is needed
Reference generator
Bi-directional/data Write voltage
dependent current
flow
Old data: VDDVWR
New data: sink data from VWR to charge up the new 1 node
Writing old data pulls up VWR push-pull is needed at Ref. Generator
Read Write-assist circuits
Keeps WL
voltage in
check
(lower for
stable
read)
Charge redistribution
between cell VDD and
down Vdd
Read Write-assist circuits
Keeps WL
voltage in
check
(lower for
stable
read)
Charge redistribution
between cell VDD and
down Vdd
Transistor Level View of Core
Precharge
Row Decode
Column Decode
Sense Amp
6
SRAM, Putting it all together
2n rows, 2m * k columns
n + m address lines, k bits data width 7
Hierarchical Array Architecture
Subblocks Select 1 column / subblock
1 / output bit
Row
Address
Column
Address
Block
Address
Global Data Bus
Control Block Selector Global
Circuitry Amplifier/Driver
I/O
Advantages:
1. Shorter wires within blocks
2. Block address activates only 1 block => power savings
8
1 sense amp / subblock
Standalone SRAM Floorplan Example
9
Divided bit-line structure
10
SRAM Partitioning
Partitioned Bitline
11
SRAM Partitioning
Divided Wordline Arch
12
Partitioning summary
• Partitioning involves a trade off between area, power and speed
• For high speed designs, use short blocks(e.g 64 rows x 128 columns )
• Keep local bitline heights small
• For low power designs use tall narrow blocks (e.g 256 rows x 64
columns)
13
Partitioning summary
• Partitioning involves a trade off between area, power and speed
• For high speed designs, use short blocks(e.g 64 rows x 128 columns )
• Keep local bitline heights small
• For low power designs use tall narrow blocks (e.g 256 rows x 64
columns)
14
Address Transition Detection
Provides Clock for Asynch RAMs
VDD
DELAY
A0 td
ATD ATD
DELAY
A1 td
...
DELAY
AN-1 td
15
Row Decoders
• Collection of 2 complex logic gates organized in a
R
regular, dense fashion
• (N)AND decoder 9->512
WL(0) /= !A8!A7!A6!A5!A4!A3!A2!A1!A0
…
WL(511) /= A8A7A6A5A4A3A2A1A0
• NOR decoder 9->512
WL(0) = !(A8+A7+A6+A5+A4+A3+A2+A1+A0)
…
WL(511) = !(!A8+!A7+!A6+!A5+!A4+!A3+!A2+!A1+!A0)
16
A NAND decoder using 2-input pre-decoders
WL 1
WL 0
A0A1 A0 A1 A0 A1 A0A 1 A 2A3 A2 A3 A2 A3 A2 A3
A1 A 0 A0 A1 A3 A2 A2 A3
Splitting decoder into two or more logic layers
produces a faster and cheaper implementation17
Row Decoders (cont’d)
A0/ A1/ A0 A1/ A0/ A1 A0 A1 A2/ A3/ A2 A3/ A2/ A3 A2 A3
R0/
R1/
R2/
… and so forth
A0
A1
A2
A3
18
Dynamic Decoders
Precharge devices GND GND
VDD
WL 3
WL 3 VDD
WL 2
WL 2 VDD
WL 1 WL 1
VDD
WL 0 WL 0
V DD φ A0 A0 A1 A1 A0 A0 A1 A1 φ
Dynamic 2-to-4 NOR decoder 2-to-4 MOS dynamic NAND Decoder
Propagation delay is primary concern
19
Dynamic NOR
V Row Decoder
dd
WL0
WL1
WL2
WL3
A0 !A0 A1 !A1
Precharge/
20
Dynamic NAND Row Decoder
WL0
WL1
WL2
WL3
!A0 A0 !A1 A1
Precharge/
21
Back
Decoders
• n:2n decoder consists of 2n n-input AND gates
• One needed for each row of memory
• Build AND from NAND or NOR gates
• Make devices on address line minimal size
• Scale devices on decoder O/P to drive word lines
Static CMOS Pseudo-nMOS
A1 A0 A1 A0
1/2 4 16
word
word0
1 1 8
word0
A0 A1 2 8
word 1 1
word1 word1
A1 1 4
word2 A0 1 word2
word3 word3
22
Row Decoders
23
Decoder Designs
24
Decoder Layout
• Decoders must be pitch-matched to SRAM cell
• Requires very skinny gates
A3 A3 A2 A2 A1 A1 A0 A0
VDD
word
GND
NAND gate buffer inverter
25
Large Decoders
• For n > 4, NAND gates become slow
• Break large gates into multiple smaller gates
A3 A2 A1 A0
word0
word1
word2
word3
word15
26
Predecoding
• Group address bits in predecoder
• Saves area
• Same path effort A3
A2
A1
A0
predecoders
1 of 4 hot
predecoded lines
word0
word1
word2
word3
27
word15
Split Row Decoder !(!(!A0!A1!A2) + !(!A3!A4!A5) +!A6)
WL0 WL0
*128 *128
WL127 WL127
!(!A0!A1!A2)
...
!(A0A1A2) *8 *8
*7
Address<6:0>
*7
28
SRAM Partitioning
Divided Wordline Arch
29
Sense Amplifier: Why? Cell pull
• Bit line cap significant for large array down
• If each cell contributes 2fF, Xtor
• for 256 cells, 512fF plus wire cap resistance
• Pull-down resistance is about 15K RC∆V
• RC = 7.5ns! (assuming ∆V = Vdd) τ=
Vdd
• Cannot easily change R, C, or Vdd, but can change
Cell current
∆V i.e. smallest sensed voltage
• Can reliably sense ∆V as small as <50mV
30
Sense Amplifiers
make ∆ V as small
Cb⋅ ∆ V
tp = ---------------- as possible
Icell
large small
Idea: Use Sense Amplifer
small
transition s.a.
input output
31
Differential Sensing - SRAM
V DD VDD
V DD PC VDD
y M3 M4 y
x M1 M2 x x
BL BL x
EQ
SE M5
SE
WL i
(b) Doubled-ended Current Mirror Amplifier
SRAM cell i V DD
Diff. y y
x Sense x
Amp
y y x x
D D
SE
(a) SRAM sensing scheme.
(c) Cross-Coupled Amplifier 32