CENG 4480
Embedded System Development & Applications
Lecture 10: Memory 2
Bei Yu
CSE Department, CUHK
[email protected] (Latest update: October 27, 2021)
Fall 2021
CENG4480 v.s. CENG3420
CENG3420:
• architecture perspective
• memory coherent
• data address
CENG4480:
• more details on how data is stored
2/48
Memory Arrays
3/48
SRAM
SRAM
• What if we add feedback to a pair of inverters?
0 1 0
• Usually drawn as a ring of cross-coupled inverters
• Stable way to store one bit of information (w. power)
0 1
1 0
5/48
How to change the value stored?
• Replace inverter with NAND gate
• RS Latch
A B A nand B
1
0 0 0 1
0 1 1
1 0 1
1 0
1 1 0
6/48
12T SRAM Cell
• Basic building block: SRAM Cell
• Holds one bit of information, like a latch
• Must be read and written
• 12-transistor (12T) SRAM cell
• Use a simple latch connected to bitline
• 46 × 75 λ unit cell
7/48
nMOS, pMOS, Inverter
• nMOS:
• Gate = 1, transistor is ON
• Then electric current path
• pMOS:
• Gate = 0, transistor is ON
• Then electric current path
• Inverter:
• Q = NOT (A)
8/48
6T SRAM Cell
• Used in most commercial chips
• A pair of weak cross-coupled inverters
• Data stored in cross-coupled inverters
• Compared with 12T SRAM, 6T SRAM:
• (+) reduce area
• (-) much more complex control
9/48
6T SRAM Read
• Precharge both bitlines high
• Then turn on wordline
• One of the two bitlines will be pulled
down by the cell
• Read stability
• A must not flip
• N1 >> N2
10/48
EX: 6T SRAM Read
• Question 1: A = 0, A_b = 1, discuss the behavior:
• Question 2: At least how many bit lines to finish read?
11/48
6T SRAM Write
• Drive one bitline high, the other low
• Then turn on wordline
• Bitlines overpower cell with new value
• Writability
• Must overpower feedback inverter
• N4 >> P2
• N2 >> P1 (symmetry)
12/48
EX: 6T SRAM Write
• Question 1: A = 0, A_b = 1, discuss the behavior:
• Question 2: At least how many bit lines to finish write?
13/48
6T SRAM Sizing
• High bitlines must not overpower inverters during reads
• But low bitlines must write new value into cell
14/48
Memory Arrays
15/48
DRAM
Dynamic RAM (DRAM)
• Basic Principle: Storage of information on capacitors
• Charge & discharge of capacitor to change stored value
• Use of transistor as "switch" to:
• Store charge
• Charge or discharge
17/48
4T DRAM Cell
Remove the two p-MOS
transistors from static
RAM cell, to get a
four-transistor dynamic
RAM cell.
• Data must be refreshed regularly
• Dynamic cells must be designed very carefully
• Data stored as charge on gate capacitors (complementary nodes)
18/48
3T DRAM Cell
• No constraints on device ratios
• Reads are non-destructive
• Value stored at node X when writing a "1" = VDD − VT
19/48
3T DRAM Layout
• 576 λ 3T DRAM v.s. 1092 λ 6T SRAM
• Further simplified
20/48
1T DRAM Cell
• Need sense amp helping reading
21/48
1T DRAM Cell
• Read
• Pre-charge large tank to VDD2
• If Ts = 0, for large tank: VDD2 - V1
• If Ts = 1, for large tank: VDD2 + V1
• V1 is very insignificant
• Need sense amp 22/48
1T DRAM Cell
• Write: Cs is charged or discharged by asserting WL and BL
• Read: Charge redistribution takes place between bit line and storage capacitance
• Voltage swing is small; typically around 250 mV
23/48
EX. 1T DRAM Cell
• Question: VDD =4V, CS =100pF, CBL =1000pF. What’s the voltage swing value?
• Note: ∆V = VDD CS
2 · CS +CBL
24/48
SRAM v.s. DRAM
• Static (SRAM)
• Data stored as long as supply is applied
• Large (6 transistorscell)
• Fast
• Compatible with current CMOS manufacturing
• Dynamic (DRAM)
• Periodic refresh required
• Small (1-3 transistors/cell)
• Slower
• Require additional process for trench capacitance
25/48
Array Architecture
Array Architecture
• 2ˆn words of 2ˆm bits each
• Good regularity - easy to design
27/48
SRAM Memory Structure
• Latch based memory
28/48
Array Architecture
• 2ˆn words of 2ˆm bits each
• How to design if n >> m?
• Fold by 2k into fewer rows of more columns
29/48
Decoders
• n:2n decoder consists of 2n n-input AND gates
• One needed for each row of memory
• Build AND with NAND or NOR gates
Static CMOS Using NOR gates
30/48
EX. Decoder
• Question: AND gates => NAND gate structure
31/48
Larger Decoder
• For n > 4, NAND gates become slow
• Break large gates into multiple smaller gates
32/48
Predecoding
• Many of these gates are redundant
• Factor out common gates
• => Predecoder
• Saves area
• Same path effort
• Question: How many NANDs can be saved?
33/48
Appendix
*Decoder Layout
• Decoders must be pitch-matched to SRAM cell
• Requires very skinny gates
35/48
*Column Circuitry
• Some circuitry is required for each column
• Bitline conditioning
• Column multiplexing
• Sense amplifiers (DRAM)
36/48
*Bitline Conditioning
• Precharge bitlines high before reads
• Equalize bitlines to minimize voltage difference when using sense amplifiers
37/48
*Twisted Bitlines
• Sense amplifiers also amplify noise
• Coupling noise is severe in modern processes
• Try to couple equally onto bit and bit_b
• Done by twisting bitlines
38/48
*SRAM Column Example
read write
39/48
*Column Multiplexing
• Recall that array may be folded for good aspect ratio
• Ex: 2 kword x 16 folded into 256 rows x 128 columns
• Must select 16 output bits from the 128 columns
• Requires 16 8:1 column multiplexers
40/48
*Ex: 2-way Muxed SRAM
41/48
*Tree Decoder Mux
• Column mux can use pass transistors
• Use nMOS only, precharge outputs
• One design is to use k series transistors for 2k :1 mux
• No external decoder logic needed
42/48
*SRAM from ARM
43/48
Sense Amp Operation for 1T DRAM
• 1T DRAM read is destructive
• Read and refresh for 1T DRAM
44/48
*Sense Amplifiers (DRAM)
• Bitlines have many cells attached
• Ex: 32-kbit SRAM has 256 rows x 128 cols
• 256 cells on each bitline
• tpd ∝ (C/I)∆V
• Ex: Even with shared diffusion contacts, 64C of diffusion capacitance (big C)
• Discharged slowly through small transistors (small I)
• Sense amplifiers are triggered on small voltage swing (reduce ∆V)
45/48
*Differential Pair Amp
• Differential pair requires no clock
• But always dissipates static power
46/48
*Clocked Sense Amp
• Clocked sense amp saves power
• Requires sense_clk after enough bitline swing
• Isolation transistors cut off large bitline capacitance
47/48
Thank You :-)