RTL Design
RTL Design
Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com
Introduction
outputs
inputs
bi bo
FSM
FSM
• Chapter 3: Controllers Combinational
– Control input/output: single bit (or just a logic n1
few) representing event or state n0
s1 s0
– Finite-state machine describes
behavior; implemented as state register State register
clk
and combinational logic
• Chapter 4: Datapath components
– Data input/output: Multiple bits Register Comparator
collectively representing single entity
– Datapath components included ALU Register file
registers, adders, ALU, comparators,
register files, etc.
bi bo
• This chapter: custom processors
Combinational Register file
– Processor: Controller and datapath logic n1
components working together to n0
implement an algorithm s1 s0 ALU
State register
Datapath
Digital Design
Copyright © 2006 Controller 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to Circuit
• Recall
– Chapter 2: Combinational Logic Design
• First step: Capture behavior (using equation
or truth table)
• Remaining steps: Convert to circuit
Capture behavior
– Chapter 3: Sequential Logic Design
• First step: Capture behavior (using FSM)
• Remaining steps: Convert to circuit
• RTL Design (the method for creating
Convert to circuit
custom processors)
– First step: Capture behavior (using high-
level state machine, to be introduced)
– Remaining steps: Convert to circuit
Digital Design
Copyright © 2006 3
Frank Vahid
5.2
Digital Design
Copyright © 2006 4
Frank Vahid
RTL Design Method: “Preview” Example
• Soda dispenser s a
– c: bit input, 1 when coin
deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a processor
soda
– d: bit output, processor sets to s a 25
1 when total value of
deposited coins equals or 50 25
0 1 0 1 0
exceeds cost of a soda c Soda tot:
tot:
d dispenser a
0 1 0 processor 50
25
Digital Design
Copyright © 2006 7
Frank Vahid
Preview Example: Step 2 –
Create Datapath
Inputs : c (bit), a(8 bits) , s (8 bits)
O utputs : d (bit)
Local reg isters : tot (8 bits)
Datapath 8
Digital Design
Copyright © 2006 8
Frank Vahid
Preview Example: Step 3 –
Connect Datapath to a Controller s a
s a
which we named
tot_lt_s 8 8
• Controller’s outputs
– External output d c
(dispense soda)
– Outputs to datapath d tot_ld
to load and clear the
tot register tot_clr
Controller Datapath
Digital Design
tot_lt_s
Copyright © 2006 9
Frank Vahid
Preview Example: Step 4 –
Derive the Controller’s FSM s a
• Same states 8 8
and arcs as
c
high-level state
d
machine tot_ld
Controller
Datapath
tot_clr
• But set/read
tot_lt_s
datapath s a
control signals Inputs:: c, tot_lt_s(bit)
Outputs:d, tot_ld, tot_clr (bit)
for all datapath tot_ld
tot_ld
tot_clr
ld
clr
tpt
operations and c c
Add
8
8 8
tot_clr
conditions d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
adder
d=0 c’*tot_lt_s <
tot_clr=1 Datapath 8
Disp
d=1
Digital Design Controller
Copyright © 2006 10
Frank Vahid
Preview Example: Completing the Design
• Implement the FSM as a
state register and logic
tot_lt_s
tot_clr
tot_ld
– As in Ch3
s1 s0 c n1 n0 d
– Table shown on right 0 0 0 0 0 1 0 0 1
0 0 0 1 0 1 0 0 1
Init
Inputs:: c, tot_lt_s (bit) 0 0 1 0 0 1 0 0 1
Outputs: d, tot_ld, tot_clr (bit) 0 0 1 1 0 1 0 0 1
tot_ld 0 1 0 0 1 1 0 0 0
c c 0 1 0 1 0 1 0 0 0
Wait
Add tot_clr 0 1 1 0 1 0 0 0 0
d Init Wait
tot_ld=1 0 1 1 1 1 0 0 0 0
tot_lt_s
d=0 c’*tot_lt_s 1 0 0 0 0 1 0 1 0
tot_clr=1 Add
Disp 1 1 0 0 0 0 1 0 0
Disp
d=1
Controller
Digital Design
Copyright © 2006 11
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
interest
sensor
2D = T sec * 3*108 m/sec
• Inputs/outputs
– B: bit input, from button to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, displays computed distance
Digital Design
Copyright © 2006 13
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
Inputs: B, S (1 bit each) based
Outputs: L (bit), D (16 bits) distance
D 16 measurer S
to display from sensor
S0 ?
a
L = 0 (laser off)
D = 0 (distance = 0)
Digital Design
Copyright © 2006 14
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) from button B Laser-
L
to laser
Outputs: L (bit), D (16 bits) based
distance
B’ (button not pressed) to display
D 16 measurer S
from sensor
S0 S1 ?
B
L=0 (button
D=0 pressed)
• Add another state, call S1, that waits for a button press
– B’ – stay in S1, keep waiting
– B – go to a new state S2
S0 S1 S2 S3
B
L=0 L=1 L=0 a
Digital Design
Copyright © 2006 16
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
B L
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) from button to laser
Laser-based
Local Registers: Dctr (16 bits) D 16
distance
measurer S
to display from sensor
B’ S’ (no reflection)
S (reflection)
S0 S1 S2 S3 ?
B
L=0 Dctr = 0 L=1 L=0 a
D=0 (reset cycle Dctr = Dctr + 1
count) (count cycles)
B’ S’
S0 S1 S2 S3 S4
B S
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Digital Design
Copyright © 2006 19
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
instantiate a L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datapath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and
Dctr_clr clear clear I
instantiate Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
datapath up-counter register
components and Q Q
connections to
implement any 16
data computations
D
Digital Design
Copyright © 2006 20
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
connections to D=0 Dctr = Dctr + 1 (calculate D)
implement any Datapath
a
data computations
Dreg_clr >>1
16
Dreg_ld
Dctr_clr clear clear I
Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
up-counter register
Q Q
16
16
D
Digital Design
Copyright © 2006 21
Frank Vahid
Step 3: Connecting the Datapath to a Controller
L
B to laser
from button
Controller from sensor
Dreg_clr S
Dreg_ld
• Laser-based distance
measurer example
Dctr_clr Datapath
• Easy – just connect all
Dctr_cnt
D control signals
to display between controller and
16 300 M H z Clock
datapath
Datapath
Dreg_clr >>1
Dreg_ld 16
Dreg_ld
B’ S’
Dctr_clr Datapath
Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
• FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’
a
– Inputs/outputs all
bits now S0 S1
B
S2 S3
S
S4
– Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
Digital Design (laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
Copyright © 2006 (clear D reg) (count up) (stop counting) 23
Frank Vahid
Step 4: Deriving the Controller’s FSM
B’ S’
B S
S0 S1 S2 S3 S4
Datapath
Dreg_ld 16
Dreg_ld
Dctr_clr
Dctr_clr clear clear I
count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D
B’ S’
• Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off)
logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design
Digital Design
Copyright © 2006 25
Frank Vahid
Step 2 Example Showing Mux Use
Localregisters:
E, F, G, R (16 bits)
E F G E F G E F G
T0 R = E + F
A B A B add_A_s0 1
2× 1
2×
+ + add_B_s0
T1 R = R + G A B
a
+
R R
R
(a) (b) (c)
(d)
• Introduce mux when one component input can come from
more than one source
Digital Design
Copyright © 2006 26
Frank Vahid
5.3
clk
Inputs
rd
State W W SD W W SD SD W
Outputs
D Z Q1 Z Q1 Z
Digital Design
Copyright © 2006 29
Frank Vahid
RTL Example: Bus Interface
D_en
32
a
Datapath
Bus interface
A SAD
256-byte array
integer
B sad
256-byte array
go
!(i<256)
Digital Design
Copyright © 2006 34
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)
go
S0 !go
go
• S0: wait for go sum = 0 a
S1
i=0
• S1: initialize sum and index !(i<256)
(i<256)’
• S2: check if done (i>=256) S2
• S3: add difference to sum, i<256
sum=sum+abs(A[i]-B[i])
increment index S3
i=i+1
• S4: done, write to output
S4 sad_ reg = sum
sad_reg
Digital Design
Copyright © 2006 35
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
!(i<256) 32
sum abs
S2 sum_clr
i<256 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
S4 sad_ reg=sum 32
Datapath
!(i<256) = (i_lt_256) sad
• Step 2: Create datapath
Digital Design
Copyright © 2006 36
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
go AB_rd AB_addr A_data B_data
i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 sad_reg_ld
S4 sad_reg=sum
a
sad_reg +
sad_reg_ld=1
!(i<256) = (i_lt_256) Controller 32
sad
• Step 3: Connect to controller
Digital Design • Step 4: Replace high-level state machine by FSM
Copyright © 2006 37
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
• Comparing software and custom
circuit SAD
– Circuit: Two states (S2 & S3) for each i,
256 i’s 512 clock cycles !(i<256)
– Software: Loop (for i = 1 to 256), but for S2
each i, must move memory to local
i<256
registers, subtract, compute absolute sum=sum+abs(A[i]-B[i])
value, add to sum, increment i – say S3
i=i+1
about 6 cycles per array item
256*6 = 1536 cycles
– Circuit is about 3 times (300%) faster
– Later, we’ll see how to build SAD circuit
that is even faster
Digital Design
Copyright © 2006 38
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Assuming Local registers: R, Q (8 bits)
register is update in the
state it’s written R<100 C
Digital Design
Copyright © 2006 39
Frank Vahid
RTL Design Pitfalls and Good Practice
• Solutions Local registers: R, Q (8 bits)
Q ? ? 99 99
(b)
Digital Design
Copyright © 2006 40
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Reading outputs Outputs: P (8 bits) Outputs: P (8 bits)
Local register: R (8 bits)
– Outputs can only be
written
– Solution: Introduce S T S T
additional register,
which can be written P=A P=P+B R=A P=R+B
and read P=A
(a) (b)
Digital Design
Copyright © 2006 41
Frank Vahid
RTL Design Pitfalls and Good Practice
• Good practice: Register B B
all data outputs R R
– In fig (a), output P would
show spurious values as
addition computes
• Furthermore, longest + +
register-to-register path,
which determines clock
period, is not known until P
that output is connected
to another component (a) Preg
– In fig (b), spurious outputs
reduced, and longest P
register-to-register path is (b)
clear
Digital Design
Copyright © 2006 42
Frank Vahid
Control vs. Data Dominated RTL Design
• Designs often categorized as control-dominated or data-
dominated
– Control-dominated design – Controller contains most of the
complexity
– Data-dominated design – Datapath contains most of the complexity
– General, descriptive terms – no hard rule that separates the two
types of designs
– Laser-based distance measurer – control dominated
– Bus interface, SAD circuit – mix of control and data
– Now let’s do a data dominated design
Digital Design
Copyright © 2006 43
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Filter concept
– Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
– That 240 is probably wrong! 12 digital filter 12
• Could be electrical noise clk
– Filter should remove such
noise in its output Y
– Simple filter: Output average
of last N values
• Small N: less filtering
• Large N: more filtering, but
less sharp output
Digital Design
Copyright © 2006 44
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• FIR filter
– “Finite Impulse Response” X Y
– Simply a configurable weighted 12 digital filter 12
sum of past input values clk
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Tens of taps more common
• Very general filter – User sets the
constants (c0, c1, c2) to define
specific filter
– RTL design
• Step 1: Create high-level state
machine
– But there really is none! Data
dominated indeed.
• Go straight to step 2
Digital Design
Copyright © 2006 45
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
– Begin by creating chain clk
of xt registers to hold past
values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIRfilter
x(t) x(t-1) x(t-2)
clk
Digital Design
Copyright © 2006 46
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
– Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk
* * *
Y
Digital Design
Copyright © 2006 47
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate adders
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
clk
a
* * *
+ +
Y
Digital Design
Copyright © 2006 48
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath (cont.) X Y
12 digital filter 12
– Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C
clk
* * *
+ + yreg
Y
Digital Design
Copyright © 2006 49
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Step 3 & 4: Connect to controller, Create FSM
– No controller needed
– Extreme data-dominated example
– (Example of an extreme control-dominated design – an FSM, with no
datapath)
• Comparing the FIR circuit to a software implementation
– Circuit
• Assume adder has 2-gate delay, multiplier has 20-gate delay
• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay
• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays
– Circuit is more than 100 times faster (10,000% faster). Wow.
Digital Design
Copyright © 2006 50
Frank Vahid
5.4
Digital Design
Copyright © 2006 51
Frank Vahid
Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns
a b
– b to d through + and *: 7 ns
– b to d through *: 5 ns
• Longest path is thus 7 ns 2 ns
delay
+ * 5 ns
delay
• Fastest frequency 7 ns 7 ns
2 ns
5 ns
7 ns
7 ns
– 1 / 7 ns = 142 MHz c d
Max
(2,7,7,5)
= 7 ns
Digital Design
Copyright © 2006 52
Frank Vahid
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns clk a b
• Trend
0.5 ns
– 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
– But wire delays not shrinking as fast as + 2 ns
logic delays
• Wire delays may even be greater than 0.5 ns
logic delays!
3 ns
3 ns
c 3 ns
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright © 2006 53
Frank Vahid
A Circuit May Have Numerous Paths
s a
• Paths can exist
– In the datapath Combinational logic 8 8
d
– In the controller
– Between the tot_ld
ld
controller and tot_clr tot
c clr
datapath
(c ) 8
tot_lt_s
– May be n1
hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8
automatically very
helpful
Digital Design
Copyright © 2006 54
Frank Vahid
5.5
Digital Design
Copyright © 2006 56
Frank Vahid
Converting from C to High-Level State Machine
• Convert each C construct to
equivalent states and
transitions
• Assignment statement
target= a
target = expression;
– Becomes one state with expression
assignment
• If-then statement
– Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a
Digital Design
Copyright © 2006 57
Frank Vahid
Converting from C to High-Level State Machine
• If-then-else
!cond
– Becomes state with condition if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
else { a
to “else” statements if condition // else stmts (end)
false }
cond
– Becomes state with condition while (cond) {
// while stmts (while stmts)
check, transitioning to while a
}
loop’s statements if true, then
transitioning back to condition
(end)
check
Digital Design
Copyright © 2006 58
Frank Vahid
Simple Example of Converting from C to High-
Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)
X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a
Register Files
• MxN register file
component provides er C C
32
efficient access to M N- er t
8
d0d0 load
a
Digital Design
Copyright © 2006 61
Frank Vahid
Register File
• Instead, want component that has one data input and one data output,
and allows us to specify which internal register to write and which to read
32 32
W_data R_data a
4 4
W_addr R_addr
W_en R_en
16×32
register file
Digital Design
Copyright © 2006 62
Frank Vahid
Register File Timing Diagram
• Can write one clk
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6
register and read 1 2 3 4 5 6
R_addr X X 3 X 1 3
R_en
0: ? 0: ? 0: ? 0: ? 0: ? 0: ? 0: ?
32 32
W_data R_data
1: ? 1: ? 1: 22 1: 22 1: 22 1: 22 1: 22
2: ? 2: ? 2: ? 2: ? 2: ? 2: 177 2: 177
2 2
W_addr R_addr 3: ? 3: 9 3: 9 3: 9 3: 9 3: 9 3: 555
W_en R_en
4x32
register file
Digital Design
Copyright © 2006 63
Frank Vahid
5.6
Memory Components
• Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller
M words
– A few more components are
often used outside the
controller and datapath
• MxN memory
– M words, N bits wide each N-bits
wide each
• Several varieties of memory,
which we now introduce M×N memory
Digital Design
Copyright © 2006 64
Frank Vahid
Random Access Memory (RAM)
• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
W_addr R_addr
• Strange name – Created several decades ago to
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually just one port; register file usually two
or more
32
– RAM vs. register file data
• RAM typically larger than roughly 512 or 1024 10
addr
words 1024 × 32
rw RAM
• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop en
• RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest
RAM block symbol
wires (hence delay) short
Digital Design
Copyright © 2006 65
Frank Vahid
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data cell
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
ad d r
d1
decoder
en data cell
addr(A-1) a(A-1) a
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
word 0
rdata(N-1) rdata(N-2) rdata0 enable
ad d r
d1
decoder
en data cell
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
SRAM cell
• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0
data data’
– Reading this cell 1 1
• Somewhat trickier d
• When rw set to read, the RAM logic sets
1 0
both data and data’ to 1
a
• The stored bit d will pull either the left line or
the right bit down slightly below 1 1 1 <1
word
• “Sense amplifiers” detect which side is enable
slightly pulled down To sense amplifiers
ad d r
d1
decoder
en data cell
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en
rw to all cells
rw data
DRAM cell
data
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0
cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging
data
addr
rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld
• Behavior speaker
– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into
successive RAM words
– To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright © 2006 72
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
• RTL design of processor RAM
Digital Design
Copyright © 2006 73
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one Play behavior
cycle after reading RAM, when
Local register: a (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a=0
a
ad_buf=0
machines would be parts of a Ra=a
X
larger state machine controlled Rrw=0
Ren=1
by signals that determine when da_ld=1
a=a+1
to record or play
a=4095
Digital Design
Copyright © 2006 74
Frank Vahid
Read-Only Memory – ROM
• Memory that can only be read from, not 32
data
10
written to addr
1024 × 32
– Data lines are output only rw RAM
– No need for rw input en
• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright © 2006 75
Frank Vahid
Read-Only Memory – ROM
32
data
10
addr 1024x32 Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en data
Digital Design
Copyright © 2006 76
Frank Vahid
ROM Types
• If a ROM can only be read, how Let A = log2 M
word
enable
bit storage
block ,,
addr
d1
first place?
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable
– Several methods
• Mask-programmed ROM 1 data line 0 data line
ROM addr0
addr1
a0
a1 A M
word
addr
d1
decoder
data
addr(A-1) a(A-1) cell
as 1s enable
Digital Design
Copyright © 2006 78
Frank Vahid
ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage
enable block ,,
(EPROM) addr0
addr1
a0
a1 A M
d0
,,
(a cell )
word
addr
d1
decoder
– Uses “floating-gate transistor” in each cell addr(A-1) a(A-1)
data
word word
cell
floating-gate
• Electrons become trapped in the gate data line data line
transistor
• Only done for cells that should store 0 cell cell
• Other cells (without electrons trapped in 1 0
gate) will be 1 o
tr
word eÐeÐ
– 2-bit word on right stores “10” enable
ting
ar
• Details beyond our scope – just general eta t trapped electrons
idea is necessary here g
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window
Digital Design
Copyright © 2006 79
Frank Vahid
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously 32
data
– Become common relatively recently (late 1990s) 10
addr
• Both types are in-system programmable en 1024x32
– Can be programmed with new stored bits while in the EEPROM
write
system in which the ROM operates
• Requires bi-directional data lines, and write control input busy
Digital Design
Copyright © 2006 81
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• High-level state machine 4096x16 Flash
Digital Design
Copyright © 2006 82
Frank Vahid
Blurring of Distinction Between ROM and RAM
• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright © 2006 83
Frank Vahid
Hierarchy and Abstraction
• Abstraction
– Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
• e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
– Frees designer from having to remember,
co s7.. s0
or even from having to understand, the
lower-level details
Digital Design
Copyright © 2006 84
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
4 1
• A common task is to compose smaller components i0 i0
into a larger one i1 i1 a
– Gates: Suppose you have plenty of 3-input AND gates, i2 i2 d
but need a 9-input AND gate
i3 i3
• Can simple compose the 9-input gate from several 3-input
gates 2 1
– Muxes: Suppose you have 4x1 and 2x1 muxes, but s1 s0 i0
need an 8x1 mux d
• s2 selects either top or bottom 4x1 4 1 i1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output i5 i1
i6 i2 d
i7 i3
P
ro
vin s1 s0
ce 1
s1 s0 s2
Digital Design
Copyright © 2006 85
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
• Composing memory very common
• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
1024x8 1024x8 1024x8 1024x8
addr ROM ROM ROM ROM
en en en en
data data data data
en
8 8 8 8
data(31..0)
10
1024x32
ROM
data
Digital Design
Copyright © 2006 32
86
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
11
a9..a0
• Creating memory with more words addr
– Put memories on top of one another until the addr a10 1x2 d0 1024x8
number of desired words is achieved i0 dcd ROM
– Use decoder to select among the memories e d1 en data
• Can use highest order address input(s) as 8
decoder input
• Although actually, any address line could be en addr
used 1024x8
11 ROM
– Example: Compose 1024x8 memories into
2048x8 memory 2048x8 en data
ROM
a10 a9 a8 a0 8
0 0 0 0 0 0 0 0 0 0 0 data
0 0 0 0 0 0 0 0 0 0 1 addr 8
0 0 0 0 0 0 0 0 0 1 0 1024x8
a
ROM
0 1 1 1 1 1 1 1 1 1 0 en data
a10 just chooses
0 1 1 1 1 1 1 1 1 1 1 a
which memory To create memory with more
to access 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 addr words and wider words, can first
1 0 0 0 0 0 0 0 0 1 0 1024x8 compose to enough words, then
ROM widen.
Digital Design
1 1 1 1 1 1 1 1 1 1 0 en data
Copyright © 2006 87
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary
– Modern digital design involves creating processor-level components
– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
– Several example
• Control dominated, data dominated, and mix
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Additional RTL components
• Memory: RAM, ROM
• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright © 2006 88
Frank Vahid