0% found this document useful (0 votes)
12 views

RTL Design

Chapter 5 of 'Digital Design' focuses on Register-Transfer Level (RTL) design, detailing the process of creating custom processors by integrating controllers and datapath components. It outlines the steps for capturing behavior using high-level state machines and converting them into circuits, illustrated through examples like a soda dispenser and a laser-based distance measurer. The chapter emphasizes the importance of defining inputs, outputs, and local registers while developing a high-level state machine to describe processor behavior.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

RTL Design

Chapter 5 of 'Digital Design' focuses on Register-Transfer Level (RTL) design, detailing the process of creating custom processors by integrating controllers and datapath components. It outlines the steps for capturing behavior using high-level state machines and converting them into circuits, illustrated through examples like a soda dispenser and a laser-based distance measurer. The chapter emphasizes the importance of defining inputs, outputs, and local registers while developing a high-level state machine to describe processor behavior.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Digital Design

Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com

Copyright © 2007 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
Digital
subject to keeping Design
this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright © 2006 1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Frank Vahid
may obtain PowerPoint source or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.
5.1

Introduction

outputs
inputs
bi bo

FSM
FSM
• Chapter 3: Controllers Combinational
– Control input/output: single bit (or just a logic n1
few) representing event or state n0
s1 s0
– Finite-state machine describes
behavior; implemented as state register State register
clk
and combinational logic
• Chapter 4: Datapath components
– Data input/output: Multiple bits Register Comparator
collectively representing single entity
– Datapath components included ALU Register file
registers, adders, ALU, comparators,
register files, etc.
bi bo
• This chapter: custom processors
Combinational Register file
– Processor: Controller and datapath logic n1
components working together to n0
implement an algorithm s1 s0 ALU
State register
Datapath
Digital Design
Copyright © 2006 Controller 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to Circuit
• Recall
– Chapter 2: Combinational Logic Design
• First step: Capture behavior (using equation
or truth table)
• Remaining steps: Convert to circuit
Capture behavior
– Chapter 3: Sequential Logic Design
• First step: Capture behavior (using FSM)
• Remaining steps: Convert to circuit
• RTL Design (the method for creating
Convert to circuit
custom processors)
– First step: Capture behavior (using high-
level state machine, to be introduced)
– Remaining steps: Convert to circuit

Digital Design
Copyright © 2006 3
Frank Vahid
5.2

RTL Design Method

Digital Design
Copyright © 2006 4
Frank Vahid
RTL Design Method: “Preview” Example
• Soda dispenser s a
– c: bit input, 1 when coin
deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a processor
soda
– d: bit output, processor sets to s a 25
1 when total value of
deposited coins equals or 50 25
0 1 0 1 0
exceeds cost of a soda c Soda tot:
tot:
d dispenser a

0 1 0 processor 50
25

How can we precisely describe this


Digital Design
Copyright © 2006 processor’s behavior? 5
Frank Vahid
Preview Example: Step 1 –
Capture High-Level State Machine s a
• Declare local register tot 8 8
c Soda
• Init state: Set d=0, tot=0 d dispenser
processor
• Wait state: wait for coin
– If see coin, go to Add state
Inputs: c (bit), a (8 bits), s (8 bits)
• Add state: Update total value: Outputs: d (bit)
tot = tot + a Local registers: tot (8 bits)
– Remember, a is present coin’s
c
value Add
– Go back to Wait state
Init Wait
• In Wait state, if tot >= s, go to tot=tot+a
Disp(ense) state d=0 c’*(tot<s)
tot=0 c’*(tot<s)’
• Disp state: Set d=1 (dispense
soda) Disp
– Return to Init state
d=1
Digital Design
Copyright © 2006 6
Frank Vahid
Preview Example: Step 1 –
Create a High-Level State Machine
• Let’s consider each step of the
RTL design process in more
detail Inputs : c (bit), a (8 bits) , s (8 bits)
• Step 1 Outputs : d (bit)
Local reg isters: tot (8 bits)
– Soda dispenser example
c
– Not an FSM because:
• Multi-bit (data) inputs a and s Init Wait
• Local register tot tot= tot+a
• Data operations tot=0, tot<s, d=0 c’ (tot<s )
c’(tot<s )’
tot=tot+a. tot=0
– Useful high-level state machine: Disp
• Data types beyond just bits d=1
• Local registers
• Arithmetic equations/expressions

Digital Design
Copyright © 2006 7
Frank Vahid
Preview Example: Step 2 –
Create Datapath
Inputs : c (bit), a(8 bits) , s (8 bits)
O utputs : d (bit)
Local reg isters : tot (8 bits)

• Need tot register c


Add
Init Wait
• Need 8-bit comparator d=0 c‘
tot= tot+a
c‘ *(tot<s)
(tot<s)‘
to compare s and tot tot=0
Disp

• Need 8-bit adder to s a


d=1

perform tot = tot + a


• Wire the components
tot_ld ld
as needed for above tot
tot_clr clr
• Create control 8
input/outputs, give 8 8
them names
8-bit 8-bit
tot_lt_s
< adder

Datapath 8
Digital Design
Copyright © 2006 8
Frank Vahid
Preview Example: Step 3 –
Connect Datapath to a Controller s a

• Controller’s inputs tot_ld ld


tot
tot_clr clr
– External input c 8
8 8
(coin detected)
8-bit 8-bit
tot_lt_s
– Input from datapath < adder

comparator’s output, Datapath 8

s a
which we named
tot_lt_s 8 8

• Controller’s outputs
– External output d c
(dispense soda)
– Outputs to datapath d tot_ld
to load and clear the
tot register tot_clr

Controller Datapath
Digital Design
tot_lt_s
Copyright © 2006 9
Frank Vahid
Preview Example: Step 4 –
Derive the Controller’s FSM s a

• Same states 8 8

and arcs as
c
high-level state
d
machine tot_ld

Controller

Datapath
tot_clr
• But set/read
tot_lt_s
datapath s a
control signals Inputs:: c, tot_lt_s(bit)
Outputs:d, tot_ld, tot_clr (bit)
for all datapath tot_ld
tot_ld
tot_clr
ld
clr
tpt

operations and c c
Add
8
8 8
tot_clr
conditions d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
adder
d=0 c’*tot_lt_s <
tot_clr=1 Datapath 8
Disp

d=1
Digital Design Controller
Copyright © 2006 10
Frank Vahid
Preview Example: Completing the Design
• Implement the FSM as a
state register and logic

tot_lt_s

tot_clr
tot_ld
– As in Ch3
s1 s0 c n1 n0 d
– Table shown on right 0 0 0 0 0 1 0 0 1
0 0 0 1 0 1 0 0 1

Init
Inputs:: c, tot_lt_s (bit) 0 0 1 0 0 1 0 0 1
Outputs: d, tot_ld, tot_clr (bit) 0 0 1 1 0 1 0 0 1
tot_ld 0 1 0 0 1 1 0 0 0
c c 0 1 0 1 0 1 0 0 0

Wait
Add tot_clr 0 1 1 0 1 0 0 0 0
d Init Wait
tot_ld=1 0 1 1 1 1 0 0 0 0
tot_lt_s
d=0 c’*tot_lt_s 1 0 0 0 0 1 0 1 0
tot_clr=1 Add
Disp 1 1 0 0 0 0 1 0 0
Disp

d=1
Controller

Digital Design
Copyright © 2006 11
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
interest
sensor
2D = T sec * 3*108 m/sec

• Example of how to create a high-level state machine to


describe desired processor behavior
• Laser-based distance measurement – pulse laser,
measure time T to sense reflection
8
– Laser light travels at speed of light, 3*10 m/sec
8
– Distance is thus D = T sec * 3*10 m/sec / 2
Digital Design
Copyright © 2006 12
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

• Inputs/outputs
– B: bit input, from button to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, displays computed distance

Digital Design
Copyright © 2006 13
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
Inputs: B, S (1 bit each) based
Outputs: L (bit), D (16 bits) distance
D 16 measurer S
to display from sensor

S0 ?
a
L = 0 (laser off)
D = 0 (distance = 0)

• Step 1: Create high-level state machine


• Begin by declaring inputs and outputs
• Create initial state, name it S0
– Initialize laser to off (L=0)
– Initialize displayed distance to 0 (D=0)

Digital Design
Copyright © 2006 14
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) from button B Laser-
L
to laser
Outputs: L (bit), D (16 bits) based
distance
B’ (button not pressed) to display
D 16 measurer S
from sensor

S0 S1 ?
B
L=0 (button
D=0 pressed)

• Add another state, call S1, that waits for a button press
– B’ – stay in S1, keep waiting
– B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a
Digital Design
Copyright © 2006 15
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) from button B Laser-
L
to laser
Outputs: L (bit), D (16 bits) based
distance
D 16 measurer S
to display from sensor
B’

S0 S1 S2 S3
B
L=0 L=1 L=0 a

D=0 (laser on) (laser off)

• Add a state S2 that turns on the laser (L=1)


• Then turn off laser (L=0) in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

Digital Design
Copyright © 2006 16
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
B L
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) from button to laser
Laser-based
Local Registers: Dctr (16 bits) D 16
distance
measurer S
to display from sensor
B’ S’ (no reflection)

S (reflection)
S0 S1 S2 S3 ?
B
L=0 Dctr = 0 L=1 L=0 a
D=0 (reset cycle Dctr = Dctr + 1
count) (count cycles)

• Stay in S3 until sense reflection (S)


• To measure time, count cycles for which we are in S3
– To count, declare local register Dctr
– Increment Dctr each cycle in S3
– Initialize Dctr to 0 in S1. S2 would have been O.K. too
Digital Design
Copyright © 2006 17
Frank Vahid
Step 1 Example: Laser-Based Distance Measurer
B L
from button to laser
Laser-based
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) distance
D 16 measurer S
Local Registers: Dctr (16 bits) to display from sensor

B’ S’

S0 S1 S2 S3 S4
B S
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)

• Once reflection detected (S), go to new state S4


– Calculate distance
8
– Assuming clock frequency is 3x10 , Dctr holds number of meters, so
D=Dctr/2
• After S4, go back to S1 to wait for button again
Digital Design
Copyright © 2006 18
Frank Vahid
Step 2: Create a Datapath
• Datapath must
– Implement data storage
– Implement data computations
• Look at high-level state machine, do
three substeps
– (a) Make data inputs/outputs be datapath
inputs/outputs
– (b) Instantiate declared registers into the
datapath (also instantiate a register for each Instantiate: to
data output)
introduce a new
– (c) Examine every state and transition, and
instantiate datapath components and component into a
connections to implement any data design.
computations

Digital Design
Copyright © 2006 19
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
instantiate a L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datapath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and
Dctr_clr clear clear I
instantiate Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
datapath up-counter register
components and Q Q
connections to
implement any 16
data computations
D

Digital Design
Copyright © 2006 20
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
connections to D=0 Dctr = Dctr + 1 (calculate D)
implement any Datapath
a

data computations
Dreg_clr >>1
16
Dreg_ld
Dctr_clr clear clear I
Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
up-counter register
Q Q
16

16
D
Digital Design
Copyright © 2006 21
Frank Vahid
Step 3: Connecting the Datapath to a Controller

L
B to laser
from button
Controller from sensor
Dreg_clr S

Dreg_ld
• Laser-based distance
measurer example
Dctr_clr Datapath
• Easy – just connect all
Dctr_cnt
D control signals
to display between controller and
16 300 M H z Clock
datapath

Datapath

Dreg_clr >>1
Dreg_ld 16

Dctr_clr clear clear I


count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
Q Q
16
Digital Design
16
Copyright © 2006 22
Frank Vahid D
Step 4: Deriving the Controller’s FSM
B
L Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Controller
to laser
Local Registers: Dctr (16 bits)
from sensor
Dreg_clr S

Dreg_ld
B’ S’
Dctr_clr Datapath

Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
• FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’
a
– Inputs/outputs all
bits now S0 S1
B
S2 S3
S
S4
– Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
Digital Design (laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
Copyright © 2006 (clear D reg) (count up) (stop counting) 23
Frank Vahid
Step 4: Deriving the Controller’s FSM
B’ S’

B S
S0 S1 S2 S3 S4

L=0 L=0 L=1 L=0 L=0


Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt


• Using
shorthand of B’ S’
a
outputs not
assigned B S
S0 S1 S2 S3 S4
implicitly
assigned 0 L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)
Digital Design
Copyright © 2006 24
Frank Vahid
Step 4
B L
from button to laser Datapath
Controller
from sensor
Dreg_clr S
Dreg_clr >>1

Datapath
Dreg_ld 16
Dreg_ld
Dctr_clr
Dctr_clr clear clear I
count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

B’ S’

• Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off)
logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design
Digital Design
Copyright © 2006 25
Frank Vahid
Step 2 Example Showing Mux Use
Localregisters:
E, F, G, R (16 bits)
E F G E F G E F G

T0 R = E + F
A B A B add_A_s0 1
2× 1

+ + add_B_s0
T1 R = R + G A B
a
+
R R

R
(a) (b) (c)

(d)
• Introduce mux when one component input can come from
more than one source
Digital Design
Copyright © 2006 26
Frank Vahid
5.3

RTL Design Examples and Issues


• We’ll use several more Master
processor
examples to illustrate RTL
design rd
D
• Example: Bus interface 32
4 A
– Master processor can read
register from any peripheral Per0 Per1 Per15
• Each register has unique 4-bit
address to/from processor bus
rd D A
• Assume 1 register/periph.
– Sets rd=1, A=address 32 4

– Appropriate peripheral places Faddr


Bus interface
register data on 32-bit D lines 4
• Periph’s address provided on Q
32
Faddr inputs (maybe from DIP
switches, or another register) Main part

Digital Design Peripheral


Copyright © 2006 27
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

• Step 1: Create high-level state machine


– State WaitMyAddress
• Output “nothing” (“Z”) on D, store peripheral’s register value Q into local
register Q1
• Wait until this peripheral’s address is seen (A=Faddr) and rd=1
– State SendData
• Output Q1 onto D, wait for rd=0 (meaning main processor is done
reading the D lines)
Digital Design
Copyright © 2006 28
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

clk
Inputs
rd

State W W SD W W SD SD W
Outputs
D Z Q1 Z Q1 Z

Digital Design
Copyright © 2006 29
Frank Vahid
RTL Example: Bus Interface

Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)


Outputs: D (32 bits)
Local register: Q1 (32 bits) A Faddr Q
rd’ rd
4 4 32
((A = Faddr)
and rd)’ Q1_ld
ld Q1
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1 = (4-bit)
Q1 = Q 32
A_eq_Faddr

D_en
32
a

• Step 2: Create a datapath Datapath


(a) Datapath inputs/outputs Bus interface
(b) Instantiate declared registers
D
(c) Instantiate datapath components and
connections
Digital Design
Copyright © 2006 30
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd A Faddr Q
Inputs: rd, A_eq_Faddr
((A(bit)
= Faddr)
Outputs: Q1_ld, D_en (bit)
and rd)’ 4 4 32
WaitMyAddress rdSendData
‘ Q1_ld
rd rd ld
(A = Faddr) Q1
D = “Z” and(A_eq_
rd Faddr D = Q1
Q1 = Q and rd) ‘
= (4-bit) 32
WaitMyAdd ress SendD ata A_eq_Faddr
A_eq_ Faddr
D_en = 0 and rd D_en = 1 D_en
a Q1_ld = 1 Q1_ld = 0 32

Datapath
Bus interface

• Step 3: Connect datapath to controller D

• Step 4: Derive controller’s FSM


Digital Design
Copyright © 2006 31
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Only difference: ball moving
Differences
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
• Video is a series of frames (e.g., 30 per second) difference
• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digital Design
Copyright © 2006 32
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
• Need to quickly determine whether two frames are similar
enough to just send difference for second frame
– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array
– Compute the absolute value of the difference of each array item
– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design
Copyright © 2006 33
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences

A SAD
256-byte array

integer
B sad
256-byte array
go

!(i<256)

• Want fast sum-of-absolute-differences (SAD) component


– When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum

Digital Design
Copyright © 2006 34
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)

go
S0 !go
go
• S0: wait for go sum = 0 a
S1
i=0
• S1: initialize sum and index !(i<256)
(i<256)’
• S2: check if done (i>=256) S2
• S3: add difference to sum, i<256
sum=sum+abs(A[i]-B[i])
increment index S3
i=i+1
• S4: done, write to output
S4 sad_ reg = sum
sad_reg
Digital Design
Copyright © 2006 35
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
!(i<256) 32
sum abs
S2 sum_clr
i<256 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
S4 sad_ reg=sum 32
Datapath
!(i<256) = (i_lt_256) sad
• Step 2: Create datapath
Digital Design
Copyright © 2006 36
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
go AB_rd AB_addr A_data B_data

i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 sad_reg_ld
S4 sad_reg=sum
a
sad_reg +
sad_reg_ld=1
!(i<256) = (i_lt_256) Controller 32

sad
• Step 3: Connect to controller
Digital Design • Step 4: Replace high-level state machine by FSM
Copyright © 2006 37
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
• Comparing software and custom
circuit SAD
– Circuit: Two states (S2 & S3) for each i,
256 i’s 512 clock cycles !(i<256)
– Software: Loop (for i = 1 to 256), but for S2
each i, must move memory to local
i<256
registers, subtract, compute absolute sum=sum+abs(A[i]-B[i])
value, add to sum, increment i – say S3
i=i+1
about 6 cycles per array item
 256*6 = 1536 cycles
– Circuit is about 3 times (300%) faster
– Later, we’ll see how to build SAD circuit
that is even faster

Digital Design
Copyright © 2006 38
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Assuming Local registers: R, Q (8 bits)
register is update in the
state it’s written R<100 C

– Final value of Q? A B R>=100


– Final state?
R=99 R=R+1 D
– Answers may surprise you Q=R
(a)
• Value of Q unknown
R<100
• Final state is C, not D
clk A B C
– Why?
99 100
• State A: R=99 and Q=R
R ? 99 100
happen simultaneously
• State B: R not updated with Q ? ? ?
R+1 until next clock cycle,
simultaneously with state (b)
register being updated

Digital Design
Copyright © 2006 39
Frank Vahid
RTL Design Pitfalls and Good Practice
• Solutions Local registers: R, Q (8 bits)

– Read register in R<100 C


following state (Q=R) A B B2 R>=100
– Insert extra state so that
R=99 R=R+1 D
conditions use updated Q=R Q=R
value (a)

– Other solutions are R<100 R>=100


possible, depends on clk A B B2 D
the example 99 100
R ? 99 100 100

Q ? ? 99 99

(b)

Digital Design
Copyright © 2006 40
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Reading outputs Outputs: P (8 bits) Outputs: P (8 bits)
Local register: R (8 bits)
– Outputs can only be
written
– Solution: Introduce S T S T
additional register,
which can be written P=A P=P+B R=A P=R+B
and read P=A

(a) (b)

Digital Design
Copyright © 2006 41
Frank Vahid
RTL Design Pitfalls and Good Practice
• Good practice: Register B B
all data outputs R R
– In fig (a), output P would
show spurious values as
addition computes
• Furthermore, longest + +
register-to-register path,
which determines clock
period, is not known until P
that output is connected
to another component (a) Preg
– In fig (b), spurious outputs
reduced, and longest P
register-to-register path is (b)
clear

Digital Design
Copyright © 2006 42
Frank Vahid
Control vs. Data Dominated RTL Design
• Designs often categorized as control-dominated or data-
dominated
– Control-dominated design – Controller contains most of the
complexity
– Data-dominated design – Datapath contains most of the complexity
– General, descriptive terms – no hard rule that separates the two
types of designs
– Laser-based distance measurer – control dominated
– Bus interface, SAD circuit – mix of control and data
– Now let’s do a data dominated design

Digital Design
Copyright © 2006 43
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Filter concept
– Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
– That 240 is probably wrong! 12 digital filter 12
• Could be electrical noise clk
– Filter should remove such
noise in its output Y
– Simple filter: Output average
of last N values
• Small N: less filtering
• Large N: more filtering, but
less sharp output

Digital Design
Copyright © 2006 44
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• FIR filter
– “Finite Impulse Response” X Y
– Simply a configurable weighted 12 digital filter 12
sum of past input values clk
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Tens of taps more common
• Very general filter – User sets the
constants (c0, c1, c2) to define
specific filter
– RTL design
• Step 1: Create high-level state
machine
– But there really is none! Data
dominated indeed.
• Go straight to step 2
Digital Design
Copyright © 2006 45
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
– Begin by creating chain clk
of xt registers to hold past
values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIRfilter
x(t) x(t-1) x(t-2)

xt0 xt1 xt2


X 240
180
181 180
181 180 Y
12 12 12 12 a

clk

Digital Design
Copyright © 2006 46
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
– Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk

* * *
Y

Digital Design
Copyright © 2006 47
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate adders
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X

clk
a
* * *

+ +
Y

Digital Design
Copyright © 2006 48
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath (cont.) X Y
12 digital filter 12
– Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C

x(t) x(t-1) x(t-2)


c0 c1 c2
xt0 xt1 xt2 a
X

clk

* * *

+ + yreg
Y
Digital Design
Copyright © 2006 49
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Step 3 & 4: Connect to controller, Create FSM
– No controller needed
– Extreme data-dominated example
– (Example of an extreme control-dominated design – an FSM, with no
datapath)
• Comparing the FIR circuit to a software implementation
– Circuit
• Assume adder has 2-gate delay, multiplier has 20-gate delay
• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay
• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays
– Circuit is more than 100 times faster (10,000% faster). Wow.
Digital Design
Copyright © 2006 50
Frank Vahid
5.4

Determining Clock Frequency


• Designers of digital circuits
often want fastest
performance clk a b
– Means want high clock
frequency
• Frequency limited by longest
register-to-register delay
2 ns +
delay
– Known as critical path
– If clock is any faster, incorrect
data may be stored into register c
– Longest path on right is 2 ns
• Ignoring wire delays, and
register setup and hold times,
for simplicity

Digital Design
Copyright © 2006 51
Frank Vahid
Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns
a b
– b to d through + and *: 7 ns
– b to d through *: 5 ns
• Longest path is thus 7 ns 2 ns
delay
+ * 5 ns
delay
• Fastest frequency 7 ns 7 ns

2 ns

5 ns
7 ns
7 ns
– 1 / 7 ns = 142 MHz c d
Max
(2,7,7,5)
= 7 ns

Digital Design
Copyright © 2006 52
Frank Vahid
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns clk a b
• Trend
0.5 ns
– 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
– But wire delays not shrinking as fast as + 2 ns
logic delays
• Wire delays may even be greater than 0.5 ns
logic delays!

3 ns

3 ns
c 3 ns
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright © 2006 53
Frank Vahid
A Circuit May Have Numerous Paths
s a
• Paths can exist
– In the datapath Combinational logic 8 8
d
– In the controller
– Between the tot_ld
ld
controller and tot_clr tot
c clr
datapath
(c ) 8
tot_lt_s
– May be n1
hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8

• Timing analysis Datapath


s1 s0
tools that evaluate (b ) (a)
clk
all possible paths State register

automatically very
helpful
Digital Design
Copyright © 2006 54
Frank Vahid
5.5

Behavioral Level Design: C to Gates


C code
S0 !go
int SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_ reg = sum

• Earlier sum-of-absolute-differences example


– Started with high-level state machine
– C code is an even better starting point -- easier to understand
Digital Design
Copyright © 2006 55
Frank Vahid
Behavioral-Level Design: Start with C (or Similar
Language)
• Replace first step of RTL design method by two steps
– Capture in C, then convert C to high-level state machine
– How convert from C to high-level state machine?
Step 1A: Capture in C
a
Step 1B: Convert to high-level state machine

Digital Design
Copyright © 2006 56
Frank Vahid
Converting from C to High-Level State Machine
• Convert each C construct to
equivalent states and
transitions
• Assignment statement
target= a
target = expression;
– Becomes one state with expression
assignment
• If-then statement
– Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a

otherwise to ending state }

• “then” statements would also (end)


be converted to states

Digital Design
Copyright © 2006 57
Frank Vahid
Converting from C to High-Level State Machine
• If-then-else
!cond
– Becomes state with condition if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
else { a
to “else” statements if condition // else stmts (end)
false }

• While loop statement !cond

cond
– Becomes state with condition while (cond) {
// while stmts (while stmts)
check, transitioning to while a
}
loop’s statements if true, then
transitioning back to condition
(end)
check
Digital Design
Copyright © 2006 58
Frank Vahid
Simple Example of Converting from C to High-
Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)

X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a

(a) (b) (c)

• Simple example: Computing the maximum of two numbers


– Convert if-then-else statement to states (b)
– Then convert assignment statements to states (c)
Digital Design
Copyright © 2006 59
Frank Vahid
Example: Converting Sum-of-Absolute-Differences C
code to High-Level State Machine
Inputs: byte A[256, B[256]
• Convert each construct to bit go;
!(!go)
Output: int sad
states main() !go !go go !go go
{
– Simplify when possible, uint sum; short uint I;
while (1) {
sum=0 sum=0
i=0
e.g., merge states
while (!go); i=0
(d)
• From high-level state sum = 0;
i = 0;
machine, follow RTL design while (i < 256) {
(b)
(c)

method to create circuit sum = sum + abs(A[i] - B[i]);


i = i + 1;
• Thus, can convert C to }
}
sad = sum;
!go go !go go
gates using straightforward }
(a)
a

automatable process !go go


sum=0
i=0
sum=0
i=0
– Not all C constructs can be sum=0 !(i<256) !(i<256)
efficiently converted i=0
i<256 i<256
– Use C subset if intended !(i<256)
sum=sum sum=sum
for circuit i<256
+ abs
i=i+1
+ abs
i=i+1
while stmts
– Can use languages other sad =
sum
than C, of course
(g)
Digital Design
(e) sad =
Copyright © 2006 sum 60
Frank Vahid (f)
4.10

Register Files
• MxN register file
component provides er C C
32

efficient access to M N- er t
8
d0d0 load
a

loadreg0 huge mux


bit-wide registers s
? t ompu reg0 T
32 o th
– If we have many om ompucthe car4 162 4 8
i0 i0
or mi
mi T r
disp
To
com art l the car's too much
32-bit
8-bit r ror d
al
registers but only need rt F nr e 4
d1 loadfanout
reg1 A 16x41×1 r
ec c
a0
access one or two at a rF n i0
i1 a o
ve
time, a register file is 8 o
ve y
i3-i0
a1
i1
d d a
y - DD
d2 load reg2
more efficient I 8
32
– Ex: Above-mirror display i2
8 congestion
(earlier example), but this d3 load reg3 M
d15e load reg15
time having 16 32-bit e
registers load i15i3 s1 s0
load 32 8
s3-s0
• Too many wires, and x y
big mux is too slow

Digital Design
Copyright © 2006 61
Frank Vahid
Register File
• Instead, want component that has one data input and one data output,
and allows us to specify which internal register to write and which to read

32 32
W_data R_data a

4 4
W_addr R_addr

W_en R_en
16×32
register file

Digital Design
Copyright © 2006 62
Frank Vahid
Register File Timing Diagram
• Can write one clk
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6
register and read 1 2 3 4 5 6

one register each W_data 9 22 X X 177 555


clock cycle W_addr 3 1 X X 2 3
– May be same
W_en
register
R_data Z Z Z 9 Z 22 9 555

R_addr X X 3 X 1 3

R_en

0: ? 0: ? 0: ? 0: ? 0: ? 0: ? 0: ?
32 32
W_data R_data
1: ? 1: ? 1: 22 1: 22 1: 22 1: 22 1: 22
2: ? 2: ? 2: ? 2: ? 2: ? 2: 177 2: 177
2 2
W_addr R_addr 3: ? 3: 9 3: 9 3: 9 3: 9 3: 9 3: 555

W_en R_en
4x32
register file

Digital Design
Copyright © 2006 63
Frank Vahid
5.6

Memory Components
• Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller

M words
– A few more components are
often used outside the
controller and datapath
• MxN memory
– M words, N bits wide each N-bits
wide each
• Several varieties of memory,
which we now introduce M×N memory

Digital Design
Copyright © 2006 64
Frank Vahid
Random Access Memory (RAM)
• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
W_addr R_addr
• Strange name – Created several decades ago to
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually just one port; register file usually two
or more
32
– RAM vs. register file data
• RAM typically larger than roughly 512 or 1024 10
addr
words 1024 × 32
rw RAM
• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop en
• RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest
RAM block symbol
wires (hence delay) short
Digital Design
Copyright © 2006 65
Frank Vahid
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data cell
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells

rdata(N-1) rdata(N-2) rdata0 RAM cell

• Similar internal structure as register file


– Decoder enables appropriate word based on address
inputs
– rw controls whether cell is written or read
Digital Design
Copyright © 2006 – Let’s see what’s inside each RAM cell 66
Frank Vahid
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
SRAM cell
32 Let A = log2 M
data data’
data word bit storage
10 enable block ,,
addr d0
,,
(aka cell ) cell
1024x32 addr0 a0 word d d’
rw RAM addr1 a1 A  M

ad d r
d1
decoder
en data cell
addr(A-1) a(A-1) a
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
word 0
rdata(N-1) rdata(N-2) rdata0 enable

• “Static” RAM cell SRAM cell


– 6 transistors (recall inverter is 2 transistors) data data’
1 0
– Writing this cell d
• word enable input comes from decoder a

• When 0, value d loops around inverters 1 0

– That loop is where a bit stays stored


• When 1, the data bit value enters the loop word 1
– data is the bit to be stored in this cell enable
– data’ enters on other side
data data’
– Example shows a “1” being written into cell cell
d d’
1 0 a
Digital Design
Copyright © 2006 67
Frank Vahid word 0
enable
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,,
,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A  M

ad d r
d1
decoder
en data cell
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
SRAM cell
• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0

data data’
– Reading this cell 1 1
• Somewhat trickier d
• When rw set to read, the RAM logic sets
1 0
both data and data’ to 1
a
• The stored bit d will pull either the left line or
the right bit down slightly below 1 1 1 <1
word
• “Sense amplifiers” detect which side is enable
slightly pulled down To sense amplifiers

– The electrical description of SRAM is really


beyond our scope – just general idea here,
mainly to contrast with DRAM...
Digital Design
Copyright © 2006 68
Frank Vahid
Dynamic RAM (DRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,,
,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A  M

ad d r
d1
decoder
en data cell
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en
rw to all cells
rw data
DRAM cell
data
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0

cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging

• Read: Just look at value of d (a)


• Problem: Capacitor discharges over time
data
– Must “refresh” regularly, by reading d and
then writing it right back enable
discharges
d
(b)
Digital Design
Copyright © 2006 69
Frank Vahid
Comparing Memory Types
• Register file MxN Memory
– Fastest implemented as a:

– But biggest size register


file
• SRAM
– Fast SRAM
– More compact than register file DRAM
• DRAM
– Slowest
• And refreshing takes time
Size comparison for same
– But very compact
number of bits (not to scale)
• Use register file for small items,
SRAM for large items, and DRAM
for huge items
– Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design
Copyright © 2006 70
Frank Vahid
Reading and Writing a RAM
clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
• Writing (b)
– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading
– Set addr and en lines, but put nothing (Z) on data lines, set rw=0
– Data will appear on data lines
• Don’t forget to obey setup and hold times
– In short – keep inputs stable before and after a clock edge
Digital Design
Copyright © 2006 71
Frank Vahid
RAM Example: Digital Sound Recorder
4096 16
RAM

data
addr
rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld

• Behavior speaker

– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into
successive RAM words
– To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright © 2006 72
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
• RTL design of processor RAM

– Create high-level state


machine 16
analog-to- digital-to-
– Begin with the record behavior digital ad_buf
12
Ra Rw Ren analog
converter converter
– Keep local register a ad_ld processor da_ld

• Stores current address,


ranges from 0 to 4095 (thus
Record behavior
need 12 bits)
Local register: a (12 bits)
– Create state machine that a<4095
counts from 0 to 4095 using a S T
• For each a a=0 ad_ld=1 a
– Read analog-to-digital conv. ad_buf=1
Ra=a U
» ad_ld=1, ad_buf=1 Rrw=1 a=a+1
– Write to RAM at address a Ren=1
» Ra=a, Rrw=1, Ren=1 a=4095

Digital Design
Copyright © 2006 73
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one Play behavior
cycle after reading RAM, when
Local register: a (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a=0
a
ad_buf=0
machines would be parts of a Ra=a
X
larger state machine controlled Rrw=0
Ren=1
by signals that determine when da_ld=1
a=a+1
to record or play
a=4095

Digital Design
Copyright © 2006 74
Frank Vahid
Read-Only Memory – ROM
• Memory that can only be read from, not 32
data
10
written to addr
1024 × 32
– Data lines are output only rw RAM
– No need for rw input en

• Advantages over RAM


– Compact: May be smaller RAM block symbol

– Nonvolatile: Saves bits even if power supply


is turned off 32
– Speed: May be faster (especially than data
DRAM) 10
addr 1024x32
ROM
– Low power: Doesn’t need power supply to
save bits, so can extend battery life en

• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright © 2006 75
Frank Vahid
Read-Only Memory – ROM
32
data
10
addr 1024x32 Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

• Internal logical structure similar to RAM, without the data


input lines

Digital Design
Copyright © 2006 76
Frank Vahid
ROM Types
• If a ROM can only be read, how Let A = log2 M
word
enable
bit storage
block ,,

are the stored bits stored in the


,,
d0 (a cell )
addr0 a0 word
addr1 a1 A  M

addr
d1

first place?
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Storing bits in a ROM known as en


data

programming data(N-1) data(N-2) data0

– Several methods
• Mask-programmed ROM 1 data line 0 data line

– Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
• 2-bit word on right stores “10” enable
• word enable (from decoder) simply
passes the hardwired value
through transistor
– Notice how compact, and fast, this
memory would be
Digital Design
Copyright © 2006 77
Frank Vahid
ROM Types
• Fuse-Based Programmable Let A = log2 M
word
enable
bit storage
block ,,
,,
d0 (a cell )

ROM addr0
addr1
a0
a1 A  M
word

addr
d1
decoder
data
addr(A-1) a(A-1) cell

– Each cell has a fuse e d(M-1)


word word
enable enable
data
en

– A special device, known as a data(N-1) data(N-2) data0

programmer, blows certain fuses


(using higher-than-normal voltage)
1 data line 1 data line
• Those cells will be read as 0s
(involving some special electronics) cell cell
• Cells with unblown fuses will be read word
a

as 1s enable

• 2-bit word on right stores “10”


fuse blown fuse
– Also known as One-Time
Programmable (OTP) ROM

Digital Design
Copyright © 2006 78
Frank Vahid
ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage
enable block ,,

(EPROM) addr0
addr1
a0
a1 A  M
d0
,,
(a cell )
word

addr
d1
decoder
– Uses “floating-gate transistor” in each cell addr(A-1) a(A-1)
data

word word
cell

e d(M-1) enable enable

– Special programmer device uses higher- en


data

than-normal voltage to cause electrons to data(N-1) data(N-2) data0

tunnel into the gate

floating-gate
• Electrons become trapped in the gate data line data line

transistor
• Only done for cells that should store 0 cell cell
• Other cells (without electrons trapped in 1 0
gate) will be 1 o
tr
word eÐeÐ
– 2-bit word on right stores “10” enable
ting
ar
• Details beyond our scope – just general eta t trapped electrons
idea is necessary here g
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window
Digital Design
Copyright © 2006 79
Frank Vahid
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously 32
data
– Become common relatively recently (late 1990s) 10
addr
• Both types are in-system programmable en 1024x32
– Can be programmed with new stored bits while in the EEPROM
write
system in which the ROM operates
• Requires bi-directional data lines, and write control input busy

• Also need busy output to indicate that erasing is in


progress – erasing takes some time
Digital Design
Copyright © 2006 80
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• Want to record the outgoing
announcement 4096x16 Flash
– When rec=1, record digitized “We’re not home.”
sound in locations 0 to 4095
busy
– When play=1, play those
stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
• What type of memory? converter analog
– Should store without power ad_ld processor converter
da_ld
supply – ROM, not RAM
– Should be in-system rec
programmable – EEPROM record play
or Flash, not EPROM, OTP microphone speaker
ROM, or mask-programmed
ROM
– Will always erase entire
memory when
reprogramming – Flash
better than EEPROM

Digital Design
Copyright © 2006 81
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• High-level state machine 4096x16 Flash

– Once rec=1, begin


erasing flash by setting
16
er=1 analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
converter analog
– Wait for flash to finish ad_ld processor converter
da_ld
erasing by waiting for
rec
bu=0 record play

– Execute loop that sets microphone speaker

local register a from 0 to


4095, reading analog-to- Local register: a (13 bits)
bu
digital converter and a<4096 a
writing to flash for each a S T bu’ U
a=0 er=0 ad_ld=1
er=1 ad_buf=1
Ra=a V
rec
Rrw=1
Ren=1
a=a+1 a=4096

Digital Design
Copyright © 2006 82
Frank Vahid
Blurring of Distinction Between ROM and RAM
• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright © 2006 83
Frank Vahid
Hierarchy and Abstraction

• Abstraction
– Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
• e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
– Frees designer from having to remember,
co s7.. s0
or even from having to understand, the
lower-level details

Digital Design
Copyright © 2006 84
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
4 1
• A common task is to compose smaller components i0 i0
into a larger one i1 i1 a
– Gates: Suppose you have plenty of 3-input AND gates, i2 i2 d
but need a 9-input AND gate
i3 i3
• Can simple compose the 9-input gate from several 3-input
gates 2 1
– Muxes: Suppose you have 4x1 and 2x1 muxes, but s1 s0 i0
need an 8x1 mux d
• s2 selects either top or bottom 4x1 4 1 i1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output i5 i1
i6 i2 d
i7 i3
P
ro
vin s1 s0
ce 1
s1 s0 s2

Digital Design
Copyright © 2006 85
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
• Composing memory very common
• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
1024x8 1024x8 1024x8 1024x8
addr ROM ROM ROM ROM
en en en en
data data data data
en
8 8 8 8

data(31..0)

10
1024x32
ROM
data
Digital Design
Copyright © 2006 32
86
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
11
a9..a0
• Creating memory with more words addr
– Put memories on top of one another until the addr a10 1x2 d0 1024x8
number of desired words is achieved i0 dcd ROM
– Use decoder to select among the memories e d1 en data
• Can use highest order address input(s) as 8
decoder input
• Although actually, any address line could be en addr
used 1024x8
11 ROM
– Example: Compose 1024x8 memories into
2048x8 memory 2048x8 en data
ROM
a10 a9 a8 a0 8
0 0 0 0 0 0 0 0 0 0 0 data
0 0 0 0 0 0 0 0 0 0 1 addr 8
0 0 0 0 0 0 0 0 0 1 0 1024x8
a
ROM
0 1 1 1 1 1 1 1 1 1 0 en data
a10 just chooses
0 1 1 1 1 1 1 1 1 1 1 a
which memory To create memory with more
to access 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 addr words and wider words, can first
1 0 0 0 0 0 0 0 0 1 0 1024x8 compose to enough words, then
ROM widen.
Digital Design
1 1 1 1 1 1 1 1 1 1 0 en data
Copyright © 2006 87
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary
– Modern digital design involves creating processor-level components
– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
– Several example
• Control dominated, data dominated, and mix
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Additional RTL components
• Memory: RAM, ROM
• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright © 2006 88
Frank Vahid

You might also like