0% found this document useful (0 votes)
195 views

Digitaldesign Partialsolution

digital design(By dally) solution

Uploaded by

whalendda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views

Digitaldesign Partialsolution

digital design(By dally) solution

Uploaded by

whalendda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

lOMoARcPSD|32379196

Partial Solutions EEC 118

논리설계 및 실험(Digital Logic Design and Lab) (Seoul National University)

Studocu is not sponsored or endorsed by any college or university


Downloaded by DongJun Lee ([email protected])
lOMoARcPSD|32379196

Selected Solutions
Digital Design: A Systems Approach

William James Dally and R. Curtis Harting

September 10, 2012

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

2 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 1

Solutions: The Digital


Abstraction

1–1 (0.3,2,2), the worst case noise margin is larger

1–2 1. 0.5V
2. -0.5V
3. 0.7V
4. 0.4V

1–3 VN f 0.4V

1–4 GNDA can be 0.2V lower than GNDB or or 0.3V higher

1–8

dVout VOH 2 VOL


max g
dVin VIH 2 VIL
dVout 2.1 2 0.2
max g = 1.9
dVin 1.7 2 0.7

000 68
001 70
011 72
010 74
1–9
110 76
111 78
101 80
100 82

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

4 Digital Design: A Systems Approach, Solutions

1–10 We assume a truncation scheme, encoding 2+ T2 ,. For binary weightings:


 
T 2 68
T empA[i] = mod 2 (1.1)
2i+1

For thermometer encodings:


(
0 : T 2 70 2 2i < 0
T empB[i] = (1.2)
1 : T 2 70 2 2i g 0

1–11 For both (a) and (b) we can use a 6-bit scheme where the top 2 bits
represent the suit. The bottom 4 bits represent the rank in suit. In both
representations, the lower four bits are used to do any coparison of rank.
The upper two are used to do a comparison of suits.
1–15 The rules are to go west of the current address is greater than the des-
tination and east if the current is less than the destintation. If the two
addresses are equal, the current processor is the destination.
0000 0001 0010 0011
0100 0101 0110 0111
1–16 1.
1000 1001 1010 1011
1100 1101 1110 1111
2. Using the rules of the previous problem, the lower two bits determine
ease/west routing (east is higher) and the upper two bits represent
north/south (south is higher).
3. By splitting the north/south and east/west addesses, it becomes
much simplier to determine direction. If nodes had been assigned
meaningless IDs, a routing table (or more complex logic) is required
to determine direction.
1–17 One example is:
0000 0
0001 1
0011 2
0111 3
0101 4
0100 5

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 2

Solutions: The Practice of


Digital System Design

2–1 Student solutions should include information about how the system con-
nects to the TV (HDMI), resolution (1080p), the processor, DRAM, and
video. The games could be burned into the system, on DVD, or down-
loadable (wired or wireless Internet?). The controllers could be wired or
wireless, etc.
2–2 Version 2 of the console should probably be better than the first version, so
upgrades to core components such as DRAM, graphics card, and processor
are probably necessary. The controllers and video output can potentially
remain the same. Another option is to take advantage of advances in
fabrication technology to make a version of the console with the same
performance. Presumably, this would make the device cheaper and draw
less power.
2–4 In our example, we would buy the network interface and HDMI output
since they are commodity parts that our team could only mess up. We
would built the motherboard, for example, to tie together our components
in a custom fashion.
2–9 To average the four inputs on every cycle we need 3 sets of 32 flip-flops
(28.8kgrids) and 3 adders (90kgrids). We don’t need any multipliers be-
cause we can shift the sum of the four numbers by 2 (in binary). The
total is 118.8kgrids (or 488kgrids with a multiplier). To do the weighted
average, we need 4 multipliers (1.2Mgrids) and storage for 128 bits of
data (256 grids in ROM and 3072 in SRAM). The sum is approximately
1.3Mgrids regardless of the storage mechanism.
2–10 See answers below:
1. We need storage for 2.4 × 107 bits of SRAM. This is 5.76 × 108 grids,
or 4.8mm2 .

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

6 Digital Design: A Systems Approach, Solutions

2. It would take 5003 × 5ns = 0.625s to complete the full operation.


3. The remaining area is 5.2mm2 , enough area for 1890 functional units.
The matrix multiplication takes 330µs.

2–17 In 2015 there would be over 15B transistors and in 2020, 115B.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 3

Solutions: Boolean Algebra

3–1 We prove absorption by enumerating all possible input combinations:

x y x ' (x ( y) x x ( (x ' y) x
0 0 0 0 0 0
0 1 0 0 0 0
1 0 1 1 1 1
1 1 1 1 1 1

x x'x x(x
3–2 0 0 0
1 1 1

3–6 Omitting several uninteresting cases, the truth table is:

w x y z w'x'y'z w̄ ( x̄ ( ȳ ( z̄ w(x(y(z w̄ ' x̄ ' ȳ ' z̄


0 0 0 0 1 1 1 1
0 0 0 1 1 1 0 0
... ... ... ... 1 1 0 0
1 1 1 0 1 1 0 0
1 1 1 1 0 0 0 0

3–9 We simplify this equation by using the commutative and associative prop-
erties followed by the combing property (twice):

(x ' y ' z) ( (x̄ ' y) ( (x ' y ' z̄)


= ((x ' y ' z) ( (x ' y ' z̄)) ( (x̄ ' y)
= (x ' y) ( (x̄ ' y)
= y

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

8 Digital Design: A Systems Approach, Solutions

3–10

((y ' z̄) ( (x̄ ' w)) ' ((x ' ȳ) ( (z ' w̄))
= ((y ' z̄) ' ((x ' ȳ) ( (z ' w̄))) ( ((x̄ ' w) ' ((x ' ȳ) ( (z ' w̄))) Distributive
= ((y ' z̄ ' x ' ȳ) ( (y ' z̄ ' z ' w̄)) ( ((x̄ ' w ' x ' ȳ) ( (x̄ ' w ' z ' w̄)) Distributive
= (0 ( 0) ' (0 ( 0) Complementation
= 0

3–13 The dual is found by simply replaced ( with ' and ' with (.

f (x, y) = (x ' ȳ) ( (x̄ ' y)


D
f (x, y) = (x ( ȳ) ' (x̄ ( y)

This equation is true when x and y are the same, thus in normal form:

f D (x, y) = (x ' y) ( (x̄ ' ȳ)

3–17 The normal form of f (x, y, z) = x includes all minterms with x = 1:

f (x, y, z) = (x ' y ' z) ( (x ' y ' z) ( (x ' y ' z) ( (x ' y ' z)

3–20 We first directly write the equation from the schematic and then simplify
using DeMorgan’s law:

f (x, y, z) = (x ( y) ' (x ( y)
= (x ' y) ( (x ' y)

3–23 We directly draw the schematic from the equations, using inversion bub-
bles instead of full inverters:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 4

Solutions: CMOS Logic


Circuits

4–1 We enumerate all paths through the switch and write the logic in sum-of-
products from:

f (e, d, c, b, a) = (a ' c ' e) ( (a ' d) ( (b ' c ' d) ( (b ' e)

4–3 We find the solution by synthesizing the expression step-by-step, starting


with (y ' z) ( x̄. The final solution is:

x y z

Ç[ x[

x y z

4–5 The solution is:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

10 Digital Design: A Systems Approach, Solutions

x
w y
z
y
x
z
y z

4–11 By substituting a closed switch for PFETs with a 0 input and NFETs
with a 1 input (and open for PFET/1 and NFET/0), we can see that the
circuit outputs a 1 for the given input:

c c

b b
f

b b

a a

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 11

4–15 We make our calculations using the following equations:

L
Rs,N = KRN
W
1
WP = WN Kp
0.5
Cg = W LKc
Rs,N,130 = 1kΩ
Rs,N,28 = 2.1kΩ
Cg,N,130 = 4f F
Cg,N,28 = 0.56f F
WN,130 = 100Lmin
WN,28 = 52Lmin
Cg,P,130 = 20f F
Cg,P,28 = 1.5F

4–17 1.

1mm2 1 × 1012 nm2


N = 6 2 2
1 × 10 Lmin nm 1mm2
N130 = 59
N28 = 1276

2.

T = 400(1 + KP )τn
T130 = 5.46ns
T28 = 1.10ns

3.

N
Θ =
T
Gops
Θ130 = 10.9
s
T ops
Θ28 = 1.16
s

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

12 Digital Design: A Systems Approach, Solutions

4–18

a
c

4–20

4–23 We write the equation, simplify it, and then draw the circuit:

f (c, b, a) = (c ' b ' a) ( (c ' b ' a) ( (c ' b ' a) ( (c ' b ' a)


= (c ' b) ( (c ' a)

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 13

f
c[

4–26 Because CMOS gates are inverting, we write the equation using the
minterms that set f = 0 (000, 100, 110, 111). Next, we simplify and
draw the gate:

f (c, b, a) = (c ' b ' a) ( (c ' b ' a)) ( (c ' b ' a) ( (c ' b ' a)
= (b ' a) ( (c ' b)

f
b[

4–30 We can write the equation by analyzing the NFETs:

f (d, c, b, a) = d ( c ( (b ' a)

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

14 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 5

Solutions: Delay and Power


of CMOS Circuits

5–1 The pull-down (NFET) transistor has equal resistance to that of a mini-
mum inverter, giving a maximum (and minimum) fall time of the product
of fanout and inverter delay. That is tf max = 4tinv . The rise time, how-
ever, has a resistance to one-third that of a minimum inverter. Thus,
trmax = 34 tinv .

5–4 In order to drive a load of 256Cinv using a chain of FO2 inverters, we


need log2 (256) = 8 inverters. The total delay is the product of number of
stages (8) and the fanout of each stage (2), or 16tinv . The total energy
of transitioning up then down again (or down then up) is found (starting
with the output of stage 1) as:

n
X
E = Ci+1 V 2
i=0
n
X Ci+1
= Einv
i=0
Cinv
= (2 + 4 + 8 + 16 + 32 + 64 + 128 + 256)Einv
= 510Einv

5–8 For part (a), we draw the schematic below as a AND-OR-ANDI gate. To
size the transistor diagram, we want the maximum pull-up and pull-down
resistance to be that of a minimum inverter. Recall that resistance is
inversely proportional to width and we find the resistance of the pull-
down path of c-d-a to be 31 + 31 + 31 = 1. Pull-down path of b-a is 32 + 13 .
In this case having Wa = 2, Wc = Wd = 4, and Wb = 2 is also a valid
answer. We apply a similar methodology to the pull-up network.

15

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

16 Digital Design: A Systems Approach, Solutions

2Kp
2Kp
a
Kp
2Kp
d b
c
b f
a
3
c 1.5
3
d
3

(a) (b)

To calculate the logical effort of each input, we find the total input capac-
itance and divide by that of an inverter: 1 + Kp :

3 + Kp
LEa =
1 + Kp
1.5 + 2Kp
LEb =
1 + Kp
3 + 2Kp
LEc =
1 + Kp
3 + 2Kp
LEd =
1 + Kp

Signal Fanout Logical Effort (Kp = 1.3) Delay


i to i + 1 i to i + 1 i+1 i to i + 1
b 2 1.87 3.74
5–11 cN 2 2.70 5.4
d 1 1 1
eN 5 1 5
TOTAL a to eN 15.14tinv

5–14 To find the minimum delay, we first find the total effort. Next, we find

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 17

the stage effort to be the 4th root of the total effort:


Y
TE = LEi × F Oi
i
TE = 1.87 7 2.7 7 1 7 1 7 20
TE = 101
SE = T E 1/4
SE = 3.17

Using this stage effort (F O × LE) we find the new optimal sizes and delay:
Signal Fanout Size of Driven Logical Effort (Kp = 1.3) Delay
Gate
i to i + 1 i + 1 i to i + 1 i + 1 i to i + 1
b 1.70 3.19 1.87 3.17
cN 1.17 5.4 2.7 3.17
d 3.17 6.3 1 3.17
eN 3.17 20 1 3.17
TOTAL a to eN 12.68tinv

5–17 To solve this problem, we use the same methodology as in Example 5.6:
The size of our inverter is WN = 20 7 8Lmin = 160Lmin.


Rw = 10 × 500µm = 5000Ω,
µm
fF
Cw = 0.18 × 500µm = 90fF,
µm
KRN 4.2 × 104
Rr = = = 263Ω,
WN 160
Cr = WN (1 + KP )KC = 160(1 + 1.3)2.8 × 10217 = 10.3fF.

Using Equation (5.18) we compute the delay of each segment as:

Dl = 0.4Rw Cw + Rr Cw + (Rr + Rw )Cr


= (0.4)(5000)(90) + (263)(90) + (5000 + 263)(10.3)
= 180000 + 23, 670 + 54, 209 = 258ps

The delay of the entire wire, 20 segments, is 4Dl = (20)(258) = 5.16ns.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

18 Digital Design: A Systems Approach, Solutions

5–20 1. As was shown in the text, the delay per millimeter of an optimally
sized and spaced wire (61µm) is 228ps. 5mm is just 5 times that
amount: 1.14ns. The energy to transmit one bit (Vdd = 1) is:
1 2
E = V 7 (Cwire + Cd )
2 dd
1 2 fF
E = V 7 (5000µm 7 0.18 + 82 7 108(1 + 1.3)2.8 × 10217 )
2 dd µm
E = 0.75pJ
We found the total driver capacitance by multiplying the capacitance
of each driver by the number of drivers.
2. By doubling the spacing between wires to 122µm, we only need 41
segments. This reduces our driver energy by half and gives an energy
per bit of 0.6pJ. The delay calculation is:


Rw = 10 × 122µm = 1220Ω,
µm
fF
Cw = 0.18 × 122µm = 22fF,
µm
KRN 4.2 × 104
Rr = = = 389Ω,
WN 108
Cr = WN (1 + KP )KC = 108(1 + 1.3)2.8 × 10217 = 7.0fF
Dl = 0.4Rw Cw + Rr Cw + (Rr + Rw )Cr
= (0.4)(1220)(22) + (389)(22) + (1220 + 389)(7.0)
= 180000 + 23, 670 + 54, 209 = 31ps
D = 1.27ns

5–22 We calculate energy in the same way we did in Exercise 5–4. Remember
that the input capacitance is the product of the size and logical effort of
a gate:
n
X
E101 = Ci+1 V 2
i=0
n
X Ci+1
= Einv
i=0
Cinv
= (1 + 2 7 1.87 + 4 7 2.7 + 4 + 20)Einv
= 39.5Einv

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 6

Solutions: Combinational
Logic Design

6–1 Circuits A, B, and D are all combinational. C is not combinational since


the output of the middle block feeds the input of the leftmost block and
the output of the leftmost block feeds the input of the middle block. No
other diagram contains a cyclic composition of blocks.

No. in out
0 0000 1
1 0001 1
2 0010 1
3 0011 1
4 0100 0
5 0101 1
6 0110 0
6–2 (a) 7 0111 0
8 1000 1
9 1001 0
10 1010 0
11 1011 0
12 1100 0
13 1101 1
14 1110 0
15 1111 0

(b) See figure below, part (b)

19

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

20 Digital Design: A Systems Approach, Solutions

Number of variables
4 3 2 1
0000 000X 00XX
0001 00X0
(c) 0010 X000
0011 00X1
0101 0X01
1000 001X
1101 X101
The prime implicants of this function are 00XX, X000, 0X01, and
X101.
(d) The essential prime implicants of the function are 00XX (only cover
of 2,3), X000 (only one to cover 8), and X101 (only one to cover 13).
(e) The function is covered by the three essential prime implicants.
(f) See figure below:
a
ba
dc 00 01 11 10
d
10 11 13 12
00

04 15 07 06
01

b f
c

012 113 015 014


11

a
d

18 09 011 010
10

b
(b) (f)

No. in out
0 0000 1
1 0001 1
2 0010 1
3 0011 1
4 0100 0
5 0101 1
6 0110 0
6–4 (a) 7 0111 0
8 1000 1
9 1001 0
10 1010 X
11 1011 X
12 1100 X
13 1101 X
14 1110 X
15 1111 X

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 21

(b) See figure below, part (b)

Number of variables
4 3 2 1
0000 000X 00XX
0001 00X0 X0X0
0010 X000 1XX0
0011 00X1
0101 0X01
1000 001X
X010
X011
X101
10X0
1X00

The prime implicants of this function are 00XX, X0X0, 1XX0, 0X01,
X101, and X011.

(c) The essential prime implicant of the function is 00XX (only cover of
3).

(d) The function is covered by the implicants 00XX, X0X0, and X101.

(e) See figure below:

a
ba
dc 00 01 11 10
d
10 11 13 12
00

04 15 07 06
01

b f
c

X12 X13 X15 X14


11

a
d

18 09 X11 X10
10

b
(b) (f)

6–9 To solve this problem, we list all implicants and minimize:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

22 Digital Design: A Systems Approach, Solutions

Number of variables
5 4 3 2 1
00010 0001X X0X11
00011 00X11
00101 0X011
00111 X0011
01011 001X1
01101 0X101
10001 X1101
10011 100X1
10111 10X11
11101 1X111
11111 111X1
The essential prime implicants are: 0001X, 0X011, and 100X1. A possible
cover of the function is: 0001x, 0x011, 100X1, X0X11, 0X101, and 111X1.
One solutions in sum-of-products for is:

¯
f (e, d, c, b, a) = (e'd'c'b)((e'c'b'a)((e' ¯
d'c'a)(( ¯
d'b'a)((e'c'b'a)((e'd'c'a)

6–12 Using the same Karnaugh-map of Exercise 6–4, we can see the cover of
maxterms (when the output is 0) is OR(0XX0), OR(X00X), and OR(X0X1).
Our function is:

f (d, c, b, a) = (d¯ ( a) ' (c ( b) ' (c ( a)

6–14 Segment 0 is lit for inputs 2, 3, 4, 5, 6, 8, 9, A, B, D, E, F. We show the


Karnaugh-map below:

a
ba
dc 00 01 11 10
00 01 13 12
00

14 15 07 16
01
c

012 113 115 114


11

18 19 111 110
10

One possible cover is:

f (d, c, b, a) = (b ' a) ( (d ' c) ( (d ' a) ( (d¯ ' c ' b) ( (d¯ ' c ' b)

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 23

6–15 Segment 1 is lit for inputs 0, 4, 5, 6, 8, 9, A, B, C, E, F. We show the


Karnaugh-map below:

a
ba
dc 00 01 11 10
10 01 03 02
00

14 15 07 16
01
c

112 013 115 114


11

d
18 19 111 110
10

One possible cover is:

f (d, c, b, a) = (b ' a) ( (d ' c) ( (d¯ ' c ' b) ( (c ' b ' a) ( (d ( c ( b)

6–21 The Karnaugh-map is below:

a
ba
dc 00 01 11 10
00 01 13 12
00

14 15 07 16
01
c

X12 X13 X15 X14


11

18 19 X11 X10
10

One possible cover is:

f (d, c, b, a) = (b ' a) ( (d) ( (c ' b) ( (d¯ ' c ' b)

6–22 The Karnaugh-map is below:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

24 Digital Design: A Systems Approach, Solutions

a
ba
dc 00 01 11 10
10 01 03 02

00
14 15 07 16

01
c X12 X13 X15 X14
11

d
18 19 X11 X10
10

One possible cover is:


f (d, c, b, a) = (b ' a) ( (d) ( (c ' b) ( (c ' a)

6–28 Using the same Karnaugh-map as Exercise 6–14, we can find a cover of
maxterms as:
f (d, c, b, a) = (d ( c ( b) ' (d ( cb ( a) ' (d¯ ( c ( b ( a)

6–35 Using the same Karnaugh-map as Exercise 6–21, we can find a cover of
maxterms as:
f (d, c, b, a) = (d ( c ( b) ' (c ' b ' a)

6–42 Using the three Karnaugh-maps below (outputs 0, 1, 2 from L to R), we


look for common terms in the coverage equations:

a a a
ba ba ba
dc 00 01 11 10 dc 00 01 11 10 dc 00 01 11 10
00 01 13 12 10 01 03 02 10 01 03 12
00

00

00

14 15 07 16 14 15 07 16 04 05 07 16
01

01

01
c

X12 X13 X15 X14 X12 X13 X15 X14 X12 X13 X15 X14
11

11

11
d
d

18 19 X11 X10 18 19 X11 X10 18 09 X11 X10


10

10

10

b b b

f (d, c, b, a)0 = (b ' a) ( (d) ( (c ' b) ( (d¯ ' c ' b)


f (d, c, b, a)1 = (b ' a) ( (d) ( (c ' b) ( (c ' a)
f (d, c, b, a)2 = (c ' a) ( (b ' a)

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 25

We can share the b ' a(XX10) term between outputs 0 and 2. We also
share implicant (c ' b)(X10X) between outputs 0 and 1.
6–44 The hazard occurs when a = b = c = 1 and then a toggles to 0. The
output may go low for a period of time equal to that of the NOT-AND
delay. The simplest fix for this problem is to remove the a input from the
AND gate. This simplifies the logic equation from f = a ( (c ' b ' a) to
just f = a ( (c ' b).

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

26 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 7

Solutions: Verilog
Descriptions of
Combinational Logic

7–1 module fib_case(a, b) ;


input [3:0] a;
output b;
reg b;
always @(*) begin
case(a)
0, 1, 2, 3, 5, 8, 13: b = 1;
default b = 0;
endcase // case (a)
end // always @ (*)
endmodule // fib_case
7–2 module fib_casex(a, b) ;
input [3:0] a;
output b;
reg b;
always @(*) begin
casex(a)
4’b00xx: b = 1;
4’bx000: b = 1;
4’bx101: b = 1;
default b = 0;
endcase // casex (a)
end // always @ (*)
endmodule // fib_casex
7–3 module fib_assign(a, b) ;

27

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

28 Digital Design: A Systems Approach, Solutions

input [3:0] a;
output b;
assign b = (~a[3] & ~a[2]) |
(~a[2] & ~a[1] & ~a[0]) |
(a[2] & ~a[1] & a[0]);
endmodule // fib_assign

7–4 module AND2(a, b, z);


input a, b;
output z;
assign z = a & b;
endmodule // AND2

module AND3(a, b, c, z);


input a, b, c;
output z;
assign z = a & b & c;
endmodule // AND3

module OR3(a, b, c, z);


input a, b, c;
output z;
assign z = a | b | c;
endmodule // OR3

module fib_struct(a, b);


input [3:0] a;
output b;
wire t0, t1, t2;
AND2 andt0(~a[3], ~a[2], t0);
AND3 andt1(~a[2], ~a[1], ~a[0], t1);
AND3 andt2(a[2], ~a[1], a[0], t2);
OR3 orb(t0, t1, t2, b);
endmodule // fib_struct

7–5 The testbench below requires the user to manually verify the output of
dut0. It automatically checks that all 4 implementations provide the same
answer.

module fib_tb ;
reg [3:0] a;
wire [3:0] o;
reg error;
fib_case dut0(a, o[0]);
fib_casex dut1(a, o[1]);

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 29

fib_assign dut2(a, o[2]);


fib_struct dut3(a, o[3]);
initial begin
a = 4’h0;
error = 0;
repeat (16) begin
#10;
$display("%h -> %b", a, o[0]);
if(o[0] !== o[1]) error = 1;
if(o[0] !== o[2]) error = 1;
if(o[0] !== o[3]) error = 1;
a = a+1;
end
if(error === 0) $display("Tests Passed");
else $display("Tests Failed");

end
endmodule // fib_tb

7–10 module dec_fib_case(a, b) ;


input [3:0] a;
output b;
reg b;
always @(*) begin
case(a)
0, 1, 2, 3, 5, 8: b = 1;
4, 6, 7, 9: b = 0;
default b = 1’bx;
endcase // case (a)
end // always @ (*)
endmodule // fib_case
This is the right approach as it allows us to simply encode the truth table.

7–15 module bit_reverse(a, b);


input [4:0] a;
output [4:0] b;
assign b = {a[0], a[1], a[2], a[3], a[4]};
endmodule // bit_reverse

7–24 module ff1(a, o, v);


input [15:0] a;
output [3:0] o;
output v;

reg [3:0] o;
reg v;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

30 Digital Design: A Systems Approach, Solutions

always@(*) begin
casex(a)
16’b1xxxxxxxxxxxxxxx: {v,o} = 5’h1F;
16’b01xxxxxxxxxxxxxx: {v,o} = 5’h1E;
16’b001xxxxxxxxxxxxx: {v,o} = 5’h1D;
16’b0001xxxxxxxxxxxx: {v,o} = 5’h1C;
16’b00001xxxxxxxxxxx: {v,o} = 5’h1B;
16’b000001xxxxxxxxxx: {v,o} = 5’h1A;
16’b0000001xxxxxxxxx: {v,o} = 5’h19;
16’b00000001xxxxxxxx: {v,o} = 5’h18;
16’b000000001xxxxxxx: {v,o} = 5’h17;
16’b0000000001xxxxxx: {v,o} = 5’h16;
16’b00000000001xxxxx: {v,o} = 5’h15;
16’b000000000001xxxx: {v,o} = 5’h14;
16’b0000000000001xxx: {v,o} = 5’h13;
16’b00000000000001xx: {v,o} = 5’h12;
16’b000000000000001x: {v,o} = 5’h11;
16’b0000000000000001: {v,o} = 5’h10;
default: {v,o} = 5’h00;
endcase // casex (a)
end // always@ (*)
endmodule // ff1

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 8

Solutions: Combinational
Building Blocks

8–1 module dec38(a, b) ;


input [2:0] a;
output [7:0] b;

assign b[0] = ~a[0] & ~a[1] & ~a[2] ;


assign b[1] = a[0] & ~a[1] & ~a[2];
assign b[2] = ~a[0] & a[1] & ~a[2];
assign b[3] = a[0] & a[1] & ~a[2];
assign b[4] = ~a[0] & ~a[1] & a[2] ;
assign b[5] = a[0] & ~a[1] & a[2];
assign b[6] = ~a[0] & a[1] & a[2];
assign b[7] = a[0] & a[1] & a[2];
endmodule // dec38

8–2 module ssdec(a, b) ;


input [3:0] a;
output [6:0] b;

wire [15:0] w;
assign b = (‘SS_0 & {7{w[0]}}) |
(‘SS_1 & {7{w[1]}}) |
(‘SS_2 & {7{w[2]}}) |
(‘SS_3 & {7{w[3]}}) |
(‘SS_4 & {7{w[4]}}) |
(‘SS_5 & {7{w[5]}}) |
(‘SS_6 & {7{w[6]}}) |
(‘SS_7 & {7{w[7]}}) |
(‘SS_8 & {7{w[8]}}) |
(‘SS_9 & {7{w[9]}});

31

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

32 Digital Design: A Systems Approach, Solutions

Decoder #(4, 16) dec(a, w);


endmodule // ssdec
8–4 module dec532(a, b);
input [4:0] a;
output [31:0] b;
wire [7:0] x;
wire [3:0] y;

assign b[7:0] = x & {8{y[0]}};


assign b[15:8] = x & {8{y[1]}};
assign b[23:16] = x & {8{y[2]}};
assign b[31:24] = x & {8{y[3]}};

Decoder #(3, 8) d0(a[2:0], x);


Decoder #(2, 4) d1(a[4:3], y);
endmodule // dec53
8–8 module multFib(a, b);
input [3:0] a;
output b;

Muxb8 #(1) mux(0, 0, 1, 0, ~a[3], ~a[3], ~a[3], 1, a, b);


endmodule // multFib
8–11 For this problem, we had to reverse the priority polarity from the example
given in the chapter. To do so, we no compute c from the MSB to the
LSB.

module progPriEnc83(a, p, b) ;
input [7:0] a, p;
output [2:0] b;
wire [7:0] g;

wire [15:0] c = ({1’b0, c[15:1]} & {1’b0, ~a, ~a[7:1]}) | {p, 8’d0};
assign g = a & (c[15:8] | c[7:0]);
Enc83 enc(g, b);
endmodule // progPriEnc

8–14 module MagCompML(a, b, gt) ;


parameter k = 8;
input [k-1:0] a, b;
output gt;
wire [k-1:0] eqi = a ~^ b;
wire [k-1:0] gti = a & ~b;
wire [k:0] eqa = {1’b1, (eqi[k-1:0]&eqa[k:1]) };
wire [k:0] gta = {1’b0, gta[k:1] | (gti[k-1:0] & eqa[k:1])};

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 33

assign gt = gta[0];
endmodule // MagCompML
8–17 module funnelShift(a, n, b) ;
parameter i = 16;
parameter j = 8;
parameter l = 3;
input [i-1:0] a;
input [l-1:0] n;
output [j-1:0] b;
assign b = a >> n;
endmodule // funnelShift
8–19 module findMin (a, b, c, z) ;
parameter n = 16;
input [n-1:0] a, b, c;
output [n-1:0] z;

wire agtb, bgtc, cgta;


MagCompML #(n) ab(a, b, agtb);
MagCompML #(n) ac(a, c, agtc);
MagCompML #(n) bc(b, c, bgtc);

Mux3 #(n) mout(a, b, c, {~agtb & ~agtc, ~bgtc & agtb, agtc & bgtc}, z);
endmodule // findMin
8–21 Our ROM would need to have 16 entries (one for each number). Ad-
dressed with the 4 input bits, each value in our table would be a single
bit. We store a 1 in locations indexed by a prime number (2, 3, 5, 7, etc.)
and a 0 everywhere else.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

34 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 9

Solutions: Combinational
Examples

9–3 The way we build our multiple-of-5 circuit is analogous to the multiple of
3, as shown below:
in7

in6

in0
in in in
Mul5 rem23:21 Mul5 rem20:18 rem5:3 Mul5 rem2:0 out
rout

rout

rout
rin

rin

rin
0 =0
Bit Bit Bit
3 3 3 3

We keep the intermediate remainder in a 3 bit format, in order to represent


0-4. We show the logic for computing next remainder in the next solution.

9–4 module Multiple_of_5_bit(in, rin, rout) ;


input in;
input [2:0] rin;
output [2:0] rout;
reg [2:0] rout;

//rin can only be 0-4, not 5,6,7


always@(*) begin
case({rin, in})
4’b0000: rout = 3’d0;
4’b0001: rout = 3’d1;
4’b0010: rout = 3’d2;
4’b0011: rout = 3’d3;

35

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

36 Digital Design: A Systems Approach, Solutions

4’b0100: rout = 3’d4;


4’b0101: rout = 3’d0;
4’b0110: rout = 3’d1;
4’b0111: rout = 3’d2;
4’b1000: rout = 3’d3;
4’b1001: rout = 3’d4;
default: rout = 3’bxxx;
endcase // case ({rin, in})
end // always@ (*)
endmodule // Multiple_of_5_bit

module Multiple_of_5(in, out) ;


input [7:0] in;
output out;
wire [23:0] rem;

Multiple_of_5_bit b7(in[7], 3’d0, rem[23:21]);


Multiple_of_5_bit b6(in[6], rem[23:21], rem[20:18]);
Multiple_of_5_bit b5(in[5], rem[20:18], rem[17:15]);
Multiple_of_5_bit b4(in[4], rem[17:15], rem[14:12]);
Multiple_of_5_bit b3(in[3], rem[14:12], rem[11:9]);
Multiple_of_5_bit b2(in[2], rem[11:9], rem[8:6]);
Multiple_of_5_bit b1(in[1], rem[8:6], rem[5:3]);
Multiple_of_5_bit b0(in[0], rem[5:3], rem[2:0]);
assign out = ~(|rem[2:0]);
endmodule // Multiple_of_5

module testMul5 ;
reg [7:0] in;
reg error;
wire out;
Multiple_of_5 dut(in, out);
initial begin
in = 0;
error = 0;
repeat(256) begin
#100;
if(out !== ((in % 5) == 0)) begin
$display("ERROR %d -> %b", in, out);
error = 1;
end
in = in + 1;
end
if(error === 0) $display("PASS");
end // initial begin
endmodule // testMul5

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 37

9–9 One correct solution is to implement this function using a case statement.
For example: ‘MONDAY: tomorrow = ‘TUESDAY.
9–10 One possible solution is to pass the year input to the DaysInMonth mod-
ule. We can then modify the case statement:

casex({year[1:0], month})
6’bxx0100: days = 5’d30;
6’bxx0110: days = 5’d30;
6’bxx1001: days = 5’d30;
6’bxx1011: days = 5’d30;
6’b000010: days = 5’d29;//year divisible by 4 => leap year
6’bxx0010: days = 5’d28;
default: days = 5’d31;
endcase // casex ({year[1:0], month})

9–12 There are two different options when designing this circuit. First, we
could modify the comparator block to output a signal gteq for greater
than or equal. Leaving the rest of the logic as is, the arbiter would break
ties to the higher input. A second option is to simply switch the a and b
inputs to each comparator.
9–16 The simplest way to encode the lowest output is to place an inverter at the
output of each magnitude comparator. That way, the comparator output
is the f value. Ties are broken in favor of the higher number input.
9–18 A version of OneInRow and OneInArray are shown below. We do not use
any sort of strategy in selecting the next position to play. We integrated
the OneInArray function into the main module as the 2nd lowest prior-
ity. The OneInArray module itself is nearly identical to the TwoInArry
module.

module OneInRow(ain, bin, cout) ;


input [2:0] ain, bin;
output [2:0] cout ;
//Input is highest to lowest priority
//cout[0] (__X) = X__ or _X_
assign cout[0] = (ain[2] & ~(ain[1] | bin[1]) & ~(ain[0] | bin[0])) |
(~(ain[2] | bin[2]) & ain[1] & ~(ain[0] | bin[0]));
//cout[1] = __X
assign cout[1] = ~(ain[2] | bin[2]) & ~(ain[1] | bin[1]) & ain[0];
assign cout[2] = 1’b0;
endmodule // OneInRow

module OneInArray(ain, bin, cout) ;


input [8:0] ain, bin ;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

38 Digital Design: A Systems Approach, Solutions

output [8:0] cout ;


wire [8:0] rows, cols ;
wire [2:0] ddiag, udiag ;

// check each row


OneInRow topr(ain[2:0],bin[2:0],rows[2:0]) ;
OneInRow midr(ain[5:3],bin[5:3],rows[5:3]) ;
OneInRow botr(ain[8:6],bin[8:6],rows[8:6]) ;

// check each column


OneInRow leftc({ain[6],ain[3],ain[0]},
{bin[6],bin[3],bin[0]},
{cols[6],cols[3],cols[0]}) ;
OneInRow midc({ain[7],ain[4],ain[1]},
{bin[7],bin[4],bin[1]},
{cols[7],cols[4],cols[1]}) ;
OneInRow rightc({ain[8],ain[5],ain[2]},
{bin[8],bin[5],bin[2]},
{cols[8],cols[5],cols[2]}) ;

// check both diagonals


OneInRow dndiagx({ain[8],ain[4],ain[0]},{bin[8],bin[4],bin[0]},ddiag) ;
OneInRow updiagx({ain[6],ain[4],ain[2]},{bin[6],bin[4],bin[2]},udiag) ;

//OR together the outputs


assign cout = rows | cols |
{ddiag[2],1’b0,1’b0,1’b0,ddiag[1],1’b0,1’b0,1’b0,ddiag[0]} |
{1’b0,1’b0,udiag[2],1’b0,udiag[1],1’b0,udiag[0],1’b0,1’b0} ;
endmodule // OneInArray

9–21 module tttLegal(a, b, legal) ;


input [8:0] a, b;
output legal;

wire noConflict = ~(|(a&b));


//Count the number of ones
wire [3:0] nX = a[8] + a[7] + a[6] + a[5] + a[4] +
a[3] + a[2] + a[1] + a[0] ;
wire [3:0] nO = b[8] + b[7] + b[6] + b[5] + b[4] +
b[3] + b[2] + b[1] + b[0] ;
//want nx + 1 >= nO & nO >= nx
// ~(nX + 1 < nO) & ~(nX > nO)
wire [1:0] illegalCnt;
MagComp #(4) m0(nO, nX + 1,illegalCnt[0]);
MagComp #(4) m1(nX, nO, illegalCnt[1]);
assign legal = noConflict & ~illegalCnt[1] & ~illegalCnt[0];

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 39

endmodule // tttLegal

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

40 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 10

Solutions: Arithmetic
Circuits

10–1 81710 = 11001100012 = 33116

10–2 149210 = 101110101002 = 5D416

10–5 0011001100012 = 29 + 28 + 25 + 24 + 1 = 81710

10–9 2C16 = 001011002 = 11 7 163 = 4410

10–10 BEEF16 = 10111110111011112 = 11 7 163 + 14 7 162 + 14 7 16 + 15 =


4887910

1010
10–14 + 0111
10001

10–16 We can either perform this addition in hexadecimal, or convert to an-


other base and then convert the resulting answer back to hexadecimal.
We’ll do both below (carries are shown in the top row):
1 0111 000
2A 0010 10102
+3C 0011 11002
66 0110 01102

10–18 To count the seven input bits, labeled abcdefg, we need four full adders.
The first two have binary weight 0 and count the number of 1s in inputs
abc and def . Next is a third FA with weight 0 that counts the two previous
sums and the seventh input, g. Finally, the three carry-outs of the FAs
are sent to a final adder with weight 21 . The schematic is shown below:

41

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

42 Digital Design: A Systems Approach, Solutions

s2
C

a
C
FA
s1
b S
FA
c
S

d
C
e
FA C
f S FA
g S
s0

10–20 Since we are not concerned about subtraction (yet), an overflow condi-
tion is detected when any cout is produced by the adder. This output
selects, using a multiplexer, between the computed sum and 2n 2 1 as
shown below:

a cout
a
n Adder s
b b n 0
n cin s
Mux
n
n
2 -1 1
n

Sign-magnitude 0 001 0001


10–25 1’s complement 0001 0001
2’s complement 0001 0001

Sign-magnitude 1 001 0001


10–26 1’s complement 1110 1110
2’s complement 1110 1111
10–29 To do the subtraction, we first take the 2’s complement of 201102 =
10102 . Next, we add 01012 + 10102 = 11112 = 2110 .
10–30 To do the subtraction, we first take the 2’s complement of 211102 =
00102 . Next, we add 01012 + 00102 = 01112 = 710 .
10–33 Our 1s complement adder is shown below. When subtracting, we simply
negate (take the 1s complement) the b input. Unlike a 2s complement
adder, however, we connect the carry-out of the adder back onto the carry-
in. To see why, examine the number wheel of Figure 10.9(b). When we
add two numbers, we count off positions going clockwise. When counting

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 43

from 1111 to 0000, a carry-out is generated. This carry out is added back
into the input in order to account for the extra step needed to “go by” -0.

a cout
a
b n s out
Adder
n b n
n cin

sub

10–35 The design of this circuit is similar to that of Exercise 10–20. However,
we now must detect both positive and negative overflow (Table 10.3).
These two bits and their NOR are used as the three selection bits into a
mux.

10–40 The multiplication is shown below:

0 1 01
x 0 1 01
0 1 01
00 0 0
010 1
0000
00011001

10–44 Multiplying a number, a, by 5 is the same as taking 4a + a. As shown


below, we simply left shift a by 2 (multiplying by 4) and add it back with
the sign-extended input. As a general rule, if you are building hardware
to multiply by a fixed constant, do not use a multiplier.

{a, 0, 0} cout
a
n+2 s f
Adder
{an-1, an-1, a} b n+2 n+3
n+2 cin

10–52 The division is shown below:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

44 Digital Design: A Systems Approach, Solutions

00 1 0 0 1
10110 1 1 1 0
- 10 1 0 0 0
00 0 1 1 0
- 1 0 1
0 0 1

10–54 We could do the division by converting to binary (or decimal) and then
converting back to hex. Instead, we list the multiplication table for E to
find the quotient. By looking in the table below, we find that AE ÷E = C.
The remainder is AE 2 A8 = 6.
× E
0 0
1 E
2 1C
3 2A
4 38
5 46
6 54
7 62
8 70
9 7E
A 8C
B 9A
C A8
D B6
E C4
F D2

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 11

Solutions: Fixed- and


Floating-Point Numbers

11–1
1.01012 = 1 × 1 + 1 × 0.25 + 1 × 0.0625 = 1.312510
11
11–2 The number represented is 2 16 .
11–5 The first two bits are 01 to represent a positive one integer. We find the
remaining digits as follows:
0.599910 0.000002
20.510 +0.100002
0.099910 0.00102
20.062510 +0.00012
0.037410 0.100102
20.0312510 +0.000012
0.0061510 0.100112
Our fixed-point number is 01.100112 or 1.5937510. The absolute error is
0.00615 and the relative error is 0.38%. We did not round our fixed-point
1
value because 0.00615 < 64 .
11–6 The first two bits are 00 to represent a positive number with no integer
component. We find the remaining digits as follows:
0.377510 0.000002
20.2510 +0.012
0.127510 0.010002
20.12510 +0.0012
0.002510 0.011002
Our fixed-point number is 00.011002 or 0.37510. The absolute error is
0.0025 and the relative error is 0.66%.

45

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

46 Digital Design: A Systems Approach, Solutions

11–9 If we assume a basic rounding scheme, the precision of our system is


1
half our resolution. Since the resolution is 32 , or precision and maximum
1 1
absolute error is 64 . Any odd multiple of 64 has maximal absolute error.

7
11–10 The maximal relative error occurs at 64 = 0.109375. To view this
graphically, you can use the following WolframAlpha1 R command:

Maximize[{Abs[Round[x 32]/32 - x]/x, 0.1 <= x <= 0.2}, {x}]

11–11 Because we have both positive and negative numbers, we need a sign
bit. We also need 4 integral bits to represent magnitudes from 0 to 1010 .
1
The minimal accuracy is 16 = 0.0625 which gives a resolution of 81 . Thus,
our format is s4.3.

15
11–13 The value is 16 × 2723 = 15.

4
11–14 The value is 21 × 8 × 2123 = 20.125.

11–17 The sign bit is 1, since the number is negative. 23 in binary is 10011,
which is rounded to 101 × 22 = 5810 10
× 25 . The exponent is the sum of
the bias and 5: 01101. The answer is 1101E01101. The absolute error is
24 2 23 = 1 and the relative error is 4.3%.

11–18 The sign bit is 0, and 100000010 = F 424016. j 100 × 218 = 8410
10
× 221 .
Adding the bias gives a final answer 0100E11101. The number represented
is 220 = 1048576, an error of 48 576, or 4.9%.

11–21 The mantissa of this representation needs to be 4 bits to bound the error
to 10%. The smallest magnitude that must be represented accurately is
1 8 24
32 = 16 × 2 . Our exponent bias is -4 and the maximum exponent is 4,
9 different values. The final representation is s4E4.

11–25 Our additions to the floating point adder presented are shown below.
The initial design included a FF1 shift unit capable of shifting the LSB
to MSB, so we did not modify it. We selectively invert the number with
the smaller magnitude based on the XOR of both input signs and the
subtract signal. The output sign, cs is that of the input with the highest
magnitude (bs is inverted in subtraction). Note that the highest magni-
tude comparison must include the mantissa bits of the input to handle
exponent ties.

1 www.wolframalpha.com

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 47

ae be am bm as bs sub

3 3 5 5

Exp
a>b
Logic
3 3
aeqb
agtb
ge

lm
gm
5
de
Shift

isub
alm
5 5

Add

6
sm

0 1

FF1/Shift
3 5
round

sc
nm

a-b+1 Inc
5
3

ovf ce cm cs

11–29 Shown below, we handle gradual underflow by right shifting the mantissa
if the newly computed exponent is less than 0. We also include a MUX to
clamp the output exponent to 0.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

48 Digital Design: A Systems Approach, Solutions

ae be am bm

3 3 5 5

Exp
Logic
3 3 agtb
ge

lm
gm
5
de
Shift

alm
5 5

Add

6
sm

FF1/Shift
3 5
round

sc
nm

a-b+1 Inc
5
4
e3
3
*-1
1
e2:0

0 0
>>
0

3
0 1

ovf ce cm

11–32 Our adder first must include the implicit-1 if present. We must also
modify the shift logic to account for the fact that exponents of 0 and 1
weight the mantissa equally. We then use the same logic as in the previous
problem (omitting the mantissa MSB) for final exponent and mantissa
calculation.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 49

ae be am bm
iam5 5 5
!=0
ibm5
!=0
3 3 6 6

Exp
Logic*
3 3 agtb
ge

lm
gm
5

de Shift
de = a-b-1 if a>0, b=0
de = b t a t 1 if b>0, a=0 alm
6 6

Add

7
sm

FF1/Shift
3 6
round

sc
nm

Inc
a-b+1
6
4
e3
3
*-1
1
e2:0

0 0
>>
0

3
0 1 6
4:0

ovf ce cm

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

50 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 12

Solutions: Fast Arithmetic


Circuits

12–1 We use the propagate and generate equations shown in Equations 12.10-
12.12 to formulate our comparator, setting the output to the generate
signal of the final comparator. (The propagate signal of the final com-
parator signals equality.)

module PG5(pi, gi, po, go) ;


input [4:0] pi, gi;
output po, go;

assign po = &pi;
assign go = gi[4] | (gi[3] & pi[4]) | (gi[2] & (&pi[4:3])) |
(gi[1] & (&pi[4:2])) | (gi[0] & (&pi[4:1]));
endmodule // PG5

module PG6(pi, gi, po, go) ;


input [5:0] pi, gi;
output po, go;

assign po = &pi;
assign go = gi[5] | (gi[4] & pi[5]) | (gi[3] & (&pi[5:4])) |
(gi[2] & (&pi[5:3])) | (gi[1] & (&pi[5:2])) | (gi[0] & (&pi[5:1]));
endmodule // PG6

module comp32(a, b, agtb) ;


input [31:0] a, b;
output agtb;

wire [31:0] g, p;
assign g = a & ~(b);

51

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

52 Digital Design: A Systems Approach, Solutions

assign p = ~(a ^ b);

wire [5:0] p6, g6;


PG6 pg10(p[5:0], g[5:0], p6[0], g6[0]);
PG6 pg11(p[11:6], g[11:6], p6[1], g6[1]);
PG5 pg12(p[16:12], g[16:12], p6[2], g6[2]);
PG5 pg13(p[21:17], g[21:17], p6[3], g6[3]);
PG5 pg14(p[26:22], g[26:22], p6[4], g6[4]);
PG5 pg15(p[31:27], g[31:27], p6[5], g6[5]);

wire p32;
PG6 pg2(p6[5:0], g6[5:0], p32, agtb);
endmodule // comp32

12–3 The code is shown below and does not use PG modules. Instead, we use a
look ahead tree to detect the presence of a 1 in any lower (higher priority)
bits.

module arb4(bi, g, bo) ;


input bi;
input [2:0] g;
output [2:0] bo;
assign bo[0] = bi | g[0];
assign bo[1] = bi | g[0] | g[1];
assign bo[2] = bi | g[0] | g[1] | g[2];
endmodule // arb4

module claArb(req, gnt) ;


input [31:0] req;
output [31:0] gnt;

wire [7:0] g4;


assign g4[0] = |(req[3:0]);
assign g4[1] = |(req[7:4]);
assign g4[2] = |(req[11:8]);
assign g4[3] = |(req[15:11]);
assign g4[4] = |(req[19:16]);
assign g4[5] = |(req[23:20]);
assign g4[6] = |(req[27:24]);
assign g4[7] = |(req[31:28]);

wire [1:0] g8;


assign g8[0] = |(g4[3:0]); //bits 15:0
assign g8[1] = |(g4[7:4]); //bits 31:0

wire [31:0] b; //1 below this number?

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 53

assign b[0] = 0;
assign b[16] = g8[0];
arb4 a20(b[0], g4[2:0], {b[12], b[8], b[4]});
arb4 a21(b[16], g4[6:4], {b[28], b[24], b[20]});

arb4 a10(b[0], req[2:0], b[3:1]);


arb4 a11(b[4], req[6:4], b[7:5]);
arb4 a12(b[8], req[10:8], b[11:9]);
arb4 a13(b[12], req[14:12], b[15:13]);
arb4 a14(b[16], req[18:16], b[19:17]);
arb4 a15(b[20], req[22:20], b[23:21]);
arb4 a16(b[24], req[26:24], b[27:25]);
arb4 a17(b[28], req[30:28], b[31:29]);

assign gnt = ~b & req;


endmodule // claArb

12–5 One (possibly the best) option for implementing this solution is to pre-
compute the value 3a and modify the booth recoders to look at two bits
at a time and select between {0, a, 2a, 3a}. We also need to remove all
sign extensions internal to the multiplier. The other option is to append
0 to the MSB of both a and b, converting the n × m multiplication to
(n + 1) × (m + 1).

12–6 The two tables are below. We can quickly check that the first table is cor-
rect because the sum of each column gives the correct weight from each bit
position. The Verilog would closely follow, using a selection multiplexer,
inverter, and precomputed 3× sum.

bit b8 b7 b6 b5 b4 b3 b2 b1 b0 b21
weight -256 128 64 32 16 8 4 2 1 N/A
d2 -256 128 64 64
d1 -32 16 8 8
d0 -4 2 1 1

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

54 Digital Design: A Systems Approach, Solutions

b3i+2 b2i+1 b3i b3i21 di


0 0 0 0 0
0 0 0 1 1
0 0 1 0 1
0 0 1 1 2
0 1 0 0 2
0 1 0 1 3
0 1 1 0 3
0 1 1 1 4
1 0 0 0 -4
1 0 0 1 -3
1 0 1 0 -3
1 0 1 1 -2
1 1 0 0 -2
1 1 0 1 -1
1 1 1 0 -1
1 1 1 1 0

12–11 The tables are shown below:

lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 8 9 7 8 6 7 5 6 4 5 3 4 2 3 1 2
stage 1 6 5 6 5 4 5 4 3 4 3 2 3 2 1
stage 2 4 4 4 3 4 3 3 2 3 2 2 1
stage 3 3 3 3 2 3 2 2 2 1
stage 4 2 2 2 2 1

lg weight 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
stage 1 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
stage 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
stage 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

12–12 The first table below shows an updated version of Table 12.3, accounting
for the new 7-input counter. The next two tables show the number of
remaining terms to add at each bit-position. We were unable to save a
stage of logic using this scheme, though this is not always the case.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 55

in out
i 2i 3i
1 1 0 0
2 1 1 0
3 1 1 0
4 2 1 0
5 2 2 0
6 2 2 0
7 1 1 1
8 2 1 1
9 2 2 1

lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 8 9 7 8 6 7 5 6 4 5 3 4 2 3 1 2
stage 1 5 4 2 5 3 3 4 3 4 3 2 3 2 1
stage 2 3 3 3 3 2 2 3 2 3 2 2 1
stage 3 2 2 2 2 2 2 2 2 1
stage 4

lg weight 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
stage 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4
stage 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3
stage 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

12–15 The tables are shown below. We must sign-extend the partial products.

in out
i 2i
1 1 0
2 1 1
3 1 1
4 2 1
5 2 2
6 2 2
7 1 1
8 2 1
9 2 2
10 2 2
11 2 2
12 3 3

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

56 Digital Design: A Systems Approach, Solutions

lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 10 10 10 10 11 8 9 6 6 7 4 4 5 2 2 3
stage 1 7 7 8 7 6 6 5 4 4 4 3 4 3 2 2 1
stage 2 5 6 5 5 4 4 3 3 3 3 2 3 1
stage 3 4 4 4 4 3 3 3 2 2 2 2 1
stage 4 3 3 3 3 2 2 1
stage 5 2 2 2 2 1
lg weight 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 10 10 10 10 10 10 10 10 10 10 10 10 10 10
stage 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7
stage 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5
stage 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3
stage 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 13

Arithmetic Examples

13–1 We can Booth-encode (Radix-4) and use the Wallace tree shown in the
solution to Exercise 12–11. Slightly more interesting, however, is the nega-
tion on the input to the 2nd multiplier. To do this, we invert the input
that is not the input to the Booth-recoders and add an additional carry
in of 1 to the weight 0 term. This will not add any stages to the Wallace
tree.

13–4 One of the simplest ways of fixing this bug is to add the following code
to the Verilog:

wire bugFix = (fixed == 12’h800);


assign float = (new_exp[3] | bugFix) ? {sign, 3h7, 4hf} :
{sign, new_exp[2:0], mant[3:0]};

This will add minimum delay to the system, since the comparison is done
in parallel to rest of the conversion process.

13–7 Shown below is our solution. We have increased the width of the mul-
tiplier output to accommodate for s2.5 weights. We did not widen the
adder output since the sum of the weights had a magnitude less than one.
This ensures that adder output can not have a magnitude greater than
that of the greatest input.

57

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

58 Digital Design: A Systems Approach, Solutions

w0 s2.5

x0 Float To
x
8b Float Fixed s11.0 s13.5

w1 s2.5

x1 Float To
x
8b Float Fixed s11.0 s13.5

Fixed to y
+ Float
w2 s2.5 s11.5 8b Float

x2 Float To
x
8b Float Fixed s11.0 s13.5

w3 s2.5

x3 Float To
x
8b Float Fixed s11.0 s13.5

13–8 See the image below, which has a larger output of the adder. To detect
overflow from the addition, we check that the upper 5 bits (sign, 4 MSB)
are equivalent. If they are not, an overflow (relative to a s11.0 number)
has occurred and the output is saturated.

w0 s2.5

x0 Float To
x
8b Float Fixed s11.0 s13.5

w1 s2.5

x1 Float To
x
8b Float Fixed s11.0 s13.5

Fixed to y
+ Float
w2 s2.5 s15.5 8b Float

x2 Float To
x
8b Float Fixed s11.0 s13.5

w3 s2.5

x3 Float To
x
8b Float Fixed s11.0 s13.5

13–11 Our cross product block diagram is shown below. We can factor the mul-

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 59

tiple and subtract module, and instantiate 3 copies of it. For extra speed,
the full multiplies can be replaced with the ones used in Exercise 13–1.

s3.14
ay s6.28
s3.14 x
bz s7.28
s3.14 cx
az -
s3.14 x
by
s6.28
s3.14
az s6.28
s3.14 x
bx s7.28
s3.14 cy
ax -
s3.14 x
bz
s6.28
s3.14
ax s6.28
s3.14 x
by s7.28
s3.14 cz
ay -
s3.14 x
bx
s6.28

13–12 The code is shown below. We find the values x2 1, (x2 1)2 , and (x2 1)3
using Verilog’s built-in multiplication. We then shift, sign extend, and
manually line up the decimal points before the final add. We only include
20 bits in the final adder (instead of 27) because we discard the lower bits
of (x 2 1)3 which cannot affect the round. The maximum error is 1.6% at
x = 2.

module sqrtApprox(x, y);

input [8:0] x; //1.8


output [8:0] y; //1.8
wire signed [8:0] x_m1 = {~x[8], x[7:0]}; //s.8 = x-1
wire signed [16:0] x_m1_2 = x_m1 * x_m1; //x.16 = (x-1)*(x-1)
wire signed [24:0] x_m1_3 = x_m1_2 * x_m1; //x.24 = (x-1)*(x-1)*(x-1)
wire [19:0] y_t ;
assign y = y_t[19:11] + y_t[10]; //round

assign y_t = {1’b1, 19’d0} +


{x_m1[8], x_m1, 10’d0} -
{ {3{x_m1_2[16]}}, x_m1_2} +
{ {4{x_m1_3[24]}}, x_m1_3[24:9]};
endmodule // sqrtApprox

13–14 The simple converter, using Verilog addition and multiplication, is shown
below.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

60 Digital Design: A Systems Approach, Solutions

module bcd2bin(d, b);


input [15:0] d;
output [13:0] b;

assign b = d[15:12]*14’d1000 +
d[11:8] *14’d100 +
d[7:4] * 14’d10 + d[3:0];

endmodule // bcd2bin

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 14

Solutions: Sequential Logic

14–1 Leaving the input low for a minimum of 3 clock edges will put this FSM
into state 00.

14–3 The new state diagram is shown below:

carew

rst carew
gns yns rns gew yew rew

100 001 010 001 001 001 001 100 001 010 001 001

We have added new rns and rew states to set the lights to red in both
directions. The state table is shown below:
state next state out
carew=0 carew=1
gns gns yns 100 001
yns rns rns 010 001
rns gew gew 001 001
gew yew yew 001 100
yew rew rew 001 010
rew gns gns 001 001

14–4 One possible binary state assignment is shown in the chart below.

61

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

62 Digital Design: A Systems Approach, Solutions

state encoding
gns 000
yns 001
rns 010
gew 100
yew 101
rew 110

Next, we write the next state logic table:

state carew next state


000 0 000
000 1 001
001 0 010
001 1 010
010 0 100
010 1 100
100 0 101
100 1 101
101 0 110
101 1 110
110 0 000
110 1 000

And find the next state logic:

s0 s0 s0
s1s0 s1s0 s1s0
cs2 00 01 11 10 cs2 00 01 11 10 cs2 00 01 11 10
00 01 x3 12 00 11 x3 02 00 01 x3 02
00

00

00

14 15 x7 06 04 15 x7 06 14 05 x7 06
01

01

01
s2

s2

s2

112 113 x15 014 012 113 x15 014 112 013 x15 014
11

11

11
c

08 09 x11 110 08 19 x11 010 18 09 x11 010


10

10

10

s1 s1 s1
ns2 s2 š s1 › s2 š s1 ns1 s0 ns0 s1 š s0 š car › s2 š s1 š s0

Finally, we compute the outputs:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 63

gns = s2 ' s1 ' s0


yns = s2 ' s0
rns = s2 ( s1
gew = s2 ' s1 ' s0
yew = s2 ' s0
rew = s2 ( s1

14–5 The Verilog has very few changes compared to that supplied in the text:

//State assignment
‘define SWIDTH 3
‘define GNS 3’b000
‘define YNS 3’b001
‘define RNS 3’b010
‘define GEW 3’b100
‘define YEW 3’b101
‘define REW 3’b110
//---------------------------------------------
// define output codes
//---------------------------------------------
‘define GNSL 6’b100001
‘define YNSL 6’b010001
‘define GEWL 6’b001100
‘define YEWL 6’b001010
‘define REDL 6’b001001

module Traffic_Light(clk, rst, carew, lights) ;


input clk ;
input rst ; // reset
input carew ; // car present on east-west road
output [5:0] lights ; // {gns, yns, rns, gew, yew, rew}
wire [‘SWIDTH-1:0] state, next ; // current and next state
reg [‘SWIDTH-1:0] next1 ; // next state w/o reset
reg [5:0] lights ; // output - six lights 1=on

// instantiate state register


DFF #(‘SWIDTH) state_reg(clk, next, state) ;

// next state and output equations - this is combinational logic


always @(*) begin

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

64 Digital Design: A Systems Approach, Solutions

case(state)
‘GNS: {next1, lights} = {(carew ? ‘YNS : ‘GNS), ‘GNSL} ;
‘YNS: {next1, lights} = {‘RNS, ‘YNSL} ;
‘RNS: {next1, lights} = {‘GEW, ‘REDL} ;
‘GEW: {next1, lights} = {‘YEW, ‘GEWL} ;
‘YEW: {next1, lights} = {‘REW, ‘YEWL} ;
‘REW: {next1, lights} = {‘GNS, ‘REDL} ;
default: {next1, lights} = {‘SWIDTH+5{1’bx}};
endcase
end
// add reset
assign next = rst ? ‘GNS : next1 ;
endmodule

14–12 The new state diagram can be constructed by inserting a state between
3 and 4. This new state will be renamed 4, while 4 becomes 5, and 5
becomes 6. The new state transitions every cycle.

14–13 The state table is shown below. We omit a column indicating that the
rst input cases a transition to state R.
state next state out
a=0 a=1
R 0 1 0
1 2 2 1
2 3 3 0
3 4 4 0
4 5 1 0
5 M 1 0
M 2 L 1
L 2 2 0
We also include the state table from the modified pulse filler of the previous
exercise:
state next state out
a=0 a=1
R 0 1 0
1 2 2 1
2 3 3 0
3 4 4 0
4 5 5 0
5 6 1 0
6 M 1 0
M 2 L 1
L 2 2 0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 65

14–14 We wrote a program to permute all possible state assignments and got
the following:

state encoding
R 000
1 001
2 011
3 111
4 101
5 100
M 110
L 010

There are only two non-single bit transitions.

14–15 The K-maps are shown below:

s0 s0 s0
s1s0 s1s0 s1s0
as2 00 01 11 10 as2 00 01 11 10 as2 00 01 11 10
00 01 13 02 00 11 03 12 00 01 03 12
00

00

14 15 07 06 04 15 17 16 00 14 05 07 06
01

01

01
s2

s2

s2

012 013 015 114 012 013 115 114 112 113 015 114
11

11

11
a

a
08 09 111 010 08 19 011 110 18 09 011 110
10

10

10

s1 s1 s1
s2 s1 s0

The state logic is:

s2 = (a ' s2 s1 ) ( (s2 ' s1 ' s0 ) ( (a ' s2 ' s1 ' s0 )


s1 = (s2 ' s1 ) ( (s1 ' s0 ) ( (a ' s1 ' s0 ) ( (s2 ' s1 ' s0 )
s0 = (a ' s0 ) ( (s2 ' s1 ' s0 ) ( (a ' s2 ' s1 ) ( (s2 ' s1 ' s0 )

14–20 In our state table, shown below, we assume that only one (or zero)
input goes high each cycle. If money is inserted while vending, we go to
the appropriate state.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

66 Digital Design: A Systems Approach, Solutions

state next state {vend,change}


n=0,d=0 n=1,d=0 n=0,d=1
in00 in00 in05 in10 00
in05 in05 in10 in15 00
in10 in10 in15 in20 00
in15 in15 in20 in25 00
in20 in20 in25 in30 00
in25 in25 in30 in35 00
in30 in30 in35 in40 00
in35 in35 in40 in45 00
in40 in00 in05 in10 10
in45 in00 in05 in10 11
14–25 See the state table below. We use the state names up25k0 to up25k4
when counting the smooth signal. When we list up25kX, we are indicating
any of the 5 states prefixed by “up25k.”
state alt10k alt25k smooth next state {noelect, belt}
gnd 0 x x gnd 11
gnd 1 x x up10k 11
up10k 0 0 x up10k 01
up10k 1 0 x gnd 01
up10k 0 1 x up25k 01
up25k0 0 0 1 up25k1 01
up25k1 0 0 1 up25k2 01
up25k2 0 0 1 up25k3 01
up25k3 0 0 1 up25k4 01
up25k4 0 0 1 up25kS 01
up25kS 0 0 1 up25kS 00
up25kX 0 1 x up10k
up25kX 0 0 0 up25k0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 15

Solutions: Timing
Constraints

15–1 " tcax = tdax = 20ps

" tcbx = 3030, tdbx = 40ps

" tccx = 3030, tdcx = 40ps

" tcdx = 3030, tddx = 50ps

15–2 tcax = 35ps, tdax = 60ps

15–3 tcbx = 35ps, tdbx = 40ps

15–7 The solution for the next 3 problems is shown below:

67

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

68 Digital Design: A Systems Approach, Solutions

tcy

clk

ts th
dA

tcCQ
qA
tdCQ
ts
th
dB
tcCQ
qB
tdCQ

th
ts
dC

qC tcCQ

tdCQ

15–8 See above.

15–9 See above.

15–10 A 2GHz cycle time gives tcy = 500ps. Using Equations 15.3 and 15.4
gives:

tcy g tdCQ + tdMax + ts


500 g 20 + 400 + 20
500 g 440
th f tcCQ + tcMin
10 f 10 + 10
10 f 20

There are no errors.

15–11 Again using Equations 15.3 and 15.4:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 69

tcy g tdCQ + tdMax + ts


500 g 30 + 400 + 100
500 g 530
th f tcCQ + tcMin
220 f 2 + 10
220 f 12

There is a setup violation. We must increase the cycle time to 530ps for
correct operation.
15–13 Using the setup and hold equations:

tcy g tdCQ + tdMax + ts


1000 g 20 + tdMax + 20
tdMax f 960ps
th f tcCQ + tcMin
10 f 10 + tcMin
tcMin g 0ps

15–14 Using the setup and hold equations:

tcy g tdCQ + tdMax + ts


1000 g 30 + tdMax + 100
tdMax f 870ps
th f tcCQ + tcMin
220 f 2 + tcMin
tcMin g 218ps

15–16 We want the flip-flop to work even when there is no combinational logic,
thus:

th f tcCQ + 0
th 2 tcCQ f 0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

70 Digital Design: A Systems Approach, Solutions

15–18 To test if the violation is a setup-time problem, simply increase the cycle
time. If running at a slower frequency fixes the problem (to a first order)
it is a setup violation. Hold time violations cannot be detected by varying
the clock. (Saying “if it is not setup then it is hold” is not a valid test
strategy.) Because hold violations occur when the contamination delay
of a circuit is too fast, any method of slowing logic such as increased
temperature or lower voltage would make the circuit work. See Section
20.2.6 for more information on this type of characterization.
15–19 The new ts = th = 10ps since the outer clock effectively arrives 40ps
earlier than the inner clock. There is a 40ps delay on the output, which
gives tdcq = tccq = 120ps.
15–22 We make the table below for all logic paths. In it, we list the time that
a clock can come early to Y so a signal from X does not cause a setup
violation. We also provide the how late a clock can come to Y so a signal
from X does not a hold violation.
From To Early(ps) Late(ps)
X Y 1860 100
X Z 1910 30
Y X - -
Y Z 1910 30
Z X 1560 10
Z Y - -
Note that in the table the clock to X from Z cannot actually be 1910ps
early since that is equivalent to a 1860ps early clock to Z from X (a
violation). We’ve updated the table below:
From To Early(ps) Late(ps)
X Y 1860 100
X Z 10 30
Y X 100 1860
Y Z 1910 30
Z X 30 10
Z Y 30 1910

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 16

Solutions: Data Path


Sequential Logic

16–1 The sequence of states is shown in the table below. It counts through 8
different states only changing a single bit at a time.

State Next
0000 0001
0001 0011
0011 0111
0111 1111
1111 1110
1110 1100
1100 1000
1000 0000

16–2 The sequence of 15 states (all except 0000) is shown in the table below.
This counts through the 15 numbers in a pseudo-random order.

71

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

72 Digital Design: A Systems Approach, Solutions

1111 1110
1110 1100
1100 1000
1000 0001
0001 0010
0010 0100
0100 1001
1001 0011
0011 0110
0110 1101
1101 1010
1010 0101
0101 1011
1011 0111
0111 1111

16–4 The solution is shown below. We have added a register for storing the
maximum value, the logic to load it, and modified the counter. The new
counter (not shown) will not increment when the count equals max and
will not decrement when the count equals 0.

loadMax

0M D Q max
u n n
x
12
n clk

3
in n
next count
2 D Q
+/-1 n Mux4 n n
Sat 1
n
0
0 clk
n
4
rst
up
down C
L
load

16–6 The Verilog is shown below:

module UDL_Count3Rd(clk, rst, up, down, load, in,


rd, r0, r1, r2, r3) ;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 73

parameter n = 4 ;
input clk, rst, up, down, load ;
input [n-1:0] in ;
input [1:0] rd;

output [n-1:0] r0, r1, r2, r3;


wire [n-1:0] outpm1, next;
reg [n-1:0] n0, n1, n2, n3, src;

DFF #(n) count0(clk, n0, r0) ;


DFF #(n) count1(clk, n1, r1) ;
DFF #(n) count2(clk, n2, r2) ;
DFF #(n) count3(clk, n3, r3) ;

always@(*) begin
case(rd)
2’b00: {src, n3, n2, n1, n0} = {r0, r3, r2, r1, next};
2’b01: {src, n3, n2, n1, n0} = {r1, r3, r2, next, r0};
2’b10: {src, n3, n2, n1, n0} = {r2, r3, next, r1, r0};
2’b11: {src, n3, n2, n1, n0} = {r3, next, r2, r1, r0};
default {src, n3, n1, n1, n0} = {5*n{1’bx}};
endcase // case (rd)
end
assign outpm1 = src + {{n-1{down}},1’b1} ;

Mux4 #(n) mux(src, in, outpm1, {n{1’b0}},


{(~rst & ~up & ~down & ~load),
(~rst & load),
(~rst & (up | down)),
rst},
next) ;
endmodule // UDL_Count3Rd

16–7 The Verilog is shown below:

module UDL_Count3Rs(clk, rst, up, down, load, in, rd, rs, r0, r1, r2, r3) ;
parameter n = 4 ;
input clk, rst, up, down, load ;
input [n-1:0] in ;
input [1:0] rd, rs;

output [n-1:0] r0, r1, r2, r3;


wire [n-1:0] outpm1, next;
reg [n-1:0] n0, n1, n2, n3, src;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

74 Digital Design: A Systems Approach, Solutions

DFF #(n) count0(clk, n0, r0) ;


DFF #(n) count1(clk, n1, r1) ;
DFF #(n) count2(clk, n2, r2) ;
DFF #(n) count3(clk, n3, r3) ;

always@(*) begin
case(rs)
2’b00: src = r0;
2’b01: src = r1;
2’b10: src = r2;
2’b11: src = r3;
default src = {n{1’bx}};
endcase // case (rs)
end

always@(*) begin
case(rd)
2’b00: {n3, n2, n1, n0} = {r3, r2, r1, next};
2’b01: {n3, n2, n1, n0} = {r3, r2, next, r0};
2’b10: {n3, n2, n1, n0} = {r3, next, r1, r0};
2’b11: {n3, n2, n1, n0} = {next, r2, r1, r0};
default {n3, n1, n1, n0} = {4*n{1’bx}};
endcase // case (rd)
end

assign outpm1 = src + {{n-1{down}},1’b1} ;

Mux4 #(n) mux(src, in, outpm1, {n{1’b0}},


{(~rst & ~up & ~down & ~load),
(~rst & load),
(~rst & (up | down)),
rst},
next) ;
endmodule // UDL_Count3Rs

16–8 See below for a possible solution. We use two registers: one for the current
number and one for the previous number. When we reset (not shown), we
set the out register to 0 and the last register to 1. That gives the correct
sequence of 0, 1, 1, 2, 3, ... on the output.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 75

overflow
D Q out
+
15

last
D Q
15

16–14 The datapath is shown below:

16–15 The two control signals are:

Sel A = rst ( valid out


Sel B = rst ( valid out

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

76 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 17

Solutions: Factoring Finite


State Machines

17–1 The diagram below shows our solution. We have factored the states B,
C, D, F, G, H, and I into a separate counter state machine (bottom of the
image). When the go signal is asserted, it walks through the Gray-code
count (possibly through state FO1 ). The controller toggles between states
A and B and goes to the next state when the counter signals ready and
the input is the correct value.

77

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

78 Digital Design: A Systems Approach, Solutions

A B
sel=0 sel=1
go= go=
(m==1) & rdy (m==0) & rdy

Controller

rdy
sel

go
Counter

To F00
sel F1 F2 F3
go &
F00 x=01 x=11 x=10
go sel

rdy=0 rdy=0 rdy=0


&
~

x=00
rst rdy=1
F01
x=00
rdy=0

17–5 We have factored the FSM (see below) twice. First, we add a timer (either
2 or 4 cycles) that counts down the time spent at any one output state.
Next, we factor the sequence of output states into two distinct patterns:
2-1-0, labeled A1-A3 and 3-1, labeled B1-B2. The mast controller (top-
right) selects the pattern and, when done, moves to the next state.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 79

psel=A psel=A psel=B psel=A psel=R


go=m go=1 go=1 go=1 go=1
in
Control m=1 done=1 done=1 done=1 done=1
R M1 M2 M3 M4 to R

done
psel

go
out=2 out=1 out=0
Pattern out tsel=2 tsel=4 tsel=2/4
to A1 (psel=A)
tdone
FSM A1 tdone
A2
tdone
A3 to B1 (psel = B)
out=2
to R (psel =R)

go
tdone

tsel=2/4
tload

tsel

R to A1 (psel=A)
go
tdone tdone
B1 B2
Timer to R (psel=R)
out=3 out=1
tsel=4 tsel=2

17–9 The inverted controller is shown below. We have an explicit timer at


the top level that communications with the counter using next and psel
signals. When the timer reaches 0, it asserts next and goes to the state
indicated by psel. The output is driven directly by the counter.

to T5 (psel == T5)
OFF T5 T4 T3 T2 T1 to T3 (psel == T3)
in
to Off (psel == R)
next=in next=0 next=0 next=0 next=0 next=1

in
Master
next

psel

Counter out

next next next next next next


Done A B C D E

psel = T5 psel = T3 psel = T5 psel = T3 psel = T5 psel = R


out = 0 out = 1 out = 0 out = 1 out = 0 out = 1

17–14 To implement this functionality, we must modify both the master, timer,
and the count. In the timer, the tsel signal must now bet widened to 2 bits

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

80 Digital Design: A Systems Approach, Solutions

to select between a count of 5, 15, or 4 cycles. The counter must output


the LSB of the count itself (odd or even). Finally, the Master FSM must
set tsel correctly to take into account the count.

17–17 The benefits of factoring is seen in this problem, as we only need to


change 1 of the 5 state machines (shown below). We add the car ns signal
to the system and modify the controller to be fair (while still giving car lt
priority).

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 81

car_ew dir
car_lt 2 lights
Master FSM Combiner
car_ns ok 9

light
tdone
load

on
time

done
3

Light
Timer1
FSM

ltdone
tload

time
Timer2

ok & (!car_lt | tdone) & car_ns

LT
lt
c ar_
ok & (!car_lt | tdone) &

o k& dir = lt
load = 0
~car_ns
ok & car_lt

NS ok
&(
car
_e
&~ w &
car t
dir = ns _lt done
load = 1 )

EW

dir = ew
load = 0
ok & (car_ns & tdone)&~car_lt

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

82 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 18

Solutions: Microcode

18–1 The new microcode table is shown below. We have added the additional
input signal, car ns, and another address bit. We also modified the state
transitions from GEW to go to YEW only when car ns is asserted.

Address State car ew, car ns Next State Output Data


00000 GNS (000) 00 GNS (000) 100001 000100001
00001 GNS (000) 01 GNS (000) 100001 000100001
00010 GNS (000) 10 YNS (001) 100001 001100001
00011 GNS (000) 11 YNS (001) 100001 001100001
00100 YNS (001) 00 GEW (010) 010001 010010001
00101 YNS (001) 01 GEW (010) 010001 010010001
00110 YNS (001) 10 GEW (010) 010001 010010001
00111 YNS (001) 11 GEW (010) 010001 010010001
01000 GEW (010) 00 GEW (010) 001100 010001100
01001 GEW (010) 01 YEW (011) 001100 011001100
01010 GEW (010) 10 GEW (010) 001100 010001100
01011 GEW (010) 11 YEW (011) 001100 011001100
01100 YEW (011) 00 GNS (000) 001010 000001010
01101 YEW (011) 01 GNS (000) 001010 000001010
01110 YEW (011) 10 GNS (000) 001010 000001010
01111 YEW (011) 11 GNS (000) 001010 000001010

18–3 We must change a total of 2 bits in our storage, see the new table below:

83

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

84 Digital Design: A Systems Approach, Solutions

Address State car ew, car ns Next State Output Data


00000 GNS (000) 00 GNS (000) 100001 000100001
00001 GNS (000) 01 GNS (000) 100001 000100001
00010 GNS (000) 10 YNS (001) 100001 001100001
00011 GNS (000) 11 GNS (001) 100001 000100001
00100 YNS (001) 00 GEW (010) 010001 010010001
00101 YNS (001) 01 GEW (010) 010001 010010001
00110 YNS (001) 10 GEW (010) 010001 010010001
00111 YNS (001) 11 GEW (010) 010001 010010001
01000 GEW (010) 00 GEW (010) 001100 010001100
01001 GEW (010) 01 YEW (011) 001100 011001100
01010 GEW (010) 10 GEW (010) 001100 010001100
01011 GEW (010) 11 GEW (010) 001100 010001100
01100 YEW (011) 00 GNS (000) 001010 000001010
01101 YEW (011) 01 GNS (000) 001010 000001010
01110 YEW (011) 10 GNS (000) 001010 000001010
01111 YEW (011) 11 GNS (000) 001010 000001010
18–4 Below is our sequence, using the same inputs and outputs as described in
the text. In the BASE state, we loop until an input is asserted. We then
branch based on the input and perform one or more instructions, such as
adding a value to the total or dispensing.
State Br Inst Target SelVal sub selNext Serv Change Next
BASE inputs=0 BASE 0000 0 100 0 0 0
B2Q quarter Q1 0000 0 100 0 0 0
B2D dime D1 0000 0 100 0 0 0
B2N nickel N1 0000 0 100 0 0 0
B2Disp dispense & enough S1 0000 0 100 0 0 0
ToBase Always BASE 0000 0 100 0 0 0
Q1 Always BASE 0001 0 010 0 0 1
D1 Always BASE 0010 0 010 0 0 1
N1 Always BASE 0100 0 010 0 0 1
S1 Never X 1000 1 010 1 0 1
S1A ˜done S1A 0000 0 100 0 0 1
S2A done S2A 0000 0 100 0 0 1
S2B zero BASE 0000 0 100 0 0 0
C1 Never X 0100 1 010 0 1 0
C1A ˜done C1A 0000 0 100 0 0 1
C2A done C2A 0000 0 100 0 0 1
C2B zero BASE 0000 0 100 0 0 1
C2C Always C1 0000 0 100 0 0 1
18–6 A portion of the micro-code table is shown below. It simply iterates
through each state in the flash sequence if the input flash is high and
reverts to state RST if it goes low.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 85

State flash Next State Output


RST 0 RST 0
RST 1 F1 0
X 0 RST 0
F1 1 F2 1
F2 1 F3 0

18–7 Two potential solutions are shown below. In (a), the microcode takes
both flash and tdone as inputs and sets the output and loads the timer
when appropiate. The code most both sequence each sub-component of a
letter (dot, dash, space) and the letters themselves. Solution (b) factors
out reusable letter FSM (potentially also microcoded) and the master FSM
only needs to sequence the letters ‘SOS’.

flash out flash


Master uCode FSM Master uCode FSM

2 5
tdone

ldone
load
load

tsel

lsel

out
Timer Letter FSM

2
tdone
load

(a)
tsel

Timer

(b)

18–11 The state diagram and simplified microcode are shown below. The mi-
crocode is fairly basic, except with respect to states S11p0 and S11p1.
Here, we must check if the input character is either a ‘1’ or an ‘A’. We
first check for an ’A’ and if that is not a match, the FSM will not assert
c nxt and instead check against the value ‘1’.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

86 Digital Design: A Systems Approach, Solutions

c!=1 && c!=A c=1


c=\0
c!=1
start N c=1 c=1
Fail 1 11
Other
States:
n_f = 1

c=A
c!=B n_f = 0

start
n_m = 0 c!=C
n_m = 0
c_nxt = 0 c_nxt = 1
c=C c=B
11AB 11A
Success

n_f = 0
n_m = 1
c_nxt = 0

State Start End Match Next sc nf nm c nxt


FAIL 0 X X FAIL x 1 0 0
FAIL 1 X X N x 0 0 1
X 0 1 X FAIL x 1 0 1
N 0 0 0 N 1 0 0 1
N 0 0 1 S1 1 0 0 1
S1 0 0 0 N 1 0 0 1
S1 0 0 1 S11p0 1 0 0 1
S11p0 0 0 0 S11p1 A 0 0 0
S11p0 0 0 1 S11A A 0 0 1
S11p1 0 0 0 N 1 0 0 1
S11p1 0 0 1 S11p0 1 0 0 1
S11A 0 0 0 N B 0 0 1
S11A 0 0 1 S11AB B 0 0 1
S11AB 0 0 0 N C 0 0 1
S11AC 0 0 1 SUCCESS C 0 1 1
SUCCESS 0 X X SUCCESS x 0 1 0
SUCCESS 1 X X N x 0 0 1

18–17 The code used to write this program is shown below. It relies on a series
of immediate loads, adds, and subtracts to compute each character. The
code itself cannot be easily changed to spell out different strings. Another
solution would be to write “HELLO WORLD” into RAM memory at
initialization then load and output each character in turn. This would
enable the code to be easily changed to spell out other phrases.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 87

#48, 45, 4C, 4F, 20, #From bottom of left


#57, 4F, 52, 4C, 44 ADD O1
LDAI 0100 STA O1 #O1 = 0x4F = O
STA T2 STA T0 #T0 = 0x4F
LDAI 0010 LDA T1
SH T2 STA O1 #O1 = 0x20 = ’ ’
STA T1 #T1 = 0x20 LDAI 1000
LDA T2 #ACC = 0x04 ADD T0 #Acc = 57
SH T2 #ACC = 0x40 STA O1 #O1 = 0x57
STA T2 #T2 = 0x40 LDA T0
LDAI 1000 STA O1 #O1 = 0x4F
OR T2 #ACC = 0x48 LDAI 0011
STA O1 #O1 = 0x48 = H ADD O1
LDAI 0011 STA O1 #O1 = 0x52
STA T0 #T0 = 3 LDAI 1110
LDA O1 STA T0 #T0 = 0xE
SUB T0 LDA O1
STA O1 #O1 = 0x45 = E SUB T0 #ACC = 0x44
LDAI 0111 STA T0
ADD O1 LDAI 1000
STA O1 #O1 = 0x4C = L ADD T0
STA O1 #O1 = 0x4C = L STA O1
LDAI 0011 LDA T0
#To next col STA O1

18–18 The output from our testbench is shown below.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

88 Digital Design: A Systems Approach, Solutions

PC: 0000, o1: 0000 i:01100100 # PC: 0017, o1: 004f i:01111010
# PC: 0001, o1: 0000 i:01111100 # PC: 0018, o1: 004f i:01011011
# PC: 0002, o1: 0000 i:01100010 # PC: 0019, o1: 004f i:01110011
# PC: 0003, o1: 0000 i:10111100 # PC: 001a, o1: 0020 i:01101000
# PC: 0004, o1: 0000 i:01111011 # PC: 001b, o1: 0020 i:10001010
# PC: 0005, o1: 0000 i:01011100 # PC: 001c, o1: 0020 i:01110011
# PC: 0006, o1: 0000 i:10111100 # PC: 001d, o1: 0057 i:01011010
# PC: 0007, o1: 0000 i:01111100 # PC: 001e, o1: 0057 i:01110011
# PC: 0008, o1: 0000 i:01101000 # PC: 001f, o1: 004f i:01100011
# PC: 0009, o1: 0000 i:11101100 # PC: 0020, o1: 004f i:10000011
# PC: 000a, o1: 0000 i:01110011 # PC: 0021, o1: 004f i:01110011
# PC: 000b, o1: 0048 i:01100011 # PC: 0022, o1: 0052 i:01101110
# PC: 000c, o1: 0048 i:01111010 # PC: 0023, o1: 0052 i:01111010
# PC: 000d, o1: 0048 i:01010011 # PC: 0024, o1: 0052 i:01010011
# PC: 000e, o1: 0048 i:10011010 # PC: 0025, o1: 0052 i:10011010
# PC: 000f, o1: 0048 i:01110011 # PC: 0026, o1: 0052 i:01111010
# PC: 0010, o1: 0045 i:01100111 # PC: 0027, o1: 0052 i:01101000
# PC: 0011, o1: 0045 i:10000011 # PC: 0028, o1: 0052 i:10001010
# PC: 0012, o1: 0045 i:01110011 # PC: 0029, o1: 0052 i:01110011
# PC: 0013, o1: 004c i:01110011 # PC: 002a, o1: 004c i:01011010
# PC: 0014, o1: 004c i:01100011 # PC: 002b, o1: 004c i:01110011
# PC: 0015, o1: 004c i:10000011 # PC: 002c, o1: 0044 i:xxxxxxxx
# PC: 0016, o1: 004c i:01110011

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 19

Solutions: Sequential
Examples

19–1 In the new state diagram, below, we have added another stage compared
to that of the divide-by-3 counter.

0 0

rst A 1 B
0 0
1
1 C
0
0
1
E 1 D
1 0

19–2 We can make a divide-by-9 counter by attaching the output of one divide-
by-3 counter to the input of the next. With this structure, however, the
divide-by-9 signal will be one cycle later than if we had built a single state
machine.

19–4 The new dot (a) and dash (b) state machines are below. If there is a
fourth consequative 1 in the dash detector, we leave the cb signal asserted
and move to state D4. There, if there is a 0 (or if there is a zero in D3),
the FSM signals is and resets back to state 0. A fifth one goes into state

89

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

90 Digital Design: A Systems Approach, Solutions

1 and does not assert is. The dot detector operates in a similar fashion.

rst 1/cb 1/cb


0 D1 D2
0/is 0/is

0
1

0
(a) Dot

rst 1/cb 1/cb 1/cb 1/cb


0 D1 D2 D3 D4
0 0 0/is 0/is

1
0
1

(b) Dash

19–6 The top level diagram of the Tic-Tac-Toe machine is shown below. We
have added a combinational module, 3-or-full, and sequential module,
Controller. The 3-or-full asserts x3 if the xout signal has 3 in a row.
It sets f9 high when no free squares remain. The controller state machine
is shown in the second image (omitting resets of the game). It toggles
between X playing and O playing until the board is full or the currently
playing side wins.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 91

XReg
9
1
9
E 9
0
xin
MoveGen
9
xout
0
9
oin
OReg

9
1

E
xout x3
3 or full
oin f9

ex
Controller
eo

xwin
owin
gover

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

92 Digital Design: A Systems Approach, Solutions

ex=0, eo=0 ex=0, eo=0


out = 101 out = 011

XWIN OWIN

out = {xwin, owin, gover}

x3 x3

!(x3 | f9)
X O
!(x3 | f9)
ex=1, eo=0 ex=0, eo=1
out = 000 out = 000

(!x3) & f9 (!x3) & f9


TIE

ex=0, eo=0
out = 001

19–12 The state table for the machine is below. We must also include a counter
that increments when the next state is 10 and resets to zero when the state
becomes 00.
State in Next State out
00 0 00 0
00 1 01 0
01 0 11 0
01 1 00 0
11 0 00 0
11 1 10 0
10 0 11 1
10 1 00 1

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 20

Solutions: Verification and


Test

20–1 A sample listing of features is detailed in the table below. This list is not
exhaustive, but provides a sampling of different features.

93

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

94 Digital Design: A Systems Approach, Solutions

Designation Name Description


I Input Values are successfully input into the system
I.1 1-digit num- All single digit numbers can be input
bers
I.2, I.3, I.4 Multi-digit 2-, 3-, 4-digit numbers are input correctly
numbers
I.5+ 5-digits or Inputs with more than 5 digits generate the ex-
more pected (illegal) behavior.
I.d Decimal Input decimal numbers
I.lz Leading 0 Inputing any number of leading 0s to a number
gives expected behavior
I.neg Negative User can input negative numbers
numbers
I.fun Function User can successfully indicate which of the 4 arith-
metic operations can be performed.
I.op 2nd number Inputting a number after the input of a function
I.Eq1 Equals, The equals input works after entering a number,
expected function, and another number
I.Eq2 Equals, un- The equals input works after entering a number
expected and a function (no second number) or just after a
single number.
I.post After calcu- Correctly handling input after the user inputs the
lation equals sign
A Arithmetic Testing the math computations themselves.
A.[+ 2 ×÷] Addition The basic operations work without overflow
A.over Overflowed The basic operations work with overflow
math
A.div0 Divide by 0 Correct error behavior when the user divides by 0
A ... Other math operations
D Display Operations that numbers, decimal points, and
negative signs display correctly

20–2 This feature list would be similar to that of Table 20.1. Users can also
add functionality such as a stop watch (including lap times) or multiple
time zones. The feature list should include all input buttons and their
function.

20–5 Below are six possible test patterns to be applied to the adder:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 95

Input 1 (hex) Input 2 (hex) Reason


0000 0000 0000 0000 Ensure the combinational unit
has no x outputs or other glaring
errors
AAAA AAAA 5555 5555 Check that all sum bits work cor-
rectly when there are no carries
3FFF FFFF 0000 0001 Check for correct carry propaga-
tion
3FFF FFFF 3FFF FFFF Largest addition of 2 positive
numbers without overflow
7FFF FFFF 0000 0001 Positive overflow condition
FFFF FFFF 0000 0001 Check that adding a negative and
positive number does not give
overflow
20–8 All four input combinations are needed to test that every one of the
outputs are assigned correctly.
20–10 The faults and test vectors are shown below. We only need inputs {a,
b, cin}={001, 010, 011, 110} to test for all possible faults.
Fault {a,b,cin} coutgood coutbad
g’ -0 110 1 0
g’ -1 001 0 1
p’ -0 001 0 1
p’ -1 011 1 0
cin-0 011 1 0
cin-1 010 0 1
20–11 We need to use vectors 001, 011, and 010 to test Q3.
Fault {a,b,cin} sgood sbad
p’ -0 001 1 0
p’ -1 011 0 1
cin-0 011 0 1
cin-1 010 1 0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

96 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 21

Solutions: System-Level
Design

21–1 Topics mentioned in the text that are unspecified are the velocity of a
serve, paddle size, etc. Other areas that need specification are the time-
step, speed of the ball (grids per time-step), and the speed of the paddle.
Moreover, the description should include the victory condition, what hap-
pens when the victory condition is obtained, and how to restart the game.
One possible edge case is if the bottom of the paddle strikes the top of the
ball. Users may also need to make sure that the ball does not have speed
enough to “go through” the paddle and not detect the collision. Does the
direction of the paddle during a collision or where on the paddle a collision
occurs effect the direction of the ball?

21–5 The new block diagram is shown below, adding a new Load FSM module.
When load is asserted and mode is idle, the load FSM will read each input
node and save it into the RAM. The load signal is asserted during song
playback. Playback cannot begin until after a song has been loaded and
load has been deasserted.

97

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

98 Digital Design: A Systems Approach, Solutions

note freq value


Sine Wave
start Note Note to
Synthesizer CODEC
FSM Frequency ready next
FSM

addr

addr
data

data
Quarter
Song
Sine
RAM
RAM
addr

data
load
Load
note FSM

21–6 The new block diagram is shown below. We have also updated the mode
table to explain the functionality of the pause and stop signals.

start note freq value


Sine Wave
pause Note Note to
Synthesizer CODEC
FSM Frequency ready next
stop FSM
addr

addr
data

data

Quarter
Song
Sine
RAM
RAM
addr

data

load
Load
note FSM

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 99

Name Description
idle No music being played, goes to playback starting at note 0 on start
and load on load.
playback Generating audible output, goes to pause on pause and idle on stop.
pause Not playing music, but will continue playing from current node on
either pause or start. Will ignore load inputs.
loading Loading a song, ignores all inputs that are not part of the load
FSM.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

100 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 22

Solutions: Interface and


System-Level Timing

22–1 Three further examples are: a timer for counting seconds, almost all
visual displays (always valid to the observer), or signals that indicate a
current long-term mode of device operation.

22–2 Periodic signals include a vending machine’s coin inputs, the output of
the arithmetic unit in dataflow FSMs, or messages sent across a bus (see
Chapter 24).

22–4 The Verilog is shown below. We save the incoming whenever the register
is not full, setting full to 0 when count reaches 4.

module rv2per(in, in_v, in_r, rst, clk, out) ;


input [7:0] in;
output [7:0] out;
input in_v;
output in_r;
reg in_r;
input rst, clk;

wire [2:0] count;


wire [2:0] nxt_count;
wire countIs5 = (count == 3’b100);
assign nxt_count = (countIs5 | rst) ? 3’b000 : count + 1;

//Save data if the data ff is empty, or at the end of the 5th


wire full;
wire saveData = countIs5 | ~full;
wire [7:0] nxt_data = rst ? 8’d0 : (saveData ? in : out);

101

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

102 Digital Design: A Systems Approach, Solutions

reg nxt_full;
always@(*) begin
casex({countIs5, full, in_v})
3’b000: {nxt_full, in_r} = 2’b00;
3’b001: {nxt_full, in_r} = 2’b11;
3’b01x: {nxt_full, in_r} = 2’b10;
3’b1x0: {nxt_full, in_r} = 2’b00;
3’b1x1: {nxt_full, in_r} = 2’b11;
default: {nxt_full, in_r} = 0;
endcase // case ({countIs5, full, in_r})
end
wire nxt_full_r = rst ? 1’b0 : nxt_full;

DFF #(3) cdff(clk, nxt_count, count);


DFF #(8) ddff(clk, nxt_data, out);
DFF #(1) edff(clk, nxt_full_r, full);

endmodule // rv2per

22–6 The double buffer design of Figure 23.11 meets the stated goals. We
could have also designed a module where we wrote (and read) from the
two flip-flops, alternating on every ready cycle.
22–8 One possible implementation of the serializer is show below:

module serializer(in, out, clk, sync);


//Use to trigger the 1st valid 64 bits instead of a reset
input sync, clk;
input [63:0] in;
output [7:0] out;
reg [7:0] out;

wire [2:0] count;


wire [2:0] nxt_count;
wire countIs7 = (count == 3’b111);
wire store = countIs7 | sync;
assign nxt_count = store ? 3’b000 : count + 1;

wire [63:0] data;


wire [63:0] nxt_data = store ? in : data;

//LSB first
always@(*) begin
case(count)
3’d0: out = data[7:0];
3’d1: out = data[15:8];

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 103

3’d2: out = data[23:16];


3’d3: out = data[31:24];
3’d4: out = data[39:32];
3’d5: out = data[47:40];
3’d6: out = data[55:48];
3’d7: out = data[63:56];
default: out = 8’hxx;
endcase // case (count)
end // always@ (*)

DFF #(3) cdff(clk, nxt_count, count);


DFF #(64) ddff(clk, nxt_data, data);

endmodule // serializer

22–14 An example timing table is below:


cycle ball posx ball posy paddley serve score pointsx
i 5 50 3 0 0 x
i+1 5 50 3 0 0 x
i + 20 4 50 4 0 0 x
i + 40 3 50 5 0 0 x
i + 60 2 50 6 0 0 x
i + 80 1 50 7 0 0 x
i + 100 0 50 8 0 1 x
i + 101 3 50 8 0 0 x+1
i + 102 3 50 8 0 0 x+1
... 3 50 8 0 0 x+1
i+j 3 50 8 1 0 x+1
i + j+1 4 50 8 0 0 x+1
i + j+31 5 50 8 0 0 x+1
22–15 The new timing table is below. We are able to start a new round of
decryption 2 cycles (instead of 3) after Round 16 of a failed block.
Cycle FK NK KGen KSel FB NB CT SD DES Check
-1 1 1
0 Key 0 Key 0 Block 0 1
1 Key 1 Key 0 Round 1
2 Key 0 Round 2
... Key 0 ...
15 Key 0 1 Round 15
16 Key 0 Block 1 1 Round 16
17 1 Key 1 Key 0 1 1 Round 1 Not PT
18 Key 2 Key 1 Block 0 Round 1
19 Key 1 Round 2
20 Key 1 Round 3
... Key 1 ...

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

104 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 23

Solutions: Pipelines

23–1 Using the equations from Section 23.1, the latency is 20.5ns and through-
put is 48 780 000 s21 .

23–2 Using the equations from Section 23.1, the latency is 20.5ns and through-
put is 243 900 000 s21 .

23–3 Using the equations from Section 23.1, the latency is 22.5ns and through-
put is 222 222 222 s21 .

23–4 Using the equations from Section 23.1, the latency is 22.5ns and through-
put is 1 111 111 111 s21 .

23–5 Our plot is shown below and shows that deep pipelining offers diminishing
returns. Errata: Ask the students to calculate the power (energy divided
by clock time) for each pipeline depth.

105

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

106 Digital Design: A Systems Approach, Solutions

9
2.5 x 10

1.5
throughput

0.5

0
100 110 120 130 140 150 160 170 180 190 200
area

23–7 1. It will take 200.5ns to complete the work (10 units of 20ns each, plus
the final register delay)
2. It will take 40.5ns to complete the work, since each of the 5 units will
complete a task in 20ns.
3. It will take 22.5ns to complete the first task (traversing the entire
pipeline), and 4.5ns to complete each of the 9 subsequent tasks. This
gives a total time of 63ns.
4. The final answers are 20 000.5ns, 4 000.5ns, and 4 518ns. Note that
the pipeline latency-penalty decreased from 50% to 13% as the batch
size got larger.
23–14 The second stage is the bottleneck. The utilizations are 50%, 100%,
25%, and 33%.
23–15 To have no idle stages whatsoever, we need have a throughput of each
stage equal to 200 000 000 s21 we must use the following replication scheme:

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 107

6 copies of stage 1, 12 copies of stage 2, 3 copies of stage 3, and 4 copies


of stage 4.
23–18 The expected number of total accesses in each cycle is 2. Thus, the
expected utilizations of the stages are 100%, 66%, and 50% for n=2,3,4
(respectively). There are a handful of ways to compute the probabilities
of conflicts, but one of which is to observe that there are 16 equally likey
combinations of pipeline requests. Of these 16 (from 0000 to 1111): 1 has
0 requests, 4 have 1 request, 6 have 2 requests, 4 have 3 requests, and 1
5 1
has 4 requests. The over-subscription probabilities are 16 , 16 , and 0. This
assumes a simplified model where subsequent requests are independent.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

108 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 24

Solutions: Interconnect

24–1 We can add full ready-valid flow control by adding three signals, as hinted
in the problem statement. bt ready is an input to the interface that
indicates that the bus is ready to transmit. br ready is an output that
indicates that the bus is ready to transmit. br ready should be high when
the arbiter has granted the bus to the client. ct ready is an output that
indicates that the client should be ready to receive data. ct ready should
be asserted high when the bus is ready to transmit, and the address of the
bus data is the address of this client.
The updated Verilog from 24.3 is below:

// Combinational Bus Interface


// t (transmit) and r (receive) in signal names are from the
// perspective of the bus
module BusInt(cr_valid, cr_ready, cr_addr, cr_data, // bus rx - to the bus
ct_valid, ct_data, ct_ready, // bus tx - from the bus
br_addr, br_data, br_valid, br_ready, // to the bus
bt_addr, bt_data, bt_valid, bt_ready, // from the bus
arb_req, arb_grant, // the arbiter
my_addr) ; // address of this interface
parameter aw = 2 ; // address width
parameter dw = 4 ; // data width
input cr_valid, arb_grant, bt_valid, bt_ready ;
output cr_ready, ct_valid, arb_req, br_valid, br_ready, ct_ready ;
input [aw-1:0] cr_addr, bt_addr, my_addr ;
output [aw-1:0] br_addr ;
input [dw-1:0] cr_data , bt_data ;
output [dw-1:0] br_data, ct_data ;

// arbitration
assign arb_req = cr_valid ;
assign cr_ready = arb_grant ;

109

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

110 Digital Design: A Systems Approach, Solutions

// bus drive
assign br_valid = arb_grant ;
assign br_addr = arb_grant ? cr_addr : 0 ;
assign br_data = arb_grant ? cr_data : 0 ;
assign br_ready = arb_grant ;

// bus receive
assign ct_valid = bt_valid & (bt_addr == my_addr) ;
assign ct_data = bt_data ;
assign ct_ready = bt_ready & (bt_addr == my_addr) ;
endmodule

24–4 bt valid will not be asserted until all multicast clients are ready to re-
ceive. As we are now given cr vector (a vector that indicates which
clients should be transmitted to) instead of cr addr, we can look at bit
my addr in cr vector to see if our client should be receiving.
The updated Verilog from 24.3 is below. This builds on the Verilog from
exercise 24–1.

// Combinational Bus Interface


// t (transmit) and r (receive) in signal names are from the
// perspective of the bus
module BusInt(cr_valid, cr_ready, cr_vector, cr_data, // bus rx - to the bus
ct_valid, ct_data, ct_ready, // bus tx - from the bus
br_vector, br_data, br_valid, br_ready, // to the bus
bt_vector, bt_data, bt_valid, bt_ready, // from the bus
arb_req, arb_grant, // the arbiter
my_addr) ; // address of this interface
parameter aw = 2 ; // address width
parameter dw = 4 ; // data width
parameter vw = 2 ** aw ; // vector width - corresponds to number of clients in a
input cr_valid, arb_grant, bt_valid, bt_ready ;
output cr_ready, ct_valid, arb_req, br_valid, br_ready, ct_ready ;
input [aw-1:0] my_addr ;
input [vw-1:0] cr_addr, bt_addr ;
output [vw-1:0] br_addr ;
input [dw-1:0] cr_data , bt_data ;
output [dw-1:0] br_data, ct_data ;

// arbitration
assign arb_req = cr_valid ;
assign cr_ready = arb_grant ;

// bus drive

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 111

assign br_valid = arb_grant ;


assign br_addr = arb_grant ? cr_addr : 0 ;
assign br_data = arb_grant ? cr_data : 0 ;
assign br_ready = arb_grant ;

// bus receive
assign ct_valid = bt_valid & (bt_addr [my_addr]) ;
assign ct_data = bt_data ;
assign ct_ready = bt_ready & (bt_addr [my_addr]) ;
endmodule

24–4 To expand from a 2x2 crossbar to a 4x4 crossbar, we increase the size of
the request/grant matrices. The Verilog below is an expansion of 24.5.

// 4x4 Crossbar switch - full flow control


module Xbar44(c0r_valid, c0r_ready, c0r_addr, c0r_data, // client 0
c0t_valid, c0t_ready, c0t_data,
c1r_valid, c1r_ready, c1r_addr, c1r_data, // client 1
c1t_valid, c1t_ready, c1t_data,
c2r_valid, c2r_ready, c2r_addr, c2r_data, // client 2
c2t_valid, c2t_ready, c2t_data,
c3r_valid, c3r_ready, c3r_addr, c3r_data, // client 3
c3t_valid, c3t_ready, c3t_data) ;
parameter dw = 4 ; // data width
input c0r_valid, c0t_ready, c1r_valid, c1t_ready, c2r_valid, c2t_ready, c3r_valid, c3t_rea
output c0r_ready, c0t_valid, c1r_ready, c1t_valid, c2r_ready, c2t_valid, c3r_ready, c3t_va
input [1:0] c0r_addr, c1r_addr, c2r_addr, c3r_addr ; // address
input [dw-1:0] c0r_data, c1r_data, c2r_data, c3r_data ; // data
output [dw-1:0] c0t_data, c1t_data, c2t_data, c3t_data ;

// request matrix
wire req00 = (c0r_addr == 0) & c0r_valid ;
wire req01 = (c0r_addr == 1) & c0r_valid ;
wire req02 = (c0r_addr == 2) & c0r_valid ;
wire req03 = (c0r_addr == 3) & c0r_valid ;
wire req10 = (c1r_addr == 0) & c1r_valid ;
wire req11 = (c1r_addr == 1) & c1r_valid ;
wire req12 = (c1r_addr == 2) & c1r_valid ;
wire req13 = (c1r_addr == 3) & c1r_valid ;
wire req20 = (c0r_addr == 0) & c2r_valid ;
wire req21 = (c0r_addr == 1) & c2r_valid ;
wire req22 = (c0r_addr == 2) & c2r_valid ;
wire req23 = (c0r_addr == 3) & c2r_valid ;
wire req30 = (c1r_addr == 0) & c3r_valid ;
wire req31 = (c1r_addr == 1) & c3r_valid ;
wire req32 = (c1r_addr == 2) & c3r_valid ;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

112 Digital Design: A Systems Approach, Solutions

wire req33 = (c1r_addr == 3) & c3r_valid ;

// arbitration 0 wins --> 3 loses


wire grant00 = req00 ;
wire grant01 = req01 ;
wire grant02 = req02 ;
wire grant03 = req03 ;
wire grant10 = req10 & ~req00 ;
wire grant11 = req11 & ~req01 ;
wire grant12 = req12 & ~req02 ;
wire grant13 = req13 & ~req03 ;
wire grant20 = req20 & ~req10 & ~req00 ;
wire grant21 = req21 & ~req11 & ~req01 ;
wire grant22 = req22 & ~req12 & ~req02 ;
wire grant23 = req23 & ~req13 & ~req03 ;
wire grant30 = req30 & ~req20 & ~req10 & ~req00 ;
wire grant31 = req31 & ~req21 & ~req11 & ~req01 ;
wire grant32 = req32 & ~req22 & ~req12 & ~req02 ;
wire grant33 = req33 & ~req23 & ~req13 & ~req03 ;

// connections
assign c0t_valid = (grant00 & c0r_valid) | (grant10 & c1r_valid) | (grant20 & c2
assign c0t_data = ({dw{grant00}} & c0r_data) | ({dw{grant10}} & c1r_data) |
({dw{grant20}} & c2r_data) | ({dw{grant30}} & c3r_data) ;
assign c1t_valid = (grant01 & c0r_valid) | (grant11 & c1r_valid) | (grant21 & c2
assign c1t_data = ({dw{grant01}} & c0r_data) | ({dw{grant11}} & c1r_data) |
({dw{grant21}} & c2r_data) | ({dw{grant31}} & c3r_data) ;
assign c2t_valid = (grant02 & c0r_valid) | (grant12 & c1r_valid) | (grant22 & c2
assign c2t_data = ({dw{grant02}} & c0r_data) | ({dw{grant12}} & c1r_data) |
({dw{grant22}} & c2r_data) | ({dw{grant32}} & c3r_data) ;
assign c3t_valid = (grant03 & c0r_valid) | (grant13 & c1r_valid) | (grant23 & c2
assign c3t_data = ({dw{grant03}} & c0r_data) | ({dw{grant13}} & c1r_data) |
({dw{grant23}} & c2r_data) | ({dw{grant33}} & c3r_data) ;

// ready
assign c0r_ready = (grant00 & c0t_ready) | (grant01 & c1t_ready) | (grant02 & c2
assign c1r_ready = (grant10 & c0t_ready) | (grant11 & c1t_ready) | (grant12 & c2
assign c2r_ready = (grant20 & c0t_ready) | (grant21 & c1t_ready) | (grant22 & c2
assign c3r_ready = (grant30 & c0t_ready) | (grant31 & c1t_ready) | (grant32 & c2
endmodule

24–10 To build a buffered crossbar, we will expand upon the 2x2 crossbar
in 24.5. We will start by creating a parametrized first-in, first-out (FIFO)
buffer. This requires a random-access memory (RAM) and a counter.

// A simple random access memory (RAM).

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 113

module RAM (clk, data_in, data_out, addr_in, addr_out, write_en);


parameter dw = 4 ; // data width
parameter aw = 2 ; // address width
parameter elements = 2 ** aw ; // number of elements in RAM
input clk ; // clock input
input [dw-1:0] data_in ; // data input
output [dw-1:0] data_out ;
input [aw-1:0] addr_in ; // write address
input [aw-1:0] addr_out ; // read address
input write_en ; // write enable

reg [aw-1:0] data_out ; // data out will be a reg

// memory array
reg [dw-1:0] memory [0:elements-1];

// read/write memory on positive edge of clock


always @(posedge clk) begin
data_out <= memory [addr_out]; // get output value

if (write_en)
memory [addr_in] <= data_in; // write to memory if enable high
end
endmodule

// A simple positive edge triggered D-flip flop


module DFF (clk, d, q);
parameter width = 1 ;
input clk ;
input [width-1:0] d ;
output [width-1:0] q ;

always @(posedge clk)


q <= d;

endmodule

// A simple counter. Increments by 1 on every clock that enable is high.


module counter (clk, rst, count, enable);
parameter width = 1 ;
input clk ; // positive edge clocked
input rst ; // synchronous block reset
input enable ;
output [width-1:0] count ;

wire [width-1:0] next_count;

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

114 Digital Design: A Systems Approach, Solutions

// increment by 1 if enabled - if reset is high, reset to 0


assign next_count = rst ? 1’b0 : enable ? count + 1 : count ;

// D-flip flop to store count value


DFF #(width) count_reg (.clk (clk), .d (next_count), .q (count));
endmodule

// A simple first-in, first-out (FIFO) buffer.


// read data is always ready
module FIFO (clk, rst, data_in, data_out, write_en, read_en, full, empty);
parameter dw = 4 ; // data width
parameter aw = 2 ; // address width for RAM in FIFO buffer
input clk ; // clock input
input rst ; // synchronous reset
input [dw-1:0] data_in ;
output [dw-1:0] data_out ;
input write_en ; // write enable
input read_en ; // read enable
output full ;
output empty ;

// read and write pointers in buffer


wire [aw-1:0] read_addr, write_addr;

// read and write counters


counter #(aw) read_counter (.clk (clk), .rst (rst), .count (read_addr), .enable
counter #(aw) write_counter (.clk (clk), .rst (rst), .count (write_addr), .enabl

// buffer memory
RAM #(.aw (aw), .dw (dw)) memory (.clk (clk), .data_in (data_in), .data_out (dat
.addr_in (write_addr), .addr_out (read_addr),
.write_en (write_en));

// buffer is empty when read and write pointers are equal


assign empty = write_addr == read_addr ;

// buffer is full when write pointer is 1 below read pointer


assign full = (write_addr - 1) == read_addr ;
endmodule

Now, we modify the previous 2x2 crossbar to add the FIFO buffers.

// 2x2 Buffered crossbar switch - full flow control


module BufXbar22(clk, rst,
c0r_valid, c0r_ready, c0r_addr, c0r_data, // client 0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 115

c0t_valid, c0t_ready, c0t_data,


c1r_valid, c1r_ready, c1r_addr, c1r_data, // client 1
c1t_valid, c1t_ready, c1t_data) ;
parameter dw = 4 ; // data width
paramter aw = 2 ; // address width of fifo buffers
input clk, rst ; // clk and rst to fifo buffers
input c0r_valid, c0t_ready, c1r_valid, c1t_ready ; // r-v handshakes
output c0r_ready, c0t_valid, c1r_ready, c1t_valid ;
input c0r_addr, c1r_addr ; // address
input [dw-1:0] c0r_data, c1r_data ; // data
output [dw-1:0] c0t_data, c1t_data ;

// buffer wires
wire buf00_empty, buf01_empty, buf10_empty, buf11_empty ;
wire buf00_full, buf01_full, buf10_full, buf11_full ;
wire [dw-1:0] buf00_data, buf01_data, buf10_data, buf11_data ;

// request matrix
wire req00 = (c0r_addr == 0) & c0r_valid ;
wire req01 = (c0r_addr == 1) & c0r_valid ;
wire req10 = (c1r_addr == 0) & c1r_valid ;
wire req11 = (c1r_addr == 1) & c1r_valid ;

// arbitration 0 wins
wire grant00 = ~buf00_empty ;
wire grant01 = ~buf01_empty ;
wire grant10 = ~buf10_empty & buf00_empty ;
wire grant11 = ~buf11_empty & buf01_empty ;

// connections
assign c0t_valid = grant00 | grant10 ;
assign c0t_data = ({dw{grant00}} & buf00_data) |
({dw{grant10}} & buf10_data) ;
assign c1t_valid = grant01 | grant11 ;
assign c1t_data = ({dw{grant01}} & buf01_data) |
({dw{grant11}} & buf11_data) ;

// ready when all buffers for input are not full


assign c0r_ready = ~buf00_full & ~buf01_full ;
assign c1r_ready = ~buf10_full & ~buf11_full ;

// buffer instantiations
FIFO #(.dw (dw), .aw (aw)) buf00 (.clk (clk), .rst (rst),
.data_out (buf00_data), .data_in (c0r_data),
.write_en (req00), .read_en (grant00),
.full (buf00_full), .empty (buf00_empty));

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

116 Digital Design: A Systems Approach, Solutions

FIFO #(.dw (dw), .aw (aw)) buf01 (.clk (clk), .rst (rst),
.data_out (buf01_data), .data_in (c0r_data),
.write_en (req01), .read_en (grant01),
.full (buf01_full), .empty (buf01_empty));
FIFO #(.dw (dw), .aw (aw)) buf10 (.clk (clk), .rst (rst),
.data_out (buf10_data), .data_in (c1r_data),
.write_en (req10), .read_en (grant10),
.full (buf10_full), .empty (buf10_empty));
FIFO #(.dw (dw), .aw (aw)) buf11 (.clk (clk), .rst (rst),
.data_out (buf11_data), .data_in (c1r_data),
.write_en (req11), .read_en (grant11),
.full (buf11_full), .empty (buf11_empty));
endmodule

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 25

Solutions: Memory Systems

25-1 1. 13-bits total are needed

{word, byte} = {address[12:2], address[1:0]}

2. The size of one word is 8×2=16B. We need a total of 10 bits to


address each of the 1024 words.

{word, byte} = {address[13:4], address[3:0]}

3. 17 bits total are needed.

{word, bank, byte} = {address[16:8], address[7:4] ,address[3:0]}

4. Each word is 16×8=128B.

{word, bank, byte} = {address[19:10], address[9:7], address[6:0]}

25-3 1. See the timing table below. The total time is 130 cycles.

117

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

118 Digital Design: A Systems Approach, Solutions

Command Time
Activate R0 5
Read C1 5
Read C2 5
Read C3 5
Precharge R0 5
Act R1 5
Read C0 5
RAS: Wait to PC 2
Precharge R1 5
Act R2 5
Read C0 5
RAS: Wait to PC 2
Precharge R2 5
Act Ra 5
Read C3 5
RAS: Wait to PC 2
Precharge Ra 5
Act Rb 5
Read C3 5
RAS: Wait to PC 2
Precharge Rb 5
Act R0 5
Read C4 5
RAS: Wait to PC 2
Precharge R0 5
Act Rb 5
Read C1 5
Read C2 5
Precharge Rb 5
Total 130 cycles

2. See the timing table below with rearranged values. This solutions
uses a greedy algorithm to group all accesses to one row together.
The total time is 106 cycles.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 119

Command Time
Activate R0 5
Read C1 5
Read C2 5
Read C3 5
Read C4 5
Precharge R0 5
Act R1 5
Read C0 5
RAS: Wait to PC 2
Precharge R1 5
Act R2 5
Read C0 5
RAS: Wait to PC 2
Precharge R2 5
Act Ra 5
Read C3 5
RAS: Wait to PC 2
Precharge Ra 5
Act Rb 5
Read C3 5
Read C1 5
Read C2 5
Precharge Rb 5
Total 106 cycles

1–6 With a single word cache line, the cache hit rate will be 85% (the base-
line). The sequence of addresses does not matter. With a line size of n,
however, we can eliminate (with P = 0.95) the next n-1 misses. As a
simplified example with 1000 cache accesses, we have a total of 150 cache
misses. With a line size of 2, the number of misses is reduced by about 71
(92% hit rate). With line sizes of 4 and 8, the number is reduced by 107
(95.6%) and 125 (97.5%), respectively. The memory bandwidth required
for data increasing with line size because unneeded words can potentially
be fetched. The control bandwidth decreases, as there are less requests.
25-8 1. We simply need two address such that (A1 mod n) = (A2 mod n).
Each access will conflict, evicting the previous address’s line.
2. A sequence of n+1 unique addresses will cause a miss on every access,
assuming we evict the least recently used value.
3. A sequence of w + 1 addresses that map to the same set will never
hit in the cache.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

120 Digital Design: A Systems Approach, Solutions

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 26

Solutions: Asynchronous
Sequential Circuits

1–1 The flow table is shown below. The circuit sets itself into the 0 or 1 state
when both of its inputs are 0 or 1, respectively. When the inputs are 01,
the state toggles, and inputs 10 cause the state to hold.

Next
State
00 01 11 10
0 0 1 1 0
1 0 0 1 01

1–2 The waveform, state transition table, and K-maps are shown below. We
had to use 6 different states, and do not include transitions that are in-
consistent with the problem description.

121

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

122 Digital Design: A Systems Approach, Solutions

A B C D A B C A E F D A 01 11
a a

b b

Next {a,b}
State s2s1s0 {A,B}
00 01 11 10
A A E - B 000 00
B - - C 0B 010 10
C - D C - 110 00
D A D - 0D 100 00
E - ED F - 001 01
F - - CF D 101 00

a,b b a,b b
s2,s1 s2,s1
A E B E F
x x x
000 001 010 001 101

C B
x x x x x x
110 010
s1

s1
D C
x x x x x x
100 110
s2

s2
A D D F D
x x x
000 100 100 101 100

a a
s0=0 s0=1

1–5 The waveform, state transition table, and K-maps are shown below.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 123

A B C D E F A

in

s0,i i
s2,s1
Next
State s2s1s0 abc A B B C
0 1
000 001 001 011
A A B 000 000
B C B- 001 100 E D D C
110 010 010 011
C C D 011 000
s1

D E D 010 010 E F
x x
110 100
E E- F 110 000

s2
F A F- 100 001 A F
x x
000 100

s0

1–9 The solution is presented below. Taking the simple approach yields a total
of 12 different states. However, if i is the first signal to rise, and i and q
have the same frequency, only 4 states are necessary. We show the K-map
for only the 4-state solution.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

124 Digital Design: A Systems Approach, Solutions

States1 A B C D E F G H I J K L A B
States2 A B B B C C C D D D A A A B

i,q q
Next {i,q} s1,s0
State s1s0 x
00 01 11 10 A A A B
A A EA -
A B 00 0 00 00 00 01
B C B- CB 0B 01 1 C B B B
C C D C CD 11 0 11 01 01 01

s0
D AD D A 0D 10 1 C D C C
11 10 11 11

s1
D D A D
10 10 00 10

1–12 We allow the transition to B to go through 10000 or 10011 using the K-


maps below. We changed the value for r at 10001 to be the desired final
value of 0. The values at 10000 and 10000 must also be changed from x
to the desired final state value so having a race results in the correct final
state.

a,r r a,r r a,r r a,r r


b,s b,s b,s b,s
1 0 0 0 0 0 0 1 1 1 1 1

0 0 0 0 0 0
s

0 0
b

1 1 0 1 1 0

a a a a
in=0 in=1 in=0 in=1
r a

r = (b ' in) ( (r ' in)


a = (b ' s ' in) ( (a ' s)

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 27

Solutions: Flip-Flops

27–1 In the transition of D from 1 to 0, both inputs to U4 must stabilize through


the path of U1 + U3 + U5 and separately through U2 . The inputs to U5
must be stable from a D transition from 0 to 1.

ts = max(t1 + t3 + t5 , t2 + t4 )

27–2 The hold time in this situation is 0. Once the g input falls to 0, no change
in d can change s’, r’ or the outputs.

th = 0

27–3 The maximum delay comes from a transition from 1 to 0.

tdDQ = max(t1 + t3 + t5 , t2 ) + t4

27–4 The enable to q delay is given by:

tdGQ = max(t3 + t5 , t2 ) + t4

27–5 As stated in the text, the setup time is that of the master.

ts = max(t2 + t4 + t6 , t3 + t5 )

27–8 The setup time is simply tg + t2 .


27–17 The state transitions are shown in the table below. The state is listed
from the output of the top NAND gate to the bottom NAND gate. When
the clock is off, the state toggles between 1110 and 0111 depending on
d (via 0110 and 1111). When in one of the stable states (1110 or 0111)
and the clock goes high, the system moves into either 1010 or 0101. This
causes the output of the final RS latch to take on the last value of d. While
the clock remains high, no change to the input is reflected in the output.

125

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

126 Digital Design: A Systems Approach, Solutions

Next {d,c}
State q
00 01 11 10
1110 1111 - 1010 1110 q
1111 0111 - - - q
0111 0111 0101 - 0110 q
0110 - - - 1110 q
1010 - 1011 1010 1110 1
1011 1111 1011 1010 - 1
0101 0111 0101 0101 0111 0

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 28

Solutions: Metastability
and Synchronization Failure

28–1 The system will settle in 4.1τs .

ts = 2τs log (0.016) = 4.1τs

28–3 The minimum required voltage difference is 0.91mV


 
7τs
∆V (0) = 1V exp 2 = 0.91mV
τs

28–6 There will be an error on about 40% of the asynchronous signal transi-
tions.
ts + th
PE = = 0.4
tcy

28–9 The asynchronous signal must transition no more than 250 times a second.
fE
fE = ts +th = 250Hz
tcy

28–12 The answers are shown below. We can compute the ratio of errors in
FF1 vs. FF2 below:

 
2tw
P1 (ts1 + th1 ) exp τs1
=  
P2 (ts2 + th2 ) exp 2tw
τs2
P1
0.2 exp 5 7 1010 tw

=
P2

127

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

128 Digital Design: A Systems Approach, Solutions

1. The first flip-flop is most desirable because it is least likely to enter


an illegal state.
2. The second flip-flop will have en error about 2.4× less often.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Chapter 29

Solutions: Synchronizer
Design

29–2 See below:

   
ts + th 2(5tcy 2 ts 2 tdCQ )
PES = exp
tcy τs
= 5.8 × 10228
fES = fa PES
= 1.2 × 10221
M T BF = 8.63 × 1018 s

29–3 See below, noting that 5 flip-flops gives a total wait time of 4 cycles:

   
ts + th 24(tcy 2 ts 2 tdCQ )
PES = exp
tcy τs
= 3.0 × 10220
fES = fa PES
= 5.9 × 10212
M T BF = 1.7 × 1011 s

29–5 The final answer is that we must only wait 2 clock cycles. First, we
calculate the failure frequency:

MTBF = (106 )(30)(365)(24)(3600) = 9.5 × 1014 s


fES = 1.1 × 10215 Hz

129

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

130 Digital Design: A Systems Approach, Solutions

Next, we can compute the probability of error and finally the error window:

fES
PES = = 3.2 × 10216
fa
 
2tw tcy
exp = PE S = 4.5 × 10215
τs ts + th
tw = 2τs log(4.5 × 10215 ) = 1.32ns
tw + ts + tdCQ
N g g 1.4
tcy

29–7 The time between bit transitions must be at least tcy,o + ts + th . This is
to ensure that two transitions can enter into an illegal state in consecutive
cycles.

29–9 In the simplest form of the problem requires placing synchronizers on the
increment and decrement signals. This is a case in which the input logic
does not require knowledge of the output signal (can increment past f f16 ).
We may want to include logic that will only move the counter once for
every positive edge of an input and not continuously.

29–11 The figure below shows the control data-path for indicating if a partic-
ular register is ready for new data (not present) or valid (present). We
construct an FSM with 4 states, keeping 1 bit of state in each clock do-
main. In 2 of the states (00, 11) the register is considered empty, while it
is considered full in the others. The state diagram is shown at the bottom
of the figure. Note that only logic from one clock domain can only change
the state bit in the same domain.

Downloaded by DongJun Lee ([email protected])


lOMoARcPSD|32379196

Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 131

S1 iready
S0i
sync
clkin
input clock domain
output clock domain

S0 ovalid
S1o
sync
clkout

ivalid
00 10
oready
oready

01 ivalid 11

Downloaded by DongJun Lee ([email protected])

You might also like