Digitaldesign Partialsolution
Digitaldesign Partialsolution
Selected Solutions
Digital Design: A Systems Approach
Chapter 1
1–2 1. 0.5V
2. -0.5V
3. 0.7V
4. 0.4V
1–3 VN f 0.4V
1–8
000 68
001 70
011 72
010 74
1–9
110 76
111 78
101 80
100 82
1–11 For both (a) and (b) we can use a 6-bit scheme where the top 2 bits
represent the suit. The bottom 4 bits represent the rank in suit. In both
representations, the lower four bits are used to do any coparison of rank.
The upper two are used to do a comparison of suits.
1–15 The rules are to go west of the current address is greater than the des-
tination and east if the current is less than the destintation. If the two
addresses are equal, the current processor is the destination.
0000 0001 0010 0011
0100 0101 0110 0111
1–16 1.
1000 1001 1010 1011
1100 1101 1110 1111
2. Using the rules of the previous problem, the lower two bits determine
ease/west routing (east is higher) and the upper two bits represent
north/south (south is higher).
3. By splitting the north/south and east/west addesses, it becomes
much simplier to determine direction. If nodes had been assigned
meaningless IDs, a routing table (or more complex logic) is required
to determine direction.
1–17 One example is:
0000 0
0001 1
0011 2
0111 3
0101 4
0100 5
Chapter 2
2–1 Student solutions should include information about how the system con-
nects to the TV (HDMI), resolution (1080p), the processor, DRAM, and
video. The games could be burned into the system, on DVD, or down-
loadable (wired or wireless Internet?). The controllers could be wired or
wireless, etc.
2–2 Version 2 of the console should probably be better than the first version, so
upgrades to core components such as DRAM, graphics card, and processor
are probably necessary. The controllers and video output can potentially
remain the same. Another option is to take advantage of advances in
fabrication technology to make a version of the console with the same
performance. Presumably, this would make the device cheaper and draw
less power.
2–4 In our example, we would buy the network interface and HDMI output
since they are commodity parts that our team could only mess up. We
would built the motherboard, for example, to tie together our components
in a custom fashion.
2–9 To average the four inputs on every cycle we need 3 sets of 32 flip-flops
(28.8kgrids) and 3 adders (90kgrids). We don’t need any multipliers be-
cause we can shift the sum of the four numbers by 2 (in binary). The
total is 118.8kgrids (or 488kgrids with a multiplier). To do the weighted
average, we need 4 multipliers (1.2Mgrids) and storage for 128 bits of
data (256 grids in ROM and 3072 in SRAM). The sum is approximately
1.3Mgrids regardless of the storage mechanism.
2–10 See answers below:
1. We need storage for 2.4 × 107 bits of SRAM. This is 5.76 × 108 grids,
or 4.8mm2 .
2–17 In 2015 there would be over 15B transistors and in 2020, 115B.
Chapter 3
x y x ' (x ( y) x x ( (x ' y) x
0 0 0 0 0 0
0 1 0 0 0 0
1 0 1 1 1 1
1 1 1 1 1 1
x x'x x(x
3–2 0 0 0
1 1 1
3–9 We simplify this equation by using the commutative and associative prop-
erties followed by the combing property (twice):
3–10
((y ' z̄) ( (x̄ ' w)) ' ((x ' ȳ) ( (z ' w̄))
= ((y ' z̄) ' ((x ' ȳ) ( (z ' w̄))) ( ((x̄ ' w) ' ((x ' ȳ) ( (z ' w̄))) Distributive
= ((y ' z̄ ' x ' ȳ) ( (y ' z̄ ' z ' w̄)) ( ((x̄ ' w ' x ' ȳ) ( (x̄ ' w ' z ' w̄)) Distributive
= (0 ( 0) ' (0 ( 0) Complementation
= 0
3–13 The dual is found by simply replaced ( with ' and ' with (.
This equation is true when x and y are the same, thus in normal form:
3–20 We first directly write the equation from the schematic and then simplify
using DeMorgan’s law:
f (x, y, z) = (x ( y) ' (x ( y)
= (x ' y) ( (x ' y)
3–23 We directly draw the schematic from the equations, using inversion bub-
bles instead of full inverters:
Chapter 4
4–1 We enumerate all paths through the switch and write the logic in sum-of-
products from:
x y z
Ç[ x[
x y z
x
w y
z
y
x
z
y z
4–11 By substituting a closed switch for PFETs with a 0 input and NFETs
with a 1 input (and open for PFET/1 and NFET/0), we can see that the
circuit outputs a 1 for the given input:
c c
b b
f
b b
a a
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 11
L
Rs,N = KRN
W
1
WP = WN Kp
0.5
Cg = W LKc
Rs,N,130 = 1kΩ
Rs,N,28 = 2.1kΩ
Cg,N,130 = 4f F
Cg,N,28 = 0.56f F
WN,130 = 100Lmin
WN,28 = 52Lmin
Cg,P,130 = 20f F
Cg,P,28 = 1.5F
4–17 1.
2.
T = 400(1 + KP )τn
T130 = 5.46ns
T28 = 1.10ns
3.
N
Θ =
T
Gops
Θ130 = 10.9
s
T ops
Θ28 = 1.16
s
4–18
a
c
4–20
4–23 We write the equation, simplify it, and then draw the circuit:
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 13
f
c[
4–26 Because CMOS gates are inverting, we write the equation using the
minterms that set f = 0 (000, 100, 110, 111). Next, we simplify and
draw the gate:
f (c, b, a) = (c ' b ' a) ( (c ' b ' a)) ( (c ' b ' a) ( (c ' b ' a)
= (b ' a) ( (c ' b)
f
b[
f (d, c, b, a) = d ( c ( (b ' a)
Chapter 5
5–1 The pull-down (NFET) transistor has equal resistance to that of a mini-
mum inverter, giving a maximum (and minimum) fall time of the product
of fanout and inverter delay. That is tf max = 4tinv . The rise time, how-
ever, has a resistance to one-third that of a minimum inverter. Thus,
trmax = 34 tinv .
n
X
E = Ci+1 V 2
i=0
n
X Ci+1
= Einv
i=0
Cinv
= (2 + 4 + 8 + 16 + 32 + 64 + 128 + 256)Einv
= 510Einv
5–8 For part (a), we draw the schematic below as a AND-OR-ANDI gate. To
size the transistor diagram, we want the maximum pull-up and pull-down
resistance to be that of a minimum inverter. Recall that resistance is
inversely proportional to width and we find the resistance of the pull-
down path of c-d-a to be 31 + 31 + 31 = 1. Pull-down path of b-a is 32 + 13 .
In this case having Wa = 2, Wc = Wd = 4, and Wb = 2 is also a valid
answer. We apply a similar methodology to the pull-up network.
15
2Kp
2Kp
a
Kp
2Kp
d b
c
b f
a
3
c 1.5
3
d
3
(a) (b)
To calculate the logical effort of each input, we find the total input capac-
itance and divide by that of an inverter: 1 + Kp :
3 + Kp
LEa =
1 + Kp
1.5 + 2Kp
LEb =
1 + Kp
3 + 2Kp
LEc =
1 + Kp
3 + 2Kp
LEd =
1 + Kp
5–14 To find the minimum delay, we first find the total effort. Next, we find
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 17
Using this stage effort (F O × LE) we find the new optimal sizes and delay:
Signal Fanout Size of Driven Logical Effort (Kp = 1.3) Delay
Gate
i to i + 1 i + 1 i to i + 1 i + 1 i to i + 1
b 1.70 3.19 1.87 3.17
cN 1.17 5.4 2.7 3.17
d 3.17 6.3 1 3.17
eN 3.17 20 1 3.17
TOTAL a to eN 12.68tinv
5–17 To solve this problem, we use the same methodology as in Example 5.6:
The size of our inverter is WN = 20 7 8Lmin = 160Lmin.
Ω
Rw = 10 × 500µm = 5000Ω,
µm
fF
Cw = 0.18 × 500µm = 90fF,
µm
KRN 4.2 × 104
Rr = = = 263Ω,
WN 160
Cr = WN (1 + KP )KC = 160(1 + 1.3)2.8 × 10217 = 10.3fF.
5–20 1. As was shown in the text, the delay per millimeter of an optimally
sized and spaced wire (61µm) is 228ps. 5mm is just 5 times that
amount: 1.14ns. The energy to transmit one bit (Vdd = 1) is:
1 2
E = V 7 (Cwire + Cd )
2 dd
1 2 fF
E = V 7 (5000µm 7 0.18 + 82 7 108(1 + 1.3)2.8 × 10217 )
2 dd µm
E = 0.75pJ
We found the total driver capacitance by multiplying the capacitance
of each driver by the number of drivers.
2. By doubling the spacing between wires to 122µm, we only need 41
segments. This reduces our driver energy by half and gives an energy
per bit of 0.6pJ. The delay calculation is:
Ω
Rw = 10 × 122µm = 1220Ω,
µm
fF
Cw = 0.18 × 122µm = 22fF,
µm
KRN 4.2 × 104
Rr = = = 389Ω,
WN 108
Cr = WN (1 + KP )KC = 108(1 + 1.3)2.8 × 10217 = 7.0fF
Dl = 0.4Rw Cw + Rr Cw + (Rr + Rw )Cr
= (0.4)(1220)(22) + (389)(22) + (1220 + 389)(7.0)
= 180000 + 23, 670 + 54, 209 = 31ps
D = 1.27ns
5–22 We calculate energy in the same way we did in Exercise 5–4. Remember
that the input capacitance is the product of the size and logical effort of
a gate:
n
X
E101 = Ci+1 V 2
i=0
n
X Ci+1
= Einv
i=0
Cinv
= (1 + 2 7 1.87 + 4 7 2.7 + 4 + 20)Einv
= 39.5Einv
Chapter 6
Solutions: Combinational
Logic Design
No. in out
0 0000 1
1 0001 1
2 0010 1
3 0011 1
4 0100 0
5 0101 1
6 0110 0
6–2 (a) 7 0111 0
8 1000 1
9 1001 0
10 1010 0
11 1011 0
12 1100 0
13 1101 1
14 1110 0
15 1111 0
19
Number of variables
4 3 2 1
0000 000X 00XX
0001 00X0
(c) 0010 X000
0011 00X1
0101 0X01
1000 001X
1101 X101
The prime implicants of this function are 00XX, X000, 0X01, and
X101.
(d) The essential prime implicants of the function are 00XX (only cover
of 2,3), X000 (only one to cover 8), and X101 (only one to cover 13).
(e) The function is covered by the three essential prime implicants.
(f) See figure below:
a
ba
dc 00 01 11 10
d
10 11 13 12
00
04 15 07 06
01
b f
c
a
d
18 09 011 010
10
b
(b) (f)
No. in out
0 0000 1
1 0001 1
2 0010 1
3 0011 1
4 0100 0
5 0101 1
6 0110 0
6–4 (a) 7 0111 0
8 1000 1
9 1001 0
10 1010 X
11 1011 X
12 1100 X
13 1101 X
14 1110 X
15 1111 X
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 21
Number of variables
4 3 2 1
0000 000X 00XX
0001 00X0 X0X0
0010 X000 1XX0
0011 00X1
0101 0X01
1000 001X
X010
X011
X101
10X0
1X00
The prime implicants of this function are 00XX, X0X0, 1XX0, 0X01,
X101, and X011.
(c) The essential prime implicant of the function is 00XX (only cover of
3).
(d) The function is covered by the implicants 00XX, X0X0, and X101.
a
ba
dc 00 01 11 10
d
10 11 13 12
00
04 15 07 06
01
b f
c
a
d
18 09 X11 X10
10
b
(b) (f)
Number of variables
5 4 3 2 1
00010 0001X X0X11
00011 00X11
00101 0X011
00111 X0011
01011 001X1
01101 0X101
10001 X1101
10011 100X1
10111 10X11
11101 1X111
11111 111X1
The essential prime implicants are: 0001X, 0X011, and 100X1. A possible
cover of the function is: 0001x, 0x011, 100X1, X0X11, 0X101, and 111X1.
One solutions in sum-of-products for is:
¯
f (e, d, c, b, a) = (e'd'c'b)((e'c'b'a)((e' ¯
d'c'a)(( ¯
d'b'a)((e'c'b'a)((e'd'c'a)
6–12 Using the same Karnaugh-map of Exercise 6–4, we can see the cover of
maxterms (when the output is 0) is OR(0XX0), OR(X00X), and OR(X0X1).
Our function is:
a
ba
dc 00 01 11 10
00 01 13 12
00
14 15 07 16
01
c
18 19 111 110
10
f (d, c, b, a) = (b ' a) ( (d ' c) ( (d ' a) ( (d¯ ' c ' b) ( (d¯ ' c ' b)
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 23
a
ba
dc 00 01 11 10
10 01 03 02
00
14 15 07 16
01
c
d
18 19 111 110
10
a
ba
dc 00 01 11 10
00 01 13 12
00
14 15 07 16
01
c
18 19 X11 X10
10
a
ba
dc 00 01 11 10
10 01 03 02
00
14 15 07 16
01
c X12 X13 X15 X14
11
d
18 19 X11 X10
10
6–28 Using the same Karnaugh-map as Exercise 6–14, we can find a cover of
maxterms as:
f (d, c, b, a) = (d ( c ( b) ' (d ( cb ( a) ' (d¯ ( c ( b ( a)
6–35 Using the same Karnaugh-map as Exercise 6–21, we can find a cover of
maxterms as:
f (d, c, b, a) = (d ( c ( b) ' (c ' b ' a)
a a a
ba ba ba
dc 00 01 11 10 dc 00 01 11 10 dc 00 01 11 10
00 01 13 12 10 01 03 02 10 01 03 12
00
00
00
14 15 07 16 14 15 07 16 04 05 07 16
01
01
01
c
X12 X13 X15 X14 X12 X13 X15 X14 X12 X13 X15 X14
11
11
11
d
d
10
10
b b b
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 25
We can share the b ' a(XX10) term between outputs 0 and 2. We also
share implicant (c ' b)(X10X) between outputs 0 and 1.
6–44 The hazard occurs when a = b = c = 1 and then a toggles to 0. The
output may go low for a period of time equal to that of the NOT-AND
delay. The simplest fix for this problem is to remove the a input from the
AND gate. This simplifies the logic equation from f = a ( (c ' b ' a) to
just f = a ( (c ' b).
Chapter 7
Solutions: Verilog
Descriptions of
Combinational Logic
27
input [3:0] a;
output b;
assign b = (~a[3] & ~a[2]) |
(~a[2] & ~a[1] & ~a[0]) |
(a[2] & ~a[1] & a[0]);
endmodule // fib_assign
7–5 The testbench below requires the user to manually verify the output of
dut0. It automatically checks that all 4 implementations provide the same
answer.
module fib_tb ;
reg [3:0] a;
wire [3:0] o;
reg error;
fib_case dut0(a, o[0]);
fib_casex dut1(a, o[1]);
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 29
end
endmodule // fib_tb
reg [3:0] o;
reg v;
always@(*) begin
casex(a)
16’b1xxxxxxxxxxxxxxx: {v,o} = 5’h1F;
16’b01xxxxxxxxxxxxxx: {v,o} = 5’h1E;
16’b001xxxxxxxxxxxxx: {v,o} = 5’h1D;
16’b0001xxxxxxxxxxxx: {v,o} = 5’h1C;
16’b00001xxxxxxxxxxx: {v,o} = 5’h1B;
16’b000001xxxxxxxxxx: {v,o} = 5’h1A;
16’b0000001xxxxxxxxx: {v,o} = 5’h19;
16’b00000001xxxxxxxx: {v,o} = 5’h18;
16’b000000001xxxxxxx: {v,o} = 5’h17;
16’b0000000001xxxxxx: {v,o} = 5’h16;
16’b00000000001xxxxx: {v,o} = 5’h15;
16’b000000000001xxxx: {v,o} = 5’h14;
16’b0000000000001xxx: {v,o} = 5’h13;
16’b00000000000001xx: {v,o} = 5’h12;
16’b000000000000001x: {v,o} = 5’h11;
16’b0000000000000001: {v,o} = 5’h10;
default: {v,o} = 5’h00;
endcase // casex (a)
end // always@ (*)
endmodule // ff1
Chapter 8
Solutions: Combinational
Building Blocks
wire [15:0] w;
assign b = (‘SS_0 & {7{w[0]}}) |
(‘SS_1 & {7{w[1]}}) |
(‘SS_2 & {7{w[2]}}) |
(‘SS_3 & {7{w[3]}}) |
(‘SS_4 & {7{w[4]}}) |
(‘SS_5 & {7{w[5]}}) |
(‘SS_6 & {7{w[6]}}) |
(‘SS_7 & {7{w[7]}}) |
(‘SS_8 & {7{w[8]}}) |
(‘SS_9 & {7{w[9]}});
31
module progPriEnc83(a, p, b) ;
input [7:0] a, p;
output [2:0] b;
wire [7:0] g;
wire [15:0] c = ({1’b0, c[15:1]} & {1’b0, ~a, ~a[7:1]}) | {p, 8’d0};
assign g = a & (c[15:8] | c[7:0]);
Enc83 enc(g, b);
endmodule // progPriEnc
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 33
assign gt = gta[0];
endmodule // MagCompML
8–17 module funnelShift(a, n, b) ;
parameter i = 16;
parameter j = 8;
parameter l = 3;
input [i-1:0] a;
input [l-1:0] n;
output [j-1:0] b;
assign b = a >> n;
endmodule // funnelShift
8–19 module findMin (a, b, c, z) ;
parameter n = 16;
input [n-1:0] a, b, c;
output [n-1:0] z;
Mux3 #(n) mout(a, b, c, {~agtb & ~agtc, ~bgtc & agtb, agtc & bgtc}, z);
endmodule // findMin
8–21 Our ROM would need to have 16 entries (one for each number). Ad-
dressed with the 4 input bits, each value in our table would be a single
bit. We store a 1 in locations indexed by a prime number (2, 3, 5, 7, etc.)
and a 0 everywhere else.
Chapter 9
Solutions: Combinational
Examples
9–3 The way we build our multiple-of-5 circuit is analogous to the multiple of
3, as shown below:
in7
in6
in0
in in in
Mul5 rem23:21 Mul5 rem20:18 rem5:3 Mul5 rem2:0 out
rout
rout
rout
rin
rin
rin
0 =0
Bit Bit Bit
3 3 3 3
35
module testMul5 ;
reg [7:0] in;
reg error;
wire out;
Multiple_of_5 dut(in, out);
initial begin
in = 0;
error = 0;
repeat(256) begin
#100;
if(out !== ((in % 5) == 0)) begin
$display("ERROR %d -> %b", in, out);
error = 1;
end
in = in + 1;
end
if(error === 0) $display("PASS");
end // initial begin
endmodule // testMul5
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 37
9–9 One correct solution is to implement this function using a case statement.
For example: ‘MONDAY: tomorrow = ‘TUESDAY.
9–10 One possible solution is to pass the year input to the DaysInMonth mod-
ule. We can then modify the case statement:
casex({year[1:0], month})
6’bxx0100: days = 5’d30;
6’bxx0110: days = 5’d30;
6’bxx1001: days = 5’d30;
6’bxx1011: days = 5’d30;
6’b000010: days = 5’d29;//year divisible by 4 => leap year
6’bxx0010: days = 5’d28;
default: days = 5’d31;
endcase // casex ({year[1:0], month})
9–12 There are two different options when designing this circuit. First, we
could modify the comparator block to output a signal gteq for greater
than or equal. Leaving the rest of the logic as is, the arbiter would break
ties to the higher input. A second option is to simply switch the a and b
inputs to each comparator.
9–16 The simplest way to encode the lowest output is to place an inverter at the
output of each magnitude comparator. That way, the comparator output
is the f value. Ties are broken in favor of the higher number input.
9–18 A version of OneInRow and OneInArray are shown below. We do not use
any sort of strategy in selecting the next position to play. We integrated
the OneInArray function into the main module as the 2nd lowest prior-
ity. The OneInArray module itself is nearly identical to the TwoInArry
module.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 39
endmodule // tttLegal
Chapter 10
Solutions: Arithmetic
Circuits
1010
10–14 + 0111
10001
10–18 To count the seven input bits, labeled abcdefg, we need four full adders.
The first two have binary weight 0 and count the number of 1s in inputs
abc and def . Next is a third FA with weight 0 that counts the two previous
sums and the seventh input, g. Finally, the three carry-outs of the FAs
are sent to a final adder with weight 21 . The schematic is shown below:
41
s2
C
a
C
FA
s1
b S
FA
c
S
d
C
e
FA C
f S FA
g S
s0
10–20 Since we are not concerned about subtraction (yet), an overflow condi-
tion is detected when any cout is produced by the adder. This output
selects, using a multiplexer, between the computed sum and 2n 2 1 as
shown below:
a cout
a
n Adder s
b b n 0
n cin s
Mux
n
n
2 -1 1
n
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 43
from 1111 to 0000, a carry-out is generated. This carry out is added back
into the input in order to account for the extra step needed to “go by” -0.
a cout
a
b n s out
Adder
n b n
n cin
sub
10–35 The design of this circuit is similar to that of Exercise 10–20. However,
we now must detect both positive and negative overflow (Table 10.3).
These two bits and their NOR are used as the three selection bits into a
mux.
0 1 01
x 0 1 01
0 1 01
00 0 0
010 1
0000
00011001
{a, 0, 0} cout
a
n+2 s f
Adder
{an-1, an-1, a} b n+2 n+3
n+2 cin
00 1 0 0 1
10110 1 1 1 0
- 10 1 0 0 0
00 0 1 1 0
- 1 0 1
0 0 1
10–54 We could do the division by converting to binary (or decimal) and then
converting back to hex. Instead, we list the multiplication table for E to
find the quotient. By looking in the table below, we find that AE ÷E = C.
The remainder is AE 2 A8 = 6.
× E
0 0
1 E
2 1C
3 2A
4 38
5 46
6 54
7 62
8 70
9 7E
A 8C
B 9A
C A8
D B6
E C4
F D2
Chapter 11
11–1
1.01012 = 1 × 1 + 1 × 0.25 + 1 × 0.0625 = 1.312510
11
11–2 The number represented is 2 16 .
11–5 The first two bits are 01 to represent a positive one integer. We find the
remaining digits as follows:
0.599910 0.000002
20.510 +0.100002
0.099910 0.00102
20.062510 +0.00012
0.037410 0.100102
20.0312510 +0.000012
0.0061510 0.100112
Our fixed-point number is 01.100112 or 1.5937510. The absolute error is
0.00615 and the relative error is 0.38%. We did not round our fixed-point
1
value because 0.00615 < 64 .
11–6 The first two bits are 00 to represent a positive number with no integer
component. We find the remaining digits as follows:
0.377510 0.000002
20.2510 +0.012
0.127510 0.010002
20.12510 +0.0012
0.002510 0.011002
Our fixed-point number is 00.011002 or 0.37510. The absolute error is
0.0025 and the relative error is 0.66%.
45
7
11–10 The maximal relative error occurs at 64 = 0.109375. To view this
graphically, you can use the following WolframAlpha1 R command:
11–11 Because we have both positive and negative numbers, we need a sign
bit. We also need 4 integral bits to represent magnitudes from 0 to 1010 .
1
The minimal accuracy is 16 = 0.0625 which gives a resolution of 81 . Thus,
our format is s4.3.
15
11–13 The value is 16 × 2723 = 15.
4
11–14 The value is 21 × 8 × 2123 = 20.125.
11–17 The sign bit is 1, since the number is negative. 23 in binary is 10011,
which is rounded to 101 × 22 = 5810 10
× 25 . The exponent is the sum of
the bias and 5: 01101. The answer is 1101E01101. The absolute error is
24 2 23 = 1 and the relative error is 4.3%.
11–18 The sign bit is 0, and 100000010 = F 424016. j 100 × 218 = 8410
10
× 221 .
Adding the bias gives a final answer 0100E11101. The number represented
is 220 = 1048576, an error of 48 576, or 4.9%.
11–21 The mantissa of this representation needs to be 4 bits to bound the error
to 10%. The smallest magnitude that must be represented accurately is
1 8 24
32 = 16 × 2 . Our exponent bias is -4 and the maximum exponent is 4,
9 different values. The final representation is s4E4.
11–25 Our additions to the floating point adder presented are shown below.
The initial design included a FF1 shift unit capable of shifting the LSB
to MSB, so we did not modify it. We selectively invert the number with
the smaller magnitude based on the XOR of both input signs and the
subtract signal. The output sign, cs is that of the input with the highest
magnitude (bs is inverted in subtraction). Note that the highest magni-
tude comparison must include the mantissa bits of the input to handle
exponent ties.
1 www.wolframalpha.com
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 47
ae be am bm as bs sub
3 3 5 5
Exp
a>b
Logic
3 3
aeqb
agtb
ge
lm
gm
5
de
Shift
isub
alm
5 5
Add
6
sm
0 1
FF1/Shift
3 5
round
sc
nm
a-b+1 Inc
5
3
ovf ce cm cs
11–29 Shown below, we handle gradual underflow by right shifting the mantissa
if the newly computed exponent is less than 0. We also include a MUX to
clamp the output exponent to 0.
ae be am bm
3 3 5 5
Exp
Logic
3 3 agtb
ge
lm
gm
5
de
Shift
alm
5 5
Add
6
sm
FF1/Shift
3 5
round
sc
nm
a-b+1 Inc
5
4
e3
3
*-1
1
e2:0
0 0
>>
0
3
0 1
ovf ce cm
11–32 Our adder first must include the implicit-1 if present. We must also
modify the shift logic to account for the fact that exponents of 0 and 1
weight the mantissa equally. We then use the same logic as in the previous
problem (omitting the mantissa MSB) for final exponent and mantissa
calculation.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 49
ae be am bm
iam5 5 5
!=0
ibm5
!=0
3 3 6 6
Exp
Logic*
3 3 agtb
ge
lm
gm
5
de Shift
de = a-b-1 if a>0, b=0
de = b t a t 1 if b>0, a=0 alm
6 6
Add
7
sm
FF1/Shift
3 6
round
sc
nm
Inc
a-b+1
6
4
e3
3
*-1
1
e2:0
0 0
>>
0
3
0 1 6
4:0
ovf ce cm
Chapter 12
12–1 We use the propagate and generate equations shown in Equations 12.10-
12.12 to formulate our comparator, setting the output to the generate
signal of the final comparator. (The propagate signal of the final com-
parator signals equality.)
assign po = π
assign go = gi[4] | (gi[3] & pi[4]) | (gi[2] & (&pi[4:3])) |
(gi[1] & (&pi[4:2])) | (gi[0] & (&pi[4:1]));
endmodule // PG5
assign po = π
assign go = gi[5] | (gi[4] & pi[5]) | (gi[3] & (&pi[5:4])) |
(gi[2] & (&pi[5:3])) | (gi[1] & (&pi[5:2])) | (gi[0] & (&pi[5:1]));
endmodule // PG6
wire [31:0] g, p;
assign g = a & ~(b);
51
wire p32;
PG6 pg2(p6[5:0], g6[5:0], p32, agtb);
endmodule // comp32
12–3 The code is shown below and does not use PG modules. Instead, we use a
look ahead tree to detect the presence of a 1 in any lower (higher priority)
bits.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 53
assign b[0] = 0;
assign b[16] = g8[0];
arb4 a20(b[0], g4[2:0], {b[12], b[8], b[4]});
arb4 a21(b[16], g4[6:4], {b[28], b[24], b[20]});
12–5 One (possibly the best) option for implementing this solution is to pre-
compute the value 3a and modify the booth recoders to look at two bits
at a time and select between {0, a, 2a, 3a}. We also need to remove all
sign extensions internal to the multiplier. The other option is to append
0 to the MSB of both a and b, converting the n × m multiplication to
(n + 1) × (m + 1).
12–6 The two tables are below. We can quickly check that the first table is cor-
rect because the sum of each column gives the correct weight from each bit
position. The Verilog would closely follow, using a selection multiplexer,
inverter, and precomputed 3× sum.
bit b8 b7 b6 b5 b4 b3 b2 b1 b0 b21
weight -256 128 64 32 16 8 4 2 1 N/A
d2 -256 128 64 64
d1 -32 16 8 8
d0 -4 2 1 1
lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 8 9 7 8 6 7 5 6 4 5 3 4 2 3 1 2
stage 1 6 5 6 5 4 5 4 3 4 3 2 3 2 1
stage 2 4 4 4 3 4 3 3 2 3 2 2 1
stage 3 3 3 3 2 3 2 2 2 1
stage 4 2 2 2 2 1
lg weight 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
stage 1 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
stage 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
stage 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
12–12 The first table below shows an updated version of Table 12.3, accounting
for the new 7-input counter. The next two tables show the number of
remaining terms to add at each bit-position. We were unable to save a
stage of logic using this scheme, though this is not always the case.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 55
in out
i 2i 3i
1 1 0 0
2 1 1 0
3 1 1 0
4 2 1 0
5 2 2 0
6 2 2 0
7 1 1 1
8 2 1 1
9 2 2 1
lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 8 9 7 8 6 7 5 6 4 5 3 4 2 3 1 2
stage 1 5 4 2 5 3 3 4 3 4 3 2 3 2 1
stage 2 3 3 3 3 2 2 3 2 3 2 2 1
stage 3 2 2 2 2 2 2 2 2 1
stage 4
lg weight 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
stage 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4
stage 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3
stage 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
12–15 The tables are shown below. We must sign-extend the partial products.
in out
i 2i
1 1 0
2 1 1
3 1 1
4 2 1
5 2 2
6 2 2
7 1 1
8 2 1
9 2 2
10 2 2
11 2 2
12 3 3
lg weight 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
pps 10 10 10 10 11 8 9 6 6 7 4 4 5 2 2 3
stage 1 7 7 8 7 6 6 5 4 4 4 3 4 3 2 2 1
stage 2 5 6 5 5 4 4 3 3 3 3 2 3 1
stage 3 4 4 4 4 3 3 3 2 2 2 2 1
stage 4 3 3 3 3 2 2 1
stage 5 2 2 2 2 1
lg weight 29 28 27 26 25 24 23 22 21 20 19 18 17 16
pps 10 10 10 10 10 10 10 10 10 10 10 10 10 10
stage 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7
stage 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5
stage 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4
stage 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3
stage 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Chapter 13
Arithmetic Examples
13–1 We can Booth-encode (Radix-4) and use the Wallace tree shown in the
solution to Exercise 12–11. Slightly more interesting, however, is the nega-
tion on the input to the 2nd multiplier. To do this, we invert the input
that is not the input to the Booth-recoders and add an additional carry
in of 1 to the weight 0 term. This will not add any stages to the Wallace
tree.
13–4 One of the simplest ways of fixing this bug is to add the following code
to the Verilog:
This will add minimum delay to the system, since the comparison is done
in parallel to rest of the conversion process.
13–7 Shown below is our solution. We have increased the width of the mul-
tiplier output to accommodate for s2.5 weights. We did not widen the
adder output since the sum of the weights had a magnitude less than one.
This ensures that adder output can not have a magnitude greater than
that of the greatest input.
57
w0 s2.5
x0 Float To
x
8b Float Fixed s11.0 s13.5
w1 s2.5
x1 Float To
x
8b Float Fixed s11.0 s13.5
Fixed to y
+ Float
w2 s2.5 s11.5 8b Float
x2 Float To
x
8b Float Fixed s11.0 s13.5
w3 s2.5
x3 Float To
x
8b Float Fixed s11.0 s13.5
13–8 See the image below, which has a larger output of the adder. To detect
overflow from the addition, we check that the upper 5 bits (sign, 4 MSB)
are equivalent. If they are not, an overflow (relative to a s11.0 number)
has occurred and the output is saturated.
w0 s2.5
x0 Float To
x
8b Float Fixed s11.0 s13.5
w1 s2.5
x1 Float To
x
8b Float Fixed s11.0 s13.5
Fixed to y
+ Float
w2 s2.5 s15.5 8b Float
x2 Float To
x
8b Float Fixed s11.0 s13.5
w3 s2.5
x3 Float To
x
8b Float Fixed s11.0 s13.5
13–11 Our cross product block diagram is shown below. We can factor the mul-
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 59
tiple and subtract module, and instantiate 3 copies of it. For extra speed,
the full multiplies can be replaced with the ones used in Exercise 13–1.
s3.14
ay s6.28
s3.14 x
bz s7.28
s3.14 cx
az -
s3.14 x
by
s6.28
s3.14
az s6.28
s3.14 x
bx s7.28
s3.14 cy
ax -
s3.14 x
bz
s6.28
s3.14
ax s6.28
s3.14 x
by s7.28
s3.14 cz
ay -
s3.14 x
bx
s6.28
13–12 The code is shown below. We find the values x2 1, (x2 1)2 , and (x2 1)3
using Verilog’s built-in multiplication. We then shift, sign extend, and
manually line up the decimal points before the final add. We only include
20 bits in the final adder (instead of 27) because we discard the lower bits
of (x 2 1)3 which cannot affect the round. The maximum error is 1.6% at
x = 2.
13–14 The simple converter, using Verilog addition and multiplication, is shown
below.
assign b = d[15:12]*14’d1000 +
d[11:8] *14’d100 +
d[7:4] * 14’d10 + d[3:0];
endmodule // bcd2bin
Chapter 14
14–1 Leaving the input low for a minimum of 3 clock edges will put this FSM
into state 00.
carew
rst carew
gns yns rns gew yew rew
100 001 010 001 001 001 001 100 001 010 001 001
We have added new rns and rew states to set the lights to red in both
directions. The state table is shown below:
state next state out
carew=0 carew=1
gns gns yns 100 001
yns rns rns 010 001
rns gew gew 001 001
gew yew yew 001 100
yew rew rew 001 010
rew gns gns 001 001
14–4 One possible binary state assignment is shown in the chart below.
61
state encoding
gns 000
yns 001
rns 010
gew 100
yew 101
rew 110
s0 s0 s0
s1s0 s1s0 s1s0
cs2 00 01 11 10 cs2 00 01 11 10 cs2 00 01 11 10
00 01 x3 12 00 11 x3 02 00 01 x3 02
00
00
00
14 15 x7 06 04 15 x7 06 14 05 x7 06
01
01
01
s2
s2
s2
112 113 x15 014 012 113 x15 014 112 013 x15 014
11
11
11
c
10
10
s1 s1 s1
ns2 s2 š s1 › s2 š s1 ns1 s0 ns0 s1 š s0 š car › s2 š s1 š s0
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 63
14–5 The Verilog has very few changes compared to that supplied in the text:
//State assignment
‘define SWIDTH 3
‘define GNS 3’b000
‘define YNS 3’b001
‘define RNS 3’b010
‘define GEW 3’b100
‘define YEW 3’b101
‘define REW 3’b110
//---------------------------------------------
// define output codes
//---------------------------------------------
‘define GNSL 6’b100001
‘define YNSL 6’b010001
‘define GEWL 6’b001100
‘define YEWL 6’b001010
‘define REDL 6’b001001
case(state)
‘GNS: {next1, lights} = {(carew ? ‘YNS : ‘GNS), ‘GNSL} ;
‘YNS: {next1, lights} = {‘RNS, ‘YNSL} ;
‘RNS: {next1, lights} = {‘GEW, ‘REDL} ;
‘GEW: {next1, lights} = {‘YEW, ‘GEWL} ;
‘YEW: {next1, lights} = {‘REW, ‘YEWL} ;
‘REW: {next1, lights} = {‘GNS, ‘REDL} ;
default: {next1, lights} = {‘SWIDTH+5{1’bx}};
endcase
end
// add reset
assign next = rst ? ‘GNS : next1 ;
endmodule
14–12 The new state diagram can be constructed by inserting a state between
3 and 4. This new state will be renamed 4, while 4 becomes 5, and 5
becomes 6. The new state transitions every cycle.
14–13 The state table is shown below. We omit a column indicating that the
rst input cases a transition to state R.
state next state out
a=0 a=1
R 0 1 0
1 2 2 1
2 3 3 0
3 4 4 0
4 5 1 0
5 M 1 0
M 2 L 1
L 2 2 0
We also include the state table from the modified pulse filler of the previous
exercise:
state next state out
a=0 a=1
R 0 1 0
1 2 2 1
2 3 3 0
3 4 4 0
4 5 5 0
5 6 1 0
6 M 1 0
M 2 L 1
L 2 2 0
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 65
14–14 We wrote a program to permute all possible state assignments and got
the following:
state encoding
R 000
1 001
2 011
3 111
4 101
5 100
M 110
L 010
s0 s0 s0
s1s0 s1s0 s1s0
as2 00 01 11 10 as2 00 01 11 10 as2 00 01 11 10
00 01 13 02 00 11 03 12 00 01 03 12
00
00
14 15 07 06 04 15 17 16 00 14 05 07 06
01
01
01
s2
s2
s2
012 013 015 114 012 013 115 114 112 113 015 114
11
11
11
a
a
08 09 111 010 08 19 011 110 18 09 011 110
10
10
10
s1 s1 s1
s2 s1 s0
14–20 In our state table, shown below, we assume that only one (or zero)
input goes high each cycle. If money is inserted while vending, we go to
the appropriate state.
Chapter 15
Solutions: Timing
Constraints
67
tcy
clk
ts th
dA
tcCQ
qA
tdCQ
ts
th
dB
tcCQ
qB
tdCQ
th
ts
dC
qC tcCQ
tdCQ
15–10 A 2GHz cycle time gives tcy = 500ps. Using Equations 15.3 and 15.4
gives:
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 69
There is a setup violation. We must increase the cycle time to 530ps for
correct operation.
15–13 Using the setup and hold equations:
15–16 We want the flip-flop to work even when there is no combinational logic,
thus:
th f tcCQ + 0
th 2 tcCQ f 0
15–18 To test if the violation is a setup-time problem, simply increase the cycle
time. If running at a slower frequency fixes the problem (to a first order)
it is a setup violation. Hold time violations cannot be detected by varying
the clock. (Saying “if it is not setup then it is hold” is not a valid test
strategy.) Because hold violations occur when the contamination delay
of a circuit is too fast, any method of slowing logic such as increased
temperature or lower voltage would make the circuit work. See Section
20.2.6 for more information on this type of characterization.
15–19 The new ts = th = 10ps since the outer clock effectively arrives 40ps
earlier than the inner clock. There is a 40ps delay on the output, which
gives tdcq = tccq = 120ps.
15–22 We make the table below for all logic paths. In it, we list the time that
a clock can come early to Y so a signal from X does not cause a setup
violation. We also provide the how late a clock can come to Y so a signal
from X does not a hold violation.
From To Early(ps) Late(ps)
X Y 1860 100
X Z 1910 30
Y X - -
Y Z 1910 30
Z X 1560 10
Z Y - -
Note that in the table the clock to X from Z cannot actually be 1910ps
early since that is equivalent to a 1860ps early clock to Z from X (a
violation). We’ve updated the table below:
From To Early(ps) Late(ps)
X Y 1860 100
X Z 10 30
Y X 100 1860
Y Z 1910 30
Z X 30 10
Z Y 30 1910
Chapter 16
16–1 The sequence of states is shown in the table below. It counts through 8
different states only changing a single bit at a time.
State Next
0000 0001
0001 0011
0011 0111
0111 1111
1111 1110
1110 1100
1100 1000
1000 0000
16–2 The sequence of 15 states (all except 0000) is shown in the table below.
This counts through the 15 numbers in a pseudo-random order.
71
1111 1110
1110 1100
1100 1000
1000 0001
0001 0010
0010 0100
0100 1001
1001 0011
0011 0110
0110 1101
1101 1010
1010 0101
0101 1011
1011 0111
0111 1111
16–4 The solution is shown below. We have added a register for storing the
maximum value, the logic to load it, and modified the counter. The new
counter (not shown) will not increment when the count equals max and
will not decrement when the count equals 0.
loadMax
0M D Q max
u n n
x
12
n clk
3
in n
next count
2 D Q
+/-1 n Mux4 n n
Sat 1
n
0
0 clk
n
4
rst
up
down C
L
load
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 73
parameter n = 4 ;
input clk, rst, up, down, load ;
input [n-1:0] in ;
input [1:0] rd;
always@(*) begin
case(rd)
2’b00: {src, n3, n2, n1, n0} = {r0, r3, r2, r1, next};
2’b01: {src, n3, n2, n1, n0} = {r1, r3, r2, next, r0};
2’b10: {src, n3, n2, n1, n0} = {r2, r3, next, r1, r0};
2’b11: {src, n3, n2, n1, n0} = {r3, next, r2, r1, r0};
default {src, n3, n1, n1, n0} = {5*n{1’bx}};
endcase // case (rd)
end
assign outpm1 = src + {{n-1{down}},1’b1} ;
module UDL_Count3Rs(clk, rst, up, down, load, in, rd, rs, r0, r1, r2, r3) ;
parameter n = 4 ;
input clk, rst, up, down, load ;
input [n-1:0] in ;
input [1:0] rd, rs;
always@(*) begin
case(rs)
2’b00: src = r0;
2’b01: src = r1;
2’b10: src = r2;
2’b11: src = r3;
default src = {n{1’bx}};
endcase // case (rs)
end
always@(*) begin
case(rd)
2’b00: {n3, n2, n1, n0} = {r3, r2, r1, next};
2’b01: {n3, n2, n1, n0} = {r3, r2, next, r0};
2’b10: {n3, n2, n1, n0} = {r3, next, r1, r0};
2’b11: {n3, n2, n1, n0} = {next, r2, r1, r0};
default {n3, n1, n1, n0} = {4*n{1’bx}};
endcase // case (rd)
end
16–8 See below for a possible solution. We use two registers: one for the current
number and one for the previous number. When we reset (not shown), we
set the out register to 0 and the last register to 1. That gives the correct
sequence of 0, 1, 1, 2, 3, ... on the output.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 75
overflow
D Q out
+
15
last
D Q
15
Chapter 17
17–1 The diagram below shows our solution. We have factored the states B,
C, D, F, G, H, and I into a separate counter state machine (bottom of the
image). When the go signal is asserted, it walks through the Gray-code
count (possibly through state FO1 ). The controller toggles between states
A and B and goes to the next state when the counter signals ready and
the input is the correct value.
77
A B
sel=0 sel=1
go= go=
(m==1) & rdy (m==0) & rdy
Controller
rdy
sel
go
Counter
To F00
sel F1 F2 F3
go &
F00 x=01 x=11 x=10
go sel
x=00
rst rdy=1
F01
x=00
rdy=0
17–5 We have factored the FSM (see below) twice. First, we add a timer (either
2 or 4 cycles) that counts down the time spent at any one output state.
Next, we factor the sequence of output states into two distinct patterns:
2-1-0, labeled A1-A3 and 3-1, labeled B1-B2. The mast controller (top-
right) selects the pattern and, when done, moves to the next state.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 79
done
psel
go
out=2 out=1 out=0
Pattern out tsel=2 tsel=4 tsel=2/4
to A1 (psel=A)
tdone
FSM A1 tdone
A2
tdone
A3 to B1 (psel = B)
out=2
to R (psel =R)
go
tdone
tsel=2/4
tload
tsel
R to A1 (psel=A)
go
tdone tdone
B1 B2
Timer to R (psel=R)
out=3 out=1
tsel=4 tsel=2
to T5 (psel == T5)
OFF T5 T4 T3 T2 T1 to T3 (psel == T3)
in
to Off (psel == R)
next=in next=0 next=0 next=0 next=0 next=1
in
Master
next
psel
Counter out
17–14 To implement this functionality, we must modify both the master, timer,
and the count. In the timer, the tsel signal must now bet widened to 2 bits
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 81
car_ew dir
car_lt 2 lights
Master FSM Combiner
car_ns ok 9
light
tdone
load
on
time
done
3
Light
Timer1
FSM
ltdone
tload
time
Timer2
LT
lt
c ar_
ok & (!car_lt | tdone) &
o k& dir = lt
load = 0
~car_ns
ok & car_lt
NS ok
&(
car
_e
&~ w &
car t
dir = ns _lt done
load = 1 )
EW
dir = ew
load = 0
ok & (car_ns & tdone)&~car_lt
Chapter 18
Solutions: Microcode
18–1 The new microcode table is shown below. We have added the additional
input signal, car ns, and another address bit. We also modified the state
transitions from GEW to go to YEW only when car ns is asserted.
18–3 We must change a total of 2 bits in our storage, see the new table below:
83
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 85
18–7 Two potential solutions are shown below. In (a), the microcode takes
both flash and tdone as inputs and sets the output and loads the timer
when appropiate. The code most both sequence each sub-component of a
letter (dot, dash, space) and the letters themselves. Solution (b) factors
out reusable letter FSM (potentially also microcoded) and the master FSM
only needs to sequence the letters ‘SOS’.
2 5
tdone
ldone
load
load
tsel
lsel
out
Timer Letter FSM
2
tdone
load
(a)
tsel
Timer
(b)
18–11 The state diagram and simplified microcode are shown below. The mi-
crocode is fairly basic, except with respect to states S11p0 and S11p1.
Here, we must check if the input character is either a ‘1’ or an ‘A’. We
first check for an ’A’ and if that is not a match, the FSM will not assert
c nxt and instead check against the value ‘1’.
c=A
c!=B n_f = 0
start
n_m = 0 c!=C
n_m = 0
c_nxt = 0 c_nxt = 1
c=C c=B
11AB 11A
Success
n_f = 0
n_m = 1
c_nxt = 0
18–17 The code used to write this program is shown below. It relies on a series
of immediate loads, adds, and subtracts to compute each character. The
code itself cannot be easily changed to spell out different strings. Another
solution would be to write “HELLO WORLD” into RAM memory at
initialization then load and output each character in turn. This would
enable the code to be easily changed to spell out other phrases.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 87
PC: 0000, o1: 0000 i:01100100 # PC: 0017, o1: 004f i:01111010
# PC: 0001, o1: 0000 i:01111100 # PC: 0018, o1: 004f i:01011011
# PC: 0002, o1: 0000 i:01100010 # PC: 0019, o1: 004f i:01110011
# PC: 0003, o1: 0000 i:10111100 # PC: 001a, o1: 0020 i:01101000
# PC: 0004, o1: 0000 i:01111011 # PC: 001b, o1: 0020 i:10001010
# PC: 0005, o1: 0000 i:01011100 # PC: 001c, o1: 0020 i:01110011
# PC: 0006, o1: 0000 i:10111100 # PC: 001d, o1: 0057 i:01011010
# PC: 0007, o1: 0000 i:01111100 # PC: 001e, o1: 0057 i:01110011
# PC: 0008, o1: 0000 i:01101000 # PC: 001f, o1: 004f i:01100011
# PC: 0009, o1: 0000 i:11101100 # PC: 0020, o1: 004f i:10000011
# PC: 000a, o1: 0000 i:01110011 # PC: 0021, o1: 004f i:01110011
# PC: 000b, o1: 0048 i:01100011 # PC: 0022, o1: 0052 i:01101110
# PC: 000c, o1: 0048 i:01111010 # PC: 0023, o1: 0052 i:01111010
# PC: 000d, o1: 0048 i:01010011 # PC: 0024, o1: 0052 i:01010011
# PC: 000e, o1: 0048 i:10011010 # PC: 0025, o1: 0052 i:10011010
# PC: 000f, o1: 0048 i:01110011 # PC: 0026, o1: 0052 i:01111010
# PC: 0010, o1: 0045 i:01100111 # PC: 0027, o1: 0052 i:01101000
# PC: 0011, o1: 0045 i:10000011 # PC: 0028, o1: 0052 i:10001010
# PC: 0012, o1: 0045 i:01110011 # PC: 0029, o1: 0052 i:01110011
# PC: 0013, o1: 004c i:01110011 # PC: 002a, o1: 004c i:01011010
# PC: 0014, o1: 004c i:01100011 # PC: 002b, o1: 004c i:01110011
# PC: 0015, o1: 004c i:10000011 # PC: 002c, o1: 0044 i:xxxxxxxx
# PC: 0016, o1: 004c i:01110011
Chapter 19
Solutions: Sequential
Examples
19–1 In the new state diagram, below, we have added another stage compared
to that of the divide-by-3 counter.
0 0
rst A 1 B
0 0
1
1 C
0
0
1
E 1 D
1 0
19–2 We can make a divide-by-9 counter by attaching the output of one divide-
by-3 counter to the input of the next. With this structure, however, the
divide-by-9 signal will be one cycle later than if we had built a single state
machine.
19–4 The new dot (a) and dash (b) state machines are below. If there is a
fourth consequative 1 in the dash detector, we leave the cb signal asserted
and move to state D4. There, if there is a 0 (or if there is a zero in D3),
the FSM signals is and resets back to state 0. A fifth one goes into state
89
1 and does not assert is. The dot detector operates in a similar fashion.
0
1
0
(a) Dot
1
0
1
(b) Dash
19–6 The top level diagram of the Tic-Tac-Toe machine is shown below. We
have added a combinational module, 3-or-full, and sequential module,
Controller. The 3-or-full asserts x3 if the xout signal has 3 in a row.
It sets f9 high when no free squares remain. The controller state machine
is shown in the second image (omitting resets of the game). It toggles
between X playing and O playing until the board is full or the currently
playing side wins.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 91
XReg
9
1
9
E 9
0
xin
MoveGen
9
xout
0
9
oin
OReg
9
1
E
xout x3
3 or full
oin f9
ex
Controller
eo
xwin
owin
gover
XWIN OWIN
x3 x3
!(x3 | f9)
X O
!(x3 | f9)
ex=1, eo=0 ex=0, eo=1
out = 000 out = 000
ex=0, eo=0
out = 001
19–12 The state table for the machine is below. We must also include a counter
that increments when the next state is 10 and resets to zero when the state
becomes 00.
State in Next State out
00 0 00 0
00 1 01 0
01 0 11 0
01 1 00 0
11 0 00 0
11 1 10 0
10 0 11 1
10 1 00 1
Chapter 20
20–1 A sample listing of features is detailed in the table below. This list is not
exhaustive, but provides a sampling of different features.
93
20–2 This feature list would be similar to that of Table 20.1. Users can also
add functionality such as a stop watch (including lap times) or multiple
time zones. The feature list should include all input buttons and their
function.
20–5 Below are six possible test patterns to be applied to the adder:
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 95
Chapter 21
Solutions: System-Level
Design
21–1 Topics mentioned in the text that are unspecified are the velocity of a
serve, paddle size, etc. Other areas that need specification are the time-
step, speed of the ball (grids per time-step), and the speed of the paddle.
Moreover, the description should include the victory condition, what hap-
pens when the victory condition is obtained, and how to restart the game.
One possible edge case is if the bottom of the paddle strikes the top of the
ball. Users may also need to make sure that the ball does not have speed
enough to “go through” the paddle and not detect the collision. Does the
direction of the paddle during a collision or where on the paddle a collision
occurs effect the direction of the ball?
21–5 The new block diagram is shown below, adding a new Load FSM module.
When load is asserted and mode is idle, the load FSM will read each input
node and save it into the RAM. The load signal is asserted during song
playback. Playback cannot begin until after a song has been loaded and
load has been deasserted.
97
addr
addr
data
data
Quarter
Song
Sine
RAM
RAM
addr
data
load
Load
note FSM
21–6 The new block diagram is shown below. We have also updated the mode
table to explain the functionality of the pause and stop signals.
addr
data
data
Quarter
Song
Sine
RAM
RAM
addr
data
load
Load
note FSM
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 99
Name Description
idle No music being played, goes to playback starting at note 0 on start
and load on load.
playback Generating audible output, goes to pause on pause and idle on stop.
pause Not playing music, but will continue playing from current node on
either pause or start. Will ignore load inputs.
loading Loading a song, ignores all inputs that are not part of the load
FSM.
Chapter 22
22–1 Three further examples are: a timer for counting seconds, almost all
visual displays (always valid to the observer), or signals that indicate a
current long-term mode of device operation.
22–2 Periodic signals include a vending machine’s coin inputs, the output of
the arithmetic unit in dataflow FSMs, or messages sent across a bus (see
Chapter 24).
22–4 The Verilog is shown below. We save the incoming whenever the register
is not full, setting full to 0 when count reaches 4.
101
reg nxt_full;
always@(*) begin
casex({countIs5, full, in_v})
3’b000: {nxt_full, in_r} = 2’b00;
3’b001: {nxt_full, in_r} = 2’b11;
3’b01x: {nxt_full, in_r} = 2’b10;
3’b1x0: {nxt_full, in_r} = 2’b00;
3’b1x1: {nxt_full, in_r} = 2’b11;
default: {nxt_full, in_r} = 0;
endcase // case ({countIs5, full, in_r})
end
wire nxt_full_r = rst ? 1’b0 : nxt_full;
endmodule // rv2per
22–6 The double buffer design of Figure 23.11 meets the stated goals. We
could have also designed a module where we wrote (and read) from the
two flip-flops, alternating on every ready cycle.
22–8 One possible implementation of the serializer is show below:
//LSB first
always@(*) begin
case(count)
3’d0: out = data[7:0];
3’d1: out = data[15:8];
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 103
endmodule // serializer
Chapter 23
Solutions: Pipelines
23–1 Using the equations from Section 23.1, the latency is 20.5ns and through-
put is 48 780 000 s21 .
23–2 Using the equations from Section 23.1, the latency is 20.5ns and through-
put is 243 900 000 s21 .
23–3 Using the equations from Section 23.1, the latency is 22.5ns and through-
put is 222 222 222 s21 .
23–4 Using the equations from Section 23.1, the latency is 22.5ns and through-
put is 1 111 111 111 s21 .
23–5 Our plot is shown below and shows that deep pipelining offers diminishing
returns. Errata: Ask the students to calculate the power (energy divided
by clock time) for each pipeline depth.
105
9
2.5 x 10
1.5
throughput
0.5
0
100 110 120 130 140 150 160 170 180 190 200
area
23–7 1. It will take 200.5ns to complete the work (10 units of 20ns each, plus
the final register delay)
2. It will take 40.5ns to complete the work, since each of the 5 units will
complete a task in 20ns.
3. It will take 22.5ns to complete the first task (traversing the entire
pipeline), and 4.5ns to complete each of the 9 subsequent tasks. This
gives a total time of 63ns.
4. The final answers are 20 000.5ns, 4 000.5ns, and 4 518ns. Note that
the pipeline latency-penalty decreased from 50% to 13% as the batch
size got larger.
23–14 The second stage is the bottleneck. The utilizations are 50%, 100%,
25%, and 33%.
23–15 To have no idle stages whatsoever, we need have a throughput of each
stage equal to 200 000 000 s21 we must use the following replication scheme:
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 107
Chapter 24
Solutions: Interconnect
24–1 We can add full ready-valid flow control by adding three signals, as hinted
in the problem statement. bt ready is an input to the interface that
indicates that the bus is ready to transmit. br ready is an output that
indicates that the bus is ready to transmit. br ready should be high when
the arbiter has granted the bus to the client. ct ready is an output that
indicates that the client should be ready to receive data. ct ready should
be asserted high when the bus is ready to transmit, and the address of the
bus data is the address of this client.
The updated Verilog from 24.3 is below:
// arbitration
assign arb_req = cr_valid ;
assign cr_ready = arb_grant ;
109
// bus drive
assign br_valid = arb_grant ;
assign br_addr = arb_grant ? cr_addr : 0 ;
assign br_data = arb_grant ? cr_data : 0 ;
assign br_ready = arb_grant ;
// bus receive
assign ct_valid = bt_valid & (bt_addr == my_addr) ;
assign ct_data = bt_data ;
assign ct_ready = bt_ready & (bt_addr == my_addr) ;
endmodule
24–4 bt valid will not be asserted until all multicast clients are ready to re-
ceive. As we are now given cr vector (a vector that indicates which
clients should be transmitted to) instead of cr addr, we can look at bit
my addr in cr vector to see if our client should be receiving.
The updated Verilog from 24.3 is below. This builds on the Verilog from
exercise 24–1.
// arbitration
assign arb_req = cr_valid ;
assign cr_ready = arb_grant ;
// bus drive
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 111
// bus receive
assign ct_valid = bt_valid & (bt_addr [my_addr]) ;
assign ct_data = bt_data ;
assign ct_ready = bt_ready & (bt_addr [my_addr]) ;
endmodule
24–4 To expand from a 2x2 crossbar to a 4x4 crossbar, we increase the size of
the request/grant matrices. The Verilog below is an expansion of 24.5.
// request matrix
wire req00 = (c0r_addr == 0) & c0r_valid ;
wire req01 = (c0r_addr == 1) & c0r_valid ;
wire req02 = (c0r_addr == 2) & c0r_valid ;
wire req03 = (c0r_addr == 3) & c0r_valid ;
wire req10 = (c1r_addr == 0) & c1r_valid ;
wire req11 = (c1r_addr == 1) & c1r_valid ;
wire req12 = (c1r_addr == 2) & c1r_valid ;
wire req13 = (c1r_addr == 3) & c1r_valid ;
wire req20 = (c0r_addr == 0) & c2r_valid ;
wire req21 = (c0r_addr == 1) & c2r_valid ;
wire req22 = (c0r_addr == 2) & c2r_valid ;
wire req23 = (c0r_addr == 3) & c2r_valid ;
wire req30 = (c1r_addr == 0) & c3r_valid ;
wire req31 = (c1r_addr == 1) & c3r_valid ;
wire req32 = (c1r_addr == 2) & c3r_valid ;
// connections
assign c0t_valid = (grant00 & c0r_valid) | (grant10 & c1r_valid) | (grant20 & c2
assign c0t_data = ({dw{grant00}} & c0r_data) | ({dw{grant10}} & c1r_data) |
({dw{grant20}} & c2r_data) | ({dw{grant30}} & c3r_data) ;
assign c1t_valid = (grant01 & c0r_valid) | (grant11 & c1r_valid) | (grant21 & c2
assign c1t_data = ({dw{grant01}} & c0r_data) | ({dw{grant11}} & c1r_data) |
({dw{grant21}} & c2r_data) | ({dw{grant31}} & c3r_data) ;
assign c2t_valid = (grant02 & c0r_valid) | (grant12 & c1r_valid) | (grant22 & c2
assign c2t_data = ({dw{grant02}} & c0r_data) | ({dw{grant12}} & c1r_data) |
({dw{grant22}} & c2r_data) | ({dw{grant32}} & c3r_data) ;
assign c3t_valid = (grant03 & c0r_valid) | (grant13 & c1r_valid) | (grant23 & c2
assign c3t_data = ({dw{grant03}} & c0r_data) | ({dw{grant13}} & c1r_data) |
({dw{grant23}} & c2r_data) | ({dw{grant33}} & c3r_data) ;
// ready
assign c0r_ready = (grant00 & c0t_ready) | (grant01 & c1t_ready) | (grant02 & c2
assign c1r_ready = (grant10 & c0t_ready) | (grant11 & c1t_ready) | (grant12 & c2
assign c2r_ready = (grant20 & c0t_ready) | (grant21 & c1t_ready) | (grant22 & c2
assign c3r_ready = (grant30 & c0t_ready) | (grant31 & c1t_ready) | (grant32 & c2
endmodule
24–10 To build a buffered crossbar, we will expand upon the 2x2 crossbar
in 24.5. We will start by creating a parametrized first-in, first-out (FIFO)
buffer. This requires a random-access memory (RAM) and a counter.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 113
// memory array
reg [dw-1:0] memory [0:elements-1];
if (write_en)
memory [addr_in] <= data_in; // write to memory if enable high
end
endmodule
endmodule
// buffer memory
RAM #(.aw (aw), .dw (dw)) memory (.clk (clk), .data_in (data_in), .data_out (dat
.addr_in (write_addr), .addr_out (read_addr),
.write_en (write_en));
Now, we modify the previous 2x2 crossbar to add the FIFO buffers.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 115
// buffer wires
wire buf00_empty, buf01_empty, buf10_empty, buf11_empty ;
wire buf00_full, buf01_full, buf10_full, buf11_full ;
wire [dw-1:0] buf00_data, buf01_data, buf10_data, buf11_data ;
// request matrix
wire req00 = (c0r_addr == 0) & c0r_valid ;
wire req01 = (c0r_addr == 1) & c0r_valid ;
wire req10 = (c1r_addr == 0) & c1r_valid ;
wire req11 = (c1r_addr == 1) & c1r_valid ;
// arbitration 0 wins
wire grant00 = ~buf00_empty ;
wire grant01 = ~buf01_empty ;
wire grant10 = ~buf10_empty & buf00_empty ;
wire grant11 = ~buf11_empty & buf01_empty ;
// connections
assign c0t_valid = grant00 | grant10 ;
assign c0t_data = ({dw{grant00}} & buf00_data) |
({dw{grant10}} & buf10_data) ;
assign c1t_valid = grant01 | grant11 ;
assign c1t_data = ({dw{grant01}} & buf01_data) |
({dw{grant11}} & buf11_data) ;
// buffer instantiations
FIFO #(.dw (dw), .aw (aw)) buf00 (.clk (clk), .rst (rst),
.data_out (buf00_data), .data_in (c0r_data),
.write_en (req00), .read_en (grant00),
.full (buf00_full), .empty (buf00_empty));
FIFO #(.dw (dw), .aw (aw)) buf01 (.clk (clk), .rst (rst),
.data_out (buf01_data), .data_in (c0r_data),
.write_en (req01), .read_en (grant01),
.full (buf01_full), .empty (buf01_empty));
FIFO #(.dw (dw), .aw (aw)) buf10 (.clk (clk), .rst (rst),
.data_out (buf10_data), .data_in (c1r_data),
.write_en (req10), .read_en (grant10),
.full (buf10_full), .empty (buf10_empty));
FIFO #(.dw (dw), .aw (aw)) buf11 (.clk (clk), .rst (rst),
.data_out (buf11_data), .data_in (c1r_data),
.write_en (req11), .read_en (grant11),
.full (buf11_full), .empty (buf11_empty));
endmodule
Chapter 25
25-3 1. See the timing table below. The total time is 130 cycles.
117
Command Time
Activate R0 5
Read C1 5
Read C2 5
Read C3 5
Precharge R0 5
Act R1 5
Read C0 5
RAS: Wait to PC 2
Precharge R1 5
Act R2 5
Read C0 5
RAS: Wait to PC 2
Precharge R2 5
Act Ra 5
Read C3 5
RAS: Wait to PC 2
Precharge Ra 5
Act Rb 5
Read C3 5
RAS: Wait to PC 2
Precharge Rb 5
Act R0 5
Read C4 5
RAS: Wait to PC 2
Precharge R0 5
Act Rb 5
Read C1 5
Read C2 5
Precharge Rb 5
Total 130 cycles
2. See the timing table below with rearranged values. This solutions
uses a greedy algorithm to group all accesses to one row together.
The total time is 106 cycles.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 119
Command Time
Activate R0 5
Read C1 5
Read C2 5
Read C3 5
Read C4 5
Precharge R0 5
Act R1 5
Read C0 5
RAS: Wait to PC 2
Precharge R1 5
Act R2 5
Read C0 5
RAS: Wait to PC 2
Precharge R2 5
Act Ra 5
Read C3 5
RAS: Wait to PC 2
Precharge Ra 5
Act Rb 5
Read C3 5
Read C1 5
Read C2 5
Precharge Rb 5
Total 106 cycles
1–6 With a single word cache line, the cache hit rate will be 85% (the base-
line). The sequence of addresses does not matter. With a line size of n,
however, we can eliminate (with P = 0.95) the next n-1 misses. As a
simplified example with 1000 cache accesses, we have a total of 150 cache
misses. With a line size of 2, the number of misses is reduced by about 71
(92% hit rate). With line sizes of 4 and 8, the number is reduced by 107
(95.6%) and 125 (97.5%), respectively. The memory bandwidth required
for data increasing with line size because unneeded words can potentially
be fetched. The control bandwidth decreases, as there are less requests.
25-8 1. We simply need two address such that (A1 mod n) = (A2 mod n).
Each access will conflict, evicting the previous address’s line.
2. A sequence of n+1 unique addresses will cause a miss on every access,
assuming we evict the least recently used value.
3. A sequence of w + 1 addresses that map to the same set will never
hit in the cache.
Chapter 26
Solutions: Asynchronous
Sequential Circuits
1–1 The flow table is shown below. The circuit sets itself into the 0 or 1 state
when both of its inputs are 0 or 1, respectively. When the inputs are 01,
the state toggles, and inputs 10 cause the state to hold.
Next
State
00 01 11 10
0 0 1 1 0
1 0 0 1 01
1–2 The waveform, state transition table, and K-maps are shown below. We
had to use 6 different states, and do not include transitions that are in-
consistent with the problem description.
121
A B C D A B C A E F D A 01 11
a a
b b
Next {a,b}
State s2s1s0 {A,B}
00 01 11 10
A A E - B 000 00
B - - C 0B 010 10
C - D C - 110 00
D A D - 0D 100 00
E - ED F - 001 01
F - - CF D 101 00
a,b b a,b b
s2,s1 s2,s1
A E B E F
x x x
000 001 010 001 101
C B
x x x x x x
110 010
s1
s1
D C
x x x x x x
100 110
s2
s2
A D D F D
x x x
000 100 100 101 100
a a
s0=0 s0=1
1–5 The waveform, state transition table, and K-maps are shown below.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 123
A B C D E F A
in
s0,i i
s2,s1
Next
State s2s1s0 abc A B B C
0 1
000 001 001 011
A A B 000 000
B C B- 001 100 E D D C
110 010 010 011
C C D 011 000
s1
D E D 010 010 E F
x x
110 100
E E- F 110 000
s2
F A F- 100 001 A F
x x
000 100
s0
1–9 The solution is presented below. Taking the simple approach yields a total
of 12 different states. However, if i is the first signal to rise, and i and q
have the same frequency, only 4 states are necessary. We show the K-map
for only the 4-state solution.
States1 A B C D E F G H I J K L A B
States2 A B B B C C C D D D A A A B
i,q q
Next {i,q} s1,s0
State s1s0 x
00 01 11 10 A A A B
A A EA -
A B 00 0 00 00 00 01
B C B- CB 0B 01 1 C B B B
C C D C CD 11 0 11 01 01 01
s0
D AD D A 0D 10 1 C D C C
11 10 11 11
s1
D D A D
10 10 00 10
0 0 0 0 0 0
s
0 0
b
1 1 0 1 1 0
a a a a
in=0 in=1 in=0 in=1
r a
Chapter 27
Solutions: Flip-Flops
ts = max(t1 + t3 + t5 , t2 + t4 )
27–2 The hold time in this situation is 0. Once the g input falls to 0, no change
in d can change s’, r’ or the outputs.
th = 0
tdDQ = max(t1 + t3 + t5 , t2 ) + t4
tdGQ = max(t3 + t5 , t2 ) + t4
27–5 As stated in the text, the setup time is that of the master.
ts = max(t2 + t4 + t6 , t3 + t5 )
125
Next {d,c}
State q
00 01 11 10
1110 1111 - 1010 1110 q
1111 0111 - - - q
0111 0111 0101 - 0110 q
0110 - - - 1110 q
1010 - 1011 1010 1110 1
1011 1111 1011 1010 - 1
0101 0111 0101 0101 0111 0
Chapter 28
Solutions: Metastability
and Synchronization Failure
28–6 There will be an error on about 40% of the asynchronous signal transi-
tions.
ts + th
PE = = 0.4
tcy
28–9 The asynchronous signal must transition no more than 250 times a second.
fE
fE = ts +th = 250Hz
tcy
28–12 The answers are shown below. We can compute the ratio of errors in
FF1 vs. FF2 below:
2tw
P1 (ts1 + th1 ) exp τs1
=
P2 (ts2 + th2 ) exp 2tw
τs2
P1
0.2 exp 5 7 1010 tw
=
P2
127
Chapter 29
Solutions: Synchronizer
Design
ts + th 2(5tcy 2 ts 2 tdCQ )
PES = exp
tcy τs
= 5.8 × 10228
fES = fa PES
= 1.2 × 10221
M T BF = 8.63 × 1018 s
29–3 See below, noting that 5 flip-flops gives a total wait time of 4 cycles:
ts + th 24(tcy 2 ts 2 tdCQ )
PES = exp
tcy τs
= 3.0 × 10220
fES = fa PES
= 5.9 × 10212
M T BF = 1.7 × 1011 s
29–5 The final answer is that we must only wait 2 clock cycles. First, we
calculate the failure frequency:
129
Next, we can compute the probability of error and finally the error window:
fES
PES = = 3.2 × 10216
fa
2tw tcy
exp = PE S = 4.5 × 10215
τs ts + th
tw = 2τs log(4.5 × 10215 ) = 1.32ns
tw + ts + tdCQ
N g g 1.4
tcy
29–7 The time between bit transitions must be at least tcy,o + ts + th . This is
to ensure that two transitions can enter into an illegal state in consecutive
cycles.
29–9 In the simplest form of the problem requires placing synchronizers on the
increment and decrement signals. This is a case in which the input logic
does not require knowledge of the output signal (can increment past f f16 ).
We may want to include logic that will only move the counter once for
every positive edge of an input and not continuously.
29–11 The figure below shows the control data-path for indicating if a partic-
ular register is ready for new data (not present) or valid (present). We
construct an FSM with 4 states, keeping 1 bit of state in each clock do-
main. In 2 of the states (00, 11) the register is considered empty, while it
is considered full in the others. The state diagram is shown at the bottom
of the figure. Note that only logic from one clock domain can only change
the state bit in the same domain.
Copyright (c) 2012 by W.J. Dally and R.C. Harting, all rights reserved 131
S1 iready
S0i
sync
clkin
input clock domain
output clock domain
S0 ovalid
S1o
sync
clkout
ivalid
00 10
oready
oready
01 ivalid 11