HW6 Spring2022 Solution 2
HW6 Spring2022 Solution 2
Izadi
20 PT.
Question 1
5.2 Caches are important to providing a high-performance memory hierarchy to
processors. Below is a list of 32-bit memory address references, given as word
addresses.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
5.2.1 For each of these references, identify the binary address, the tag, and the index
given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a
hit or a miss, assuming the cache is initially empty.
5.2.2 For each of these references, identify the binary address, the tag, and the index
given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also
list if each reference is a hit or a miss, assuming the cache is initially empty.
5.2.3 You are asked to optimize a cache design for the given references. There are
three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has
1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss
rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an
access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best
cache design?
20 PT.
Question 2
5.3 For a direct-mapped cache design with a 32-bit address, the following bits of the
address are used to access the cache.
20 PT.
Question 3
Assume an instruction cache miss rate for an application is 2% and the data cache
miss rate of 4%. Assume further that our CPU has a CPI of 2 without any memory
stalls and the miss penalty is 40 cycles for all misses.
a. Determine the overall CPI with the indicated misses, provided the frequency of
all loads and stores in the application is 20%.
b. Suppose we increase the performance of the machine in the above example by
reducing its CPI from 2 to 1 via pipelining. Determine the new overall CPI.
40 PT.
Question 4
The following is a series of address references given as word addresses: 9, 4, 20, 4, 8,
15, 5, 19, 4, 20, 4, 22, 7, 17, 10.
a. Assume direct map with a word size of 1 byte, a block size of 1 word, and a total
size of 8 words. Show the hits and misses and final cache contents. Show the final
cache content.
Location Hit/Miss?
9
4
20
4
8
15
5
19
4
20
4
22
7
17
10
b. Assume direct map with word size of 1 byte, a block size of 2, and a total size of
8 words. Show the hits and misses and final cache contents.
c. Assume two way associative for the same total cache locations as of part b. Show
the hits and misses and the final cache contents.
d. Assume a fully associated cache for the same total cache locations as of part b.
Show the hits and misses and the final cache contents.
Key
20 PT.
Question 1, Do problem 5.2.1, 5.2.2, 5.2.3
20 PT. Question 2, Do problem 5.3
5.3.1 8
5.3.2 32
5.3.3 1_ (22/8/32) _ 1.086
5.3.4 3
5.3.5 0.25
5.3.6 _Index, tag, data_
<0000012, 00012, mem[1024]>
<0000012, 00112, mem[16]>
<0010112, 00002, mem[176]>
<0010002, 00102, mem[2176]>
<0011102, 00002, mem[224]>
<0010102, 00002, mem[160]>
20 PT.
Question 3
Show internal architecture of a direct cache with 512 blocks and 8 words per block.
Tag 18
9 Block Offset Data
Hit Index
3
V Tag Data
9 32 32
Mux
32
40 PT.
Question 4
The following is a series of address references given as word addresses: 9, 4, 20, 4, 8,
15, 5, 19, 4, 20, 4, 22, 7, 17, 10.
a. Assume direct map with a word size of 1 and a total size of 8 words. Show the
hits and misses and final cache contents. Show the final cache content.
Location Hit/Miss?
9 miss
4 miss
20 miss
4 miss
8 miss
15 miss
5 miss
19 miss
4 hit
20 miss
4 miss
22 miss
7 miss
17 miss
10 miss
Location Hit/Miss?
9 miss
4 miss
20 miss
4 miss
8 Hit
15 miss
5 Hit
19 Miss
4 Hit
20 miss
4 miss
22 miss
7 miss
17 miss
10 miss
c. Assume two way associative for the same total cache locations as of part b. Show
the hits and misses and the final cache contents.
Location Hit/Miss?
9 miss
4 miss
20 miss
4 hit
8 miss
15 miss
5 miss
19 Miss
4 hit
20 miss
4 hit
22 miss
7 miss
17 miss
10 miss
I used LRU replacement algorithm. In case both had the same usage, I used FIFO.
d. Assume a fully associated cache for the same total cache locations as of part b.
Show the hits and misses and the final cache contents.
Location Hit/Miss?
9 miss
4 miss
20 miss
4 hit
8 miss
15 miss
5 miss
19 miss
4 hit
20 hit
4 hit
22 miss
7 miss
17 miss
10 miss
I used LRU replacement algorithm. In case both had the same usage, I used FIFO.
a.
I-cache miss rate = 2%
D-cache miss rate = 4%
Miss penalty = 40 cycles
Base CPI (ideal cache) = 2
Load & stores are 20% of instructions
Miss cycles per instruction
I-cache: 0.02 × 40 =.8
D-cache: 0.2 × 0.04 × 40 = .32
Actual CPI = 2 + .8 + .32 = 3.12