0% found this document useful (0 votes)
67 views

Maths

This document discusses cache organization and memory hierarchy performance for various cache configurations including: 1) A direct mapped cache with 32-bit addresses, 2048 blocks of 2048 bits each, resulting in a 0.5MB cache size. 2) A 2-way set associative cache with the same parameters, requiring 1 bit for way selection and 10 bits for the index. 3) A 4-way set associative cache that is byte-addressable rather than word-addressable, requiring 8 bits for offset and 9 bits for index. It then calculates the effective CPI and average memory access time for a sample system with these cache parameters.

Uploaded by

sukanta majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Maths

This document discusses cache organization and memory hierarchy performance for various cache configurations including: 1) A direct mapped cache with 32-bit addresses, 2048 blocks of 2048 bits each, resulting in a 0.5MB cache size. 2) A 2-way set associative cache with the same parameters, requiring 1 bit for way selection and 10 bits for the index. 3) A 4-way set associative cache that is byte-addressable rather than word-addressable, requiring 8 bits for offset and 9 bits for index. It then calculates the effective CPI and average memory access time for a sample system with these cache parameters.

Uploaded by

sukanta majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

You have been asked to design a cache with the following properties:
• Data words are 32 bits each
• A cache block will contain 2048 bits of data
• The cache is direct mapped
• The address supplied from the CPU is 32 bits long
• There are 2048 blocks in the cache
• Addresses are to the word addressable.

There are 211 bits / block and there are 25 bits / word.
Thus there are 26 words / block, so we need 6 bits of offset.
There are 211 blocks and the cache is direct mapped. Therefore, we need 11 bits of
index.
Tag field size = 32-(6+11) = 15 bits
total size of the cache
2048 blocks in the cache and there are 2048 bits / block.
So, 256 bytes / block
Hence, 2048 blocks x 256 bytes / block = 219 bytes (or 0.5 MB).
2. Letʼs consider the previous Scenario once again, what happens if we make our cache
2-way set-associative instead of direct mapped.

Number of bits in block offset = 6.


Number of bits required for way- selection = 1 (21).
Total no of blocks = 211
Number of sets = no of blocks/(blocks/set)
= 211 / 21 = 210
10 bits of index needed.
Tag field = 16 bits.

3. letʼs consider what happens if data is byte addressable, and the cache organization
is 4-way set associative for the 1st question.
Number of bits in offset? –
There were 6 bits in the offset - Now, each of the 4 bytes of a given word can be
individually addressed
Therefore, we need 2 more bits of address per word –
Thus, 26 words * 22 bytes / word = 28
8 bits of offset are needed.
Number of bits in index?
211 blocks /( 22 blocks/set) = 29 sets o (9 bits of index needed)
Number of bits in tag? - 32 – 8 – 9 = 15 bits
4. Consider CPI for ALU, Load/Store , Branch and Jump are 1, 1.5, 1.5 and 1
respectively. instruction mix of 40% ALU and logical operations, 30% load and
store, 20% branch and 10% jump instructions.4-way set associative cache with
separate data and instruction cache miss rate of 0.20 for data and miss rate of 0.10
for instructions, miss penalty of 80 cycles and 50 cycles for data and instruction
cache respectively (and assume a cache hit takes 1 cycle) . What is the effective CPU time
(or effective CPI with memory stalls) and the average memory access time for this
application with this Cache organization?

CPIIdeal = (0.4*1) + (0.3*1.5) + (0.2*1.5) + (0.1*1)


= 1.25.

Data memory stall = (.3 * .2 *80 ) = 4.8


Instruction memory stall = (1 * .1 * 50 ) = 5
Total memory stall cycle = 9.8
CPIStall = 1.25 + 9.8
Average memory access time =
percentage of data accesses (hit time + data miss rate* miss penalty) +
percentage of inst. Accesses (hit time + inst .miss rate * miss penalty)

total access to memory during program execution = 1(for hit)+ 0.3(for load/Store)
AMAT = (0.3/1.3) *(1 + .2 * 80) + (1/1.3)*(1+.1*50)
5. Calculate the loading time difference between no memory interleaving and 4-modules memory
interleaving when a cache read miss occurs where main memory have to be accessed and
subsequent transfer data to the cache. – Size of block needed to transfer from memory to cache
= 8 words – Access time for main memory (1st word) = 8 cycles/word – Access time for main
memory (2nd to 8th word) = 4 cycles/word (No address decoding is necessary for same memory
module.) – Access/transfer time from main memory to cache = 1 cycle/word.

No memory interleaving :
Loading time = cache miss + (1st word + 2nd to 8th word) + 1 cache transfer
= 1 + 8 + (7 × 4) + 1 = 38 cycles
Memory interleaving :
Loading time = cache miss + (1st word in 4 modules + 2nd word in 4 modules) + 4 cache transfers
= 1 + (8 + 4) + 4 = 17 cycles
6. Consider a computer with the following characteristics: total of 1Mbyte of main memory; word
size of 1 byte; block size of 16 bytes; and cache size of 64 Kbytes. For the main memory addresses
of F0010 and CABBE, give the corresponding tag and offset values for a fully-associative cache.

With a fully associative cache, the cache is split up into a TAG and a WORDOFFSET field. We no
longer need to identify which line a memory block might map to, because a block can be in any
line and we will search each cache line in parallel. The word-offset must be 4 bits to address each
individual word in the 16-word block. This leaves 16 bits leftover for the tag.
F0010
Word offset = 0h
Tag = F001h
CABBE
Word offset = Eh
Tag = CABBh
7. Consider an Intel P4 microprocessor with a 16 Kbyte unified L1 cache. The miss rate for this cache
is 3% and the hit time is 2 CCs. The processor also has an 8 Mbyte, on-chip L2 cache. 95% of the
time, data requests to the L2 cache are found. If data is not found in the L2 cache, a request is
made to a 4 Gbyte main memory. The time to service a memory request is 100,000 CCs. On
average, it takes 3.5 CCs to process a memory request. How often is data found in main memory?

Average memory access time = Hit Time + (Miss Rate x Miss Penalty)
Average memory access time = Hit TimeL1 + (Miss Rate L1 x Miss Penalty L1)
Miss PenaltyL1 = Hit TimeL2 + (Miss Rate L2 x Miss Penalty L2)
Miss PenaltyL2 = Hit TimeMain + (Miss Rate Main x Miss Penalty Main)
3.5 = 2 + 0.03 (15 + 0.05 (200 + X (100,000)))
3.5 = 2 + 0.03 (15 + 10 + 5000X)
3.5 = 2 + 0.03 (25 + 5000X)
3.5 = 2 + 0.75 + 150X
3.5 = 2.75 + 150X
0.75 = 150X
X = .005

Thus, 99.5% of the time, we find the data we are looking for in main memory.

You might also like