0% found this document useful (0 votes)
54 views

207 Assignment 6

The document contains solutions to several problems related to computer architecture. It first calculates the CPI, MIPS, and execution time for two different machines. It then calculates the theoretical speedup of caching, the average access time T as a function of cache hit ratio H, and compares direct-mapped caching to caching with cache misses. Finally, it provides solutions for problems related to instruction execution, bus architectures, locality, cache organization, and set-associative caching.

Uploaded by

CKB The Artist
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

207 Assignment 6

The document contains solutions to several problems related to computer architecture. It first calculates the CPI, MIPS, and execution time for two different machines. It then calculates the theoretical speedup of caching, the average access time T as a function of cache hit ratio H, and compares direct-mapped caching to caching with cache misses. Finally, it provides solutions for problems related to instruction execution, bus architectures, locality, cache organization, and set-associative caching.

Uploaded by

CKB The Artist
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

207 Assignment 6

10869753

Solution

CPI = 45000 + (2*32000) + (2*15000) + (8000*2) / (100 000) = 155 / 100 = 1.55

MIPS = 40 / 1.55 / 1000000 = 25.8 MIPs

Execution time = (100 000 instructions) * 1.55 CPI = 155 000 cycles

* 1/40M sec = 0.003875 = 3.87 ms


For machine A

CPI = [(8 * 1 + 4*3 +2*4 + 4*3) x 10^6]/(8+4+2+4) * 10^6

=2.22

MIPS = 200 * 10^6/ 2.22 *10^6 = 90

CPU = 18 * 10^6 x 2.2/ 200 * 10^6 = 0.2 seconds

For machine *

CPI = (10x1 +8x2 +2x4 + 4x3) x 10^6/ (10 + 8 + 2 + 4) x 10^6 = 1.92

MIPS = 200 x 10^6/ 1.92 x 10^6 = 104

CPU = 24 x 10^6 x 1.92/ 200 x 10^6 = 0.23 seconds

Even though machine b has a higher MIOS, it needs a longer execution time to execute a similar set of
insrtructions.

2.8

A processor accesses main memory with an average access time of A smaller cache memory is
interposed between the processor and main memory. The cache has a significantly faster access time of
The cache holds, at any time, copies of some main memory words and is designed so that the words
more likely to be accessed in the near future are in the cache. Assume that the probability that the next
word accessed by the processor is in the cache is H, known as the hit ratio. a. For any single memory
access, what is the theoretical speedup of accessing the word in the cache rather than in main memory?
b. Let T be the average access time. Express T as a function of and H. What is the overall speedup as a
function of H? c. In practice, a system may be designed so that the processor must first access the cache
to determine if the word is in the cache and, if it is not, then access main memory, so that on a miss
(opposite of a hit), memory access time is Express T as a function of and H. Now calculate the speedup
and compare to the result produced in part (b)

Solution

Speedup = (time to access in main memory) / (time to access in cache)

= T2/T1

b. The average access time could be calculated as the following:

T = H × T1 + (1 – H) × T2

By applying the following equation:

Speedup = (Execution time before enhancement) / (Execution time after enhancement)

= T2 / T = T2 / (H × T1 + (1− H)T2)

= 1/ ( (1− H) + H× (T1 /T2) )


c. T = H × T1 + (1 – H) × (T1 + T2) = T1 + (1 – H) × T2)

This is Equation in Chapter 4. Now,

Speedup = (Execution time before enhancement) / (Execution time after enhancement)

= T2 /T = T2 / (T1 + (1− H)T2 ) = 1 / ( (1− H) + T1 /T2 )

Problem 3.1

Step 1: 3005 → IR

Step 2: 3 → AC

Step 3: 5940 → IR

Step 4: 3 + 2 = 5 → AC

Step 5: 7006 → IR

Step 6: AC → Device 6

Problem 3.3

Consider a hypothetical 32-bit microprocessor having 32-bit instructions composed of two fields: the
first byte contains the opcode and the remainder the immediate operand or an operand address.

a. What is the maximum directly addressable memory capacity (in bytes)?

b. Discuss the impact on the system speed if the microprocessor bus has

1. a 32-bit local address bus and a 16-bit local data bus, or

2. a 16-bit local address bus and a 16-bit local data bus.

c. How many bits are needed for the program counter and the instruction register?

Solution

.2^( 32-8) = 2^24 = 16,777,216 bytes = 16 MB

b.1. a 32-bit local address bus and a 16-bit local data bus. Instruction and data transfers would take
three bus cycles each, one for the address and two for the data. If the address bus is 32 bits, the whole
address can be transferred to memory at once and decoded there; however, since the data bus is only
16 bits, it will require 2 bus cycles (accesses to memory) to fetch the 32-bit instruction or operand.
b.2. a 16-bit local address bus and a 16-bit local data bus. Instruction and data transfers would take four
bus cycles each, two for the address and two for the data. Therefore, that will have the processor
perform two transmissions in order to send to memory the whole 32-bit address; this will require more
complex memory interface control to latch the two halves of the address before it performs an access to
it. In addition to this two-step address issue, since the data bus is also 16 bits, the microprocessor will
need 2 bus cycles to fetch the 32-bit instruction or operand.

c. For the PC needs 24 bits (24-bit addresses), and for the IR needs 32 bits (32-bit addresses).

doubling the bus wait states for read and write operations:

Original cycles = fetch opcode (4) + fetch operand address (3) + fetch operand(3) + increment (3) + store
(3)
increasing two bus wait states for read and write:
 
= 8+6+6+6+6 = 29 cycles.
              
increase in percentage = (29-16)/16 = 13/16 =   0.8125

= 81.25%
b)
increase increment operation to 13 cycles..
= 8+6+6+13+6 = 39 = (39-16)/16 = 23/16

= 143.75%

Problem 4.1

a. Data spatial locality


b. Instruction spatial locality
c. Data temporal locality
d. Instruction temporal locality.

Problem 4.2

Program A is better than B. Even though they perform the sum of the square of the absolute difference
between two sequence; even though they seem to be same the performance
Z[i]=X[i]-Y[i]
Z[i]=Z[i]*Z[i]
both these statements are executed same no of times in both the program
but the condition i<n is checked for n times in program A whereas it is checked
2 times in program B because of the 2 for loops
T2s = 1.5 +(1 - 0.97 ) x T2 = 1.5 +0.03T2

T1s = 1+(1 – 0.95 ) x T2 = 1 + 0.05T2

1+ 0.05T2 > 1.5 + 0.03 T2

T2 > 25ns

Increasing hit ratio to 0.97 at the cost of increasing cache access time to 1.5 will improve average
memory access time in the case that main memory access time is larger than 25 ns. T2  > 25 n

Problem 5.1

There are 6 bits

Problem 5.9

Number of bytes in main memory = 2^16.

Block size = 8 bytes. , A byte in a block is represented with 3 bits. So Byte number field requires 3 bits

Number of blocks in main memory = 2^16/8 = 2^13

Given: Cache is Direct Mapped

Number of cache lines = 32 , so line number in cache can be represented with 5 bits.

Tag bits = 16 -5 -3 = 8 bits

Byte No Line No

0001 0001 0001 1011 4379%32 = 27

1100 0011 0011 0100 49972%32 = 20

1101 0000 0001 1101 53277%32 = 29

1010 1010 1010 1010 43690%32 = 10

c)

Decimal equivalent of 0001 1010 0001 1010 is : 6682

So when byte 6682 is in cache memory, along with it 7 other bytes are in cache memory.( All 8 bytes of a
block are stored in cache line)
0001 1010 0001 1000

0001 1010 0001 1001

0001 1010 0001 1010

0001 1010 0001 1011

0001 1010 0001 1100

0001 1010 0001 1101

0001 1010 0001 1110

0001 1010 0001 1111

d)

Total bytes that can be stored in cache memory = No. of cache lines * No of bytes in a block(or line)

= 32 * 8

=256

Problem 5.13

a) fully associative cache..

tag(16  word offset(4


bit) bit)

CABBE :<1100 1010 1011 1011 1110>(in binary)

<1100 1010 1011 1011> <1110>

word offset is=14 th 

(b) set associative

no of sets=211 

ie 2048

tag(5 sets(11 word offset(4


bit) bit) bit)

CABBE :<11001> <01010111011>  <1110>

word off set: 14

set no: 699


tag: 25

You might also like