Exercise 5_with solution
Exercise 5_with solution
1- Consider a microprocessor system where the processor has 16-bit data bus and
20-bit address bus. What is the maximum size of the byte addressable memory
that can be connected with this processor? Express the size using Kilo, Mega or
Giga bytes.
- If the data memory is 1 Kbyte, what will be the size of the address-bus to
point at data-memory locations?
- If the instruction memory is 1 Mbyte, what will be the size of the instruction
address-bus?
5- In this exercise, we examine how pipelining affects the clock cycle time of the
processor. Problems in this exercise assume that individual stages of the
datapath have the following latencies
Also assume:
5.1 What is the clock cycle time in a pipelined and non-pipelined processor?
5.2 What is the total latency of an ld instruction in a pipelined and non-pipeline
processor?
5.3 If we can split one stage of the pipelined data path into two new stages, each
with half the latency of the original stage, which stage would you split and what is
the new clock cycle time of the processor?
5.4 Assuming there are no stalls or hazards, what is the utilization of the data
memory?
5.5 Assuming there are no stalls or hazards, what is the utilization of the write-
register port of the “Registers” unit?
6.1 Suppose we modify the pipeline so that it has only one memory (that handles
both instructions and data). In this case, there will be a structural hazard every
time a program needs to fetch an instruction during the same cycle in which
another instruction accesses data. Draw a pipeline diagram to show were the
code above will stall.
6.2 In general, is it possible to reduce the number of stalls/NOPs resulting from this
structural hazard by reordering code?
6.3 Must this structural hazard be handled in hardware? We have seen that data
hazards can be eliminated by adding NOPs to the code. Can you do the same
with this structural hazard? If so, explain how. If not, explain why not.
6.4 Approximately how many stalls would you expect this structural hazard to
generate in a typical program? (Use the instruction mix from Exercise 5).
7- Assume that x11 is initialized to 11 and x12 is initialized to 22. Suppose you
executed the code below on a version of the pipeline with 5 stages that does not
handle data hazards (i.e., the programmer is responsible for addressing data
hazards by inserting NOP instructions where necessary). What would the final
values of registers x13 and x14 be?
a) For each of these references, identify the binary word address, the tag, and
the index given a direct-mapped cache with 16 one-word blocks. Also list
whether each reference is a hit or a miss, assuming the cache is initially
empty. Ignore byte-offset.
b) For each of these references, identify the binary word address, the tag,
the index, and the offset given a direct mapped cache with two-word
blocks and a total size of eight blocks. Also list if each reference is a hit
or a miss, assuming the cache is initially empty. Ignore byte-offset.
c) You are asked to optimize a cache design for the given references.
There are three direct-mapped cache designs possible, all with a total
of eight words of data:
• C1 has 1-word blocks,
• C2 has 2-word blocks, and
• C3 has 4-word blocks
9- For a direct-mapped cache design with a 64-bit address, the following bits
of the address are used to access the cache.
Tag Index Total Offset
63–10 9–5 4–0
a. What is the cache block size (in words)? assuming each word is 32 bits,
make sure to consider byte-offset.
b. How many blocks does the cache have?
c. What is the ratio between total bits required for such a cache
implementation over the data storage bits?
Beginning from power on, the following byte-addressed cache
references are recorded.
Question 1 solution
the maimum size of byte addressable memory = no. cells that could be addressed
by address bus* the memory cell size
= 2^20 * 1 byte = 1 Mbte = 1 MB
Question 2 solution
: similar to the previous question, instead the address-bus is 32 bits.
Therefore:
Max memory capacity = 2^32 * 1 bit = 2^2 *2^30 * 1bit = 4 Gbit =
2^32/2^3 = 2^29 = 2^9 * 2^20 = 512 MB
Question 3 solution
: Please note that choosing the type of memory to store data and instructions is
something that depends on the designers. Most of the memories are available as a
byte-addressable memory meaning that every 8 bits can be accessed through its
dedicated address. In the case of this question, a convenient choice for data memory
is byte addressable memory due to 8-bit registers, while a memory with cell size of 16
bits will be convenient to store instructions. As a processor designer, you may choose
a byte-addressable memory to store each instruction in two consecutive memory
locations, or a 16-bit addressable memory to store each instruction in one memory
cell. However, when you make a choice like this you may need to consider
adjustments in the other parts of your design (such as PC incrementing, or size of the
address bus)
CPU execution time for a program = no. CPU clock cycles * clock cycle time,
In the given question, time and the number of clock cycles have been given to us, so we can calculate
the frequency:
𝐶𝑃𝑈 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝐴 = 𝑛𝑜. 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝐴 ∗ 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒 𝐴
10 𝑚𝑠 20∗10^6
𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒 = 20∗10^6
=> 𝐶𝑙𝑜𝑐𝑘 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐴 = 10∗10^−3
= 2 ∗ 109 = 2 𝐺𝐻𝑧
𝐶𝑃𝑈 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝐵 = 𝑛𝑜. 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝐵 ∗ 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒𝐵
𝐶𝑃𝑈 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝐵 5 𝑚𝑠
𝑛𝑜. 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝐵 = = = 15 ∗ 10^6
𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡𝑖𝑚𝑒𝐵 1/3 𝐺𝐻𝑧
Question 5 solution
• Pipelined: 350; non-pipelined: 1250
• Pipelined: 1750; non-pipelined: 1250
• Split the ID stage. This reduces the clock-cycle time to 300ps.
• 35% (load and store)
• 65% (sum of alu and load)
Question 6 solution
6.1 Stalls are marked with **:
6.2 Reordering code won’t help in this case Because every instruction must be
fetched; thus, every data access causes a stall. Reordering code will just change
the pair of instructions that are in conflict.
6.3 You can’t solve this structural hazard with NOPs, because even the NOPs
must be fetched from instruction memory.
Question 7 solution:
There is a data hazard of type RAW (read after write). R11 is written in the first
instruction, while it is still in the pipeline stages, instructions 2 and 3 are fetched
and decoded and use the old value of R11. So, in the second instruction R11 is
still 11 and therefore 33 will be stored in x13. In the last instruction, also r11 is still
11, and therefore x14 become 26.
Question 8 solution:
Byte offset is ignored according to the question.
1 word/block -> no bits are required for word offset
16 blocks -> 4 bit index -> 4 lSbs are used for index, the remaining bits for tag:
b) 2 words per block -> 1 bit for the word offset
8 blocks -> 3 bits for index
Remaining bits are used for the tag, you notice in this case we have mor hits for the address
references given in this question.
c)
Question 9 solution:
A) assuming each word is 32 bits =4 bytes => 2 bits for the byte offset => we
have 5 bits in total for the offset, therefore, 5-2=3 => we will have 2^3 = 8
words per block
B) we should look at the index part => 5 bits for the index => 2^5 = 32 blocks
in total
C) Total no. bits required = no. blocks *(data in each block + tag + valid bit)
= 32* (8 words/block* 32 bits/words + (64 – (5+5)) + 1 valid bit )
= 32 (8*32 + 54+1) = 9952 bits
Total data that can be stored = 32 blocks * 8 words/block * 32 bits per
words = 8192
No. bits required/data that can be stored in the cache = 1.214
This means to store 4096 bits of data, you actually need more space
equivalent to 5856 in the cache to be able to store the tag and valid bits.