Compre Final
Compre Final
Semester-I, 2021-22
Comprehensive Examination
Department of Computer Science and Information Systems (CSIS)
BITS-Pilani, K K Birla Goa Campus, Goa, India.
Dec 23, 2021 from 9 am to 12 noon (3 Hrs) Marks: 50
Instructions: Students can have 3-pages handwritten or 3-pages printout notes and a calculator with them. Students must
not have a mobile phone, laptop, and textbooks. No marks will be awarded for the questions if no reasoning is found in
the answer script.
1. (a) (Marks: 4) Consider the following program segment and compare the performance of a 1-bit predictor with the
2-bit predictor.
for (i=0; i<m; i++)
for (j=0; j<n; j++){
S1; S2; ; Sk;
}
2. (a) (Marks: 2+1) How many cycles are required to issue all the instructions for the following program on the
pipelined MIPS processor? What is the CPI of this program? Consider there is no dynamic branch predictor in
pipelined MIPS processor. Write down your assumptions and solve it.
addi R1, $0, #10
addi R2, $0, #6
add R3, R2, R1
loop: beq R3, $0, exit
addi R3, R3, #-2
j loop
exit:
(b) (Marks: 1+1) Consider there is a 1-bit dynamic branch predictor with an initial state as 1 and answer the
questions as in (a).
(c) (Marks: 1+2+1+4) Consider the datapath discussed in the class and the following delay elements for the
pipelined MIPS processor. Which unit should one consider to obtain the greatest speedup of the overall processor?
How fast should it be? (Making it faster than necessary is a waste of time) What is the cycle time of the improved
processor?
Elements Parameter Delay (ps)
Register clk-to-Q 𝑡𝑝𝑐𝑞 35
Register setup 𝑡𝑠𝑒𝑡𝑢𝑝 20
Multiplexer 𝑡𝑚𝑢𝑥 25
ALU 𝑡𝐴𝐿𝑈 200
Memory read 𝑡𝑚𝑒𝑚𝑟𝑒𝑎𝑑 250
Memory write 𝑡𝑚𝑒𝑚𝑤𝑟𝑖𝑡𝑒 220
Register file read 𝑡𝑅𝐹𝑟𝑒𝑎𝑑 160
Register file write 𝑡𝑅𝐹𝑊𝑟𝑖𝑡𝑒 100
AND 𝑡𝐴𝑁𝐷 15
Equality Comparator 𝑡𝑒𝑞 40
Consider the calculated cycle time as each stage takes to compute and latch the pipeline results. Find out the
execution time of a program consists of 10^9 instructions, and the program has this type of instructions: 25% loads,
10% stores, 15% branches, 4% jumps and 46% R-type. Assume that 40% of all loads are used by the next instruction
and 30% of all branches are mispredicted.
3. (a) (Marks: 2+2+1+1) Consider the systems with a byte-addressable main memory of 216-bytes. Assume that a
direct-mapped data cache consisting of 32-lines of 64-bytes each is used in the system. A 50 x 50 two-dimensional
array of bytes is stored in the main memory, starting from 1000H. Assume that the data cache is initially empty.
The complete array is accessed twice. Assume that the contents of the data cache do not change in between two
access. How many data misses will occur in total? Show each step clearly. No marks will be awarded if steps are
not shown clearly. Show tag the memory with valid bit and dirty bit entry. What is the size of tag memory while
considering the field like valid bit and dirty bit?
(b) (Marks: 2.5+0.5+0.5+0.5) Write a C code for 3-bits Tree PLRU replacement algorithm for the fully associative
cache. Your code must have these functions: refer (), replace (), isAllValid (). Make your assumptions and define
the functions’ signature and body. No need to check the tags in memory for the CPU’s request. No need to call the
above mention functions properly. Just write down the function properly by considering the dependencies among
them. Where do we put this state information? How many bits are required for the Tree PLRU algorithm if the
associative cache has n-lines? How many bits are required for the Tree PLRU algorithm if the set associative cache
has s-sets and n-lines?
(c) (Marks: 2+2+2+1) Suppose a computer has a processor with two L1 caches, one for instructions and one for
data, and an L2 cache. Let x be the access time for the two L1 caches. The miss penalties are approximately 21x
for transferring a block from L2 to L1, and 105x for transferring a block from the main memory to L2. For the
purpose of this problem, assume that the hit rates are the same for instructions and data and that the hit rates in the
L1 and L2 caches are 0.81 and 0.75, respectively.
(1) What fraction of accesses miss in both the L1 and L2 caches, thus requiring access to the main memory? (2)
What is the average access time as seen by the processor? (3) Suppose that the L2 cache has an ideal hit rate of 1.
By what factor would this reduce the average memory access time as seen by the processor? (4) Consider the
following change to the memory hierarchy. The L2 cache is removed and the size of the L1 caches is increased so
that their miss rate is cut in half. What is the average memory access time as seen by the processor in this case?
4. (a) (Marks: 0.5+0.5+3) Consider the 512K x 8 memory chip below. How many memory blocks are required to
build the 8M x 32 memory? Which extra element with dimension is required to connect such modules? Show the
connection properly for datapath, address-path, and control-path of the 8M x 32 memory. No marks will be awarded
if the connections are not proper and clear.
19-bits 8-bits
512Kx8
CS
(b) (Marks: 2+1.5+2.5) The systems have the 64-GB DRAM with two channels, each channel with 2-DIMMs, each
DIMM having 2-ranks, 32-banks, and the number of rows and columns in each bank is the same. The width of the
data bus for communicating with the last-level cache is 16-B. Each channel is connected to 64-GB of continuous
memory. (1) Find out the bits needed to identify Channel, DIMM, Rank, Bank, Rows and Cols. as discussed in the
class (2) Consider the DRAM systems are using a 64-B last level cache with cache block interleaving and a closed-
page row buffer management policy. Assume that it takes 20-cycles for a row to be transferred from the storage
array to row buffer after ACT (A) is enabled, and 10-cycles for a data to be moved through data bus to memory
controller after CAS is enabled. Minimum 15-cycles are needed between a PRE (P) and ACT (A) signal. The
controller uses FR-FCFS scheduling algorithm. Find out the total service time for the following memory requests
(physical address, arrival time):
(0x844332257, 10), (0x044336245, 11), (0x144336245, 12), (0x844336245, 20), (0x845333246, 35)
(0x855332265, 80)
-End-