0% found this document useful (0 votes)
3 views

Mid-Sem2

The document is an exam paper for the Parallel Computing course at Birla Institute of Technology & Science, Pilani, Hyderabad Campus, for the 1st semester of 2019-2020. It consists of multiple questions covering topics such as message routing, matrix multiplication, network topology, task scheduling, search trees, and cache coherence. The exam has a total weightage of 30% and is closed book with a duration of 1 hour and 30 minutes.

Uploaded by

p20240109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Mid-Sem2

The document is an exam paper for the Parallel Computing course at Birla Institute of Technology & Science, Pilani, Hyderabad Campus, for the 1st semester of 2019-2020. It consists of multiple questions covering topics such as message routing, matrix multiplication, network topology, task scheduling, search trees, and cache coherence. The exam has a total weightage of 30% and is closed book with a duration of 1 hour and 30 minutes.

Uploaded by

p20240109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Birla Institute of Technology & Science – Pilani

Hyderabad Campus
st
1 Semester 2019-2020
Parallel Computing (CS F422) – Mid Sem Test (Regular)
Date: 04.10.2019 Weightage: 30% Duration: 1hr 30 min. Type: Closed Book
Instructions: Answer all questions. All parts of a question should be answered consecutively. No of pages: 2
Q1. (a) In parallel systems routing of messages is very important. Let us consider the store-and-forward routing. Given
that the cost of sending a single message of size m from Psource(Ps) to Pdestination(Pd) via a path of
length d is ts+tw×d×m. An alternate way of sending a message of size m is possible whereby, the user breaks the message
into k parts each of size m/k, and then sends these k distinct messages one by one from Ps to Pd. For this new method,
derive the expression for time to transfer a message of size m to a node d hops away under the following two cases:
(i)Assume that another message can be sent from Ps as soon as the previous message has reached the next mode in the
path
(ii)Assume that another message can be sent from Ps only after the previous message has reached Pd
(b) For each case, discuss the value of this expression as the value of k varies between 1 and m. Also, what is the optimal
value of k if ts is very large, or if ts=0?
(4 + 4 = 8 marks)
Q2. (a) Let us use the CREW approach for solving the problem of multiplying an n×n matrix A and vector X in O(lg n)
time. How many processors does this approach require? How much work does it require? Compare this approach with
sequential version to find the efficiency and speedup. You can write the answers using O( ) notation.
(b) Alter the approach given in (a) to execute it on an EREW PRAM and find the running time.
(c) Assume that we have an array of n elements and attached to its element is a tag with a value of 0 or 1. The objective is
to group the elements in the array in such a way that the 0-tagged elements come before the 1-tagged ones and the order of
same-tagged elements must be preserved. For example, if element A is in position 5 and B is in position 8 and both A and
B are tagged 0 we want in the output A to precede B. Following is an optimized approach that uses n processors on an
EREW PRAM. Fill in the blanks.
Algorithm Compact (A[1..n],f[1...n],n,B[1..n])
1. if (_____ ) c[i]=1;
else c[i]=0;
2. Parallel_Prefix (c[1..n],d[1..n],n);
// Broadcast n-d[n] (number of zeroes) everywhere.
3. Broadcast(n,n-d[n],Total);
4. if (f[i]==1) // i.e. c[i]==1/f[i]==0 processors participate in step 4
5. B[Total[i] + d[i]] = _____ // d[i] is destination address for item ___
6. if (f[i]==0)
7. B[i-d[i]]= _____;
(2 + 2 + 4 = 8 marks)
Q3. (a) Derive the diameter, number of links, and bisection width of a k-ary d-cube with p nodes. Let Lav be the average
distance between any two nodes in the network. Derive Lav for a k-ary d-cube.
(b) Let us consider a hypercube network of p nodes. Assume that the channel width of each communication link is 1. The
channel width of the links in a k-ary d-cube (for d < log p) can be increased by equating the cost of this network with that
of a hypercube. Two distinct measures are proposed to evaluate the cost of the network:
(i) The cost can be expressed in terms of the total number of wires in the network (the total number of wires is a product
of the number of communication links and the channel width)
(ii) The bisection bandwidth can be used as a measure of cost. Using each of these cost metrics and equating the cost of a
k-ary d-cube with a hypercube, what is the channel width of a k-ary d-cube with an identical number of nodes, channel
rate, and cost?
(4 + 4 = 8 marks)
Q4. (a) You are given a scenario where there are 7 tasks with running times of 1, 2, 3, 4, 5, 5, and 10 units, respectively.
Assuming that it does not take any time to assign work to a process, compute the best-case and worst-case speedup for a
centralized scheme for dynamic mapping with two processes.
(b) Consider the decomposition of LU-factorization into 14 tasks as depicted below:( i) Illustrate an efficient mapping of
the task-dependency graph of the decomposition onto three processes. (ii) Describe and draw an efficient mapping of the
task-dependency graph of the decomposition onto four processes.
PTO
(2 + 8 = 10 marks)
Q5. Consider the search trees (a) and (b) given below where the dark node represents the solution.

(a) If a sequential search of the tree is performed using the standard depth-first search (DFS) approach, how much time
does it take to find the solution if traversing each arc of the tree takes one unit of time?
(b) Assume that now the tree is partitioned between two processing elements that are assigned to do the search job. If both
processing elements perform a DFS on their respective halves of the tree, how much time does it take for the solution to
be found? What is the speedup? Is there a speedup anomaly? If so, explain the anomaly.
(4 + 8 = 12 marks)
Q6. (a) Consider the cache coherence problem in parallel systems. An example of an invalidation protocol working on a
snooping bus for a single cache block (X) with write-back caches is depicted below. Fill in the blanks.

(b) The following code segment describes cache coherence problem in a parallel processor setup comprising two
processors with write back caches. Given two $100 withdrawals from the account #241 at two ATMs; Each transaction
maps to a thread on different processor; Track accts[241].bal (address is in r3). Identify those instructions which point
towards the existence of cache incoherence.
Processor 0 Processor 1
0: addi r1,accts,r3 0: addi r1,accts,r3
1: ld 0(r3),r4 1: ld 0(r3),r4
2: blt r4,r2,6 2: blt r4,r2,6
3: sub r4,r2,r4 3: sub r4,r2,r4
4: st r4,0(r3) 4: st r4,0(r3)
5: call spew_cash 5: call spew_cash
(8 + 6 = 14 marks)

You might also like