0% found this document useful (0 votes)
7 views

Exercise 9

homework 9 networking

Uploaded by

nani chkhenkeli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Exercise 9

homework 9 networking

Uploaded by

nani chkhenkeli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Exercise 9

1. Computing forces

Celestial mechanics:
2
𝐺∗|𝑟𝑖 −𝑟𝑗 |
𝑓(𝑖, 𝑗) = 𝑚𝑖 ∗ 𝑚𝑗

Cost: distance calculation - eucledian distance between bodies dominates cost, one square root has
time complexity of O(sqrt), subtractions - O(n)

constant factors G and masses of bodies are assumed constant and minimal cost. overall cost
O(1)+O(√𝑛).

Time: Brute force methods have time complexity of O(𝑛2 ), this is because each body needs to interact
with every other body, n*(n-1) calculation. But because of symmetrical interactions time complexity
becomes O(n^2)/2. (but for large n constant factor ½ becomes less significant).

Molecular dynamics:

Cost: almost similar to celestial mechanics, but higher because of cost associated with energy
function.

Time: Brute force methods have time complexity of O(𝑛2 ), systems with localized interactions can
reduce to O(n).

2. Heat dissipation

1) Serial time t1(n) = O(n)


𝑛 𝑛
Parallel time tp(n) = + O(√ )
√𝑝 𝑝

Number of processors – p
𝑡1(𝑛) 𝑝
Speedup S=𝑡𝑝(𝑛) = 2
+ 𝑂(1)

𝑆 1
Efficiency = 𝑝 = ½ + O(𝑝)

2)𝑡𝑠𝑡𝑟𝑖𝑝𝑒 = n x 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 ( 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 is time for simulating single grid point). Communication – O(n).

Total parallel runtime = 𝑡𝑠𝑡𝑟𝑖𝑝𝑒 + O(n)


𝑡𝑠𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝑛2 ∗𝑡𝑠𝑒𝑟𝑖𝑎𝑙 𝑝𝑛
Speedup 𝑡𝑠𝑡𝑟𝑖𝑝𝑒 = 𝑡 = 𝑛∗𝑡𝑠𝑒𝑟𝑖𝑎𝑙+𝑂(𝑛0
= 𝑛+𝑂(1)
=𝑝
𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑆𝑡𝑟𝑖𝑝𝑒

𝑆
Efficiency(squares) = 𝑝 = 1, efficiency for stripes is the same.
3. Matrix multiplication

Runtime: O(n^3) .

a(constant communication overhead factor),

n(problem size)

parallelizable part = O(n^3) – a


1
ideal speedup = 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑧𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑧𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡 1
(1− )+( )𝑥 ( )
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑝

𝑂(𝑛3 )−𝑎 𝑎
= 1- 𝑂(𝑛3)
𝑂(𝑛3 )

1 𝑂(𝑛3 )∗𝑝 𝑂(𝑛3 )


S= 𝑎 1 𝑎 = =
+ − 𝑂(𝑛3 )−𝑎(𝑝−1) 𝑝−𝑎
𝑂(𝑛3 ) 𝑝 𝑂(𝑛3 )∗𝑝

each processor stores portion of input matrices A,B. if each processor stores n/p rows of A, n
2𝑛2
columns of B, total number will be +n.
𝑝

4. Matrix multiplication

Each processor (ip, jp) has initial inputs, submatrices A,B and empty C.

Exchange blocks -

Each processor in √𝑝 communication rounds 0<= k <= √𝑝 − 1

In round k processor sends its block B(i,(𝑗𝑝 + 𝑘) % √𝑝) 𝑓𝑟𝑜𝑚 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 (𝑖𝑝 , (𝑗𝑝 − k + √𝑝)%√𝑝 .

Processor receives block B(i’,(𝑗𝑝 + 𝑘) % √𝑝) from processor ((𝑖𝑝 + 𝑘) % √𝑝, 𝑗𝑝 )

Local computation –

Processor performs block multiplication between submatrix A and all received blocks B. results are in
blocks of C.

Communication of gathering –
Each processor in √𝑝 communication rounds 0<= k <= √𝑝 − 1

In round k processor sends its block C(i,(𝑗𝑝 + 𝑘) % √𝑝) to processor ((𝑖𝑝 − 𝑘 + √𝑝) % √𝑝, 𝑗𝑝 ),

Processor receives a block C((𝑖𝑝 + 𝑘) % √𝑝, ( 𝑗𝑝 + 𝑘)%√𝑝) from processor (𝑖𝑝 ,(𝑗𝑝 − 𝑘 + √𝑝) % √𝑝)

Local accumulation:

After √𝑝 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑜𝑢𝑛𝑑𝑠, 𝑒𝑎𝑐ℎ 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 𝑎𝑙𝑙 𝑏𝑙𝑜𝑐𝑘𝑠 𝑓𝑜𝑟 𝐶(𝑖, 𝑗). Processor
accumulates received blocks into the final submatrix C(i,j).

Pseudo code would like the following –

FUNCTION parallelMult(A, B, n, p):

# Check if n and p are perfect squares

# Decompose matrices A and B into submatrices

submatricesA = decomposeMatrix(A, n, p)

submatricesB = decomposeMatrix(B, n, p)

# Initialize result matrix

submatricesC = initializeMatrix(n, p)

# Communication and computation rounds

FOR k = 0 TO sqrt(p) - 1:

FOR i = 0 TO sqrt(p) - 1:

# Calculate index for submatrix B to be received

j_recv = (i + k) % sqrt(p)

submatrixB = submatricesB[j_recv]

# Calculate index for submatrix C to be accumulated

j_out = i

computeProduct(submatricesA[k], submatrixB, submatricesC[i * sqrt(p) + j_out])

# Gather results

RETURN gatherResults(submatricesC)
Exercise 5: routing for a grid

1.

(0,0) -- (1,0) -- (2,0) -- (3,0)

| | | |

(0,1) -- (1,1) -- (2,1) -- (3,1)

| | | |

(0,2) -- (1,2) -- (2,2) -- (3,2)

| | | |

(0,3) -- (1,3) -- (2,3) -- (3,3)

2.

(0,0) -> 0 (0,1) -> 1 (0,2) -> 2 (0,3) -> 3

(1,0) -> 4 (1,1) -> 5 (1,2) -> 6 (1,3) -> 7

(2,0) -> 8 (2,1) -> 9 (2,2) -> 10 (2,3) -> 11

(3,0) -> 12 (3,1) -> 13 (3,2) -> 14 (3,3) -> 15

3. w=(0,1,4,5,8,9,12,13)

4. w=(0,1,4,5,8,9,12,13,3,7,11,15,2,6,10,14)

5.

Westward edges (0,1,4,5,8,9,12,13) form a cycle. These nodes not conflict, eastward edges(3,7,11,15,
2,6,10,14) connect nodes in a way that does not create conflicts.

6.
Route left (0,1,4,5,8,9,12,13)

Right(3,7,11,15,2,6,10,14)

Exercise6:

Case 1: cy(0) + th < e(0) - ts

Sender transmits bit x with time greater than clock skew ts.

During interval [e(0)-ts, e(0)], x is guaranteed to be stable on the bus before receivers first clock edge
in cycle i. cy(i) - R_cy(i) < e(0) - ts for all k ∈ [0, 6].

Case2: cy(0) + th ≥ e(0) – ts

Receiver might miss x in cycle I due to clock alignment. cy(i) - R_cy(i) ≥ 0 (only applies for cycle i).

Stable sender transmission and subsequent cycles guarantee correct sampling: cy(i + k) - R_cy(i + k) <
e(0) - ts for k ∈ [1, 6].

In both cases, the receiver samples the correct x for at least 7 consecutive cycles, starting from cycle β
= 0 (Case 1) or β = 1 or β=0 (Case 2).

You might also like