Exercise 9
Exercise 9
1. Computing forces
Celestial mechanics:
2
𝐺∗|𝑟𝑖 −𝑟𝑗 |
𝑓(𝑖, 𝑗) = 𝑚𝑖 ∗ 𝑚𝑗
Cost: distance calculation - eucledian distance between bodies dominates cost, one square root has
time complexity of O(sqrt), subtractions - O(n)
constant factors G and masses of bodies are assumed constant and minimal cost. overall cost
O(1)+O(√𝑛).
Time: Brute force methods have time complexity of O(𝑛2 ), this is because each body needs to interact
with every other body, n*(n-1) calculation. But because of symmetrical interactions time complexity
becomes O(n^2)/2. (but for large n constant factor ½ becomes less significant).
Molecular dynamics:
Cost: almost similar to celestial mechanics, but higher because of cost associated with energy
function.
Time: Brute force methods have time complexity of O(𝑛2 ), systems with localized interactions can
reduce to O(n).
2. Heat dissipation
Number of processors – p
𝑡1(𝑛) 𝑝
Speedup S=𝑡𝑝(𝑛) = 2
+ 𝑂(1)
𝑆 1
Efficiency = 𝑝 = ½ + O(𝑝)
2)𝑡𝑠𝑡𝑟𝑖𝑝𝑒 = n x 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 ( 𝑡𝑠𝑒𝑟𝑖𝑎𝑙 is time for simulating single grid point). Communication – O(n).
𝑆
Efficiency(squares) = 𝑝 = 1, efficiency for stripes is the same.
3. Matrix multiplication
Runtime: O(n^3) .
n(problem size)
𝑂(𝑛3 )−𝑎 𝑎
= 1- 𝑂(𝑛3)
𝑂(𝑛3 )
each processor stores portion of input matrices A,B. if each processor stores n/p rows of A, n
2𝑛2
columns of B, total number will be +n.
𝑝
4. Matrix multiplication
Each processor (ip, jp) has initial inputs, submatrices A,B and empty C.
Exchange blocks -
In round k processor sends its block B(i,(𝑗𝑝 + 𝑘) % √𝑝) 𝑓𝑟𝑜𝑚 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 (𝑖𝑝 , (𝑗𝑝 − k + √𝑝)%√𝑝 .
Local computation –
Processor performs block multiplication between submatrix A and all received blocks B. results are in
blocks of C.
Communication of gathering –
Each processor in √𝑝 communication rounds 0<= k <= √𝑝 − 1
In round k processor sends its block C(i,(𝑗𝑝 + 𝑘) % √𝑝) to processor ((𝑖𝑝 − 𝑘 + √𝑝) % √𝑝, 𝑗𝑝 ),
Processor receives a block C((𝑖𝑝 + 𝑘) % √𝑝, ( 𝑗𝑝 + 𝑘)%√𝑝) from processor (𝑖𝑝 ,(𝑗𝑝 − 𝑘 + √𝑝) % √𝑝)
Local accumulation:
After √𝑝 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑜𝑢𝑛𝑑𝑠, 𝑒𝑎𝑐ℎ 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 𝑎𝑙𝑙 𝑏𝑙𝑜𝑐𝑘𝑠 𝑓𝑜𝑟 𝐶(𝑖, 𝑗). Processor
accumulates received blocks into the final submatrix C(i,j).
submatricesA = decomposeMatrix(A, n, p)
submatricesB = decomposeMatrix(B, n, p)
submatricesC = initializeMatrix(n, p)
FOR k = 0 TO sqrt(p) - 1:
FOR i = 0 TO sqrt(p) - 1:
j_recv = (i + k) % sqrt(p)
submatrixB = submatricesB[j_recv]
j_out = i
# Gather results
RETURN gatherResults(submatricesC)
Exercise 5: routing for a grid
1.
| | | |
| | | |
| | | |
2.
3. w=(0,1,4,5,8,9,12,13)
4. w=(0,1,4,5,8,9,12,13,3,7,11,15,2,6,10,14)
5.
Westward edges (0,1,4,5,8,9,12,13) form a cycle. These nodes not conflict, eastward edges(3,7,11,15,
2,6,10,14) connect nodes in a way that does not create conflicts.
6.
Route left (0,1,4,5,8,9,12,13)
Right(3,7,11,15,2,6,10,14)
Exercise6:
Sender transmits bit x with time greater than clock skew ts.
During interval [e(0)-ts, e(0)], x is guaranteed to be stable on the bus before receivers first clock edge
in cycle i. cy(i) - R_cy(i) < e(0) - ts for all k ∈ [0, 6].
Receiver might miss x in cycle I due to clock alignment. cy(i) - R_cy(i) ≥ 0 (only applies for cycle i).
Stable sender transmission and subsequent cycles guarantee correct sampling: cy(i + k) - R_cy(i + k) <
e(0) - ts for k ∈ [1, 6].
In both cases, the receiver samples the correct x for at least 7 consecutive cycles, starting from cycle β
= 0 (Case 1) or β = 1 or β=0 (Case 2).