Array Processors: SIMD Computer Organization
Array Processors: SIMD Computer Organization
CU
CP
Control
Data bus
PE0
PEM0
PE1
PEM1
PEN-1
PEMN-1
PE0
PE1
PEN-1
M0
M1
Mp-1
N: Number of PEs in the system. Illiac IV ___ 64 MPP __ 16,384 (Massively parallel processing) F: Set of data routing functions provided by the interconnection network. Example: Mesh, star, Omega and butterfly.
data routing and network manipulation operations.
M: Set of masking schemes, where each mask partitions the PEs into 2 disjoint subnets of enabled PEs & disable PEs.
Components in a PE:
Index Register Destination Register Ai Di Bi Ii Ci Data Routing Register
Ri
Si
ALU
Status Register
Si = 1 Active
Si = 0 Inactive
Now, for computing: i=0 S (n) = Ai For n = 8 with N = 8, addition is performed in log2 N steps i.e., 3 steps.
SIMD:
1 2 3 1 1 + 2=3 2 + 3=5 3 + 4=7 4 + 5=9 5 + 6=11 6 + 7=13 7 + 8=15 1 3 5(1+5=6) 7(3+7=10) 9(5+9=14) 11(7+11=18) 13(9+13=22) 15(11+15=26) 1 3 6 10 1+!4 3+18
S(0) S(1)
S(2)
S(3) S(4) S(5) S(6) S(7)
4
5 6 7 8
6+22
10+26=36
Step 1
Step 2
Step 3
Algorithm:
i = 0-6 i = 0-6
Ai
i = 1-7
i = 0-5 i = 0-5
Ai + Ri
Step # 3: Ai Ri Ri Ri+4 Ai + Ri
Ai
i = 2-7
Masking Scheme:
Step # 1: PE7 is disabled. Step # 2: PE6 and PE7 are disabled. Step # 3: PE4 PE7 are disabled.
During Addition:
Step # 1: PE0 is not involved.
Static network
Dynamic network
1-D
2-D
Bus Based
Switch Based
Single Stage
Multistage
crossbar
N nodes connected by n-1 links; Internal nodes have degree 2 End nodes have degree 1. Diameter = n-1
2D Ring Network
Like a linear array, but the two end nodes are connected by an n
th
link.
Star network
mesh [torus or wraparound mesh] is an extension of the linear array [ring]. Degree: 2-4 Examples: Intel Paragon (2D mesh),
10
11
12 a
13 b
14 c
15 d
In the Illiac IV, each processor i was connected to processors: Ex: N=16 {i+1, i1, i+4, and i4} (mod 16). Here are the routing functions:
Chordal Ring N/W is ILLIAC-IV n/w. Also called as partially connected n/w. (Diag.)
15 14 13 12 11 10 9
1 2 3 4 5 6
Contd
3D networks :
000
3- cube
001 010 011 101
100 110
111
A hypercube is a generalized cube. In a hypercube, there are 2n nodes, for some n. Each node is connected to all other nodes whose
numbers differ from it in only one bit position. The node degree of n cube equals n and so does the
network diameter.
nodes.
Hence a 3-cube can be transformed to a 3-CCC with k x 2k nodes.
4D hypercube
4D hypercube = two 3D hypercubes with an additional link connecting corresponding processors
A x B switch module A inputs and B outputs In practice, A = B = power of 2 Each input is connected to one or more outputs
Binary Switch
2x2 Switch Legitimate States = 4
Perfect-shuffle interconnection:
This interconnection network is defined by the routing function S (an1 a1a0)2 = (an2 a1a0 an1)2
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Perfect Shuffle
Inverse Shuffle
a shuffle network is not a complete interconnection network. This can be seen by looking at what happens as data is reci rculated through the network.
with a shuffle-exchange network, arbitrary cyclic shifts of an N-element array can be performed in log N steps. Here
Exch. 1
0 1 2 3 4 5 6 7 0 4 1 5 2 6 3 7
Exch. 2
0 2 4 6 1 3 5 7
Exch. 3
0 1 2 3 4 5 6 7
Shuffle 1
Shuffle 2
Shuffle 3
There are log p stages each with p/2 switching elements each = p/2 * log p total Simple routing algorithm At each stage, look at the corresponding bit (starting with the msb) of the
Path Contention
0 1 4 2 3 5 4 0 1 2 3 4
5
6
5
6 7
Path Contention
0 1 2 3 4 4 0 1 2 3 4
5
6 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2
3 4
5
6
5
6 5 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6 4 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4 5
5
6
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4 5 5 6 7
5
6
Extra Problems
c)
d)
from node 1101 to node 0101 and from node 0111 to node 1001
simultaneously.
e)
Find the number of steps required to add 16 elements. Calculate the different routing functions. Show the routing and addition in each step.