0% found this document useful (0 votes)
64 views

0020.matrix Multiplication Systolic

1) Systolic arrays replace single processors with an array of simple processing elements to enable high throughput computations with less memory access. 2) Each processing element may perform a different operation and communicate data to neighboring elements in different directions through the array in a nonlinear, multidirectional flow. 3) A 3x3 systolic array is presented as an example for matrix multiplication where each processing element computes and accumulates one element of the product matrix.

Uploaded by

Tejas.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

0020.matrix Multiplication Systolic

1) Systolic arrays replace single processors with an array of simple processing elements to enable high throughput computations with less memory access. 2) Each processing element may perform a different operation and communicate data to neighboring elements in different directions through the array in a nonlinear, multidirectional flow. 3) A 3x3 systolic array is presented as an example for matrix multiplication where each processing element computes and accumulates one element of the product matrix.

Uploaded by

Tejas.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Slides from

Shaaban Systolic Architectures


• Replace single processor with an array of regular processing elements
• Orchestrate data flow for high throughput with less memory access

M M

PE
PE PE PE

• Different from pipelining


– Nonlinear array structure, multidirection data flow, each PE
may have (small) local instruction and data memory
• Different from SIMD: each PE may do something different
• Initial motivation: VLSI enables inexpensive special-purpose chips
• Represent algorithms directly by chips connected in regular pattern

EECC756 - Shaaban
#1 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid b2,2
• Each processor accumulates one b2,1 b1,2
element of the product b2,0 b1,1 b0,2
b1,0 b0,1
Alignments in time b0,0
Columns of B

Rows of A

a0,2 a0,1 a0,0

a1,2 a1,1 a1,0

a2,2 a2,1 a2,0


T=0
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#2 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one b2,2
element of the product b2,1 b1,2
b2,0 b1,1 b0,2
Alignments in time b1,0 b0,1
b0,0
a0,0*b0,0
a0,0
a0,2 a0,1

a1,2 a1,1 a1,0

a2,2 a2,1 a2,0


T=1
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#3 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product b2,2
b2,1 b1,2
Alignments in time b2,0 b1,1 b0,2

b1,0 b0,1
a0,0*b0,0 a0,0*b0,1
a0,1 + a0,1*b1,0 a0,0
a0,2

b0,0
a1,0*b0,0
a1,2 a1,1 a1,0

a2,2 a2,1 a2,0

T=2
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#4 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product
b2,2
Alignments in time
b2,1 b1,2
b2,0 b1,1 b0,2
a0,0*b0,0 a0,0*b0,1
a0,2 + a0,1*b1,0 a0,1 + a0,1*b1,1 a0,0 a0,0*b0,2
+ a0,2*b2,0

b1,0 b0,1
a1,0*b0,0
a1,1 a1,0 a1,0*b0,1
a1,2 + a1,1*b1,0

b0,0
a2,0*b0,0
a2,0
a2,2 a2,1

T=3
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#5 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time
b2,2
b2,1 b1,2
a0,0*b0,0 a0,0*b0,1
+ a0,1*b1,0 a0,2 + a0,1*b1,1 a0,1 a0,0*b0,2
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1

b2,0 b1,1 b0,2


a1,0*b0,0
a1,1 a1,0*b0,2
a1,2 + a1,1*b1,0 a1,0*b0,1 a1,0
+ a1,2*a2,0 +a1,1*b1,1

b1,0 b0,1
a2,0*b0,0 a2,0*b0,1
a2,2 a2,1 + a2,1*b1,0
a2,0

T=4
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#6 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

b2,2
a0,0*b0,0 a0,0*b0,1
+ a0,1*b1,0 + a0,1*b1,1 a0,2 a0,0*b0,2
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

b2,1 b1,2
a1,0*b0,0
a1,2 a1,0*b0,2
+ a1,1*b1,0 a1,0*b0,1 a1,1 + a1,1*b1,2
+ a1,2*a2,0 +a1,1*b1,1
+ a1,2*b2,1

b2,0 b1,1 b0,2


a2,0*b0,0 a2,0*b0,1 a2,0*b0,2
a2,2 + a2,1*b1,0
a2,1 + a2,1*b1,1 a2,0
+ a2,2*b2,0

T=5
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#7 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

a0,0*b0,0 a0,0*b0,1
a0,0*b0,2
+ a0,1*b1,0 + a0,1*b1,1
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

b2,2
a1,0*b0,0
a1,0*b0,2
+ a1,1*b1,0 a1,0*b0,1 a1,2 + a1,1*b1,2
+ a1,2*a2,0 +a1,1*b1,1
+ a1,2*b2,1 + a1,2*b2,2

b2,1 b1,2
a2,0*b0,0 a2,0*b0,1 a2,0*b0,2
+ a2,1*b1,0
a2,2 + a2,1*b1,1 a2,1 + a2,1*b1,2
+ a2,2*b2,0 + a2,2*b2,1

T=6
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#8 lec # 1 Spring 2003 3-11-2003
Systolic Array Example:
3x3 Systolic Array Matrix Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

a0,0*b0,0 a0,0*b0,1
a0,0*b0,2
+ a0,1*b1,0 + a0,1*b1,1
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

a1,0*b0,0
a1,0*b0,1 a1,0*b0,2
+ a1,1*b1,0
+a1,1*b1,1 + a1,1*b1,2
+ a1,2*a2,0
+ a1,2*b2,1 + a1,2*b2,2

Done
b2,2
a2,0*b0,0 a2,0*b0,1 a2,0*b0,2
+ a2,1*b1,0 + a2,1*b1,1 a2,2 + a2,1*b1,2
+ a2,2*b2,0 + a2,2*b2,1 + a2,2*b2,2

T=7
EECC756 - Shaaban
Example source: http://www.cs.hmc.edu/courses/2001/spring/cs156/
#9 lec # 1 Spring 2003 3-11-2003

You might also like