0% found this document useful (0 votes)
85 views14 pages

Multiplication Ax and AB Column Space of A Independent Rows and Basis Row Rank Column Rank

For matrix multiplication AB first or BC first matters for the order of operations but not the final result. Matrix multiplication is associative: (AB)C = A(BC). However, the order of operations affects the intermediate matrix dimensions and thus the number of floating point operations (flops) required. In general (AB)C requires fewer flops than A(BC) when m < n. So when in doubt, multiply matrices from left to right in order of decreasing dimension for better efficiency.

Uploaded by

Den Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views14 pages

Multiplication Ax and AB Column Space of A Independent Rows and Basis Row Rank Column Rank

For matrix multiplication AB first or BC first matters for the order of operations but not the final result. Matrix multiplication is associative: (AB)C = A(BC). However, the order of operations affects the intermediate matrix dimensions and thus the number of floating point operations (flops) required. In general (AB)C requires fewer flops than A(BC) when m < n. So when in doubt, multiply matrices from left to right in order of decreasing dimension for better efficiency.

Uploaded by

Den Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Linear Algebra and Learning from Data

Multiplication Ax and AB
Column space of A
Independent rows and basis
Row rank = column rank

Neural Networks and Deep Learning / new course and book


math.mit.edu/learningfromdata
   
2 3   2x1 + 3x2
x
By rows 2 4 1 = 2x1 + 4x2 
x2
3 7 3x1 + 7x2
     
2 3   2 3
x
By columns 2 4 1 = x1 2 + x2 4
x2
3 7 3 7
b = (b1 , b2 , b3 ) is in the column space of A exactly when
Axb = b has a solution (x1 , x2 )
 
1
b =  1  is not in C(A).
1
   
2x1 + 3x2 1
Ax = 2x1 + 4x2 = 1  is unsolvable.
  
3x1 + 7x2 1
1
The firsttwo
 equations force x1 = 2 and x2 = 0.
Then: 3 21 +7(0) = 1.5 (not 1).
What are the column spaces of
   
2 3 5 2 3 1
A2 = 2 4 6  and A3 = 2 4 1 ?
3 7 10 3 7 1
If column 1 of A is not all zero, put it into C.
If column 2 of A is not a multiple of column 1, put it into C.
If column 3 of A is not a combination of columns 1 and 2, put it
into C. Continue.
At the end C will have r columns (r ≤ n).
They will be a “basis” for the column space of A.
The left out columns are combinations of those basic columns in
C.
   
1 3 8 1 3
If A = 1 2 6
  then C =  1 2  n = 3 columns in A
r = 2 columns in C
0 1 2 0 1
 
1 2 3
n = 3 columns in A
If A =  0 4 5  then C = A.
r = 3 columns in C
0 0 6
   
1 2 5 1
If A =  1 2 5  then C =  1  n = 3 columns in A
r = 1 column in C
1 2 5 1
   
1 3 8 1 3  
1 0 2
A= 1 2 6 = 1 2
    = CR
0 1 2
0 1 2 0 1

All we are doing is to put the right numbers in R. Combinations


of the columns of C produce the columns of A. Then A = CR
stores this information as a matrix multiplication.
The number of independent columns equals the number of
independent rows

Look at A = CR by rows instead of columns. R has r rows.


Multiplying by C takes combinations. Since A = CR, we
get every row of A from the r rows of R. Those r rows are
independent — a basis for the row space of A.
Column-row multiplication of matrices

—– b∗1 —–
  
| |
.. ∗ ∗ ∗
AB =  a1 . . . an    = a1 b1 + a2 b2 + · · · + an bn .
  
.
| | —– b∗n —–
| {z }
sum of rank 1 matrices

The i, j entry of ak b∗k is aik bkj .


Pn
Add to find cij = k=1 aik bkj = row i · column j.

A = LU A = QR S = QΛQT A = XΛX −1 A = U ΣV T
Deep Learning by Neural Networks

1 Key operation Composition F = F3 (F2 (F1 (x0 )))


2 Key rule Chain rule for derivatives
3 Key algorithm Stochastic gradient descent
4 Key subroutine Backpropagation
5 Key nonlinearity ReLU(x) = max(x, 0) = ramp function
Theorem

Suppose we have N hyperplanes H1 , . . . , HN in m-dimensional


space Rm . Those come from N linear equations aTi x + bi = 0,
in other words from Ax = b. Then the number of regions
bounded by the N hyperplanes (including infinite regions) is
probably r(N, m) and certainly not more :
m        
X N N N N
r(N, m) = = + + ··· + .
i 0 1 m
i=0

Thus N = 1 hyperplane in Rm produces 10 + 11 = 2 regions


 

(one fold). And N = 2 hyperplanes will produce 1 + 2 + 1 = 4


regions provided m ≥ 2. When m = 1 we have 2 folds in a line,
which only separates the line into r(2, 1) = 3 pieces.
The theorem will follow from the recursive formula

r(N, m) = r(N − 1, m) + r(N − 1, m − 1)

Figure: The r(2, 1) = 3 pieces of H create 3 new regions. Then the


count becomes r(3.2) = 4 + 3 = 7 flat regions in the continuous
piecewise linear surface z = F (x1 , x2 ).
Backpropagation: Reverse Mode Graph for Derivatives
of x2 (x + y)

 
∂F ∂F
Figure: Reverse-mode computation of the gradient ∂x , ∂y at
x = 2, y = 3.
AB first or BC first ? Compute (AB) C or A(BC) ?

AB = (m × n) (n × p) has mnp multiplications


First way
(AB)C = (m × p) (p × q) has mpq multiplications

BC = (n × p) (p × q) has npq multiplications


Second way
A(BC) = (m × n) (n × q) has mnq multiplications

You might also like