0% found this document useful (0 votes)
2 views126 pages

Linear Algebra

Linear Algebra

Uploaded by

tamizhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views126 pages

Linear Algebra

Linear Algebra

Uploaded by

tamizhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Linear Algebra

Chapter 1.
Vectors in R n and C n , spatial vectors

A vector v is a member of a Vector Space R n ,C n .

If u ,v V where V is a vector space, and  ,  are


scalars (numbers) then

 ( u  v )  u  v , (    )u  u  u ,
 ( u )  (  )u are all in V .

A vector is an ordered collection of numbers


(quantities), an array.

e.g. v  [ 2.3 - 3.5 0.1] v  ( on on off on on off)

v  ( blue green blue white yellow)

In physics, vectors are things with a magnitude


and a direction. Or, it is a point in a space.

( 4 ,3 )
vector addition, vector amplifications.

We may depict a vector either as a row of


numbers or a column of numbers. It is up to us,
but we’ve to be consistent.

 u1 
u 
If u  ( u1 ,u 2 ,...,u n ) then v  u t   2  is the
 .. 
 
u n 
transpose of the vector u . Conversely, v t  u .

Here, u  row  vector , v  column  vector

Vector addition, scalar multiplication. Negation of


a vector ( if v, then what is –v?).

A vector u  ( u1 ,u 2 ,...,u n ) is a tuple in an n-space.


The components ui are also called coordinates,
entries, elements, etc.

One may consider a set of unit vectors comprising


this R n ( or C n ) space. The corresponding vector-
space that it collectively spans allows us to express
each vector as follows:

     
n
u  i1u1  i2 u 2  ...  ik u k  ...  in u n   k uk
i
k 1

If the vector space is orthogonal, each unit vector


spanning the space is orthogonal to the other. The
collection of the unit vectors that span completely
any vector in the vector space is called a basis.
In the above, the set of unit vectors { i j } form an
orthonormal basis.

e.g. A three-dimensional orthogonal space

Norm (or Length) of a vector.

The norm or length of a vector u  R n is || u ||.


n
If u  ( u1 ,u 2 ,...,u n ) then || u ||  ui2
i 1
u is a unit vector if || u || 1 .

Therefore,

v
v̂  is a unit vector.
|| v ||

Accordingly, if u and v are two points in the n-


space R n , the distance between them is PQ where

 n 
PQ    ( ui  vi ) 
2

 i 1 
P
u
Q

if the notion of a distance is ‘meaningful’ in that


space.
In such a space, the dot product (or the inner
product) between two vectors is defined in this
way:

For u  ( u1 ,u 2 ,u 3 ,...,u n ) and v  ( v1 , v2 ,..., vn ) , the


dot product is u .v or  u | v  (this Dirac notation
will be explained shortly)

u .v  u | v  ( u1v1  u 2 v2  ...  u n vn )   ui vi
i
In terms of the physics-metaphor

 u | v   | u || v | cos  , where  is the angle


between the two vectors.

The norm of a vector u is then  u | u  . We


should be careful. This is only one variety of norm
that we can think of with a vector. In fact, we can
have a number of them:

n
l1  norm : || u ||1   | ui |
i 1
l 2  norm : || u ||2 || u || as we have defined it.
1/ p
 n 
l p  norm : || u ||p    | ui | p 
 i 1 
l  norm : || u ||  max | ui |
i
For our sake, we’d mostly consider l 2  norm .

An inner product space V is a vector space where


norm of a vector is defined and one could form a
dot product  u | v  between any pair of vector
u and v in it with the following conditions:

1.  u | u   0 the length of a vector is never


negative.

2.  u | v  v | u  symmetry
3.  u  w | v    u | v    w | v 
4.  u | v     u | v  with  a constant
5.  u  v | u  v    u | u    v | v  triangle
inequality
6.  u | v    u | u  v | v  Cauchy-Schwarz
inequality

ex. u  ( 2 0 - 1) v  (1 3 - 2)

 u |u   4 0 1  5
 v | v   1  9  4  14 and
 u | v   (-2  1  0  3  (-1)  (-2))  0  5 14
ex. Let V  R 2 be an inner product space where
the dot products are defined in the following term:

(( a ,b ),( c , d ))  ac  bd

Then (( a ,b ),( a ,b ))  a 2  b 2  0 (condition 1 is


fulfilled and R 2 is an inner product space.

ex. A polynomial space P n is an inner product


space where every vector is an n-degree
polynomial (polynomial-format?) like

p( x )  a0  a1 x  a2 x 2  ...  an x n
q( x )  b0  b1 x  b2 x 2  ...  bn x n

Then  p | q   a0 b0  a1b1  ...  an bn is an inner


product between two such vectors in P n .

Another inner product in P n may be defined as


1
 p | q    p( x )q( x )dx
0

ex. Another inner product space is the space of


continuous functions where the dot product
between f ( x ) and g( x ) is defined as

 f |g   f(t)g(t)dt
-
In this set up, the basis set S  { v̂1 , v̂2 ,..., } is an
infinite set of vectors like

1 1 1
v̂1  , v̂2  cos t , v̂3  sin t ,
2  
1 1
v̂2 n  cos nt , v̂2 n 1  sin nt
 

That these form an orthonormal basis is evident


from the fact that


 cos mt sin nt dt  0 for m  n ,


 cos mt cos nt dt  0 for m  n


 sin mt sin nt dt  0 for m  n


ex. A weighted Eucledian inner product space R n


may be designed to yield dot products like

 u | v   1u1v1   2 u 2 v2  ...   n u n vn
with 1   2  ...   n  1
The neural networks are developed on such
spaces.

More observation.

A set S in an inner product space V is called


orthogonal if any two distinct vectors in S are
orthogonal. If also each vector is a unit vector, the
set S is called orthonormal.

The set of unit vectors in R n comprise an


orthonormal space if

 i j | ik    jk (Kronecker delta)
 jk  1 if j  k, 0 otherwise

Notice that a finite orthonormal set S is spanned


by linearly independent vectors.

Accordingly, if a set of vectors S  V can be


identified such that any vector v  V can be
expressed uniquely as a linear combination of
vectors in S , the set S is a basis for V . Thus,

v  î1v1  î2 v2  î3 v3  ...  în vn if each îk  S .


A vector space may have several distinct bases but
each will have same number of basis vectors in
them. The number of basis vectors spanning a
vector space is called the dimension of a vector
space.

Ex. The standard basis spanning the R 3 is the set


of three vectors S  { e1 , e2 , e3 } where

e1  ( 1 0 0) , e2  ( 0 1 0) and e3  ( 0 0 1)
Any three-dimensional vector can be expressed in
this basis. e.g.

v  ( 4 - 5 3)  4e1  5e2  3e3

Obviously other bases in this vector space are


possible. Suppose, we choose these as our basis set
S'  { u1 ,u 2 ,u 3 } with

u1  ( 1 - 2 1), u 2  (0 3 2) and u 3  ( 2 1 - 1)
Then,

v  ( 4 - 5 3)   1u1   2 u 2   3 u 3 and the


coefficients are determined by the equations
 1  2 3  4
 2 1  3 2   3  5
 1  2 2   3  3

Distances, angle and projections

The distance between the two vectors (the distance


between the terminals of the two vectors)

d ( u ,v )   u | v 

If u and v are vectors in an inner product space


V, the angle  between these two vectors is given
by

 u|v 
cos  
|| u |||| v ||

The two vectors are orthogonal (perpendicular to


each other) if  u | v  0
Projection of a vector on another vector (see the
diagram)

u
v

u cos 

Projection of u on vector v is uv  proj( u.v ) is

<u.v>  u.v 
uv  proj( u.v )  u cos  = ||u|| 
||u||||v|| || v ||
Lecture 2. About matrices.

1. A matrix is a rectangular array of scalars such


as

 a11 a12 ... a1n 


a a22 .. a2n 
A   21  having m rows & n
 .. .. .. .. 
 
 am1 am2 .. amn 

columns. The ith row and jth column entry of this


matrix is aij . We often denote a matrix as
Amn  [ aij ]

Matrix elements may be constants (real, complex,


ordinal), variables like functions of time, etc.

Row of a matrix. Column of a matrix. Square


matrix. Rectangular matrix. Diagonal elements.

A typical square matrix must have same number


of rows and columns.
 on off on on 
 off off 
 2 3  on off
e.g. A    B=  
 4 5  on off on off 
 
 off on off on 

All square matrices are rectangular matrices, but


not all rectangular matrices are square.

 1 1 0 1 1 
e.g. A25  
 1 1 1 0 0 

A Diagonal matrix is the following.

 d11 0 0 
D 0 d 22 0  . Notice that all dij  0, if i  j
 
 0 d 33 
 0

An Identity matrix is a diagonal matrix with all its


diagonal elements equal to 1.

1 0 0
I  0 1 0 
 
0 0 1
 
Trace of a matrix Tr( A ) is the sum of all its
n
diagonal elements. Tr( A )   aii .
i 1

A matrix A is strictly diagonally dominant if for


each row i, | aij | < |aii |
j i

Transpose of a matrix A is At whose rows and


columns are columns and rows of A .
 2 0
 2 1 3   1 3 
e.g. A    , At

0 3 2  
 3 2
 

The (ij)th element of At is a ji .

A complex matrix has complex elements in it.

 2i 0 2 
e.g. C33   1  i i 1  i 
 
 2  i 3i 3 

Given a complex number z  a  ib , its conjugate


is a complex number z  a  ib obtained by
reversing the sign of its imaginary part.
A conjugate transpose matrix D obtained from
the original complex matrix C is such that dij  c ji .

Thus, conjugate transpose of C above is

 2  i 1  i 2  i 
C t  D33  0 i 3i 
 
 2 1 i 3 

A matrix C H is Hermitian if it is equal to its


conjugate transpose. i.e.

If cij  c ji then C is Hermitian. A Hermitian matrix


will have real numbers in its diagonal.

 1 i   p -i 
e.g.  i 1 ,  i 1
   

Also are the Pauli matrices

0 1 0 i 1 0 
1    ,  2   i 0  ,  3   0 1 
 1 0     

A typical 3  3 Hermitian matrix would be


 2 1i 2i 
H  1 i 3 2  3i 
 
 2i 2  3i 6 

A square matrix A is skew-Hermitian if aij   a ji

A matrix could often be configured with positional


dependencies. For instance, a Hilbert matrix H is
1
a square matrix in which hij  . These
i  j 1
type of matrices are often used to test matrix
algorithms.

2. An upper triangular matrix U has the form that


all items below the diagonal are zero, i.e.

uij  aij i  j
= 0, otherwise
 3 1 2 0 9 
 0 2 2 1 4 
 
e.g. U   0 0 3 0 1 
 
 0 0 0 1 9 
0 0 0 0 4 
 

A lower triangular matrix L has all items above


the diagonal zero.

lij  aij i  j
= 0, otherwise

e.g.

3 0 0 0 0
1 2 0 0 0
 
L   3 2 3 0 0
 
 8 3 1 1 0
 1 1 2 1 4 

A matrix is Hessenberg (upper or lower) if it is


'almost' triangular with an additional off-diagonal
band just below or above the main diagonal. e.g.
1 0 -1 5 3
2 -2 3 4 6
b= -1 1 2 0 7
0 1 2 -3 -4
1 -1 2 -2 3

and its Hessenberg form is:

1.0000 -1.6330 3.2889 -4.6345 0.1941


-2.4495 -1.6667 -0.0384 4.7074 -1.9307
0 2.8964 0.3091 7.9878 -5.8211
0 0 4.7644 0.2614 -1.8186
0 0 0 2.2094 1.0962

Another important type of matrices is a sparse


matrix. It is a matrix where most of the elements
are 0 (e.g. only 10% of the elements have non-zero
values, say). The issue here is storage of such
matrices. Normal approach is inefficient.

e.g.
0 2 0 0 1 
5 0 0 0 3 
 
A  0 0 0 0 6 
 
2 0 0 0 0 
0 0 
 0 0 1

Its storage has to be designed, preferably, in some


form of list structure such as

1,2,2 1,5,1 2,1,5 2,5,3

3,5,-6 4,1,2 5,4,1 NULL

Finally, a block matrix is a matrix that is defined


using smaller blocks (of rectangular) matrices. It
is advantageous in computation. e.g.

A B
X   where each element unit is a matrix
C D
such as
 1 1 
 2 3   4 3 2   2 2 
A  ,B   4 0 5  ,C 
4 1     
 3 2
 
 3 0 3 
D   2 2 0 
 
 1 3 4
 

 2 3 4 3 2
4 1 4 0 5
 
with the original X   1 1 3 0 3 
 
 2 2 2 2 0 
 3 2 3 0 
 1

3. Matrix addition & scalar multiplication. Matrix


multiplication.

Given Amn and a Bmn , we may define Cmn as

Cmn  Amn  kBmn


if cij  aij  kbij i  [1,m] & j  [1,n]

2 0 3  -2 0 -3 
Thus, A   1 1 9  , -A =  -1 1 -9 
   
4 3 5  -4 -3 -5 
   

Summation Symbols one more time!

Given two vectors u and v , we may compute


expressions like:

a. uj
j  u1  u2  u3  ...  un on an n-component

vector
n 1
b.  ( j  1 )u
j 1
j  2u1  3u2  4u3  ...  nun 1

n
c. u v
k 3
k k 2  u3 v1  u4 v2  ...  un vn 2

Similarly, on matrices

d. ai
ij  a1 j  a2 j  a3 j  ...  amj
 a   ( a
i j j
ij
i j
i2  ai3  ai4  ...  ain )

e. = a12  ( a13  a23 )  ( a14  a24  a34 )


+ ... + (a1n  a2n  ...  amn )

n
f. a
k 1
b  ( ai1b1 j  ai 2b2 j  ...  ainbnj )
ik kj

This is an inner product of two vectors:


i-th row of a and the j-th column of b .

 ai. | b. j  
 b1 j 
b 
 2j 
 b3 j 
(ai1 ai2 ai3 ... aij ... ain )  
 ... 
 ... 
 
 bnj 
3. 1 Matrix multiplication

If Am p is multiplied from the right by B pn the


p
result is Cmn such that the cij   aik bkj
k 1

Observe. Am p can multiply Bqn from the left if


p  q . It can multiply Bqn from the right if m  n .

e.g.

 1 2 3  1 0 1 
A   2 1 1  and B   2 2 3 
   
 2 0 2   1 1 2 
   

Then

c23  a21b13  a22b23  a23b33

= 2  (-1)+(-1)  3+1  2 =-3


and the entire matrix c is

 2 1 11 
C   1 3 3 
 
 4 2 6 
 

Note that in matrix multiplication AB  BA


usually. If they are equal they are said to be
commutative. In our case,

 2 1 11  3 2 1
AB   1 3 3  , BA=  -8 6 10 
   
 4 2 6   -3 -3 2 
   

Also, note for a square matrix Ann ,

AI  IA  A if I is a n  n identity matrix.

The product of block matrices of same structure


and shape with diagonal block elements as square
matrices yields a block matrix of same structure.

e.g. Product of two block diagonal matrices


 A1 0 0  B1 0 0   A1 B1 
0 A2 0  0 B2 0  A2 B2 
    
0 A3  B3   A3 B3 
 0  0 0

also, for general block-matrices

 A1 A2  B1 B2   A1 B1  A2 B2 A1 B2  A2 B4 
A 
 3 A4 
 B3

B4   A3 B1  A4 B3 A3 B2  A4 B4 

Note that the diagonal blocks must be square


matrices.

3.2 Question: If AB  I , could we conclude that


BA  I ?

AB  I  ABA  A (multiplying from the right). If


BA  X then AX  A implying X  I .

Therefore, AB  I  BA

Another question: Is it possible for AB  0 (the


zero matrix, with all elements 0) even though
neither A nor B is zero? Leave it to you as an
exercise!
4. Inverse of a matrix.

Suppose, AB  I . B is then inverse of A and is


denoted by A1 assuming A is invertible.
Accordingly, AA1  I . Since,
AB  I  BA  I , we see A1 A  I as well.

Determination of inverse of a matrix is a challenge


for a numerical analyst. This is a significant
problem for us.

Given C  AB , C 1  B 1 A1 . That it is so can be


seen via this construct

C 1C  B 1 A1 AB  I . This means for any product


C  A1 A2 A3 ...Ai , C 1  Ai1 Ai11 ...A21 A11 . In all these
applications, we assume the matrices are
invertible.

If D is a diagonal matrix like

 d11 0 0 0 
 0 d 22 0 0 
D   then
 0 0 .. 0 
 
 0 0 0 d nn 
 1 / d11 0 0 0 
 0 1 / d 22 0 0 
D 
1 
 0 0 .. 0 
 
 0 0 0 1 / d nn 

A real matrix A is orthogonal if At  A1 .


Orthonormal matrices constitute an important
class of matrices in physical sciences.

An inverse of an orthogonal matrix is orthogonal;


product of two orthogonal matrices is orthogonal.

Orthogonal matrices preserve inner products as


can be seen from this:

Suppose Q is orthogonal. Let

| x  = Q|x> and |y'> = Q|y> . Then


 y | =  y | Q t and
 y | x  = <y|Q t Q | x  = <y|x>

Also, an orthogonal matrix preserves the norm of


a vector, i.e. || Qx ||  ||x|| . This is useful when
we want to solve an equation like A| x   |b 
by its equivalent problem Q | A| x  Q | b 
Recall our discussion on inner product space.
Accordingly, rows (or columns) of an orthogonal
matrix form an orthogonal basis in R n with usual
inner products either 1 or 0.

e.g.

0.4507 -0.5172 0.0236 -0.4721 0.5531


0.4465 0.2753 -0.3655 0.6273 0.4447
x = 0.5096 0.6817 0.0726 -0.4827 -0.1930
0.3984 -0.3959 -0.5234 0.0386 -0.6396
0.4231 -0.1878 0.7659 0.3860 -0.2236

Take any row or column of x, and see that its


norm is 1.0. Dot product of any two rows or two
columns must be 0.0.

4.1 How do we determine A1 given A ? Consider a


simple case.

a a12 
a. A   11  we want its inverse
 a21 a 22 
 b11 b12 
B  such that AB  I
 21 22 
b b
 a11 a12  b11 b12   1 0 
Now 
a22     0 1  implies that
 a21  21 22  
b b 

a11b11  a12b21  1
a21b12  a22b22  1
a11b12  a12b22  0
a21b11  a22b21  0

These equations give us the following:

Let | A| a11a22  a12 a21 . Then


a a a a
b11  22 , b12  12 , b21  21 , b22  11
| A| | A| | A| | A|

b. Suppose we like to obtain A1 of the matrix A


where A is the following:

 2 1 1   x1 x2 x3 
A   0 1 1  Assume, A1   y1 y2 y3 
   
 1 1 2  z z3 
   1 z2
 2 1 1   x1 x2 x3 
AA1   0 1 1   y1 y2 y3   I 3
  
 1 1 2  z z3 
  1 z2

Then, we have nine sets of equations to solve for


the unknowns. (Tedious but could be done with
patience, etc.). Eventually, one gets

 0.5 0.5 0 
A1   0.5 2.5 1 
 
 0.5 1.5 1 
 

c. There are other better computationally feasible


ways to solve the matrix inversion problem.
Systems of Linear Equations
n
1. Solve a x
i 1
i i b a linear equation

How do we solve a set of them?

a11 x1  a12 x2  a13 x3  ...  a1n xn  b1


a21 x1  a22 x2  a23 x3  ...  a2n xn  b2
... ............................................ = ...
am1 x1  am2 x2  am3 x3  ...  amn xn  bm

This is an m  n system. If m  n , we will not have


unique solutions. If m  n , we may get into a
situation inconsistency. If m  n , we may get unique
set of solutions.

Some observation:

Suppose, the equation is L0 : 2x  3 y  5


Nothing is changed if we multiply or divide this
equation by a constant.
Therefore, k  L0 can be used to replace L0 . We
denote this as kL0  L0

Given two linear equations Li and L j , we can form


any combination of these two to replace, say, L j .
Thus,

kLi  L j  L j

Interchanging the positions of two linear equations


does not change anything.

Li  L j

These operations that leave the system of equations


invariant are called elementary operations.

2. Consider solution of linear equations.

a. One unknown, one solution. If we have ax  b,


b
we have a solution x  , if a  0
a
b. Two unknown, two solutions. Assume,

2x  3 y  5
we have here x  1, y  1
3x  y  2
If one equation is the multiple of another equation,
we have infinite solutions. e.g.

2x  3 y  5
4x  6 y  10

Geometrically, the two lines are parallel.

a11 x1  a12 x2  b1
The equations have unique
a21 x1  a22 x2  b2

solutions if the determinant of this equation

a11 a12
| A|  a11a22  a12 a21  0
a21 a22

The determinant of a 3  3 matrix could also be


computed in a similar manner. For instance,

a11 a12 a13


a a23 a21 a23
| A| a21 a22 a23  a11  22  a12 
a32 a33 a31 a33
a31 a32 a33
a21 a22
 a13 
a31 a32
The determinant | A| could be expressed as an
n n
expansion | A|  aij Aij   ( 1 )i  j aij M ij
j 1 j 1

Where M ij  min or Aij  cofactor

A minor M ij is obtained from the original matrix the


resultant submatrix obtained by leaving ith row and
jth column behind.

3. Systems in triangular and echelon form:

An equation is in a triangular form if it can be


expressed, via appropriate elementary operations, like
below:

2x1  x2  3x3  2x4  4


5x2  6 x3  x4  0
2x3  3x4  1
2x4  2
This is easy to solve.

A set of equations is in echelon form if (a) no


equation is degenerate, (b) the leading term in an
equation begins at the right of the leading term of the
previous equation. The leading unknowns are called
pivot element.
Theorem. Suppose a set of m equations is in echelon
form with n unknowns. If m  n , we have a unique
set of solutions. If m  n , we can assign arbitrary
values (parameters) for ( n  m ) variables, and solve
the remaining m variables in terms of these
parameters.

See parametric form/free-variable form. Boils down


to the same thing.

Gaussian elimination method:

Part 1: (Forward elimination) Using elementary


operations, reduce the system of equation to either a
triangular form or an echelon form.

Part2: (Backward elimination) Step-by-step back


substitution yields now the solution of the system.

Show an example.

An echelon matrix must have all zero rows at the


bottom and it must obey the basic format of an
echelon as far as the position of the pivots is
concerned.

This is an echelon matrix.


0 2 2 1 
0 0 3 8
  The pivots are underlined.
0 0 0 4
0 0 0 0 

The rank of a matrix is the number of pivots in its


echelon form.

A matrix is A is in row-canonical form if (a) it's an


echelon matrix, (b) each pivot (leading to non-zero
entry) is 1, (c) each pivot is the only nonzero entry in
that column.

0 1 4 0 0 4 
e.g. 0 0 0 1 0 2  is in row-canonical form.
 
0 0 0 0 1 3 

Row equivalence of a matrix. A matrix A B (row


equivalent) if B is obtained from A by elementary
operations.

To set up the Gaussian elimination, work on the


augmented matrix  A b  and reduce A to a triangular
matrix (or to an echelon form) by elementary
operations. Then solve by back substitution.
Use Gaussian Elimination to solve

L1 : 2x1  x2  x3  4
L2 :  x1  2x2  3x3  6
L3 : x1  x2  x3  1
The augmented matrix is

 2 1 1 4   2 1 1 4 
 1 2 3 6  0 1 4 5 
   
 1 1 1 1  0 0 7 7 

back substitution now leads to

x3  1,x2  1,x1  1

Note that the O( GE )  n 3 This is prohibitively large


for large n.

Secondly, we have to be careful about error. Consider


the following.

0.003x1  59.14x2  59.17


5.291x1  6.130x2  46.78
with the exact solution x1  10, x2  1

Numerical solution using a 4 digit arithmetic gives us

0.003x1  59.14x2  59.17


-104300x2  104400

x2  1.001 (tolerable),
this gives us
x1  10.00 (outrageous)_

We have to come up with a pivoting strategy. Choose


the element in column k such that | a pk | max |aik |
k i  n
This is called "partial pivoting".

For instance, in our case, | a21 | max |ai1 |


1i  2
We now swap the rows so that the row with this pivot
becomes the topmost row.

5.291x1  6.130x2  46.78


Thus,
0.003x1  59.14x2  59.17

With this arrangement, the GE leads to

5.291x1  6.130x2  46.78


59.14x2  59.14
This now gives the correct solution

x1  1.000 x2  10.00
Failure of partial pivoting. If we scale equation 1 by
10 4 , then we get

30.00x1  591400x2  591700


5.29x1  6.130x2  46.78

partial pivoting will choose row1, which then again


leads to
x2  1.001 and x1  10.00

Remedy: Scaled partial pivoting

step1: design for each row a scale-factor


si  max |aij |
j 1..n

| ak 1 | |a |
step 2: Choose k such that  max j1
sk j 1..n sj

s1  max | 30.00 |,| 591400 |  591400


In our case,
s2  max | 5.291|,| 6.1301|  6.1301
| a11 | 30.00
  0.507  10 4
s1 591400
| a21 | 5.291
  0.8631
s2 6.130
Therefore, we choose the a21 as the pivot, swap row1
and row2 and continue with GE.

Note. (a) This can get expensive if you recompute si


at each step, so usually they are not recomputed.

(b) Full pivoting is expensive – search all n  n


max | aij |
elements to find
si
An example. Naïve Gaussian Elimination may fail.

Scaled partial pivoting. (Cheney and Kincaid, “Numerical mathematics


and Computing”, pp 234, Brooks/Cole Publishing)

Augmented equation.

 3  13 9 3  19 
 6 4 1  18  34
 
 6 2 2 4 16 
 12 8 26 
 6 10
The last column of the matrix is augmented to the
original matrix.

The index vector and the scale vector at the first


pivoting:

l  1 2 3 4
s  13 18 6 12 si  max j {aij }

| ali ,1 |   3 6 6 12 
Compute now  i  4   , , , 
 sli  13 18 6 12 

Choose the index for which the first maximum is


found. In our case, it is 3. Therefore, the first pivot is
going to be on the first element of row 3. Exchange
l1  l3 in the vector l . Now the augmented matrix
R
appears as (after R1  R1  3 , R2  R2  R3 ,
2
R4  R4  2 R3 )

0  12 8 1  27 
0 2 3  14  18 
A 
6  2 2 4 16 
0  4  6 
 2 2
Now we are to select the pivot number 2 given that
the vectors are now

l  3 2 1 4 and s  13 18 6 12

obtain the pivot row index from

| ali , 2   2 12 4 
 i  2, 3, 4   , , 
 sli  18 13 12 

The largest value corresponds to index 3 (which


12
yields ) , so we exchange 2nd and the 3rd elements
13
of the vector l realizing l  3 1 2 4. The pivot
in 2nd step is in Row 1 now. In the augmented matrix,
Row 2 and Row 4 under column 2 now become zero.
R R
This is done by R2  1  R2 and R4  R4  1 .
6 3

0  12 8 1  27 
0 0 13 / 3  83 / 6  45 / 2
A  
6  2 2 4 16 
0 0  2/3 5/3 3 

We select the final pivot now, pivot 3. The vectors
are now l  3 1 2 4 and s  13 18 6 12.

We now obtain the pivot row index from

| ali ,3  13 / 3 2 / 3 


 i  3, 4   , 
 sli   18 12 

Since the first number is bigger and that corresponds


to 3, therefore we have now l3  l3 leaving the index
vector as before. The pivot equation is l3  2 . So, our
pivot is on the 2nd row. Now the augmented matrix
takes the following shape:

0  12 8 1  27 
0 0 13 / 3  83 / 6  45 / 2
A 
6  2 2 4 16 
0 0  6 / 13  6 / 13 
 0

The rest is easy. This gives us through back


substitution the solution vector x t  (3 1  2 1) .

More on numerical computation of solution


matrix.
Error computation on an iterative framework. Let
x( k ) be the solution vector of the equation Ax  b on
kth iteration. The difference between this and the
actual is the error vector e( k )  x  x( k ) and the
residual on the kth iteration is

r ( k )  b  Ax( k )  Ax  Ax( k )  Ae( k )

Therefore, if we solve this error equation


Ae( k )  r ( k ) in double precision, the better solution at
the next ( k  1 ) iteration level comes out to be
x( k 1 )  x( k )  e( k ) . We continue with the process till
a convergence is reached.

Convergence criterion: stop when


|| x( l 1 )  x( l ) || some 

4. General Iterative techniques.

Gauss-Jacobi's method. Ideal for large sparse


matrices. To solve for Ax  b convert this into an
iterative form x  Tx  c , and solve it iteratively.

For instance, suppose the system is


2x  3 y  z  8
x  y  2z  1
x  y  3z  6

its exact solution is the vector x  ( 2 1 -1)t

can be expressed in an iterative format as

x( k )  4  1.5 y( k 1 )  0.5z ( k 1 )
y( k )  1  x( k 1 )  2z ( k 1 )
z ( k )  2  0.3333x( k 1 )  0.3333 y( k 1 )

Start with a trial solution x( 0 )  ( 1.0 0.5 0.5 )t and


obtain the next level solution.

This refers to the generalization of the iterative


solution scheme  x k  starting with x( 0 ) getting

 
 bi   aij x(j k ) 
1
xi( k 1 )
aii  j 1

 j i 

We can write a matrix A as


A  D  L U
D  Diagonal matrix
L  Lower Triangular
U=Upper Triangular
e.g.

 2 5 3   2 0 0   0 0 0  0 5 3 
 1 6 4   0 6 0    1 0 0   0 0 4 
       
 3 2 1  0 0 1   3 2 0  0 0 0 

Given this Ax  b becomes


( D  L  U )x  b or Dx=(L+U)x+b
or x=D -1 ( L  U )x  D 1b

This is Gauss-Jacobi. Given the diagonal matrix D,


it’s easy to compute D 1. Multiplying matrices is an
easier than taking an inverse.

Comparing this with x  Tx  c we see Gauss-Jacobi's


approach amounts to using

T  D 1 ( L  U ) and c=D -1b

e.g. The equation matrix is


 1 1 1  0 
A  A   1 3 0  b=  2  with this
   
 1 0 2  -3 
A  D  L  U , and here

1 0 0   0 -1 1 
D  0 3 0  L+U=  1 0 0 
   
0 0 2  -1 0 0 

1 0 0 
Now, D 1  0 0.3333 0 
 
0 0 0.5 

Therefore,
1 0 0   0 1 1
D 1 ( L  U )  0 0.3333 0   1 0 0
  
0 0 0.5   1 0 0 

 0 1 1 
= 0.3333 0 0 
 
 0.5 0 0 
1 0 0  0   0 
and D 1b  0 0.3333 0   2   0.6666 
    
0 0 0.5   3   1.5 

Therefore, our iterative problem appears to be

 0 1 1   0 
x ( k 1 )  0.3333 0 0  x( k )  0.6666 
   
 0.5 0 0   1.5 

We start with x( 0 )  ( 0 0 0)t . The iteration


progressed through the following:

0 0.666667 1.5
0.833333 0.944444 1.91667
0.972222 0.990741 1.98611
0.99537 0.998457 1.99769
0.999228 0.999743 1.99961
0.999871 0.999957 1.99994
0.999979 0.999993 1.99999
0.999996 0.999999 2
0.999999 1 2
1 1 2
Gauss-Seidel's Iteration Method:

From above, we can write

( D  L )x  Ux  b or x=(D-L)-1Ux  ( D  L )1 b .

Thus, the iterative scheme attempts to get

x( k 1 )  ( D  L )1Ux( k )  ( D  L )1 b

In computational format, the improved solution x( k 1 )


is obtained in the following way:

1  i 1 n
k
 i  ij j 
( k 1 ) ( k 1 )
xi  b  a x  aij j 
x
aii  j 1 j  i  1 

In other words, those next level of computed


variables that are already available could be used
immediately when situation arises. Apparently, this
converges twice as fast as Gauss-Jacobi.

Convergence analysis. Like successive substitution

Define error
e( k )  x  x( k )  Tx  c  ( Tx( k 1 )  c )  T( x  x( k 1 ) )
 Te( k 1 )
Thus, the magnitude of the error
|| e( k ) ||||Te( k 1 ) ||||T || || e( k 1 ) ||
This will be reduced only if ||T || 1.

We’ll come return to more computational aspects


after we develop the topic further.
A linear transformation = linear mapping
f : V  W transforms a linear combination of
vectors in V into a linear combination of vectors in
W. Note that V , W may not be of same dimension.

Linear transformation means if v1 , v2  V then


a. f (v1  v2 )  f (v1 )  f (v2 ) and
b. f v1   f (v1 ) where  is a scalar
with f (v)  W .

Therefore, a generalization from these two rules is


the following: if v1 , v2  V then

f (v1  v2 )  f (v1 )  f (v2 )  w1   w2

where w1 , w2  W

example:

V is the space of 2  2 matrices and W is the space


a b  a
of 2  1 matrices. f :      is a linear
c d  b
mapping or a linear transformation.

Verification:
 a b   e f  ae b f   ae
f :     f   
 c d   g h  c  g d  h  b  f 
a b   e f  a  e   a  e 
Also, f   f         
c d   g h  b   f  b  f 

Similarly, you can show that


  a b  a b 
f      f  
  c d   c d 

The following expansion comes naturally:

a. the zero vector in V maps to the zero vector


in W
b. f ( x)   f ( x)
c. For any set of scalars  i  and vectors
vi  V ,
f (1v1   2 v2  ...   n vn )  1 f (v1 )  ..   n f (vn )

d. Let Y  V . Then f : Y  Z  W . A subspace


maps to a subspace.

example. Suppose f :  3   3 defined by


f (a, b, c)  (a  b, b  c, c  a )

This you can show to be a linear mapping. One


subspace of  3 is the space Y spanned by all
vectors of the form (0, b, c) . Now, for any
(0, a, b)  Y f (0, a, b)  (a, a  b, b)

All such vectors (a, a  b, b) form a subset. Note


that the transformation is linear:

( a , a  b, b )  ( c , c  d , d )  ( a  c , a  b  c  d , b  d )
and  (a, a  b, b)  (a, a  b, b)

Also, ( f  g )(v1  v2 ) = ( g  f )(v1  v2 )

( f  g )(v1  v2 ) = f  ( g )(v1  v2 )


 f (g (v1 )   g (v2 ))
 fg (v1 )  fg (v2 )

Next consider the following situation of mapping


from one dimension to another.

Consider a linear mapping from  3 to a 2  2


matrix.
a a  b
f ( a , b, c )   
b a  c

Then a set of basis vectors gets transformed into


the following matrices:

1 1 0 1
f (1,0,0)    f (0,1,0)    and
0 1 1 0
0 0
f (0,0,1)   
0 1

From these we can get the general rule for any


vector (a, b, c) :

f (a, b, c)  af (1,0,0)  bf (0,1,0)  cf (0,0,1)

 1 1  0 1   0 0 
 a   b   c 
 0 1  1 0   0 1 

a a  b
 
b a  c
That is, even the vectors in this space are subject
to the same linear transformation. This shows that
a linear mapping f :  n   m is possible.

Suppose, the vector space V is spanned by an


orthonormal basis set {vi }and W is spanned by an
orthornormal basis set {wi } .

Any vector v  V and w  W can be expanded in


their respective base as:

n m
v   a i vi and w   b j v j
i 1 j 1

Now if f is a linear transformation f : V  W then

n n m
f ( v )  f  a i vi   a i f ( vi )   b j w j
i 1 i 1 j 1

m
This is possible only when each f (vi )   c ji w j .
j 1
n n m
Check.  ai f (vi )   ai  c ji w j
i 1 i 1 j 1
n m
But  ai c ji is some b j . Therefore, f (v)   b j w j
i 1 j 1

The crucial issue is that in a linear transformation


the basis is transformed under the transformation
m
f (vi )   c ji w j .
j 1

Obviously, we can generate the converse situation


too.

Proposition: If  :  n   m is a linear mapping,


then there exists a unique m  n matrix A such that
 ( x)  Ax , x   n

From the matrix framework


Example. Let V be a polynomial vector space.
The basis vectors in this space are:

For V: p1 ( x)  1 p2 ( x)  x, p3 ( x)  x 2 , p4 ( x)  x 3
d
The function f  transforms vectors as
dx
d d
f :V  V . p1 ( x)  0, p2 ( x)  1  p1 ( x)
dx dx
d d
p3 ( x)  2 x  2 p2 ( x), p4 ( x)  3 x 2  3 p3 ( x)
dx dx

So, in this basis the matrix of f is


0 1 0 0 
0 0 2 0 
 
0 0 0 3
0 0 0 0 
 

Question: What is the mapping transformation f


that maps from  3 to  4 as

f : ( a , b, c )  ( a , a  b, b  c , a  b  c )
Consider the basis vectors in  3 . How are they
transformed?

f (1,0,0)  (1,1,0,1)  1(1,0,0,0)  1(0,1,0,0)  1(0,0,0,1)


f (0,1,0)  (0,1,1,1)  1(0,1,0,0)  1(0,0,1,0)  1(0,0,0,1)
and f (0,0,1)  (0,0,1,1)  1(0,0,1,0)  1(0,0,0,1)

so the transformation matrix f is the transpose of


the above coefficient matrix.

1 1 0 1
f  0 1 1 1
 
0 0 1 1

 a 
a   ab 
Thus the vector b  maps into   as
   bc 
 c   
a  b  c 

 a  1 0 0
 a  b  1 a 
1 0  
 =  b
 b  c  0 1 1  
 a  b  c  1   c 
   1 1
Example. A mapping f : (a, b)  (a  3b,5b) is
thus the matrix transformation

 a  3b   1  3  a 
   
 5b   0 5  b 

Show that this is linear i.e. show that

f (a, b)  f (c  d )  f (a  c, b  d ) and
f (a , b )  f ( a , b)
http://turnbull.mcs.st-and.ac.uk/~sophieh/LinearAlg/SHlintran.pdf

More on mapping.

A mapping f : U  V is one-to-one (1-1 or


injective) if different elements of elements of
U gets mapped to different elements of V .

U
V
A mapping f : U  V is onto (or surjective) if
every element v  V is a mapping of one or more
element in U .

U
V

A mapping f : U  V is bijective if it is both onto


and one-to-one at the same time. This amounts to
an invertible function.

Example. Suppose a neural circuit outputs an


index for a pattern Pi  P . The neural network is
trained to output something. A NN machine is,
therefore, an one-to-one pattern recognizer.

Output Layer
Pattern NN
System
A hashing function where many addresses are
hashed onto the same hashed address is an
example of onto mapping. A cluster is another
example.

A function is an invertible mapping. Every person


has a unique finger-print and behind every finger-
print there’s a unique person.

We do your
Finger-printing
here.

Identity mapping.

Example. f :    . f ( x)  2e  x is an one-to-
one mapping. x   , y  f ( x) . But this is not
onto. We don’t have the situation that y  , x .
g :    where g ( x)  2 x  3 is a bijective
mapping. For every x   , there is a unique y  
and for every y   there is a unique x   .

h :    where h( x)  x  rand () is neither an


one-to-one nor an onto.

Vector space isomorphism. Two vector spaces


U and V are isomorphic if (a) they are defined
over a common field K, and (b) if there is a
bijection that maps U into V (and naturally, from
V to U ). An isomorphism is a bijective linear
mapping i.e. it is both one-to-one and onto. Every
isomorphism f :  m   m is a mapping that
admits its inverse
f 1 :  m   m .

Kernel and Image of a linear mapping.


Given f : U  V , Ker ( f )  U such that
Ker ( f )  {u  U , f (u )  0}

Kernel

Zero

Image or range of the mapping f is the set of


points in V to which points in U are mapped on.
Im( f )  V .

Proposition. If {ui , i  1,..., n} span a vector space


U as its basis and there is a linear mapping
f : U  V , then {vi  f (ui )}span Im f .

Kernel and the Image of a matrix mapping.

Suppose a linear mapping f :  3   2 . Suppose


the mapping is equivalent to a matrix
 x1 
 2  3 4   x  maps to the
A  and a vector x 
 1 2 8   2

 x3 
vector y where

 x1 
2  3 4   2 x1  3 x2  4 x3 
y  x2   

1 2 8    x1  2 x2  8 x3 
 x3 

Fair enough! Therefore, a typical orthogonal basis


set in  3 {e1 , e2 , e3 } are changed to the following
vectors in  2 :

1
2  3 4    2
Ae1    0  
1 
2 8   1 
0
0 
2  3 4    3
Ae2   1 
1 2 8     2 
0
 4
and Ae3   
8 
Indeed, these are all columns of the
transformation matrix A .

Thus, if Amn is a transformation matrix that yields


A : K n  K m (where K is either real or a complex
field) then the image of A, Im(A), is the set of
columns of A.

Rank and Nullity of a linear mapping.

Rank of a matrix is a measure of its degree of


independence.

Suppose Amn is an m  n matrix {aij }. The


rowspace of A is the subspace of  n spanned by
the rows
( a11 , a12 , a13 ,..., a1n ), ( a 21 , a22 , a23 ,..., a 2 n ) , …
(am1 , am 2 , am3 ,..., a mn ) . We call this space
rowspace(A) and its dimension as row-rank(A).

2 3 4
Suppose A   1  2 3 . The rowspace of A is
 
 0  7 2
the subspace of  3 spanned by the vectors
u1  (2 3 4), u 2  (1 - 2 3) and u3  (0 - 7 2)

but u3  u1  2u 2 . Therefore,

rowspace( A)  {u , u 2 ,u1  2u 2 }  {u1 , u 2 }

Since the vectors u1and u 2 are linearly


independent (check this!), they form a basis of the
space rowspace(A) so the row rank of A is the
dimension of this basis dim(Rowspace(A))=2.

Similarly, for the column rank of a matrix. The


column space of A is the set of columns of A
denoted by colspace(A). The dimension of column
space colspace() is the column rank of the matrix.
We would see that

row rank(A) = column rank(A)

Proposition: Consider an Amn matrix. Suppose,


matrices Pmm and Qnn are invertible matrix.
Then
Row rank ( A)  Row rank ( PA)
= Row rank ( AQ)

and
Colrank ( A)  Colrank ( PA)  Colrank ( AQ)

For 1  i  m let ui  (ai1 , ai 2 ,..., aim ) , the ith row of


A. Thus, rank ( A)  dim(U ) where
U  u1 , u 2 ,..., u m 

Now, suppose PA  B  (bij ) and let vi be the ith


row of B i.e vi  (bi1 , bi 2 ,..., bin ) .

rank ( PA)  dim(V ) , where V  v1 , v2 ,..., vn 

Now bij   pik akj . Therefore,


k
vi  ( pik a k1 ,  pik ak 2 ,...,  pik akn )
k k k
  pik (a k1 , ak 2 ,..., a kn )   pik u k
k k

Thus, each vector vi is a linear combination of the


vectors u1 , u 2 ,..., u n . Therefore, V  U . Therefore,

Rowsapce (PA)  Rowspace ( A)

This holds for any invertible matrix P and any


matrix A . Therefore,
1
Rowsapce ( P ( PA))  Rowspace (PA)  Rowspace ( A)

That is, Rowsapce (PA)  Rowsapce ( A) .


Therefore, their dimensions must be the same.
But, rank ( A)  dim( Rowspace( A)) . Hence,

rank ( A)  rank ( PA)

Similarly, for the other part of the observation.

Example. Find the rank of A where

 2 1 1 2 
A   1 2 0  1
 
 6 6 6 6 

v1  (2,1,1,2) and v2  (1,2,0,1) . Now,


(v1  v2 )  (1,1,1,1) . Therefore, v3  6(v1  v2 ) .
Therefore, v3  v1 , v2  . Therefore, the matrix
has a rank of two.

Revisit to Ax  b .

LU decomposition. Our problem is to solve the


matrix equation Ax  b where A is a matrix and
x and b are column vectors. We propose that the
matrix A is decomposable to product of two
triangular matrices L and U .

A  LU where

The decomposition is not unique. Suppose there is


a matrix E such that

L'  LE and U '  E 1U . Then


L'U '  LU  A

Therefore we may even choose the diagonal


elements of L to be 1, i.e. lii  1, i . Given this, the
problem Ax  b becomes LUx  b . This is
equivalent to two equations:

Solve Ly  b first and then solve Ux  y .


Since Ly  b

L11 y1  b1  y1  b1 / L11
L21 y1  L22 y 2  b2  y 2  (b2  L21 y1 ) / L22

….

This way we get the y vector. After that we solve


the equation

Ux  y

We may need to do pivoting if the raw matrix A


has weaker diagonal elements compared to the
rest of the elements in their rows. In that case, we
factorize not A but PA . That is,

PA  LU where P is a permutation matrix.

Cholesky decomposition

If a matrix A is symmetric and positive definite


then we can even do better than an LU
factorization.
A matrix A is positive definite if for any non-zero
x , we get

x * Ax  0 where x * = x t a complex-conjugate
vector.

Cholesky decomposition suggests that in that case


A  LLt (this will make A symmetric).

This means Ax  b is LLt x  b . This suggests the


approach

First, let Ly  b and then Lt x  y yields x .

In this case,

 a11 a12 .. a1n 


a a22 .. a 2 n 
A   12 =
 .. .. .. .. 
a .. a nn 
 1n a2 n

 l11 0 0 0  l11 l 21 .. l n1 
l l 0 0   0 l 22 .. l n 2 
 21 22  
 .. .. .. ..   0 .. .. .. 
l .. l nn   0 0 0 l nn 
 n1 l n 2
This gives us the matrix coefficients.

i 1  i 1 
lii  (aii   lik2 ) and l ji   a ji   l jk lik  / lii
k 1  k 1 
Matrix inversion problem.

Given a non-singular square matrix A , obtain a


matrix B such that AB  BA  I . The matrix B is the
inverse of A. A 1  B . In our course, we would be
considering only inverses of non-singular matrices
over real field (no complex matrix).

Matrix inversion through its adjoint.

Good for small matrices. Consider the matrix

a b c
P  d e f .
 
 g h i 

Let the corresponding cofactor matrix be

A B C
e f
Q  D E F where A  det 
  h i 
G H I 

d f d e
B   det   , C  det   … etc.
g i g h
Then the inverse of P , P 1 is

1
P 1  Q
det( P )

This scheme is impractical for large matrices.


Consequently, we need easier approaches to deal
with the problem.

We would deal with the square matrices only. We


assume its determinant is non-zero.

Given a non-singular square matrix Ann we can


obtain its inverse Ann
1
. We will approach this
problem from different angles.

a. Using elementary matrices:

A n  n matrix is an elementary matrix if it is


obtained from an n  n identity matrix I n by a single
row operation.

1 0 0 
e.g I 3  0 1 0
 
0 0 1
We can generate a number of elementary matrices
from it.

0 0 1 0 1 1
a. E1  0 1 0 b. E2  0 1 0
   
1 0 0 1 0 0
1 0 0 1 0 0
c. E3   2 1 0 d. E4  0 1 0
   
 0 0 1 1 0  1

A general 3 3 matrix can be expressed as a single


column 3 rows:

 R1 
Let A   R2 
 
 R3 

What is the effect of operating by an elementary


matrix Ei on A?

For instance,
 R3 
E1 A   R2  R1 , R3 interchange.
 
 R1 

 R2  R3 
E2 A   R2  R1 is replaced by R2  R3 , R3 by R1
 
 R1 

 R1 
E3 A   2 R1  R2  Replace R2 by  2 R1  R2
 
 R3 

 R1 
and E4 A   R2 
 
 R1  R3 

These demonstrate the effect of elementary matrices


on general matrices -- they effectively achieve row-
operations. Therefore, using such matrices, we can
transform a non-singular matrix A into its row-
echelon form.

Thus,
1 0 0 
Em El Ek ...E2 E1 A  0 1 0
 
0 0 1

Therefore, the matrix product

Em El Ek ...E2 E1  A 1

This gives us a procedure to obtain inverse of a non-


singular matrix A using row transformation.

a. Start with an augmented matrix A | I n .


b. Carry out row-transformation on this using
elementary matrices.
c. When the left-side becomes an identity matrix,
the transformed right side must be the inverse of
the original matrix A .

Observe:

a. Two matrices A and B are row equivalent to


each other if one can get B from A using a
sequence of elementary matrices on the latter.

That means E m El E k ...E2 E1 A  B


b. Every elementary row-operation can be
“undone” by another elementary row-
operation. Therefore, every elementary matrix
has an inverse.

c. The inverse of a product is the product of the


inverses in reverse order. For instance,

( Em El Ek ...E2 E1 ) 1  E11 E2 1...Em 1

d. Finally, given any Ann the following


statements are equivalent:

1. A has an inverse.
2. Ax  b has a unique solution for any b .
3. A is row-equivalent to I n
4. A can be expressed as product of
elementary matrices.
The world of Eigenvalues-eigenfunctions

An operator A operates on a function and produces a


function.

For every operator, there is a set of functions which


when operated by the operator produces the same
function modified only multiplied by a constant
factor.

Such a function is called the eigenfunction of the


operator, and the constant modifier is called its
corresponding eigenvalue. An eigenvalue is just a
number: Real or complex.

A typical eigenvalue equation would look like

Ax  x

Here, the matrix or the operator A operates on a


vector (or a function) x producing an amplified or
reduced vector x . Here the eigenvalue  belongs to
eigenfunction x .

d
Suppose the operator is A  ( x ) . A operating on
dx
d n
x n produces Ax n  x x  nx n .
dx
Therefore, the operator A has an eigenvalue n
corresponding to eigenfunction x n .

1. Eigenfunctions are not unique.

Suppose Ax  x . Define, another vector z  cx ,


where c is a constant.

Now, Az  Acx  cAx  cx  cx  z


Therefore, z is also an e-function (eigenfunction)
of A.

2. If Ax  x is an eigenvalue equation (and we


assume that x is not a zero vector), then

Ax  x  (A - I)x  0  det(A - I)  0

This leads to a characteristic polynomial in  :


p A  det( A  I )
 is an e-value of A only if p A  0 .

3. Spectrum of an operator A is  ( A ) : set of all


its e-values.
4. Spectral radius of an operator A is
 ( A )  max |  | = max | i |
 ( A ) 1i  n

5. Computation of spectrum and spectral radius:

2  1
Let A    be the matrix and we want to
2 5 
compute its eigenvalues and eigenfunctions. Its
characteristic equation (CE) is:

2    1 
det    0  (2 -  )(5 -  )  2  0
 2 5  

This gives 2  7  12  0  (   3 )(   4 )  0

Therefore, A has two eigenvalues: 3 and 4.

 x1 
Let the eigenfunction be the vector x   
 x2 
corresponding to e-value 3. Then

2  1  x1   x1   3 x1 
2 5   x   3 x   3 x 
  2   2  2
Therefore, we have 2 x1  x2  3 x1 yielding
x1   x2 . Also, we get 2 x1  5 x2  3 x2 which gives
us no new result. Therefore, we can arbitrarily take
1
the following solution: e1    corresponding to e-
 1
value 3 for the matrix A.

Similarly, for e-value of 4, the eigenfunction appears


1
to be e2    .
 2

6. Faddeev-Leverrier Method to get characteristic


polynomial.

Define a sequence of matrices


P1  A, p1  trace( P1 )
P2  AP1  p1I , p2  trace( P2 )
1
2
P3  AP2  p2 I , p3  trace( P3 )
1
3


Pn  APn 1  pn 1I , pn  trace( Pn )
1
n
Then the characteristic polynomial P(  ) is

P(  )  ( 1 )n n  p1n 1  p2 n  2  ...  pn 
 12 6  6
e.g. A   6 16 2 
 
 6 2 16 

Define P1  A, p1  trace( A )  12  16  16  44
P2  A( P1  p1I ) 

 12 6  6  32 6 6 
 6 16 2   6  28 2 
  
 6 2 16    6 2  28

 312  108 108 


   108  408  60  , p2  564
 
 108  60  408

And one proceeds this way to get p3  1728

The CA polynomial =

( 1 )3 3  442  564  1728 
The eigenvalues are next found solving
 
3  442  564  1728  0

7. More facts about eigenvalues.

Assume Ax  x . Therefore,  is the eigenvalue


of A with eigenvector x .

a. A 1 has the same eigenvector as A and the


corresponding eigenvalue is 1 .

b. A n has the same eigenvector as A with the


eigenvalue n .

c. ( A  I ) has the same eigenvector as A with the


eigenvalue (    ) .

d. If A is symmetric, all its eigenvalues are real.

e. If P is an invertible matrix then P 1 AP has the


same eigenvalues as A .

Proof of e.
Suppose, the eigenfunction of P 1 AP is y with
eigenvalue k .
Then,
P 1 APy  ky  APy  Pky  kPy
Therefore, Py  x and k must be equal to  .
Therefore the eigenvalues of A and P 1 AP are
identical and the eigenvector of one is a linear
mapping of the other one.

If the eigenvalues of A , 1 , 2 ,..., n are all distinct


then there exists a similarity transformation such that
1 0 0 .. 0 
0  0 .. 0 
1
 2 
P AP  D   0 0 3 .. 0 
 .. .. .. .. 0 
 
 0 0 0 .. n 

Let the eigenvectors of A be x( 1 ) , x( 2 ) ,..., x( i ) ,...x( n )


such that we have Ax( i )  i x( i )


Then the matrix P  x( 1 ) , x( 2 ) ,..., x( n ) 

Then AP  Ax( 1 ) , Ax( 2 ) ,..., Ax( n ) 

 1 x( 1 ) , 2 x( 2 ) ,..., n x( n ) 
 
 x( 1 ) , x( 2 ) ,..., x( n ) 1e( 1 ) , 2 e( 2 ) ,..., n e( n ) 
 PD

Therefore, P 1 AP  D

Also, note the following. If A is symmetric, then


( x( i ) )t x( j )  0 , i  j . So, we can normalize each
x( i )
eigenvector and obtain u (i )
 so that the
(i )
x

 
matrix Q  u ( 1 ) ,u ( 2 ) ,...,u ( n ) would be an
orthogonal matrix. i.e. Q t AQ  D

Matrix-norm.

Computationally, the l2 -norm of a matrix is


determined as


l 2 -norm of A || A ||2   ( A A ) t

1/ 2

 1 1 0
e.g. A   1 2 1 
 
 1 1 2
1 1  1  1 1 0  3 2  1
Then At A  1 2 1   1 2 1    2 6 4 
    
0 1 2   1 1 2  1 4 5 

The eigenvalues are:

1  0 , 2  7  7 , 3  7  7

Therefore, A 2   ( At A )  7  7  3.106

The l norm is defined as A   max  aij


1i  n j
1 1 0 
e.g. A   1 2 1 
 
 1 1  4
3 3
 a1 j  1  1  0  2 ,  a2 j  1  2  1  4
j 1 j 1

3
 a3 j  6 Therefore, A   max( 2,4,6 )  6
j 1

In computational matrix algebra, we would often be


interested about situations when A k becomes small
(all the entrees become almost zero). In that case, A is
considered convergent.

i.e. A is convergent if lim A k


k 
 ij  0

1 
 0
Example. Is A   2 convergent?
1 1
 
4 2

1  1  1 
0 0 0
2 4 3 8  4 16 
A  , A  , A  ,
1 1 3 1 1 1
     
4 4 16 8   8 16 

It appears that

 1 
 2k 0 
A 
k
k 1
 
 2  1 2 
k k

1
In the limit k   , k  0 . Therefore, A is a
2
convergent matrix.
Note the following equivalent results:

a. A is a convergent matrix
b1. lim A k  0
k  2

b2. lim A k 0
k  
c.  ( A )  1
d. lim A k x  0 x
k 

Condition number K ( A ) of a non-singular matrix


A is computed as

K ( A )  A . A-1

A matrix is well-behaved if its condition number is


close to 1. When K ( A ) of a matrix A is significantly
larger than 1, we call it an ill-behaved matrix.
Computational aspects of eigenvalue problems.

Problem. Given a n  n matrix A find its eigenvalues


and eigenfunctions.

Ax  x is the eigenvalue equation.

Power Method. To obtain the dominant eigenvalue


k and its associated eigenvector. We assume all
eigenvalues are distinct. Thus,

k  maxi 
i

Suppose, v1 , v2 , v3 ,..., vn  are the eigenfunctions of


the matrix A . Therefore, they span any vector
x  V   n on which A can work. For such a vector
x,

x  a1v1  a2 v2  a3v3  ...  an vn … (1)

If we let A operate on x repeatedly, say, m times we


get

A m x  a11m v1  a2 m
2 v2  a33 v3  ...  a n n vn
m m
  m
 m
 m
 m
k a1 m v1  a 2 m v2  ...  a k vk  ...a n m 
1 2 n
 k k k 

mj
For a large m ,  0 j  k . Therefore,
 m
k

1
lim A m
x  ak vk
m   k
m
…. (2)

As long as ak is not zero, we can compute the


dominant eigenvalue k . To guarantee ak  0 , we
have to make sure that the chosen eigenvector x is
not orthogonal to vk .

Consider now any vector y which is not orthogonal


to vk . Take a dot product of both sides of (2) with y .
We get

A m x . y  m
k a k vk . y … (3)

Apply the matrix A on (3) just one more time. This


will produce
A m 1 x. y  m 1
k a k vk . y … (4)

Divide (4) by (3) since none of the sides in (3) is


zero. This would result in

A m 1 x. y
k  lim m
… (5)
m  A x.y

This is the overall basis of the power method. The


assumptions are:

a. There is a dominant eigenvalue k in the


spectrum  A of the operator A . This means, that
m
 
the high powers  i   0 quickly.
 k 
b. A must have linearly independent eigenvectors;
that is, it is diagonalizable. If A ’s eigenvectors
are not linearly independent, the matrix is not
diagonizable.
c. The entries of A should be pretty clean. Any
significant error in A would be amplified in
computing high-powers of A m .
Example. Find the dominant eigenvalue of
1 3 
A 
2 2

1
Let’s start with a vector x   
0 

1 1 10


Ax    , A x  A    
2
3 3  8 
10 34 4 142 5 562
A x  A     , A x    , A x  
3

 8  36 140 564
2254
A6 x   
2252

1
Now let y    . With m  5, equation (5) yields the
0 
most dominant eigenvalue as

A6 x. y 2254
   4.0106
A5 x . y 562
The dominant eigenvalue seems to be 4, and the
1
corresponding eigenfunction seems to be   .
1

Note that A m x is the approximate eigenvector at the


eigenvalue  . How?

 
From (4), A( A m x )   A m x when m   .
Therefore, if the system converges well A m x is
roughly equal to the eigenvector for which the
eigenvalue is  . We’ll return to it later on.

Rayleigh Quotient

If x is an exact eigenfunction of A then the ratio


 Ax i
= i
xi
If x is an estimate of an eigenfunction then we could
do something like this. Consider the ratio called
Rayleigh Quotient of a matrix A with respect to
vector x as

x t Ax 
R( A, x )  t
x x
It would be a scalar whose magnitude would be
bounded. If A is symmetric then all is eigenvalues
are real and its RQ (Rayleigh Quotient) would be
bounded as follows:

min  R( A, x )  max

When x is an approximate eigenvector of A , RQ is


an accurate measure of the eigenvalue at that
eigenfunction.

Stopping rule in Power method:

We stop computation when comp  actual   ,


some threshold. But we don’t know actual .

Suppose, A is a real symmetric matrix with a


dominant eigenvalue 1. Then, using RQ estimate of
e-value, we argue if 1comp is the dominant e-value of
the matrix then

1comp
x t Ax 
 t with x  A m x0
x x
we can approximate the error bound as

 Ax.Ax
| 1comp  1actual |  
 x.x

 1comp 2 
example.

 5 2 1
A  and starting x0   
 2 8  1

1
starting with x0 =   we get
1
 3 3  69 4  1005
Ax0    , A x0    , A x0  
2 3
 , A x0   2778 
 
6  
42  330   

 101373
A5 x0   
 215034 

Ax.Ax A6 x0 .A5 x0
Using A x0 , we get 1 
5
 5
x.x A x0 . A5 x0
 8.9865
The error estimate is:

 A 6 x 0 . A 6 x0 2 
| 8.9865  1comp |   8.9865
 A 5 x . A5 x 
 0 0 

 0.26

From the error estimate we see that our error is at


most 0.26. We should not stop but continue since
otherwise

max abs error 0.26


  2.9%
computed value 8.9865

Another way to compute computational error is to try


the following (a more optimistic scenario, though not
realiable)

| 1comp ( n )  1comp ( n  1 ) |
error( n  1 ) 
| 1comp ( n  1 ) |

where error( n  1 ) is the error at the n  1th level of


computation and 1comp ( n ) is the computed value of
the dominant e-value at the n th level of the
computation. This check is recommended for non-
symmetric cases since RQ based error bound
computation is valid for only symmetric e-values.

In actual computation, components of the computed


e-functions get extremely large. We need to scale the
components appropriately. To do this, we divide the
eigenvector by its largest component and then use the
scaled vector in the next step.

Example.

 5 2 1
A  and again we start at x 
0  
 2 8  1

 3  0 .5 
e.g. Ax0    scale    = w1
6  1
 0. 5 0.07143
Aw1    scale     w2
7  1 
 1.64  0.4949
Aw2    … Aw10   
 7.85   1 

you get the idea. Thus, the approximate eigenvalue


comes out to be
Aw10 .w10
  9.002
w10 .w10

Example. Calculate error( n  1 ) from the previous


exercise.

1comp ( 3 )  3.8292268 1comp ( 4 )  4.0244698


1comp ( 5 )  3.9926566 comp
2 ( 6 )  4.0017604

Thus,

| 1comp ( 3 )  1comp ( 4 ) |
error( 4 )   0.04858
| 1comp ( 4 ) |
similarly, we can compute

error( 5 ), error(6)

Finding the least dominant eigenvalue by power


method.

If Ax  x and  is the least dominant one then


1
A 1 x  x would yield the reciprocal of this e-value

when subjected to Power method. Therefore, first we
compute B  A 1 and then obtain the most dominant
eigenvalue  from Bx  x . The desired eigenvalue
1
 .

Finding non-dominant eigenvalues.

Let the eigenvalues of Ax  x are

| 1 || 2 || 3 | ... | n |

We know how to get the most dominant one i.e. 1.


How do we get 2 given that we already know 1?

If A is symmetric, it could be done in the following


way. Let v1 be the eigenvector corresponding to
eigenvalue 1. Let u1 be the unit vector

v
u1  1 and   A  1u1u1t
v1

then has eigenvalues 0 , 2 , 3 ,..., n and the


eigenvectors of  are identical to the eigenvectors of
A . Therefore, to find 2 we can now use Power
method on  . The application of Power method to
this new  is called the method of deflation. A
warning note. Since 1 is not exact, we would be
introducing some error in  .

 5 2
Example. Again A    and let 1  9 and its
 2 8 
1
eigenvector is v1   
  2

 1/ 5 
Therefore, u1   
 2 / 5 

5 2  1 / 5   1 2 
We compute    9 
8   2 / 5   5
.
 2 5 
4 4 2
 
5 2 1 

1
By applying the Power method on  with x0   
1
We get …
Power method with shift

Normal Power method  dominant eigenvalue


Inverse Power method  least dominant e-value

Now consider a small change. Suppose, we now


change our matrix A to A  qI where q is a scalar.
That is we add q to all elements on its diagonal.

Now, let us use the Inverse Power Method (IPM) on


it. Suppose we now get the smallest eigenvalue  .
But  is an eigenvalue of A  qI . Therefore,

   q

is one of the eigenvalues of A . This is the shifting


method.

In physics, for example, this amounts to the assertion


that energy of a system is defined up to an additive
constant – there is no absolute zero for an energy.
And energy levels in Quantum Mechanics are
eigenvalues of a Hamiltonian matrix.

Any example?

__________________
Stability of numerical eigenvalue problems.

Problem. Given Ax  x , find the eigenvectors x


and the corresponding eigenvalues  of the operator
A.

Note 1. If A and B are almost equal their


eigenvalues need not be even close. That is, small
changes in A do not produce small changes in its
eigenvalue spectrum.

The issue here: Truncated data in machine storage


may cause serious problems.

Ex. Consider

1 1000  1 1000 
A  and B  
0 1  0.001 1 

 A  1,1  B  0 , 2

 1 1000
consider this time also C  
 0.001 1 
C has no real eigenvalue since its characteristic
polynomial 2  2  2 has no real root.

Therefore, Bad News:

■ errors will creep in matrix entries (almost


unavoidable)
■ eigenvalues of such a matrix may be suspect
if the matrix is no symmetric.

and the Good News (from Bad News)

■ If A is symmetric over real field, small


perturbations in A will render small changes in
eigenvalues.

Changes in
Eigenvalues
Changes in a
symmetric Matrix
Numerical methods are generally successful in
situations when we deal with essentially symmetric
matrices to compute eigenvalues.

Define Frobenius norm of a matrix A as

n
2
AF   aij
i , j 1

Stability Theorem: Let Ann our actual matrix and


Enn is the error matrix (both real and symmetric).
Let   A  E be the error-version of A .

If  A  1 , 2 , 3 ,..., n  are the eigenvalues of A


and  Â  ˆ1 , ˆ 2 , ˆ 3 ,...,ˆ n  are the eigenvalues of Â
then it appears that sum of the deviations of the
eigenvalues is bounded. That is

2
 i  ˆ i   E
n 2
F
i 1

With the same hypothesis of the stability theorem, on


individual eigenvalue
k  ˆ k  E F

2
Note that k  ˆ k    i  ˆ i   E
2 n 2
F
i 1
Thus, the above constraint.

Example. We need to calculate eigenvalues of A


where

 1  1 2
A   1 2 7 
 
 2 7 5

suppose the machine stores it as

 1.01  1.05 2.1


   1.05 1.97 7.1
 
 2.1 7.1 4.9

Therefore, the error matrix is in our case


 0.01  0.05 0.1 
E   0.05  0.03 0.1 
 
 0.1 0.1  0.1

Therefore,
E F
 ( 0.01 )2  2( 0.05 )2  5( 0.1 )2  0.032
 0.23664

So we are guaranteed that

| k  ˆ k |  0.23664

Even if the absolute error is small, the relative error


could be large. Suppose, we have in a specific case


| k  ˆ k |   and k 
3

Then the relative error is about 300%.

Gramm-Schmidt orthogonalization process.

Given a bunch of n arbitrary vectors vi ,i  n


produce an orthogonal set of vectors ui ,i  n
The dot product of two vectors a and b would be
computed as (assume both are column vectors)

 b1 
b 
a .b  a b = a1 a2 ... an  2 
t
 ... 
b 
 n
 a1b1  a2b2  ...  an bn

The dot product of a with itself is

2
a .a  a12  a2 2  ...  an 2  a

u
Let u1  v1 and e1  1
u1

u2
Now, let u 2  v2  ( u1 .v2 )u1 and e2 
u2

Given this, u1 .u 2  u1 .( v2  ( u1 .v2 )u1) = 0

Therefore, u1 is perpendicular to u 2 .

Similarly construct u 3 using u1 ,u 2  and v3


Again, define u 3  v3  ( u1 .v3 )u1  ( u 2 .v3 )u 2 and
u
e3  3
u3

Now, u1 .u3  u1 .( v3  ( u1 .v3 )u1  ( u 2 .v3 )u 2 ) = 0

And, u 2 .u 3 = 0

Therefore, all u1 ,u 2 and u3 are perpendicular to each


other. We continue in this way to define the rest of
the orthogonal basis vector set.

u k  vk  ( u1 .v k )u1  ( u 2 .vk )u 2  ...  ( u k 1 .vk 1 )u k 1


u
with ek  k
uk

This is Gramm-Schmidt orthogonalization process.


ei .e j  1 if i  j
Note that we have
 0, otherwise

Therefore the matrix Q  e1 | e2 | e3 | ... | en  would


be an orthonormal matrix i.e. QQ t  I

We can proceed further.


We note that

v1  u1
v2  u 2  ( u1 .v2 )u1
v3  u 3  ( u1 .v3 )u1  ( u 2 .v3 )u 2
v4  u 4  ( u1 .v4 )u1  ( u 2 .v4 )u 2  ( u 3 .v4 )u 3

This shows that

v  v1 v2 .. vn   QR

where Q  u1 u 2 .. u n 

a11 a12 .. a1n 


 0 a22 .. a2 n 
and R   
 0 0 .. akn 
 0 .. ann 
 0

Here Q is an orthogonal matrix and R is an upper


triangular one. The entries in R are projections of the
original vector v in the orthogonal basis space u .

We can convert Q to the orthonormal basis space E


where the vectors ei are orthonormal to each other.
Example. Three vectors.

2 1  1
v1  1  , v2   1 and v3   3 
     
3  2   2 

Assume they span a  3 space. Find an orthogonal


basis set that spans the same space.

Given that a matrix can be decomposed into QR


product, we get for a matrix A at the m th iteration:

A( m )  Q ( m ) R ( m )
A( m 1 )  R ( m )Q ( m ) for m  1,2 ,..

( m )t
Since R (m)
Q A( m ) we get recursive definition

( m 1 ) ( m )t
A Q A( m )Q ( m )

 
As we progress the sequence A( m ) will converge to
a triangular matrix with its eigenvalues on the
diagonal (or to a nearly triangular matrix whence we
can compute the egienvalues).
More on iterative solution of Ax  b

Iterative solution

Jacobi’s method

Gauss’-Seidel’s
Approach

SOR method

SOR method. Successive Over-Relaxation Method

Consider the basic framework for an iterative


solution of Ax  b through an example.

4 x1  3 x2  24
3x1  4 x2  x3  30
 x2  4 x3  24

Its solution is x  3 4  5t .

We begin with a starting solution x( 0 )  1 1 1t .

Gauss-Seidel frames the equations in the following


way:
x1( k )  0.75 x2( k 1 )  6
x2( k )  0.75 x1( k )  0.25 x3( k 1 )  7.5
x3( k )  0.25 x2( k )  6

For Gauss’-Seidel,

1  i 1 n
( k 1 ) 
xi( k )  (k )
  aij x j   aij x j  bi 
aii  j 1 j i 1 

In SOR, we begin right about here. We take ( 1   )


units of the previous estimate of xi and take  units
of the currently available estimate to formulate the
iterative solution:

xi( k )  ( 1   )xi( k 1 ) 
  i 1 ( k ) n
( k 1 ) 
  aij x j   aij x j  bi 
aii  j 1 j i 1 

The value of  is usually less than 2.0. When   1


we get the Gauss’-Seidel method; for 0    1, we
get Under-Relaxation Method. For 1    2 , we
have the Over-Relaxation Method.
If we take   1.5 , our SOR equations appear as

x1( k )  0.5 x1( k 1 ) 


1.5
4

 3 x2( k 1 )  24 
1.5
x2( k )  0.5 x2( k 1 )  ( 3 x1( k )  4 x2( k 1 )  x3( k 1 )  30 )
4

….

Assignment. Write a program that would iteratively


obtain SOR solutions given A and b .
Least Square solution of estimation problem.

Given a set of data xi , yi i  n. Is there any


relationship between the two?

Could we find a best-fit line through these data?


What do we mean by “the best-fit line”?

Many ways we could define a “best-fit” line. How


about this? A best-fit line is that line on which the
sum of squares of all errors (the difference between
yi and the projected data) is minimum.

Suppose the best-fit line is the line y  a0  a1 x . In


that case, the estimation error at the point xi is
ei  abs( a0  a1 xi  yi )

n n
Let S   ei2   ( a0  a1 xi  yi )2
i 1 i 1

We want to find that line (i.e. that a0 and a1) for


which S is minimized. Such a line is called Least-
square line (or a trend line).

Since S is a function of both a0 (the intercept) and


a1 (the slope), we must have

S n
  2( a0  a1 xi  yi )  0 … (1) and
a0 i 1

S n
  2 xi ( a0  a1 xi  yi )  0 … (2)
a1 i 1

From (1), we get na0  a1  xi   yi … (3) and


i i
From (2), we get a0  xi  a1  xi2   xi yi …(4)
i i i
From these, we get

n xi yi   xi  yi
a1 
n xi2   xi 2

Given this, we obtain a0 


 yi  a  xi
1
n n

How well does the line y  a0  a1 x cover the data?


This is given by the correlation coefficient  where

cov( x , y )
( x , y ) 
 x y

n  xy   x  y 2
where  ( x , y ) 
n x 2   x2 n y 2   y 2 
The correlation coefficient  is bounded:  1    1.
A good fit implies absolute value of  is closer to 1.

Correlation coefficient is a measure of linearity of


data. Even if data is not linear, one could try various
transformation on the data and try a linear fit
particularly if the associated correlation coefficient is
high in magnitude.
Example: Linearization process as Y  a0  a1 X

Transformation Transformation Relation


x y

a X x Y  log y y  ae bx
b. X  log x Y  log y y  ax b
c. X  log x Y y 
y  log ax 
b 2

d. X  x2 Y  ey e y  a  bx 2 

In Matlab, we get least-square fit solution using the


following approach.

Let A be the matrix with the x values in the first


column, and the y values in the second column.

>> a

a=

1 2
2 4
3 5
4 6
5 9
6 3
7 12
8 15
9 14
>> x=a(:,1)'

x=

1 2 3 4 5 6 7 8 9

>> y=a(:,2)'

y=

2 4 5 6 9 3 12 15 14

>> pcoeff=polyfit(x,y,1)

pcoeff =

1.5333 0.1111

>>

The best fit line is y  0.1111  1.5333 x in this case.

How about fitting a quadratic function? In this case,


we try

>> pcoeff=polyfit(x,y,2)

pcoeff =

0.1212 0.3212 2.3333


The best fit quadratic curve is now
y  0.1212 x 2  0.3212 x  2.3333

To get the value of correlation coefficient, compute


in Matlab

>> r=corrcoef(a)

r=

1.0000 0.8582
0.8582 1.0000

>>

The entry in ( ij ) element refers to the correlation


coefficient between the ith and the jth variable.
Assignment 1.
Numerical Linear Algebra

These assignments are suggested for sharpening our Linear Algebra skills. Our next exam
would definitely post similar questions (note the word is “similar”, not “same”) and it
would be to our advantage to spend time on it now. You may try Matlab to get their
answers in some cases just to ensure yourself that you did solve them correctly. Matlab is
available in the lab C014. A quick tutorial for Matlab could be procured from
http://www.math.siu.edu/matlab/tutorials.html. In particular, tutorial 3 and tutorial 4
show how to use Matlab to solve Linear Algebra problems.

1. In these two questions you’re supposed to solve the simultaneous linear equations
using Gaussian elimination procedure which is usually known as naïve pivoting
scheme.

x  y  z  t  2
2x  3 y  z  2
2x  y  z  t  1
a. x  y  3z  1 and b.
x  2 y  z  2t  6
x  2y  z  2
x  y  z  t  2

2. Do the previous one using scaled partial pivoting scheme.


3. In the next set of problems, find the rank and the basis space of the matrices
shown below. (In the text book, similar problems appear page 151)

2 4 1 3
 1  2 2  5 4 
1 0  1  2 1
a.  b.  
0 0 2 2
  1  4 5 
3 6 2 5

4. Write a program to solve the following set of equations using Jacoby’s iteration
scheme outlined in our lecture notes. If you are having difficulty to converge,
indicate why. Indicate what must be done to converge to a correct set of solution
if the latter does exist.

2x  3 y  z  2 3x  y  z  6
a. x  y  3z  1 b. x  4 y  z  1
x  2y  z  2  x  y  4z  8

5. Solve the above problems using Gauss-Seidel’s approach. Is the method faster
than pure Jacoby’s iteration scheme? Comment on the number of iterations taken
in these cases if you start at the same initial positions as in (4).
Assignment 2.

This is just like the previous assignment; this is supposed to provide a practice drill for your upcoming
exam.

1. Recall the three elementary row operations:

a. cRi multiply row i by a constant c


b. Ri  Ri interchange row i with row j
c. Ri  Ri  cR j add to the ith row c times the jth row.

Given this, obtain elementary matrices for each one of the following operations:
a. 8R2 b. R3  2R1 c. R1  R3

2. Give the row operation for the following elementary matrices:

1 0 0 0 1 0 0 0
0 0 1 0 0 1  1 0
A B
0 1 0 0 0 0 1 0
   
0 0 0 1 0 0 0 1

obtain their corresponding inverses.

 5 7 
3. Let A    , which of the following statements are true?
 3  4

a. A is singular b. A is invertible
c. A is non-singular d. A 1 does exist
4 7
e. The inverse of A   
3 5

4. In this question you are given two matrices (a) and (b). For each case, obtain a set of elementary
matrices E i such that E k El E m ...E 2 E1 operating on the relevant matrix transforms it into an upper
diagonal matrix U . Given this sequence, obtain next the product matrix L  E11 E 21 E31 ...E k1 such
that each of the matrices in (a) and (b) could be expressed in A  LU form. Write down their
corresponding matrix factors.

1 2 1  3  1 1  1  2 5

a. A  2 5 4  b. A  1 2 3 and c. A   1  3 6
 
0  1  3 0 1 1  1 3 7

5. Given the above matrices obtain their inverses from the elementary matrices identified.
d1 0 0 
6. What would be the inverse of diagonal matrices like D   0 d 2 0  ?
 0 0 d 3 
4 2  2

1. Consider the symmetric matrix A   2 10 5  . Show that A can be expressed as A  LLt .
 2 5 9 
Obtain this decomposition.
2. Find a factorization of the form A  LDLt for the following symmetric matrices:

4 1 1 1
 2 1 0  1 3  1 1 
(a) A   1 2  1  and (b) B  
1  1 2 0
 0  1  2  
1 1 0 2

3. Using the decomposition in (1), show how you are going to solve the following equations:
4x  2 y  2z  0
2 x  10 y  5 z  3
 2x  5 y  9z  2

(Hint: Taking advantage of the decomposition, we express our problem as


0 0
 
Ax   3 ; LL x  b where b   3
t

 2   2 
We first solve (a) Ly  b to get the vector y and then solve (b) Lt x  y for the vector x )

4. Find a sequence of elementary matrices Ei such that when they operate on the matrix A from left as
shown below the matrix A is transformed into the identity matrix I .
E i E j E k ...E m A  I
Show how from such an expression we can get A 1 given that A is the matrix shown in question 1.

5. Using Gramm-Schmidt orthogonalization process show how to transform the following vectors into an
orthogonal set:
 2 1  1

v1   1   v 2  2 and v3   1
 
 2  2  1 
Convert the orthogonal set into an orthonormal set.

6. Write a program (in a language you prefer) to compute the most dominant and the least dominant
eigenvalues of an invertible matrix given that they are real. Your program should identify the associated
eigenvectors as well.

You might also like