Untitled
Untitled
Lecture - 01
Elementary Row Operations
Hello friends. Welcome to lecture series on Matrix Analysis with Applications. So, this
is the first lecture and this lecture deals with Elementary Row Operations. So, what
elementary row operations are and how it is applicable to solve linear system of
equations, we will see in first few lectures.
So, before stating what elementary row operations are first we define binary operations.
So, let G be a set of binary operation on G is a function that assigns each ordered pair of
elements of G, an element of G. So, what does it mean? It means if G is a set you take
any two arbitrary element in that set apply the operation given to you and the resulting
element if it also belongs to the same set then we say that the operation is a binary
operation. Like you take the set of natural numbers ok, and you apply usual addition.
So, we know that we if we take any two arbitrary natural numbers and we apply usual
addition operation then the resultant is also a natural number. That means the usual
addition for the set of natural numbers is a binary operation. Similarly, if you take it set
of integers here ok. And, you apply usual addition, you take any two integers add them
the resultant is also a integer; that means, the usual addition over the set of integers is a
1
binary operation. Similarly, you take the set of rational numbers or the real numbers or
complex numbers under usual addition their usual addition is the binary operation.
Now, similarly if you take the set of natural numbers and you take binary operation as
usual multiplication ok. If, you take any two natural numbers you multiply them the
resultant is also a natural number; that means, the usual multiplication over the set of
natural numbers is a binary operation ok. Now, similarly set of integer’s rational
numbers, real numbers and complex numbers, the usual multiplication is the binary
operation. Now, if you take a set of natural numbers, subtraction and divisions are not
binary operations.
Suppose you take two natural number say 1 and 2, 1-2 =- 1, which is not an integer. That
means, subtraction over set of natural numbers is not a binary operation. Similarly, if you
take division you say you take two natural numbers say 2 and 3 you divide them 2/3. So,
2/3 is not a natural number so; that means division is not a binary operation.
So; that means, binary operation is the operation, which when applying any two elements
of a set G the resultant element must be in the same set. Then, that operation is called
binary operation.
Now, come to group when we say that a set is the group under some binary operation.
So, let G be a set together with a binary operation that assigns to each ordered pair (a, b)
of the elements of G an element in G, denoted by a.b or ab. We, say G is the group under
2
this operation if the following three properties are satisfied ok. First of all the operation,
which we are defining on the set G is the binary operation.
Binary operation means is satisfy closure property; that means, you take any two
arbitrary element on the set G the resultant element is also in G ok. And what are the
other three properties which the set G should satisfy to form a group -
(1): Associativity, associativity means you take any 3 arbitrary element a,b.c in G. If,
you take brackets in the first two elements or you take the bracket in the last two
elements, the values are same that is associativity ok.
(2) Identity; identity means if there existed element e which is also called the identity in
G such that a.e = e.a = a for every a in G.
(3) Inverse means for each a in G, there is an element b in G called the inverse of a such
that a.b = ba = e, ok.
So, if the three property hold on a binary operation applied on G, then we say that the G
under that binary operation is the group. So, what are the four properties which we have
discussed?
Suppose you take a set of real numbers set of real numbers ok.
3
(Refer Slide Time: 06:38)
Let us consider set of real numbers denoted by capital R (Refer Time: 06:45). Binary
operation which we are defining here is suppose usual addition. So, we know that usual
addition on the set of real numbers on the set of real numbers is a binary operation,
because if you take any two elements a, b belongs to R, then a + b if this is usual addition
we are denoting by '+' then a + b is also in R for all a, b in R.
So; that means, this '+' which is applying on the two elements a, b in R is the binary
operation ok. Now, if we have to see whether this set of real numbers under this binary
operation forms a group or not. So, this is the binary operation, now we have to see the
other three properties are satisfying or not number one property is associativity, a
associative property.
Now, next is to see whether identity exist or not you take any element a in R ok. Now
a+e = e+a = a for any a in R and this implies e = 0 and 0 is in R. So, this belongs to R;
that means, for any a in R identity element is 0, which exist and belongs to the set of
natural real numbers. The third property is inverse you take any a in R then a+b =b+a = e
e is 0.
4
So, this implies b is equal to -a, which also belongs to R, suppose you want to find out
the inverse of 2, 2 is the real number. So, inverse of 2 is -2 which is also in R. So, we
have shown that all the properties are satisfied; that means this set R over the binary
operation usual addition forms a group. Now, if we see the same set of real numbers
same set of real numbers over multiplication if you say same set of real numbers, over
usual multiplication.
So, we have to exclude 0 because inverse of 0 is not defined here. So, we are here to
exclude 0 ok.
Now, if we are taking this set the set of real number excluding 0 under unusual
multiplication it forms a group. Usual multiplication is a binary operation because if you
take any two arbitrary element from this set say G and multiply them then the resultant is
also a real number. So, it is a binary operation. Now, the first property assosiativity holds
because (a.b).c = a.(b.c) for all a, b, c in G. The second property is identity. You take any
element in G say a belongs to G, then a.e = e.a = a for all a and G and this implies e =1
which is in G. So, there this means there exist an identity element in G.
And third is inverse if you take any a in G. So, it is a.b = b.a = e which is 1. So, this
implies b = 1/a which also belongs to G. Suppose you want to find out multiplicative
inverse of 2. So, it is 1/2, which is in set of real numbers excluding 0 so; that means, this
set G under binary operation usual multiplication constitute a group.
5
(Refer Slide Time: 12:01)
So, here are some examples you see the set of integers Z, the set of rational number is Q,
the set of real numbers R, are all groups under ordinary addition a usual addition.
In each case the identity element is 0 and the inverse of an element a is -a. In fact, all
these are Abelian groups, because they satisfy commutative property also. If, you take
the set of integers you take a.b or b.a resultant is same, the values are same; that means,
the set of integers in fact, forms an Abelian group similarly set of rational numbers set of
real numbers also forms an Abelian groups. Now you take this set; {1,-1, i,-i} you take
this set.
6
Now, set is {1,-1, i, -i}, we know that i2= -1. And, what is the binary operation under
which we are seeing that, it will forms a group or not under usual multiplication. We
know that the usual multiplication is a binary operation for this G; why it is binary
operation this you can easily see. You take 1,-1, i, -i. Now, you multiply 1 with 1 is 1; 1
with -1 is - 1; 1 with i is i; -i.-1 is i;-i with 1 is -i, then 1. -i = -i; i.i = -1; -i.-i = 1. Now,
you have seen all the possible multiplication of the elements of G with itself and we have
seen that all the elements in this set are in G itself. That means, this usual multiplication
on this G is in binary operation that is clear, because if you multiply any element of G
with itself all the elements are in G itself.
That means usual multiplication on this G is binary operation. So, first property holds
now we have to see the associative property. So, associativity is always hold in
multiplication in usual multiplication is always satisfied, then we have to see identity
element. If you take any a in G, then a.e = e.a = e for all a in G this implies e = 1, which
is in G you have seen here this is 1 which is in G so; that means, identity element also
exist which is in G. Now, the existence of inverse, if you see the existence of inverse so,
you take any a in G then for inverse a.b = b.a =e which is 1.
Then this implies b = 1/a if you take the inverse of 1; inverse of 1 is 1/1 which is 1,
which is in G, if you take inverse of -1; inverse is 1/-1; which is -1 is also in G. Inverse
of i which is 1/i which is -i; it is also in G and inverse of minus i is 1/-i which is i, it is
also in G. So, inverse of all the elements exist a identity element exist a sensitivity
property holds. So, we say that this set G under this binary operation I mean by usual
multiplication constitute a group.
Now, you take a set of all 22 matrices, whose determinant are not equal to 0; that
means, invertible matrices of order 22. Now, if you take so, so here binary operation
which we are choosing is the usual multiplication it forms a non Abelian group. Now let
us see how.
7
(Refer Slide Time: 16:53)
So, we are taking all those 22 matrices, whose determinant are not equal to 0. And,
what is the operation we are applying operation is usual multiplication.
Now, you take any A ,B belongs to S, this means determinant of A 0 and determinant
of B 0. If, you take the multiplication of these 2 matrices A . B and take the
determinant the determinant of A B =determinant of A into determinant of B, which is
also not equal to 0 this implies A B belongs to S; that means, this usual multiplication
forms a binary operation.
On this set S ok. Now, we have to see associativity the matrix is satisfy associative
property we already know that A B. C is same as A. B C for all A, B, C in S, this is
always satisfied in case of matrices. Now, existence of identity element you take any A
in S; A. E = E . A = A. So, this implies E = I and determinant of I 0. So, this implies I
also belongs to S. So, this guarantees the existence of identity elementiness. Now, we
have to see existence of inverse element, you take any A in S then A B should be equals
to B A should be equals to I and this implies B = A-1.
Now, inverse exists because determinant is not equal to 0 and since determinant is not
equal to 0, then determinant of A-1 0 determinant of B will be what? Determinant of A-1
= 1/determinant of A which is also not equal to 0 because determinant because from here
determinant of a is not equal to 0 and this implies B belongs to S.
8
So, we have shown the existence of inverse element also in S. So, hence we can say that
this constitute a group under usual multiplication. Now, it is n non Abelian group, why
non Abelian because if you multiply a.b or you multiply b.a, they need not be equal for
all a, b in S ok.
So, it is the group, but it is not an Abelian group. Now, the set of integers Z excluding 0
under ordinary multiplication is not a group, because if you can clearly see, if your
identity element is there; identity element under multiplication is 1, but if you take
element say 2, it's inverse is 1/2, which is not in this set of integers excluding 0. Hence it
will not constitute a group. So, this is all about group. Now, we come to field and then
we go to matrices ok. Now what is the field let us see; let us quickly see a non-empty set
F equipped with two binary operations, addition and multiplication is said to be a field if
it satisfies the following axioms for all a, b, c in F.
Now, here in fields instead of one binary operation we are having two binary operations.
The first binary operation we are denoting by addition, it may be any operation, but we
are denoting it by addition and the second binary operation we are denoting by
multiplication. Now, the first property is commutativity holds of for addition and
multiplication; that means, a+b = b+a for all a, b in F and a.b = b.a.
In case in case of both the binary operations number 1 and number 2; a associativity of
addition and multiplication must hold. Existence of identity, existence of additive and
multiplicative identities should exist that is 0. We are denoting 0 as the additive identity
and 1 as the multiplicative identity. So, 0+a = a and 1 . a = a for all a in F.
9
Then, existence of additive and multiplicative inverses so, here -a is simply additive
inverse of a and 1/b is simply multiplicative inverse of b where b is a non-zero element
in F ok. And the distribution of multiplication over addition, from right also and form left
also from left and from right this must hold. So, what I want to say basically that in case
of field we are having two binary operations, one we are calling as addition, other we are
calling as multiplication. So, F respect to addition must be an Abelian group and all non-
zero elements in F, over multiplication must constitute an Abelian group and the next
property is distribution of multiplication over addition ok.
So, if these three property hold; that means, with respect to addition f must be an Abelian
group, with respect to multiplication the set excluding 0 in F must be an Abelian group
and distribution of multiplication over addition this property must hold. So, if these three
property holds, then we say that F is the field.
Let us see few examples based on this now you see the set of real numbers.
Now, if you see set of real numbers and which the binary operation. The first binary
operation is usual addition and the next binary operation usual multiplication. Now,
usual addition forms an Abelian group for this set of real numbers, we have already seen
ok. And, if you exclude 0 from the set of real numbers it will also form an Abelian group
respect to multiplication and dot and multiplication distribute over addition also from left
and from right also.
So, we say that the set of real numbers with the usual addition and multiplication forms
of field is a field. Now, you see the set of complex numbers usual addition and usual
10
multiplication again the two binary operations are usual addition, usual multiplication.
Now, you see the set of complex number set of complex numbers and the usual addition
constitute a Abelian group this we can easily see.
All the 4 properties holds. So, it constitute a Abelian group over usual multiplication.
And, you if you exclude 0 from this C then this will also constitute Abelian group over
usual multiplication ok.
And, multiplication satisfy distributive over addition also. So, it will constitute a field.
Similarly, the third example set of rational numbers with the usual addition usual
multiplication is also a field. Now, see some examples, which are not fields; suppose we
are considering set of integers. Now, set of integers under usual addition constitute
Abelian group it is true, but under usual multiplication it does not form a group because
if you take a element say 2 in Z. So, it's multiplicative inverse is 1/2 which is not in Z.
So, this set of integers does not constitute group under multiplication.So, it will not
constitute a field.
Similarly, if you take Euclidean space say R2 R2 is all x y such that x y are in R. Now,
under usual addition it again constitute a group, I mean Abelian group. In fact, because
associative property hold identity is (0, 0) and inverse of any (a, b) in R 2 is (-a,-b),
which is also in R2, but if you see respect to multiplication excluding (0,0). There are
other element say (1, 0), which are in R2, but it's inverse does not exist. Hence, this R2
under usual multiplication does not constitute a group. So, it is not a field.
Now come to matrices. So, we defined group and field because we want to define
matrices over a field k, matrices are always defined over a field ok. So, that is why we
first clear out that what do you mean by a field? Ok.
11
(Refer Slide Time: 27:04)
So, a matrix A over a field K, K may be any field is a rectangular array of a scalars
usually presented in the following form. So, any matrix A can be represented as a
rectangular form see (Refer Slide Time: 27:04). So, if you take any a ij means i j-th entry
of this matrix a i j-th entry means the element in the i-th row and in the j th column.
Suppose, we are talking about a 22 ; a 22 is the element in the second row and in the second
column. If, you are talking about a 2n means element in the second row and n th column
ok. So, we denote a matrix by simply writing A = a ij and this denote the order of the
matrix the order of the matrix is m.n, because number of rows here are m and number of
column here are n. So, the order of the matrix is mn rows into columns. So, there are
general representation of a matrix A, which is a ij means any element i j-th i is varying
from 1,2,..., m and j is varying from 1,2,..., n.
These are some simple properties of a matrices we already know these things that A + 0
= 0+A=A; A+(-A) = 0; (A+B)+C=A+(B+C) associativity property hold respect to
12
addition then it distribute over addition, because it satisfy this property A + B = B + A
these three are the very basic properties this already holds in matrices.
Now, properties of transpose what do you mean by transpose? Transpose means you
interchange rows and columns.
1 2
Suppose A is any matrix, which is suppose 3 4 . It is it is having 3 rows and 2
5 6
1 2 3
So, a transpose means you simply convert rows into columns that is . So, this is
4 5 6
the way this is the way of writing a transpose means you interchange rows by columns.
The first row in first column, second row in second column, third row in third column,
ok. So, these property hold for transpose also (AB)T = ATBT, (kA)T = kAT , where k is
any scalar, transpose of transpose is itself. And if A.B is defined then (A.B)T=BTAT.
13
(Refer Slide Time: 30:15)
Now, adjoint of a matrix: let A be a nn matrix or square matrix of order nn. Then,
how to find adjoint; first you find cofactor of a element a ij , which is given by a (-1)i+j M ij
where M ij is the minor of a ij .
This we already know adjoint of a is simply you make the matrix of cofactors and then
take the transpose, that will be the adjoint of a matrix A. And, we also denote it by this
expression Adj(A), then the third property; A.Adj(A) = Adj(A).A =|A|I nn . Then |Adj(A)|
|A|n-1, this is very easy to prove you can simply see here efer Slide Time: 31:23)
And, also |kA|=kn|A| where k is any scalar and A is a matrix of order nn is simply equal
to kn|A|. If A is a matrix of order nn ok. Because, this k is multiplied with all the
elements of A and when you take the determinant, you can take the common from each
row first row, second row up to n th row. So, kn will be common and then it will be |A|.
14
So, here determinant of a works as k, because it is a scalar, scalar quantity and I is the
matrix of order nn. So, it is determinant of An |I|. |I| = 1 so, it a determinant of I. So, this
implies |Adj(A)| is simply |A|n-1.
Also we know this thing that if matrix is invertible then inverse exist and inverse is given
by Adj(A)/|A|.
Now this is the very simply problem let us see, just to illustrate few properties of
matrices.
If, you want to find out determinant of 2A so, it will be simply because we know that
determinant of k into A is equal to kn |A|. If A is a matrix of order nn, here matrix of
order 22 and k = 2. So, it is 22|A|, which is 4.3 = 12. If, we want to find out |Adj(A)| it
is simply |A|n-1 here n is 2 and determinant of A is 3. So, 31 which is 3 if you want to find
out determinant of Adj(2A)T. So, it is simply (|2AT| )2-1, which is |2A|T = 2n|AT|,
15
determinant of A transpose and A are same. So, it is determinant of A which is equals to
4 into 3 which is 12 ok. So, in this way we can simply solve this problem.
So these are some special type of matrices, row matrix is a matrix is said to be a row
matrix if it consists only 1 row, column matrix; a matrix said to be a column matrix if it
consist of only 1 column.
Diagonal matrix are a square matrix is said to be diagonal if it's non diagonal entries are
all 0, that a diagonal matrix. A scalar matrix: a diagonal matrix is said to be scalar if it's
all diagonal elements are same say k. Symmetric matrix means a matrix a is said to be
symmetric if AT = A that is a ij = a ji for all i, j.
We will discuss more about symmetric and askew symmetric matrix in detail later on. A
skew symmetric matrix: a matrix A said to be skew symmetric if AT= -A, that is a ij = -a ji
for all i and j and also all diagonal entries in Skew symmetric matrix are 0 now what are
elementary row operations.
So, first we are talking about matrices over the field F or over the field k. So, these
scalars whatever we are talking about comes on the field, if the field a set of real
numbers then the scalars will be a set I mean real. And, if we are talking about the set of
complex number the scalars come from the set of complex numbers. Now, what are
elementary row operations let us see there are 3 elementary row operations on an m n
matrix A over the field F.
16
(Refer Slide Time: 36:31)
(1.) Multiplication of any row of A by a non-zero scalar c, you see if you take any row R j
of a matrix A. And you multiply that row by a non zero scalar c then this is the first
elementary row operation which we can apply on a matrix A, (2.) The second elementary
row operation is replacement of the Rth row of A by Rth row+c.Sth row. Where c is any
scalar and R is not equal to S; that means, you take any R th row and you replace this R
th row by the R th row plus c time some other S th row. Then this is the second
elementary row operation on any matrix A (3.) and you can always interchange any 2
rows you can interchange i-th row by a j-th or j-th with by i-th; i j. So, these are the 3
basic elementary row operations. Now let us discuss this by an example.
If, A and B are mn matrices over the field F, we say that B is row equivalent to A, if B
can be obtained from A by the finite sequence of elementary row operations. You see
you have a matrix A and you apply some elementary row operations on that matrix and
you get a new matrix B, then we say that the matrix B is row equivalent to A.
17
(Refer Slide Time: 38:39)
Now, we discuss this example. Let us suppose you take A (see (Refer Slide Time:
38:39)). Now using elementary row operations transform A into identity matrix. Suppose
we have to apply elementary row operation on this matrix A and we have to convert this
matrix into an identity matrix.
So, how can we convert this? So, let us see area of solution let us discuss solution first.
So, this is matrix A. So, in the identity we have to take the first element as 1 and in that
column all the elements must be 0.
Similarly, for the second element and for the third element I mean in diagonal. Now to
make 0 here which element a row operation we will apply to make 0 here we will take
this row R2 and we add with R1 because -1+1 will become 0 here.
So, we make first elementary row operation in R2 row and we replace R2 by R2+R1 ok.
Now, this -1+1 is 0; 0+2 is 2; 2+3 is 5.
So, this is the first elementary row operation which we have applied in this matrix. Now,
the next is we have to make 0 here because we want to make identity here. Now to make
0 here we have to take the third row and we have to subtract 2 times the first row, I mean
we have to replace the third row by R3-2R1, then it will become 0; 2-2.1 become 0.
Now, now we have to make 1 here to make identity so, replace this row by 1/2 this row, I
mean replace R2 by 1/2 R2, because you want to make 1 here ok. Next, you divide this
by minus 2, if you divide this by minus 2 or replace this row by -1/2R3, then it is 0; it is
18
[1 2 3; 0 1 5/2; 0 0 1]. Now, you have to make 0 here 0 here and 0 here to complete the
identity matrix. To make 0 here with the help of this row you simply take this row 1 and
you subtract it with twice of row 2; that means, in the row 1 you apply the elementary
row operation R1-2R2. Now, R1 you replace R1 by R1+2R3. And other elements remain
the same now you want to make 0 here to complete identity matrix. So, you replace R2
by R2- 5/2 R3. So, this will be the identity matrix.
So, we have applied series of elementary row operations to get to convert matrix A into
an identity matrix ok. Now, if you talk about this matrix say this matrix. So, this matrix
is obtained from the matrix A by two elementary row operations. So, we can say that this
matrix is a row equivalent matrix to A or in fact, we can say any matrix any matrix up to
here are the row equivalent forms of the matrix A, because they are obtained by applying
some elementary row operation on the matrix.
Now, let us discuss this example, suppose we discuss to convert this into an identity
matrix by applying elementary row operation.
19
(Refer Slide Time: 43:38)
So, consider the matrix (see (Refer Slide Time: 43:38)). Now, by applying elementary
row operation you want to transform this matrix into an identity matrix. So, how you will
proceed?
In the first column you see if there is any 1, I mean any element '1' then you interchange
those rows first ok. So, first what you do you interchange R1 and R2. So, this is[1 -1 2; 2
-1 0; -1 0 1]. The other way out is you divide it by a 2 and then apply the elementary row
operations now you want to make 0 here with the help of this.
So, to make 0 here with the help of this you replace R2 by R2 -2R1 then only it will
become 0. Now you want to make 0 here with the help of this. So, replace R3 by R3+R1.
Now, this is already 1 you want to make 0 here because you want to complete identity
matrix. So, again you have to apply elementary row operation you replace R3 by R3+R2.
Now it is -1 you have to make 1 here.
So, you multiply this by -1 you replace R3 by -R3. Now, you want to make 0 here with
the help of this. So, you replace R1 by R1 + R2. Now, you want to make 0 here with the
help of this. So, you take R1-> R1+2R3. Now, you want to make 0 here with the help of
this or R2 -> R2+4R3; this will give you the identity matrix; So, it will be an identity
20
matrix now. So, these are the elementary row operations which we apply to convert this
matrix A into an identity matrix, similarly we can push it for the second problem.
21
Matrix Analysis with Applications
Prof. Dr. S.K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 02
Echelon form of A matrix
Hello friends, welcome to lecture series on matrix analysis with applications. So, in the
first lecture we have seen that how we can apply elementary row operation to get a row
echelon form of a matrix A ok. Now we will see echelon form of a matrix what do you
mean by echelon form and how we can obtain equal echelon form by applying
elementary row operations, before starting the echelon form we will first see elementary
matrices what do you mean by elementary matrices?
An nn square matrix is called an elementary matrix E if it can be obtained from nn
identity matrix In using single elementary row operations. We have already seen the last
lecture that there are 3 basic elementary row operations, the first is you replace any row
by multiplying it by non zero scalar, the second elementary row operation is you replace
any row R say R th row by R th row plus c times some S th row ok, and you can always
interchange any 2 rows, so these are the basic 3 elementary row operations.
Now, if you apply an elementary row operation in any matrix A and the same elementary
row operation you apply in the identity matrix I, then that matrix which is obtained from
22
I is called an elementary matrix ok. Now the elementary matrix said to be of type 1, 2 or
3 according to whether the elementary operation performed on I n is type 1, 2 or 3
operation respectively. The three operation which we have defined if we are applying the
first operation of the identity matrix, we are saying it is the elementary matrix of type 1.
Similarly, type 2 and type 3 now for example, interchanging the first 2 rows of I 3
produces the elementary matrix is this you see, if you interchange first 2 rows. So, this
matrix is obtained and this matrix we call as elementary matrix ok.
Now, let this is a result, let A is a mn matrices over the field F, and suppose that B is
obtained from A by performing an elementary row operation ok. Then there exist an
mm elementary matrix E such that B is equal to E. So, there always exist an elementary
matrix E such that B = EA.
In fact, E is obtained from I by performing the same elementary row operation as that
what was performed on A to obtain B ok. If you obtain B from A by applying some
elementary row operation and same elementary row operation you applied on I m , so we
obtain E. Conversely if E is an elementary mm matrix then E is the matrix obtained
from A by performing the same elementary row operation as that which produce E from
I m . So, convert, so the statement is also true, so let us discuss it by a few examples.
23
(Refer Slide Time: 04:18).
See a matrix A here A is this matrix of order you see it is 34, 3 rows, 4 columns. Now
in this matrix you suppose interchange the second row of A with the first row you apply
elementary row operation you interchange R 2 by R 1 . So, the new matrix is obtained
which is row equivalent to A, which is 2 1 -1 3 the second row, the first row will be into
second row now [2 1 -1 3; 1 2 3 4; 4 0 1 2] ok. And if you have an identity matrix here is
order 33 and you interchange the first and the second row.
So, this array this matrix is obtained and this matrix is an elementary matrix here ok, and
E A = B when you multiply E with A, then E is obtained with this you can easily check
ok, now let us take second example you see here A is this matrix ok.
24
Now, suppose we want to transform this A into I by applying elementary row operations
ok, then now how you will make this as the identity matrix by applying elementary row
operations you see the first element is 1 here which is non-zero element you first
interchange this row with the first row that is you interchange R 2 and R 1 you will obtain
a new matrix B 1 which is [1 0 -1], this row will come here and this row will come here
[0 1 0] and this [4 0 1].
Now, you make 0 here. So, replace R 3 by R 3 - 4R 1 you get new matrix B 2 and B 2 will be
equal to the first 2 rows remain the same and the third row is [0 0 5]. Now you make 1
here to make 1 here you would simply divide this row by 5 you obtained this matrix B 3 ,
it is [1 0 -1; 0 1 0; 0 0 1], now to complete identity you have to make 0 here with the help
of this. So, you replace R 1 by R 1 + R 3 . So, this will give an identity matrix B ok.
Now, to obtain this matrix from this matrix what is the elementary matrix you see, we
have interchange R 2 by R 1 . So, same elementary row operation you apply on identity
matrix of order 33 you replace R 2 by R 1 . So, which matrix we will obtained the matrix
which will obtained is E =[0 1 0; 1 0 0; 0 0 1] which is the obtained by interchanging R 2
and R 1 of identity matrix ok.
Now similarly to obtain B 2 from B 1 we have what you have done we have replace R 3 by
R 3 - 4R 1 . Now you apply a same elementary row operation is the identity matrix to get
the new elementary matrix E 2 ok.
So, here if you take the identity matrix and replace the third row by third row minus 4
times the first row then you get minus [4 0 1]. So, this is the elementary matrix 2
similarly now you take the third, I mean B 3 ; how B 3 is obtained from B 2 . To obtain B 3
25
from B 2 you have to divide simply the third row by 5, you apply the same elementary
row operation as the identity matrix to get the elementary matrix .
So, simply divide the third row by 5, all the 2 rows are same. So, this is E 3 and similarly
the E 4 can be obtained by simply adding the first row by the third row, you apply the
same elementary row operation to identity matrix to get E 4 .
Now, by the theorem which we have just stated to get B 1 from A we have the elementary
row operation E 1 ok. So B 1 can be written as E 1 A because E 1 is the elementary row
matrix formed from applying this row operation to identity matrix, similarly B 2 will be
E 2 B 1 ; B 3 will be E 3 B 2 . And similarly the last matrix B which is an identity matrix of
course is E 4 B 3 . Now what we have obtained if you substitute B 3 over here, this is E 3 B 2
then you replace B 2 by E 2 B 1 and then B 1 by E 1 A.
So, what finally, we have obtained finally, we have obtained B = E 4 E 3 E 2 E 1 A so; that
means, if here we have applied 4 number of elementary row operations and for each
elementary row operation we obtained an elementary matrix ok, to get the final matrix
which is the row equivalent to a matrix A they simply E 4 E 3 E 2 E 1 and the first matrix
initial matrix A ok, or we can say that B = EA where BE is simply multiplication of all
elementary matrices in this order E 4 E 3 E 2 E 1 .
So, that means, what I want to say that if any mm matrix A is transformed to B by
applying k number of elementary row operations then there exist an mm invertible
square matrix U that is a product of elementary matrices, U = E k E k-1 E k-2..... E 1 such that
B =U A.
26
So, that matrix always exist and it is invertible and why elementary matrix are invertible,
you see elementary matrices are obtained from identity matrix by applying some
elementary row operations.
You apply some elementary row operation on this matrix, elementary row operation and
you got a new matrix B which is the row equivalent to A ok. So, this matrix can be
obtained either by interchanging 2 rows or to multiply this A by and any row of this
matrix A by non-zero scalar or you replace some ith row by ith plus c times gth row
where i j ok.
So, we can simply say that |A| =|B|, where 0, because applying elementary row
operation will not change the nature of the matrix, I mean if it is invertible remains
invertible ok. It simply change the determinant of the matrix by some scalar because we
can multiply a row by A non-zero scalar.
So, if elementary matrix are obtained from identity matrix by applying elementary row
operations. Since identity matrices are invertible, it has a determinant 1. So, elementary
matrix are also invertible because determinant of identity matrix will be some alpha time
determinant of E where 0, and E this is 1. So, simply determinant of E 0 ok, so
invertible. So, elementary matrix are always invertible and then there product also
invertible because product of invertible matrix also invertible ok.
27
(Refer Slide Time: 13:16).
Now, come to echelon form of a matrix. Matrix A is called an echelon form or is said to
be in echelon form if the following 2 conditions hold. Where a leading non-zero element
of a row of A is a first non-zero element in the row ok, we are taking the leading element
as the first non zero element in that row ok. So, the property 1: all zero rows if any are at
a bottom of the matrix? If there is a row continuing all zero, that must be at the bottom of
the matrix. Property 2: each leading non-zero entry in a row leading non-zero entry
means is the first non-zero entry in a row is to the right of the leading non zero entry in
the preceding row.
If you have the second row the first non zero entry in the second row must be on the right
side of the first non zero entry in the first row ok, that is if rows are 1, 2,...,R, are the
non-zero rows of a matrix and if the leading non-zero entry i occurs in the column k i , i
from 1 to r then k 1 is less than k 2 is less than k r . So, let us discuss this by some
examples.
28
Now, you focus on the first matrix, first of all the 0 rows are at the bottom. The row
containing all 0 elements this is the bottom now first property holds, now you see the
first leading element leading means the first non-zero element is here this is a 12 the first
leading non zero entry here is this 1.
The first non zero entry here is this 1 now this element the first non zero entry of this row
is to right of this row. The first non-zero entry of this should be right of the next row. So,
this satisfy the properties of the echelon form of a matrix and hence we can say that this
is the echelon form.
Now this is the second example, the second example you see that there is no row
containing all zeros ok. Now the second element is the first non zero element in second
row and the third element is the first non zero element in third row. And the leading
element of non zero row is a right of this leading element of second row. So, this is a
echelon form of some matrix.
However if you see the third example here you see this is not a echelon form because this
row is an containing all 0 element, but it is not at a bottom of the matrix. So, this not a
echelon form here if you see in this example the first non-zero element or the first row.
And this is the first non zero element of the second row, but this non zero element is on
the left of the first non zero element of the preceding row. So, this is not A echelon form
because it must be on the right side. If it is 0, then we have to make 0 here to make it a
echelon form. So, this is not a echelon form now let us try this example that how we can
make it a echelon form of the matrix.
29
(Refer Slide Time: 17:16)
So, you apply elementary row operation to make 0 here, you replace R 2 by R 2 + 4R 1 this
will make this element 0. Now this is the first non zero element of this row which is 1,
this is the first non zero element of second row which is minus 12, and it is to the right of
this row there is no row we continuing all the 0 elements. So, we can say that this is the
echelon form of this matrix A. So, in this way we can find the echelon form of a matrix.
Now, let us consider a second example, the second example can be framed you see you
have to form, form the echelon form of this matrix. So, how we can do that;
30
(Refer Slide Time: 19:14)
It is [2 4 -3 -2; -2 -3 2 -5; 1 3 -2 2] now you want to frame its echelon form. So, you
leave first non zero element as it is; make 0, the first entry of second row with the help of
third row. So, we apply elementary row operations, so you replace R 2 by R 2 + R 1 .
So, this is [2 4 -3 -2; 0 1 -1 -7; 1 3 -2 2] ;you leave the second row as it is; now you make
0, the first entry of third row with the help of R 1 . So, you replace R 3 by R 3 -1/2R 1 . So, it
is [2 4 -3 -2; 0 1 -1 -7; 0 1 -1/2 3], now below second entry of second row all the
elements must be 0. So you pick out the first non leading element make 0 in that column
leaving this element make 0 to all the elements below this. And the second non-zero
second leading element I mean second non-zero element in that column in that row make
0 below. So, here you make 0 you simply replace R 3 by R 3 -R 2 .
So, this will be simply [2 4 -3 -2; 0 1 -1 -7; 0 0 1/2 10] So, you see that there is no row
containing all 0, if you take this first non zero entry all the elements below these are zero,
first non zero entry second row all the element below these are 0 and their third element,
so this is likely one form of this matrix ok. So, this is all about echelon form we will see
some more illustrations of echelon form and its applications in the next lecture.
Thank you.
31
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 03
Rank of a Matrix
Hello friends. So, welcome to lecture series on Matrix Analysis with Applications. So,
the next lecture is on Rank of a Matrix. What do you mean by rank and how it is
important to solve system of linear equations we will see in this lecture and then the next
lecture. So, before defining rank of a matrix let us see what do you mean by linear
dependent or linear independent of elements of Rn or Cn.
So, let v 1 , v 2 ,...v k let us suppose these r k elements of R n or C n . What do you mean by
R n ? R n means all those x 1 , x 2 up to x n such that x i belongs to R for all i.
That means, Rn is simply set of n tuples such that each x i belongs to R. Now, if you think
about Cn and Cn is simply all x 1 , x 2 up to x n such that x i belongs to set of complex
numbers for all i.
32
Now, in this definition this v 1 , v 2 up to v k are the elements or vectors we also call it
vectors Cn, ok. That means, each v i consist of set of n tuples which is either in Rn or in
Cn. Then these elements or vectors are said to be linearly dependent or LD if there exist
scalars 1 , 2 ,..., k not all 0 such that 1 v 1 , 2 v 2 ,..., k v k =0............(1), ok.
And if the equation 1 is satisfied only for 1 = 2 = k = 0 that is all 's are 0, then these
vectors are called linearly independent. Now, let us understand what do you mean by
this.
Now, you have vectors this is v 1 , v 2 up to v k . Let us suppose this belongs to Rn, each v i
belongs to Rn, ok. Now, you take you take 1 v 1 , 2 v 2 ,..., k v k , this is called linear
combination of v 1 , v 2 up to v k . These 's are some scalars, some constants, ok, if you
change this v will change, but all v are simply the linear combination of elements of
v i ’s.
33
(Refer Slide Time: 04:53)
Now, if we are taking 1 v 1 , 2 v 2 ,..., k v k = 0 implies there exist at least one i 0, then
v 1 , v 2 up to v k are called linearly dependent, ok.
So, that means what? This is 2 . That means, if vectors are linearly dependent, then there
exist at least one vector which can be expressed as linear combination of remaining
vectors. So, you can always put it some 1 , you can always take it some 2 , you can
always take it some say p-1 and so on. So, what I want to say that if vectors are linearly
dependent, then there always exist at least one vector or one element in that set which
can be expressed as linear combination of remaining vectors. And if vectors are linearly
independent then none of the vector can be expressed as linear combination of remaining
vectors, ok.
34
(Refer Slide Time: 07:38)
Now, let us discuss this thing by few examples. So, what do you mean by linearly
independent or dependent vectors let us discuss all this things.
So, the first problem is, this is (1, 2), (0, 2), (3, 4). These vectors are in R2. We have to
see whether this set of vectors is linearly independent or dependent.
Now, you can simply see by observation that (3, 4) can be expressed as 3(1, 2)-1(0, 2)
So, we are express this vector(3, 4) as a linear combination of these 2 vectors. So, what
does it mean? It means that these vectors are linearly dependent. So, this is linearly
dependent, LD, this set is LD, ok. Because we can express one vector as a linear
combination of the remaining vectors.
Other way out is you take a linear combination of these vectors, and put it equal to 0. So,
what does it imply? It imply 1 (1, 2)+ 2 (0, 2)+ 3 (3, 4) = (0, 0), that means, 1+ 3 3 =0,
2 1 +2 2 +4 3 =0. Now, we are having 2 equations and there are 3 unknowns and
35
equation is homogenous; right hand side is 0, ok. So, you can arbitrary choose any value
of 3 , may not be 0, ok.
Then you can find out 1 and 2 so that means, there exist some alpha which is not equal
to 0 and that means, this set of vectors are linearly dependent, ok.
Now, for second problem (2, 1, 0), (1, 0, 2), (0, 1, 2), these are in R3, and you have to see
where this set of vectors are linearly independent or dependent. So, you take the linear
combination of these vectors, put it equal to 0, ok. So, this implies 2 1 + 2 = 0, then
1 + 3 = 0 and then 2 1 +2 2 =0.
From here we are getting 2 = -2 1 , when we substitute it here we are getting from here
it is -2 1 + 3 = 0 that means, 3 = 2 1 , ok. And when we substitute it over here in this
equation 1 = 0and 3 =2 1 => 3 = 0 2 = 0.
So that means, we are getting only one solution which is the 0 solution that means, all 's
are 0 - it is the only solution of this equation that means, this vector are linearly
independent.
36
(Refer Slide Time: 12:14)
Now, if you take this set (0, 0, 0), (1, 2, 3), and (3, 4, 5) the next example. Now, this is
clearly linearly dependent. Why it is linearly dependent? Because this vector (0, 0, 0),
this can always we expressed as 0(1, 2, 3)+0(3, 4, 5). That means, one of the vectors of
this set can be expressed as linear combination of remaining 2 vectors, this means set is
LD, set is linearly dependent.
Now, if you see the next problem it is (2, 3, 4) then (-1, 4, 2) then (1, 7, 6). Now, you
have to see whether this set is LI or LD. So, you take linear combination of these 3
vectors put it equal to 0 and if you find only one solution that is all 's are 0 that means,
these vectors are linearly independent otherwise linearly dependent. One thing you can
easily observe that here one of the vectors can be expressed as linear combination of
other two, this clearly means set is LD set. So, in this way we can check whether set is
linearly dependent or independent. Now, we will see some properties of dependent or
linearly independent sets.
37
The first property is any set containing 0 element is always LD. This is very easy to
show. If you are having any set containing 0 element then that 0 element can always be
expressed as linear combination of the remaining vectors, were scalars are also 0, ok. So,
since this 0 vector can be expressed as linear combination of remaining vectors. So, the
set is clearly LD.
In the third example we have seen that how this set is LD, because this (0, 0, 0) can be
expressed as 0(1,2,3)+0(3,4,5). That means, 0 vector can be expressed as linear
combination of remaining 2 vectors, ok.
The next property is any set S subset of Rn containing n+1 or more elements is always
LD, this is easy to prove you see. You see we have some set say v 1 , v 2,.... v n, ,v k.
suppose, and all v i’s are in Rn, ok.
So, this is the set containing n+1 or more elements k 1. So, this set is containing n+1 or
more elements and each vi’s you see if you are talking about v 1 . So, v 1 is in R n that
means, it is v 1 = (v1 1 ,v2 1 ,.....,vn 1 ) and similarly if you talk about v i. So, v i = (v1 i
,v2 i ,.....,vn i ) for all i. Now, if you take the linear combination of this vector. So,
1 v 1 + 2 v 2 +...+ n+1 v n+1+... n+k v n+k = 0.
Now, when you substitute v 1 from here to here, similarly v 2 which is a n tuples,
similarly v n+k then when you multiply with 1, multiply v 2 by 2 and so on you will get
n equations with n+k unknowns. So, we are getting n equations with n+k unknowns, k
1.
38
So, more unknowns and equations are less that means, many solutions. And many
solution means there exist non-zero solution and non-zero solution means set is LD set is
linearly dependent. So, this set is always LD, ok.
The third property is if the set is LD then any super set of it is also LD. This is again easy
to show you see if we have a set say v 1 , v 2 ,.... v p say this set is LD. So, if this set is LD
that means, there exist some v k where k is between p to 1 such that this v k can be
expressed as linear combination of remaining vectors, p v p .
Now, take a superset of this set, say superset is v 1 , v 2 ,...,v p ,....v p+m take a superset of
this set containing elements more than this containing this set and some more elements.
Now, if k can be expressed as linear combination of the remaining vectors, then this k
can also be written as 1 v 1 .... k-1 v k-1 + k+1 v k+1 + p v p +0(remaining vectors), that means,
that means linear combination of remaining vectors.
So, this set will also become linearly dependent, because we have expressed one element
v k which is in this set and can be expressed as linear combination of remaining vectors,
ok. So, this set is also LD. So, we have shown that if you have any set which is LD then
any super set of this set is also LD, ok.
The next properties if a set is LI then any subset of it is also LI, if it is not if any subset
of it is not LI then it is LD and if it is LD then the super set of this set will also be LD
however it is LI. So, this contradict statement 3, ok. So, therefore, if a set is LI then any
subset of it is also LI.
39
Now, come to the rank of a matrix. How we defined rank of a matrix? Say we are having
a matrix of order mn.
Then the rank of matrix A denoted by r(A) or rank of A is defined as the number of non-
zero rows in the echelon form of a matrix. We have already discuss the echelon form of a
matrix. Now, number of non-zero rows in the echelon form is called its rank or it can
also we defined as maximum number of linearly independent rows or column of the
matrix A, ok. If you have a matrix A = [a 11 , a .... a 1n ; a 21 , a 22 ....a 2n ], it is of order mn.
Now, you take the first column say C 1 as a vector, the second column as second vector
and the third as third vector, this n th column as a n th vector. This number of linearly
independent vectors or number of independent columns is the rank. I mean we are taking
the first column as first vector the second column as second vector the n th column as n
th vector.
Similarly if you take the row this is first row and the first vector R 1 and m th row as the
m th vector; So, the number of linearly independent rows or columns considering one
column as one vector or considering one row as one vector, ok. So, number of linearly
independent rows or columns is the rank of the matrix.
40
(Refer Slide Time: 22:31)
Now, let us discuss few examples. Now, this is the matrix A. Now, this is the echelon
form of some matrix is clear because you see the first leading non-zero element is 1 in
the first row and below this all elements are 0.
The second row, the first leading non-zero element is 2 and below this all elements are 0
and all rows containing only zero element are at the bottom of the matrix, ok. So, now,
how many number of non-zero rows it is having it is having only 2 rows first row and the
second row. So, that we can say the rank of this matrix is 2, ok. We can also say like this,
vector (1 2 1) can never be expressed as (0 2 4), ok. So, these two vectors are linearly
independent. So, maximum number of linearly independent rows with this matrix is only
2, ok. So, rank of this matrix is 2.
Now, see here if you have this matrix the first row is (2 2 3), the second row is (-2 -2 -3),
third row is (6 6 9). Now, if you multiply the first row with -1 you will get the second
row that means, the second row is simply -1 time the first row, ok. So that means, these 2
rows are linearly dependent similarly if you multiply this first row by 3 you will get the
third row that means, this row and this row is also dependent.
So, how many maximum number of linearly independent rows with this matrix is
having? Only one, that is (2 2 3) because other 2 rows are linearly dependent on the first
row. So, maximum number of linearly independent rows with this matrix is having only
one. So, the rank of this matrix is 1. You can also find the echelon form of this matrix
and then you can see the number of non-zero rows with this matrix.
41
(Refer Slide Time: 24:56)
Now, let us find the echelon form of these matrices. I mean echelon form and then the
rank.
Now, what is the matrix we were having? It is [1 3; -1 2; 2 1]. Let us find the echelon
form of this matrix. The echelon form obtained after s series of elementary row operation
is: [1 3; 0 5; 0 0]. And rank of this matrix will be 2. In fact, the order of matrix is 32 .
So, rank can never be more than 2, because whatever element you are having here you
will always make 0 with the help of this to convert this matrix in a echelon form. So,
maximum linearly independent rows or maximum non-zero rows with this matrix is only
2, ok. It maybe, less than 2 also, but the maximum is only 2 you see if you are having
this type of matrix [1 2; 2 4;3 6] what the echelon form of this matrix [1 2; 0 0; 0 0]. So,
the rank of this matrix is simply 1, ok.
42
So, I want to say that if a matrix is having an order m n. The rank of this matrix can
never exceed n and similarly can never exceed m. So, it will be always less than or
equals to minimum of m or n, ok.
Now, you have second example suppose it is [2 -1 4 5; 4 2 2 1]. So, it is having 2 rows
and 4 columns. Of course, by simply seeing the order of the matrix, I can say that rank of
this matrix if it is A, then rank of this matrix is always less than equal to 2. This I can say
surely, because rank of a matrix is always less than equal to minimum of m or n because
rank can never exceed m and can never exceed n.
Now, the echelon form of this matrix is [2 -1 4 5; 0 4 -6 -4]. So, how many number of
non-zero is it is having? It is having 2 rows. So, we can say that rank of A is 2, ok.
Now, the next problem is suppose you want to find out the conditions on , for which
this matrix has rank 1 2 or 3. So, this is also a simple problem let us try to find it. So,
what is the matrix we are having?
43
(Refer Slide Time: 28:59)
Matrix is [ 1 2; 0 2 ;1 3 6]. So, first find its echelon form and then only we can say
about rank of this matrix A. So, echelon form can be deduced from A after applying a
series of elementary row operations. The echelon form of A is: [1 3 6; 0 2 ;0 0 (1-3)(4-
)/2]. Now, for which values of alpha and beta rank of this matrix is 1 you see, this
element is non-zero. So, rank will always be 2 or more than 2 rank is either 2 or 3 rank
can never be 1.
44
So, the first consider rank 1. No, , , because this case is not possible, I mean this case
is not possible. For no values of , rank is 1, ok.
Now, for rank 2. Now, rank of this matrix is 2 if (1-3)(4-)/2 = 0. Now, second row
cannot be 0 because this is 2, ok. So, rank will be 2 if either = 1/3 or = 4 and rank of
A will be 3 if (1-3)(4-)/2 0. So that means, 1/3 or 4. So, whenever you want
to find out echelon form of any mn and matrix you first write its echelon form and then
try to find its number of non-zero rows. From number of non-zero rows you can find its
rank.
Now, we have some properties of rank of a matrix. The first properties rank of only zero
matrix is 0, only the null matrix has a rank 0, ok. Of course, rank of a matrix can never
be in fraction because it is number of non-zero rows, ok. Elementary row and column
operations of a matrix are rank preserving that means, it does not change the rank of a
matrix if you apply a row operations or you apply a column operations, it will not alter
the rank of the matrix. Rank remain the same because it is number of maximum number
of linearly independent rows or columns, and hence rank of A same as rank of AT.
Rank of c.A, where c is any non-zero scalar is always equal to rank of A you see. If you
multiply a matrix by a non-zero scalar it will not change the maximum number of
linearly independent rows or columns it is having. So, rank will remain the same. This I
already explained, rank of A is always less than equals to minimum of m or n.
Now, if you have a square matrix of order nn and rank is n. Now, if rank is n this
means matrix does not have any 0 row because the order of matrix is nn and rank is n
that means,in the echelon form of the matrix it does not have any row containing all 0
45
elements. And that means, the determinant of the matrix is not equal to 0, because we
have already discuss this thing that suppose A is a matrix and B its echelon form, B is its
echelon form then determinant of A is some |B|, where 0.
So, if this echelon form does not have any 0 row (it is a square matrix), then determinant
is not equal to 0. So, determinant of A will not be equal to 0, and if determinant is not
equal to 0 that means, the echelon form of the matrix does not have any row which
contain all 0 that means, rank of that matrix is n.
So, from here we can also conclude that if rank of a matrix is less than n which is n-1, n-
2 or any other thing, then determinant of A is always 0. Because if rank of a matrix is
less than n-1 that means, it contains at least 1 row containing all 0 element. So,
determinant of the echelon form of the matrix is 0 and hence the determinate of the
original matrix is also 0, ok. So, these are some of the property of rank of a matrix.
Thank you.
46
Matrix Analysis with Applications
Dr. S.K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 04
System of Linear Equations-I
Hello friends, so welcome to lecture series on Matrix Analysis with Applications. In the
last lecture we have seen how we can find out rank of a matrix. I have discussed that
rank of matrix is nothing, but number of non-zero rows in that echelon form of that
matrix or it is the maximum number of linearly independent rows or columns of the
matrix. We have also seen some of the important properties of rank or the matrix. Now
how it is useful to solve system of linear equations we will see in this lecture.
So, what system of linear equations is let us see. A system of linear equations is the set
of linear equations or a list of linear equations with the same unknowns. Consider, the
following system of m linear equations with n unknowns you see, we are having here m
number of equations, our equations are linear having same set of variables x 1 to x n . So,
this system is having n number of unknowns with m equations. So, a ij 's are known and
the right hand side is also known and we have to find out x j 's. Here j is varying from 1 to
n, these are the unknowns ok.
47
(Refer Slide Time: 02:03)
Now this can also be expressed in the matrix form as follows you see, if you write Ax =b
where A is given by this matrix. If you see here the coefficient matrix is A= [a 11
a 12 ...a 1n ; a 21 a 22 ....a 2n ; ........ a m1 a m2 ....a mn ].
So, this is the coefficient matrix and this x =[x 1 x 2 ...x n ] the unknown vector and this b
=[b 1 b 2 ....b m ] this is right hand side. So, this A is also called coefficient matrix this x is
called unknown vector which is to find out, this b is called constant vector or right hand
side column vector. So, the matrix representation of a system of linear equation with n
unknowns and m equation is given by this expression. Now, system of linear equations
are of two types is it is either homogenous or non homogenous.
What do you mean by homogenous? You see if the right hand side that is this b, if all
b i ’s are 0. Then this system of equations are called homogenous system, and if there
exists at least one b j which is not equal to 0 then this system of equation is called non-
homogenous system of equations.
48
(Refer Slide Time: 03:43)
The system of linear equation Ax = b is called consistent if it has one or more than one
solutions. System of equations are either consistent or in consistent; consistent means it
has solution, solution may be one or may be more than one. And inconsistent means no
solution ok, no solution. Now, system of linear equation cannot have finitely many more
than one number of solutions ok; that means, if you are talking about system of linear
equations whether it is homogenous or non-homogenous there are only three
possibilities.
The system may either have unique solution that is only one solution, or it has no
solution ok, or it has infinitely many solutions. It can never it can never have finitely
many solution that is more than one solutions. Why? You can see here you have system
of equation Ax = b; where A is the mn matrix, x is a vector which is an r(n) and b is
again a vector which is an r(m) ok.
49
(Refer Slide Time: 04:51)
Now, suppose this system has two solutions x 1 and x 2 are the two solutions of Ax = b.
Now, if x 1 and x 2 are the two solutions of this means Ax 1 = b and Ax 2 = b.
If it is a solution this means it satisfies the system of equations, now if you take any
x = λx1 + (1 − λ ) x 2 where is any real number. If you take any x bar which is defined
like this you take A x = A(λx1 + (1 − λ ) x 2 , which can be equal to Ax 1 +(1-)Ax 2 , Ax 1 =
b also Ax 2 = b from here.
So, it is xb+(1-)xb = b. So, this implies A x = b ; that means, x is also the solution of
the system of linear equation x = b and x =Ax 1 +(1-)x 2 . So, you vary in R, you will
get, so many x satisfying this equation. So, this equation will be having infinitely many
solutions ok. So, this system can never have finitely many solutions or more than 1
solutions. If it has 2 solutions than this may be having infinitely many solutions ok.
50
(Refer Slide Time: 07:03)
Now, first we will consider how we can find out the solution of homogenous system of
equations, homogenous system of equation means the right hand side the 0 ok. Now, the
homogenous system is always consistent it will never be having no solution the reason is
if you take to Ax equal to 0. So, x equal to 0 which is the trivial solution always satisfy
the system of linear equations so; that means, the homogenous system is always
consistent. So, only we have only two possibilities either it has unique solution or it has
infinitely many solutions ok.
Now, when we can say that it has unique solution or infinitely many solutions, this
unique solution is also called trivial solution which is 0 solution, x = 0 all x i’s are 0 and
infinitely many solution is also called non trivial solution, some x i may not be 0. Now
how can we find out whether the system has unique solution or infinitely many
solutions? So, now, we can use a concept of rank. let us consider a matrix of order mn
such that the rank of matrix is r suppose the rank of this matrix is r.
Now, rank of this matrix is r what does it mean, it means that that this coefficient matrix
A is having r number of linearly independent rows or columns, because rank is
maximum number of linear independent rows or columns. If the rank is r this means this
matrix is having r number of linearly independent rows or columns.
Now, here for this coefficient matrix A number of unknowns are n, Ax = b number of
unknowns are n. Now, if r = n; that means, rank of the matrix is n, n is number of
unknowns; that means, if you are taking Ax = 0 the homogenous system, A is the matrix
of order mn and x = x 1 , x 2 ...x n .
51
(Refer Slide Time: 09:28)
Now, this system is either having unique solution as we have discussed or this system
has infinitely many solution. This is also called trivial solution this is also called non
trivial solution. A unique solution means x = 0, and infinitely many solution means all
xi’s are not equal to 0 ok.
Now, if rank of the matrix is r. So, there are only two possibilities either the rank is equal
to n, n is number of unknowns. Now, if rank is equal to n this means the maximum
number of linearly independent equation are n because the rank is n this means the
echelon form of this matrix is having n number of linearly independent rows and; that
means, we are having n number of linearly independent rows with n number of
unknowns. So, it will be having unique solution and which is the 0 solutions or trivial
solution.
Now, if r n because r is less than less than minimum or less than equal to minimum of
m or n ok. So, if r is less than n this means numbers of linearly independent equations are
less than n that is unknowns are more than number of linear independent equations. So,
system will be having infinitely many solutions ok.
So, if r < ,n infinitely many solution that is non zero solution or non trivial solution.
Now, how to find out all those infinitely many solutions, you can put n - r variables as
arbitrary, you can choose n - r variable arbitrary value and then you can solve the
remaining system. Now, let us discuss few examples based on this. Suppose you are
having this system ok.
52
(Refer Slide Time: 12:48)
The system of equation is this the coefficient matrix is [1-3 7; 2 -1 4; 1 2 9]. Now, how
can we see whether this system has unique solution or infinitely many solution. There are
only two possibilities because system is homogenous, now this we can find out
calculating the rank of this matrix. So, what is the rank of this matrix.
Now, coefficient matrix is [1-3 7; 2 -1 4; 1 2 9]. Now, let us fine rank of this matrix,
convert this matrix to its echelon form; it is [1 -3 7; 0 5 -10; 0 0 12]. So, the rank of this
matrix is 3 which is equal to the number of unknowns.
53
So, this implies unique solution, and unique solution means trivial solution and trivial
solution means x 1 =x 2 =x 3 = 0 all x are 0 ok. So, hence we can find out the solution of this
system the next problem suppose we are having this problem.
You see here we are having 4 equations with 3 unknowns and we have to see whether
this system has unique solution or infinitely many solutions. So you can write the
coefficient matrix, which is [2 -5 8; 2 7 -1; 4 2 7; 4 -22 25]. When you find out the rank
of this matrix by converting this into its echelon form.
So, rank we can obtain rank as 2, rank as 2 and number of unknowns are 3. So, rank of
matrix is less than number of unknowns this means this system has infinitely many
solutions, now how can you find out all those solutions you can start with this matrix
itself.
54
(Refer Slide Time: 15:47)
So, now let us solve these two problems what are conditions on a and b such that the
system of linear equation has unique solution or infinitely many solutions. We are to find
out the conditions on a and b such that this system has unique solution or infinitely many
solutions. So, let us try this problem now what is the coefficient matrix here.
55
(Refer Slide Time: 17:22)
The coefficient matrix is [1 2 3; -1 -2 a; 2 b 6]. Now to find the conditions on a and b for
which the system has unique solution or infinitely many solutions, we first find out the
echelon form of this matrix then we can impose the condition on a and b that when the
rank of the matrix is equal to 3 because if rank of the matrix is 3, this means is equal to
number of unknowns this means unique solution, and then, when rank of the matrix is
less than 3 and that case it is it is having infinitely many solutions.
So, echelon form of the matrix is [1 2 3; 0 b-4 0; 0 0 a+3]. Now for unique solution rank
of A must be equal to number of unknowns, number of unknowns are 3. So, rank of A
must be 3; that means, there is no row containing all 0. So, b-4 0 and a+3 0 then only
none of the rows will be zero, if anyone become 0 then the rank will be less than 3. So,
this implies a - 3 and b 4 then only rank of this matrix will be 3.
So, for unique solution we are having these two conditions, now for many solutions
infinitely many solutions rank of A must be less than 3. Now rank of A less than 3, that
means there exist at least 1 row containing all 0 elements. So, this may be having that b--
4 0; a+3 = 0 or both of them are 0. So, we can write either a = -3 or b = 4 or both, then
the rank of this matrix will be less than 3.
If both the conditions satisfy the rank is 1, if one of the condition satisfy the rank is 2, in
both the cases the system is having infinitely many solutions ok.
Now, similarly if you try to find out this system then how can we solve this system let
us see what is the coefficient matrix here.
56
(Refer Slide Time: 21:16)
Coefficient matrix here is [-1 0 2; 2 4; 1 1]. So first find the echelon form of this
matrix which is [-1 0 2; 0 8; 0 0 (3-/.8] Now, this will be having unique solution, if
rank of this matrix is 3 and rank of this matrix will be 3 when 0 then only rank will be
3 ok. So, 0 and for many solutions, for many solutions = 0. So, rank will become 2
and if 0, so rank will be 3. So, one condition is over now if 0, then it will be[-1 0
2; 0 8; 0 3]. Now, you can apply 1 row operation here you can replace R 3 by R 3 -
/R 2 .
So, what you will obtain it is [-1 0 2; 0 8; 0 0 3-/8], because you want to convert this
into its echelon form. Now, for unique solution how you will get the unique solution. If
rank of this matrix is 3, if rank of this matrix is 3; number of unknowns is also 3, if 0,
and this is not equal to 0, 0, is already here. So, we do not need this condition. So, 3 -
/.8 0, so this implies 38.
For many solution this must be 0 because we want rank less than 3 ok. So, 3 -/.8 = 0.
So, this implies 3=8, now if 0; the first condition if 0, you see for unique
solution 0. Ifwea substituted here = 0 then 0 that is coming from this condition
itself.
57
And if = 0 for many solution = 0 that is coming from here. So, we can combine these
two conditions and you can simply say that for unique solution, 3 8 and for
infinitely many solution 3 = 8.
So, in this way we can find out the conditions on and such the system of equations
are having unique solution infinitely many solutions ok. So, in this lecture we have seen
that how can we solve homogenous system of equations using the rank approach. You
find the rank of the matrix, if you see that rank of the matrix is equal to number of
unknowns this mean the homogenous system is having unique solution. If rank of matrix
is less than n less than number of unknowns this mean system is having infinitely many
solutions. In the next lecture we will see that how can we solve system of non-
homogenous equations.
Thank you.
58
Matrix Analysis with Applications
Prof. Dr. S.K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 05
System of Linear Equations-II
Hello friends, welcome to lecture series on matrix analysis with applications. So, in the
last lecture we have seen that how can we solve homogenous system of linear equations.
We have seen that if you have a system of equation like this Ax = 0 where A is a matrix
of order mn see [a 11 , a 12 ,...a 1n ; a m1 , a m2 ,...,a mn ] which is a matrix of order mn and x is
a vector it is [x 1 , x 2 ,...x n].
So, this A is called coefficient matrix and this x is called unknown vector which is to
find out. So, we have seen that while solving a homogenous system of equation and right
hand side is 0. So, this system is always consistent because at least x = 0 is the solution,
we satisfy this system of equations; linear system of equations. So, this is always
consistent and if it is consistent, there are only 2 possibilities; either it has a unique
solution or it has infinitely many solution.
So, where it will be having unique solution? Unique solution means 0 solution or the
trivial solution. When rank of A is equal to number of unknowns, then it will be having
unique solution or trivial solution. We also call it trivial solution which is a 0 solution
and if rank of A is less than n; that means, number of linearly independent equations are
less than number of unknowns, then the system will be having infinitely many solutions
59
ok. So, this we have seen in the last lecture that how can we solve system of homogenous
linear system of equations.
Now, if you are having system of non homogenous equation that is Ax = b, which is the
linear system of equation, but non homogenous. Non homogenous means b vector is not
equal to 0 ok. A is the coefficient matrix which is of order mn, x is the vector which is
to find out x 1 , x 2 ,...x n and this b is a vector b 1 , b 2 ,...b m and this is not equal to 0 because
if it is 0, then this will be the homogenous system of equations.
And now we are interested to find out the solution of non homogeneous system of
equations. Now to find out the solution of this system, so there are three possibilities;
either system will be having no solution or inconsistent or unique solution that is only 1
solution or infinitely many solutions. Now how can we see; how can you find out
whether this system has unique solution, many solution or no solution? So, we first
construct a new matrix which we call as Augmented matrix.
So, what is augmented matrix? You see augmented matrix is given by we denoted by
A:b. This matrix is augmented matrix and the augmented matrix means you first write A,
A =[a 11 , a 12 ....a n1 ; a 21 , a 22 ,...a 2n ,...a 2n , a m1 ...a mn ] which is a mn matrix. This is A
matrix. and then right hand side is [b 1, b 2 ,...b m ]. So, this matrix is called augmented
matrix.
60
Now what is the order of this matrix? Now you see, there are m rows in this matrix and
number of columns are n+1. So, the order of this matrix is mn+1. Now of course, if you
find the rank of this matrix, rank of augmented matrix; it will be always greater or equal
to rank of A. Why this is always greater than equal to rank of A, because augmented
matrix is basically having higher order than the order of A.
A is simply of order mn and you add one more column in the A matrix to get a
augmented matrix. So, it has a order mn+1 which is of higher order than the order mn.
So, the rank of augmented matrix will always be greater than or equal to rank of A ok.
Now, so we have seen that rank of augmented matrix will always be greater than or equal
to rank of A ok. So, augmented matrix has the order mn+1; while A matrix has a order
mn. Now there are 2 possibilities number 1, either rank of augmented matrix is equal to
rank of A or rank of augmented matrix is greater than rank of A greater than means not
equal to ok. Now if rank of augmented matrix is equal to rank of A, what does it mean?
You see, what is the augmented matrix we are having? Augmented matrix is is given by
[a 11 , a 12 ....a n1 ; a 21 , a 22 ,...a 2n ,...a 2n , a m1 ...a mn : b 1 , b 2 ,....b n ].
Now we have seen that the first case, we are discussing that what happens if the rank of
augmented matrix is equal to rank of A? What does it imply? Whether it imply that
system is consistent or whether it imply the system is inconsistent? How can we say? So,
61
let us discuss this thing. We have first found augmented matrix here. Now this is matrix
A and the rank of A is same as rank of augmented matrix that is number of linearly
independent rows or columns, it is having this matrix A is having the same number of
rows and columns.
The same maximum number of rows and columns, the entire matrix which is the
augmented matrix have; that means, what? or we can say that linearly independent
columns of matrix A will be same as maximum linearly independent columns of
augmented matrix, the full matrix and that is possible only when this column, the extra
column can be written as linear combination of these columns.
Suppose and we know that rank of this A matrix is the same as rank of the entire matrix.
This means maximum linearly independent columns of matrix A will be same as the
maximum linearly independent columns of augmented matrix and this means the vector
b can be represented as linear combination of c 1 vector, c 2 vector up to c n vector; then
only it will be possible that rank of augmented matrix will be same as rank of A.
So, what does it mean basically? This means this vector b can be represented by some
linear combinations of these column vectors ok. Now what is b? b is simply [b 1 , b 2,... b n ]
= 1 c 1 + 1 c 2 +.....+ n c n, where c 1 = [a 11 , a 21 , ....a m1 ], c2 = [a 12 , a 22 , ....a m2 ],...c n = [a 1n ,
a 2n , ....a mn ]. So, this means b 1 will be equal to this LC, there exist some 1, 2,... n such
that b can be expressed as linear combination of this column vectors.
So, now this implies that b 1 = 1 a 11 +a 12 +.....+ n a 1n . Similarly b 2 can be written as
1 a 21 +a 22 +.....+ n a 2n and similarly the m’th equation. So, what it is basically? It is
simply A[ 1 ....A n ]= b; that means, in place of x 1 , we are having 1 ; in place of x 2 , we
are having 2 ; in place of x n , we are having n ; that means, we have shown the existence
of the solution of this linear system of equations. Hence we can say that solution exist, if
rank of augmented matrix same as rank of A.
62
(Refer Slide Time: 11:22)
So, we can say that if rank of augmented matrix is equal to rank of A, this implies
solution exist or system is consistent we can say.
Now, if system is consistent, there are 2 possibilities; either it has unique solution or it
has infinitely many solutions. Now if it is having unique solution, so when it will be
having unique solution you see? It will be having unique solution only when number of
linearly independent equations, will be same as number of unknowns and that will be
possible only when rank of augmented matrix, which is same as rank of A should be
equal to number of unknowns.
This means that number of linearly independent equations are same as number of
unknowns because of A means maximum number of linearly independent rows or
columns ok. And if it is equal to number of unknowns, this means number of linearly
independent equations are same as number of unknowns. This means unique solution.
Now when it will be having infinitely many solutions? If rank of augmented matrix will
be equal to rank of A and will be less than number of unknowns ok, then it will be
having infinitely many solution. The reason is very simple, you see if rank of A or rank
of augmented matrix are same and it represent number of linearly independent equations
and if they are less than number of unknowns this means equation are less and number of
unknowns are more; number of linearly independent equations are less and number of
unknowns are more, this means infinitely many solutions ok.
63
(Refer Slide Time: 13:46)
Now, the second case is if rank of augmented matrix is not equal to rank of A or rank of
augmented matrix is more than rank of A, this means in the echelon form of this matrix
rank of this matrix A is less than rank the entire matrix ok. This means this will be
possible only when in the echelon form of this matrix A. There is a row which contains
all 0 and corresponding v i is not equal to 0 ok. You can see that if it is 0, the entire row
the last row is 0 and this is not equal to 0 then what is the rank of this matrix? Rank will
be m-1 and here it is m.
So, I want to say that rank of augmented matrix will be more than the rank of A only
when in the echelon form of the augmented matrix, you will be having one entire row
zero and the corresponding b i is not equal to 0 and this means that that 0 is equal to some
non-zero quantity which is not possible. So, system is inconsistent. In this case, system is
inconsistent or no solution ok.
So, let us discuss few properties based on this. This we have already discuss that system
of non homogenous equation is given by Ax = b where b is not equal to 0 ok. These are
coefficient matrix. This is unknown and this vector is A right hand side.
64
Augmented matrix is defined like this. You see that we first type the matrix A and then
the right hand side, this entire matrix is called augmented matrix. Its order is mn+1 and
clearly rank of A plus rank of augmented matrix is greater than or equal to rank of A.
Now, if you talk about consistency, consistency or inconsistency in the system of linear
equations and then we have already discuss that if rank of augmented matrix is not equal
to rank of A, is greater than A. This mean system is inconsistent and if rank of
augmented matrix is equal to rank of A is equal to rs, then this means consistent and this
r equal to number of unknowns means unique solution. If this r is less than number of
unknowns, this means infinitely many solutions as we have already discussed.
Now let us discuss few examples based on this ok. The first example is we can see here.
We are having here three equations and how many unknowns? We are having 4
65
unknowns ok. So, x 1 , x 2 , x 3 , x 4 ; these are the unknowns and these are three equations.
So, you first write augmented matrix of this equation ok.
So, what is the augmented matrix we are having here? here the augmented matrix given
by [1 1 -2 | 5; 2 2 -3 1 | 3; 3 3 -4 -2 | 1]. This is the augmented matrix ok.
Now if you see here, the first equation is x 1 +x 2 -2x 3 +4x 4 =5. The second equation is
2x 1 +2x 2 -3x 3 +x 4 =3. Third equation is 3x 1 +3x 2 -4x 3 -2x 4 =1. Now the first important
properties is when you apply elementary row operations on the system of linear
equations, it will not change the solution. It may change the equations, but the solution
set remain unchanged.
So, you first find the echelon form of this matrix, which is [1 1-2 4 | 5; 0 0 1 -7 | -7; 0 0 0
0 | 0]. Now what is the rank of A, rank of this matrix A? This is simply 2; number of non
zero rows is 2 and what is the rank of augmented matrix or the entire matrix? The rank of
entire matrix is also 2. So, this implies rank of A is equal to rank of augmented matrix
which is equal to 2. This means system is consistent. How many unknowns it is having?
It is having 4 unknowns. So, rank is less than 4. This means it having infinitely many
solutions.
Now how you will find the solution? You will arbitrarily choose two variables, because
we are having 2 independent equations. The first equation from here is x 1 +x 2 -
2x 3 +4x 4 =5. The second equation is x 3 -7x 4 = -7. So, these are two linearly independent
equations and we are having four unknowns.
66
So, you can take 2 variables as arbitrary values and find out the remaining 2 in terms of
that ok. So, that is what we have discussed in the solution. You can take x 4 = k 1 , and
x 1 =k 2 , then you can find x 2 and x 3, where k 1 and k 2 are any real numbers
Now, let us discuss the second problem. The second problem is again having 4
unknowns and 3 equations. So, how can we solve this problem? Again we will find the
augmented matrix. Convert the augmented matrix and so, it is echelon form. Find the
rank of augmented matrix and then we can see, whether system is consistent or
inconsistent ok. So, let us discuss this example also.
67
Augmented matrix here is [1 1 -2 3 | 4; 2 3 3 -1 | 3; 5 7 4 1 | 5]. This is the augmented
matrix of this problem.
Now you will try to find its echelon form, which comes out to be [1 2 -2 3 |4; 0 1 7 -7 | -
9; 0 0 0 0 | 3].
So, here rank of A is 2 and the rank of the entire matrix is 3? So, rank of A is not equal to
rank of augmented matrix, this means no solution ok. So, system is having no solution.
Now you come to third problem. Here we are having 3 equation with 3 unknowns. Now
to find out whether the system is consistent or inconsistent or it is having many solution
or no solution or unique solution, we will first find its augmented matrix ok. This is [1 2
1 | 3; 2 5 -1 | -4; 3 -2 -1 | 5], Now you find out the echelon form of this matrix which is [1
2 1 | 3; 0 1 -3 | -10; 0 0 -28 | -84]. You can easily, find out the echelon form of this
matrix by applying a series of elementary row transformation.
So, the rank of matrix A is 3 because number of linearly independent rows it is having is
3 and the rank of the entire matrix is also 3.
68
(Refer Slide Time: 26:10)
The first equation is x 1 +2x 2 +x 3 =3. What is the second equation? x 2 -3x 3 =-10. Third
equation is -28x 3 =-84 or x 3 =3. Now we will apply a back substitution. From here we
got x 3 = 3. Now, the substitute x 3 in the second equation, which gives x 2 = -1. Now
substitute x 2 and x 3 in equation 1 to get value of x 1, which is 2.
Now, let us solve this problem where we have to find out the conditions on and such
that system is having no solution unique solution or infinitely many solutions; that
means, what should be the values of this and . So, that this system is having no
solution unique or infinitely many solutions. So, how can we see this? How can we find
out? So, we again find out the echelon form of this matrix ok.
69
So, let us try to find it. So, what is the augmented matrix of this problem? This will be [1
1 1 | 6; 1 2 3 |10; 1 2 | ]. Now we have to first find out the echelon form of this matrix,
which comes out to be [1 1 1 | 6; 0 1 2 | 4; 0 0 -3 | -10]No
Now the first condition is no solution. For no solution, rank of A should not equal to rank
of augmented matrix or we can say that rank of augmented matrix will always be more
than rank of A, if it is having no solution. Now the rank of A cannot be less than 2
because there are 2 non-zero rows is it is having. So, rank of A is at least 2 ok. It will be
3, if 3 and it will be 2, if = 3 ok.
Now you want the rank of augmented matrix more than rank of A ok. So, if = 3, then
rank of this matrix is 2 and we want rank of this matrix as 3. So, this entire matrix rank
will be 3, if -10 0 that is 10; then only if =3. This means rank of this matrix is 2
and if 10; that means, rank of the entire matrix is 3 so; that means, ranks are not same;
that means, no solution this is the only possibility ok. So, if this happen, then no solution.
Now second case is many solutions or infinitely many solutions. For many solution rank
of A should be equal to rank of augmented matrix and should be less than number of
unknowns. Now here unknowns are 3. So, rank must be less than 3 and less than 3
means; either 1 or 2 and we have seen that rank of A is at least 2 so; that means, 2
because you want it less than 3. So r = 2 is the only possibility ok. With =3 and =10,
rank of this entire matrix is 2. So, if this happens then, it has many solutions; infinitely
many solutions.
70
Now the third case is unique solution. For unique solution rank of this should be equal to
rank of augmented matrix and should be equal to 3. Now rank of A is 3, if -30 or 3.
Now if 3, the rank of this matrix is 3 and whatever the value of maybe the rank of
the entire matrix is also 3. So, may belongs to R. So, this is the condition on and .
So, that it is having unique solution. So, this is how we can find out whether the linear
system of equations has unique solution, no solution or many solutions ok.
So, in the next class we will see about vector spaces that what vector spaces are, what
vector spaces are and how they are important for solving some problems so
71
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 06
Introduction to Vector Spaces
Hello friends. Welcome to lecture series on Matrix Analysis with Applications. So, today
we will discuss about vector spaces that what vector spaces are. So, let us start.
Now, vector spaces V over a field F consist of a set on which two operations called
addition which is denoted by a '+' sign and a scalar multiplication which is denoted by a
'.', respectively are defined such that the following axioms are satisfied. So, what are
axioms?
The first axiom is you take any u v in V then u+ v also belongs to V that means, this
addition is a binary operation that means, this satisfy a closure property. You take any
arbitrary u and v in V then the with respect to addition the sum is also in V, ok.
Now, for all x y z in V, (x+y)+z = x +(y+z), if you put the brackets in the first two or in
the last two both are same. So, that is associative property with respect to addition.
72
Next property is there exists an element v, then there exists an element in v denoted by a
0. This 0 is not usual 0 this is simply a notation and this denote identity element with
respect to addition such that v + 0 = 0 + v = v for every v in V, ok.
Then for every u in V, there exists v in V such that u + v = 0 and this v is called inverse
of u with respect to addition. And the next is for every u, v in V u + v i= v + u that is
commutatively of addition. So, these 5 properties are with respect to addition. And we
can also say that vector space V with respect to '+' must be an Abelian group, ok.
Now, with respect to'.' for each belongs to field, v belongs to V, .v also belongs to V,
ok. So that means, '.' is a binary operation, but this binary operation is defined for one
vector from vector space and one scalar from field.
Now, for each v in V, 1.v also belongs to V. The next is for every belongs to field, u, v
belongs to V .(u+v) = .u+.v, and for every , belongs to field u belongs to V,
(+).u = .u+.u. So, if a vector space V over the field F with respect to addition and
scalar multiplication do satisfy these 9 properties then we say that it is a vector space
over the field F.
Now, the elements of the field F are called scalars and the elements of the vector space V
are called vectors. Now, these vectors are not the usual vectors which we study in
physics. We simply called the elements of vector spaces are vectors that is all, ok.
Now, (V, +), it must be an abelian group and with respect to '.' this V must satisfy the
following properties, which are: for every belongs to field, v belongs to V .v must
belongs to V, 1.v must belongs to V, for every v belongs to V, then (+).v = .v+.v
73
for all , from field and v belongs to V and .(u+v) =.u+.v, for all belongs to field
and u v belongs to V.
So, with respect to addition this V must be an abelian group and with respect to dot V
must satisfy these properties that we said for V over F is a vector space. So, now, let us
discuss few examples based on this. Now, you take V as R2, and say field is R.
R2 means all (x, y) such that x and y both are in R. Now, the next is how addition and
scalar multiplication is defined, then only it will constitute a vector space over that
addition and scalar multiplication.
So, addition is defined as you take any two elements in R2 and this is simply (a 1 ,
a 2 )+(b 1 +b 2 ), you guys simply write here (a 1 +b 1, a 2 +b 2 ) now, this is about addition. Now,
the scalar multiplication is defined as .(a 1 , a 2 ) = (a 1 +a 2 ).
Now, first you have to see that V over '+' must be an abelian group if it is a vector space.
So, the first element is this '+' must be a binary operation or we can say that with respect
to '+' it must be closed, closure property must be satisfied. So, this is very clear that if
you take any element is in R2, you take let u and v belongs to R2 or V, then u = (a 1 , a 2 )
and v = (b 1 +b 2 ). Now, if you take u + v which is (a 1 , a 2 )+(b 1 +b 2 ) = (a 1 +b 1, a 2 +b 2 ), by
74
this definition which also belongs to R2. That means what? That means, it satisfies
closure property, ok.
The second property is associative. If you take the associative property it is very obvious
that if you take 3 elements (a 1 , a 2 ) and (b 1 , b 2 ) and (c 1 , c 2 ) in R2. You take the bracket
in the first two you add them So, you can easily show that this is equal to (a 1 , a 2 )+(b 1 ,
b 2 )+(c 1 , c 2 ) that means, weather take the bracket in the first two elements or take the
bracket in the last two element both are same. It is very easy to show. You guys simply
proceed using this addition operation.
Now, if you talk about the existence of identity element then you can simply see that if
you add (0, 0) to any element we will get itself. So, if you take any (a 1 , a 2 ) add (0, 0) you
will get (a 1 , a 2 ), and it is true for all (a 1 , a 2 ) in R2 that means, you have shown the
existence of identity element. And the inverse of any (a 1 , a 2 ) is (-a 1 , -a 2 ) which is again
in R2. Inverse means if you add these two. So, it is equal to (0, 0). So, we have shown all
the 5 property with respect to addition that means, V respect to '+' is a abilean group.
75
(Refer Slide Time: 11:20)
Now, with respect to '.', the first property you take belongs to field where field is real
here and you take any u which is given by (a 1 , a 2 ) in R2. If you take . (a 1 , a 2 ) then by
the definition of scalar multiplication we can easily write (a 1 , a 2 ) and it again belongs
to R2 that means, closure with respect to scalar multiplication is satisfied.
Now, the next property is you take 1.v that means, 1.(a 1 , a 2 ) which is by definition is
again (a 1 , a 2 ) that is v.1, so 1.v = v.1 for every v in V. That means, the second property
holds. Now, if you take (+).v. v here is (a 1 , a 2 ). So, it is(+)a 1 , (+)a 2 by the
definition. So, it is equals to (a 1 + a 1 ), (a 2 + a 2 ) which is equals to (a 1 , a 2 )+ (a 1 ,
a 2 ) = v+v, by the definition of addition. And it holds for every , in F and v in V.
And the last property is .(u+v), this must be equals to .u+v. So, to hold this property
u=(a 1 , a 2 ) and say v=(b 1 , b 2 ). So, this will be .( (a 1 , a 2 )+ (b 1 , b 2 ) = .( a 1 +b 1 , a 2+ b 2 )
by the definition of addition. And this is equals to by the scalar multiplication this is
(a 1 +b 1 ), (a 2+ b 2 ) = (a 1 +b 1 , a 2+ b 2 ) and that can be written as a 1, a 2 +b 1, b 2
And this is equals to u+v where u is (a 1 , a 2 ) and v is (b 1 , b 2 ) and this property holds
for every belongs to field and for every u, v belong to V.
So, we have shown that all the 9 properties are satisfied that means, R 2 over the field R
under these operations of addition, scalar multiplication is a vector space. So, these are
few examples of this. The first example we have discussed.
76
(Refer Slide Time: 14:56)
So, say we are having instead of R2 we are having Rn, ok. If you are having Rn over the
field R. And how addition and scalar multiplication is defined? You take any u which is
in Rn that is (a 1 , a 2 ....,a n ), v is also in R n that is (b 1 , b 2 ,....b n ) and the addition is defined
as (a 1 +b 1 , a 2 +b 2 ,... a n +b n ) that is component wise addition and scalar multiplication
result for u is .(a 1 , a 2 ....,a n ) = (a 1 , a 2 ...., a n ), is a vector space. Now, this can be
proved easily as we did for R2 over R, ok.
Now, a second example is the set of all real polynomials of degree less than equal to n,
over a real field under usual addition and scalar multiplication of polynomials is also a
vector space you can see here.
77
Now, you are considering V as P n , P n is all polynomials of degree less than equal to n
and field is R. And addition is standard addition of two polynomials and a scalar
multiplication a standard multiplication of a scalar with a polynomial.
So, how can it constitute a vector space? So, this is very easy to show you can simply see
that the first is with respect to '+' must be an abelian group, in order to show that (V,+) is
an abelian group the first property is closure that means, "+" is a binary operation. So,
that is simple to show that you take a n xn+a n-1 xn-1+....+a n . This is the first polynomial
that is u and the v is supposed b n xn+b n-1 xn-1+....+b n . So, these are 2 polynomials with
degree less than equal to n. It may be less than n also if coefficients a , b n are 0;. So, we
are not putting any restrictions on the leading coefficients.
Now, if you take u + v, then if you take component wise addition as defined so this is
something like (a n +b n) xn+(a n-1+ b n-1 ) xn-1+....+a n +b n . Then this again belongs to V
because it is also a polynomial of degree less than equal to n that means, closure with
respect to addition is satisfied.
The second property is associative, that is very easy to show if you take (u + v) + w is
also equals to u +(v+w) for every u,, v w belongs to V, that you can easily show you take
the left hand side and then solve it you take the right hand side or you or from the left
hand side itself you can easily show the right hand side. You can arbitrary chose u as
polynomial of this type, v as this polynomial of this type and w as some c n xn+c n-1 xn-
1
+....+c n
Next is commutative. Commutative is also easy to show you take u + v which we have
already seen that it is (a n +b n) xn+(a n-1+ b n-1 ) xn-1+....+a n +b n . And this can be written as
(b n + a n) xn+(b n-1 +a n-1 ) xn-1+....b n +a n b because these are real numbers, and real numbers
commute respect to addition. This is b n xn+b n-1 xn-1+....+b n + a n xn+a n-1 xn-1+....+a n . by the
definition of vector addition. So, this is v + u for every u, v belongs to V. So, we have
shown that it commutes.
78
(Refer Slide Time: 20:44)
Now, with respect to multiplication also, if you see the multiplication. So, first property
is .u if you take . (a n xn+a n-1 xn-1+....+a n ) = a n xn+a n-1 xn-1+....+a n . which also
belongs to V because it is again a polynomial of degree less than equal to n. So, the first
property with respect to scalar multiplication is satisfied which is closure.
The second is 1.u if you take as 1. So, it is simply 1.(a n xn+a n-1 xn-1+....+a n ) = (a n xn+a n-
1 xn-1+....+a n ) which is equal to u for all u in V. So, next property also hold.
Then you take (+).u. So, it is (+).(a n xn+a n-1 xn-1+....+a n ) So, it is
(+)a n xn+(+)a n-1 xn-1+....+(+)a n ). Now, it is a n xn+a n-1 xn-1+....+a n + a n xn+a n-
1 xn-1+....+a n. So, it is .u+.u, for all , belongs to field and u belongs to V.
And similarly we can show the last property that .(u+v) = .u+.v for all belongs to
field and the u, v belongs to V. So, in this way we can say that it is a vector space over
real field.
79
(Refer Slide Time: 23:19)
Now, next example is if you consider set of all mn real matrices over a real field under
usual addition of matrices and usual scalar multiplication with matrix. It is also a vector
space it is also very easy to show that with respect to '+' the set of matrices is a abelian
group and with respect to '.' it satisfy all those 4 properties,
Then next is if you take the set of all real and continuous functions defined in the interval
[0 1], you take the set of all real and continuous functions which are defined in between
[0 1] suppose it is denoted by C[0 1] over a real field with a addition defined like this and
a scalar multiplication defined like this is also a vector space.
You see with respect to '+' the sum of two continuous function is closer. Identity element
is (0 0) is a continues function which belongs to this set, the inverse of any element f in
[0 1], C[0 1] is -f which is also in C[0 1]. That is inverse element exist.
Now, if you see with respect to '.' also you can easily see that if you multiply a
continuous function by a non-zero scalar 'a' it will retrieves a continuous function then
1.f = f. And similarly the other two properties results (+).v = .v+.v and the last
property is .(u+v) = .u+.v.
S So, in this way we can also show that this constitute a vector space over the field R.
80
(Refer Slide Time: 25:53)
Now, these are the important theorems, that in any vector space over any field F. If it is a
vector space over the field F then these 3 properties also hold and it is very easy to prove
you can see here.
The first property is and it is for alpha, it is for all in field. Now, this 0 is from vector
space, this 0 is going to be a usual 0, it is a 0 of a vector space. So, this 0 is additive
identity basically and we have to show that if you take a scalar multiplication of any
with this identities it is always 0. So, the proof is .0 = .(0+0) = .0+.0 because it is a
vector space. So, (u+v) = . u+.v.
81
Now, what is second? Second is 0.u, this 0 is from field this is not additive identity the
things which have marked by bold 0 is additive identity. Now, 0 is from field and 0.u
with any u in V any u in V is always additive identity again it is simple to prove you can
see.
It is 0.u = 0 for all u in V. Then it is (0 +0).u you see 0.u +0. by the property of vector
space. Now, if you take 0.u as v then it is simply v = v+ v and v is any element in vector
space v.
So, its additive inverse exists. So, we can always add its additive inverse both the sides
and it is 0 which is equal to v + 0 and this implies v = 0 and v is nothing, but 0 . u = 0.
So, we have shown that 0.u is always 0 this is additive identity, ok. The last property is -
1.u is nothing, but additive inverse of u for all u in V. So, this is also simple to show.
This is-1. u = -u this is to show.
82
(Refer Slide Time: 29:35)
So, let us write 0.u = 0 this we have proved in the second part. Now, this 0 can be written
as 1-1. Now, this 1-1 because we know that (+).u = .u+.u, because the vector space.
So, this can be done as (1- 1). u = 0.
So, it is .u+.u = 0, 1.u is u+(-1).u = 0 and if u is in V then its additive inverse exist
which is -u and it is also in V. So, we can always add -u both sides and this is additive
identity. Now, element with its additive identity is always itself. So, it is -1.u which is
equals to -u. So, we have solved this also, ok. So, all the three parts we have done.
Now, this set of R+ which a set of all positive real numbers and we define the operation
of addition and scalar multiplication like this is also a real vector space. So, this is very
easy to prove.
83
(Refer Slide Time: 31:25)
Now, how addition is defined? Addition is simply u+v=uv and .u = u and for every u
belongs to R+ set of all positive real numbers field is real. So, first we see with respect to
addition, with respect to addition ts must constitute an abelian group. The first is closure;
you take let u, v belongs to R+ then u+v which is defined like uv is also product of two
real positive number also positive real number. So, this also belong to R+. So, closure
property is satisfied.
Next is associative, if you take (u+v)+w which is equals to u+(v+w) it is. This will be
uv+w =u+vw => uvw=uvw so it satisfy associative property. You can take you can take
u+0 which is equal to u. So, u+0 = u.0 = 0 in the usual 0 is a additive identity you can
denote it by a bar also by any other number also. So, this implies 0 bar is 1. So, here
additive identity is 1.
So, clearly the inverse of any element if you take u + v should be additive identity. So,
this implies u v = 1 and this implies v = 1/u which also belong to R+. So, identity, so
inverse of any element is 1/u additive identity is 1 with respect to addition it satisfy all
the properties of closure, comutative also hold if you take u + v it is u v which is equal to
vu which is equal to v + u for every u v in R+. So, all the property with respect to
addition is satisfied. So, with respect to '+' it is an abelian group.
84
(Refer Slide Time: 34:01)
Now, if you say with respect to scalar multiplication; if you see for a scalar
multiplication it is defined like this. So, of course, closure property holds you take any
in real field and any u in R+ then u is always belong to R+ that means, closure of scalar
multiplication holds. Now, you take 1.u which is u1 is u, satisfied.
Now, you take (+).u which is u(+) which is u.u which is .u+.u by definition of
addition and multiplication. So, this is equal to you see this '.'t means multiplication
means addition. So, similarly the next property can be shown that .(u+v) is equals to
.u+.v, ok. So, in this way we can show that it is a real vector space.
Now, let us discuss few examples which are not vector spaces you see. If you take V
equal to R2 and over a real field and take any two arbitrary element (a 1 , a 2 ); (b 1 , b 2 ) in V
belongs to field and you defined addition like this which is a standard addition and dot
like this .(a 1 , a 2 ) = (a 1 , 0) then it will not constitute a vector space. The reason is very
85
simple you see with respect to addition it will be an abilean group no problem, but if we
see here, so for vector space 1.v should be v.
What I want to say basically, if it is a vector space is must satisfy all the property of
vector space if it is not satisfying anyone of the property of vector space means it is not a
vector space, ok. Now, if you take this property (+).v. So, it is (+).(a 1 , a 2 ) by this
property it is by this definition it is (+).a 1, (+).a 2 .
But for a vector space it must be equals to .v+.v which is .(a 1 , a 2 )+.(a 1 , a 2 ) which is
equal to a 1 ,a 2 +a 1 , a 2 and which is equals to (+) a 1 ,a 2 and may not be equal to this if
a 2 0. So, for any a 1 , a 2 this side is not equal to this side. So, this means it is not a
vector space.
Now, if you take R2 over a complex field it will also not be a vector space. Because if
you take a scalars from a complex numbers and you multiply with a real number the
resultant is not in R2, resultant will be in C2. So, closure with respect to the scalar
multiplication is not satisfied. So, this means it will not be a vector space.
86
(Refer Slide Time: 38:53)
Similarly, you we can easily show that this addition, scalar multiplication same standard
one. But if you take vector addition, vector addition is simply this thing, and it does not
satisfy a associative property you can verify.
Now, if you take the real polynomials of degree greater than equal to two over the real
field it is also not a vector space you can see. If you take a polynomial you take one
polynomial of degree greater equal to two over a real field. You take one polynomial as
suppose x2-1 and second polynomial as say -x3-1 then u + v is -2 which does not belongs
to v, that is closure property with respect to addition is not satisfied. So, it will not
constitute a vector space.
So, in this lecture we have seen that what are vector spaces and what are the standard
examples of vector spaces. In the next lecture we will see that what are the spaces and
how we can find out basis and dimension of vector space.
Thank you.
87
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 07
Subspaces
Hello friends, welcome to lecture series on Matrix Analysis with Applications. So, in this
lecture, we will talk about Subspaces. In the last lecture, we have seen that what vector
spaces are, and how can we see that a given set is a vector-space or not, over a given real
field ok, real or complex field.
So, what subspaces are. Let us see. Let V be a vector-space over the field F, and let W be
a subset of V ok; this V is a vector-space over the field F, and this W is a subset of this
V. Then this W is called subspace of V, if W itself is a vector-space over the field F with
respect to the same operations of vector addition and scalar multiplication on V.
Whatever vector addition and scalar multiplication is in V over F, the same vector
addition scalar multiplication we apply on W, which is a subset of V. If it is itself vector-
space of the same field F, then we say that it is a subspace of V.
So, in any vector-space V, two subspaces are trivial, the first one is V itself ok, because
V is a subset of V, and it is a vector-space. And this 0 subspace, which is a trivial
subspace of vector spaces V. In the subspace 0, this 0 is simply additive identity, and is
also called ‘zero subspace’ of V. So, if a set is given to you ok, if a subset of a vector-
space V is given to you, and if this W itself is a vector-space over the same field F, with
respect to the same set of vector additions and scalar multiplication than we say that W is
a subspace over vector-space V.
88
(Refer Slide Time: 02:43)
Now, how can we show that a given subset W of V is a subspace. So, instead of showing
all the properties of a vector-space V, if we show these two properties that is closure
respect to addition vector addition, and closure with respect to a scalar multiplication,
then it is sufficient to show that that W is a subspace of this vector-space V.
Now, how can we say this. You see that V is a vector-space over the field F if V with
respect to '+' is an Abelian group. Abelian group means, first is closure, next is
associative, then identity element, then inverse element, and then commutative. If these
property holds with respect to vector addition, then we say that V with respect to
addition is the Abelian group.
And with respect to '.', the first is closure, the second property is 1.V = V for all V in V
ok, third property is (+).v = .v+.v, where , are in field, and v is any vector in V.
89
Next is a.(v) =.(v) .(V) =.(v). And next is .(v 1 +v 2 ) must be equals to
.v 1 +.v 2 for all in field, and for all v 1 , v 2 in vector pace V.
So, if this property hold with respect to addition and scalar multiplication, then we say
that V is a vector-space over the field F. Now, now we are taking W as a sub set of V,
over the same field F with respect to same vector addition and scalar multiplication.
Now, you see, closure is a automatically hold, because of this property, the first property
itself says that it is close with respect to vector addition ok.
Now, the second property is associative. Since, associative holds for every element u, v,
w in V, so it will hold for a subset of V also. Now, we have to show the existence of
identity element, and inverse with respect to addition. Now, we know that u + v belongs
to W, for all u v belongs to W. And ,u belongs to W, for all belongs to field, and u
belongs to W.
Now, if you take = 0, say for example, because it hold for every , so it will hold for
= 0 also. And 0.u =0, so that means, 0 belongs to W, so, we have shown that existence of
identity element with respect to addition. Now, for inverse for inverse, you simply you
simply replace by -1. So, -1.u = -u, we have already shown, so that means, inverse
element with respect to addition also exist.
Now, commutativity property holds, because this property holds for every u, v in V, so it
will hold for a subset also, because subset elements of W are nothing but the elements of
V. And this V holds commutative property for every u v in V, so it will hold for subset
also. So, so we can say that W with respect to '+' is an Abelian group.
Now, we have to show with respect to '.'. Now, with respect to '.' closure is automatically
satisfied, because of the second property. Now, 1.v = v, because this property is holding
for every ok. You see, this property is holding for every v in V, so it will hold for a
subset also. So, we can say that this property hold. And similarly, all the remaining three
properties also hold for W, because W is a subset of V.
So, hence, we can say that if we show the these two properties hold, then we can say that
W is a subspace of V, or we can club these two properties to property .(u+v) should
belong to W for all u, v in W, and in F. So, either we have two ways to show that W is
a subspace of vector-space V, either we show these two properties separately, or we can
90
show the single property, and we can show that, then we can say that W is a subspace of
a vector-space V.
So, now let us discuss few examples based on this. Now, first example is V, we are
taking as R 2 , over the real field R. Now, W, which is given as all (x, y) R2 such that x
+ 2y = 0, and we have to show that W is a subspace of this vector-space V ok.
Now, what is V here? V is entire R2 and field is R. So, this is something like this, this
entire R2 is simply a vector-space V over a real field. Now, W is simply all (x, y) in R2,
91
such that x+2y= 0. So, it is passing through origin, then x =0, y = 0. And when x= -1,
when x = -2 for example, then y=1, so it is something this line ok, this is x +2y = 0. So, it
is of course, a subset of this R 2 . Now, we have to show that it is a subspace of this V. So,
you take to arbitrary element in W, and in field.
So, u is as (x 1 , y1 ) and say v is (x 2 , y2 ). So, this implies x 1 +2y 1 =0, and x 2 +2y2 =0,
because u v are from W, so it will satisfy these properties. Now, if it is a subspace, then
we have to show that .(u+v) must be in W. So, .(u+v) is .[(x 1 , y1 )+ (x 2 , y2 )], so it is
.(x 1 + x 2, y1 +y 2 ). Again by the standard vector addition now, we have to show that this
is in W, so that means, we have to show that this is x and this is y. So, we have to show
that x+ 2 y = 0, then only we can say that this element is in W.
So, you can take x+2y, now x+2y is (x 1 +x 2 )+2(y 1 +y 2 ) = (x 1 +2y 1 +x 2 +2y2 ). And
x 1 +2y1 is 0 from this equation, and x 2 +2y2 is 0 from this equation, so it is .(0+0) = 0.
And this implies x, y belongs to W. So, we can say that this x, and this y is in W, that
means, W is a subspace of V.
Now, now similarly if we want to show that this matrix, this vector-space we have taken
as all mn matrices all the real field R, and W is a subset of this V, we have taken as all
collection of all symmetric matrices, where A = AT.
And again if you want to show that W is subspace of V, so again it is very easy to show
you simply, take two matrices, A comma B belongs to W. So, this implies A=AT and B
=BT.
92
And we have to show that a matrix C, which is .(A+B) is also in W, so that means, you
have to show that C = CT. So, take CT, which is .(A+B)T, which is .(AT+BT). And it is
.(A+B), because AT=A, and BT=B, so it is equal to C, so this implies C is in W.
So, we have shown that W is a subspace of this vector-space V, because all the matrices,
which is equal to its transpose are in W. And we have shown that C = CT, and C is
nothing but .(A+B), that means, W is a subspace of this vector-space V.
Similarly, we can show for the third problem also, you take two arbitrary element, two
arbitrary vectors of p 2 . And try to show that it is closed under addition, and the scalar
multiplication. All you can show that you can take u and v in P 2 , and you can try to show
that .(u+v) is also in p 2 .
Now similarly, if you take the next example, we have taken V as a collection of all
functions from R into R. And we have consider a subset W of V such that f(-1) = 0 ok.
All those functions, we which vanishes at -1, these we have taken as W.
So, again in order to show that it is a subspace of this vector-space V, you take any two
arbitrary element of this subset W, and try to show that it is closed with respect to
addition and scalar multiplication. Now, then the last example, last example of this slide
that we are taking V as all n1 matrices over the field F. Suppose, A be an mn matrix
over F is a fix matrix. There are set of all n1 column matrices X over F, such that AX=
e= 0 is a subspace of V.
You see that here. Here we have taken a W as all X in V such that AX= 0. Here A is a
fixed matrix of order mn, and this X is a vector of order n1 ok. And the collection of
93
all those X, we are claiming that this W is nothing but a subspace of the vector-space V.
Now, it is easy to show that if you again, take two elements say x 1 and y1 in W. Now, of
course, this set is never empty, because at least X = 0 is in the set.
And we have to show that Z, which is a .(X 1 +Y 1 ) is also in is also in W. So, you take
A.Z, A.Z = A.(X 1 +Y 1 ) , which is .AX 1 + .AY 1 because is a scalar, and it is 0, so it
is 0, so that means, Z belongs to W. And hence, we can say that W is nothing but a
subspace of this vector-space V.
So, basically to show that this a subspace, we have to take two cases, first is a unique
solution of this, if it is a unique solution that means, trivial subspace, if not unique that is
infinitely many solutions, and the proof follows on this line ok. So, in this way, we can
see that whether, a given subset of a vector-space V constitute a subspace or not. In this
way, we can show that it is a subspace of given vector-space V.
Now, let us consider few more problems based on this. Now, here in the first problem,
we have taken V as R 2 over the real over the field R. Now, we have consider W 1 , which
is all (x 1 , x 2 ), such that x 1 0 ok.
94
(Refer Slide Time: 18:03)
So, what W is? So, the first example, we have taken V as R 2 , field as R this is the thing.
And x 1 0, x is (x 1 , x 2 ), x 1 0 means this thing. So, whether, half of this R 2 , constitute
a subspace of this vector-space or not. So, you see that if it is a subspace of this vector-
space, then it must be closed with respect to addition and scalar multiplication ok.
Now, if you take element say, (1, 2), in W, yes, because W is all those (x 1 , x 2 ) such that
x 1 0, here, x 1 0, so it is in W. And you take and is any real number, it comes from
the field.
So, if you take as -1 say, and -1.(1, 2) = (-1, -2), which is not in W, so that means, it is
not closed with respect to the scalar multiplication, and that means, it will not constitute
a subspace of this vector-space V because, if it is not a subspace of the vector-space V, it
must be closed with respect to vector addition and scalar multiplication ok.
So, if you have to show that it is a subspace of vector-space, then you must show that it
is closed respect to addition and scalar multiplication. And if it is not a subspace at all,
then simply give a counter example to show that it is not closed with respect to addition
or scalar multiplication.
Now, take second example. Now, here we have considered all (x 1 , x 2 ) in R 2 . So that, x 2
are rational. So, again when you take W as all (x 1 , x 2 ) in R 2 , such that x 2 is rational ok.
Now, again if you take say (1, 2), it is in W, because here x 2 is rational.
95
Now, if you take = (2)1/2, then (2)1/2.(1, 2) is simply ((2)1/2, 2(2)1/2)., which is not in W,
because now x 2 is not rational. So, it is not closed with respect to scalar multiplication,
so that means, it is not a subspace of this vector-space not a subspace, is it clear. Now,
discuss second problem, the second problem is we have taken p 2 as V, p 2 are all
polynomials degree less than equal to 2 over the real field. Then, which are the following
subsets of V are the subspaces of V. The first one is all p in V such that p'(1) = 0.
So, here V is all p 2 over real field R, and W we have taken as all p in V, such that p'(1)
=0. Now, you take two polynomials, p 1 and p 2 in W that means, p 1 '(1) = 0 and p 2 '(1) =0.
And if it is a subspace, then it must be closed with respect to vector addition and scalar
multiplication, so that means, you take an arbitrary p, an if it is a subspace, then we have
to show that p'(1) = 0. So, what is p dash of this, this is .(p'(1)). Suppose, because this
implies that p' is simply .(p 1 +p 2 )'. Dash means derivative, which is .(p 1 '+p 2 '). Now,
p 1 '(1) is nothing but .(p 1 '+p 2 ')(1), and which is .(p 1 '(1)+p 2 '(1))., which is this
statement ok. And this is .(0+0) which is 0, so that means, this p is also in W, and that
means W is a subspace of the vector-space V.
Now, if you take the second example, second problem here, we have taken p in V, such
that p(-1)=. Now, one important thing is that the additive identity of a vector-space, and
all subspaces of that vector-space are same.
96
(Refer Slide Time: 23:47)
You take a vector-space V over the real field F, suppose this 0 denotes the additive
identity of this vector-space. And you take any subspace of this vector-space V over the
field F it will also have the same additive identity 0.
Now, here, you can easily see that if you take collection of all those polynomials, where
p(-1) =1. Now, the identity of this p 2 is 0; this p 2 has a identity 0. And that polynomial is
not there in this S 2 , because it contains all those polynomials, which at -1 is 1.
So, the 0 polynomial is not there, because 0 polynomial at -1 will be equal to 0, only will
not be equal to 1, so that means, 0 polynomial is not there, that means, additive identity
is not in S 2 . And hence, we can say that this is not a subspace of this vector-space V, or
the other way out is you take two polynomial which is 1 at -1.
If you take 2-x2 ok, if you take this polynomial, it is in W, because at -1 this gives 1.
Now, you take another polynomial say 3-2x2, again it gives at -1, it gives 1. Now, this is
u and this is v, if you add them u + v, this is 5-3x2. And this polynomial at -1 is nothing
but is equal to 2, which is not 1, that means, this element does not belongs to W. So, we
have shown that it is not closed with respect to vector addition. And hence, we can say
that it is not a subspace of this vector-space V.
97
Now, let vector V is a vector-space of field F. Then the intersection of any collection of
subspaces of V is a subspace of V.
If you take W, suppose W 1, W 2 and so on are the subspaces of V over F. Then if you
take intersection of all these subspaces, then it will also be a subspace of this vector-
space V. Things are very easy to show, you take u and v in W, and in field. Then since,
and you have to show that this Z, which is .(u+v) is also in W. So, since u and v is in
intersection of w i , this implies u and v belongs to w i for all i. And this implies .(u+v)
belongs to w i for all i, because all w i are subspaces of V. So, it must be closes with
respect to addition and scalar multiplication.
So, this implies this Z, will also belong to w i for all i. And this implies Z belongs to
intersection of w i also, because if it is in w i for all i, that means, it is in intersection also,
and that means, Z belongs to w, and that means, this W is a subspace of this vector-space
V.
98
Now, let u 1 , u 2 , up to u n be n vectors of a vector space V, and let 1, 2 ,... n be n
scalars. Then if you take 1 u 1 + 2 u 2 +... n u n , then this is called linear combination of u 1 ,
u 2 , up to u n . So, this will be a vector. Though it definitely belongs to V, because V is a
vector space. So, by the closure property of a scalar multiplication, vector addition, this
element will be in V itself, and this element is called linear combination of u 1 , u 2 , up to
un.
99
So, let us try to prove this. So, so what is S, S is simply u 1 , u 2 , ....,u n it is subset of V all
u i' s are from V. What is span of S, [S] is simply, the collection of all linear combinations
like 1 u 1+ 2 u 2 ,....+ n u n , such that i belongs to field for all i.
Now, in this theorem we have to show three things. First we have to show that it is a
subspace of V, it contains S we have to show that is a span of S is a subspace of V.
Second S is containing [S],. And third is a span of S is a smallest subspace, a smallest
subspace of V containing S.
Now, the second property is very easy to show, you see you see this span is containing
all the linear combinations of u 1 , u 2 , ....,u n . If you take 1 =1, because is varying from
the field. If you take 1 = 1, and all other = 0, then u= u 1 , so u 1 is definitely span of S.
Similarly, if you take 2 = 0, and all remaining i 's = 0. I mean 2 =1, and all remaining
= 0, then u 2 will be in span of S ok. So we can say that if i you take as 1 for any i
from n to 1. And i = 0, p = 0, and p i, then this implies u i belongs to span of S for
any i, and this implies S is containing span of S. So second part is very easy to show. If it
contain all the linear combinations of vectors u 1 , u 2 , ....,u n , so it will definitely contains
u 1 , u 2 , ....,u n also.
Now, we have to show that it is a subspace of V. So, again you take two arbitrary
elements of this sets say V and W belongs to the span of S, and alpha belongs to field.
And then we have to show that .(v+w) also in span of S. So, since v belongs to the span
of S, and this implies v is some linear combination of vectors u 1 , u 2 , ....,u n for i belongs
to field for all i. And similarly, w will be some other linear combination of u 1 , u 2 , ....,u n ,
so scalars will be different. So, this will be some scalar combinations.
Now, you take .(v+w). So, it is ( 1 u 1+ 2 u 2 ,....+ n u n+ 1 u 1+ 2 u 2 ,....+ n u n. ) So, when
you applied a definition of standard scalar multiplication, vector addition, you simply get
( 1+ 1 )u 1+ ( 2+ 2 )u 2 ,....+ ( n+ n )u n .
Now, it is some scalar, so we can say that it is also some linear combination of u 1 , u 2 ,
....,u n . So, we can simply say that it belongs to span of S. Hence we can say that span of
S is nothing but a subspace of field.
100
Now, we have to show that it is a smallest subspace of V containing S, how we can
prove it. In order to show that it is the smallest subspace containing S
you take an arbitrary subspace say T of V containing S, and try to show that the span of
S is also containing T.
So, so let T be any arbitrary subspace of V containing S. And in order to show that span
is the smallest such subspace, so we have to show that a span of S is a subset of T. Then
only this will be the smallest one containing S.
So, in order to show that this is a subset of this, take an arbitrary element in span of S,
and try to show that, that element is also in T. So, let say w belongs to span of S, so this
implies w is some linear combination of elements u 1 , u 2 , ....,u n and now, what to show,
we have to show that this w is also in T, then only we can say that this span is a smallest
such subspace.
Now, since u i belongs to T for all i, and T is a subspace. So, we can say that i u i also in
T for all i, by the closure property of a scalar multiplication, because T is a subspace ok.
And this implies summation of i from 1 to n u i also belongs to T. Again by the closure
property of vector addition, because 1 u 1 is in T, 2 u 2 is in T, n u n is in T. So, by the
closure property of vector addition this sum is also in T ok, so that this implies w is in T
ok, and this implies span of S is a subset of T. So, hence we have shown that this span is
a smallest subspace of V containing S.
101
So, so in this lecture, we have seen that a subset of a vector-space V. If you want to show
that it is subspace of given vector-space V, we have to simply show that it is closure with
respect to vector addition and scalar multiplication. If it is not a subspace of that vector
space, simply give a counter example, which contradict either closure properties with
respect to addition or scalar multiplication. We have also seen that span of S, which is
the collection of all linear combination of vectors in S, a simply as smallest subspace of
V containing S.
Thank you.
102
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 08
Basis and Dimension
Hello, friends. Welcome to lecture series on matrix analysis with applications. So, our
today's talk is basis and dimension. So, we have already discussed about vector spaces
subspace of a space V. Now, what do you mean by basis of vector space and how can we
calculate its dimension? So, let us discuss these things in this lecture.
So, first of all what is a basis? Now, a set S which is given by u 1 , u 2 , ....,u n of vectors is a
basis of V if it has the following two properties; number 1, S is linearly independent. So,
the first property is you consider S a subset of V, ok. Now, this subset of V will be a
basis of this V if number 1, this is linearly independent and number – 2, the span of S
generates V means the span of [S] =V that means, by the linear combination of the
elements of S we are getting this vector space V.
So, we will discuss few examples based on this than things will be clear. Now, a basis is
a minimal spanning set that is linearly independent number 1, ok. It is a minimal
spanning tree a spanning set that is linearly independent and number 2, a basis is the
largest linearly independent set that spans V, because if you are having one more element
in this basis then it will become linearly dependent. So, it is a largest linearly
independent set that spans V that generates V, ok.
103
(Refer Slide Time: 02:31)
Now, the first theorem is let V be a vector space which is spanned by a finite set of
vectors u 1 , u 2 , ....,u n , ok. Then any independent set of vectors in V is finite and contains
no more than m-elements.
Now, suppose you have a set of elements u 1 , u 2 , ....,u n and this spans V, then any
independent set of vectors in V is finite and contains no more than m-elements that
means, if you would take any independent set of vectors in V, the number of element in
that set it will not exceed m.
So, that means, if you take any set S which contain more than m elements is always
linearly independent. Indirectly if you want to say that if a independent set is finite and
contains no more than m-element that means, if we take any subset S of V contains more
104
than m-elements that will be always linearly dependent ok. So, let us suppose T is that
set ok, T is such a set containing elements or vectors more than m.
So, let us suppose T is equals to say v 1 , v 2 , ....,v n , where n > m. Now, we have to show
that this T is linearly dependent, ok. So, first of all since span of this set generates V and
T is a subset of V of course, T is a subset of V so, that means every element of T will be
in some linear combination of elements of S ok. So, we can write this v j as
A 1j u 1 +A 2j u 2 +...+A mj u m , where j is varying from 1 to n; that means, which is equals to
summation i varying from 1 to m, A ij u j and j is varying from 1 to n, ok.
Now, in order to show that this set is linearly independent take some linear combination
of this, put it equal to 0 and try to show that not all scalars are 0, this means this set is LT
ok. So, so let us take set S let us take some linear combination of these vectors,. Let us
say take some linear combination of these elements. Now, it is the summation j v j
,where j is varying from 1 to n. So, this is summation j varying from 1 to n and this v j’s
are nothing, but given by this expression here. So, this can be written as summation i
from 1 to m j A ij u i .
Now, this can be re arranged and can be written as summation i varying from 1 to m,
summation j varying from 1 to n A ij j .u i . Now, if this is equal to 0 means this is a
system of linear homogeneous equations, you have to find j ’s. j ’s are unknowns u i ’s
are known, A ij ’s are known, j ’s are unknown. So, it is something like AX = 0, where X
is j ’s, ok, it is to find out, all other things are known.
Now, for this system we are having how many j ’s are varying from 1 to n. So, number
of unknowns are n and number of equations are m and n is more than m. So, this system
will be having more than one solution, I mean infinitely many solutions and if it is
having infinitely many solution this means there will be some non-zero solution also. So,
we have shown that this set of vectors v 1 , v 2 , ....,v n of T are linearly dependent. Hence
we have shown that if we are having a set S which spans V then the linearly independent
set of V is always finite and will not contain more than m elements.
105
(Refer Slide Time: 09:12)
Now, based on this we have another result if V is a finite dimensional vector space now,
what do you mean by dimension? Dimension is simply number of elements in the vector
space. I mean number of elements in a basis of vector space ok, then any two basis of V
have same number of elements. Suppose we have vector space V, and say B 1 is one basis
which consists of say m number of elements; and another basis B 2 which contains say n
number of elements.
So, we have to show that m=n, ok. Now, since this is a basis this means span of B 1 will
be V because we know that basis’s are set which are linearly independent and spans is
equal to V, span of that set is equal to V, ok. Now, span of this set is equal to V and this
B 2 is linearly independent. So, by the previous result we can say that n will be less than
equal to m. Now, similarly B 2 is also the basis. Now, that means, span of B 2 will be
equal to V, now the span of B 2 equal to V and B 1 is linearly independent; that means, m
will be the less than equal to n by the previous result. So, from these two we can easily
say that m = n.
So, what I want to say is, if V has a finite dimension, then it may be having infinitely
many basis, but number of element in each basis are same, ok, number of elements in the
basis are same and that is called its dimension.
106
(Refer Slide Time: 11:04)
Now, say you consider R 3 in R 3 if you take e 1 as (1, 0, 0), e 2 as (0, 1, 0) and e 3 as a (0,
0, 1) then this is a basis of R 3 and is also called standard basis of R 3 .
Now, you can easily see if you take vector space as R 3 over the real field of course, and
you are taking a set which is (1, 0, 0), (0, 1, 0), (0, 0, 1). Now, first of all this set is
linearly independent, it is very easy to show. You take linear combination of this set
linear combination elements of this set put it equal to 0 and this implies 1 = 2 = 3 =0,
that means, set is LI.
Now, secondly, we have to show that a span of S equal to V ok. Now, the span of S is
equal to V; that means, any element of V can be expressed as linear combination of
elements of S you see. We have to show that span of S is equal to V. It is very clear that
107
span of S is a subset of V, it is obvious because V is a vector space a span means linear
combination of elements of S ok, because it is a vector space. So, it must be close with
respect to scalar multiplication. So, that means, this span will be automatically be a sub
set of V. Now, we have to show that V is a sub set of a span of S to show that span of S
is equal to V.
So, take an elements say (x, y, z) in V and we have to show that this (x, y, z) can be
expressed as some linear combination of elements of S, then only we can say that it is in
span of S. So, this x, y, z can clearly be expressed as x(1, 0, 0)+y(0, 1, 0)+z(0, 0, 1) these
are , , he scalars and that means, belongs to the span of S and that means, V is a sub
set of span of S and hence implies a span of S is equals to V. So, we can say that this S is
is a basis of this vector space R3 and this is also called standard basis of R3, ok.
The second example is similarly, we can go for Rn, if you take Rn then you can take e1 as
(1, 0, 1, 0, 0 up to mean n-th term). Similarly, e 2 and e n and e 1 , e 2 up to e n it is a basis of
R n similarly and is also called standard basis for R n
Similarly, if you take P n which is consisting of all the polynomials of degree less than
equal to n over the field F and you take a set {1, x, x2,....xn} is a basis for P n and is also
called standard basis for P n , ok. This is also very simple to show.
If you take this set which is {1, x, x2,....xn}, you are taking P n over F. Now, first in order
to show that it is linearly independent you take a linear combination of this set, put it
108
equal to 0. Now, this 0 can be written as 0+0.x ..... 0. xn. Now, when it will be hold for
every x this will hold for every x only when all i is 0. So, this implies LI.
Now, in order to show that a span of this is equal to V, you take an arbitrary element in
V ok, you take any polynomial say, you take a 0 xn+ a 1 xn-1+....a n in V and this element
can be written as as linear combination of elements of S and hence span of S generates
V, that means, it is a basis of V.
Now, similarly if you consider all matrix of order 22 over a real field and you take a
subset A 1 , A 2 , A 3 , A 4 , where A 1 is this, A 2 is this, A 3 is this and A 4 is this that is a
standard basis of this vector space and this will be a basis because first of all it is linearly
independent, second you take any element any matrix of order 22 that can be expressed
as linear combination of A 1 , A 2 , A 3 , A 4 .
Now, in this problems which of the following subset S form basis for a given vector
space V? Say you take first example, ok.
109
(Refer Slide Time: 17:01)
The first example is you are taking S as (2, 1), (0, -1) and here V is R 2 over the real field.
So, we have to see whether it will constitute a basis for this vector space or not. So, first
of all we have to see whether these vectors are linearly independent or not, number one.
So, take linear combination of these vectors, put it equal to (0, 0). So, this implies (2 1 ,
1 - 2 ) = (0, 0). So, this implies 1 = 2 = 0. So, this means LI. So, first property hold that
means, this set is LI.
The second property is the span of this must be equal to V. So, that means, you take any
x, y in V. So, they exist some , in field such that this x, y can be expressed as linear
combination of these vectors. So, let us try to find those. So, let x, y is equals to (2, 1)
+(0, -1). So, this implies x = 2 and y = -. So, this implies = x/2 and y = x/2- or
= x/2-y. So, here we can write as this (x, y) = x/2 (2, 1)+(x/2-y)(0, -1).
So, for every x, y in V we have find , such that every x, y can be expressed as a linear
combination of these two vectors. You change x and y, you will get this multipliers or
these are scalars correspondingly for which this x, y can be expressed as a combination
of these two vectors. So, what we have shown? We have shown that that span of S is
equal to V, I mean we have shown that any x y can be expressed as a combination of
elements of S and hence we can say that this is a basis of V.
110
(Refer Slide Time: 19:37)
The second example is we have taken these vectors. These vectors I mean these subset of
V and we have to see whether it will constitute a subspace I mean if it constitute a basis
for this P 2 or not. So, how will show again?
So, let us try to prove this, if it is. So, you take as first factor x-1, second is x2+x-1 and
x2-x+1. Here this is P 2 over R. Now, first of all if it is a basis of this P 2 , then must be
linearly independent. So, first we will see whether it is linearly independent or not.
So, take linear combination of these vectors, x + 1= 0. So, now, you collect the like
terms. You see the power of x2 here is no term of x2, here it is , here it is . So, +
must be 0; all the coefficients must be 0 because it is equal to 0 and the coefficient x=
+- = 0 and the constant term is --+=0. Now, from here you can get =-. When
111
you substitute = - here it is =2; Now, when you substitute - as 2 here and as
so, this is automatically satisfied, ok.
So, this means this means it has many solutions. So, there is a non zero solution also. Say
a few if you put = 1 so, = 2 and = -1 and if you take 2(x-1)+(-1)(x2+ x-1)+1(x2-
x+1).So you get a 0. So, this means LD because there are non zero scalars whose linear
combination equal to 0 and that means, it is not a basis of V because for basis it must be
linearly independent, ok.
Similarly, we can show for the other two also that whether it will be subspaces, I mean
basis or not. For example, if you see the third example, now this one can be written as
sin2x+cos2x that is this one is the linear combination of these two elements. So, this set is
LD. So, this will not be a basis of this vector space, ok.
Now, find the basis for a subspace U of V in the following cases, ok. Now, in these
problems we have to find the basis. Now, the first example first is you take {x 1 , x 2 , x 3 }
in R 3 such that 2x 1 +x 2 =0 and x 1 -3x 2 +x 3 = 0 in R 3 . So, how we can find this?
112
So, here we are taking all x 1 , x 2 , x 3 in R 3 such that 2x 1 +x 2 =0 and x 1 -3x 2 +x 3 =0. So, this
will be subspace of this vector space V this is very clear. We can show it because it
satisfy closure property with respect to addition and scalar multiplication. Now, we have
to find out if it is a subspace so, it must be having some basis. So, how can you find out
the basis of this subspace of this V? So, how can you find it let us see. Now, here x 2 = -
2x 1 from this equation.
Now, from this equation if you take x 1 and substitute x 2 as -2x 1 here it is +6x 1 +x 3 =0.
So, this implies x 3 =-7x 1 . So, this set is x 1 , x 2 = -2x 1 and x 3 = -7x 1 . So, this is basically
x 1 (1-2-7) we are, where x 1 belongs to R this is basically U, ok.
So, pick out a linearly independent set which generates this space, that will be the basis.
So, we can say that basis is simply (1, -2, -7) is a basis of U. In fact, any times this
vector is a basis you see this is LI and if you take time this so, this generate the entire
U. So, hence this is a basis of this vector this subspace U. So, what the dimension of this
subspace dimension of this su space is 1, and this can also be computed like this you see
here it is in R 3 and how many equations how many independent equations we are
having, 3 minus 2 so, dimension will be 1, ok.
The next example is you see you are taking all the polynomials p in P 2 such that p'(0) =
0. It is clearly a subspace and now what to find? Its basis p 2 over real field ok; So, let us
write this p as a 0 x2+a 1 x+a 2 belongs to p 2 such that derivative at 0 is equal to 0; that
113
means, what is the derivative of this? It is 2xa 0 +a 1 , this is a derivative of this and p' at 0
is 0. So, this implies a 1 =0.
So, a 1 =0, means what? a 0 x2+a 2 where a 0 , a 2 are real numbers. So, basically U is this set,
all a 0 x2+a 2 in P 2 such that a 0 , a 2 are real numbers. So, how to find out the basis of this,
is very easy. You see you can write this as a 0( x2)+a 2 (1), that is a {1, x2 }will be the basis
of this.
Because you see 1 and x2 are linearly independent. 1 cannot be expressed as times x2
or x2 cannot be expressed as .1, if you take a linear combination of these two vectors 1
and x2, it generates a 0 x2+a 2 all such vectors. So, it is a basis of this subspace and the
dimension of this is 2, ok. This is not the only basis, you can find out infinitely many
basis, but the set must be LI and it should generate entire subspace, ok.
The last example here is, consider all matrices of order 22 such that trace of M is 0.
114
(Refer Slide Time: 28:23)
You see here vector space we are taking as M of order 22 over R and subspace we are
talking as all matrix of order 22 such that trace of A = 0. So, basically we are taking all
x, y z, w in M of order 22 such that x+w = 0, trace is what some more elements. So,
this is basically all x, y, z you can write it -x here because x + w = 0 in M of order 22,
where x, y, z in R. So, this can be written as x (1, 0; 0, -1)+ y(0, 1; 0, 0)+z(0 0; 0, 1, 0).
So, you can write this set as set as (1, 0; 0, -1), (0, 1; 0, 0) and (0 0; 0, 1, 0). this will be
the basis of this U, because first is, this is linearly independent and this the linear
combination of this generates the entire U. So, this will be the basis of this subspace U.
So, hence we have seen that how to find out a subspace of A, how to find out the basis of
a sub space and how we can see that that a given set is a basis of vector space or not.
115
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 09
Linear Transformations
Hello friends. Welcome to lecture series on Matrix Analysis with Applications. So, in the
last few lectures we have seen that what vector spaces subspaces are, and what are their
basic properties. We have also seen, we have also studied a basis and dimension of
vector spaces and subspaces. Now this lecture basically deals with linear transformation,
what linear transformation is and how we can find out a linear transformation from V to
W.
So, let us see now what linear transformation is, you see let V and W be 2 vector spaces
over the field F. A linear transformation which is also say LT. From V into W is a
function, from V into W such that T(u+v) =T(u)+T(v) for every u, v in V and for all
scalars in F.
116
So, basically what linear transformation is, you see that T is a linear transformation from
a vector space V to vector space W over the field F. If for every v 1 , v 2 in V, and for
every in field, T(v 1 +v 2 ) =T(v 1 )+T(v 2 ). Now this v 1 is nothing but a scalar
multiplication of a with a element of a vector space.
So, I am not putting dot here, its understood that it is .v 1 ok. Similarly, this is this T(v 1 )
is a element of W you see we have vector space V we have vector space W and we
define a linear transformation T from V to W if we have any element say v, here the
image of this element in W is simply T(v) ok. So, this T(v) is nothing but a element of a
vector space W this vector space W and this T(v 2 ) is also element of vector space W.
So, this .T(v 1 ) means a scalar multiplication of a with element of W ok. So, I am not
putting dot here its understood this is nothing but .T(v) fine. So, basically if these
property hold for every v 1 , v 2 in V and for every scalar in field that we say that T is a
linear transformation. Now this property can also be stated as we can also say that
T(v 1 +v 2 ) = T(v 1 )+T(v 2 ) first property, and second property is T(v 1 ) = = T(v 1 ) for all
v 1 v 2 in V and belongs to field ok. So, we club these two property here in this step ok.
So, we can also state this single property as these 2 properties. So, if a function T from V
to W satisfy these 2 property then we say that T is a linear map or a linear
transformation. Now, let us discuss few examples of linear transformation the first
example is we have considered T.
117
From vector space R 2 to a vector space R 2 which is defined by T(x 1 , x 2 ) = x 1 +x 2 , 2x 1 -
x2.
If it is a linear map, so how can you proceed for this let us find the T(v 1 +v 2 ) first it is
T((x 1 , x 2 )+(y1 , y 2 )). So, it is x 1 +y1 +x 2 +y2, 2(x 1 +y1 ) - (x 2 +y 2 )
Now, this can be written as (x 1 +x 2 )+(y1 +y1 ), (2x 1 -x 2 )+(2y 1 -y 2 ) = ((x 1 +x 2 ), (2x 1 -
x 2 ))+ ((y1 +y1 ), (2y1 -y2 )) = T(x 1 , x 2 )+T(y1 , y 2 ) =T(v 1 )+T(v 2 )
So, we have shown that this property holds for every v 1 v 2 and belongs to field; that
means, this map is a linear transformation ok. Now similarly we can easily show that T
from R 2 to R 2 which is given by this expression which is also called projection on the x
axis is also a linear map it follow the same lines as we did earlier in the first example. So,
118
now, the third one is we consider T from the set of all polynomials degree less than equal
to n over the field R, to all polynomials of degree less than equals to n-1 over R by T(f(x)
= f '(x). Now this T is also called differential operator ok, where f '(T) know the
derivative of f(x) now this is also all linear map.
How it is a linear map you can simply see here, we have defined T(f(x) as f '(x) where f
'(x) is nothing but derivative of f(x) now you take any f and g in P n (R) ok, and belongs
to field, here field is R, you take f+g; T of this. So, T of this will be nothing but (f+g)'
by this definition and this is nothing but f '+g' and this is T(f)+T(g). So, we have
shown that the property of linear transformation hold for every f and g in vector space P n
over the field R; that means, this will be a linear transformation. Now, similarly if we
define a integral of a to b of a function f(x) where f(x) is a continuous function from a to
b then this is also a linear map ok.
119
So, this is very easy to show again because if you take T(f(x)) as all a to b f(x) dx, I
mean this is T(f) if you take T(f) defined as this, now you take any f and g in set of
continuous function in the closed interval a to b and any in field.
And you take f+g, the T of this that will be equals to a to b(f+g)x dx, which is equal
to a to b(f(x)+g(x) dx, and which is equal to a to b f(x) dx,+ a to b g(x) dx,
and this is equal to T(F)+T(g). So, we have shown that this property hold for every f
and g in the set of continuous function the closed interval [a b] and belongs to field
this; means this map is a linear map.
Now, similarly if we see the last example you see we have considered a fix matrix of
order mn. And we define T as R n1 means it is a it is a column vector basically, to a
column vector of m dimensional space such that T(x) = ax. Now again this, A is fixed if
you take any x and y in R n1 and take cx+y. So, it is easy to show that it is nothing but
T(x)+T(y), so it will be a linear map.
Now, let us see some basic properties of linear transformation, the first property is if we
consider a linear map from V to W, V and W are the vector space over the field F then
the first property is T(0 v ) = 0 w means additive identity of V, always map to additive
identity of W ok.
120
(Refer Slide Time: 12:15)
Now it is very easy to show we are considering here T from vector space V to W. What I
want to show if we have a additive identity here which we denote as T(0 v ) = 0 w means
additive identity of vector space v always map to additive identity of vector space w by
this linear transformation T. So, it is very easy to show that for a linear map v+p is
equals to T(v)+T(p). And this is true for all v and p belongs to V and for all belongs
to field, since it is a linear transformation. Now first we have to show that T(0 v ) = 0 w .
So, you first put p = 0 since it is true for every belongs to field and p and v belongs to
V. So, it will be true for p = 0 also 0 means 0 v ok. So, this means it is T(v)
=T(v)+T(0 v ), because additive identity plus an element of vector space is itself that is
why we are having here 0 v . Now in order to show that now you take say =1, say you
take v= 0 ok, since it is true for every and every v. So, it will be true for =1 and v= 0
So now, we will obtain 1.v = v. So, it is v, I mean v is 0 here 0. v which is equals to
1.T(v) = T(v) and v is 0, so it is T(0 v )+T(0 v ).
So, let T(0 v ) is basically let us suppose it is w. So, we are obtaining w = w+w now w is a
element of capital W it is a vector space, now if it is a vector space. So, its additive
inverse will exist, so you can always add with additive inverse of w both the sides
element which is inversely identity element.
So, it is 0 w = w+0 w , so this implies w = 0 w and this implies w is nothing but T(0 v ). So,
this T(0 v )=0 w , so this is a first most property that if it is a linear map. So, T(0) will
121
always may map to 0 of w; additive identity of w, the second property is T(-v) is
additive inverse of v always map to or always equal to -T(v); that means, additive
identity of T(v). So, again it is easy to show you see we have to show that T(-v) = -T(v) o
for every v in V ok.
Now, we know that if it is a linear transformation then T(v+p) =T(v)+T(p) for all v, p
belongs to vector space V and for all belongs to field, this we already know now you
put p = 0 first since it is true for every p. So, it will be true for p = 0 also 0 means 0 v ok.
So, it will be T(v) the left hand side will be equals to T(v)+T(0 v ); now T(0 v ) = 0 w . So,
it is T(v+0 w ) = T(v). Now you substitute put = -1, if you put =-1 we have already
shown you the vector spaces that -1.v is nothing but -v.
So, it is T(-v) = -T(v) and it is -1.w, where w belong to W. So, we have shown this
property also the last property is T( 1 x 1 + 2 x 2 +...+ n x n ) =
1 T(x 1 )+ 2 T(x 2 )+...+ n T(x n ) where x i ' s are the element in vector space and is
element is field, again it is easy to show the result is very trivial.
122
(Refer Slide Time: 17:27)
You see T( 1 x 1 + 2 x 2 +...+ n x n ) if you take this now you take this element as suppose
capital X. So, we know the property of vector space, so by the property of vector space
this will be equals to 1 T(x 1 )+T(X)
Now, it is 1 T(x 1 )+T( 2 x 2 +...+ n x n again you take this as say capital Y again you apply
the property of vector space I mean linear transformation. So, this will be 2T(x 2 )+T(Y).
So, similarly if you extend this up to n times, so we will get the same result which we are
having here.
Now, the next theorem is let V be a finite dimensional vector space over the field F and
let v 1 , v 2 ,....v, be n ordered basis for V ok. Let W be a vector space over the same field F
and let w 1 w 2 up to w n be n vectors in capital W. Then there is a unique linear
transformation T from V to W such that T(v i ) = w i for i = 1 to n ok, that now what I
want to say basically in this theorem that you are having a linear transformation T from
V to W ok
123
(Refer Slide Time: 19:00)
Here you are having from vector say v 1 , v 2 ,... v n which is an ordered basis of V ok. So,
they will always exists a unique linear transformation T this T such that such that T(v i )
will map to w i ok, this is the main result that there will exist a unique linear
transformation T from V to W such that this happens.
Now, proof is very easy you see here you take any let v belongs to V ok, if any v belongs
to this space and this we already know that v 1 , v 2 ,... v n is a ordered basis for V ok. If it is
a basis this means any element v in this vector space can be written as linear combination
of element of the basis because, it is a basis; that means span of this will generate the
entire vector space V.
So, if you take any element v in this vector space that can be written as linear
combination of elements of V. So, this implies there will exist unique 1 , ... n in field
such that T(v) will be equals to T( 1 v 1 + 2 v 2 +...+ n v n )ok. Now for this v for this v, we
can define T(v) as 1 T(v 1 ) =w 1 , T(v 2 ) = w 2 and so on. Now this T is well defined and it
is cleared T(v i ) =w i , now we have to show that, this map is linear and this is unique.
So, for linear how we can show linear you can take say v and p are 2 elements in V you
take any belongs to field and we have to show that T(v+p) = T(v)+T(p). So, how
can we show this. So, this p can be written as some linear combination of element of v i
124
's. So, this will be, p= 1 v 1 + 2 v 2 +.... n v n , and this implies T(p) =?, 1 T(v 1 ) = 1 w 1 . So,
it is 1 w 1 +... n w n . Now you take v+p because we have to show this result for T as a
linear map.
So, we have shown that T(v+p) = T(v)+T(p) means this is a linear map then next thing
to show that it is this linear map is unique. So, in order to show that this linear map is
unique you consider a linear map U such that U(v i )= w i for all i. Then if you write U(v)
from here you see if you take any v in V, then that v can be uniquely expressed as
1 v 1 + 2 v 2 +...+ n v n So, what will be U(v) = U( 1 v 1 + 2 v 2 +...+ n v n ) and this will be
1 U(v 1 )+ 2 U(v 2 )+...+ n U(v n ) and this will be 1 w 1 + 2 w 2 +...+ n w n ok; that means, it
is equals to T(v) again from here so; that means, since the expressions are same; that
means, transformation is unique So, we have shown that this is a linear map and a
transformation is unique hence we have proved the theorem ok.
Now, the next is determine whether there exists a linear transformation in the following
cases and if exist find the general formula. So, let us start with a first problem if we are
having say T from R 2 to R 2 which is defined as this expression then can we find, a linear
transformation such that these properties hold how can we see. So, let us see let us start
with a first problem.
125
(Refer Slide Time: 26:20)
So, here it is T: R 2 to R2 and here T(1, 2) = (3, 0), and T(2, 1) = (1, 2) first of all we
have seen that these two elements are in R 2 .
First of all let us observe that if they are linearly independent. So, answer is yes we can’t
write (1 2) as a liner combination of (2 1) ok. So, yes they are linearly independent now
what will be the dimension of R2 , dimension of R2 will be 2 and if there exist any 2
linearly independent vector in R2. So, that will be the basis of R2, I mean the span of
those two elements will generate the entire vector space ok.
Now, these elements (1 2) and (2 1) in R2, so the span of these 2 elements will be
definitely R2, because they are linearly independent thus what is the dimension of Rn,
dimension of Rn is n ok. And if you are having any set containing n linearly independent
vectors for that will be the dimension that will be the basis of Rn, there are infinite basis
of Rn, they have infinite basis of R2, one of the basis is you take any two linearly
independent vectors in R2, the span of this will definitely generate R2 this is one of the
basis ok.
Now is there exist any linear transformation such that this equal to this and this is equal
to this. So, and if yes how can you find that, so you see you take any (x, y) in R2, any (x,
y) here now since span of this generates entire R2 so; that means, there will exist some
126
scalars and such that this (x, y) can be written as linear combination of these two
vectors ok.
So, this implies x = +2 and y= 2+. Now you multiply this by 2 and subtract these 2
equations what we will obtain this is minus 3 alpha. Multiply by 2 and subtract with this
of first equation and this equals to x minus 2 y and that implies that implies =2y-x/3
and = -y+2x/3
Now this (x, y) = (the first vector)+ (the second vector). What is ? =2y-x/3(1,
2)+2x-y/3(2, 1), now what is T(x, y), since T is linear. So, it is 2y-x/3 T(1,2)+2x-y/3
T(2,1) = 2y-x/3 (3, 0)+2x-y/3 (1, 2), now you can simplify this and we can easily find
out what is T(x, y) ok. So, in this way we can find out a linear transformation T from R2
to R.
Now, if you see the second example ok, here 3 elements are given to you that is T(0 1) =
(3 4), T(3 1) = (2 2) and T(3 2) = (5 7) of course, (0 1), (3 1) and (3 2) are not linearly
independent, because the dimension of R2 is only 2 and here we are having 3 vectors. So,
you take any 2 linearly independent I mean any two vectors says (0 1) and (3 1). Find
out a linear transformation as we did in the example first and if that linear transformation
satisfy the third expression also then there exist such linear transformation otherwise, we
say that the linear transformation does not exist. So, if you have a condition here in the
first example we are having two conditions only and the vectors are linearly independent
here we are having 3 vectors ok.
So, basically if you have to see that if you have to see that such linear transformation
exist then you can write (3 2) or (3 1) anyone vector as a linear combination of remaining
2 and try to see that whether the expressions are also same or not. Images are also same
or not if they are not; that means, that linear transformation does not exist. Now for the
third example you see it is defined from P 2 to P 2, P 2 is a polynomial of degree less than
equal to 2.
So, what is the dimension of P 2, dimension of P 2 will be 3. So, for a unique LT in order
to find the unique L T unique linear transformation from P 2 to P 2 we must have at least
three independent conditions, but here the conditions are only two. So, this means there
exist infinitely many linear transformation from P 2 to P 2 . And if you are interested to
127
find out one such linear transformation then you take a vector independent of this and
this take any image of this and then you can find out say for example.
Here what is given to us T is from P 2 to P 2 ok, now T(1+x) = x+2 and T(x2) = 4x, only
two conditions are given to you.
So, there will be infinitely many linear transformations from P 2 to P 2 satisfying these
two equations, suppose we are interested to find out one such linear transformation. So,
how can we proceed let you take T(x) = x2; this will make these linearly independent.
These 3 vectors are linearly independent you can easily verify that these 3 vectors are
linearly independent. Now you take any polynomial of P 2 say a+bx+cx2 that can be
written as (1+x)+(x2)+(x), because these 3 vectors are linearly independent and the
dimension of P 2 is 3.
So, this will form a basis of P 2 , so any vector in P 2 can be written as linear combination
of element of the basis. Now this implies if you take here the constant here is . So, now
a= , the coefficient of x is + and that is equal to b, the coefficient x2 is = c, now this
implies = b - a, because is a. So, we can say that a + bx + cx2 = (1+x)+x2+ x.
128
So, this is a our required linear transformation, one such linear transformation.
So, there are infinite ways to consider the third expression. So, there are infinite linear
transformation of such type, but one such linear transformation is this. So, in this way we
have seen that what linear transformation is and what are the basic properties of linear
transformation now there may be some examples say.
So, in this lecture we have seen that what linear transformations are and what are the
basic properties of linear transformation. In the next lecture we will see some more
properties of linear transformation.
Thank you.
129
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 10
Rank and Nullity
Hello friends welcome to lecture series on Matrix Analysis with Applications. In the last
lecture we have seen that what linear transformations are and what are the fundamental
properties of linear transformation. In this lecture we will see what is mean by Rank and
Nullity of a linear transformation ok.
So, let V and W be vector spaces over the field F and let T from V to W be linear ok.
Then, basically what we are having we are having a linear transformation T from V to W
V is a vector space W is the vector space over the field F ok. Now the null space o of this
linear transformation T are all those elements in V, such that T(v) = 0 of W. All those
130
elements, which map to 0 of W, it may be 1 or it may be a set ok. This set is called null
space of T and the range of T is defined as all those w in W such that T(v) = w for v in
V. I mean the image of v in W, will be the range of T ok.
Say we have this example and we have to find for this example null space say N(T) and
and rank r(T) so what is the linear transformation here.
131
So, the linear transformer here is R3 to R2 and it is given by T(x 1 , x 2 , x 3 ) = x 1 -x 2 , 2x 3 .
Now, first we have to find null space of T, by the definition it is all those v in V such that
T(v) = 0. Now here it is all those (x 1 , x 2 , x 3 ) in R 3 , such that T(x 1 , x 2 , x 3 ) = (0 0),
because the image is in R2.
So, these are all those (x 1 , x 2 , x 3 ) in R3 such that now T(x 1 , x 2 , x 3 ) is nothing, but x 1 -x 2 ,
2x 3 = (0, 0). So, this is all those (x 1 , x 2 , x 3 ) in R3 such that x 1 =x 2 and 2x 3 = 0; that
means, x 3 = 0. So, this is simply all those x 1 because x 1 and x 2 are equal and x 3 = 0 such
that x 1 belongs to R. So, this is all x 1 (1 1 0) such that x 1 belongs to R. So, this is
basically the null space of T or we can say that it is equals to the span of (1 1 0) ok.
Now, range of T are all those (y1, y 2 ) in R2 such that T(x 1 , x 2 , x 3 ) = (y 1 , y 2 ), for all (x 1 ,
x 2 , x 3 ) R3. So, we have to find out r(T) of all (y1 , y 2 ) in this R2 such that T(x 1 , x 2 , x 3 ) =
y 1 , y2 . Now, you can simply observe here that T (y 1 , 0, y 2 /2). So, it is equal to (y 1 ,y 2 )
you guys simply observe here it is y1 -0 = y 1 and y 2. 2 = y2 so; that means if you take for
any y 1 y2 here in W which is R2 there exist a pre image here in R3.
So, that; means, this function map is onto and since this is an onto map. So, the range of
T is nothing, but entire 2 ok, because for any vector (y1 , y 2 ) in R2 there exist (y1 , 0, y
2 /2) in R3 such that T of this equal to this. So, if you take any (y 1 , y2 ) here. So, there
exist a pre image here in R3 so; that means, range of T is the entire R2 ok.
Now, there are some basic properties: the first property is range of T is a subspace of W
ok.
132
(Refer Slide Time: 06:38)
Now what we are considering, here basically T is a linear map from V to W over the
field F, the first is range of T which is defined as all those w in W such that T(v) =w for
all v belongs to V. For all I mean you vary v in V and the collection of all those w in W
will be a range of T and we have to show that this is nothing, but a subspace of W. So,
how can we show that? You take let w 1 and w 2 in R2 and for sub space we have to show
that w 1 +w 2 is also in R2 and you take alpha in field ok. So, if these are in R2 this means
there exist some v 1 , v 2 in V, such that T(v 1 ) is w 1 and T(v 2 ) =w 2 ok.
Now, if you take T(v 1 +v 2 ), so since a linear map. So, it is T(v 1 +v 2 ) which is w 1 +w 2
by this expression. So, what we have we have shown that there exist a element (v 1 +v 2 )
whose image is w 1 +w 2 this implies w 1 +w 2 belongs to W, because this element is in
V and we have shown that there exist a element in V whose image is w 1 +w 2 ; that
means, this is in r(T) by the definition of r(T)
So, this means this is in r(T) yes, so this implies r(T) is a sub space of W then next is
n(T) is a sub space of V, the nullity how we can show this.
133
(Refer Slide Time: 08:57)
Now, nullity of T or all those v in V such that T(v ) = 0, so again you take some v 1 , v 2 in
n(T) and belongs to field and we have to show that alpha v 1 +v 2 also belongs to n(T)
for n(T) to be a sub space. Now, this implies T(v 1 ) = T(v 2 ) =0.
So, now take T(v 1 +v 2 ) since is a linear map, so it is T(v 1 )+T(v 2 ). So, it is .0+0
which is equal to 0, so this implies v 1 +v 2 belongs to nullity of T, because this element
is in V and we have shown that T of this element equal to 0 and T of all those v in V
whose image is 0 are in null space of T or nullity of T, means null space of T. So, we
have shown that null space is a subspace of V.
So, basically what we have shown that if you are having two subspaces V and W and T
is a linear map from V to W. If this is a nullity of T then this is not only a subset of this
V this is a subspace of this V, and the image of this V which we are denoting by r(T). So,
it is not only a subset of W it is a subspace of W, the next step is if T is 1 to 1, if and only
if null space is a 0 sub space or the singleton says 0 of V, so the proof is again simple let
us discuss.
134
(Refer Slide Time: 10:54)
Now, where now when we can say that T is 1 to 1 T is 1 to 1 means T(v 1 ) = T(v 2 ) should
implies v 1 = v 2 for all v 1, v 2 in V ok. Now, let us first take let T be 1 to 1 and we have to
show that null space T is a singleton 0 of V ok. Now T is 1 to 1, let v belongs to nullity
of T, so if we have shown that v is nothing, but 0 of V then we have done.
Now if this belongs to null space; that means, T(v) = 0, now this implies T(v) = T(0)
because we know that T(0), 0 maps with 0 only. Now, since T is 1 to 1 and by this
definition we can simply say that v equal to 0. So, that implies nullity of T is nothing, but
singleton 0 of V. Now the converse, let null space of T is the singleton 0 and we have to
show that T is 1 to 1.
So, let T(v 1 ) = T(v 2 ) and we have to show that v 1 = v 2 for T to be 1 to 1. So, this implies
T(v 1 )-T(v 2 ) will be 0; you can add additive inverse of T(v 2 ) both the sides. Now, this is
equal to V know that this is equal to T(v 1 -v 2 ) = 0; this implies T(v 1 -v 2 ) equal to T(0) ok,
any way this is not required now T(v 1 -v 2 ) = 0. So, this implies v 1 -v 2 is in the null space
of T and null space of T is the singleton 0 of V, so this implies v 1 = v 2 .
So, we have shown that T is 1 to 1 if and only if null space of T is a singleton 0 of V, the
next result is if the span of v 1, v 2 ....,V n equal to V, then the range of T is nothing, but
span of T(v 1, v 2 ....,V n ), again it is easy to show.
135
(Refer Slide Time: 13:23)
You see here this span of v 1 , v 2,..., v n is equal to V, and we have to show that r(T) is
nothing, but T(v 1 ) and T(v n ) is span of this ok. Now, T(v i ) belongs to r(T) for all i
because, T(v i ) is simply the image of v i . So, by the definition of r(T) the T(v i ) will
belongs to r(T) and since r(T) is the sub space of W. So, we can easily say that span of
T(v i ) is span of T(v 1 ) and T(v n ) will be definitely contained in r(T) because r(T) has sub
space of W.
So, to show that it is equal to this we have just to show that r(T) is a subset of span of
T(v 1) up to T(v n ); Now, to show that this is a subset of this take an arbitrary element in
r(T) and try to show that it is also in the span of this set. So, let some W belongs to r(T)
if w belongs to r(T) this implies there exist some v belongs to V such that T(v ) =w.
Now, since this v is in V and V is equals to span of v 1 , v 2. ... v n ; that means, this V can be
expressed as linear combination of element of this set. So, since v belongs to V this
implies there exist 1 , 2..., n belongs to field such that v = 1 v 1 , 2 v 2..., n v n. now T(v)
= 1 T(v 1) , 2 T(v 2 ) ..., n T(v n ) , since T is linear, that means, belongs to span of T(v 1 ),...
T(v n ) and T(v) = w ok. So, we have shown that is w belongs to span of this ; that means,
; that means, r(T) is the subset of span of this.
So, from these two results we can say that r(T) is equal to the span of T(v 1 ),... T(v n ).
Now the next result is if V is a finite dimensional vector space then r(T) that is a rank of
T is always less than equal to dimension of V. So, it is easy to show. So, if these vector
are linearly independent, and spans V; that means, V is n dimensional and from this
result we know that r(T) is equal to span of this, but span of T (v i ) may not be linearly
136
independent ok. So, basically the dimension of this r(T) will be less than equal to
dimension of V, the largest LI set is itself if this is the whole set is LI then this will be
the basis of V ok, but the image of Ti i. e., T(v i ) may not be linearly independent, this
may be possible there some of T(v)'s are linearly dependent. So, the dimension of this
will be less than equal to dimension of V if you have a counter example say you are
taking a span of (1 0) and (0 1) as R2.
It is v 1 it is v 2 now you define you can define the linear map as this T(x 1 , x 2 ) as say
x 1 +x 2 , x 1 +x 2 . Now, T(1, 0) = (1, 1) and again T(0, 1) is again (1 1). So, the span of T(1
0) and T(0 1) is of course, equal to r(T) by the definition by the result, but that is simply
(1, 1) because both the elements are same. So, dimension of r(T) is less than equal to
dimension of V, it cannot be more than this now the next result is the rank nullity
theorem.
It states that if V and W are 2 vector spaces of field f and let T from V to W be linear if
V is finite dimensional then nullity of T + rank of T is always equal to dimension of V.
So, the result is very easy to show, let us see the proof we have to show that nullity of T
plus rank of T is equal to dimension of V.
137
(Refer Slide Time: 18:46)
Here T is from V to W this we have to show. Now, let us consider that v 1, v 2 ,... v n be an
ordered basis of n(T) null space of T let us suppose. Now since it is a subspace of V. So,
we can always extend this basis to form the basis of V. n(T) is the sub space of V. So,
we can always extend this sub space to form the basis of V. So, let us suppose the
dimension of V is say p ok. So, extend the basis of n(T) to form basis of V.
So, let us suppose this is a basis of V, where of course, p n. Now, we have to show that
now let us consider a set say we consider set S which is nothing, but T(v n+1 ), T(v n+2 )
and so on up to T(v p ). Now, if we show that this is nothing, but range of T, I mean this is
nothing, but a basis of range of T then the dimension of this then we have shown because
here nullity of T is n ok.
And say it is k and it is p then k = p - n, and if any how we have shown that this set S is a
basis of r(T) whose dimension is p - n; that means, we have shown. So, we have just to
show now that this S is nothing, but a basis of r(T). So, the basis of r(T) we have to show
the two properties number one: it is linearly independent and number two: is span of this
generates entire r(T).
138
So, first let us see the span of this is T(v n+1 ), T(v n+2 ) and T(v p ). We know that if V =
span of (v 1 , v 2 ,...v n ) then r(T) = span of T(v 1 ), T(v 2 )...T(v n ) ok. So, now here a span of
(v 1 , v 2 ,...v p ) = V, because this is a basis o V. So, the span of T(v 1 ), T(v 2 )...T(v n ),....,
T(v n+1 ), T(v n+2 ) and T(v p ) = V, now since (v 1 , v 2 ,...v n ) is n(T). So, T(v i ) up to T(v n ) is
0, so these are all 0.
So; that means, its equal to T(v n+1 ), T(v n+2 ) and T(v p ) and that is equals to r(T). So, the
first property we have shown that the span of this is nothing, but span of r(T), I mean is
equal to r(T) that is a span of this is entire r(T). Now the second thing we have to show
that this set is linearly independent. So, take some linear combination of this elements
put it equal to 0, and we have to show that all 's from n+1 to p are 0.
So, this implies T( n+1 v n+1 ),.....,T( p v p ) = 0 because T is linear. So, this implies
( n+1 v n+1 ),.....,( p v p ) belongs to null space of T and since it belongs to null space of T
and the basis of null space of T is (v 1 , v 2 ,...v n ). So, this element can be written as linear
combination of element of basis of n(T). So, this implies ( n+1 v n+1 ),.....,( p v p ) will be
1 v 1 ,.... n v n .
So, this implies 1 v 1 ,.... n v n - n+1 v n+1 ),.....,( p v p ) = 0, but this set (v 1 , v 2 ,...v p ). is a
basis of V. So, it will be linearly independent so this implies i 's are 0, i from 1 to n and
j 's are 0 for j from n+1 to p. So, we have shown that j 's from j varying from 1+n to p,
are 0. This means this set is linearly independent. So, we have shown that this is linearly
independent and the span of this is equal to what you did; that means this is a basis of
r(T), and hence we have shown the result now let us come to few problems based on this.
139
State true or false the proper reasoning: the first one is non linear transformation from R5
to R3. Now, let us see whether it is true or false. So, for all these things we will apply a
rank nullity theorem by the rank nullity theorem you see here T is from R5 to R3 ok.
So, this implies from here r(T) will be 5 which is not possible since the maximum value
of r(T) is 3, see r(T) is a sub subset subspace of W. So, its dimension cannot exceed the
dimension of W it is a sub space of W. So, r(T) is 5 here which is not possible because
the maximum value of r(T) is 3; So, we cannot have any linear transformation from R5 to
R3 which is 1 to 1.
So, the statement is false yeah statement is true no linear transformation yeah. So, the
first statement is true now second: if T is from R2 to R2 is linear and 1 to 1 then it is
onto.
140
(Refer Slide Time: 27:18)
In the next: if T is from R3 to R3 and it is linear and onto then rank of T is equal to
dimension of V here dimension of V is 3 ok. So, now, it is linear and onto, now T is from
R3 to R3 it is V it is W dimension of V here is 3, T is linear and onto means r(T) is equal
to dimension of W which is 3. So, nullity of T plus rank of T will be equal to dimension
of V which is 3.
141
The next is if T is from V to W is linear and a finite dimensional then rank of T is always
less than equals to minimum of dimension of V and dimension and dimension of W, you
see, here T is a linear transformation from V to W ok.
Now, r(T) is definitely less than equals to dimension of W because r(T) is a subspace of
W, if r(T) is s subspace of W. So, its dimension will always be less than equal to
dimension of W, if it is onto then it nullity holds and by the rank nullity; nullity of T plus
rank of T equal to dimension of V. So, rank of T will be equal to dimension of V minus
nullity and nullity of T is greater than equal to 0, it is 0 if T is 1 to 1. So, rank minus
nullity of T will be less than equal to 0; that means, dimension of V minus nullity of T
will be less than equal to dimension of V.
So, this implies r(T) is also less than equals to dimension of V, so since it is less than
equals to dimension of V and also less than equal to dimension of W. So, this implies
r(T) will be less than equal to minimum of dimension of V and dimension of W. The last
one is if T from R3 to R5 be a linear map then r(T) can be a 2 dimensional sub space of
R5, so let us see now here T is.
142
(Refer Slide Time: 31:29)
So, yes there exist if r(T) is 2 and n(T) is 1 for some T then 1+ 2 is 3 yes there exist. But
there, cannot exist r(T) I mean r(T) of more than 3 dimensional subspace of R5, because
sum must be 3 the minimum value of n(T) is 0. So, definitely r(T) will be equals to 3
minus n(T) and n(T) will be greater than equal to 0; that means, r(T) will be less than
equal to 3; it cannot be more than 3. So, for this problem it is true.
So, we have seen that what rank and nullity of a linear transformation is and what are the
few properties based on this.
Thank you.
143
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 11
Inverse of a Linear Transformation
Hello friends welcome to lecture series on Matrix Analysis with Applications. In the last
lecture we have seen that what linear transformation is and what are the fundamental
properties of linear transformation. Also, we have seen that what the rank of linear
transformation and how we can define null space or nullity of a linear transformation T.
Now, in this lecture we will see that how we can find inverse of a linear transformation
ok.
Now, first here definition that let V and W be vector spaces over the field F and let T is a
linear transformation from V to W. A function U from W to V is said to be an inverse of
T if TU is identity of W and UT is equal to identity of V. If T has an inverse, then T is
said to be an invertible LT.
Invertible means there exist linear transformation U from W to V such that TU equal to
identity of W and UT equal to identify of V. Also, if T has an inverse then the inverse of
T is unique and is denoted by T-1 ok.
144
(Refer Slide Time: 01:54)
So, we have already seen that if we have a linear transformation T from say V to W, then
we say that this T is a linear transformation, if T(v 1 +v 2 ) = T(v 1 )+T(v 2 ) for all v 1 , v 2
in V and alpha belongs to field ok. Now, T-1 is a linear transformation from W to V ok,
this is T-1; T is the linear transformation from V to W and T-1 is the linear transformation
from W to V. Now, when this T-1 exist? This T-1 exist if we have already seen that this is
equal to U basically what we have say U, then this will exist if TU equal to identity of V
and UT equal to identity of W.
Now, T-1 will exist if and only if the number one, T must be 1 to 1 and the second is T is
onto that means, bijective something like bijective mapping ok.
145
(Refer Slide Time: 04:25)
Now, let us solve this problem. Determine whether linear transformation T in each of the
following cases invertible? And if yes find T-1 . So, the first example is T is from R2 to
R2 be defined by this ok. So, what is T here let us see.
Now, to verify whether T-1 exist or not or T is invertible or not we have to see two
conditions. Number 1, T must be 1 to 1 and number 2, T must be onto. So, first is T is 1
146
to 1 or not, first we will verify whether T is 1 to 1 or not. Now, for T, 1 to 1 we have to
find the null space of a T if null space of T is a singleton 0 of V, V here is this R2 then
we say that T is 1 to 1. So let us find nullity of T, nullity of T are all those (x 1, x 2 ) in R2
such that T(x 1 , x 2 ) = (0, 0). So, this is equal to all (x 1 , x 2 ) in R2 such that T(x 1 , x 2 ) = (0,
0) means this mapping (x 1 -x 2 ) = 0 and x 1 +2x 2 = 0.
So, when we solve these two equation we will simply get x 1 =0 and x 2 = 0; that means, (0
0) is the is the only point which is in the null space of T that is the singleton 0. So, this
implies T is 1 to 1. Now, second is T must be onto. So, for T to be onto means we have
to see that rank of T must be equal to W ok. Now, if you find rank of T, I mean r(T), r(T)
is all (y1, y2 ) in W such that T(x 1 , x 2 ) = (y 1 , y 2 ).
So, this implies what is T(x 1 , x 2 )? T(x 1 , x 2 ) is simply (x 1 -x 2, x 1 +2x 2 ) should be equals
to (y1 , y2 ). So, this implies x 1 -x 2 should be equal to y 1 and x 1 +2x 2 should be equal to y 2 .
Now, when you solve these two equation is simply subtract these two. So, we will obtain
-3x 2 = y1 -y2 or this implies x 2 = y2 -y1 /3 ok. And when you find x 1 from the first
equation it is y1 +x 2 which is equals to y1 +y2 -y 1 /3=2y1 +y2 /3.
So, what we have shown we have shown that for every for every (y 1 , y 2 ) in W there
exist (x 1 , x 2 ) in V such that x 1 is given by this expression, x 2 is given by this expression;
that means, that simply means T(2y 1 +y2 /3) which is x 1 and x 2 is y2 -y 1 /3 is equal to y1 ,
y2 ok.
So, for every (y 1 , y 2 ) there exist (x 1 , x 2 ) in V; that means, this mapping is onto. So, this
implies onto or we can simply apply rank nullity theorem; you see nullity of T is 0 ok.
Dimension of V is 2 from here and by rank nullity, nullity of T plus rank of T must be
dimension of V, it is 0 plus rank of T equal to 2. So, this implies rank of T equal to 2 and
this implies mapping is onto ok. So, this is a simple illustration to showed at T is onto.
Now, T is onto and 1 - 1 this implies T is invertible ok.
Now, if T is invertible how we can find T-1. So, to find T-1. Let us take say T-1(z 1 , z 2 ) =
(x 1 , x 2 ) ok. So, this implies T(x 1 , x 2 ) = (z 1 , z 2 ), T(x 1 , x 2 ) is this thing and this is further
equal to x 1 - x 2 = z 1 and x 1 + 2x 2 = z 2 .
Now, when you solve these two you will simply obtain x 2 as we did here as x 2 as z 2 -
z 1 /3 and x 1 as 2z 1 +z 2 /3 ok. So that means, T-1(z 1 z 2 ) = 2z 1 +z 2 /3 and z 2 -z 1 /3. So, this is
147
T-1 of this map. So, first we have to validate whether this T is invertible or not. In order
to validate this you have to show that T is 1 to 1 and onto. If it is not 1 to 1 or onto this
means T is not invertible and if it is invertible then we can find T-1 using this processor
ok. Now, the next example we have considered linear map T from considering all
matrixes of order 22 to P 2 over field R and it is given by T(a, b; c, d) = a+2bx+(c+d)x2.
So, first we will see whether this is invertible or not. So, here T(a b; c d) matrix of order
22 maps to a+2bx+(c+d)x2. So, first we will find out null space of this linear
transformation. So, we will take all those (a b c d) in M of 22 such that T(a b; c d) = 0.
Here 0 means 0 polynomial, 0 of P 2 , 0 of P 2 means 0+0x+0x2. So, these are all (a b c d)
such that this expression T(a b c d) is simply this expression and this equal to 0 means
the constant term equal to 0, the coefficient of x = 0 and c + d = 0. And this means a = 0,
b = 0 if it is c then d = -c, where c belongs to R.
So, clearly the null space null space is what null space is all the linear combinations of
this type of matrix where c belongs to R. So, that null space of this is not singleton 0, this
implies this is not 1 to 1 say this map is not 1 to 1. So, this mapping is this linear
transformation is not invertible ok. Now, the third example we consider all matrixes for a
22 matrixes of order 22 and defined by this expression. So, let us see whether this is
1- 1 and onto or not.
148
(Refer Slide Time: 13:21)
So, here T(a, b; c, d) = [a+b, a; c, c+d]. Now, we have to see whether first we have to
verify whether T is 1 to 1 or not. For 1 to 1 we have to find out the null space of T, null
space of T are all those a b c d such that T(a, b; c, d) = 0 matrix [0 0; 0 0] ok. Because
here mapping going to all matrix of order 22.
So, these are all those a b c d such that now T(a, b; c, d) = 0. So, this means a+b=0, a=0,
c=0 and c+d = 0. If a = 0 then from here b= 0 and if c = 0 from here d = 0; that means,
only singleton 0 matrix and this implies null space of T is only 0 and this implies T is 1
to 1. Now, we have to see whether T is onto or not. So, that we can see very easily by
using rank nullity theorem, you see nullity of T plus rank of T must be equals to
dimension of V. Here V is matrixes of order 22.
So, its dimension is 4 ok, nullity is 0. So, this implies rank of T is 4 which is equals to
dimension of W, W here is also M of order 22 whose dimension is 4. So, this implies T
is onto. Now, T is 1 to 1 and T is onto this implies T is invertible, now if T is invertible
then what will be T-1. So, we can find T-1 you see here.
149
(Refer Slide Time: 15:27)
Let T-1( ; ) = suppose [a b; c d] ok. So, this implies T(a b; c d) will be equals to [
; ] and this implies T(a b c d) is given by [a+b a; c c+d]. And it is equal to [ ; ]
this implies a+b = , a = , c = and c + d = . And this implies a = and b = -; c =
and d = -.
So, what will be T-1( ) this is given by a = , b = -, c = and this is -. So, this
would be the T-1 of this linear transformation ok.
Now, we will see some properties of inverse of a linear transformation. The first property
is, it is a linear map; the T-1 of linear transformation is again linear transformation. So,
150
what is the statement to the theorem? Let V and W be vector spaces over the field F and
let T from V to W be linear and invertible ok, then T inverse which is from W to V is
also linear ok. So, in order show that T-1 is linear we have to take two elements w 1 and
w 2 in W and in field and we have to show that T inverse of w 1 +w 2 = T-1(w 1 )+T-
1
(w 2 ).
So, in the proof we have taken two vectors w 1 and w 2 in W and in field. And since T is
1-1 and onto, because T is invertible, then there exist a unique vectors v 1 and v 2 in V
such that T(v 1 ) = w 1 and T(v 2 ) = w 2 . Therefore, T-1(w 1 ) = v 1 and T-1(w 2 ) = v 2 . Now,
we have to show that it is linear we have to show the T-1(w 1 +w 2 ) = T-1(w 1 )+ T-1(w 2 ).
So, if T is linear and invertible then T inverses also linear and invertible.
151
Now, the next result is if V and W be finite dimensional vector spaces and T is a linear
map from V to W. Then we have to prove that we have to prove the first and two results.
So, what is the first result? The first result is if dimension of V is less than dimension of
W then T cannot be onto. So, let us try to prove this result.
Now, T is a linear map from V to W. So, in the first part dimension of V is less than
dimension of W and we have to show that T cannot be onto. Now, when T will be onto?
T will be onto when r(T) = dimension of W ok.
So, by the rank nullity theorem you know that nullity of T plus rank of T equal to
dimension of V and dimension of V is less than dimension of W it is given to us. So, this
implies rank of T < dimension of W - nullity of T. Now, nullity of T 0 we know this.
So, -( nullity of T) 0. So, this implies rank of T because this is always less than equal
to 0 with negative sign. So, this will again be less than 0; I mean less than dimension of
W.
So, this means rank of T < dimension of W, it will never be equal to dimension of W. So,
this means T cannot be onto ok. So, the first part of the theorem is over. So, the next part
is if dimension of V is more than dimension of W then T cannot be one to one ok. Now,
again T is a linear map from V to W and dimension of V is more than dimension of W
and we have to show that T cannot be one to one that means, nullity of T cannot be 0 or
nullity of T is always strictly greater than 0, this you have to show ok.
152
(Refer Slide Time: 21:53)
Now, again apply rank nullity theorem, it is, nullity of T + rank of T = dimension of V
ok, which is more than dimension of W. So, this implies, now if, nullity of T = 0 then
this implies r(T) > dimension of W from this expression which is not possible. This is not
possible because rank of T can never exceed dimension of V because range of T is a
subspace of W ok.
So, this implies nullity of T can never be 0 or nullity of T > 0 and implies T cannot be 1
to 1 ok. So, proof of this is over.
Now, the next result is let T be a linear transformation from V into W. Then T is non-
singular if and only if T carries each linearly independent subset of V onto a linearly
153
independent subset of W. That means, if T is a linear map and is non-singular also then
you take any linearly independent subset of V; it always maps to linearly independent
subset of W or if this happens for any subset from T to W, I mean from V to W then T is
non-singular.
So, how we can show this? The proof is simple you can see T is a linear map from V to
W ok. We have to show that T is non-singular if and only if T carries each linearly
independent subset of V onto linearly independent subset of W ok. So, first of all let us
take T is non-singula. So, we have to show that if you take any subset say S as v 1 ,
v 2 ,...v k be a linearly independent subset of V, then we have to show that T(v 1 ),
T(v 2 ),...T(v k ) is also LI in W.
So, to show that it is LI we have to take a linear combination of these elements put it
equal to 0 and try to show at all scalars are equal to 0. So, let T(v 1 )+ T(v 2 )+...+
T(v k ) = 0. Now, since T is linear so, this implies T(v 1 + v 2 +...+ v k ) = 0 and this
implies v 1 + v 2 +...+ v k belongs to null space of T. Because T of this element is equal
to 0, this mean this belongs to null space of T.
But T is non-singular given to us and T is non-singular this means null space will contain
only singleton 0 and this implies v 1 + v 2 +...+ v k = 0. But this set is linearly
independent, it is given to us S is linearly independent. So, since S is linearly
154
independent this implies 1 = 2 =.....= k = 0. Since S is LI and this shows that this set is
also linearly independent. So, the first part of the proof is over. Now, let us suppose T
carries each LI subset of V onto LI subset of W. We have to show the converse part now
and we have to show that T is non-singular, T is non-singular means we have to show
that the null space of T will contain only singleton 0 ok.
Let v be a non-zero vector in V ok. Now, let us consider a set containing this v ok. If we
consider a set containing this v and suppose it is LI ok, suppose S is a set containing v is
LI. So, by this assumption that T carries each LI subset of V onto LI subset of W then a
set containing T(v) of W is also LI, is also LI means T(v) 0 then only it will be LI.
So, we have shown that v 0 implies T(v) 0; we have shown that T(v) 0 implies null
space of v contains only a singleton 0, no element other than 0 and that means this is
non-singular ok. The next definition is isomorphism.
What do you mean by isomorphism? You see if V and W are the vector spaces over the
field F, then any one-one linear transformation T(V) onto W is called an isomorphism of
V on to W. I mean if T is a linear transformation T is one-one and onto from V to W then
that linear map is called an isomorphism of V onto W. If there exists an isomorphism of
V onto W then we say that V is isomorphic to W.
So, if we have a linear transformation T from V to W such that T is one-one and onto
then we say that V is isomorphic to W ok. So, in this lecture we have seen, we have
discussed some of the important properties about inverse of a linear transformation. In
the next lecture we will see a matrix associated with a linear transformation.
Thank you.
155
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 12
Matrix Associated with a LT
Hello friends, welcome to lecture series on matrix analysis with applications. In the last
lecture, we have seen that how we can find out inverse of a linear transformations. We
have seen that if T is invertible that is T is 1 1 and onto then T-1 is also linear and
invertible. In this lecture, we will discuss that how we can find out a matrix associated
with a linear approximation.
Let us suppose that V and W be vector spaces of dimensions n and m, respectively ok.
Let us suppose these 2 are finite dimension vector spaces of dimension n and m over the
field F. Let B 1 which is given by v 1, v 2.,..... v n be an ordered basis of V and B 2 which is w 1 ,
w 2,.... w m be an ordered basis of W. Let T from V to W be a linear map, be a linear
transformation ok.
156
(Refer Slide Time: 01:37)
Similarly, if you want to write say T(v j ) where j is running from 1 to n. So, it will be 1j
w 1 + 2j w 2 +....+ mj w m or it will be written as i running from 1 to m ij w i and j is
running from 1 to n, so that means, each of the element of this B 1 can be expressed as
linear combination of elements of w i ’s.
Now if you take the first element v 1 and T(v 1 ) is given by this and these scalars are
called coordinates of this T(v 1 ), The coordinates of T(v 1 ) are 11 , 21,.... m1 . The
coordinates of v 2 will be similarly 12 , 22 ,.... m2 . So, if you represent matrix
corresponding to this like this, the first coordinate the first coordinate of T(v 1 ) as a
column vector 11 , 21,.... m1 . The second vector is 12 , 22 ,.... m2 and the last is 1n ,
2n ,.... mn. So, this mn matrix, this matrix correspond to this linear transformation
respect to the basis B 1 and B 2 is the matrix associated with T respect to the ordered basis
B 1 and B 2 ok.
157
(Refer Slide Time: 05:18)
So, this is same A which we have discussed and this is called the matrix of T related to
the pair of ordered basis B 1 and B 2 ok. Now, to understand this let us discuss few
examples or few problems based on this.
The first problem is let us consider T from R 2 to R 3 and T is given by T(x 1 , x 2 ) is given
by x 1+ x 2, 2x 1 -x 2 , 7x 2 and the basis of R 2 is the ordered basis [1 0; 0 1], the standard
basis and the basis of R 3 is [1 0 0; 0 1 0; 0 0 1] the standard basis of R 3 .
Now, we have to find out the matrix of T with respect to the ordered basis B 1 and B 2 . So,
how we can find out that matrix? So, let us see here, first we find T(1, 0) ; it is 1, it is 2,
it is 0, so (1, 2, 0), this is nothing, but 1(1, 0, 0)+ 2(0, 1, 0)+0(0, 0, 1). So, what are the
coordinates of this vector T(1, 0) ? The coordinate of this vector are 1, 2 and 0 and you
write this as a column of the matrix of T.
158
Again T(0, 1), it is (1, -1, 7) which can be written as 1(1, 0, 0) -1(0, 1, 0)+7(0, 0, 1). So,
the coordinate of this vector is (1, -1, 7). So, the matrix associated with this linear
transformation corresponding to the basis B 1 and B 2 is given by the first coordinate (1 2
0) as a column vector, (1 2 0). the second column, and third column as (1, -1, 7) . So, this
will be the matrix associated with this linear transformation T corresponding to these
standard basis of R 2 and R 3 . Say now we have a second example, you see the second
example is determine the matrix of T with respect to the basis B 1 and B 2 where T is
given by this expression B 1 is this and B 2 is this. So, how we can find this ?
So, now T is from R 3 to R 2 and T(x, y, z) is given by x+y, y+z. So, B 1 is the basis of R 3
is given as [1, 1, 1; 1, 0, 0 ; 1, 1, 0] and basis of R 2 is as [2, 3; 1, 0].
So, of course, corresponding to different basis for the same linear transformation, the
matrix will be different if you have this linear transformation, but we have a different
bases of R 3 and R 2 . So, there will be a different matrices of this linear transformation
corresponding to the basis B 1 and B 2 ok. Now suppose so, of course, the order of the
matrix will be 23. It is clear from here ok. So, what is T(1, 1, 1)? from here it is (2, 2)
and now this can written as a linear combination of (2, 3) and (1, 0) ok. So, it is 3 and
you want 2. So, it 2/3 times, then only it will become 2 because no second component
here. Now it is 4/ 3 and you to want to make it 2. So, it will be 2/ 3, then only it is 6/3
means 2.
159
So, the coordinates of vector with respect to this basis is (2/3, 2/3). Now if you take T(1,
0, 0): here is you take 1+ 0 =1 and 0+ 0 = 0 which is 0( 2, 3 )+1(1, 1). So, the coordinate
of this vector will be (0, 1).
Now T(1, 1, 0) will be you see 1+1 = 2 and 1 + 0 =1 and this can be written as you see
you want to write it the linear combination of these two vectors, so it is 1/3 and you want
2 here; a 2/3 means 4/3 ok.
So, the matrix corresponding to this linear transformation with respect to these two basis
will be the first coordinate [2/3 2/3] as a column vector, the second vector; the second
coordinate [0, 1]. The third vector is [1/3 4/3]. So, it is of order 23. So, this will be the
corresponding linear transformation ok. The third problem is you consider a linear map, I
mean linear transformation D from p 3 to p 2 which is a differential map ok.
Now, we have to calculate the matrix of T relative to the standard basis of P 3 and P 2 . So,
how we can do that? Now, here D is from P 3 to P 2 and D(f) is simply f ' or D(P) is
simply P' ok, this for the standard basis. So, standard basis of P 3 is 1, x, x2, x3. The
standard basis of B 2 is 1, x, x2. Now what will be D(1), because it is a differential
operator so D(1) = 0 and 0 can be written as 0(1)+0(x)+0(x2). The linear combination of
these 3 elements. Differential of x is 1 which can be written as 1(1)+0(x)+0(x2). Now,
what is D(x2), it is 2x, which is 0(1)+2(x)+0(x2) and what is D(x3), it is 3x2 , which is
0(1)+0(x)+3(x2). So, what is the corresponding matrix? Corresponding matrix will be
you see here the coordinates are (0 0 0). So, as a column (0 0 0), here coordinates are (1
0 0). Here coordinates are (0 2 0). Here coordinates are (0 0 3). So, this will be the matrix
160
corresponding to it is of it of 3 row, 4 columns yeah. So, this is a corresponding matrix
respect to this linear transformation corresponding to these two standard basis of P 3 and
P2.
Now, if a linear map is known and a standard basis are known, then we can find out the
corresponding matrix ok. Now, if a matrix is known and basis are known, then can we
find out the corresponding linear map from V to W? So, the answer is yes. So, how we
can find it, let us see here.
So, let us suppose B as a matrix which is given by B ij of order mn be given. Now using
this matrix B, we can define a linear transformation T from V to W where V and W are
the vector spaces of dimensions n and m respectively. Now to understand this, let us
suppose B 1 which is given by v 1 , v 2,..., v n and B 2 which is w 1 , w 2 ,...w m be the ordered
basis of V and W respectively. Then how we can find out the correspondingly linear
transformation from V to W?
161
You see, we are having T from V to W, a matrix B which is B ij of mn is known to you
that matrix which is associated with the linear map because for the basis B 1 and B 2 . B 1
and B 2 are the basis of V and W ok.
Now again so, this matrix be something like you see, it is B 11 and B 12 and so on up to
B 1n . It is B 21 , B 22 and so on up to B 2n and it is B m1 and so on up to B mn . It is of order
mn. Now if you want to write the first element T(v 1 ) which is the first element of the
basis of v 1 . So, this is the some element of W. So, again this can be written as linear
combination of elements of basis of W which elements of B 2 and the coordinates we
know that the coordinates of this vector is always in the column vector is always from
the column.
So, this T(v 1 ) can be find as B 11 (w 1 )+ B 21 (w 2 )+....+B m1 (w m ). So, this is known the
coefficients all B ij ’s are known and w i ’s are known. So, similarly if you want T(v j ’s) or
T(v j ’s) yeah, So, this will be say B ij .w i and summation i is varying from 1 to m. It is
T(v j ’s) and j is running from 1 to n. So, if we know that T(v j ’s), then we can easily find
out the correspondingly linear map. How we can find this? Let us discuss this by an
example.
You consider a matrix A of [1 - 1 2; 3 1 0]. We have to find out the linear transformation
T from R 3 to R 2 in each of the following basis B 1 and B 2 . The first one is standard basis
of R 3 and R 2 . So, how we can see this?
162
(Refer Slide Time: 17:05)
T is from R 3 to R 2. Matrix A for both the problems are is same ok. Now standard basis
of R 3 is 1, 0, 0 0, 1, 0 and then 0, 0, 1. The standard basis of R 2 is 1, 0 and 0, 1 ok; that
that you already know.
Now, what is T(1, 0 0). Now T of this element will be given by the matrix associated
with this linear map corresponding to these standard basis, this matrix which is given to
us and the first column correspond to the coordinates of the first vector that is T(v 1 ). So,
this is nothing, but 1(1, 0)+ 3(0, 1). So, that is simply (1, 3). Again T(0, 1, 0) will be
written as the second column that is -1(1 0)+1(0, 1) that will be (-1, 1).
Now T(0, 0, 1) = 2(1, 0)+0(0, 1) which is (2, 0). Now, if you write any (x, y, z) in R 3 . So,
it can be written as x(1, 0, 0)+y(0, 1, 0)+z(0, 0, 1). So, T( x, y, z) = xT(1, 0, 0) is (1, 3)
given to us, yT(0, 1, 0) is (-1, 1) and zT(0, 0, 1) is (2, 0). So, this will be (x-y+2z, 3x+y).
So, this is a required linear map. So, matrix is known and a correspondingly basis are
known, then also we can find out the linear map ok.
163
In the second example, basis are this and this. So, how we can find out the corresponding
linear map? you can see here. Again T is from R 3 to R 2 and A is given to us which is
same B 1 is [1, 1, 1; 1, 1, 0; 1, 0, 0], B 2 is [1, 1; 1 -1].
Now, first you write T(1, 1, 1); the first element the coordinates are (1, 3). So, 1(1,
1)+3(1 -1) which is (4, -2). Now T(1, 1, 0) will be equal to the coordinates are -1(1,
1)+1(1, -1) which is the (0, -2); T(1, 0, 0) = 2(1, 1)+0(1, -1) which is (2, 2). Now you
express any (x, y, z) in R 3 as a linear combination of these 3 elements ok. You try to find
out , , in terms of x, y, z. How we can do that? You can simply have 3 equation you
see ++ = x; + = y and = z.
So, from here, we will obtain as = y - z and from here we obtain as = x- -. You see
++ = x; + = y and = ; = z. So, = y-z, it is and how we will find . So, = x
- -. It is x - = z and = y- z. So, this is x-y. So, this is x-y. So, what is x, y, z? It is
z(1, 1, 1)+(y-z)(1, 1, 0)+(x-y)(1, 0, 0) and T(x, y, z) will be equal to z(1, 1, 1) which is 4-
2+y-zT(1, 1, 0) which is (0, -2)+ x-y; T of this which is (2, 2).
Now, you can simplify this and you can find out the correspondingly linear map from R 3
to R 2 respect to these two basis B 1 and B 2 . So, if we know the linear transformation T
and the corresponding basis of V and W, then we can find out matrix associated with T
correspond to those basis B 1 and B 2 . And conversely, if you know the matrix and the b
164
basis of V and W, then we can find out the linear map ok. So, in the next lecture we will
see some more properties of vector space and linear transformations.
Thank you.
165
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 13
Eigenvalues and Eigenvectors
Hello friends, welcome to lecture series on Matrix Analysis with Applications. Now,
today’s lectures based on Eigenvalues and Eigenvectors. That if we have a given matrix
A, then how can you find the eigenvalues and the corresponding eigenvectors.
So, first of all what eigenvalues and eigenvectors are. So, let V be a vector space over the
field F, and T be a linear operator on V that is T is a linear transformation from V to V.
A non-zero vector v belongs to V is called an eigenvector of T, if there exists a scalar
belongs to F such that Tv = v. So, if we have a linear operator T, and we have a non-
zero vector v in V, and there exist a belongs to field such that T v = v, then we say
that is eigenvalue, and the corresponding eigenvector is v.
166
Now, a square matrix A may be viewed as a linear operator T, and is defined as Tx=Ax,
where x belongs to V ok. So, if you have a linear operator, then A matrix can be viewed
as a linear operator also. Now, now let us define a matrix A, which is a square matrix of
order n. Then if we consider a system of homogenous equation Ax =x.
Then the value of for which the system of linear equations has non-trivial solution are
called eigenvalues or the characteristic values of A, and the corresponding non-zero
vectors x are called eigenvectors or the characteristic vectors of A. Since, every matrix
can be viewed as a linear operator T, so in the previous definition we have replace this T
by a matrix A here. And this x is in V, this is a scalar in field ok.
So, what I want to say basically, suppose you have a matrix A, which is of order nn.
Then if Ax= x, where x 0. Then this is called eigenvalue of A and x is called
corresponding eigenvector of A, or eigenvector corresponding to .
Now, we are having AX=x, so this can be written as Ax-x= 0 ok. Now, this can be
written as (A-I)x =0, where I is an identity matrix. Now, what we are having basically,
we are having (A-I)x =0 means, we are having system of homogeneous linear equations
you know, because b = 0.
Now, since x0 as we already defined. So, this means if we are taking this as a linear
operator T, if we take (A-I) as a linear operator T, then Tx= 0, that means, x0, that
means, the nullity the nulls space. The nullity of this matrix or this operator is greater
167
than or equal to 1 ok, because x0 means this means nullity is not 0, and nullity is not 0
means, it is greater than or equal to 1.
Now, what is the dimension of A-I of course, dimension of A-I is n. So, by rank
nullity theorem, the rank of this A-I + nullity of A - I must be equal to dimension of
matrix or operator, which is n So, this implies rank of A - I must be equals to n - nullity
of A - I.
And since, nullity is greater than equal to 1, this is less than equal to n - 1. And since,
rank n-1, this means in the echelon form of this matrix at least one 0 row is there,
which is having all 0 elements, and that means, this implies determinant of A-I must be
0. Now, this equation, determinant of A-I = 0 is called a characteristic polynomial, this
will give a characteristic polynomial in . Now, this is a matrix of order nn. So,
determinant, so this determinant will give a polynomial of degree n in .
So, so what we are having now? So, now if we visualize the determinant of A-I = 0,
that means, this determinant is equal to 0 you see.
168
(Refer Slide Time: 06:38)
So, this will give us a polynomial in , and that polynomial is called characteristic
polynomial ok. So, this is basically gives (-1)n[n - c 1 n-1 + c 2 n-2 - so on + (-1)nc n ] =0.
Because when you expect this in terms of , so you will get a polynomial of degree n in
.
Now, if you carefully observe this determinant you see, when you open from the first
element you see, if you take a 11 -, then you delete this row and this row and this column.
So, this will be the remaining I mean determinant -a 12 .
Now, when you take, when you expand from this element, you delete this column and
this row, that means, the two values of two linear factors of are not coming in this in
this expansion, so that means, this expansion will contain maximum n-2, because when
169
you take, when you expand from this, you are deleting this row and this column. So, two
factors of are not coming in are not coming in that expansion, that means, the
maximum power of , which is coming from this term is maximum power of , which is
coming from this term is n-2.
Similarly, if you take a 13 , so here will be something a 33 -, and when you delete this
column and this row, then again two factors of are not coming, so that means,
maximum power of , which are coming from this term is n-2. Similarly from the other
terms so, what we can conclude that the remaining terms will contain say 1 n-1, and so
n-2 and so on, so that means, the power of n, and power of n-1 are coming only from
the first determinant.
So, that means, that means, it will contain maximum n-2, and -3, because this
determinant is having n-1, 1 factor is already outside. So, this determinant is having n-1.
And two factors are not coming, that means, the maximum power, which this is this
element is having is n-3.
And when you multiply this with , so it will be having n-2, so that means, what I want
to say basically that when you expand similarly, the entire determinant. So, n and n-1
are only coming in the product of the diagonal elements. When you similarly try to
expand this entire determinant, so n, and -1 is coming only from product of the
determinant diagonal elements ok.
170
So, all other and then-2 , n-3 are from all the components, all the elements, so that
means, coefficient of n is simply coefficient of (a 11 -)(a 22 -).....(a nn -). And it is simply
(-1)n, so carefully see it is simply (-1)n.
And similarly, if you want to see coefficient of n-1, it is coefficient of n-1 in again this
product (a 11 -)(a 22 -).....(a nn -). So, what is the coefficient of n-1 in this, it is simply a 11
+ a 22 +... a nn . You can simply see, the sum of these elements a 11 + a 22 +... a nn , will be the
coefficient of n-1.
So, what basically now we are having, we are having determinant of A-I is something
like (-1)n[n-c 1 n-1 +c 2 n-2-.........(-1)nc n ]
Now, how many roots this equation will be having, this equation will be having n roots,
because it is of n degree polynomial. So, let 1 , 2 ,....., n be the roots of this equation.
So, if these are the roots, then what is the sum of the roots, this is the equation. So, some
of the roots of this will be simply -(-c 1 ), and this is equal to the coefficient of n-1 is
simply this, so this is a 11 +a 22 +....a nn . And this is called a trace of a matrix, trace of a
matrix A.
So, what we have concluded, we have concluded that sum of eigenvalues is nothing but
the trace of the metrics, that means, some of the diagonal elements, the first property.
The second property is now this result that this is equal to this whole for every , so this
whole for =0 also. And when you substitute =0 we will get determinant of A as equal
to c n .
Now, if you find product of roots for this equation, product of roots. So, product of roots
are 1 . 2 ..... n , and that is simply = (-1)degree of equation last term/first term, which is
nothing but c n ok, c n /(-1)n, and that is simply equal to determinant of A. So, this implies
product of eigenvalues is nothing but determinant of A. So, we have noted here, two
important properties of eigenvalues, the one property is that the sum of the
eigenvalues is equal to trace of the matrix that is the sum of the diagonal elements,
and the second property is the product of the eigenvalues is equal to the
determinant of the matrix A.
171
(Refer Slide Time: 16:41)
So, the sum of the eigenvalues simply equal the trace of the matrix. And the product of
the eigenvalues simply the determinant of A.
Now, let us let us discuss this problem. The first problem is let us consider this matrix A,
which is [1 -1; 1 1], let us find this eigenvalues and the corresponding eigenvectors.
172
So, the matrix A is simply [1 -1; 1 1]. So, you can simply write |A-I| = 0 for the
characteristic polynomial, or finding the roots of the equation, roots of the latent roots or
characteristic roots of this matrix A, so this is nothing but (1-)2+ 1 = ,0 that implies (1-
)2 = -1, or = ±i or =1±i
So, let us find corresponding eigenvectors. So, first you find eigenvector corresponding
to =1-i. So, how you will find that you can simply see that (A-I) = 0, this we have
already seen that this x is the eigenvector correspond to this lambda.
So, this implies, now you substitute as A 1-i , so this is (A-(1-i) -I)x. So, this implies,
now when you take 1- here, so it is you can simply take here as [i -1; 1 i][x 1 x 2 ] = [0
0]
So, this implies, now you can apply some elementary row operation in this matrix. You
simply first interchange these two rows, it is [1 i;. -i 1][ x 1 x 2 ] = [0 0]. Now, this implies
you can make 0 here, with the help of this by applying the elementary row operation R 2 -
iR 1 . When you apply this elementary row operations here, it is [1 i;0 0 ][x 1 x 2 ] =[0 0].
So, this implies x 1 +ix 2 = 0. So, this give infinitely many solutions of x 1 and x 2 ok. You
can substitute any value of x 1 , you can find corresponding value of x 2 such that x 1 ,
x 2 0.
So, you can take x 1 as -i, x 2 from here, and it is x 2 =1, So, if you are talking about
number of linearly independent eigenvector corresponding to = 1-i, so it is one. So, you
can pick any one linearly independent eigenvector from this, because it gives infinitely
many eigenvectors. You can pick out any one eigenvector, so we can said that vector is
linearly independent eigenvector corresponding to =1-i. So, we can say that
corresponding to =1-i, the linearly independent eigenvector is say [-i 1]T.
Now, similarly you will take = 1+i, and we can you find out the corresponding linearly
independent eigenvector on the same lines.
173
(Refer Slide Time: 21:43)
So, let us discuss this problem now. Here the matrix A is simply [0 0 2; 0 2 0; 2 0 3].
Now, 1 = 4, it is given to us. Now, we know that the trace is equal to the sum of the
eigenvalues. So, let us suppose the other two eigenvalues are 2 , 3 , it is of order 33, so
it will be having three eigenvalues. So, some of the eigenvalues is equal to trace of the
matrix that is 5= 0 + 2 + 3. Now, 1 = 4, so this implies 2 + 3 = 1.
Now, the product of the eigenvalues, we know that product of eigenvalue is simply
determinant of A. Now, the determinant of A is -8, so this implies 2 3 = - 2. So, solving
these two equations, we can easily find 2 , 3 . So, it is clearly as we are seeing it is 2
=2; 3 = -1. So, sum is 1 and the product is -2. So, the other two eigenvalues are 2 and -1.
So, now we can say that this matrix has the eigenvalues 4, 2, and -1. Now, we have to
find out the corresponding eigenvectors. Let us suppose, we have to find out the
eigenvector corresponding to = 4. Similarly, we can find eigenvector for = 2, and =
-1.
So, let us find eigenvector corresponding to = 4. How you will find that this is (A I)X
= 0 again, so this implies (A-4 I).X = 0. So, this implies [-4 0 2; 0 -2 0; 2 0 -1[x 1 x 2 x 3 ]
=0.
174
Now, we will try to convert this into its echelon form, which comes out to be [-4 0 2; 0 -
2 0; 0 0 0][ x 1 x 2 x 3 ] = [0 0 0]. Multiplied these two matrices, then we simply get these
equations, and -2x 2 = 0. So, this implies x 2 =0, and x 3 =2x 1 .
175
(Refer Slide Time: 26:18)
Now, if you want for A2, you see AX = X, X 0. If you multiply both sides by A, it is
A . X = A . X, which is AX. And AX is X, so it is X. So, we can say that A2X =
2X. So, what we can say that if A has an eigenvalue and the corresponding
eigenvector is X, then A2 has an eigenvalue 2 and the corresponding eigenvector is X.
Similarly, we can say that Ak = k.X, you can similarly find A3, A4 and so on. So, what
we have concluded, we have concluded that if A has an eigenvalue and the
corresponding eigenvector is X, then Ak has eigenvalue k, and the corresponding
eigenvector is X.
Now, next property is A and AT, both have same eigenvalues that is very easy to show,
you can see that eigenvalues of A are given by this expression. Now, A-A this matrix,
the transpose of this matrix is equals to AT the determinant of this is same as determinant
of this, because by changing interchanging rows and columns will not change the value
of the determinant. This is equal to 0, it is given to us, and this implies |(AT-I)2| = 0 so
that is not changing here, whatever we are having here, same we are having here,
that means, the eigenvalues of A and AT are same.
176
The next is if you say that AX= X, and suppose determinant of A is not equal to 0, that
means, A is invertible. Then you can take A-1 both the sides, then A-1A X =A-1X. And
this implies, IX =A-1X. And this implies, A-1X = 1/X, this is one or identity here. So,
we can say that if A as an eigenvalue and the corresponding eigenvector is X, then A-1
has an eigenvalue 1/ and the corresponding eigenvector is X.
So, these are few of the properties of eigenvalues, which is stated here. And if g is an
eigenvalue of gA, where g is a polynomial of , where x is an eigenvector of gA
corresponding to eigenvalue g. That is very easy to show again, because the third
property itself states that if g is a polynomial of , then g is an eigenvalue of gA.
Now, let us let us discuss these few problems quickly. You see, if A is a singular matrix
of order 2 and having trace 3. If it is singular matrix means, determinant is equal to 0,
determinant is nothing but product of eigenvalues, product of eigenvalues equal to 0
means at least one of the eigenvalue is 0. Now, it is order two, that means, it is having
only two eigenvalues, say 1 , and 2 .
177
(Refer Slide Time: 30:45)
So, so we can say A is a order 22 having two eigenvalues, 1 , and 2 . Then product of
eigenvalues is equal to 0, because matrix is singular, and trace is 3, trace is 3. So, here
from here, we can say that one of the eigenvalue is 0 say = 0. If 1 is 0, so 2 = 3. So,
the eigenvalues are 0 and 3 of A; eigenvalue of AT will be again 0 and 3, and eigenvalue
of AT2 will be 0 and 9 ok.
The next one is if A is a 44 matrix of eigenvalues 1, -1, 2, -2. Then the eigenvalues of
this how we can find, you see eigenvalues of A are 1, -1, 2, -2 and B is given as 2A+A-1-
I. So, it is some polynomial in A, So, the eigenvalue of B will be nothing but if A has an
eigenvalue , then the eigenvalue of B will be nothing but 2+1/-1.
So, simply, you can substitute here, corresponding 1, where having 2+1+1-1 = 2 for B.
Corresponding -1, if it substitute -1, it is 2(-1)-1-1= -4. Corresponding 2, it is 1+ 1/ 2 -1
that is that is 4-1/2 that is 7/2. Corresponding to -2, if I talking about -2, because -2, it is
-4-1/2(1) that is -4-3/2 that is -11/ 2. So, these are the eigenvalues corresponding to B.
So, the third problem is simple, if A is a matrix of order n, and such that Ak=0, where k
belongs to a natural number. Then all this eigenvalues are 0, because if one of because if
any of the eigenvalue is not equal to 0 of A, then Ak will not be 0. So, therefore, all the
eigenvalues of A must be 0.
178
(Refer Slide Time: 33:26)
So, this is a result that if A is matrix of order n having k distinct eigenvalues 1,.... k . Let
v i be the eigenvector corresponding to the eigenvalue i , i from 1 to k. Then this set is
LI.
So, let us try to prove this result. You see, 1 , 2,.... k are distinct eigenvalues, that
means, , i j for ij number 1. Now, the set we are considering this v 1 , v 2 ,...v k , we
have to show that this set is linearly independent. v 1 is a eigenvector linearly
independent eigenvector correspond to 1 , v 2 is a linearly independent eigenvector
corresponding to 2 and so on. So, we will prove this by the method of induction ok.
Let us for n = 1 for k =1, we are having v 1 a singleton set, I mean, it is linearly
independent. I mean singleton set is always linearly independent, singleton non-zero
vector, and v 1 is a non-zero vector is always linearly independent. Now, we will assume
179
that it is true for k = r, I mean this result hold for k= r, that means, the set of k= r vectors
are linearly independent. And we will try to show that this also holds for k = r+1.
So, take the linear combination of these vectors now, in order to show that these are
linearly independent. Now, you multiply both sides by matrix A, so we can simply take
A 1 v 1 +A. 2 v 2 +....A. r+1 v r+1 = 0.
Now, A.v i = i .v i for all i, because i is the eigenvalue and the corresponding
eigenvector is v i . So, we can simply write 1 Av 1 is A 1 v 1 , similarly 2 2 v 2 and so on
up to r r+1 v r+1 . Now, say this equation 1, and this is equation 2. You can multiply
1/ r+1 . into eq.1 and subtract with eq.2, so what we will obtain, we will obtain 1 1. r+1 -
1 v 1 and so on up to r r+1 - r v r = 0.
Now, we have already assumed that a set of r vectors are linearly independent. So, this is
some linear combination of this linearly independent vector, which is equal to 0. So, this
implies, i is r+1 - i = 0 for all i. But, eigenvalues are distinct, so this is not equal to 0.
So, this implies, = 0 for all i for all i means for i from 1 to r.
And when you substitute from 1 to r here with all 0, then you will get r+1 v r+1 = 0. And
since, v r+1 0 is eigenvector, so this implies, r+1 = 0. So, we have shown that all i ’s , i
from 1 to r+1 = 0 that mean, this set of vectors are linearly independent. So, in this
lecture, we have seen that that what eigenvalues and eigenvectors are, and some of the
important properties of eigenvalues and eigenvectors.
180
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 14
Cayley-Hamilton Theorem and Minimal Polynomials
Hello friends so, welcome to lecture series on Matrix Analysis with Applications. So, in
today’s lecture we will focus on Cayley-Hamilton theorem and minimal polynomial that,
what Cayley-Hamilton theorem is and how can you find the minimal polynomial using
characteristic polynomial. So, what Cayley-Hamilton theorem state let us see.
Cayley-Hamilton theorem states that every matrix A, every square matrix A is a root of
its characteristic polynomial ok. So, what does it mean let us see.
181
We know that AX=X, we already discuss this thing that where X 0 and is a
corresponding eigenvalue, I mean if lambda is a corresponding eigenvalue, then the
corresponding eigenvector is X ok.
Now, you see for this system for this system of linear equations, where B is a matrix of
order nn, this system of equation if X 0 means have infinitely many solutions. And
this will be having infinitely many solution only when determinant of B is equal to 0. So,
here instead of B we are having A-I. So, this means since X is not equal to 0 this means
determinant of A minus lambda I is equal to 0. So, this means this expression |A-I|=0 ,
gives a polynomial in and that polynomial in is called characteristic polynomial. And
what is the degree of that polynomial? The degree of that polynomial is order of the
matrix.
Here, here I am taking out the matrixes nn. So, the degree of the polynomial of this will
be simply of order n. So, suppose a polynomial which we are getting after opening this
determinant is n-C 1 n-1+C 2 n-2-.......(-1)nC n = 0 ok. So, this polynomial in is called
characteristic polynomial ok.
So, how many roots this equation is having, this equation is having n number of roots.
Suppose roots are 1 , 2 up to n , roots maybe distinct roots maybe real roots maybe
complex ok. Some roots are equal some are distinct anything.
Now, we have already discussed sum of the eigenvalues is equal to trace of the matrix,
trace means sum of the diagonal elements, that is sum of a 11 +a 22 +a 33 +...+a nn . And this
also we have discussed that product of eigenvalues is nothing, but determinant of A ok.
182
(Refer Slide Time: 04:39)
That means, if for a matrix A of order nn, the characteristic polynomial is determinant
|A-I| = 0, which is equal to suppose n-C 1 n-1+C 2 n-2-.......(-1)nC n = 0.
Now, how can we prove it, how can we prove that a matrix whose characteristic
polynomial is given by suppose this expression, it will be satisfied by the matrix also. So,
let us try to prove this result the proof of the Cayley-Hamilton theorem you see, in order
to prove let us find adjoint(A)-I. You see what is a matrix suppose a matrix is [a 11 ,
a 12 ,... ..a 1n ; a 21 , a 22, ...a 2n ;.... ;a n1 ,....a nn ].
This is nn matrix A and, what is A-I, A-I = [a 11 -, a 12,... a 1n ; a 21 , a 22 -.....a 2n ; a n1 ,
a n2 ,...a nn -].
Now, if you want to find out adjoint of this matrix A-I. So, first find cofactors of each
element, take the transpose of the matrix found by the cofactors, that will be the adjoint
of that matrix. If suppose you want to find out the cofactor of this matrix this element,
183
the first element you want to find out cofactor of this element will leave, this column will
leave this row and the determinant of n-1n-1 matrix will be the co factor corresponding
to first element.
Similarly, you can find out cofactor of a 12 element and all others. So, it mean it mean
that if you want to find out the cofactor of any element of this matrix say first element.
So, it will be the highest power of will be n-1, if you see cofactor of this element. Then
the determinant of this will contain n-1. Now, n, if you want to find cofactor of this
element, then the highest power of will be n-2 ok.
So, if you find cofactors of each and every element of this matrix. So, this will be
something like, you can say that it is B 1 n-1+ B 2 n-2 +.....+ B n ok. This B 1 , B 2 up to B n
they are the matrices itself ok, Because, when you write the determinant of this matrix
you open it, it will contain n-1 and the other terms.
Similarly, when you take the cofactors of other element and so on, you will get n-1 a
matrix B 1 . Similarity n-2 a matrix B 2 and so on up to B n . You can easily verify this
result by taking a 33 matrix so, you will find that B 1 , B 2 up to B n in that case are the
matrixes of order nn.
Now, we know that we also know that A-I . adjoint(A-I) that means, A.adjoint(A)
=|AI|. So, A-I . adjoint(A-I) = |A-I. I|, this we already know. So, you substitute this
expression.
184
We are having A-I. adjoint(A)-I=| A-I|I.
This we have already assumed that the determinant of this matrixesn- C 1 n-1 +....+(-
1)n.C n )I.
Now, let us compare the coefficient from both the sides, you see what is the coefficient
of n, from here and when you multiply these two brackets, then n will be -B 1 , from this
into this itself ok. And that must be equal to I from here.
Now, n-1, when you multiply this with this element. So, it is AB 1 and this with this, I
mean this element the second element of this bracket, that is -B 2 that will be equal to n-2
from here C 1 .I
Similarly, if you take the 0 from here. So, that is B n .A and, that will be equal to (-
1)nC n I. Now, you multiply the first equation by An, the second equation by An-1. The last
equation with I and you add them, when you multiply this by An, multiply this by An-1.
Then it become A.An-1 will become AnB 1 .
So, these two will cancel out. Now, similarly when you multiply B 2 with this element
this will be -An-1.B 2 which will be cancel from the next expression, term in the next
expression. And similarly last expression will be cancels from the second last one ok. So,
when you add them it we obtain An-C 1 An-1+C 2 An-2+......(-1)nC n I= 0.
So, hence we have obtained this result, which states that characteristic polynomial will
be satisfied by the matrix itself. So, hence we got the proof of the Cayley-Hamilton
theorem.
So, suppose we are having this problem now, matrix A for simplicity we have taken
example of 22. Similarly we can go for 33, or higher orders. Now, let us suppose A is
185
this matrix ok, we have to verify Cayley-Hamilton theorem and hence find A-1,
adjoint(A) and A6.
Now, what is A here? A is you see A is [2 -2; -2 5]. First we will find out the
characteristic polynomial of this matrix A and, from there we will try to verify Cayley-
Hamilton theorem. So, what is the characteristic polynomial: |A-I| = 0 is a characteristic
polynomial. So, this implies determinant of |2- -2; -2 5-| = 0.
So, 2-7+6I=0 is the characteristic polynomial of this matrix A. Now, from Cayley-
Hamilton theorem we must have A2-7A+6I=0, this we have to verify in order to verify
you simply take left hand side, left hand side is A2-7A+6I .
And we have to show that it is equal to null matrix. So, first find A2, which is [2 -2; -2
5] [2 -2; -2 5] = [8 -14; -14 29].
Now, you now let us try to find this expression A2-7A+6I which is [8 -14; -14 29]-7[2 -2;
-2 5]+6[1 0; 0 1] which comes out [0 0; 0 0]
So, this Caylen Hamilton theorem is verified. Now, we have to find out A-1 using
Cayley-Hamilton theorem so, how we will find that.
186
(Refer Slide Time: 17:11)
So, for this matrix we have seen that A2-7A+6I = 0, this we have obtained by Cayley-
Hamilton theorem. Now, what is the determinant of the matrix A, determinant of the
matrix is simply product of eigenvalues and, product of eigenvalues is simply given by
the last term/first term that is 6 that is the determinant.
So, determinant is not equal to 0 this means inverse exist. Now, first we have to ensure
that inverse exist and, for inverse to exist, matrix must be invertible, I mean non singular
for that determinant must be nonzero and, from here the product of eigenvalue is 6. So,
we can say that determinant is not equal to 0 so, A inverse exist.
Now, how to find A-1. Now, since A-1 exist and A satisfied this equation by Cayley-
Hamilton theorem. So, we can multiply both the sides, or we can operate both the sides
by A-1. So, let us operate both the sides by A-1, A-1A2-7A-1A+6A-1I =0i
Now, this implies A-1 = 1/6(7I-A) =1/6(7[1 0; 0 1]-[2 -2; -2 5]) =1/6[5 2; 2 2]
So, this should be the inverse of this matrix. Now, next is we have to find out adjoint of
A. Now, we know that A-1= adj(A)|A|. So, this implies adjoint(A) = A-1.|A|. Now,
determinant of A is 6 that we already shown. So, it is 6. A-1 so, 6 is cancel out. So,
adjoint of this matrix is [5 2; 2 2], next we have to find out A6 for the same problem ok.
187
(Refer Slide Time: 19:52)
So, it is very easy to find out using this expression you see, A2=7A-6I, you can find a
cube by multiplying both sides by A, it is A3=7A2-6A = 7(7A-6I)-6A =43A-42I. Now,
again multiply both sides by A. A4= 43A2-42A =43(7A-6I)-42A,
Similarly, you can now multiply again you can multiply by A both the sides, A2 =7A-6I
And again you can multiply by A to get A5, and substitute A2 from this expression and
finally, A6 ok.
So, finally, you will get a expression of A6 of this form some A+I, some expression of
this form where and can be computed by the successive computation, then you can
substitute A because you know the matrix A and I you already know you can easily find
out A6 ok.
And other way out is using Cayley-Hamilton theorem, it is like you might be seeing that
for 22 matrix, either you can multiply A by A 6 times, A square, then A cube then A
raised to the power 4 A raised to the power 5 and, then A raised to power 6, but if it is a
larger matrix of order say 1010, then finding A raised to the power 6 is the difficult
task.
188
So, but by the Cayley-Hamilton theorem using Cayley-Hamilton theorem, it is easy to
find out. So, this is the main application of Cayley-Hamilton theorem to find out the
higher powers of A to find out A inverse, adjoint of A and other things about the matrix
A.
Now, what is monic polynomial a polynomial f(x) given by f(x)=a n xn+a n-1 xn-
1
+........+a 1 x+a 0 , is said to be monic. If the leading coefficient is 1, if this leading
coefficient which is a coefficient of highest power of this polynomial is 1, then this
polynomial is called monic polynomial ok.
Now, what do you mean by minimal polynomial. Let T be a linear operator on a finite
dimensional vector space V, then there exist a unique monic polynomial of minimum
degree m T (x) such that m T (T)(v) = 0 for all v in V, then this m T (T) is called minimal
polynomial of T.
189
So, what does it mean basically suppose, suppose A is a matrix of order say 44 ok. it's
characteristic polynomial is suppose (-1)( -1)( -3)(-4) of course, if it is a order 44.
So, the degree of the characteristic polynomial will be 4. So, it will be having 4 roots,
suppose all the 4 roots are real. So, characteristic roots are the eigenvalues of this matrix
are suppose 1, 1, 2, 3. So, what is the characteristic polynomial these are characteristic
polynomial of A, this matrix A and by the Cayley-Hamilton theorem we can easily say
that (A-I)2.(A-2I).(A-3I) = 0, because by the Cayley-Hamilton theorem matrix satisfies
characteristic polynomial.
So, hence this will be equal to 0. Now, minimal polynomial is the lowest degree
polynomial, for which is to be satisfied by the matrix itself, you see what is the degree of
this polynomial degree of polynomial is 4 ok. The important property of minimal
polynomial is it contains all the different roots ok, it contains all the different roots that
different roots are 1 2 and 3. So, minimal polynomial will always contain l( -1)( -3)(-
4).
All the irreducible roots it will be having the minimal polynomial. So, either it is this
polynomial, which is the lowest degree polynomial, which maybe the lowest degree
polynomial. And A is satisfied in this expression or it will be characteristic polynomial
itself, I mean I want to say that matrix A will be satisfied by either this, or this for this it
is A for this it is A obviously, true because by the Cayley-Hamilton theorem.
But we need a polynomial which is lowest degree. So, it may be this polynomial also.
So, for this polynomial we have to check whether A is satisfying this expression or not.
If A satisfying this expression so, this will be the minimal polynomial. Otherwise this is
the minimum polynomial which is of course, be satisfied by A by the Cayley-Hamilton
theorem ok.
190
(Refer Slide Time: 00:27)
So, let us discuss it by an example, suppose you are having the first problem [2 0; 0 2].
So, what is the characteristic polynomial of this matrix, which means |2- 0; 0 2-| =0, or
it implies (2-)2=0. So, this is a characteristic polynomial of this matrix A, how many
roots it is having only 2 roots both are equal, = 2, 2 ok.
Now, the minimal polynomial is the lowest degree polynomial contains all the
irreducible roots, all the linear roots, all the different roots distinct roots ok. So, that
means, the minimal polynomial will be (2-), or (2-)2. This will be always satisfied by
matrix itself by the Cayley-Hamilton theorem.
So, we have to check whether this is the satisfied by the matrix, or not if it is satisfied by
the matrix; that means it is a minimal polynomial. So, this means we have to check that
(2I-A) should be 0, or it is not 0 let us see. So, 2I is [2 0; 0 2] and A is simply of course,
the same thing.
191
So, it is comes out to be a null matrix; that means, this expression, satisfied by matrix A;
that means a minimal polynomial ocorresponding to this matrix is (2-) ok. It is of
degree 1 not 2. So, the minimal polynomial of this matrix is (2-).
Now, let us take the second example to see what is the meaning of polynomial of this.
The second example, A here is [3 1 0; 0 3 0; 0 0 4]. It is the upper triangular matrix you
can easily see upper triangular matrix and, the in case of upper triangular matrix
eigenvalues are simply the diagonal elements.
So, what are eighenvalues of this matrix these are 3 3 and 4. And if you know the
eigenvalues you can simply find out characteristic polynomial, which is (-3)2(-4). So,
characteristic polynomial of this matrix A will be nothing, but (-3)2(-4).. Now, you
have to see what is the minimal polynomial of this matrix A. Minimal polynomial is
lowest degree polynomial which is satisfied by the matrix itself.
So, this is obviously, satisfied because by the Cayley-Hamilton theorem. Let us see
whether it is satisfied or not if it is satisfied, then this will be the minimal polynomial of
this matrix A. So, we have to see basically that (A-3I)(A-4I), what is A-3 I, it is [0 1 0; 0
0 0; 0 0 1] and, what is A- 4 I, it is [-1 1 0; 0-1 0; 0 0 0].
192
Now, what is the product of these two, this is not a null matrix. And if it is not equal to
null matrix; that means, this cannot be the minimal polynomial
So, what is the minimal polynomial then the minimum polynomial will be (-3)2(-4)
So, in this way we can find out the minimal polynomial of a matrix A ok.
So, basically if I have a matrix A of order say 55 and the characteristic polynomial is
suppose (-1)3(-2)(+3)= 0, then the minimal polynomial of this I mean for this matrix
A corresponding characteristic polynomial may be so, what about cases we have to
check.
So, first you will check for (-1)(-2)(+3), if it is equal to 0. So, this the least degree
polynomial, which is satisfied by the matrix. Otherwise we check for (-1)2(-2)(+3), if
these two are not equal to 0, then the characteristic polynomial itself will be the minimal
polynomial ok.
193
(Refer Slide Time: 33:03)
So, what are properties: minimal polynomial, the minimal polynomial m(t) of matrix A
divides every polynomial that has A as a zero, that is m(t) divide the characteristic
polynomial of A. So, always minimal polynomial divides it's characteristic polynomial
ok.
Number 2, it has the characteristic polynomial and minimal polynomial of matrix A has
the same irreducible factors, which we have discussed. And number 3, A scalar is
eigenvalue of matrix A if and only if is a root of a minimal polynomial of A also. So, if
is eigenvalue it is always a root of minimal polynomial also.
So, these are some of the important properties of minimal polynomial. So, in this lecture
we have seen that that, how we can find out characteristic polynomial, I mean how we
can see and see the advantage, or the applications of Cayley-Hamilton theorem, how can
you find out minimal polynomial corresponding to a matrix A ok. In the next few
lectures, we will see some of the important properties or important advantage of minimal
polynomial so.
Thank you.
194
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 15
Diagonalization
Hello friends, so welcome to lecture series on Matrix Analysis with Applications. So,
today we will discuss about Diagonalization, what do you mean by diagonalization of the
matrices?
Before starting diagonalization let us discuss similar matrices. So, let A and B be two
square matrix of order n over the field F, then A is said to be similar to B if there exists
an invertible matrix P such that B=P-1P and we symbolically write A ~ B like this ok. So,
what do you mean by basically two similar matrices?
So, we say that a matrix A~B if there exist an invertible matrix, invertible matrix P such
that A P=P B or B=P-1AP ok. So, one thing is very obvious from this fact. So, what is
195
that, that determinant of B and determinant of A are equal. This is very clear from here
you can see, |B| =|P-1AP|, and that will be equal to |P-1| |A||P|. Since, determinant of A
into B is equal to determinant of A into determinant of B.
Now, |P-1| =1/|P|. So, these two cancels out, so it is equal to determinant of A. So, what
we obtain the first property of similar matrices is that if two matrices A and B are similar
then their determinant are always equal, this is a first property of similar matrices. The
second property is that if two matrices are similar then they have same eigenvalues ok.
So, how can you prove this? So, if A is similar to B this implies there exist P such that
AP = PB or A= P B P-1. If A is similar to B, now we have the second property we have to
show that if two matrices similar then they have same eigenvalues.
So, how we can proceed for this supposes Ax = x where x 0. So, what does it mean
this means that A has an eigenvalue and the corresponding eigenvector is x ok. Now
since, A similar to B that means, A = P B P-1, so you can replace A by P B P-1. Now this
implies B.P-1x this we can easily write you pre multiplied both the side by P-1 because P
is invertible. So, it is P-1x and this implies B.P-1x = P-1x, because is a scalar can be
taken out. Now, this P-1x you can suppose is y, then this is By = y.
So, what we have concluded we have concluded that B matrix has an eigenvalue and
the corresponding eigenvector is y. In order to prove that y is an eigenvector you have to
show that y 0 of course. So, that is very easy to show suppose y = 0, we can prove this
196
by contradiction if y = 0; that means, P-1x = 0. Now x 0; from here because it’s an
eigenvector and this matrix is invertible this is a system of linear equations ok.
As system of linear equation with this matrix is invertible; that means, only unique
solution which is the x = 0, but x 0. So, it is a contradiction. Hence, y 0 because,
whenever we write such express that B y = y, if y 0 then only we can say that the
eigenvalues and the corresponding eigenvector is y ok. So, what we have concluded
basically we have taken a matrix A with eigenvalue and the corresponding eigenvector
x, and using the property of similar matrices we have shown that a matrix B has a same
eigenvalue and the corresponding eigenvector is y ok. So, we can say that similar
matrices have same eigenvalues of course, their eigenvector may not be same if x is an
eigenvector corresponding to here for A, then corresponding to for B the eigenvector
is y which is P-1x.
Now, if they have same eigenvalues this clearly shows that the characteristic
polynomials of similar matrices are same, since roots are same. If roots are same
eigenvalues are same; that means, characteristic polynomial of matrix A and B which are
similar to each other are also same ok. Now since they have same eigenvalues; that
means, that trace of the two matrices are also same trace is nothing, but sum of
eigenvalues. Now if the similar matrix have same eigenvalues; that means, sum of
eigenvalues are also same and this implies that trace of two matrix are also equal ok.
So, what we have concluded, we have concluded that if two matrices A and B are similar
then their determinant are also equal. Then the tracers of the 2 matrices are equal, trace
of A is equal to trace of B, where A and B are similar matrices they have the same
characteristic polynomial. And hence they have the same characters same characteristic
roots or eigenvalues ok, eigenvectors may not be same ok.
197
(Refer Slide Time: 07:40)
So, these are what they are in this properties you can easily see, if two matrices represent
the same linear operator if and only if the matrices are similar number 1. If matrices are
similar than the linear operators are also same or if they have the linear operator then the
matrices are similar. If A and B are similar matrices then determinant are equal, trace are
equal same characteristic polynomial and same eigenvalues this way we have already
discussed
Now, let us discuss one example suppose A is this matrix and B is this matrix then A and
B are similar, how we have concluded this is very easy to conclude basically if you have
two matrices A and B any two matrices A and B ok.
198
(Refer Slide Time: 08:51)
You have to check whether these two matrices are similar or not. So, you basically take
arbitrary matrix P such that A P = P B it here A and B are 22 matrices. So, you can take
P as [a b; c d], then you can solve the left hand side you can solve the right hand side ok.
And you will get 4 equations in 4 unknowns, you can find a b c d, and if you find a b c d
such that this matrix P is invertible that is ad-bc 0 that clearly means matrix A and B
are similar.
Because we have shown the existence of such P, that is invertible P, such that AP= BP;
So, here also you can find such P which we are getting as this you can easily verify here
that B = P-1A P. So, we can say that these two matrices are similar, you can also verify
here you can see that trace is 10 here also the trace is 10, here the determinant is, 24 +6 =
30. And here also if you simplify you will get 30 the determinant of B because they are
similar. Now, in terms of linear operator what do you mean by diagonalizability.
So, if T is a linear operator on a finite dimensional vector space V then we say that T is
diagonalizable if there exist a basis for, each vector of which is a characteristic vector of
199
T. So, what does it mean? It means that there exist a basis S which is given by u 1 , u 2 ,...u n
of V, for which T(u 1 )= 1 u 1 , it is something like Ax=x. You already know that every
linear transformation corresponding to a matrix respect to some basis.
The matrix representation of T similar to D; that means, there exists an invertible matrix
P again such that matrix representation of T = PDP-1
So, what we have concluded we have concluded suppose we have a matrix A which is a
matrix corresponding to a linear operator matrix representation of a linear operator T.
200
Then we say that this matrix is diagonalizable if it is similar to a diagonal matrix D, D is
a diagonal matrix ok. Diagonal[d 1, d 2 ,... d n ] if A is nn matrix ok. It is similar to a
diagonal matrix means there are exist an invertible matrix P ok. Invertible matrix P such
that A P = P D or A = P D P-1
Now, we know that eigenvalues of A and eigenvalues of its similar matrices are same
this we have already proved ok. So, it is a diagonal matrix and the eigenvalues of
diagonal matrix are nothing, but the diagonal element itself. So, we can say that, here the
diagonal elements of this D are nothing, but eigenvalues of A; that means, if A has an
eigenvalues 1 , 2 ,.... n then this D will be nothing, but [ 1 0 0 0 0; 0 2 0 0 0 0;..... 0 0
0 0 n ] because similar matrices have same eigenvalues now if characteristic
polynomial.
P()T is a product of n distinct linear factors like this (-a 1 )( -a 2 )....( -a n ), then T is
always diagonalizable. If it is a distinct linear factors, all have a power 1 in the
characteristic polynomial then T is a matrix is always diagonalizable, now let us discuss
algebraic and geometric multiplicity of a eigenvalue and eigenvector. So, what is it mean
let us discuss ok.
201
(Refer Slide Time: 14:35)
Suppose A is a 55 matrix and its characteristic polynomial is suppose (-1)( -2)3(+1).
So, it is 3+2 = 5 now corresponding to = 1 there is only 1 factor. So, we can we say
that algebraic multiplicity for = 1 is 1. Algebraic multiplicity means numbers of times
that repeats this = 1 is repeating only 1 time.
202
M will be 1 only it cannot be 0, there is at least one linearly independent eigenvector
correspond to a ok.
Here is a 77 matrix here it is (-2)( -3)2(+5)4. So, for = 1 algebraic multiplicity is 1
for = 3 algebraic multiplicity is 2 for = -5 algebraic multiplicity is 4, geometric
multiplicity is 4.
203
(Refer Slide Time: 19:05)
So, this we have already seen that geometric multiplicity of an eigenvalue does not
exceed its algebraic multiplicity number 1. Now, a matrix is diagonalizable if and only if
geometric multiplicity corresponding to each lambda is same as its algebraic multiplicity
ok; that means, if geometric multiplicity corresponding to each i = algebraic
multiplicity then only matrix will be diagonalizable.
So, this is a sufficient condition, or we can say that if a matrix has an order n then it will
be diagonalizable if and only if it has n linearly dependent eigenvectors because,
corresponding to each geometric multiplicity equal to algebraic multiplicity, number of
linearly independent eigenvectors are total n. So, it must have n linearly independent
eigenvector if matrix is diagonalizable otherwise it will not be diagonalizable. So, let us
discuss these two example quickly and from here we can analyze each and everything, so
what is the first example.
204
The first example is you see A is given to us as [3 -1 1; 7 -5 1; 6 -6 2], eigenvalues which
is given to us are 2, -4 and x. It is given to us or we can find out the eigenvalues by
finding the characteristic polynomial of this matrix. Now, how you can find out x? We
can find out because the sum of eigenvalues is equal to trace of A and trace of A is
simply 3 -5 + 2 = 0. So, it will be x-2 = 0 that is x = 2.
So, let us find eigenvector correspond to = 2, so eigenvector for = 2. So, how you
will find it (A-I)x = 0 this implies = 2. So, put it 2 this implies (A-2I)x=0, it is [1 -1 1;
7 -7 1; 6 -6 0][x 1 x 2 x 3 ] = [0 0 0], Now try to make echelon form of this. The echelon
form comes out to be [1 -1 1; 0 0 -6; 0 0 0]
So, what we have obtained from here it is x 1 - x 2 +x 3 = 0 and -6x 3 = 0. So, this implies
x 3 =0 if you substitute x 3 =0, so we obtain x 1 =x 2 . So, how many linearly independent
eigenvector correspond to =2 it is only 1, which is [x 1 x 1 0] or [1 1 0] you have taken
any one linearly independent eigenvector which is you say [1 1 0].
So, 3 - 2 is 1 and 1 is the number of linear independent eigenvectors so; that means, if r
is the rank of A-I and n is the order of A then n- r will be the geometric multiplicity
corresponding to that . And if it is equal to algebraic multiplicity for each then only
matrix will be diagnosable ok, now you can see here.
205
(Refer Slide Time: 25:17)
Here A= [4 1 -1; 2 5 -2; 1 1 2] and here eigenvalues are 3 3 and , let us see whether it is
diagonalizable or not and if yes what will be P and how we can find the other
expressions in this problem ok.
So, this is a problem let us let try to find it quickly, so the sum of eigenvalues again is
equal to trace of A. So, = 9+2=11, 11-6= 5, so we can say that eigenvalues of this A are
3, 3 and 5. So, we will find number of linearly dependent eigenvector corresponding =
3 first we will check ok. So, for = 3, so for = 3 it is (A-3 I) x= 0 and this implies [1 1 -
1; 2 2 -2; 1 1 -1] x=0. So, when we convert into basically echelon forms we can easily
206
get this matrix [1 1 -1; 0 0 0; 0 0 0], so what is the rank of this matrix, rank is 1 and
what is the order of the matrix is 3, 3-1 = 2; that means, 2 is the geometric multiplicity
corresponding to =3. So, yes because algebraic multiplicity is 2 and geometric
multiplicity is also 2 for =3, for =5 it is 1 only so; that means, this matrix is
diagonalizable ok. So, what are what are those 2 vectors x 1 +x 2 -x 3 = 0. So, we can pick
any two arbitrary linearly depend eigenvector satisfying this equation.
So, how you can find out P because it is diagonalizable now we have ensured that that
this matrix diagonalizable. So how you can find out that P? So, do find P, to find P how
you can find out P we simply write first x 1 vector here x 2 vector here x 3 eigenvector
here. So, what is x 1 eigenvector; x 1 eigenvector we have already find [1-1 0]; x 2 = [1 0 1]
and x 3 = [1 2 1]; this will be that P = [1 1 1; -1 0 2; 0 1 1] and it is always invertible,
because eigenvectors are linearly independent ok. So, it is always invertible ok. So, what
will be P-1, so P-1you can easily find out and the P-1 which you can find will be [2 0 -2; -1
-1 3; 1 1 -1] this you can find out the P-1.
207
Now, AP, so AP = PD and this implies A =PDP-1. So, and what is D here D, will be
nothing, but is the first eigenvector first eigenvector is this x and this is corresponding to
=3. So, here you can write first is [3 0 0] the second eigenvector is corresponding to
again =3.
So, it is [0 3 0] and the third eigenvector for =5. So, it is 5 now suppose if you want to
find out A2 = A.A which is equals to PDP-1 . PDP-1 and this is nothing, but PDP-1PDP-1.
So, it is a P D2P-1; that means, A2 is also diagonalizable ok.
Similarly, if you proceed like this, so what is Ak it is PDkP-1, suppose you want to find
out A10. So, it is PD10P-1, and what is D10, it is [310 0 0; 0 310 0; 0 0 510], P you know P-1
you know D10 you know. So, simply multiplication of these three matrices will give you
A10 which is otherwise we have to find out A10 by ten matrix multiplication suppose you
want to find A50.
Similarly, you can find out A50 also PD50P-1, similarly by the matrix multiplication of
these three, now suppose you want to find out eA.
It is I+A+A2/2!+...... So, if you multiply P you see here it is A= PDP-1. so if you take P-1
eA P. So, this will be P-1P+P-1AP/1!+ P-1A2P/2!+....... And this is I+D/1!+D2/2!+....
208
So, it is eD, so what we have concluded that eA is nothing, but PeDP-1 and P, P-1 be known
how we can find eD, eD is nothing, but I+D /1!+D2/2!.... D is nothing, but here it is [ 1 0
0 ; 0 2 0; 0 0 3 ] here 1 and 2 are 3 and 3 is 5 for this particular problem ok.
Now, square will be [ 1 2 0 0; 0 2 2 0; 0 0 3 2]. So, when you club all these terms, so it is
the first term is 1+ 1 /1! + 1 2/2!+ 1 3/3!+.... so on, which is nothing, but e 1 , second
term is 0 throughout this is 0 this again e2 0 0; 0 e3.
So, eD we can simply find out [e1 0 0; 0 e2 0; 0 0 e3]. So, in this case e eD will be what
e3 0 0; 0 e3 0; 0 0 e5]. So, hence we can find out eA also e raised to power of a matrix by
simply multiplying these three matrices. Similarly, if you want to find out sin(A); this is
a matrix not an angle that also we can find out doing the same derivation same lines
following the same lines similarly cos ( A) matrix this also we can find out.
Thank you.
209
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 16
Special Matrices
Hello friends. So, welcome to lecture series on Matrix Analysis with applications. So,
today we will discuss about Special Matrices ok. What special matrices are and how we
can find out eigenvalues of special mattices, what are the important characteristics,
properties they are having. So, first matrix is symmetric matrix.
AT here, so this matrix is not symmetric. So, we can always construct a symmetric matrix
like this.(Refer Slide Time: 01:27)
210
Suppose you are having a matrix you can arbitrary put any diagonal elements, if it is [1 4
5; 5 2 -7; 5 -7 3]. So, if you find its transpose, you interchange these rows and columns,
you can see easily see that AT= A. So, we can say that A is a symmetric matrix ok.
Now, what are the properties of symmetric matrices? Any real symmetric matrix, real
means all entries are real ok, has only real eigenvalues.
Suppose matrix is symmetric that is A = AT and matrix of order nn. Now, we know that
any matrix A, matrix may have complex roots also. I mean complex eigenvalues also,
but if it is a real symmetric matrix it always have only real eigenvalues, how we can
prove it? So, the proof is simple you can see here.
Let us suppose Ax =x and x 0 ok. So, that means, A has an eigenvalue and the
corresponding eigenvector is x. Now, you take bar both the sides, bar means conjugate,
how you can find the conjugate of a matrix? You simply find conjugate of each element
211
of the matrix that will be x = λ x , you can take conjugate both the sides, now you take
transpose both the sides. So, it is x T()T = ( λ x )T, So, this implies x T()T = λ x T
Now, since A is a real matrix so, will be itself. So, we can write x TA= λ x T , AT= A
because A is symmetric. Now, you have to post multiply both sides by a vector x. So,
this is x TAx= λ x Tx.
Now, this Ax is x, this we have already taken. So, it is x Tx = λ x Tx. So, this implies
can be taken out. So, (- λ )xT x = 0.
Now, if x is this vector [x 1 , x 2 ,....x n ] then x will be [ x 1 , x 2 ,..... x n ]. And what will be
x T? This will be |x 1 2|+|x 2 2|+...+|x n 2| And it is not equal to 0 since x 0. It will be 0 only
when all x i ’s are 0 and all x i = 0 means x = 0, but x 0. So, that mean this is not equal to
0.
Now, this is not equal to 0 this means (- λ ) = 0, because product of these two is equal
to 0. So, this implies = λ and if a number and is equal to its conjugate; that means, is
purly real. So, we have taken an arbitrary for a symmetric matrix A and we have shown
that is real; that means, all eigenvalues are real for a symmetric matrix. So, this is the
first property of symmetric matrix that eigenvalues are always real.
The next property is, it is always diagonalizable; that means, if you have any real
symmetric matrix A then you can always find an invertible matrix p such that AP = PD,
if A is symmetric always. Next is, it has orthogonal eigenvectors, by using the second
property is it easy to show because we note that every symmetric matrix is always,
diagonalizable.
212
(Refer Slide Time: 06:37)
Now, you can take transpose both the sides because A is symmetric. So, it is (P D P-1)T =
AT = A ; A is real symmetric matrix and it is p-1TDTPT. It is (PT)-1DPT. So, it is P D P-1 =
A. So, one of the conclusion from here is P-1= PT or PTP = I..
Now, now what is P? P is simply a matrix of eigenvectors, we have already seen the
construction of such P’s. So, what is P? P is a simply [x 1 , x 2 ,...x n ] these are eigenvector
corresponding to 1 , 2 ,.... n . So, what is PTP, it will be it will be [x 1 , x 2 ,...x n ] in this
row and [x 1 , x 2 ,...x n ] in this column.
So, what is it is equal to, you see this row this column, when you multiply this row this
column this is nothing, but norm of x 1 2 . When you multiply this row with this column,
this will be nothing, but x 1 Tx 2 and similarly it is x1Tx n ; x 2 Tx 1, x 2 2, ... So, on up to x n 2
and this is equal to I, this we have shown this equal to identity means norms of all the
vectors is equal to 1 and all other entries are 0; that means, eigenvectors are orthogonal
in fact.
213
(Refer Slide Time: 09:59)
A square matrix A of order n is said to be skew symmetric matrix if AT= -A. Suppose
you see this matrix, if you take AT of this matrix it is simply equal to -A so, hence this
matrix is a skew symmetric. However, this matrix is not a skew symmetric because when
you take BT, it is not equal to -B for this problem,. So, what are properties of skew
symmetric matrix let us see.
214
(Refer Slide Time: 11:10)
Now, for A skew symmetric, AT= -A, A is nn skew symmetric ok.
Now, if A is skew symmetric the first property is, that if n is odd, order of the matrix is
odd then determinant of A is 0, if A is skew symmetric. Proof is simple you can see here
AT = -A this implies determinant of AT = determinant of -A. This implies |AT| is same as
|A| and determinant of -A is (-1)n|A|. So, if n is odd then (-1)n will be -1. So, determinant
of A will be -|A| and this imply |A|= 0.
So, the first property is that a skew symmetric matrix of odd order has 0 determinant or
any skew symmetric matrix of odd order is always singular the first property. Now,
second property is, all the eigenvalues of a skew symmetric matrix are either 0 or purely
imaginary how we can prove this. So, let us see the proof.
215
Now, AT is equals to -A, A is nn a skew symmetric matrix. So, AX= X, X 0, again
take bar both the sides, take transpose both the sides. So, taking transpose we get this
expression ok, now since A is a real skew symmetric matrix for this case. So, X bar
transpose, it is A, A transpose bar X bar transpose, this implies X bar transpose A
transpose is -A. So, it is -A is equals to bar X bar transpose post multiply both side by
vector X, this implies A X equal to X you can substitute here and this implies bar plus
times X bar transpose X is equal to 0 and X bar transpose X is not equal to 0 this we
have already showed in the last proof. So, as X bar transpose X is not equal to 0. So, this
implies bar plus is equal to 0 or is equals to minus bar and this implies either
equal to 0 or purely imaginary ok.
So, for a real skew symmetric matrix eigenvalues are either 0 or purely imaginary. So,
these are second property for a skew symmetric matrix ok.
So, the property is here the elements of the main diagonal of a skew symmetric matrix
are 0 and therefore, its trace is also 0. This is very easy to show, you see that if A matrix
is skew symmetric matrix then a ii that a diagonal element, diagonal element a 11 , a 22 , a 33
and so on.
216
It will be equal to minus of a ii because AT= -A for all i and this implies aii=0 for all i.
This means diagonal elements of a skew symmetric matrix are all 0, and hence the trace
is also 0 because trace is nothing but sum of diagonal elements.
The next property we have shown that each eigenvalue of a real symmetric matrix A is
either 0 or purely imaginary. Now, if A is a real symmetric matrix then determinant of A
is always greater than or equal to 0, this is again easy to show because they have shown
that eigenvalues of skew symmetric matrix, real skew symmetric matrix are either 0 or
purely imaginary ok. If one of the eigenvalue is 0 so determinant will be 0 because
determinant is nothing, but product of eigenvalues ok. So, A has eigenvalues like 0 ± i
± i like this, the eigenvalues of A, if A is a skew symmetric matrix.
So, if 0 is one of the eigenvalue then the product will be 0; that means, determinant is 0,
if 0 is not an eigenvalue suppose eigenvalues are ± i ± i . So, if 0 is not the eigenvalue of
a skew symmetric matrix A then the eigenvalues are of the types ± i ± i ± i ±.... and
determinant of A will be nothing, but product of eigenvalues.
So, if you multiply these two it will be i.- i So, product will be 2, similarly product of
these will be 2 , product of these will be 2. So, product of I mean eigenvalues will be
something like this. So, we can see the determinant of A will be greater than or equal to
0 because complex roots always exist in pairs, any real skew symmetric matrix of odd
order has determinant 0 that we have already seen.
its own conjugate transpose, that is, T = A. So, let us see this example.
217
(Refer Slide Time: 17:48)
T= A, the first most property for this matrix is you see that a ii which are diagonal
elements conjugate should be equals to aii for all i. So, this implies a ii for all i’s are real.
So, the first most property of Hermitian matrix is that the diagonal elements must be real,
it is a complex matrix, but for Hermitian matrix all diagonal elements must be real. So,
you take this example suppose it is [1 2+3i; 2-3i 4]. When you take the conjugate of this
conjugate is 1 conjugate is 1, conjugate of 2+3i is 2-3i, conjugate of this 2-3i is 2+3i and
4 is 4. Now, when you take transpose of this, it is [1 2+3i; 2-3i 4], So, which is equal to
A So, we can say that this matrix is a Hermitian matrix.
Now, the next property of Hermitian matrix is, all eigenvalues of Hermitian matrix are
real, the proof follows on the same lines as we did for symmetric matrix, you take
AX=X, the proof is same, you can see.
218
Here we are having T=A ok. So, AX = X you can take here, X 0 take bar both the
X TX = X and it is λ X TX. So, this implies - λ X TX= 0 and say this is not equal to 0.
So, this implies = λ . So, hence all eigenvalues of Hermitian matrix are real.
So, the properties of Hermitian matrix are the diagonal element of Hermitian matrix are
necessarily real, all eigenvalues of Hermitian matrix of order n are all real. Now, the
inverse of a invertible Hermitain matrix is also Hermitian ok, you can see the proof is
very easy you see here A conjugate transpose is equal to A because A is Hermitian.
219
So, you can take B bar transpose, B bar transpose is A-1 whole bar and transpose which is
same as A bar inverse whole transpose which is same as A bar transpose whole inverse.
This is by the properties of matrices and A bar transpose is A because A is Hermitian.
So, this is A-1 and this is nothing, but B. So, we have shown that B is equal to B bar
transpose; that means, inverse of a Hermitian matrix if A if invertible is also Hermitian.
Now, the product of two Hermitian matrices A and B is Hermitian if and only if A B = B
A and thus An is Hermitian, if A is Hermitian n is a positive integer. So, let us try to
prove this, the second proof is follows on this. So, product of two Hermitian matrix is
Hermitian this we have to show if and only if AB = BA.
So, let us let us take A and B two Hermitian matrices ok, two Hermitian matrices, that
means, A bar transpose equal to A and B bar transpose equal to B.
So, we have to show that A into B is also Hermitian if A B = B A. So, let us take this
whole transpose and try to show that if A B = B A then this is equal to A B. So, this is
equals to A bar, B bar whose transpose and this is equals to B bar transpose A bar
transpose and this is equals to B into A because A and B are Hermitian matrices and A B
= B A. So, this is equals to A B. So, we have shown that product of two Hermitian
matrices is also Hermitian if A and B, if A B = B A. Similarly, we can show that B A
whole bar transpose is equals to B A, same lines.
220
Now, the converse part, now let us suppose this is Hermitian that is, this is equal to this
and we have to show that A B= B A the second part. So, to show this part, so this implies
A bar B bar whole transpose is equals to A B and this implies bar transpose A bar
transpose is equals to A B and B bar transpose is B and this is A. So, it is equals to A.
So, we have shown that this is equal to this.
So, hence we can show, hence we can if you substitute B = A, then A2 , I mean A2 is
always equal to A2; that means, A2 is also Hermitian if A is Hermitian. And similarly A3,
similarly A4. So, this proof again is simply get so; that means, if A is Hermitian then any
power any positive power of A is also Hermitian.
Now, for a skew Hermitian matrix A with complex entries is called skew Hermitian if, A
conjugate transpose is equal to -A. So, what are the properties of skew Hermitian
matrices?.
You see here A bar transpose is equals to -A; that means, conjugate a ii = -a ii for all i and
this means a ii for all i are either 0 or purely imaginary ok. The first most point for a skew
Hermitian matrix the first property.
221
The second property is all eigenvalues of the skew Hermitian matrices are either 0 or
purely imaginary, the proof we can easily obtain ok, following the same as we did for a
skew symmetric matrix ok.
Next is if A is a skew Hermitian matrix then iA and -iA are Hermitian. So, this is used to
show, you see A is a skew Hermitian that means this is equals to -A.
You take B 1 = iA and B 2 = -iA. So, if it is B 1 conjugate transpose. So, it is -iT. So,
So, similarly we can proceed for B 2 if you take B 2 bar transpose so, iT, because -i bar
to B 2 . So, we have shown that B 2 is Hermitian matrix. So, if A is skew Hermitian then
iA and -iA’s are Hermitian matrices.
222
So, here we have seen some of the important matrices like symmetric matrices, skew
symmetric matrix, Hermitian matrix and skew Hermitian matrices and we have seen
some of the important properties of these special matrices. In the next lecture we will see
some more special matrices like orthogonal matrices, unity matrices etcetera and their
properties.
Thank you.
223
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 17
More on Special Matrices and Gerschgorin Theorem
Hello friends, welcome to lecture series on Matrix Analysis and Applications. So, in the
last lecture, we have seen some special matrices like symmetric matrices, skew
symmetric matrices, Hermitian matrices and skew Hermitian matrices. We have seen
that, what are the important properties of these matrices. So, there are some more special
matrices which we will see in this lecture and what are their properties.
So, first is orthogonal matrix. Now, a square matrix A with real entries whose columns
and rows are orthogonal unit vectors that is ATA=I. An orthogonal matrix is basically a
square matrix where A-1 is nothing but AT. So, such type of real matrices are called
orthogonal matrices. Now, first example this matrix is a orthogonal matrix we can easily
verify you see.
224
(Refer Slide Time: 01:23)
The matrix say simply A= [cos -sin; sin cos] when you take A.AT it comes out I. So,
it is identity matrix, so that means, A-1=AT and hence it is orthogonal matrix. However, if
you see the second example and if you take B.BT for this matrix it is not an identity
matrix. So, this matrix is not an orthogonal matrix.
Now, what are the properties of orthogonal matrix? The first property is the determinant
of orthogonal matrix is either +1 or -1. It is very easy to verify.
Orthogonal matrix are simply A.AT=I. You take determinant both the side |A.AT|=|IT|.
And this implies |A||AT|=|I|, because determinant of identity is 1. Determinant of A and
determinant of A transpose are same, because by changing rows and columns it will not
225
change the value of the determinant. So, we can say that it is |A||A|=1. And this imply
|A|2=1 1 which means |A| = ±1 ok.
So, from here we can say that determinant of an orthogonal matrix is either +1 or -1. So,
this is the first property. Now, all eigenvalues of a real orthogonal matrix are |1|. So,
what does it mean, how we can prove it, you take any orthogonal matrix with real entries
and its eigenvalues always have |1|. So, this is very easy to verify you see.
You see we have a orthogonal matrix; definition of orthogonal matrix is A.AT =I = ATA.
Now, you take AX=X, X0. So, this means is an eigenvalue of A and corresponding
eigenvector is X. Now, you take conjugate both the side because maybe complex entry
Now, you take transpose both the sides. So, if you take transpose both the side, it will be
X TAT= λ T
X T
because λ is a scalar quantity its transpose will be itself. Now, you
multiply both sides by matrix A post multiplication. So, you get this expression
X TATA= λ T
X TA, from expression ATA is identity. So, this implies X T= λ T
X TA.
Now, again you post multiply both the sides by vector X. So, this implies
X TX= λ T
X TA X. And AX is nothing but X from 2. So, this implies X TX= λ T
X T
226
Now, X TX is never 0, because if X is suppose this vector [x 1 x 2 ....x n ] then X TX will be
[x 1 x 2 ....x n ] ¯T [x 1 x 2 ....x n ]. And when you multiply this row with this column, this is
|x 1 |2+|x 2 |2+.....+|x n |2. And it will be 0 only when all x’is are 0, because modulus are the
positive quantities.
So, this will never be 0, because X 0, hence this is not 0, because this is 0 only when X
is 0 ok. So, this quantity is not 0. So, this implies ||=1. So, we can say that all the real all
the eigenvalues of an orthogonal matrix have |1|. Now, the next property is inverse of an
orthogonal matrix is also orthogonal.
So, suppose you have A which is an orthogonal matrix. So, since it is an orthogonal
matrix. So, what does it imply, it imply A.AT= ATA =I. Now, you take B as inverse of
A. And you have to show that B is also orthogonal matrix. So, we have to prove that B is
also an orthogonal matrix. So, in order to show this, what we have to show, we have to
show that B.BT=BT.B=I.
Now, B= A-1. You take this side left hand side into A-1 whole transpose, this is equal to
A-1(AT)-1. This is equal to (ATA)-1, because (A.B)-1=B-1A-1 And this is equal to I-1
because ATA = I. So, this is I. So, similarly this is BTB we can show is equal to I, and
hence we can say that BBT = BTB = I. So, this implies B is also an orthogonal matrix. So,
these are the property of orthogonal matrices.
227
(Refer Slide Time: 09:06)
Next is unitary matrix. Now, a complex unitary matrix is a complex square matrix in
which conjugate transpose is equal to its inverse. If you take the conjugate transpose and
is equal to its inverse, then we say that matrix is a unitary matrix. Now, for example, you
consider this matrix A. So, what is matrix A here.
The matrix A is shown in refer Slide Time: 09:33. Now, if you take compute AH which is
T, it will be equal to 1/(7)1/2[1-2i 1+i; 1-i -1-2i]. Now, when you multiply A.AH. This is
nothing but 1/7[7 0; 07] =I. So, we can verify that A.AH is identity, hence we can say
that A is an unitary matrix. Now, if you take this matrix B, and if you multiply this
matrix B by the by its conjugate transpose, then it is not coming out to be an identity
matrix. So, we can say that it is not a unitary matrix.
228
(Refer Slide Time: 12:32)
Now, what are the properties of unitary matrices? The first property is modulus of
determinant of a unitary matrix is 1. The second property is the eigen value of unitary
matrix are of magnitude 1. So, this property we can easily show you see.
This follows same line as we did for orthogonal matrices. You see here AX=X; X0.
You take bar both the sides. Now, you take transpose. Now multiply both sides by matrix
multiplication by vector X => X TX= λ X TAX And it is (1-||2)( X TX) = 0 => ||=1
because X TX 0. So, like orthogonal matrices unitary matrices also have eigenvalues
|1|.
229
(Refer Slide Time: 14:02)
Now, idempotent matrix, a matrix A is said to be idempotent if A2=A. You can easily
verify if you square this matrix. So, you get matrix itself I mean A itself; however,
square of this matrix is not an idempotent. So, these are very easy to verify.
Now, what are the properties of idempotent matrices. The first property is with the
exception of identity matrix every idempotent matrix is singular. Identity matrix is
idempotent you see I2=I. But if A I you see, idempotent matrix is what A2= A.
Suppose A I, and A is non singular ok; non singular means inverse exist. And if you
post multiply by A inverse both the sides here, it is A2A-1 =A.A-1 It is A = I, which is not
here, because A I contradiction.
230
So, that means, if A2=A that means, idempotent matrices, if A I, it must be singular,
because if it is non singular then we are getting contradiction. So, that means, an
idempotent matrix as (Refer Time: 15:46) identity matrix is always singular that is
inverse does not does not exists. Next is this eigenvalues are either 0 or 1. So, again let
us try to prove it because A2=A.
Now, AX = X, X 0, so that means, A2X = 2X as we have already known this thing
and. So we have concluded from here that 2= because A2= A. So, 2=. So, it implies
is either 0 or 1, since X 0, so 2= hence is either 0 or 1. Now, since eigenvalues are
either 0 or 1 so sum of Eigenvalues, which is equal to trace of a matrix is always an
integer in fact a positive integer. So, trace of an idempotent matrix is always a positive
integer.
Next is nilpotent matrices. Now, a nilpotent matrix is a square matrix N such that Nk=0
for some positive integer k ok. And the smallest such k is called an index of matrix N.
So, suppose you consider this matrix.
231
(Refer Slide Time: 17:44)
So, this matrix is [0 3 1; 0 0 2; 0 0 0]. Now, when you take A2 this is not equal to 0, I
mean 0 matrix. When you take square of this matrix it is [0 0 6; 0 0 0; 0 0 0]; Now yow
take A3 which comes out null matrix. So, what we have concluded we have concluded
that this A matrix is a nilpotent matrix of order 3.
Now, what are the property of nilpotent matrices the first property is. The only eigen
value of N is 0, I mean if N is a nilpotent matrix it has only eigen value of this is 0 so
that is easy to show you see.
232
(Refer Slide Time: 19:36)
Suppose N is a nilpotent matrix then that means, NX =X, X 0. Then NkX = kX. And
Nk is 0 because N is a nilpotent matrices. So, 0 = kX. So, this implies =0 because X0.
So, the only eigen value of nilpotent matrix is equal to 0. So, this can also be proved
suppose N is a nilpotent matrix and 0 let us suppose.
So, if 0, then we obtain then NX =X for X 0; NkX = kX, now the 0 kX. And this
implies this is a contradiction, because 0, X 0. So, this is a contradiction basically.
So, this means cannot be nonzero, the only eigenvalue of N is 0. It cannot be different
from 0.
The next is the characteristic polynomial for N is xn. Now, since all the eigenvalues of N
are 0. So, the characteristic polynomial of N will be (-0)n=0, because 0 is the only
eigenvalue of N so that is why it is xn.
For example, every 22 square matrices squares to 0. always 22 non zero matrices in
fact. Now, the determinant and trace of nilpotent matrix is always 0 because all
eigenvalues are 0, so trace would be 0 sum of eigenvalues and the determinant will be 0
233
which is equal to product of the eigenvalues of course. So, consequently a nilpotent
matrix cannot be invertible. The only nilpotent diagonalizable matrix is a 0 matrix.
Now, next is Gerschgorin theorem. Now, what is Gerschgorin theorem what it states, let
us see. The modulus of the largest eigenvalue of a square matrix if you want to see what
is the largest eigenvalue of the modulus of largest eigenvalue of a matrix then by this
theorem, we can find at least the range. The modulus of the largest eigenvalue of a
square matrix cannot exceed the largest sum of the moduli of its element along any row.
Since, the eigenvalue of a matrix and its transpose are same, so the theorem is also
applicable to column also.
So, what this theorem means basically this theorem means suppose you have a matrix
like this say [1 2 3; 4 0 -1; -2 1 6]. So, this theorem states that the || max{1+2+3,
4+0+1, 2+1+6} =max{6,5,9}=9, is largest eigen value of this. Now, when you see the
column you see || max of column, {1+4+2, 2+0+1, 3+1+6}= 10. So, we can say that
234
mod of maximum lambda of this matrix is less than equal to 9, the intersection of these
two. So, how can we prove this theorem, how can we show this theorem, so proof is
simple you can see.
You take Ax = x, x 0. Now, you suppose x is [x 1r x 2r... x nr ]. We are it which is not
equal to 0. So, you see A is this matrix A is [a 11 a 12 ....a 1n a 21 a 22 ...a 2n ...a n1 ....a nn ].
So, what we will obtain you see you have to multiply each row of A with X. So, this
implies it is |a i1 |x 1r |+|a i2| x 2r |+....|a in ||x nr | = r x ir and this i is from 1 to n. When you put i
=1 it is |a 11 |x 1r |+|a 12| x 2r |+....|a 1n ||x nr | = r x 1r , and similarly for i = 2 to n. Now, you take
modulus both the sides and mod of this because |a+b||a|+|b|. So, we can say that is less
than equals to |a i1 ||x 1 r|+ |a i2| x 2r |+....|a in ||x nr |.
Now, let xs r it is |xs r | = max{ |x 1r |, |x 2r | ... |x nr |]. Suppose, out of maximum of modulus of
these xi’s x sr is the maximum one ok. And of course, it is not equal to 0 because x is not
equal to 0 so maximum cannot be 0. Now, you divide this equation by mod of this x sr .
When you divide both the sides, so inequality will not change; and it is less than equals
to |a i1 |+|a i2 |+.....|a in |. Because |x ir | |x ir |...|x sr | 1 for all i because this is a maximum one.
So, all we can get this inequality.
235
Now, this is valid for every i. So, this is valid for i = s also. So, for i = s what we obtain
| s | |a s1 +|a s2| +..... |a sn |. so which is equals to summation of j varying from 1 to n |a sj |.
Now, if it is less than equal to sum of this, so it will be less than equal to maximum of
maximum over i basically sum of j from 1 to n |a ij |.
So, we can say that the eigenvalue of matrix A is less than equals to maximum of sum of
the rows. Now, we have one result based on this that is for each eigenvalue there exists
a circle with centre a ss and radius this, such that it lies inside the circle or on its
boundary.
What is the matrix A here. The matrix A is [1 -1 2; -2 2]. So, what are eigenvalues of this
matrix if you quickly see, so eigenvalues of this matrix are simply sum of eigenvalues is
0 and the product of eigenvalues is simply -1 and
+1 +2 ok, so it is 1. So, eigenvalues what are eigenvalues then that we can easily find out
236
solving these two equations. So, these are nothing but you see you see - 1 2=1. So, =±i.
So, 1 = i, 2 = -i. This you can easily verify; sum is 0 and the product is 1.
Now, if you see -a ss here fthe first row is 1 is less than equals to sum of these two other
than diagonal elements that is 1. Now, +1 centre here is -1 less than equals to excluding
this entry because we are excluding j= s here. So, that is why, so it is equal less than
equal to 2.
Now, when you draw these two, here centre is 1 and radius is 1; means this circle. And
here centre is -1 and radius is 2 mean this circle something like this. Now, when you see
the first which is i, its modulus is 1; so it is inside the circle that means, inside the
second circle. And its eigenvalue, its modulus is also 1 and it is also inside a second
circle.
So, what this is what this theorem states that in A for each eigenvalue there exist a
centre a ss and the radius such that it lies inside a circle or on its boundary. So, the same
theorem can also be verified on this example. So, hence we have seen that what are the
other properties of some special matrices like a unitary matrices, orthogonal, idempotent
and so on and also how we can find out the range of eigenvalues if matrix is given to us
using Gerschgorin theorem.
Thank you.
237
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 18
Inner Product Spaces
Hello friends, welcome to lecture series on Matrix Analysis with Applications. Our
today’s lecture is based on inner product. What inner product is and how we will see
some important properties of inner product also.
So, what are inner product let us see. So, let F be a field of real numbers or the field of
complex numbers and V be a vector space over F. An inner product on the vector space
V is a function which is denoted by this from VV to F satisfying the following
properties. For every x, y, z in V and for every alpha in field, there are four properties
that must hold.
238
(Refer Slide Time: 01:18)
The first property is inner product of < x, x > 0; and < x, x > = 0 if and only if x = 0,
is a first property. The second property is inner product of <x, y> = < x, y > , where bar
is a complex conjugate. The third one is inner product of <x+y, z> is same as inner
product of <x+y, z> + <x, z>+<y, z>. And the four property is<x, y> = <x, y> where
is any scalar in field.
So, if you have a function which is defined like this from VV to F satisfying these four
properties then that product is called inner product. And that vector space defined in
which this inner product is defined is called inner product space.
Now, one important property which can be seen from the four property you see if you
take inner product of <x, y> so that will be equal to inner product of <y, x> whole bar
by second property. So, this can be written as using this property, this can be written as
<y, x> whole bar which is equals to bar < y, x> whole bar. And this is bar. And the
inner product the complex conjugate of this from the second property is equals to <x,y>.
So, if this scalar is in the second component, then it is bar; and if it is a first
component, then it is .
239
(Refer Slide Time: 03:25)
Now, let us discuss first few examples based on this. The first example is the dot product
in Rn. If we define a dot product in Rn which is defined as this then this is a inner product
space, defines the inner product. Let us see how.
You see you are taking V as Rn and field as R ok. And if you take u as [a 1, a 2, ....a n ]; and
v as [b 1, b 2, ...b n ], then the inner product of u and v is simply summation of a i b i , i from
1 to n. Now, this can easily be shown you see. The first property is inner product of
240
<u,u>., it is summation i from 1 to n a i .a i , which is simply summation i from 1 to n ai2
or a 1 2+a 2 2....a n 2 that is always greater than equal to 0.
And this inner product is equal to 0 if and only if you see if it is equal to 0, this implies
a i = 0. And this implies a i = 0 for all i; a i = 0 for all i. And this means u= 0. And of
course, if u = 0, then this implies inner product <0, 0> is of course 0 by this definition.
So, the first property holds.
Now, for second property you can see inner product <u, v> if you write it is a 1 b 1 +
a 2 b 2+.... a n b n , this can be written as b 1 a 1 +b 2 a 2 +...b n a n . So, this is same as inner product
<v, u> because we are talking in real space, I mean real field is a real vector space so bar
is itself.
Now, the third property isinner product <u+v,w>, if you take w = [c 1 , c 2 ,...,c n ] then
<u+v,w> = summation i from 1 to n (a i +b i )c i by this definition which is summation
over i a i c i + summation over i b i c i . And this is equal to <u, w> + <v, w>. So, third
property also holds.
Now, the fourth property is inner product <u, v> = summation i from 1 to n (a i b i ) as
you can already see. And this is equals to summation i from 1 to n a i b i which is equal
to <u, v>. So, hence we have seen that all the four properties hold over this definition
when we define the inner product by this. All the four properties hold. So, this defines
inner product. And the vector space over which this inner product is defined we are
calling that vector space as inner product space.
So, there may be some other ways also to define the inner product over Rn, this is one of
the way we are defining the inner product between u and v. Now, if you take Cn I mean
vector space as Cn over the complex field then the inner product may be defined like this.
This is again formed an inner product this can easily be verified.
241
(Refer Slide Time: 07:38)
You see we are taking V as Cn and field as C. Now, inner product of u and v for any u, v
in Cn is defined like this i from 1 to nu i ’s v i’s bar conjugate of this. We are taking u as
[u 1 , u 2 ,...u n ] and v as [v 1 , v 2 ,...v n ].
So, the first property is that inner product <u,u> = summation over i u i .u i bar which is
summation over i |u i 2| squares. This means |u 1 2|+|u 2 2|+|u 3 2|+....+|u n 2|. So, this is always
greater than equal to 0 for all u. Now, if this is equal to 0 this implies
|u 1 2|+|u 2 2|+|u 3 2|+....+|u n 2| = 0 and this implies |u n |=0 for all i and this implies u i =0 for all i
and; that means, u= 0. Now, if u = 0 of course, inner product <0,0> = 0 so, the first
property holds.
Now, for second property inner product <u, v> if you take, is summation i from 1 to n
u i v i bar which can be written as summation i from 1 to n v i u i bar whole bar. And this
is inner product < v , >u whole bar. So, second property also holds.
Now, the third property is summation <u+v,w> = u i +v i .w i bar over i where we are
taking w as [w 1, w 2,..... w n ]. So, this is summation over i u i w i bar + summation over i
v i w i bar; and this is inner product <u, w> + <v, w>.
The fourth property is inner product <u, w>= summation over i u i .w i bar which is
equals to summation over i u i, w i bar, which is < u, w>. So, we have shown all the
242
four properties over this definition that means, this defines an inner product Cn over the
complex field.
Now, similarly we can go for the third problem that if we define a vector space, if we
take a vector space of all real continuous functions on the close interval [a b], and we
defined as the product as a to b f(t)g(t) dt, then this defines an inner product very easy
to show. Because you can simply see that if you take f and f inner product <f, f> which is
simply a to bf2dt which is always greater than equal to 0 for every f. And if inner
product <f, f>=0, that means, a to bf2 = 0 and that is true only when f = 0. So, the first
property holds.
Now, when you take inner product <f, g> or <g, f> both are same because f(t).g(t) is
same as g(t).f(t). Now, inner product <f+g, h> if you take (f+g). h, so the third property
can be directly obtained. And (f, g) is you can simply take outside the integration.
And we can get the fourth property also. So, these definitions define inner product on the
vector space C of close [a b].
Now, if you take this example, we consider all the real matrices of order mn over the
real field, and this defines the inner product. How it is defined inner product let us see.
243
(Refer Slide Time: 12:24)
So, here we are taking vector space as all matrix of order mn real matrix. And inner
product between matrices A and B we are defined as trace of BTA which is simply
summation over i, summation over j a ij b ij . Now, the first property is that inner product
<A, A> means trace of ATA which means summation over i summation over j a ij 2 , that
is always greater equal to 0 of course, because sum of non negative quantities.
Now, inner product <A, A> = 0 implies summation i summation j a ij 2 = 0 and that is
true only when a ij = 0 for all i and j and that means matrix A itself is 0. And again if
matrix A is 0 the definition trivially holds because it is inner product of 0 and 0 that
means, trace of 0T0 which is of course, 0. So, the first property holds.
Now, if I take the second property that is <A,B> which is same as trace of BTA which is
given by summation i summation j a ij b ij , this can be written as summation over i
summation over j b ij a ij ,. So, we can easily write with a trace of ATB which is inner
product <B,A>.
Now, the third property is inner product <A+B,C> which is same as trace of CTA+B by
the definition which is trace of CTA+CTB. And this can be written as by the property of
trace, the trace of CTA+ trace of CTB, which is inner product <A,C> + <B,C> So, third
property also holds.
244
Now, for the last property it is inner product <A,B> = trace of BTA which can be
written as summation over i summation over j b ij a ij >. So, a ij because you are
multiplying A with .. This can be written as summation over i summation over j
a ij b ij . And this will be times trace of BTA or < A ,B>. So, hence we can say that all
the four properties holds over this definition that means, this defines an inner product for
this vector space over this field. So, these are the few examples of inner product.
Now, when we can say that two vectors are orthogonal, so two vectors are said to be
orthogonal if the inner product of them is 0. Then we say that u and v are orthogonal.
Now, let us see this problem consider vectors u = (1, 1, 1), v = (1, 2, -3) and w=(1, -4, 3).
So, the first problem is u orthogonal to v and w. So, you see here we are defining inner
product as a standard inner product that is a dot product. If nothing is given that means
we are taking the inner product as the dot product of the two vectors.
Now, is v orthogonal w? So, to see that v is orthogonal to w try to find out the inner
product of v and w; and if it is equal to 0; that means v is orthogonal to w but it is not
equal to 0 clearly. So, v is not orthogonal to w. Now, find a nonzero vector w that is
orthogonal to u 1 and also u 2 .
245
(Refer Slide Time: 17:47)
So, what is u 1 and what is u 2 . Here u 1 = (1, 2, 1); and u 2 = (2, 5, 4). So, we have to find a
nonzero vector w which is orthogonal to u 1 and is orthogonal to u 2 also that means the
inner product of w and u 1 and inner product of w and u 2 is equal to 0. So, a vector which
is orthogonal to these two vector is simply the cross product of these two. You simply
find the cross product of these two that will be w, which is orthogonal to these two
vectors. So, how to find w, so w will be simply you find the cross product of these two,
cross product of these two will be | i, j, k; 1, 2, 1; 2, 5, 4]. So, it is 3i-2j+k.
So, we can say that w is simply (3, -2, 1) you can check also, if you take the inner
product of w and u 1 , which is 0. Inner product of w and u 2 is again 0. It will be because
it is the cross product of these two vectors and hence w will be orthogonal to these two
vectors. Or the other way out is in fact any (3, -2, 1) will be orthogonal to u 1 and u 2
both. Now, how you define orthogonal complements.
246
(Refer Slide Time: 19:39)
247
So, how we define orthogonal complement, orthogonal compliment is basically a subset
of vector space V which is defined as all those v in V such that inner product of v and u
is equal to 0 for all u in S, S is a subset of V. Now, in order to show that this is a
subspace of this vector space V let v 1 , v 2 in S orthogonal compliment and belongs to
field. And we have to show that v 1 +v 2 also belongs to S complement. To show that it is
a subspace we have to show that this also belongs to orthogonal compliment of S.
So, how we will show this? So, let us try to find out the inner product of this with any u,
any u in S. Now, by the definition of inner product, this is u+v 2 , v 2 , u. Again this is
times inner product of v 1 u+ inner product of v 2 u. Now, since uv 1 and v 2 are in
orthogonal compliment of S, so this is 0 and this is 0 for any u. So, this is equal to 0. So,
we have shown that v 1 +v 2 belongs to orthogonal complement of S and that means
orthogonal complement of S is a subspace of V.
248
(Refer Slide Time: 22:53)
Inner product of v and u is equal to 0. So, geometrically it is simply all the planes which
are passing through origin because it is equal to 0 and perpendicular to u it is very clear
from the definition itself.
You see here we are having A.X, A is a matrix [a 11 , a 12 ,...a 1n ....a n1 , a n2 ,...a m1 a m2 a mn ]
and here it is [x 1 , x,...x n ]=0. What does this mean, this means that when you multiply
this row with this column it is equal to 0. The second row with this column is equal to 0;
the third row with this column is equal to 0. So, what does it mean this means that every
row is orthogonal to this that is each solution is orthogonal to each row because the inner
product of row and the x i is 0.
So, each solution is orthogonal to each row and hence W is orthogonal complement of
the row space of A. So, this W which is a solution space of this is simply the orthogonal
complement of row space of A, row space is a space generated by the rows of A. So, this
is all about inner product space. So, in this lecture, we have seen that what inner product
space is and what are the properties of inner product. We have also seen that if a set is
known to you then how we can find out orthogonal complement of that S.
Thank you.
249
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 19
Vector and Matrix Norms
Hello friends welcome to lecture series on Matrix Analysis with Applications. So, today
we will deal with Vector and Matrix Norms. What do you mean by a vector norm and
how can you find out a matrix norm. So, that we discuss in this lecture, first is what you
mean by norm of a vector.
The norm of a vector x which may be in Rn or in Cn, Cn means complex tuple is a real
valued function, which is defined from Rn or Cn , it is always a real value which satisfy
the following axioms. The first axiom is positivity, positivity means norm of any vector
x is always greater than equal to 0 for any x.
And norm ||x||=0 if and only if x is itself 0 second property is homogeneity; that means,
norm ||kx||=||k||.||x|| where k is a scalar, which is in R or C. The third property is
subaddivity which is norm ||x+y||||x||+||y||, something like triangle or triangular
inequality where x, y belongs to Rn or Cn.
So, if vector x satisfying these three properties then that is called norm of that vector that
is a norm induced in that vector; So, there are different type of norms which can be
induced on Rn or Cn, now let us discuss these things by an examples suppose.
250
(Refer Slide Time: 02:11)
Suppose you are in Rn, then the norm 1 we are got calling is norm ||x|| 1 , norm 1 of vector
x which is (x 1 , x 2,..... x n ) is defined as sum of mod of x i’ s where i is varying from 1 to n,
so let us discuss this thing.
Now, is it a norm first of all is it a norm how can we show that it is a norm. So, in order
to show that this is a norm we have to show the 3 axioms which we have studied earlier.
251
Now, the first axiom is norm of x is always greater than equal to 0, which is true because
sum of absolute values cannot be less than 0 we are taking the modulus and then we are
adding them it cannot be less than 0. So, which is true since |x i | 0 for all i and hence,
sum over i |x i | 0, so this is true.
Now, ||x1|| = 0 if and only if x = 0, now if x = 0 suppose this vector is a 0 vector then of
course, the sum of the absolute values is 0. So, the first part is over now if ||x 1 ||= 0 then
this means ||x 1 || +||x 2 ||+||x 3 ||+.....||x n ||=0.
If sum of positive values is equal to 0, this means each x i = 0 for all i, because this is
positive and if sum of positive quantities is equal to 0 this means each quantity is 0. And
|x i |=0 means what x i =0 for all i. So, this means x= 0, so we have shown that first
property holds which is positivity.
Now, let us try to show a second property, second property is ||kx||; that means, you
multiply this vector x by k, a scalar k. So, what will be the norm of this by definition this
will be summation i from 1 to n |kx i | = summation over i |k||x i |, by the definition this
k is a scalar quantity can be taken out. So, it is, sum over i |k||x i | and this is simply
|k|.||x|| 1 . So, we have shown that ||k x|| =||k||.||x|| 1 , so property holds.
Now, the third axiom is ||x+y||||x||+||y||. How we can show this is equal to sum of i from
1 to n |x i |+|yi |, here x = (x 1 , x 2,..... x n ) and y=(y1 , y 2,..... yn )
Now, here it is ||x+y|| 1 = sum of i from 1 to n ||x i +y i . Now |x i +y i | |x i |+|y i | for all i, this
is by the property of modulus, I mean mod. So, this implies sum will also be sum over i
|x i |+ sum over i |y i |, and this implies ||x|| 1 +||y|| 1 ||x|| 1 +||y|| 1 .
252
So, we have shown all the three properties. So, we can say that the definition given here
of the norm is correct I mean this is a norm, so this is how we define norm 1 of vector x.
Now, let us come to the second part norm 2 in Rn, norm 2 of vector x in Rn can be
defined like this it is the sum of square of vectors and the under root of that expression
that is a norm 2 of a vector x. So, this also defines a norm, so how we can show this.
Now, here norm2 of x we are taking as sum over i from 1 to n (x i 2)1/2 that is
(x 1 2+x 2 2+.......x n 2)1/2. So, the first property is that norm of vector is always greater than
equal to 0 which is true because you see all these quantities are nonnegative. So, under
root is defined and under root is always greater than equal to 0. So, this is; obviously,
true by the definition that norm 2 of ||x|| 2 0 for any x this is true by the definition itself.
Now, we have to show that norm2 ||x|| 2 = 0, if and only if x = 0. Now if x = 0; that
means, (x 1 , x 2,..... x n ) all are 0, so if x = 0, so this expression is 0. So, ||x|| 2 is obviously 0
if x=0
Now, let ||x|| 2 = 0, so this implies (x 1 2 + x 2 2 +,.....+ x n 2)1/2= 0. So, this implies x 1 2 + x 2 2 +,.....+
x n 2 = 0, and since it is a sum of square of positive quantities and this is equal to 0 this
means x i 2=0 for all i and this means x i =0 for all i this means x = 0. So, we have shown
253
that if norm is equal to 0; that means, x itself equal to 0, so the first property we have
shown.
Now, second property is ||k x|| 2 =|k|.||x|| 2 . So, ||x|| 2 which is by the definition; that means,
we are multiplying each x i ’s by k. So, what is the norm of this quantity then by definition
it is (k2x 1 2+k2x 2 2+ ,..... + k2x n 2)1/2. So, (k2)1/2 (x 1 2 + x 2 2 +,.....+ x n 2)1/2 and this is |k| ||x|| 2 . So,
second property also holds true. Now, you have to show the third property, so how we
can show the third property.
The third property is ||x+y|| 2 ||x|| 2 +||y|| 2 , this we have to show, in order to show that this
is a norm. So, this will be equal to by the definition it is summation i from 1 to n x i +y i,
you make it square sum, sum of a square of all the components to the power 1/2 that will
be ((x i +y i )2)1/2.
So, we have to show that norm of ||x+y|| 2 ||x|| 2 +||y|| 2 , this we have to show. So, from
here summation i from 1 to n ((x i +y i )2)1/2 summation i from 1 to n (x i 2)1/2 +
summation i from 1 to n (yi 2)1/2, we have to show that this inequality is true.
Now, let us square both the sides, if you square both the sides what we will obtain we
will obtain sum i from 1 to n (x i +yi )2) (x i 2)+ (y i 2)+2(x i 2)1/2(yi 2)1/2. And when
you further simplify this what you will obtain you see a plus b whole square is a square
254
plus b square plus 2 a b. So, square, will cancel out. So, 2 will cancel from this 2. So,
what we will obtain finally, from here after simplification this expression we will obtain.
Now, if any how we prove that this is true then means this is true and this means this is
true so; that means, we have done. So, any how we have to prove that this expression is
true. So, now, let us try to prove that this expression is true and this is sometimes also
called Cauchy’s Schwarz inequality. So, how we will prove Cauchy’s Schwarz
inequality for this real case let us see.
Now, you see it can be proven using quadratic expression how we can do that you simply
consider this polynomial consider you see. It is (x 1 z+y1 )2 + (x 2 z+y2 )2
+.......+(x n z+yn )20. This is very obvious sum of square quantities or sum of non-
negative expressions is always greater than equal to 0 that is very true.
Now, what we will obtain from here, if you simplify this what we obtain it is z2( sum
over i, ( x i 2+2(x i yi )z+yi 2) 0, you simply open each square and you simplify you
will get this expression. So, this is something like az2+bz+c0 where a is this expression
b is twice of this expression and c is this expression.
Now, if a polynomial of degree 2 is greater than equal to 0 what does that mean for any z
z is a arbitrary like variable. If z is a arbitrary variable and for any arbitrary variable this
expression is always greater than equal to 0 this means a parabola, because a quadratic
polynomial always represent a parabola.
So, this, parabola for any arbitrary z is always greater than equal to 0 this means parabola
is always on the upper side. Either this or this for any z; that means, it has at most 1 root
it has at most 1 real root. so this imply discriminant of this polynomial will be less than
or equal to 0. And what is the discriminant, discriminant b2-4ac
Now, 4 4 will cancels out with 0 4 will go with 0, so what we obtain from here that
summation over i (x i yi )2 sum over i x i 2y i 2.
So, these are same expression, which is here if you take under root both the sides we get
the same expression over here if this is true then this is true, and if this is true then this is
true and hence this is true. So, we obtain the third inequality over here. So, hence we can
255
see that the norm which is induced in this manner, I mean the definition of norm is true
the next is, so this is an norm 2 of a vector x.
Now, the infinite norm of a vector x is given as maximum{ |x i |}= ||x|| a notation for
infinite norm. This is now it, it is also a norm induced in a vector x this is very easy to
show again, we have to show all the three properties.
Infinite norm of vector x is maximum over i {|x i |, where x is (x 1 , x 2 ,....x n ). Now, the first
property property is norm is always greater than or equal to 0 which is; obviously, true
because you are taking maximum of absolute values which cannot be negative number 1.
So, first property that norm of this quantity is always greater than equal to 0 it is;
obviously, true by the definition itself.
256
Now, if x is equal to 0 of course, norm infinite norm will be 0 because the maximum will
be 0. And if norm infinite is equal to 0 this means maximum over i {|x i | = 0. And each
|x j | maximum over i {|xi|, you see because x j maximum over i {|x i | is always true
and it is equal to 0 by this; that means,|x j | 0 for all j if it is less then equal to 0 for all j
means 0.
So, because it cannot be less than 0 it is a mod value cannot be less than 0. So, x j = 0 that
means, x = 0. So, we have shown that x = 0 if this happens. So, the first property holds.
Now, second property is ||k x|| = |k|.|| x|| , we have to show. So, it is maximum of it is
easy to show again ||k x i || = |k| maximum over i {|x i | which is equals to |k|.||x|| So, a
second property; obviously, hold.
Similarly, we can easily show the third property also, this case ||x+y|| is maximum over
i { |x i +yi |. And since |x i +yi | |x i |+|yi |, you take the maximum both the sides and you can
easily obtain that this is less than or equal to maximum over i |x i |+ maximum over i |y i | =
||x|| +||y|| . So, hence we have shown all the three properties for this norm. So, this is
the norm induced by induce of a vector x.
The last 1 here is the norm p, norm on a vector x where p 1, if p = 2. So, it will be a
norm 2 or norm 2 of a vector x is defined like this. So, we can prove all the three
properties of this norm also for the third one, we have to use generalised Cauchy’s
Schwarz inequality.
So, suppose you are having this vector (1, -1, 2)T and you want to find out 1 norm of this
vector, so was is the first norm of this vector.
257
(Refer Slide Time: 21:41)
What is x here x is (1, -1, 2), now norm 1 of this vector we have already defined it is i
from 1 to n |xi| and here n=3. So, it is summation i from 1, 2 3|x i | which is |1|+|-1|+|2|
which is 4.
What is norm 2, is simply square of each x i’s and the under root of this, that will be it is
(4+1+4)1/2 = (6)1/2. Infinite norm of this will be maximum over i |x i |. Now, mod of first
quantity is 1, mod of second quantity is 1, mod of third quantity is 2, maximum of three
quantities is 2. So, this is infinite norm of this vector, so in this way we can find out
norm 1, norm 2, norm infinite or p norm for any p1.
Now, how can you define matrix norm, norm of a matrix. So, let us see here suppose we
are having a mn real or complex matrix. So, norm of a real or complex matrix of order
mn is a real valued function which satisfy the following properties. What are those
properties? The first property is same as a vector norm that is norm of a matrix is always
greater than equal to 0 and this is equal to 0, if and only if A is a 0 matrix or a null matrix
258
the first property is same. The second property is also same that is norm of A = ||.||A||
where is a scalar maybe complex (Refer Time: 23:48)
Now, ||A+B||||A||+||B||. So, these three properties are similar to those properties which
we have discussed for finding norm of a vector. Here instead of vector we are having a
matrix A. Additionally in case of a square matrices some matrix norm satisfy this
condition also that is ||A.B|| ||A||.||B||. And the matrix norm that satisfy this addition
property is called sub multiplicative norm, that is true for square matrices may be true for
a square matrices.
Now, what are the various methods to induce a norm in a matrix. So, what are various
things? The first is maximum column sum which we are denoting by a norm 1 of a
matrix A.
This is, you sum all the mod of all this columns and you take the maximum over j that is
the maximum column sum let us take this thing then it should be clear.
259
(Refer Slide Time: 25:02)
You see you are having a matrix of order mn matrix A may be [a 11 a 12.... a 1n ; a 21
a 22 ....a 2n ; a m1 a m2 ,...a mn ], this is a matrix A.
Now, what is norm 1 of this matrix, ||A|| 1 = max over j from 1 to m {|a ij |}. Now, let us
try to prove that this is a norm first of all; that means, satisfy all the three properties. So,
the first property is ||A|| 1 0 which is of course, true because you are adding the non-
negative quantities. And then taking the maximum which cannot be negative, so it is;
obviously, true by the definition itself.
Now, if A is a null matrix then of course, norm 1 of matrices equal to 0. Now let norm 1
of this matrix is equal to 0 then we have to show that only possibility is A is 0, I mean
null matrix then norm is equal to 0 this means maximum over j sum i from 1 to m
max{|a ij |} =0. Now, maximum of sum of this is equal to 0 means what you see this
means maximum over j the first element is
max{|a 11 |+|a 21 |+....|a 1m |,.......|a 1n |+|a 2n |+.....|a mn | }= 0
Now, maximum of this is equal to 0 means what each element, I mean suppose
maximum is obtained at j = p; that means, suppose this is equals to |a 1 |. Suppose it is
obtained at p; that means, |a 1p |+|a 2p |+.... |a mp |= 0, if maximum is obtain at j = p. Now,
each values each a 1i +a 2i +......+a mi a 1p +......+a mp because this is a maximum value and
260
which is 0 for all i, and this sum of non-negative quantity is less than equal to 0, it cannot
be negative because it is a sum of non-negative expressions.
So, this means it is equal to 0 and if it is equal to 0 this means each a ij each a 1i = 0 for all
i, a 1i = 0 , a 2i = 0 so; that means matrix A = 0. So, we have shown that the first property
holds for norm 1 of a matrix A. Now, if you go to a second property if using this
definition if you go for a second property.
Norm 1 is defined as maximum over j (i from 1 to m |a ij |). So, norm ||KA|| 1 will be
maximum over j. ( i from 1 to m |kij|.|a ij |. So, it is |K| ||A|| 1 , and similarly we can show
the third properties also that norm ||A+B|| 1 ||A|| 1 +||B|| 1, this property we have to show.
You see you take the left hand side, which is equals to maximum over j ( i from 1 to m
|a ij +b ij |).
Now, again you can use |a ij +b ij |) |a ij |+|b ij |. Now you take the sum over i both the sides,
if you take the sum of over i both the sides and now you take the maximum both the
sides you will get the required expression a required this expression.
So, hence we can say that the definition of this is a norm induced on the matrix A, which
we are calling as norm 1. Similarly, we can define maximum rows sum instead of
column sum we are taking row sum and then we are taking the maximum. So, this we are
denoting by a non-infinite of a matrix A, similarly we can go the proof of this next is
Euclidean norm.
261
Now, Euclidean norm is simply, you take square of all the elements of the matrix A and
the under root of this sum of square of all the elements of matrix A. And you take the
under root this is a Euclidean norm of matrix A and this can also be obtained by finding
trace of A transpose A, trace is sum of diagonal elements. You find the sum of diagonal
elements of A transpose A, put it under root, now you get the same expression, this you
can easily verify and it is a norm this we can prove also. So, these are three different
norms which can be induced on a matrix A, this is norm 1, this is infinite norm and this
is Euclidean norm on a matrix A.
Now, suppose you have this problem. In this problem what is A? A is [2 -1 0; 3 1 -2; -2
0 2].
Now, what is norm 1 of this matrix? Norm 1 of this matrix is simply you find absolute of
sum of each column and then take the maximum; that means, ||A|| 1 = max{|2|+|3|+|-2|, |-
1|+|1|+|0|, |0|+|-2|+|2|}, so norm 1 of this matrix is 7. Now, what is norm infinite of this
matrix, you take maximum over row, ||A|| = max{ |2|+|-1|+|0|, |3|+|1|+|-2|, |0|+|-2|+|2|}
So, maximum is 6 and what is Euclidean norm of this matrix A, it is you find sum of
square of all the elements it is ||A|| E = (4+1+0+9+1+4+4+0+4)1/2 = (27)1/2, So, in this
way we can find out norm 1, infinite norm, and Euclidean norm of matrix.
262
(Refer Slide Time: 35:11)
Now, when we can say that matrix is compatible, a norm of a matrix which is given by
this is said to be compatible or consistent with a norm of vector x if norm ||Ax|| ||A||
||x||, here x is a vector and A is a matrix. So, if this inequality hold then we say that norm
of a matrix A is compatible with norm of a vector.
Now, we have again another result let A be a matrix of order n and let x belongs to Cn
then the function, which is a matrix of order nn C R defined by this expression,
where x 0 x is a vector. So, the proof is very easy one can easily show the proof, you
have to show all the three properties of a matrix norms this can be easily obtained. So,
we can show that this is a norm a spectral norm.
Now, let A be matrix of order n, the spectral norm of a matrix A is defined as (largest
eigenvalue of AHA)1/2. If A is a real matrix it will be ATA. If A is a complex matrix, it is
263
TA whole raised to power half. So, you find the largest eigenvalue of this matrix take
the under root of that, that is the norm 2 of a matrix A, and we are also calling as spectral
norm of A. So, for a real matrix spectral norm is simply (largest eigenvalue of ATA )1/2.
Suppose you want to find out the spectral norm of this matrix.
So, what this matrix is? This matrix is [1 1+i; 1-i 2]. It is basically a complex matrix, so
we will find AHA. So, what is AH, What is ? = [1 1-i; 1+i 2] its transpose which is [1
1-i; 1+i 2]. So, it is basically A itself, so it is basically Hermitian matrix; if A is equals to
AH means it is a Hermitian matrix.
So, AHA, it is basically A2. For norm 2 of A, you have to find out the (largest eigenvalue
of TA)1/2 that is (largest eigenvalue of A2)1/2. So, you find out the largest eigenvalue of
A2, it will be the largest eigenvalue of A2 because we know that if lambda is a eigenvalue
of A then 2 will be the eigenvalue of A2
So, you find out the eigenvalues of this matrix. How to find Eigen values of matrix? You
simply (A-I) = 0. So, this gives = 0 and 3.
So, the largest eigenvalue of A is 3. So, what is the largest eigenvalue of A2? It is 9. So,
this will be simply 91/2 which is 3. So, in this way we can find out the spectral norm of a
264
matrix A or norm 2 of a matrix A. So, this is how we can find out norm of a vector and
norm of a matrix.
So, in the next few lectures we will see that what are the applications of norm of a
matrix.
Thank you.
265
Matrix Analysis with Applications
Dr. S. K. Gupta
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 20
Gram Schmidt Process
Hello, friends. Welcome to lecture series on Matrix Analysis with Applications. So,
today lecture is Gram Schmidt process, what Gram Schmidt process is and how it is used
to find an orthogonal set of vectors.
So, first let us understand let us recall our definition of orthogonal set. We know that if a
subset {v 1 , v 2 ,....v n } of a vector space V with inner product defined like this is called an
orthogonal set, if inner product of v i , v j is equal to 0 when i j, that is any two distinct
vectors in this set are perpendicular.
For example, if you see this set if you take the inner product of these two say the set is
{(-1, 1, 0), (1, 1, 0), (0, 0, 1)}.. You take the inner product of v 1 and v 2 , the usual inner
product. The usual inner product between two vectors in real dimensional vector space is
simply the standard dot product. So, it would be a dot product of these two vectors.
266
So, -1.1 +1.1+0.0 =0. Similarly, you take inner product of v 2 and v 3 , it is simply you
take 1.0+1.0+0.1=0. Similarly, inner product of v 1 and v 3 is also 0. You can easily verify
this So, we can say that v i and v j for ij, is 0. So, means this set of vector is orthogonal.
Now, if you take the inner product between these two vectors it is -2 +4 which is 2,
which is not equal to 0; that means, this set is not an orthogonal set.
Now, beside an orthogonal set if it also satisfy that norm of each vector is 1; norm
means, how we define a norm of vector?
267
Norm if we want to define norm then norm of a vector v is nothing, but under root of
inner product of v with itself, this is simply length of a vector or a norm of a vector. So,
if we have a set of vector say (v 1 , v 2 ,.....v n ) so, this set is called an orthonormal set, if
inner product of vi, vj is 0 for i j and inner product of v i ,v i = 1, for all i. Or, we can say
that inner product of <v i , v> j = 0 for i j, and <v i , v> j = 1 for i= j and if it hold for all i
and j then the set of vectors are called orthonormal set. So, suppose we have various
examples to verify that the set of vectors are an orthonormal sets. Suppose, you have a
standard basis of Rn, if you have a standard basis of Rn what are standard basis of Rn?
We know standard basis of Rn is simply (1, 0, 0,......0), (0, 1, 0,......0) and similarly, (0, 0,
0,.....,1). If you take the inner product of any two distinct vectors in this set, it is 0. You
can simply verify 1 into 0 is 0, 0 into 1 is 0. So, you can simply verify that if you take
inner product of any two distinct vectors in this set, it is 0 and the norm of any vector in
this set is 1. So, we can say that this set of vectors are orthonormal set.
Similarly, if you take the second example, now second example is not an orthonormal
you see. If you take the inner product of these two it is 1 into 1 is 1, 0 into root 2 is 0, -1
into 1 is -1, which is 0. The inner product of these two is 0. If you verify the inner
product of these two, it is 1, -2, 1 again 0. If we verify the inner product of first vector
and the third vector it is 1, plus 0 and -1, again 0; that means, this set of vectors is
orthogonal.
Now, the norm of this vector is (2)1/2 which is not 1; that means, this set of vectors is
orthogonal, but not orthonormal.
268
So, how can we construct an orthonormal set from an orthogonal set, you simply divide
each vector by it is norm. You see if you have a set of vectors like this say (v 1 , v 2 ,....v n ),
and it is you know that this set is an orthogonal set. Orthogonal means that inner product
of any two distinct vector is 0, and now you want to construct an orthonormal set from
this set.
So, how you can do that? You can simply divide each vector by its norm. You
see(v 1 /||v 1 ||, v 2 /||v 2 ||,.... v n /||v n ||), it will be some different set from this set, but it will be
an orthonormal set. So, always we can construct an orthonormal set from an orthogonal
set by dividing each vector with it is norm. You can easily verify now you see that norm
of each vector is 1 now, what is the norm of the first vector if it is U 1 , what is the norm
of U 1 ? ||U 1 || is ||v 1 ||/||v 1 ||, it is a scalar quantity. So, we can take it out it is 1/||v 1 || ||v 1 ||.
So, it is 1.
Similarly, we can verify for this vectors also and if you take any two distinct vector in
this set because norm is only a scalar quantity it will comes out from the inner product
and this set is an orthogonal set. So, this will also be an orthogonal.
Now, similarly if you take the third example where V is a space of real valued
continuous functions on the interval x varying from 0 to 1 with the inner product defined
like this, where fn is defined like this is f n (x)= (2)1/2cos 2nx and g n (x) = (2)1/2sin 2nx
then this set is an orthonormal set, it is very easy to show.
269
You can take f i (x) = (2)1/2 cos2ix, you take g j (x) = (2
)1/2 sin2jx and if you take the inner product of f i with g j which is given by a 0 to 1
fi(x).g j (x), I am taking i j here.
So, it will be equal to 0 to 1, 2 cos2ix. sin2jx dx, this integral is solved in (Refer
Slide Time: 07:59). So the inner product of f i (x) and f j (x) is 0, we can verify that inner
product of f i with itself is1 and similarly inner product of g j which itself is 1, this we can
easily verify using the same concept.
So, we can say that this set is an orthonormal set because if you take any two different
vectors any two different functions. In fact, if you take one function as 1 and you
multiply with any f i , I mean I want to say in the same if you take inner product of 1 with
f i this is also 0 for all i and inner product of 1 with g j is also 0 for all j this also we can
verify using the same definition of inner product. So, hence we can say that this set is an
orthonormal set.
Now, the next result is let V be an inner product space, and the set which is given as {v 1 ,
v 2 ,....v k } be an orthogonal subset of V consisting of non-zero vectors, it is an orthogonal
subset and it consist of non-zero vectors. So, the first result is this, set is always LI; that
means, set of non-zero vectors which is an orthogonal is always linearly independent.
How we can show this?
270
(Refer Slide Time: 11:25)
Very easy to show; you see we have the set {v 1 , v 2 ,....v k }. It is given to us that the set is
orthogonal, means inner product of v i with v j is 0 for all i j, we also know that v i 0 for
all i because it consist of non-zero vectors.
Now, in order to show that this set is linearly independent, take a linear combination of
these vectors, put it equal to 0 and try to show that each scalars is 0 So, take a linear
combination of these vectors, and put it equal to 0. Now, we have to show that each is
0 in order to show that this set is linearly independent.
Now, say this vector is v the linear combination of these vectors say the linear
combination of these vector is v which is of course, equal to 0. Now, you take the inner
product of this v with say any v i or what you can do you take the inner product of this
vector v with any v i , inner product of 1 v 1 + 2 v 2+ ....+ k v k v k with any say v i here also
0 with v i for any i from 1 to k. This i may be 1, this i may be 2 or this i may be k.
Now, this is of course, 0 inner product of 0 with any vector is always 0. Now, when you
take the inner product definition of inner product of on this v i ; so, this will be 1
<v 1, v i >+ 2 <v 2 ,v i > +..... i <v i , v i >.... +...... k < v k ,v i > = 0 because the right hand side
is 0.
Now, since this is an orthogonal set that means, for any two distinct vectors in this set the
inner product is 0. So, that means, this is 0 ; that means, this is 0 ; that means, this is 0 if i
271
k, it will be, it will have some value only when this i= i. So, that means, it is i ||v i ||2= 0
because all others are 0 and since v i 0, for all i this means this is not equal to 0, and; that
means, i =0.
Now, you vary you vary i, this i may be 1; that means, 1 = 0, this i may be 2; that
means, 2 = 0 and if you vary this i over k, so; that means, 1 = 2 = 3 =... k = 0 and
this means set is linearly independent. Or you can understand it is like this you take the
linear combination of this vector and suppose it is v, you first take the inner product of
this vector with v 1, this will give 1 =0.
Now, you take the inner product of this vector v with v 2 this will give 2 = 0 and
similarly, you will take inner product of this v with v k this with k = 0 and hence we will
obtain all = 0 that mean the set is linearly independent.
The second part of the theorem is if y belongs to span of S, span means some linear
combination of these vectors of S then y can be expressed as this sum also it is also easy
to show.
272
because this y is in the span of these vectors. So, now, we have to find out 1 , 2 ,..... n ,
like this we have to find out.
Now, if you take inner product of this y with suppose v p , where p is any value between k
and 1 again p may be 1, may be 2, may be k, any p then this is equal to inner product of
1 v 1,.... k v k with v p . Now, applying the definition of inner product this is 1 <v 1 , v p > +
2 <v 2 , v p > +........+ p <v p , v p > +........+ n <v n , v p > and since this set is an orthogonal
set so, all these terms are 0 only this term left. So, this means this is equals to p ||v p ||2.
So, that means, p is nothing, but <y, v p >/||v p ||2.
So, what is 1 ? You replace p by 1. What is 2 ? You replace p by 2 and similarly other
's . So, what we can say about y? Now, this y can be written as summation p from 1 to
k p v p . But, p is <y, v p >/||v p ||2. So it is p from 1 to k <y, v p >/||v p ||2v p So, hence we
obtain this. So, the same result instead of I am having p no problem with that.
Now, in this problem you see in the inner product space R3 with the standard inner
product, this set is an orthogonal set. You can easily verify, you see you take the inner
product of any two distinct vectors of this set you will find that it is 0. So, this set is an
orthogonal set.
Now, how can we find its equivalent orthonormal set? This I have already discussed that
in order to form an orthonormal set from an orthogonal set, simply divide each vector by
its norm. So, what is the norm of this vector? It is (2)1/2, norm of this vector (3)1/2, norm
of this vector is (6)1/2.
273
(Refer Slide Time: 19:24)
So, what is an equivalent orthonormal set of this the equivalent orthonormal set of this is
suppose S' which is the first vector is (1, 1,0)/(2)1/2, the second vector is(1, -1, 1)/(3)1/2;
the third vector is (-1, 1, 2)/(6)1/2. So, this will be an equivalent, the first element is 1, 1,
0. So, this is an equivalent orthonormal set.
Now, again the next problem we have to express (2, 1, 3) as the linear combination of
vectors of S. So, how we can express (2, 1, 3) as a linear combination of elements of S.
So, we can use the previous result you see. In the previous result this set is an orthogonal
set and if y belongs to span of S then this y is some expression like this.
So, this so, here you can see that because this is an orthogonal set and all vectors are
linearly independent. So, this will be a basis of R3, any three linearly independent vectors
of R3 will be a basis of R3. So, this is a basis of R3 means this (2, 1, 3) will belongs to
span of these three vectors v 1 , v 2 and v 3 and since (2, 1, 3) belongs to a span of these
three vectors.
So, we can use the previous result sorry we can use a previous result here and this y we
can write as like this y we can write it like, here i from 1, 2, 3 because these are three
vectors inner product of (2, 1, 3) which is y with v i ’s upon norm of v i ’s square into v i ’s.
So, now, you vary i here you can see in this example v 1 is this, v 2 is this, v 3 is this you
can simply substitute in that expression to find out that to find out the expression in
which 2, 1, 3 can be expressed as linear combination of elements of vectors of S.
274
(Refer Slide Time: 22:04)
Now, the next result is supposed (w 1 , w 2 ,.....w r ) form an orthogonal set of nonzero
vectors in V, it is an orthogonal set again and if we write v which is any vector in V and
if we define v' as this where c i is the defined like this expression, < v, w 1 >/<w 1 , w 1 > and
similarly other c i ’s then this v' is always orthogonal to w again it is easy to show.
What is v' here? v' = v - i from 1 to r c i w i , ok, it is up to r. So, you can take it up to r, v
and the c i is inner product of v with w i upon norm of w i square and we have to show that
this v' is orthogonal to (w 1 , w 2 ,.....w r ). So, that means, we have to show that this inner
product < v', w i > = 0 for all i, this we have to prove.
275
So, take the inner product of v' with w i ’s. This is inner product of what is v'? <v- i from
1 to r c i w i , w i> This is inner product of v with w i . Now, when you open this sum it
will be what? c 1 you see minus will come out it is inner product of c 1 w 1 +c 2 w 2 +.....c r w r
with w i this will remain as it is.
Now, since this (w 1 , w 2 ,.....w r ) is an orthogonal set ; that means, <w 1, w i > = 0 if i 1, ;
that means, this will be having some nonzero value when w i with inner product with w i
only ; that means, it will value only one coefficient itself c i , w i with w i . You see when
you apply a definition of inner product in this expression <w 1, w i >. So, in between we are
having some c i w i also. So, only that expression will be here, for other i’s it will be 0.
Now, we are having important process to orthogonalize the given set of vectors. Suppose
a set of vectors (v 1 , v 2 ,.....v n ) is a linearly independent set, and you want to find out an
orthonormal or orthogonal, anything orthogonal orthonormal set of vectors (u 1 , u 2 ,.....u n )
such that the span of (v 1 , v 2 ,.....v i ) = span of (u 1 , u 2 ,.....u i ); i varying from 1 to n. So,
this property or this result is called Gram Schmidt orthogonalization process.
Now, what is the proof of this? The proof is basically we can obtain the proof using
mathematical induction. So, we have to construct a set of vectors (u 1 , u 2 ,.....u n ) such that
276
this set is an orthonormal set and the span of(u 1 , u 2 ,.....u i ) = span of (v 1 , v 2 ,.....v i ) for
each i; i is varying from 1 to n this we have to show.
So, in the mathematical induction first we take i = 1, ok. First you take this i =1 and then
we assume i hold for i = i-1 and try to show that it is also true for each i,
Now, for i=1 we can easily see if we take i =1. So, you can always set u 1 as v 1 /||v 1 ||, you
have to set u 1 , u 2,...... u n such that this set is an orthonormal set and the span of (v 1 ,
v 2 ,.....v i ) is equal to span of (u 1 , v u ,.....u i ) for each i. So, first we are taking i = 1 this is a
first step for i =1; I assume that u 1 is like this, of course, you can easily verify that norm
of u 1 = 1 you can simply see the norm of this is what the norm of u 1 =||v 1 ||/||v 1 ||.
And norm is a scalar quantity it can be taken out ||v 1 || = 1 and of course, this norm is well
defined I mean this v 1 is u 1 is well defined because v 1 0 because it is given to us that
(v 1 , v 2 ,.....v n ) is an linearly independent set and if any vector is 0, it will become LD set
linearly independent set. So, that means, each v i 0. So, this expression is well defined.
And, now, since v 1 = ||v 1 || u 1 so, we can easily say that a span of v 1 is equal to a span of
u 1 , because it is some scalar multiplication of u 1 . Since it is a scalar multiplication of u 1 .
277
So, whatever spans this v 1 , the same span will span the vector u 1 . So, we have proved
this result for i = 1.
No, we will assume that it hold for i = i -1 it holds for i, i = i-1. Now, we have to show
that it also hold for i.
So, I have assumed that we have constructed an orthonormal set of i-1 vectors. (u 1 ,
u 2 ,.....u i-1 ) such that the span of these i-1 v i ’s is equal to a span of these i-1 u i ’s.
Now, first of all since these vectors are linearly independent so, of course, any v i ’s
cannot be expressed as linear combination of i-1 v i ’s. If this v i belongs to a span of this,
means the set is linearly dependent, because this v i belongs to span of this; that means,
there exist a vector v i in this set such that this v i can be expressed as linear combination
of elements of this set. So, of course, this v i will not belongs to span of this, number one.
Now, we defined w i like this, you see we defined w i as we defined w i as v i - < v i , u 1 >
u 1 - <v i , u 2 > u 2 +.....+ <v i , u i-1 >u i-1 because we are considering i- 1 set of u i ’s. So, we
our self have constructed a w i satisfying this expression this we have constructed our
self.
Now, we have constructed we have set u i as w i /||w i ||. Now, again ||u i ||=1 for each i that is
easy to verify and again this expression is well defined because w i 0, because if w i = 0 ;
that means, v i is equal to linear combination of u i ’s and linear combination of u i ’s u i’
278
with i- vector is same as linear combination of v i with i-1 vectors, that may be the set
(v 1 , v 2 ,.....v n ) will become LD. So, which is not possible hence w i will not equal to 0.
Now, now, we have to show that this set is an orthonormal set basically we have to show
that this u i is orthogonal to any u p , where p is varying from 1 to i -1. So, the thing is very
easy to show u p or u k you can take anything, where k is varying from 1 to i-1. Now, what
is u i here u i is w i /||w i ||, as we have already discussed. This expression of norm of w i can
be taken out by a definition of inner product can be taken out. This w i is given by you
can see here this w i this expression we can substitute it here with u k .
Now, (u 1 , u 2 ,.....u i-1 ) is an orthogonal set, I mean orthonormal set this is by our
assumption of mathematical induction. So, that means, if you take the inner product of v i
with u k it will be remain as it is and this u k for any k between 1 to i -1 this will be 0 for
all u i ’s other than i = k.
Though it will exist only for i = k and for i = k the inner product of u k with u k will be
||u k|| 2 and it will v i .u k you see in some term you will be having some v i . u k inner product
of v i with u k . Now, a inner product of u k with u k will be ||u k ||2 and this will come out
since it is a orthonormal set. So, this norm is 1 and this will cancel out. So, it is 0.
So, we have shown that the set of now, i vectors (u 1 , u 2 ,.....u i ) becomes an orthonormal
set, we have set it like this only and also v i belongs to span of this, also we have seen
from above. So, both the vectors are linearly independent. Hence by induction we can
279
say that span of this equal to span of this for all i’s. So, hence we have proved Gram
Schmidt process.
So, basically what happens in Gram Schmidt process that if you are having any linearly
independent set of vectors say (u 1 , u 2 ,.....u k ) then you can always find an orthonormal or
orthogonal set of vectors (v 1 , v 2 ,.....v k ) such that span of u i ’s is equals to a span of v i 's
for each i. This is a main concept of Gram Schmidt process.
Now, how we can apply Gram Schmidt process in various problems? You see say (u 1 ,
u 2 ,.....u k ) which is linearly independent given to you. Now, you want to construct an
orthogonal set of vectors using Gram Schmidt process from this set of vectors. So, how
we can do that? You so, suppose that set of vectors is (v 1 , v 2 ,.....v k ) of course, the same
number of elements will be there.
So, we have to construct this orthogonal set. If you obtain orthogonal then orthonormal
can be find out by dividing each vector by its norm, because all vectors will be definitely
nonzero vectors because this vectors are linearly independent. So v1, v2 ...vp can be
computed using Gram Schmidt process as shown in (Refer Slide Time: 35:20)
So, in this way we can construct an orthonormal orthogonal set of vectors and if further
we want to find orthonormal set of vector using this so, we can divide each vector by its
norm.
280
(Refer Slide Time: 38:02)
Now, these are some result from this if V be a finite dimensional inner product space
then it always has a orthonormal basis, very easy to show you see. If we are having a
finite dimensional vector space so, it will be having a finite basis, say it is (u 1 , u 2 ,.....u n )
if it is having and if it is a n dimensional vector space and using Gram Schmidt process
we can always find an equivalent orthonormal set of vectors. So, that means, there exists
an orthonormal basis of a finite dimensional inner product space.
The next is if V is a finite dimensional inner product space then any orthonormal set of
vectors any in V can be extended to form an orthonormal basis of V.
If it is an orthonormal set of vectors. So, we can always extend this to find an basis say to
find an basis of V if dimension of basis is m+p, and using Gram Schmidt
orthogonalization process we can always find its equivalent orthonormal basis using the
corollary one. So, that means, (u 1 , u 2 ,.....u m, u m+1.... u m+p ), that will be the equivalent
orthonormal basis of that finite dimensional vector space. So, these are few examples.
281
(Refer Slide Time: 39:53)
Based on this let us discuss the first one, using that, second one can be obtained
similarly. So, what basically Gram Schmidt process is? You have given a set of linearly
independent vectors and how to find an equivalent orthogonal or orthonormal set of
vectors from it such that the span of(u 1 , u 2 ,.....u i ) = span of (v 1 , v 2 ,.....v i ) for each i.
That means suppose you are having a set of vector say the set of vectors are (u 1 ,
u 2 ,.....u k ), these are linearly independent and you want to construct an equivalent
orthogonal set of vectors from this using Gram Schmidt orthogonalization process. So,
suppose it is (v 1 , v 2 ,.....v k ), number of elements will be same of course, so, that is
orthogonal. These orthogonal set of vectors we want to construct from this.
282
So, we can obtain it using Gram Schmidt orthogonalization process as explained in
(Refer Slide Time: 40:31)
So, this is how we can easily find out set of orthogonal vectors and once we obtain an
orthogonal set of vectors from this linearly independent set using Gram Schmidt process
so, orthonormal set can be find out by dividing each vector with it is norm.
So, now, let us discuss one example based on this before discussing an examples let us
discuss these two results you see that if V; V is a finite dimensional inner product space
then V is an orthonormal basis you see if it has a finite dimensional inner product space;
that means, it is having a finite dimensional basis and using Gram Schmidt process you
can always find its equivalent orthogonal orthonormal set which is of course, a basis
because span of both the sides are equal by the Gram Schmidt process and is independent
also. So, by the Gram Schmidt process we can always find an orthonormal basis of any
finite dimensional inner product space V.
Now, if V is a finite dimension inner product space then any orthonormal set of vectors
(u 1 , u 2 ,....u m ) can be extended to form an orthonormal basis of V.
This is also very easy to show you see you are now having a set of vectors (u 1 , u 2 ,....u m ).
This is a orthonormal set. Now, you can always extend up any set to form the basis of V
finite dimensional vector space V. So, suppose we extended it up to p suppose dimension
of the vector space v is m+p and now, using Gram Schmidt process Gram Schmidt
orthogonalization process you can always convert this set of vectors into its equivalent
orthonormal set of vectors. So, hence we have extended this orthogonal set to an
orthogonal basis of this vector space V.
283
Now, these are some problems based on this and there are two problems can be easily
obtained. Let us discuss the first problem, the second can be solved on the same lines, the
definition of inner product is different other things are same.
You see here we are having in this example you are having the set of vectors as (1, 0, 1,
0), the second is (1, 1, 1, 1), the third is (0, 1, 2, 1) this is u 1 , u=, u=. So, v 1 = u= by the
Gram Schmidt process, v 2 = u 2 -<u 2 , v 1 >/||v 1 ||2.v 1 which is (1, 1, 1, 1)-2/2(1,0,1,0) =(0,
1, 0, 1).
Now, if you take the inner product of this with v 1 ; v 1 is u 1 ; that means, 0 +0 + 0 + 0
which is 0. So, that means, we are going on the right direction, how we find v 3 now? v 3
is again u 3 -<u 3 , v 1 >/||v 1 ||2.v 1 - <u 3 , v 2 >/||v 2 ||2 .v 2 . So, this is you see v 3 = (-1, 0, 1, 0)
Now, if you take inner product of these two, these two it is 0, 0, 0, 0 you have 0, inner
product of these two is -1+1, 0, yeah.
Now, this (v 1 , v 2 , v 3 ) this set of vectors is an orthogonal set and if you want to find out
the corresponding orthonormal set, then divide each vector by its norm. Norm of this is
2, norm of this is (2)1/2 and norm of this is (2)1/2,. So, divide each vector by its norm, you
will get an equivalent orthonormal set of vectors. Now, similarly we can discuss this
solve this examples.
284
So, hence we have seen this in this lecture that if you are given a set of a linearly
independent vectors how we can find its equivalent orthogonal or orthonormal set of
vectors.
285
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 21
Normal Matrices
Hello friends, so, welcome to the 21st lecture of this course. So, in this lecture I will
introduce you Normal Matrices. So, in the last unit of this course you have learned about
diagonalization of a matrix. And we have seen that if a square matrix consist a complete
set of linearly independent eigenvectors, then we say that the matrix is diagonalizable;
that is, if the size of matrix is nn and you are getting n linearly independent
eigenvectors for that matrix, then the matrix is diagonalizable.
There may be case that the matrix is having n eigen values, then corresponding to each
eigenvalue you will get a linearly independent eigenvectors, and hence matrix is
diagonalizable. In other case when the algebraic multiplicity of any eigenvalue is more
than 1, then if you are having the same geometric multiplicity that is algebraic
multiplicity equals to geometric multiplicity for each eigenvalue then the matrix is also
diagonalizable. However, in this lecture, we will see something more than linearly
independent, and that is called the orthogonal set of eigenvectors.
286
So, linearly independence does not guarantee about orthonormal set of eigenvectors.
Means there is no assurance that the matrix P which is called model matrix, and that we
are writing from the eigenvectors, means the columns of the eigenvectors of the matrix
A. So, this matrix can be taken to be unitary, if it is defined with complex entries, or
orthogonal if it is having real entries.
So, we are having n linearly independent eigenvectors and from there we are writing P,
but we do not have any assurance about the orthonormal or orthogonal property of those
eigenvectors. We have learned that gram Schmidt process, from that we can convert a set
of n linearly independent vectors into n orthogonal vectors or n orthonormal vectors.
Gram Schmidt if you are having n linearly independent vectors and you apply the Gram
Schmidt process, then you will get a set of n orthonormal vectors.
But there is no guarantee that if the earlier set is n linearly independent eigenvectors,
then the resulting set will be having n orthonormal eigenvectors. So, here Gram Schmidt
also does not help. So, in this lecture we will learn a special class of matrices those are
unitarily similar to a diagonal matrix. So, you know that if I am having a matrix A and it
is diagonalizable. So, I can find out a model matrix P such that A = P D P-1; where P is
coming from the eigenvectors of A and D is a diagonal matrix. Now unitarily similar
means the matrix P is a unitary matrix.
So now, our first definition is unitary diagonalization. So, consider a matrix A of size
nn having complex entries. It is unitarily similar to a diagonal matrix means A has a
287
complete set of orthonormal eigenvectors, if and only if the A*A=AA*. In this case, A
is said to be normal matrix.
So, what I want to say that a matrix is normal if and only if it is unitarily similar to a
diagonal matrix. Means U*AU=D, where U is a unitary matrix having columns as
eigenvectors of A, where D is a diagonal matrix, if U*AU=D, then A will be a normal
matrix; that is, AA* = A*A. And if A is a normal matrix, then A will be unitarily similar
to a diagonal matrix; let us try to prove it.
So, here we want to prove that AA*=A*A, means A is normal, that is the most popular
definition of a normal matrix, there are alternate definition also. So, A is normal if and
only if A =U D UT; where D is a diagonal matrix consisting with the eigenvalues of A
and U is a unitary matrix. So, let us first assume that A is unitarily similar. So, it means
let A =U D U.
So, since we are talking about complex matrix. So, instead of transpose I will write star,
then if I see A.A* this will become (UDU*)(UDU*)*. So, as I told you let me (Refer
Time: 07:09) again that star is for the complex transpose, conjugate transpose. So, it will
be U D U* and then this will become U D*U*. This equals to, as I told you U is a unitary
matrix so, U* U will be identity so, it is UDD*U*.
Now if I see A*A, then it will become U D U*U D U* so, this is my A*, this is A. This
equals to U D*U* U D U*. So, this comes out to be U D*U*UDU*; U*U will become
identity so it is UDD*U*. So, A A* = A*A only when D.D*= D*D. So, as you know the
288
D is a diagonal matrix so, since D is a diagonal matrix, hence D D* will be equals to
D*D, and this implies A.A*=A*A.
So, this is the proof of first part that if A is normal A is unitarily similar to a diagonal
matrix then A is a normal matrix. Now, see the another part of this so, here we are
assuming that A is normal.
And we need to prove that the matrix A is unitarily similar to a diagonal matrix; that is,
A = UDU*, where U is a unitary matrix. So, since A is normal, it means AA*= A*A, let
us say this lesson number 1. Now from the Schur decomposition lemma, we know that
any square matrix of order n can be decomposed as the product of 3 matrices UMU-1,
where U is a unitary matrix. So, U can be written as U M UT, and M is an upper
triangular matrix. So now, from here I can write M =UTAU.
Now, if I calculate MM* that is the conjugate transpose of M, it will become (U*AU)(
U*AU)* =
U*AU U*A*U=U*AA*U. So, again it is* because we are talking about
complex matrix. So, I will write M*M=(U*AU)*(U*AU). This comes out to be
U*A*AU, as UU* will become identity so, it will be U*A*AU.
Now since we have assumed that A is normal. So, these two things are equal. So, from
here I can write MM*=M*M. So, let me write it as lesson number 2. Now the point is we
289
need to prove that A is unitarily similar to a diagonal matrix. Here we have seen that A is
unitarily similar to a upper triangular matrix. Now this upper triangular matrix is M. So,
only thing we need to prove that this upper triangular matrix is a diagonal matrix. So,
what we need to prove that M is a diagonal matrix.
Now, let M is an upper triangular matrix. Now if it is a 33 matrix and I take M as [m 11 ,
m 12 m 13 ; 0 m 22 m 23; 0 0 m 33 ]. So, M.M* so, if M is this, then M* will become [m 11
conjugate 0 0; m 12 m 22 0; m 13 , m 23 , m 33 ] and conjugate of each element. So, these are the
matrices M and M*. So, if I try to calculate the upper left element of MM*, then it comes
out to be if you see from here, m 11 . m 11 conjugate m 12 .m 12 conjugate and so on.
So, the first element of this, that is the upper left element will be ||m 11 ||2, ||m 12 ||2 up to
||m 1n ||2. While if I calculate the upper left element of M*M, then this comes out to be the
square of the norm of m 11 , now what is happening?
Since M M*= M*M. So, these 2 elements should be equal, this gives me that m 12 = 0 =
m 13 =..... = m 1n . That is the first row of the matrix M is having 0 entries except the pivot
element that is the first element. Similarly, if I compare the diagonal element of the
second row, it will give me the elements M 2j where j 3 up to n or 0. In this way what I
290
can say that M is a diagonal matrix. And hence A is unitarily similar to a diagonal matrix
D which is nothing just M. So, in this way we can prove this result, ok.
So, after this result let me give you few examples of normal matrices. So, all symmetric
matrices are normal. Then all skew symmetric matrices are also normal all Hermitian
matrices are normal. All skew Hermitian matrices are normal, in fact, all orthogonal
matrices are normal. And if I say about complex vector space, then all unitary matrices
are normal. Apart from these examples some other normal matrices like [1 -1; 1 1] are
also normal. There are matrices also, those are not normal, we will see.
291
Now, again an important result for normal matrices that a matrix of size nn having
complex entries is normal if and only if every matrix unitarily similar to A is normal; So,
the proof of this particular theorem is quiet simple, let me explain here suppose A is
normal, it means A*A = AA* and B is unitarily similar to A; that is, B= U*AU where U
is a unitary matrix. Now if I calculate B*B, it comes out to be U*AU*U*AU.
After calculating it comes out to be U*A*UU*AU, UU* will become I. So, I can write
this expression as U*A A*U. And since U is a unitary matrix so, I can multiply between
A and A* by an identity matrix, which is I can write in terms of U and U* is product of
UU*, and which is nothing just BB*r. So, what we have done? B*B=BB*, it means B is
a normal matrix.
Now, if we prove this from the other way, that B is normal, then B B*= B*B, then
U*AUU*A*U will be equals to U*AA*U. And from here we can see that AA*= A*A
this implies that A is normal. So, this is the proof of this theorem.
Another important property of normal matrix that a matrix of order n which is a square
matrix having complex entries is normal, if and only if the Euclidean norm of A into x
for a given vector x, from the n dimensional complex vector space, that norm of ||Ax|| 2
=||A*x|| 2 .
So, the proof is quite simple from the definition of inner product, that the norm ||Ax||
<Ax,Ax>, this can be written as xA*Ax since A is normal so, A*A can be written as
AA*. So, <xA, A*x>, this comes out to be inner product <A*x, A*x, which is nothing
just ||A*x|| so, very simple result one-line proof. Now, come to another important result
that is called spectral theorem.
292
(Refer Slide Time: 21:07)
Spectral Theorem: So, given a square matrix of order n having complex entries then the
following statements are equivalent. The first one is, A is a normal matrix, the second is,
A is unitarily similar to a diagonal matrix; or in other word we can say, A is unitarily
diagonalizable. The third point is, the summation of entries of A and square of those
equals to the square of the amplitude of the eigenvalues of A. Since A is means square of
the mode of A, that is since A may be the complex and A may contain the complex
entries.
That is s|a ij |2 = 1 to i to n | i |2 , where 1, 2, 3,...... n, all are the eigenvalues of matrix
A. The fourth statement is there is n orthonormal set of n eigenvectors of A. Means A
consist a complete set of orthonormal eigenvectors; that is, the corresponding matrix be
unitary. And that is some of them we have already proved that A is normal, then I have
shown you that A is similar to a diagonal matrix.
So, first and second one already we have done, second to fourth is obvious, four to
second is obvious, and since first to second then we can go from second to fourth. And
similarly we can go for third. So, this theorem I am doing without proof, although some
of proof we have done in the previous slide.
Now, let us take some numerical examples. So, the first example is A which is given by
this [5+i; -2i; 2 4+2i], a normal matrix or not? So, let me do it.
293
(Refer Slide Time: 23:13)
So, A is given as [5+i -2i; 2 4+2i]. So, we have to check whether this matrix is normal or
not. So, first of all we will calculate conjugate transpose of A, that is A*. So, it will
become 5-i, that is the conjugate of 5+ i then -2i will become 2i, then 2 will remain as
such and 4-2i, ok. Let me calculate A A* so, AA* will become [5+i -2i; 2 4+2i][5-i 2; 2i
4-2i]. So, this comes out to be 26+4= 30.
Similarly, this row will be multiplied with this particular column. So, 10+2 i and then I
am having --8i+4i2. So, 10-4 will be 6. 1-i then this row will be multiplied with this
column. So, 10-2i+8i+4i2 so, 10-4 will become 6 so, 1+i. And the last entry with this row
will be multiplied with this column. So, it will become 2.2 4+16 - 4 i2 so, this is the AA2.
Similarly, we will calculate A*A.
294
(Refer Slide Time: 25:53)
So, A*A will become [30 6(i-1); 6(1+i) 24] So, what we are observing here that AA*=
A*A.
This means A is normal. This is one of the way of checking the whether the given matrix
is normal or not. If someone ask that give examples of two distinct classes of normal
matrices those are real, but not symmetric.
295
(Refer Slide Time: 27:32)
So, I have told you in the beginning itself in the second slide, that such class of matrices
may be skew symmetric matrices. That is A = -AT and orthogonal matrices; that is, AAT
= ATA.
Another important result related to normal matrices are Cayley transformation, and it is
saying that if A is skew Hermitian or real skew symmetric, then the function of matrix
defined as I-A.I+A-1 = I+A-1.I-A is unitary. Or if it is really skew symmetric then it is
orthogonal.
296
(Refer Slide Time: 28:31)
So, what I need to do I am having a function of matrix step that is of A. And it is define
as I-A.I+A-1 = I+A.I-A-1. And what I need to prove that this particular function that is f,
which is a matrix of the same size as A, is unitary or orthogonal according to the given
matrix A, if it is skew Hermitian or skew symmetric. So, let us prove it, let A be skew
Hermitian. Then what we are having? -A = A*.
Now, if I see this matrix I-A, since A is a Hermitian matrix, and I+A then the
eigenvalues of I-A or I+A will be of the form 1±some imaginary number. Because the
eigenvalues of skew Hermitian matrix are purely imaginary or 0 so, eigenvalues will be
either 1 or 1 ± some imaginary. This gives that I-A and I+A are invertible. Means inverse
is well defined, so, the above function f is well defined.
Now let us assume B = I+A . I-A-1. Then if I calculate B*, B* will become i+A i-A-1,
and A*r of this matrix so, these are 2 matrix and B. So, (A B)* will become B*A* so, it
will become (I-A-1)*.(I+A)*.
So, I am interchanging inverse and star here I*+A*. Now this will be I-A*, so, I*-A* and
A*= I-A. So, this will become I+A-1 this whole thing, and this will become I-A. So, what
I am having? That the 2 functions I-A . I+ A-1 and I+ A .I -A-1 both are the complex
conjugate or conjugate transpose of each other.
297
Now, if I take the matrix B.B* it comes out to be I + A I - A-1 , this is my B.I-A. I+A-1,
this is B*. So, I+A, now if I multiply this are I-A-1.I-A. So, becomes identity I+A-1, this
is equals to I. Similarly, I can show that B*B=I, because it is not very difficult because
B* is given like this, and B is given like this.
So, I can always find out this thing. So, B B*= B*B = I, it means B is unitary matrix,
which we need to prove. So, this is the proof of this Cayley transformation. Now, next
result is so that a triangular matrix is normal if and only if it is diagonal.
And that we have already proven in the very first definition, where I have shown you by
taking the skew decomposition lemma and a matrix M, and I have shown that M M* =
M*M, only when M is diagonal. And the same proof can be used here, the same concept.
These are few references for this lecture. So, in this lecture we have learned about
normal matrices, in the next lecture we will learn another very important class of
matrices, those are called positive definite matrices. So, with this I close this lecture.
298
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 22
Positive Definite Matrices
Hello friends, so, welcome to the 22nd lecture of this course. So, in the last lecture, we
have learnt few results about normal matrices. In this lecture, we will discuss another
important class of matrices those are called Positive Definite Matrices. So, as we know
that if we are having a symmetric matrix, then all the eigenvalues are real, that is true,
and we have seen the proof of this in the previous unit.
However, what is additional we will learn in this lecture that the eigenvalues are not only
the real, but positive also or just non negative.
So, a square matrix A is called positive definite if it is symmetric, and for a non 0 vector
x, xTAx, which is a scalar and x would be positive. So, whatever non 0 vector you take x,
xTAx would come out to be positive. If this is true, then we will say the square matrix A
is a positive definite matrix.
In short I am saying it PD, P for positive and D for definite. The same definition in other
words can be written like this; xTx 0 for all x, and this particular expression xTAx will
be 0, only if x = 0. So, this is the definition of positive definite matrix. A square matrix A
299
is positive semi definite or PSD in short, if it is symmetric, and xTAx 0 for all non zero
x. So, the only difference is positive definite and positive semi definite is, here it will be
strictly greater than 0, for non zero x, here it may be 0 also. The example or I will say
that trivial example of a positive definite matrix is identity matrix.
So, that a symmetric matrix A whose eigenvalues are positive or non-negative is called
positive definite. So, here what we need to prove?
We are having a matrix A, let us say Ann matrix, and we have to prove that if all the
eigenvalues of A are positive then it is a positive definite matrix, and if all the
eigenvalues of A are non-negative, then it is a positive semi definite matrix. So, let us
take an eigenvalue , let be an eigenvalue of A and x be the corresponding eigenvector.
300
So, here is eigenvalue and corresponding eigenvector is x, it means Ax=x. Now,
multiply both side by xT. So, xTAx will become xTx. Or I can write it xTAx/xTx. Now
if is positive, this means xTAx/xTx is positive.
Since denominator will be always positive, so, it means xTAx is positive for all x belongs
to; this means A is a positive definite matrix. Similarly, if >0, this means xTAx0 for
all x belongs to Rn. And please note here x is eigenvector so, there will be non zero
vectors. So, in this way we can prove that if all the eigenvalues of a matrix are positive,
then the matrix is positive definite. If all the eigenvalues of a matrix are non-negative,
then the matrix is positive semi definite.
Alternate definition for positive definiteness can be given like this. A real symmetric
matrix A is positive semi definite or I will say a positive semi definite, if and only if A
can be factored as A = BTB. Further A is positive definite if B is nonsingular.
301
So, the proof can be seen like this so, A is positive definite or positive semi definite, if
and only if A can be written as BTB. So, since A is a symmetric matrix, we can always
find a matrix P such that A = P D PT. Means, since A is symmetric so, it will be always
diagonalizable. Now what I need to prove that A can be written as the factor of two
matrices that is BTB. Now this can be written as P since A is positive semi definite. So,
eigenvalues of A will be non-negative, and hence all the diagonal entries of D will be
non-negative. So, I can write this PD1/2D1/2PT; where the entries are square root of the
original D.
So, if I choose B as D1/2PT, then from here I can write A = BTB. If B is nonsingular, then
all the diagonal entries of B will be strictly positive. And hence A is positive definite in
this case so, in this way we can prove this result. There are few more test to check in
whether a given matrix is positive definite or not.
And one of the test is based on the leading principal minors. So, a real symmetric matrix
is a positive definite if the leading principal minors of A are positive, or all principal
minors of A are positive. Means, having the positive determinates.
302
(Refer Slide Time: 09:28)
Let us check all these results for a given matrix. So, check the following matrix for
positive definiteness. The matrix is [2 -1 0; -1 2 -1; 0 -1 2]. So, the matrix A is given.
So, we have to check whether this matrix is positive definite or not. So, there are
different test so, let us take the first definition, let x belongs to R3, be a nonzero vector.
So, means x = [x 1 x 2 x 3 ]T. So now, xTA x will become [x 1 x 2 x 3 ][2 -1 0; -1 2 -1; 0 -1
2][x 1 x 2 x 3 ]. This comes out to be 2x 1 2-2x 1 x 2 +2x 2 2-2x 2 x 3 +2x 3 2. This I can write like
this, so, 2(x 1 -1/2x 2 )2 +3/2(x 2 -2/3x 3 )2+4/3x 3 2. So, the above thing can be written in this
way, and you know that the final expression is the sum of square terms. So, this will be
always positive for nonzero x. Hence A is positive definite because xTAx is always
303
positive for non-zero vector x, this is one of the way of checking the positive
definiteness.
Another way is by calculating the eigenvalues of the matrix A. So, if we check the
eigenvalues of this matrix A which is given like this, the eigenvalues comes out to be =
2, 2 ±(2)1/2.
So, eigenvalues are =2, =2+(2)1/2 and =2-(2)1/2. If we check it is positive, this is also
positive, and this is also positive. So, all the eigenvalues of A are positive this implies
that A is positive definite.
304
(Refer Slide Time: 13:50)
Another test is by checking the principal minor. So, matrix is [2 -1 0;, -1 2 -1; 0 -1 2]. So,
let us take first this minor so, A 1 which is 2 which is greater than 0. Now check the 22
minor. So, it will be this one so, this is the determinate of |2 -1; -1 2|. This comes out to
be 3 greater than 0. Now check the determinate of A, that is that 33 minor, this comes
out to be 6 minus 2, equals to 4 which is also greater than 0. So, all the principal minors
are positive and hence the matrix A is positive definite. So, in this way I have told you 3
different methods to check whether a given matrix is positive definite or not. What are
those? The first one just take a non-zero vector x and find out xTx, if you can write that
thing as a sum of perfect squares, then the given matrix is positive definite. Number 2
find out the eigenvalues, if all the eigenvalues are positive then the given matrix is
positive definite.
If all the eigenvalues are non-negative, then the matrix is positive semi definite, means
some are positive and some are 0. The third is minor test, if all the principal minors are
positive then the matrix is positive definite.
305
(Refer Slide Time: 16:20)
In the same manner we can define the negative definite matrix, a square matrix A is
negative definite in short I will say it ND, if it is symmetric and xTA x< 0 for all x,
means, for all non-zero x. The equivalent conditions are A has only non-negative
eigenvalues, the determinants of A’s principal sub-matrices are negative or A has
negative pivots, these are the equivalent conditions.
Like the test we have seen for the previous example just the same test, but in negative
sense. To prove a given matrix is negative definite. In the same way we can define the
negative semi definite in that symmetric matrix is negative semi definite, if xTx 0, for
all x those are not 0.
Then the matrix is called negative semi definite, means, if a matrix is having the
eigenvalue 0 and negative, then the matrix is negative semi definite.
306
(Refer Slide Time: 18:01)
Further a quadratic form is said to be positive definite, positive semi definite, whenever
the matrix A is positive definite or positive semi definite, this is the real quadratic form.
In the similar manner we can define the complex quadratic form. So, let X be a complex
vector having n components, and A is nn matrix having complex entries and A is
Hermitian. Then the expression f(x) = x*Ax which comes out to be a ij x i x j conjugate is
called a complex quadratic form.
307
For example, if a matrix is given to you and someone ask find out the quadratic form of
this matrix.
So, let A is given to you which is [3 -1 2; -1 2 0; 2 0 1], find the quadratic form of A; let
[x 1 x 2 x 3 ] be a vector in R3, then the quadratic form of A is defined as xTAx which is [x 1
x 2 x 3 ]T[3 -1 2; -1 2 0; 2 0 1][x 1 x 2 x 3 ]. So, this comes out to be 3x 1 2+2x 2 2+x 3 2-
2x 1 x 2 +4x 1 x 3. The coefficient of x 2 x 3 will be 0 as well as for the coefficient of x 3 x 2 . So,
this is the required quadratic form of the given matrix. If someone ask to check whether
this quadratic form is positive definite or positive semi definite or negative definite or
negative semi definite what you have to check you just check this matrix A.
Later on we have learnt few test how to check whether a given matrix is positive definite
or not. Apart from that we have learnt quadratic form relating to a matrix. In the next
lecture, we will learn some other important properties of quadratic form. We will learn
the diagonalization of the quadratic form; we will learn some applications of the
quadratic form, in particular to draw a given quadratic curve with the help of quadratic
form. With this I will end this lecture.
308
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 23
Positive Definite and Quadratic Forms
Hello friends. So, welcome to the 23rd lecture of this course. So, as you know in the last
lecture we have discussed the definition of positive definite matrices. And then we have
discussed few methods that how to check whether a given matrix is positive definite or
not. We have also learn the properties of eigenvalues of positive definite matrices. In this
lecture we will extend the same concept and we will see few applications of positive
definite matrices in particular when you are having a positive definite quadratic form.
As you know that in the last lecture, I have defined a positive definite quadratic form as
q(x) = xTAx, where A is a symmetric matrix of order n. And x is a vector from the n
dimensional vector space.
309
(Refer Slide Time: 02:20)
For example, if you consider x belongs to R3 and A be a 33 matrix. Then if you take A
=[3 0 0; 0 5 0; 0 0 1]. Then the quadratic form q(x) which is given as xT.A.x =[x 1 x 2
x 3 ]T[3 0 0; 0 5 0; 0 0 1] [x 1 x 2 x 3 ]. Then this comes out to be 3x 1 2+5x 2 2+x 3 2. So, this
particular quadratic form is a diagonal quadratic form. So, I can say in other word that a
quadratic form is said to be diagonal quadratic form if there is no cross term or cross
product term.
So, here my meaning of cross product term means a term something like x 1 x 2 or x 2 x 3 or
x 1 x 3 . So, all these are having 0 coefficient.
310
So, I can represent this quadratic form basically in this way. So, xTAx so, here A is a
diagonal matrix. So, i =1 to n a ii .x i 2, where x i are the components of the vector x.
This is another example of the quadratic form. Now, we are having a very important
result and we will see the application of this theorem later on. So, that every quadratic
form xTAx can be diagonalized by making a change of variable or coordinates, let us say
Y = QTx. So, let us see the proof of this.
So, proof let q(x)=xTAx be a quadratic form, where A is a symmetric matrix of order n
and x is a vector from the n dimensional real vector space. Now, as we know that A is
symmetric. So, there exists a diagonal matrix D and an orthogonal matrix Q such that, A
can be written as Q D QT; means as we discussed earlier also if A is a symmetric matrix
it is always diagonalizable, and here the diagonal matrix D will be having the
eigenvalues of A as the main diagonal entries. And Q will come from the eigenvectors of
A and those you can write as an orthogonal set of eigenvectors.
311
Now, so if I substitute this form of A in my quadratic form. So, I can write xT.Q.D.QT.x.
Now define a transformation, a vector Y, which is having the same dimension as of x as
QTx. Then YT will become xTQ and this particular quadratic form can be written as
YT.D.Y which can be written as i =1 to n d ii .y i 2 which is a quadratic form, but in
diagonal representation. So, this is the result which we have to prove and this is the proof
of this.
Now, let us take an application of this theorem. So, by diagonalizing the quadratic form
q(x) = 13x 1 2+10 x 1 x 2 +13x 2 2, plot the curve q(x)=72, where x 1 and x 2 are the coordinate
axis.
312
(Refer Slide Time: 09:13)
So, what curve we need to plot that is q(x)=13x 1 2 +10x 1 x 2 +13x 2 2 and this is =72. So,
we need to plot this particular curve. Now, q(x) can be written as x 1 x 2 [13 5; 5 13]x 1 x 2 =
72. So, here the matrix A = [13 5; 5 13], which is associated with the quadratic form
q(x). Now let us perform the diagonalization of the matrix A. So, here if I calculate the
eigenvalue of the matrix A it comes out to be |13- 5; 5 13-| =0.
So, if I solve it, it will be (13-)2 - 25 =0. This gives me eigenvalue as =8 and 18. Now,
if I calculate the eigenvector corresponding to eigenvalue, =8, then A - 8 I . x =0 gives
me 5x 1 +5x 2 =0 the equation from the first row from the matrix A - 8 I and the equation
from the second row will be the same.
That is 5x 1 +x 2 =0, 5x 1 +5x 2 =0. Now from these 2 equations I can write x 1 =- x 2 . So, the
eigenvector corresponding to =8 can be written as if I take x 1 as 1. So, x 2 will become
- 1 and if I want to write it as a unit vector so, it will become 1/(2)1/2[1 - 1]. Similarly, if
I calculate the eigenvector corresponding to eigenvalue =18, then it comes out to be -
5x 1 +5x 2 =0 and the second equation will become 5x 1 -5x 2 =0.
So, from here I am getting x 1 = x 2 and the corresponding eigenvector will become unit
eigenvector (1/2)1/2 and (1/2)1/2. So, in this way I can write the matrix A as p or q, where
313
q is [(1/2)1/2 - (1/2)1/2 ; (1/2)1/2 (1/2)1/2]. And the matrix D. So, D will be [8 0; 0 18] and
then again qT. So, it will become [(1/2)1/2 - (1/2)1/2 (1/2)1/2 (1/2)1/2].
Now, the associated quadratic form can be written as q(x) = [x 1 x 2 ] [(1/2)1/2 - (1/2)1/2 ;
(1/2)1/2 (1/2)1/2], which is my matrix q. Then I am having matrix D = [8 0; 0 18] and then
I am having qT which is [(1/2)1/2 - (1/2)1/2 ; (1/2)1/2 (1/2)1/2][x 1 x 2 ]. So now assume
another vector Y which is [y1 y2 ] this =(1/2)1/2 - (1/2)1/2 ; (1/2)1/2 (1/2)1/2] that is QTx. So,
x is here [x 1 x 2 ].
Now, if I want to plot this particular curve. So, if these are the axis x 1 and x 2 . So, then
the representation of y1 will be x 1 +x 1 =x 2 .
314
(Refer Slide Time: 17:40)
So, I can write A =LDLT which is a congruent transformation CTAC, where l = C. Now,
the two symmetric matrices A and B are called congruent, if there exist an invertible
matrix S such that; A =STBS.
For example, here the matrices A [13 5; 5 13] and [8 0; 0 18] are congruent because just
now we have seen that there exist a matrix Q which is [(1/2)1/2 - (1/2)1/2 ; (1/2)1/2 (1/2)1/2]
which is A =QDQT where [8 0; 0 18] and A is [13 5; 5 13]. Now, my next definition in
the same category is inertia.
315
(Refer Slide Time: 19:38)
So, how to define an inertia of a real symmetric matrix? So, the inertia of a real
symmetric matrix A is defined to be the triplet (, , ) in which , , are the respective
number of the positive, negative and 0 eigenvalues. So, inertia of a matrix is a triplet in
which first component is the number of positive eigenvalues, the second component is
the number of negative eigenvalues, and the third component is the number of 0
eigenvalues counting the algebraic multiplicities. So, if I take this example.
So, my example is A =[1 - 1 3; - 1 2 1; 3 1 1]. So, what is the inertia of this matrix? So,
inertia of A can be found just from the eigenvalues of A. So, if I calculate the
316
eigenvalues of A these comes out to be 4, ±(6)1/2. So, if I see here I am having 2 positive
eigenvalues of A, one is 4 another one is (6)1/2.
So, this particular theorem tells us that two matrices A and B both are real symmetric
matrices, the same result hold for the Hermitian matrices also. So, A and B are congruent
if and only if A and B have the same inertia; means A and B are congruent to each other
means you can always find a non-singular matrix C such that; CTAC =B, if and only if
the number of positive eigenvalues, number of negative eigenvalues and number of 0
eigenvalues for A and B are same.
317
So, in this way we can say the result is enough to check whether two matrices are
congruent or not. The alternate statement of this theorem is suppose D be a diagonal
representation of a symmetric matrix A. We say that the index of A is the number of
positive entries in D, and the signature of D is the number of positive entries in D -
number of negative entries in D, if r be the rank of A. So, r will be the number of non-
zero entries. In the diagonal representation of A, because number of non-zero
eigenvalues, then p the index and s as the signature then s = r - p.
So, for example, if you are having a matrix A which is a 4 4 real symmetric matrix and
eigenvalues of A, let us say you are having [2, - 1; 3 0]. Then we will define index of A
= number of positive eigenvalues or number of positive entries in the diagonal
representation of A since A is symmetric. So, I can always write A =Q D QT. So, Q[2 -
1; 3 0]QT.
So, here index of A is the number of positive entries. So, it is 2, signature is number of
positive entries - number of negative entries. So, here I am having two positive entries
and one negative entry. So, 2 - 1 comes out to be 1. And as I told you rank of A is index
+ signature and which is the number of non 0 entries in the diagonal representation of the
matrix A.
So, again the Sylvester law of inertia can be defined like this the two symmetric matrices
A and B are congruent if and only if their diagonal representations have the same rank,
318
index and signature. So, this is the alternate definition or alternate statement I will write.
So, let us consider an example of this.
So, determine which of the following matrices are congruent to each other. So, all the 3
matrices are given where the matrix A is [1 - 1 3; - 1 2 1; 3 1 1]. The matrix B is [1 2 1; 2
3 2; 1 2 1] and C is [1 0 1; 0 1 2; 1 2 1]. So, if I calculate the eigenvalues of matrix A. So,
eigenvalues of matrix A comes out to be 4, (6)1/2 and -(6)1/2. The eigenvalue of matrix B
is 0, 5+(33)1/2/2 and 5-(33)1/2/2. And the eigenvalues of the matrix C are 1, 1+(5)1/2 and
1-(5)1/2.
Now, if I talk about the rank index and signature of these 3 matrices. So, the rank of
matrix A is the number of non 0 entries in the diagonal representation of A and which
will be 3 because the diagonal representation will contain the eigenvalues the index will
be the number of positive eigenvalues or positive entries in D. And the signature is
number of positive entries - number of negative entries so 2 - 1 = 1.
For this matrix the rank is 2 because one of the entry is 0, index is 1 because only
5+(33)1/2/2 is the positive entry and signature is 1 - 1= 0. For this matrix again I am
having rank = 3; index is 2 and signature is 1. So, if you see here the matrix A and matrix
C are having the same rank index and signature. So, A and C are congruent. A =PCPT ;
by using either LDU factorization. So, if you go while the earlier statement of this
Sylvester law of inertia.
319
(Refer Slide Time: 29:59)
So, there I need to calculate (, , ). So, is defined as the number of positive
eigenvalues. So, here it is 2; is the number of negative eigenvalues which is 1, and is
the number of 0 eigenvalues. So, 0 here it will be 1, 1 and 1 and the third one will
become 2, 1, 0. So, again A and C are having the same inertia because for A this is 2 1 0
which is same for C. So, again A and C are congruent to each other in the same way we
can have a diagonal form or quadratic form.
So, consider the quadratic form f(x) =2x 1 2+2x 2 2+x 3 2-2x 1 x 2 -2x 2 x 3 . So, find a symmetric
matrix A. So, that f(x) =xTAx.
320
(Refer Slide Time: 31:16)
So, here q(x) given as so, solution q(x) given as 2x 1 2+2x 2 2+x 3 2-2x 1 x 2 -2x 2 x 3 . So now,
the associated matrix is q(x) =xTAx. here will come the coefficient of x 1 2 as 2, the
second diagonal entry in second row will be the coefficient of x 2 2 and in the same way
the diagonal entry in the third row will be the coefficient of x 3 2-2x 1 x 2 . So, I will take - 1
here and - 1 here - 2x 2 x 3 . So, I will take - 1 here and - 1 here it 2 3 or 3 2 entries and
these are 0.
So, this is the associated form. Now if I see the eigenvalues of this matrix the
eigenvalues of this comes out to be 0.1981 1.555 and 3.2470. So, here if I calculate the
inertia of this, inertia will be 3 3 3, here rank is 3, index is 3 and signature is again 3. So,
in this way since all the eigenvalues are positive. So, I can say and if rank =index
=signature, then i will say the matrix is positive definite PD and hence the associated
quadratic form is also positive definite.
I will ask you to work out on this and find out a suitable transformation Y =QTx such
that; q(x) you can write as YTDY means; the corresponding diagonal representation of
the quadratic form. And from that you will see the entries of D will be 0.9811 0.555 and
3.2470.
321
(Refer Slide Time: 34:22)
So, with this I will end this lecture. So, in this lecture we have seen a couple of
applications of the quadratic form these are the references.
322
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology Roorkee
Lecture – 24
Gram Matrix and Minimization of Quadratic Forms
Hello friends. So, welcome to the 24th lecture of this course. So, in this lecture we will
learn about gram matrix and we will learn how to find out a suitable minimizer for the
given quadratic form.
So, first definition in this lecture comes as a gram matrix. So, a gram matrix can be
defined in this way: let A be a mn matrix. So, A can be a rectangular matrix also, then
the nn matrix AT.A = K is known as the associated gram matrix.
So from here, you can see since, it is a product of the transpose of a matrix with that
matrix itself. So, gram matrix will be always a symmetric matrix. If I take an example
that, A is a 32 matrix. Where, A is [1 3; 2 0; -1 6] then ATA comes out to be [6 -3; -3
4]5.
So here ATA will become 22 matrix and this is the gram matrix of A.
323
(Refer Slide Time: 01:59)
Now, we are having a very important result related to gram matrices. So, all gram
matrices are positive semi definite, moreover the gram matrix is positive definite if and
only if the kernel of A= 0, means the kernel is having only 0 vector.
So, first we have to prove that all gram matrices are positive definite. So, if I take A So,
if I take the quadratic form of a gram matrix K, it can be written as XTK.X. Where, K is
324
the gram matrix. So, this can be defined as ATAX this can be written as AXT.AX. Which
is the norm of which will be always greater than equal to 0.
So, since the quadratic form of a gram matrix will be always greater than equal to 0. So
hence, K is positive semi definite; this is the first part of the theorem. Now, in the second
part we need to prove that, the matrix K is positive definite if and only if kernel of the
matrix A contains only 0 vector.
So, let kernel of A contains only 0 solution. In this case, the system AX= 0, implies that
X= 0. Since, we are having that q(X) 0 then and q(X) = 0. Only when, X= 0 because,
kernel is having only 0 vector. So, hence K is positive definite.
So, this we have done that, if kernel of A= 0, then K is positive definite. So, let us do the
same result from the other way that is, if we assume that, if K is positive definite then we
need to see. So, that the kernel of A is having only 0 vector. So, if K is positive definite
means all the eigenvalues of K are positive. It means the rank of the matrix ATA is n. If
K is a nn matrix means it is a full rank matrix.
Norm of a AX2 0 or. In fact, it will be strictly greater than 0. So, hence AX = 0 only
when X= 0. This means the kernel of A contains only the 0 vector. So, this is the proof of
other side.
In the same way, we can define weighted gram matrix. So, if C is any symmetric positive
definite mm matrix with all positive entries. Then the weighted gram matrix is K=
ATCA.
325
(Refer Slide Time: 07:45)
So, all the result which we have discuss in the earlier theorem holds for weighted gram
matrix also. Now, we will look for the minimization of quadratic forms.
So, consider a scalar quadratic form p(x) = ax2+2 bx+c for all x belongs to R. So, here
what we are having we are having a scalar quadratic form q x= ax2+2 b x+c.
326
(Refer Slide Time: 08:41)
Now, consider the case, when a > 0. So, when a > 0, it will be a parabola pointing
upward and if I see the minimization of these.
So, it will be having an unique global minima. So, if I see this so, q'(x) will become
2x+2b= 0 which gives me x = -b/a is the critical point. Now, if I check the sufficient
condition of minimization on this, then the double derivative of q with respect to x can
be given as twice of a and if a is positive then twice of a is also positive. So, point x= -
b/a is a point of minima. Similarly,, if I take a < 0. In this case, the point x= -b/a comes
out to be a point of maxima.
So, generally these are the necessary and sufficient conditions we can use for looking the
maxima and minima of a scalar quadratic form, but if I am having a vector quadratic
form, where x is not a scalar, but it is a vector form n. Dimensionally space then, how to
find out the minima of the quadratic form? So, we will learn this application now.
327
So, consider a general quadratic function of n variables p(x 1, x 2, x n) = p(X) where X is n
dimensional vector having component x 1 , x 2 , x n and this can be written as ij= 1 to
nk ij x i x j - i= 1 to n 2f i x i +c.
Where, the coefficient k ij, f i, c are assumed to be real. So in compact form, this particular
general quadratic function can be written in the matrix notations. So, here it can be
written as XTKX - 2XTf+C.
Then what I am having xTkx-2xTf+c. So, xT will become again [x 1 x 2 x 3 ] and f will be
here a column vector [0 -1 0]. So, if I see here, K comes out to be [2 -1 0; -1 3 0; 0 0 -5]
the scalar C is 3. So, this form p(X) can be written as xTkx-2xTf+c. So, in this way we
can write any quadratic function of n variables . a compact form using this concept.
328
(Refer Slide Time: 14:50)
Now, this is the theorem which tells us about the minimization of a general quadratic
form. So, if K is a symmetric positive definite matrix then the quadratic function p(X)=
XTKX-2XT f+C has a unique minimization.
So, please note down if the unique minimization or global minima of this particular
quadratic function depends on the property of K. So, if K is symmetric and positive
definite then this particular quadratic function will certainly have a global minima.
And this global minima is the solution of the linear system KX= f namely as X= K-1f.
The minimum value of p(X) is equal to in any one of this form. So, p(X*) = p(K-1f)
which is C-f T K-1f. So, this K-1f, I can replace with the X* which is the K solution of this
system. So, C-fTX* or this also I can write C-(X*)TKX*. So, let us see the proof of this.
So, since K is a positive definite matrix. This implies K is nonsingular and hence X= K-1f
exists, it means K is invertible. Now, for any X belongs to n dimensional real vector
329
space. We can write p(X)= XTKX-2XTf+C. This= XTKX-2XT and from here, I can write
f= KX*+C. .
The first term means this term. In the above expression, has the form YTKY where Y=
X-X*. Since, K is positive definite. We know YTKY will be greater than will be a strictly
greater than 0. For all Y those are not= 0. Thus, the first term means the underlined term
achieves. It is minimum value because, it is a quadratic term to 0 if and only if, Y= 0
because, then only this term will become 0 and this implies Y= X-X*. So, X-X*= 0 or
X= X*.
Moreover, since X* is fixed the second term does not depend on X because X* is fixed
here and this term does not contain any expression in X. Therefore, the minimum value
of p(X) occurs at X= X* ok. And where X*= K-1f which is the proof of first part.
And the minimum value is or can be obtained just by substituting X= K-1f in this
expression and from that we can get any form. So, the minimum value of p(X) can be
achieved by putting X= k-1f in p(X) means, in this expression, which will give you the
330
second part of the second statement of the theorem, proof of the second statement of the
theorem. So, this is the proof of this theorem. Now, let us see an example of this
theorem. So, example is like this.
Consider the problem of minimizing the quadratic expression or quadratic function p(x 1 ,
x 2 )= 4x 1 2 -2x 1 x 2 +3x 2 2+3x 1 -2x 2 +1 over R2 means all real values of x 1 and x 2 .
So, by using the. So, how to solve this particular thing? So, by using the previous
theorem which we have stated just now if the associated matrix K will be positive and
symmetric positive definite and symmetric. Then, it will be having a unique minima. So,
let us write this p in the matrix form. So, it will become [x 1 x 2 ][4-1; -1 3][x 1 x 2 ]-2
xTf+1. So, xT is [x 1 x 2 ][ and what will be f here. So, if we see the coefficient of x 1, it is
+3. So, it will become -3/2 and if we see the coefficient of x 2 , it is -2. So, f=[-3/2 1]
So, here what I am having, if I compare with XTKX-2KTf+C. So, K is [4 -1; -1 3]; f is [-
3/2 1] and C is 1. If we see the matrix K, first principal minor of K is positive and the
determinant K is 13 which is again positive. So, here K is positive definite. So, since K is
positive definite, so according to the previous theorem the quadratic function p will be
having a unique minima.
331
Now, this particular function will be having a unique minima at X*= K-1. f. So now, if I
calculate the K-1,by the previous theorem X*= K-1f and this comes out to be [-0.318 18
0.22727].
So, this is the point on which the minima of this function will exist and the minimum
value of this comes out to be, calculated using Any of the expression of this theorem
means, either C-fTK-1f or C-f T X* or C-X*TK X*.
332
(Refer Slide Time: 28:05)
Where, X* is given by this vector and this minimum value is 13/44. So, this is the point
of minima and this is the minimum value of p at this point of minima. So, in this way by
using the previous theorem, we can find out the unique minimizer of a given quadratic
function; however, the condition as I stated earlier the matrix K should be positive
definite.
So, with these in this lecture we have learnt gram matrix and how to minimize a given
quadratic function of n variables.
In the next lecture, we will learn another canonical transformation that is called Jordan
canonical form. Which exist when a matrix is not diagonalizable? So, with this I will end
this particular lecture. These are the references for this lecture.
333
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology Roorkee
Lecture 25
Generalized Eigenvectors and Jordan Canonical Form
Hello friends, so, welcome to the lecture on Generalized Eigenvectors and Jordan
Canonical Form. So, as we know that, if a matrix of order nXn having n linearly
independent Eigen vectors, then the matrix can be written as PDP-1 or in other way other
words we can say, the matrix is diagonalizable or the matrix is similar to a diagonal
matrix, where the diagonal entries are the Eigenvalue of the matrix A.
So, if the matrix is not diagonalizable means the matrix does not have n linearly
independent eigenvectors. Then, how to find out a similar type of transformation so that,
at least we can write this matrix as P J P-1 where J is a block diagonal matrix. So, I want
to say that, if matrix is diagonalizable then, I can write P D P-1 where D is a diagonal
matrix. And if matrix is not diagonalizable, then write it P J P-1, where J is a block
diagonal matrix. So, it is a generalized similar transformation where we are reducing a
given matrix block diagonal matrix.
334
(Refer Slide Time: 02:08)
So, for doing this, we have to talk about Jordan blocks. So, the definition of Jordan block
is a Jordan block corresponding to a given eigenvalue is a kXk matrix with on the
main diagonal and one on the super diagonal.
So, for example, if I want to write a Jordan block of size 1 with respect to eigenvalue
= 0 . So, J 1 ()=0 is given as by this matrix. If I want to write a Jordan block of size 2
corresponding to eigenvalue = 0 , then it will be 0 0 in the main diagonal. So, 1 in the
super diagonal and 0 will be below.
335
Similarly, a Jordan block of size 3 corresponding to eigenvalue = 0 can be written in
the same way. So, [ 0 1 0; 0 0 1; 0 0 0 ] here, you can notice that 0 are in the main
diagonal and 1 is in the super diagonal. So, in the similar way, a Jordan block of size k
corresponding to eigenvalue = 0 can be written as [ 0 1 0; 0 0 1 0; 0 0 0 1; 0 0 0
0 ]. So, it is a kk matrix. So, this is the definition of Jordan block of different size.
So, if someone ask you write a Jordan block of size 2 corresponding to eigenvalue =3,
so, it can be written as size 2, eigenvalue is 3 so, it is given as [3 1; 0 3]. Now, we will
see some properties of Jordan blocks; so, a Jordan block has only 1 eigenvalue = 0 . So,
for example, if you are having this particular matrix, it is a Jordan block of size k. So, the
eigenvalue of this will be = 0 with algebraic multiplicity k.
So, algebraic multiplicity of this will be k and the determinant of a Jordan block of size k
corresponding to = 0 will be given as (- 0 )k. The geometric multiplicity of = 0 of a
Jordan block of size k will be 1 means, there will be only 1 linearly independent
eigenvector corresponding to a given Jordan block whatever be the size.
And third important property is, if e 1 , e 2 ,....., e k denotes the standard basis in a k
dimensional vector space, then J k ( 0 )= 0 e 1 = 0 e 1 ; and J k ( 0 )e i = 0 e i +e i -1 where i is
varying from 2 to k. So, suppose I want to find out J k ( 0 )e 2 . So, it will become 0 e 2 +e 1 .
336
Now, after learning about Jordan blocks, let us define the Jordan canonical form. So, a
Jordan canonical form is a block diagonal nn matrix given like this. So, here, basically
what we are having? We are having these m Jordan blocks corresponding to eigenvalues
1, 2 up to n and respective size are k 1 for the Jordan block corresponding to
eigenvalue 1, k 2 is the size of the Jordan block corresponding to eigenvalue 2 and so
on.
So, in this way, if a the matrix is nn matrix, then k 1 +k 2 +......+k m = n. Please note, it is a
block diagonal matrix and all these are 0 blocks. So, block diagonal and we are having 0
blocks above the block diagonal and below the block diagonals. So, here we are having
m Jordan blocks as I told you and this complete matrix is called Jordan canonical form.
So, and if I know the algebraic multiplicity and geometric multiplicity of different
eigenvalues for a given matrix, then I can write the Jordan canonical form of that matrix
that we will take some example of that.
337
(Refer Slide Time: 07:51)
Now, as you can see, the determinant of this Jordan canonical form is given as ( 1 -
)k1( 2 -)k2( 3 -)k3 ..... ( m -)km and this can be easily obtained by using the concept of
finding the determinant of the block diagonal matrix where the determinant of the
complete matrix will be the product of determinants of different block matrices.
Similarly, corresponding to each block as I told you, when I was defining the Jordan
blocks, that corresponding to each Jordan blocks there will be only one linearly
independent eigenvector. So, in that way, I will be having this particular Jordan
canonical form will be having m eigenvectors given as X 1 , X 2 , X 3 ,......X m each
corresponding to on the Jordan blocks and this can be proved by the method of
induction.
Now, we are coming to a very important theorem and this tells us about this is called
Jordan canonical form.
338
Similar to a transformation so, this particular transformation tells us that every2matrix of
order n is similar to a Jordan canonical form J of the similar size that is if A is a matrix of
order nn then a is similar to a Jordan canonical form J such that A can be written as S J
S-1. So, where S is the matrix containing the eigenvectors and generalized eigenvectors
of A.
Now, please note it here; if A is a diagonalizable matrix, in that case, A will become P J
P-1 where P comes from the eigenvectors of A because, then if A is diagonalizable, A
will contain n linearly independent eigenvectors and if I write those eigenvectors as
columns of P, then I will get the model matrix P and J will be D where diagonal entries
are the eigenvalues of A and P-1. So, if A is diagonalizable. So, this Jordan canonical
transformation will become diagonalization. So, we can consider Jordan canonical
transformation as the generalization of classic diagonalization transformation.
So, in this way, now question arise when A is not diagonalizable, means A does not have
n linearly independent eigenvectors then, how to write this matrix S? Because, if A is a
nn matrix and I am able to find only let us say some m linearly independent
eigenvectors corresponding to different eigenvalues of A, then I will be able to write
only m columns of S. So, from where I will write remaining n-m columns?
So, those columns I will write by finding the generalized eigenvectors of the matrix A.
So, for writing this particular matrix S, we need to learn how to find out the generalized
eigenvectors of a matrix. So, let me define generalized eigenvector of a2matrix.
339
(Refer Slide Time: 11:52)
For some positive integer p such that (A- i )p-1X 0. So, (A-I)pX =0, where X is a non 0
vector, but (A-I)p-1X 0. So, we can say that, a generalized eigenvector is a member of
null space of (A-I)pX =0. Let us take an example to find out the generalized
eigenvector.
340
So, find the generalized eigenvectors of a matrix A where A is given as [1 1 0; 0 1 2; 0 0
3]. Now, if I see here, the A is a upper triangular matrix. So, eigenvalues of A will
become 3, 1, 1. If I calculate the eigenvector corresponding to =3, then (A-3 I)X will
become 0 and let us say it is X 1 . So, from here, I get an eigenvector X 1 =[1 2 2]T. Now,
similarly if I calculate the eigenvector corresponding to =1, then (A-I)X =0 and from
here I get only 1 linearly independent eigenvector that comes out to be [1 0 0].
So, here what we can say that the algebraic multiplicity of A is 2 while geometric
multiplicity of =1 is only 1. So, hence A is not a diagonalizable matrix. So, if A is not a
diagonalizable matrix and as I told you, we can write A as S J S-1 by the Jordan canonical
transformation. So, for writing the matrix S, I need to find out one generalized
eigenvector corresponding to =1. So, it means a generalized eigenvector will be X 3
such that A-I and that is, I am talking about corresponding to =1. So, (A-I)X 3 and (A-
I)2X 3 =0.
And (A-I)X 3 . So, not be =0. So, since I want to take (A-I)X 3 . So, not be =0. So, if from
here, I take (A-I)X 3 =X 2 because, as I told you X 2 is an eigenvector. So, it is a non 0
eigenvector.
So, this particular equation satisfy this condition of generalized eigenvector if I multiply
both side XA-I, then it will become (A-I)2X 3 and it comes out to be (A-I)X 2 and from
here, (A-I)X 2 is 0 because X 2 is an eigenvector.
So, hence I need to find out n eigenvector or generalized eigenvector let me say X 3
which is satisfying this particular condition, so that, I can calculate by using or by
solving this non homogeneous system of equations. So, if I solve it, here it will become
(A-I)X 3 =X 2 . So, this gives me X 3 =[0 1 0]T. So, here X 3 is a generalized eigenvector of
the matrix A corresponding to eigenvalue =1. So, in this way, we can calculate the
generalized eigenvectors.
Once you find out the raised m-n generalized eigenvectors, then you are having m
linearly independent eigenvectors corresponding the matrix A of size n and then what
you have done you have calculated n-m generalized eigenvectors corresponding to
different Eigenvalues. So, what you can do? You will be having n total eigenvectors and
generalized eigenvectors and those n vectors. You can write as the columns of a matrix
and that matrix will become matrix S.
341
So, if X 1 X 2 X 3 .... X n is the set of all linearly independent eigenvectors and generalized
eigenvectors of the matrix A, then S will be the matrix having columns as these
eigenvectors and generalized eigenvectors. So, let us take an example to write the Jordan
canonical transformation of a given matrix.
342
(Refer Slide Time: 19:16)
So, example is find the Jordan canonical form of the matrix A =[2 2 1; 0 2-1; 0 0 3] also
find a matrix S such that A =S J S-1 where J is the Jordan canonical form of A. So, let me
solve this particular example. So, first of all, I need to find out eigenvalues of A and
again you can see A is an upper triangular matrix. Eigenvalue will be given by the
diagonal elements. So, here eigenvalues are lambda =3, 2, 2. The algebraic multiplicity
of =2 is 2. Now, we will see what is the geometric multiplicity of this.
Now, eigenvector corresponding to =2. So, it means (A-2 I) X =0. So, from here what
I got, I got only one linearly independent eigenvector which is let me write as X 2 . So, X 2
becomes [1 0 0] because, when I will write (A-2I), the first equation will become 0 2
X 2 +X 3 =0.
Second equation will give us that X 3 = 0. So, from there, I will get X 1 is also 0 and third
equation will give me again X 3 =0. So, here, I will get X 2 =0 =X 3 and X 1 is arbitrary.
So, I have chosen X 2 , X 1 as 1, means X 1 X 2 X 3 are different components of X 2 .
343
Now, I need to find out one generalized eigenvector corresponding to =2. So, if I solve
for that (A-2 I)2X =0 which is equivalent to solving (A-2 I), let me write this X 3 =X 2 . So,
if I do it, I will get X 3 as the generalized eigenvector and this comes out to be 0 1 X 2 0.
So, after doing this, now I need to write matrix S and the matrix J. So, here my matrix J
as I told you, I can write with only the information about algebraic multiplicity and
geometric multiplicities of the different eigenvalues of A. So, here 3 is having algebraic
multiplicity 1, geometric multiplicity 1. So, there will be a 11 block of 3. Now, the
algebraic multiplicity of the eigenvalue is 2. So, algebraic multiplicity of a given
eigenvalue tells us that how many what will be the total size of the sum of various blocks
corresponding to this eigenvalue.
So, here it is saying these that algebraic multiplicity is 2. So, it will be a 22 blocks
corresponding to eigenvalue =2 the geometric multiplicity of =2 is one. So, geometric
multiplicity tells us the total number of blocks corresponding to that eigenvalue. So,
algebraic multiplicity tells size total size geometric multiplicity total blocks.
So, algebraic multiplicity is 2 geometric multiplicity is 1. So, only 1 block of size 2. So, I
will be having a block of size 2 corresponding to eigenvalue =2. So, in this way this
particular matrix becomes the Jordan canonical form of A.
Now, I can write the matrix S as the [-1-1 1; 1 0 0; 0 1/2 0]. So, this vector as column
and third column will come from here. So, these are the matrices J and S such that A =S
J S-1 and you can verify it later on.
So, this is the overall process using the Jordan canonical transformation for finding the
matrix S and a Jordan canonical form J of a given matrix.
344
(Refer Slide Time: 26:52)
Similarly, if I take this example so, find a Jordan canonical form J of this matrix. So, if I
solve it here, the eigenvalues coming out to be =3 3 3. So, algebraic multiplicity of
=3 is 3. If I calculate the eigenvector corresponding to eigenvalue =3 then (A-3 I)X 1
=0 this gives me X 1 =[1 2 0]. So, hence the geometric multiplicity of =3 is 1.
Now, calculate the generalized eigenvector. So, (A-3 I)X 2 =X 1 . So, from that I got X 2
=[1 1 1] and the another because I need to calculate 2 generalized eigenvectors for
writing the matrix S. So, another generalized eigenvector can be written as (A-3 I)3X 3 =0
which I can have (A-3 I)X 3 =X 2 from this relation. So, from there I got X 3 =[1-1 1]T.
So, hence J, means total block is 1 size is 3. So, a Jordan block of size 3. So, [3 1 0; 0 3
1; 0 0 3] and S will become [1 2 0] is the first column, [1 1 1] is the second column and
[1-1 1] is the third column.
Hence we have A =S J S-1. So, I have taken a couple of examples for finding the Jordan
canonical transformation for a given matrix. If matrix is diagonalizable, then Jordan
canonical form will be equal to the diagonal matrix having eigenvalues as the main
diagonal entries.
345
(Refer Slide Time: 28:55)
So, let me explain the relation of Jordan canonical form of a matrix with minimal
polynomial, it given matrix A of order n, then the JCF of A the eigenvalues are the
entries on the main diagonal.
So, if the minimal polynomial of A is m A () and it is given as (- 1 )s1(- 2 )s2 ....... (-
k )sk where s i is the size of the largest Jordan block corresponding to i in A.
So, powers in the minimal polynomial corresponding to different terms, different factors
will give you the size of largest block corresponding to that particular eigenvalue. And if
(- 1 )r1(- 2 )r2 ....... (- k )rk is the characteristic polynomial, then r i is the number of
occurrence of i on the main diagonal which is obvious. So, the geometric multiplicity of
i is the number of i Jordan blocks in A because each Jordan block will give you only 1
linearly independent eigenvector.
So, let us take an example corresponding to this particular relation. So, consider a 66
matrix A having characteristic polynomial (-3)4(-2)2 and minimal polynomial is (-
3)3(-2)2.
346
(Refer Slide Time: 31:01)
So, find J mean Jordan canonical form of A. So, here as I told you, that these powers will
give you the size of maximum biggest Jordan block corresponding to these eigenvalues
and these are the number of occurrence of these eigenvalues on the main diagonal. So,
here I am having 4. So, the =3 will occur 4 times on the main diagonal out of which the
biggest Jordan block will be having size 3.
So, from here, I can get an information that, thus there will be 2 Jordan blocks
corresponding to =3; one of size 3 another one of size 1. So, hence I can have [3 1 0; 0
3 1; 0 0 3]. This is the block of size 3 another one of size 1. So, 3.
And then, here I am having 2-time occurrence of =2 and the maximum Jordan block
will be having size 2. So, 2 =2. So, there will be only one Jordan block of size 2. So, [2
1; 0 2] and rest are 0 blocks. So, in this way, this is the Jordan canonical form of A if
characteristic polynomial and minimal polynomials are given in this way.
347
(Refer Slide Time: 33:59)
So, there will be 2 ways of writing 4 having the biggest entry is 2; one is 2+2 another one
is 2+1+1 and the other one is 2 which I have to have biggest one 2. So, 2 like this.
So, the possible Jordan block form will be if I take this particular thing, so, there will be
2 blocks corresponding to =2 each of size 2. So, [3 1; 0 3], then [3 1; 0 3], then what I
am having here? I am having [2 1; 0 2]. So, this is one of the possible Jordan canonical
form. Obviously, you can interchange the Jordan blocks.
So, here I am not taking consideration of reordering of the Jordan blocks. I am taking
them as the symmetrics. The other possibilities if you use this combination , so, in this
combination, what I am saying one of the Jordan blocks corresponding to =3 of size 2
and two are of size 1 1. So, I am having [3 1; 0 3] and 2 are of size 1 1 and then I am
having [2 1; 0 2]. So, this is m and these are 0s. So, this is the 2 possibilities for the
Jordan canonical form J of this matrix A having this characteristic polynomial and this
minimal polynomial.
So, hence I can say that, the information you can write the Jordan canonical form of A
matrix if you know either the algebraic and geometric multiplicity of each eigenvalues or
you can get n information, if you know the minimal polynomial as well as characteristic
polynomial of that particular matrix.
348
So, in this lecture we have learn about Jordan canonical transformation and how to write
Jordan canonical form of a given matrix.
These are the references in the next lecture we will learn evaluation of matrix functions.
349
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture- 26
Evaluation of Matrix Functions
Hello friends. So, welcome to the lecture on Matrix Functions. So, in this lecture we will
learn how to calculate the functions having matrix as variable. So, first let me define the
matrix function.
So, a matrix function may be defined as a function f that takes the input as a matrix A of
order nn and returns another matrix f(A) of the same dimension. So, we can define
various matrix functions for example, such that f( A)= ef(A) . let us An, where nis any
integer.
So, for example, if someone ask you to calculate A100, which is not easy because you
have to multiply the matrix A 100 times. Another may be the trigonometric functions of
350
A for example, sin(A) or cos(A) or hyperbolic functions and many more, that is the
polynomials of A. So, these are the examples of matrix functions.
Now, why we calculate or why we evaluate the matrix functions? So, we are having
several applications of matrix functions especially in mathematics and physics, few of
them I have listed here like autonomous or non-autonomous system of ordinary
differential equations, in control theory we are having plenty of applications of the
matrix functions, and where we need to evaluate the matrix functions, in linear algebra;
obviously, and then we are having applications of these matrix functions in image and
signal processing. Because, an image can be consider as a matrix and when you are
performing some operation on image you have to evaluate that particular functions
means; output of that particular operation will be the value of the functions when image
will be the input variable.
351
So, let us learn about the computing matrix functions. So, here, let me take the case 1.
And in case 1, I am taking that let the matrix A which is of size nn, is A diagonalizable.
So, it means I can write the matrix A = PDP-1. Where P is the modal matrix coming from
the eigenvectors of A and D is the diagonal matrix having diagonal entries as the
eigenvalues of A.
Now, number 1: evaluate An where nis any integer. So, what I am having as you know A
can be written as PDP-1, if I want to calculate A2, it will become AA, that is PDP-1 which
is AA PDP-1. So, this will become P D2 P-1, if I am having A3. So, in the same way it will
become P D3 P-1.
So, for any given n, An will become. So, in the similar way A n
=P D nP-1, where n
belongs to I. So, for example, if let us take A33 matrix having eigenvalue 1 , 2 , 3,
then I can write A100.
So, let us say find A100. So, A hundred will become P which is the model matrix for A
and then [ 1 0 0; 0 2 0; 0 0 3 ], it is D. So, D100P-1. So, this will become P[ 1 100 0 0; 0
2 100 0; 0 0 3 100]P-1. So, in this way we can calculate various powers of A given matrix.
352
So, one more thing need to mention here since I have written An belongs to I. So, for
negative integers we will define An only when, n only when matrix A is invertible. So,
please take care of it.
Evaluate f(A) =eA, where it means exponential of matrix A. So, as we know eA. So, if I
open the series it will become I+A+A2/2!+................. So, I can write this particular
function eA in terms of this infinite series, in terms of this infinite series if the series is
convergent. So, there is a question about the convergence of this series.
So, if I talk about convergence, I can write ||eA||e||A||. And, since A is a matrix of order
nn so, A will be finite, this gives me eA < and hence the series is convergent in fact,
uniformly convergent. So, this is about the convergence of this series. So, let us come
back to our evaluation.
353
(Refer Slide Time: 09:05)
So, I can write this function, I can write P I P-1 because A is diagonalizable. So, I can
write A=PDP-1. So, this A, I can write PIP-1+PDP-1+1/2!(PD2P-1)+...... So, this I can
write P(I+D+D2/2!+D3/3!+....... )P-1. So, this is P and if you see this round bracket term it
is nothing just PeDP-1.
So, what I got eA =PeD P-1. Now, the question is if D is given to you how to evaluate this
eD. So, if let us assume that D = [ 1 0 0... 0; 0 2 0...0; 0.....0 n ] ok. So, it is n n matrix
so, then D2 will become [ 1 2 0 0...0; 0 2 2 .....0; 0......0 n2].
So, if I calculate I +D +D2 and so on and which is nothing just eD. So, this comes out to
be from first entry will be [1+ 1 + 1 2/2! 0.... 0; 0 1+ 2 + 2 2/2! 0....0; 0 .....0
1+ n + n 2/2!].
So, this is the matrix eD and this is nothing just this becomes e. So, this I can write
simply eD if my D is this one then eD will become [e1 0 0; 0 e2 0; 0 0 en]. So, you can
calculate eD quite easily. So, hence you can calculate eD.
354
(Refer Slide Time: 12:33)
Find eA where A is given as [1 5; 4 2]. So, let us see the solution of it. So, here the
eigenvalues of A is = -3 and 6, the eigenvector X 1 corresponding to = - 3. So, the
eigenvector corresponding to =- 3 is X 1 and let us say the eigenvector corresponding to
=6 is X 2 , since both the eigenvalues are different. So, they will be having linearly
independent eigenvectors and hence matrix is diagonalizable.
So, from here I can write the matrix A. So, I can write A=P that is my matrix A is [1 5; 4
2]. So, this is =PDP-1. So, matrix P is [- 5/4 1; 1 1], here D is [- 3 0; 0 6] and then P-1.
So, P-1 comes here[- 4/9 4/9; 4/9 5/9].
Now, consider the second case. In the first case we have consider that A is a
diagonalizable matrix, but suppose A is not diagonalizable then we can use the Jordan
canonical form of A for finding the function of A.
355
(Refer Slide Time: 16:39)
For example, if I want to evaluate case 2, when A is not diagonalizable, then by the
Jordan canonical transformation I can write A =S J S-1 and how to calculate it you have
learn in the previous lecture?. So, now, if I want to evaluate let us say An.
So, here An will become S Jn s-1 and now how to calculate Jn?. So, I will evaluate it with
the Jordan blocks of various size. So, let me take a Jordan block of size 1. So, if 0 is
there. So, this n will become simply 0 n.
So, now, problem with Jordan block of size 1 now if you take Jordan block of size 2 so,
[ 0 1; 0 0 ]. So, let us assume this is my J. So, now, if I calculate J2 it will become [ 0 1;
0 0 ][ 0 1; 0 0 ]. So, if I multiply these 2 matrices it comes out to be [ 0 2 2 0 ; 0 0 2], if
I calculate J3. So, it will be J2. J. So, it means I have to multiply again with this matrix
here into this matrix.
So, this becomes [ 0 3 3 0 2; 0 0 3]. So, this is about 22 matrices. Similarly, we can
obtain it for 3Jordan blocks or Jordan blocks of any size.
356
So, in general if I am having a Jordan block of size kk having diagonal entries as ,
then the power of n of this Jordan block is given by, n is the first diagonal entry then nc 1
n- 1, nc 2 n- 2 and so on.
Then 0n nc 1 and so on and in this way ok. One more thing you have to note here that the
entries of this upper triangular matrix are 0 for term involving nc r whenever r > n.
So, if I use this result for calculating, if J is let us say a 33 blocks so, [ 0 1 0; 0 0 1; 0 0
0 ]. So, using this general formula here J2 will become [ 0 2 2c 1 0 1;. 0 0 2 2 0 ; 0 0 0 2]
So, this particular matrix gives us the general power of a Jordan block.
So, in this way you can calculate An =S J n S-1 and where J n can be calculated block wise
using this particular formula, please note that. The Jordan canonical form will be having
different blocks. So, you have to apply this formula block wise.
357
(Refer Slide Time: 22:39)
So, if someone ask me wright J4. So, now, I will use I will calculate it block wise. So,
here n is 4, k is 2 here. So, first entry will become 34, second entry will become 4c 1. 34-1,
then 0 here and 34.
So, [34 4c 1. 34-1; 0 34], this is for the first block. Now, second block it will become [24]
and for third block it will be [54]. So, this will become the corresponding J4 matrix.
So, you have to calculate it block wise. Now, and it will be quite easy to calculate using
the given formula.
Now, case B evaluate eA, A is given as SJS-1. So, again eA = I +A +A2/2! and so on. So,
this will become SIS-1+SJS-1+SJ2S-1/2! +......
358
So, this I can write SI +J +J2/2!......S-1 and this will become SeJS-1.
Now, how to calculate this eJ. Again, you know that eJ can be written as I +J +J2/2! and
so on and you know if J is given how to calculate J2 and various powers of J. So, if I see
this.
So, this will be if I take a Jordan block of size k. So, I +J, means I am writing J, but it is
H in slide. So, you can consider it J Jordan canonical. So, I +H +Jordan block. So, H is a
Jordan block here.
So, I +H +H2 H2/2! +...... So, I is this H is this one H2 is given by this one. So, if I add all
these. So, final ex comes out to be [e e e/2!........... e/(k - 1)!; 0 [e e e/2!...........
e/(k - 2)!; 0 0 [e e e/2!........... e/(k - 3)!; 0 0 .......e]. So, in this way you can
evaluate e H.
359
(Refer Slide Time: 26:43)
And, since J is a Jordan having various Jordan blocks. So, this formula, we will be have
to apply each Jordan blocks separately based on the size at given eigenvalue. So, let me
take one more example of this. So, let me take a interesting example.
Find eJ for the matrix with characteristic polynomial as C A ()=(-2)3(- 3)3(-1)2 and
geometric multiplicity of each eigenvalue is 2 ok.
So, first of all we have to write the Jordan canonical form of such a matrix. So, if I write
the Jordan canonical form. So, total size of the matrix becomes 3 +3 +2= 88 matrix. So,
here =2 will occur 3 times in the main diagonal and geometric multiplicity is 2. So,
there will be 2 Jordan blocks corresponding to =3. So, if I make the factors of 3 as 2
integers. So, it will become 2 +1. So, here again I am indicating that I do not consider the
360
order 2 +1 or 1 +2 I am taking the same ; however, if you talk about the complete Jordan
canonical form they will be different matrices.
So, here [2 1; 0 2], so, this will be the first Jordan block corresponding to =2 another 1
will come from here. So, it will be simply a Jordan block of size 1. Now, come to this
factor here again 3 geometric multiplicity is 2. So, 2 blocks. So, 3 factored in 2 integers
as 2 +1. So, I will be having [3 1; 0 3] and again 3 here and finally, come (-1)2. So, here
algebraic multiplicity is 2 of =1 geometric multiplicity is 2 so, there will be 2 factor.
So, 2 integers with sum is 2 is 1 +1 positive integers. So, then I will be having 1 and 1.
So, this is my matrix J now what I need to calculate eJ. (Refer Slide Time: 27:05)
So, this is the corresponding eJ for such a given matrix. So, in this way you can write eJ
and eA will become SeJS-1. So, in this lecture we have learn how to evaluate various
functions especially en and eA, for a given matrix A.
When A diagonalizable and A is not diagonalizable. If, you are having trigonometric
function you can always write trigonometric function in terms of exponential function.
So, you can calculate exponential using this process and then by performing the
corresponding operations you can evaluate the given trigonometric function. So, with
this I will end this lecture.
361
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 27
Least Square Approximation
Hello friends. So, welcome to the lecture on least square approximation. So, in this
lecture we will learn how to find out an approximate solution to a linear system, which is
either over-determined and not having the exact solution and in other case if it is
underdetermined and having the infinitely many solutions. So, out of those infinitely
many solution which of the solution will be having the minimum error.
So, since A is nonsingular so, A-1 exist and the solution of this system can be written as
X = A-1 b and this solution will be exact and unique solution to this linear system. Now,
362
this is my first case now second case, if A is not square, if A is a mn matrix means the
coefficient matrix is having a rectangular shape where m> n.
So m>n means you are having the more number of equations, than unknown means more
equations than unknowns, in this case the system is an over-determined linear system.
So, over determined means you are having more equations than unknown. Moreover,
assume that the system does not have an exact solution if the system is not having any
exact solution, then we need to find out an approximate solution.
Hence, I need to find out an approximate solution and this approximate solution
minimize the residual error means the solution with minimum residual error. So, let me
define the residual error is e = ||A|| 2 x-b. So, what I need to do I need to minimize E. So, I
have to find out such x, which minimize E ok. So, this means I need to minimize (Ax-b)T
(Ax-b), which is the this norm of E, which is nothing just E.
So, since I have to find out minimization of this function. So, what I need to do I have to
differentiate E partially with respect to x, and I need to put it equal to 0. So, basically
from here this E can be written as XTATA X-XTATb-b TA+bTb. And, this will be =
XTATAX-2XTATb+bTb, because these 2 things will be same.
Now, E/x = 0 will give you 2ATAX-2ATb and this will be 0. So, this is equal to 0. So,
what I am getting from here I will get. So, I will get X = ATA-1 ATb.
363
So, if we write ATA-1ATas A+, then least square solution is given as X = A+b where this
A+ is ATA-1AT.
Here, A+ which is equal to again, I am writing this is called the Pseudo inverse of A. The
solution X = A+b that is pseudo inverse of A, b is called least square approximation of
the over determined system AX = b or least square solution of the over-determined
system Ax = b.
Why I am saying it A pseudo inverse of A, because if you take. So, if I take A as square
and invertible matrix means as a full rank matrix then AA+ will become AATA-1AT. So,
this will become A A-1 because A TA-1 will become (A-1AT)-1 AT.
So, it means this will be I, this will be I because A is a square and invertible. So, ATwill
be invertible. So, this is I I = I. So, it is a kind of inverse only, but in case of rectangular
matrices and where number of rows are more than number of columns. So, this is the
least square approximation of an over-determined system.
So, let us take an example of it. Here, I am having a remark we have assumed that A is
mn matrix where m>n. So, and for calculating the pseudo inverse of A. I am using the
inverse of ATA. So, here ATA should be invertible and what will be the size of ATA it
will be of nn matrix.
So, it means the rank of A should be n then only you can calculate ATA-1 and you can
apply this pseudo inverse and you can find out the least square approximation in this
case. So, please note that. So, please note it, if the rank of A is less than n, I will tell you
how to find out the least square approximation of the system X = b that in the subsequent
lectures.
364
(Refer Slide Time: 13:38)
Now, let me take an example of it. So, fit the best line from the data points (1, 2), (2, 3),
and (3, 5). So, we are having 3 points and we have to fit the best line from these data. So,
let us solve this particular example.
So, here assume that line is given as y = mx+c. So, from the first data point I can write y
is 2, x is 1=> 2= m+c. Similarly, from the second data point I can write 3 = 2 m+c. And,
from the third data point I can write 5 = 3 m+c. So, this particular system can be written
as [1 1; 2 1; 3 1][m c] = [2 3 5].
So, this is my matrix A, this is X, this is b. So, it is a linear system which is over-
determined because I am having 3 equations and my unknowns are 2 means to fit the line
I have to find out the value of m and c.
So, here if I write the number of equations is 3 and number of variables is 2. So, it is an
over-determined system. So, now, use the least square approximation technique. So, first
I will calculate AT. So, AT will come [1 2 3; 1 1 1].
So, once I calculate AT, then I will calculate ATA which will be a 22 matrix. So, here if
I calculate ATA. So, [14 6; 6 3], so, this is ATA.
365
Now, if the rank of this is 2 then I can calculate the inverse of this. So, here I am
calculating the inverse because it is a full rank matrix, full rank means rank is 2. So
inverse comes out to be [1/2 -1; -1 7/3].
Now, least square solution of this system can be given as X = (ATA)-1 ATb. So, ATA-1 is
[1/2 -1; -1 7/3], ATwill be [1 2 3; 1 1 1], b is [2 3 5], if I multiply these 3 matrices it
comes out to be [3/2 1/3].
So, it means m is 3/2 and c is 1/3. So, here best line is y = 3/2x+1/3 C, which is going
from these data points. Here the residual error is which is just ||y-mx-c||2 and it is 0.1667
which is the minimum. So, hence this is the overall idea of applying least square
approximation for the systems those are over-determined and here we have taken the
case of line fitting.
Consider A system A x = b, where A is a mn matrix and here m<n means I am having
less equation than the variables. So, it is an underdetermined system. This system will be
having infinitely many solutions.
366
Now, we need to pick one of these solutions by finding smaller one, that is we have to
minimize norm of x subject to AX = b ok. So, here how to solve it by using the method
of Lagrange multiplier, we have our Lagrangian edge E = ||X||2+tb-AX. Now, to
minimize this and finding the X, I have to write E/X = 0 which gives 2X+AT = 0, let
me write this equation as 1.
Now, I cannot find the value of directly from here, because I am having AT in the
multiplication with and A is a rectangular matrix. So, the AT and I cannot calculate the
inverse of A-1. So, I cannot find out this from here.
So, from here I can write = (AAT)-1 b provided the inverse of AA T exist. And, it will
exist when the matrix A will be having the rank m, if it is less than m then this inverse
will not exist because then A ATwill become singular.
Now, I am having this one now putting this value of in equation 1, I will be having
2(X-A)T and then is here supporting this value of . So, it will become 2(AAT)-1 b = 0.
So, it means X = 2AT(AAT)-1 b. So, if I write this particular matrix as A+. So, this will
become A+ b.
367
So, here the matrix A+, which is = AT(AAT)-1 is called the pseudo inverse of A in
underdetermined case. So, please see the difference in over-determined case, I was
having this matrix as (A TA)-1AT. Here, I am having AT(AAT)-1.
So, with this what I can have and that the least square solution is given as X = A+ b. So,
what we have learnt from here that if system is over-determined, then solution is given as
X = (ATA-1)ATb.
If, the system is underdetermined means less equation than unknown then the least
square solution is given as (ATAAT)-1 b, you must be surprising why I have not used this
form in case of underdetermined system. Because, if I will use this form for
underdetermined system then my matrix ATA will be of size nn with rank less than
equal to m. So, m < n in underdetermined case.
So, hence this matrix will be always singular and you cannot calculate ATA-1, that is why
we have taken this form of the solution. Because, now we have taken A AT which will be
a matrix of mm with rank less than equal to m. So, if rank = m then you can apply this
method or least square approximation of this form. If rank is less than m then how to find
out least square solution that I will discuss in the next to next lecture. .
So, this is the underdetermined form means the least square solution for the
underdetermined system.
368
Let us take an example of it find the least square approximation of the linear system
2x 1 +x 2 +x 3 = 4 and 2x 1 -x 2 +x 3 = 2. So, here A is [2 1 1; 2-1 1] now AA Twill become [6
4; 4 6] which is invertible. So, from here I can calculate (AAT)-1, which comes out to be
[0.3 -0.2; -0.2 0.3].
And, now least square approximation is X = AT(AAT)-1 b and this comes out to be [1.2
1.0 0.6] means x 1 = 1.2, x 2 = 1, x 3 = 0.6.
One of the solution of this system you can see directly x 1 is 1, x 2 is 1 and x 3 is 1. This
solution is satisfying these two equations exactly; however, we are getting a different
solution using this least square approximation technique. Why? Because while deriving
this method I have told you, I have to look out of the infinite solutions, I have to look the
solution which is having the minimum norm of x. And the norm of x in this case will be
the minimum for a solution which satisfying these two equations.
So, in this lecture we have learnt least square approximation for over-determined as well
as underdetermined systems. We have taken couple of example for such type of systems
and how to apply least square approximation on those. However, few questions are
unanswered here like if (ATA)-1 in case of over-determined system and (AAT)-1 in case of
underdetermined system do not exist, then how to apply these techniques?.
I will speak about it in next to next lecture, because for finding the approximate solution
we have to use singular value decomposition tool and that I will discuss in the next
lecture, and then how to find out least square approximation using singular value
decomposition that will come in the next to next lecture.
369
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 28
Single Value Decomposition
Hello friends. So, welcome to the lecture on Singular Value Decomposition. So, singular
value decomposition or SVD in short, is a very powerful tool of a linear algebra and
matrix analysis. It is having variety of applications ranging from data analysis then row
rank approximation to image and signal processing.
So, the definition of SVD is the singular value decomposition of a matrix A of order
mn is the factorization of A the product of three matrices that is U S and V such that A
= U S VT, where U and V are having orthonormal columns and the matrix S is diagonal
with non-negative real entries.
370
(Refer Slide Time: 01:38)
So, what I want to say the matrix A is having size mn. So, this matrix A, I am writing
as the product of matrix U S VT. Here, the matrix U is of size mn and all the n columns
of U are orthonormal. The matrix S is a nn diagonal matrix and the matrix V, VT is
nn; so the V. So, like this, if I am having a matrix A like this so, mn, so from the size
itself you can see this A is having more rows than columns then it will be = mn matrix
U and then nn matrix S which is diagonal, and VT. So, this is called reduced singular
value decomposition.
Another type of singular value decomposition is called full SVD or full singular value
decomposition. So, there again we write A as the product of U, S, and VT, here U is an
orthogonal matrix of size, if this is mn then U will be of size mm, V is also
orthogonal of size nn and the matrix S is of size mn. So, this is full singular value
decomposition and this is reduced singular value decomposition. So, let us see this
further.
371
(Refer Slide Time: 04:47)
So, here I am saying A is mn matrix. So, it will become a matrix U which will be
having column u 1 , u 2 , up to u n and each u i will belongs to m dimensional space, means
having the m component and then I am having matrix S which is having a matrix of nn
and it is a diagonal matrix having diagonal entries are 1, 2 up to n and then I will be
having the matrix V which is VT which is again having nn matrix. So, it will be having
n columns and each v i belongs to n dimensional real vector space means each column is
a vector of this one. Moreover u i 's are orthonormal as well as v i 's are orthonormal. So,
these are orthonormal vectors.
Here, 1 will be means 1, 2,... n will appear in a non increasing order, moreover all
will be non negative, these 's is means 1, 2, 3 are called singular values of A.
Hence, I can say singular values are non negative. Furthermore, we will say they are not
only non negative they are also real.
372
Now, this is the case of reduced SVD if I talk about full SVD then I am having A, which
is of size mn then I am writing matrix U of size mn. So, here I will be having only m
columns and each u i will be having m component, here my matrices will be of size mn.
So, let us take a case first m> n means you are having the more number of rows than
columns in matrix A, then S will be written as [1, 0, 0, 0; 0, 2, 0, 0; 0 0, 0 n; 0 0
...0; 0 0...0] here. So, a nn diagonal sub matrix and then m-n number of 0 rows. So, it is
mn matrix where a mn sub matrix this one and m-m 0 rows. Finally, I will be having
a nn orthogonal matrix VT. So, it will be like v 1 , v 2 , v n . So, each v i belongs to Rn.
Now, this is the case if m< n then A will be this will remain same only S will change
here. So, now, the shape of S will be like this [1, 0, 0, 2, 0, 0, 0, m 0 0]. So, mm
sub matrix which is diagonal and then m-n number of 0 columns to make it mn. And
then finally, VT which is again an orthogonal matrix orthogonal matrix means all v i 's are
orthonormal all u i 's make a orthonormal set.
Now, this is the SVD in both the cases means how what will be the shape of matrix U,
what will be the shape of matrix S and what will be the shape of matrix V. Now, how to
compute?
So, how to do SVD? Basically, how to find matrix, the three matrix. So, how to find
matrix U? So, basically the matrix A is of size of size mn. So, if I take the matrix AAT
the size of this matrix will be of order mm. Now, further more this will be a symmetric
matrix.
373
So, the Eigen values of this matrix means AAT will be real, moreover it will contain an
orthonormal set of Eigenvectors since it is symmetric. So, the columns of matrix U will
be the orthonormal Eigenvectors of AAT. So, what I want to say the columns of U are the
orthonormal Eigenvectors of AAT. So, in this way we will be able to compute matrix U
which is orthogonal matrix.
Now, how to compute matrix V which is again orthogonal of size nn; so basically the
columns of V are the orthonormal Eigenvectors of ATA and this you can prove quite
easily because A is USVT. So, AAT will become USVT (USVT)T. So, it comes out to be
USVTVSTUT as VTT will become V
So, this will become U and it will be having shape like this [12, 0, 0, 0; 0, 22 0 0; 0, 0,
0, m2]. So, here you can see I can write AAT as U this matrix UT and what will be the
Eigenvalue of AAT they will be 12, 22 and n2.
Hence, it is a diagonalization of AAT, where the Eigenvectors are the columns of matrix
U and hence the columns of U are the orthonormal Eigenvectors of AAT. The similar
kind of analysis we can do for the columns of V those are the orthonormal Eigenvectors
of ATA. So, this is.
Now, how to find out S? So, S will be containing the singular values of the matrix A and
the singular values are 1, 2 up to n are the square root of the Eigenvalues of AAT or
ATA because both will be having the same sort of Eigenvalues. If one is having the
374
bigger size the rest of the Eigenvalues will be 0 because that is due to rare deficiency. So,
in this way we can calculate U, we can calculate V, we can write our matrix S and we
can perform the singular value decomposition of A that is USVT.
So, let us take an example of it. Find the singular value decomposition of A= [2, 2; 1, 1].
So, it is a square matrix for simplicity I have taken. I will take an example of rectangular
matrix also. So, here AAT will become [2, 2; 1, 1][ 2, 1; 2, 1] which comes out to be [8,
4; 4, 2]. So, the Eigenvalue of this AAT will come so, = 10 and 0. Here eigenvector
corresponding = 10 is so, (AAT-10I) x = 0. So, the solution of this will become x 1 = 2
x 2 = 1.
So, Eigenvector will become. So, x 2 is 1. So, x 1 will become [2, 1]T. Similarly,
Eigenvector corresponding to = 0 will become (AAT-0I)x = 0. So, this gives me 2x 1
= -x 2 . So, from here Eigenvector, I can write, if I take x-1 so, [1 -2]T. One more thing is
very important here we have to make this Eigenvectors orthonormal Eigenvectors. So,
we have to divide this by the norm of these vectors. So, norm of this vector will be root
5, norm of this vector will be root 5. So, my matrix U will become 1/(5)1/2[2 1; 1 -2] So,
this is the matrix U.
375
Now, I will find the matrix V. So, for that I will take ATA. So, ATA will become [2, 2; 1,
1] [2, 2; 1, 1]. So, this comes out to be [5, 5; 5, 5]. So, here again Eigenvalues come out
to be because both will be having the same Eigenvalue because Eigenvalues of A B and
B A are the same. So, AAT and ATA will be having the same Eigenvalue. Moreover, if I
find the Eigenvector corresponding to = 10. So, this comes out to be x 1 = x 2 .
Similarly, if I take = 0, I will get so, this gives me an Eigenvector [1 1]T and from here
I got x 1 = -x 2 . So, from here I got another Eigenvector [1,-1]T.
So, what I am having U is 1/(5)1/2[ 2, 1; 1,-2]; V = 1/(2)1/2[1, 1; 1-1]. Now, I need to find
out matrix S. As I told you my 1 which is the bigger singular value will be the square
root of the biggest Eigenvalue of AAT or ATA, so which is 10. So, it will become S=
[(10)1/2, 0; 0 0]. So, square root of 0 will be 0.
So, here A will be having singular value decomposition USVT where U=1/(5)1/2[ 2, 1; 1,-
2]; [(10)1/2, 0; 0 0].; V = 1/(2)1/2[1, 1; 1-1].. So, this is the singular value decomposition
of a square matrix.
376
(Refer Slide Time: 23:39)
So, example is find the singular value decomposition of matrix A which is given as [1, 0,
1, 0; 0, 1, 0, 1]. So, let us solve this. So, here AAT will become, it will be a 22 matrix,
[2, 0; 0, 2] and AT A will become [1, 0, 1, 0; 0, 1, 0, 1; 1, 0, 1, 0; 0, 1, 0, 1].
Now, let us first find out U for which I have to use AAT. So, Eigenvalues of AAT will
become 2 and 2. So, = 2, 2 and Eigenvectors will become U 1 = [1, 0] and U 2 = [0, 1].
So, here my matrix U comes out to be [1, 0; 0, 1]. Now, my 1 will become (2)1/2 and
2 also become (2)1/2 because they are the square root of the Eigenvalues of AAT.
Now, I am having a = USVT, where what I am having where U and V are orthogonal
matrices. So, if I premultiply by UT, I will be having UTA = SVT. So, by this lesson what
I can write so, in a more better way I can have taking the transpose of both side ATU will
become VST. So, from here I can write v i = that is the column of V AT u i /i from this
lesson because column wise if I want to calculate this one because S is a diagonal matrix.
So, that will come in denominator.
So, now if I calculate v 1, from here v 1 will become ATu 1 /1. So, A is given to you, I can
calculate ATu 1 /1. So, this comes out to be 1/(2)1/2[1, 0, 1, 0]T. From here if I calculate
v 2 which will become ATu 2 /2 which comes out to be 1/(2)1/2[0, 1, 0, 1]T as you know
that capital V is a 44 matrix. So, there will be four columns I have just calculated only
two columns. So, I need to calculate two more columns.
377
So, if I am able to find out two more orthonormal vectors that is v 3 and v 4 such that the
set v 1 , v 2 , v 3 and v 4 makes an orthonormal set of Eigenvectors then my job will be done.
Please note that here I am not going by the classical process means finding the
Eigenvalue of it and then finding the Eigenvectors corresponding to each Eigenvalue, I
want to make a shortcut and that I have taken from this particular relation. So, I need to
choose two more vectors v 3 and v 4 such that they makes an orthonormal set. So, if I
choose v 3 = 1/(2)1/2[ 1, 0,-1, 0]T and v 4 is 1/(2)1/2[0, 1, 0,-1]T then you can see that v 1 ,
v 2 , v 3 and v 4 are orthonormal and this can serve the purpose of columns of the matrix V.
So, here my matrix V comes out to be [1/(2)1/2, 0, 1/(2)1/2, 0; 0, 1/(2)1/2 0 1/(2)1/2; 1/(2)1/2
0, -1/(2)1/2, 0; 0, 1/(2)1/2, 0 -1/(2)1/2]. So, hence clearly V is an orthogonal matrix. Now,
only thing left to write the matrix S. So, as I told you in this case S will be of since my
original matrix of size 24. So, x will be of size 24 and it will be the Eigenvalue means
[(2)1/2, 0, 0, 0; 0 (2)1/2 0, 0] 0 columns as I told you in the definition of full SVD.
So, here I am having my U, S and V and here matrix A will be = USVT, where U is this
one, S is this one and VT can be obtain from this V and A is this one. So, this is another
way of doing singular value decomposition without computing the Eigenvectors of a
bigger matrix, but here you have to choose these vectors very carefully because this
would make a pair a set of orthonormal vectors.
So, there is a remark on singular value decomposition. For the positive definite matrix
means if A is a positive define matrix then SVD is identical to QSQT. So, it is identical to
some sort of factorization or what I will say diagonalization. Basically it will be
diagonalization only
378
(Refer Slide Time: 32:10)
Some facts about singular value decomposition if A is a real matrix of size mn and the
singular value decomposition of A = USVT, where U is mm matrix, V is nn both are
orthogonal then rank of matrix A will be number of nonzero singular values of A that is
rank of S and that will be r. So, rank of A matrix will be the number of nonzero singular
values because singular values will be non negative. So, they may be non nonzero or
zero. So, number of nonzero singular values will be the rank of the matrix A.
The column space of A is spanned by the first r columns of U that is the first r columns
of U will form a basis for column space of A. The null space of A is spanned by the last
n-r columns of V, means the solution of the system Ax = 0 that is the homogeneous
linear system will be given by the last n-r columns of V and those will be the columns
corresponding to zero singular values the row space of A is spanned by the first r
columns of V. And similarly the null space of AT is spanned by the last m-r columns of
U.
Hence, if you know the singular value decomposition of a matrix A. Then you can write
all the four fundamental subspaces of that particular matrix that is column space, null
space, row space and then null space of AT.
So, in this lecture we have learnt singular value decomposition. We have taken couple of
examples that how to do singular value decomposition of a matrix. In the next lecture,
we will find out a relation between least square approximation and singular value
decomposition. So, these are the references.
379
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 29
Pseudo-Inverse and SVD
Hello friends. So, welcome to the lecture on Pseudo-Inverse and Singular Value
Decomposition. In the last lecture we have learned least square approximation of over
determined and under determined systems. However, we have used the matrix ATA in
case of over determined system and AAT in case of under determined system.
And, in particular for finding the pseudo inverse we have use the inverse of these 2
matrices. At that point I told you that if the inverse does not exist, then we will discuss
this case later. So, in this lecture we will discuss that case.
So, let A X=b be an over determined system, where A is a mn matrix with mn
because it is an over determined system. So, in the last lecture we have learned the least
square approximation of this can be written as like this X =A+ b, where A+ is the pseudo
inverse of A and it is defined as (ATA)-1AT. So, if rank of the matrix A is less than n.
This implies that ATA and the determinate of ATA =0 and hence (ATA)-1 does not exist.
380
So, now question is how to find least square approximation of such a system?. So, for
doing this we will make use of singular value decomposition for calculating the pseudo
inverse in this particular case. So, let A =USVT be the singular value decomposition of
matrix A, where U is a mm orthogonal matrix and V is a nn orthogonal matrix and the
matrix S is a mn matrix containing the singular values of A. So, means we are using
here full singular value decomposition.
So, now, we are having a system AX =b. So, A can be written as USVT X =b from here I
can write X =USVT-1 b. This can be written as (VT)-1 let us say S-1, I am writing as S+U-1
b. So, from here I can write X equal to V is an orthogonal matrix. So, VT is also
orthogonal. And here (VT)-1 will become (VT)T which will be your matrix V.
S+ and U-1 will become UT because U is also an orthogonal matrix b. Now, you can
easily write V and UT, because this you are having already in singular value
decomposition of A.
For the matrix S+ is defined as. So, if ij are the entries of S, then the entries of S+, I am
defining as ij+ and this equal to 0 if ij =0 and it is =1/ij if ij is non zero.
381
So, for example, if you are having a matrix S, which is m by n and the rank of the matrix
A is r r is less than minimum of m and n. So, it will be less than m in case of over
determined under determined system and less than n in case of over determined system.
So, in both the case AT A as well as A AT will be singular matrices.
And, hence the inverse of these 2 matrices do not exist. So, in this case my S will be
something of this shape [1 0 0... 0; 0 2 0 0... 0; 0 0 r 0 0.... 0; 0... 0]. So, these will
be matrix A in case of over determined system a mn matrix and then what I am having.
So, it is mn matrix. So, these will be total m- r, 0 rows.
In this case your S+ will become a nm matrix now which is having 1/1 because 1 is
non-zero here. So, it will become [1/1 0 0 0 0 1/2 0, 1/r 0 0 ] and then what I will be
having; I will be having n-m 0 rows, because there will be 1/0 so, 1/0, I will replace by 0,
we will see this in an example and then so, this will become nm matrix which is having
now m-r, 0 columns, columns if you see here S+ will become an identity matrix of rr
and then 0 blocks accordingly.
So, in the first r rows there will be one in the diagonal entries and rest of the places it will
be 0. So, hence in this way we can define the pseudo inverse of S as S+. And, the pseudo
inverse of A will become A+ which is VS+UT. So, let us take an example of this.
382
Consider an example of fitting a line through data points let us say (1 2), (2 3), and (3 5).
So, let us solve this example using the approach which I told you just now. So, here let
the equation of line becomes y =mx+c. So, here I am having (x 1 y1 ), (x 2 y2 ), (x 3 y3 ).
So, now from the first equation I will get or in matrix form I can write it [x 1 1; x 2 1; x 3
1][m c] =[y1 y 2 y3 ]. So, please note that in each case the coefficient of c will be 1,
because here c is having coefficient as 1. So, if this is my matrix A this is X this is
equal to b, then my matrix A will become here [1 1; 2 1; 3 1], x 1 = 1 x 2 = 2 x 3 = 3 and
here b will become that is the right hand side vector [y1 y 2 y 3 ].
So, now, I need to find out the values of m and c. So, it is an over determined system if I
perform the SVD of A then the matrix A can be written as USVD USVT. So, where the
matrix U will be a 33 matrix because m is 3 here it is 32. So, mn and the entries of
this will be [-0.3231 0.8538 0.4082; -0.5475 0.1832 -0.8165; -0.7719 -0.4873 0.4082].
Here, the matrix S will become a 32 matrix, which will be having 4.0791 is the first
singular value of A the biggest one, 0 then 0 2 will be 0.6005 and then a 0 row m-n
number of 0 row. So, this will be 3-2 =1, one 0 row. And finally, V will be a 22 matrix
the entries of V will be [-0.9153 -0.4027; -0.4027 0.9153] here U and V are orthogonal
matrices.
And, we have learn in the previous lecture that how to calculate these matrices? The
alternate way of doing it is using the MATLAB software and there you are having direct
command for finding the singular value decomposition SVD of A. So, if you will
perform USV = SVD of A in MATLAB, you will get these 3 matrices. So, this is just an
alternate for doing it to save the calculation efforts.
So, I am having now these 3 matrices. So, now, my pseudo inverse of A will become A+,
which will be VS+UT here V will be this matrix S+ will be now A 23 matrix. So, it will
be 1/4.0791 000 1/0.6 0 0 5 0 UT. And, if I calculate these matrix A+ so, it will be A now
23 matrix, because A is 32 matrix which is given as [-0.50 0.5 1.3333; 0.3333 -
0.6666]. So, this is my A+.
383
So, now if I calculate my X, which will become A+b. So, A+ my matrix b is [2 3 5] the
vector b. So, which comes out to be X=[1.5 0.3333] as the least square approximation of
this system so, in this way we can calculate the least square approximation solution using
the singular value decomposition. In particular we have used the singular value
decomposition for calculating the pseudo inverse.
Now, let us make the analysis of it how we have done it LSA with SVD or this LSA, I
can also write LLS means linear least square solution. So, for doing this again consider
AX =b, where A is mn matrix with rank of A =r, which is less than minimum of m, n.
Then, if it is a an over determined system, the least square solution is to minimize ||AX-
b|| 2 2 and we have to find out such X.
384
So, if in particular I check about this particular thing then, this I can write as ||Ax-
b|| 2 2=UT(AVVTX-b)||2 2, here you know that V.VT = I. So, AVVTX will become simply
AX-b. So, this bracket term is similar to this one. Moreover I am pre multiplying this
particular vector by UT. And, in singular value decomposition of A you know that U and
V are the orthogonal matrices.
So, this I am replacing with S. So, ||SVTX-UTb||2 ; so this is equal to this one let VTX = Z
another vector. So, from here what I can write, this can be written as S Z-UTb this one or
if I open this norm this can written as i =1 to r (i z i -u i Tb)2, where U i are the rows of
matrix U2 and why I am taking up to r because the matrix A is having only r nonzero
singular values. After, that all i where i > r will become 0 so, this term will become 0.
So, if this will become 0 then i = r+1 to m it is will become simply (U i Tb)2.
So, from here I can write z i =u i Tb/i, for i =1 2 up to r, from the first term and then it
will be some arbitrary for i = r+1 up to n. As, a result if z i =this for first r, then first term
will become 0. So, minimum X and the least square error in this solution will become
385
summation i = r+1 to m (u i Tb)2. Also, you know that recall Z =VTX. So, from here I can
write X =VZ, because V is an orthogonal matrix.
Moreover, the length of the vector X, =VVTX, because VVT is an identity matrix this
will become VTX is z. So, Vz and since V is an orthogonal operator. So, it will preserve
the norm this is equal to z. So, from here I can write hence least square approximation of
AX =b given as X* that is the solution i =1 to r u i Tb/i this is my z i and then v i ,
because z =VTX.
So, in terms of singular values I can write the solution of least square solution in this
way. So, if we see the earlier example, which we have taken for fitting a line.
There X* that is the least square solution can be given as i =1 to 2 because we are having
only 2 singular values u i Tb/iv i . So, this will become u 1 Tb/ 1v 1 +u 2 Tb/2v 2 . And, if I
calculate this particular term it is a scalar value this comes out to be 1.5072 and then v 1
is now the first column -0.9153 and this will become 4027- 0.298 this is this particular
scalar value, that is -0.298 multiplied with second column of the v that is -0.4027 and
then 0.9153.
So, if I do, it comes out to be [1.4995 0.33342], which is the same [1.5 0.333], which
we obtained earlier with A+ means which we obtained already with X =A+b hence the
claim is verified. So, this is another way of doing the analysis of least square
approximation using the singular value decomposition. And, here we have seen that both
the answers are equal.
386
Now, take one more thing means one more example where we are having 0 determinant.
So, take one more example where we are having 0 determinant of the matrix A.
So, example 2: solve x 1 -2x 2 +x 3 =3; 2x 1 -4x 2 =0, and then x 1 -2x 2 +3x 3 =9 and solve
means find out the minimum norm solution in least square approximation of this system.
Here, if you see the matrix A is given as [1-2 1; 2-4 0; 1-2 3] and the right hand side
vector b is [3 0 9]. If, you check here determinant of A comes out to be 0 and hence you
cannot obtain A-1 and you cannot find the exact solution like X =A-1 b. So, here if I
perform the singular value decomposition of A, then A will become U S VT where U
again will be A 33 matrix.
So, this is my matrix U the matrix S will be a diagonal matrix, a 33 diagonal matrix
having singular values of A. So, first singular value is 5.7807, second singular value is
2.5659 and then third singular value is 0, because determinant is 0. So, at least 1 of the
singular value will be 0.
And finally, matrix V is given as [-.4178 -0.1596 0.8944; -0.8355 .3192 0.4472; -0.3568
0.9342 0]. So, here pseudo inverse of A+ will become VS+UT. So, you can write V from
here, S+ will become [1/5.7807 0 0; 0 1/2.5659 0; 0 0 0], the last entry according to rule
it should be 1 upon 0, but 1 upon 0 is not defined. So, as I told you for writing the S+, 1/0
387
will be replaced by 0, UT and this matrix will be a 33 matrix and then the solution is
given as X =A+b, which comes out to be [0 0 3]. So, this is the minimum norm solution
that is the least square approximation of this system using this approach.
And, if I calculate this particular term this will be a scalar multiplied with first column of
v, this will be again a scalar multiplied with second column of v, this again comes out to
be [0 0 3], which is same as X =A+b using the earlier method. So, in this way I have told
you the 2 different way of solving linear systems in particular finding the least square
solution of the linear systems, in case when the matrix A is having rank r and the size of
A is mn and r is less than minimum of m and n.
So, in the next lecture we will learn another type of systems those are called ill condition
systems. And then we will learn how to solve those systems using the concept of singular
values.
388
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 30
Introduction to Ill-Conditioned Systems
Hello friends. So, welcome to the lecture on Introduction to Ill-Conditioned Systems. So,
in this lecture we will learn when a system will be called ill-conditions, apart from that
we will learn how to measure the ill-conditioning of a given system. So, for introducing
the ill-conditioned system let us take one example.
So, consider a 22 system x1+x2=2 and then x1+1.000x2=2.0001. So, here the coefficient
matrix is [1 1; 1 1.0001] the unknown variable column is [x1 x2] and right hand side
column vector is [2 2.0001] now the solution of this system is x1=1 and x2=1 this you can
verify now make a small change in one of the entry of the right hand side vector that is
you are having earlier the vector [2 2.0001]T.
Let us say due to some sensor error or due to some computer what I will say sort of thing
this 2.0001 will be read as 2. So, now, the new system is [1 1; 1 1.0001][ x1 x2]=[2 2] if
we see the solution of this system the solution is x1=2 and x2=0. So, what we have seen if
389
I make a small change in one of the entry of the right hand side vector then my solution
is having a large change because earlier solution was x1=1 x2=1.
But due to this small change now it will become x1=2 and x2=0, in the same way take
one more example. So, here I am having again a 22 system where the coefficient matrix
is [400 -201 -800 401] this is the coefficient matrix the vector is again [x1 x2] and the
right hand side vector is [200 -200].
In the example 1, we have made a small change in the entry of the right hand side vector
now let us make a small change in one of the entry from the coefficient matrix. So, this is
the 22 system now if I look for the solution of this system the solution is given as x1= -
100 and x2= -200 now if I make a small change in an entry of the coefficient matrix
because in the earlier example I have made change in the right hand side vector now I am
making the change in coefficient matrix.
So, let us say earlier a11 entry was 400 now make it a11 as 401. So, the new system is
[401 -201; -800 401] together with [x1 x2] and right hand side vector is the same [200 -
200]. So, now, the solution of this new system becomes x1=40000 and x2=79800.
So, now you just notice I am having only a small change earlier it was 400 now it
becomes 401 how much change, I am getting in my solution earlier x1 was -100 now x1
become 40000; x2 was -200 now x2 becomes 79800. So, what we have observed that if
you make a small change either in an entry of the right hand side vector or in an entry of
the coefficient matrix a very small change tiny change. We are getting a very large
change in our solution so, such type of systems are called ill-conditioned systems.
390
(Refer Slide Time: 08:04)
So, if we make a small change in my input data for the linear system it will become an
entry of right hand side vector or coefficient matrix and get a large change in the solution
which is output then the system is called ill-conditioned systems. Now, let see us how to
measure the ill-conditioning of a system for that we need to define conditional number.
So, let A be an invertible matrix then the conditional number of A is denoted by k(A)
and it is defined as k(A) = ||A|| ||A-1|| now when the conditional number K(A) of a matrix
or of the coefficient matrix, I will write becomes large the system AX=b is regarded as
ill-conditioned.
So, you find out the conditional number of the coefficient matrix and if it is very large
then we will say the system is ill-conditioned if the conditional number of A where A is
the coefficient matrix is near to one then the system is called well conditioned. So, hence
we can measure the ill-conditioning of a system by calculating the conditional number
and if it is quite large then the system is ill-conditioned.
391
(Refer Slide Time: 12:53)
So, now if I find out the conditional number of A, this will be ||A|| ||A-1|| which comes out
to be something around 42002 almost. So, hence point something like this. So, hence it is
quite large and that is why the system Ax=b is ill-conditioned, similar kind of analysis
we can make for another matrix alternate definition for conditional number is if be the
singular values of an invertible matrix A.
So, all will be greater than 0 then conditional number of A is the ratio of the largest
singular values upon the smallest singular values. So, if we see in the above example the
singular values of A will be. So, if A=USVT then S will be [2.000 0; 0 0.0005]. So,
from here conditional number K(A) will be 2.0005/0.00005 and which will be the same
40002 which you have obtained with the definition ||A|| ||A-1||. So, by this alternate
definition you can calculate the conditional number if you know the singular values of A
given matrix.
392
Now, let us make some investigation why we are having this change in what I will say
the change in solution a large change in the solution by making a very tiny change in the
input data means either an entry of coefficient matrix or an entry of right hand side
vector.
So, first we will check if you have a small change b+b in b means in right hand side
vector. So, let X be the solution of the original system AX=b by making a small change
in b that is b becomes b+b, we get new solution as X+X. So, what we are having A
(X+X)=b+b ok.
So, this I can write Ax+AX=b+b, but as you know this Ax=b. So, this Ax I can
replace with b. So, from here I am getting AX=b and b will be cancel out from both
side AX=b or from here X=A-1b or ||X||= ||A-1b||||A-1|| ||b||-----(1), because
||A|| b||A|| ||b|| ||b||. So, this let us say equation 1 moreover we have ||AX|| ||A|| ||X||
and ||b||||A|| ||X||----(2).
So, ||b|| ||A|| ||X|| let us, I can write in other way ||A|| ||X|||| b||. So, let us say this is my
equation 2 dividing 1 by 2 we get ||X||/||A||.||X|| because left hand side of 1 is X and
left hand side of 2 is ||A||.||X|| and this will be less than equal to ||A-1||.||b||.||A-1||.||b||,
393
sorry because it is b here /||b||. If I multiply the this inequality by ||A||, I will get
||X||/|| X|| in the left hand side which is less than equal to ||A||. ||A-1|| and ||A|| ||A-1|| is my
conditional number into ||b||/||b||.
So, from here you can observe if the conditional number is very large if I am having a
small b, I will get a large X. This means if K is very large then the product of small
b with large K(A) will be significant and that is why I will get a significant X means
significant change in my final solution. So, this is the investigation when I am making
the change in the right hand side vector and from here we have seen that why for the
large conditional number a small change in right hand side vector will give a large
change in the solution.
Now, take the second case when I am making a small change in my coefficient matrix
means if I am having the original coefficient matrix A. And now I am making a small
change and it is becoming A+A and if earlier my solution was X now it becomes
X+X then what I will be having A+A . X+X=b, because there is no change in the
right hand side vector by making the same kind of analysis which we have done earlier
in the first case I will get X/X+X will be less than equal to conditional number of A
.||A||/||A||.
So, again if conditional number of A is quite large a small A multiplied with a large
K(A) will give you a very large value and that is why I am getting a large change in X.
So, this two kind of investigations such that why conditional number is very important
for ill-conditioning and if conditional number is near about 1 then a small change
multiplied with an entry which is quite close to 1 will give you the small change X.
394
In this lecture we have learnt what are the ill-conditioning systems what is conditional
number what is the relation between conditional number and singular values and we have
investigated why the system becomes ill-conditioned when we are having the large
conditional number.
In the next lecture we will learn how to solve such type of ill-conditioned system by
defining a proper regularization term.
395
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 31
Regularization of Ill-Conditioned Systems
So, in this lecture we will learn regularization of ill-conditioned systems means how to
solve ill-conditioned systems by defining a suitable regularization term.
396
So, if you recall last to last lecture there we have learnt if we are having a system AX = b
with singular value decomposition of A as U S VT then least square solution of this
system is defined as AX-b, then by making some calculation and defining Z = VT X. We
have seen that Zi = UiTb/i for i = 1, 2 up to r and arbitrary for i = r + 1 up to n.
Later on by seeing that the ||Z|| = ||X|| we have written the least square approximation of
X as summation i = 1 to r where r is the number of nonzero singular values UiT b/iVi.
Now, just look here it will become u1Tb/1V1 +U2Tb/2V2+
.....+UiTb/iVi+......+UrTb/rVr.
Now, if any one of i is small or I will say is very small then a small change in b gives a
large change in the solution X* this is why because b is numerator and i which is very
small close to 0 is in the denominator. And if you will divide something with a very
small value which is very close to 0 you will get a large change and that is why that
conditional number is 1/r means the biggest singular value upon a smallest singular
value, and if it is very small this is quite large and system is ill-condition. And the same
kind of analysis we can make here if r is very small if you make a small change in b
you will get a large change in X* means your system will be ill-conditioned.
Now how to avoid this problem of ill-conditioned system, that we will learn in this
particular lecture.
397
So, regularization of ill-conditioned system: so, consider an ill-conditioned system AX =
b where 1/r is quite large where 1, 2,... r is the smallest singular value of A then
it might be useful to consider the regularized least square solution instead of the earlier
one, here earlier one means which we have learnt in previous lectures. So, this solution is
defined as you just find out X which minimize 1/2||X-b||+ here we are having a
regularization term.
So now, instead of minimizing this only which we have taken in the earlier case we are
minimizing this whole thing what will happen when you will minimize the ||X||, it will
stop to have a large deviation in the X and hence your solution will not be having the
large change. So, this particular term is called regularization term and this kind of
regularization is called Thikonov regularization which is quite popular in regularization
theory. So, here I am telling it in terms of linear systems in terms of matrix analysis.
So, here this particular parameter will be a positive real number. And it is called the
regularization parameter which is not known a priori and has to be determined based on
the problem data we will learn how to determine it by taking an example.
398
Now, we have to minimize this particular function let us say this equation 1. So now,
minimization of 1/2||AX-b||22+ /2||X||22 and will be = Min x||(A, ()1/2I)x-(b, 0)||22. So,
both of these are similar. So, let us say it equation 2.
Now for >0, so if is non 0 what will happen the size of matrix (A()1/2)I will be means
it will be having m+n number of rows m from A, n from I, because I is a nn identity
matrix, n number of columns and has always full rank, full rank means n because even
though if A is 0, I is a nn matrix. So, the rank of this matrix will be always n and you
are having a rank and coefficient matrix and x is having n number of unknowns. So, you
will be always having a unique solution.
So, the regularized system 2 has a unique solution always. So, now how to find out this
unique solution? So, we have learnt in a lecture if you are having a system AX = b where
A is a rectangular matrix then the least square approximation of this A will be ATAX =
ATb, and from here X will be written as ATA-1ATb which is the pseudo inverse of A.
So, in the same way now the normal equation so, this is called a normal equation to this
system so, now the normal equation to regularized system 2 can be written as (A
()1/2I)T because like AX = b. So, here A is replaced by (A ()1/2I)T X = (A ()1/2 I)T(b 0)
or this can be written as (ATA+I) X= AT b
If I take the singular value if the SVD of A = U S VT then this I can write (U S VT)TA.
So, A will become U S VT+I, I can write V VT X = U S VTb. So, what I have done
where ever I am having A in this normal equation I am writing its singular value
decomposition.
So, from here if I write this (U S VT)T U S VT. So, this can be written as VSTUT because
ABCT is CTBTAT , USVT. So, UTU will become I, because U is an orthogonal matrix. So,
this can be written as VSTSVT. So, this system I can write VSTS+IVT and this X equal
to what I am having here VSTUTb. Now if I premultiply this particular equation by a
matrix VT then this will become VTV = I. So, STS+IVTX = VTV will become ISTUTb
now put VT X = Z as we have done the earlier case.
399
(Refer Slide Time: 16:20)
So, then it can be written as (STS+I)Z = STUTb, now if you look at this matrix it is a
diagonal matrix because S is a matrix which is having non 0 values only at few of the
diagonal. So, STS will be a square matrix having non 0 values if they are at the diagonal
main diagonal only. So, then the solution Z, can be given as here i due to this ST U IT b/
STS will become i2 + for i = 1 2 up to r and it will be 0 for i = r+1 up to n.
Here, if I calculate the least square approximation in the same way which I have done
earlier it will become summation i = 1 to r if i is having r non singular values iUiTb/
i2+ . Vi, now in the earlier case where we have not use the regularization term, I
obtain this solution as I = 1 to r uiTb/i Vi and in the regularize case I am obtaining this
solution. Now if is tending to 0. In this case this particular solution put = 0. So, i
will cancel i2 this will become this particular solution.
Means = 0 means we do not have the regularization term. So, this is the same solution,
but with the regularization term now how this solution is useful even though if a i is
very small ok, you can find out a suitable so, that if you make a small change in b it
will protect to having a large change in the final solution due to the choice of . So, here
what I can say. So, what I can write here adding /2 means this regularization term to the
ordinary least square. So, this term acts as a filter means contribution from singular
values which are larger relative to the regularization parameter means the those are
greater than are left almost unchanged means you will get this kind of solution.
400
And where as you are having small when compare to the the solution will be this
particular complete term will be treated as 0.
So, I can write this particular term as iUiTb/i2+ this can be treated like as a filter and
it will be 0 if you are having a singular value which is very close to 0 small singular
values. So, there is no contribution from that term in the solution and it will become
UiTb/i, means here ordinary least square if i is greater than , hence we can protect
our solution to have a large change, because the term which will give you large change
we have made the coefficient of that term as 0.
So, there will be no contribution of that particular term now question arise how to choose
. So, here suppose the data are b = bex+b ok. So, bx is a data without any perturbation
and b that the possible perturbation in my right hand side vector b. So, then I can write
my Xex as i = 1 to r UiTb/i Vi. So, this is the minimum norm solution of ordinary least
square with unperturbed data right hand side vector. So, this will come in this form.
Now, but we can only compute b we do not know how much perturbation is there in the
b that is bex+b since we do not know bex. Now the solution of regularized linear least
square problem is X* = i= 1 to r (iUiTb/(i2+)+iUiTb/(i2+) )Vi. So, here I have
written regularize linear least square b, I have written as bex+b here.
401
(Refer Slide Time: 25:14)
So, these 2 facts suggest us. So, we suggest to choose sufficiently large to ensure that
b in the data are not magnified by small singular values. So, in this way by choosing a
suitable if you know the singular values of the matrix A choose somewhere which is
greater than some of the small singular values, so that the contribution due to those small
singular values will be neglected in the regularized least square approximation.
So, in this lecture we have learnt how to perform regularization or in this lecture we have
learnt Thikonov regularization of the linear systems we have learnt how to choose
regularization parameter. And we have done the analysis of regularize least square
approximation based on the singular values.
402
(Refer Slide Time: 28:30)
403
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 32
Linear System: Iterative Methods I
Hello friends. So welcome to the lecture on Iterative Methods. So, in past few lectures
we have learnt about singular value decomposition, least square approximation and their
analysis in terms of singular values.
So, now in next few lectures we will learn another method for solving linear system of
equations, but by using iterative methods. So, in the beginning of this course we have
learnt direct method for solving linear systems. Some of them are like Gaussian
elimination and then LU decomposition. like in LU decomposition we write the
coefficient matrix A in terms of 2 matrices product i. e. L and U, where L is a lower
triangular matrix and U is an upper triangular matrix. And then by using the forward
substitutions and then backward substitutions we solve the system of equations. Now in
the category of iterative methods we can solve the square linear system means we are
having n equations in n unknown.
404
(Refer Slide Time: 01:42)
So, iterative methods for Ax=b begin with an approximation to the solution let us say x0,
which is the initial solution and then seek to provide a series of improved approximations
like x1, x2 up to like in this way that converts to the exact solution or somehow we reach
near to the exact solution. These iterative methods are quite popular in engineering for
engineers because we are looking for the approximations and once we are having a
required precision means a marginal error between the exact solution and the sequence of
approximation then we can stop.
So, these are in this way we can save the time or computation for solving the linear
system. Moreover when we are solving especially partial differential equation using
some numerical methods, we encounter the large and sparse systems. We will discuss
what we mean by the sparse system. And these iterative methods are quite useful, when
we are solving the large and sparse system when compared to the direct method.
405
A general form of an iterative methods for solving a nn linear system Ax=b can be seen
in this way xk+1 means the approximation of the solution in k+1 iteration equal to a
matrix P which will be nn matrix into the solution in k-th iteration+Ann vector, n1
vector q.
So, here the matrix P is called a iteration matrix, there are different schemes next couple
of lectures we are going to discuss three different schemes and they differ in the process
of computing the iteration matrix and the vector q from the input data that is the
coefficient matrix A and from the right hand side vector b. So, from these data A and b
we will compute our iteration matrix p and the vector q and then using this iteration
process we will find out the sequence of approximations.
So, first we will take the Jacobi method, so the Jacobi method is the simplest iterative
method for solving a square linear system Ax=b square means our matrix A is a square
matrix, this method use the concept of simultaneous displacements.
We can write a nn matrix A as the sum of a lower triangular matrix let us say L, a
diagonal matrix D and an upper triangular matrix U. So, if you take this 33 matrix then
I can write this equal to this matrix L which is a lower triangular matrix + a diagonal
matrix D+ an upper triangular matrix U.
406
(Refer Slide Time: 05:04)
So, I am talking about Jacobi method consider the system Ax=b where A is a nn
matrix. Now, you are having Ax=b write A as the sum of 3 matrices L, D and U lower
triangular, diagonal and upper triangular into Ax=b or here write it like DX= -
(L+U)X+b. So, what I have done I have taken this L and U in the right hand side, or I
can write X = - D-1(L+U)X+D-1b and set your iterations in this way.
407
So, X at k+1 iteration can be given by -D-1(L+U)X at k-th iteration+D-1b, which is equal
to the general iterative system PX(k)+q a column vector q. So, here iteration matrix P is -
D-1(L+U) and the column vector q= D-1b. So, this is our Jacobi method if we take A
means how to solve a system.
Now, for setting the iterations what I will do the same system I can write x1=1/a11
provided a11 is not 0, b1-a12x2-a13x3. Similarly, I can write from the second line x2=1/a22
b2-a21x1-a23x3 and then x3=1/a33b3-a31x1-a32x2.
So, now set this at left hand side at k+1 iteration and in the right hand side take
everything at k-th iteration. So, this is the iterative equations for Jacobi method for a 33
system. When I introduce you about Jacobi method I told you this method is a method of
simultaneous displacement, why I told it, because here you can see that for finding k+1
for each component x1 x2 x3 for each variable, I am using the value of these variables
from the previous iteration. So, what I am doing simultaneously for getting the value at
k+1 iteration, I am using the value of k-th iteration. So, now, we will take one example
and we will solve this example.
408
(Refer Slide Time: 11:07)
Similarly x3 at k+1 iteration is given as 1/8(27+2x1-3x2) at k-th iteration. So, this is the
initially scheme if I take the initial solution x1(0)=x2(0)=x3(0)=0 means my initial solution
is (0 0 0)T which is my initial x then what I have.
My x1(1) will become 1/4(8-0-0) = 2, x2(1) will become -1/5(-14), so, this will become 2.8
and then x3(1) will become 1/8(27), so, it will be 3.375, so this is the initial value of x1 x2
x3 at first iteration.
409
(Refer Slide Time: 14:55)
So if I go in the same way in the second iteration I got x1 as -1.931, x2 as 5.350 and x3 as
2.825. If we need a solution correct up to 3 decimal places then we need to calculate this
sequence of values. And if we go in the same way in the third equation x1 will be
obtained like this x2 will be this one.
And x3 will be 0.886 in fourth iteration continuing in the same manner in twenty second
iteration value will be like this in twenty third iteration value will be like this still you
can see we are having not much flexibility in the value of x2, but in the value of x1 and
x2. Moving continuing in the same way, we see in the forty fifth iteration x1 comes out to
be -1, x2 is 3 and x3 is 2 which remains same in 46 iteration, so it is my exact solution.
410
So, we have taken 46 iteration for converging to the exact solution from the sequence of
approximations and this is the Jacobi method.
My next method in this category is Gauss-Seidel method which is a bit better when
compared to the Jacobi method in terms of convergence. So, the Gauss-Seidel method is
a variant of the Jacobi method that usually improves the rate of convergence by using
successive displacement in Jacobi we have used the simultaneous displacement. So, here
we will use successive displacement, just recall the iterative scheme of the Jacobi
method in the earlier example.
411
So, this was the iterative scheme of the Jacobi method in the example which we have
taken in case of Jacobi method. So, if you observe here I have calculated from the first
equation x1(k+1) iteration when I am calculating x2 in k+1 iteration using the Jacobi
method I am using the approximation which I have obtained for x1 and x3 in k-th
iteration.
So what I can do instead of using the old values I can use the more updated values and in
that manner I can have a faster convergence. So, in Gauss-Seidel method we use such
type of updated values.
So, if we see the derivation of this method then I am having Gauss-Seidel method.
412
(Refer Slide Time: 18:13)
So, again I am having nn system Ax=b I will write matrix as the sum of 3 matrices L D
and U, where L is lower triangular D is diagonal and U is an upper triangular matrix,
(L+D+U)X=b.
Now, in this method what I will do I will take D+L here and what I will take I will take-
U into X in the right hand side. So, from here I can write X = -(D+L)-1UX+(D+L)-1b and
the iterative scheme can be set like this X(k+1) means in the left hand side I am taking the
value in k+1 iteration, which is equal to -(D+L)-1UX(k)+(D+L)-1b. So, this is the iterative
scheme of the Gauss-Seidel method. So, here if you check the iteration, iteration matrix
P comes out to be -(D+L)-1U this is the iteration matrix P, and the vector q is (D+L)-1b.
413
So, if you again take a 33 system like a11x1+a12x2+a13x3=b1; a21x1+a22x2+a23x3=b2;
a31x1+a32x2+a33x3=b3. So, if I need to solve this system using the Gauss-Seidel method
then my iterative scheme will comes like this. So, x1(k+1) iteration will be 1/a11b1-a12x2 at
k-th iteration -a13x3 at k-th iteration.
Then I will take x2 at k+1 iteration this will become 1/a22b2-a21 and please note that this
is the change from the Jacobi method, earlier in Jacobi method I have taken x1 at k-th
iteration, but here since I am having x1 in k+1 iteration available with me. So, I will have
x1 at k+1 iteration -a23x3 at k-th iteration, and then x3 in k+1 iteration this will become
1/a33b3-a31 and I am having x1 at k+1 iteration -a32x2 at k+1 iteration. Because now I am
having x1 and x2 both available at k k+1 iteration, so this is the scheme of the Gauss-
Seidel method.
So, consider the same example which we have taken earlier in the case of Jacobi method.
So, we will solve the same example using the Gauss-Seidel method. So, according to
whatever just I have told, I have told you according to that iterative process my iterative
equations becomes like this. So, x1 at k+1 iteration will be 1/4(8-2x2(k)-3x3(k)) which is
similar as in case of Jacobi method.
414
(Refer Slide Time: 23:41)
So, in this way if I start with (0, 0, 0) means initial case for x1= 0 for x2= 0 and for x3= 0
then, in the first iteration my x1 become2; x2 becomes 4 and x3 becomes 2.375. Then in
the second iteration x1 will become -1.781, x2 becomes 2.681, x3 becomes 1.924. In third
equation x1 becomes 0.784, x2 becomes 3.099 and x3 becomes 2.017, these are the values
of x1, x2, x3 in the fourth iteration.
So, this is actually the exact solution also so what we have observed in Jacobi scheme.
We have taken 46 iterations for converging to this solution. However, in the Gauss-
Seidel method we have obtained the same solution means solution with same accuracy
415
just in 10 iterations. So, in this way we can say the Gauss-Seidel method is a better
variant of the Jacobi method, because it makes the usage of successive displacements
more updated values.
So, in this lecture we have learned the two iterative schemes those are very basic
schemes one is the Jacobi method, and the other one is Gauss-Seidel method. In the next
lecture we will learn one more scheme called successive over relaxation, and then we
will discuss the convergence criteria and few results about the convergence of these three
schemes. And then we will go to non-stationary iterative methods like steepest descent
conjugate gradients and other Krylov subspace methods for solving large and sparse
linear systems.
416
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology Roorkee
Lecture - 33
Linear Systems: Iterative Methods II
Hello friends, so welcome to the second lecture on the Iterative Methods for solving
linear systems, so in the previous lecture we have learnt two schemes.
One is Jacobi and another one is Gauss Seidel, for solving the linear systems, it has been
seen that Gauss Seidel method converges towards the exact solution in less iterations
when compare to the Jacobi scheme. there are many problems where these two schemes
do not converge at all in this lecture we will learn a more generalized scheme that is
called successive over relaxation finally, in the last page of this lecture we will discuss
the conditions under which these iterative schemes means Jacobi Gauss Seidel and
successive over relaxation converges.
So, as I told you in the previous lecture these schemes are called non stationary iterative
methods means these are of the form X(k+1) equals to a iteration matrix PX(k+q), so these
are the stationary method.
417
(Refer Slide Time: 01:43)
So, now consider Ax=b write a as the sum of 3 matrix is (L+D+U)Xb. Now, what you do
you write (D+wL+(1-w)L+U)X = b. So, if you observe this is just wL+L-wL. So, this is
just this L only now what you do you write D over iterations like this (D+wL)X(k+1) =-
((1-w)L+U)X(k)+b.
Now, if in this equation if I take w=0 then what will happen. So, it will become DX(k+1) -
1L+UX(k)+b. So, this becomes jacobis scheme, if I take w = 1 then it becomes Gauss
Seidel scheme, for w = 0.5, this is somewhere between Gauss Seidel and Jacobi, for w>1
we have a method beyond Gauss Seidel. So, in this way we can say we are having some
sense of over relaxation in this case and for certain problem it turns out to be highly
effective method.
418
(Refer Slide Time: 04:50)
Now, consider an iterative scheme for a nn linear system Ax = b as (D+wL)X(k+1) = -((1
-wL+U)X(k)b.
Now, it is possible that is the same thing which I have discussed in the last slide. So, it is
possible to recast the above scheme in such a way that the matrices on the left and right
hand sides are lower and upper triangular respectively. So, it is lower and it is upper this
allows us to use the concept of successive displacement, and over relaxation can be
implemented in a manner similar to the Gauss Seidel method with new variables values
overwriting old ones as soon as they become available.
419
This type of iterative method is known as successive over relaxation or in short SOR. So,
the iterative process of SOR method is the same as the other stationary methods that
x(k+1) = PX(k)+q where w is between 0 to 2, where w is called a relaxation parameter for
w>1 we say it over relaxation, for w<1 we say it under relaxation.
One more thing that how to compute the optimal value because w will be between 0 to 2.
So, what will be the optimal value for a given problem, so that the schemes converge
faster?
So, that optimal value of the w for the SOR scheme can be given by this one. So, w
optimal will be 2/2(1-(1-2)1/2), where is the spectral radius of the Jacobi iteration
matrix and if you can recall Jacobi iteration matrix is -D-1(L+U). So, you calculate all the
eigenvalues of this and the eigenvalue bigger in the magnitude will be the value of .
That is the spectral radius of this iteration matrix P and once you are having you can
calculate the optimal value of w using this particular formula.
420
Once you are having w then you can put the value of w here and then you can calculate
the iteration matrix P as well as column vector q, and you can write your iterative
equations of SOR method. So, for this let us take this particular example.
So, consider the linear system [2 -1 0; - 1 2 - 1; 0 - 1 2]. So, this is the coefficient matrix
A multiplied with column vector of variable [x1 x2 x3] equals to right hand side vector b
as [7 1 1], develop the iterative scheme using the successive over relaxation method also
perform 3 iterations of this method by taking an initial solution as (0 0 0), so x1 is 0 x2
is 0 and x3 is 0.
So, if we see now the coefficient matrix A can be written in terms of lower triangular
matrix L. a diagonal matrix D and an upper triangular matrix U. So, for this matrix L will
be this; D will be like this and U will become this matrix.
421
(Refer Slide Time: 09:00)
Now, as we know the iteration matrix for successive over relaxation is given by this
particular scheme. So, if I calculate here the iteration matrix in terms of w, comes out to
be in this form. So, first element is 1 - w second is w by 2 third one is 0, then in the
second row first element is w(1 - w)/2 second element is 1 - w+w2/4 and the third
element in the second row is a w/2. Then in third row first element is w2(1 - w)/4; w(1-
w)/2+w3/8 and the third element is 1-w+w2/4. So, this is the matrix which is the iteration
matrix for successive over relaxation method and the right hand side column vector q
comes out in this form. So, now, what we need to do we need to calculate the optimal
value of the w and we have to substitute that value here to get the iteration matrix P in
the form, so that we can perform the iterations. So, as I told you the optimal value of the
w will be a function of where is the spectral radius of the Jacobi iteration matrix.
422
So, if I calculate the Jacobi iteration matrix for this problem it comes out to be -D-1(L+U)
and in this way, the eigenvalue of this matrix are 01/(2)1/2 and - 1/(2)1/2. So, it gives the
spectral radius as 1/(2)1/2 and hence the optimal factor for SOR scheme is w optimal is
2/2(1 - (1-2))1/2 by using this formula and it comes out to be 1.171573. After
substituting this value of w into this iteration matrix and the column vector q.
We obtained this matrix P as the iteration matrix and this column vector as the q now
take [x1 x2 x3] as 0 in the initial solution. So, in the first iteration we obtained x1 as
4.1006, x2 as 2.9879 and x3 as 2.3361.
These are the values of x1 x2 x3 in the second iteration and in the third iteration we
obtained x1 as 5.8283, x2 as 4.8731 and x3 as 2.9606. So, these are the 3 iterations of
SOR method.
So, this is how we can implement the successive over relaxation scheme, first write
down the iteration matrix in terms of w from the formula of iteration matrix as well as
the column vector q, find out the optimal value of w substitute there and from there you
will get the iterative equations for the successive over relaxation method.
423
(Refer Slide Time: 12:34)
After this implementation we will talk about the convergence analysis for all these
iterative methods.
So, we are talking about convergence analysis of the stationary iterative schemes for
solving linear systems. So, consider a linear system Ax=b where A is a n by n matrix and
let the exact solution of this system becomes x equals to s.
So, if we apply stationary iterative scheme on it, then we are having the scheme of the
form x(k+1) = P(x(k)+q, where P is the iteration matrix and q is column vector now since s
is the exact solution. So, what we are having s = Ps+q.
424
Because s is the exact solution and once you obtain the exact solution then whatever
iteration you perform you will get the same solution that we have seen in case of the
example of Jacobi as well as Gauss Seidel in the previous lecture where in case of Jacobi
in forty fifth and forty sixth iterations you are getting the same solution and in case of
Gauss Seidel you are getting the same solution in ninth and tenth iteration.
So, now let us say equation 1 and this is my equation 2, now if we subtract equation 2
from equation 1. We have x(k+1)-s = Px(k) - s ok, define e(i) = xi - s as the error in the ith
iteration. Then what we are having we can write this equation as e that is error in k+1
iteration, e(k+1) =Pe (k).
Or this I can write norm of error because this error will be a vector for different variables
x1 x2 x3 up to xn this can be written as norm of the error vector in kth iteration. So, what
we are having we are having ||e(k+1)|| = ||Pe (k)
|| and this can be written as ||P|| ||e(k)|| into
norm of e k.
So, what we are having e(k+1) ||P|| ||e(k)||. Now if this ||P|| becomes less than 1 then what
we are having the error in k+1 iteration is less than error in kth iteration. Now this is
quite important, this tells us that whatever error you are having in first iteration in the
second iteration you will be having the less error whatever error you are having in
second iteration in the third iteration you will be having the less error means you are
going towards the exact solution and this is happening due to this condition.
So, based on this what we can say they we can write the sufficient condition for the
convergence of an iterative scheme, and that sufficient condition becomes that even if
our initial approximation has a large error the first iteration is certain to reduce that error
425
the second is certain to reduce it further because in each iteration you are having less and
less error and so on.
It follows that the error in kth iteration will be 0 when k is standing to infinity which
implies that the sequence of approximation will move towards a exact solution as k
tending to infinity, because this error is tending to 0 hence a sufficient condition for the
convergence of iterative equation can be stated as if the norm of matrix ||P||< 1, then the
iterative scheme x(k+1) = Px(k+q) is convergent for any initial solution, because whatever
large error you are having initially it will reduce in each subsequent iteration.
Here one important remark I would like to mention because there are different types of
matrix norms. So, if we choose a particular matrix norm say the infinity norm and if we
find that the this infinity norm for the iteration matrix P is greater than 1 this does not
indicate that the iterative scheme will fail to converge, because it is a sufficient condition
not necessary first of all.
And moreover there may be some other matrix norm such as 1 norm, Frobenius norm
that is strictly less than 1 in which case convergence is still guaranteed. So, there should
be 1 norm is less than 1 that will do our job then guaranteed, we can say that the iterative
scheme will be convergent for any initial solution. So, in any case the condition, the
norm of iteration matrix P is less than 1 is only a sufficient condition for a convergence
not a necessary one, now we will write the necessary and sufficient condition.
426
(Refer Slide Time: 21:33)
The iterative scheme x(k+1) = Px(x+q) for solving a linear system Ax=b is convergent for
any initial for any initial solution, if and only if every eigenvalue of P satisfy <1, means
every eigenvalue of P is less than 1.
Or in other way we can say it the spectral radius of P is less than 1, so this is the
necessary as well as sufficient condition for the convergence means if scheme is
convergent then the every eigenvalue of P will be less than 1 and if every eigenvalue of P
is less than 1 then the scheme will be certainly convergent.
These conditions I have spoken they are on the iteration matrix P, we can we say some
condition on the coefficient matrix A because the coefficient matrix A is directly
available to us from the problem, P we need to calculate. So, here I am telling one more
condition which can be implemented on the coefficient matrix A and that sometimes
guarantee of convergence can be established by direct inspection of the coefficient
matrix A.
That is as I told you without needing to compute the iteration matrix P in particular if A
has the diagonally dominant property. Then the Jacobi and Gauss Seidel methods are
both certain to converge, and a matrix is set to be diagonally dominated if in each row.
427
The absolute value of the entry on the diagonal is greater than the sum of the absolute
values of the other entries.
So, consider this particular example, in this example comments about the convergence of
Jacobi iterative method for the following matrix. So, here this is my coefficient matrix.
So, if you see the coefficient matrix here the diagonal entry in first row is 5 and some of
the absolute value of the other entries are 3. So, 5 is greater than 3 here 8 is greater than
2+1 and here 4 is greater than 2+0.
So, it is a diagonally dominant matrix strictly diagonally dominant and hence the scheme
will converge, moreover if you want to check with the iteration matrix, if I calculate
iteration matrix in case of Jacobi method. It comes out to be in this way and if I find out
the norm of this the infinity norm is 0.10 and 1 norm is 0.75, both are less than 1 and
hence these are the sufficient are these guaranteed the scheme will converge.
Jacobi scheme will convergence, comments about the convergence of Jacobi iterative
method for the following matrix, now this is a an interesting example why I am saying
interesting let me describe it.
428
(Refer Slide Time: 24:55)
Now what you do, you just interchange first and third equations. So, if I interchange it
my system will become 5x1-x3+3x3=b3, the second equation will be remain like the
earlier and this -2x1+4x3=b1. Now these 2 systems are same they will be having the same
solution just I have interchange the order of equations.
Now, if you see the coefficient matrix of this system it will become [5 - 1 3; 2 - 8 1; - 2 0
4]. So, what will happen if I take this coefficient matrix it is diagonally dominant,
because in each row diagonal element is having the bigger value when compare to the
sum of the absolute value of other one.
In terms of absolute value and what you do apply the Jacobi scheme on this matrix
instead of this one, because you will get the same solution only and your scheme will be
guarantee converge.
429
(Refer Slide Time: 27:44)
Take one more example let A becomes [4 2 - 2; 0 4 2;1 0 4] here matrix is not strictly
diagonally dominant, because if you see the first row 4 = |2|+|-2|, so which are same.
So, what we are having here we cannot comment even though we change the order of the
equations it will not, it will become more worst. So, we cannot comment anything after
seeing the coefficient matrix. So, if we talk about Jacobi method and let me calculate the
iteration matrix for Jacobi method. So, it will become - D-1(L+U) and it comes out to be
[0 - 0.5 0.5; 0 0 - 0.5; - 0.25 0 0].
If I calculate this norm it comes out to be 1, and if I calculate this norm this is also 1. So,
what will happen not diagonally dominant this norm is 1 this norm is 1. So, what is the
other choice left with us calculate this also, because based on this I cannot say whatever
information I have calculated I cannot say anything about the convergence. So, calculate
frobenius norm and fortunately it comes out to be 0.901, which is less than 1 and hence
since one of the norm is less than 1, then convergence is guaranteed it satisfy the criteria
of sufficient condition.
In this lecture we have learned about the successive over relaxation scheme then we have
seen the convergence conditions for different stationary schemes. In the next lecture we
will talk about non-stationary iterative scheme so.
430
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture – 34
Non Stationary Iterative Methods: Steepest Descent I
Hello, friends. So, welcome to the lecture on Non Stationary method Iterative Methods.
As you know, in the previous couple of lectures we have discussed stationary iterative
methods, those includes the Jacobi method, then Gauss-Seidel method and successive
over relaxation.
Why we say them the stationary method because the iterative equations for those
methods was given like this X(k+1) equals to an iteration matrix P which is of the same
size as the coefficient matrix into X(k) means the value of the non vector at k-th iteration
+ q. So, in Jacobi, Gauss-Seidel as well as successive over relaxation method we have
seen that these two means the iteration matrix and the column vector q both are constant
throughout the iterations, means we have calculated them once and then we did not
change them. So, that is why we told them stationary method.
Today we are going to discuss non stationary methods. In general we use these non-
stationary methods for solving large and sparse linear system. So, consider a linear
system AX=b, where A is a nn matrix which is non-singular large means n is quite
431
large and sparse I will tell you what we mean by sparse matrices then the linear system
AX = b is called sparse linear system. The stationary method non-stationary methods
those we are going to discuss in next couple of lectures are quite useful for solving such
type of sparse system.
Basically, these systems these means large and sparse occurs quite frequently in
engineering and science computations those involved numerical solution of partial
differential equations, means especially when you are applying finite difference method.
Now, if we talk about non stationary iterative methods then these are methods where the
data changes at each iteration; means not like the stationary method where the iteration
matrix P and the column vector q were fixed.
These are method of the form X(k+1) =X(k)+kdk means the vector which you have to
calculate, a vector of a non variable at k+1 iteration equals to the vector at k-th iteration
plus kdk. So, here the data k and dk change in each iteration k. That is why I have put
here suffix k in both of them. Here dk is called the search direction and k is called the
step length.
This category of methods includes line search methods; we will discuss two methods in
this category and Krylov subspace methods. We will discuss steepest descent method in
432
the earlier category and then conjugate gradient method. These methods are quite useful
in case of large and sparse linear system.
So, let us define now sparse matrix. A matrix is said to be sparse if very few entries of it
are non zero; it means most of the entries of the matrix are zero valued. For example, if
we take a matrix A which is given as [0 0 3 0 4; 0 0 5 7 0; 0 0 0 0 0; 0 2 5 0 0]. Then it is
a 45 matrix which is a sparse matrix, because here only six entries are non zero. So, if I
define the sparsity of this matrix then sparsity is number of non zero entries upon total
number of entries. So, if I take this matrix the sparsity of this matrix is there are total
fourteen number of zero entries and 20 is the total number of entries in this matrix.
So, it is 0.7 or I can define that the matrix is 70 percent sparse. The opposite of the sparse
is dense matrix. So, a dense matrix is a matrix in which most of the entries are non zero.
In the similar way we can define the density of a matrix. So, density will be number of
non zero entries upon the total number of entries. So, for a given matrix density+sparsity
will be equals to 1.
Now, the direct method like Gaussian elimination is not computationally efficient for
solving if the system is large and sparse. Why, because if you take a nn system and you
use Gaussian elimination method for solving that there will be total order of n3
operations. If most of the entries are zero then using n3 operations on those zero entries
are not a wise way of solving such systems. Moreover you know in the sparse system
most of the entries are zero, but if you apply elementary row operations on that matrix in
433
Gaussian elimination method then zero entries will become non zero. So, in that way
these two facts we can say the system which is large and sparse cannot be solved
efficiently using the direct method like Gaussian elimination.
So, what is the alternative? Alternative is iterative methods and in this category means
non stationary methods first we are going to discuss; stationary methods category first
we are going to discuss the gradient methods.
So, consider a quadratic form q(x) because you know already about quadratic form. So,
q(X)=1/2XTAX-XTb. So, you have seen this form earlier also in positive definite
matrices lecture and what we are going to do, let us try to find out the minima of this
method. So, our aim is to minimize this method. Further here A is a nn matrix and
which is a symmetric matrix because you know we can always associate a symmetric
matrix with the given quadratic form and X is a unknown vector having n component b
is also column vectors.
Now, our aim is to minimize q(X), which is a function from Rn to R for some given b.
The gradient of q can be treated as residual and computed as gradient of the functional
q(X)=AX-b here. Moreover the Hessian matrix of q is given by the Jacobian of the
gradient; means Jacobian of q, that is, Hessian matrix of which will be a function of X
434
it is Jacobian of gradient of q and this comes out to be A because you have to find out
one more time.
And, if A is positive definite then solution of one will be the point of minima. Why
because, A is positive definite. So, for second order derivative is positive here for the
functional q(X). So, what we can claim? Thus the functional q(X) has a unique minima
in case when A is positive definite.
And, if this minima at X* the stationary point of q(X) which is q(X)* = AX* - b = 0,
because X* is a stationary point. Hence for a symmetric and positive definite in short let
me write it as SPD; S for symmetric and PD for positive definite, matrix A solving
AX=b is equivalent to finding to find a minimum of q(X) =1/2XTAX - XTb.
So, what I want to say that in gradient methods for a given system AX = b I will write
such a functional q(X) and I will find out the minima of that function that will be
automatically a solution of AX - b and exact solution if it is not then it will be an
approximate solution having the minimum residual error because this is defining the
residual.
So, in this category the first method we are going to discuss is steepest descent method.
So, this method is based on a greedy algorithm in that it chooses the search direction dk
in that at the iteration k as the local direction of steepest descent, that is dk = -q(X)k and
this I define as the residual rk in K-th iteration.
435
(Refer Slide Time: 19:40)
So, algorithm of steepest descent is given in this way. So, it is so, algorithm for steepest
descent for solving AX = b. So, our first step will be choose an initial solution is X0
which is nn dimensional column vector. Now, for K equals to 0, 1, 2 and so on, do
calculate the residual which is as I told you rk = - gradient of q(X) so, it will become b-
AXk because gradient of q equals to AX - b. Compute the search direction k=<rk,
rk>/<rk, Ark>. So, in each iteration we will calculate this search direction.
Once you are having rk and k in kth iteration you can update your solution in k+1
iteration by using the iterative equation. So, iterate the sequence of solution as Xk+1 =
Xk+ kdk and as you know that dk = rk. So, this I can write in this way also Xk+k.rk
because in steepest descent rk = dk. So, this is the algorithm for steepest descent method
let us take an example on it.
436
(Refer Slide Time: 22:45)
So, let us try to solve it. So, as we have discuss we can apply the steepest descent method
for solving a linear system in case when the matrix A is symmetric as well as positive
definite. So, if I check it, the given matrix A is a symmetric matrix which is given here
because A = AT, it means A is symmetric.
Now, if I check for positive definiteness so, there are different types of criteria for
checking positive definiteness of a given matrix. So, here it is a 33 matrix. So, I can
apply the principle minor test. So, here a11 is 3 which is greater than 0. Now, if I check
the next principle minor here then it will be 3 - 1 - 1 and 3. So, this comes out to be 8,
which is again positive and then determinant of A comes out to be 20, which is again
positive.
So, hence A is a positive definite matrix and we can apply steepest descent method for
solving AX = b. Now, let us apply this method.
437
(Refer Slide Time: 27:01)
So, let us say first iteration k = 0. So, for k = 0, X0 is given as (0, 0, 0)T. Now, I will
calculate r0 which is b - AX = 0. So, it comes out to be b is given as [- 1, 7, - 7] - A. So,
here A is [3 - 1 1, - 1 3 - 1, 1 - 1 3][0 0 0]. So, this will become [- 1 7 - 1] which is same
as your X0.
Now, compute the step length 0. So, according to algorithm it is inner product <r0, r0>
/<r0, Ar0>. So, this will be r0Tr0/r0TAr0 and this comes out to be r0Tr0 will be 99 and our
denominator will give you 423. So, it is 0.2340. So, here X1 comes out to be X0+0 r0
and it comes out to be -0.2340, then it will be 1.6383 and then - 7 into this. So, - 1.6383
now, calculate r1, r1 will become b-AX1 which comes out to be 2.9787, 0.2128, and
finally, - 0.2128. So, from here we calculate 1 which one is r1.r1/r1TA.r1 and it comes out
to be 0.3667.
So, then the second approximation means in the second iteration become X2 which is
X1+1r1 and it comes out 0.858, 1.7163 and then - 1.7163. Then once you are having X2
I can calculate r2 residual in the second iteration. So, b - AX2 and this comes out to be -
0.148, 0.9929 and - 0.9929, once we are having r2 we can calculate 2, which is
r2T.r2/r2TAr2 and this comes out to be 0.2340.
438
(Refer Slide Time: 31:50)
So, Xr2 as well as 2r2 available with us. So, we can calculate X3 the approximation in
third iteration which will become according to the generally scheme or iterative equation
of the non stationary method X2+2r2 which comes out to be 0.8250, 1.9487, - 1.9487.
If I calculate r3 here it will become b - AX3 which is 0.4225 0.0302 and - 0.0302. The
solution is converging and slowly I will say here towards the exact solution which is
given as X bar = 1, 2 and - 2. So, in this way we can apply the steepest descent method
for a given problem.
So, this method is quite simple, in each iteration you have to calculate the residual or I
will say the search direction which is residual only gradient of minus of the gradient of q
and the step length which can be a calculated from the search direction and the matrix A.
However, the drawback of this method is, this method is quite slowly, quite slow in
terms of convergence.
In the next lecture we will learn how can we apply this method when the given matrix A
is not positive definite or symmetric or both. And, how can we update or how can we
increase the convergence of this method means in what way we should take our initial
solution so that we can have a faster convergence.
439
(Refer Slide Time: 34:43)
So, these are the references for this lecture in particular reference 1 is quite important for
all these line search method and krylov subspace method when you are solving large and
sparse linear systems.
440
Matrix Analysis with Applications
Dr. Sanjeev k umar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 35
Non Stationary Iterative Methods: Steepest Descent II
Hello friends. So, welcome to the second lecture on Non Stationary Iterative Methods.
So, in this lecture we will continue the topic, which we have discussed in the previous
lecture means steepest descent methods, we will see few more property of this gradient
method. So, in previous lecture I told you that steepest descent method works like this.
You are having the residual as b-AXk and which is the search direction also. Then what
we have taken we have taken alpha k , that is the step length as rkTrk/rkT Ark. Now, the
first property we are going to look here which is a very important property in steepest
descent method that is in steepest descent method consecutive search directions.
So, this is direction dk we denote it by dk in general setting are orthogonal to each other.
Means, what I want to say that dk is orthogonal to dk+1 for k equals to 0 1 2 and so on,
and d0 is orthogonal to d1 then d1 is orthogonal to d2 and so on. So, let us try to prove it.
441
So, we have dk+1 which is basically rk+1 residual and this is b-AX k+1 in gradient descent
steepest descent.
This equals to b-A and Xk+1 will be Xk+kdk. So, this becomes b-AXk- is any scalar.
So, is a scalar. So, kAdk , b-AXk can be written as rk-kA and dk is also rk. So now, if
I check the inner product of dk+1 with dk which is inner product of rk+1 with rk this
becomes rk dk-krk; sorry Adk with dk, because dk+1 I am writing in this form.
So, inner product of dk with dk+1 will be first term will come rk dk-k. I am taking out Ark
or I have written A dk here with dk. This becomes rk . rk so, I am writing everything in
terms of rk, because dk equals to rk, if you see the value of k k is inner product of rk
with rk/rkA rk,. A rk rk.
So, this will be 1. So, it will become 0. So, the inner product of 2 consecutive search
directions are 0, means they are orthogonal to each other.
Now, let us see another variant of steepest descent method that is steepest descent
method when the given matrix A is not symmetric as well as positive definite; so non-
symmetric steepest descent. So, we have seen that in steepest descent method the matrix
A must be symmetric and positive definite. In order to have a unique minima of
functional q(X)=1/2XTAX-XTb, which is also a solution of the linear system A X = b.
442
Now, just consider that A is not SPD. SPD stands for symmetric and positive definite,
but it is non-singular. Then, the matrix ATA is symmetric as well as positive definite.
And, the algorithm can be applied instead of AX = b we can apply it to the normal
equation of AX = b and which is ATAX = AT b.
So, this I can write AX = b, where  is AT A and b̂ is AT b. So, here you can easily see
that  is symmetric and positive definite. So, I can apply the steepest descent method.
So, this is the strategy for applying steepest descent method for a general system, where
A is not symmetric and positive definite let us take an example of it.
Solve the linear system AX = b using steepest descent method with initial solution as (0
0 0)T and A is given as [3 1 0; 0 3 2; 1 1 0] and b is [4 1 0]. So, if you see that the matrix
A is neither symmetric nor positive definite. Hence, we cannot apply the gradient descent
steepest descent algorithm directly on this system. Means, we cannot minimize the
functional 1/2XTAX-XTb using the steepest descent method. So, what we will do here we
will apply the method on ATAX = ATb. So, let us first calculate ATA. So, ATA will
become [3 1 0; 0 3 2; 1 1 0]T. [3 1 0; 0 3 2; 1 1 0] and this comes out to be [10 4 0; 4 11
6; 0 6 4].
The same time we calculate ATb and it becomes [12 7 2]. So, now, instead of the
original system AX = b, we are going to solve ATAX = AT b. So, for applying the
443
steepest descent method here this matrix would be positive definite, because it will be
symmetric, it is product of a matrix with its transpose. So, it will be symmetric always.
So, positive definite if we check here so, 10 is greater than 0 if I take this [10 4 4 11].
So, this is 94, which is greater than 0 and determinant of ATA comes out to be 16 which
is again positive. So, hence ATA is a symmetric and positive definite matrix and we can
apply the steepest descent method here.
So, let us apply here the method. So, my r0 will become ATb-ATAX0, which comes out to
be [12 7 2]T. Now, I calculate my 0 which is r0T. r0/r0TAr0. So, it will come out to be a
scalar and in the similar way as we have done in the previous lecture we can apply the
steepest descent method.
And, the solution of this system will be the solution of the original system in particular
this procedure; where the steepest descents we are applying on the normal equations of
the original system. So, this procedure is called the residual norm steepest descent. Here,
the functional being minimized is so, it will be instead of q(X), I am writing (X),
because it is a different function and it is 1/2<AX, AX> - <X, AT b>.
If, you check the earlier one in case of extended steepest descent it was q(X), which was
1/2<X, AX> - <X, b>. So, here you can notice that we are having this AT b instead of b it
is because, now my b is in the normal equation AT b. And, similarly instead of this X, I
am having AX here because it is, I am applying of ATA instead of A.
444
And, this method minimizes the Euclidean norm of the residual that is AX-b. And, if you
can recall the least square approximation method in that method we have written the
solution like X= ATA-1ATb. So, here this solution, which we are obtaining with residual
norm steepest descent method is similar what we are obtaining using least square
approximation.
Now, let us see another important property of the steepest descent method.
So, this I will write as instant convergence of steepest descent method. So, let us write
this result. So, if the initial error; that means, r0 which is b-AX0 is an eigenvector of the
coefficient matrix A in the steepest descent method, then the method converges in just
one iteration only.
So, what I want to say if your initial solution you choose in such a way, that the initial
residual or initial error becomes an eigenvector of the coefficient matrix. Then the
steepest descent method will converge in just one iteration to the exact solution let us see
the proof of this. So, let V be an Eigenpair of A. So, is an eigenvalue and
corresponding eigenvector is V it means A.V = V let us say this is equation 1.
Now, assume initial solution as X0, which is the error. So, this is X*-V, where X* is the
exact solution of AX = b. So, what I am assuming here I am taking the initial solution as
445
the error, if I choose my initial solution in this way then what you can see from here that
the error in the initial solution will be. So, initial error e0 will be or r0 here X*-X0, this is
X*-X0 is X*-V. So, this will become V.
So, I am taking initial error as the eigenvector of the matrix A. So, now, calculate r0. So,
calculate r0, r0 will become b-AX0. So, b-A X0 we have chosen X*-V. So, b-AX*+AV
AX* = b. So, b will be cancel out it will remain as AV and AV is nothing just V.
So, the initial residual is a scalar multiple of V, which is again an eigenvector same
eigenvector here if we calculate 0. So, 0 is inner product <r0, r0>/< r0, Ar0>. So, this
comes out to be V.V/V. AV. So, you can take out AV and this comes out to be
1/ because AV again become the V.
So, in the numerator you will be having 2 in the denominator you will be having 3. So,
it comes out to be 1/. So, the step length is 1/ and initial residual is times that
eigenvector, now calculate X1.
446
So, X1 will become X0+0r0, and what is my claim my claim is that the steepest descent
method will converge in just 1 iteration. So, this X1 should be equal to the exact solution
which is X* in our case.
So, it will X0+0 is 1/ r0 is V. So, this comes out to be X0+V and X0+V is nothing just
X*, because X0 is X*- V and which is exact solution of AX = b. So, in this way we have
seen that if the initial error you choose the initial solution in such a way that the initial
solution error is an eigenvector of the matrix A, then the steepest descent method will
converge in just 1 iteration.
So, if somehow you are having an idea of the eigenvector of A you can choose your
initial solution in such a way that the error in the zeroth iteration will be that eigenvector
then your method will converge in just 1 iteration this is the idea. So, let us take an
example based on this. So, consider A = [3-1 1; -1 3-1; 1-1 3]. So, this is the same
example which we have taken in the earlier lecture.
And, here b the same b, I am taking [-1 7-7]. So, solve AX = b using steepest descent.
So, here if you see one of the eigenpair of A is 2, [1 1 0]. So, this is eigenvalue and this
is corresponding eigenvector. So, is an eigenpair of A. If, we choose X0 as X-V, where X
is the exact solution of this so, it comes out to be [0 1-2].
Then, my r0 will become b-AX0 and b-AX0 means. So, b-AX0 this comes out to be [2 2
0] Here, if I calculate 0 it will be 1/ so, 1/2. So, what is X1, X1 is X0+0 r0 X0, which is
[1 2 -2], that is the same as the initials as the exact solution X*.
So, by this example we have verified the result, which is given in the theorem that the
steepest descent method will converge in just one iteration.
447
(Refer Slide Time: 27:38)
So, these are the references, for this lecture and in this lecture we have seen some
properties of steepest descent. And then we have seen the residual norm steepest descent
method for general system AX = b, where the matrix A is not A SPD matrix means
symmetric and positive definite matrix.
In the next lecture we will learn another gradient method that is called conjugate gradient
method, which is having faster convergence when compare to the steepest descent
method.
448
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 36
Krylov Subspace Iterative Methods (Conjugate Gradient Method)
Hello friends. So, welcome to the 36th lecture of this course. So, as you remember in
past couple of lectures I have discuss the steepest descent method which is coming from
the class of non stationary method. Here we are going to look another type of non
stationary method which is called Krylov subspace iterative methods.
So, before going to the methods I will introduce you what we mean by Krylov subspace.
So, let me write the definition of Krylov subspace.
So, let A be a nn matrix having real entries and b be a column vector having dimension
n. Then the Krylov subspace which is denoted as kjAb is defined as, it means the j-th
Krylov subspace of A and b is the linear span of the vectors b, A.b, A2b and in this way
up to Aj-1b. And you can observe that again this Krylov subspace is a subspace of Rn
because all these vectors are having n components. So, all these vectors are coming from
the vector space Rn. Let us take an example of it. Let A equals to [2 1, 1 2] and b equals
449
to [1 -1] then if I calculate Ab it will be [2 1, 1 2][1 -1] and it comes out to be [1 -1]
which is equals to A2b=A3b and so on.
So, here Krylov subspaces that is if I write k1(Ab)k1(Ab) will be the linear span or let me
denote this linear span by L(1 -1)T. Similarly k2(Ab) will be linear span of vector b and
Ab. So, this will be linear span of b and Ab are the same vector here; so [1 -1]T. So, in
this way we can define the Krylov subspace.
Next I will define Krylov matrix. Again let A be n by n matrix having real entries and b
is a column vector having n component. Then Krylov matrix of j-th order of A and b is
the matrix having first column as b, the second column will become A b and in this way
the j-th column will become Aj-1.b. So, in this way if I am having if I want to define nth
order, so this j = n and in that case it will become a square matrix of order n. Next what
is the motivation of using Krylov subspace for solving the iterative system?
450
So, let us consider, so I want to talk about motivation. So, consider the matrix A which is
a 33 matrix having entries [0 2 1, -1 3 1, 2 2 3]. Now, the characteristic polynomial of
A is CA() which is just determinant of A-I and if I calculate, it comes out to be a
polynomial in of degree 3 which is 3-62+11-6.
Now, by the Calley Hamilton theorem we have. So, what Calley Hamilton theorem tells
us? That every matrix satisfies its characteristic equation, so it means I can write A3-
6A2+11A-6I, where I is an identity matrix of order 33, in this case equals to 0 which is
a null matrix of size 33 or I can write from here if A is an invertible matrix then A-1 =
1/6[A2-6A+11I]. So, what I have done? I have multiplied this characteristic equation
means matrix in terms of matrix by A-1 and I have written in this way let us say this is a
polynomial of matrix A.
Now, from here I can say that the inverse of A can be expressed in terms of a polynomial
of matrix A. Now, if we have a system AX = b, where A is an invertible matrix then the
solution of this system can be written as X = A-1b, which I can write in case of this
matrix 1/6[A2-6A+11I ].b which is nothing just p(A).b, if I do it what I will get 1/6[A2b-
6Ab+11b].
If you see this particular vector that is [A2b-6Ab+11b], it is a column vector having 3
components. So, 31 vector this is just linear span of A2b, Ab and b which means this
vector can be written as the linear combination of A2 b, A b and b which is in this from
and this is nothing just Krylov subspace which is k3(Ab). So, hence the solution of the
system AX = b is a vector from the Krylov subspace k3(Ab). So, that is the motivation of
using Krylov subspace for solving iterative system.
Now, what I want to mention here if A is a large and sparse matrix, but it is non singular
then Krylov subspace method is a very effective method for solving linear system. We
will get the solution with less computational complexity as compared to any other
iterative method extended iterative methods like Jacobi and Gauss Seidel or some direct
method. So, this is the motivation of using Krylov subspace method.
451
(Refer Slide Time: 12:26)
One more thing I want to write about, this that is a main result. So, we can summarize
this main result as let AX = b be a linear system, where A belongs to Rnn and b is a
vector from n tuple real vector space we do not care whatever large n. Then there exist a
real polynomial p given as p(t) = j = 0 to m-1 jtj, where 1, 2 all these 0, 1 up to m-
1, all these are scalars, all these are real numbers basically such that the solution of
system AX = b can be written as X = p(A)b and this can be written as j = 0 to m-1
jAjb, now this is the p(A).
Now, it is not necessary here that m equals to n, means the degree of this polynomial p(t)
will be a characteristic polynomial of the matrix A. One of such case is the use of
minimal polynomial because minimal polynomial of a matrix A is also 0 and from there
we can find out A-1 which is having pure degree sometimes compared to the
characteristic polynomial and that is why I was saying that these methods are quite
effective and having less computational complexity.
452
(Refer Slide Time: 15:12)
Now, just try to formulate method using this sub spacing. So, let me revisit Krylov
subspaces the Krylov subspace kjAb is the column space of the Krylov matrix which is
quite obvious because the columns of Krylov matrix is nothing just the vectors bA, bA2
up to Aj-1b.
Now, for solving a linear system we want to choose the best combination as our
improved Xj in j-th iteration if you are having Xj-1. Now, there are various ways of
defining this best or there are various ways of choosing Xj in j-th Krylov subspace of A
and b. So, let us see what are those ways. So, the first way is the residual rj which is
nothing just b-AXj. So, choose Xj in such a way from the Krylov subspaces kj(Ab) such
that the residual rj is orthogonal. So, rj should be an orthogonal vector to kj(Ab) means, rj
is orthogonal to all vectors from this Krylov subspace. If you use this strategy then we
will have a method that is called conjugate gradient method or in short CG method.
The second strategy is the residual rj has minimum norm for Xj in kj(Ab). So, if we
follow this strategy then the method name is we got another Krylov subspace iterative
method that is called GMRES method. So, this is gradient method for minimum with
minimum residual. The other way of defining this space maybe rj which is the residual is
orthogonal to a different space kj(ATb). So, here you just notice that instead of A, I am
having Krylov subspace of AT and b this method is called biconjugate method. The next
453
strategy is the error ej has minimum norm. So, the error in j-th iteration has minimum
norm and this strategy we follow in a method called SYMMLQ. So, in this way we are
having these 4 Krylov subspaces method based on defining this based in different way.
There are other methods also. In this course I will focus on this method because if you
know one method you can follow the other methods from the literature in a simple
manner.
So, let us learn; what is conjugate gradient method. So, it is one of the standard Krylov
subspaces method which is used when the system is large and sparse, but the matrix that
is the coefficient matrix A is symmetric and positive definite. Like in the case of steepest
descent method, so it is as SPD in short.
So, let us go to the algorithm how this method executes. So, input is a symmetric and
positive definite matrix A which is the coefficient matrix of the linear system AX = b,
right hand side vector b and an initial guess it 0, because it is a non stationary iterative
method. So, we need some initial solution.
454
In first step compute r0 which is b-AX0 and it is same as in case of steepest descent. After
computing this set the search direction in the initial iteration is d0 equals to r0 which is
again similar to steepest descent. Now, for k equals to 0 1 2 until convergence do
calculate k that is the step length which is again same as in case of steepest descent. In
initial iteration then it will change because later on we will make the change in the search
direction then 4th step is compute Xk+1=Xk+kdk, now same iterative equation as we are
having in steepest descent.
Next from here this method will differ from steepest descent method. Compute rk+1
which was b-AXA.Xk+1 in steepest descent, but here it will become rk-kdk. Then if
rk+1=0 then stop else calculate k which is rk+1 rk+1/rkrk <rk+1, rk+1>/< rk, rk>. Then what
you do compute dk+1 as means next search direction, rk+1 + kdk and then end of your 'for'
loop. So, this is the complete algorithm for conjugate gradient method.
If I need to explain this method theoretically then let us have a quick look on it. So, let us
have a quick look on the explanation of this method in theoretical sense. So, the
conjugate gradient algorithm returns approximations means in different iterations Xj that
is the solution in j-th iteration as a vector of the sum of initial solution plus a vector from
j-th Krylov subspace of A and R0. And this is true for j = 0 1 2 and so on, such that the
455
error in j-th iteration with a ||A|| should be minimum over a vector q which is from a
polynomial of degree j-1, where the ||A|| is defined as square root XTAX for a given
vector X. So, it is the norm induced by the matrix A.
Now, if we look first step of this method means for going X0 to X1, what we are having
X1 = X0+0r0. Now, this 0r0 is a vector in this Krylov subspaces because this Krylov
subspaces will be having vectors r0, means scalar multiple of r0. So, it is X0+0r0.
Now, this belongs to X0+k1. Here 0 will become <r0 r0>/d0TAd0 and this is nothing just
the square of the norm ||r0||2/||d0||2 a square of the norm induced by the matrix A on the
vector d0 and it cannot be 0 because r0 is not 0. Once r0 is 0 method will not proceed that
is the termination or stopping condition for the conjugate gradient method.
Now, residual r1 will be r0-A0d0. Again this will become a vector in the second Krylov
subspace of A and b. Here if r1 is 0 then method will stop otherwise {r0, r1} is an
orthonormal basis of k2(A b) which is the condition when I told you about 4 methods that
each time the residual rj will be a vector which is orthogonal to the j-th Krylov subspace
of A and r0. So, in this way again when I will calculate r2, then r0, r1 and r2 will form an
orthonormal basis for k3(A b0 and so on. So, in this way we will proceed for subsequent
iterations in conjugate gradient method.
So, in this lecture we have learnt the definition of Krylov subspace then we have seen the
motivation of using Krylov subspaces for solving large and sparse linear systems. In the
last phase of this lecture we have learnt the conjugate gradient algorithm. In the next
lecture we will take an example of conjugate gradient method and then we will see some
conditions for the convergence of this method. Later on in the next lecture we will learn
preconditioning of conjugate gradient method.
456
(Refer Slide Time: 33:42)
457
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 37
Krylov Subspace Iterative Methods
(CG and Preconditioning)
Hello friends. So, welcome to the 37th lecture of this course. As you remember in the
last lecture we have discussed about Krylov subspaces based iterative methods. In last
lecture in particular we have discussed about conjugate gradient method there we have
seen the algorithm of this method. Now, in this lecture let us start with an example of
conjugate gradient method.
Let us take a simple example solve the system 2x1- x2 = 1, - x1 + 2x2= 0 using conjugate
gradient method with initial solution x0 as (0 0). So, here if we see the coefficient matrix
A is a 22 matrix [2, - 1, - 1 2] which is symmetric as well as positive definite. So, it is a
SPD matrix.
458
Now, the right hand side vector b is [1 0] and x0 is given as [0 0]. So, let us apply
conjugate gradient method for solving this system. So, if you remember the first step of
the algorithm of conjugate gradient method is to calculate the initial residual that will be
r0 and it is b-AX0. Since AX0 is a 0 vector. So, the second term will be 0 and it is equals
to b and b is given as [1 0]. Also in the algorithm we assume it as the initial such
direction.
Now, in the second step what we will do? We will calculate 0 that is the step length
which is r0Tr0/d0TAd0 and here you can notice d0 and r0 are the same vector. If I calculate,
it comes out to be 1/2, then in the third step since we are having d0 as well as 0 with us.
So, I can calculate the next approximation of the solution that is X1 which is X0 + 0d0.
So, here X0 is [0 0] + 0 is 1/2 and d0 is 1 and 0,so it becomes 1/2 and 0.
Now, in 4th step what we will do we will calculate the residual in first iteration. So, r1
here r1 will become r0-0Ad0 which is different from the steepest descent method
because in steepest descent method we used to take r1 as b-AX1. So, if I calculate it, I am
having r00d0 as well as A with us. So, what I will be having 0 and 1/2. Now, once you
are having r1, calculate 0 which is r1Tr1/r0Tr0 and it comes out to be 1/4. So, here the new
search direction d1 is given as r1+0d0. So, r1 is basically 0 and 1/2+0 is 1/4 and d0 is
from the first line 1 and 0. So, it comes out to be 1/4 as the first component and 1/2 as the
second component.
Once we are having d1 with me I can calculate 1. So, again 1 will become r1Tr1/d1TAd1
and this quantity becomes 2/3. So, if I am having 1, I can calculate next step
approximation of the solution that is X2 it is, X1+1d1. So, X1 is with me here which is
1/2 and 0+1 is 2/3 and d1 is 1/4 1/2. So, this becomes 2/3 and 1/3.
Next I will calculate r2. So, if you see here r1 is r0-0Ad0. So, r2 will become r1-1Ad1.
So, here if I calculate it r1 is 0 and 1/2-1 is 2/3.A is [2 - 1; - 1 2], and then d1 1/4 and1/2.
So, this becomes if I calculate it 0 and 0. So, now, r2 is 0. So, our iteration will stop
means method is converged to X2 which is, solution is x1 is 2/3 and x2 is 1/3 which is
also the exact solution of the system. So, this is the implementation process of the
conjugate gradient method.
Now, let us talk about the convergence of the conjugate gradient method.
459
(Refer Slide Time: 08:21)
So, there is a result in the literature regarding the convergence of this method that is let A
be a symmetric and positive definite matrix of order n. So, it is nn symmetric as well as
positive definite matrix. Then conjugate gradient method for solving the system AX = b
converges at most in n iterations. So, CG method will not take more than n iterations. It
will take less n iterations like in the case of previous example which we have taken in the
beginning of this lecture the system was 22 and we have taken just 2 iterations for
getting the exact solution.
Moreover the number of iterations for convergence will be proportional to a square root
of conditional number of A. So, if conditional number of A is large then matrix CG
method will take more number of iterations; if it is small then we will be having a faster
convergence. So, it means to have a better convergence and if you recall from some of
from the previous lecture the conditional number of A can be given as max/min which is
just the product of norm of A with norm of A-1. So, to have a better convergence we
want the conditional number of the coefficient matrix must be small.
Now, the question arise if A is symmetric and positive definite matrix, but the
conditional number of A is quite large then we will be having slow convergence of CG
method. So, can we have some method to improve this convergence or to make the
convergence faster? Yes, we are having such a method and that is called preconditioning.
So, let us talk about preconditioning.
460
(Refer Slide Time: 12:24)
So, the general idea of preconditioning for iterative methods is to modify the original
system which is ill-conditioned, ill-conditioned means here the conditional number of the
coefficient matrix is quite large. So, if AX = b is an ill conditioned system, so to modify
this system in such a way that we obtain an equivalent system let us write it with the cap
notations that is ÂX̂ = b̂ and here meaning of equivalent is the solution of AX= b is also
the solution of ÂX̂ = b̂ or in a reverse way the solution of the new system which is
denoted with caps equals to the solution of the original system. And for this new system
we should have a faster convergence. So, for which the iterative method converges
faster.
So, one of the approach for doing this, one of the approach for it to choose a nonsingular
matrix M of the same size as the size of the coefficient matrix A and rewrite the original
system which is ill-conditioned AX = b as M-1AX = M-1b. Here M should be chosen in
such a way that the conditional number of M-1A should be very less when compared to
the conditional number of A. So, in that way the convergence of this new modify system
will be faster when compared to the ill-conditioned original system. So, this is the basic
idea of the preconditioning.
Now, one of the problem here when you are going from the original system AX =b to a
new system ÂX̂ = b̂ then for applying the conjugate gradient to the new system the
matrix  should be symmetric as well as positive definite. So, how to choose such a M
461
that is the preconditioned matrix or preconditioner M such that M-1A should be
symmetric as well as positive definite. So, let us address this issue.
So, here issue is how to find ÂX̂ = b̂ in order to ensure the SPD property means
symmetric and positive definiteness property of Â. So, a solution to this issue is let M-1
=LLT, where L is a lower triangular matrix. So, here what we will be having. So, here L
is a mm lower triangular matrix since we are taking A as nn. So, better to write it nn
here which is nonsingular also, means none of the diagonal element of this lower
triangular matrix is 0.
Then what we will be having? If we are having original system as AX = b then this
system is equivalent to M-1AX = M-1 b which is equivalent to LTAX = LTb which is
equivalent to LTALL-1X = LTb. Now, take this matrix LTAL as your Â, take L-1 X as
your new variable X̂ and take LTb as b̂. So, if you choose your matrix same ÂX̂ and b̂ in
this way then we can ensure the symmetric and positive definiteness of matrix Â.
462
(Refer Slide Time: 20:35)
Now, let us rewrite the conjugate gradient algorithm with preconditioning. So, the same
algorithm which we have taken in the previous lecture we will rewrite it, but together
with a preconditioner M. So, let us write it preconditioned CG algorithm. So, here input
will be the coefficient matrix A, right hand side vector b the initial solution X0 and the
modification when compare to the classical conjugate gradient method is here we will be
having a preconditioner M which is a matrix of the same size as your A.
Now, the first step is compute r0 = b-AX0 and solve Mr̂0 = r0. Once you obtain r̂0 from
here set d0=r̂0. If you recall the original algorithm there we do not have this step. And
what we are doing? We are setting d0 as my r0. But here what I am doing? I am
calculating a new r̂0 which is nothing just M-1 r0 which is due to preconditioning.
Now, in the second step for k equals to 0 1 2 until convergence do find out k which is
rkTr̂k here now. So, this is another change here. Earlier we were having rkTrk. Now, what
we are having rkT r̂k /dkTAdk. The fourth step is if we are having update X as Xk + 1 =
Xk+kdk. Once you are having Xk+1, calculate rk+1 which is rk - kAdk. Here if rk+1 = 0
then stop otherwise go to step 6 again solve Mr̂k+1 = rk+1 and from here obtain and find
r̂k+1 with the help of M and rk+1. Then compute k which is rk+1T r̂k+1/rkTr̂k.
463
Once you are having your k then compute dk+1= r̂k+1+kdk and in that way the algorithm
will run till convergence and this is the end of 'for' loop which you are having in the
second line. So, this is the preconditioned conjugate gradient algorithm.
Now, what we will see here what is the extra computation here when compared to the
original conjugate gradient algorithm? If you see it in each iteration what we are having?
We have to solve a system Mrk = rk extra in each iteration for finding the r̂k. So, to have a
faster algorithm to have a faster computation we have to choose preconditioner M in
such a way that this system can be solved easily it should be there otherwise there is no
use of using suchkind of preconditioner because you are having in each iteration for
solving a system in each iteration you are having an extra system. So, how to choose this
M let us talk about it.
So, what should be M? So, the 2 extreme cases are number one, M equals to identity
matrix. So, you can solve easily the system Mr0=r0 or rk= this one because if this is an
identity then your r̂ k will become rk in each iteration and it means the preconditioned CG
converts into classical CG algorithm. So, there is no preconditioning here.
The other choice is M = A. So, if you take M = A that is another extreme case then X
will become A-1 b which is as difficult because it is a direct method for solving the
original system which is also difficult and not feasible choice. So, what we need to do?
We need to choose M somewhere in between these 2 cases. So, there are few choices of
464
M the first if the original matrix A can be written as the sum of 3 different matrices L, D
and U like we have done in the beginning of iterative methods in Jacobi and Gauss
Seidel method. So, where L is a lower triangular matrix, D is a diagonal matrix and U is
an upper triangular matrix.
If you choose M in out of any way out of these 3 then we will have number 1, M will be
symmetric and positive definite not M basically Â, number 2, Mrn= rn can be solved
easily, because the earlier methods and the third one, which is a guarantee for the
convergence that is the spectral radius of I-M-1A <1 or ||I - M-1A|| <1. I will prefer this
one because this is necessary and sufficient condition. So, these 3 things will effect
surely if you make the choice of M based on those 3 Jacobi, Gauss Seidel or SOR type of
approaching.
So, with this I will end this lecture. So, in this lecture we have learnt that how can we
make the convergence of the conjugate gradient method faster than the classical one
using the preconditioning procedure.
So, with this I will end the iterative methods in this course and in the next lecture we will
see some new properties of a matrix especially when all the entries of a matrix are
positive.
465
(Refer Slide Time: 32:55)
466
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 38
Introduction to Positive Matrices
Hello friends. So, welcome to the 38th lecture of this course. So, title of this lecture is
Introduction to Positive Matrices. So, in past 6 lectures we have talk about the different
iterative methods for solving linear systems, but this lecture is different from the
previous thread. Here, we will see a special type of matrix and we will see the relation
between that matrix and we will try to find out that what are the natures of its
eigenvalues and eigenvectors. So, let us first define the positive matrix.
So, a matrix A of order nn is called positive if all of its entries are positive that is aij is >
0 for all i j. So, here we are having 3 different matrices. So, if you see A1 all the entries
of A1 are positive. So, it satisfy this definition and here A1 is a positive matrix
If you see A2 the 3 entries of A2 are positive, but this entry is 0. So, hence it is not a
positive matrix, but if all the entries of a matrix are non-negative like in this case then the
matrix is called non-negative matrix. So, what we can say every positive matrix is also a
non-negative matrix, but reverse is not true. If you see the third matrix A3 here you can
467
see this entry is -1. So this matrix is neither positive nor non-negative. So, basic aim of
this lecture is to investigate the extent to which this positivity is inherited by the
eigenvalues and eigenvectors of positive matrix A.
So, now, let us discuss some properties of positive matrix. So, the first property is if A is
a positive matrix then the spectral radius of A will be greater than 0, means if it is a
positive matrix then spectral radius cannot be 0.
And the simple justification of this is there are 2 things one is spectrum. So, set of
eigenvalues of a matrix A is called spectrum of A and it is denoted as (A). So, if A nn
matrix is having eigenvalues as 1, 2, r, why it is up to r because r is less than n because
some of the eigenvalues may be repeated then the spectrum of A is simply the set 1, 2,
r, some of r maybe negative.
468
The other thing is spectral radius of a matrix is denoted by (A) and it is maximum
among all i such that absolute values of the eigenvalue. So, for example, if a matrix A is
having eigenvalue 2, -2, 2, 3, 3 or then it is a 55 matrix then spectrum of A is simply -2,
2, 3 and here spectral radius of A is 3. So, if it is having entries like -3 and -3 then the
spectrum is -2 to n-3 and, but the spectral radius will remain 3 because here we are
taking the absolute value. So, this is all about the definitions of spectrum and spectral
radius because these 2 terms we will use quite frequently in this lecture.
So, first thing is if A is positive, if A is a positive matrix then spectral radius of A will be
greater than 0. So, a simple justification of this is if the spectrum of A will contain only
the 0 eigenvalues or a spectrum contain only 0 then the spectral radius of A can be only 0
otherwise it is not because we are taking the absolute value.
So, if it is this case then Jordan form of A and the matrix A itself is nilpotent, nilpotent
means AK=0 matrix for some positive K. So, it is K, is called nilpotency of index K. But
AK=0 is not possible, when A is a positive matrix because all the entries are positive they
are multiplying with positive entries. So, they will grow they cannot be 0 hence spectral
radius of A cannot be 0 and it will be positive only for the positive matrices. So, this is
the justification for first formula.
469
My second remark is or second result is if A is a positive matrix and x is a positive
vector then Ax will be a positive vector. The third result is if A is a non-negative matrix
and uo v 0, u and v are the vectors then Au Av.
The next result is if a is non-negative and Z is a positive vector and Az = 0 will be only
in the case when A= 0. The next result is if A > 0, u >v are two positive vectors then Au
> Av. So, these are some basic results we will use further.
Now, let us talk about positive eigenpair. So, if A be a nn positive matrix then the
following statements are true number one, the spectral radius of A will lie in the
spectrum of A and the second is if Av equals to the spectral radius of A .v then v will be
a positive vector means the eigenvector corresponding to the spectral radius of A will be
a positive vector. So, it means the first condition here first result is spectral radius of A in
spectrum of A, if let me justify it or let me explain it we are saying that a spectral radius
of A belongs to spectrum of A. So, consider a matrix A of order 33 having eigenvalues
as =2, -3 and 1 in this case spectrum of A is -3 1 2 and spectral radius of A is 3 the
maximum of the absolute values of eigenvalues.
So, for this matrix A spectral radius of A does not belongs to spectrum of A, but from
that result we are saying that the spectral radius of A will be always in spectrum of A
means whatever will be the spectral radius that will be an eigenvalue of A means like in
this case if eigenvalues are 2, 3, 1 then spectrum is 3 and spectral radius is 3. So, here in
this case this line is spectrum. So, this is the explanation of that particular thing.
470
And the second condition is saying that the eigenvector corresponding to this eigenvalue
which is the spectral radius of A will be a positive vector. So, let us prove it.
So, for simplicity assume that spectral radius of A is 1 which is positive because here we
are taking the positive and that we can have because if a spectral radius of A is r which is
a positive number then spectral radius of A/ r will become 1 and A/r will be again a
positive matrix. So, here we are taking spectral radius as fix that is 1.
Now, if X be any eigenpair of A such that the absolute value of is 1 then what we will
be having? |X|=|| |X| because absolute value of lambda is 1 this equals to |X| = AX
because X = AX since A and X is an eigenpair of A and this is less than equals to this
quantity. And since A is a positive matrix, so I can write in this way and |X|. So, this
gives me that |X| A|X|. Let us write it equation number 1.
Now, the goal of this theorem is to show that equality holds in 1. So, for showing this let
Z be a vector which is AX that is the right hand side of 1 and define Y is the difference
of Z -|X| then from 1 because Y = Z -|X|. So, Y 0.
Now, here we have to show that equality holds. So, we have to show that Y can be 0.
Now, suppose Y is not equals to 0. So, we are proving it by taking some method of
contradiction. So, Y is not 0 means there exist Y is not a 0 vector. So, there will be some
component yi which is positive. Now, there is a positive matrix, Y is a positive vector,
471
this implies from the results given in previous slide that AY will be a positive vector and
if AY is a positive matrix vector Z will also be a positive vector.
So, there exist a number, in fact, positive number such that AY > (Z), may be any
positive number greater than 1 or less than 1 because these are 2 positive vectors. So, we
can find such a number. Or A/1+.Z> Z or let us write this matrix A/1+ as B which is
again a positive matrix because 1/1+ is a positive entity number and you are dividing
each entry of A by this. So, BZ > Z and as I told you B is again a positive matrix.
Now, Z> 0, B> 0 this implies the vector BZ is a positive vector. Now, BZ is a positive
vector B is a positive matrix this implies B2Z is also a positive vector. In this way we can
have that BKZ will be a positive vector for any K=1 2 and so on.
Now, if I take limit K , BKZ which is limit K (A/1+)K . Z because B=A/1+
and this equals to 0. Why it is 0? Because if you recall that the spectral radius of A is 1
and so spectral radius of B equals to 1/1+ which is less than 1. And so, it is a number
which is less than 1. So, raised to power K, where K will be 0.
So, if this is the case then what we are getting here that Z<0 which is a contradiction the
fact that we assumed that Z>0. This means assumption Y 0 led to a contradiction. It
means Y can be 0 means equality holds that was the thing which we need to prove. If Y
is 0 it means 0 = A|X|-|X| which implies A|X| =||.|X|, this is the spectral radius because
472
it is equal =1 it means the eigenvector corresponding to spectral radius of a positive
matrix will be a positive vector. So, this is the proof of this particular result.
My next result is on the index of a positive matrix. So, if A be a nn positive matrix then
the following statements are true. The first one is the spectral radius of A is the only
eigenvalue of A on the spectral circle. Spectral circle means Gershgorin circle. The
second result is index of spectral radius of equals to 1, means whatever eigenvalue be the
spectral radius the index of that is 1. And index means here in the Jordan canonical form
of A, the size of largest block corresponding to that particular eigenvalue. So, that is
called a index. So, if is an eigenvalue the index of is the size of the; or dimension of
the largest block in the Jordan canonical form of A.
The other one is the spectral radius of A is a semi simple eigenvalue. An eigenvalue is
called semi simple if its algebraic multiplicity equals to geometric multiplicity. So, we
are not taking proof of these 2 results, but we will take an example on these results.
473
The other results on the positive matrices are, if A is a square matrix of order n which is
a positive matrix then algebraic multiplicity of spectral radius of A equals to 1. And
means its geometric multiplicity is also 1 because it is a semi simple eigenvalue hence
algebraic multiplicity of spectral radius of A equals to geometric multiplicity of spectral
radius of A and this is equals to 1. So, let us take an example of that.
So, I am having a definition here. Let A be a positive matrix then the eigenvector of A
associated with the eigenvalue which is the spectral radius of A is called the Perron
vector. We have already shown that this Perron vector will be a positive vector. The
eigenvalue equals to spectral radius of A is called the Perron root of A. So, this is the
definition of Perron vector.
474
Now, if A is a positive matrix there are no non-negative eigenvector for A, other than the
Perron vector means all other vectors will not be non-negative, there will be some
negative component in the column vector those are the eigenvectors.
So, now let us take an example to explain this in a better way. So, it is a 33 matrix
which is a positive matrix you can see all the entries are positive. So, it is a positive
matrix now if I calculate the eigenvalue of this matrix then these are 12, 6 and 6. So, here
the spectral radius of A is 12.
Now, the eigenvector for =12 is [1 1 1] and here you can see this vector is Perron vector
for matrix A and as we have already proved that this particular vector is a positive
vector. Moreover, in the last slide I have told you that no other eigenvector will be non-
negative. So, if you see the other 2 eigenvectors X2 and X3 both of them will be having at
least 1 negative component. So, they are now non-negative.
The other result is that the only spectral radius of A will lie on the spectral circle only
Perron root will lie on the spectral circle. So, here Perron root is 12 and if you draw the
spectral circle of A, then we are having A = [7 2 3, 1 8 3, 1 2 9].
475
(Refer Slide Time: 25:19)
So, if I draw the spectral circle of A then the first circle is -7 5. So, this is the
diagonal pivot element. So, it is here less than equals to the absolute value sum of rest of
the element in first row. So, this is basically a circle having centre at 7 and radius as 5.
So, 12 and 2 here, and this is the centre at 7. The other circle the second will be centre at
8 and radius is 4. So, it will go like this. And the third circle is centre at 9 and radius is 3,
so it will be like this. So, this all circle will intersect here at 12 and which is our spectral
radius or Perron root we satisfy the claim that the only Perron root will lie on the spectral
circle.
So, in this lecture I have talked about positive matrices and some of their properties in
terms of eigenvalues and eigenvectors.
These are the references. In the next lecture we will talk about non-negative matrices.
476
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 39
Non-negativity and Irreducible Matrices
Hello friends. So, welcome to the lecture on Non-negativity and Irreducible Matrices.
So, in the last lecture we have discussed about positive matrices where we have seen
some properties of positive matrices especially about the spectral radius of such matrices.
We have seen the definition of Perron value and Perron vector also in that lecture.
So, let us continue in the same direction from non positive matrices to non-negative
matrices in this lecture.
So, a matrix M which is a real matrix of size mn is said to be a non-negative whenever
each entry of this matrix is nonnegative it means 0 is allowed here, unlike the case of
positive matrices where we are having each entry is strictly greater than 0 such matrices
are denoted by capital A is greater than equals to 0.
So, for example, [1 2; 0 3] this 22 matrix is a non-negative matrix. Similarly this 33
matrix is again a non-negative matrix, but these 2 are not positive matrices. This
particular matrix is a positive matrix moreover every positive matrix is a non-negative
matrix in the sense that it is not having any negative value.
477
(Refer Slide Time: 02:02)
Let us go through the same result which we were having in the case of positive matrices
in terms of spectral radius. So, let A be nn matrix having spectral radius as small r then
my following statements are true. The first one is this r which is the spectral radius of A
belongs to spectrum of A, but r = 0 is possible in this case which was not possible in case
of positive matrices, there r > 0.
My next definition is reducible matrices. So, A nn matrix is said to be reducible matrix
when there exist a permutation matrix P and you remember that permutation matrix is
product of elementary matrices means after applying the elementary row operation on
478
the identity matrix you can get permutation matrix. So, if you are having such a
permutation matrix P and you take the product of PT, if it comes out in this form where x
and z are square matrices. It may happen that they are of different order and this matrix
form is like this, a square sub matrix [x y; 0 z] then we say that A is a reducible matrix.
Here the PT.A.P is called symmetric permutation of A. It means that we are applying the
permutation matrices from both side because if P is permutation matrix. So, PT the effect
is to interchange rows in the same way as columns are interchanged.
Now, let us see the graph what we can say about the graph of such matrices. So, the
graph of matrix A is defined to be the directed graph on n nodes if the size of A is nn
having N1, N2, Nn as nodes in which there is a directed edge is leading from Ni to Nj if
and only if aij 0.
479
(Refer Slide Time: 06:15)
So, if I am having a matrix let us say 22 matrix [2 0, 1 1]. Then here I will be having 2
nodes in the graph of this matrix and I will be having an edge if there is a non-zero entry
at this position. So, if you see a11 position here I am having a non-zero entry. So, I am
having an edge from 1 to 1 and edge will be an a direct edge if you see from you can
write like this N1, N2, N1, N2. So, if you see from N1 to N2 this particular entry is 0. So,
there is no connection from N1 to N2 a direct connection.
If I go from N2 to N1, yes I am having an edge due to this non-zero entry and then I am
having N2 to N2. So, this if this matrix is A this is graph of A. If I take a 33 matrix [3 2
0; 0 1 0; 2 1 2] then here it is a 33 matrix, so graph of this matrix will be having 3
nodes let us say N1, N2 and N3. So, I am having an edge N1 to 1 to 1 due to this non-zero
entry then I am having an edge from N1 to N2 due to this entry 1 to 3, I am having 0. So,
I do not have any edge from 1 to 3. Then 2 to 1, 0, 2 to 2 each yes I am having an edge 2
to 3, I do not have 3 to 1, I am having an edge 3 to 1, 3 to 2 yes and then 3 to 3. So, this
is the graph, if this is my matrix B then this is graph of B. So, in this way we can define
the graph of a given square matrix.
480
If am having an edge like this means 2 to 3 and 3 to 1. So, if this entry is one. So, I will
be having this edge. Now, I am having a connection from N1 to N3 because I will go like
this from N1 to N2 and then N2 to N3. I can move from N1 to N2, I can move N2 to N3 in
the same way I can move N2 to N1, because I will reach from N2 to N3 and then N2 to N1.
And then I can also move N2 to N3, I can move from N3 to N1 as well as from N3 to N2.
So hence, I am having connection between all the edges. So, this graph is a strongly
connected graph in this case.
If I remove this particular edge, if I do not take this edge then this particular graph is not
a strongly connected graph, but if I take this edge here then it becomes a strongly
connected graph. So, if I see the relation between a strongly connected graph and
irreducible matrices then I will be having a very elegant relationship between these two
and that is A is an irreducible matrix if and only if G(A) is a strongly connected. So, let
us take example.
So, if someone ask you given this matrix [1 0 0; 2 3 4;5 6 7] check whether it is an
irreducible matrix or not. So, just make the graph of this matrix, N1 N2 and N3. So, we
are making the graph associated with this matrix. So, I am having n as 1 to N1, I do not
have N1 to N2, N1 to N3 this is 0 then 2 to 1 2 to 2 2 to 3, 3 to 1 3 to 2 and then 3 to 3. So,
what I am having this is the graph associated with this particular matrix. Now, is it a
strongly connected, if I want to go from N1 to N2 or N2 to N3 it is not possible here in this
case because there is no sequence of edges by following that I can move from N1 to N2
481
or N1 to N3. So, it is not a strongly connected. It means this matrix is not irreducible. So,
it is a reducible matrix.
On the other hand if I take another example let us say A2 let us say [1 2 0, 0 3 4, 5 6 7].
So, please note that I am having the same number of 0 in this matrix also whatever I was
having in the case of matrix A1. So, again if I make a graph here N1, N2, N3. So, 1 to 1, 1
to 2, 1 to 3 there is no direct edge 0, 2 to 1 no, 2 to 2, 2 to 3 yes 3 to 1 3 to 2 and 3 to 3.
So, now, if you check this, I can move from N1 to N2 using this edge. N1 to N3 by
moving first N2 and then following this edge I can move from N to 2 and N2 to N3 as
well as N2 to N1, I can move N3 to N1 as well as N3 to N2. So, it is a connected graph.
There are another method if you are having more number of nodes like twenty notes. So,
there are several algorithm for checking this particular thing whether the graph is a
strongly connected or not. So, there are different algorithms, but I am not discussing
those in this particular lecture, that is the part of a separate course, just I am making an
understanding of matrix and how can we use graphs for checking whether the given
matrix is reducible or not. So, here it is a strongly connected graph. So, it is a irreducible
matrix.
So, this is the relation between these two matrices my next result is the Perron-Frobenius
theorem in case of non-negative and irreducible matrices in case of non-negative and
irreducible matrices.
482
So, the statement of this theorem is something like that let A belongs to nn matrix
having real entries and it is an irreducible matrix with non-negative entries. It means it is
irreducible as well as non-negative matrix. Then we are having few statements we can
make few statements for this particular matrix that, A has a positive eigenvalue equal to
its spectral radius.
The second is the eigenvector corresponding to (A) is a positive vector. The third is this
particular eigenvalue (A) which is the spectral radius also is a simple eigenvalue that is
at in other words I can write it there is a single Jordan block of order 1 for this particular
eigenvalue or index of this eigenvalue is 1. The fourth result is this eigenvalue (A)
increases or decreases when an entry of A increases or decreases. So, in other way the
spectral radius of A, if a irreducible and non-negative defense on entries of A. If you
increase any of the entry of A spectral radius will increase if you decrease the entry of A
spectral radius will decrease. That is if A and B are 2 non-negative irreducible matrices
with each entry of A is greater than equals to 0 and each entry of B is greater than equals
to corresponding entries of A and A not equals to B means some of the entries are equal
to A and some of the entries are greater than the corresponding entries of A.
Then according to this I can say that spectral radius of A will be strictly less than the
spectral radius of B. So, this theorem is called Perron-Frobenius theorem in case of
irreducible and non-negative matrices. And these four beautiful properties of such
matrices can be used in many applications in various engineering discipline. So, here I
am not taking the proof of this matrix theorem because it is quite big. So, one can follow
the book if someone is interested in the proof we will see the applications of all these.
483
Let us see few more consequences of the Perron-Frobenius theorem. So, let me write as
lemma 1. So, if A is a nn matrix having real entries and it is irreducible and non-
negative matrix then (I+A)n-1 will be a positive matrix. And the proof of this can be done
like this the matrix (I+A)n-1 will be the linear combination of the matrices I, A, A2, A3
upto An-1 with positive coefficients. Why positive coefficients? Because A is a non-
negative matrix, so there you cannot take mean there will not be negative coefficients.
So, using that fact and using the result from the Perron-Frobenius theorem we can prove
this particular result.
The next is another lemma on the Perron-Frobenius theorem. It say me that for any
square matrix M, if the spectral radius of this matrix is less than 1 then the matrix series
let us say K= 0 to MK. Means it is I +M + M2 + M3 up to , this infinite series
converges and in particular K = 0 to MK will become I - M-1.
Means, if you open the binomial expansion of this you will get this infinite series the
another beautiful properties of such matrices we see that let AX = X means A is a
square matrix and X is an eigenvector of these corresponding to eigenvalue . Then is
multiple means it is not simple means the index of this is greater than 1, if and only if
there exist y, another vector y such that ATy =y. And x and y are orthogonal means the
dot product of these 2 vectors are 0.
So, these 3 properties we will use in the proof of Perron-Frobenius theorem and then 1
can prove it.
Now, after reducible and irreducible matrices, let us come to next definition that is about
the primitive matrices. So, a non-negative irreducible matrix A having only 1 eigenvalue
let us say that is spectral radius on its spectral circle is said to be a primitive matrix.
484
So, if in the case of non-negative and irreducible matrix there exists only 1 eigenvalue
which is on the spectral circles then the matrix is called primitive. If there are more than
1 eigenvalue those lie on the spectral circle then the matrix is called imprimitive matrix.
And that the number of those eigenvalues those are spectral circle is called the index of
imprimitivity. So, that number is called index of imprimitivity.
Another test for checking the primitivity which is very useful when you are solving the
examples is the Frobenius test and Frobenius test tells us that a non-negative matrix A is
primitive if and only if n size of is nn, An2-2n+2 is a positive matrix. So, if this happens
then A is primitive. Let us see an example of this.
So, determine whether or not this particular non-negative matrix is primitive or not.
485
(Refer Slide Time: 27:13)
So, here this example is, so what I need to check I am having this matrix A and I have to
check whether this matrix is a primitive matrix or not. One of the way is, just find out the
spectral circles of this and eigenvalues of this matrix if only 1 eigenvalue lie on the
spectral circle then it is primitive otherwise it is not.
Another is Frobenius test, so how to use Frobenius test? It is 33 matrix. So, here n2-
2n+2 becomes 32-6+2, so it is 5. So, we have need to check A5, if it is a positive matrix
then A is a primitive matrix. So, here if I need to calculate A5 what I need to do I have to
multiply A, 5 times. Instead of A let us write a Boolean matrix corresponding to A,
which is a matrix of 0 and 1 only. So, the Boolean matrix corresponding to A is a matrix
of 0 and 1 of the same size if there is a 0 entry it will be having 0 if there is a non-zero
entry there we will be having 1. So, like first row of B will become [0 1 0], second row
will become [0 0 1], then [1 1 0].
So, A5 will be a positive matrix means all the values all the entries in these matrix will be
positive when B5 will be having of all the entries as 1. So, if I calculate, I found B5= [1 1
1, 1 1 1, 1 1 1] which implies that A5 is a positive matrix which implies A is primitive.
So, this is the way of applying Frobenius test for checking the primitivity of a given
matrix.
486
(Refer Slide Time: 30:13)
Now, if matrix is not primitive it is called imprimitive. And the number of eigenvalues
those lie on the spectral circle is called the index of imprimitivity. If someone ask you
find the index of imprimitivity of A, where A is this 44 matrix then what I will be
having the characteristic polynomial of this matrix A is 4-52+4 = 0. So, here
eigenvalues are 2 and 1.
If I make the spectral circle of this matrix, so the first row will give me the spectral circle
as the center at (0, 0) and radius is 1, second row will give me or absolute value of ||
2 + 1= 3. So, it will become center at origin (0, 0) and radius is 3.
And then 4th row will give the same fifth row will give as the first row. So, these 2 are
the spectral circle for the given matrix and eigenvalues are 1, -1, 2, -2. So, if you check
these 2 eigenvalues lie on the spectral circle. So, hence here h = 2 means index of
487
imprimitivity is 2 here. If there is only 1 eigenvalue which is on the spectral circle then
the matrix will become a primitive matrix, please note that.
Another way of checking index of imprimitivity you just write the characteristic
polynomial and now, see this polynomial like this, what you have to do you have to
count the non-zero coefficients.
So, it is let us say K0, it is K1 then so K0, K1, K2, K3, K4. Now, see the subscript where
the coefficients are non-zero. So, if you see the coefficient of which is corresponding to
K2 is -5 which is non-zero. So, here take K1 as 2. Again the coefficient which is
corresponding to fifth location is non-zero, so that is my, it is I am taking K5. So, it
means there are 2 and 4 position second and 4th position where the coefficients are non-
zero. So, GCD of these two will give you the index of imprimitivity. So, GCD of 2 and 4
is 2 and this process is mentioned here.
So, in this lecture we have learned about reducible, irreducible, their respective graph,
how to check whether the given matrix is reducible or not, and then about the primitive
matrices. In the next lecture we will learn the polar decomposition of a given matrix.
These are the references for this lecture.
488
Matrix Analysis with Applications
Dr. Sanjeev Kumar
Department of Mathematics
Indian Institute of Technology, Roorkee
Lecture - 40
Polar Decomposition
Hello friends. So, welcome to the last lecture of this course which is on Polar
Decomposition. So, again it is a decomposition of a matrix in terms of product of
different matrices like in case of singular value decomposition and this particular
decomposition is having various applications in different fields of science and
engineering. Hence, it is a good way to close this course after introducing you polar
decomposition.
So, what is polar decomposition? From the basic knowledge of complex number you
know that if I am having a complex number Z which is having real part as x and then iy,
y is the imaginary part of this number. Then this number I can also write r.ei, where r is
a positive quantity which is (x2+y2)1/2 and is tan-1y/x.
So, what I am having I am having a complex number and I am writing this number as the
product of these 2 thing r and ei,, where r is positive and ei, is some sort of rotation
type of thing. So, can we have the same type of thing in case of matrices, means can I
write a matrix A as the product of 2 different matrices let us say P and W, where W is
some sort of rotation matrix and P is having some sort of positive definiteness or positive
semi definiteness type of property. So, this is the idea of polar decomposition.
489
(Refer Slide Time: 02:36)
So, what I want to say that we are having a polar decomposition for any matrix A having
real entries. Let us say here it is also valid for complex matrix is of size mn using the
same analogy as we are having in case of complex numbers that in case of complex
number, r is greater than equals to 0 means I should have a positive operators and ei, is
something similar to isometries or transformations, rotational transformations. So, let us
learn this.
So, let us learn a theorem before going to this and this theorem is called as polar
decomposition theorem. So, the statement is something like that let us first restrict up to
square matrix then we will discuss the case of rectangular matrix. So, for any square
matrix A there exist a unitary matrix W and a positive semi definite matrix, in short let
me write it PSD, P such that A can be written as the product of W and P. Furthermore, if
490
A is invertible then the representation or decomposition is unique. Means you can write
this unique way as the product of two matrices one is positive semi definite and one is
unitary.
So, let us see the proof of this. From singular value decomposition of A, we have A =
USV*, where U and V are unitary matrices and S is a diagonal matrix having singular
values at the diagona entries. This I can write as U V* V because V is a unitary matrix
V*V will be identity matrix S V*. Now, write this U.V*= W and VSV* =P. So, what I
am having? W = U V* and P = VSV*.
Now, since U and V* are unitary matrices therefore, their product W is also unitary. So,
we have seen that the W is unitary what I need to show. Now, that P is a positive semi
definite matrix. So, here if you see P= VSV* it means P and S are unitary equivalent or
in other word I can said they are unitarily similar, they are similar matrices. So, it means
the spectrum of P equals to spectrum of S. Means eigenvalues of P is also the
eigenvalues of S or vice versa. Now, if you see S, the eigenvalues of S are the singular
values of A, and what is that singular values are always non-negative. This implies
eigenvalues of P are non-negative and if the eigenvalue of matrix are non-negative it
means the matrix is a positive semi definite matrix.
491
So, in this way we have proven the first part of the theorem that I can write as the
product of the W and P, where W is unitary and P is positive semi definite matrix.
Now, let us do the second that if A is invertible then this decomposition is a unique
decomposition. Now, let the representation is not unique that is A = WP = ZQ, where P
and Q are positive semi definite matrices and W and Z are unitary matrices. So, what I
need to show now, that W = Z and P = Q then only I can say that representation is a
unique representation. Now, here what I am having WP = ZQ. So, if I multiply both side
by Z* that is complex conjugate of conjugate transpose of Z then what I will be having?
Z* W = Q P-1, I can write.
Here you just note that, A is an invertible matrix, A is invertible. And what I am having?
It means P and Q are also invertible because A is the product of W and P. So, A is
invertible means the both the matrices should be invertible, W is unitary. So, it is
invertible. So, there P should also be invertible similarly we can say that Q is also
invertible. So, that is why I am saying these means we have put this condition that if A is
invertible then only representation is unique.
Now, what I can have? If you see that Z* is unitary matrix, W is also a unitary matrix, it
means their product is unitary. So, from here I can say QP-1 is also unitary matrix, it
means (QP-1)*QP-1 means conjugate transpose of a unitary matrix product itself should
be I, it means P-1, if I see this Q2P-1 = I, it means P2= Q2.
Now, if you see here P and Q are positive semi definite matrices not even semi definite
they are positive definite matrices in this case because A is invertible. So, what I will be
having? If P and Q are positive definite matrices then P2= Q2 implies that P =Q. If P = Q
and P and Q are invertible this implies W= Z, it means the factorization is unique. So,
this is the proof of second part of the theorem such kind of factorization of a matrix, that
if we are writing a matrix as the product of 1 unitary matrix and 1 positive semi definite
matrix such kind of factorization is called polar decomposition of a given matrix. So, let
us take an example of this.
492
(Refer Slide Time: 14:23)
So, find the polar decomposition of A =[11 -5; -2 10]. So, it means I need to write matrix
A as the product of 2 matrices one is unitary and another one is positive semi definite.
So, let us first perform the singular value decomposition. So, if I calculate the
eigenvalues of A*A these comes out to be 200, 50. So, hence the bigger singular value is
(200)1/2, that is 10(2)1/2 and (50)1/2 means this is the second singular value, so this is
5(2)1/2. Now, if I calculate the eigenvector of A*A corresponding to eigenvalue 200 what
I will find that this eigenvector comes out to be 1/(2)1/2[1 -1]
So, now, what I need to calculate? For completing the singular value decomposition of
A, I need to calculate a matrix U. So, my U1 will become AV1/1 and this comes out to
be 1/5[4 -3]. Similarly U2 will become AV2/2 if you can remember it from the singular
value decomposition lecture it will be 1/5[ 3 4]. So, hence my U becomes [4/5, -3/5; 3/5,
4/5].
493
Now, I want to perform polar decomposition of A. So, if polar decomposition of A is
equal to W.P. Then what in my W according to singular value decomposition these W is
U.V*. So, U.V* means U.V conjugate transpose, U is gained by this V* will become
transpose of this matrix and this comes out to be 1/5(2)1/2[7 -1, 1 7]; so here W. Hence P
will become V.S.V* and this becomes 5/(2)1/2[ 3 -1; 1 3] and if you multiply W and P.
You will get the matrix A, that is the polar this is these are the matrices for the polar
decomposition of the matrix A.
So, what we are using? Basically, we are using the singular value decomposition for
performing the polar decomposition and as I told you this is the case of square matrices
and this is one of the way of doing polar decomposition. Let us talk about general
matrices means rectangular matrices.
So, there are 2 types of polar decomposition in case of rectangular matrices, the right
polar decomposition and the left polar decomposition.
The right polar decomposition of a matrix A which is mn matrix, where m n means
you are having more number of rows than columns hence the form A = U.P, where U is a
matrix with orthogonal columns. So, here U is not a square matrix in case of the which
we have discussed just earlier, where A is a square matrix both U and P will be square,
but if A is a rectangular, A is of size mn then U will be of size mn and P will be size
of nn. So, here U is n matrix with orthogonal columns means if you take the dot
product of each column it will be 0 and P will be a n positive semi definite matrix.
In the same way we can define left polar decomposition in case when the matrix is
having more number of columns when compared to row. So, if A is a nn matrix where
494
mn means you are having more number of columns, then we will be having left polar
decomposition and it is H.U, where H is a positive semi definite matrix of size nn and
U is a unitary or orthogonal means U is a matrix having orthogonal columns.
Now, how to perform the polar decomposition of these matrices? So, another way of
doing it let me explain to you.
So, how to find P in right polar decomposition? So, what I am having here? I am having
a matrix A which is of size mn and m>n. So, take the matrix A*.A means conjugate
product of conjugate transpose of A together with A. Then certainly this matrix will be a
diagonalizable matrix and let us write the diagonalization of this matrix as SBS* because
it will be a symmetric matrix. So, it will be always diagonalizable. Here B will be a
495
positive semi definite matrix means it will be a diagonal matrix where all entries are
greater than equals to 0, it will not contain the negative entries.
So, what I can write if this is the case I can write this B as C2, where C is (B)1/2 means
and it is a diagonal matrix. So, it will be square root of each diagonal entries. S* or this
can be written as SCC.S*. So, these become SCS*S.C.S* because S*S*.S is an identity
matrix in this case it will be unitary matrices S is a unitary matrix.
So, from here take this as P this will again be P*, A will become P2. So, from here I can
write P equals to square root of the product of A conjugate transpose with matrix A. How
to find out it? You can find out by the diagonalization of A* A, because square root of
A* A that is P will become raise to power half it will become (S.B)1/2. S*. And here B is
a diagonal matrix. So, B1/2 will be just the square root of each diagonal entry. So, in this
way we can find out P in the right polar decomposition. So, let us take an example of
this.
Take A as, let us say take [3 1, 0 1, 1 0], this 32 matrix. So, here AT will become [3 1, 0
1, 1 0]. So, now, AT.A will be a 22 matrix, so this entry will be 1. So, AT is 23. So,
now, AT A will become [3 1 0, 1 1 0][3 1, 0 1, 1 0] = [10, 3; 3, 2] So, it is the product of
AT.A.
496
Now, if we see the eigenvalues of ATA the characteristic polynomial of ATA will
become ( -10).( -2)-9 = 0. So, this will become 2-12+11. So, from here what I will
be having if this equals to 0, this equals to 0 means what I am having 2-11 -+11 =0.
So, eigenvalues of ATA become 11 and 1.
So, what I can write? I can write ATA the diagonalization of this matrix is S B.ST, where
S is given as, where S is given as [3/(10)1/2, 1/(10)1/2; 1/(10)1/2 -3/(10)1/2] that is the
eigenvector corresponding to =11 the eigenvector corresponding to =1. Then at
diagonal matrix B [11 0 0 1].ST, ST will become here [3/(10)1/2, 1/(10)1/2; 1/(10)1/2 -
3/(10)1/2]which is similar to S. So, now, P will become a (AT.A)1/2.
So, this will become S.B1/2S* or here I am using transpose, because real matrix is ST. So,
it will be [3/(10)1/2, 1/(10)1/2; 1/(10)1/2 -3/(10)1/2].[(11)1/2 0; 0 1] [3/(10)1/2, 1/(10)1/2;
1/(10)1/2 -3/(10)1/2]. So, this is my matrix P.
Once I will be having my matrix P, I am having A, A = W.P from here by making the
product of these 3 matrices I will get my P. Then if P is available with me W will
become A.P-1, I can calculate my W and in this way I can perform the polar
497
decomposition of the given matrix A. This is the right polar decomposition we cannot
make left polar decomposition in this case because if I take in left polar decomposition
my matrix A will be written as some Q.Z where Q is the positive semi definite matrix.
And in that case this will become 33 matrix having one eigen value 0, and hence the
inverse of that positive semi definite matrix will not exist and you cannot calculate the
unity matrix like we have done in this case here.
So, this is the alternate way of doing this polar decomposition apart from the singular
value decomposition method which we have done in the earlier example. Again we are
having couple of example here.
This is again how I have done in the earlier case. This is the matrix, this is A*.A then
M1/2 will become (A*.A)1/2 and by using the same process I calculate A=UP, where U is
given by this one U= A.P-1.
498
(Refer Slide Time: 33:25)
This is again basedt on the singular value decomposition. So, this is the singular value
decomposition of this matrix, then U is Us.VT and P is VVT. So, by finding these 2
matrices I can perform the polar decomposition of a given matrix.
499
(Refer Slide Time: 33:50)
500
THIS BOOK IS NOT FOR SALE
NOR COMMERCIAL USE