Linear Algebra (Mir, 1983)
Linear Algebra (Mir, 1983)
Boeso,n;rm
JIMHEiiHAH
AJifEBPA
LINEAR
ALGEBRA
Translated from the Russian
by
Vladimir S hokurov
TO THE READER
Ha aHZAUilC~OM U1>1~e
Preface 9
Vector Spaces
CHAPTER 1
Exercises
1. Construct finite and infinite sets. What properties
are characteristic of them?
2. Construct sets whose descriptions contain a contradiction.
3. Is the set of real roots of the polynomial z 4 + 4z3+ 7z1+ + 4z t
empty?
4. Construct sets whose elements are sets.
5. Construct sets that contain themselves as an element.
2. Algebraic operation
Among all possible sets there are such on
whose elements some operations are allowed to be performed. Suppose
we are considering the set of all real numbers. Then for each of its
elements such operations as calculation of the absolute value of
that element and calculalion of the sine of that element can be
defined, and for every pair of elements addition and multiplication
can be defined.
In the above example, note especially the following features of
the operations. One is the definiteness of all operations for any
element of the given set, another is the uniqueness of all operations
and the final feature is that the result of any operation belongs
to the elements of the same set. Such a situation takes place by far
not always.
An operation may be defined not for all elements of a set; for
example, calculation of logarithms is not defined for negative num-
bers. Taking the square root of positive numbers is defined, but
not uniquely. However, even if an operation is uniquely defined
for every element, its result may not be an element of the given set.
Consider division on the set of positive integers. It is clear that for
any two numbers of the set division is realizable but its result is
not necessarily an integer.
Let A be a set containing at least one element. We shall say that
an algebraic operation is defined in A if a law is indicated by which
any pair of elements, a and b, taken from A in a definite order, is
uniquely assigned a third element, c, also from that set.
This operation may be called addition, and c will then be called
the sum of a and b and designated c = a +
b; it may be called
multiplication, and c will then be called the product of a and b
and designated c = ab.
In general terminology and notation for an operation defined
in A will not play any significant part in what follows. A!>- a rule,
14 Seta, Elements, Operations [Ch. I
* a b c * a b c
a a c b a a a a (2.1)
c b Q
"
c b Q c "c c c c
b b b
and let the first element be always chosen from the column and the
second from the row, and let the result of the operation be taken
at the intersection of the corresponding row and column. In the
first case the operation is obviously commutative but not associa-
tive, since, for example,
(a • b) • c = c• c = c,
a • (b • c) = a•a = a.
In the second case the operation is not commutative, but associa-
tive, which is easy to show by a straightforward check.
EKerclses
1. Is the operation of calculating tan z on the set of
all real numbers z algebraic?
2. Consider the set of real numbers z satisfying the inequality 1 ~ 1E;; t.
Are the operations of multiplication, addition, division and subtraction ll.lgebl'a-
ic on th1s set?
3. Is the algebraic operation x • y = z 1 + y commutative and associative
on the set of all real numbers z and y?
4. Let a set consist of a single element. How can the algebraic operation be
defined on that set?
5. Construct algebraic operations on a set whose elements are sets. Are
these operations commutative, associative?
S. lnvene operation
Let A be a set with some algebraic operation.
As we know, it assigns to any two elements a and b of A a third
element c = a •:b.
Consider the collection C of the elements of A
that can be represented as the result of the gi\"en algebraic operation.
It is clear that regardless of the algebraic operation all the elements
of C are at the same time the elements of A. It is quite optional,
however, for all the elements of A to be in C.
Indeed, fix in A some element f and assign it to any pair of ele-
ments a and b of A. It is obvious that the resulting correspondence
is an algebraic operation, commutative and associative. The set C
will contain only one element f regardless of the number of ele-
ments in A.
3) Inverse operation 17
Exercises
1. Are there right and left inverses of the algebraic
operations given by tables (2.1)?
2. What are the right and left inverse of the algebraic operation z•y = zll
defined on the set of positive numbers z and y?
3. Prove that if the right and the left inverse coincide, then the original
algebraic operation is commutative.
4. Prove that if an algebraic operation has an inverse, then the right and
the left inverse have inverses too. What are these?
5. Construct an algebraic operation for which all four inverses of the inverse
operations coincide with the original operation.
4) Equivalence relation 19
4. Equivalence relation
Notice that in discussing above the properties
of the algebraic operation we implicitly assumed the possibility
of checking any two elements of a set for coincidence or noncoinci-
dence. Moreover, we treated coinciding elements rather freely
never making any distinction between them. We did not assume
anywhere that the coinciding elements were indeed one element
rather than different objects. But actually we only used the fact
that some group of elements, which we called equal, are the same
in certain contexts.
This situation occurs fairly often. Investigating general proper-
ties of similar triangles we in fact make no distinction between any
triangles having the same angles. In terms of the properties preserved
under a similarity transformation, these triangles are indistinguisha-
ble and could be called "equal". Investigating the criteria of the
equality of triangles we make no difference between the triangles
that are situated in different places of the plane but can be made
to coincide if displaced.
In many different problems we shall be faced with the necessity
of partitioning one set or another into groups of elements united
according to some criterion. If none of the elements is in two differ-
ent groups, then we shall say that the set is partitioned into disjoint,
or rwrwverlapping, groups or classes.
Although the criteria according to which the elements of a set
are partitioned into classes may be very different, they are not
entirely arbitrary. Suppose, for example, that we want to divide
into classes all real numbers, including numbers a and b into the
same class if and only if b > a. Then no number a can be in the
same class with itself, since a is not greater than a itself. Conse-
quently, no partitioning into classes according to this criterion
is possible.
Let some criterion be given. We assume that with regard to any
pair of elements a and b of a set A it can be said that either a is
related to b by the given criterion or not. If a is related to b, then
we shall write a ,.., b and say that a is equivalent to b.
Even the analysis of the simplest examples suggests the condi-
tions a criterion must satisfy for partitioning a set A into classes
according to it to be possible. Namely:
1. Reflexivity: a ,.., a for all a E A.
2. Symmetry: if a ,.., b, then b ,.., a.
3. Transitivity: if a ,.., b and b ,.., c, then a - c.
A criterion satisfying these conditions is called an equivalence
relation.
We prove that any equivalence relation partitions a set into
classes. Indeed, let Ka be a group of elements of A equivalent to
20 Sets, Elements, Operations (Ch. 1
EKerclses
1. Is it possible to divide all the countries of the world
into classes, placing two countr1es in the same class if and only if they have
a common border? If not, why?
2. Consider a set of cities with motorway communication. Say that two
cities A and B are connected if one can go from A to B by motorway. Can the
cities be divided into classes according to this criterion? If they can, what are
the classes?
3. Say that two complex numbers a and b are equal in absolute value if
I a I = 1 b I· Is this criterion an equivalence relation? What is this partition
into classes?
4. Consider the algebraic operations of addition and multiplication of com-
plex numbers. How do they act on classes of elements equal in absolute value?
5. Construct algebraic operations on the set defined in Exercise 2. How do
they act on the classes?
-
parallel to AC. Denote the point of the intersection of the last two
lines by D. The directed line segment CD will be precisely the
~
a and b therefore we can use the following device. Fix some point
and translate to it the vectors a and b. If they completely coincide,
then a = b, and if not, a =F b.
Besides the set consisting of all vectors of a space we shall often
deal with other sets. These will mainly be sets of vectors either
parallel to some straight line or lying on it or parallel to some plane
or lying in it. Such vectors will be called respectively collinear and
coplanar. Of course, on the sets of collinear and coplanar vectors
the above definition of the equality of vectors is preserved.
We shall also consider the so-called zero directed line segments
whose initial and terminal points coincide. The direction of zero
vectors is not defined and they are all considered equal by defini-
tion. If it is not necessary to specify the limiting points of a zero
vector, then we shall denote that vector by 0.
Also it will be assumed by definition that any zero vector is
parallel to any straight line and any plane. Throughout the following
therefore, unless otherwise specified, the set of vectors of a space,
as well as any set of collinear or coplanar vectors, will be assumed
to include the set of all zero vectors. This should not be forgotten.
Exercises
1. Prove that the nonzero vectors of a space can be
partitioned into classes of nonzero collinear vectors.
2. Prove that any class of nonzero collinear vectors can be partitioned into
classes of nonzero equal vectors.
3. Prove that any class of nonzero equal vectors is entirely in one and only
one class of nonzero collinear vectors.
4. Can the nonzero vectors of a space be partitioned Into classes of coplanar
vectors? If not, why?
5. Prove that any set of nonzero coplanar vectors can be partitioned into
classes of nonzero collinear vectors.
6. Prove that any pair of different classes of nonzero collinear vectors is
entirely in one and only one set of nonzero coplanar vectors.
6. Addition
of directed line segments
As already noted, force, displacement, veloc-
ity and acceleration are the originals of the directed line segments
we have constructed. If these line segments are to be useful in solv-
ing various physical problems, we must take into account the cor-
responding physical analogies when introducing operations.
Well known is the operation of addition of forces performed by
the so-called parallelogram law. The same law is used to add dis-
placements, velocities and accelerations. According to the introduced
terminology this operation is algebraic, commutative and associa-
tive. Our immediate task is to construct a similar operation on
directed line segments.
24 Sets, Elements, Operations [Ch. 1
-- --
of the equality of vectors that
BC=OA=a,
- AC=OB=b.
c
A
0
0
Fig. 6.2 Fig. 6.3
____ _
point 0, b to the terminal point of a, and c to the terminal point
of b (Fig. 6.3). Denote by A, B and C the terminal points of a, b
and c. Then __.
(a+ b)+c= (OA+AB)+ BC= OB+ BC= OC,
--+ --+ --+ --+ --+ --+
a+ (b+c) =0A+(AB+ BC)=OA +AC=OC.
6) Addition of directed line segments 25
--
geometric construction we establish that always
AB-L- BA=O,
Therefore the equation
(6.1)
--
for any vectors AB and CD will clearly have at least one solution,
(6.2)
for example,
-- X=BA+CD. (6.3)
- - -
AB+x=CD,
-- - - -
AB+z=CD. -
Suppose (6.2) holds for some other vector z as well, i.e.
Exercises
1. Three forces equal in magnitude and directed along
the edges of a cube are applied to one of its vertices. What is the direction of
the sum of these forces?
2. Let three different classes of collinear vectors be given. When can any
vector of a space be represented as the sum of three vectors of these classes?
3. Applied to the vertices of a regular polygon are forces equal in magnitude
and directed to its centre. What is the sum of these forces?
4. What is the set of the sum of vectors taken from two different classes of
collinear vectors?
7. Groups
Sets with one algebraic operation are in
a sense the simplest and it is therefore natural to begin our studies
just with such sets. We shall assume the properties of an operation
to be axioms and then deduce their consequences. This will allow us
later on to immediately apply the results of our studies to all sets
whel'e the operations have similar properties, regardless of specific
features.
A group is a set G with one algebraic operation, associative (al-
though not necessarily commutative), for which there must exist
an inverse.
7) Groups 27
a n aa I = (a"a) aI = ea = a . I I
it follows that a- 1 ' = a-1 ". This implies the existence and unique-
ness for any element a in G of an inverse element a-1 •
Now it is easy to show that the set G is a group. Indeed, ax = b
and ya = b clearly hold for the elements
x = a- 1 b, y = ba- 1 •
Suppose that there are other solutions, for example, an element z
for the first equation. Then ax = b and az = b yield ax = az.
Multiplying both sides on the left by a-1 we get x = z. So the set G
is a group.
A group is said to be commutative or Abelian if the group opera-
tion is commutative. In that case the operation is as a rule called
addition and the summation symbol a + b is written instead of the
product notation ab. The identity of an Abelian group is called
the zero element and designated 0. The inverse of the operation is
called subtraction, and the inverse element is called the negative
element. It is denoted by -a. It will be assumed that by definition
the difference symbol a- b denotes the sum a+ (-b).
But if for some reason we shall call the operation in a commutative
group multiplication, then its inverse will be assumed to be division.
The now equal products a-1 b and ba- 1 will be denoted by bla and
called a quotient of b by a.
&ere lees
Prove that the following sets are Abelian groups.
Everywhere the name of the operation reflects its content rather than notation.
1. The set consists of integers; the operation is addition of numbers.
2. The set consists of complex numbers, except zero; the operation is multi-
plication of numbers.
3. The set consists of integer multiples of 3; the operation is addition of
numbers.
4. The set consists of positive rationals; the operation is multiplication
of numbers.
5. The set consists of numbers of the form a + bV2, where a and b are
nonzero rationals; the operation is multiplication of numbers.
6. The set consists of a single elernent a; the operation is called addition and
defined by a + a = a.
7. The set consists of integers 0, 1, 2, ... , n-1; the operation is called
mod n addttion and consists in calculating the nonnegative remainder less than n
of the division of the sum of two numbers by the number n.
8. The set consists of integers 1, 2, 3, ... , n - 1, where n is a prime; the
operation is called mod n multtpltcatton and consists in calculating the nonnega-
tive remainder less than • of the division of the product of two numbers by the
number n.
9. The set consists of collinear directed line segments; the operation is addi-
tion of directed line segments.
10. The set consists of coplanar directed line segments; the operation is
addition of directed line se¥Jllents.
11. The set consists of dueeted line segments of a space; the operation is
addition of directed line segments.
30 Set.s, Elements, Operations [Ch. 1
As regards the last three examples, notice that the zero element of an Abeli-
an group of directed line segments is a zero directed line segment and that the
inverse line segment toAB isBA. It follows from what was proved above that
they are unique. Examples of noncommutative groups will be given later.
Exercises
Prove that sets 1-7 are rings and not fields and that
sets 8-13 are fields. Everywhere the name of the operation reflects its content
rather than notation.
1. The set consists of integers; the operations are addition and multiplica-
tion of numbers.
2. The set consists of integer multiples of some number n; the operations
.are addition and multiplication of numbers.
+
3. The set consists of real numbers of the form a. bV2, where a and b
are integers; the operations are addition and multiplication of numbers.
OJ Multiplication of directed line segments 33
-
the so-called magnitude of the directed line segment.
-
The magnitude {AB} of a directed line segment AB is a number
~
equal to the length of the line segment AB taken with a plus sign
~
c A
to zero, i.e.
-
{AA}=O.
Fig. 9.1
-
Regardless of which direction on
- -
are equal; consequently,
--
The magnitude of a directed line segment, unlike its length, may
have any sign. Since the length of A B is the absolute value of its
magnitude, we shall use the symbol I AB I to designate it. It is
clear that in contrast to (9.1)
--IABI = IBAI.
-- -
Let A, B and C be any three points on the axis determining three
directed line segments AB, BC and AC. Whatever the location
of the points, the magnitudes of these directed line segments satisfy
the relation
->
{AB} + {BC}
- = -
{AC}. (9.2)
Indeed, let the direction of the axis and the location of the points
---
be such as in Fig. 9.1, for example. Then obviously
- -- -- -- -
ICAI ={CA}= -{AC},
-- -
-{AC} + {AB} = -{BC}
which coincides essentially with (9.2).
In our proof we used only relations (9.3) and (9.4) which depend
only on relative positions of the points A, B and C on the axis
and are independent of their coincidence or noncoincidence with
one another. It is clear that for any other location of the points
the proof is similar.
Identity (9.2) is the basic identity. In terms of the operation of
-- - -
vector addition, for vectors on the same axis, it means that
--
directed line segments.
-
Now let AB be a directed line segment and let a be a number.
The product a·AB of the directed line segment ABby the real number a
-
is a directed line segment lying on the axis through the points A
- -
and B and having a magnitude equal to a· {AB}. Thus by definition
have vectors lying on the same axis and use relations (9.5) and
(9.6). We prove the fourth property. Suppose for simplicity that
a > 0. Apply vectors a and b to
a common point and construct on
them a paraHelogram whose diagonal
is equal to a+ b (Fig. 9.2). When
a and b are multiplied by a, the
parallelogram diagonal, by the simil-
itude of figures, is also multiplied
by a. But this means that aa +
Fig. 9.2 --.- ab = a (a + b).
Note in conclusion that the magni-
tude of a directed line segment may be treated as some "function"
6 = {z} (9.7)
whose "independent variable" are vectors z of the same axis and
whose "value" are real numbers 6, with
{z + y} = {z} + {y}, (9.8)
{A.x} = "- {z}
for any vectors z and y on the axis and any number A..
Exercises
1. Prove that the result of multiplication by a number
does not depend on the choice of the positive direction on the axis.
2. Prove that the result of multiplication by a number does not depend on
the way the unit line segment is specified on the axis.
3. Prove that if we perform multiplication by a number defmed on any set
of collinear line segments, the result will remain in the same set.
4. Prove that if we perform multiplication by a number defined on any set
of coplanar line segments, the result will remain In the same set.
5. What are the zero and the negative directed line segment in terms of
multiplication by a number?
Exercises
Prove that the following sets are vector spaces E\"ery-
where the name of the operation reflects its content rather than nutation.
1. The field consists of real numbers; the set consists of real numbers; addi-
tion is the addition of real numbers; multiplication hy a number is the multi-
plication of a real number hy a real number.
2. The field consists of real numbers; the set consists of complex numbers;
addition is the addition of complex numbers; multiplication hy a number is
the multiplication of a complex number hy a real number.
3. The field consists of rational numbers; the set consists of real numbers;
addition is the addition of real numbers; multiplication hy a number is the
multiplication of a real number hy a rational number.
4. The field consists of any number; the set consists of a single vector a;
addition is defined hy the rule a + a =- a; multiplication of the vector a hy
any number a is defined hy the rule aa = a.
5. The field consists of real numbers; the set consists of polynomials with
real coefficients in a single variable t, including constants; addition is the addi-
tion of polynomials; multiplication hy a number is the multiplication of a poly-
nomial hy a real number.
6. The field consists of rational numbers; the set consists of numbers of the
form a + + +
bV2 ct/3 dl/5, where a, b, c and d are rationals; addition is
the addition of numbers of the indicated form; multiplication by a number is
the multiplication of a number of the indicated form by a rational number.
7. The field is any field; the set is the same field; addition is the addition of
elements (vectors!) of the field; multiplication hy a number is the multiplica-
tion of an element (a vector!) of the field by an element (a number!) of the field.
(a,.m+a11+1.m+ • ·• +apm)
+(a,., m+l +all+ I. m+l + ... +a,,, m+l) ...J_ • •• +(a,.,. __j_ a,.+l., . .)__ . ·. -a "n)
p P p n P
= )~ a,rn + 2J a,, m+l- ... -;- ~ a,,.= )~ (L; a,,).
1=11 •=II ~=h j=m ;=·,
42 Sets. Elements. Operations [Ch. 1
Consequently,
p 11 n p
~ ( ~ all)= ~ (~ ad).
•=I• i=m J=m •=II
If we agree that we shall always take summation con!'ecutively
over the summation indices arranged from right to left, then the
brackets may be dropped and we finally get
~
p
.t....
i~hi=m
~
~
n
a,J= ,..,L·n
J=mi=ll
"" ao.
p
£...
fJ a.b, =
i=ll
a.P-h+t (J
f=(!
b,.
1=11
II i=m
(J aiJ= ll lJ all·
1=m •=ll
All these facts can be proved according to the same scheme as in
the case of summation of numbers.
Exercises
Calculate the following expressions:
n n n n
~ 1, ~ t, ~ ,. ~ 8(1-1),
•=-1 1=1 •=-! 1-t
n m n m n m p
~ L rj, L ~ (t+5s), ~ .L :L (2,-z)2k,
r=l i=l i=l •=I 1=1]=1 ll=l
n
11
p=t
to''. n n2'-'.
n
•=t )=!
m . n
[] [J
f=t i=t ll=l
m /,
Ll 2'+i+ll.
12) ApfJroximalf' calculalions 43
(12.1)
The Structure
of a Vector Space
The uniqueness of the zero and negative vectors follows from their
uniqueness as vectors of the vector space K.
Notice that the span of vectors eh e 2 , • • • , en is the "smallest"
vector space containing those vectors. Indeed, the span consists
of only linear combinations of vectors eh e 2, ... , en and any
vector space containing eh e 2 , • • • , en must contain all their
linear combinations.
So any vector space contains in the general case an infinite number
of other vector spaces, the spans. Now the following questions
arise:
What are the cond i lions under which the spans of two distinct
systems of vectors consist of the same vectors of the original space?
What minimum number of vectors determines the same span?
Is the original vector space the span of some of its vectors?
We shall soon get answers to these and other questions. To do
this a very wide use will be made of the concept of linear combina-
tion, and in particula1 of its transitive property. Namely, if some
vector z is a linear combination of vectors xlt x 2 , • • • , Xr and
each of them in turn is a linear combination of vectors Y1t y 2 , • • •
. . . , y 8 , then z too may be represented as a linear combination of
Y1• y 2 , • • • , Ys· We prove this property. Let
(13.2)
,...
X,=~ 'VrJYlt
s r " r ~
~ ~.YrJ·
61- •=I
So the concept of linear combination is indeed transitive.
1'1) Linear depcndence 47
Exercises
1. What are in a space of directed line segments the
spans of systems of one, two, three and a larger number of directed linc segments?
2. Consider a vector space of polynomials in t over the field of real numbers.
What is the span of the system of vectors t 1+ +
1, t1 t and 1?
3. In what space do all spans coincide with the space?
4. Prove that the vector space of all directed line segments cannot be the-
span of any two directed line segments.
we find that
Exercises
Prove that the following transformations of a system
of vectors, called elementary, result in an ~ivalent system.
t. Addition to a system of vectors of any hnear combination of those vectors.
2. Elimination from a system of vectors of any vector which is a linear com-
bination of the remaining vectors.
3. MultiJi'lication of any vector of a system by a number other than zero.
4. Addit1on to any vector of a system of any linear combination of the re-
maining vectors.
5. Interchanging of two vectors.
16) rhc basis 53
The concept of basis plays a great role in the study of finite dimen-
sional vector spaces and we shall continually use it for this purpose.
It allows a very easy description of the structure of any vector space
over an arbitrary field P. In addition it can be used to construct
a very efficient method reducing operations on elements of a space to the
corresponding operations on numbers from a field P.
As shown above, any vector x of a vector space K may be repre-
sented as a linear combination
(16.1)
where a 1 , a 2 , • • • , an are some numbers from P and e1 , e 2 , • • • , en
constitute a basis of K. The linear combination (16.1) is called the
expansion of a vector x with respect to a basis and the numbers a 1 , a 2 , •••
• . . , an are the coordinates of x relative to that basis. The fact that x
is givPn by its coordinates a 1 , a 2 , • • • , an will be written as follows:
x = (a 11 a 2, ••• , an)•
As a rule, we shall not indicate to which basis the given coordinates
relate, unless any ambiguity arises.
It is easy to show that for any vector x in K its expansion with
respect to a basis is unique. This can be proved by a device very often
used in solving problems concerning linear dependence. Suppose
there is another expansion
X = ~lel + ~2e2 + · · · + ~nen• (16.2)
Subtracting term by term (16.2) from (16.1) we get
(al - ~~) e1 + (a 2 - ~ 2 ) e2 + ... + (an - ~n) en = 0.
Since e1 , e 2, ••• , en are linearly independent, it follows that all
coefficients of the linear combination are zero and hence expansions
(16.1) and (16.2) coincide.
Thus, with a basis of a vector space K fixed, every vector in K
is uniquely determined by the collection of its coordinates relative
to that basis.
Now let any two vectors x andy inK be given by their coordinates
relative to the same basis e1 , e 2 , • • • , e,., i.e.
x = a 1e1 + a 2e 2 + ... + anen,
Y = Y1e1 + Y1e2 + · · · + Ynen,
then
Z + Y = (al + '\'1) e1 + (a + Yz) ez + · · · + (an + Yn) en•
2
Exercises
1. Prove that the rank of a system of vectors coincides
with the dimension of its span.
2. Prove that equivalent systems of vectors have the same rank.
3. Prove that if a span L1 is constructed on the vectors of a span L1 , then
dim Lt~ dim L.
4. Prove that if a span Lt is constructed on the vectors of a span L 1 and
dim Lt = dim L1, then the spans coincide.
5. Prove that a vector space of polynomials with real coefficients given over
a field of real numbers is infinite dimensional.
or
2(1-2a 1a 2 )
a~+2a:
V2.
This is impossible since the left-hand side has a rational and the
right-hand side has an irrational.
So the space under consideration cannot be two-dimensional
either. But then what sort is it? Surprising as it may be, it is infinite
dimensional. However, the proof of this fact is beyond the scope of
this book.
The particular attention we have given to the examples of vector
spaces of small dimensions is due to the possibility of using them to
construct vector spaces of any dimensions. But we shall discuss
this later on.
Exercises
1. What is the dimension of the vector space of ration-
al numbers over the field of rational numbers?
2. Construct linearly independent systems of vectors in the space of com-
plex numbers over the field of rational numbers.
3. Is an additive group of rational numbers over the field of real numben
a vector space? If not, why?
b= (-~)a.
Consequently, by the definition of multiplication of a directed line
segment by a number, a and b are collinear.
Suppose now that a and b are collinear vectors. Apply them to a
common point 0. They will be on some straight line which is turned
into an axis by specifying on it a direction. The vectors a and b
are nonzero and therefore there is a real A such that the magnitude
of the directed line segment a equals the product of the magnitude of
b by A, i.e. {a} =A {b}. But by the definition of multiplication of
a directed line segment by a number this means that a = Ab. So the
vectors a and b are linearly dependent.
It follows from the above lemma that a vector space of collinear
directed line segments is a one-dimensional space and that any nonzero
vector may serve as its basis.
Lemma 18.1 allows us to deduce one useful consequence. Namely,
if vectors a and b are collinear and a =I= 0, then there is a number A
such that b = 'Aa. Indeed, these vectors are linearly dependent, i.e.
for some numbers a and ~. not both zero, aa + ~b = 0. If it is
assumed that ~ = 0, then it follows that a = 0. Therefore ~ =I= 0
and we may take A = (-a)/~ as a number A.
Lemma 18.2. A necessary and sufficient condition for three vectors
to be linearly dependent is that they should be coplanar.
Proof. We may assume without loss of generality that no pair of
the three vectors is collinear since otherwise the lemma follows
immediately from Lemma 18.1.
So let a, b and c be three linearly dependent vectors. Then we can
find real numbers a, ~ and y, not all zero, such that
cxa + ~b + yc = 0.
If, for example, y =1= 0, then this equation yields
c= ( - ~ ) a+ ( - ~ ) b.
58 The Structure of a Vector Space (Ch. z
Apply a, b and c to a common point 0. Then it follows from the
last equation that the vector c is equal to the diagonal of the paral-
lelogram constructed on the vectors ( -a./y) a and ( -~/y) b. This
means that after translation to the
ccommon point the vectors a, b and
8~-----,., c are found to be in the same plane
and consequently they are coplanar.
h c Suppose now that a, b and c are
coplanar vectors. Translate them
o to the same plane and apply them
a A to the common point 0 (Fig. 18.1).
-
Fig. t8.t Draw through the terminal point
- -- ----+
OA=A.a,
---+
OB=1.1.b.
But OC = OA + OB, which means that c = M + 1.1.b or
A.a + 1.1.b + (-1) c = 0.
Since A., 1.1. and -1 are clearly different from zero, the last equation
implies that a, b and c are linearly dependent.
We can now solve the question concerning the dimension of the
vector space of coplanar directed line segments. By Lemma 18.2
the dimension of this space must be less
than three. But any two noncollinear
directed line segments of this space are
linearly independent. Therefore the vector
space of coplanar directed line segments is
a two-dtmensional space and any two non-
collinear vectors m11y serve as its basis.
Lemma 18.3. Any four vectors are linearly
dependent. c
Proof. We may assume without loss of
generality that no triple of the four vectors
are coplanar since otherwise the lemma Q-=:;;...._+--~
11
follows immediately from Lemma 18.2.
Apply vectors a, b, c and d to a common Fig. 18.2
origin 0 and draw through the terminal
point D of d the planes parallel to the planes determined respec-
tively by the pairs of vectors b, c; a, c; a, b (Fig. 18.2). It follows
from the parallelogram law of vector addition that
----+ ----+ ----+
OD=OC+OE,
18] Vector spaces of directed line segments 59
therefore
(18.1)
- - -
The vectors a, OA, as well as b, OB and c, OC, are collinear by
construction, with a, b and c being nonzero. Therefore there are
numbers A., 1..1. and v such that
---+ ---+ ---+
OA=Aa, OB=!J.b, OC=vc.
Exercises
1. Consider a vector space V 8 to establish the geome~
rica) meaning of the sum and intersection of subspaces.
2. What is the sum of subspaces V1 and V2?
3. What is the intersection of subspaces V1 and V2?
4. Prove that the dimension of the intersection of any number of subspaces
does not exreed the minimum dimension of those subspaces.
5. Prove that the dimension of the sum of any number of subspaces is not
less than the maximum dimension of those subspaces.
number A.
w (x ...L y) = (I) (x) + <•) (y),
(21.2)
w (Ax) = A.w (x).
The 1-1 correspondence between K and K' implies that to any
different independent variables of the function (21.1) there corre-
spond diffPrent ntlues, i.e. if
X =fo y, (21.3)
then
w (x) =F w (y). (21.4)
Consequently, the equality or nonequality of the values of the
function implies respectively the equality or nonequality of its
independent variables.
Isomorphic spaces have much in common. In particular, to a zero
vector there corresponds a zero vector, for
w (0) = w (O·x) = O·w (x) = O·x' = 0'.
The most important consequence, however, is that a linearly in-
dependent system of vectors is sent into a linearly independent
system.
Indeed, let x 1 , x 2 , • • • , Xn ben linearly independent vectors. Con-
sider now a linear combination a 1 w (x 1 ) + a 2 w (x 2 ) + ... + anw (xn)
and equate it to zero. By the property of an isomorphism
0' = a 1 w (x1) +a 2 w (x 2 ) + ... + anw (xn)
+ a 2X 2 + ... + CXnXn)
= w (a 1 x 1 = w (0),
from which we have
a 1x 1 +a 2 x2 + ... + anXn = 0.
Since x 1 , x 2 , • • • , Xn are linearly independent, all the coefficients
must be zero.
The consequence we have proved makes it possible to state that
if two finite dimensional vector spaces are isomorphic, then they
have the same dimension. The converse is also true. Namely, we have
Theorem 21.1. Any two finite dimensional vector spaces having the
same dimension and given over the same field are isomorphic.
Proof. Let K and K' be two vector spaces of dimension n. Choose
a basis e1 , e 2 , • • • , en in K and a basis e;, e~, ... , e~ in K'. Using
these systems of vectors construct an isomorphism w as follows.
To every vector
x = a 1e1 a 2e 2+ + ... +
anen
in K assign a vector
w (x) = a 1 e~ + a e; + ... + ane~
2
21) Isomorphism of vector spaces 67
Exercises
1. Construct an isomorphism from a space vl to the
space of reals over the field of reals.
-., 2. Construct an isomo~hism from a space V 2 to the space of complex num-
bers over the field of reals.
22) Linear dependence, systems of linear equa lions 69
are the leading elements of individual steps and therefore they are
all nonzero.
So from the theoretical point of view the concept of linear depend-
ence has been investigated sufficiently fully. As to practice, however,
it may result in very serious difficulties. Consider, for example in
a space R 110 a system of vectors
a1 = (1, -2, 0, . .. , 0, 0) •
a 2 = (0, 1, -2, ... , o. 0),
.. (22.6)
all-1= (0, 0, o. .. ., 1, -2),
all=(-2-<11-o, 0, o. .. ., o. 1).
Exercises
1. Prove that if system (22.2) is compatible, then it
has a unique solution if and only if the system of vectors a,., a 1 , • • • , 11m is
linearly inde~ndent.
2. Prove ihat if a system of vectors a 10 a 1 , • • • , 11m has rank r, then sya-
tem (22.5) consists of r equations.
3. Assume that the solutions of a system are vectors of a space P . I.e\
b = 0 and let the system of vectors a 1 , a 1, . . . , 11m has rank r. Prove i&at in
this case the set of all solutions of (22.2) forms an (m - r)-dimensional sub-
space of Pm.
4. Find all solutions of the system of linear algebraic equations
"V2sl + 1·z2 = va.
2·zl+ "V2z 2 = "V6.
Solve the same system giving lf2, if§ and V6 to various accuracy. Compare
the results.
5. Establish the relation of the Gauss method to elementary transformations
of a system of vectors.
CHAPTER 3
The axis with the given point 0 and basis vector a forms an affine
coordinate system on the straight line. The point 0 is called the origin,
and the length of the vector a is the scale unit.
- -
The position of any point M on the straight line is uniquely deter-
mined by that of the vector OM. The vectors a and OM are collinear,
with a =fo 0, and so according to the consequence of Lemma 18.1
there is a real a such that
----+
OM=aa. (23.1)
That number is called an affine coordinate of the point M on the
straight line. The point M with a coordinate a is designated M (a).
Notice that with a fixed affine coordinate
system on the straight line relation (23.1)
uniquely defines the affine coordinate a of
any point M of that straight line. Obviously
11
,.
the converse is also true. Namely, relation Fig. 23.1
(23.1) makes every number a uniquely
define some point M of the straight line. Thus given a fixed affine
coordinate system there is a 1-1 correspondence between all real
numbers and the points of a straight line.
Giving points by their coordinates allows us to calculate the
magnitudes of directed line segments and the distances between
points. Let M 1 (a1 ) and M 2 (a 2 ) be given points. We have
~ ----+ ----+
{M 1M 2} = {OM 2 -0M 1} = {a 2a-a 1a} ={(a2 -a 1) a}
=(az-at){a}=(a2 -aJ Ia I. (23.2)
If p (.l/" .lf 2 ) denotes the dbtance between points M 1 and M 2 , then
nate system in the plane. The axis containing a is called the x axis or the
axis of abscissas; the axis containing b is the y axis or the axis of
ordinates.
-
Again the position of any point M in the plane is uniquely deter-
mined by the vector OM and in turn there is a unique vector decom-
position of the form
(23.5)
The real numbers a and ~ are again called the affine coordinates of
the point M. The first coordinate is called the abscissa and the second
the ordinate of M. The point M
with coordinates a and ~ is de-
signated M (a, ~).
On the x and y axes there are
unique points M.>: and My such that
~ ~ ~
-
the coordinate axes. The vectors
OM x and OM y are the affine projections of OM. From the uniqueness
of decompositions (23.5) and (23.6) we conclude that
~
0Mx=aa1 (23.7)
Thus if M has the coordinates M (a, ~). then M x and M Y• as
points of the plane, have the coordinates M x (a, 0) and My (0, ~).
Moreover if
~
OM=(a, ~).
then
~ ~
OMx= (a, 0), OMy=(O, ~).
-
The position of any point M in space is again uniquely determined
by the vector OM for which there is a unique decomposition
- -
M 11 z, Mxz and Mx 11 , called the affine projections of the point M onto
the coordinate planes. Accordingly the vectors OM uz and OMz and
78 Measurements in Vector Space [Ch . .i
_.
-- -- -
so on are called the affine projections of the vector OM. It is obvious that
- -
Oil{ =0Mx+ OM 11 + OM:,
-- -- --
OM 11 : =OM,+ OMz,
OMxz=OMx-+ 0111:,
OM xy =OM x + 0!11 11 •
We conclude, as in the case of the plane, that if a point M has the
coordinates
M (a, ~. y),
then the affine projections of that point will have the coordinates:
M x (a, 0, 0), M 11 (0, ~. 0), M z (0, 0, y), ( J.B)
2
M 111 (0, ~. y), M xs (a, 0, y), M xy (a, ~. 0).
Similarly, if
-
O il{- (a, ~. y),
then
-
OM x =(a, 0, 0), --
OM 11 =(0, ~.
Exercises
1. Which of the points A (cz) and B (-cz) is to the
right of the other on the coordinate axis of Fig. 23.1?
2. What is the locus of points M (cz, ~. y) for which the affine projections Mx11
have coordinates Mx 11 (-3, 2, 0)?
24) Other coordinate systems 79
If the point M coincides with the pole 0, then the polar angle is
considered to be undefined.
Associated in a natural way with every polar coordinate system
is a rectangular Cartesian system. In this the origin coincides with
the pole, the axis of abscissas coincides with the polar axis and the
axis of ordinates is obtained by rotating the polar axis through an
angle of n/2 about 0.
80 Measurements in Vector Space rch. 3
Denote the coordinates of the point Min the Cartesian x, y system
by a and ~- We have the obvious formulas
a = p cos q>, ~ = p sin q>.
!I !I
-
be its projection onto the z, y plane. Denote by p the distance from
!If to 0, by a the angle between the vector OM and the basis vector
of the z axis, and finally by q> the polar angle of the projection M xy·
The spherical coordinates of the point M in space are the three
numbers p, q> and a. The number p is a radius vector, the number q> is
a longitude, and the number a is a colatitude. It is assumed that
0:::;;;; p < +oo, 0:::;;;; q> < 2n, 0:::;;;; a:::;;;; n.
The longitude is undefined for all the points of the z axis, and the
colatitude is not defined for the point 0.
Cartesian coordinates in the z, y, z system and spherical coordi-
nates are connected by the relations
z = p sin a cos q>, y = p sin a sin q>, z = p cos a.
Exercises
1. Construct a curve whose points in polar coordinates
satisfy the relation p = cos 3cp.
2. Construct a curve whose points in cylindrical coordinates satisfy the
relations p = cp- 1 and z = cp.
3. Construct a surface whose points in spherical coordinates satisfy the
relations 0 ~ p ~ 1, cp = :t/2 and 0 ~ a~ :t/2.
MIM2=0M2-0MI.
Further, by the definition of the affine coordinates of M 1 and M '1.
~ ~
--
or according to the accepted notation
--
onto the same coordinate plane or the same coordinate axis we obtain
a new directed line segment. It is called a coordinate projection of the
vector M 1 M 2 •
Every vector in space has six coordinate projections-three pro-
jections onto the coordinate axes and three projections onto the
coordinate planes. It is easy to find the coordinate projections in
the basis i, j and k from the coordinate points M 1 (a1 , ~ 1 , y1) and
we find that
-
M t.:.:Mu:=(a2-at, 0, 0). (25.2)
Similarly
--
Jl1t.xzM 2.:rz = (a2- Cit, 0, '\'2- Yt)
and so on for all the remaining projections.
Comparing the first of the formulas (23.4) with formulas of the
--
type (25.2) we conclude that
--
coordinate axes coincide with the coordinates of that vector.
The second of the formulas (23.4) allows us to calculate the lengths
of the projections of M 1M 2 onto the coordinate axes from the coordi-
--
nates of M 1 and M 2 • Namely,
-- --
1Mt.:.:M2.:.: I= I a2-atl. I MtyM2u I= I ~2-~tl•
(<~;-:.:---~2
I
I.,"'
,,'' /1.
I I
..... - - - - - ---- /1,1 ,.,,"
I
I 0 I II 17
""I 7
~!. ;llo
I I I "l I ,." !/
I ~I --------1--.,.l.:::_-+--Y"
: I,-'" I_,"
-----------~---~
Fig. 25.1
~
Suppose a and b are given by their
coordinates
a = (xl, Y1• zl),
Q B
Then
Fig. 25.2
b- a = (x2 - xl, Y2 - Yl• z2 - zl).
As is known from elementary geometry, the square of the length
of one side of a triangle is equal to the sum of the squares of the
lengths of its other two sides minus the double product of the lengths
of those sides by the cosine of the angle between them. Therefore
Ib - a 12 = I a 12 + I b 12 - 2 I a I· I b I cos {a, b}
or, taking into account formula (25.4),
(x2- xl)2 + (Y2- y.)2 + (z2- zl)2 = x: + y: + zi
+ x~ + y~ + z~- 2 (x: + y: + z:) 112 (x~+ y~ ..,- z;) 112 cos {a, b}.
Performing elementary transformations we find
(25.5)
-
--+
Changing the direction on the axis makes the numbers {M1 M}
and {M M 2 } change sign simultaneously. Hence ratio (25.6) is in-
Some problems 85
(25. 7)
----+ ----+
By formula (23.4) {Jl/I.,.M,} = a - ai and {M.,Jlt/ 2.,} = a 2 - a.
~ow taking into account (25. 7), we find that a= (a1 + Aa 2 )/(1 + A).
The calculation of the coordinates ~ and y is similar. So
a,+).a2 ~ ~ 1 -l A~2 "Y•+Ai'2
a= I 1- ). ' = 1 -·- ). ' '\' = I -- ).
' '
----+
Notice that A> 0 if ill is inside the line segment MIM 2 , A< 0
----+
if Jll is ontside MIJlt/ 2 , and A = 0 if M coincides with MI. If M
moves from M I to M 2 (excluding the coincidence with Jl/ 2 ), the
ratio A first takes on a zero value and then all possible positive
Yalues successively in increasing order. If M moves from MI in the
positive direction of the axis (see Fig. 25.3), then A first assumes a
zero value and then negative values in decreasing order approaching
arbitrarily closely A = -1 but always remaining greater than
I. = -1. If M moves in the negative direction from M 2 , then A
takes on all possible negative values in increasing order but always
remains less than A = -1.
Thus a 1-1 correspondence could be established between all real
numbers and the points on the straight line if the straight line
----+
contained a point M dividing the line segment M 1 M 2 in the ratio
86 :\lea"urrmenl!> in Vector Space [Ch. 3
-
Orthogonal projections of a ''ector. Let u be some axis in space
and let AB be a directed line segment. Draw through the points
A and B the planes perpendicular to u (Fig. 25.4). The intersection
of the planes with the axis determines the points Au and Bu. with
--
Au lying in the same plane as A
and Bu in the same plane asB.
-
The directed line segment AuBu
is called the orthogonal projection
of AB onto the axis u. The follow-
Fig. 25.4
-- -
ing notation is used to denote it:
with the first coordinate of the vector, the other coordinates equalling
zero. Therefore
pru (z + y) = (a 1 + a 2 , 0, 0),
pru (Ax) = (A<x 1 , 0, 0),
(25.10)
pru z = (a 1 , 0, 0),
pru y = (a 2 , 0, 0).
According to the rules of vector addition and of multiplication of
vectors by a number, it follows from the last two equations of (25.10)
that
pru z + pru y = (a1 + a 2 , 0, 0),
A pru z = (A<x 1 , 0, 0).
Comparing the right-hand sides of these with the right-hand sides of
the first two equations of (25.10) we see that both properties of
(25.9) are true.
-
Now let :rt be some plane in space and let AB be a directed line
segment. On dropping from the points A and B perpendiculars to :rt
we obtain in :rt two points An and B:r. which determine a directed
----+ -
line segment A:r.B:r.. This is called the orthogonal projection of AB
onto n. The same notation is used to denote it, i.e.
----+ -
A:r.Bn = pr:r. AB.
Of course, for orthogonal projections onto the same plane we have
relations similar to (25.9). To prove this we may fix some Cartesian
system where n is a coordinate plane and use again the corresponding
properties of projections onto a coordinate plane.
We have discussed orthogonal projections of vectors in space.
Undoubtedly a close analogy holds for vectors in the plane.
Exercises
1. Two nonzero vectors are given by their Cartesian
coordinates. When are they perpendicular?
2. Find the coordinates of the centre of gravity of three particles, given their
Cartesian coordinates and masses.
3. Find the area of a triangle, given the Cartesian coordinates of its three
vertices.
4. Let z, a, b and c he nonzero vectors in space, with a, b and c mutually
perpendicular. Prove that
cos2 {x, a) + cos 2 {z, b} + cos2 {z, c) = 1.
5. Denote hy 1t any coordinate plane, and hy u denote any coordinate axis
in tt. Prove that for any vector z
pr 11 (pr:r. z) = pru z.
88 Mea"url'ments in Vector Spare [Ch. ;,
It may seem that these formulas are trivial, since they follow
immediately from (26.2), without any reference to formulas (25.4)
and (25.5). We shall not jump at a conclusion but note a very impor-
tant fact.
Notice that as a matter of fact our investigation proceeded in
three steps. Using (26.2) we ti.rst proved that properties (26.5) are
true. Then relying only on these properties and the orthonormality
of the basis vectors of a coordinate system we established formula
(26.6). And finally we obtained formulas (26. 7) by using formulas
(25.4) and (25.5) which were derived without using the concept of
a scalar product of vectors.
Starting from this we could now introduce the scalar product
not by giving its explicit form but axiomatically, as some numerical
function defined for every pair of vectors, requiring that properties
(26.5) should necessarily hold. Relation (26.6) would then hold for
any coordinate systems where the basis vectors are orthonormal in
the sense of the axiomatic scalar product. Consequently, bearing
in mind the model of a Cartesian system we could axiomatically
assume that the lengths of vectors and the angles between them are
calculated from formulas (26. 7). It would be necessary of course to
show that the lengths and angles introduced in this way possess the
necessary properties.
Exercises
1. Given two vectors a and b, under what conditions on
the number a are the vectors a and b +
aa orthogonal? What is the geometrical
interpretation of the problem?
27] Euclidean space 91
for any vectors z 1 and y1, any numbers a 1 and ~~and any numbers
r and s of summands.
Any linear subspace L of a Euclidean space E is a Euclidean space
with the scalar product introduced in E.
It is easy to show a general method of introducing a scalar product
in an arbitrary real space K. Let elt e 2 , • • • , en be some basis of K.
Take two arbitrary vectors x and y in K and suppose that
X = ~1e1 + ~2e2 + • · · + ~nen,
Y = t'J1e1 + t'J2e2 + • • • + t'Jnen.
A scalar product of vectors can now be introduced, for example, as
follows:
(27.2)
92 Mca:<tlll'llll'llls in v,•ctor Space (Ch .•J
It is not hard to check that all the axioms hold. Therefore the vector
space K with scalar product (27.2) is Euclidean.
Notice that a scalar product can be introduced inK in other ways.
For example, a scalar product in K is the following expression:
(x, y) = a,~,lh + a2~2t'J2 + · · · + an~nt'ln
for any fixed positive numbers a,, a 2 , • • • , a,.. \Ve should not be
confused by this lack of uniqueness. For is there anything strange
in the fact that lengths can be measured in metres or inches, angles
can be measured in degrees or radians and so on? It is tllis lack of
uniquene:;s that makes it possible to take the fullest account of the
properties of particular ~paces when introducing a scalar product
in them.
Introducing a ~calar product in spaces of directed line segments
we had to define it separately when al least one of the ~egmenls was
zero. The seal ar product was assumed to be ze1·o. ::'<l' ow I his fact is
a property arising from a:xioms (27.1). If x is an arbitJ·aJ·y \"ector
in E, then
(0, .1:) = (Ox, x) = 0 (x, x) = 0.
Of course, by the fJJ"sl axiom of (27.1) (x, 0) = 0.
A vecto1· x of a Euclidean space is said to be normed if (.1, x) =-= 1.
Any nonzero vector y can be normed by multiplying it by !-orne num-
ber A. Indeed, under the hypothesis
(Ay, AY) = A2 (y, y) = 1,
and thet·efore as a n01 malization factor "e may lake
A = (y, y)-'12.
A ~ystem of vectors is said to be normed if all its vectors are normed.
It folio\\ s from the foregoing that any system of nonzero vecloJ'S
can be normed.
One of the most important properties of a scalar product is stated
by the following
Theorem 27.1 (the Cau(·hy-Buniakowski-Schwarz inequality).
For any tu·o z:ectors x and y of a Euclidean space
(x, y) 2 :::;;;; (x, x) (y, y).
Proof. The theorem is clearly true if y = 0. and therefore we
assume that y =fo 0. Cou~ider a vector x - Ay, where A is an arbi-
trary real number. We have
(x- Ay, x - AY) = (x, x) - 2A (x, y) + A2 (y, y).
1 he left-hand ~ide of the equation contains a scalar p10ducl of
equal vectors. The1efo1e the ~ec(.nd-degJee tiinomial at the right
2il Euclidran "P·lce !)3
Thus
) , (x, y)~ ( (x. y) 2 - .
(X, )
X- 2 -(x. y) ("
( - - ) l , y -;--(
y, y
) y, y) =- ( X,
y, y 2
)
X--(--)--::::"' 0 ,
y, y
Exercises
1. Introduce a scalar product in a space of polynomials
with real coefficients in a single variable.
2. Will a space Rn be Euclidean if the scalar product is introduced in it as
follows:
n
(.r, y)=~ la.tll~il?
•=I
28. Orthogonality
The most important relation between the
vectors of a Euclidean space is orthogonality.
By definition vectors x andy are said to be orthogonal if (x, y) = 0.
By the first axiom of (27.1) the orthogonality relation of two vectors
is symmetrical. In fact in a space of directed line segments the con-
cept of orthogonality coincides with that of perpendicularity.
Orthogonality may therefore be regarded as an extension of the
notion of perpendicularity to abstract Euclidean spaces.
A system of vectors of a Euclidean space is said to be orthogonal
if either it consists of a single vector or its vectors are mutually
orthogonal. If an orthogonal system consists of nonzero vectors,
then it can be normed. A normed orthogonal system is called ortM-
normal.
Interest in orthogonal and orthonormal systems is due to the
advantages they offer in investigating Euclidean spaces.
Thus, for example, any orthogonal system of nonzero vectors and
of course any orthonormal system is linearly independent. Indeed,
let a system xh X:h • • • , x 11 be orthogonal and let Xt -=/= 0 for every i.
This means that (x 1, x 1) = 0 for i -=1= j, but that (x1. XJ)-=/= 0 for
i = j. We write
a1x1 + a~2 + . . . a,x, = 0. +
On performing a scalar multiplication of this equation by any
vector Xt we find
a1 (x~. x1) + a2 (x" x 2) + ... + a, (x" x,) = 0.
28] Orthogonality 95
Consequently,
(28.1)
and of course a. 1 = 0. Thus the system of vectors x 1 , x 2 , • • • , x~~.
is linearly independent.
From (28.1) we deduce, in particular, that if a sum of mutually
orthogonal vectors is zero, then all the vectors are zero.
Especially many useful consequences arise from the assumption
that some orthonormal system e11 e2 , • • • , e8 may form a basis of
a Euclidean space E. In this case every vector x in E must be
uniquely represented as a linear combination
x = a. 1e 1 + a. 2e 2 + ... + a. e 8 8•
or
IX - y I :::;;;; I X I + I y I, (29.3)
lx-yl~llxi-IYII·
Exercises
1. Prove that the length of the sum of any number of
veclors does not exceed the sum of the lengths of those vectors.
2. Prove that the square of the length of the sum of any number of orthog-
onal vectors is equal to the sum of the squares of the lengths of those vectora.
3. Given a Euclidean space of polynomials in a single variable t, find the
angles of a triangle formed by the vectors 1, t 2 and 1 - t2,
4. What is the distance between the polynomials 3f~ + 6 and 2t 8 + +
t 1?
5. Prove that a triangle in a Euclidean space is a right triangle if and only
if the length of one of its sides is equal to tlie product of the length of another
side of the cosine of the angle between them.
where OM
~
~
OM= ~
OML+MLM,
- L E L and M ---+
--
vector OM of the space as the sum
LM j_ L. From
(30.1)
Fig. 30.t
geometrical considerations it is clear
that decomposition (30.1) always exists and is unique.
This example suggests how to set the problem of a perpendicular
in the general case. Suppose in a Euclidean space E some subspace L
is fixed. Take an arbitrary vector f in E and study the possibility
of rJecomposing it as a sum
I= 1 + h, (30.2)
\\here gEL and h j_ L.
We have already encountered this problem. Indeed, the condition
h j_ Lis equivalent to h E LJ.. By Theorem 28.2 a Euclidean space E
is the direct sum of subspaces L and LJ.. Therefore decomposition
(30.2) always exists and is unique.
Bearing in mind the analogy with decomposition (30.1) the vector g
in (30.2) will be called the projection of the vector f onto L, h will be
102 Measurements in Vector Space [Ch. 3
Exercises
1. Does the analogue of the theorem on three perpen-
diculars hold in a Euclidean space?
2. Prove that the sum of two angles between a vector 1 and subspaces L
and LJ. is equal to :t/2.
3. Find the perpendicular and projection of a vector I onto trivial subspaces.
4. Prove that if for fixed subspaces L 1 and L 2 and any vector z
prL,+I 1 z= prL, x+ prL1 x,
then the sum L 1 +
L, is orthogonal.
5. Prove that if subspaces £ 1, L 2 , ••• , Lm are mutually orthogonal, then
for any vector z in E
m
lzl 2 ;;;;. ~ IPrLzl 2 •
t=l I
Exercises
1. Construct a Euclidean isomorphism from V 2 to R 2 •
2. Prove that in Euclidean isomorphic !'paces an orthonormal system of vec-
tors goes over into an orthonormal system.
3. Prove that in Euclidean isomorphic spaces the angles between pairs of
corresponding vectors are equal.
4. Prove that in Euclidean isomorphic spaces a perpendicular and a projec-
tion go over respectively into a perpendicular and a projection.
106 Measurements in Vector Space [Ch. 3
Using this space it is easy to show the part played by complex con-
jugation in the first axiom. If the scalar product were introduced
in Cn according to formula (27.2), then in C3 , for example, for the
vector
X= (3, 4, 5i)
we would have
(x, x) = 9 16 + 25i 2 = 0. +
The fourth axiom would be found to have failed.
Exercises
1. Compare a Euclidean space R2 and a unitary
space Ct.
2. Write the Cauchy-Buniakowski-Schwarz inequality for a space «;,.
3. If in a complex space the scalar product i:1 introduced according to axi-
oms (27.1), can the Cauchy-Buniakowski-Schwarz inequality hold in such a space?
4. If in a complex space the scalar product is introduced according to axi-
oms (27.1), can there be an orthogonal basis in such a space?
i=l
n
:::;;;;(~ 11 ~~ 2
1)
n
(~
t~t
I e" 2
).
(33.2)
Exertlsea
1. Let e~,., e2 , • • • , en be an orthogonal basis of a Euclid-
ean space. Prove that a system of vectors zu z 2 , • • • , Zn is linearly indepen-
dent 1f
n
~ cos {elt zl} > n--}-.
1=1
The Volume
of a System of Vectors
in Vector Space
Exercises
1. Prove that a, b and c are coplanar if and only if
their triple scalar product is zero.
2. Prove that for any three vectors a, b and c
[a, [b, c)) = (a, c) b - (a, b) c.
3. Prove that vector multiplication is not an associative operation.
4. Find an expression of the oriented area of a parallelogram in terms of the
Cartesian coordinates of vectors in the plane.
5. Will formulas (34.5) and (34.6) change if the coordinate system relative
to which the vectors are given is left-handed?
I
Fig. 35.t Fig. 35.2
Exercises
1. Prove that in spaces of directed line segments ori-
ented length, area and volume are defined uniquely hy conditions (34.4).
2. Will the same concepts he uniquely defmed if one of the conditions (34.4)
is excluded?
3. Prove that in any Euclidean space V (x1 , x 2) = I x 1 1 • I x 1 I if ancl only
if vectors x1 and x 2 are orthogonal.
4. Prove that in any Euclii:lean space V (x1 , x 2) = V (x 2 , x1 ).
=( n I ortLI .rl+
i=O
I I) ( n I ortK, Yt+t I)
1=0
= v (.r,, ... '.rp) "(y,, ... 'y,).
Before proceeding to further studies we make a remark. The
volume of a system is only expressible in terms of the perpendiculars
dropped to spans formed by the preceding vectors. Taking into
account the properties of perpendiculars, it may therefore be con-
cluded that the volume of the system will remain unaffected if to
any vector any linear combination of the preceding vectors is added.
In particular, the volume will remain unaffected if any vector is
replaced by the perpendicular from that vector to any span formed
by the preceding vectors.
Property 4. The volume of a system of vectors remains unaffected
by any rearrangement of vectors in the system.
Consider first the case where in a system of vectors .r1, ... , Zn
two adjacent vectors Zp+J and ZpH are interchanged. According
to the above remark the volume will remain unaffected if Zp+l
and Zp+ 2 are replaced by the vectors ortr,. Zp+l and ortrP Zp+t•
and Zp+s• .•. , Zn are replaced by ortLP+ 2 Zp+s• ... , ortLp+ 2 Zn·
But now three sets of vectors
x 1, ••• , Xp,
ortLp .rp+t• ortLp Zp+Z•
ort 1'1>+2 Xp+S• • •• ,ortL
P+2
Zn
36) Properties of a volume 119
A little later we shall prove that any pE>rmutation x;,, x;,, ...
• • • , Xinof vectors of a system x 1 , x~, ••• , Xn can be obtained by
successive interchanging of adjacent vectors. For an arbitrary per-
mutation therefore Property 4 follows from the above special case.
Property 5. The volume of a system of vectors is an absolutely homo-
geneous function, i.e.
V (x1 , ••• , axp, ••• , Xn) = I a I V (x1 , • . • , Xp, • • • , Xn)
for any p.
By Property 4 we may assume without loss of generality that
p = n. But then in view of (30.6) we get
n-2
V (x 1, •••• Xn-l• axn) = Ull ortL 1 x 1+ 1 I) I ortLn-l (axn) I
n-1
=I a I fl
i=O
I ortL 1 x1+ 1 I =I a I V (xl, ... , Xn-1• xn).
Property 6. The volume of a system of vectors remains unaffected
if to some one of the vectors of the system a linear combination of the
remaining vectors is added.
Again by Property 4 we may assume that to the last vector a linear
combination of the preceding vectors is added. But, as already
noted, in this case the volume remains unaffected.
The volume of a system of vectors is a real-valued function.
This function possesses some properties part of which we have already
established. They have confirmed our supposition that the volume
120 The Volume of a System of Vectors [Ch. 4
Exercises
1. Give a geometrical interpretation of Properties 2,
3 and 6 in spaces of directed line segments.
2. Give a geometrical interpretation of equation (36.5) in spaces of directed
line segments.
3. Can a function satisfying conditions (36.3) he zero on any linearly inde-
pendent system of vectors?
4. Suppose relative to an orthonormal basis e1 , e2 , • • • , en a system of vectors
z1 , x 2 , ••• , xn possesses the property
(x 1, eJ) = 0
fori = 2, 3, ... , nand 1 < i (1 = 1, 2, ... , n- 1 and 1 > 1). Find the expres-
sion for the volume V (x 1, x 2 , • • • , xn) in tenns of the coordinates of vectors
x1 , x 2 , • • • , Xn in the has1s e~, e2 , • • • , en.
5. What will change in tbe concept of volume for a complex space?
- V± (z 1 , z 1 , z 3 , • • • , Zn) + V± (z 2 , z 2 , z 3 , • • • ,
+
V± (z1 , z 2 , z 3 , • • • , Zn) + V± (z 2 , z 1 , z 3 , ... ,
On the right of this equation the first two terms are zero from which
it follows that Property 2 is valid. It is again not hard to prove that
if A holds, then B and Property 2 are equivalent.
Property 3. The oriented volume of a system of vectors remains
unaffected by addition, to any vector, of any linear combination of the
remaining vectors.
122 The Volume of a Syswm of Vectors [Ch. 4
n
= y± (x 1, x2, ••• , Xn) + ~ a 1y± (x" x 2, ••• , Xn)·
~2
In this equation all terms at the right but the ftrst are zero according
to Property 1.
Property 4. The oriented volume is a Mmogeneous function, i.e.
V± (xlt . . . • CXXp, •••• Xn) = a V± (xlt ...• Xp, •••• Xn)
for any p.
This property is a direct consequence of A.
Property 5. The equation V± (x1, x 2 , ••• , Xn) = 0 holds if and
.only if the system of vectors x 1 , x 2 , • • • , Xn is linearly dependent.
Obviously it is only necessary to prove that V± (x 1 , x 2 , • • •
• • • , Xn) = 0 implies a linear dependence of the vectors x 1 , x 2 , • • •
• • • , Xn· Suppose the contrary. Let the oriented volume be zero for
some linearly independent system Y1t YoJ.• ••• , Yn· This system
is a basis of En and therefore for any vector z in En
Z = CI1Y1 + CI2Y2 + • • • + CinYn·
Exercises
1. Prove that if A of (34.4) holds, then B is equivalent
to both Property t and Property 2.
2. Prove that whatever tlie real number o: may be there is a system of vectors
such that its oriented volume equals o:.
3. Suppose C of (34.4) is replaced by the condition of equality to any fixed
number on any fixed linearly independent system. How is an oriented volume
affected?
4. Were the presence of a scalar product in a vector space and the reality of
an oriented volume used in deriving the properties of an oriented volume? What
will change if we consider an oriented volume in a complex space?
38. Permutations
Consider a system xlt x 2 , • • • , Xn and a sys-
tem x 1,, x 1,, • • • , x 1 n obtained from the first using several permu-
tations of vectors. Suppose these systems may be transformed into
each other by successive transformations of only pairs of elements.
Then the volumes of the systems will be the same and their oriented
volumes will be either the same or differ in sign depending on the
number of transpositions required.
124 The Volume or .t System or Vectors [Ch. 4
Suppose the theorem has already been proved for any permutations
containing no more than n - 1 numbers. Consider permutations
of n numbers. Suppose we are to begin with a permutation i 1 , i 2 , • • •
. . . , i,1 • We shall arrange permutations according to the following
principle. We begin with permutations with i 1 in the first place.
According to the assumption all these permutations can be ordered
in accordance with the requirements of the theorem, since in fact
it is necessary to arrange in the required order all permutations
of n - 1 numbers.
In the last permutation obtained in this way we make one trans-
posttion, transferring to the fust place the number i 2 • We then put
in ot·der, as in the pt·eceding case, all permutations with a given
number in the ftrst place and so on. In this way it is possible to look
ovet· all permutations of n numbers.
With such a system of arranging permutations of n numbers
adjacent permutations will have opposite parities. Considering
that n! is even for n ~ 2 we can conclude that in this case the number
of even permutations of n numbers equals that of odd pet'mutations
and is nl/2.
Exercises
Namely,
n n n
V± (x 1, x 2 , ••• , Xn) = V± ( .~ a 1;,z;,, .~ a2;,z;,, ... , ~ an; z; )
Ja=-1 J,=l in=! n n
n n n
= ~ a,;, V± (z;,. ~ a2,,z;, •... ' ~ anJ Zj )
,,_1 J,=l jn=t n n
n n n
= ~ ~ a 1;,a2;, V± (z 1,, z;,, ... , .~ an;nz;n)
,, ... t f.:,t Jn=l
n n n
= ... = ~ ~ .... ~ a,;,a2;, ..• an;n V± (z;,, z;,, ... , z;n).
Ja=l ,r.:, Jn-1
(39.2)
In the last n-fold sum most of the terms are zero, since by Prop-
erty 1 the oriented volume of a system of vectors is zero if any two
vectors of the system coincide. Therefore only those of the systems
z;,, z;,, ... , z1n should be considered for which the set of indices
jh i 2 , • • • , in is a permutation of n numbers 1, 2, .•. , n. But
in this case
y: (z,,, Zj Zjn) = ± 1 1
, ••• ,
for some numbers a. and ~- Denote by a;,J and a;; respectively the
coordinates of x~ and x;
in a basis z1 , z2 , • • • , Zn. Then it is obvious
that
a PI = a.a~; + ~a~J
fot· every j in the range from 1 to n. We further find
~ ± a,;, ... avip ... anln
= ~ ± a ti, ... (a.a~Jp + ~;Jp) . · · anln
-=a.~ ± a 1;, ••• a~;P ... anJn +~ ~ ±ali, ••• a;,P ••• tlnJn'
and thus A of (34.4) holds.
40) Determinants 127
Exercises
1. Was the orthonormality of the system z1 , z2 , ••• , zn
actually used in deriving formula (39.3)?
2. What changes will occur in formula (39.3) if in Condition C of (34.4) the
oriented volume is not assumed to equal unity?
3. To what extent Condition B of (34.4) was actually used in deriving (39.3)?
4. Will the form of (39.3) be affected if we consider an oriented volume in
a complex space?
40. Determinants
Let vectors x 11 x 2 , , ... Xn of a Euclidean
space Rn be given by their coordinates
x, = (all, a12, ••• , a,n)
in basis (21. 7). Arrange numbers a 11 as an array A:
au a12 ••• a1n)
A== a~ I. ~22. ••• a.2n. •
(
ani an2 · · • ann
This array is called a square n X n matrix and the numbers a 11
are matrix elements. If the matrix rows are numbered in succession
from top to bottom and the columns are numbered from left to right,
then the first index of an element stands for the number of the row
128 Th!• Volume of a System of Vectors [Ch ,-.
the element is in and the second index is the number of the column.
The elements a 11 , a 22 , • • • , ann are said to form the principal diag-
onal of the matrix A.
Any n 2 numbers can be arranged as a square n X n matrix. If
the 1·ow elements of a matrix are assumed to be the coordinates of
a vector of Rn in basis (21. 7), then a 1-1 correspondence is established
between all square n X n matrices and ordered systems of n vectors
of the space Rn.
In Rn, as in any other space, there is an oriented volume. It will
be unique if we require that Condition C of (34.4) should hold on
the system of vectors (21.7). Taking into account the above 1-1 cor-
re!'pondence we conclude that a well-defined function is generated
on the set of all square matrices. Considering (39.3) we arrive at the
following definition of that function.
An nth-order determinant corresponding to a matrix A is an algeb-
raic sum of n! terms which is made up as follows. The terms of the
determinant are all possible products of n matrix elements taken
an element from each row and each column. The term is taken with
a plus sign if the indices of the columns of its elements form an even
permutation, provided the elements are arranged in increasing
order of the row indices, and a minus sign otherwise.
To designate a determinant we shall use the following symbol:
au a12 • • • a1n )
det a~·. a.a2. . • • a~". (40.1)
(
ant an2 ann
• • •
minant
au a21 ••• ani )
det a~2. a.22. • • • a~2·
(
aln a2n ••• ann
1
or det A is said to be obtained by transposing determinant (40.1).
As to transposition the determinant possesses the following impor-
tant property:
The determinant of any matrix remains unaffected when transposed.
Indeed, the determinant of a matrix A consists of terms of the
following form:
(40.2)
Note that in all cases the COl-responding minors are defined by the
same set of elements. Moreover, in cases (2) to (4) they coincide and
in case (1) they only differ in sign, since each of them is obtained
from the other by interchanging two columns. For similar reasons,
the corresponding complementary minors differ in sign in case (2)
and coincide in the remaining cases. The algebraic adjuncts and
complementary minors coincide up to a sign which depends on the
parity of the sum of the indices of the rows and columns containing
the minor. They are the same in cases (1) and (2) and differ by unity
in cases (3) and (4).
Comparing now the corresponding terms of F (x 1, x 2, ••• , Xn)
and those of the function resulting from permuting x 1 and x 1+1
we notice that they coincide up to a sign. Consequently, if two adja-
cent vectors are interchanged F (x11 x 2 , • • • , Xn) changes sign.
The theorem is often used where only one row or one column is
chosen. The determinant of a 1 X 1 matrix coincides with its only
element. Therefore the minor at the intersection of the ith row and
jth column is equal to the element a 11 . Denote by A 11 the algebraic
adjunct of a 11 • By Laplace's theorem, for every i
(40.5)
This formula is called the expansion of the determinant with respect
to the ith row. Similarly for every j
(40.6)
which gives the expansion of the determinant with respect to the jth
column.
We replace in (40.5) the elements of the ith row by a collection of
n arbitrary numbers b1 , b 2 , • • • , bn. The expression
r
a~ 1. ~12. .' •. • . a.ln.)
det bt b2 • • • bn (40. 7)
.....
lant an2 ••• annJ
obtained from the determinant d by replacing the ith row with the
row of the numbers b1 , b 2 , • • • , bn. We now take as these numbers
the elements of the kth row of d, with k =fo i. The corresponding de-
terminant (40. 7) is zero since it has two equal rows. Consequently,
a,uAII + auA12 + ... + a11nA1n = 0, k=F i. (40.8)
132 The Volume of a System of Vectors [Ch. 4
Similarly
ali,All +a 2 11A 2 1 + ... + anltAnl = 0, k =I= j. (40.9)
So the sum of the products of all elements of any row (column) of
a determinant by the algebraic adjuncts of the corresponding ele-
ments of another row (column) of the same determinant is zero.
In conclusion note that the entire theory of determinants can be
extended without change to the case of complex matrices. The only
thing lost is the visualization associated with the concept of volume.
Exercises
1. Write expressions for determinants of the second
and third orders in terms of matrix elements. Compare them w1th expression
(34.6).
2. Write Hadamard's inequality for the determinant of matrices A and A'.
3. A determinant of the nth order all elements of which are equal to unity
in absolute value equals nn' 2 • Prove that its rows (columns) form an orthog-
onal baeis.
4. Find the determinant whose elements satisfy the conditions a 11 = 0 for
'>5. The<elements of a determinant satisfy the conditions a
j (t j, t;;;;, j, t~ J).
11 = 0 for t > k
and j ~ k. Prove that the determinant is the product of the pnncipal minor of
order k and its complementary minor.
6. Let the elements of a complex matrix A satisfy the conditions au = 11 a
for all t anc! f. Prove that the determinant of such a matrix is a real number.
(41.1)
where the rows represent the coordinates of the given vectors in the
chosen basis.
Such an array is called a rectangular matrix. As before the first
index of an element a 11 stands for the number of the matrix row
the element is in, and the second index is the number of the column.
If we want to stress what number of rows and columns a matrix A
has, we shall write A (m X n) or say that the matrix A is an m
41) Linear dependence and determinants 133
Exercises
1. What is a matrix all of whose minors are zero?
2. For a square matrix, are its basis rows and basis columns equivalent
systems of vectors?
3. Do the elementary transformations of its rows and columns, discussed in
Section 15, affect the rank of a matrix?
4. Prove the inequality
n
O~G(zb z 21 ... , Zn)~ IJ (Zj, Zj).
i=l
and carry out the indicated procedure for every i =I= k. All the ele-
ments of the pth column of the new matrix, except the leading one,
will then be zero. Expanding the new determinant with respect to
the pth column we reduce calculating the determinant of the nth
order to calculating a single determinant of order (n - 1). We pro-
ceed in a similar way with this determinant and so on.
This algorithm is called the Gauss method. To calculate the deter-
minant of the nth order by this method would require carrying out a
total of 2n3 /3 arithmetical operations. Now the determinant of the
hundredth order could be computed, on a computer performing
106 arithmetical operations per second, in less than a second.
In conclusion note that with arithmetical operations approxi-
mately performed and information approximately given, the results
of computing determinants should be regarded with some caution. If
conclusions about linear dependence or independence of a system
of vectors are made only on the strength of the determinant being
zero or nonzero, then in the presence of instability pointed out in
Section 22 they may turn out to be false. This should be borne in
mind whenever a determinant is used.
Exercises
1. What accounts for a faster rate of calculation of a deter-
minant by the Gauss method as compared with straightforward calculation?
2. Let all elements of a determinant equal at most unity in absolute value
and suppose that in calculating each element we make an error of an order of e.
For what n does a straightforward calculation of the determinant make sense in
~s of accuracy?
3. Construct an algorithm for calculating the rank of a rectangular matrix,
usin~ the Gauss met~od. What is the result of applying this algorithm to ap-
proximate calculations?
CHAPTER 5
--
to L will be collinear with n.
Take a point M 0 (z0 , y0 ) on L. All points M (z, y) of L and only
--
those points possess the property that the vectors M 0 M and n are
perpendicular, i.e.
(MoM, n)- 0. (43.2)
Since
--
M 0 M = (z-z0 , y- ye),
from (43.1) and (43.2) it follows that
A (z - z 0 ) + B (y - y 0 ) = 0.
Letting
-Az 0 - By 0 = C,
we conclude that in the given z, y system the coordinates of the
points of L and only those coordinates satisfy
Az + By + C = 0. (43.3)
Among the numbers A and B there is a nonzero one. Therefore
equation (43.3) will be called a first-degree equation in variables z
and y.
We now prove that any first-degree equation (43.3) defines relative
to a fixed z, y coordinate system some straight line. Since (43.3) is
a first-degree equation, of the constants A and B at least one is not
43] The equations of a straight line and of a plane 131"
zero. Hence (43.3) has at least one solution .r0 , y 0 , for example,
AC BC
Xo=- Al+Bl• Yo=- A'+B"
with
A.r 0 + By + C =
0 0.
Subtracting from (43.3) this identity yields
A (.r - .r0) + B (y - y0) = 0
equivalent to (43.3). But it means that any point M (.r, y) whose-
coordinates satisfy the given equation (43.3) is on the straight line-
passing through M 0 (.r 0 , y 0 ) and perpendicular to vector (43.1).
So, given a fixed coordinate system in the plane, any first-degree
equation defines a straight line and the coordinates of the points of
any straight line satisfy a first-degree equation. Equation (43.3) is
called the general equation of a straight line in the plane and the
vector n in (43.1) is the normal vector to that straight line.
Without any fundamental changes we can carry out a study of the
plane in space. Fix a Cartesian .r, y, z system and consider a plane-
n. Again take a nonzero vector
n = (A, B, C) (43.4)
perpendicular to n. Repeating the above reasoning we conclude that
all points M (.r, y, z) of n and only those points satisfy the equation
Ax + By + Cz + D = 0 (43.5}
which is also called a first-degree equation in variables .r, y and z.
If we again consider an arbitrary first-degree equation (43.5),.
then we shall see that it also has at least one solution .r 0 , y 0 , z0 ,
for example,
AD BD CD
Xo=- A¥+B'+ c2 • Yo=- A2+B2+ c2 • Zo=- A2+B2 +C2"
We further establish that any point M (.r, y, z) whose coordinates
satisfy the given equation (43.5) is in the plane through the point
M 0 (.r 0 , y 0 , z0 ) perpendicular to vector (43.4).
Thus, given a fixed coordinate system in space, any first-degree-
equation defines a plane and the coordinates of the points of any
plane satisfy a first-degree equation. Equation (43.5) is called the-
general equation of a plane in space and the vector n in (43.4) is the
normal vector to that plane.
We shall now show how two general equations defining the same
straight line or plane are related. Suppose for definiteness that we
are given two equations of a plane n
A1.r + B 1y + C1z + D1 = 0, (43.6)
.A. tZ + B 211 + C~ + D 2 = 0.
138 The Straight Line and the Plane [Ch. 5
The vectors
Itt = (A 1 , B~o C1 ), n2 = (A 2 , B 2 , C2 )
are perpendicular to the same plane n and therefore they are col-
linear. Since furthermore they are nonzero, there is a number t such
that, for example,
or
A1 = tA2, B1 = tB2, cl = tC2. (43.7)
Multiplying the second of the equations (43.6) by t and sub-
tracting from it the first we get by virtue
Y of (43.7)
D 1 = tD 2 •
Consequently, the coefficients of the general
equations defining the same straight line or
plane are proportional.
A general equation is said to be complete
if all of its coefficients are nonzero. An
:c equation which is not complete is called
Fig. 43.1 incomplete. Consider the complete equation
of a straight line (43.3). Since all the coef-
ficients are nonzero, that equation can be written as
z y
-c-+-c-· = 1·
-T -If
If we let
c c
a=-
7 , b=- 8 ,
then we obtain a new equation of a straight line
z y 1
-a+b'=.
This is the intercept form of the equation of a straight line. The num-
bers a and b have a simple geometrical meaning. They are equal to
the magnitudes of the intercepts of the straight line on the coordi-
nate semiaxes (Fig. 43.1). Of course the complete equation of a plane
can be reduced to a similar form
-=-+..!!.+..!.=1.
a b c
Different incomplete equations define special cases of location of
a straight line and a plane. It is useful to remember them since they
occur fairly often. For example, if C = 0, equation (43.3) defines a
43) The equations of a straight line and of a plane 139
and similar equations for the case of a straight line in the plane
X= Xo + lt, (43.11)
y =Yo+ mt.
-- --
are not collinear. Therefore M (x, y, z) is in the same plane with
Mit M 2 and M 8 if and only if the vectors M 1M 2 and M 1M 3 and
~
Exercises
1. Write the equation of a straight line in the plane
through two given points using a second-order determinant. Compare it with
(43.12).
2. Is it correct to say that (43.12) is always the equation of a plane?
3. By analogy with equations (43.10) write the parametric equations of
a plane in space. How many parameters must they contain?
4. Find the coordinates of the normal vector to the plane through three given
points not. on the same straight line.
5. What is the locus of points in space whose coordinates are solutions of
a system of two linear algebraic equations in three unknowns?
Quite similarly derived is the formula for the angle between two
straight lines in the plane given by their general equations
A 1z + B 1y + C1 = 0,
A 2z + B 2 y + C 2 = 0.
One of the angles <p made by these straight lines is calculated from
the formula
--- --
plane and denote by M 1 (x1 , y 1 , z1 ) its foot. It is clear that the dis-
tance p (M 0 , n) from M 0 to the plane is equal to the length of M 0 M 1 •
--
The vectors n = (A, B, C) and M 0 M1 = (x 1 - x 0 , y 1 - Yo.
z1 - z0 ) are perpendicular to the same plane and hence collinear.
'Therefore there is a number t such that M 0 M 1 = tn, i.e.
X1 - x 0 = tA,
Y1- Yo= tB,
Z1 - z0 = tC.
'The point M 1 (x1 , y1 , z1 ) is inn. Expressing its coordinates in terms
-of the relations obtained and substituting them in the equation of a
plane we find
P( M o,n)= (A2+B2+C')lf2
In particular
I Dl
P (0, n) = (A2+B'+C')lf2 •
Along with the general equation (43.5) of a plane we consider the
following of its equations:
±(A 1 + 8 2 + C2)-112 (Ax+ By+ Cz +D)= 0.
Of the two possible signs at the left we choose the one opposite to
the sign of D. If D = 0, then we choose any sign. Then the free
term of that equation is a nonpositive number -p, and the coef-
ficients of x, y and z are the cosines of the angles between the normal
vector and the coordinate axes. The equation
x cos a + y cos~ + z cosy - p = 0 (44.1)
is called the rwrmed equation of a plane. It is obvious that
P (Mo, n) = I Xo cos a +Yo cos~ + z0 cosy- p 1.
p (0, n) = p.
45] The plane in vector space 145
Exercises
1. Under what condition on the coordinates of normal
vectors do two straight lines in the plane (three planes in space) intersect in a
single point?
2. Under what condition is the straight line (43.8) in plane (43.5)?
3. Under what condition are two straight lines in space, given by their
canonical equations, in the same plane?
4. Calculate the angles between a diagonal of a cuhe and its faces.
5. Derive the formula for the distance between a point and a straight line in
space, given by its canonical equations.
f E L - f E L1; f E La - Zo 1- f E H 1; Zo 1- f E H a - Zo 1- f E H,
i.e. any vector of L translated by a vector z0 is in H. Thus the theo-
rem is proved.
A plane is not necessarily a subspace. Nevertheless it can be as-
signed dimension equal to that of the direction subspace. A plane of
zero dimension contains only one vector, the translation vector.
In determining the dimension of an intersection of planes Theo-
rem 19.1 is useful. From Theorem 19.1 and 45.1 it follows that the
dimension of an intersection H does not exceed the minimum one
of the dimensions of H 1 and H a·
If in spaces of directed line segments two (three) vectors are given,
then with some additional conditions it is possible to construct only
one plane of dimension 1 (2) containing the given vectors. Those
additional conditions can be formulated as follows. If two vectors
are given, then they must not coincide, i.e. they must not be in the
same plane of zero dimension. If three vectors are given, then they
must not be in the same plane of dimension one.
148 The Straight Line and the Plane [Ch. 5
Exercises
1. Let H 1 and H 2 be any two planes. Define the sum
H 1 + H 2 to be the set of all vectors of the form z1 + z2 , where zt E H 1 and
1 1 E H 1. Prove that the sum of the planes is a plane.
2. Let H be a plane and let ).. be a number. Define the product ).H of H by )..
to be the set of all vectors of the form ).z, where z E H. Prove that ).His a plane.
3. Will the set of all planes of the same space, with the operations intro-
duced above on them, be a vector space?
4. Prove that vectors z 0 , z1 , ••• , z 11 are in a general position if and only if
the vectors z1 - z 0 , z 2 - z 0 , ••• , z11 - z 0 are linearly independent.
5. Prove that a plane of dimension k containing vectors of a general posi-
tion z 0 , z1 , ••• , z 11 is a subspace if and only if those vectors are linearly de-
pendent.
Exercises
(n~r., z) - f~r. ~ 0,
is a convex set. Such systems of inequalities are the basic element
in the description of many production planning problems, manage-
ment problems and the like.
Exercises
1. Prove that a set of vectors s satisfying (s, s) ~ a
ois convex.
2. Prove that if a vectors is in hyperplane (46.8), then the vectors+ n is
in the positive half-space.
3. Prove that if hyperplanes are given by the normed equations (44.1)
.and (44.2), then the origin is always in the nonpositive half-space.
i= 1, 2, ... , n. (48.3)
~
i=l
a, 1 1= ! ~ all/ ~ b Af/ = ! ~ ~ a, b Af/
i=l i=l
1
i=l f=l
1 1
n n n n
=! ~ ~ b1a, 1A 0 =
\=I i=l
! ~bf=l
1 ~
J=l
a"'A 11 = b,.
The inner sum in the last equation is by (40.5) and (40.8) equal to
either d or zero according as i = k or i =fo k.
Thus formulas (48.3) provide an explicit expression of the solu-
tion of a system in terms of its elements. By virtue of uniqueness
there is no other solution. These formulas are known as Cramer's
rule.
Formally, by calculating determinants it is possible to solve any
systems of equations of the form (48.2). First, calculating the vari-
ous minors of the matrix of the system and those of the augmented
matrix we check the system for compatibility. Let it be compatible
and let the rank of both matrices be r. We may assume without loss
of generality that the principal minor of order r of the matrix of the
system is nonzero. By Theorem 41.1 the last k - r rows of the
augmented matrix are linear combinations of its first r rows. Hence
the system
~ auZt + · ••+ atrZr ; + at.r+tZr+t + · · · + atmZm = b,,
~ ~tZt + ••· + azrZr + az,r+tZr+t + · · · + azmZm = bz,
:. . . . . . . . . . . ............... . (48.4)
or
lim Xn =Xe
for e\·ery n > N. But from this, using the triangle axiom, we find
P (xo, Yo):::;;;; P (xo, Xn) + P (xn, Yo) <e.
By virtue of the arbitrariness of e this means that p (x 0 , y 0 ) = 0,
i.e. Xo = Yo·
A sphere S (a, r) in a metric space X is the set of all elements
x E X satisfying the condition
p (a, x) < r. (49.2)
An element a is called the centre of the sphere and r is its radius.
Any sphere with centre in a is a neighbourhood of a. A set of elements
is said to be bounded if it is entirely in some sphere.
It is easy to see that x 0 is the limit of {xn} if and only if any neigh-
bourhood of x 0 contains all the elements of {x,.} beginning with
some index.
In a metric space it is possible to introduce many of the other ma-
jor concepts dealt with in number sets. Thus if a set M c X is
given, then an element x E X is said to be a cluster point (or limit point
II-OSlO
162 The Limit in Vector Space [Ch. 6
Exercises
1. Prove that if Zn-+ z, then p (zn, z)-+ p (x, z)
for any element z.
2. Will the set of all real numbers be a metric space if for numbers z and y
we set
p (z, y) = arctan I z - y I?
3. Can a set consisting of a finite number of elements have cluster points?
For the set of all reals the converse is also true. That is, any fun-
damental sequence is convergent. In general, however, this is not
true, which is exemplified by a metric space with at least one cluster
point eliminated.
A metric space is said to be complete if any fundamental seqnenc~
in it is convergent.
In complete metric spaces a theorem holds similar to the theorem
on embedded line segments for real numbers. Given some sequence
of spheres, we shall say that they are embedded into one another if
every subsequent sphere is contained inside the preceding one.
Theorem 50.1. In a complete metric space X let {S (an, en)} be a
sequence of closed spheres embedded into one another. If the sequence
of their radii tends to zero, then there is a unique element in X u·hich
is in all those spheres.
Proof. Consider a sequence {an}· Since S (an+P• en+ 1,) c S (an, en)
for any p ~ 0, we have an+P E S (a 11 , en)· Consequently,
p (an+P• a,.)~ en
which implies that {an} is fundamental.
The space X is complete and therefore {an} tends to some limit a
in X. Take any sphere S (a 10 e 11 ). This sphere contains all terms
of {a,.}, beginning with a 11 • By virtue of the closure of the spheres
the limit of {an} is also in S (a 11 , e 11 ). Thus a is in all the spheres.
Assume further that there is another element, b, that is also in
all the spheres. By the triangle axiom
r (a, b)~ p (a, an)+ p (an, b)~ 2en.
Since e,. may be taken as small as we like, this means that p (a, b)
= 0, i.e. that a = b.
The most important examples of complete spaces are the sets of real
and complex numbers. We assume that the distance between numbers
coincides with the absolute value of their difference. The complete-
ness of the set of reals is proved in the course of mathematical analy-
sis. We show the completeness of the set of complex numbers.
Assume that complex numbers are given in algebraic form. The
distance between numbers
z =a+ ib, v= c + id
is introduced by the rule
p (z, v) = I z- v 1. (50.1)
where
I z- v 12 = (a - c) 2 + (b - d) 2 • (50.2)
It is obvious that the axioms of a metric hold.
164 The:> Limit in Vector Space [Ch. 6
b .- aP M
a:::::::::.-p+q- (51 .3)
~otice that if at least one of the vectors x and y is zero, then Hol-
der's inequality is obviously true. It may therefore be assumed that
x =F 0 and y =F 0. Let the inequality hold for some nonzero vee-
1_6_6_______________
T_hc__L_im_i_t_in__
V_cc_to_r_S~p_ac_e____________~[~Ch. 6
tors x and y. Then it does for vectors 'A.x and llY• with any A. and Jl·
Therefore it suffices to prove it for the case where
(51.5)
n n n
11
(~ lxii+YIIIP) P::s;;;(~ lxiiiP) P+(~ IYIIIP)
11 11
" (51.6)
11=1 11=1 11=1
for every p ~ 1.
1\linkowski 's inequality is obvious for p = 1, since the absolute
value of the sum of two numbers is at most the sum of their absolute
values. Moreover, it clearly holds if at least one of the vectors x
and y is zero. Therefore we may restrict ourselves to the case p > 1
and x =I= 0. We write the identity
(I a I + I b I)" = (I a I + I b nv- 1
Ia I
+ (I a I + I b 1)"- 1 I b I·
Setting a = x 11 and b = y 11 and summing over k from 1 to n we get
n
n n n
:::;;;;< ~ ox~~ 1+ 1Y11 l>p)l'q <( ~ 1x~~ 1pr'" + <~ 1y,_ lp)l'p).
11=1 11=1 11=1
we find that
Exercises
1. Derive the Cauchy-Buniakowski-Schwarz inequal-
ity from Holder's.
2. Study Holder's inequality for p-. oo.
3. Study Minkowski's inequality for p-. oo.
(52.4)
where p ~ 1. That the first two axioms hold for a norm is obvious
and the third axiom is seen to hold from Minkowski's inequality
53] Conv«'rgence in the norm 169
The second of the norms is often called a Euclidean norm and desig-
nated II x liE·
In what follows we shall consider only normed spaces with metric
(52.2). The convergence of a sequence of vectors in such a metric is
called the convergence in the norm, the boundedness of a set of vectors
is cal led the boundedness in the norm, etc.
Exercises
1. Prove ~hat ~here is a sequence of vectors whoR
norms form an infinitely large sequence.
2. Prove that for any numbers >..1 and vectors e1
fl n
II i=l
~ i.,e, II~ ~ I AI I II e, 11.
1=1
3. Prove that if z,.-+ z, Yn-+ y and >..,.-+ >.., then
II z,. n- II J: II. z,. + Yn- X+ y, >..,.z,.- Ax.
(53.1}
nates in the expansion with respect to the same basis. Finite dimen-
sional normed spaces are noteworthy for the fact that the not ions of
convergence in the norm and of coordinate convergence are equiva-
lent in them.
It is easy to show that coordinate convergence implies conver-
gence in the norm. Indeed, let (53.3) hold. Using the axioms of the
absolute homogeneity of a norm and of the triangle inequality we
conclude from (53.1) and (53.2) that
n n
II Xm- Xo II= II ~ (~Lm) -61.01 ) e~~.l!::s;;; ~ 16Lm) -~1.0 ' I II ell II- 0.
11=1 11=1
If
n
y ~ 11<P>e 11 ,
= 11=1
p II
then of course
6(mp)
11(p)= _II_ _
II Omp
(53.4)
Exercises
1. Is the requirement that a space should be finite
dimensional essential in proving the equivalence of the two kinds of con-
vergence?
2. Prove that if some set of vectors of a finite dimensional space is bounded
in one norm, then it will be bounded in any other norm.
3. Prove that if in a finite dimensional space Zn-+ z in one norm, then
Zn-+ x in any other norm.
tion suppose that all coefficients are real. Each of the equations.
au.r + al2y = ft,
a21Z + a22Y = I 2
of the system defines in the plane a straight line. The point M of the
intersection of the straight lines gives a solution of the system
(Fig. 55.1).
Take a point M 0 lying on none of the straight lines in th£> plane.
Drop from it a perpendicular to either straight line. The foot M 1 of
the perpendicular is closer to M
than M 0 is, since a projection is
always smaller than the inclined g
line. Drop then a perpendicular
from M 1 to the other straight line.
The foot M 2 of that perpendicu-
lar is still closer to the solution.
Successively projecting now onto
one straight line now onto the other,
we obtain a sequence {M 11 } of
points of the plane converging toM. o'----------_,x,..
It is important to note that the
sequence constructed converges for Fig. 55.1
any choice of the initial point M 0 •
This example suggests how to construct a computational process.
to solve a system of linear algebraic equations of the general form
(48.2). We replace the problem by an equivalent problem of finding
vectors of the intersection of a system of hyperplanes (46.9). Sup-
pose the hyperplanes contain at least one common vector and a'>sume
for simplicity that the vector space is real.
Choose a vector v0 and project it onto the first hyperplane. ProJect
the resulting vector v1 onto the second hyperplane and so on. This
process determines some sequence {vp}· Let us investigate that
sequence.
Basic to the computational process is projecting some vector Vp
onto the hyperplane given by equation (46.8). It is clear that. v,.+I
satisfies that equation and is related to Vp by the equation
Vp+l = Vp + tn
for some number t. Substituting Vp+I in (46.8) determines t. From
this we get
bp -(n, vp) )
Vp+l=vp+ ( (n, n) n.
This formula says that all vectors of the sequence {vp} are in the
plane obtained by translating the span L (n 1 , n2 , ••• , n 11 ) to the
amount of a length of the vector v 0 • But all vectors which are in the
176 The Limit in Vector Space [Ch. 6
Consequently
p-1
~ p2 (Vr, Vr+l)::s;;;p2 (Zo, Vo)
r=O
from which we conclude that
p (vp. v,+l)- 0. (55.1)
Denote by H r the hyperplane in the rth row of (46.9). It is clear
that the distance from Vp to II r is not greater than the distance be-
tween v, and any vector of H r· By the construction of {v,}, among
any k consecutive •lectors of {vp} there is necessarily a vector that is
in any of the hyperplanes. Using the triangle inequality and the
limiting relation (55.1) we get
P (vp. II r):::;;;; p (v,, Vp+ 1) + p (vp-+1• Vp+2) +
... + p (vp+ll-l• Vp+ll)- 0 (55.2)
for every r = 1, 2, ... , k.
The sequence {v11 } is obviously bounded. Choose some convergent
subsequence of it. Let that subsequence converge to a vector z~.
Proceeding in (55.2) to the limit we find that
p (z~, Hr) = 0
for every r = 1, 2, ... , k. But, as already noted earlier, the vector
z~ must coincide with z0 • Consequently, {vJ>} converges to z0 •
Exercises
1. Were the concepts of completeness and closure actu-
ally used in the above investigation?
2. How are other solutions of the system to be found if there are any?
3. How will the process behave itself if the system is incompatible?
PART II
Linear Operators
CHAPTER 7
56. Operators
A major point in creating the foundations of
mathematical analysis is the introduction of the concept of func-
tion. By definition, to specify a function it is necessary to indicate
two sets, X and Y, of real numbers and to formulate a rule assign-
ing to each number z EX a unique number y E Y. That rule is
a single-valued function of a real variable z given on the set X.
In realizing the general idea of functional dependence it is not at
all necessary to require that X and Y l"hould be sets of real numbers.
Understanding by X and Y various sets of elements we arrive at the
following definition generalizing the concept of function.
A rule assigning to each element z of some nonempty set X a unique
element y of a nonempty set Y is called an operator. A result y of
applying an operator A to an element z is designated
y = A (z), y = Az (56.1)
and A is said to be an operator from X to Y or to mapJX into Y.
The set X is said to be the domain of A. An element y of (56.1) is
the image of an element z and z is the inverse image of y. The collec-
tion T A of all images is the range (or image) of A. If each element
y E Y has only one inverse image, then operator (56.1) is said to
be 1-1. An operator is also called a mapping, transformation or
operation.
In what follows we shall mainly consider the so-called linear opera-
tors. Their distinctive features are as follows. First, the domain of
a linear operator is always some vector space or linear subspace.
Second, the properties of a linear operator are closely related to
operations on vectors of a vector space. As a rule in our study of
linear operators we shall assume that spaces are given over a field
of real or complex numbers. Unless otherwise stated, operator will
mean linear operator. In the general theory of operators linear opera-
tors play as important a part as the straight line and the plane do
in mathematical analysis. That is why they require a detailed study.
Let X and Y be vector spaces over the same field P. Consider an
operator A whose domain is X and whose range is some set
178 Matrices and Linear Operators [Ch. 7
Exercises
Prove that the following operators are linear.
1. A basis is given in a vector space X. An operator A assigns to each vector
z E X its coordinate with a fixed index.
2. A vector z 0 is fixed in a space X with a scalar product. An operator A
assigns to each vector z E X a sc&lar product (z, z 0 ).
3. A vector z 0 is fixed in a space V8 An operator A assigns to each vector
z E V 2 a vector product [z, z 0 ].
4. A space X is formed by polynomials with real coefficients. An operator A
assigns to each polynomial its kth derivative. It is called an operator of k-fold
differentiation.
5. In a space of polynomials dependent on a variable t, an operator A assigns
to each polynomial P (t) a polynomial t .p (t).
6. A space X is decomposed as a direct sum of subspaces S and T. Represent
each vectors z E X as a sum z = u + v, where u E S and v E T. An operator A
assigns to a vector z a vector u. It is called an operator of projectlon onto S paral-
lel to T.
Exercises
1. Prove that multiplying an operator by a nonzero
number leaves its rank and nullity unaffected.
2. Prove that the rank of a sum of operators is at most a sum of the ranks
of the summands.
3. Prove that a set of linear operators of CllxY whose ranges are in the same
subspace is a linear subspace.
4. Prove that a system of two nonzero operators of Cll ~ Y whose ranges are
distinct is linearly independent. ·
5. Prove that a space of linear operators in V1 is one-dimensional.
Exercises
1. In a space of polynomials in t, denote by D an oper-
ator of differentiation and by T an operator of multiplication by t. Prove that
DT ::p TD. Find the operator DT- TD.
2. Fix some operator B of the space wxx. Prove that the set of operators A
such that BA = 0 is a subspace in Wx~·
3. Prove that the rank of a product or operators is not greater than the rank
of each of the factors.
4. Prove that the nullity of a product of operators is not leas than the nullity
of each of the factors.
5. Prove that in the ring wx x of linear operators there are zero divisors.
184 ~latrices and Linear Operators [Ch. 7
i.e. A -I is linear.
It is easy to show that A - 1 is nonsingular. For any vector y from
the kernel of A -I we have
A - 1y = 0.
Apply A to both sides of this equation. Since A is a linear operator,
AO = 0. Considering (59.4) we conclude that y = 0. So the kernel
of A -I consists only of a zero vector, i.e. A - 1 is a nonsingular operator.
Thus the set of nonsingular operators is a group relative to multi-
plication. It will be shown somewhat later that this group is non-
commutative.
Using nonsingular operators it is possible to construct commuta-
tive groups too. Let A be an arbitrary operator in a space X. For any
positive integer p we define the pth power of A by the equation
p
(59.5)
where the right-hand side contains p multipliers. By virtue of the
associativity of the operation of multiplication the operator AP is
uniquely defined. Of course it is linear.
For any positive integers p and r it follows from (59.5) that
APA'"=AP+r. (59.6)
If it is assumed by definition that
A0 = E
for any operator A, then formula (59.6) will hold for any nonnegative
integers p and r.
Suppose A is a nonsingular operator. Then so is the operator A'
for any nonnegative r. Hence there is an inverse operator for A r.
By (7.2) and (59.5) we have
r
...----..
(Art!= (A-I)r =A-lA-I ... A-I. (59. 7)
Also it is assumed by definition that
A_,. = (A'")- 1•
Taking into account formulas (59.5) and (59.7) and the fact that
AA -I = A-lA, it is not hard to prove the relation
APA -r = A-,.A•
60] The matrix of an operator 187
for any nonnegative integers p and r. This means that formula (59.6)
holds for any integers p and r.
Now take a nonsingular operator A and make up a set wA of
operators of the form A 11 for all integers p. On that set the multiplica-
tion of operators is an algebraic and, as (59.6) implies, commutative
operation. Every operator A" has an inverse equal to A -P. Contained
in wA is also an identity operator E. Hence wA is a commutative
group relative to multiplication. It is celled a cyclic group generated
by the operator A.
Exercises
1. Prove that if for two linear operators A and B
of we have AB = E, then both operators are nonsiiurular.
CJl;g;x
2. Prove that for operators A and B of CJl;g~ to be nonsingular it is necessary
and sufficient that so should operators AB and BA.
3. Prove that if an operator A is nonsingular and a number a ::P 0, then the
operator aA is also nonsingular and (aA)-1 = (1/a)A - 1•
4. Prove that TAcNA if and only if A1 = 0.
5. Prove that for any operator A
N.4.sNA.sNA.s ... , T.4.~TA.~TA.~ •..•
of X.
Fix a basis e1 , e 2 , • • • , em in X and a basis q1 , q2 , • • • , qn in Y.
A vector e1 is sent by A to some vector Ae1 of Y which, as any vector
of that space, can be expanded with respect to the basis vectors
Ae1 = a 11 q1 +a 21 q 2 + ... + an qn.
1
Similarly
(60.1)
Obviously
1n m m n nm
Az=A(_L ~ 1 e1 )- ~ ~ 1 Ae 1 = ~ ~ 1 1; a, 1q1 -~ (.~ ~p 11 )q,.
]=1 f=l )=t i=t i=t i=l
Comparing the right-hand side of these equations with expan•
sion (60.1) for y we conclude that the equations
m
~ al/s/='11
1=1
must. hold for i = 1, 2, ••• , n, i.e. that
au~l + au~2 + • • • + alm~m = TJu
a21 ~~ + a22~2 + • • • + a2m~m = TJ 21
(60.2)
anl~l + an2~1 + • • • + anm~m = TJn•
Hence all elements of the matrix of a zero operator are zero. Such
a matrix is called a zero matrix and designated 0.
Now take an identity operator E. For this operator we find
Matrices of this form are called diagonal. If all diagonal elements are
equal, then the matrix is said to be a scalar matrix. In particular,
an identity matrix is a scalar matrix. Rectangular matrices construct-
ed in a similar way will also be called diagonal. If we consider
relations (60.2), we shall easily see what the action of a linear opera-
tor with a matrix A is. This operator "stretches" the ith coordinate
of any vector by a factor of A- 1 for every i.
61) Operations on matrices 19t
EKerclses
1. In a space of polynomials of degree not higher than n
a basis 1, t, t1 , •••• tn is fixed. or what form is the matrix of the operator
of differentiation in that basis?
2. In a space X an operator P of projection onto a subspace S parallel to
a subspace Tis given. Fix in X any basis made up as a union of the bases of S
and T. Of what form are the matrices of P and E - Pin that basis?
3. Let A be a linear operator from X to Y. Denote by M4 _a subspace in X
complementary to the kernel N.A and by RA a subspace in Y complementary
to T A· How will the matrix of A change if in choosing bases in X and Y we use
bases of some or all of the indicated subspaces?
for i = 1, 2,
designated
..., n and j = 1, 2, ••• , m. A sum of matrices is
C =A+ B.
A difference of two matrices A and B of the same size n X m
with elements all and biJ is a matrix C of the same size with ele-
ments cl}lif
cfl = all - biJ
for i = 1, 2, ••• , n and 1 = 1, 2, ••• , m. A difference of matrices is
designated
C =A- B.
Consider an operator A from X to Y and an operator C = A.A for
some number A.. If a 11 and CIJ are elements of the matrices of the op-
era tors, then
cu = {Ce1} 1 = {A.Ae 1 }, =A. {AeJ}t = A.au,
and we arrive at the following definition:
A product of an n X m matrix A with elements a 11 by a number A.
is an n X m matrix C with elements c 11 if
c 11 = A.au
for i = 1, 2, ... , nand j = 1, 2, ••• , m. A product of a matrix by
a number is designated
C =A.A.
Let an m-dimensional space X and an n-dimensional space Y be
given over the same field P. As proved earlier, given f1xed bases in X
andY there is a 1-1 correspondence between the set Wxy of all opera-
tors from X to Y and the set of all n X m matrices with elements
from P. Since operations on matrices were introduced in accor-
dance with operations on operators, the set of n X m matrices, just
as the set Wxy, is a vector space.
It is easy to show one of the bases of the space of matrices. It is,
for example, a system of matrices A (ltP) for k = 1, 2, ... , n and
p = 1, 2, ... , m, where the elements al~P> of a matrix A (ltP) are
defined by the following equations:
(ltp) { 1 if i=k, j=p,
ai·1 =
0 otherwise.
In the space Wxya basis is a system of operators with matrices A <11 P>.
From this we conclude that the vector space of operators from X to Y
is a finite dimensional space and that its dimension is equal to mn.
Let X, Y and Z be vector spaces, let A be an operator from X toY
and let B be an operator from Y to Z. Also let m, n and p be the
61] Operations on matrices 193
BA=G ~)
and the noncommutativity of multiplication is proved.
The operation of matrix multiplication provides a convenient way
of writing relations of the type (60.2). Denote by Xe an m X 1 ma-
trix made up of the coordinates of a vector x andy an n X 1 matrix
made up of the coordinates of a vector y. Then reiations (60.2) are
equivalent to a single matrix equation
(61.2)
which is called a coordinate equation corresponding to the operator
equation
yAx=
and relates in matrix form the coordinates of inverse image and
image by means of the matrix of the operator.
It is important to note that the coordinate and the operator equa-
tion look quite alike from the notational point of view if of course
the indices are dropped and the symbol Ax is understood as a prod-
uct of A by x. Since the notation and the properties of operations
on matrices and operators coincide, any transformation of an opera-
tor equation leads to the same transformation of the coordinate
equation. Therefore formally it makes no difference whether we deal
with matrix equations or with operator equations.
In what follows we shall actually draw no distinction between
operator and coordinate equations. Moreover, all new notions and
facts that hold for operators will as a rule be extended to matrices, unless
otherwise noted.
Exerclse9
1. Prove that operations on matrices are related to the
operation of transposition by
(aA)' = aA', (A + B)' = AI + B'.
(AB)' = B'A', (A)'= A.
2. Prove that every linear operator of rank r can be represented as a sum
of r linear operators and cannot be represented as a sum of a smaller number of
operators of rank 1.
3. Prove that an n X m matrix has rank 1 if and only if it can be represented
as a product of two nonzero, n X 1 and 1 X m, matrices.
62) Matrices and determinants 195
Au Au Am2
A-1 = -d- --d- --d-
l •1=1
L a"'lb.l, ~ a.,.,2b•22 ~ ansnbsnn J
•2=1 •n=l
62) Matrices and determinants 197
m
( a~o 1 bs 11 ~ au 2b, 2 2
S2=l
m m
~ aru 2 bs 2 2 ~ ansnbsnn
l tZ,u 1bs 1 1
•2=1 •n=l
m
m m [(ats 1 1
ba 1
~ alsnbsnn
•n=l
= ~ ~ det · · · ·
•1=1 •2=1
tlna1bs1l
m m m (1 2
•i~1 s~l •~I A S1 S2
(62.3)
or equivalently
det C = det A ·det B.
Corollary. Let a square n X n matrix C equal the product of two
rectangular, n X m and m X n, matrices A and B, with m < n.
Then det C = 0.
Indeed, add to A and B n - m zero columns on the right and
n- m zero rows below, and the matrices obtained become square
n X n matrices with zero determinants. The product of those matrices
is a matrix C. Therefore according to the first corollary det C = 0.
Exercises
1. Prove that (A - 1)' = (A ')-1 for any nonsingular
matrix A.
2. Prove that det (A - 1) = (det A )- 1 for any nonsingular matrix A.
3. Prove that det (aA) = an ·det A for any square n X n matrix A.
4. Prove that if AB = E for square matrices A and B, then A is nonsingular
and B =A -1.
5. Write a fonnula of the type (62.2) fur an arbitrary minor of the product
of two matrices.
6. Prove that for any real matrix A all the principal minors of matrice..,
A' A and AA' are nonnegative.
7. Prove that the rank of a product of matrices is not greater than the rank
of each of the factors.
8. Prove that multiplying by a nonsingular matrix leaves the rank unaffected.
z= 2:
h=1
~1 e 1 = ~ T],/ 1•
i=1
By (63.1) we have
m m m m mm mm
~~ = 1=1
~ PIJTIJ (63.2)
Exercises
1. Prove that the rank of the matrix of an operator is
not affected by a change to other bases.
2. Prove that the determinant of the matrix of an operator in a vector space
is independent of the choice of basis.
3. What correspondence can be established between nonsingular operators
in a space X and transformations of coordinates in the same space?
4. Let us say that two bases of the same real space are of the same sign if
the determinant of their coordmate transformation matrix is positive. Prove
that all bases can be divided into two classes of bases of the same sign.
5. Let us say that one class of bases of the same sign is left-handed and the
other is right-handed. Compare these classes with those described in Section 34.
64] Equivalent ao.d similar matrices 201
0 0 1 0 0
1,=
00 ... 00 ... 0
0 0 0 0 0
l r J
Let a rectangular n X m matrix be given. It defines some linear-
operator A mapping a space X with a basis e1 , e 2 , ••• , em into a space
Y with a basis q1 , q2 , • • • , qn. Denote by r the number of linearly
independent vectors among the images of the vectors of the basis
Ae 11 Ae 2 , • • • , Aem. We may assume without loss of generality that
it is the vectors Ae 1 , Ae 2 , • • • , Ae, th11t are linearly independent,
since this can be achieved by a proper numbering of the basis vectors.
The remaining vectors Aer+l• ... , Aem can be linearly expressed in
202 Matrices and Linear Operators [Ch. i
terms of them,
r
Then by (64.1)
A/11 = 0 (64.3)
'for k = r + 1, ... , m. Set, further,
Af1 = t1 (64.4)
for j = 1, 2, ... , r. Vectors t1 , t 2 , • • • , tr are by assumption linearly
independent. Supplement them with some vectors tr+u ••• , tn to
a basis in Y and consider the matrix of the operator A in the new
bases f 1 , . • • , fm and t 1 , . . . , tn. The coefficients of the kth column of
the matrix coincide with the coordinates of the vector Af, in the
basis t 1 , • • • , tn. According to (64.3) and (64.4) the matrix of A will
·Coincide with I r·
The original matrix and I r correspond to the same operator and
therefore they are equivalent. Hence all matrices of the same rank
are equivalent to I r and therefore to one another.
While proving the theorem we answered a very important question:
Hou· are bases in spaces X and Y to be chosen for the matrix of the
linear operator to have the simplest form? Besides we have shown an
explicit form of that simplest matrix.
So simple and effective an answer has turned out to be possible
because bases in X andY could be chosen independently of each other.
Now let A be an operator in a space X. Of course, we could again
consider images and inverse images in different bases, but it is not
natural now since both images and inverse images are in the same
space. Using different bases would greatly hamper the study of the
action of the operator on the vectors of the space X. If there is one
basis, then the matrices P and Q in (63.6) coincide. Hence, corre-
sponding to every linear operator in a vector space is a class of matri-
ces connected by the relations
B = P- 1AP (64.5)
for different nonsingular matrices P. Such matrices are called similar
and a matrix P is called a similarity transformation matrix.
64) Equivalent and similar matrices 203
Exercises
1. Prove that the equivalence and similarity criteria
~f matrices are equivalence relations.
2. Prove that similar matrices have the same trace and the same determi-
nant.
3. Prove that under the same similarity transformation a cyclic group of
nonsingular matrices goes over into a cyclic group.
4. Prove that under the same similarity transformation a linear subspace of
matrices goes over into a linear subspace.
5. On a set of square matrices of the same size consider an operator of simi-
larity transformation of those matrices using a fixed similarity transformation
matrix. Prove that that operator is linear.
6. Prove that the set of all similarity transformation operators over the
same set of square matrices of the same size forms a group relative to multipli-
cation.
CHAPTER 8
Axm = AmXm·
We recall that column elements of the matrix of an operator coincide
with the coordinates of the images of basis vectors. Therefore the
matrix A>. of the operator A has the following form in a basis con-
206 The Characteristic Polynomial [Ch. S
sisting of eigenvectors:
A,~
c :·
0
A2
0
.0}
0
Am
If now A has in some basis x 1 , x 2 , • • • , Xm a diagonal matrix with
some, not necessarily different, numbers A1 , A2 , • • • , Am on the
principal diagonal, then x1 , x 2 , • • • , Xm are eigenvectors of A corre-
sponding to eigenvalues A~t A2 , • • • , Am.
Thus operators of a simple structure, and operators of a simple
structure alone, have diagonal matrices in some basis. That basis
can be made up only of eigenvectors of the operator A. The action
of any operator of a simple structure always reduces to a "stretching"
of the coordinates of a vector in the given basis. If all linear operators
had a simple structure, then the question of choosing a basis in which
the matrix of an operator has the simplest form would have been
completely solved. However, operators of a simple structure do not
exhaust all linear operators.
Exercises
1. Let an operator A have an eigenvector z correspond-
ing to an eigenvalue A. Prove that for the operator
a 0 E + a 1A + ... + anAn,
where a 0 , a 1 , ••• , an are some numbers, the vector x is also an eigenYector but
that it corresponds to an eigenvalue a 0 +~A+ ... + ani..n.
2. Prove that operators A and A - aE have the same eigenvectors for any
operator A and any number a.
3. Prove that an operator A is nonsingular if and only if it has no zero eigen-
values.
4. Prove that operators A and A - 1 have the same eigenvectors for any nonsin-
gular operator A. What is the connection between the eigenvalues of the op-
erators?"
5. Prove that if an operator A is of a simple structure, then the operator
a 0 E + ~A + .•• + anA n
is also of a simple structure.
6. Prove that an operator of differentiation in a space of polynomials is not
an operator of a simple structure. Find the eigenvectors and eigenvalues of
that operator.
i. Consider a similarity transformation operator with a diagonal matrix.
Prove that that operator is of a simple structure. Find ail its eigenvectors and
eigenYalues.
66. The characteristic polynomial
Not any linear operator has at least one eigen-
vector. Suppose, for example, that we have an operator in a
space v2 which turns every directed line segment about the origin
66) The characteristic polynomial 207
(66.4)
A,~!
0 0 0
: 1
0
0
1
0
0 J
(66.5)
This is easy to see from a direct check, using the Laplace theorem
for calculating det (AE - A e)· A matrix of the form (66.5) is called
.a Frobenius matrix.
For a number A from P to be an eigenvalue of an operator A it is
necessary and sufficient that it should satisfy the equation
Exercises
1. Find the characteristic polynomial for the zero and
the identity operator.
2. Find the characteristic polynomial for the operator of differentiation..
3. Is the coincidence of characteristic polynomials an. indication. of the
equality of the operators?
4. Prove that operators with matrices A and A' have the same character-
istic polynomials.
5. Suppose that in. some basis an. operator has matrix (66.5). Find the coor-
dinates of the eigenvectors in. the same basis.
6. Prove that an. operator with matrix (66.5) has a simple structure if and
only if the characteristic polynomial has m mutually distinct roots.
where
From the last equation it follows that dn+• :/= 0 and therefore the
degree of the product of nonzero polynomials is equal to the sum
of the degrees of the factors. Hence a product of nonzero polynomials
is a nonzero polynomial.
A special case of a product of polynomials is a product a.! (z) of
a polynomial f (z) by a number a., since a nonzero number can be
regarded as a polynomial of degree 0.
A set of polynomials with the operations introduced above is
a commutative ring. We shall not concern ourselves with checking
that all the axioms hold.
Theorem 67.f. For any polynomial f (z) and nonzero polynomial
g (z) we can find unique polynomials q (z) and r (z) such that
I (z) = g (z) q (z) +r (z), (67 .3)
with the degree of r (z) lower than that of g (z) or r (z) = 0.
Proof. Let f (z) and g (z) be polynomials of degrees n and s. If
n <s or f (z) = 0, then it is possible to set q (z) = 0 and r (z) =
= f (z) in (67 .3). Suppose therefore that n;;;;;::, s.
We represent f (z) and g (z) according to (67 .2) and set
f (z)- :: zn-•g (z) = /s(z). (67.4)
Let the degree of / 1 (z) be n 1 and let its leading coefficient be a~:.
It is clear that ~ < n. If ~;;;;;::, s, then we set
a< t1>
/ 1 (z)- b: zn 1 - 8g(z)=/2 (z). (67.5)
where/., (z) is either zero or its degree n11 is less than s. After that the
process is stopped.
Now adding all equations of the type (67.4) to (67.7) we get
(I) a<ll-1) )
f (z)- ( :: z"-'+ a;: zn1-s+ ..• + n:.- 1 z"ll-1-• g(z)= ft. (z).
satisfy equation (67 .3), with either r (z) = 0 or the degree of r (z)
less than that of g (z).
We now prove that the polynomials q (z) and r (z) satisfying the
hypothesis of the theorem are unique. Let there be other polynomials,
q' (z) and r' (z), such that
I (z) = g (z) q' (z) r' (z), +
with either r' (z) = 0 or the degree of r' (z) less than that of g (z).
Then
g (z) (q (z) - q' (z)) = r' (z) - r (z). (67 .8)
The polynomial on the right of this equation is either zero or its
degree is less than that of g (z). But the polynomial on the left of
this equation has a degree not less than that of g (z) for q (z) -
q' (z) :/= 0. Therefore (67 .8) is possible only if
q (z) = q' (z), r (z) = r' (z).
This completes the proof of the theorem.
A polynomial q (z) is called the quotient of f (z) by g (z) and r (z)
is the remainder. If the remainder is zero, then f (z) is said to be
divisible by g (z) and g (z) is said to be a divisor of f (z).
Consider division of a nonzero polynomial f (z) by a first-degree
polynomial (z - a). We have
f (z) = (z- a) q (z) r (z). + (67.9)
Since the degree of r (z) must be less than that of (z - a), r (z) is
a polynomial of degree zero, i.e. a constant. That constant is easy
to determine. On substituting on the right and left of (67.9) z =a
we find that r (z) = f (a). So
f (z) = (z- a) q (z) f (a). +
(67.10}
For f (z) to be divisible by (z - a) it is necessary and sufficient
that f (a) = 0. Numbers a such that f (a) = 0 are usually called
roots of f (z). Thus finding all linear divisors of a polynomial is
equivalent to finding all its roots.
212 The Characteristic Polynomial [Ch. 8
are the roots of (68.1 ). We shall call them the nth roots of a and
designate them as
n/-
a11=y a.
Now let f (z) be a polynomial with complex coefficients. We consid-
er it to be a complex function of the complex independent vari-
able z. For such functions, as for real-valued functions of a real inde-
pendent variable, it is possible to introduce the concepts of conti-
nuity, of derivative and so on. Not all of these notions will be equal-
ly necessary to us, but they are all based on the use of the complete-
ness of the space of complex numbers.
A one-valued complex function f (z) of a complex independent
variable z is said to be continuous at a point z0 if for any arbitrarily
small number e > 0 we can find 6 > 0 such that for any complex
number z satisfying
I z- zo I< 6
we have
I I (z) - I (zo) I < e.
A function f (z) continuous at each point of its domain is called
everywhere continuous or simply continuous.
Lemma 68. t. A polynomial f (z) with complex coefficients is a con-
tinuous function of a complex independent variable z.
Proof. Let
(68.3)
and let z0 be an arbitrary fixed complex number. Denote h = z - z0 •
We show that for any arbitrarily small number e > 0 we can find
6 > 0 such that I I (z) - f (z 0 ) I < e for I h I < 6.
On expanding the polynomial f (z) in powers of (z - z0 ) we get
f (z) = Ao+ A1 (z-z0) + ...
+An (z-z 0)n.
Since A 0 =I (z 0 ) and (z - z0 ) is denoted by h, we have
f (zo +h)- f (z0)=A 1h + ... + Anhn. (68.4)
Hence
I f(zo+h)-f(zo) I~ IAtllh I+ ..• +I An llh ln=A( I hI). (68.5)
The real-valued function A (I h I) is a polynomial with real coeffi-
cients I A 1 I in a real variable I h 1. As is known from mathematical
analysis, A (I h I) is a continuous function everywhere and, in
68] The fundamental theorem of algebra 215
I f(z) I;;;;;::, !an II z I" ( 1-1 :: 11 z 1-n_ ... -1 aan:l j1 z r•). (68.10)
Since {z11 } is infinitely large,
lim I z~~. I= + oo.
11-+oo
lim
1:11.1 .. oo
{1-~~~1 z"
Gn
,-n_ ... -~~~~ zll ,-•)=1.
an
Exercises
1. Prove that the set of all nth roots of the complex
number 1 forms a commutative group relative to multiplication.
2. Prove that for a sequence of complex numbers {1]1} to be bounded it is
necessary and sufficient that for at least one polynomial I (1) of degree n ;;, 1
the sequence {I (1)1)} should be bounded.
3. Prove that for any polynomial/ (1) of degree n;;;;. 1 and for any complex
number 1 0 there is a complex number h such that I I (z 0 +
h) I > I I (z 0 ) I·
4. Prove that all roots of polynomial (68.3) are in the ring
( f +max I
!.!!..j) -t
11>0 ao
~ I z I ~ ( 1 + max ~
h<n
I I) .
an
5. Try to ".Prove" the algebraic closure of a field of real numbers according
to the same scheme as that for complex numbers. In what place has the "proof'"
no anslogy?
69. Consequences
of the fundamental theorem
There arise a variety of consequences from the
fundamental theorem. Let us consider the most important of them.
A polynomial I (z) of degree n;;;;::, 1 with complex coefficients has
at least one root z1 • Therefore I (z) has a factorization
I (z) = (z - z1) cp (z),
218 The Characteristic Polynomial [Ch. 8
It is clear that the degree of the polynomial at the right does not ex-
ceed n, and at the points z = a 1 it assumes the values f (a,). The
polynomial thus constructed is called Lagrange's interpolation poly-
rwmial.
Consider now a polynomial I (z) of degree n and let z1, z 2, . . . , Zn
be its roots repeated according to multiplicity. Then
,If (z) =an (z- Z1) (z- z2) ••• (z- Zn)·
Multiplying the parentheses at the right, collecting similar terms and
comparing the resulting coefficients with those in (68.3) we can de-
rive the following equations:
an -/an = -(Z1 + Z2 + ••• + Zn),
an -2/an = +(zlz2 + zlz3 + • • . + ZlZn + • • • + Zn-lZn),
an-ian = -(Z1Z2Z3 + Z1Z2Z6 + · · · + Zn-2Zn-IZn),
These are called Vieta's formulas and express coefficients of the poly-
nomial in terms of its roots.
On the right of the kth equation is the sum of all possible products.
k roots each, taken with a plus or a minus sign according ask is even
or odd.
For further discussion we shall need some consequences of the fun-
damental theorem of algebra, relating to polynomials with real coef-
ficients. Let a polynomial
/(z)=ao+a 1z+ ... +anz"
with real coefficients have a complex !(but not real!) root v, i.e.
a1 +a 1v+ •.. +anv"=O.
The last equation is not violated if all numbers are replaced in it by
complex conjugate ones. However, the coefficients a 0 , ••• , an and the
number 0, being reals, will remain unaffected by the replacement.
Therefore
i.e. t (v) = o.
Thus if a complex (but not a real!) number vis a root of a polynomi-
~1 f (z) with real coefficients, then so is the complex conjugate number
v.
It follows that f (z) will be !divisible by a quadratic trinomial
<pW=~-0~-~=~-~+~z+~
with real coefficients. Using this fact we prove that v and ii have the
same multiplicity.
Let them have multiplicities k and l respectively and let k > l,
for example. Then f (z) is divisible by the lth degree of the polynomial
<p (z), i.e.
I (z) = <p\ (z) · q (z).
The polynomial q (z), as a quotient of two polynomials with real coef-
ficients, has also real coefficients. By assumption it must have anum-
ber v as its (k - l)-fold root and must have no root equal to v. Ac-
cording to what was proved above it is impossible and therefore
k = l. Thus all complex roots of any polynomial with real coefficients
are mutually complex conjugate. From the uniqueness of the canoni-
cal factorization we can draw the following conclusion:
Any polynomial with real coefficients can be represented, up to an
arrangement of the factors, uniquely as a product of its leading coef-
ficient and polynomials with real coefficients. Those polynomials
have leading coefficients equal to unity and are linear, if they corre-
spond to real roots, and quadratic, if they correspond to a pair of com-
plex conjugate roots.
69] Consequences of the fundamental theorem 221
Exercises
1. Prove that if a complex number a =1= 0, then for
any natura'l n there are only n distinct complex numbers whose nth power is
equal to a.
2. What is the relation between the roots off (1) and I (1 - a), where a is
a complex number?
3. Let a polynomial/ (s) of degree not greater than n with complex coeflici-
enta assume equal values for n + 1 distinct values of the independent variable.
Prove that f (s) is a polynomial of degree zero.
4. Prove that any polynomial of an odd degree with real coeflicients has at
least one real root.
5. Prove that a polynomial I (s) has at least a root in each of the regions
IzI ~ vI :: I· IzI ~ VI :: I·
6. Prove that an operator A has a simple structure if and only if there are
as many linearly independent eigenvectors corresponding to each of its eigen-
values as is the multiplicity of ~
CHAPTER 9
The Structure
of a Linear Operator
A,
=(Au
0
A12)
Au . (70.1)
Any linear operator has at least one eigenvector in every invariant sub-
.space.
If a space is decomposed as a direct sum of r invariant subspaces,
then the linear operator has at least r linearly independent eigen-
vectors.
It is clear that any eigenvalue and any eigenvector of an induced
operator are respectively the eigenvalue and the eigenvector of the
generator. Less obvious is
Theorem 70. f. The characteristic polynomial of an induced operator
generated on a nontrivial subspace is a divisor of the characteristic poly-
nomial of the generator.
Proof. Let an induced operator A I L be defmed on an invariant
subspace L. Again choose a basis e1 , ••• , ~m of a space X so that vec-
tors elt ... , en constitute a basis in L. If the matrix of the gener-
ator is A 6 of (70.1), then the matrix of A !Lis Au of (70.1). The char-
.acteristic polynomial is equal to det ('A.E - A e) for A and to
det ('A.E- Au) for A 1 L. Applying the Laplace theorem to expand
the determinant det ('A.E - A e) by the first n columns we find
AE=~:)
AE-A 11
dct (AE- Ae) = det (
0
=det (AE-A 11 ) det (AE-Au).
This equation establishes the validity of the theorem.
Determining all eigenvalues of the operator A reduces to finding
all roots of the characteristic polynomial. If A has a nontrivial in-
variant subspace, then by Theorem 70.1 this problem can be reduced to
·finding all roots of two polynomials of lower degree. If the induced
operator itself has a nontrivial invariant subspace, then the process
of factoring the characteristic polynomial can be continued.
Exercises
EKerclses
i. Let A be an operator of differentiation in a finite
dimensional real space of rolynomials. What is the operator <p (A) for a
polynomial <p (z) with rea coefficients?
2. Let <p (z) be tbe characteristic polynomial of an induced operator generat-
ed by an operator A on an invariant subspace N. Prove that N is in the kernel
of an operator If" (A) for some positive Integer k.
i2) The triangular form 227
l
then the coefficient of e1 must be zero for every i > j. Hence the mat-
rix of A is of the form
au a12 al, m-1
0 azz az, m-1 aa,m
2m
A.=
0 0 am-I. m-1 am-I. m
0 0 0 amm J
where a 11 = 0 for i > j.
A matrix all of whose elements under (above) the principal diag-
onal are zero is called a right (left) triangular matrix. In matrix term1
228 The Structure of 11 Linear Operator [Ch. 9
Exercises
1. Prove that any square matrix is similar to a left
triangular matrix.
2. Prove that a set of left (or right) triangular matrices forms a ring.
3. Prove that a set of nonsingular left (or right) triangular matrices forms
a group.
4. Let ).1 , ).2 , ••• , Am be the eigenvalues of an operator A written. out in
succession according to multiplicity. Prove that, taking into account their
multiplicities, the eigenvalues of an operator q> (A) for any polynomial q> (s)
are q> (~). q> ().. 2), •.. , p (Am)·
5. Prove that if all d1agonal elements of a triangular m X m matrix A are
zero, then Am= 0.
6. Let a triangular matrix be similar to a diagonal matrix. Prove that the
similarity transformation matrix may be chosen to be left (or right) triangular.
An operator A defined by
Ax= BxL + CxM
is called a direct sum of B and C. If one of the subs paces L and M is
trivial, then the direct sum is also called trivial.
It is easy to verify that A is a linear operator in X. We show that
it can be represented only uniquely as a direct sum of operators cle-
fined on L and M. Indeed, for any vector x E L we have Ax = Bx.
Similarly Ax = Cx for any x E M. This means that B coincides
with the induced operator A I L and C coincides with A I M.
Consider now an operator A in a space X. If X is decomposed in
some way into a direct sum of subspaces L and M invariant under
A, then the operator A itself can be decomposed as a direct sum. Indeed,
construct A I L and A I M. On decomposing again a vector x EX
as a sum (73.1) we get
Ax = (A I L) xL + (A I M) xM.
In this case, by Theorem 70.1, the characteristic polynomial of A
is equal to the product of the characteristic polynomials of A I L
and A I M.
The operator A can be decomposed as a direct sum using any op-
erator polynomial <p (A). Denote by N" the kernel of an operator
<pit (A). This is a subspace invariant under A and it is obvious that
N 1 c N 2 c . . . . We first prove that if Nit = NH 1 for some k,
then N 11 = Np for every p > k. Indeed, take any vector x E Np.
Then <pP (A) x = 0. On writing this as q>11 +l (A) (<pp-lt- 1 (A) x) =0
we conclude that the vector <pp-11.- 1 (A) x E N 1!+ 1 • By virtue of
N 11 = Nlt+l the same vector is in Nit. Consequently,
<p 11 (A) (q>P-"- 1 (A) x) = qt- 1 (A) x = 0,
i.e. the vector x E Np_ 1 • The validity of the above assertion can
now be established by induction on p.
The space X in which A is an operator is finite dimensional. There-
fore the dimensions of ~ubspaces N" cannot increase without lim-
it. Let q be the smallest positive integer for which N q = N q+l·
Denote by Tit the range of an operator <p11 (A) and consider any vector
x common for subspaces T q and N . We have <pq (A) x = 0 and
x = <pq (A) y for some vector y E X.
It follows that <p1 q (A) y =
= 0, i.e. y E N tq· But by what has been proved N q = N 2 q. There-
fore y E N q• i.e. x = <pq (A) y = 0.
Thus T q and N q have only a zero vector in common. In view of for-
mula (56.3) this means that X = T q -f.- N q· Since T q and N q are
invariant subspaces, the possibility of decomposing the operator is
established.
230 rhr Structure of a Linl'ar Operator [Ch. 9
follows from what has been said that any operator can be decom-
posed as a direct sum of operators induced on root !<ubspaces.
A root subspace R 1 coincides with the kernel of the operator ((A -
- 1,. 1E) 11 ')q for some positive integer q. We show that in this case it
is always possible to put q = 1. Consider operators (A - ). 1£)P for
p = 1, 2, . . . . Let p 1 be the smallest number for which the kernel
of (A - ).iE)Pt coincides with that of (A - 1,. 1E)P,-u. Then R, will
coincide with the kernel of (A - 1,. 1E)Pt. Since the dimensions of
the kernels of (A - 1,. 1E)P for p = 1, 2, ... are monotonically in-
creasing and the dimension of R 1 is equal to k,, we have p, ~ k 1•
Thus R, corresponding to an eigenvalue ). 1 of multiplicity k 1
clearly coincides with the kernel of (A - 1,. 1E) 11 '.
Theorem 73.2 (Cayley-Hamilton). If I {z) is the characteristic poly-
nomial of an operator A, then f (A) is a zero operator.
Proof. Let us represent the characteristic polynomial as the canon-
ical factorization (73.2). Since the operator polynomial f (A) con-
tains the factor (A - ).,E) 11 ' and any polynomials in the same opera-
tor are commutative, f (A) x 1 = 0 for any vector x 1 in R 1• Now take
a vector x and represent it as x = x 1 x2+ + ... + Xr, where
x 1 E R 1• It is now clear that f (A) x = 0, i.e. that I (A) is a zero op-
erator.
Of great interest is again the matrix interpretation of the results
obtained. Compose a basis of the space as a successive combination of
any bases of root subspaces R 1 , R 1 , • •• , Rr. Root subspaces are in-
variant and their direct sum coincides with X. Therefore the matrix
A e of A in the basis has the so-called quasi-diagonal form
A, A,. ~ ) .
(73.3)
(
0 Arr
Exerelses
1. Can an operator of differentiation in a finite dimen-
sional space of polynomials be decomposed as a nontrivial direct sum?
2. Prove that a system of root vectors corresponding pairwise to distinct
eigenvalues is lineuly independent.
3. Prove that if an operator A is nonsingular, then A - 1 = q> (A) for some
polynomial q> (z).
4. An operator A is said to be nilpotent if A P = 0 for some positive inte-
ger p. Prove that an operator is nilpotent if and only if all its eigenvalues are
zero.
5. Let q> (z) be a polynomial of the lowest degree for which q> (A) = 0.
Prove that q1 (z) is a divisor of the characteristic polynomial of A.
232 The Structure of a Linear Operator [Ch. 9
Hence the matrix of the induced operator has the following form:
(A., 1 0 0 0 )
0 A., 1 0 0
0 0 0 ... A., 1
0 0 0 ..• 0 A., J
Matrices of this form are called Jordan canonical boxes.
We shall now construct a basis of a space as a successive combina-
tion of the bases of root subspaces R 1 , R 1 , • • • , Rr. As a basis of
each root subspace R 1 we take vectors of the type (74.4) ordered in
succession from bottom to top and from left to right. A space basis
constructed in this way is called a root basis.
In a root basis the matrix J of an operator A assumes the so-called
Jordan canonical form. It is a quasi-diagonal matrix made up of Jor-
dan boxes. First come Jordan boxes corresponding to an eigenvalue
"-t, in nonincreasing order of their sizes. Then, in the same order,
come Jordan boxes corresponding to A. 2 and so on. Thus
,., 1 0
0 '-r 0
0
0 0.. A.1
..
~.
J= . (74.5)
').,. , . 0
0 ).r .0
0
0 o...J.,.
75] The adjoint operator 235
Exercises
1. Let x be a root vector of height v corresponding
to an eigenvalue ).. 1 of an operator A. Prove that if ).. 1 is a root of multiplicity
p of a polynomial <p (z), then a vector v = <p (A) z is a root vector of height
r = max (0, v - p} corresponding to the same eigenvalue ).. 1• What can be
said about the vector v if )..1 is not a root of <p (z)?
2. Let x be a nonzero vector and let <p (z) be a polynomial of the lowest degree
such that <p (A) x=O. Prove that <p (z) is a divisor of the characteristic polyno-
mial of A.
3. Prove that any square matrix can be reduced to a unique Jordan canoni-
cal form up to a permutation of Jordan boxes.
4. Prove that 1f a matrix is similar to the matrix J of (74.5), then it is simi-
lar to J' as well.
5. Prove that square matrices A and A' are the matrices of the same operator.
6. Let J be a Jordan canonical matrix. What is the form of matrices JP for
positive integers p?
Exercises
Exerelses
1. Let A be a linear operator and let a and ~ be com-
plex numbers equal in absolnte value. Prove that aA + p...t • is a normal
operator.
2. Let A be a normal operator. Prove that for any polynomial 'P (s) the op-
erator 'P (A) is normal.
3. Prove that for a normal operator any induced operator is normal.
4. Prove that an operator A Is normal if and only if for any invariant sub-
space L its orthogonal complement Ll. is also innriant.
5. Let A be an operator of a simple structure in a complex space. Prove
that A can always be made normal by an appropria\e assignment of a sealar
product in its space.
242 The Structure of a Lint>ar Operator [Ch. 9
Now
U*Ux = U• (Ux) = U• (a 1A. 1X1 + ... +amA.mxm)
= Cl1A 1'I 1x1 + ... + ClmAm'ImXm = a 1x1 + ... +amXm =X.
then
m
(x, y) = ~ a 1~1 •
t=l
By the linearity of U
m
Uy = ~ ~,Y;·
t=l
Therefore again
7fl
(Ux, Uy) = ~ a 1 ~1 •
t=l
for .x :/= 0.
If an operator H is positive definite, then H- 1 is also a positive defi-
nite operator.
Indeed, since H = H*, we have H- 1 = (H*t 1 = (H- 1 )*, i.e.
the operator H- 1 is Hermitian. The eigenvalue~ of H- 1 are inverses
of the eigenvalues of H. Therefore they are positive and H- 1 is posi-
tive definite.
If H is positive definite and A is nonsingular operator, then A* H A
and AHA • are positive definite operators.
It is easy to verify that they are Hermitian. By the nonsingularity
of A we have Ax :/= 0 and A •x :/= 0 for any x :f=. 0. Therefore
(A*HAx, x) = (HAx, Ax)> 0, (AHA*x, x) = (l!A*x, A*x) > 0
for x :/= 0. In particular, it follows that for any nonsingular opera-
iil Unitary and Hermitian operators 245
Exercises
1. Prove that the set of all unitary operators in a
given unitary space forms a group relative to multiplication.
2. Prove that the set of all Hermitian operators in a given unitary space
forms a group relative to addition.
3. Let an operator A be Hermitian and B positive definite. Prove that the
eigenvalues of the operators BA and B-1A are real.
4. Prove that if A and B are positive definite eperators, then all eigenvalues
of the operator BA are posittve.
5. Prove that if A and B are commutative positive definite operators, then
the operator BA is also positive definite.
6. Prove that if A is a positive definite operator in a unitary space, then the
!unction (x, y)A = (A.r, y) satisfies all the scalar product axioms.
246 The Structure of a Linear Operat~r [Ch. !l
(78.4)
Pr
0
8. Prove that if I).It 1 = p11 for all k = 1, 2, ... , m, then the operator is normal.
79. Decomposition of
an arbitrary operator
One of the circumstances determining the
significance of the unitary and the Hermitian operator is the possi-
bility of using them to represent an arbitrary linear operator.
Let A be a linear operator in a unitary space X. We show that it
can always be represented as
A = H1 iH 2 ,+ (79.1)
where H 1 and H 2 are Hermitian operators. Indeed, if this decomposi-
tion exists,
But then
1
H 1 =:r(A+A*),
It is these formulas that define decomposition (79.1). Since
HIH'I.-H'I.HI= ;, (A*A-AA*),
Exereiaes
1. Prove that if an operator is normal, then the eigen-
values of the operator H 1 (H 1 ) of (i9.1) are the real (imaginary) parts of th~
eigenvalues of the operator A.
250 The Structnre of a Linear Operator [Ch. 9
It is not hard to establish that the space C with such a scalar product
is unitary. The scalar product for any two vectors from R is pre-
served.
Let A be an operator in R. Construct a new operator A inC equal
to A on R. To do this we set
+ iv)
A
A (u = Au -L iAv.
It is easy to verify that they are real. Moreover, it is not hard to see
+
that if A. = 1.1. iv, then
Ax= flX- vy, Ay = \"X -r f.IY·
Therefore the span in R constructed on vectors (80.3) is an invariant
subspace of A. The matrix of the induced operator on that ~ubspace
in basis (80.3) is as follows:
( -v
!.1. v) .
Jl,
Hence the characteristic polynomial of the induced operator i!' (z -
- J.1.) 2 + v 2 or equivalently z2 - (A. +
~) z +
AI. Note that in the
invariant subspace constructed A has no eigenvector for ,. :/= 0.
Thus we have arrived at an important conclusion. Namely:
If the characteristic polynomial of an operator A in the real space R
ha.<; a complex (not real!) root, then that root has in R a corresponding
tu·o-dimensional invariant subspace of A containing no eigenvectors.
This conclusion is as important for the study of operators in a
real space as is the fact of the existence of at least one eigenvector for
the study of operators in a complex space. Choosing in a suitable way
bases in the space R we can reduce the matrix of an operator to a form
resembling in a sense either the diagonal form or the triangular form
or the Jordan canonical form. This method of investigating the op-
erator is employed comparatively rarely, since real canonical forms
lack many merits of complex canonical forms. It is much easier and
more fruitful to investigate the complexification of an operator.
81) Matrices of a special form 253
Exercises
~ - { 0 if i -=1= j,
£....1
11.=1
u,l!.uJI!. = 1 1'f l. = ]..
254 The Structure of a Linear Operator [Ch. 9
Thus the systems of row vectors and column vectors of any unitary
matrix are orthonormal systems.
A real unitary matrix U is called orthogonal. It is defined by the
following relations:
UU'=U'U=E.
Exercises
1. Prove that any complex matrix is unitarily sim-
ilar to a triangular matrix.
2. Let ~. ).,, ... , ).,. be the eigenvalues of a matrix A, each eigenvalue
repeated accorchng to mtiftiplicity. Prove that
m
~ I AI 12 :so; tr (A • A). (81.1)
1=1
3. Prove that equality holds in (81.1) if and only if the matrix A is normal.
4. Using the Binet-Cauchy formula prove tbat for any matrix A the princi-
pal minors of the matrix A •A are nonnegative.
5. Prove that the sum of the squares of the absolute values of all minors of
a unitary matrix in any fixed rows and columns is equal to unity.
6. Prove that any rectangular matrix A can be represented as A = QAS,
where Q and S are unitary matrices and A is a diagonal matrix with nonnegative
elements.
CHAPTER 10
Metric Properties
of an Operator
e1 + ...
t(O) t(O)
Xo=':ot +':om em.
Suppose x 11 -+ x 0 and
t<h)
X11 =':II e1 + ·· · + Sm em.
"(h)
Then
1 1
II Ay, II = if;;~! II Axe II> liXeif (II A II- e) II Xe II = II A II- e.
Since II Y8 II = 1, we have
sup II Ax II> II Aye II;>: II A 11-e.
11x11,.;t
By virtue of the arbitrariness of e we get
sup IIAxii>IIAII. (82.6)
llxi(E:;I
Now from (82.5) and (82.6) we obtain relation (82.3) which was to
be established.
We shall soon show that the norm of an operator plays an exceptional-
ly important role in introducing a metric in the space of linear operators.
It is the explicit form (82.3) that will be essential.
Exercises
1. Prove that on a bounded closed set of vectors the
supremum and infimum of the norms of the values of a linear operator are
attained.
2. Prove that a linear operator carries any bounded closed set again into
a bounded closed set.
3. Is the assertion of the preceding exercise true if the boundedness requi~
ment of a set is dropped?
4. Prove that in (82.3) the supremum is attained on a set of vectors satisfying
II x II = 1 provided dim X > 0.
5. Let A be an operator in a space X. Prove that A is nonsingular ifand
only if there is a number m > 0 such that II Ax II> m II x II for any x E X.
then II Ax II = 0 for every vector x whose norm does not exceed uni-
ty. But then, by the linearity of the operator, Ax = 0 for every
x. Hence A = 0. For any operator A and any A we have
IIAAII= sup fiAAxii=IAI sup IIAxii=IAI·IIAII.
n:~: rr.;;; 1 11z n.;;; 1
we find
II BA II= sup II (BA) x II= sup II B (Ax) II
(lxi(E;;l ll:r(IE;;l
=(Vx,sup
Vx)E;;l
(AVx, AVx) =sup (Av, Av) =II A 11:.
(11, v)E;;l
then
m
(x, x) = ~ I ai lz.
t=l
which yields
m
II A 11:= sup ~ I a, l~p:. (83.5)
m j ... t
~ let 111 E;;l
j.ol
Exercises
1. Prove that for any eigenvalue ). of an operator A
I ). I :so; inf II All 11 1/ll.
ll
2. Let q> (z) be any polynomial with nonnegative coefficients. Prove that
II q> (A) II :so; q> (II A II>·
3. Prove that II A II~ II A - 1 11-1 for any nonsingular operator A. When does
equality hold in the spectral norm case?
262 Metric Properties of an Operator [Ch. to
We now show that for some vector x satisfying the condition II x 111 ~
~ 1, II Ax 111 coincides with the right-hand side of the relation ob-
tained.
Let the largest value at the right be reached when j = l. Then all
the inequalities become equations, for x = e~o for example. So
n
IIAII 1 = max ~I all I·
IE;;JE;;m •=I
m m
~sup
l(x(( 00 E;;l
(max .L
lE;;iE;;n1=l
lallllx 1 1)~ sup
((xi( 00 E;;l
((max~
t.;;;t,..;;n;=t
la11 1)
m m
X ( max I x 1 I)) = ( max ~ I a11 I) ( sup II x lloo) = max ~ I all I·
lE;;JE;;m lE;;iE;;n i=l ((x(I 00 E;;I 1E;;tE;;n i=l
p m n n
~( L; ~ ( ~ (~
112
I b," 12 ) I a,J 12))
t=1J=1 lt=l t=l
p 11 n m
112
~ ~
= (( 1~1 I b,lt 12) (~ ~ I a,J 12)) =II B liE' II A liE·
11=1 l~t ;=t
In the general case a Euclidean norm is not subordinate. Its com-
patibility with 2-norms can be proved in the same way as the prop-
erty just considered.
A direct check makes it possible to establish important formulas
for the Euclidean norm. Namely,
II A liE = tr (A;eAq 11) = tr (A9eA;e)· (84.2)
We can now draw the following conclusions.
An adjoint matrix in orthonormal bases has a corresponding ad-
joint operator. We transform the chosen bases into orthonormal bases
if we introduce in X andY scalar products in a way similar to (32.1).
Since the trace of a matrix is equal to the sum of its eigenvalues, it
follows from (84.2) that
The square of a Euclidean norm of an operator is equal to the sum
of the squares of its singular values.
If scalar products are introduced in X andY, it is possible to speak
of unitary operators. It is for these unitary operators that it is easy
to show that
A Euclidean norm is not affected by the multiplication of an operator
by any unitary operators.
Indeed, as noted in the exercises to Section 78, singular Yalues re-
main unaffected by the multiplication by unitary operators, and the
Euclidean norm can be expressed only in terms of singular values.
In most applications connected with norms, not so much an ex-
plicit assignment of an operator norm is important as the fact that
properties (83.1) hold. An operator norm can therefore be defined
axiomatically in terms of its matrix. Choose in the spaces, in which
the operators are given, some bases, then each operator will have a
corresponding matrix. We assign to each matrix a number designat-
ed as II· II and suppose that conditions (83.1) hold as axioms. Anum-
ber II· II will be called a matri:c norm. If now each operator is assigned
the norm of its matrix, it is clear that this introduces a norm in
the space of the operators. Conditions (83.1) obviously hold for the
operators, too. The converse is also true. Given fixed bases, any op-
erator norm generates a matrix norm. These matrix norms will be
designated by similar symbols II· 11 2 , II· II""' etc. It is obvious that
85) Operator equations 265
Exercises
1. Prove that, given any norm, for a unit matrix
>
II E II t. (84.3)
2. Let~ •... , Am be the eigenvalues of a matrix A. Prove that
m
Inf II B-lAB liE=~ I All 12 •
B 11=1
Compare this equation with (81.1).
Exercises
1. Prove that the equation. A •A .:z: = A •11 Ia solvable.
2. Prove that the equation (A •A )P.:z: = (A •A )qy is solvable for any poslUve
Integers p and q.
3". Give a geometrical interpretation of the Fredholm alternative and theorem.
86] Pseudosolutions and the pseudoinverse operator 267
(86.2)
x 0 = £.J
"' b-xll.
Pll
(86.4)
ll=l
Exercises
1. What is the :pseudoinverse of a zero operator?
2. Let X and Y be distinct spaces. Wnte the matrix of the pseudoinverse of
an operator in singular bases and compare it with (78.4).
3. Let U and V be unitary operators in X and Y respectively. Prove that
(VA U)+ = U*A+v•.
4. Prove that there are operators K in X and L in Y such that
A+= KA* = A*L.
Describe the action of the operators K and L.
5. Prove that the pseudoinverse of an operator is uniquely defined by the
conditions
AA+A =A,
A+= KA* = A*L.
6. Prove that all pseudosolutions, and they alone, are solutions of the equa-
tion
Ax= AA+y.
7. Give a geometrical interpretation of pseudosolutions.
For any number a less than unity in absolute value we have the
limiting relation
(1 + at 1 = lim ap,
p-+oo
where
p
ap = ~ (-a) 11 •
11=0
II (E _L A)-I II~ II E II
~ 1-IIAII.
For any subordinate norm II E II = 1 and hence in this case
(87 .6)
where
(87. 7)
The number "A is called the condition number of the operator A.
Although it depends on the choice of norm, it can never be very small.
From
E = A- 1A
~i] Perturbation and nonsingularity of an operator 273
and further
11;-xii~II(E+A-•eAt'-EII·IIx II
+ II (E + A-•eA)- 1 II · II A-• II • II e11 II.
It is assumed that subordinate norm is used. Taking into account
estimates (87.2) and (87.3), as well as the inequality IIY II~ IIA II X
X II x 11. we get
- II A-1II· II eA 11·11 x II II A- 111-0 e" II
llx-xll~ t-UA- II·IIeAII +t-UA-1II·IIeAII
1
In symbols
6x~ 1 _""AAM (6A+ 6y). (87.10)
This formula again gives the value of the condition number and
again it is important from the viewpoint of stability that it should
TUJt be too large.
Exercises
t. Prove that a condition number expressed in terms
of fl{leetral norm is equal to the ratio of the maximum singular value to the
minimum singular value,
2. There are operators with the smallest condition. number. What are these-
operators if spectral norm is used?
3. Prove that multiplication of an operator by unitary operators leaves its
condition number expressed in. terms of spectral or Euclidean norm unchanged.
4. Prove that for any nonsingular operators A and B
IIB- 1 -A- 1 11 IIA-BII
II B- 1 II ~"A II A II •
5. What causes the large instability of the system ofiveetora described In
Section. 22? Evaluate the condition. number of the operator whose matrix coluJIIII&
coincide with the coordinates of£vectora (22. 7).
X
~
~i a+p2
= £.J Pll~ll x,,. (88.2)
11=1 "
hence
(88.3)
where
I
= ~ I ~11!
2
""2
., "'-.. Pt
11=1
We then find
x 0 -xa. =a ~ ~~~ x 11 ,
"'-.. Pll(a+p112 )
II= I
from which we conclude that
I Xo - Xa. I ~ ay, (88.4)
where
I
""2=
I "'~
LJ pt •
11=1 "
Consequently
Thus, for small values of a the vector Xa. may serve as an approxima-
tion to the normal pseudosolution x 0 •
We expand the vectors Xa. and x 0 with respect to singular bases in
a way similar to (86.2). A direct check easily shows that Xa. satisfies
(A*A+aE) xa. = A*y. (88.5)
For a > 0 the operator A* A + aE is positive definite and therefore
there is an operator (A *A + aE)- 1 , i.e.
Xa. =(A* A+ aEr 1 A*y. (88.6)
On Xa. the minimum value of functional (88.1) is attained and there-
fore <Da. (xa.) ~ $a. (x 0 ). Taking into account (88.3) and (88.4) yields
I Axa.- Y 1 2
~ I Axo- Y 12 + a (I Xo 12 -l Xa. 12)
~I Axo-Y l 2 +2a~ 2 • (88.7)
In addition (])a. (xa.) ~ (])a. (0), from which it follows that
.- IYI
I Xa.l ~---m-·
a
Together with (88.6) this means that, given any operator A and any
vector y, for a> 0
I (A*A+aE)- 1 A*y I~ ~~~~ . (88.8)
88] Stable solution of equations 277
So.
11-;CI_ Xa. II~ :A (II Axo- y 11 2 -r 2a~ 2) 112 + ),2 (eA II Xo II+ ey).
Exercises
1. Prove that 1J in estimate (88.3) is the norm of the
oormal solution of
A*A (A*A)ll 1.x = A*y.
2. Prove that y in estimate (88.4) is the norm of the normal solution of
(A *A)1.:z: =A *y.
3. Prove that the difference .:z:a. - .:z:~ satisfies
(A*A+aE) (A*A --~E) (.:z:a. -r~)=(~- a) A•y.
4. Compare (88.11) and (87.10). What can be said about estimate (88.11)
in the case of a nonsingular operator A?
5. To what accuracy can a normal pseudosolution be computed if A = 0?
or
min IA,-AI~IIH- 1 112IIesii211H112·
IE;;fE;;n
In the first case this inequality also holds and therefore always
I A, - A I ~ v n II e B II a (89.2)
at least for one value of i. Here
Vn = II H-I ll2 II H lla
is the condition number of the matrix H expressed in terms of spec-
tral norm.
The relation obtained means that whatever the perturbation e B
of the matrix B is, for any eigenvalue A of the perturbed matrix
B + e B there is an eigenvalue A1 of B such that we have inequality
(89.2). Notice that we nowhere required that e B should be small. Rela-
tion (89.2) may be interpreted somewhat differently. Namely:
The eigenvalues of a perturbed matrix are in the region which is the
union of all disks with centres at A1 and of radius v n II e B 11 2 •
The columns of the matrix H are eigenvectors of the matrix B.
It follows from (89.2) therefore that as a general measure of sensi-
tivity of eigenvalues to the perturbation of a matrix we could ap-
parently take the condition number of the matrix H of eigenvectors
(rather than of the matrix B itselfl). The matrix H satisfying (89.1)
is not unique, since the eigenvectors are defined up to arbitrary fac-
tors. It will be assumed that H is always chosen so that its value v n
is a minimum one. We recall that in any case v H ~ 1.
If B is a normal matrix and, in particular, Hermitian or unitary,
then we may take H to be a unitary matrix. Then v H = 1 and con-
sequently
(89.3)
We consider in somewhat greater detail the case of a Hermitian ma-
trix B with Hermitian perturbation Es. Now we can show that:
Every disk with centre at A1 and of radius II e B 11 2 contains at least
one eigenvalue of a perturbed matrix.
Indeed, let us agree to consider a matrix B + e B as the "original''
matrix and the matrix B = (B + e s) - e B as a "perturbed'' matrix
with perturbation equal to -e B· Repeating the above calculations
word for word we obtain a formula similar to (89.3) but with the
eigenvalues of Band B + Es reversed. This means that for any t'igen-
value A1 of the "perturbed" matrix B there must be at least one eigt'n-
val ue A of the "original" matrix B + e B for which (89.3) holds.
If the eigenvalues of B are simple, then for a sufficiently small
perturbation e B all the disks become separated and then each disk
will contain one and only one eigenvalue of the perturbed matrix.
89] Perturbation and eigenvalues 28f
(0201 0) 20 01 0)e
\0 0 1
0 '
(0 0 1
ha!> three linearly independent eigenvectors, and the second has two,
although their eigenvalue-: are equal. Theoretically this phenomenon
b due only to the presence of multiple eigenvalues in the original
matrix. But under conditions of approximate assignment of a matrix
it is hard, if not impossible, to decide which eigenvalues are to be con-
sidered multiple and which simple.
Questions concerning the stability of eigenvalues, eigenvectors
and root vectors are among the most complicated in the sections of
algebra !connected Jwi th computations.
Bilinear Forms
CHAPTER 11
called polar relative to a given quadratic form. The set of all bilinear
forms generating the same quadratic form can be obtained by adding
the polar bilin<>ar form and an arbitrary skew-symmetric form. In
using bilinear forms for the study of the properties of quadratic forms
it suffices therefore to consider only symmetric bilinear forms.
The impossibility of reconstructing a bilinear form from a qua-
dratic form is explained by the fact that the quadratic form gives
no information about the skew-symmetric part of any bilinear form.
Lemma 90. t. Skew-symmetric bilinear forms, and these forms alone,
assume zero values for all coinciding independent variables.
Proof. We have already noted that if <p (x, y) is skew-symmetric,
then <p (x, x) = 0 for every x. If, however, <p (x, x) = 0 for every x,
then from (90.5) it follows that <p (x, y) + <p (y, x) = 0 for all
vectors x and y, i.e. the bilinear form <p (x, y) is skew-symmetric.
A comparison of the properties of a scalar product and relations
(90.1) shows that in a unitary space strictly speaking a scalar product
is not a bilinear form. In a complex space, closely related to a scalar
product are Hermitian bilinear forms. A numerical function <p (x, y)
is said to be a Hermitian bilinear form if for any vectors x, y, z E Kn
and any number a from the complex field P
<p (x -r z, y) = <p (x, y) + <p (z, y), <p (ax, y) = a<p (x, y),
<p (x, y + z) = <p (x, y) + <p (x, z), <p (x, ay) = a<p (x, y).
Here the bar stands for complex conjugation.
Again a sum of two Hermitian bilinear forms, as well as a product
of a Hermitian bilinear form by a number, is a Hermitian bilinear
form. The set of all Hermitian bilinear forms over the complex space
assuming complex values is therefore a complex vector space.
A Hermitian bilinear form is said to be Hermitian-symmetric if for
any vectors x, y E Kn
<p (x, y) = <p (y, x).
If for any x, y E Kn
<p(x, Y)= -<p(y, x),
then the form is called skew-Hermitian. On coinciding vectors the
skew-Hermitian form assumes pure imaginary values and the Her-
mitian-symmetric form assumes real values. Now any Hermitian
bilinear form is uniquely defined by its values when its independent
variables coincide. But instead of (90.3) the following relation is
true:
1
cp(x, y)=T{<p(x+y, x+y)-<p(x-y, x-y)
+ i<p (x + iy, x + iy)- i<p (x- iy, x- iy)}. (90.6)
From this it follows in particular that
286 Bilinear and Quadratic Forms [Ch 1 t
Of the Hermttian bilinear forms only the zero form assumes zero values
when all its independent var1ables coincide.
In this case, too, a Hermitian bilinear form can be uniquely re-
presented as a sum of a Hermitian-symmetric and a skew-Hermitian
form, with
1 1
q> (x, y) = 2 {q> (x, y) -4- q> (y, x)} + 2 {q> (x, y)- q> (y, x)}. (90. 7)
The proofs of the facts for Hermitian forms are much the same as the
corresponding proofs for bilinear forms.
A quadratic Hermitian form is a numerical function q> (x, x) of a
single independent vector variable x E Kn obtained from a Hermitian
bilinear function q> (x, y) by replacing the vector y with x. Unlike
quadratic forms, a Hermitian quadratic form allows a unique recon-
struction of the Hermitian bilinear form that generates it. The re-
construction is carried out according to formula (90.6), and the cor-
responding bilinear form is also called polar relative to the original
quadratic form.
The possibility of reconstructing uniquely a Hermitian bilinear form
from the Hermitian quadratic form generated by it is due to a close
relation of Hermitian-symmetric to skew-Hermitian bilinear forms.
Lemma 90.2. If q> (x, y) is a Hermitian-symmetric (skew-Hermiti-
an) bilinear form, then 'ljl (x, y) = i<p (x, y) is a skew-Ilermitian (Her-
mitian-symmetric) bilinear form.
Proof. Suppose, for example, q> (x, y) is Hermitian-symmetric.
Then for all vectors x and y we have
'ljl (x, y) = i<p (x, y)= q> (tx, y) = q> (y, ix) = - i<p (y, x) = - \j; (y, x),
i.e. 'ljJ (x, y) is skew-Hermitian. The case of a skew-Hermitian form
q> (x, y) can be considered in a similar way.
In what follows we shall more often be concerned with Hermitian
quadratic forms generated by Hermitian-symmetric bilinear forms.
Lemma 90.3. Of the Hermitian bilinear forms only symmetric forms
generate real Hermitian quadratic forms.
Proof. As already noted earlier, Hermitian-symmetric forms as-
sume real values when their independent variables coincide. Suppose
now that a Hermitian quadratic form q> (x, x) assume only real val-
ues. According to (90.6), for a polar bilinear form q> (x, y) we have
1
cp (y, x) = 4{q> (Y+ x, y+ x)- q> (y-x, y-x)
+i<p(y+ix, y+ix)-i<p(y-ix, y-ix)}
=-}{q>(x+y, x+y)-<p(x-y, x-y)+i<p(x-iy, x-iy)
and Lemma 90.3. Since <p (u, u) and <p (v, v) have opposite signs,
polynomial (90.8) will have two real roots. Let a 0 be one of them.
This means that <p (u + a 0v, u + a 0 v) = 0. However, the vector
u --'- a 0 v is nonzero by virtue of linear independence of u and v, so
the vanishing on it of the quadratic form is impossible under the
hypothesis of the theorem. This contradiction completes the proof.
It is no chance that we restricted our discussion in Theorem 90.1
to quadratic forms generated only by the real bilinear and the Her-
mitian-symmetric bilinear form. No other bilinear form can lead
to a real quadratic form. Actually it only remains to consider the
bilinear form in the complex space. But it is impossible for such a bi-
linear form to generate a real quadratic form not identically zero.
If for some vector u the quadratic form takes on a nonzero real value
<p (u. u), then <p (au, au) = a 2 <p (u, u) will be a complex number
for any complex a with a nonzero real and a pure imaginary part.
So
For real quadratic forms to have no isotropic vectors it is necessary
and sufficient that they should be strictly of constant signs.
A complex bilinear form always generates a quadratic form with
isotropic vectors provided it is defined on a vector space of dimen-
sion greater than unity. Indeed, assuming that this is not the case
we can always find linearly independent vectors u and v such that
<p (u, u) =I= 0 and q> (v, v) =I= 0. But according to (90.8) the vector
u , av will be isotropic under a suitable choice of complex number a.
A Hermitian bilinear complex form can generate a quadratic form
having no isotropic vectors. It follows from our studies that
Fur a quadratic form generated by a Hermitian bilinear form to have
no isotropic vectors it is sufficient that the real (or imaginary) part of
the quadratic form should be strictly of constant signs.
Exercises
1. Prove that given any bilinear form cp (.:z:, y), equa-
tions cp (0, y) = cp (.:z:, 0) = 0 for any .:z:, y E Kn.
2. Find the dimension and a basis of a vector space of bilinear forms.
3. Prove that sets of symmetric and skew-symmetric bilinear forms con-
stitute subspaces in the vector srace of all bilinear forms.
4. Prove that the space of al bilinear forms is a direct sum of subspaces of
symmetric and skew-symmetric bilinear forms.
5. Prove that the set of all quadratic forms constitutes a vector space. Find
its dimension and a basis.
6. Are the following sets of quadratic forms linear subspaces:
the quadratic forms of constant signa,
the quadratic forms assuming real values,
the quadratic forms having no isotropic vectors,
the quadratic forms for which all vectors of a given set are isotropic?
7. Prove that, given any quadratic form in a normed space, there is anum-
ber a such that for every .:z:
I cp (.:z:, .:z:) I ~ cz II x 111 •
91) The matrices of bilinear and quadratic forms 289
8. Let q> (.:z:, r) be a quadratic form strictly of constant signs and 1jJ (.:z:, .:z:)
an arbitrary quadratic form. Prove that there is a number ~such that for every .:z:
I 'iJ (.:z:, .r) I ~ ~41' (.:z:, .:z:).
9. Prove tnat a quadratic form is not strictly of constant signs if and only
if the set of isotropic vectors and the zero vector form a linear subspace.
10. Consider Exercises 1 to 9 for Hermitian bilinear and quadratic forms.
Do all the assertions remain valid?
11. Suppose that in a complex space Kn some subspace L consists only of
isotropic vectors of a Hermitian bilinear form q> (.:z:, y) and a zero vector. Prove
that q> (u, v) = 0 for any vectors u, v E L.
By (63.3)
X 11 = Px 1, Yq = Qy, (91.3)
and therefore it follows from (91.2) that
<p (x, y) = x;G,qyq = x/P'GeqQy,.
But on the other hand
<p (x, y) =- xjG 11 y 1•
Consequently
(91.4)
Since P and Q are nonsingular, in accordance with the terminology
introduced in Section 64 we shall call matrices G11 and G,q equiv-
alent. As shown earlier, equivalent matrices of the same size, and
only such matrice~>, have the same rank. This means that the rank of
the matrix of a bilinear form is independent of the choice of bases
and is a characteristic of the form itself. We shall call it the rank
of the bilinear form. A bilinear form is said to be nonsingular if so is
its matrix. A characteristic of a bilinear form is also the difference
between the dimenRion of the space Kn and the rank of the form. We
shall call it the nullity of the bilinear form.
It follows from the results of Section 64 that all matrices of the
same rank are equivalent to a diagonal matrix with elements 0 and 1.
In terms of bilinear forms this means that for an arbitrary form of
rank r we can always find bases / 1 , / 2 , • • • , In and tit t 2 , • • • , tnr
such that the form is of the simplest type. That is, if
n n
X=~ 'f,/,
1=1
y= 2:
1=1
v 1t 1,
then
r
<p (x, y) = ~ 'f 1v 1•
1
A separate choire of bases for each variable of the bilinear form is
made fairly rarely. It is more usual to choose a common basis. Let
e10 e 2 , • • • , en be some basis of Kn and
n n
X=~ ~ 1 e 1 , Y= ~ y1e1•
I= I J=l
In this case, as in (91.1), we obtain the following representation of a
bilinear form:
n n
or in matrix notation
<p (x, y) = x;G,y,. (91.5)
Jlere Ge is a matrix with elements glj' = <p (e., e1). It is Ge that
will throughout be called the matrix of a bilinear form. If again P
is a coordinate transformation matrix for a change from e1 , e 2 , • • • , en
to / 1 , / 2 , • • • , fn, then according to (91.4) matrices G. and G1 of the
same bilinear form <p (x, y) will be related by
G1 - P'G,P. (91.6)
The matrices Ge and G1 related by (91.6), with P nonsingular, are
called congruent. Congruent matrices are always equivalent. In gen-
eral the converse is not true, of course.
What was said about bilinear forms carries over with slight
changes to Hermitian bilinear forms. Every Hermitian form can be
represented uniquely in matrix notation
<p (x, y) = x;GeqYq•
with el' e 2 , ••• , en and q1 , q2 , ••• , qn fixed. Under a change to Itt
/ 1, ••• , In and t 11 t 2 , ••• , tn, instead of (91.4) we have
G11 = P'G,qQ.
If the independent variables of the Hermitian bilinear form are given
in a single basis, then the matrix notation of the form is similar
to (91.6). That is,
(91.7)
Under a change to a new basis the matrices of the form are related by
G1 = P'G,P
and we shall say that they are Hermitian-congruent.
Now we can establish a relation between the type of a bilinear form
and the type of its matrix. If the form is symmetric, then for any
basis e1 , e 2 , • • • , en
IW = <p (e 1, e1) = <p (e 1, e1) = ~),
i.e. G, = G; and the matrix G, of the form <p (x, y) is symmetric.
If, however, the form is skew-symmetric, then
~) = <p (e 1, e1) = - <p (e 1, e1) = - ~~>,
i.e. G, = -G;. In this case the matrix G, is also called skew-sym-
metric.
The converse is also true. If in some basis the matrix of a form is
symmetric (skew-symmetric), then so is the bilinear form generating
292 Bilinear and Quadratic Forms [Ch. 11
q> (y, x) = y;Gei'. = (y;GeXe)' = x;G;y. = x;G:ie = x;G.Y. = q> (x, y).
For the case Ge = -G: we find
q> (y, x) =y;Geie = (y;Gexe)' = x;G;ye =x;G:y.= -x;GeYe= -cp (x, y).
The matrix of a zero bilinear form consists only of zero elements,
i.e. is a zero matrix. It is the only matrix that is simultaneously sym-
metric and skew-symmetric, as is the zero form.
We have already noted that there is a very close connection between
symmetric bilinear and quadratic forms. It is especially obvious on
the rnatrix level. For a bilinear form q> (x, y) th0 matrix relation
(91.5) holds. For the corresponding quadratic fl'lmt we have
q>:(x, x) = x;&11 X 11 • (\J1.8)
For a fixed basis e1 , e 2 , • • • , en, given any matrix G11 , (91.8) defines
some quadratic form. The matrix Ge in (91.8) is now called not the
matrix of a bilinear form but the matrix of a quadratic form.
While for bilinear forms there is a 1-1 correspondence between the
forms and their matrices given a fixed basis in Kn, there is no longer
such a correspondence now. Every quadratic form can be given by
the entire set of its matrices. This set contains only one symmetric
matrix and the difference between any two matrices of a given set is
a skew-symmetric matrix.
Thus any ordinary quadratic form can always be given by a sym-
metric matrix. Changing to a different basis affects the matrices of the
quadratic form according to (91.6). We therefore conclude again that
9t] The matrices of bilinear and quadratic forms 293
A={(A+A*)+! (A-A*).
If A is the matrix of a bilinear form, then the first terms of the right-
hand sides are the matrices of the symmetric parts of the bilinear form
and the second terms are the matrices of the skew-symmetric parts
of the same form.
We shall often carry over to matrices without comment the ter-
minology introduced for bilinear and quadralicforms. For example, we
shall call a matrix positive defimte, meaning by this that it is a matrix
of a positive definite form and so on.
One of the major problems connected with the bilinear form is that
of determining the simplest form its matrix can be reduced to by
changing the basis and finding the appropriate basis. This problem
294 Bilinear and Quadratic Forms [Ch. 11
Fxerclses
1. Prove that the determinant of a Hermitian matrix
is a real number.
2. What kind of number is the determinant of a skew-Hermitian matrix?
3. Prove that the rank of a skew-symmetric matrix is an even number.
4. Bilinear forms q> (.:z:, y) and q> (y, .:z:) are in general different. What can be
said about their matrices?
5. Prove that the rank of a sum of bilinear forms does not exceed the sum
of the ranks of the summands.
6. Prove that every bilinear form of rank r can be represented as a sum of r
bilinear forms of rank 1.
7. Prove that every bilinear form q> (.:z:, y) of rank 1 can be represented as
q1 (.:z:, y) = q1 (.:z:, a) •ql (b, y)
for some vectors a and b. Is this representation unique?
(92.1)
Multiplying A on the left by a matrix P' does not affect the first row
of A and makes zero all off-diagonal elements of the first column of
a matrix P' A. Multiplying P' A on the right by P does not affect the
first column of P'A.
Note one important fact. All matrix minors in the upper left-hand
corner of a matrix will be called principal minors. Since the matrix P
is right triangular and each of its diagonal elements is equal to unity,
of all minors in the first r columns only the principal minor is non-
zero, it is equal to unity. All principal minors will therefore coin-
cide in the matrices A and C. Indeed, using the Binet-Cauchy for-
mula we find
c(f1 22 ...
.. .
r)- r
~
l~ltl<lt2< .•. <ltr~n
P'(1 2
kl kz
X AP (
kl
1 2
k, k )
r r = AP
(11 22
r
kr
)·P(k1 kz ... kr)=A(12 ...
1 2 ... r 1 2 ... r
r).
We shall use this remark later on.
B. In a matrix A the element au is 0, but some element au is other
than 0, where j > 1. There is a nonsingular matrix P such that for
a matrix C = P'AP the element Cu = au is other than 0. The ma-
trix P differs from a unit matrix only in the four elements at the inter-
sections of rows and columns with indices 1, j. In those positions P
is ofthe form (~~).Multiplying A on the right by P interchanges in A
92) Reduction to canonical form 297
(92.3)
• - { - at2• j = 2,
(/I- 0, j :j=.2,
1 au
• j>2.
Multiplying A on the left by P; does not affect the first two rows and
the second column of A and makes zero all the elements of the first
column of P;A save the first two. Multiplying P;A on the left by P2
does not affect the first two rows and the first column of PIA and
makes zero all the elements of the second column of P' A save the
first two. Multiplying P' A on the right by P does not affect the first
two columns of P'A.
G. Suppose a matrix A has for some partition into blocks the struc-
ture
A=(~A12) 1 (92.4)
0 !A 22
C= (Au; ~12P22 )
0 : P11 A22P22
is congruent with A. Moreover, C = P'AP, where
El 0
p= Co··[··p~J ·
A direct check of all the assertions made in the descriptions of the
auxiliary steps presents no particular difficulty, and it is ]eft for
the reader as an exercise to show their validity.
The method as a whole is carried out as follows. At the first basic
step the matrix A is reduced to the form (92.4), where A 11 is a non-
singular 1 X 1 or 2 X 2 matrix. If a matrix A,., k ~ 1, is of the
form (92.4), then at the next basic step the matrix in the lower right-
92) Reduction to canonical form 299
hand corner is also reduced to the form (92.4) and a general congru-
ence transformation is carried out according to step G. The matrix
A II +I can again be represented in the form (92.4) but the block in the
upper left-hand corner will be not only nonsingular for it but will
also have a greater size than for the matrix A11 • The process is repeat-
ed until at some step in the matrix A, there appears in (92.4) a
zero block in the lower right-hand corner or the size of the block in
the upper left-hand corner is n X n. The resulting transformation
matrix is a left-to-right product of the transformation matrices of
all the steps.
The form of A, depends on whether the matrix A is skew-symmet-
ric or not. So does the composition of the basic steps and the aux-
iliary steps.
Whatever the structure of a basic step, its aim is to obtain the next
portion of zeros in the matrix to be transformed. If the original ma-
trix is not skew-symmetric, then zeros are always obtained using an
auxiliary step A, and steps B to C are necessary only for it to be pre-
pared. But if the original matrix is skew-symmetric, then zeros are
obtained using step F, and it is step D that is preparatory. We de-
scribe the basic step of the method also in terms of the transformation
of a matrix A and we begin with a nonskew-symmetric matrix A.
At the first basic step the matrix to be transformed is nonskew-
symmetric. If the element a 11 :/= 0 and all off-diagonal elements of
the first column are zero, then nothing changes and we assume that
the basic step has been carried out. We take a unit matrix as a trans-
formation matrix P. In general, however, we carry out the fust of
the auxiliary steps A to C that can be made. If this happens to be
step B or C, then after it we must carry out step A or both steps, B
and A. We take as a transformation matrix P a left-to-right product
of all transformation matrices of the actually made auxiliary steps.
As a result of the first basic step, in the transformed matrix A1 all
the off-diagonal elements of the first column will be zero, i.e. A1
will have a block structure of the form (92.4).
The difference of all the other steps from the first is due to the fact
that the matrix to be transformed may turn out to be skew-sym-
metric. If it is not, then the basic step to be made next does not differ
in anything from the first. If however, the matrix to be transformed
is skew-symmetric, then under any congruence transformation it
remains skew-symmetric and it is impossible to obtain a nonzero
element in the upper left-hand corner using this matrix alone. A way
out is based on transforming an extended lower diagonal block.
Until a skew-symmetric matrix is found, the block in the upper
left-hand corner of (92.4) for matrices A 11 will be a right triangular
matrix with nonzero diagonal elements. If the elements in positions
(1, 2) and (2, 1) of the skew-symmetric matrix are nonzero in the
lower right-hand rorner, then for the matrix A,. which is the next to
300 Bilinear and Quadratic Forms [Ch 11
0--(o).
P'AP= ( -- (92.6)
M: N)
P'AP= ( ··o-to (92.7)
For matrices of the canonical forms (92.5) and (92.6) we may per-
form yet another congruence transformation with a diagonal matrix
and have the nonzero elements determining the nonsingularity of the
block M equal either +1 or -1. Such a canonical form of a matrix
and the corresponding basis are called normal. It is clear that mul-
tiplying on the right (left) by a diagonal matrix results in multi-
plying the columns (rows) by the diagonal elements of the transfor-
mation matrix. We again describe the transformation in terms of
the auxiliary step with a matrix A.
H. A real nonskew-symmetric matrix A of rank r has the canonical
form (92.5). There is a real diagonal matrix P such that the nonzero
diagonal elements CJJ of a matrix C = P'AP are equal to sgn au.
with
j~r.
112
_ { (ausgnau)- ,
Pu-
t, j>r.
A real (complex) skew-symmetric matrix A of rank r has the canon-
ical form (92.6). There is a real (complex) diagonal matrix P such
that the nonzero upper off-diagonal elements of the matrix C = P' A P
equal +1 and the nonzero lower off-diagonal elements equal -1,
with
1, j is odd,
Pu= { a- 1 . j is even
1-1,1' •
Exercises
1. Prove that if reduction to canonical form using
a matrix P is carried out according to the above method, then det P = ±1.
2. What does the matrix equation
(92.8)
(93.2)
Bilinear and Quadratic Forms [Ch. it
a,,- 2 •-1
l1p dppUpJ (93.3)
u,, P=l
= _ _...:;;_..:.d-=-,-,---
1- I
aj 1 - ~ ljpdppUpf
l,, = _ _ _,po-'--"'1---;-----
du
t> 1, J>i.
We apply to (93.1) the Binet-Cauchy formula. Recall that among
the minors of the left triangular matrix L in the first r rows only the
principal minor is nonzero, it is equal to unity. A similar assertion
holds for the matrix U, if the rows are replaced by columns, of course.
Therefore
1 2
A (1 2
from the canonical basis to the original basis, which are also right
triangular.
In the case of a symmetric matrix the described process of decom-
position is closely related to the so-called Jacobi algorithm for
transforming a quadratic form to canonical form. The only difference
is that in the Jacobi algorithm we find the matrix s- 1 instead of s.
Notice that S is much easier to find than S- 1 •
Congruence transformations with a right triangular matrix are
among the simplest, but still sufficiently general transformations to
be applied to a wide class of matrices. Of certain interest therefore is
a description of the class of matrices that can be reduced to canonical
form using transformations wilh a right triangular matrix.
Lemma 93.1. If a rectangular matrix A is representable in the
block form
(93.8)
where the corresponding blocks have the same size as in (93.8). Then
P= (L ~~.:.~--~B; ·R:).
1
( ~~~-~~~-~--~-~~~=-~-~- ~~-:-_?.~) .
1
P' AP = ,
Exercises
1. Prove that if decompositions (93.1), (93.5) and
(93.6) exist, then they are unique.
2. Prove that 1f all the minors of a matrix A in the lower right-hand corner
(except perhaps the minor of the highest order) are nonzero, then there is a
unique decomposition A =LDC.:, where Lis a right triangular n1atrix, U is a left
triangular matrix, each with unit diagonal elements, and D is a diagonal matrix.
3. Prove that for the elements d 11 of the matrix D of Exercise 2
A ('• 1+1, ... , n)
d- _ 1, r,-1 •... , n
i<n.
11
- A (''1, 1+2•... , ") I
j -· 11 I T 2, • • • 1 n
4. Into what triangular factors can a matrix be faetored if its minors in the
lower left-hand CupJler right-hand) corner are nonzero?
5. Suppose for the elements a 11 of a matrix A
a,,= 0, k < j - t, 1 - i < l, (93.11)
given some numbers l < k. Such a matrix is called a band matnr. Prove that
if for a band matrix A decomposition (93.1) holds, then
llj = 0, 1 - i < l, u,,
= 0, j - f > k.
6. A matrix A is said to be tridiagonal if it satisfies conditions (93.11) for
k = 1 and l = -1. What form have formulas (93.3) and (93. 7) for a tridiagonal
matrix?
7. A matrix A is said to be raght (left) almost tnangular if it satisfies condi-
tions (93.11) fork= n and l = -1 (k = 1 and l = -n). What form have
formulas (93.3) for almost triangular matrices?
8. What number of arithmetical operations is required for the various forms
of matrices to obtain decompositions of the type (93.1)?
9. How are decompositions (93.1), (93.5) and (93.6) to be applied to solve
systems of linear algebraic equations?
Similarly
n
ell .. · C1n)
det ................. - .. :::1= 0. (94.3)
(
Cnt • · • Cnn
It follows that
z1 (a) = ... = z 1 (a) = 0. (94.5)
On the other hand. according to the choice of the numbers a 17
a 2,• • • , an we have
A(11 22 ... s)
... s
cal form using the transformation defined by formulas (93. 7). Under
the hypotheses of the theorem and according to formulas (93.4)
all the coefficients of the canonical form will be positive, i.e. the
quadratic form is positive definite.
Corollary. For a quadratic form to be negative definite it is necessary
and sufficient that all principal minors of odd order should be negative
and all principal minors of even order should be positive.
The proof follows from Sylvester's criterion and from the fact
that if A is the matrix of a negative defmite quadratic form, then -A
is the matrix of a positive definite quadratic form.
Theorem 94.3 (Jacobi's criterion). For a quadratic form to be positive
definite it is necessary and sufficient that all coefficients of the character-
istic polynomial of the matrix of the form should be nonzero and ha~:e
alternating signs.
Proof. Necessity. As already noted, a transformation of variables
with an orthogonal matrix can reduce a given quadratic form to
canonical form, where coefficients are the eigenvalues A. 1 , A. 2 , • • • , An
of the matrix of the form. Under the hypotheses of the theorem the
eigenvalues must be positive. The characteristic polynomial f (A.)
equals
/(A)=(A-A 1)(A.-A. 2) ••• (A.-A.n)=A.n+an_ 1A,n-t + ... +a 1A.+a 0
and all of its coefficients are nonzero and have alternating signs,
which is immediate from Vieta's formulas for the coefficients a 1•
Sufficiency. Let the coefficients of the characteristic polynomial
be nonzero and have alternating signs. The roots of this polynomial
will be real as the eigenvalues of a symmetric matrix and it remains
to show that they are positive. Suppose this statement has been
proved for all polynomials of degree n-1. Since all coefficients f' (A.)
are nonzero and have alternating signs, under the assumption f' (A.)
has the (n - 1)th positive root. It is known from mathematical
analysis that if a polynomial has only real roots, then they are
separated by the roots of a derivative. Therefore f (A.) has at least
n - 1 positive roots. The last root will also be positive since the
product of roots is positive.
Criteria for nonnegative and nonpositive quadratic forms are
much more complicated, and this is mainly due to the fact that in
these cases the matrices of the forms are singular. One of the main
ways of investigating the constancy of signs of a quadratic form
involves reducing its matrix to a symmetric form (93.8) and (93.9)
and studying that form. Matrices of constant ~igns being closely re-
lated, we restrict our consideration to nonnegative matrices only.
We shall say that a matrix H is a permutation matrix if in each of
its rows and in each of its columns there is only one nonzero element
and all the nonzero elements equal unity. It is clear that multiply-
ing an arbitrary matrix A on the right by a permutation matrix H
!l'tl Symmetric bilinear forms 313
1 2 ...s)}
;;;;::: { (P-tH) ( 1 2 ... s
2
> 0.
Exercises
1. Prctve that if all principal minors of a real sym-
melr•c or a complex Hermitian matrix are nonzero, then the number of its
positive and negative eigenvalues coincides respectively with that of
positive and negative terms of sequence (93.4).
2. Prove that if a matrix is positive definite, then any diagonal minor is
pGsitive.
3. Prove that a symmetric matrix of rank r always has at least one diagonal
minor of order r not equal to zero.
4. Prove that the maximum element of a positive definite matrix is on the
principal diagonal.
5. Prove that A is a positive definite matrix if for every t
I a, I>~ I a11l·
'""'
6. Prove that for any symmetric matrix A of rank r there is a permutation
matrix H such that among the first r principal minors of the matrix H'AH there
are no two adjacent zero minors and the minor of order r is nonzero.
7. Prove that the matrix H' A H of Exercise 6 can be represented asH' A H =
= S' DS, where S is a right triangular matrix with unit diagonal elements and D
is a block-diagonal matrix with t X t and 2 X 2 blocks.
8. Prove tliat any nonnegative matrix of rank r can be represented as a sum
of r nonnegative matrices of rank 1.
9. Let A and B be positive definite matrices with elements a 11 and b 11 .
Prove that a matrix C with elements c11 = ailb 11 is also positive definite.
= (Al
'Xo
) + lb. ll-tAI. .:z:,,l (Al l)
(AI, l) '
= (b ' l)
•
951 Second-degree hyper.,urfaccs 317
x•= ~ (x+x')
allows us to call the point x* the centre of symmetry of the hypersur-
face. If on (95.2) there is at least one point of Rn, then the centre of
symmetry is said to be real. Otherwise it is called imaginary.
318 Bilin£'ar and Quadratic Form" [Ch. 11
Now let x• be the centre of symmetry, i.e. for any x the left-hand
side of (95.2) assumes the same values at the points x and x'. Hence
(Ax, x)- 2 (b, x) 1 c = (Ax', x')- 2 (b, x') +c.
According to (95.8) and (95.9) this is possible only if for any x
(Ax* - b, x - x*) == 0.
But the last identity holds if and only if Ax* - b = 0, i.e. if x•
is a solution of system (95.7). Notice that here we have never assumed
either the nonsingularity of the matrix A or the presence of any
other of its properties besides symmetry. Therefore:
For a system Ax = b to have a solution it ts necessary and sufficient
that hypersurface (95.2) should have a centre of symmetry. The set of all
solutions coincides with that of all centres of symmetry.
Thus there emerges a far-reaching connection between systems of
linear algebraic equations and second-degree hypersurfaces. It is
widely used in constructing a host of computational algorithms.
Construction of a system of diametrical hyperplanes is central to
a large group of methods among the so-called methods of conjugate
directions. These will be discussed in the last chapter.
In general investigation of second-degree hypersurfaces may be
based on reducing them to canonical form in much the same way
as quadratic forms are. But in addition to linear nonsingular trans-
formations of variables, translations are required.
Consider any transformation of variables x = Py reducing a quad-
ratic form (Ax, x) to normal form. In variables y 1 , y 2 , • • • , Yn the
equation of a hypersurface will have the form
Yi + .· . + Y~- Y:+l- ... - y;
-2dsYs- · ·· -2drYr-2dr+tYr+l- ·· · -2dnYn+c=O.
Now we translate the variables by the formulas
y,-d,, 1~i~k,
z1 = y 1 +d 1, k+1~i~r,
{
y 1, r+1~i~n.
Exercises
1. Let A be a positive definite matrix. Prove that on
the solution of the system .4 x = b the expression (Ax, x) - 2 (b, x) reaches
its minimum.
2. Let A be a positive definite matrix. Prove that on the straight line (95.3)
the expression (Ax, x) - 2 (b, x) reaches its minimum for the value oft in (95.5).
3. Prove that for any direction to be nonasymptotic for hypersurface (95.2)
it is necessary and sufficient that the quadratic form (Ax, x) should be either
positive definite or negative definite.
4. What symmetry properLy has a diameLrical hyperplane conjugate to
a direction l if l is an eigenvector of a matrix A corresponding to a nonzero
.eigenvalue?
5. Prove that the system Ax = b has no solution if and only if hypersur-
face (95.2) can be reduced to canonical form (95.10).
a= V- ~: , b= V - ~: . (96.2)
Under the hypothesis a and bare real numbers, so (96.1) is equiva-
lent to
(96.3)
1/2
= ( x2 ( 1 - :: ) 1 2xc !...a 2 )
I
= ( --...L2xcTa
:z;2c2
a:t
I
2) 1/2
c ) 2) 1/2 c
=.( ( -;x+a =-;x+a.
Finally we have
c c
p(M,F 1)+p(.lf, Fz)= --x+a+-x
a a
-a= 2a.
322 Bilin('ar and Quadratic Forms [Ch. 11
1.2. The number a 0 is not zero; the numbers "-t. A2 and a 0 have the
same sign. Let
a=V~> (96.5)
(96.6)
a= VI ;11'
Then (96.1) is equivalent to
xl y2
QT+b2 =0. (96.7)
The points F 1 and F 2 with coordinates (-c, 0) and (+c, 0) are called
the foci of the hyperbola.
Theorem 96.2. The absolute value of the difference of the distances
from any point of a hyperbola to its foci is a constant value equal to 2a.
Proof. For any point M (x, y) of a hyperbola
bZ+ b2:z;2
2-
y-- --ar-·
For the same point
p (M. F2) = ((x- c) 2+ y2) 112= ( x2- 2xc + c2- b2 + --az
b2:z;2 ) 1/2
Further
= ( x 2+ 2xc + c2- b2 +--;;-
1/2
p (111, F1) = ((x + c) 2+ y2)1/2
b2:z;2 )
= ( x2 ( 1 + :: ) + 2xc + a2 r' 2
= ( x:~! + 2xc + a2)
112
= (( : X+ a) 2r'2 =I : X+ a,.
For all points of a hyperbola we have I x I~ a and cia> 1. Therefore
c
F,)~ {
-x-a for x>O,
a
p(M, c
--x+a for x<O,
a
c
-x+a
a
for x>O,
p(M, FJ- { c
--x-a for x<O.
a
Finally
I p (M, F 1) - p (M, F 2) I = 2a.
Consider the part of the hyperbola that is in the first quarter. For
that part x ~ a and y ~ 0, and equation (96.8) is equivalent to
b ;--
y=-1
a
x2-a2,
assuming of course b > 0 and a > 0. It is easy to see that this func-
tion can be represented in the following form:
b ba
y=- X- • (96.9)
a x+ V :z:2-a2
324 Bilinear and Quadratic Forms [Ch. 11
that of the straight line (96.10) having the same abscissa x. With x
increasing without limit, the difference
1 ba
Y -y= X
+'r
V X 2 -a 2 '
A o;imilar property holds for the other parts of the hyperbola too,
the role of the straight line (96.10) being played by one of the
straight lines
b b
Y=-x,
a
y=--x.
a
(96.11)
1 X+IJ
Y=a- "b·
From (96.8) we have
( : - : ) (: -1- ~) =1.
Hence in the new coordinate system (not rectangular, in general) the
equation of a hyperbola has the form
1
x'y = 1 (96.12)
or
1 1
Y=?·
This is just the familiar school-book equation. Equation (96.12) is
called the equation of a hyperbola in its asymptotes.
1.5. The number a 0 is zero; the numbers A. 1 and A. 2 are opposite in
sign. After a standard change of coefficients we get
(96.1~)
96] Second-dcgreo curves 325
(: - ~ ) ( : + ~ ) =0
or
b b
Y=a-x, Y=--aX· (96.14)
Thus (96.13) is the equation of a curve splitting into two intersecting
straight lines (96.14).
Consider now the second equation of (95.13). It has the
form
(96.15) !I
11.6. Both numbers A. 2 and b0 are non-
zero. Let
2p=- ~: *0.
Now (96.15) is equivalent to the following
equation:
y 2 = 2px. (96.16) Fig. 96.3
The curve described by this equation is called a parabola (Fig. 96.3)
and the equation is called the canonical equation of a parabola. It
may be assumed without loss of generality that p > 0, since for
p < 0 we obtain a curve symmetric with respect to the y axis. Like
the hyperbola, the parabola is an unbounded curve. It has only one
axis of symmetry, the x axis, and no centre of symmetry. The point
of intersection of the axis of the parabola with the parabola itself
is called the vertex of the parabola. The point F with coordinates
(p/2, 0) is called the focus of the parabola. A straight line L given by
X=- ~ (96.1i)
is called the directrix of the parabola.
Theorem 96.3. The distance from any point of a parabola to its direc-
trix is equal to that from the same point to the focus of the parabola.
Proof. We have for any point M (x, y) of a parabola
p (L, M) = X+ ~ '
and further
2 12 2 2
p (F, M)= (( x- ~ ) + y2r = ( (x- ~ ) +2px r'
=(x2-px+ ~2 +2px)t'2=(x2-+-px+~2rt2
=((x+ ~ rr'2=x+ ~,
since x;;;;:: 0 and p > 0.
326 Bilinear and Quadratic Forms [Ch. 11
are in the same half-plane given by the y axis. We can now show that:
The ratio of distances p (M, F,) and p (M, a 1) is constant for all
points M of an ellipse, a hyperbola and a parabola.
For the parabola this statement follows from Theorem 96.3. For
the ellipse and the hyperbola it follows from (96.23) and (96.24).
The ratio
p(M, Ft)
e = ....:.....,-=-='---'c:....
p(M, Ut)
is called the eccentricity. We have:
e=~=(1- b2 )''2<1
a a4
for the ellipse,
c ( b2 ) 1/2
e=-;z= 1+-;zr >1
for the hyperbola,
e =1
for the parabola.
Fxerclses
1. What is a diametrical hyperplane conjugate to
a given direction for second-degree curves?
2. Write the equation of a tangent for the ellipse, the hyperbola and the
parabola.
3. Prove that a light ray issuing from one focus of an ellipse passes, after
a mirror reflection from a tangent, through the other focus.
4. Prove that a light ray issuing from the focus of a parabola passes, after
a mirror reflection from a tangent, parallel to the axis of the parabola.
5. Prove that a light ray issuing from one focus of a hyperbola, after a mir-
ror reflection from a tangent, appears to issue from the other focus.
where
a•=aV 1 + h'l.jc'l. and b*= bY 1 +hz;cz,
its size increasing without limit for h-+ +oo. Sections of the one-
sheeted hyperboloid by the y, z and x, z planes are hyperbolas.
97) Second-degree surfaces 329-
II. 7. The numbers 1..1 and 1.. 2 are of the same sign. It may be assumed
without loss of generality that b 0 is opposite in sign, since if b 0
-eoincides in sign with 1..1 and 1.. 2 , we obtain a surface symmetric with
a (1 + ~ )-P(; -7)=0
in a and p. The determinant of the system is zero if and only if
a point M (x, y, z) is on hyperbola (97.10). The rank of the matrix of
the system is obviously equal to unity. Hence a and pare determined
up to proportionality. But this precisely means that there is a
unique straight line r through each point of a hyperboloid.
Similarly we can see that through each point of a hyperboloid
there is a unique straight line r• determined by the planes
v(:+7)=A(1+~). A(:-7)=v(1-:).
The straight lines r and r• are distinct. The same reasoning shows
that the hyperbolic paraboloid
.:z:2 y2
Z= Q'2-'b!"
is covered by two distinct families of straight lines n and n• deter·
mined by the planes
az = p ( ; + ! ), p= a(; - : )
and
vz = ). (-=._- JL)
a b '
Exercises
1. What is a diametrical hyperplane conjugate to
a given. direction. for second-degree surfaces?
2. Write the equations of a tangential plane for the various second-degree
surfaces.
3. Investigate the optical properties of second-degree surfaces.
CHAPTER 12
relations (98.4) imply that (v, xi) = 0 for every j. The vector v is
a nontrivial linear combination of vectors x 1 , x 2 , • • • , Xm and is left
orthogonal to each of those vectors and therefore orthogonal to each
vector of their span. The vector u is constructed in a l'imilar way but
proceeding from the linear dependence of the columns of the Gram
matrix.
Corollary. It the Gram matrix for a linearly independent system of
vectors is singular, then the quadratic form (x, x) has an isotropic vector
lying in the span of the given system and right (left) orthogonal to all
vectors of the span.
Indeed, by virtue of the linear independence of the vectors of the
~ystem the vectors u and v are nonzero; moreover, (u, u) = (v, v) = 0.
In a number of important cases the Gramian is a convenient tool
for establishing the fact of linear dependence or independence of
a system of vectors.
Lemma 98.2. For any linearly dependent system of vectors the
Gramian is zero.
Proof. Let a system x 1 , x 2 , • • • , Xm be linearly dependent. Then
the zero vector x can be represented as a nontrivial linear combina-
tion of vectors x 17 x 2 , • • • , Xm· But in this case the homogeneous
.336 Bilinear Metric Spaces [Ch. 12
Exercises
in the left null subspace. But er+h ... , en are linearly independent
as vectors of a basis and equal in number the dimension of th&
null subspaces, so both null subspaces coincide.
Corollary. If in a bllinear metric space the scalar product is given.
by a symmetric or Hermitian-symmetric bilinear form, then its right
and left null subspaces coincide.
Corollary. In any orthogonal basis isotropic z:ectors. and only
isotropic vectors, form the basis of the null subspace in common.
Corollary. If in a space with a scalar product there is an orthogonal
basis, then that space can be decomposed as an orthogonal sum of an!f
nonsingular subspace of maximum dimension and a null subspace.
The last corollary actually means that the study of any singulazo
spaces with orthogonal bases reduces to studying separately non-
singular subspaces with orthogonal bases and subspaces on which
the scalar product is zero.
To know an orthogonal basis in a space is not only to be able to-
find an orthogonal basis in the nonsingular subspace of maximum
dimension but also to obtain an explicit expansion of the orthogonal
projection of any vector onto that subspace with respect to its orthog-
onal basis. Indeed, let e1 , e 2 , • • • , e, be an orthogonal basis in K,,
let e1 , • • • , er be nonisotropic vectors and let er+t• ... , e, b&
isotropic vectors. Denote by L the subspace spanned by the vectors
e1 , • • • , er. It is clear that it is nonsingular and of maximum dimen-
sion, that LJ. = J.L and in addition
K,=L~LJ..
Any vector x in K, can be represented in a unique way as a sum
x = u + v, where u E L and v E LJ.. Here u is called the left orthog-
onal projection of x onto a subspace Land vis the left perpendicular
to that subspace. We write for x expansion (100.2) with respect to
e1 , e2 , • • • , e,. Formula (100.1) no longer holds. Observe, however,
that the first r terms in (100.2) form a vector u and the last n - r
terms form a vector v. Performing a right scalar multiplication of
equation (100.2) successively by eh ... , er we get
r
"" (x, eJ)
u= .<::::.J (e}t eJ) eJ.
1=1
J=l
The only thing that cannot be done now is to find the expansion of v
with respect to the vectors er+l• ... , e, using a scalar product~
although the expansion itself does exist.
346 Bilinear Metric Spaces [Ch. i2
we find that
a.I (et, el) = (x, e 1),
al (el, e2) + a2 (e2, e2) = (x, e 2 ),
(100.3)
a 1 (e 1, e,.) -;- a 2 (e 2, e,) + ... +
a, (e,, e,.) = (x, e,).
From these we successively determine a~' a 2 , • • • , a,. Of course,
the yectors of a pseudoorthogonal basis in a nonsingular space can
be normed to yield a pseudoorthonormal basis such that I (eJo e1) I =
= 1 for every j.
Observe that the process of solving system (100.3) gives much
more than just a simple expansion of a Yector x with respect to
a pseudoorthogonal basis e 1 , e 2 , • • • , e,. Simultaneously, without
any extra costs we can determine all the vectors
u 11 = a 1e1 +a e
2 2 + ... + a"e 11 •
The vectors u 11 form a sequence of projections of the same vector x
onto embedded subspaces
L1 c::: L 2 s;:;: ••• s;:;: L,,
where L 11 is the span of vectors e 1 , e 2 • • • • , e11 • If we look at u 11
as an "approximation" to the solution x, then the left orthogonality
of the "error" v11 = x - x 11 to L 11 in fact implies the left orthogonality
of v11 to u 1 , u 2 , • • • , u 11 • We shall return to all these questions some-
what later.
If a space K, is singular, then in general the existence of a pseudo-
orthogonal basis does not guarantee the coincidence of the right
and left null subspaces and hence one cannot expect the space to be
decomposed as an orthogonal sum of some of its snbspaces. But knowing
a pseudoorthogonal basis makes possible an efficiPnt construction of
a decomposition of the space as a direct sum (99.1).
Suppose that in a space K, of rank r there is a pseudoorthogonal
basis e 1 , e 2 , • • • , e,. It will be assumed that the vectors e1 , • • • , er
are nonisotropic and er+I• ... , e, are isotropic. In a pseudoorthogo-
nal basis the isotropic ,·ectors are left orthogonal to all vectors of
thP basis, so they are left orthogonal to all vectors of K,. But this
means that the isotropic vectors of the pseudoorthogonal basis form
a basis of the left null subspace J.K,.. Denote by L the span of the
vectors e 1 , • • • , er. By the second corollary of Theorem 99.3
.
K,=L+l.L=L+J.Kn,
.
with bases known for both L and J.K,.. For L the basis e 1 , • • • , er
is pseudoorthogonal.
So the stndy of any singular spaces with a pseudoorthogonal basis
reduces to a simultanl'ous study of nonsingular subspaces with
Bilinear Metric Spaces [Ch. 12
Exercises
1. Let a scalar product be symmetric. Is the number of
vectors with positive, negative and zero values of (e 1, e,) invariant for nonor-
thogonal bases e1 , e2 , ••• , en?
2. How can any real or complex vector space be converted into a bilinear
metric space with a symmetric scalar product with a given rank ao.d signature?
3. An orthogonal basis has no isotropic vectors in a nonsingular space. Can
there be a basis consisting of isotropic vectors in such a space?
4. Prove that as functions of vectors of a bilinear metric space orthogonal
projection and perpendicular are linear operators.
5. What form has the Gram matrix for a pseudoorthogonal basis if the right
and left null spaces coincide?
6. Prove that io. any ordinary or Hermitian bilinear metric space there is
a basis in which the Gram matrix is a right block-triangular matrix with 1 X 1
and 2 X 2 blocks along the diagonal.
i. How can the coefficients of an expansion of a vector with respect to
a basis for which some dual or pseudodual basis is known be determined?
8. Prove that in a nonsingular space the coordinate transformation matrix
for a change from one basis, pseudodual to a given basis, to ao.y other pseudo-
dual basis of the same name is left triangular.
For the left adjoint operator the relations are similar. All relations
can be proved according to the same scheme using representations
(101.2) for the matrices of adjoint operators. Therefore we shall
prove only the validity of the last property. We have
(A:tt = G;1 (A;tt (G;ltt = G;l (A;l)' G.= (A; I)•.
Comparing formulas (75.4) and (101.3) we can see the absence in
(101.3) of the analogue of the fir~t of the relations (75.4). It now looks
like this:
(*A)*= *(A*)= A. (101.4)
To prove its validity we again turn to representations (101.2) and get
(*Ae)* = G; 1(*Ae)' Ge = G;l (G; 11
A;G;)' Ge = G; 1GeA/[; 1Ge = Ae,
*(A:)= G; 1 ' (A:)' G~ = G; 1' (G; 1A;G;)' G; = G; 1 'G;AeG; 1 'G; = Ae,
i.e. relations (101.4) do hold.
Theorem 101.2. If in a nonsingular Hermitian bilinear metric
space an operator A has in some basis a matrix J, then in a right (left)
dual basis the operator A • (*A) has a matrix J*.
Proof. Let A have a matrix J in a basis e1 , e2 , • • • , en. Consider
a right dual basis / 1 , / 2 , • • • , In· Denote by Ge, G, and Get= E
the matrices of a bilinear form (x, y) in the corresponding bases.
If P is a coordinate transformation matrix for a change from the
first basis to the second, then we have
G.=G.,J5-t=p-t, G,=P'Get=P'
and then, taking into account (63.7) and (101.2), we get
Ai = G!1AjG1 = G/ 1 (P- 1JP)' G1 = G! 1G;J'G! 1G1 = J*.
If, however, A has a matrix J in a basis / 1 , / 2 , • • • , ln. then for
that basis the basis e1 , e2 , • • • , en is the left dual basis and we now
find
• Ae = G; 1 ' A;G; = G; 1 ' (PA 1 P- 1)' G; = G; 1 'G~J'G; 1 'G; = J•.
This theorem is as significant in the study of adjoint operators in
Hermitian bilinear metric spaces as Theorem 75.2 is in unitary
spaces. It follows from it in particular that the right and left adjoint
operators A • and • A have the same eigenvalue~. complex conjugate
to those of A, that the right and left adjoint operators A • and • A
have a simple structure, if A has a simple structure, and so on.
· Besides a scalar product (x, y) other Hermitian bilinear forms can
be given in a Hermitian bilinear metric space. Consider, for example,
functions of the form (Ax, y) and (r, Ay), where A is an arbitrary
linear operator. It is not hard to see that they are Hermitian bilinear
352 Bilinear Metric Spaces [Ch. 12
These matrix equations prove the validity of the first group of the
operator equations (101.6). The second group follows trivially from
the first, if we take into account the equation (x, My) = (* Mx, y)
and relations (101.2).
There are different types of operators in the Hermitian bilinear
metric space. An operator A is said to be Hermitian or self-adjoint,
if for any x, y E Kn
(Ax, y) = (x, Ay),
and skew-Hermitian or skew-adjoint, if
(Ax, y) = -(x, Ay).
Hence respectively
A= A*= *A, A= -A*= -*A.
An operator A is said to be isometric if for any x, y E Kn
(Ax, Ay) = (x, y).
This leads to the equations
*AA = A*A.
In an ordinary bilinear metric space the analogues of the Hermi-
tian and the skew-Hermitian operator are called a symmetric and
a skew-symmetric operator respectively. In what follows we shall
often deal with operators defined by the equation
A • = a.E + ~A (101.7)
for some numbers a and p.
By far not all properties of operators of a special form carry over
from the unitary to the Hermitian bilinear metric space, although
they do have something in common. We shall not discuss all these
questions.
Exercises
1. How are the characteristic polynomials of operators
A, A• and •A related?
2. Let a subspace L be invariant under an operator A. Prove that the sub-
space L.l. (.l.L) is invariant under A• (•A).
3. Prove that any ei~envector of an operator A corresponding to an eigen-
value ). is left (right) orthogonal to any eigenvector of A • (• A) corresponding to
an eigenvalue f.1. =I= ):.
4. Prove that any root vector of an operator A corresponding to an eigenval-
ue).. is left (right) orthogonal to any root vector of A• (•A) corresponding to
an eigenvalue f.1. =I= i.
5. Prove that the eigenvalues of a Hermitian (skew-Hermitian) operator
corresponding to nonisotropic eigenvectors are real (pure imaginary).
6. Prove that the moduli of the eigenvalues of an isometric operator cor-
responding to nonisotropic eigenvectors equal unity.
354 Bilinear Metric Spaces [Ch. 12
where sl.... , Sn and 111• .•. , Tin are the coordinates of X and y.
Again to within isomorphism a pseudounitary space is uniquely
defined by its two characteristics: dimension and signature, a posi-
tive and a negative index and so on.
Exercises
t. Prove that in isomorphic spaces to orthogonal
(pseudoorthogonal, dual, pseudodual) bases there correspond orthogonal (pseu-
doorthogonal, dual, pscudodual) bases.
2. Prove that in isomorphic spaces to nonsingnlar subspaces there corre-
spond nonsingular subspaces.
3. Prove that in isomorphic spaces perpendicular and projection go over
into perpendicular and proJection respectively.
4. Prove that in isomorphic spaces the Gramians of the corresponding systems
of vectors are equal.
CHAPTER 13
Bilinear Forms
in Computational Processes
The matrix of the system is left triangular. Under the assumption its
diagonal elements are nonzero, so system (103.2) has a unique solu-
tion. It is clear that the vector I II.+ 1 thus constructed and the vectors
/ 1 , . . . , l11. form together a pseudoorthogonal system and that their
span coincide!' with that of e1 , • • • , eh+I· The system of vectors
/ 1 , • • • , l11.+ 1 is linearly independent, for so is the system 11 , • • •
• · ., fh, e11+1·
We continue the process further. If it turns out that for every i
(j, 11) are nonzero, then the resulting system of vectors 11 , ••• , In
will be the desired pseudoorthogonal basis. Of course, we can now
norm the vectors 11 , . • • , In and obtain a pseudoorthonormal basis.
A useful consequence follows from (103.1). We rewrite it as
II.
el!.+t = (- ~ a.,,h+dt) + fh+t·
f=l
The vector in the parentheses lies in L 11 and the vector 111.+ 1 is in J.L,
by construction, so the solution of system (103.2) gives in fact a de-
composition of each vector e11 + 1 into the projection and left perpen-
dicular relative to L 11 •
The process is greatly simplified if the scalar product is given by
a Hermitian symmetric bilinear form. In this case the conditions
(j, IJ) = 0 for j < i imply that the conditions (f, !1) = 0 hold for
f =fo i. System (103.2) therefore becomes a system with a diagonal
matrix and we have
(ell+h ft)
a.,,l!.+t =- (f, It)
where ~ 1 . ll+ 1, ... , ~~~. ll+ 1 are unknown coefficients. The condi-
tions of the left orthogonality of tll+ 1 to the vectors e1, ... , e"
again yield for the determination of ~ 1 . 11.+ 1, ... , ~~~. 11+ 1 a system
of linear algebraic equations with a left triangular matrix:
~1.11.+1 (tit e1) = -(qll+I• e1),
~1,11.+1 (th e2) + ~~. 11+1 (t2, e2) = -(qlt+1• e2),
(103.4)
~1, 11+1 (t1, e~r) + ~2, lt+J. (t2, e~r) + ... + ~~~. 11+1 (t~~., e~r)
= -(qlt+1• e~~.).
According to the assumption about diagonal elements the system
has a unique solution. If it is found in continuing the process that
the quantities (t, e 1) are nonzero for all i, then after an appropriate
normalization the resulting system of vectors is a left pseudodual
basis to e1 , e 2 , • • • , e,.. Notice that now the process does not become
simpler if the scalar product is given by a symmetric bilinear form.
Employing an auxiliary basis q1 , q 2 , • • • , qn makes it possible to
avoid the degeneration of the process by replacing at the proper time
one of the vectors q1 and repeating the computation of the vector t 1•
Again the vectors t 1 , • • • , t 1 _ 1 remain unchanged.
In what follows, regardless of their particular content all the above
and similar processes will most often be called orthogonalization
processes. We shall sometimes have, however, to construct in the same
bilinear metric space Kn sequences of vectors orthogonal or pseudo-
orthogonal relative to different bilinear forms. We shall discuss only
360 Bilinear Forms in Computational Processes [Ch. 13
bilinear forms of the form (Rx, y), where R is some linear operator
in Kn. To distinguish between sequences, we shall speak in this case
of R-orthogonalization, R-pseudoorthogonalization, and so on.
Many properties and features of orthogonalization processes can
be established by considering their matrix notation. Let a scalar
product in Kn be given by a Hermitian bilinear form (x, y). The
pseudoorthogonality of a basis / 1 , / 1 , • • • , In implies that (f, fJ) =
= 0 for j < i, i.e. that the Gram matrix G1 of the bilinear form (x, y)
in the basis / 1 , / 2 , • • • , In is right triangular. According to the
process of constructing a new basis the spans of vectors / 1 , • • • , !11.
and e1 , • • • , e11. coincide. Hence in view of (103.1) we conclude that
el = /1•
e2 = -al, J1 + /2,
(103.5)
en= -Cll, n/1- Cl2, n / 2 - • • • - Cln-1,nfn-1+/n,
where a 11 are precisely the coefficients that are computed from system
(103.2). Therefore the coordinate transformation matrix A for a
change from the new basis / 1 , / 2 , • • • , In to e1 , e2 , • • • , en is a right
triangular matrix with unit diagonal elements. Since the coordinate
transformation matrix for a change from the old basis to a new basis
coincides with A -I, we have
G, = A-I'GeA-1.
Hence
(103.6)
It is easy to verify that G1A is a right triangular matrix whose
diagonal elements coincide with those of G1.
Denote byE q (F q) a matrix whose columns are the coordinates of
vectors e1 , • • • , en (/1 , • • • , In) in a basis q1 , • • • , qn. Relations
(103.5) show that
(103.7)
and that of course
(103.8)
Thus the above process of constructing a pseudoorthogonal basis
proves to be closely related to the factorization of a Gram matrix
into triangular factors and to factorization (103. 7) into the factors
of the matrix of coordinates.
Theorem 103.1. For process (103.1) and (103.2) of constructing
a pseudoorthogonal basis / 1 , / 2 , • • • , In from a basis eu e1 , • • • , en
in a nonsingular bilinear metric space K,. to be implementable it is
necessary and sufficient that the Gram matrix of the system e1 , e 2 , • • •
. . . , en should have nonzero princtpal miTUJrs.
103] Orthogoo.alization processes 361
Exercises
104. Orthogonalization
of a power sequence
In orthogonalization processes the coordinate
transformation matrix for a change from the old basis to a new basis
is always triangular. However, if the original basis is chosen in
a special way, it is possible to obtain significantly simpler representa-
tions for the coordinate transformation matrix and hence simpler
.orthogonalization processes.
Let A be some operator in a nonsingular Hermitian bilinear metric
space Kn. Take a nonzero vector x and consider a sequence of vectors
x, Ax, A 2x, ... , A 11 - 1x. (104.1)
We shall call such sequences power sequences generated by the vec-
tor x.
In any power sequence some number of the first vectors is linearly
independent. Suppose k is the largest of such numbers. This means
that there are numbers a 0 , a 1 , ••• , a 11 , with a 11 :::/= 0, such that
CloX + a 1Ax + ... + a 11 A 11x = 0. (104.2)
Denote by <p (A.) = a 11A. 11 + ... + a A. + a 0 a polynomial
1 of de-
gree k. Clearly (104.2) is equivalent to
<p (A) X= 0. (104.3)
There are many polynomials for which relations of the type
(104.3) hold. In particular, such a polynomial is the characteristic
polynomial of A. But there is clearly a polynomial of the lowest
degree among them. It is called the minimum polynomial annihilat-
ing the vector x. It is clear that its degree equals the maximum
number of the first vectors of the power sequence (104.1) that form
the linearly independent system or equivalently is a unity less than
the minimum number of the first vectors that form the linearly
dependent system.
The degree of the minimum polynomial turns out to be closely
related to the expansion of the vector x with respect to the root
basis of the operator A by the heights of the root vectors and the
number of mutually distinct eigenvalues. That is, we have
Lemma 104. t. The degree of a minimum polynomial annihilating
a vector x equals the sum of the marimum heights of the root vectors of
Bilinear Forms in Computational Process88 [Ch. t3
'· = x,
Is = A/1 - a.Ju (104.5)
t > 1,
where
i> 1,
(104.6)
1.04} Orthogonalization of a power sequence 365
for some numbers YJ. 1• It follows that the vector / 1+1 - A/1 is in
the span of vectors z, Ax, •.. , A 1- 1x or equivalently that of vectors
ft, / 1 , • • • , f 1• Therefore
e
for some numbers 1, t+l· The conditions of the left orthogonality
.of / 1+1 to / 1 , / 2 , ••• , ft yield the following system of linear al-
gebraic equations to determine the coefficients 61 , 1+1:
Sl, 1+1 (fit /1) = -(A/, /1),
e1. 1+1 (fl. t2> + s2. 1+1 <J2. t2> = -<At, t2>·
(104. 7)
~~. 1+1 (fl. IH) + ... + Sl-1, 1+1 <JI-1• fl-1) = -(A/, "-1>•
61, 1+1 <J1, /1) + ... + 61-1, 1+1 (JI-1, /1) + 61. t+l <J, /1)
=-(A/, /1).
Under the hypothesis of the theorem the operator A satisfies
condition (101. 7). Therefore in view of the pseudoorthogonality of
the system of vectors t1 we have for l < i - 1
(A/1• /,) = (fl, A•fl) =(II, (a.E +JM) /1) =a (/1• f,) +"ji (II, A/1)
I I
= ~(11. ft+t- ~ 61. 1+d1) = ~ {(!1• f~+t>- ~ ~J. 1+t U1• !1)} = o.
1=1 1=1
Among the right-hand sides of system (104. 7) only the last two are
s s
nonzero. Hence only 1- 1 , l+l and 1 , l+l may be different from zero,
which proves the validity of relations (104.5). The value for the
coefficient a. 1 is found from the condition of the left orthogonality of
/ 1 to / 1 and the values for the coefficients a. 1 and ~ 1 _ 1 are found from
the condition of the left orthogonality of the vector 11+1 to the vec-
tors / 1 and /1-1·
Thus the orthogonalization process for a power sequence does turn
out to be much simpler than that for a general sequence. If at some
step it is found that (j, / 1) = 0 but / 1 =I= 0, then the degeneration
of the process can be avoided by choosing a new vector x.
Suppose that there are n linearly independent vectors in the power
sequence. Applying the orthogonalization process it is possible to
366 Bilinear Forms in Computational Processes [Ch. 13"
V=U-~TlJfJ
1=1
Exercises
(105.2)
r 1 = Ax 1 - b.
Then from the relations
x 1 = x 1_ 1 + a Bs1 1 (105.3)
it follows that
r 1 = r 1_ 1 + a ABs,. 1 (105.4)
It is easy to show that for the chosen CAB-pseudoorthogonal sys-
tem s1 , • • • , Sn
(Crtt s 11 ) = 0, 1 ~ k ~ i. (105.5)
Indeed
n
rl=Ax 1-b=A(x,-x)=- ~ a1ABs1
J-t+t
and further
n
(Cr 1, s11) =-. ~ a1 (CABs I• '") =0
1=i+t
for every k ~ i.
We assume that the system of vectors s1 , • • • , sn is constructed
parallel to the system r 0 , ••• , rn_ 1 using the process of its CAB-
pseudoorthogonalization. Set s1 = r 0 and for every i we have
t
St+t = r, + 11-t
~ P11, t+tSII• (105.6)
(CABx, y) and (Cx, y) for that matter. Indeed, observe that the
adjoint operator in (105.9) is connected with the basic scalar product
of a unitary space, whereas the orthogonality of vectors s1 and r 1
is ensured in relation to the scalar products (CABx, y) and (Cx, y)
respectively
cAs(AB)* = (CAB·AB· (CAB)- 1 )* = (CABC- 1)* = aE+PAB,
c(AB)* = (CABC- 1)* = aE + PAB.
The implementation of methods of conjugate directions can be
prevented only by the vanishing of one of the scalar products,
(CABs, s 1) or (Cr 1_ 1 , r,_ 1), before the discrepancy vanishes. If
(CABs, s 1) = 0, then the coefficients a, and b1 cannot be computed.
If, however, (Cr 1 _ 1 , r 1 _ 1 ) = 0, then this leads to a zero coefficient a,.
to the coincidence of the nonzero discrepancies r 1_ 1 and r 1 and hence
to the equation (CABsl+lt s 1+ 1) = 0 holding. Such a situation can
be avoided by choosing a new initial vector x 0 • If the operators CAB
and C are positive definite, then the above degenerations are impos-
sible and the computational process runs without complications. If
CAB is positive definite, then the methods of conjugate directions
acquire further interesting properties.
The closeness of a vector z to the solution of (105.1) can be judged
by the smallness of the square of some norm of the difference e =
= x - z. To that end it is convenient to use the so-called generalized
e"or functionals of the form (Re, e), where R is any positive definite
operator, for example B- 1*CA. That operator is positive definite,
since it is connected with the operator CAB by the relation B- 1*CA =
= B-1 • (CAB) B- 1 . We have
Theorem t05.t. If an operator CAB is positive definite, then among
the totality of vectors of the form z = x 0 + Bs, where s is in the span
of vectors s1 , • • • , s,. a vector x 1 gives a minimum of the generaltzed
e"or functional
<p (z) = (B- 1 *CAe, e).
We have
<p (z) = (.8"" 1*CA (x- z), x- z)
372 Bilinear Forms in Computational Processes [Ch. 13
1 n
= ~ !aJ-hJI 2 (CABs1, sJ)+
J=l
2: laJI 2 (CABsJ,
J=t+l
sJ)•
Exercises
1. Prove that the matrix of a bilinear form (CA Bx, y) is:
right triangular m a basis sJ, ... , sn,
right almost triangular in a basis r 0 , ••• , rn_1,
left triangular in bases s1 , • • • , sn and r 0 , ••• , rn_ 1 if Cis a Hermitian opera-
tor.
2. How does the form of the matrt:\ of the bilinear form (CABx, y) of Exer-
cise 1 change if CAB is a Hermitian operator?
3. Prove that the matrix of a bilinear form (Cx, y) is:
right triangular 1n a basis r 0 , ••• , rn _1,
right triangular in bases r 0 , . . . , rn- 1 and s1 , ••• , sn,
right triangular in bases A Bs1 , • • • , A Bsn and s1 , • • • , sn,
right almost triangular in bases ABr0 , ••• , ABrn-1 and r 0 , ••• , rn_1.
4. How does the form of the matrix of the bilinear form (Cz, y) of Exercise 3
change if C is a Hermitian operator?
5. Prove that if condition (105.9) is replaced by the following:
(CABC-1)* = a. 1 E +
~AB + ... + a.p (AB)P, p> t,
then relation. (105.10) will be like this:
Sf+l = r1 + b1s1 + b1- 1st-1 + •••+ bl-p+JSI-P+l•
6. Prove that
(Crl-1• r1-1)
a,= (CABs,. s 1) o
7. Prove that if CAB and Care Hermitian operators then
b (Cr,. r 1)
I= (Crl-1• rl-1) o
8. Prove that if CAB and C are Hermitian. and positive definite operators
then a1 < 0 and b 1 > 0 for every t.
9. Prove that the matrix of an operator AB in a basis made up of vectors
s., ... •_!n or r 0 , • • • , rn-I has a tridiagonal form.
tO. How do the methods of conjugate directions run if A is a singular opera-
tor?
where
X=- Xo .L ajUJ.
+ 1=1 (107 .3)
If
i
x 1 = x0 + ~ a1uJ>
1=1
P11 e0 = ~~ a1u1.
that those vectors satisfy relations (107.2) implies that the matrix
C = V*AU
is nonsingular left triangular. From this we obtain the following
factorization of the matrix A:
A = v-l•cu- 1. (107.8)
So a knowledge of A-pseudodual, up to normalization, systems
of vectors allows us to solve the system of linear algebraic equations
(107.1) with error estimates (107.7) and to obtain factorization
(107.8) of the matrix A into factors among which there is one triangu-
lar factor. We show that the converse is also true. That is, any meth-
od of solving systems of linear algebraic equations, based on factor-
ing a matrix into factors among which there is at least one that is
triangular, determines some A-pseudodual, up to normalization,
systems of vectors. Hence implementing such methods according
to schemes (107 .3) to (107 .6), it is possible to use estimates (107. 7).
Consider a matrix P obtained from a unit matrix by reversing
its columns or equivalently its rows. It is easy to verify that multi-
plying an arbitrary matrix C on the right by P reverses the order
of the columns of C and multiplying a matrix CP on the left by P
reverses the rows of CP. The elements ! 11 of the matrix F = PCP are
therefore connected with the elements c11 of C by the relation
f11 = Cu-1+1. n-J+l•
Exercises
t. Assume an operator to be a matrix and vectors of
a space to be column vectors. Prove that in the methods of dual directions
suCCessive errors are related by
ell = (E - Sit) elt-1•
where~
UJtV:A
S~t= v*Au • (107.10)
It It
4. What role do operators Sit and Pit play io. the particular methods defined
by factorization (107.9)?
5. How do errors ell change in the particular methods defined by factoriza-
tion (107.9)?
6. Which of the known methods of solving systems of linear algebraic equa-
tions is not based on factorization (107.9)?
l 0 'Yn-1 an
Thus the largest value of the sum in (108.9) may be attained only
for those d1 , • • • , dn for which d1 :::fo 0 only when A.1 = A.,.,. Hence-
the maximum value of the quotient in (108.7) is attained on eigen-
vectors corresponding to A.,.,. Of course equation (108. 7) follows too.
This theorem shows a way of constructing a numerical method of
finding the eigenvalues and the eigenvectors of the operator A,
which is based on seeking the maxima of the function
a(x) = (Ax, x)
(x, x)
a (x + el) = a (x) + ( 2£
X, X
) Re (Ax - a (x) x, l) +0 (e1 ).
(108.10)
If Re (Ax- a (x) x, l) :::fo 0, then for sufficiently small real e whose
sign coincides with that of the real part of the scalar product we get
+
a (x el) > a (x), the inequality being opposite for small e of the
opposite sign.
386 Bilinear Forms in Computational Processes [Ch. 13