0% found this document useful (0 votes)
244 views393 pages

Linear Algebra (Mir, 1983)

This document is a preface and table of contents for a textbook on linear algebra and analytic geometry. The textbook is intended as a comprehensive course for students studying computational mathematics and covers topics like vector spaces, linear operators, bilinear forms, and their applications to computational processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views393 pages

Linear Algebra (Mir, 1983)

This document is a preface and table of contents for a textbook on linear algebra and analytic geometry. The textbook is intended as a comprehensive course for students studying computational mathematics and covers topics like vector spaces, linear operators, bilinear forms, and their applications to computational processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 393

B. B.

Boeso,n;rm

JIMHEiiHAH
AJifEBPA

Ma):J;aTeJILCTBO <<HayKa>> MocRBa


V.V.Voyevodin

LINEAR
ALGEBRA
Translated from the Russian
by
Vladimir S hokurov

~ Mir Publishers Moscow


First published 1983
Revised from the 1980 Russian edition

TO THE READER

Mir Publishers would be grateful for your com-


ments on the content, translation and design of this
book. We would also be pleased to receive any other
suggestions you may wish to make.
Our address is:
~ir F1ibl~rs
2 Pervy Rizhsky Pereulok
1-HO, GSP, ~oscow, 129820
USSR

Ha aHZAUilC~OM U1>1~e

@ rnaa&aR pep;aKU.HR cflBBIIKO-MaTeMaTB'leCKOU nHTep&Typy


uap;aTem.cTBa «HayKat, 1980
© English translation, ~ir Publishers, 1983
Contents

Preface 9

Part I. Vector Spaces

Chapter 1. Sets, Elements, Operations


1. Sets and elements 11
2. Algebraic Operation 13
3. Inverse operation 16
4. Equivalence relation 19
5. Directed hne segments 21
6. Addition of directed line segments 23
7. Grou~ U
8. Rings and fields 30
9. Multiplication of directed line segments by a number 33
10. Vector spaces 36
11. Finite sums and products 40
12. Approximate calculations 43

Chapter 2. The Structure of a Vector Space


13. Linear combinations and spans 45
14. Linear dependence 47
15. Equivalent systems of vectors 50
16. The basis 53
17. Simple examples of vector spaces 55
18. Vector spaces of directed line segments 56
19. The sum and intersection of subspaces 60
20. The direct sum of subspaces 63
21. Isomorphism of vector spaces 65
22. Linear dependence and systems of linear equations 69

Chapter 3. Measurements In Vector Space


23. Affine coordinate systems 74
24. Other coordinate systems 79
6 Contents

25. Some problems 81


26. Scalar product 88
27. Euclidean space 91
28. Orthogonality 94
29. Lengths, angles, distances 98
30. Inclined line, perpendicular, projection 101
31. Euclidean isomorphism 1M
32. Unitary spaces 106
33. Linear dependence and orthonormal systems 107

Chapter 4. The Volume of a System of Vectors In Vector Space


34. Vector and triple scalar products 109
35. Volume and oriented volume of a system of vectors 114
36. Geometrical and algebraic properties of a volume 116
37. Algebraic properties of an oriented volume 121
38. Permutations 123
39. The existence of an oriented volume 125
40. Determinants 127
41. Linear dependence and determinants 132
42. Calculation of determinants 135

Chapter 5. The Straight Line and the Plane In Vector Space


43. The equations of a straight line and of a plane 136
44. Relative positions 141
45. The plane in vector space 145
46. The straight line and the hyperplane 148
47. The half-space 153
48. Systems of linear equations 155

Chapter 6. The Limit in Vector Space


49. Metric spaces 160
50. Complete spaces 162
51. Auxiliary inequalities 165
52. Normed spaces 167
53. Convergence in the norm and ;coordinate convergence 169
54. Completeness of normed spaces 172
55. The limit and computational processes 174

Part II. Linear Operators


Chapter 7. Matrices and Linear Operators
56. Operators 177
57. The vector space of operators 480
Cnolent11 7

58. The ring of operators 182


59. The group of nonsingular operators 184
60. The matrix of an operator 187
61. Operations on matrices 191
62. Matrices and determinants 195
63. Change of basis 198
64. Equivalent and similar matrices 201

Chapter 8. The Characteristic Polynomial


65. Eigenvalues and eigenvectors 204
66. The characteristic polynomial 206
67. The polynomial ring 209
68. The fundamental theorem of algebra 213
69. Consequences of the fundamental theorem 217

Chapter 9. The Structure of a Linear Operator


70. Invariant suhspaces 222
71. The operator pol~ nomial 225
72. The triangular form 227
73. A direct sum of operators 228
74. The Jordan canonical form 232
75. The adjoint operator 235
76. The normal operator 240
11. Unitary a11d Hermitian operators 242
78. Operators A • A and A A • 246
79. Decomposition of an arhitruy operator 248
80. Operators in the real space 250
St. Matrices of a special form 253

Chapter 10. Metric Properties of an Operator


82. The continuity and houndedness of an operator 256
83. The norm of an operator 258
84. Matrix norms of an operator 262
85. Operator equations 265
86. Pseudosolutions and the pseudoinverse operator 267
87. Perturbation and nonsingularity of an operatoi 270
88. Stahle solution of equations 274
89. Perturbation and eigenvalues 279

Part III. Bilinear Forms


Chapter 11. BIIIDear and Quadratic Forms
90. General properties of bilinear and quadratic forms 283
91. The matrices of bilinear and quadratic forms 289
8 Contents

92. Reduction to canonical form 295


93. Congruence and matrix decompositions 303
94. Symmetric bilinear forms 308
95. Second-degree hypersurfaces 315
96. Second-degree curves 320
97. Second-degree surfaces 327

Chapter 12. Bilinear Metric Spaces


98. The Gram matrix and determinant 333
99. Nonsingular suhspaces 339
100. Orthogonality in bases 342
101. Operators and hili near forms 349
102. Bilinear metric isomorphism 354

Chapter 13. Bilinear Forms in Computational Processes


103. Orthogonalization processes 357
104. Orthogonalization of a power sequence 363
105. Methods of conjugate directions 367
106. Main variants 373
107. Operator equations and pseudoduality 377
108. Bilinear forms in spectral problems 381
Conclusion 387
Inde:a: 389
Preface

This textbook is a comprehensive united


course in linear algebra and analytic geometry based on lectures
read by the author for many years at various institutes to future
specialists in computational mathematics.
It is intended mainly for those in whose education computational
mathematics is to occupy a substantial place. Much of the instruc-
tion in this speciality is connected with the traditional mathemat-
ical courses. Nevertheless the interests of computational mathemat-
ics make it necessary to introduce large enough changes in both
the methods of presentation of these courses and their content.
There are a lot of good textbooks of linear algebra and analytic
geometry, including English textbooks. Their direct use for training
specialists in computational mathematics proves difficult, however.
To our mind, this is mainly due to computers to be requiring much
more facts from linear algebra than is usually given in the available
books. And the necessary additional facts fail as a rule to appear
in the conventional texts of linear algebra and are instead contained
in either papers scattered in journals or books relating to other
branches of mathematics.
Computer students begin to get familiar with linear algebra and
analytic geometry sufficiently early. At this same time their scientif-
ic world outlook begins to shape. Therefore what is read in this
course and how it is read determine to a great extent the future
perception of entire computational mathematics by the students.
Of course computer students must get a systematic and rigorous
presentation of all the fundamentals of algebra and geometry. But
they must be made familiar as early as possible at least briefly
with those problems and methods which computational algebra
has accumulated.
Introduction to problems of computations allo,vs the lecture
course to be effectively accentuated in the interests of computational
mathematics and a close relation to be established between theory
and numerical methods in linear algebra. The basic material for
this is provided by the simplest facts pertaining to such topics
as round-off errors, perturbation instability of many basic notions
of linear algebra, stability of orthonormal systems, metric and
10 Preface

normed spaces, singular decomposition, bilinear forms and their


relation to computational processes, and so on.
Of course incorporation of novel and sufficiently large material
is impossible without substantial restructuring of the traditional
course. This book is an attempt at such a restructuring.
V. V. Voyevodln
J.lay, 6 1982
PART I

Vector Spaces

CHAPTER 1

Sets, Elements, Operations

t. Sets and elements


In all areas of activity we have continually
to deal with various collections of objects united by some common
feature.
Thus, studying the design of some mechanism we may consider
the totality of its parts. An individual object of the collection
may be any of its parts, the feature uniting all the objects being
the fact that they all belong to a quite definite mechanism.
Speaking of the collection of points of a circle in the plane we
actually speak of objects, points of a plane, that are united by the
property that they are all the same distance from some fixed point.
It is customary in mathematics to call a collection of objects
united by some common feature a set and to term the 0bjects ele-
ments of the set. It is impossible to give the notion of set a rigorous
definition. We may of course say (as we did!) that the set is a "col-
lection", a "system", a "class" and so on. It looks, however, very
much like a formal utilization of the rich vocabulary of our language.
To define a concept it is first of all necessary to point out how it
is related to more general notions. For the concept of set this cannot
be done, since there is no more general concept for it in mathemat-
ics. Instead of defining it we are compelled to resort to illustra-
tions.
One of the simplest ways of describing a set is to give a complete
list of elements constituting the set. For example, the set of all
books available to a reader of a library is completely defined by
their lists in the library catalogues, the set of all prices of goods
is completely defined by a price-list and so on. Ho;wever, this method
applies only to finite sets, i.e. such that contain a finite number of
elements. But infinite sets, i.e. sets containing infinitely many
elements, cannot be defined with the aid of a list. How, for example,
can we compile a list of all real numbers?
When it is impossible or inconvenient to give a set by using
a list, it can be given by pointing out a characteristic property,
12 Sets, Elements, Operations [Ch. 1

i.e. a property possessed only by the elements of the set. In problems


in defining loci, for example, a characteristic property of the set of
points which is the solution of the problem is nothing than the
collection of conditions these points must satisfy according to the
requirements of the problem.
The description of a set may be very simple and cause no dif-
ficulties. For example, if we take a set consisting of two numbers,
1 and 2, then clearly neither the number 3 nor a notebook or a car
will be contained in that set. But in the general case, giving sets
by their characteristic properties sometimes results in compli-
cations. The reasons for these are rather numerous.
One reason seems to be the insufficient defmiteness of the con-
cepts used to describe sets. Suppose we are considering the set of
all planets of the solar system. What is the question? There are
nine major planets known. But there are over a thousand minor
planets, or asteroids, turning round the sun. The diameters of some
measure hundreds of kilometres, but there are also such whose
diameter is under one kilometre. As methods of observation improve,
smaller and smaller planets will be discovered, and finally the
question will arise as to where the minor planets end and the mete-
orites and solar dust begin.
These are not the only difficulties with the definition of the struc-
ture of a set. Sometimes sets, quite well defined at first sight, turn
out to be very poorly defined, if defined at all. Suppose, for example,
some set consists of one number. Let us define that number as the
smallest integer not definable in under a hundred words. Assume that
only words taken from some dictionary and their grammatical
forms are used and that the dictionary contains such words as "one",
"two" and so on.
Notice that on the one hand such an integer must not exist for
it is defined in under a hundred words, italicized above, and accord-
ing to the meaning of the words it cannot be defined in such a way.
But on the other hand since the number of the words used in the
language is finite this means that there are integers that cannot be
defined in under a hundred words and hence there is a smallest
one among these integers.
The area of mathematics called set theory has accumulated many
examples where the definition of the set is intrinsically contradictory.
The study of the question under what conditions this may happen
has led to deep investigations in logic, we shall lay them aside,
however. Throughout the following it will be assumed that we are
considering only the sets that are defined precisely and without
contradictions and have a structure that raises no doubts.
As a rule, we shall denote sets by capital Latin letters A, R . ..•
and elements of sets by small letters a, b, . . . . We shall write
x E A if an Plement x is in a set A and x Ei A if it is not.
2) Algebraic operation 13

We shall sometimes introduce into consideration the so-called


empty set, or null set, i.e. the set that contains no elements. It is
convenient to use this set where it is not known in advance whether
there is at least one element in the collection under consideration.

Exercises
1. Construct finite and infinite sets. What properties
are characteristic of them?
2. Construct sets whose descriptions contain a contradiction.
3. Is the set of real roots of the polynomial z 4 + 4z3+ 7z1+ + 4z t
empty?
4. Construct sets whose elements are sets.
5. Construct sets that contain themselves as an element.

2. Algebraic operation
Among all possible sets there are such on
whose elements some operations are allowed to be performed. Suppose
we are considering the set of all real numbers. Then for each of its
elements such operations as calculation of the absolute value of
that element and calculalion of the sine of that element can be
defined, and for every pair of elements addition and multiplication
can be defined.
In the above example, note especially the following features of
the operations. One is the definiteness of all operations for any
element of the given set, another is the uniqueness of all operations
and the final feature is that the result of any operation belongs
to the elements of the same set. Such a situation takes place by far
not always.
An operation may be defined not for all elements of a set; for
example, calculation of logarithms is not defined for negative num-
bers. Taking the square root of positive numbers is defined, but
not uniquely. However, even if an operation is uniquely defined
for every element, its result may not be an element of the given set.
Consider division on the set of positive integers. It is clear that for
any two numbers of the set division is realizable but its result is
not necessarily an integer.
Let A be a set containing at least one element. We shall say that
an algebraic operation is defined in A if a law is indicated by which
any pair of elements, a and b, taken from A in a definite order, is
uniquely assigned a third element, c, also from that set.
This operation may be called addition, and c will then be called
the sum of a and b and designated c = a +
b; it may be called
multiplication, and c will then be called the product of a and b
and designated c = ab.
In general terminology and notation for an operation defined
in A will not play any significant part in what follows. A!>- a rule,
14 Seta, Elements, Operations [Ch. I

we shall employ the notation of the sum and product regardless of


the way the operation is in fact defined. But if it is necessary to
emphasize some general properties of the algebraic operation,
then it will be designated •·
Consider some simple examples to see what features an algebraic
operation may have. Let A be the set of all positive rational num-
bers. We introduce for the elements of A the usual multiplication
and division of numbers and use the conventional notation. It is
not hard to check that both operations on A are algebraic. But while
for multiplication ab = ba for every element of A, i.e. the order
of the elements is immaterial, for division, on the contrary, it
is very essential, for the equation a : b = b : a is possible only
if a = b. Thus, although the algebraic operation is defined for an
ordered pair of elements the ordering of the elements is inessential.
An algebraic operation is said to be commutative if its result
is independent of the order of choosing the elements, i.e. for any
two elements a and b of a given set a • b = b • a. It is obvious that
of the conventional arithmetical operations on numbers addition
and multiplication are commutative and subtraction and division
are noncommutative.
Suppose now that three arbitrary elements, a, b and c, are taken.
Then the question naturally arises as to what meaning should be
given to the expression a • b •c. How can the algebraic operation
defined for two elements be applied to three?
Since we can apply an algebraic operation only to a pair of ele-
ments, the expression a • b •c may be given a definite meaning by
bracketing either the first two or the last two elements. In the first
case the expression becomes (a • b) • c and in the second we get
a • (b • c). Consider the elements d = a • b and e = b • c. Since
they are members of the original set, (a • b) • c and a • (b • c) may
be considered as the result of applying the algebraic operation
to the elements d, c and a, e, respectively.
In general, the elements d • c and a • e may turn out to be different.
Consider again the set of positive rational numbers with the algeb-
raic operation of division. It is easy to see that as a rule (a : b) : c =1=
=I= a : (b : c). For example, ((3/2) : 3) : (3/4) = 2/3, but
(3/2) : (3 : (3/4)) = 318.
An algebraic operation is said to be tl$Sociative if for any three
elements, a, b and c, of the original set a • (b • c) = (a • b) • c.
The associativity of an operation allows one to speak of a uniquely
defined result of applying an algebraic operation to any three ele-
ments, a, b and c, meaning by it any of the equivalent expressions
a • (b • c) and (a • b) • c, and write a • b • c without brackets.
In the case of the associative operation one can also speak of the
uniqueness of the expression a 1 • a 2 • • • • • an containing any
finite number of elements a 1 , a 2 , • • • , an. By a 1 • a 2 • • • • • an
2) Algebraic operation 15

we shall mean the following. Arrange brackets in this expression


in an arbitrary way, if only it may be defined by successive appli-
cation of an algebraic operation to pairs of elements. For example,
for the five elements a 1 , a 2 , a 3 , a 6 and as the brackets may be ar-
ranged either like this: a 1 • ((a 2 • a 3 ) • (a 6 • as)) or like this:
((a 1 • a 2 ) • a 3 ) • (a 6 • a 5) or in many other ways.
We prove that for an associative operation the result of a calcula-
tion is independent of bracket arrangement. Indeed, for n = 3
this assertion follows from the definition of an associative opera-
tion. Therefore we set n > 3 and assume that for all numbers less
than n our assertion is already proved.
Let elements a 1 , a 2 , • • • , an be given and suppose that brackets
indicating the order in which to perform the operation are arranged
in some way. Notice that the final step is always to perform the
operation on the two elements a 1 • a 2 • • • • • a" and a11.+1 • ak+z • •••
. • . • an for some k satisfying the condition 1 ~ k ~ n - 1. Since
both expressions contain fewer than n elements, by assumption
they are uniquely defined and it remains for us to prove that for
any positive integers k, l, l ~ 1,

(Gt • a 1 • ••• • a,.) • (a1l+ 1 • ak+t • ••• • an)

= (al * a2 * · · · *a~~;+ I) * (a,.+ 1+1 *all.+ I+ 1 * · · · * tJn).


Letting
a1 • a 1 • ••• • a~~; = b,
all+! *all+ 1 * • • • * a11;+ I = C,
a,.+ 1+1 • all.+ 1+2 • ••. • an = d,

we get, on the basis of the associativity of the operation,


b • (c • d) = (b • c) • d,
and our assertion is proved.
If an operation is commutative as well as associative, then the
expression a 1 • a 2 • • • • • an is independent of the order of its
elements. It is left as an exercise for the reader to prove that this
assertion is true.
It should not be supposed that the commutativity and associativi-
ty of an operation are in some way related to each other. It is pos-
sible to construct operations with very different combinations of
these properties. We have already seen from the examples of multi-
plication and division of numbers that an operation may be commuta-
tive and associative or noncommutative and nonassociative. Consider
two more examples. Let a set consist of three elements, a, b, and c.
16 Sets. Elements. Operationl" ICh. t

Give algebraic operations by these tables:

* a b c * a b c
a a c b a a a a (2.1)
c b Q
"
c b Q c "c c c c
b b b

and let the first element be always chosen from the column and the
second from the row, and let the result of the operation be taken
at the intersection of the corresponding row and column. In the
first case the operation is obviously commutative but not associa-
tive, since, for example,
(a • b) • c = c• c = c,
a • (b • c) = a•a = a.
In the second case the operation is not commutative, but associa-
tive, which is easy to show by a straightforward check.

EKerclses
1. Is the operation of calculating tan z on the set of
all real numbers z algebraic?
2. Consider the set of real numbers z satisfying the inequality 1 ~ 1E;; t.
Are the operations of multiplication, addition, division and subtraction ll.lgebl'a-
ic on th1s set?
3. Is the algebraic operation x • y = z 1 + y commutative and associative
on the set of all real numbers z and y?
4. Let a set consist of a single element. How can the algebraic operation be
defined on that set?
5. Construct algebraic operations on a set whose elements are sets. Are
these operations commutative, associative?

S. lnvene operation
Let A be a set with some algebraic operation.
As we know, it assigns to any two elements a and b of A a third
element c = a •:b.
Consider the collection C of the elements of A
that can be represented as the result of the gi\"en algebraic operation.
It is clear that regardless of the algebraic operation all the elements
of C are at the same time the elements of A. It is quite optional,
however, for all the elements of A to be in C.
Indeed, fix in A some element f and assign it to any pair of ele-
ments a and b of A. It is obvious that the resulting correspondence
is an algebraic operation, commutative and associative. The set C
will contain only one element f regardless of the number of ele-
ments in A.
3) Inverse operation 17

Exactly what elements of A are in C depends on the algebraic


operation. Let it be such that C coincides with A, i.e. let both sets
contain the same elements. Then each element of A can be represent-
ed as the result of the given algebraic operation on some two ele-
ments of the same set A. Of course, such a representation may be
nonunique. Nevertheless we conclude that each element of A may
be assigned definite pairs of elements of A.
Thus, the original algebraic operation generates on A some other
operation. This may not be unique, since one element may be as-
signed to more than one pair. But even if it is unique, it will not be
algebraic since it is defined not for any pair of elements, but for
only one element, although this may be arbitrary. In respect to the
given algebraic operation it would be natural to call this new opera-
tion the "inverse" one. In fact, however, by the inverse operation we
shall mean something different, something closer to the concept of
algebraic operation.
Notice that investigating the "inverse" operation is equivalent
to investigating those elements u and v which satisfy the equation
u. v = b (3.1)
for different elements b. Investigation of this equation for the two
elements u and v is easy to reduce to investigation of two equations
for one element. To do this it suffices to fix one of them and to deter-
mine the other from equation (3.1). So investigating the "inverse''
operation is mathematically equivalent to solving the equations
a • z = b, y• a = b (3.2)
for the elements z and y of A, with different elements a and b of
A fixed.
Suppose (3.2) have unique solutions for any a and b. Then each
ordered pair of elements a and b of A can be assigned uniquely
defined elements z and y of A, i.e. two algebraic operations can be
introduced. These are called respectively the right and the left
inverse of the basic operation. If they exist, we shall say that the
basic operation has an inverse. Note that the above example shows
that an algebraic operation, even commutative and associative one,
m:~y lack both the right and the left inverse.
The existence of an inverse implies in fact the existence of two,
in general different, algebraic operations, the right and the left
inverse. We are compelled therefore to speak of different elements
z and y. If, however, the algebraic operation is commutative and
has an inverse, then obviously z = y and the right inverse coincides
with the left inverse.
Consider some examples. Let A be the real axis with the usual
multiplication of numbers as algebraic operation. This has no
inverse on the given set, since when, for example, a = 0 and b = 1,
18 Sets, Elements, Operations [Ch. 1

equations (3.2) cannot hold for any numbers x and y. But if we


consider the operation of multiplication given only on the set of
positive numbers, then the operation will now have an inverse.
Indeed, for any positive numbers a and b there are unique positive
numbers x and y satisfying equations (3.2). The inverse in this
case is nothing but division of numbers. The fact that in reality
x = y is of no interest to us now.
The operation of addition has no inverse if it is given on the set
of positive numbers, since equations (3.2), for example, can hold
for no positive x and y if a = 2 and b = 1. But if the operation of
addition is given on the entire real axis, then its inverse exists and
is nothing but subtraction of numbers.
The example of addition and multiplication shows that a direct
operation and its inverse may have quite different properties. From
the associativity and commutativity of an algebraic operation
need not necessarily follow the associativity or commutativity
of its inverse, even if the inverse exists. Moreover, as already noted
above, a commutative and associative algebraic operation may
simply have neither the right nor the left inverse.
These simple examples show yet another important fact. Consider
again multiplication on the set of positive numbers. Its right and
left inverse coincide for this operation and are division of numbers.
At first it may seem that now for division of numbers the inverse
is multiplication of numbers. This is not quite so, however.
Indeed, write the corresponding equations (3.2)
a : x = b, y : a = b.
It is then obvious that
x =a: b, y = a·b.
Consequently, the right inverse of division is again division, and
the left inverse is multiplication. Thus, the inverse of an inverse
does not necessarily coincide with the original algebraic operation.

Exercises
1. Are there right and left inverses of the algebraic
operations given by tables (2.1)?
2. What are the right and left inverse of the algebraic operation z•y = zll
defined on the set of positive numbers z and y?
3. Prove that if the right and the left inverse coincide, then the original
algebraic operation is commutative.
4. Prove that if an algebraic operation has an inverse, then the right and
the left inverse have inverses too. What are these?
5. Construct an algebraic operation for which all four inverses of the inverse
operations coincide with the original operation.
4) Equivalence relation 19

4. Equivalence relation
Notice that in discussing above the properties
of the algebraic operation we implicitly assumed the possibility
of checking any two elements of a set for coincidence or noncoinci-
dence. Moreover, we treated coinciding elements rather freely
never making any distinction between them. We did not assume
anywhere that the coinciding elements were indeed one element
rather than different objects. But actually we only used the fact
that some group of elements, which we called equal, are the same
in certain contexts.
This situation occurs fairly often. Investigating general proper-
ties of similar triangles we in fact make no distinction between any
triangles having the same angles. In terms of the properties preserved
under a similarity transformation, these triangles are indistinguisha-
ble and could be called "equal". Investigating the criteria of the
equality of triangles we make no difference between the triangles
that are situated in different places of the plane but can be made
to coincide if displaced.
In many different problems we shall be faced with the necessity
of partitioning one set or another into groups of elements united
according to some criterion. If none of the elements is in two differ-
ent groups, then we shall say that the set is partitioned into disjoint,
or rwrwverlapping, groups or classes.
Although the criteria according to which the elements of a set
are partitioned into classes may be very different, they are not
entirely arbitrary. Suppose, for example, that we want to divide
into classes all real numbers, including numbers a and b into the
same class if and only if b > a. Then no number a can be in the
same class with itself, since a is not greater than a itself. Conse-
quently, no partitioning into classes according to this criterion
is possible.
Let some criterion be given. We assume that with regard to any
pair of elements a and b of a set A it can be said that either a is
related to b by the given criterion or not. If a is related to b, then
we shall write a ,.., b and say that a is equivalent to b.
Even the analysis of the simplest examples suggests the condi-
tions a criterion must satisfy for partitioning a set A into classes
according to it to be possible. Namely:
1. Reflexivity: a ,.., a for all a E A.
2. Symmetry: if a ,.., b, then b ,.., a.
3. Transitivity: if a ,.., b and b ,.., c, then a - c.
A criterion satisfying these conditions is called an equivalence
relation.
We prove that any equivalence relation partitions a set into
classes. Indeed, let Ka be a group of elements of A equivalent to
20 Sets, Elements, Operations (Ch. 1

a fixed element a. By reflexivity a E K 0 • We show that two groups


Ka and Kb either coincide or have no elements in common.
Let some element c be in Ka and K b• i.e. let c,.., a and c,.., b.
By symmetry a ,.., c and by transitivity a ,.., b and, of course,
b,.., a. If now x E K 0 , then x,.., a and hence x,.., b, i.e. x E Kb·
Similarly, if x E K b• then it follows that x E K 0 • Thus, two groups
having at least one element in common completely coincide and
we have indeed obtained a partition of the set A into classes.
Any two elements may be equivalent or nonequivalent in terms
of the criterion in question. Nothing will happen if we call equiv-
alent elements equal (with respect to a given criterion!) and non-
equivalent elements unequal (with respect to the same criterion!).
It may seem that in doing so we ignore the meaning of the word
"equal", for now elements equal according to one criterion may prove
unequal according to another. There is nothing unnatural to it,
however. In every particular problem we distinguish the elements
or do not only in relation to their properties that are of interest
to us in this particular problem, and in different problems we may
be concerned with different properties of the same elements.
It will be assumed in what follows that, whenever necessary,
for the elements of the set an equality criterion must be defined
saying that an element a is or is not equal to an element b. If a
is equal to b, then we shall write a = b, and a =1= b otherwise.
It will also be assumed that the equality criterion is an equivalence
relation. The reflexivity, symmetry and transitivity conditions
may be regarded as reflecting the most general properties of the
usual equality relation of numbers.
The equality relation allows us to partition an entire set into
classes of elements which we have decided for some reasons to con-
sider equal. This means that the difference between the elements
of the same class is of no importance to us. Consequently, in all
situations to be considered in what follows the elements called
equal must exhibit sameness.
If the equality relation is introduced axiomatically, i.e. without
reference to the particular nature of the elements, it will be agreed
to assume that the equality sign merely implies that the elements
on its sides simply coincide, that is that it is the same element.
When the equality sign is used in this way, the properties of reflex-
ivity, symmetry and transitivity require no particular convention.
Partitioning a set into classes of equal elements will make each
class consist of only one element.
Where the equality relation is introduced relying on a particular
nature of elements it may happen that some or all classes of equal
elements will consist of more than one element. This makes us
impose additional requirements on the operations on elements to be
introduced.
5) Directed line segments 2f

Indeed, as we have agreed, equal elements must exhibit sameness.


Therefore every operation to be introduced must now necessarily
give equal results when applied to equal elements. In fact we shall
never verify this requirement, and it will be left for the reader
to see for himself that the given property holds for the operations
to be introduced.

EKerclses
1. Is it possible to divide all the countries of the world
into classes, placing two countr1es in the same class if and only if they have
a common border? If not, why?
2. Consider a set of cities with motorway communication. Say that two
cities A and B are connected if one can go from A to B by motorway. Can the
cities be divided into classes according to this criterion? If they can, what are
the classes?
3. Say that two complex numbers a and b are equal in absolute value if
I a I = 1 b I· Is this criterion an equivalence relation? What is this partition
into classes?
4. Consider the algebraic operations of addition and multiplication of com-
plex numbers. How do they act on classes of elements equal in absolute value?
5. Construct algebraic operations on the set defined in Exercise 2. How do
they act on the classes?

5. Directed line segments


The foregoing examples may give an impres-
sion that all talk about operations on elements of sets concerns
only operations on various number sets. It is not so, however.
In what follows we construct many examples of other kinds of sets
with operations, but for the present consider just one example which
will be constantly referred to throughout the course.
The most fundamental concepts of physics are such notions as
force, displacement, velocity, acceleration. They are all character-
ized not only by a number giving their magnitude but also by some
direction. We now construct a geometrical analogue of such no-
tions.
Let A and B be two distinct points in space. On the straight
line through them they define in a natural way some line segment.
It is assumed that the points are always given in a definite order,
for example, first A is given and then B is. Now we can state a direc-
tion on the constructed line segment, namely, the direction from
the first point A to the second point B.
A line segment together with the direction stated on it is called
a directed line segment with initial point A and terminal point B.
It will be otherwise termed a vector, and A will be called the point
of application of the vector. A vector with point of application A
will be said to be fixed at A.
22 Sets, Elements, Operations [Ch. 1

For directed line segments or vectors double notation will be


used. If it must be stressed that a directed line segment with initial
~

point A and terminal point B is meant, we shall write AB. But


if we do not care exactly what points of the directed line segment
are limiting points. then we shall use some simpler notation, small
Latin letters, for example. In drawings directed line segments will
be denoted by arrows, with arrowheads always at the terminal
point of the line segment.
In a directed line segment it is essential which of the limiting
~

points is initial and which is terminal. Directed line segments AB


~

and BA will therefore be considered


D 1ifferent.
-...;----r- So we can construct different sets
whose elements are directed line
segments. Before introducing opera-
tions on elements we define what
directed line segments will be consid-
ered equal.
A Consider first the (parallel) trans-
lation of a directed line segment
Fig. 5.1 ~
AB to a point C. Let C be off
the straight line through A and B
(Fig. 5.1). Draw the straight line through A and C, then the straight
line through C parallel to AB, and finally the straight line through B

-
parallel to AC. Denote the point of the intersection of the last two
lines by D. The directed line segment CD will be precisely the
~

result)f the translation of AB to C. But if Cis on the straight line


--+
through A and B, then the directed line segment CD is obtained
~

by shifting the directed line segment AB along the straight line


containing it until the point A coincides with C.
~ow we can give a definition of the equality of vectors. Two
vectors are said to be equal if they can be made to coincide under
a translation. It is not hard to see that this definition of equality
is an equivalence relation, i.e. possesses the properties of reflexivity,
symmetry and transitivity.
Thus the collection of all vectors can be broken down in a natural
way into classes of equal vectors. It is simply sufficient to describe
each of the classes. It is obtained by translating any of the vectors
of a class to each point of space.
Notice that there is one and only one vector, in every class of
equal vectors, fixed at any point of space. In comparing vectors
6) Addition of directed line segments 23

a and b therefore we can use the following device. Fix some point
and translate to it the vectors a and b. If they completely coincide,
then a = b, and if not, a =F b.
Besides the set consisting of all vectors of a space we shall often
deal with other sets. These will mainly be sets of vectors either
parallel to some straight line or lying on it or parallel to some plane
or lying in it. Such vectors will be called respectively collinear and
coplanar. Of course, on the sets of collinear and coplanar vectors
the above definition of the equality of vectors is preserved.
We shall also consider the so-called zero directed line segments
whose initial and terminal points coincide. The direction of zero
vectors is not defined and they are all considered equal by defini-
tion. If it is not necessary to specify the limiting points of a zero
vector, then we shall denote that vector by 0.
Also it will be assumed by definition that any zero vector is
parallel to any straight line and any plane. Throughout the following
therefore, unless otherwise specified, the set of vectors of a space,
as well as any set of collinear or coplanar vectors, will be assumed
to include the set of all zero vectors. This should not be forgotten.

Exercises
1. Prove that the nonzero vectors of a space can be
partitioned into classes of nonzero collinear vectors.
2. Prove that any class of nonzero collinear vectors can be partitioned into
classes of nonzero equal vectors.
3. Prove that any class of nonzero equal vectors is entirely in one and only
one class of nonzero collinear vectors.
4. Can the nonzero vectors of a space be partitioned Into classes of coplanar
vectors? If not, why?
5. Prove that any set of nonzero coplanar vectors can be partitioned into
classes of nonzero collinear vectors.
6. Prove that any pair of different classes of nonzero collinear vectors is
entirely in one and only one set of nonzero coplanar vectors.

6. Addition
of directed line segments
As already noted, force, displacement, veloc-
ity and acceleration are the originals of the directed line segments
we have constructed. If these line segments are to be useful in solv-
ing various physical problems, we must take into account the cor-
responding physical analogies when introducing operations.
Well known is the operation of addition of forces performed by
the so-called parallelogram law. The same law is used to add dis-
placements, velocities and accelerations. According to the introduced
terminology this operation is algebraic, commutative and associa-
tive. Our immediate task is to construct a similar operation on
directed line segments.
24 Sets, Elements, Operations [Ch. 1

The operation of vector addition is defined as follows. Suppose


it is necessary to add vectors a and b. Translate the vector b to
the terminal point of a (Fig. 6.1). Then the sum a+ b is the vector
whose initial point coincides with the initial point of a and whose
terminal point coincides with the ter-
minal point of b. This rule is usually
called the triangle law.
It is obvious that vector addition is
an algebraic operation. We shall prove
that it is commutative and associative.
Fig. 6.1 To establish the commutativity of
addition suppose first that a and b are
not collinear. Apply them to a common origin 0 (Fig. 6.2). Denote
by A and B the terminal points of a and b respectively and
consider the parallelogram OBCA. It follows from the definition

-- --
of the equality of vectors that

BC=OA=a,

- AC=OB=b.

But then the same diagonal OC of the parallelogram OBCA is simulta-


neously a+ b and b +
a. The collinearity of a and b is obvious.

c
A

0
0
Fig. 6.2 Fig. 6.3

Notice that incidentally we have obtained another way of con-


structing a vector sum. Namely, if on vectors a and b fixed at one
point we construct a parallelogram, then its diagonal fixed at the
same point will be the sum a b. +
To prove the associativity of addition, apply a to an arbitrary

____ _
point 0, b to the terminal point of a, and c to the terminal point
of b (Fig. 6.3). Denote by A, B and C the terminal points of a, b
and c. Then __.
(a+ b)+c= (OA+AB)+ BC= OB+ BC= OC,
--+ --+ --+ --+ --+ --+
a+ (b+c) =0A+(AB+ BC)=OA +AC=OC.
6) Addition of directed line segments 25

From the transitivity of the equality relation of vectors we conclude


that the operation is also associative.
These properties of vector addition allow us to calculate the sum
of any number of vectors. If we apply a vector a 2 to the terminal
point of alt a 3 to the terminal point of a 2 and so on, and finally
a vector an to the terminal point of an-It then the sum a1 + a 2 + ...
. . . + IZn will be a vector whose initial point coincides with the
initial point of a 1 and whose terminal point coincides with the
terminal point of IZn· This rule of constructing a vector sum is called
the polygon law.
We now discuss the existence of an inverse for vector addition.
As is known, to answer this question it is necessary to investigate
the existence and uniqueness of the solution of the equations
a +x = b, y +a= b
for arbitrary vectors a and b. By virtue of the commutativity of
the basic operation it is obvious that it suffices to examine only
one of the equations.
---
Take an arbitrary directed line segment AB. Using an elementary

--
geometric construction we establish that always

AB-L- BA=O,
Therefore the equation
(6.1)

--
for any vectors AB and CD will clearly have at least one solution,
(6.2)

for example,
-- X=BA+CD. (6.3)

- - -
AB+x=CD,

-- - - -
AB+z=CD. -
Suppose (6.2) holds for some other vector z as well, i.e.

Then adding BA to both sides of these equations we get, in view


of (6.1), x = BA +CD, z = BA + CD and hence x = z.
Thus, the operation of vector addition has an inverse. It is vector
subtraction. If for vectors a, b and c we have a + c = b, then we
write in symbols c = b - a. The vector b - a uniquely determined
by the vectors a and b is called the vector difference. The justification
of this notation will be given somewhat later.
It is easy to show a rule for constructing the difference of two
given vectors a and b. Apply these to a common point and construct
26 Sets, Elements, Operations [Ch. 1

a parallelogram on them (Fig. 6.4). We have already shown above


that one of the parallelogram diagonals is the sum of the given
vectors. The other diagonal is easily seen to be the difference of the
same vectors. This rule of constructing the sum and the difference
of vectors is usually called the paralle~gram law.
Notice that we could define addition not for the set of all vectors
of a space but only for one of the sets of collinear or coplanar vectors.
The sum of two vectors of any such set
will again be in the same set. The opera-
tion of vector addition therefore remains
algebraic in this case too. Moreover, it
preserves all its properties and, what
is especially important, it has as before
an inverse. The validity of the last
Fig. 6.4
assertion follows from formula (6.3). If
~

vectors AB and CD are parallel to some


straight line or plane, then it is obvious that so is the vector
~ ~ ~ ~

BA + CD or equivalently the difference vector CD - AB.


Thus, the operation of vector addition is algebraic, commutative
and associative, and has an inverse on the sets of three types: on the
set of vectors of a space, on the set of collinear vectors and on the
set of coplanar vectors.

Exercises
1. Three forces equal in magnitude and directed along
the edges of a cube are applied to one of its vertices. What is the direction of
the sum of these forces?
2. Let three different classes of collinear vectors be given. When can any
vector of a space be represented as the sum of three vectors of these classes?
3. Applied to the vertices of a regular polygon are forces equal in magnitude
and directed to its centre. What is the sum of these forces?
4. What is the set of the sum of vectors taken from two different classes of
collinear vectors?

7. Groups
Sets with one algebraic operation are in
a sense the simplest and it is therefore natural to begin our studies
just with such sets. We shall assume the properties of an operation
to be axioms and then deduce their consequences. This will allow us
later on to immediately apply the results of our studies to all sets
whel'e the operations have similar properties, regardless of specific
features.
A group is a set G with one algebraic operation, associative (al-
though not necessarily commutative), for which there must exist
an inverse.
7) Groups 27

Notice that the inverse of an operation cannot be considered to


be a second independent operation in a group, since it is defined
in terms of the basic operation. As is customary in group theory,
we call the operation given in G multiplication and use the corre-
sponding notatiorl. Before considering the various examples of
groups we deduce the simplest consequences following from the
definition.
Take an element a of a group G. The existence of an inverse in
the group implies the existence of a unique element ea such that
aea = a. Consequently, this element plays the same part in multi-
plying by it the element a on the right as unity does in multiplying
numbers. Suppose further that b is any other element of the group.
It is obvious that there is an element y satisfying ya = b. We now
get
b = ya = y (aea) = (ya) ea = bea.
So ea plays the part of the right unity with respect to all elements
of G, not only with respect to a. An element with such a property
must be unique. Indeed, all such elements satisfy ax = a, but by
the definition of the inverse of an operation this equation has a
1
unique solution. Denote the resulting element by e •

Similarly we can prove the existence and uniqueness in G of e"


satisfying e"b = b for every b in G. In fact e' and e" coincide, which
1
follows from e"e' = e" and e"e' = e •

Thus we have obtained 8 first important consequence: in any


group G there is a unique element e satisfying
ae = ea =a
for every a in G. It is called the identity (or identity element) of
8 group G.
The definition of the inverse also implies the existence and unique-
ness for any a of elements a' and a" such that
I
aa = e, a"a =e.
They are called the right and the left inverse element respectively.
It is easy to show that in this case they coincide. Indeed, consider
an element a"aa' and calculate it in two different ways. We have
tr
aaa =a tr ( aa 1) =ae=a,
1 n n

a n aa I = (a"a) aI = ea = a . I I

Consequently, a" = a'. This element is the inverse of a and denoted


by a-i.
Now we have obtained another important consequence: in any
group G every element a has a unique inverse element a-1 for which
aa-1 = a-1a =e. (7.1)
28 Sets, Elements, Operations [Ch. 1

Because of the associativity of the group operation we can speak


of the uniqueness of the product of any finite number of elements
of a group given (in view of the possible non-commutativity of
the group operation) in a definite order. Taking into account (7.1)
it is not hard to indicate the general formula for the inverse element
of the product. Namely,
(7.2)
From (7 .1) it follows that the inverse element of a- 1 is the ele-
ment a and the inverse of the identity element is the identity ele-
ment, i.e.
e- 1 =e. (7.3)
Verifying that a set with one associative operation is a group is
greatly facilitated by the fact that in the definition of a group the
requirement that the inverse operation should hold can be replaced
by the assumption about the existence of an identity and inverse
elements, on only one side (say, right) and without the assumption
that they are unique. More precisely, we have the following
Theorem 7.1. A set G with one associative operation is a group if G
has at least one element e with the property ae = a for every a ln G
and with respect to that element any element a in G has at least one
right inverse element a-1 , i.e. aa- 1 =e.
Proof. Let a-1 be one of the right inverses of a. We have
aa- 1 = e = ee = eaa-1 •
Multiply both sides of this equation on the right by one of the
right inverses of a- 1 • Then ae = eae, from which a = ea since e is
the right identity for G. Thus the element e is found to be also the
left identity for G.
If now e' is an arbitrary right identity and e" is an arbitrary
left identity, then it follows from e"e' = e' and e"e' = e" that
e' = e", i.e. any right identity is equal to any left identity. This
proves the existence and uniqueness in G of an identity element
which we again denote by e.
Further, for any right inverse element a-1
a-1 = a- 1e = a- 1aa- 1 •
Multiply both sides of this equation on the right by a right inverse
of a- 1 • Then e = a- 1a, i.e. a- 1 is simultaneously a left inverse of a.
If now a- 1' is an arbitrary right inverse of a and a- 1 " is an arbitrary
left inverse, then from
a- 1 "aa- 1' = (a-1 "a) a-1 ' = ea- 1' = a-1 ',
a- 1 "aa-1 ' = a-1 " (aa-1 ') = a- 1 "e = a-1 "
7] Groups 29

it follows that a- 1 ' = a-1 ". This implies the existence and unique-
ness for any element a in G of an inverse element a-1 •
Now it is easy to show that the set G is a group. Indeed, ax = b
and ya = b clearly hold for the elements
x = a- 1 b, y = ba- 1 •
Suppose that there are other solutions, for example, an element z
for the first equation. Then ax = b and az = b yield ax = az.
Multiplying both sides on the left by a-1 we get x = z. So the set G
is a group.
A group is said to be commutative or Abelian if the group opera-
tion is commutative. In that case the operation is as a rule called
addition and the summation symbol a + b is written instead of the
product notation ab. The identity of an Abelian group is called
the zero element and designated 0. The inverse of the operation is
called subtraction, and the inverse element is called the negative
element. It is denoted by -a. It will be assumed that by definition
the difference symbol a- b denotes the sum a+ (-b).
But if for some reason we shall call the operation in a commutative
group multiplication, then its inverse will be assumed to be division.
The now equal products a-1 b and ba- 1 will be denoted by bla and
called a quotient of b by a.

&ere lees
Prove that the following sets are Abelian groups.
Everywhere the name of the operation reflects its content rather than notation.
1. The set consists of integers; the operation is addition of numbers.
2. The set consists of complex numbers, except zero; the operation is multi-
plication of numbers.
3. The set consists of integer multiples of 3; the operation is addition of
numbers.
4. The set consists of positive rationals; the operation is multiplication
of numbers.
5. The set consists of numbers of the form a + bV2, where a and b are
nonzero rationals; the operation is multiplication of numbers.
6. The set consists of a single elernent a; the operation is called addition and
defined by a + a = a.
7. The set consists of integers 0, 1, 2, ... , n-1; the operation is called
mod n addttion and consists in calculating the nonnegative remainder less than n
of the division of the sum of two numbers by the number n.
8. The set consists of integers 1, 2, 3, ... , n - 1, where n is a prime; the
operation is called mod n multtpltcatton and consists in calculating the nonnega-
tive remainder less than • of the division of the product of two numbers by the
number n.
9. The set consists of collinear directed line segments; the operation is addi-
tion of directed line segments.
10. The set consists of coplanar directed line segments; the operation is
addition of directed line se¥Jllents.
11. The set consists of dueeted line segments of a space; the operation is
addition of directed line segments.
30 Set.s, Elements, Operations [Ch. 1

As regards the last three examples, notice that the zero element of an Abeli-
an group of directed line segments is a zero directed line segment and that the
inverse line segment toAB isBA. It follows from what was proved above that
they are unique. Examples of noncommutative groups will be given later.

8. Rings and fields


Consider a set K in which two operations
are introduced. Call one of them addition and the other multiplica-
tion, and use the corresponding notation. Assume that both are
related by the distributive law, i.e. for any three elements a, b and c
of K
(a + b) c = ac + be, a (b + c) = ab + ac.
The set K is said to be a ring if two operations are defined in it,
addition and multiplication, both associative as well as related
by the distributive law, addition being commutative and possessing
an inverse. A ring is said to be commutative if multiplication is
commutative and noncommutative otherwise.
Notice that any ring is an additive Abelian group. Consequently,
there is a unique zero element 0 in it. The element possesses the
property that for any element a of the ring
a+ 0 =a.
We gave the definition of the zero element only with respect to
the operation of addition. But it plays a particular role with respect
to multiplication as well. Namely, in any ring the product of any
element by the zero element is a zero element. Jndeed, let a be any
element of K; then
a·O = a (0 + 0) = a·O + a·O.
Adding to each side an element -a· 0 we get a· 0 = 0. It can be
proved similarly that 0· a = 0.
Using this property of the zero element it can be established
that in any ring for any elements a and b
(-a) b = - (ab).
Indeed,
ab +(-a) b =(a+ (-a)) b = O·b = 0,
i.e. the element (-a) b is the negative of ab. According to our
notation we may write it as - (ab).
Now it is easy to show that the distributive law is true for the
difference of elements. We have
(a- b) c = (a+ (-b)) c = ac + (-b) c = ac + (-(be))= ac- be,
a (b-e)= a(b +(-c))= ab +a (-c)= ab + (- (ac)) = ab- ac.
8) Rings and fields 31

The distributive law, i.e. the usual rule of removing brackets,


is the only requirement in the definition of a ring relating addition
and multiplication. Only due to this law a simultaneous study of
the two operations gives more than could be obtained if they were
studied separately.
We have JUSt proved that algebraic operations in a ring possess
the many customary properties of operations on numbers. It should
not be supposed, however, that any property of addition and multi-
plication of numbers is preserved in any ring, be it even a commuta-
tive one. Thus multiplication of numbers has a property converse
to that of multiplication by a zero element. Namely, if a product
of two numbers is equal to zero, then at least one of the multipliers
equals zero. In an arbitrary commutative ring this property does
not necessarily hold, i.e. a product of elements not equal to the zero
element may be zero.
Nonzero elements whose product is a zero element are called
zero divisors. Their existence in a ring makes investigating them
substantially more difficult and prevents one from drawing a deep
analogy between numbers and elements of a commutative ring.
This analogy can be drawn, however, for rings having no zero di-
visors.
Suppose in a commutative ring with respect to the operation of
multiplication there is an identity element e and each nonzero
element a has an inverse element a- 1 • It is not hard to prove that
both the identity and the inverse element are unique, but what is
most important is the fact that now the ring has no zero divisors.
Indeed, let ab = 0, but a =1= 0. Multiplying both sides of this equa-
tion on the left by a -l we get
a- 1ab = (a- 1a) b = eb = b
and certainly a-10 = 0. Consequently, b = 0.
From the absence of zero divisors it follows that from any equa-
tion we can cancel the nonzero common multiplier. If ca = cb
and c =1= 0, then c (a- b)= 0, from which we conclude that a- b=
= 0, i.e. a = b.
A commutative ring P in which there is an identity element and
each nonzero element has an inverse is called a .field.
Writing the quotient alb as the product ab- 1 , it is easy to show
that any field preserves all the usual rules of handling fractions, in
terms of addition, subtraction, division and multiplication. Namely,
.!... ± ~ _ ad ± be a c ac -a _ a
b d- bd ' 'b"'d=bif· b -b.
Besides, alb = cld if and only if ad = be, provided, of course,
b =1= 0 and d =1= 0. It is left as an exercise for the reader to check
that these assertions are true.
32 Sets. Elements, Operations [Ch. 1

So in terms of the usual rules of handling fractions all fields


are indistinguishable from the set of numbers. For this reason the
elements of any field will be caned numbers if of course this name
does not lead to any ambiguity. As a rule, the zero element of any
field will be designated as 0 and the identity element as 1.
We shall now list all the general facts we need about the elements
of any field in what follows.
A. To every pair of elements a and b there corresponds an element
a + b, called the sum of a and b, and
(1) addition is commutative, a + b = b + a,
(2) addition is associative, a + (b + c) = (a+ b) + c,
(3) there is a unique zero element 0 such that a + 0 = a for
any element a,
(4) for every element a there is a unique negative element -a
such that a + (-a) = 0.
B. To every pair of elements a and b there corresponds an ele-
ment ab, called the product of a and b, and
(1) multiplication is commutative, ab = ba,
(2) multiplication is associative, a (be) = (ab) c,
(3) there is a unique identity element 1 such that a·i = f. a = a
for any element a,
(4) for every nonzero element a there is a unique inverse element
a- 1 such that aa- 1 = a- 1a = 1.
C. The operations of addition and multiplication are connected
by the following relation: multiplication is distributive over addi-
tion, (a + b) c = ac + be.
These facts lay no claim to logical independence and are but
a convenient way of characterizing elements. Properties A describe
the field in terms of the operation of addition and say that with
respect to this operation the field is an Abelian group. Properties B
describe the field in terms of the operation of multiplication and
say that with respect to this operation the field becomes an Abelian
group if we eliminate from it the zero element. Property C describes
the relation of the two operations to each other.

Exercises

Prove that sets 1-7 are rings and not fields and that
sets 8-13 are fields. Everywhere the name of the operation reflects its content
rather than notation.
1. The set consists of integers; the operations are addition and multiplica-
tion of numbers.
2. The set consists of integer multiples of some number n; the operations
.are addition and multiplication of numbers.
+
3. The set consists of real numbers of the form a. bV2, where a and b
are integers; the operations are addition and multiplication of numbers.
OJ Multiplication of directed line segments 33

4. The set consists of polynomials with real coefficients in a single variable t,


including constants; the operations are addition and multiplication of polyno-
mials.
5. The set consists of a single element a; the operations are defmed hy a +
+ a = a and a ·a = a.
6. The set consists of integers 0, 1, 2, ... , n - 1, where n is a composite
number; the operations are mod n addition and mod n multiplication.
7. The set consists of pairs (a, b) of integers; the operations are defined hy
the formulas
(a, b) + (c, d) = (a + c, b + d); (a, b) ·(c, d) = (ac, bd).
8. The set consists of rational numbers; the operations are addition and
multiplication of numbers.
9. The set consists of real numbers; the operations are addition and multi-
plication of numbers.
10. The set consists of comple1. numbers; the operations are addition and
multiplication of numbers.
11. The set consists of real numbers of the form a+ bV 2, where a and b
are rationals; the operations are addition and multiplication of numbers.
12. The set consists of two clements a and b; the operations are defined hy the
equations
a + a = b +b = a, a +
b = b +
a = b,
a•a=a·b=b·a=a, b·b=b.
13. The set consists of integers 0, 1. 2..... n - 1, where n is a prime; the
operations are mod n addition and mod n multlP.lication.
The reader should note that one of the examples gives a rmg with zero divi-
sors. Which example is it? What is the general form of zero divisors?

9. Multiplication of directed line segments


by a number
We stress once again that an algebraic opera-
tion was defmed by us as an operation on two elements of the same
set. Numerous examples from physics suggest, however, that it is
sometimes reasonable to consider operations on elements of differ-
ent sets. One of such operations is suggested by the concepts of
force, displacement, velocity and accelera lion, and we shall again
use the example of directed line segments to consider it.
It has been customary for a long time now in physics to make
use of line segments. If, for example, a force is said to have increased
by a factor of five, then the line segment representing it is "extended"
by a factor of five without changing the general direction. If, how-
e\ er, the force direction is said to have changed, then the initial
and terminal points of the corresponding line segment are inter-
changed. Proceeding from these considerations we introduce multi-
plication of a directed line segment by a real number.
We first discuss some general questions. Suppose an arbitrary
straight line is given in the plane or in space. We agree to consider
one of the directions on the line to be positive and the other to be
34 Sets, Elc:>mcnts, Opc:>rations [Ch. I

negative. A straight line on which a direction is specified will be


called an axis.
Suppose now that some axis is given and, in addition, a unit
line segment is indicated, which can be used to measure any other
line segment and thus determine its length. With every directed
line segment on the axis we associate its numerical characteri '>tic,

-
the so-called magnitude of the directed line segment.
-
The magnitude {AB} of a directed line segment AB is a number
~

equal to the length of the line segment AB taken with a plus sign
~

if the direction of AB coincides with the positive direction of the


axis, and with a minus sign if the direction of A B coincides with
the negative direction of the axis. The magnitudes of all zero di•·ected
line segments are considered equal

c A
to zero, i.e.
-
{AA}=O.
Fig. 9.1

-
Regardless of which direction on

~ is opposite in direction to BA and the lengths of AB and BA


AB - -
the axis is taken to be positive,

- -
are equal; consequently,

{AB} = - {BA}. (9.1)

--
The magnitude of a directed line segment, unlike its length, may
have any sign. Since the length of A B is the absolute value of its
magnitude, we shall use the symbol I AB I to designate it. It is
clear that in contrast to (9.1)

--IABI = IBAI.

-- -
Let A, B and C be any three points on the axis determining three
directed line segments AB, BC and AC. Whatever the location
of the points, the magnitudes of these directed line segments satisfy
the relation
->
{AB} + {BC}
- = -
{AC}. (9.2)
Indeed, let the direction of the axis and the location of the points

---
be such as in Fig. 9.1, for example. Then obviously

!CAl+ IABI = iCBI. (9.3)


9) Multiplication of directed line segments 35

According to the definition of the magnitude of a directed line


segment and equation (9.1)

- -- -- -- -
ICAI ={CA}= -{AC},

ICBI ={CB}= -{BC}.


IABI ={AB}.
(9.4)

Therefore (9.3) yields

-- -
-{AC} + {AB} = -{BC}
which coincides essentially with (9.2).
In our proof we used only relations (9.3) and (9.4) which depend
only on relative positions of the points A, B and C on the axis
and are independent of their coincidence or noncoincidence with
one another. It is clear that for any other location of the points
the proof is similar.
Identity (9.2) is the basic identity. In terms of the operation of

-- - -
vector addition, for vectors on the same axis, it means that

{AB + BC} = {AB} + {BC}. (U.5)


The magnitude of a directed line segment determines the 'line
segment on the axis to within translation. But if we consider that
equal directed line segmants are also determined to within transla-
tion, then this means that the magnitude of a directed line segment
uniquely determines on a given axis the entire collection of equal

--
directed line segments.

-
Now let AB be a directed line segment and let a be a number.
The product a·AB of the directed line segment ABby the real number a

-
is a directed line segment lying on the axis through the points A

- -
and B and having a magnitude equal to a· {AB}. Thus by definition

{a·AB}= a·{AB}. (9.6)


For any numbers a and ~ and any directed line segments a and b
the multiplication of a directed line segment by a number possesses
the following properties:
1a = a, a (~a) = (a~) a,
(a + ~) a = aa + ~a, a (a + b) = aa + ab.
The first three properties are very simple. To prove them it suffices
to note that on the left- and right-hand sides of the eqn11tions we
36 Sets, Elements, Operations [Ch. 1

have vectors lying on the same axis and use relations (9.5) and
(9.6). We prove the fourth property. Suppose for simplicity that
a > 0. Apply vectors a and b to
a common point and construct on
them a paraHelogram whose diagonal
is equal to a+ b (Fig. 9.2). When
a and b are multiplied by a, the
parallelogram diagonal, by the simil-
itude of figures, is also multiplied
by a. But this means that aa +
Fig. 9.2 --.- ab = a (a + b).
Note in conclusion that the magni-
tude of a directed line segment may be treated as some "function"
6 = {z} (9.7)
whose "independent variable" are vectors z of the same axis and
whose "value" are real numbers 6, with
{z + y} = {z} + {y}, (9.8)
{A.x} = "- {z}
for any vectors z and y on the axis and any number A..

Exercises
1. Prove that the result of multiplication by a number
does not depend on the choice of the positive direction on the axis.
2. Prove that the result of multiplication by a number does not depend on
the way the unit line segment is specified on the axis.
3. Prove that if we perform multiplication by a number defmed on any set
of collinear line segments, the result will remain in the same set.
4. Prove that if we perform multiplication by a number defined on any set
of coplanar line segments, the result will remain In the same set.
5. What are the zero and the negative directed line segment in terms of
multiplication by a number?

tO. Veetor spaees


Solving any problems reduces in the final
analysis to the study of some sets and, in the first place, to the
study of the structure of those sets. The structure of sets can be
studied by various methods, for example, starting from the char-
acteristic property the elements possess, as it is done in problems
in constructing loci, or starting from the properties of operations if
they are defined for the elements.
The last method seems particularly tempting by virtue of its
generality. Indeed, we have already seen at various times that
various sets aJlow introduction of various operations possessing
10) Vector spaces 37

nevertheless the same properties. It is obvious therefore that if in


investigating sets we obtain some result relying only on the prop-
erties of the operation, then that result will hold in all sets where
operations possess the same properties. The specific nature of both
the elements and the operations on them may be quite different.
Somewhat earlier we introduced new mathematical objects,
called directed line segments or vectors, and defined operations on
them. It is well known that in fact there are quite real physical
objects behind vectors. Therefore a detailed study of the structure
of vector sets is of interest at least to physics.
Even now we have three types of sets where operations have the
same properties. These are the set of collinear vectors, the set of
coplanar vectors and the set of vectors in the whole of space. In spite
of the fact that the same operations are introduced in these sets
we are justified in expecting that the structure of the sets must be
different.
There is some temptation, due to their simplicity, to study the
sets relying only on the specific features of their elements. One
cannot help noticing, however, that they have very much in common.
It is therefore appropriate to attack them from some general posi-
tions, in the hope of at least avoiding the tedious and monotonous
repetitions in going from one set to another. But in addition we hope
of course that if we obtain some set with similar properties we shall
be able to carry over all the results of the studies already made.
We now list the familiar general facts about vectors forming any
of the three sets in question.
A. To every pair of vectors z and y there corresponds a vector
z + y, called the sum of z and y, and
(1) addition is commutative, z + y = y + z,
(2) addition is associative, z + (y + z) = (x + y) + z,
(3) there is a unique zero vector 0 such that z + 0 = z for any
vector z,
(4) for every vector z there is a unique negative vector -z such
that z + (-z) = 0.
B. To every pair a and z, where a is a number and z is a vector,
there corresponds a vector ax, called the product of a and z, and
(1) multiplication by a number is associative, a (~x) = (a~) z,
(2) 1·z = z for any vector z.
C. Addition and multiplication are connected by the following
relations:
(1) multiplication by a number is distributive over vector addi-
tion, a (z + y) =ax+ ay,
(2) multiplication by a vector is distributive over addition of
numbers, (a + ~) x = ax + ~·
These facts, as in the case of the field, lay no claim to logical
independence. Properties A describe a set of vectors in terms of
38 Sets, Elements. Operations [Ch. 1

addition and say that it is an Abelian group under addition. Proper-


ties B describe a set of vectors in terms of multiplication of a vector
by a number. Properties C describe the relation of the two opera-
tions to each other.
Now consider a set K and a field P of an arbitrary nature. We
shall say that K is a linear or vector space over P if for all elements
of K addition and multiplication by a number from P are defined,
axioms A, B and C holding. Using this terminology we can say that
the set of comnear vectors, the set of coplanar vectors and the set
of vectors in the whole of space are vector spaces over the field
of real numbers.
The elements of any vector space wiJJ be called vectors although
in their specific nature they may not at all resemble directed line
segments. Geometrical ideas associated with the name "vectors"
will help us understand and often anticipate the required results
as well as help us find the not always obvious geometrical meaning
in the various facts.
Vectors of a vector space will as before be denoted by small Latin
letters and numbers by small Greek letters. We shall call a vector
space rational, real or complex according as the field P is the fteld
of rational, real or complex numbers, and denote it by D, R or C
respectively. The fact that the name and notation Jack any reference
to the elements of the set has a deep meaning, but we shall discuss
it much later.
Before proceeding to a detailed study of vector spaces consider
the simplest consequences following from the existence of addition
and multiplication by a number. They will mainly concern the
zero and negative vectors.
In any vector space, for every element x
O·x = 0,
where at the right 0 stands for the zero vector and at the left 0 is
the number zero. To prove this relation consider the element O·x + x.
We have
O·x +x = O·x + 1·x = (0 + 1) x = 1·x = x.
Consequently,
x = O·x + x.
Adding to each side -x we find
0 = x + (-x) = (O•x + x) + (-x) = O·x + (x + (-x))
= O·x + 0 = O·x.
Now it is easy to show an explicit expression for the negative ele-
ment -x in terms of the element x. Namely,
-x = ( -1) :r.
10) Vector spaces 39

This formula follows from the simple relations


x + (-1) :r = 1·x + (-1) x = (1- 1) x = O·x = 0.
This in turn establishes the relations
-(ax) = (-a) x =a (-x),
since
-(ax) = (-1) (ax) =(-a) x =a ((-1) x) =a (-x).
Recall that by the definition of the operation of subtraction
x - y = x + (-y) for any vectors x andy. The explicit expression
for the negative vector shows the validity of distributive laws for
a difference too. Indeed, regardless of the numbers a and ~and the
vectors x and y we have
(a - ~) x = ax + (-~) x = ax + (-(~x)) = ax - ~x,
a (x - y) = a (x + (-1) y) = ax + (-a) y = ax + (-(ay))
=ax- ay.
It follows in particular that for any number a
a·O = 0,
since
a•O =a (x- x) =ax- ax= ax+ (-ax) = 0.
And, finally, the last consequence. If for any number a and any
vector x
ax = 0, (10.1)
then either a= 0 or x = 0. Indeed, if (10.1) holds, then there are
two possibilities: either a = 0 or a =1= 0. The case a = 0 supports
our assertion. Now let a =1= 0. Then
x=1·x=(..!.·a)
~
x=..!.(a:r)=..!...·O=O.
a a
It follows in particular that in any vector space the common
nonzero multiplier can formally be cancelled from any equation,
whether it is a number or a vector. Indeed, if ax = ~x and x =1= 0,
then (a- ~) x = 0 and hence a - ~ = 0, i.e. a = ~· If ax = ay
and a =I= 0, then a (x - y) = 0 and hence x - y = 0, i.e. x = y.
So in terms of multiplication, addition and subtraction all the rules
of equivalent transformations of algebraic expressions formally hold.
We shall no longer state them explicitly in what follows.
We have broached the subject of vector spaces. To conclude, it
is no mere chance that we have used a single notation for the prop-
erties of operations in the field and in the vector space. There are
features of striking resemblance (as well as difference) between the
axioms of the field and those of the vector space over a field. The
reader should ponder on them.
40 Sets, Elements, Operations [Ch. 1

Exercises
Prove that the following sets are vector spaces E\"ery-
where the name of the operation reflects its content rather than nutation.
1. The field consists of real numbers; the set consists of real numbers; addi-
tion is the addition of real numbers; multiplication hy a number is the multi-
plication of a real number hy a real number.
2. The field consists of real numbers; the set consists of complex numbers;
addition is the addition of complex numbers; multiplication hy a number is
the multiplication of a complex number hy a real number.
3. The field consists of rational numbers; the set consists of real numbers;
addition is the addition of real numbers; multiplication hy a number is the
multiplication of a real number hy a rational number.
4. The field consists of any number; the set consists of a single vector a;
addition is defined hy the rule a + a =- a; multiplication of the vector a hy
any number a is defined hy the rule aa = a.
5. The field consists of real numbers; the set consists of polynomials with
real coefficients in a single variable t, including constants; addition is the addi-
tion of polynomials; multiplication hy a number is the multiplication of a poly-
nomial hy a real number.
6. The field consists of rational numbers; the set consists of numbers of the
form a + + +
bV2 ct/3 dl/5, where a, b, c and d are rationals; addition is
the addition of numbers of the indicated form; multiplication by a number is
the multiplication of a number of the indicated form by a rational number.
7. The field is any field; the set is the same field; addition is the addition of
elements (vectors!) of the field; multiplication hy a number is the multiplica-
tion of an element (a vector!) of the field by an element (a number!) of the field.

tt. Finite sums and products


Fields and vector spaces are the main sets
with which we shall have to deal in what follows. Two operations,
addition and multiplication, are introduced in these sets. If a large
number of operations on elements is performed, then there appear
expressions containing a considerable number of summands and
factors. For notational convenience we introduce the appropriate
symbols. It will be assumed that addition and multiplication are
commutative and associative operations.
Given a finite number of not necessarily different elements, assume
that all the elements are numbered in some way and have indices
taking on all consecutive values from some integer k to an integer p.
Denote elements by a single letter with index. The index may be
placed anywhere in notation. It may be put in parentheses near
the letter, at the bottom of the letter, at its top and so on. This is
of no consequence. Most often we shall write it at the lower right
of the letter.
We shall denote the sum of elements a 11 , ah+ 11 • • • , ap by the
symbol of the following form:
p
a11 +a 11 T,+ . .. +ap= ~a,. (11.1)
i=h
11] Finite sums and products 41

The index i in the formula is the summation index. Of course nothing


will happen if we denote it by any other letter. Sometimes under the
summation sign the collection of indices is explicitly stated over
which summation is carried out. For example, the sum in question
could be written as:
a 11 +aH 1+ ... +ap= ~ a 1•
h~i~p

It is obvious that if every e]ement a 1 equals the product of an


element b 1 and an element a, where a is independent of the sum-
mation index i, then

i.e. the multiplier independent of the summation index may be


taken outside the summation sign.
Suppose now that the elements have two indices, each changing
independently. We agree to use for these e]ements common notation
au and let, for example, k ~ i ~ p and m ~ j ~ n. Arrange the
elements as a rectangular array:
a,.m a,., m+l
ah+l, m ah+l, m+l

It is clear that whatever the order of summation the result will


be the same. Therefore, taking into account the above notation
for the sum, we have

(a,.m +a,., m+l + • •• +a,.n) + (ah+l. m +ah+l. m+l + · · · -ral!+l, n) +


... +(apm+ap,m+l+ ... +apn)
n n n p n
= _L a 111 +
1=m
.~ a 11 +1.J+ ...
]=m
+ 1=m
~ ap 1 =~ ( ~
1=11 1~m
a, 1).

On the other hand, the same sum equals

(a,.m+a11+1.m+ • ·• +apm)
+(a,., m+l +all+ I. m+l + ... +a,,, m+l) ...J_ • •• +(a,.,. __j_ a,.+l., . .)__ . ·. -a "n)
p P p n P
= )~ a,rn + 2J a,, m+l- ... -;- ~ a,,.= )~ (L; a,,).
1=11 •=II ~=h j=m ;=·,
42 Sets. Elements. Operations [Ch. 1

Consequently,
p 11 n p
~ ( ~ all)= ~ (~ ad).
•=I• i=m J=m •=II
If we agree that we shall always take summation con!'ecutively
over the summation indices arranged from right to left, then the
brackets may be dropped and we finally get
~
p
.t....
i~hi=m
~
~
n
a,J= ,..,L·n
J=mi=ll
"" ao.
p
£...

This means that we may change the order of summation in summing


over two indices. If, for example, a IJ = a. 1b1J, where a. 1 is indepen-
dent of the index j, then
p n p n
~ ~ a.,bl} =
•=1• J=m
L
•=II
lXI ~ btl.
J=m
Similar results hold for sums over any finite number of indices.
A product of elements a 11 , a/1+ 11 • • • , ap will be designated by
a symbol of the following form:
p
a 11 a 11 + 1 ••• aP = lJ a 1•
f=ll
Now if a 1 = a.b, then

fJ a.b, =
i=ll
a.P-h+t (J
f=(!
b,.

As in the case of summation, we may change the order of calculating


the products over two indices, i.e.
p n n p

1=11
II i=m
(J aiJ= ll lJ all·
1=m •=ll
All these facts can be proved according to the same scheme as in
the case of summation of numbers.

Exercises
Calculate the following expressions:
n n n n
~ 1, ~ t, ~ ,. ~ 8(1-1),
•=-1 1=1 •=-! 1-t
n m n m n m p
~ L rj, L ~ (t+5s), ~ .L :L (2,-z)2k,
r=l i=l i=l •=I 1=1]=1 ll=l
n
11
p=t
to''. n n2'-'.
n

•=t )=!
m . n
[] [J
f=t i=t ll=l
m /,
Ll 2'+i+ll.
12) ApfJroximalf' calculalions 43

12. Approximate calculations


The sets discussed above are very widely used
in various theoretical studies. To obtain a result it is almost always
necessary to perform some operations on the elements of the sets.
Especially frequently a need arises to carry out manipulations with
elements of number fields. We want to note a very important feature
of practical realizations of such computations.
Suppose for definiteness that we are dealing with a f1eld of real
numbeJ's. Let every number he represented as an infinite decimal
fraction. Neither man nor the most modern computer can handle
infinite fractions. In practice therefore every such fraction is replaced
by a finite decimal fraction close to it or by a suitable rational
number.
So an exact real number is replaced by an approximate one.
In theoretical studies implying precise assignment of numbers one
expression or another is fairly often replaced by an equal expression,
possibly written in anothel' form. Of COUJ'Se in this Case SUCh a substi-
tution can raise neither any object ions nol' even questions. But if
we want to calculate some expt·ession using approximate numbers,
then the form of the expl'ession is no longer irrelevant.
Consider a simple example. It is easy to check that in the case
of the exact assignment of the number ~' 2

(12.1)

Since Y 2 = 1.4142 ... , the numbers 7/5 = 1.4 and 17/12 =


= 1.4166 ... may be considered to be approximate values for V 2.
But substituting 7/5 on the left and right of (12.1) we get 0.00509 .. .
and 1.0 respectively. For 17/12 we have 0.00523 ... and -0.1666 ... .
The results of the substitutions considerably differ, and it is not
immediately apparent which is closer to truth. This shows Mw
careful one must be in handling approximate numbers.
We have discussed only one possible source of approximate num-
bers, the rounding of exact numbers. In fact there are many other
sources. For example, initial data for calculations often result from
experiments and every experiment may produce a result only to
a limited precision. Even in such simple operations as multiplica-
tion and division the number of digits in fractions may greatly
increase. We are compelled therefore to discard part of the digits
in the results of intermediate calculations, i.e. we are again compelled
to replace some numbers by approximate ones and so on.
A detailed study of operations with approximate numbers is
beyond the scope of this course. However, we shall fairly frequently
return to the discussion of the difference between theoretical and
Sets. Ell'ments. Operations [Ch. 1

practical calculations. The need for such a discussion arises from


the fact that theoretical calculations cannot as a rule be realized in
exact form.
Exercises
1. What finite decimal fraction must be used to approx-
imate V2 for the f• rst six digits to coincide in the results of computing the
left- and right-hand sides of (12.1)?
2. Let the result of each operation on two real numbers he rounded according
to any rule you know to t decimal places. Are the commutative and associative
properties of the operations preserved?
3. Will the distributive laws hold under the hypotheses of the preceding
exerci&'?
4. To what conclusion do you come if the answer in Exercises 2 and 3 is no?
CHAPTER 2

The Structure
of a Vector Space

13. Linear combinations and spans


Let ell e 2 , • • • , en be a finite number of
arbitrary, not necessarily distinct vectors from a vector space K
over a field P. We shall call them a system of vectors. One system
of vectors is said to be a subsystem of a second system if the first
system contains only some vectors of the second and no other vectors.
The vectors of a given system and those obtained from them will
be subjected to the operations of addition and multiplication by
a number. It is clear that any vector x of the form
(13.1)
where all a 2, ... , an are some numbers from P, is obtained from
the vectors of a given system e~o e 2 , • • • , en with the aid of the
two operations. Moreover, in whatever order the operations are
performed, we shall obtain vectors of the form (13.1).
A vector x in (13.1) is said to be linearly expressible in terms of
the vectors e~o e 2 , • • • , en. The right-hand side of (13.1) is called
a linear combination of these vectors and the numbers a 1 , a 2 , • • • , an
are the coefficients of the linear combination.
Fix a system of vectors ell e 2 , • • • , en and allow the coefficients
of linear combinations take on any values from the field P. This
will determine some set of vectors in K. This set is the span of vec-
tors ell e 2, . . . , en and is designated L (ell e 2, ..• , en>·
Our interest in spans is accounted for by two circumstances. First,
any span has a simple structure, being a collection of all linear
combinations of vectors of a given system. Second, the span of any
system of vectors from any vector space is itself a vector space.
Indeed, all the axioms of a vector space are almost obvious.
Some explanation may only be required by the axioms relating
to the zero and the negative vector. The zero vector is clearly in
any span and corresponds to the zero valuec; of the coefficients of
a linear combination, i.e.
0 = O·e 1 + O·e 2 + ... + O·en.
The negative of (13.1) is
-x = (-ai) e1 + (-a,) e2 + ... + (-a,,) ~'n·
46 The Structure of a Vector Space (Ch. 2

The uniqueness of the zero and negative vectors follows from their
uniqueness as vectors of the vector space K.
Notice that the span of vectors eh e 2 , • • • , en is the "smallest"
vector space containing those vectors. Indeed, the span consists
of only linear combinations of vectors eh e 2, ... , en and any
vector space containing eh e 2 , • • • , en must contain all their
linear combinations.
So any vector space contains in the general case an infinite number
of other vector spaces, the spans. Now the following questions
arise:
What are the cond i lions under which the spans of two distinct
systems of vectors consist of the same vectors of the original space?
What minimum number of vectors determines the same span?
Is the original vector space the span of some of its vectors?
We shall soon get answers to these and other questions. To do
this a very wide use will be made of the concept of linear combina-
tion, and in particula1 of its transitive property. Namely, if some
vector z is a linear combination of vectors xlt x 2 , • • • , Xr and
each of them in turn is a linear combination of vectors Y1t y 2 , • • •
. . . , y 8 , then z too may be represented as a linear combination of
Y1• y 2 , • • • , Ys· We prove this property. Let

(13.2)

and in addition for every index i, 1 ~ i ~ r, let


s

,...
X,=~ 'VrJYlt

whe1·e ~~ and 'VtJ are some numbers from P.


Substituting the expression for x 1 on the right of (13.2) and using
the corresponding properties of finite sums we get

s r " r ~

~ ~ ~I'VtlYJ = ~ (~ ~IYIJ) YJ =I GJYJ•


= J=t i=l 1=! i=l 1= I

where the coefficients 61 imply the following expressions:


T

~ ~.YrJ·
61- •=I
So the concept of linear combination is indeed transitive.
1'1) Linear depcndence 47

Exercises
1. What are in a space of directed line segments the
spans of systems of one, two, three and a larger number of directed linc segments?
2. Consider a vector space of polynomials in t over the field of real numbers.
What is the span of the system of vectors t 1+ +
1, t1 t and 1?
3. In what space do all spans coincide with the space?
4. Prove that the vector space of all directed line segments cannot be the-
span of any two directed line segments.

14. Linear dependence


Consider again arbitrary vectors ell e2 , • • •
. . . , en in a vector space. It may happen that one of them is linearly
expressible in terms of the others. Fm· example, let it be e1• Then
each vector of eh e 2 , • • • , en is linearly expressible in terms of
e 2 , ea, ... , en. Therefore any linear combination of vectors
eh e2 , • • • , en is also a linear combination of vectors e 2 , ea, ...
. . . , en. Consequently, the spans of the vectors elt e 2 , • • • , en
and e 2 , ea, ... , e11 coincide.
Suppose further that among the vectors e 2 , e3 , • • • , en there is
some vector, say, e2 which is also linearly expressible in terms of
the rest. Repeating our reasoning we conclude that now any linear
combination of vectors e1 , e 2 , • • • , en is also a linear combination
of ea, e4 , • • • , en. Continuing this process we finally come from
the system ell e 2 , • • • , en to a system from which none of the vectors
can any longer be eliminated. The span of the new system of vectors
obviously coincides with that of the vectors elt e 2 , • • • , en. In
c1ddition we can say that if there were at least one nonzero vector
among elt e 2 , • • • , en, then the new system of vectors would either
consist of only one nonzero vector or none of its vectors would be linearly
expressible in terms of the others.
Such a system of vectors is called linearly indepertdent.
If a system of vectors is not linearly independent, then it is said
to be linearly dependent. In particular, by definition a system consist-
ing of a zero vector alone is linearly dependent. Linear dependence
or independence are properties of a system of vectors. Nevertheless
the corresponding adjectives are very often used to refer to the
vectors themselves. Instead of a "linearly independent system of
Yectors" we shall sometimes say a "system of linearly independent
vectors" and so on.
In terms of the notions just introduced this means that we proved
Lemma 14.1. If not all of the vectors e~o e2 , • • • , en are zero and
this system is linearly dependent, then we can find in it a linearly
tndependent subsystem of vectors in terms of which any of the vectors
elt e 2 , • • • , en is linearly expressible.
48 The Structure of a Vector Space [Ch. 2

Whether the system of vectors e1 , e 2 , • • • , en is linearly depen-


dent or linearly independent is determined by one, seemingly unex-
pected, fact. We have already noted that the zero vector is in the
span and is clearly a linear combination (13.1) with zero values
of the coefficients. In spite of this it can be linearly expressed in
terms of the vectors eh e 2 , • • • , en and in other ways, i.e. defined
by another set of the coefficients of a linear combination. The linear
independence of eh e 2 , • • • , en is very closely related to the unique-
ne~s of representing the zero element in terms of them. Namely,
we have
Theorem t4.t. A system of vectors elt e 2 , • • . , en is linearly inde-
pendent rf and only if
(14.1)
implies the equality to zero of all the c?efficients of the linear combina-
tion.
Proof. Let n = 1. If e1 =1= 0, then, as already noted above, a 1e1 =
= 0 must yield a 1 = 0. But if it foHows from a 1e1 = 0 that a 1
is zero, then e1 obviously cannot be zero.
Consider now the case n ~ 2. Let a system of vectors be linearly
independent. Suppose (14.1) is true for some set of coefficients
among which there is at least one different from zero. For example,
let a 1 =1= 0. Then (14.1) yields

e1 = ( - ~: } e2 + (- ~: }e3 + ... + (- :: } e,.,


i.e. e 1 is linearly expressible in terms of the other vectors of the
system. This contradicts the condition that the system be linearly
independent, and therefore it is impossible that there should be
nonzero coeff1cients among those satisfying (14.1).
If (14.1) implies that a]] coefficients are equal to zero, then the
system of vectors cannot be linearly dependent. Indeed, suppose
the contrary and let eh for example, be linearly expressible in terms
of the other vectors, i.e. let
e1 = ~2e2 + ~3e3 + • • • + ~nen.
Then (14.1) will clearly hold for the coefficients a 1 = -1, a 2 =
= ~ 2 , • • • , an = ~n among which at least one is not equal to
zero. Thus the theorem is proved.
This theorem is so widely used in various studies that it is most
often regarded just as definition of linear independence.
Note two simple properties of systems of vectors associated with
linear independence.
Lemma t 4.2. If some of the vectors elt e 2 , • • • , en are linearly
dependent, then so is the entire system elt e 27 • • • , en.
1t.l Linear dependence 49

Proof. We may assume without loss of generality that it is the


first vectors e1o e 2 , • • • , elt that are linearly dependent. Consequent•
ly, there are numbers a1o a 2 , • • • , alt. not all zero, such that
a 1e1 +a 2e 2 + ... + altelt = 0.
This yields
a 1e1 +a 2e 2 + ... + allell + O·e1t+l + ... + O·en = 0.
But this equation implies the linear dependence of elt e 1 , • • • , en
since there are nonzero numbers among alt a 2 , • • • , alt. 0, ... , O.
Lemma 14.3. If there is at least one zero vector among elt e 2 , • • • , en,
then the entire system elt e 2 , • • • , en is linearly dependent.
Proof. Indeed, a system of one zero vector is linearly dependent.
Therefore it follows from the property just proved that the entire
system is linearly dependent.
The following theorem is the most important result relatin~ to
linear dependence:
Theorem 14.2. Vectors e1 , e2 , • • • , en are linearly dependent if and
only if either e1 = 0 or some vector elt, 2 ~ k ~ n, is a linear com-
bination of the preceding vectors.
Proof. Suppose e1 , e 2 , • • • , en are linearly dependent. Then in
(14.1) not all coefficients are zero. Let the last nonzero coefficient
be all. If k = 1, then this means that e1 = 0. Now let k > L Then
from

we find that

elt = ( - :~ ) e1 + (- :~ ) e2 + ... + (- a:~ 1 ) ell- I·


This proves the necessity of the statement formulated in the
theorem. Sufficiency is obvious since both the case where e1 = 0
and the case where ell is linearly expressible in terms of the preceding
vectors imply the linear dependence of the first vectors in e1 , e2 , • • •
• . . , en. But this implies the linear dependence of the entire system
of vectors.
Exercises
1. Prove that if any vector of a vector space can be
uniquely represented as a linear combination of vectors e1 , e1 , ••• , en, then that
system of vectors is linearly independent.
2. Prove that if a system of vectors e1 , e1 , ••• , en is linearly independent,
then any vector of the span of those vectors can be uniquely represented as a
linear combination of them.
3. Prove that a system of vectors elt e2 , • • • , en is linearly dependent if and
only if either en = 0 or some vector ell, 1 ~ k~ n - 1, is a linear combination
of the subsequent vectors.
4-0510
50 The Structure of a Vector Space [Ch. 2

4. Consider a vector space of polynomials in a variable I over the field of


real numbers. Prove that the system of vectors 1, t, t2 , ••• , t" is linearly inde-
pendent for any n.
5. Prove that a system of two noncollinear directed line segments is linearly
Independent.

15. Equivalent systems of vectors


Consider two systems of vectors of a vector
space K. Suppose their spans coincide and constitute some set L.
Any vector of each system is clearly in L and in addition each vector
of L can be represented in this case as a linear combination of both
the vectors of one system and those of the other. Consequently:
Two systems of vectors possess the property that any vector of each
system is linearly expressible in terms of the vectors of the other.
Such systems are called equivalent.
It follows from the foregoing that if the spans of two systems of
vectors coincide, then those systems are equivalent. Now let any
two equivalent systems be given. Then by the transitivity of the
concept of linear combination any linear combination of vectors of
one system can be represented as a linear combination of vectors of
the other system, i.e. the spans of both systems coincide. So we
have
Lemma 15.1. For the spans of two systems of vectors to coincide it
is necessary and sufficient that those systems should be equivalent.
Notice that the concept of equivalence of two systems of vectors is
an equivalence relation. Reflexivity is obvious since any system is
equivalent to itself, symmetry follows from the definition of equiva-
lent systems, and the transitivity of the notion follows from that of
the concept of linear combination. Therefore the set of all systems of
vectors of any vector space can be divided into classes of equivalent
systems. It is important to stress that all the systems of the same
class have the same span.
Nothing can be said in the general case about the number of
vectors in equivalent systems. But if at least one of two equivalent
systems is linearly independent, then it is possible to make quite
definite conclusions concerning the number of vectors. They are
based on
Theorem 15.1. If each of the vectors in a linearly independent system
e1 , e2 , ••• , en is linearly expressible in terms of vectors Y1 , Y2 , • • • , Ymt
then n ~ m.
Proof. Under the hypothesis of the theorem en is linearly expres-
sible in terms of Y~o y 2 , • • • , Ym and hence the system
(15.i)
is linearly dependent. The vector en is not equal to zero and there-
fore by Theorem 14.2 some vector y 1 in (15.1) is a linear combination
151 Equivalent systems of vectors 51

of the preceding vectors. On eliminating this vector we obtain the


following system:
(15.2)
Using the transitivity of the concept of linear combination it is
now easy to show that each of thl' vector!' e,, e 2 • • • • , en is linearly
expressible in terms of vectors (15.2).
We join to vectors (15.2) on th£> left a vector en-l• We again con-
clude that the system
(15.3)
is linearly dependent. The vector e,. _1 is not equal to zero and there-
fore by Theorem 14.2 one of the other vectors (15.3) is a linear com-
bination of the preceding vectors. This vector cannot be en since
this would imply the linear dependence of the system of two vectors
en_ 1, en and hence of the entire system of vectors e 1, e 2, •.• , e,..
Thus some vector y1 in (15.3) is linearly expressible in terms of the
preceding ones. If we eliminate it, then we again obtain a system
in terms of which each of the vectors e1, e2 • • • • , en is linearly expres-
sible.
Continuing this process notice that the vectors y 17 y 2 , • • • , Ym
cannot be exhausted before we have joined all vectors e1 , e 2 , • • • , e11 •
Otherwise it will turn out that each of the vectors e1 , e 2 , • • • , en
is linearly expressible in terms of some of the vectors of the same
system, i.e. that the entire system must be linearly dependent.
Since this contradicts the hypothesis of the theorem, it follows that
n:::;;;;m.
Consider consequences of the theorem. Suppose we are given two
equivalent linearly independent systems of vectors. By Theorem 15.1
each of the systems contains at most as many vectors as the other.
Consequently:
Equivalent linearly independent systems consist of the same number
of vectors.
Take further n arbitrary vectors, construct on them the span and
choose on it any n + 1 vectors. Since the number of those vectors is
greater than that of the given vectors, they cannot be linearly
independent. Therefore:
Any n + 1 vectors in the span of a system of n vectors are linearly
dependent.
In terms of equivalent systems Lemma 14.1 implies that whatever
a system of vectors not all equal to zero may be there is an equivalent
linearly independent subsystem in it. This subsystem is called a
basis of the original system.
Of course, any system may have more than one basis. All the
bases of equivalent systems are themselves equivalent systems. It
follows from the first consequence of Theorem 15.1 that they consist
52 The Structure of a Vector Space [Ch. 2

of the same number of vectors. That number is a characteristic of


all equivalent systems and is called their rank. By definition the
rank of systems of zero vectors is considered to be equal to zero.
Consider now two linearly independent systems consisting of the
same number of vectors. Replace some vector of one system by some
vector of the other. In the resulting system again replace a vector of
the first system by some of the remaining vectors of the second
system and so on. The replacement process is carried on until one
system is replaced by the other. If replacement is carried out in an
arbitrary manner, then the intermediate systems may turn out to be
linearly dependent. However, we have
Theorem 15.2. The process of successive replacement may be carried
out so that intermediate systems will all be linearly independent.
Proof. Let y1 , y 2 , • • • , Yn and z1 , z 2 , • • • , Zn be two linearly in-
dependent systems of vectors. Suppose k steps of the process have
been carried out, with k~ 0. We may assume without loss of general-
ity that the vectors y1 , • • • , y, have been replaced by z1 , • • • , z,
and that all the systems obtained, including the system

are linearly independent. This assumption obviously holds for


k = 0.
Suppose further that when Y11.+1 is replaced by any of the vectors
z,+l, ... , Zn all systems
Z1, • • ., z,, z,, Yll+2• • • ., Yn
are linearly dependent for i = k + 1, ... , n. Since the system
Z1, • • • • z,,
Yll+ 2• • • •• Yn (15.4)
is linearly independent, it follows that the vectors z, fori = k +
1, .••
, n are linearly expressible in terms of it. But -;o are the vectors
1 1 fori = 1, 2, ... , k. Consequently, all vectors z1 , • • • , Sn must be
linearly expressible in terms of (15.4). This is impossible by virtue
of Theorem 15.1. Therefore the replacement process indicated in
Theorem 15.2 does indeed hold.

Exercises
Prove that the following transformations of a system
of vectors, called elementary, result in an ~ivalent system.
t. Addition to a system of vectors of any hnear combination of those vectors.
2. Elimination from a system of vectors of any vector which is a linear com-
bination of the remaining vectors.
3. MultiJi'lication of any vector of a system by a number other than zero.
4. Addit1on to any vector of a system of any linear combination of the re-
maining vectors.
5. Interchanging of two vectors.
16) rhc basis 53

16. The basis


Suppose we are given a vector space consisting
of not only a zero vector. In such a space there is clearly at least one
nonzero vector and consequently there is a linearly independent
system of at least one vector. There are two possibilities now: either
there is a linearly independent system containing an arbitrarily
large number of vectors or there is a linearly independent system
containing a maximum number of vectors. In the former case the
vector space is called infinite dimensional and in the latter it is
called finite dimensional.
With the exception of some episodic examples, our attention will
be devoted throughout this book to finite dimensional spaces. In par-
ticular, a finite dimensional vector space is any span constructed on
a finite number of vectors of an arbitrary (not necessarily finite
dimensional) space.
So let vectors e1 , e 2 , • • • , en constitute in a finite dimensional
vector space K a linearly independent system with a maximum
number of vectors. This means that for any vector z in K the system
e1 , e2 , • • • , en, z will be linearly dependent. By Theorem 14.2 the
vector z is linearly expressible in terms of e1 , e 2 , • • • , en.
Since z is arbitrary and e1 , e 2 , • • • , en are fixed, we may say
that
Any finite dimensional vector space is the span of a finite number of
its vectors.
In studying finite dimensional vector spaces now we can use any
properties relating to spans and equivalent systems of vectors. We
introduce the following definition:
A linearly independent system of vectors in terms of which each
vector of a space is expressible is called a basis of the space.
Our concept of basis is associated with a linearly independent
system containing a maximum number of vectors. It is obvious,
however, that all bases of the same finite dimensional vector space
are equivalent linearly independent systems. As we know, such
systems contain the same number of vectors. Therefore the number
of vectors in a basis is a characteristic of a finite dimensional vector
space. This number is called the dimension of a vector space K and
designated dim K. If dim K = n, then the space K is n-dimensional.
It is clear that:
In an n-dimensional vector space any linearly independent system
of n vectors forms a basis and any system of n + 1 vectors is lmearly
dependent.
Notice that throughout the foregoing we have assumed that a
vector space consists of not on] y a zero vector. A space consisting of
a zero vector alone has no basis in the above sense, and by defmition
its dimension is assumed to equal zero.
54 The Structure of a Vector Space [Ch 2

The concept of basis plays a great role in the study of finite dimen-
sional vector spaces and we shall continually use it for this purpose.
It allows a very easy description of the structure of any vector space
over an arbitrary field P. In addition it can be used to construct
a very efficient method reducing operations on elements of a space to the
corresponding operations on numbers from a field P.
As shown above, any vector x of a vector space K may be repre-
sented as a linear combination
(16.1)
where a 1 , a 2 , • • • , an are some numbers from P and e1 , e 2 , • • • , en
constitute a basis of K. The linear combination (16.1) is called the
expansion of a vector x with respect to a basis and the numbers a 1 , a 2 , •••
• . . , an are the coordinates of x relative to that basis. The fact that x
is givPn by its coordinates a 1 , a 2 , • • • , an will be written as follows:
x = (a 11 a 2, ••• , an)•
As a rule, we shall not indicate to which basis the given coordinates
relate, unless any ambiguity arises.
It is easy to show that for any vector x in K its expansion with
respect to a basis is unique. This can be proved by a device very often
used in solving problems concerning linear dependence. Suppose
there is another expansion
X = ~lel + ~2e2 + · · · + ~nen• (16.2)
Subtracting term by term (16.2) from (16.1) we get
(al - ~~) e1 + (a 2 - ~ 2 ) e2 + ... + (an - ~n) en = 0.
Since e1 , e 2, ••• , en are linearly independent, it follows that all
coefficients of the linear combination are zero and hence expansions
(16.1) and (16.2) coincide.
Thus, with a basis of a vector space K fixed, every vector in K
is uniquely determined by the collection of its coordinates relative
to that basis.
Now let any two vectors x andy inK be given by their coordinates
relative to the same basis e1 , e 2 , • • • , e,., i.e.
x = a 1e1 + a 2e 2 + ... + anen,
Y = Y1e1 + Y1e2 + · · · + Ynen,
then
Z + Y = (al + '\'1) e1 + (a + Yz) ez + · · · + (an + Yn) en•
2

Also, for any number A. in the field P,


Az = (Aa 1 ) e1 + (Aa 2 ) ·'l + ... + (A-an) en•
17) Simple examples of vector spaces 55

It follows that in adding two vectors of a vector space their coordinates


relative to any basis are added and in multiplying a vector by a number
all its coordinates are multiplied by that number.

Exercises
1. Prove that the rank of a system of vectors coincides
with the dimension of its span.
2. Prove that equivalent systems of vectors have the same rank.
3. Prove that if a span L1 is constructed on the vectors of a span L1 , then
dim Lt~ dim L.
4. Prove that if a span Lt is constructed on the vectors of a span L 1 and
dim Lt = dim L1, then the spans coincide.
5. Prove that a vector space of polynomials with real coefficients given over
a field of real numbers is infinite dimensional.

17. Simple examples of vector spaces


The fundamental concepts of linear dependence
and of basis can be illustrated by very simple but instructive examples
if we take as vector spaces numerical sets with the usual operations
of addition and multiplication. That the axioms of a vector space
hold for such sets is quite obvious and therefore we shall not verify
their validity. As before elements of a space will be called vectors.
Consider the complex vector space which is an additive group of
all complex numbers with multiplication over the field of complex
numbers. It is clear that any nonzero number z1 is a linearly in-
dependent vector. But even any two nonzero vectors z1 and z2 are
always linearly dependent. To prove this it suffices to find two com-
plex numbers a 1 and a 2 , not both equal to zero, such that a 1z1 +
+ a~ 2 = 0. But this equation is obvious for a 1 = -z 2 and a 2 = z1 •
Therefore the vector space considered is one-dimensional.
Somewhat different is the real vector space which is an additive
group of all complex numbers with multiplication over the field
of real numbers. As coefficients of a linear combination we can now
use only real numbers and therefore this vector space cannot be one-
dimensional. Indeed, there are no real numbers a 1 and a,, all non-
zero, such that for them the linear combination a 1 z1 + a~ 2 would
vanish, for example, when z1 = 1 and z2 = i. It is left as an exercise
for the reader to prove that this vector space is two-dimensional.
It is important to stress that although both of the above spaces
consist of the same elements they are fundamentally different from
each other.
It is now clear that the real vector space which is an additive
group of all real numbers with multiplication over the field of real
numbers is a one-dimensional space. We consider further the rational
vector space which is an additive group of all real numbers with
multiplication over the field of rational numbers.
56 The Structure of a Vector Space [Ch. 2

We shall try, as before, to construct a system containing a maxi-


mum number of linearly independent vectors r 1 , r 2 , r 3 , . . . . It is
clear that we may take, for example, r 1 = 1. Since only rational
numbers are allowed to be the coefficients of linear combinations
a 1 , a 2 , a 3 , • • • , it is clear that no number of the form a 1 ·1 can
represent, for example, V 2. Therefore this space cannot be one-di-
mensional. Consequently, it is V 2 that can be taken as a second
vector linearly independent of the identity element. A number of
the form a 1 ·1 + a 2 • y2 cannot, however, represent, for example,
V2.
Indeed, let ~2 = a 1 + a 2 y2 hold for some rational numbers
a 1 and a 2 • Squaring both sides of the equation we get

or
2(1-2a 1a 2 )
a~+2a:
V2.
This is impossible since the left-hand side has a rational and the
right-hand side has an irrational.
So the space under consideration cannot be two-dimensional
either. But then what sort is it? Surprising as it may be, it is infinite
dimensional. However, the proof of this fact is beyond the scope of
this book.
The particular attention we have given to the examples of vector
spaces of small dimensions is due to the possibility of using them to
construct vector spaces of any dimensions. But we shall discuss
this later on.

Exercises
1. What is the dimension of the vector space of ration-
al numbers over the field of rational numbers?
2. Construct linearly independent systems of vectors in the space of com-
plex numbers over the field of rational numbers.
3. Is an additive group of rational numbers over the field of real numben
a vector space? If not, why?

18. Vector spaces


of directed line segments
We have already noted earlier that the sets
of collinear directed line segments, of coplanar directed line segments
and of directed line segments in the whole of space form vector
spaces over the field of real numbers. Our immediate task is to show
their dimensions and to construct their bases.
18] Vector spaces of directed line segments 57

Lemma 18.1. A necessary and sufficient condition of the linear


dependence of two vectors is that they should be collinear.
Proof. Notice that the lemma is obvious if of two vectors at least
one is zero. It will therefore be assumed that both vectors are non-
zero.
Let a and b be linearly dependent vectors. Then there are numbers
a and ~ such that
cxa + ~b = 0.
Since under the hypothesis a =I= 0 and b =1= 0, we have a =1= 0 and
~ =I= 0 and therefore

b= (-~)a.
Consequently, by the definition of multiplication of a directed line
segment by a number, a and b are collinear.
Suppose now that a and b are collinear vectors. Apply them to a
common point 0. They will be on some straight line which is turned
into an axis by specifying on it a direction. The vectors a and b
are nonzero and therefore there is a real A such that the magnitude
of the directed line segment a equals the product of the magnitude of
b by A, i.e. {a} =A {b}. But by the definition of multiplication of
a directed line segment by a number this means that a = Ab. So the
vectors a and b are linearly dependent.
It follows from the above lemma that a vector space of collinear
directed line segments is a one-dimensional space and that any nonzero
vector may serve as its basis.
Lemma 18.1 allows us to deduce one useful consequence. Namely,
if vectors a and b are collinear and a =I= 0, then there is a number A
such that b = 'Aa. Indeed, these vectors are linearly dependent, i.e.
for some numbers a and ~. not both zero, aa + ~b = 0. If it is
assumed that ~ = 0, then it follows that a = 0. Therefore ~ =I= 0
and we may take A = (-a)/~ as a number A.
Lemma 18.2. A necessary and sufficient condition for three vectors
to be linearly dependent is that they should be coplanar.
Proof. We may assume without loss of generality that no pair of
the three vectors is collinear since otherwise the lemma follows
immediately from Lemma 18.1.
So let a, b and c be three linearly dependent vectors. Then we can
find real numbers a, ~ and y, not all zero, such that
cxa + ~b + yc = 0.
If, for example, y =1= 0, then this equation yields

c= ( - ~ ) a+ ( - ~ ) b.
58 The Structure of a Vector Space (Ch. z
Apply a, b and c to a common point 0. Then it follows from the
last equation that the vector c is equal to the diagonal of the paral-
lelogram constructed on the vectors ( -a./y) a and ( -~/y) b. This
means that after translation to the
ccommon point the vectors a, b and
8~-----,., c are found to be in the same plane
and consequently they are coplanar.
h c Suppose now that a, b and c are
coplanar vectors. Translate them
o to the same plane and apply them
a A to the common point 0 (Fig. 18.1).

-
Fig. t8.t Draw through the terminal point

a and b and consider the parallelogram OACB. The vectors a, OA


and b, 08 are collinear by construction and nonzero and therefore
there are numbers A. and 1.1. such that
-
of c the straight lines parallel to

- -- ----+
OA=A.a,
---+
OB=1.1.b.
But OC = OA + OB, which means that c = M + 1.1.b or
A.a + 1.1.b + (-1) c = 0.
Since A., 1.1. and -1 are clearly different from zero, the last equation
implies that a, b and c are linearly dependent.
We can now solve the question concerning the dimension of the
vector space of coplanar directed line segments. By Lemma 18.2
the dimension of this space must be less
than three. But any two noncollinear
directed line segments of this space are
linearly independent. Therefore the vector
space of coplanar directed line segments is
a two-dtmensional space and any two non-
collinear vectors m11y serve as its basis.
Lemma 18.3. Any four vectors are linearly
dependent. c
Proof. We may assume without loss of
generality that no triple of the four vectors
are coplanar since otherwise the lemma Q-=:;;...._+--~
11
follows immediately from Lemma 18.2.
Apply vectors a, b, c and d to a common Fig. 18.2
origin 0 and draw through the terminal
point D of d the planes parallel to the planes determined respec-
tively by the pairs of vectors b, c; a, c; a, b (Fig. 18.2). It follows
from the parallelogram law of vector addition that
----+ ----+ ----+
OD=OC+OE,
18] Vector spaces of directed line segments 59

therefore

(18.1)

- - -
The vectors a, OA, as well as b, OB and c, OC, are collinear by
construction, with a, b and c being nonzero. Therefore there are
numbers A., 1..1. and v such that
---+ ---+ ---+
OA=Aa, OB=!J.b, OC=vc.

Considering (18.1) this yields


d = A.a + 1..1.b + vc
from which it follows that a, b, c and d are linearly dependent.
From Lemma 18.3 we conclude that the dimension of the vector
space of all directed line segments must be less than four. But it
cannot be less than three since by Lemma 18.2 any three noncoplanar
directed line segments are linearly independent. Therefore the vector
space of all directed line segments is a three-dimensional space and
any three noncoplanar vectors may serve as its basis.
The vector spaces considered are not very obvious geometrically,
since they allow the existence of infinitely many equal vectors. They
become much more obvious if we choose one representative from
each class of equal vectors and always mean by "vector" a directed
line segment from the collection of only these representatives.
One of the most convenient ways of choosing a vector is through
considering the set of directed line segments fixed at some point 0.
Then instead of the vector space of collinear directed line segments
we obtain a space of collinear line segments fixed at 0 and lying
on a straight line passing through 0; instead of the vector space of
coplanar directed line segments we obtain a space of directed
line segments fixed at 0 and lying in the plane through 0;
and finally instead of the vector space of all directed line segments
we obtain a space of directed line segments fixed at 0.
In what follows we shall deal mostly with only fixed vectors. The
corresponding vector spaces will be denoted by V 17 V 2 and V3 , where
the subscript stands for dimension. A vector space consisting of
only a zero directed line segment will be denoted by V 0 •
These spaces allow us to establish a 1-1 correspondence between
points and directed line segments. To do this it suffices to assign
to every vector its terminal point. Bearing this geometrical inter-
pretation in mind we shall sometimes call elements of an abstract
vector space points rather than vectors.
GO The Structure of a Vector Space [Ch. 2

In V1 , V1 and V8 establish the geometrical meaning


of such notions as:
1. S~an.
2. Linear dependence and independence.
3. Eguivalent systems of vectors.
4. Elementary equivalent transformations of a system of vectors.
5. Rank of a system of vectors.

19. The sum and intersection


of subspaces
The introduction of spans has shown that
every vector space contains an infinite number of other vector spaces.
The significance of these spaces is not limited to the questions con-
sidered above.
Spans were given by us by directly indicating their structure. We
could use another way, that of defining "smaller" spaces in terms of
the properties of vectors. Let L be a set of vectors in a vector space K.
If under the same operations as in K the set L is a vector space, then
it is said to be a linear subspace, thus stressing the fact that a sub-
space consists of vectors of some space. It is clear that the smallest
subspace is that consisting of only a zero vector. This will be called
a zero subspace and designated 0. The largest subspace is the space K.
These two subspaces are trivial, the others are nontrivial. It is also
obvious that together with every pair of its elements x and y any
subspace contains all their linear combinations a.x + ~Y· The con-
verse is also true. ~amely:
If a set L of vectors of the vector space K contains together with
every pair of its elements x and y all their linear combinations
a.z + ~y. then it is a subspace.
Indeed, of all vector space axioms it is necessary to verify only
the axioms of the zero and negative vectors. The rest of the axioms
are obvious. Take a = 0 and ~ = 0. From the consequences of the
operations for vectors of K we conclude that O·x + O·y = 0, i.e.
that the zero vector is in L. Now take a = -1 and ~ = 0. We have
(-1) x + O·y = (-1) x and therefore together with every vector x
the set L contains its negative. So L is a subspace.
The existence of a basis says that in any finite dimensional space
any subspace is a span. In finite dimensional vector spaces therefore
the span is the most general way of giving linear subspaces. It is
not so in an infinite dimensional space. ~evertheless it should not
be forgotten that there is very much in common between concepts and
facts from finite dimensional spaces and those from infinite dimen-
sional spaces. To emphasize this, even in finite dimensional spaces
we shall oftener use the term linear subspace than span.
1!ll The sum and intersection of suhspaces 61

Let K be an n-dimensional space. As in the space K, a basis can be


constructed in any of its subspaces L. If a basis e1 , e 2 , . • • , en is
chosen in K, then in the general case one cannot choose basis vectors
of a subspace L directly among the vectors e1 , e 2 , . • • , en if only
because L can have none of them. We have, however, the converse,
in a sense,
Lemma 19.1. If in some subspace L of dimen.sion s an arbitrary
basis t 1 , . . • , ts has been chosen, then we can always choose vectors
ts+l• . . . , tn in a space K of dimension n in such a way that the system
of vectors t 1, . . . , t 8 , ts+J• . . . , tn is a basis in the whole of K.
Proof. Consider only the linearly independent systems of vectors
in K that contain the vectors t 1 , • • • , t 8 • It is clear that among these
systems there is a system t 1 , • • . , t 8 , ts+l• ... , tp with a maximum
number of vectors. But then whatever the vector z in K may be the
system t 1 , . . . , tp, z must be linearly dependent. Consequently, z
must be linearly expressible in terms of the vectors t 1 , • . • , tp. This
means that t 1 , • • • , t 8 , ts+lt . . . , tp form a basis K and that p = n.
Consider again an arbitrary vector space K. It generates the set
of all its subspaces, which we denote by U. On U we can define two
algebraic operations allowing some subspaces to be constructed
from other subspaces.
The sum L 1 + L 2 of linear subspaces L 1 and L 2 is the set of all
vectors of the form z = z + y, where z E L1 and y E L 2 .
The intersection L1 n L 2 of linear subspaces L 1 and L 2 is the set of
all vectors simultaneously lying in L1 and £ 2 •
Notice that both the sum of subspaces and their intersection are
nonempty sets, since they clearly contain the zero vector of K. We
prove that they are subspaces.
Indeed, take two arbitrary vectors z1 and z 2 from the sum L1 + £ 1 •
This means that z1 = z 1 + y 1 and z2 = z 2 + y 2 , where z 11 z 2 E L 1
and y1 , y 2 E L 2 • Consider now an arbitrary linear combination
az1 + ~z 1 • We have az1 + ~z 2 = (az1 + ~z 2 ) + (ay 1 + ~y 2 ). Since
az1 + ~z 2 E L1 and ay 1 + ~Y 2 E L 2 , we have az1 + ~z 2 E L1 + L 1 •
Therefore, L1 + L 2 is a subspace. Now let z1 , z2 E L 1 n L 2 , i.e.
z1 , Z 2 E L 1 and z1 , Z 2 E L 2 • It is clear that az1 + ~z 2 E L1 and az 1 +
+ ~z 2 E L 2 , i.e. az1 + ~z 2 E L 1 nL 2 • Hence L1 n L 2 is also a subspace.
Thus the operations of addition of subspaces and of their inter-
section are algebraic. They are obviously commutative and asso-
ciative. Moreover, for any subspace L of K
L + 0 = L, L n K = L.
There are no distributive laws relating the two operations.
As can easily be seen even from the simplest examples, the di-
mension of the sum of two arbitrary subspaces depends not only
on those of the subspaces but also on the size of their common part.
We have
62 The Structu1e of a Vector Spnce [Ch. 2

Theorem 19.1. For any two finite dimen.sional subspaces L 1 and L 1


n
dim (L 1 L 2) +
dim (L 1 + L 2) = dim L 1 dim L 2 • (19.1) +
Proof. Denote the dimensions of L 1 , L 2 and L 1 n L 2 by r 1 , r 2 and m
respectively. Choose at the intersection 2 L 1 nL some basis c1 , • • •
• • • , em. These vectors are linearly independent and are in L 1 • By
Lemma 19.1 there are vectors a 1 , • • • , a 11 in L 1 such that the system
ah • • • , a 11 , c1 , • • • , em is a basis in £ 1 • Similarly there are \'ectors
bit ... , bp in L 2 such that b1 , • • • , bp, c1 , • • • , em is a basis in L 2 •
We have
r1 = k m,+ r 2 = p + m.
If we prove that
a 1, ••• , a 11 , c1 , • • • , Cm, b 1, ••• , bp (19.2)
is a basis of the subspace L 1 + £ 2, then the theorem holds since
m + (k + m + p) = (k + m) + (p + m).
Any vector in L 1 and L 2 is linearly expressible in terms of the
vectors of its basis and of course any is linearly expressible in terms
of vectors (19.2). Therefore any vector in the sum L 1 L 2 is also +
linearly expressible in terms of these vectors. It remains for uslto
show that system (19.2) is linearly independent. Let
~al + • • • + <Z"a" + '\'1C1 + · · · + '\'mCm
+ ~lbl + ... + ~pbp = 0 (19.3)
and let
(19.4)
It is clear that b E L 2 • But from (19.3) it follows that b E £ 1 • Con-
sequently, b E L1 £2, i.e. n
b = V1C1 + ... + VmCm (19.5)
for some numbers v 1 , • • • , vm. Comparing (19.4) and (19.5) we get
~1b1 + ••• + ~pbp + (-vl) + · · · + C1 (-vm) Cm = 0.
The system of vectors b1 , • • . , bp, c1 , • • • , em is linearly indepen-
dent by construction and therefore
~1 = • • • = ~p = V1 = •••= Vm = 0.
By virtue of the linear independence of ah • • • , a 11 , c1 , • • • , em it
now follows from (19.3) that
~ = ... = a" = '\'1 = ... = Ym = 0.
1'bus the theorem is proved.
20] The direct sum of subspaces 63

Exercises
1. Consider a vector space V 8 to establish the geome~
rica) meaning of the sum and intersection of subspaces.
2. What is the sum of subspaces V1 and V2?
3. What is the intersection of subspaces V1 and V2?
4. Prove that the dimension of the intersection of any number of subspaces
does not exreed the minimum dimension of those subspaces.
5. Prove that the dimension of the sum of any number of subspaces is not
less than the maximum dimension of those subspaces.

20. The direct sum of subspaces


Let £ 1 , L 2 , • • • , Lm be subspaces of some
vector space. By the definition of the operation of addition any
vector z in the sum
K = L1 + L + ... + Lm
2 (20.1)
may be represented as
z = z1 + z 2 + ... + Zm, (20.2)
where z 1 E £ 1 for every i. In general this representation is not unique.
But if every vector in K allows the unique representation (20.2), then
sum (20.1) is called a direct sum and is designated as follows:
. .
K= L1 +L 2 + ... +Lm.
. (20.3)
Direct sums possess many special properties. But we shall be
concerned not so much with these properties as with the common
features of representation (20.2) and expansion with respect to a basis.
Suppose some space K may be represented as a direct sum (20.3)
of its subspaces £ 1 , L 2 , • • • , Lm. Then by virtue of the uniqueness of
representation (20.2) the system of subspaces £ 1 , L 2 , • • • , Lm may be
regarded a!" some "generalized basis" of K and representation (20.2)
as an expansion with respect to the "generalized basis". Such an
interpretation of a direct sum is especially helpful in the study of
vector spaces of higher dimensions, since in those spaces one has as
a rule to study not all the components in the expansion with respect
to a basis but only a small portion of them. Using direct sums makes
it possible to avoid both cumbersome expansions and investigating
unnecessary details.
Let K be an n-dimensional vector space. Take its arbitrary basis
e1 , e 2 , • • • , en and construct a collection of spans L 1 = L 1 (e1), L 2 =
= L 2 (e 2), • • • , Ln = Ln (en)· It is then obvious that K is the direct
c:um of these n one-dimensional subspaces. But K may be represented
in various ways as direct sum of subspaces of other dimension. Such
a representation is based on
The Structure of a Vector Space [Ch. 2

Theorem 20. t. For a space K to be the direct sum of tts subspaces


L 1 , • • • , Lm it is necessary and sufficient that the union of the bases of
tlwse subspaces slwuld constitute a basis of the entire space.
Proof. Let K be the direct sum of subspaces L 1 , • • • , Lm and let
vectors e1 , • • • , e.,; ... ; e8 m-1 +It . . . , e8 m constitute bases of those
subspaces. Then for any vector z in K we have representation
(20.2). By representing each of the vectors z 1 as an expansion with
respect to the basis of the corresponding subspace L 1 we get
z=a 1e 1 - ••• +a, 1e31 + ... +aa m-1 +lea m-1 + 1+ ... +aa m ea m (20.4)
for some numbers a 11 • • • , a•m·
Thus every vector in K may be represented as a linear combination
of vectors e11 • • • , ea m . To assert that those vectors constitute a basis
of K it remains to prove that they are linearly independent. Consider
the equation
~1e1 + · · · + ~& 1 e& 1 ·;- • • • T f3a m-1 +lea m-1 +I+ . · · +~a m ea m = 0 (20.5)
with numerical coefficients ~ 1 , • • • , ~s m and let
~1e1 + · · · + ~•1e31 = Y1•
(20.6)
f3, m-1+lea m-1+I+· ··+~a m ea m = Ym·
It is obvious that y 1 E L 1 and it follows from (20.5) that
0 = Y1 + • • • T Ym·
Every subspace contains a zero vector and therefore it is obvious
that
0 = 0 + ... + 0.
From the uniqueness of the expansion of the zero vector in K with
respect to the subspaces L 1 , • • • , Lm we conclude that
Y1 = · · · = Ym = 0.
It follows that all the coefficients of linear combinations (20.6) are
zero, i.e. that the vectors e1 , • • • , ea m are linearly independent.
Suppose now that the vectors e1 , . • • , ea,; ... ; e,m_ 1+I• ••• , eam
constituting the bases of L1 , • • • , Lm form a basis of K. Then for
any vector z in K there is a unique expansion (20.4). Letting
a 1e1 + ... + a 81 ea 1= z 1,
(20.7)
a 3 m-1 +te3 m-1 +t+···+aa m esm =Zm,
we see that for z there is at least one representation (20.2). Every
vector z 1 in (20. 7) is a linear combination of basis vectors of L 1•
From the uniqueness of expansion (20.4) for z we conclude that
representation (20.2) is also unique for it. Thus the theorem is proved.
21) Isomorphism of vector spaces 65

1. Under what conditions is V8 a duect sum of its


subspaces V1 and V 1 ?
2. Under what conditions is V 1 a direct sum of two of its subspaces of the
type V1?
3. Can V8 be a direct sum of two of its subSJl&ces of the type V1 ? If not, why?
4. Prove that for sum (20.1) to be direct 1t is necessary and sufficient that
repre.ntation (20.2) should be unique for the zero vector.
5. Prove that for sum (20.1) to be direct it is necessary and sufficient that
the intersection of each of the subspaces L,. t = 1, ... , m, with the sum of the
others should contain only a zero vector.

21. Isomorphism of vector spaces


Consider the set of all vector spaces over the
same field P. It is natural to ask in what they are similar and in
what different.
The description of every vector space contains two, essentially
different, parts. First, a vector space is a collection of specific objects
called vectors. Second, the operations of addition and multiplication
by a number that have some properties are defined on those specific
objects. We may be concerned therefore either with the nature of
vectors and their properties or with the properties of the operations
regardless of the nature of the elements.
We were concerned with the nature of vectors only when we studied
directed line segments and only to an extent necessary for introduc-
ing the operations and establishing their properties. After that our
investigation of directed line segments was based solely on the
properties of operations. We shall proceed in a similar way in every
particular case too. Therefore two spaces with the same structure
of addition and multiplication by a number will be assumed to pos-
sess the same properties or to be isomorphic. More precisely:
Two vector spaces over the same field are said to be isomorphic if
between their vectors a 1-1 correspondence can be established such that
to the sum of any two vectors of one space there corresponds the sum of
the corresponding vectors of the other and to the product of some number
by a vector of one space there corresponds the product of the same number
by the corresponding vector of the other.
Let K and K' be two isomorphic spaces. The fact that every vector
x in K is assigned a definite vector x' in K' may be understood as
an introduction of some "function"
x' = w (x) (21.1)
whose "independent variable (or argument)" is a vector x in K and
whose "value" is a vector x' in K'. Both properties of that function
can now be written as follows. For any x and y in K and any
5-0IItO
66 The Structure of a Vector Space [Ch. 2

number A.
w (x ...L y) = (I) (x) + <•) (y),
(21.2)
w (Ax) = A.w (x).
The 1-1 correspondence between K and K' implies that to any
different independent variables of the function (21.1) there corre-
spond diffPrent ntlues, i.e. if
X =fo y, (21.3)
then
w (x) =F w (y). (21.4)
Consequently, the equality or nonequality of the values of the
function implies respectively the equality or nonequality of its
independent variables.
Isomorphic spaces have much in common. In particular, to a zero
vector there corresponds a zero vector, for
w (0) = w (O·x) = O·w (x) = O·x' = 0'.
The most important consequence, however, is that a linearly in-
dependent system of vectors is sent into a linearly independent
system.
Indeed, let x 1 , x 2 , • • • , Xn ben linearly independent vectors. Con-
sider now a linear combination a 1 w (x 1 ) + a 2 w (x 2 ) + ... + anw (xn)
and equate it to zero. By the property of an isomorphism
0' = a 1 w (x1) +a 2 w (x 2 ) + ... + anw (xn)
+ a 2X 2 + ... + CXnXn)
= w (a 1 x 1 = w (0),
from which we have
a 1x 1 +a 2 x2 + ... + anXn = 0.
Since x 1 , x 2 , • • • , Xn are linearly independent, all the coefficients
must be zero.
The consequence we have proved makes it possible to state that
if two finite dimensional vector spaces are isomorphic, then they
have the same dimension. The converse is also true. Namely, we have
Theorem 21.1. Any two finite dimensional vector spaces having the
same dimension and given over the same field are isomorphic.
Proof. Let K and K' be two vector spaces of dimension n. Choose
a basis e1 , e 2 , • • • , en in K and a basis e;, e~, ... , e~ in K'. Using
these systems of vectors construct an isomorphism w as follows.
To every vector
x = a 1e1 a 2e 2+ + ... +
anen
in K assign a vector
w (x) = a 1 e~ + a e; + ... + ane~
2
21) Isomorphism of vector spaces 67

in K'. The correspondence will be 1-1 since an expansion with respect


to a basis is unique.
Take then any two vectors z and y in K and an arbitrary number ).
and assume that
z = a 1e1 +
a 2e 2 + ... +
anen,
Y = ~lei + ~2e2 + · · • + ~nen.
We have
w (z + y) = w ((at+ ~t) et + (az + ~z) ez +
... +(an+ ~n) en)
=(at+ ~t) e~ + (Ciz + ~J e; + ... +(an+ ~n) e~
+ ...
=(ate~+ a 2e; +Cine~)
+ (~ 1 e; + ~2e; + ... + ~ne~) = w (z) + w (y),
w ().z) = w ((Aat) e1+ (ACiz) e2 + ••• + (Aan) en)
= (Aat) e: + (~ e; + ... -t- (Aan) e~
=).(ate:+ a 2e; + ... + ane:a) = Aw (z).
These equations prove the theorem.
This theorem is very important. It is this theorem that allows us
to say with certainty now that in terms of all the consequences of
the axioms any two vector spaces having the same dimension and
given over the same field are indistinguishable. Consequently, we
could construct just one n-dimensional vector space over a given
field and show the regularities common to all finite dimensional
spaces by investigating just that single space.
Let P be some field. Consider a set whose elements are all possible
ordered collections of n numbers a 1 , a 2 , • • • , an from P. If x is an
element of that set, then we shall write
z = (alt a 2, ••• , an)· (21.5)
The operations of addition and multiplication by a number ). from
the field P will be defined as follows:
(al, a2, •..• an) + (~lt ~2• ••. , ~n)
= (al + ~1• CI2 + ~2• • • •• Cin +~n), (21.6)
). (a1 , a 2 , ••• , an) = (Aa 1 , Aa 2 , ••• , Mn)•
It is easy to check that the axioms of a vector space hold. In
particular, the zero vector is defined by a set of zeros alone, i.e.
0 = (0, 0, ••. , 0),
and the negative of vector (21.5) is like this:
-z = (-alt -a'.!, .•. , -an)·
68 The Structure of a Vector Space [Ch. 2

This is an n-dimensional space and one of its bases is easy to show


at once. Namely,
e1 = (1, 0, ... , 0, 0),
e 2 = (0, 1, ... , 0, 0),
(21. 7)
en = (0, 0, ... , 0, 1).
Since for element (21.5) we have the expansion
x = a 1 e1 + a 2e2 + ... + anen,
the numbers alt a 2, • • • , CI11 will be called the coordinates of the vec-
tor x.
We shall call a space of such a type an arithmetical space and denote
it by Pn, thus emphasizing its relation to the field P. If Pis the field
of complex, real or rational numbers, then such n-dimensional spaces
will be denoted by Cn, Rn and Dn respectively.
It may now seem that there is no need to study arbitrary n-dimen-
sional vector spaces. Indeed, we know that in terms of the conse-
quences of the axioms isomorphic vector spaces are indistinguishable
and therefore we can always successfully study, for example, Pn
alone. However, general arguments allow us to show the most impor-
tant properties of vector spaces, i.e. the ones that are independent of
basis systems or, in other words, are invariant under isomorphisms.
Studying spaces Pn alone we should always be tied to a particular
basis and therefore it would not always be easy to see the invariance
of deductions. Besides, it is necessary to see to it that particular
properties of a space P n are not referred to the general properties of
vector spaces. This is not always sufficiently eas:v to do.
In conclusion note one more fact. By analogy with a space Pn
consider the space P ao whose elements are all po:-c;ible ordered in-
finitely large collections of numbers a 1 , a 2 , • • • of the field P. An
element x of that set is by analogy with (21.5) designated
x = (a1 , a 2 , • • • )
and by analogy with (21.6) we introduce operations on elements.
Now P ao is an infinite dimensional space. If we assume that
infinite dimensional spaces are isomorphic to P ao, it is not hard to
see that infinite dimensional and finite dimensional spaces must
have much in common. This example should not be forgotten.

Exercises
1. Construct an isomorphism from a space vl to the
space of reals over the field of reals.
-., 2. Construct an isomo~hism from a space V 2 to the space of complex num-
bers over the field of reals.
22) Linear dependence, systems of linear equa lions 69

3. Prove that in isomorphic spaces equivalent vector systems correspond to


equivalent vector systems.
4. Prove that in isomorphic spaces an intersection of subspaces corresponds
to an intersection of subspaces.
5. Prove that in isomorphic spaces a direct sum of subspaces corresponds to
a direct sum of subspaces.

22. Linear dependence and


systems of linear equations
Investigation of many questions associated
in some way with linear dependence reduces to solving the following
problem.
Let a 1 , a 2 , • • • , am be a system of vectors and let b be a vector.
Determine whether b is a linear combination of the given vector
system and find the coefficients of the linear combination.
If b is a linear combination of a 11 a 2 , ••• , am, then there are
numbers z11 z 2, • • . , Zm such that
Z 1a 1 + z 2 a 2 + ... + Zmam = b. (22.1)
Consequently, the above problem reduces to the investigation of
the vector equation (22.1) for the numbers z1 , z 2 , • • • , Zm.
Suppose that the vectors are given by their coordinates in some
basis e11 e2 , • • • , e~~., i.e.
al = (an, ... ,
az = (ai2• ... ,

am = (aim• a2m• · · ·• all.m),


b = (bl, b2, . . . , b~~.).
On equating the corresponding vector coordinates on the left and
right of (22.1) we get
auzl + al~2 + · · · + almZm = b1,
auzl + a2~2 + ... + a2mZm = b2,
(22.2)
a11. 1Z 1 + a11. z + ... + a11.mZm
2 2 = b11..
This system of equations which reflects coordinatewise notation of
equation (22.1) is called a system of linear algebraic equations. The
numbers b1 , b 2 , • • • , b11. are called the right-hand sides and z1 , z 2 , • • •
• • • , Zm are the unknowns of the system of equations. An ordered
collection of the values of the unknowns that satisfies each of the
equations (22.2) is called a solution of the system. If a system of
linear algebraic equations has at least one solution, then it is said
to be compatible; otherwise the system is incompatible.
70 The Structure of a Vector Space [Ch. 2

Thus the answer to the question of whether b is a linear combina-


tion of vectors a 1 , a 2 , • • • , am depends on whether (22.2) is compa-
tible or incompatible. If it is compatible, then any of its solutions
gives coefficients of the expansion of b with respect to the vector
system a~o a 2 , • • • , am.
Two systems of linear algebraic equations in the same unknowns
are said to be equivalent if each solution of one system is a solution
of the other or both are incompatible.
A general method of solving systems of equations may be based
on a successive transformation of the original system (22.2) to such
an equivalent system for which a solution is sufficiently easy to find.
We shall now describe one of these methods, called the method of
elimination or Gauss method.
In general the solution process consists of at most k - 1 steps.
To distinguish the coefficients of the unknowns and the right-hand
sides obtained in the process of transformation at various steps we
shall use an additional index, a superscript. According to this
remark the original system (22.2) will have the following form:
aWz 1+ a\<qz 2+ ... + a\~zm-= b\0>,
(0) {0)
a21Z 1 +a22z 2 + .•. + a2mZm =
(0) b(O)
2 ,
(22.3)
b(O)
(0)
a,.tz 1 ..!..
1 a,. 2 z2 +
(0)
.. . ....,a,.mZm=
I (0)
".
Consider the first equation. If all the coefficients of the unknowns
and the right-hand side are equal to zero, then the equation will
hold for any collection of numbers z1 , z 2 , • • • , Zm· Consequently, we
obtain an equivalent system if the first equation is omitted alto-
gether from consideration. It may happen that all the coefficients
of the unknowns in the first equation are equal to zero but the right-
hand side is not. Then such an equation cannot hold for any collec-
tion of numbers z1 , z 2 , • • • , Zm. In such a case the system is incom-
patible and we have done with the investigation of it.
Suppose that there is at least one nonzero coefficient among the
coefficients of the unknowns in the first equation. We may assume
without loss of generality that a~~) =F 0, since otherwise this can be
attained by rearranging the unknowns. The element a~0: is called
the leading element. We express z1 in the first equation in terms of
the remaining elements and the right-hand side and then substitute
the expression obtained for z1 in all the equations except the first.
Grouping similar terms everywhere we obtain a new system
(0) (0) (0) b(O)
a11 Z 1 1 a,2z 2 -r ... +a1mZm = 1 ,
a22 z 2 + ... + a2mZm -
(I) (I) -b(t)
2 ,
(22.4)
221 Linear dependence, syst.ems of linear equations 71

The coefficients of this new system are connected to the coefficients


of the old one by the following relations:
410)
a~!>-
I) -
al9>-
)
a<o) -1
11 - (0) -
t
4 11

for every i and j.


Systems (22.3) and (22.4) are equivalent. Indeed, let system
(22.3) be compatible. Then any of its solutions z1 , z2 , • • • , Zm turns
all equations of (22.3) into identities. Repeating the process of
elimination with any of the solutions once again we see that it is
a solution of system (22.4) as well. Suppose further that some solution
of system (22.4) is not a solution of system (22.3). It clearly satisfies
the first equation of (22.3). Let it not satisfy some equation with an
index i ~ 2. Then, repeating once again the elimination process,
we conclude that the solution chosen must not satisfy the ith equation
of system (22.4). But this contradicts the hypothesis. It is now
clear that if one of the systems is incompatible, so is the other.
We have described only the first step in the transformation of the
system. All the other steps are carried out in a similar way. At the
second step we eliminate the unknown z2 from all the equations
except the first two, at the third step we eliminate the unknown z3
from all the equations except the first three and so on. If in the
process of transformations we do not encounter equations where all
coefficients of the unknowns are equal to zero, then in k - 1 steps
we arrive at the system
1----···············--·---···-------···------·
!:allz
(0) (0) (0)
+al2z + •.. +au z~~, +al,ll+lzl!+
1
i (0) ...
(0) b(O)
1 1 -l- +atmZm= 1,
I: 1
(I)
2
(I) l (I) I (I)
a22z 2 + ••• +a211Z11 l+a2,11+tZII+I- ••. +a2mZm =
b(l)
2 ,
~ •.•.•...... ·~~~~; . , . ·(~-~)· . . . . . . . ;11~1; • • • ;11-1)
: a~11 Zll:+all,ll+lzll-'-t+ ... +allm zm-b,.
=·-················-···--·······-········-'
(22.5)

equivalent to system (22.4). If in the process of transformation we


encounter identically satisfied equations, then system (22.5) will
consist of a smaller number of equations.
The unknowns zll+l• • . . , Zn are called free unknowns. It is obvious
that regardless of the values assigned to them we can successively
determine all the others from system (22.5) beginning with z~~..
The coefficients a~~>, a~~~ , ... , a~\-ll by which we have to divide
72 The Structure of a Vector Space [Ch. 2

are the leading elements of individual steps and therefore they are
all nonzero.
So from the theoretical point of view the concept of linear depend-
ence has been investigated sufficiently fully. As to practice, however,
it may result in very serious difficulties. Consider, for example in
a space R 110 a system of vectors

a1 = (1, -2, 0, . .. , 0, 0) •
a 2 = (0, 1, -2, ... , o. 0),
.. (22.6)
all-1= (0, 0, o. .. ., 1, -2),
all=(-2-<11-o, 0, o. .. ., o. 1).

It is linearly dependent since


2- 11 a + 2-<II-Oa
1 2 + ... -7- 2- 1a11 = 0.
Notice that 2-< 11 - 1> < 10- 12 for k > 40 and therefore it is but
natural that in practical calculations a desire arises to neglect so
small a value of the coordinate. Besides as a rule all numbers are
not known exactly and nearly always contain significantly greater
errors. But even if the coordinates were known exactly, the very
first manipulations of them would lead to inexact results if the
calculations had been made approximately. It should be added that
most modern computers cannot recognize so small numbers as
2-< 11 - 1> for k > 64 and operate with them as with zeros. In actual
practice therefore instead of the system of vectors (22.6) we may
have to deal with the following system:

a1 = (1, -2, 0, ..., o. 0),


~=(0, 1, -2, ... ' 0, 0),
(22.7)
a11- 1 =(0, 0, o. ... ' 1, -2),
a"=(O, 0, 0, ... , 0, 1).

But this system is linearly independent.


Thus small changes in the coordinates of vectors may result in
a linearly dependent system becoming, under an approximate assign-
ment of coordinates and approximate calculations using them,
linearly independent, and vice versa, a linearly independent system
becoming linearly dependent. But then it is but natural to ask what
is the practical importance of such notions as linear dependence,
rank, basis, compatible and incompatible system and in general of
everything we have so far investigated? There is no simple answer
22) Linear dependence, systems of linear equations 73

to this question since it requires a deep understanding of the prob-


lems one solves. It is with this question that the differences distin-
guishing "exact" mathematics from "approximate" mathematics begin.

Exercises
1. Prove that if system (22.2) is compatible, then it
has a unique solution if and only if the system of vectors a,., a 1 , • • • , 11m is
linearly inde~ndent.
2. Prove ihat if a system of vectors a 10 a 1 , • • • , 11m has rank r, then sya-
tem (22.5) consists of r equations.
3. Assume that the solutions of a system are vectors of a space P . I.e\
b = 0 and let the system of vectors a 1 , a 1, . . . , 11m has rank r. Prove i&at in
this case the set of all solutions of (22.2) forms an (m - r)-dimensional sub-
space of Pm.
4. Find all solutions of the system of linear algebraic equations
"V2sl + 1·z2 = va.
2·zl+ "V2z 2 = "V6.
Solve the same system giving lf2, if§ and V6 to various accuracy. Compare
the results.
5. Establish the relation of the Gauss method to elementary transformations
of a system of vectors.
CHAPTER 3

Measurements in Vector Space

23. Affine coordinate systems


An enormous number of science and engineer-
ing problems require a precise description in space of various geo-
metrical objects such as points, figures, curves, surfaces and so on.
For a complex object it is very important to know not only a general
characterization of its location. such as indication of the centre of
gravity, but also the position of each of its individual points.
As an example recall that the prediction of lunar and solar eclipses
is possible because we know the position of celestial bodies at every
moment. Television broadcasts over great distances are possible
because the position of each point of the image being transmitted
is defined.
It is obviously necessary to give a method of describing the position
of only one individual point, since any geometrical object can be
given as some collection of points. Probably it would be useful to
consider independently the position of a point on a straight line,
in the plane or in space because a spatial description of an object is
by far not always appropriate. For example, a photograph can
obviously be considered only in the plane while the motion of a
particle, with no forces acting upon it, can be considered on a straight
line.
One of the most common descriptions of the position of a point
is based on a very simple idea. We have already noted that it is
possible to establish a 1-1 correspondence between all points and
fixed directed line segments. The description of the position of a
point therefore can be replaced by the description of the position
of the corresponding directed line segment. But the position of this
line segment is characterized by its coordinates relative to any basis,
i.e. by some ordered collections of numbers. Consequently, the
position of a point must also be characterized by ordered collections
of numbers. We now proceed to explore this idea.
Given some straight line, fix on it an arbitrary point 0 and con-
sider a space V1 of vectors lying on the given straight line and fixed
at the point 0. Choose in that space some basis vector a. Now turn
the straight line into an axis by specifying on it a direction so that
the magnitude of the segment a is positive (Fig. 23.1).
23] Affine coordinate syst.ems 75

The axis with the given point 0 and basis vector a forms an affine
coordinate system on the straight line. The point 0 is called the origin,
and the length of the vector a is the scale unit.

- -
The position of any point M on the straight line is uniquely deter-
mined by that of the vector OM. The vectors a and OM are collinear,
with a =fo 0, and so according to the consequence of Lemma 18.1
there is a real a such that
----+
OM=aa. (23.1)
That number is called an affine coordinate of the point M on the
straight line. The point M with a coordinate a is designated M (a).
Notice that with a fixed affine coordinate
system on the straight line relation (23.1)
uniquely defines the affine coordinate a of
any point M of that straight line. Obviously
11
,.
the converse is also true. Namely, relation Fig. 23.1
(23.1) makes every number a uniquely
define some point M of the straight line. Thus given a fixed affine
coordinate system there is a 1-1 correspondence between all real
numbers and the points of a straight line.
Giving points by their coordinates allows us to calculate the
magnitudes of directed line segments and the distances between
points. Let M 1 (a1 ) and M 2 (a 2 ) be given points. We have
~ ----+ ----+
{M 1M 2} = {OM 2 -0M 1} = {a 2a-a 1a} ={(a2 -a 1) a}
=(az-at){a}=(a2 -aJ Ia I. (23.2)
If p (.l/" .lf 2 ) denotes the dbtance between points M 1 and M 2 , then

p(Mt, Mz)= I {MtMz} I= I ~-at II a I· (23.3)


The formulas become particularly simple if the length of the basis
vector equals unity. In this case
~

{!11 1M 2}= a 2 -a11


(23.4)
p(Mt, Mz) =I ~-at I·
Now let some plane be given. Fix on it an arbitrary point 0 and
consider a space V 2 of vectors lying in the plane and fixed at 0.
Choose in that space some pair of basis vectors a and b. Specify
directions on the straight lines containing those vectors so that the
magnitudes of a and b are positive (Fig. 23.2).
The two axes in the plane intersecting at the same point 0 and
having the basis vectors a and b given on them form an affine coordi-
76 Measurements in Vector Space [Ch. 3

nate system in the plane. The axis containing a is called the x axis or the
axis of abscissas; the axis containing b is the y axis or the axis of
ordinates.

-
Again the position of any point M in the plane is uniquely deter-
mined by the vector OM and in turn there is a unique vector decom-
position of the form

(23.5)

The real numbers a and ~ are again called the affine coordinates of
the point M. The first coordinate is called the abscissa and the second
the ordinate of M. The point M
with coordinates a and ~ is de-
signated M (a, ~).
On the x and y axes there are
unique points M.>: and My such that
~ ~ ~

OM= Oil'/ X+ OM y· (23.6)


They are at the intersection of the
coordinate axes with the straight
lines parallel to the axes and pass-
ing through M. We call them the
Fig. 23.2

- - affine projections of the point M onto

-
the coordinate axes. The vectors
OM x and OM y are the affine projections of OM. From the uniqueness
of decompositions (23.5) and (23.6) we conclude that
~

0Mx=aa1 (23.7)
Thus if M has the coordinates M (a, ~). then M x and M Y• as
points of the plane, have the coordinates M x (a, 0) and My (0, ~).
Moreover if
~

OM=(a, ~).
then
~ ~
OMx= (a, 0), OMy=(O, ~).

Every basis vector forms a proper coordinate system on its axis.


The points M x and My therefore may be regarded as points of the
x and y axes given in those proper coordinate systems. It follows
from (23.7), however, that the coordinate of Mx on the x axis is equal
to the abscissa of M. Similarly, the coordinate of .11 Y on they axis
23) Affine coordinate systems 77

is equal to the ordinate of M. Obvious as they may be, these asser-


tions are very important since they allow us to use formulas (23.2)
to (23.4).
Specifying an ordered pair of numbers a and ~ uniquely defines
some point. Indeed, relations (23. 7) allow one to construct in a
unique way the affine projections of a point that uniquely determine
the point in the plane. Consequently, given a fixed affine coordinate
system there is a 1-1 correspondence between all ordered pairs of real
numbers and points in the plane.
Similarly we can introduce an affine coordinate system in space.
Fix a point 0 and consider a space V3 of vectors fixed at 0. Choose
in that space some triple of basis
vectors a, b and c. Specify directions
on the straight lines containing those
vectors so that the magnitudes of
the directed line segments a, b and
c are positive (Fig. 23.3).
The three axes in space intersecting
at the same point 0 and having the
basis vectors a, b and c given on
them form an affine coordinate system
in space. The axis containing a is called
the x axis or axis of x coordinates or
abscissas, the axis containing b is
the y axis or axis of y coordinates or Fig. 23.3
ordinates and the third axis is the z
axis or axis of z coordinates. Pairs of coordinate axes determine the
so-called coordinate planes designated x, y; y, z; x, z planes.

-
The position of any point M in space is again uniquely determined
by the vector OM for which there is a unique decomposition

The real numbers a, ~ and y are called the affine coordinates of a


point M in space. The first coordinate is called an x coordinate or
abscissa, the second is called a y coordinate or ordinate and the third
is a z coordinate of M. The point M with coordinates a, ~ and y is
designated M (a, ~. y).
Draw through the point M the planes parallel to the coordinate
planes. The points of intersection of these planes with the x, y and z
axes are denoted by M x• My and M z and called the affine projections
of the point M onto the coordinate axes. The intersection of the coordi-
nate planes with pairs of planes passing through M determines points

- -
M 11 z, Mxz and Mx 11 , called the affine projections of the point M onto
the coordinate planes. Accordingly the vectors OM uz and OMz and
78 Measurements in Vector Space [Ch . .i

_.

-- -- -
so on are called the affine projections of the vector OM. It is obvious that

- -
Oil{ =0Mx+ OM 11 + OM:,

-- -- --
OM 11 : =OM,+ OMz,
OMxz=OMx-+ 0111:,
OM xy =OM x + 0!11 11 •
We conclude, as in the case of the plane, that if a point M has the
coordinates
M (a, ~. y),
then the affine projections of that point will have the coordinates:
M x (a, 0, 0), M 11 (0, ~. 0), M z (0, 0, y), ( J.B)
2
M 111 (0, ~. y), M xs (a, 0, y), M xy (a, ~. 0).
Similarly, if

-
O il{- (a, ~. y),
then
-
OM x =(a, 0, 0), --
OM 11 =(0, ~.

OM xz= (a, 0, y),


0), --
OM:=--= (0, 0, y),
OM xy =(a, ~. 0).
Again every basis vector and every pair of basis vectors form prop-
er affine systems on the coordinate axes and coordinate planes.
And again the coordinates of points in those systems coincide with
the affine coordinates of the same points regarded as points in space.
Now, given a fixed affine coordinate system there is a 1-1 correspon-
dence between all ordered triples of real numbers and points in space.
Of the affine coordinate systems on the straight line, in the plane
and in space the most widely used are the so-called rectangular
Cartesian coordinate systems. They are characterized by all basis
vectors having unit length and the coordinate axes being mutually
perpendicular in the case of the plane and space. In a Cartesian
system basis vectors are usually denoted by i, j and k. In what follows
we shall use only these systems as a rule.

Exercises
1. Which of the points A (cz) and B (-cz) is to the
right of the other on the coordinate axis of Fig. 23.1?
2. What is the locus of points M (cz, ~. y) for which the affine projections Mx11
have coordinates Mx 11 (-3, 2, 0)?
24) Other coordinate systems 79

3. Do the coordinates of a point depend on the choice of direction on the


coordinate axes?.
4. How will tlie coordinates of a point change if the length of the basis vectors
is changed?
5. What coordinates has the centre of a parallelepiped if the origin coincides
with one of its vertices and the basis vectors coincide with its edges?

24. Other coordinate systems


Coordinate systems used in mathematics allow
us to give with the aid of numbers the position of any point in space,
in the plane or on a straight line. This makes it possible to carry
out any calculations with coordinates and, what is very important,
to apply modern computers not only to all sorts of numerical com-
putations but also to ;the solution of
geometrical problems and to the !I
investigation of any geometrical
objects and relations. Besides the affine
coordinate systems considered above
some others are not infrequently used.
The polar coordinate system. Choose
in the plane some straight line and fix
on it a Cartesian system. Call the origin
0 of the system the pole and the coor-
dinate axis the polar axis. Assume Fig. 24.1
further that the unit segment of the
coordinate system on the straight line
is used to measure the lengths of any line segments in the plane. Con-
sider an arbitrary point M in the plane. It is obvious that its posi-
tion will be completely defined if we specify the distance p between
the points M and 0 and the angle <p through which it is necessary
to turn the ray Ox counterclockwise about the point
_. 0 until its direc-
tion coincides with that of the line segment OM (Fig. 24.1).
The polar coordinates of the point M in the plane are the two num-
bers p and <p. The number p is the polar radius and the number cp
is the polar angle. It is usually assumed that
0~ p < +oo, 0 ~ <p < 2:rt. (24.1)

If the point M coincides with the pole 0, then the polar angle is
considered to be undefined.
Associated in a natural way with every polar coordinate system
is a rectangular Cartesian system. In this the origin coincides with
the pole, the axis of abscissas coincides with the polar axis and the
axis of ordinates is obtained by rotating the polar axis through an
angle of n/2 about 0.
80 Measurements in Vector Space rch. 3
Denote the coordinates of the point Min the Cartesian x, y system
by a and ~- We have the obvious formulas
a = p cos q>, ~ = p sin q>.

From these we obtain the inverse relations


a ~
p2 = a 2 + ~ 2 , cos q> = +(a'+ ~2) 112 , sin q> = +(a'+ ~2)1 1 ~ •

They allow us to calculate from the Cartesian coordinates of a point


its polar coordinates and vice versa.
Cylindrical coordinates. Choose in space some plane n and fix on
it a polar coordinate system. Through the pole 0 draw the z axis
%

!I !I

Fig. 24.2 Fig. 24.3

perpendicular to :rt (Fig. 24.2). Assume again that to measure the


lengths of any line segments in space we use the same unit segment.
Introduce in :rt a Cartesian system corresponding to the polar system.
Together with the z axis it forms a Cartesian system in space.
Consider the projections M z and M ry of the point M onto the z
axis and :r, y plane. The point M :ry as a point of :rt has polar coordi-
nates p and q>. The point M, as a point of the z axis has a z coordinate.
The cylindrical coordinates of the point M in space are the three
numbers p, q> and z. It is again assumed that
0:::;;;; p < +oo 1 0:::;;;; q> < 2:rt.
For the points of the z axis the angle q> is not defined.
Cartesian coordinates in the :r, y, z system and cylindrical coordi-
nates are connected by the relations
x = p cos q>1 y = p sin q>, z = 1.
Spherical coordinates. Consider in space a Cartesian x, y, z system
and the corresponding polar coordinate system in the x, y plane
(Fig. 24.3). Let M be any point in space other than 0, and let M ry
Some problems 81

-
be its projection onto the z, y plane. Denote by p the distance from
!If to 0, by a the angle between the vector OM and the basis vector
of the z axis, and finally by q> the polar angle of the projection M xy·
The spherical coordinates of the point M in space are the three
numbers p, q> and a. The number p is a radius vector, the number q> is
a longitude, and the number a is a colatitude. It is assumed that
0:::;;;; p < +oo, 0:::;;;; q> < 2n, 0:::;;;; a:::;;;; n.
The longitude is undefined for all the points of the z axis, and the
colatitude is not defined for the point 0.
Cartesian coordinates in the z, y, z system and spherical coordi-
nates are connected by the relations
z = p sin a cos q>, y = p sin a sin q>, z = p cos a.

Exercises
1. Construct a curve whose points in polar coordinates
satisfy the relation p = cos 3cp.
2. Construct a curve whose points in cylindrical coordinates satisfy the
relations p = cp- 1 and z = cp.
3. Construct a surface whose points in spherical coordinates satisfy the
relations 0 ~ p ~ 1, cp = :t/2 and 0 ~ a~ :t/2.

25. Some problems


Consider several simple problems in applying
Cartesian systems. For definiteness we shall consider problems in
space. Similar problems in the plane differ from those in space only
in minor details. It will be assumed throughout that some coordi-
nate system is fixed whose origin is a point 0 and whose basis vectors
are i, j and k.
The coordinates of a vector. Let M 1 (a1, ~ 1 • y1) and M2 (a2, ~2• '\'2)
----+
he two points in space. They determine a vector M 1 M 2 which has
some coordinates relative to the basis i, j and k. We establish the
----+
relation between the coordinates of M 1M 2 and those of the points
1111 and M 2 • We have
-~ ~ ~

MIM2=0M2-0MI.
Further, by the definition of the affine coordinates of M 1 and M '1.
~ ~

0!11 1 =a 1i +~ti +y 1k, OM 2 = a 2i +~ 2 j + Y2k.


Therefore it follows that
M1M2 = (a2- a1) i + (~2 - ~~) i + ('\'2- Y1) k
6-0510
82 Measurt>ments in Vector Space [Ch. 3

--
or according to the accepted notation

MtM2=(~-Cit, ~2-~t• Y2-Yt)·


Coordinate projections of a vector. Again consider the directed
----+
(25.1)

line segment M 1M 2 in space. On projecting the points M 1 and M 2

--
onto the same coordinate plane or the same coordinate axis we obtain
a new directed line segment. It is called a coordinate projection of the
vector M 1 M 2 •
Every vector in space has six coordinate projections-three pro-
jections onto the coordinate axes and three projections onto the
coordinate planes. It is easy to find the coordinate projections in
the basis i, j and k from the coordinate points M 1 (a1 , ~ 1 , y1) and

Considering that the points Mu and M 2 .~ have the coordinates


M 1 x (ait 0, 0), M 2x (a 2 , 0, 0),
--
JI 2 (a 2 , ~ 2 , y 2). To do this it suffices to use formulas (23.8) and (25.1).
For example, let us find the coordinates of the projection llfuM 2x.

we find that

-
M t.:.:Mu:=(a2-at, 0, 0). (25.2)
Similarly

--
Jl1t.xzM 2.:rz = (a2- Cit, 0, '\'2- Yt)
and so on for all the remaining projections.
Comparing the first of the formulas (23.4) with formulas of the

--
type (25.2) we conclude that

{Mt.:.:il-12.:.:} = a2- Cit, --


{MtuM 2u}= ~2-~t• --
{MtzM 2z} = 1'2- Yt·
Therefore the magnitudes of the projections of a vector onto the

--
coordinate axes coincide with the coordinates of that vector.
The second of the formulas (23.4) allows us to calculate the lengths
of the projections of M 1M 2 onto the coordinate axes from the coordi-

--
nates of M 1 and M 2 • Namely,

-- --
1Mt.:.:M2.:.: I= I a2-atl. I MtyM2u I= I ~2-~tl•

1 MtzM2% I= I Y2- Ytl·


The length of a vector. We establish a formula for the length of
a vector in space. It is obvious that the length I ---+ -M
M 1M 2 I of M 1
-
2
equals the distance p (M 1 , M 2 ) between M 1 and M 2 and the length
25) Some problems 83

of the diagonal of the rectangular parallelepiped whose faces are


parallel to the coordinate planes and pass through M 1 and M 2
(Fig. 25.1). The length of any edge of the parallelepiped equals that
-....-+
of the projection of the vector M 1 M 2 onto the coordinate axis parallel

(<~;-:.:---~2
I
I.,"'
,,'' /1.
I I
..... - - - - - ---- /1,1 ,.,,"
I
I 0 I II 17
""I 7
~!. ;llo
I I I "l I ,." !/
I ~I --------1--.,.l.:::_-+--Y"
: I,-'" I_,"
-----------~---~

Fig. 25.1

to the edge. Using therefore the Pythagorean theorem we conclude


that
~ --....-+ -....-+ -....-+ 1/2
+
I M1.1f2l =(I.Utx}}/2x 12 I M1yM2y !2 I M1zM2z 12) • +
If M 1 and M 2 are now given by their coordinates M 1 (a 1 1 ~~~ y1 )
and M 2 (a 2 , ~ 2 • y 2), then
P (M t• M ~ = ((az- at) 2 ' <~2- ~~) 2 + (Y2- Yt) 2 112
) • (25.3)
--....-+
If the vector M 1 M 2 is given by its z, y and z coordinates relath·e to
the basis i, j and k, then
-....-+
I M1M 2 1 = (z 2 +Y + z 2 2 112
) • (25.4)
Of similar form are the corresponding formulas for the plane. If the
---+
points M 1 (a1 , ~ 1 ) and M 2 (a 2 , ~ 2 ) or the vector M 1 M 2 = (z, y) are
given by their coordinates, then

The angle between vectors. Consider in space nonzero vectors


a and b. Apply them to a point 0. Denote by :rt the plane passing
through 0 and containing both vectors. The angle between a and b
is the smallest angle through which one of the vectors must be turned
about 0 in the plane :rt for its direction to coincide with that of the
other vector. If at least one of the vectors is zero, then the angle is
.-.
1\lcasuremcnts in Vector Spasc [Ch. a
undefined. Our task is to calculate the cosine of the angle between the
vectors from the coordinates of the vectors. We denote a cosine by
cos {a, b}.
Denote by A and B the terminal points of a and bin n. It is obvious
that the angle between a and b is nothing than the angle AOB of the
triangle AOB whose sides are the vec-
A tors a and b and b - a (Fig. 25.2).

~
Suppose a and b are given by their
coordinates
a = (xl, Y1• zl),
Q B
Then
Fig. 25.2
b- a = (x2 - xl, Y2 - Yl• z2 - zl).
As is known from elementary geometry, the square of the length
of one side of a triangle is equal to the sum of the squares of the
lengths of its other two sides minus the double product of the lengths
of those sides by the cosine of the angle between them. Therefore
Ib - a 12 = I a 12 + I b 12 - 2 I a I· I b I cos {a, b}
or, taking into account formula (25.4),
(x2- xl)2 + (Y2- y.)2 + (z2- zl)2 = x: + y: + zi
+ x~ + y~ + z~- 2 (x: + y: + z:) 112 (x~+ y~ ..,- z;) 112 cos {a, b}.
Performing elementary transformations we find
(25.5)

Changes in the formula for the plane are obvious.


Dh·iding a line se~ment in a given ratio. Suppose a straight line
is given in space and Jet M 1 and M 2 be two distinct points on that
straight line. Choose a positive direction on the straight line. On
--
the resulting axis M 1 and M 2 determine a directed line segment M 1 M 2 •
Let M be any point on the axis other than M 2 • The number
-~
Ao= {M 1,U} (25.6)
~
{MM 2 }
is cal Jed the ratio in which the point M divides the directed line segment
-~
M 1M 2 •

-
--+
Changing the direction on the axis makes the numbers {M1 M}
and {M M 2 } change sign simultaneously. Hence ratio (25.6) is in-
Some problems 85

dependent of the choice of positive direction on the axis. Also when


changing the scale of measuring the length of line segments on the
--+ ---+
axis the numbers {MIM} and {M M 2 } are multiplied by the same
number. Hence ratio (25.6) is independent of the choice of unit
length. It follows that ratio (25.6)
ts independent of the choice of coor- 1:
dinate system on the axis.
The problem is to calculate the
coordinates of the point M dividing
----+
.l/IM 2 in a ratio A given the coordi-
nates of the points MI and M 2 and the
number A, with A =fo -1. So let
.l/I (ai, ~I• VI) and M 2 (a2, ~2• '\'2)
be given and let M (a, ~. y) be
unknown. Project these points onto Fig. 25.3
the coordinate axes, for example the
x axis (Fig. 25.3). It is clear from similarity considerations that
----+
the point M x also divides the directed line segment Jl/I,M 2 ,.
in a ratio A. Therefore

(25. 7)

----+ ----+
By formula (23.4) {Jl/I.,.M,} = a - ai and {M.,Jlt/ 2.,} = a 2 - a.
~ow taking into account (25. 7), we find that a= (a1 + Aa 2 )/(1 + A).
The calculation of the coordinates ~ and y is similar. So
a,+).a2 ~ ~ 1 -l A~2 "Y•+Ai'2
a= I 1- ). ' = 1 -·- ). ' '\' = I -- ).
' '
----+
Notice that A> 0 if ill is inside the line segment MIM 2 , A< 0
----+
if Jll is ontside MIJlt/ 2 , and A = 0 if M coincides with MI. If M
moves from M I to M 2 (excluding the coincidence with Jl/ 2 ), the
ratio A first takes on a zero value and then all possible positive
Yalues successively in increasing order. If M moves from MI in the
positive direction of the axis (see Fig. 25.3), then A first assumes a
zero value and then negative values in decreasing order approaching
arbitrarily closely A = -1 but always remaining greater than
I. = -1. If M moves in the negative direction from M 2 , then A
takes on all possible negative values in increasing order but always
remains less than A = -1.
Thus a 1-1 correspondence could be established between all real
numbers and the points on the straight line if the straight line
----+
contained a point M dividing the line segment M 1 M 2 in the ratio
86 :\lea"urrmenl!> in Vector Space [Ch. 3

A. = -1 and the point M coinciding with M 2 could be assigned


some number. This question is usually solved by joining to the
straight line an ideal extra "point" and by joining to the number an
ideal extra "number". Such a point is called a "point at infinity"
and such a number is called an "infinitely large" number.

-
Orthogonal projections of a ''ector. Let u be some axis in space
and let AB be a directed line segment. Draw through the points
A and B the planes perpendicular to u (Fig. 25.4). The intersection
of the planes with the axis determines the points Au and Bu. with

--
Au lying in the same plane as A
and Bu in the same plane asB.

-
The directed line segment AuBu
is called the orthogonal projection
of AB onto the axis u. The follow-

Fig. 25.4
-- -
ing notation is used to denote it:

AuBu =- pr" AB.


For a fixed axis u e\'ery vector
x in space uniquely determines its
orthogonal projection x'. It may be assumed therefore that some
"function"
x' = prux (25.8)
is given whose "independent variable" may be any vector in space
and whose "value" is the vector on the axis rt. We shall now prove
that that function possesses the following properties:
Pru (x + y) = pru x pru y,+
(25.9)
Pru (A.x) = A.pru x,
true for any vectors x and y and any number A..
Indeed, fix some Cartesian system in which the axis u coincides
with the axis of abscissas. In that system let
x = (ai, ~h YI),
• •( A. )
Y,=. a2, 1-'2• '\'2,
then
X + Y = (al T CI2, ~1 T ~2• '\'1 + '\'2),
Ax = (A.al, ).~11 AYI)·
In the chosen coordinate system the orthogonal projection of a
vector onto the axis u coincides with its coordinate projection onto
the axis of abscissas. As already noted earlier, the first coordinate
of tbe projection of any vector onto the axis of abscissas coincides
2:)) Some problems 87

with the first coordinate of the vector, the other coordinates equalling
zero. Therefore
pru (z + y) = (a 1 + a 2 , 0, 0),
pru (Ax) = (A<x 1 , 0, 0),
(25.10)
pru z = (a 1 , 0, 0),
pru y = (a 2 , 0, 0).
According to the rules of vector addition and of multiplication of
vectors by a number, it follows from the last two equations of (25.10)
that
pru z + pru y = (a1 + a 2 , 0, 0),
A pru z = (A<x 1 , 0, 0).
Comparing the right-hand sides of these with the right-hand sides of
the first two equations of (25.10) we see that both properties of
(25.9) are true.
-
Now let :rt be some plane in space and let AB be a directed line
segment. On dropping from the points A and B perpendiculars to :rt
we obtain in :rt two points An and B:r. which determine a directed
----+ -
line segment A:r.B:r.. This is called the orthogonal projection of AB
onto n. The same notation is used to denote it, i.e.
----+ -
A:r.Bn = pr:r. AB.
Of course, for orthogonal projections onto the same plane we have
relations similar to (25.9). To prove this we may fix some Cartesian
system where n is a coordinate plane and use again the corresponding
properties of projections onto a coordinate plane.
We have discussed orthogonal projections of vectors in space.
Undoubtedly a close analogy holds for vectors in the plane.

Exercises
1. Two nonzero vectors are given by their Cartesian
coordinates. When are they perpendicular?
2. Find the coordinates of the centre of gravity of three particles, given their
Cartesian coordinates and masses.
3. Find the area of a triangle, given the Cartesian coordinates of its three
vertices.
4. Let z, a, b and c he nonzero vectors in space, with a, b and c mutually
perpendicular. Prove that
cos2 {x, a) + cos 2 {z, b} + cos2 {z, c) = 1.
5. Denote hy 1t any coordinate plane, and hy u denote any coordinate axis
in tt. Prove that for any vector z
pr 11 (pr:r. z) = pru z.
88 Mea"url'ments in Vector Spare [Ch. ;,

26. Scalar product


The use of directed line segments to repre-
sent forces and displacements leads to a very important notion of a
scalar product of vectors.
It is known from physics that if a vector a represents a force whose
point of application moves from the initial point of a vector b to its
term ina] point, then the work w of such a force is defined by the
equation
w = I a I I b I cos {a, b). (26.1)
The right-hand side of the equation is called the scalar product of
a and b and is generally designated (a, b). So
(a, b) = I a I 1b I cos {a, b). (26.2)
Strictly speaking, this definition of a scalar product refers only
to nonzero vectors a and b, since it is only for such vectors that an
angle is defmed. Taking into account the above interpretation of the
scalar product, however, it is easy to see that it must be defmecl
when at least one of the vectors is zero. If either a force or a displace-
ment is given by a zero vector, then the work done equals zero. It
will be assumed therefore that (a, b) = 0 if at least one of the vectors
a and b is zero.
Formula (26.2) yields some geometrical properties of a scalar
product. For example, the angle between two nonzero vectors is
acute (obtuse) if and only if the scalar product of the vectors is posi-
tive (negative).
If the angle between vectors on a straight line or at least one of
the vectors is zero, then the scalar product of the vectors equals
zero. Such vectors are called orthogonal.
Orthogonal vectors of unit length are called orthonormal vectors.
In particular, orthonormal vectors are the basis vectors i, j and k
of a Cartesian system. It follows from formula (26.2) that
(i, i) = 1, (i, j) = 0, (i, k) = 0,
(j, i) = 0, (j, j) = 1, (j, k) = 0, (26.3)
(k, i) = 0, (k, j) = 0, (k, k) = 1.
Consider nonzero vectors a and b. Draw through a an axis u,
specifying on it a direction such that the magnitude of a is positive.
lt is then obvious that
{pru b} = I b I cos {a, b).
The projection of b onto the axis, which is thus constructed, is called
the projection of the vector b onto the vector a and designated pr 0 b.
Of course, the projection of one vector onto the other preserves prop-
26) Scalar product 891

erties (25.9). In new symbols


(a, b) = I a I {Pra b} = I b I {prb a}. (26.4)
These formulas establish very important algebraic properties of
a scalar product. Namely, for any vectors a, b and c and any real
number a
(1) (a, b) = (b, a),
(2) (aa, b) = a (a, b),
(26.5}
(3) (a + b,
c) = (a, c) + (b, c),
(4) (a, a) > 0 for a =fo 0; (0, 0) = 0.
Notice that relations (26.5) clearly hold if at least one of the
vectors is zero. In the general case Properties 1 and 4 follow imme-
diately from (26.2). To establish Properties 2 and 3 we use formulas-
(26.4) and the properties of projections. We have
(aa, b) = I b I {Prb (aa)} = I b I {a·prb a}
= a I b I {prb a} = a (a, b) ..
(a+ b, c)= I c I {Pre (a+ b)}= I c! {Pre a- pre b}
= I c I {pr c a} + I c I {pr c b} = (a, c) - (b, c).

Properties 2 and 3 are associated with the first factor of a scalar-


product only. Similar properties hold for the second factor. Indeed ..
(a, ab) = (ab, a) = a (b, a) = a (a, b),
(a, b + c) = (b + c, a) = (b, a) + (c, a) = (a, b) + (a, c).
In addition, by virtue of the equation a- b = a + (-1) b
(a - b, c) = (a, c) - (b, c),
(a, b - c) = (a, b) - (a, c),
since
(a- b, c) = (a+ (--1) b, c) = (a, c) ((--1) b, c) +
= (a, c) + (-1) (b, c) = (a, c) -- (b, c)p
Theorem 26.1. If two vectors a and b are given by their Cartesian
coordinates, then the scalar product of those vectors is equal to the sum
of pairwise products of the corresponding coordinates.
Proof. Suppose for definiteness that the vectors are given in space ..
i.e. a = (z1 , y 11 z1 ) and b = (z 2 , y 2 , z 2). Since
a = z1i y1 j + z1k, +
b = z 2i yJ z2k,+ +
90 Measurements in Veclor Space [Cb. 3

carrying out algebraic transformations of the scalar product we find


(a, b) = .r1.r 2 (i, i) + .r1y 2 + .r1Z 2 (i, k) + y1.r 2 (j,
(i, j) i)
+ Y1Y2 (j, j) + Y1Z2 (j, k) + Z1Z 2 (k, i)
+ Z1Y2 (k, j) + z1z2 (k, k).
Now by (26.3) we have
(a, b) = Z1.r2 ~- Y1Y2 -L Z1Z2 (26.6)
and the theorem is proved.
Formula (26.6) allows expressions (25.4) and (25.5) obtained
earlier for the length of a vector and the angle between vectors to be
written in terms of a scalar product.
Namely,
I a I = (a, a) 112 ,
(a, b) (26.7)
cos{a. b} = lal·lbl

It may seem that these formulas are trivial, since they follow
immediately from (26.2), without any reference to formulas (25.4)
and (25.5). We shall not jump at a conclusion but note a very impor-
tant fact.
Notice that as a matter of fact our investigation proceeded in
three steps. Using (26.2) we ti.rst proved that properties (26.5) are
true. Then relying only on these properties and the orthonormality
of the basis vectors of a coordinate system we established formula
(26.6). And finally we obtained formulas (26. 7) by using formulas
(25.4) and (25.5) which were derived without using the concept of
a scalar product of vectors.
Starting from this we could now introduce the scalar product
not by giving its explicit form but axiomatically, as some numerical
function defined for every pair of vectors, requiring that properties
(26.5) should necessarily hold. Relation (26.6) would then hold for
any coordinate systems where the basis vectors are orthonormal in
the sense of the axiomatic scalar product. Consequently, bearing
in mind the model of a Cartesian system we could axiomatically
assume that the lengths of vectors and the angles between them are
calculated from formulas (26. 7). It would be necessary of course to
show that the lengths and angles introduced in this way possess the
necessary properties.

Exercises
1. Given two vectors a and b, under what conditions on
the number a are the vectors a and b +
aa orthogonal? What is the geometrical
interpretation of the problem?
27] Euclidean space 91

2. Let a be a vector given in V 3 by its Cartesian coordinates. Find two linear-


ly independent vectors orthogonal to a.
3. Let a and b be linearly independent vectors given in V3 by their Cartesian
coordinates. Find a nonzero vector orthogonal to both vectors.
4. What is the locus of vectors orthogonal to a given vector?

27. Euclidean space


Abstract vector spaces studied earlier are in
a sense poorer in concepts and properties than spaces of directed line
segments, first of all because they do not reflect the most important
facts associated with measurement of lengths, angles, areas, volumes
and so on. Metric notions can be extended to abstract vector spaces
in different ways. The most efficient method of specifying measure-
ments is through axiomatic introduction of a scalar product of vectors.
We shall begin our studies with real vector spaces.
A real vector space E is said to be Eucltdean if every pair of vectors
z and y in E is assigned a real number (z, y) called a scalar product,
with the following axioms holding:
(1) (x, y) = (y, z),
(2) (A.x, y) = A (x, y),
(27.1)
(3) (x + y, z) = (z, z) +
(y, z),
(4) (x, z) > 0 for z =fo 0; (0, 0) = 0
for arbitrary vectors z, y and z in E and an arbitrary real number A.
As is already known, it follows from these axioms that we can
carry out formal algebraic transformations using a scalar product,
i.e.
r " ' "
(L
1=1
a.x" ~ ~JYJ) =
1=1
L )~ a,~, (xi,
1=1 J=l
YJ)

for any vectors z 1 and y1, any numbers a 1 and ~~and any numbers
r and s of summands.
Any linear subspace L of a Euclidean space E is a Euclidean space
with the scalar product introduced in E.
It is easy to show a general method of introducing a scalar product
in an arbitrary real space K. Let elt e 2 , • • • , en be some basis of K.
Take two arbitrary vectors x and y in K and suppose that
X = ~1e1 + ~2e2 + • · · + ~nen,
Y = t'J1e1 + t'J2e2 + • • • + t'Jnen.
A scalar product of vectors can now be introduced, for example, as
follows:
(27.2)
92 Mca:<tlll'llll'llls in v,•ctor Space (Ch .•J

It is not hard to check that all the axioms hold. Therefore the vector
space K with scalar product (27.2) is Euclidean.
Notice that a scalar product can be introduced inK in other ways.
For example, a scalar product in K is the following expression:
(x, y) = a,~,lh + a2~2t'J2 + · · · + an~nt'ln
for any fixed positive numbers a,, a 2 , • • • , a,.. \Ve should not be
confused by this lack of uniqueness. For is there anything strange
in the fact that lengths can be measured in metres or inches, angles
can be measured in degrees or radians and so on? It is tllis lack of
uniquene:;s that makes it possible to take the fullest account of the
properties of particular ~paces when introducing a scalar product
in them.
Introducing a ~calar product in spaces of directed line segments
we had to define it separately when al least one of the ~egmenls was
zero. The seal ar product was assumed to be ze1·o. ::'<l' ow I his fact is
a property arising from a:xioms (27.1). If x is an arbitJ·aJ·y \"ector
in E, then
(0, .1:) = (Ox, x) = 0 (x, x) = 0.
Of course, by the fJJ"sl axiom of (27.1) (x, 0) = 0.
A vecto1· x of a Euclidean space is said to be normed if (.1, x) =-= 1.
Any nonzero vector y can be normed by multiplying it by !-orne num-
ber A. Indeed, under the hypothesis
(Ay, AY) = A2 (y, y) = 1,
and thet·efore as a n01 malization factor "e may lake
A = (y, y)-'12.
A ~ystem of vectors is said to be normed if all its vectors are normed.
It folio\\ s from the foregoing that any system of nonzero vecloJ'S
can be normed.
One of the most important properties of a scalar product is stated
by the following
Theorem 27.1 (the Cau(·hy-Buniakowski-Schwarz inequality).
For any tu·o z:ectors x and y of a Euclidean space
(x, y) 2 :::;;;; (x, x) (y, y).
Proof. The theorem is clearly true if y = 0. and therefore we
assume that y =fo 0. Cou~ider a vector x - Ay, where A is an arbi-
trary real number. We have
(x- Ay, x - AY) = (x, x) - 2A (x, y) + A2 (y, y).
1 he left-hand ~ide of the equation contains a scalar p10ducl of
equal vectors. The1efo1e the ~ec(.nd-degJee tiinomial at the right
2il Euclidran "P·lce !)3

i!' nonnegative for any '),, in particula1· for


A:- (x, y)
(27 .3)
(y, y) •

Thus
) , (x, y)~ ( (x. y) 2 - .
(X, )
X- 2 -(x. y) ("
( - - ) l , y -;--(
y, y
) y, y) =- ( X,
y, y 2
)
X--(--)--::::"' 0 ,
y, y

which proves the theorem.


As in the case of spaces of directed line segments two vectors z
and y of any vector space are said to be collinear if either z = 'A.y
or y = flX for some numbers A and fl· From 0 = Ox we conclude that
two vectors are obviously collinear if there is at least one zero vector
among them. A very convenient means of testing vectors for col-
linearity is the Cauchy-Buniakowski-Schwarz inequality. Namely,
we have
Theorem 27.2. The Cauchy-Buniakowski-Schwarz inequality becomes
an equation if and only if vectors x and y are collinear.
Proof. Let z and y be collinear. Suppose for definiteness that
x = 'A.y. We find
(x, y)2 = ('A.y, y)2 = A2 (y, y)2,
(z, x) (y, y) = ('A.y, 'A.y) (y, y) = ).2 (y, y)2.

Comparing these equations shows that the suff1ciency of the state-


ment of the theorem holds.
Suppose now that for some vectors z and y
(z, y) 2 = (z, z) (y, y). (27.4)
If y = 0, then the vectors are collinear. If y =fo 0, then taking 'A
according to (27 .3) and considering (27 .4) we get
(z- 'A.y, z - 'A.y) = 0.
By virtue of the last of the axioms (27.1) this means that z -
- Ay = 0 or z = 'A.y, i.e. that z and y are collinear. Necessity also
holds.
As an example consider a space Rn. It can be made Euclidean if
for the vectors

Y = (~h ~2• • • ., ~n)


a scalar product is introduced as follows:
n
(x, y) = L a,~,.
1=1
(27.5)
94 ~leasurements in Vector Space [Ch. 3

It is obvious that axioms (27.1) hold. The Cauchy-Buniakowski-


Schwarz inequality then means that
n n n
(L:
1=1
a,~ 1 ) 2
:s;;;(~ a:}(~~;)
•=I 1=1
(27.6}

for any real numbers ai and ~~·

Exercises
1. Introduce a scalar product in a space of polynomials
with real coefficients in a single variable.
2. Will a space Rn be Euclidean if the scalar product is introduced in it as
follows:
n
(.r, y)=~ la.tll~il?
•=I

3. What is the geometrical meaning of the Cauchy-Buniakowski-Schwarz


inequality in spaces of directed line segments?
4. Prove that z = y if and only if (z, d) = (y, d) for every vector d.

28. Orthogonality
The most important relation between the
vectors of a Euclidean space is orthogonality.
By definition vectors x andy are said to be orthogonal if (x, y) = 0.
By the first axiom of (27.1) the orthogonality relation of two vectors
is symmetrical. In fact in a space of directed line segments the con-
cept of orthogonality coincides with that of perpendicularity.
Orthogonality may therefore be regarded as an extension of the
notion of perpendicularity to abstract Euclidean spaces.
A system of vectors of a Euclidean space is said to be orthogonal
if either it consists of a single vector or its vectors are mutually
orthogonal. If an orthogonal system consists of nonzero vectors,
then it can be normed. A normed orthogonal system is called ortM-
normal.
Interest in orthogonal and orthonormal systems is due to the
advantages they offer in investigating Euclidean spaces.
Thus, for example, any orthogonal system of nonzero vectors and
of course any orthonormal system is linearly independent. Indeed,
let a system xh X:h • • • , x 11 be orthogonal and let Xt -=/= 0 for every i.
This means that (x 1, x 1) = 0 for i -=1= j, but that (x1. XJ)-=/= 0 for
i = j. We write
a1x1 + a~2 + . . . a,x, = 0. +
On performing a scalar multiplication of this equation by any
vector Xt we find
a1 (x~. x1) + a2 (x" x 2) + ... + a, (x" x,) = 0.
28] Orthogonality 95

Consequently,
(28.1)
and of course a. 1 = 0. Thus the system of vectors x 1 , x 2 , • • • , x~~.
is linearly independent.
From (28.1) we deduce, in particular, that if a sum of mutually
orthogonal vectors is zero, then all the vectors are zero.
Especially many useful consequences arise from the assumption
that some orthonormal system e11 e2 , • • • , e8 may form a basis of
a Euclidean space E. In this case every vector x in E must be
uniquely represented as a linear combination
x = a. 1e 1 + a. 2e 2 + ... + a. e 8 8•

But on performing a scalar multiplication of this equation by


we obtain an explicit expression for the coefficients of the expansion
e,
with respect to a basis. Namely,
(28.2)
If for the other vector, y,

Y = ~1e1 + ~2e2 + · · • + ~ses,


then on carrying out simple transformations we find that
(28.3)
In particular,
(28.4)
Before going on with such studies we shall see if there is a basis
consisting of orthonormal vectors.
A basis whose vectors form an orthonormal system is called ortho-
normal. The existence of such a basis in a Euclidean space is proved by
Theorem 28.1. There is an orthonormal basis in any finite dimensional
Euclidean space E.
Proof. Let dim E = n. An orthonormal system is linearly inde-
pendent and therefore it cannot contain more than n vectors. Sup-
pose a system e1 , e 2 , • • • , e8 contains a maximum number of ortho-
normal vectors. This means that in E there is no nonzero vector
orthogonal to all vectors e1 , e2 , • • • , e8 • If some vector is orthogonal
to these, then it must be zero.
Take an arbitrary vector x in E. If the orthonormal system e1 ,
e2 , • • • , e8 were a basis, then the vector x would have to coincide
with a vector y, where
y = (x, e1) e 1 + (x, e 2) e 2 + ... + (x, e,) e,.
'96 Mca!'urPmenls in Vector Space (Ch. H

"Therefore consider the vector x - y. We have


s
•(x- y, e 1) = (x- ~ (x, e1,) eP, e 1)
P=l
s
=(x, e1) - ~ (x, ep)(ep, e1)=(x, e1)-(x, e 1)=0.
p=!

'The vector x - y turns out to be orthogonal to all vectors e1 , e 2 , • • •


. . . , e8 • Consequently, x - y = 0 or x = y.
So the linearly independent system e1 , e 2 , • • • , e8 possesses the
property that any vector of E is linearly expressible in terms of its
~ectot·s, i.e. it forms a basis.
Corollary. Any ortMnormal system of vectors elt e 2 , • • • , e11. may
.be supplemented to an ortMnormal basis.
Indeed, choose among the orthonormal systems containing a given
:System the one that has a maximum number of vectors. Let it be
.a system eh ... , e~~., ell+1t •.. , e 8 • Repeating then word for word
the proof of Theorem 28.1 we establish that the new system is a basis.
Besides orthogonal vectors in a Euclidean space we shall discuss
<JrtMgonal sets of vectors. Two sets F and G of vectors of E are said
to be orthogonal if every vector in F is orthogonal to every vector
in G. This is designated F .1... G.
Of course a set may consist of a single vector. If some vector of
.a set is orthogonal to the entire set, then it is, in particular, orthog-
onal to itself. Consequently, it may be only zero.
Lemma 28.1. For a vector x to be ortMgonal to a subspace L it is
necessary and sufficient that it should be orthogonal to each vector of
.some basis of L.
Proof. Fix a basis y 1 , y 2 , • • • , Y11. of a subspace L. If x ..l... L,
then x is orthogonal to every vector in L and in particular to y 1,
y 2 , ••• , Yll· Now let (x, y 1) = 0 for every i. Take an arbitrary
~ector z in L and expand it with respect to the basis vectors. If

for some numbers ah a 27 ••• , a~~., then


{x, z) = (x, a1Y1 + a2Y2 + ... + a~~.y~~.)
= a1 (x, Y1) + a (x, y 2) + ... + a11. (x,
2 Yll) = 0.

"This means that x .1... L.


Corollary. For two subspaces to be ortMgonal it is necessary and
.sufficient that each vector of some basis of one of the subspaces sMuld
he ortMgonal to each vector of some basis of the other.
28) Orthogonality 97

The sum K of linear subs paces L., L 2 , • • • , Lm is said to be


orthogonal if the subspaces are mutually orthogonal. To denote
an orthogonal sum the following notation will be used:
K = L1 EB L1 EB ••• EB Lm.
Lemma 28.2. An ortMgonal sum of nonzero subspaces is always
a direct sum.
Proof. Choose in each subspace an orthonormal basis and consider
the system of vectors which is the union of bases of all subspaces.
It is clear that each vector of the orthogonal system is linearly expres-
sible in terms of the vectors of the system constructed. But this is
linearly independent since it consists of nonzero mutually orthog-
onal vectors. Now the lemma follows from Theorem 20.1.
Let a Euclidean space K be represented as an orthogonal sum of
its subs paces L., L 2 , • • • , Lm. Then the collection of these sub-
spaces may be regarded as a generalized orthogonal basis. In partic-
ular, if for any vectors z and y in K we write their expansions with
respect to the subs paces Lit L 2 , • • • , Lm, i.e. represent them as
z = z1 + x 2 + ... + Xm,
Y= Y1 + Y2 + · · · + Ym•
where z, y1 E L, then it is easy to establish that
(z, y) = (zu Y1) + (z2, Y2) + · · · + (zm, Ym)• (28.5)
This formula is similar to formula (28.3).
Consider an arbitrary nonempty set F of vectors of a Euclidean
space E. The collection of all vectors orthogonal to F is called the
orthogonal complement of F and designated Fl.. The orthogonal com-
plement is a subspace. Indeed, if vectors z, y E Fl., then z, y _L F.
But then cxx + ~y ...L F for any numbers a and ~. i.e. ax + ~y E Fl.,
Theorem 28.2. A Euclidean space E is the orthogonal sum of any
of its linear subspaces, L, and its orthogonal complement LL, i.e.
E = L EB £1..
Proof. Let dim L = s, dim £1. = m. Choose some orthonormal
basis e., ... , e, of a subspace L and some orthonormal basis rlt ...
. . . , rm of £1.. The system of vectors e1 , • • • , e,, rlt ... , rm is
orthonormal and hence linearly independent.
If the system is not a basis of E, then it can be supplemented to
an orthonormal basis of E. Let e be one of the complementary vectors.
It is orthogonal to the vectors elt ... , e8 and therefore e _L L,
i.e. e E £1.. But, on the other hand, e is orthogonal to r., ... , rm
and therefore e _L £1.. So e is both in £1. and orthogonal to £1..
Consequently, e = 0, which proves the theorem.
'1-0510
98 Measurements in Vector Space [Ch. 3

Decomposing a space as an orthogonal sum of its subspares allows


many studies to be efficiently carried out. The following example
will serve as an illustration.
Consider a Euclidean space E with some fixed system of vectors
x 1 , x 1 , • • • , x 11" If the rank of that system equals the dimension
of E, then it is obvious that the only vector in E orthogonal to all
vectors of the given system is the ze1·o vector. We have the converse
Lemma 28.3. If in a Euclidean space E some system of vectors x 1,
x 2 , • • • , x 11 is given and the only vector in E orthogonal to those vectors
is the zero vector, then the rank of the system equals the dimension of E.
Proof. Denote by L the span of the system of vectors x 10 x 1 , • • •
• . . , x 11 • Any vector orthogonal to these vectors is orthogonal to L,
i.e. is in the orthogonal complement LJ-. According to the hypothesis
of the lemma LL consists of the zero vector only. Since E = L EB LL,
it follows that the dimension of L coincides with that of E. But the
dimension of L equals the rank of the system of vectors xu x 2 , • • •
. . ., x,.. Thus the lemma is proved.
Exercises
1. Prove that if the scalar product of any two vectors
of a Euclidean space is expressed by equation (28.3), then the basis relative
to which the coordinates are taken is orthonormal.
2. Prove that if the scalar product of any vector of a Euclidean space by
itself is expressed by equation (28.4), then the basis relative to which the coordi-
nates are taken is ortlionormal.
3. Prove that if two sets consisting of a finite number of vectors are orthog-
onal, then so are the spans constructed on those sets.
4. Prove that the intersection of two orthogonal subspaces consists but of
a zero vector.
5. Prove that if a Euclidean space is the direct sum of its subspaces and
(28.5) holds for any two vectors, then the subspaces are mutually orthogonal.
6. Prove that for any subspaces L and M of a Euclidean space E
dim L + dimLL =dimE,
(Ll.)L= L,
(L + M)L = Ll. nM.L,
(L n M).L= LL + M.L.
29. Lengths, angles, distances
We now extend to the elements of a Euclidean
space such notions as length, angle and distance. We shall proceed
rrom the analogy with spaces of directed line segments.
The length I x I of a vector x of a Euclidean space E is
I x I = +(x, x) 1fl.
Every vector has length. According to the last axiom of (27 .1)
it is positive for nonzero vectors and equal to zero for a zero vector.
29) Lengths, angles, distances 99

Also the equation


I Ax I = ('Ax, 'Ax)l/ 2 = (A. I (.r, .r))l/2 = I A I I .r I
shows that it is possible to take the absolute value of the numerical
factor A. outside the vector-length sign. As already noted a nonzero
vector can be normed, i.e. multiplied by a number such that the
length of the resulting vector is equal to unity.
The angle {x, y} between nonzero vectors .r and y of a Euclidean
space E is the angle defined by the relations
(x, y)
cos {x' y} -= I z I I Y I ' o:::;;;;{.r, y}::s;;;n.
If there is at least one zero vector among the vectors x and y, then
the angle between such vectors is said to be undefined.
The Cauchy-Buniakowski-Schwarz inequality allows us to state
that the expression which we have called the cosine of the angle
between vectors does not exceed unity in absolute value. Therefore
the angle betv.een any nonzero vectors is always uniquely defined.
It remains unchanged under multiplication of the vectors by any
positive numbers and by Theorem 27.2 equals 0 or n if and only if
the nonzero vectors are collinear. All this is in full agreement with
the concept of the angle between directed line segments.
Take two nonzero vectors .r and y. Bearing in mind the analogy
with directed line segments we shall assume them to be tv. o sides
of some triangle. It is natural to take the vector .r- y as its third
side. Using the definition of the length of a vector and of the angle
between vectors we find
I .r - y 11 = (.r - y, .r - y) = (.r, .r) - 2 (.r, y) (y, y) +
= l.r 1 1
+ 1
I y 1 - 2 l.r I I y I cos {.r, y}. (29.1)
So we have shown that in a Euclidean space the square of the
length of any side of a triangle is equal to the sum of the squares
of the lengths of its two other sides minus the doubled product of
the lengths of those sides by the cosine of the angle between them.
If it is a right triangle, i.e. if the angle between the vectors .randy
is a right one, then obviously
l.r- y 11 = I .r 11 + I y 11 • (29.2)
This is nothing than a formal expression of the well-known Pythag-
orean theorem.
Consider again an arbitrary triangle. Since the cosine of the angle
between vectors does not exceed unity in absolute value, it follows
from (29.1) that
+
I .r - Y 11 :::;;;; (I .r I I Y 1)•,
I .r - Y 11 ~ (I .r I - I y 1) 1
100 Measurements in Vector Space [Ch. 3

or
IX - y I :::;;;; I X I + I y I, (29.3)
lx-yl~llxi-IYII·

Thus in a Euclidean space the length of a triangle side does not


exceed the sum of the lengths of the other two sides but is not less
than the difference of their lengths.
The distance p (x, y) between vectors x andy of a Euclidean space
is the quantity
p (x, y) = I x - Y I· (29.4)
It satisfies the three natural properties of distances between vectors
(in point interpretation!) in spaces of directed line segments. Namely,
for any vectors x, y and z of a Euclidean space
(1) p (z. y) = p (y, x),
(2) p (z2 y) > 0 if x =1= y 1 p (x, y) = 0 if x = y, (29.5)
(3) p (z. y) :::;;;; p (x 1 z) +
p (z1 y).
The first two properties are obvious. The last one is nothing than
a generalization of the well-known "triangle inequality". It follows
from the first of the inequalities (29.3) if we replace x by x - z
and y by y - z.
The distance p (A, B) between sets A and B of vectors of the same
space is the quantity all
p (A, B)= inf p (x, y).
xEA, yEB

In conclusion note the following fact. Let e1 , e2 , • • • , e, be an


orthonormal basis fixed in a Euclidean space E. For any two vectors
z and y given by their coordinates
z = (alt as, •.. , a,), y = (~1• ~2. • • ., ~.)
relative to that basis we have by (28.3)
1 xI =(a~+a:+ .•. +a~) 112 •
Consequently,
a1~1 +cz2~2+ ... +as6,
COS {X }-
' y -(czf+ ... +cz~)t/Z(~f+ .. ·+~:)t/2"
Close analogy with formulas (25.4) and (25.5) is obvious.
Thus the concepts of length, angle and distance we hava intro-
duced fully agree with similar notions in spaces of directed line
segments.
30) Inclined line, perpendicular, projection 101

Exercises
1. Prove that the length of the sum of any number of
veclors does not exceed the sum of the lengths of those vectors.
2. Prove that the square of the length of the sum of any number of orthog-
onal vectors is equal to the sum of the squares of the lengths of those vectora.
3. Given a Euclidean space of polynomials in a single variable t, find the
angles of a triangle formed by the vectors 1, t 2 and 1 - t2,
4. What is the distance between the polynomials 3f~ + 6 and 2t 8 + +
t 1?
5. Prove that a triangle in a Euclidean space is a right triangle if and only
if the length of one of its sides is equal to tlie product of the length of another
side of the cosine of the angle between them.

30. Inclined line, perpendicular,


projection
Before extending to abstract Euclidean spaces
the concepts of inclined line, perpendicular and projection we con-
sider these notions in the space of directed line segments.
Let L be a plane. Drop to it from some point M a perpendicular
and denote its foot by M L (Fig. 30.1). To give this problem a vector
interpretation choose on L a point 0
and consider a space V 8 of directed
line segments fixed at 0. The plane L
forms a subspace. Therefore the con-
struction of the perpendicular dropped
from M to L reduces to decomposing the

where OM
~

~
OM= ~
OML+MLM,
- L E L and M ---+
--
vector OM of the space as the sum

LM j_ L. From
(30.1)
Fig. 30.t
geometrical considerations it is clear
that decomposition (30.1) always exists and is unique.
This example suggests how to set the problem of a perpendicular
in the general case. Suppose in a Euclidean space E some subspace L
is fixed. Take an arbitrary vector f in E and study the possibility
of rJecomposing it as a sum
I= 1 + h, (30.2)
\\here gEL and h j_ L.
We have already encountered this problem. Indeed, the condition
h j_ Lis equivalent to h E LJ.. By Theorem 28.2 a Euclidean space E
is the direct sum of subspaces L and LJ.. Therefore decomposition
(30.2) always exists and is unique.
Bearing in mind the analogy with decomposition (30.1) the vector g
in (30.2) will be called the projection of the vector f onto L, h will be
102 Measurements in Vector Space [Ch. 3

called the perpendicular from f to L, and f will be called the inclined


line to L.
In elementary geometry the length of a perpendicular is known
never to exceed that of an inclined line. A similar situation occurs
in a Euclidean space. The vectors g and h in decomposition (30.2)
are orthogonal. By the Pythagorean theorem therefore
I t 12 = I g 12 + I h 12 ,
and so
I h I:::;;;; I I I·
It is clear that the length of the perpendicular h to a subspace L
is equal to the length of an inclined line f to the same subspace if
and only if f _L L.
The problem of a perpendicular may be given another interpreta-
tion. Consider again an arbitrary vector f in E. It is not necessarily
in L. Consequently, we may require to find in L a vector such that
is closest to fin the sense of the earlier introduced distance.
Take an arbitrary vector z in L. Subtracting it from both sides
of (30. 2) we get
I- z = (g - z) + h.
Since h is orthogonal to g - z1 by the Pythagorean theorem
II - z 11 = I g - z 11 + I h I·
Therefore
It- z I~ I h 1.
equality holding if and only if z = g.
So of all the vectors in L the projection off onto L is closest to f.
This means that
p (j, L) = p (f. g).
By analogy with directed line segments we can say that the angle
between f and L is the smallest of the angles between f and the vectors
z in L. Taking into account the Cauchy-Buniakowski-Schwarz in-
equality and decomposition (30.2) we find
(f, .z) (g +It, ') (c, ,) Ig I
{ }
cos I' z = I I I I ' I If I I' I I f I I ' I :::;;;; ffi .
It is obvious that this inequality becomes an equation if and only
if z makes a zero angle with g.
Thus the angle between f and L coincides with that between f
and its projection onto L.
The above properties of a perpendicular and a projection renect
the geometrical aspect of these concepts. Now we shall consider
them from an algebraic point of view. Given a fixed subspace L,
301 Inclined line, perpendicular, projection 103

each vector f of a Euclidean space E uniquely determines relative


to L two of its components. Consequently, it may be assumed that
decomposition (30.2) gives two functions
I= prL f,
h = ortL f.
The "independent variable" of a function may be any vector from
E, the "value" of the function prL f being a vector from L and the
"value" of ortL f a vector from LL.
By virtue of (L.L) .L = L the perpendicular and the projection
are related by the following equations:
prL f = ortL.L f,
(30.3)
orh f = prLJ. f.
Therefore in fact the study of these functions always reduces to the
study of one of them.
Take two arbitrary vectors z and y in E. According to (30.2)
z = prLz +
ortL z,
(30.4)
y = prL y +
ortL y.
Adding termwise these equations and multiplying the firs\ of them
by an arbitrary real number A. yields
z + y = (prL z +
prL y) +
(ortL z +
ortL y) 1
Ax = (A. prL z) +
(A. ortL z).
A straightforward check shows that the vectors in the first parenthe-
ses are from L and those in the second parentheses are perpendicular
to L. According to the uniqueness of decomposition of the type (30.2)
this means that the relations
prL (z + y) = prL z +
prL y,
(30.5)
prL (A:r) = A. prL z
hold for the function prL and that of course the similar relations
ortL (z + y) = ortL z +
ortL y, (30.6)
ortL (A:r) = A. ortL z
hold for ortL. Formulas (25.9) and (30.5) fully coincide.
Notice that ortL z = 0 for any vector z in L. Therefore it follows
from the first of the equations (30.6) that
ortL (z +
z) = ortL z.
Consequently, the value of the function ortL remains unchanged
if we add to the independent variable any vector from L. In partie-
104 Measurements in Vector Space [Ch. 3

ular, if we take z = -prL x, then, taking into account (30.4),


ortL (ortL x) = ortL x. (30. 7)
A similar relation holds for a projection. Namely,
prL (prL x) = prL x. (30.8)
Now let a subspace L be the orthogonal sum of subspaces £ 1 and
L 2 • Take an arbitrary vector x in E and represent it as the sum
x = (pr L, x -; prr.. x) + (x- pr£, x- pr£1 x).
The vector in the first parentheses is obviously in the subspace
L 1 EB L 2 • The vector in the second parentheses is orthogonal to
L 1 EB £ 2 , which is easy to show by transforming it with the aid
of (30.4) as follows:
X- prL, X- prLo X=- ortL, X- PfL1 X= ortL, X - prL, X. (30.9)
Therefore we conclude that
prL,<:f'L 1 X = prL, X+ pr£1 X.
The perpendicular from X to Ll e £2 is equal to one of the expres-
sions (30. 9). If in particular x _L £ 11 then
(30.10)

Exercises
1. Does the analogue of the theorem on three perpen-
diculars hold in a Euclidean space?
2. Prove that the sum of two angles between a vector 1 and subspaces L
and LJ. is equal to :t/2.
3. Find the perpendicular and projection of a vector I onto trivial subspaces.
4. Prove that if for fixed subspaces L 1 and L 2 and any vector z
prL,+I 1 z= prL, x+ prL1 x,
then the sum L 1 +
L, is orthogonal.
5. Prove that if subspaces £ 1, L 2 , ••• , Lm are mutually orthogonal, then
for any vector z in E
m
lzl 2 ;;;;. ~ IPrLzl 2 •
t=l I

31. Euclidean isomorphism


Carrying out our studies we have repeatedly
noted the coincidence of the properties of an abstract Euclidean space
and spaces of directed line segments. We could carry over to Euclid-
ean space the other facts and theorems of elementary geometry. But
there is no need for this.
31] Euclidean isomorphism 105

We introduce the concept of Euclidean isomorphism. We shall


say that Euclidean spaces E and E' are Euclidean isomorphic if
they are isomorphic as l'eal vector spaces and if in addition for any
pair of vectors x and y in E and the corresponding vectors x' and y'
in E'
(x, y) = (x', y').
Theorem 31.1. For two Euclidean spaces to be Euclidean isomorphic
tt is necessary and sufficient that they should be of equal dimension.
Proof. If two Euclidean spaces E and E' are Euclidean isomorphic,
then they are isomorphic as real vector spaces as well. But such
vector spaces have the same dimension.
Consider now two Euclidean spaces E and E' of the same dimen-
sion n. Let e11 e 2 , • • • , en be an orthonormal basis in E and let e;,
e~, •.• , e~ be an orthonormal basis in E'. Assign to each vector
x = a 1e1 + a 2e2 + •.• + Cinen
of E a vector
X = CI1ei + aze~ + . . +Cine~
of E'. This correspondence was proved earlier to be an isomorphism-
Now take another pair of corresponding vectors in E and E'
Y = ~1e1+~sez+ · · · +~nen,
y' = ~ 1 e; + ~ 2 e;+ ... +
~ne~.
By (28.3)
(x, Y) = CI1~1+~z+ · +an~n = (x', y').
Thus the theorem is proved.
We are concerned throughout only with such properties of vector
spaces that are consequences of the basic operations acting in spaces.
From this point of view Euclidean isomorphic spaces have the same
properties. Therefore any geometrical theorem proved in V 3 will
be true in any three-dimensional subspace of a Euclidean space.
Consequently, it will be also true in any Euclidean space. Of course,
the arithmetical space Rn with a scalar product introduced according
to (27.2) may serve as a standard Euclidean space.

Exercises
1. Construct a Euclidean isomorphism from V 2 to R 2 •
2. Prove that in Euclidean isomorphic !'paces an orthonormal system of vec-
tors goes over into an orthonormal system.
3. Prove that in Euclidean isomorphic spaces the angles between pairs of
corresponding vectors are equal.
4. Prove that in Euclidean isomorphic spaces a perpendicular and a projec-
tion go over respectively into a perpendicular and a projection.
106 Measurements in Vector Space [Ch. 3

32. Unitary spaces


We have extended the basic metric concepts
"Only to real vector spaces. Similar results hold for the complex vector
space.
A complex vector space U is said to be unitary if every pair of
vectors x and y in U is assigned a complex number (x, y) called
a scalar product, with the following axioms holding:
(1) (x, y) == (y, x),
(2) (Ax, y) == ). (x, y),
(3) (x + y, z) == (x, z) + (y, z),
(4) (x, x) > 0 for x =/= 0; (0, 0) == 0
for arbitrary vectors x, y and z in U and an arbitrary complex num-
ber )..
The bar in the first axiom signifies complex conjugation. This
single distinction from the axioms of a Euclidean space involves
no profound differences but we should not forget about it. Thus, while
in a Euclidean space (x, ).y) == ). (x, y), in a unitary space (x, ).y) =-
== 1 (x, y).
In a unitary space U we can introduce some metric concepts. The
length of a vector, as in the real case, will be the quantity
I x I = +(x, x)l/1,
Every nonzero vector has a positive length and the length of a zero
vector is zero. For any complex ).
I Ax I == I). I· I X I·
The Cauchy-Buniakowski-Schwarz inequality
I (x, y) 12 :::;;;; (x, x) (y, y)
is also true. The proof is similar to that for the real case.
The concept of the angle between vectors is not introduced in
a unitary space as a rule. Only the case where vectors x and y are
orthogonal is considered. As in the real case it is understood that
(x1 y) == 0.
It is obvio'lls~that (y, x) == (x, y) == 0.
Essentially the entire theory of Euclidean spaces discussed above
can be carried over without changes in the definitions and general
schemes of the proofs to unitary spaces.
The arithmetical space Cn may serve as the standard unitary space
if ror the vectors
X == (c:tt. C£11 • • o 1 C£n)l

Y = (~u ~21 • • •a ~n)


33) Linear dependence and orthonormal systems 107

their scalar product is introduced as follows:


n
(x, y) = ~ a;~,. (32.1)
tc:l

Using this space it is easy to show the part played by complex con-
jugation in the first axiom. If the scalar product were introduced
in Cn according to formula (27.2), then in C3 , for example, for the
vector
X= (3, 4, 5i)

we would have
(x, x) = 9 16 + 25i 2 = 0. +
The fourth axiom would be found to have failed.

Exercises
1. Compare a Euclidean space R2 and a unitary
space Ct.
2. Write the Cauchy-Buniakowski-Schwarz inequality for a space «;,.
3. If in a complex space the scalar product i:1 introduced according to axi-
oms (27.1), can the Cauchy-Buniakowski-Schwarz inequality hold in such a space?
4. If in a complex space the scalar product is introduced according to axi-
oms (27.1), can there be an orthogonal basis in such a space?

33. Linear dependence


and orthonormal systems
We have already noted in Section -22 that
the linear independence of a system of basis vectors may be violated
by a small change in the vectors. This phenomenon leads to great
difficulties in the use of the concept of basis in solving practical
problems. It is important to stress, however, that not all bases pos-
sess so unpleasant a feature. In particular, it is lacking in any ortho-
normal basis.
Let e11 e2 , • • • , en be an arbitrary orthonormal basis chosen in
a Euclidean or a unitary space. If for some vector b
n
b = ~ a 1e1 ,
C=t
then by (28.4)
n
I b 12 = ~ I a, 12• (33.1)
i=t

Consider now a system of vectors e1 e11 e2 + +


e 2 , • • • , en en +
and suppo!le that it is linearly dependent. This means that there
108 Measurements in Vector Space [Ch. 3

are numbers ~~~ ~ 2 , • • •8 ~n not all zero such that


n
~ ~~ (e 1 + e1) = 0
h=l
It follows that
n n
~ ~1 e1 = - ~ ~1 e 1 •
fc:t t='t
Using (33.1) and (27.6) we get
n n n n
~ I ~~ 12 = I ~ ~,e, 1
2
= I~ ~,e, 1
2
:::;;;; ( L I ~~ I I e, I)2
t=l t~t 1~1 ic:t

i=l
n
:::;;;;(~ 11 ~~ 2
1)
n
(~
t~t
I e" 2
).

Comparing the left- and right-hand sides of these relations we deduce


that
n
~~I ei 12~1.
Thus the inequality obtained means that if the condition

(33.2)

holds, then the system of vectors


e1 +e 11 e 2 + e2 , •••8 en + en
is clearly linearly independent.
The indicated feature of orthonormal systems has determined their
wide use in constructing various computational algorithms associated
with expansions with respect to bases.

Exertlsea
1. Let e~,., e2 , • • • , en be an orthogonal basis of a Euclid-
ean space. Prove that a system of vectors zu z 2 , • • • , Zn is linearly indepen-
dent 1f
n
~ cos {elt zl} > n--}-.
1=1

2. Let vectors z 1 = (zH• z 12 , ••• , Zf1a), for I = 1, 2, ..• , n, be given by


their coordinates in an arbitrary basis. Prove that if
I z11 12 > n ~ I Zi11 J2
ll¢i
for every I, then the system z 1, z 1 , •••, zn is linearly independent.
CHAPTER 4

The Volume
of a System of Vectors
in Vector Space

34. Vector and triple scalar products


Again we begin our studies with a space of
directed line segments. As always we assume some fixed Cartesian
coordinate system with origin 0 and basis i, j and k.
Three vectors are said to be a triple if it is stated which vector
is the first, which is the second and which is the third. In writing
we shall arrange the vectors of a triple consecutively from left to
right.
A triple of noncoplanar vectors a, b and c is said to be right-
(left-)handed if they are arranged the way the thumb, the unbent
forefinger and the middle finger of the right (left) hand can he held
respectively.
Of any three noncoplanar vectors a, b and c it is possible to com-
pose the following six triples:
abc, bca, cab, bac, acb, cba.
The first three are of the same sign as abc, and the others ar£> of the
opposite sign. Notice that interchanging any two vectors in any
triple makes the triple change sign.
An affine or Cartesian coordinate system is said to be right- (left-)
handed if its basis vectors form a right- (left-)handed triple. So far
our studies have been independent of the sign the basis of the coor-
dinate system had. Now some differences will appear in our inves-
tigations. For the sake of definiteness therefore in what follows we
shall consider only right-handed coordinate systems.
Suppose two noncollinear vectors a and b are given. We assign
to them a third vector c satisfying the following conditions:
(1) c is orthogonal to each of the vectors a and b,
(2) abc is a right-handed triple,
(3) the length of c equals the area S of the parallelogram construct-
ed on the vectors a and b applied to a common origin. If a and b
are collinear, then we assign to such a pair of vectors a zero vector.
The resulting correspondence is an algebraic operation in a space V 3 •
It is called vector multiplication of vectors a and b and designated
t1 ~ [a, b).
110 The Volume of a System of Vectors [Ch. '.

Consider the basis vectors i, j and k. By the definition of a vector


product
[i, i) = 0, [i, f) = k, [i, k) = -/,
[j, i) = -k, [j, j) = 0, [j, k) = l, (34.1)
[k, i) = j, [k, j) = -i, [k, k) = 0.
From these relations it follows in particular that the operation of
vector product is noncommutative.
Every triple abc of noncoplanar vectors applied to a common
point 0 determines some parallelepiped. The point 0 is one of its
vertices, and the vectors a, b and c are its edges. We designate the
volume of that parallelepiped as V (a, b, c) thus emphasizing its
dependence on the vectors a, b and c. If
a triple a, b and c is coplanar, then the
volume is assumed to be zero. We now assign
to the volume a plus sign if the noncopla-
nar triple abc is right-handed and a minus
h sign if it is left-handed. The new concept
thus defined will be called the oriented
volume of the parallelepiped and desig-
nated V± (a, b, c).
Fig. 34.1 A volume and an oriented volume may
be regarded as some numerical functions of
three independent vector variables assuming certain real values for
every vector triple abc. A volume is always nonnegative, and an
oriented volume may have any sign. We shall later see that
separating these concepts does make sense.
Let a, b and c be three arbitrary vectors. If we carry out vector
multiplication of a on the right by b and then perform scalar multi-
plication of the vector [a, b) by c, then the resulting number
([a, b), c) is said to be a triple scalar product of the vectors a, band c.
Theorem 34.1. A triple scalar product ([a, b), c) is equal to the
oriented volume of the parallelepiped constructed on vectors a, b and c
applied to a common origin.
Proof. We may assume without loss of generality that a and b
are noncollinear since otherwise [a, b) = 0 and the statement of the
theorem is obvious. Let S be as before the area of the parallelogram
constructed on a and b. By (26.4)
([a, b), c) = I [a, b) I {pr(o,bJ c} = S {PI'(o,bl c}. (34.2)
Suppose a, b and c are noncoplanar. Then {Pr(o,bl c} is up to
a sign equal to the altitude h of the parallelepiped constructed on
the vectors a, b and c applied to a common origin provided that the
base of the parallelepiped is the parallelogram constructed on a
and b (Fig. 34.1). Thus the right-hand side of (34.2) is up to a sign
equal to the volume of the parallelepiped constructed on a, b and e.
:H) Vector and triple scalar products Ht

It is obvious that {Prra,b] c} = +h if the vectors [a, b) and c


are on the same side of the plane determined by a and b. But in this.
case the triple abc is also right-handed. Otherwise {pr[o,b] c} = -h.
If the vectors abc are coplanar, then c is in the plane given by a
and b, and therefore {Prr.,.b] c} = 0. Thus the theorem is proved.
Corollary. For any three vectors a, b and c
((a, b), c) = (a, [b, c)). (34.3)
Indeed, from the symmetry of a scalar product it follows that
(a, [b, c)) = (lb, c), a), and therefore it suffices to show that
([a, b), c) = ([b, c), a). But the last equation is obvious since the
triples abc and bca are of the same sign and their parallelepipeds
coincide.
Relation (34.3) allows efficient algebraic studies to be carried out.
We first prove that for any vectors a, b and c and any real number a:
the following properties of vector multiplication hold:
(1) [a, b) = -[b, a),
(2) [a:a, b) = a: [a, b),
(3) [a + b,c) = [a, c) + [b, cl,
(4) [a, a] = 0.
Property 4 follows in an obvious way from the definition. To-
prove the remaining properties we shall use the fact that vectors :r
and y are equal if and only if
(z, d) = (y, d)
for any vector d.
Let d be an arbitrary vector. The triples abd and bad are of opposite
signs. Consequently, from Theorem 34.1 and the properties of a scalar
product we conclude that
([a, b), d) = -([b, al, d) = (-[b, a], d).
Since d is an arbitrary vector, this means that [a, b) = -[b, al
and the first property is proved.
To prove the second and the third property we proceed similarly
but in addition take into account (34.3). We have
([a:a, b1, d) = (a:a, [b, d)) = a: (a, [b, d))
= a: (lab], d) = (a: [a, b), d),
which shows the validity of Property 2. Further
([a + b, c), d) = (a + b, [c, d)) = (a, [c, d))+ (b, [c, d))
= ([a, c1. d) + ([b, c), d) = ([a, c) + [b, c), d)l
112 The Volume or a System or Vectors [Ch. 4

and Property 3 is also true. Similarly for the second factor


[a, ab] = -[ab, a] = -a [b, a] = a [a, b),
[a, b + c) = -[b + c, a] = -[b, a] - [c, a] = [a, b) + [a, c).
Now we can investigate the algebraic properties of an oriented
volume as a function given on triples of vectors. For example, let a
be a linear combination of some vectors a' and a". Then
([aa' +~a", b), c) = (a [a', b)+~ [a", b), c)
= a ([a', b), c) ~ ([a", b), c). +
Consequently
V± (aa' +~a", b, c) = aV± (a', b, c)+ ~V± (a", b, c)
for any vectors a' and a" and any real numbers a and ~·
When two independent variables are interchanged, the oriented
volume only changes sign and therefore a similar property for a linear
combination holds in each independent variable. Bearing this in
mind we shall say that an oriented volume is a linear function in
each of the independent variables.
If vectors abc are linearly dependent, then they are coplanar and
therefore the oriented volume is equal to zero. Also, considering
(34.1) we find that
V± (i, j, k) = ([i, jl, k) = (k, k) = 1.
So it may be concluded that as a function an oriented volume pos-
sesses the following properties:
(A) an oriented volume is a linear function in
each of the independent variables,
(B) an oriented volume is equal to zero on
all linearly dependent systems, (34.4)
(C) an oriented volume is equal to unity at least
on one fixed orthonormal system of vectors.
Of course we have formulated not all the properties by far of an
oriented volume. Both the properties listed in (34.4) and others
can easily be established if we know explicit expressions of vector
and triple scalar products in terms of the coordinates of vectors
a, b and c.
Theorem 34.2. If vectors a and b are given by their Cartesian coor-
dinates
a = (xit Y~t Z1),
b = (xs, Yz, Zz),
then the vector product will have the follou·ing coordinates:
[a, b) - (ylzl - Y~1t Z1X1 - ZsXlt Xll/ 1 - XtY1). (34.5)
34] VectOl' and triple scalar products 113

Proof. Considering that giving the coordinates of the vectors


determines the decompositions
a = x1i + Y1i + z1k,
b = x2 i + yJ + z k,
2

and relying on the algebraic properties of a vector product we find


[a, bJ = x 1x 2 [;, il + X1Y2 [;, + x 1z 2 [i, k] + y1x 2 [}, i]
j]
+ Y1Y2 [j, j] + Y1Z2 [j, k] + Z1X2 [k, i] + z 1y 2 [k, j] + z 1z 2 [k, k].
The statement of the theorem now follows from (34.1).
Corollary. If a vector c is also given by coordinates x:t, y 3 and z 3
in the same Cartesian system, then
(la, b], c) = X1Y2Zs + X2YsZ1 + XsY1Z2- X1Y3Z2
- X2Y1Zs - XsY2Z1. (34.6)
The introduction of an oriented volume and the study of its algeb-
raic properties allow us to make important conclusions concerning
length, area and volume.
Notice that right- and left-handed bases determine partition of
the set of all bases of a space into two classes. The term "right-
and left-handed" has no deep meaning of its own but is merely
a convenient way of identifying the class to which one basis or
another belongs. The concept of oriented volume is actually also
related to these two classes.
We have already met with such facts. All bases on a straight line
can also be divided into two classes by combining in the same class
vectors pointing in the same direction. It turns out that the mag-
nitude of a directed line segment is closely similar to an oriented
volume if both notions are considered as functions on systems of
vectors. Property A holds according to relationS; (9.8). Property B
is true since the magnitude of a zero line segment is zero. That
Property C holds is obvious.
A similar investigation could be carried out independently in the
case of the plane too. It is more simple, however, to use the results
already obtained. Fix some x, y Cartesian system. Supplement
it to a right-handed x, y, z coordinate system in space. Note that
depending on the location of the x and y axes the z axis may have
one of the two possible directions. This again determines partition
of the set of bases of the plane into two classes. The oriented area
S± (a, b) of the parallelogram constructed on vectors a and b in
the x, y plane may be defined, for example, by the equation
S± (a, b) = V± (a, b, k). Properties A, B and C again hold of
course.
8-0510
114 The Volume of a System of Vectors [Ch. 4

Thus, assigning to lengths, areas and volumes some signs and


considering them as functions given on systems of vectors we can
make all these functions have the same algebraic properties A, B
and C of (34..4).

Exercises
1. Prove that a, b and c are coplanar if and only if
their triple scalar product is zero.
2. Prove that for any three vectors a, b and c
[a, [b, c)) = (a, c) b - (a, b) c.
3. Prove that vector multiplication is not an associative operation.
4. Find an expression of the oriented area of a parallelogram in terms of the
Cartesian coordinates of vectors in the plane.
5. Will formulas (34.5) and (34.6) change if the coordinate system relative
to which the vectors are given is left-handed?

35. Volume and oriented volume


of a system of vectors
In vector spaces of directed line segments
area and volume are derived concepts of the length of a line segment.
We have already extended the concept of length to abstract Euclid-
ean spaces. \\'e now consider a similar problem for area and volume.

I
Fig. 35.t Fig. 35.2

Let x 1 and x 2 be two noncollinear vectors in the plane. Construct


on them a parallelogram, taking x 1 as its base (Fig. 35.1). Drop
from the terminal point of x 2 to the base a perpendicular h. The
area S (x1 , x 2) of the parallelogram will be defined by the formula
S (x 10 x 2) = I x 1 I I h I· (35.1)
Denote by L 0 a zero subspace and by L 1 the span constructed
on x 1 • Since
I x, I = I ortL. x, I'
formula (35.1) can be written as
S (x 1, x 2) = I ortLo x 1 I I ortL, X2 I· (35.2)
35] Volume and oriented volume 115

Take then three noncoplanar vectors .r1, .r 2 and .r 3 in space. Con-


struct on them a parallelepiped, taking as its base the parallelogram
formed by .r1 and .r 2 (Fig. 35.2). Drop from the terminal point of .r 9
to the base a perpendicular h1 • The volume V (.r1 , .r 2, .r 3) of the
parallelepiped will be defined by
V (.rl, Z2, .r3) = S (.rl, Zz) I h1 I·
If by L 2 we denote the span constructed on .r1 and .r 2, then by (35.2)
V (.rh .r2, .r3) = I ortL. Zt I I ortL, Z2 I I ortL • .r3 I·
Thus the length of a vector, the area of a parallelogram and the
volume of a parallelepiped are expressed in vector spaces V1 , V 2
and V 3 by formulas in which one cannot but see a certain regularity:
I .r 1 I = I ortL• .r1 I,
S (.r 1, x~ = I ortL, .r1 I I ortL, .r 2 I, (35.3)
V (.r 1, .r2, .r3) =- I ortL, .r 1 I I ortL, .r2 I I ortL, .r3 I·
In particular, the number of factors coincides everywhere with the
dimension of the space.
These formulas suggest how to introduce the concept of volume
in a Euclidean space En of dimension n. Let .r1 , .r 2, ... , Zn be an
arbitrary system of vectors in En. Denote by L 0 a zero subspace and
by L 1 the span formed by vectors .r1 , • . • , .r 1• Then by analogy with
spaces of directed line segments we say that:
The volume V (.r1 , .r 2, ... , .rn) of a system of vectors .r1 , .r 2, ..•
• . . , Zn of a Euclidean space En is the value on that system of a real-
valued function of n independent vector variables in En defined by
the following equation:
n-1
v (.r,, n I ortL, .rl+l I·
.r2, •..• Xn) =
t!."tl
(35.4)

Of course we cannot as yet say that the volume of a system of


vectors possesses all the properties inherent in a volume for any n.
But for Euclidean spaces of dimensions 1, 2 and 3 respectively, by
virtue of Euclidean isomorphism and relations (35.3), it clearly has
the same properties as have the length of a line segment, the area
of a parallelogram and the volume of a parallelepiped.
We now try to approach the concept of the volume of a system of
vectors of a Euclidean space En from another point of view. As
already noted, assigning definite signs turns length, area and volume
into algebraic functions possessing some common properties. There-
fore it may be expected that there are similar properties in an arbi-
trary Euclidean space, too. Bearing this in mind we give the follow-
ing definition:
116 The Volume of a System of Vectors (Ch. 4

The oriented volume V± (x1, x 2 , ••• , Xn) of a system of vectors


x 1, x 2 ,
• • • , Xn of a Euclidean space En is the value on that system
of a real-valued function of n independent vector variables in En
possessing properties (34.4).
:Much is unclear about this defmition too. Thus we do not know if
there is an oriented volume for any system of vectors in an arbitrary
Euclidean space for n ~ 4. But even if there is, is it uniquely defmed
by properties (34.4)? And finally what relation is there between vol-
ume and oriented volume in general? 1\'ow we can answer only the
last question, and only for the case n = 1, 2, 3.
We shall sometimes have to consider volume and oriented volume
in a space En for systems containing fewer than n vectors. This will
mean that in factwe are dealing not with the whole space but with
some of its subspaces from which a given system is taken. According-
ly properties (34.4) will be considered only in relation to vectors
of the same subspace. A need may arise to consider volume and
oriented volume for systems containing more than n vectors. Accord-
ing to formula (35.4) and Property B of (34.4) both functions must
be zero on such systems.
In conclusion note that the use of two different notions associated
with volume will substantially simplify their investigation, since
one concept reflects the geometrical aspect of the problem concerned
and the other reflects its algebraic aspect. We shall soon see that
there is a very close relation between them. We shall also see that
it is important to introduce these notions because they generate
a mathematical tool whose significance is not limited to the volume
problem.

Exercises
1. Prove that in spaces of directed line segments ori-
ented length, area and volume are defined uniquely hy conditions (34.4).
2. Will the same concepts he uniquely defmed if one of the conditions (34.4)
is excluded?
3. Prove that in any Euclidean space V (x1 , x 2) = I x 1 1 • I x 1 I if ancl only
if vectors x1 and x 2 are orthogonal.
4. Prove that in any Euclii:lean space V (x1 , x 2) = V (x 2 , x1 ).

36. Geometrical and algebraic properties


of a volume
We begin our study of the concept of volume
in a Euclidean space En by exploring its geometrical and algebraic
properties following from its definition.
Property 1. Always V (x 1 , x 2 , • • • , xn) ~ 0. The equation
V (x 1, x 2 , ••• , Xn) = 0 holds if and only if the system of vectors
x 1 , x 2 , • • • , Xn is linearly dependent.
36] Properties of a volume 117

The first part of the statement follows in an obvious way from


(35.4), and therefore only its second part needs to be proved. Let x 1 ,
.r 2 , . . . , Xn be a linearly dependent system. If x 1 = 0, then by defi-
nition so is the volume. If .r1 =F 0, then some vector xll+ 1 is linearly
expressible in terms of the preceding vectors x 1 , . . • , x 11 • But then
ortL11 .rll+ 1 = 0 and again the volume is zero.
Suppose now that the volume is zero. According to the definition
this means that so is one of the multipliers on the right of (3!>.4).
For that multiplier let i = k. If k = 0, then .r1 = 0. If k =fo 0,
then the condition ortL,. .rll+ 1 = 0 implies that .rll+ 1 is in the span
formed by the vectors .r1 , • • • , x 11 , i.e. that the system .r1 , • • • , xll+ 1
is linearly dependent. So is the entire system of vectors .r1 , x 2 , . • •
• • . , Xn in both cases.
Property 2. For any system of vectors .r1 , x 2, . • • , Zn
n-1
v (.r,, x2, ••• , Zn)~ n
i=O
I XI+ I I· (36.1)

(Hadamard's inequality), with equality holding if and only if the sys-


tem .r1 , .r 2 , • • • , Xn is orthogonal or contains a zero vector.
According to the properties of a perpendicular and a projection
clearly
(36.2)

the inequality becoming an equation if and only if .rl+ 1 _L L 1 or


equivalently if .rl+ 1 is orthogonal to the vectors .r1 , .r 2 , • • • , .r 1•
Consider the product of the left- and dght-hand sides of inequalities
of the form (36.2) for all i. We have
n-1 n-1
0
i=O
I ort1., .r,+, I~ f1
i=O
l.rl+t I·
If all vectors of the system .r1 , .r 2 , • • • , Zn are nonzero, then this
inequality becomes an equation if and only if the system is orthog-
onal. The case of a zero vector is trivial.
It is possible to deduce several useful properties from Hadamard's
inequality. Let .r1 , .r 2 , • • • , Xn be a normed system. Then it is
obvious that
V (.r11 X2, • • • , Zn) ~ 1.
The following statement is also true. If the system .r1 , x 2 , • • • , Zn
is normed and its volume equals unity, then it is orthonormal.
Since the volume of any normed system is not greater than unity,
this means that of all normed systems the orthonormal system has
a maximum volume.
118 Tbe Volume of a Syst.em of Vectors [Ch. 4

Property 3. For any two orthogonal sets of vectors .r1 , .r 2, ... ,


Zp and Y1• Y2• · · ·• y,.
v (.rl, .r2, ... , Zp, YI• Y2• ... , y,.)
= v (.rl, .r2, ... , Zp) v (Yl• Y2• ... , y,.).
Denote by L 1 the span formed by the first i vectors of a joined
system .r1 , • • • , .rp, y1 , • • • , y,. and by K 1 the span formed by
vectors y1 , . • . , y 1• Under the hypothesis each of the vectors in
the system y1, ... , y,. is orthogonal to all vectors of the system
.r1, ... , Zp. Therefore
Lp+l = Lp EB K,
for every t from 0 tor. 1\'ow, taking into account (30.10), we have
p-1 ,._,
v (x,, ... , Xp, Yt• ... , Yr) = ( ITI ortLIZI+tl) (
1=0
nI
1=0
ortLp I Yt+tl)
+
p-t ,._,

=( n I ortLI .rl+
i=O
I I) ( n I ortK, Yt+t I)
1=0
= v (.r,, ... '.rp) "(y,, ... 'y,).
Before proceeding to further studies we make a remark. The
volume of a system is only expressible in terms of the perpendiculars
dropped to spans formed by the preceding vectors. Taking into
account the properties of perpendiculars, it may therefore be con-
cluded that the volume of the system will remain unaffected if to
any vector any linear combination of the preceding vectors is added.
In particular, the volume will remain unaffected if any vector is
replaced by the perpendicular from that vector to any span formed
by the preceding vectors.
Property 4. The volume of a system of vectors remains unaffected
by any rearrangement of vectors in the system.
Consider first the case where in a system of vectors .r1, ... , Zn
two adjacent vectors Zp+J and ZpH are interchanged. According
to the above remark the volume will remain unaffected if Zp+l
and Zp+ 2 are replaced by the vectors ortr,. Zp+l and ortrP Zp+t•
and Zp+s• .•. , Zn are replaced by ortLP+ 2 Zp+s• ... , ortLp+ 2 Zn·
But now three sets of vectors
x 1, ••• , Xp,
ortLp .rp+t• ortLp Zp+Z•
ort 1'1>+2 Xp+S• • •• ,ortL
P+2
Zn
36) Properties of a volume 119

are mutually orthogonal and in view of Property 3 we have


V (x 1, ••• , Xp+l• Xp+ 2 , ••• , Xn) = V (x 1, ••• , Xp)
X V(ortLpxP+I• ortLpxP+ 2)·V(ortLp+ 2 xp+ 3 , ••• ,ortLp+
2
xn)·
It is clear that the spans of the vectors x 1 , • . • , xp, xP+l• Xp+ 2
and x 1 , • • • , xp, Xp+ 2 , Xp+ 1 coincide; consequently
V (x 1, ••• , Xp+ 2, Xp·T"I• ••• , Xn) = V (x 1, ••• , xp)
XV (ortLp Xp+ 2 , ortLp xp+l) · V (ortLp+ 2 Xp+s• ••• , ortLp+2 Xn)·
By virtue of Euclidean isomorphism the volume of a system of
two vectors possesses the same properties as the area of a parallelo-
gram does. In particular, it is independent of the order of the vectors
of the system. Comparing the right-hand sides of the last two equa-
tions we now conclude that
V (x1 , • • • , Xp+l• Xp+ 2 , ••• , Xn)
= V (x1 , ••• , Xp+ 2 , Xp+l• ... , Xn)•

A little later we shall prove that any pE>rmutation x;,, x;,, ...
• • • , Xinof vectors of a system x 1 , x~, ••• , Xn can be obtained by
successive interchanging of adjacent vectors. For an arbitrary per-
mutation therefore Property 4 follows from the above special case.
Property 5. The volume of a system of vectors is an absolutely homo-
geneous function, i.e.
V (x1 , ••• , axp, ••• , Xn) = I a I V (x1 , • . • , Xp, • • • , Xn)
for any p.
By Property 4 we may assume without loss of generality that
p = n. But then in view of (30.6) we get
n-2
V (x 1, •••• Xn-l• axn) = Ull ortL 1 x 1+ 1 I) I ortLn-l (axn) I
n-1
=I a I fl
i=O
I ortL 1 x1+ 1 I =I a I V (xl, ... , Xn-1• xn).
Property 6. The volume of a system of vectors remains unaffected
if to some one of the vectors of the system a linear combination of the
remaining vectors is added.
Again by Property 4 we may assume that to the last vector a linear
combination of the preceding vectors is added. But, as already
noted, in this case the volume remains unaffected.
The volume of a system of vectors is a real-valued function.
This function possesses some properties part of which we have already
established. They have confirmed our supposition that the volume
120 The Volume of a System of Vectors [Ch. 4

of a system of vectors we have defined possesses in a Euclidean space


all properties inherent in a volume for any n. But the most important
thing is perhaps that the established properties uniquely define
a volume. More precisely, we have
Theorem 36.1. If a real-valued function F (x 1 , x 2 , • • • , x .. ) of
n independent vector variables in En possesses the following properties:
(A) it remains unchanged by addition, to any independent
variable, of any linear combination of the remaining
independent variables, (36.3)
(B) it is absolutely homogeneous,
(C) it equals unity for all orthonormal systems,
then it coincides with the volume of the system of vectors.
Proof. If there is at least one zero independent variable among
x 1 , • • • , Xn, then by Property B
F (x1, x 2, ... , Xn) =
V (x1,. x 2, ... , Xn) = 0. (36.4)
Now let x 1 , xh ... , Xn be an arbitrary system. Subtracting from
each vector x 1 its projection onto the subspace formed by the vectors
x 1, • • • , x 1_ 1 and taking into account Property A we conclude that
F (x 1, x 2, ... , Xn) = F (ortLo x 1, ortr 1 x 2, ... , ortL n-1 Xn)· (36.5)
If the system x 1 , x 2 , • • • , Xn is linearly dependent, then there is
at least one zero vector among ortL 1_ x 1 and (36.4) again holds.
1
Suppose the system xh x 2 , • • • , Xn is linearly independent. Then
all vectors of the system

are nonzero. Since in addition this system is orthogonal, there is an


orthonormal system e1 , e 2 , • • • , e,, such that
ort 1_1-1 x 1 = 1 ortL 1-1 x 1 I e1•
By Property C
F (e1, e 2, ••. , en) = 1.
It follows from B therefore that
n-1
F(xl, x2, .•. ,xn)=( nI
1=0
ortL,xi+II)F(el, ez, .•• ,en)

= V (X 1, X2, ... , Xn)•


The theorem allows us to state that if we construct in some way
a function possessing properties (36.3), then it will be precisely the
volume of a system of vectors.
37) Algebraic properties of an oriented volume 121

Exercises
1. Give a geometrical interpretation of Properties 2,
3 and 6 in spaces of directed line segments.
2. Give a geometrical interpretation of equation (36.5) in spaces of directed
line segments.
3. Can a function satisfying conditions (36.3) he zero on any linearly inde-
pendent system of vectors?
4. Suppose relative to an orthonormal basis e1 , e2 , • • • , en a system of vectors
z1 , x 2 , ••• , xn possesses the property
(x 1, eJ) = 0
fori = 2, 3, ... , nand 1 < i (1 = 1, 2, ... , n- 1 and 1 > 1). Find the expres-
sion for the volume V (x 1, x 2 , • • • , xn) in tenns of the coordinates of vectors
x1 , x 2 , • • • , Xn in the has1s e~, e2 , • • • , en.
5. What will change in tbe concept of volume for a complex space?

37. Algebraic properties


of an oriented volume
We now proceed to study the algebraic prop-
erties of an oriented volume, laying aside for the time being the
question of its existence. Our study will be based on Conditions
A, B and C of (34.4).
Property 1. The oriented volume of a system of vectors is zero if any
of its tu·o vectors coincide.
This property is a direct consequence of Condition B. It is not
hard to prove that if A holds, then Band Property 1 are equivalent.
Property 2. The oriented volume of a system of vectors changes sign
if some tu·o vectors are interchanged.
The proof is similar for any two vectors and therefore for the
sake of simplicity we restrict ourselves to the case where the first
and second vectors are interchanged. By Property 1
V± (z1 + z 2, z1 + z 2, z 3 , . • . , Zn) = 0.
But on the other hand according to A
V± (z 1 +z 2, z1 + z 2, z 3, • • • , Zn)

- V± (z 1 , z 1 , z 3 , • • • , Zn) + V± (z 2 , z 2 , z 3 , • • • ,
+
V± (z1 , z 2 , z 3 , • • • , Zn) + V± (z 2 , z 1 , z 3 , ... ,
On the right of this equation the first two terms are zero from which
it follows that Property 2 is valid. It is again not hard to prove that
if A holds, then B and Property 2 are equivalent.
Property 3. The oriented volume of a system of vectors remains
unaffected by addition, to any vector, of any linear combination of the
remaining vectors.
122 The Volume of a Syswm of Vectors [Ch. 4

Again for simplicity, consider only the first vector. According


to A

n
= y± (x 1, x2, ••• , Xn) + ~ a 1y± (x" x 2, ••• , Xn)·
~2
In this equation all terms at the right but the ftrst are zero according
to Property 1.
Property 4. The oriented volume is a Mmogeneous function, i.e.
V± (xlt . . . • CXXp, •••• Xn) = a V± (xlt ...• Xp, •••• Xn)

for any p.
This property is a direct consequence of A.
Property 5. The equation V± (x1, x 2 , ••• , Xn) = 0 holds if and
.only if the system of vectors x 1 , x 2 , • • • , Xn is linearly dependent.
Obviously it is only necessary to prove that V± (x 1 , x 2 , • • •
• • • , Xn) = 0 implies a linear dependence of the vectors x 1 , x 2 , • • •
• • • , Xn· Suppose the contrary. Let the oriented volume be zero for
some linearly independent system Y1t YoJ.• ••• , Yn· This system
is a basis of En and therefore for any vector z in En
Z = CI1Y1 + CI2Y2 + • • • + CinYn·

Now replace in the system Y1t y 2 , • • • , Yn any vector, for example


111. by a vector z. Using in succession Properties 3 and 4 we ftnd that
n
V± (z, Y2• • • • • Yn) = V±(a1Y1 + .~ a,y,, Y2• • • • • Yn)
,:1
= V± (a1Y1• Y2• • • • • Yn) = CI1 V± (YI• Y2• • • • • Yn) = 0.
An oriented volume is by definition not zero at least on one linearly
independent system zit z 2 , • • • , Zn. But replacing in turn vectors
lh· y 2 , ••• , Yn by Z1, Z 2 , ••• , Zn we conclude from Theorem 15.2
that on that system the oriented volume is zero. This contradiction
proves the property in point.
Property 6. If two oriented volumes coincide on at least one linearly
independent system of vectors, then they coincide identically.
Suppose it is known that oriented volumes Vt (xit x 2 , • • • , Xn)
.and Vf (xlt X2, • , ., Xn) Coincide on a linearly independent system
z 1, z 21 • • • , Zn· Consider the difference F (x 1, x 2 , • • • , Xn) =
= Vf (xlt x 2, ••• , Xn) - V[ (xlt x 2 , ••• , Xn)· This function satis-
fies Properties 3 and 4 of an oriented volume. Besides it is zero on all
linearly dependent systems and at least on one linearly independent
system zit z2 , • • • , Zn. Repeating the arguments used in proving
38) Permutations 123

Property 5 we conclude that F (x 1, x 2 , ••• , Xn) is zero on all linearly


independent systems, i.e. that it is identically zero.
It follows from Property 6 that an oriented volume is uniquely
defined by conditions (34.4) if we fix the orthonormal system on
which it must equal unity.
Property 7. The absolute value of the oriented volume of a system of
vectors coincides with the volume of the same system.
Let an oriented volume equal unity on an orthonormal system
zit Z2o ••• , Zn. Consider functions I V± (x,, x 2 , • • • , Xn) I and
V (x 1 , x 2 , • • • , Xn). They both satisfy A and B of (36.3) and coin-
cide on a linearly independent system zit z2 , • • • , Zn· The function
4l> (x 1 , x 2 , • • • , Xn) = II V± (x 1 , x 2 , • • • , Xn) I
- V (xlt x 2 , • • • , Xn) I
also satisfies A and B of (36.3) and is zero on all linearly dependent
systems and at least on one linearly independent system zit z2 , • • •
• • • , Zn. Repeating again the arguments used in proving Property 5
we conclude that <J> (xlt x 2 , • • • , Xn) is identically zero.
The last property is very important, since it allows us to state
that the absolute value of an oriented volume must have all the
properties a volume has. In particular, it must equal unity on all
orthonormal systems, and not only on one. Hadamard's inequality
holds for it, and so on. This property supplies a final answer to all
questions posed by us concerning volume and oriented volume. The
only thing we lack is the proof of the existence of an oriented volume.

Exercises
1. Prove that if A of (34.4) holds, then B is equivalent
to both Property t and Property 2.
2. Prove that whatever tlie real number o: may be there is a system of vectors
such that its oriented volume equals o:.
3. Suppose C of (34.4) is replaced by the condition of equality to any fixed
number on any fixed linearly independent system. How is an oriented volume
affected?
4. Were the presence of a scalar product in a vector space and the reality of
an oriented volume used in deriving the properties of an oriented volume? What
will change if we consider an oriented volume in a complex space?

38. Permutations
Consider a system xlt x 2 , • • • , Xn and a sys-
tem x 1,, x 1,, • • • , x 1 n obtained from the first using several permu-
tations of vectors. Suppose these systems may be transformed into
each other by successive transformations of only pairs of elements.
Then the volumes of the systems will be the same and their oriented
volumes will be either the same or differ in sign depending on the
number of transpositions required.
124 The Volume or .t System or Vectors [Ch. 4

In the questions of permutations we are going to discuss, individual


properties of vectors will play no part, what will be important is
their order. Therefore instead of vectors we shall consider their
indices 1, 2, ... , n. A collection of numbers
jl, jz, · · ·• jn
among which there are no equal numbers and each of which il'l one
of the nnmbers 1, 2, ... , n is called a permutation of those num-
bers. The permutation 1, 2, ... , n is called normal.
It is easy to show that a set of n numbers has n! possible permu-
tations in all. Indeed, for n = 1 this is obvious. Let the statement be
true for any set of n - 1 numbers. All permutations of n numbers
can be g1·ouped into n classes by placing in the same class only per-
mutations that have the same number in the first place. The number
of permutations in each class coincides with the number of permu-
tations of n - 1 numbers, i.e. is equal to (n - 1)!. Consequently,
the numbe1· of all permutations of n numbers is nl.
It is said that in a given permutation numbers i and j form an
inversion if i > j but i precedes j in the permutation. A permutation
is said to be even if its numbers constitute an even number of inver-
sions and odd otherwise. If in some permutation we interchange any
two numbers, not necessarily adjacent, leaving all the others in
their places, we obtain a new permutation. This transformation of
a permutation is called a transposition.
We prove that any transposition changes the parity of the per-
mutation. For adjacent numbers this statement is obvious. Their
relative positions with respect to other numbers remain the same
and permutation of the adjacent numbers changes the total number
of inversions by unity.
Suppose now that between numbers i and j to be interchanged
there are s other numbers k,, k 2 , • • • , kso i.e. the permutation is
of the form
... , i, kit k 2 , • • • , k 8 , j, . . . .
We shall interchange the number i successively with the adjacent
numbers k,, k 2 , • • • , k 8 , j. Then the number j now preceding i is
transferred to the left by s transpositions with numbers ks, k,_., ...
. . . , k,. We thus carry out 2s + 1 transpositions of adjacent num-
bers iu all. Consequently, the permutation will change its parity.
Theorem 38.1. All n! permutations of n numbers can be arranged in
such an order that each subsequent permutation is obtained from the
preceding one by a single transposition, beginning with any permutation.
Proof. This is true for n = 2. If it is required to begin with the
permutation 1, 2, then the desired arrangement is 1, 2, 2, 1, if,
however, we are to begin with the permutation 2, 1, then the desired
arrangement is 2, 1, 1, 2.
39) The existence of an oriented volume 125

Suppose the theorem has already been proved for any permutations
containing no more than n - 1 numbers. Consider permutations
of n numbers. Suppose we are to begin with a permutation i 1 , i 2 , • • •
. . . , i,1 • We shall arrange permutations according to the following
principle. We begin with permutations with i 1 in the first place.
According to the assumption all these permutations can be ordered
in accordance with the requirements of the theorem, since in fact
it is necessary to arrange in the required order all permutations
of n - 1 numbers.
In the last permutation obtained in this way we make one trans-
posttion, transferring to the fust place the number i 2 • We then put
in ot·der, as in the pt·eceding case, all permutations with a given
number in the ftrst place and so on. In this way it is possible to look
ovet· all permutations of n numbers.
With such a system of arranging permutations of n numbers
adjacent permutations will have opposite parities. Considering
that n! is even for n ~ 2 we can conclude that in this case the number
of even permutations of n numbers equals that of odd pet'mutations
and is nl/2.

Exercises

f. What is the parity of the permutation 5, 2, 3, 1, 4?


2. Prove that no even (odd) permutation can be transformed into a normal
one m an odd (even) number of transpositions.
3. Consider a pair of permutations 11 , i 2 , ••• , in and 1, 2, ... , n. We trans-
form into the normal form the first permutation employing transpositions,
making for each of them one transposition of any elements in the second permuta-
tion. Prove that after the process is over the second permutation will have the
same parity as the permutation 11 , i 2 , ••• , tn.

39. The existence


of an oriented volume

We now discuss the existence of an oriented


volume of a system of vectors. Choose in a space En an orthonormal
system z1 , z2 , • • • , Zn on \Vhich an oriented volume must equal
unity by Condition C of (34.4). Take an arbitrary system X~o x 2 , • • •
. . . , Xn of vectors in En. Since the system z1 , z2 , • • • , Zn is a basis
in En, for every vector x, there is an expansion
(39.1)
with respect to that basis, where a 11 are some numbers.
If an oriented volume exists, then according to A of (34.4) we can
transform it successively, taking into account expansions (39.1).
126 Tht:> Volume of a Systt:>m of Vectors [Ch. 4

Namely,
n n n
V± (x 1, x 2 , ••• , Xn) = V± ( .~ a 1;,z;,, .~ a2;,z;,, ... , ~ an; z; )
Ja=-1 J,=l in=! n n
n n n
= ~ a,;, V± (z;,. ~ a2,,z;, •... ' ~ anJ Zj )
,,_1 J,=l jn=t n n
n n n
= ~ ~ a 1;,a2;, V± (z 1,, z;,, ... , .~ an;nz;n)
,, ... t f.:,t Jn=l
n n n
= ... = ~ ~ .... ~ a,;,a2;, ..• an;n V± (z;,, z;,, ... , z;n).
Ja=l ,r.:, Jn-1
(39.2)
In the last n-fold sum most of the terms are zero, since by Prop-
erty 1 the oriented volume of a system of vectors is zero if any two
vectors of the system coincide. Therefore only those of the systems
z;,, z;,, ... , z1n should be considered for which the set of indices
jh i 2 , • • • , in is a permutation of n numbers 1, 2, .•. , n. But
in this case
y: (z,,, Zj Zjn) = ± 1 1
, ••• ,

depending on the evenness or oddness of the permutation of indices.


Thus if an oriented volume exists, then it must be expressible
in terms of the coordinates of vectors xh x 2 , • • • , Xn in the basis
z1 , z2 , • • • , Zn by the following formula:
V±(x 1,x2 , ••• , Xn)=~±a,;,a2;, •.. an;n· (39.3)
Here summation is taken over all permutations of indices i 11 j 2 , • • •
. . ., in of numbers 1, 2, ... , nand a plus or a minus sign is taken
according as the permutation is even or odd.
We prove that the function given by the right-hand side of (39.3)
satisfies all conditions defining an oriented volume. Let a vector Xp
be a linear combination of vectors and i.e. x; x;,
Xp = CX.Xp-;- R.pXp" I 1

for some numbers a. and ~- Denote by a;,J and a;; respectively the
coordinates of x~ and x;
in a basis z1 , z2 , • • • , Zn. Then it is obvious
that
a PI = a.a~; + ~a~J
fot· every j in the range from 1 to n. We further find
~ ± a,;, ... avip ... anln
= ~ ± a ti, ... (a.a~Jp + ~;Jp) . · · anln
-=a.~ ± a 1;, ••• a~;P ... anJn +~ ~ ±ali, ••• a;,P ••• tlnJn'
and thus A of (34.4) holds.
40) Determinants 127

Suppose we permute some two vectors of a system x 1 , x 2 , • • • , Xn·


In this case function (39.3) changes sign since the parity of each
permutation is changed. As already noted, if the property of linearity
in each independent variable holds, the property just proved is-
equivalent to Condition B of (34.4).
And finally consider the value of the constructed function on
a system of vectors z1, z 2 , • • • , Zn. For that system coordinates a 11
have the following form:
0 if i=/=j,
a 11 - {
- 1 1·f l· = l··
Consequently, of terms (39.3) only one term a11a11 • • • ann is non-
zero. The permutation 1, 2, ... , n is an even one and the elements-
a11, a 221 • • • , ann equal unity, so the value of the function on the-
orthonormal system z1, z 2 , • • • , Zn is 1.
Thus all Conditions (34.4) hold and function (39.3) is an expres-
sion of the oriented volume of a system of vectors in terms of their
coordinates. That expression is unique by virtue of the uniqueness
of an oriented volume.

Exercises
1. Was the orthonormality of the system z1 , z2 , ••• , zn
actually used in deriving formula (39.3)?
2. What changes will occur in formula (39.3) if in Condition C of (34.4) the
oriented volume is not assumed to equal unity?
3. To what extent Condition B of (34.4) was actually used in deriving (39.3)?
4. Will the form of (39.3) be affected if we consider an oriented volume in
a complex space?

40. Determinants
Let vectors x 11 x 2 , , ... Xn of a Euclidean
space Rn be given by their coordinates
x, = (all, a12, ••• , a,n)
in basis (21. 7). Arrange numbers a 11 as an array A:
au a12 ••• a1n)
A== a~ I. ~22. ••• a.2n. •
(
ani an2 · · • ann
This array is called a square n X n matrix and the numbers a 11
are matrix elements. If the matrix rows are numbered in succession
from top to bottom and the columns are numbered from left to right,
then the first index of an element stands for the number of the row
128 Th!• Volume of a System of Vectors [Ch ,-.

the element is in and the second index is the number of the column.
The elements a 11 , a 22 , • • • , ann are said to form the principal diag-
onal of the matrix A.
Any n 2 numbers can be arranged as a square n X n matrix. If
the 1·ow elements of a matrix are assumed to be the coordinates of
a vector of Rn in basis (21. 7), then a 1-1 correspondence is established
between all square n X n matrices and ordered systems of n vectors
of the space Rn.
In Rn, as in any other space, there is an oriented volume. It will
be unique if we require that Condition C of (34.4) should hold on
the system of vectors (21.7). Taking into account the above 1-1 cor-
re!'pondence we conclude that a well-defined function is generated
on the set of all square matrices. Considering (39.3) we arrive at the
following definition of that function.
An nth-order determinant corresponding to a matrix A is an algeb-
raic sum of n! terms which is made up as follows. The terms of the
determinant are all possible products of n matrix elements taken
an element from each row and each column. The term is taken with
a plus sign if the indices of the columns of its elements form an even
permutation, provided the elements are arranged in increasing
order of the row indices, and a minus sign otherwise.
To designate a determinant we shall use the following symbol:
au a12 • • • a1n )
det a~·. a.a2. . • • a~". (40.1)
(
ant an2 ann
• • •

if it i!> necessary to give matrix elements in explicit form. If, however,


this is not necessary, we shall use a simpler symbol,
det A,
restricting ourselves to the notation of a matrix A. The elements
of the matrix of a determinant will also be called the elements of the
determinant.
The determinant coincides with the oriented volume of the system
of matrix rows. In investigating it therefore it is possible to use
all known facts pertaining to volumes and oriented volumes. In
particular, the determinant is zero if and only if the matrix rows
are linearly dependent, the determinant changes sign and so on.
Now our studies will concern those of its properties which are difficult
to prove without using an explicit expression of the determinant in
terms of matrix elements.
The transpose of a matrix is a transformation such that its rows
become columns and its columns become rows with the same indices.
The t1·anspose of a matrix A is denoted by A'. Accordingly the deter-
40] Determinants 129

minant
au a21 ••• ani )
det a~2. a.22. • • • a~2·
(
aln a2n ••• ann
1
or det A is said to be obtained by transposing determinant (40.1).
As to transposition the determinant possesses the following impor-
tant property:
The determinant of any matrix remains unaffected when transposed.
Indeed, the determinant of a matrix A consists of terms of the
following form:
(40.2)

whose sign depends on the parity of the permutation i~t j 2 , • • • , in·


1
In the transpose A all multipliers of product (40.2) remain in
different rows and different columns, i.e. their product is a term of
the transposed determinant. We denote the elements of A 1 by a!;.
It is clear that a;; = a11 and therefore
(40.3)

We put the elements of the right-hand side of (40.3) in increasing


order of row indices. Then the permutation of column indices will
have the same parity as that of the permutation i~t i 2 , • • • , in·
But this means that the sign of term (40.2) in the transposed deter-
minant is the same as that in the original determinant. Consequently,
both determinants consist of the same terms with the same signs,
i.e. they coincide.
It follows from the above property that the rows and columns of
a determinant are equivalent. Therefore all the properties proved
earlier for the rows will hold for the columns.
Consider a determinant d of order n. Choose in its matrix k arbi-
trary rows and k arbitrary columns. The elements at the intersection
of the chosen rows and columns form a k X k matrix. The determi-
nant of the matrix is called the kth-order minor of the determinant d.
The minor in the first k columns and the first k rows is called the
principal minor.
Suppose now that in an nth-order determinant d a minor M of order
k is taken. If we eliminate the rows and columns at whose intersec-
tion the minor M lies, we are left with a minor N of order n - k.
This is called the complementary minor of M. If on the contrary we
eliminate the rows and columns occupied by the elements of the
minor N, then the minor M is obviously left. It is thus possible to
speak of a pair of mutually complementary minors.
130 The Volume of a System of Vectors [Ch 4

If a minor M of order k is in the rows with indices i 11 i 2 , • • • , i~t


and the columns with indices itt j 2 , • • • , j,11 then the number
II
~ (fp+ip)
( -1)P= 1 N (40.4)
will be called the algebraic adjunct or cofactor of ,11.
Theorem 40.1 (Laplace). Suppose in a determinant d of order n, k
arbitrary rows (columns) are cMsen, with 1 ~ k ~ n - 1. Then the
sum of the products of all kth-order minors contained in the cMsen rows
(columns) by their algebraic adjuncts is equal to the determinant d.
Proof. Assume the columns of the matrix of d to be vectors xlt
.:r 2 , • • • , Xn of a space Rn. The sum of the products of all kth-order
minors contained in the chosen rows by their algebraic adjuncts may
be considered as some function F (x 11 x 2 , ••• , Xn) of the vectors
x 1 , x 2 , • • • , Xn.
This function is obviously linear in each independent variable,
since this property is true of both minors and algebraic adjuncts.
It equals unity on the orthonormal system (21.7), which ic; easy
to show by direct check. If we prove that F (xit x 2 , • • • , x,.) changes
sign when any two vectors are interchanged, then this will establish
that it coincides with the oriented volume of the vector system
xlt .c 2 , • • • , Xn· But the oriented volume coincides with the deter-
minant of a matrix in which the coordinates of vectors are contained
in the rows. Since the determinant of a matrix coincides with the
determinant of the transpose of the matrix, the proof of the theorem
is complete.
Obviously it suffices to consider only the case where two adjacent
vectors are interchanged, for permutation of any two vectors always
reduces to an odd number of permutations of adjacent vectors. The
proof of this fact was given in Section 38.
Suppose vectors x 1 and x 1+I are interchanged. We establish a 1-1
correspondence between the minors in the chosen rows of the original
determinant and of the determinant with interchanged columns.
We denote by w a set of column indices defming a minor. The follow-
ing cases are possible:
(1) i, i + 1 E w,
(2) i, i + 1 t w,
(3) i E w, i + 1 t w,
(4) i + 1 E w, i t w.
In cases (1) and (2) each minor is assigned a minor on the columns
with w and in cases (3) and (4) a minor on the columns with the set
of indices obtained from w by replacing i by i +
1 and i +
1 by
i respectively.
40] Determinants 131

Note that in all cases the COl-responding minors are defined by the
same set of elements. Moreover, in cases (2) to (4) they coincide and
in case (1) they only differ in sign, since each of them is obtained
from the other by interchanging two columns. For similar reasons,
the corresponding complementary minors differ in sign in case (2)
and coincide in the remaining cases. The algebraic adjuncts and
complementary minors coincide up to a sign which depends on the
parity of the sum of the indices of the rows and columns containing
the minor. They are the same in cases (1) and (2) and differ by unity
in cases (3) and (4).
Comparing now the corresponding terms of F (x 1, x 2, ••• , Xn)
and those of the function resulting from permuting x 1 and x 1+1
we notice that they coincide up to a sign. Consequently, if two adja-
cent vectors are interchanged F (x11 x 2 , • • • , Xn) changes sign.
The theorem is often used where only one row or one column is
chosen. The determinant of a 1 X 1 matrix coincides with its only
element. Therefore the minor at the intersection of the ith row and
jth column is equal to the element a 11 . Denote by A 11 the algebraic
adjunct of a 11 • By Laplace's theorem, for every i
(40.5)
This formula is called the expansion of the determinant with respect
to the ith row. Similarly for every j
(40.6)
which gives the expansion of the determinant with respect to the jth
column.
We replace in (40.5) the elements of the ith row by a collection of
n arbitrary numbers b1 , b 2 , • • • , bn. The expression

b1A11 + b2A12 + ·. · + bnAin


is the expansion with respect to the ith row of the determinant

r
a~ 1. ~12. .' •. • . a.ln.)

det bt b2 • • • bn (40. 7)
.....
lant an2 ••• annJ
obtained from the determinant d by replacing the ith row with the
row of the numbers b1 , b 2 , • • • , bn. We now take as these numbers
the elements of the kth row of d, with k =fo i. The corresponding de-
terminant (40. 7) is zero since it has two equal rows. Consequently,
a,uAII + auA12 + ... + a11nA1n = 0, k=F i. (40.8)
132 The Volume of a System of Vectors [Ch. 4

Similarly
ali,All +a 2 11A 2 1 + ... + anltAnl = 0, k =I= j. (40.9)
So the sum of the products of all elements of any row (column) of
a determinant by the algebraic adjuncts of the corresponding ele-
ments of another row (column) of the same determinant is zero.
In conclusion note that the entire theory of determinants can be
extended without change to the case of complex matrices. The only
thing lost is the visualization associated with the concept of volume.

Exercises
1. Write expressions for determinants of the second
and third orders in terms of matrix elements. Compare them w1th expression
(34.6).
2. Write Hadamard's inequality for the determinant of matrices A and A'.
3. A determinant of the nth order all elements of which are equal to unity
in absolute value equals nn' 2 • Prove that its rows (columns) form an orthog-
onal baeis.
4. Find the determinant whose elements satisfy the conditions a 11 = 0 for
'>5. The<elements of a determinant satisfy the conditions a
j (t j, t;;;;, j, t~ J).
11 = 0 for t > k
and j ~ k. Prove that the determinant is the product of the pnncipal minor of
order k and its complementary minor.
6. Let the elements of a complex matrix A satisfy the conditions au = 11 a
for all t anc! f. Prove that the determinant of such a matrix is a real number.

41. Linear dependence


and determinants
One of the most common applications of de-
terminants is in problems connected with linear dependence. Given
m vectors x 1 , x 2 , • • • , Xm in a space Kn of dimension n, determine
their basis. Choose some basis in Kn and consider the rectangular
array

(41.1)

where the rows represent the coordinates of the given vectors in the
chosen basis.
Such an array is called a rectangular matrix. As before the first
index of an element a 11 stands for the number of the matrix row
the element is in, and the second index is the number of the column.
If we want to stress what number of rows and columns a matrix A
has, we shall write A (m X n) or say that the matrix A is an m
41) Linear dependence and determinants 133

by n matrix. A matrix A (n X n) will as before be called a square n


by n matrix. Along with the matrix A we shall consider its transpose
1
A If A is m by n, then A is n by m.
1

In a rectangular matrix A (m X n) it is again possible to indicate


different minors whose order of course does not exceed the smaller
of the numbers m and n. If A has not only zero elements, then the
highest order r of nonzero minors is said to be the rank of A. Any
nonzero minor of order r is called a basis minor, and its rows and
columns are basis rows and columns. It is clear that there may be more
basis minors than one. The rank of a zero matrix is zero by definition.
We shall regard the rows of A as vectors. It is obvious that if we
find a basis of these row vectors, then the corresponding vectors of
Kn will form a basis of vectors x 1 , x 2 , • • • , Xm·
Theorem 41.1. Any basis rows of a matrix form a basis of row vectors
of that matrix.
Proof. To see that the theorem is true it is necessary to show that
basis rows are linearly independent and that any matrix row is
linearly expressible in terms of them.
If basis rows were linearly dependent, then one of them would be
linearly expressible in terms of the remaining basis rows. But then
the basis minor would be equal to zero, which contradicts the hy-
pothesis.
Now add to the basis rows any other row of A. Then by the defini-
tion of a basis minor all minors of order r + 1 in those rows will be
zero. Suppose the rows are linearly independent. By supplementing
them to a basis we construct some square matrix whose determinant
must not be zero. But, on the other hand, expanding that determi-
nant with respect to the original r + 1 rows we conclude that it is
zero. The resulting contradiction means that any row in A is linearly
expressible in terms of the basis rows.
The theorem reduces the problem of finding a basis of a system of
vectors to that of finding a basis minor of a matrix. Since the deter-
minant of the transpose of a matrix coincides with that of the orig-
inal matrix, it is clear that Theorem 41.1 is true not only for the
rows but also for the columns. This means that for any rectangular
matrix the rank of its system of row vectors equals the rank of its
system of column vectors, a fact that is not obvious if we have in mind
only the concept of the rank of a system of vectors.
In a space with scalar product the linear dependence or indepen-
dence of a system of vectors x 1 , x 2 , • • • , Xm can be established with-
out expanding with respect to a basis. Consider the determinant
(Xt, Xt) (Xt, xa) (xt, Xm) )
G (xt, x 2, •••• xm) = det (~~~: ~~~ • (~z: ~~ ~x~, .x~).
(
(xm, Xt) (x'", xa) • · · (XIIII, Xm)
134 The Volume of a Systc:>m of Vectors [Ch. Ia

called the Gram determinant or Gramian of the system of vectors


X1 1 X2 1 ••• , Xm•
Theorem 41.2. A system of vectors is linearly dependent if and only
if its Gramian is zero.
Proof. Let x1 , x 2 , • • • , Xm be a linearly dependent system of
vectors. Then there are numbers a 11 a 2 , • • • , am, not all zero,
such that
a 1 x 1 + a 2 x 2 7 • • • + CimXm = 0.
Performing scalar multiplication of this equation by x 1 for every i
we conclude that the columns of the Gramian are also linearly de-
pendent, i.e. the Gramian is zero.
Suppose now that the Gramian is zero. Then its columns are lin-
early dependent, i.e. there are numbers a 1 , a 2 , • • • , am, not all
zero, such that
a 1 (x,. x 1) + a 2 (x,. x 2) + ... -r Cim (xit Xm) = 0
for every i. We rewrite these equations as follows:
(x,. a 1x 1 + a 2x 2 + . . . + CimXm) = 0.
Multiplying them termwise by a 1 , a 2 , ••• , am and adding we get
la1x1 + CI2X2 + • · · + amxml 2 = 0.
This means that
a 1x1 + a 2x 2 + . . . + CimXm = 0,
i.e. that the vectors x11 x 2 , • • • , Xm are linearly dependent.

Exercises
1. What is a matrix all of whose minors are zero?
2. For a square matrix, are its basis rows and basis columns equivalent
systems of vectors?
3. Do the elementary transformations of its rows and columns, discussed in
Section 15, affect the rank of a matrix?
4. Prove the inequality
n
O~G(zb z 21 ... , Zn)~ IJ (Zj, Zj).
i=l

In what cases the inequality becomes an equation?


5. It is obvious that
V2 t )
det
( 2 \12_ -o.
Prove that for any approximation of the number V2 by a finite decimal frac-
tion p
det { ~ !) * o.
42] Calculation of determinants 135

42. Calculation of determinants


A straightforward calculation of a determinant
using its explicit expression in terms of matrix elements is rarely
employed in practice because of its being laborious. A determinant
of the nth order consists of n! terms and for each term to be calcu-
lated and added to the other terms it is necessary to carry out n· nl
arithmetical operations. Even carrying out all these computations on
a modern computer performing 106 arithmetical operations per second
to compute the determinant of, for example, the hundredth order
would require many million years.
One of the most efficient ways of calculating determinants is based
on the following idea. Let a 11 P be a nonzero element in a matrix A.
We call it the leading element. Adding to any ith row, i =I= k, a kth
row multiplied by an arbitrary number a 1 is known to leave the de-
terminant unaffected. Take
a,p
a,=--- a~ap

and carry out the indicated procedure for every i =I= k. All the ele-
ments of the pth column of the new matrix, except the leading one,
will then be zero. Expanding the new determinant with respect to
the pth column we reduce calculating the determinant of the nth
order to calculating a single determinant of order (n - 1). We pro-
ceed in a similar way with this determinant and so on.
This algorithm is called the Gauss method. To calculate the deter-
minant of the nth order by this method would require carrying out a
total of 2n3 /3 arithmetical operations. Now the determinant of the
hundredth order could be computed, on a computer performing
106 arithmetical operations per second, in less than a second.
In conclusion note that with arithmetical operations approxi-
mately performed and information approximately given, the results
of computing determinants should be regarded with some caution. If
conclusions about linear dependence or independence of a system
of vectors are made only on the strength of the determinant being
zero or nonzero, then in the presence of instability pointed out in
Section 22 they may turn out to be false. This should be borne in
mind whenever a determinant is used.
Exercises
1. What accounts for a faster rate of calculation of a deter-
minant by the Gauss method as compared with straightforward calculation?
2. Let all elements of a determinant equal at most unity in absolute value
and suppose that in calculating each element we make an error of an order of e.
For what n does a straightforward calculation of the determinant make sense in
~s of accuracy?
3. Construct an algorithm for calculating the rank of a rectangular matrix,
usin~ the Gauss met~od. What is the result of applying this algorithm to ap-
proximate calculations?
CHAPTER 5

The Straight Line


and the Plane
in Vector Space

43. The equations of a straight llne


and of a plane
The main object of our immediate studies are
the straight line and the plane in spaces of directed line segments.
Given some coordinate system, the coordinates of the points on the
straight line or in the plane can no longer be arbitrary and must
satisfy certain relations. Now we proceed to derive those relations.
Given in the plane a fixed Cartesian z, y system and a straight
line L, consider a nonzero vector
n = (A, B) (431.)
perpendicular to L. It is obvious that all other vectors perpendicular

--
to L will be collinear with n.
Take a point M 0 (z0 , y0 ) on L. All points M (z, y) of L and only

--
those points possess the property that the vectors M 0 M and n are
perpendicular, i.e.
(MoM, n)- 0. (43.2)
Since
--
M 0 M = (z-z0 , y- ye),
from (43.1) and (43.2) it follows that
A (z - z 0 ) + B (y - y 0 ) = 0.
Letting
-Az 0 - By 0 = C,
we conclude that in the given z, y system the coordinates of the
points of L and only those coordinates satisfy
Az + By + C = 0. (43.3)
Among the numbers A and B there is a nonzero one. Therefore
equation (43.3) will be called a first-degree equation in variables z
and y.
We now prove that any first-degree equation (43.3) defines relative
to a fixed z, y coordinate system some straight line. Since (43.3) is
a first-degree equation, of the constants A and B at least one is not
43] The equations of a straight line and of a plane 131"

zero. Hence (43.3) has at least one solution .r0 , y 0 , for example,
AC BC
Xo=- Al+Bl• Yo=- A'+B"
with
A.r 0 + By + C =
0 0.
Subtracting from (43.3) this identity yields
A (.r - .r0) + B (y - y0) = 0
equivalent to (43.3). But it means that any point M (.r, y) whose-
coordinates satisfy the given equation (43.3) is on the straight line-
passing through M 0 (.r 0 , y 0 ) and perpendicular to vector (43.1).
So, given a fixed coordinate system in the plane, any first-degree
equation defines a straight line and the coordinates of the points of
any straight line satisfy a first-degree equation. Equation (43.3) is
called the general equation of a straight line in the plane and the
vector n in (43.1) is the normal vector to that straight line.
Without any fundamental changes we can carry out a study of the
plane in space. Fix a Cartesian .r, y, z system and consider a plane-
n. Again take a nonzero vector
n = (A, B, C) (43.4)
perpendicular to n. Repeating the above reasoning we conclude that
all points M (.r, y, z) of n and only those points satisfy the equation
Ax + By + Cz + D = 0 (43.5}
which is also called a first-degree equation in variables .r, y and z.
If we again consider an arbitrary first-degree equation (43.5),.
then we shall see that it also has at least one solution .r 0 , y 0 , z0 ,
for example,
AD BD CD
Xo=- A¥+B'+ c2 • Yo=- A2+B2+ c2 • Zo=- A2+B2 +C2"
We further establish that any point M (.r, y, z) whose coordinates
satisfy the given equation (43.5) is in the plane through the point
M 0 (.r 0 , y 0 , z0 ) perpendicular to vector (43.4).
Thus, given a fixed coordinate system in space, any first-degree-
equation defines a plane and the coordinates of the points of any
plane satisfy a first-degree equation. Equation (43.5) is called the-
general equation of a plane in space and the vector n in (43.4) is the
normal vector to that plane.
We shall now show how two general equations defining the same
straight line or plane are related. Suppose for definiteness that we
are given two equations of a plane n
A1.r + B 1y + C1z + D1 = 0, (43.6)
.A. tZ + B 211 + C~ + D 2 = 0.
138 The Straight Line and the Plane [Ch. 5

The vectors
Itt = (A 1 , B~o C1 ), n2 = (A 2 , B 2 , C2 )
are perpendicular to the same plane n and therefore they are col-
linear. Since furthermore they are nonzero, there is a number t such
that, for example,

or
A1 = tA2, B1 = tB2, cl = tC2. (43.7)
Multiplying the second of the equations (43.6) by t and sub-
tracting from it the first we get by virtue
Y of (43.7)
D 1 = tD 2 •
Consequently, the coefficients of the general
equations defining the same straight line or
plane are proportional.
A general equation is said to be complete
if all of its coefficients are nonzero. An
:c equation which is not complete is called
Fig. 43.1 incomplete. Consider the complete equation
of a straight line (43.3). Since all the coef-
ficients are nonzero, that equation can be written as
z y
-c-+-c-· = 1·
-T -If
If we let
c c
a=-
7 , b=- 8 ,
then we obtain a new equation of a straight line
z y 1
-a+b'=.
This is the intercept form of the equation of a straight line. The num-
bers a and b have a simple geometrical meaning. They are equal to
the magnitudes of the intercepts of the straight line on the coordi-
nate semiaxes (Fig. 43.1). Of course the complete equation of a plane
can be reduced to a similar form
-=-+..!!.+..!.=1.
a b c
Different incomplete equations define special cases of location of
a straight line and a plane. It is useful to remember them since they
occur fairly often. For example, if C = 0, equation (43.3) defines a
43) The equations of a straight line and of a plane 139

straight line through the origin, if B = C = 0, the straight line


coincides with the y axis and so on. If A = 0, equation (43.5) defines
a plane parallel to the x axis, if A = B = D = 0, the plane coin-
cides with the x, y plane and so on.
Any nonzero vector parallel to a straight line will be called its
direction vector. Consider, for example, the case of a space and f1nd
the equation of a straight line through a given point M 0 (x 0 , y 0 , z 0 )
with a given direction vector
q = (l, m, n).
Obviously, M (x, y, z) is on the straight line if and only if the vee-
--+
tors M 0 M and q are collinear, i.e. if and only if the coordinates of
these vectors are proportional, i.e.
z-z0 y-y 0 z-z
-~- = - m - = - n -0 (43.8)
These equations are precisely the desired equations of a straight
line. They are usually called the canonical equations of a straight
line. It is clear that in the case of a straight line in the plane the
equation will be of the form
z-zo Y-lfo
-l-=-m- (43.9)

if the straight line passes through the point M 0 (x 0 , y 0 ) and has a


direction vector q = (l, m).
Using the canonical equations it is easy to obtain the equation of
a straight line through two given points M 0 and M 1 • To do this it
--+
suffices to take as a direction vector the vector M 0 M1 , express its
coordinates in terms of the coordinates of M 0 and M 1 and substitute
them in equations (43.8) and (43.9). For example, in the case of a
straight line in the plane we shall have the following equation:
z-zo Y-Yo
zl-zo Y1-Yo'
and in the case of a space we shall have
z-z y-y Z-Z
- - -0 = - -0- = - -0
Notice that in canonical equations of a straight line the denomi-
nators may turn out to be zero. Therefore in what follows the propor-
tion alb = c/d will be understood to be the equation ad = be. Con-
sequently, the vanishing of one of the coordinates of a direction
vector implies the vanishing of the corresponding numerator in the
canonical equations.
To represent a straight line analytically it is common practice to
write the coordinates of its points as functions of some auxiliary
140 The Straight Line and the Plane [Ch. 5

parameter t. Take as the parameter t each of the equal relations of


(43.8) and (43.9). Then for the case of a space we shall have the fol-
lowing equations of a straight line:
z = x0 + lt,
y =Yo+ mt, (43.10)
1 = z0 + nt

and similar equations for the case of a straight line in the plane
X= Xo + lt, (43.11)
y =Yo+ mt.

These are called the parametric equations of a straight line. Assign-


ing to t different values yields different points of a straight line.
Providing great possibilities and convenience in writing the var-
ious equations of a straight line and of a plane is the use of the
concept of determinant. We introduce, for example, the equation of
a plane through three different points not on the same straight line.
So, let M 1 (x1, y1, z1), M 2 (x 2, y 2 , z2) and M 3 (x 3, y 3, z3) be points.
Since they are not on the same straight line, the vectors
--+ _ _.
M1M2= (xz- Xt, Yz- Yt• Zz- Zt), MtMa = (xs-X!I Ya- Yt• Zs-Zt)

-- --
are not collinear. Therefore M (x, y, z) is in the same plane with
Mit M 2 and M 8 if and only if the vectors M 1M 2 and M 1M 3 and
~

M 1M= (x-x 1, y-y 1, z-z 1)


are coplanar, i.e. if and only if the determinant made up of theil'
coordinates is zero. Consequently,

det (=2 x~t :2 y~l :z z;t) = 0 (43.12)


Xs-Xt Ys-Yt Zs-Zt
is the equation of the desired plane through three given points.
Consider, finally, the equation of a straight line in space through
a given point perpendicular to two nonparallel straight lines. Sup-
pose both straight lines are given by their canonical equations
Z-Zt _ Y-Yl _ z-zl
-,-1-- ---m;-- -n-1- '
Z-1"1 Y-YI z-zl
-,-.- =----;;;-- =--;;;-
44) Relative positions 141

The direction vector q of the desired straight line must be perpen-


dicular to two vectors
ql = (ll, m1, ~). q2 = (12, m2, n2).
These are not collinear, and therefore it is possible to take as q, for
example, the vector product [q1 , q 2 ). Recalling the expression for
the coordinates of the vector product in terms of the coordinates of
the factors and using for notation the determinants of the second
()rder we get

If the desired straight line passes through the point M 0 (z0 , y 0 , z0 ),


then its canonical ~quations will be as follows:

Of course, fundamentally many conclusions concerning the equa-


tions of a straight line and of a plane remain valid for any affine
coordinate system. Our desire to use Cartesian coordinate systems
is mainly due to their ensuring simpler calculations.

Exercises
1. Write the equation of a straight line in the plane
through two given points using a second-order determinant. Compare it with
(43.12).
2. Is it correct to say that (43.12) is always the equation of a plane?
3. By analogy with equations (43.10) write the parametric equations of
a plane in space. How many parameters must they contain?
4. Find the coordinates of the normal vector to the plane through three given
points not. on the same straight line.
5. What is the locus of points in space whose coordinates are solutions of
a system of two linear algebraic equations in three unknowns?

44. Relative positions


Simultaneous consideration of several straight
lines and planes gives rise to various problems, first of all to
problems of determining their relative positions.
Let two crossing planes be given in space by their general equa-
tions
A 1z + + +
B 1y C1 z D 1 = 0,
A.z + BsY + CsZ + D2 = 0.
142 The Straight Line and the Plane [Ch. :;

They form two adjacent angles adding up to two right angles. We


find one of them. The vectors Itt = (A 1 , B 1 , C1 ) and n 2 = (A 2 , B 2 , C 2)
are normal ones, so determining the angle between the planes re-
duces to determining the angle q> between n 1 and n 2 • By (25.5)
AlA2+BlB2+ClC2
cos q> = (Af-t- B~ -r- qptl (A I-t- Bl T Ci)l/ 2

Quite similarly derived is the formula for the angle between two
straight lines in the plane given by their general equations
A 1z + B 1y + C1 = 0,
A 2z + B 2 y + C 2 = 0.
One of the angles <p made by these straight lines is calculated from
the formula

COS cp = (Af 1 Bf} 1fl (Ai -t- B~)'l" 0

The condition of the parallelism of straight lines given by their


general equations is that of the collinearity of their normal vectors,
i.e. the condition of the proportionality of their coordinates
Al Bl
y2 = y2.
The condition that straight lines should be perpendicular coincides
with the condition that cos q> = 0 or equivalently with the condi-
tion that
AIA2 + BIB2 = 0.
Of course, of a similar form is the condition
Al Bl Cl
-::t;=s;-=c;-
of the parallelism of planes and the condition
A1A2 + B1B2 + C1c2 = o
of the perpendicularity of planes also given by their general equa-
tions.
Suppose now that two straight lines, for example in space, are
given by their canonical equations
z-zl Y-Yl z-z1
-,-1- =~ =---;;;----'
z-z 2 y-y z-z
-,-2- = --;n;- = ----n;-2 .
2

Since the direction vectors of those straight lines are vectors q1 =


= (l 1 , m1 , n 1 ) and q2 = (l 2 , m 2 , n 2 ), we conclude again that one of
the angles q> between the straight lines will coincide with that be-
.-'.4] Relative positions 143

tween qJ and q2 • Consequently,


l1l2+m1m2+n1n2
cos cp = (lf + mf + nnlfl(li + mJ + nf)lf2 •

Accordingly, the proportionality of the coordinates


l1 m1 n1
z;=m.=n.
is the condition of the parallelism of the straight lines, and the
equation
l 1 l 2 + m 1m 2 + n 1 n 2 = 0
is the condition of their perpendicularity.
It is clear that if straight lines and planes are given in such a way
that their direction vector or normal vector is explicitly indicated,
then the finding of the angle between
them always reduces to the finding of the
angle between these vectors. Suppose, for
example, a plane n is given in space by
its general equation
Ax + By + Cz + D = 0
and a straight line L is given by its canon-
ical equation
z-zo
-l-
Y-Yo
= -n-,- = -z-zo
n- ·

Since the angle cp between the straight


line and the plane is complpmentary to Fig. 44.1
the angle "' between the direction vector of
the straight line and the normal vector to the plane (Fig. 44.1),
we have
• I Al+Bm+Cn I
SlD q>-= (A~ -t- B• + CZ)lf2 (lz + m2 + n2)lf2.
It is obvious that
Al + Bm + Cn =
0,
the condition of the parallelism of the straight line and the plane,
and that
A B c
the condition of the perpendicularity of the straight line and the
plane.
Assigning a straight line and a plane in the form of general equa-
tions allows us to solve very efficiently an important problem, that
of calculating the distance from a point to the straight line and
144 The Straight Line and the Plane [Ch. ;;

from a point to the plane. The derivation of formulas is quite simi-


lar in both cases and we again confine ourselves to a detailed dis-
-cussion of just one of them.
Let n be a plane in space, given by its general equation (43.5).
Take a point M 0 (x 0 , y 0 , z0 ). Drop from M 0 a perpendicular to the

--- --
plane and denote by M 1 (x1 , y 1 , z1 ) its foot. It is clear that the dis-
tance p (M 0 , n) from M 0 to the plane is equal to the length of M 0 M 1 •

--
The vectors n = (A, B, C) and M 0 M1 = (x 1 - x 0 , y 1 - Yo.
z1 - z0 ) are perpendicular to the same plane and hence collinear.
'Therefore there is a number t such that M 0 M 1 = tn, i.e.
X1 - x 0 = tA,
Y1- Yo= tB,
Z1 - z0 = tC.
'The point M 1 (x1 , y1 , z1 ) is inn. Expressing its coordinates in terms
-of the relations obtained and substituting them in the equation of a
plane we find

But the length of n is (A 2


+ +
+
- I t I (A 2 B 2 C2 ) 112 • Consequently,
I Az0 +By0 +Cz0 -l-D I
---
B 2 + C2 ) 112, and therefore I M 0 M 1 I =

P( M o,n)= (A2+B2+C')lf2

In particular
I Dl
P (0, n) = (A2+B'+C')lf2 •
Along with the general equation (43.5) of a plane we consider the
following of its equations:
±(A 1 + 8 2 + C2)-112 (Ax+ By+ Cz +D)= 0.
Of the two possible signs at the left we choose the one opposite to
the sign of D. If D = 0, then we choose any sign. Then the free
term of that equation is a nonpositive number -p, and the coef-
ficients of x, y and z are the cosines of the angles between the normal
vector and the coordinate axes. The equation
x cos a + y cos~ + z cosy - p = 0 (44.1)
is called the rwrmed equation of a plane. It is obvious that
P (Mo, n) = I Xo cos a +Yo cos~ + z0 cosy- p 1.
p (0, n) = p.
45] The plane in vector space 145

The distance p (M 0 , L) from a point M 0 (x 0 , y 0 ) to a straight line


L in the plane, given by its general equation (43.3), is determined
from! a similar formula
I Az0 +By 0 +C I
P (Mo, L) = (A2+B')lft •

The normed equation of a straight line is like this:


x cos a + y sin a - p = 0. (44.2)
Here a is the angle made by the normal vector with the x axis.

Exercises
1. Under what condition on the coordinates of normal
vectors do two straight lines in the plane (three planes in space) intersect in a
single point?
2. Under what condition is the straight line (43.8) in plane (43.5)?
3. Under what condition are two straight lines in space, given by their
canonical equations, in the same plane?
4. Calculate the angles between a diagonal of a cuhe and its faces.
5. Derive the formula for the distance between a point and a straight line in
space, given by its canonical equations.

45. The plane in veetor space


We have repeatedly stressed that a straight
line and a plane passing through an origin can be identified in
spaces of directed line segments with the geometrical representation
of a subspace. But in their properties they differ but little from any
other straight lines and planes obtained by translating or shifting
these subspaces. Wishing to extend this fact to arbitrary vector
spaces we arrive at the concept of plane in vector space.
Let L be some subspace of a vector space K. Fix in K a vector x 0 •
In particular it may be in L. A set H of vectors z obtained from the
formula
z = x0 y, + (45.1)
where y is any vector of L, is called a plane in K. The vector x 0 is
a tran.slation vector and the subspace L is a direction subspace. As to
H it will be said to be formed by translating L by a vector x 0 •
Formally the concept of plane includes those of straight lines and
planes (in vector interpretation!) in spaces of directed line segments.
But whether it possesses similar properties is as yet unknown to us.
Each vector of H can be uniquely represented as a sum (45.1).
If z = x 0 + y and z = x 0 +
y', where y, y' E L, then it follows
that y = y'. In addition, from (45.1) it follows that the difference
of any two vectors of H is in L.
146 The Straight Line and the Plane [Ch. 5

Choose in H a vector z0 • Let z0 = .r0 + y0• Represent (45.1) as


z = Zo + (JJ - Yo).
The sets of vectors y and (Jj - y 0 ) describe the same subspace L.
Therefore the last equation means that H may be obtained by trans-
lating L by any fixed vector of the plane.
A plane H is some set of vectors of K generated by a subspace L
and a translation vector .r0 according to (45.1). It is a very important
fact that any plane may be generated by only one subspace. Suppose
that this is not the case, i.e. that there is another direction subspace
L' and another translation vector .r~ forming the same plane H.
Then for any z E H we have z = .r0 + y, where y E L, and at the
same time z = .r~ + y', where y' E L'. It follows that L' is a col-
lection of vectors of K defined by the formula
y' = (.ro - .r;) + y.
Since the zero vector is in L', it follows from the last formula
that the vector (.r 0 - .r~) is in L. But this means that L' consists
of the same vectors as L.
We have already noted that the translation vector is not uniquely
defined by a plane of course. However, here too the question of
uniqueness can be dealt with in a quite natural way.
Assume that a scalar product is introduced in K. It is clear that
we obtain the same plane if we take ortL .r 0 instead of .r0 • We may
therefore assume without loss of generality that .r0 _LL. The vector
.r0 is then called the orthogonal translation vector. We can now prove
that every plane is generated by only one translation vector.
Indeed, suppose there are two translation vectors, .r~ and .r;,
orthogonal to L but generating nevertheless the same plane H.
Then for any vector y' E L there must be a vector y" E L such that
.r~ + y' = .r; + y". It follows that .r~ - .r; E L. But under the
hypothesis .r~ - .r;_LL. Consequently, .ro - .r~ = 0, i.e . .r~ = .r;.
This means in particular that in a space with scalar product any
plane has only one vector orthogonal to the direction subspace.
Two planes are said to be parallel if the direction subspace of one
is in the direction subspace of the other.
It is easy to justify this definition. Any two parallel planes H 1
and H 1 either have no vector in common or one of them is in the other.
Suppose H 1 and H 2 have in common a vector z0 • Since any plane
can be obtained by translating the direction subspace by any of its
vectors, both H 1 and H 2 can be obtained by translating the corre-
sponding subspaces by a vector z0 • But one of the subspaces is in the
other and therefore one of the planes is in the other.
A subspace is a special case of a plane. It is obvious that a sub-
space L is parallel to any plane H obtained by translating L by some
45] The plane in vector space 147

vector x 0 • From the established property of parallel planes it follows


that H coincides with L if and only if x 0 E L.
Consider now two nonparallel planes H 1 and H 2 • They either have
no vector in common or have a vector in common. In the former case
H 1 and H a are called crossing planes and in the latter they are called
intersecting planes.
Just as in the case of subspaces a set of vectors which are in both
H 1 and H a is called the intersection of the planes and designated
Hl nH2. Let HI be formed by translating a subspace Ll and let
H a be formed by translating a subspace La. Denote
H =HI nH2, L = Ll nLa.
Theorem 45.1. If an intersection H contains a vector z0 , then it is a
plane formed by translating an intersection L by that vector.
Proof. Under the hypothesis of the theorem there is a vector zc.
in H. Suppose there is another vector z1 E H. We represent it as
z1 == z0 1- (z1 -- z0).
Now from the sequence of relations
z1 , z0 E H -z 1, z0 E H1 ;
Z1, ZoE H2 -z1 -- Zo E L1;
z1 -- z0 E La - z1 -- z0 E L
we conclude that any vector of H can be represented as a sum of z0
and some vector of L.
Take then a vector f of L. We have

f E L - f E L1; f E La - Zo 1- f E H 1; Zo 1- f E H a - Zo 1- f E H,
i.e. any vector of L translated by a vector z0 is in H. Thus the theo-
rem is proved.
A plane is not necessarily a subspace. Nevertheless it can be as-
signed dimension equal to that of the direction subspace. A plane of
zero dimension contains only one vector, the translation vector.
In determining the dimension of an intersection of planes Theo-
rem 19.1 is useful. From Theorem 19.1 and 45.1 it follows that the
dimension of an intersection H does not exceed the minimum one
of the dimensions of H 1 and H a·
If in spaces of directed line segments two (three) vectors are given,
then with some additional conditions it is possible to construct only
one plane of dimension 1 (2) containing the given vectors. Those
additional conditions can be formulated as follows. If two vectors
are given, then they must not coincide, i.e. they must not be in the
same plane of zero dimension. If three vectors are given, then they
must not be in the same plane of dimension one.
148 The Straight Line and the Plane [Ch. 5

Similar facts hold in an arbitrary vector space.


Let x 0 , x1 , • • • , x 11 be vectors in a vector space. We shall say
that they are in a general position if they are not in the same plane
of dimension k - 1.
Theorem 45.2. If vectors x 0 , x1 , • • • , x 11 are in a general position,
then there is a unique plane H of dimension k containing those vectors.
Proof. Consider the vectors x 1 - x 0 , x 2 - x 0 , • • • , x 11 - x 0 • If
they were linearly dependent, then they would be in some subspace
of dimension k - 1 at most. Consequently, the vectors x 0 , x 1 , • • •
. . . , x 11 themselves would be in the plane obtained by translating
that subspace by a vector x 0 , which contradicts the hypothesis of
the theorem.
So the vectors x 1 - x 0 , x 2 - x 0 , • • • , x" - x 0 are linearly inde-
pendent. Denote by L their span. The subspace L has dimension k.
By translating it by a vector x 0 we obtain some plane H of the same
dimension which contains all the given vectors x 0 , x 1 , • • • , x 11 •
The constructed plane H is unique. Indeed, let the vectors
x 0 , x 17 • • • , x 11 be in two planes H 1 and H 2 of dimension k. The
plane remains the same if the translation vector is replaced by any
other vector of the plane. We may therefore assume without loss
of generality that H 1 and H 2 are obtained by translating the cor-
responding subspaces L1 and L 2 by the same vector x 0 • But then it
follows that both subspaces coincide since they have dimension k
and contain the same linearly independent system x 1 - x 0 ,
X2- Xo, • •• , XII- Xo·

Exercises
1. Let H 1 and H 2 be any two planes. Define the sum
H 1 + H 2 to be the set of all vectors of the form z1 + z2 , where zt E H 1 and
1 1 E H 1. Prove that the sum of the planes is a plane.
2. Let H be a plane and let ).. be a number. Define the product ).H of H by )..
to be the set of all vectors of the form ).z, where z E H. Prove that ).His a plane.
3. Will the set of all planes of the same space, with the operations intro-
duced above on them, be a vector space?
4. Prove that vectors z 0 , z1 , ••• , z 11 are in a general position if and only if
the vectors z1 - z 0 , z 2 - z 0 , ••• , z11 - z 0 are linearly independent.
5. Prove that a plane of dimension k containing vectors of a general posi-
tion z 0 , z1 , ••• , z 11 is a subspace if and only if those vectors are linearly de-
pendent.

46. The straight line and


the hyperplane
In a vector space K of dimension m two classes
of planes occupy a particular place. These are planes of dimension
1 and planes of dimension m - 1. Geometrically any plane of di-
mension 1 in spaces of directed line segments is a straight line. A
plane of dimension m - 1 is a hyperplane.
461 The straight line and the hyperplane 149

Consider a straight line H in a vector space K. Denote by .r 0 the


translation vector and by q the basis vector of a one-dimensional
direction subspace. Let these vectors be given by their coordinates
Zo = (.rl, Zz, • • ·• Zm),
q = (ql, q2, • • •• qm)
relative to some basis of K. It is obvious that any vector z of H can
be given by
z = .r 0 tq,-r (46.1)
where tis some number. Therefore relation (46.1) may be assumed to
be a vector equation of H in K. If z has in the same basis the coor~
dinates
z = (z1 , z2 , ••• , Zm),
then, writing (46.1) coordinatewise, we get
Z1 = .ri -r q1t,
Z2 = Z2 -r q2t,
(46.2)
Zm = Zm -r qmt.
Comparing now these equations with (43.10) and (43.11) it is natural
to call them parametric equations of a straight line H. We shall say
that a straight line H passes through a vector .r 0 and has a direction
vector q.
By Theorem 45.2, it is always possible to draw one and only one
straight line through any two distinct vectors .r 0 and Yo· Let .r0
and Yo be vectors given in some basis of a space K by their coordi~
nates
Zo = (.rl, Zz, · · ., Zm),
Yo = (YI• Y2• • • ., Ym)·
Since it is possible to take, for example, the vector Yo - .r0 as di~
rection vector, equations (46.2) yield the parametric equations
-r
Z1 = Z1 · (Yl - .ri) t,
Z2 = Z2 -r (Y2 - Zz) t,
.. . . .. ..
.. . . (46.3)
Zm = Zm -r (ym - Zm) t
of the straight line through two given vectors.
When t = 0 these equations define the vector .r 0 and when t = 1
they define the vector y 0 • If K is a real space, then the set of vectors
given by (46.3) for 0 ~ t ~ 1 is said to b3 the line segment joining
vectors .r0 and Yo· Of course, this name is associated with the geo~
150 The Straight Line and the Plane [Ch. 5

metrical representation of this set in spaces of directed line seg-


ments.
Suppose H intersects some plane. Then according to the conse-
quence arising from Theorem 45.1 the intersection is either a
straight line or a single vector. If the intersection is a straight line,
then it certainly coincides with the straight line H. But this means
that if a straight line intersects a plane, then the straight line is
either entirely in the plane or it and the plane have only one vector
in common.
The concept of hyperplane makes sense in any vector space, but
we shall use it only in spaces with scalar product.
Consider a hyperplane H. Let it be formed by translating an
(m - i)-dimensional subspace L by a vector .r 0 • The orthogonal
complement LJ. is in this case a one-dimensional subspace. Denote
by n any one of its basis vectors. A vector z is in H if and only if
the vector z - .r 0 is in L. That condition holds in turn if and only
if z - .r0 is orthogonal to n, i.e. if
(n, s - .r0 ) == 0. (46.4)
Thus we have obtained an equation satisfied by all vectors of H.
To give a hyperplane in the form of this equation, it suffices to indi-
cate any vector n, orthogonal to the direction subspace, and a trans-
lation vector .r0 •
The explicit form of the equation substantially simplifies making
various studies. Given vectors ftt, n 2 , • • • , n" and .r1 , .r 2 , • • • , .r",
investigate a plane R which is the intersection of the hyperplanes
(n11 z - .r1 ) == 0,
(n 2 , z - .r 2 ) == 0,
(46.5)
(nit, z - .rlt) == 0.
This problem may be regarded as a solution of the system of equa-
tions (46.5) for vectors z. Suppose that the intersection of the hyper-
planes is not empty, i.e. that (46.5) has at least one solution z 0 •
Then, as we know, the desired plane is defined by the following
system as well:
(~. z - z0 ) == 0,
(n 2 , z - z0 ) == 0,
(46.6)
(nit, z - Z0) == 0
since any plane remains unaffected if the translation vector is re-
placed by any other vector of the plane.
46) The straight line and the hyperplane 151

A vector y = z - z0 is an arbitrary vector of the intersection L


of the direction subspaces of all k hyperplanes. It is obvious that
vectors y of the subspace L satisfy the following system:
(ftt, y) = 0,
(n 2 , y) = 0,
(46.7)
(n,u y) = 0.
Giving the intersection L as system (46. 7) allows an easy solution
of the question of the dimension of L. As is seen from the system
itself, the subspace L is the orthogonal complement of the span of
a system of vectors ftt, n 2 , • • • , n~~.. Let r be the rank of that system.
Then the dimension of L and hence of R is m - r, where m is the
dimension of the space. In particular, if ftt, n 2 , • • • , n" are linearly
independent, then the dimension of plane (46.5) is m - k. It is of
course assumed that the plane exists, i.e. that system (46.5) has at
least one solution. To ,give the subspace L defined by system (46. 7)
it suffices to indicate its basis, i.e. any system of m - r linearly in-
dependent vectors orthogonal to ftt, n 2 , • • • , n~~..
Equation (46.4) of a hyperplane can be written in a somewhat
different form. Let (n, x 0 ) = b. Then
(n, z) = b (46.8)
will define the same hyperplane as equation (46.4). Note that the
general equations (43.3) and (43.5) of a straight line and of a plane
are essentially given in the same form. It is important to stress
that any equation of the form (46.8) can be reduced to the form (46.4)
by a suitably chosen vector .x 0 • To do this it suffices, for example, to
take x 0 in the following form:
x 0 =an.
Substituting this expression in (46.4) and comparing with (46.8)
we conclude that we must have
b
a=--
(n, n) •

We can now conclude that if a system of the form (46.5) defines


some plane, then the same plane may also be defined by a system of
the following form:
(ftt, z) = bu
(n 2 , z) = b 2 ,
(46.9)
152 The Straight Line and the Plane [Ch. 5

for appropriate numbers b1 , b2 , • • • , b". The converse is obviously


true too. System (46.9) defines the same plane as (46.5) does for ap-
x".
propriate vectors x1 , x 2 , • • • ,
A straight line and a plane are hyperplanes in spaces V 2 and V3
respectively. We have established earlier the relation of the distance
from a point to these hyperplanes to the result of a substitution of
the coordinates of the point in the general equations. There is a
similar relation in the case of an arbitrary hyperplane.
Let H be a hyperplane given by equation (46.4). Denote as before
by p (v, H) the distance between a vector v and H. Taking into
account equation (46.4) we get
(n, v - x0 ) = (n, v - x0 ) - (n, z - x0 ) = (n, v - z)
for any vector z in H. According to the Cauchy-Buniakowski-Schwarz
inequality
l(n, v - x0 ) I ~ I n I I v - z I· (46.10)
Therefore
_ _ I (n, v-zo) I
Iv z::?
1 lnl •
If we show that H contains one and only one vector, z*, for which
(46.10) becomes an equation, then this will mean, first, that
H)- I (n, v-zo) I (46.11)
p (v, - In I
and, second, that the value p (v, H) is attained only on that single
vector z*.
Denote by L the direction subspace of H. It is clear that any vector
orthogonal to Lis collinear with n, and vice versa. Inequality (46.10)
becomes an equation if and only if n and v - z are collinear, i.e.
v - z = a.n for some number a.. Let equality hold for two vectors,
z1 and z2 , of H, i.e.
v- z1 = a.1n,
v- z2 = a. 1 n.
It follows that
Z1 - lz = (a.z - a.1) n.
Consequently, z1 - z2 _L L. But z1 - z2 E L as the difference of two
vectors of a hyperplane. Therefore z1 - z1 = 0 or z1 = z2 •
Denote by z 0 a vector in H orthogonal to L. As we know, that
vector exists and is unique. We write a vector z as a sum
z = z0 +
y,
where y E L. A vector v can be represented as
v = t + s,
47] The half-space 153

where f E L, i.e. s _L L. Now


v - z = (s - z0 ) + (f - y).
If we take z = z0 + f, then
v- z = s- z 0 •
The vector h = s- z0 is orthogonal to L, and formula (46.11) is
established.
We have simultaneously proved that any vector v of a space can
be represented uniquely as a sum
v= z + h,
where z is a vector in Hand h is orthogonal to L. By analogy with
spaces of directed line segments a vector z in the decomposition is
called the projection of v onto H and h is called the perpendicular
from v to H. The process of obtaining a vector z from v is termed the
projection of v onto H. If a hyperplane is given by (46.4), then n is
the normal vector to the hyperplane. Given vectors x 0 and n there
is only one hyperplane containing the vector x 0 and orthogonal to the
vector n.

Exercises

1. Prove that any plane distinct from the entire space


may be given as the intersection of hyperplanes (46.9).
2. Prove that a sum of hyperplanes is a hyperplane if and only if the hyper-
planes to be added are pariillel.
3. Prove that a product of a hyperplane by a nonzero number is a hyper-
plane.
4. Under what conditions are a straight line and a hyperplane parallel?
5. Derive the formula for the distance between a vector and a stra1ght line
given by equation (46.1).

47. The half-space


Associated with the concepts of a straight line
and of the hyperplane of a body is the notion of the so-called convex
sets. Since these are widely used in various fields of mathematics,
we shall examine some of them.
A set of vectors of a real vector space is said to be convex if to-
gether with every pair of vectors it contains the entire line segment
joining them.
Examples of convex sets are a single vector, a line segment, a
straight line, a subspace, a plane, a hyperplane and many others.
Suppose a hyperplane is given in a real space by
(n, z)- b = 0.
154 The Straight Line and the Plane [Ch. 5

A set of vectors z satisfying


(n, z)- b < 0 (47.1)
or
(n, z)- b > 0 (47.2)
is called an open half-space. Half-space (47.1) is negative and (47.2)
is positive.
Theorem 47.1. A half-space is a corwex set.
Proof. Take two vectors, x 0 and y 0 , and let
~ 1 = (n, x0 ) - b, ~ 2 = (n, y 0 ) - b.

If z is any vector of the straight line through x0 and y 0 , then


z = x0 + t (y 0 - x0 ).
When 0 ~ t ~ 1 we obtain vectors of the line segment joining x 0
and Yo· We have
(n, z) - b = ~1 (1 - t) ~ 2 t. +
(47.3)
If ~ 1 and ~~ have the same sign, i.e. x 0 and y 0 are in the same half-
space, then the right-hand side of (47.3) will also have the same
sign for all values of t satisfying 0 ~ t ~ 1.
Thus any hyperplane divides a vector space into three noninter-
secting convex sets, the hyperplane itself and two open half-spaces.
Suppose that x 0 and y 0 are in different half-spaces, i.e. that ~ 1
and ~ 2 have different signs. Formal transformation of relation (47.3)
results in the following equation:

(no z)- b = (~z-~t) ( t-1-~2/11>1) •


It follows that the straight line through x 0 and y 0 intersects the
hyperplane. The intersection is defined by
t= 1
1-11>2/11>1
satisfying 0 ~ t ~ 1. So
If two vectors are in different half-spaces, then the line segment join-
ing those vectors intersects the hyperplane determining the half-spaces.
Considering this property it is easy to understand what the half-
spaces of spaces of directed line segments are. In the plane the termi-
nal points of vectors of a half-space are on the same side of a straight
line and in space they are on the same side of a plane.
Along with open half-spaces closed half-spaces are not infrequently
considered. They are defined as sets of vectors z satisfying
(n, 1) - b ~ 0 (47.4)
48) Systems of linear equations 155

(n, z)- b ~ 0. (47.5)


Half-space (47 .4) is nonpositive and (47 .5) is nonnegative. Of course,
-closed half-spaces are also convex sets.
Theorem 47.2. An intersection of convex sets ls a convex set.
Proof. Obviously it suffices to consider the case of two sets UJ
n
.and U 2 • Let U = U 1 U 1 be their intersection. Take any two vec-
tors x 0 and y 0 in U and denote by S the line segment joining them.
The vectors x 0 and y 0 are in both U1 and U 1 • Therefore in view of
the convexity of U1 and U 2 the line segment S is entirely in both
U1 and U 2 , i.e. it is in U.
This theorem plays an important part in the study of convex
-sets. In particular, it allows us to say that a nonempty set of vec-
tors z, each satisfying the system of inequalities
(~. z) - !1 ~ 0,
(n 1 , z) - / 1 ~ 0,

(n~r., z) - f~r. ~ 0,
is a convex set. Such systems of inequalities are the basic element
in the description of many production planning problems, manage-
ment problems and the like.
Exercises
1. Prove that a set of vectors s satisfying (s, s) ~ a
ois convex.
2. Prove that if a vectors is in hyperplane (46.8), then the vectors+ n is
in the positive half-space.
3. Prove that if hyperplanes are given by the normed equations (44.1)
.and (44.2), then the origin is always in the nonpositive half-space.

48. Systems of linear equations


We again turn to the study of systems of lin-
-ear algebraic equations, this time, however, in connection with the
problem of the intersection of hyperplanes.
Consider a real or complex space K of dimension m. Suppose a
:Scalar product is introduced in it. Choose some orthonormal basis
.and let nl, n., ... , n" be normal vectors to hyperplanes HI, n•.
. . . , H~r. of (46.9) given by their coordinates
~ = (au, au, • • ·• aim),
nz = (asu al2, • • ., asm),
(48.1)
156 The Straight Line and the Plane [Ch. 5

in that basis. Assume that vectors


z = (Zt, z2, . . . , Zm)

lying in the intersection of the hyperplanes are also defined by their


coordinates in the same basis.
In the case of a real space the scalar product of vectors in the
orthonormal basis is the sum of pairwise products of coordinates.
Therefore in coordinate notation system (46.9) has the following
form:
auzl + auz2 + + almZm = bl,
auzl + a22Z2 + + azmZm = bz,
(48.2}
a,uzl + a112Z2 + · · · + allmZm= b11.
In the case of a complex space we again obtain a similar system"
except for the fact that the coefficients of the unknowns and the
right-hand sides are replaced by conjugate complex numbers.
So the problem of the intersection of hyperplanes reduces to the
familiar Section 22 problem in solving a system of linear algebraic
equations. It is obvious that any system of equations with complex
or real coefficients can also be investigated in terms of the intersec-
tion of hyperplanes in a complex or real space Pm.
One of the main points is the investigation of a system of linear
algebraic equations for compatibility. It is this point that deter-
mines the answer to the question as to whether the intersection of
hyperplanes is an empty set or a nonempty one. Of course, use can be
made of the Gauss method to carry out this study. This is not always
convenient, however.
In studying systems of linear algebraic equations one has to deal
with two matrices. One is made up of the coefficients of the unknowns
and called the matrix of the system. It is as follows:

The other results from adding to it a column of right-hand sides

'au a12 · · • aim b1)


a21 a22 a2m b2

~~~: ~~~~ ~~m· b.ll


(
•• • ••
48] Systems of linear equations 157

and is called the augmented matrix of the system. Note, in particular,


that the rank of the matrix of the system coincides with that of the
system of vectors (48.1).
Theorem 48.1. (Kronecker-Capelli). For a system of linear algebraic
equations to be compatible it is necessary and sufficient that the rank
of the augmented matrix of the system should equal that of the matrix
of the system.
Proof. We use the notation of Section 22. Up to arrangement vec-
tors a 1 , a~h ... , am, b are columns of the matrices under consider-
ation. Since the rank of a matrix coincides with that of the system
of its column vectors, to prove the theorem it suffices to show that
the system is compatible if and only if the rank of the system
a 1 , a 2 , • • • , am coincides with that of the system a 1 , a 1 , • • • , am, b.
Suppose system (48.2) is compatible. This means that equation
(22.1) holds for some collection of numbers z1 , z 2 , • • • , Zm, i.e. that
b is a linear combination of vectors a 1 , a 2 , • • • , am. But it follows
that any basis of the system a 1 , a 2 , • • • , am is also a basis of the sys-
tem alt a 2 , • • • , am, b, i.e. that the ranks of both systems coincide.
Now let the ranks of the systems coincide. Choose some basis
from alt a 2 , • • • , am. It will be a basis of the system a 1 , a 2 , • • •
. . . , am, b as well. Consequently, b is linearly expressible in terms
of some of the vectors of the system a 1 , a 2 , • • • , am. Since it can
be represented as a linear combination of all vectors a 1 , a 2 , • • • , am
too, this means that system (48.2) is compatible. Thus the theorem
is proved.
The system of linear algebraic equations (48.2) is said to be non-
homogeneous if the right-hand sides are not all zero. Otherwise the
system is said to be homogeneous. Any homogeneous system is always
compatible since one of its solutions is z1 = z2 = ... = Zm = 0.
A system obtained from (48.2) by replacing all the right-hand sides
with zeros is called the reduced homogeneous system. If (48.2) is com-
patible, then each of its solutions is a particular or partial solution.
The totality of particular solutions is the general solution of the
system.
Using the previously obtained facts about planes and systems
(46.6), (46. 7) and (46.9) describing the intersection of hyperplanes
we can make a number of conclusions concerning the general solu-
tion of a system of linear algebraic equations. Namely.
The general solution of a reduced homogeneous system forms in a
space Pm a subspace of dimension m - r, where r is the rank of the
matrix of the system. Any basis of that subspace is called a fundamen-
tal system of solutions.
The general solution of a nonhomogeneous system is a plane in a
space P m obtained by translating the general solution of the reduced
homogeneous system to the amount of any particular solution of the
nonhomogeneous system.
158 The Straight Line and the Plane [Ch. 5-

The difference between any two particular solutions of a nonhomoge-


neous system is a particular solution of the reduced Mmogeneous system.
Among the particular solutions of a nonhomogeneous system there is
only one ortMgonal to all the solutions of the reduced homogeneous
system. That solution is called normal.
For a compatible system to have a unique solution it is necessary
and sufficient that the rank of the matrix of the system should equal the
number of unknowns.
For a homogeneous system to have a nonzero solution it is necessary
and sufficient that the rank of the matrix of the system should at least
equal the number of unknowns.
This study of systems of linear algebraic equations used the con-
cept of determinant only indirectly, mainly through the notion of
the rank of a matrix. The determinant plays a much greater role in
the theory of systems of equations, however.
Let the matrix of a system be a square one. For the rank of the
matrix of the system to be less than the number of unknowns it is
necessary and sufficient that the determinant of the system should
be zero. Therefore
A homogeneous system has a nonzero solution if and only if the deter-
minant of the system is zero.
Suppose now that the determinant of a system is nonzero. This
means that the rank of the matrix of the system equals m. The rank
of the augmented matrix cannot be less than m. But it cannot be
greater than m either since there is no minors of order m + 1. Hence
the ranks of both matrices are equal, i.e. the system is necessarily
compatible in this case. Moreover, it has a unique solution. Thus
If the detenninant of a system is nonzero, then the system alu:ays has
a unique solution.
In terms of the intersection of hyperplanes this fact may be restat-
ed as follows:
If the normal vectors of hyperplanes form a basis of a space, then
the intersection of the hyperplanes is not empty and contains only
one vector.
Denote by d the determinant of a system and by d 1 a determinant
differing from d only in the jth column being replaced in it by a
column of the right-hand sides b11 b2 , • • • , bm. Then the unique
solution of the system can be obtained by the formulas

i= 1, 2, ... , n. (48.3)

Indeed, let A 11 denote the algebraic adjunct of an element a 11 of


the determinant of the system. Expanding d 1 by the jth column we get
n
d 1 = ~ b1Ail.
f=t
481 Systems of linear equations 159

Now substitute expressions (48.3) in an arbitrary kth equation of


the system
n n n n n

~
i=l
a, 1 1= ! ~ all/ ~ b Af/ = ! ~ ~ a, b Af/
i=l i=l
1
i=l f=l
1 1

n n n n

=! ~ ~ b1a, 1A 0 =
\=I i=l
! ~bf=l
1 ~
J=l
a"'A 11 = b,.

The inner sum in the last equation is by (40.5) and (40.8) equal to
either d or zero according as i = k or i =fo k.
Thus formulas (48.3) provide an explicit expression of the solu-
tion of a system in terms of its elements. By virtue of uniqueness
there is no other solution. These formulas are known as Cramer's
rule.
Formally, by calculating determinants it is possible to solve any
systems of equations of the form (48.2). First, calculating the vari-
ous minors of the matrix of the system and those of the augmented
matrix we check the system for compatibility. Let it be compatible
and let the rank of both matrices be r. We may assume without loss
of generality that the principal minor of order r of the matrix of the
system is nonzero. By Theorem 41.1 the last k - r rows of the
augmented matrix are linear combinations of its first r rows. Hence
the system
~ auZt + · ••+ atrZr ; + at.r+tZr+t + · · · + atmZm = b,,
~ ~tZt + ••· + azrZr + az,r+tZr+t + · · · + azmZm = bz,
:. . . . . . . . . . . ............... . (48.4)

is equivalent to system (48.2).


As before, the unknowns Zr+I• ••• , Zm are called free unknowns.
Assigning to them any values it is possible to determine all the other
unknowns by solving the system with the martix of the principal
minor, for example by Cramer's rule.
This method of solving a system is of some value only for theoret-
ical studies. In practice, however, it is much more advantageous to
employ the Gauss method described in Section 22.
Eurelses
1. Prove that a general solution is a convex set.
2. Prove that among all particular solutions of a nonhomogeneous system
the normal solution is the sh:ortest.
3. Prove that a fundamental system is a collection of any m - r solutions
of the reduced homogeneous system for which the determinant made up of the
values of the free unknowns is nonzero.
4. It was noted in Section 22 that small changes in coordinates may result
in the violation of linear dependence or independence of vectors. What conclu-
sions can be drawn from this conceming the hyperplane intersection problem?
CHAPTER 6

The Limit in Vector Space

49. Metric spaces


One of the basic concepts of mathematical anal-
ysis is the notion of limit. It is based on the fact that for the points
of the number line the concept of "closeness" or more precisely of
the distance between points is defined.
Comparison for "closeness" can be introduced in sets of a quite
different nature. We have already defined in Section 29 the distance
between vectors of vector spaces with scalar product. It was found
that it l'Ossesses the same properties (29.5) as those the distance
between points of the number line has. The distance between vectors
was defined in terms of a scalar product which in turn was intro-
duced axiomatically.
It is natural to attempt to introduce axiomatically the distance
itself by requiring that properties (29.5) should necessarily hold.
Notice that many fundamental facts of the theory of limit in
mathematical analysis are not associated with the fact that alge-
braic operations are defined for numbers. We therefore begin by ex-
tending the concept of distance to arbitrary sets of elements that are
not necessarily vectors of a vector space.
A set is said to be a metric space if every pair of its elements is
assigned a nonnegative real number called a distance, with the fol-
lowing axioms holding:
(1) p (x, y) = p (y, x),
(2) p (x, y) > 0 if x =1= y; p (x, y) = 0 if x = y,
(3) p (x, y) ~ p (x, z) + p (z, y)

for any elements x, y and z. These axioms are called axioms of a


metric, the first being the axiom of symmetry and the third the tri-
angle axiom.
Formally any set of elements in which the equality relation is
defined can be converted into a metric space by setting
0 if X= y,
p (x, y) = { 1 if x =1= y. (49.1)

It is easy to verify that all the axioms of a metric hold.


~9) Metric spaces 161

The vector x 0 of a metric space X is said to be the limit of a se-


quence {xn} of elements x 1 , x 2 , • • • , Xn, ••• of X if the sequence
of distances p (x 0 , x1 ), p (x 0 , x 2 ), • • • , p (x 0 , x,.), ... converges to
zero. In this case we write

or
lim Xn =Xe

and call the sequence {xn} convergent in X or simply convergent.


Notice that the same sequence of elements of the same set X may
be convergent or nonconvergent depending on what metric is intro-
duced in X. Suppose, for example, that in a metric space X some
convergent sequence {xn} is chosen consisting of mutually distinct
elements. We change the metric in X by introducing it according to
(4Q.1). Now {xn} is no longer convergent. Indeed, suppose Xn -x~,
i.e. p (xn, x~) -0. With the new metric this is possible if and only
if all elements of {xn }, except for a finite number of them, coincide
with x~. The contradiction obtained confirms the above assertion.
The following two properties are shared by any convergent se-
quence:
If {xn} converges, then so does and has the same limit any of its
subsequences. A sequence may have no more than one limit.
The first property is obvious. Suppose {xn} has two limits, x 0
and y 0 • Then for any arbitrarily small number e > 0 we can find a
number N such that

for e\·ery n > N. But from this, using the triangle axiom, we find
P (xo, Yo):::;;;; P (xo, Xn) + P (xn, Yo) <e.
By virtue of the arbitrariness of e this means that p (x 0 , y 0 ) = 0,
i.e. Xo = Yo·
A sphere S (a, r) in a metric space X is the set of all elements
x E X satisfying the condition
p (a, x) < r. (49.2)
An element a is called the centre of the sphere and r is its radius.
Any sphere with centre in a is a neighbourhood of a. A set of elements
is said to be bounded if it is entirely in some sphere.
It is easy to see that x 0 is the limit of {xn} if and only if any neigh-
bourhood of x 0 contains all the elements of {x,.} beginning with
some index.
In a metric space it is possible to introduce many of the other ma-
jor concepts dealt with in number sets. Thus if a set M c X is
given, then an element x E X is said to be a cluster point (or limit point
II-OSlO
162 The Limit in Vector Space [Ch. 6

or accumulation point) of that set if any neighbourhood of z contains


at least one element of M distinct from z. The set obtained by join-
ing to M all its cluster points is called the closure of M and desig-
nated M. The set M is said to be closed if M = M.
Consider the cluster points of sphere (49.2). We show that they
all satisfy the condition
p (a, z) :::;;;; r. (49.3)
Indeed, suppose there is at least one cluster point z' for sphere
(49.2) such that p (a, z') > r. By the definition of a cluster point
any neighbourhood of z' must contain at least one element of (49.2)
distinct from z'. But a neighbourhood with radius 0.5 (p (a, z') - r)
clearly contains no such element. Accordingly:
A setS (a, r) of all elements z satisfying (49.3) rs a closed sphere.

Exercises
1. Prove that if Zn-+ z, then p (zn, z)-+ p (x, z)
for any element z.
2. Will the set of all real numbers be a metric space if for numbers z and y
we set
p (z, y) = arctan I z - y I?
3. Can a set consisting of a finite number of elements have cluster points?

50. Complete spaces


A sequence {z,.} of elements of a metric space
is said to be a fundamental or Cauchy sequence if given any number
e > 0 we can find N such that p (z,., Zm) < e for n, m > N.
Any fundamental sequence is bounded. Indeed, given e choose N
according to the definition and take a number n 0 > N. :\II ele-
ments of the sequence, beginning with Zno• are clearly in a sphere
with centre Zn 0 and of radius e. The entire collection of the elements,
however, is in a sphere with centre Zn 0 and of radius equal to the
maximum of the numbers
e, p (zl, Xn,)o ••. I p (zn,_,, Xn,)·
If a sequence is convergent, then it is fundamental. Let a sequence
{zn} converge to z 0. Then given any e > 0 there is N such tha1
I!
p(zn, Zo)<-z
for n > N. By the triangle axiom
p (Zn 1 Zm) :::;;;; p (Zn 1 Zo) + p (zo, Zm) < e
for n, m > N, which precisely means that {zn} is fundamental.
50] Complete spaces 163

For the set of all reals the converse is also true. That is, any fun-
damental sequence is convergent. In general, however, this is not
true, which is exemplified by a metric space with at least one cluster
point eliminated.
A metric space is said to be complete if any fundamental seqnenc~
in it is convergent.
In complete metric spaces a theorem holds similar to the theorem
on embedded line segments for real numbers. Given some sequence
of spheres, we shall say that they are embedded into one another if
every subsequent sphere is contained inside the preceding one.
Theorem 50.1. In a complete metric space X let {S (an, en)} be a
sequence of closed spheres embedded into one another. If the sequence
of their radii tends to zero, then there is a unique element in X u·hich
is in all those spheres.
Proof. Consider a sequence {an}· Since S (an+P• en+ 1,) c S (an, en)
for any p ~ 0, we have an+P E S (a 11 , en)· Consequently,
p (an+P• a,.)~ en
which implies that {an} is fundamental.
The space X is complete and therefore {an} tends to some limit a
in X. Take any sphere S (a 10 e 11 ). This sphere contains all terms
of {a,.}, beginning with a 11 • By virtue of the closure of the spheres
the limit of {an} is also in S (a 11 , e 11 ). Thus a is in all the spheres.
Assume further that there is another element, b, that is also in
all the spheres. By the triangle axiom
r (a, b)~ p (a, an)+ p (an, b)~ 2en.
Since e,. may be taken as small as we like, this means that p (a, b)
= 0, i.e. that a = b.
The most important examples of complete spaces are the sets of real
and complex numbers. We assume that the distance between numbers
coincides with the absolute value of their difference. The complete-
ness of the set of reals is proved in the course of mathematical analy-
sis. We show the completeness of the set of complex numbers.
Assume that complex numbers are given in algebraic form. The
distance between numbers
z =a+ ib, v= c + id
is introduced by the rule
p (z, v) = I z- v 1. (50.1)
where
I z- v 12 = (a - c) 2 + (b - d) 2 • (50.2)
It is obvious that the axioms of a metric hold.
164 The:> Limit in Vector Space [Ch. 6

Consider a sequence {z11 = a 11 + rb 11 } of complex numbers. Let it


be fundamental. Given e > 0 there is N such that for all n, m > N
I Z n - Zm I< e.
From (50.2) it follows that
I an - am I < e, I bn - bm I < e, (50.3)
i.e. that {a 11 } and {b 11 } are also fundamental. By virtue of the com-
pleteness of the set of reals there are a and b such that
a 11 -a, b11 -b.
Proceeding in (50.3) to the limit we get
I an - a I ~ e, I bn - b I ~ £.
On setting
z =a+ ib,
we find that
p (zn, z) ~ V2e
for all n > N. But this means that the fundamental sequence {z 11 }
is convergent.
As a consequence, {z 11 = a11 + ib 1t} converges to a number z =
= a + rb if and only if {a11 } and {b11 } converge to a and b.
The complete space of complex numbers has much in common with
the space of real numbers. In particular, any bounded sequence of
complex numbers has a convergent subsequence. Indeed, this is
true for any bounded sequence of reals. It is also obvious that the
boundedness of {z11 = a 11 + ib~t} implies the boundedness of {a 11 }
and {b 11 }. Since {a 11 } is bounded, it has a convergent subsequence
{a,"}. Consider a sequence {bv"}. It is bounded and therefore it
too has a subsequence {bv11 }. It is clear that {av11 } will be con-
n n
vergent. Consequently, so is {zv,.n }.
In the complex space, just as in the real one, the concept of infi-
nitely large sequence can be introduced. Namely, a sequence {zll} is
said to be infinitely large if given an arbitrarily large number A
we can find N such that I z11 I >A for every k > N. It is obvious
that we can always choose an infinitely large subsequence of any
unbounded sequence.
Exercises
1. Will the set of all real numbc:>rs be a complete
space if for numbers z and y we set
p (z, y) =arctan! z-yl?
2. Prove that any closed set of a complete space is a complete space.
3. Will any closed set of an arbitrary metric space be nc:>cessarily a com-
plete space?
4. Construct a metric such that the set of all complex numbers is not a com-
plete space.
fi1) Auxiliary inequalities 165

51. Auxiliary inequalities


We establish some inequalities to be used in
our immediate studies. Take a positive number a and consider the
exponential function y =a~ (Fig. 51.1). Let x 1 and x 2 he two dis-
tinct reals. Draw a straight line through points with coordinates
(x1, ax1) and (x 2, ax2). Taking into account the properties of the
exponential function we conclude that
if the independent variable on the closed
interval lx 1 , x 2 ) is changed, none of its !J
points will lie higher than the points of
the constructed straight line.
Now let x 1 = 0 and .1- 2 = 1. Then
the equation of the straight line under
consideration will be y = ax + (1 - x).
Consequently, 0 1
ax :::;;;; ax + (1 - x) (51.1) Fig. 51.1
for 0:::;;;; x:::;;;; 1.
We shall say that positive numbers p and q are conjugate if they
satisfy the relation
(51.2)

It is clear that p, q > 1.


For any positive numbers a and b, the number aPb-q will also
be positive and it can be taken as a of (51.1). If we assume x = p-t,
then 1 - x = q- 1 • Now from (51.1) we get

b .- aP M
a:::::::::.-p+q- (51 .3)

for any positive a and b and conjugate p and q. It is obvious that


in fact this inequality holds for all nonnegative a and b.
Consider two arbitrary \'ectors x and y in the space Rn or Cn.
Let these vectors be given by their coordinates
x = (x1 , x 2 , ••• , Xn),

Y = (y., Y2• • · ·• Yn)·


We establish the so-called Holder's inequality

~otice that if at least one of the vectors x and y is zero, then Hol-
der's inequality is obviously true. It may therefore be assumed that
x =F 0 and y =F 0. Let the inequality hold for some nonzero vee-
1_6_6_______________
T_hc__L_im_i_t_in__
V_cc_to_r_S~p_ac_e____________~[~Ch. 6

tors x and y. Then it does for vectors 'A.x and llY• with any A. and Jl·
Therefore it suffices to prove it for the case where

(51.5)

Now putting a = I x 11 I and b = I y 11 I in (51.3) and summing over


k from 1 to n we get in view of (51.2) and (51.5)
n
~ I XIIYII I ::::;;;1.
11=1

But this is Holder's inequality for (51.5).


We now proceed to prove Minkowski's inequality which, for any
vectors X and y from Rn or says that en I

n n n
11
(~ lxii+YIIIP) P::s;;;(~ lxiiiP) P+(~ IYIIIP)
11 11
" (51.6)
11=1 11=1 11=1

for every p ~ 1.
1\linkowski 's inequality is obvious for p = 1, since the absolute
value of the sum of two numbers is at most the sum of their absolute
values. Moreover, it clearly holds if at least one of the vectors x
and y is zero. Therefore we may restrict ourselves to the case p > 1
and x =I= 0. We write the identity
(I a I + I b I)" = (I a I + I b nv- 1
Ia I
+ (I a I + I b 1)"- 1 I b I·
Setting a = x 11 and b = y 11 and summing over k from 1 to n we get
n

~ (I XII I+ I Yll I)"


11=1
n n
= L (I XII I+ I Y11 I)P-l I XII I+ 11=1
11=1
~ (I x11 I+ I Y11 I)P-I I Yll I·

Applying to each of the two sums at the right Holder's inequality


and considering that (p - 1) q = p we get
n
~ (I XII I + I Yll l)p
11~1

n n n
:::;;;;< ~ ox~~ 1+ 1Y11 l>p)l'q <( ~ 1x~~ 1pr'" + <~ 1y,_ lp)l'p).
11=1 11=1 11=1

On dividing both sides of the inequality by


~>2) :\omted spaces 167

we find that

from which we at once obtain inequality (51.6).

Exercises
1. Derive the Cauchy-Buniakowski-Schwarz inequal-
ity from Holder's.
2. Study Holder's inequality for p-. oo.
3. Study Minkowski's inequality for p-. oo.

52. Normed spaces


We have arrived at the concept of metric
space by concentrating on a single property of a set, that of distance.
Similarly, by concentrating on operations in a set we have arrived
at the concept of vector space. We now discuss vector spaces
with a metric.
It is obvious that if the concept of distance is in no way connected
with operations on elements, it is impossible to construct an inter-
esting theory whose facts would join together algebraic and metric
notions. We shall therefore impose additional conditions on the
metric introduced in a vector space.
In fact we have already encountered metric vector spaces. These
are, for example, a Euclidean and a unitary space with metric
(29.4). The need for such a metric, however, arises not always by
far. Introducing a scalar product actually means introducing not
only distance between elements but also angles between them. In a
vector space it is most often required to introduce only an acceptable
definition of distance. The most important vector spaces of this
kind are the so-called normed spaces.
A real or complex vector space X is said to be a normed space if
each vector x E X is assigned a real number II x II called the norm
of a vector x, the following axioms holding:
(1) II X II > 0 if X =I= 0, II 0 II = 0,
(2) II Ax II = I A I II X II.
(3) II X + y II :::;;;; II X II + II y II
for any vectors x and y and any number A. The second axiom is
called the axiom of the absolute homogeneity of a norm and the third is
the triangle inequality axiom.
From the axiom of the absolute homogeneity of a norm it follows
that for any nonzero vector x we can find a number A such that the
168 The Limit in Vector Space [Ch. 6

norm of a vector 'A.x equals unity. To do this it suffices to take A =


= II z 11- 1 • A vector whose norm equals unity will be called a
normed vector.
The triangle inequality for a norm yields one useful relation.
+
We have II z II:::;;;; II y II II z - y II for any z andy. Hence II z II -
- II y II :::;;;; II z - y 11. Interchanging z and y we find II y II -
- II z II :::;;;; II z - y 11. Therefore
Ill z II - II y Ill:::;;;; II z - y II· (52.1)
A normed space is easy to convert into a metric one by setting
p (z, y) = II z - y 11. (52.2)
Indeed, p (z, y) = 0 means II z - y II = 0, which according to
axiom (1) means that z = y. The symmetry of the distance intro-
duced is obvious. Finally, the triangle inequality for a distance is a
simple consequence of the triangle inequality for a norm. That is,
p (z, y) = II z - y II = II (z - z) +
(z - y) II
:::;;;; II z - z II + II z - y II = p (z, z) + p (z, y).
Notice that
II z II = p (z, 0). (52.3)
Metric (52.2) defined in a vector space possesses also the following
properties:
p (z +
z, y +
z) = p (z, y)
for any z, y, z EX, i.e. distance remains unaffected by a translation
of vectors and
p (Az, AY) = I A I p (z, y)
for any vectors z, y E X and any number A, i.e. distance is an abso-
lute! y homogeneous function.
If in a metric vector space X some metric satisfies these two addi-
tional requirements, then X may be regarded as a normed space if
a norm is defined by equation (52.3) for any z E X.
Taking into account the relations of Section 29 it is easy to estab-
lish that a vector space with scalar product is a normed space. It
should be assumed that the norm of a vector is its length.
It is possible to give other examples of introducing a norm. In a
vector space let vectors be given by their coordinates in some basi~.
If z = (z1 , z 2 , ••• , Zn), then we set

(52.4)

where p ~ 1. That the first two axioms hold for a norm is obvious
and the third axiom is seen to hold from Minkowski's inequality
53] Conv«'rgence in the norm 169

(51.6). !\lost commonly used are the following norms:


n
llxllt=L 1x,.1. II x lloo = max I x" I· (52.5)
II= I 1,.;;11,.;;n

The second of the norms is often called a Euclidean norm and desig-
nated II x liE·
In what follows we shall consider only normed spaces with metric
(52.2). The convergence of a sequence of vectors in such a metric is
called the convergence in the norm, the boundedness of a set of vectors
is cal led the boundedness in the norm, etc.

Exercises
1. Prove ~hat ~here is a sequence of vectors whoR
norms form an infinitely large sequence.
2. Prove that for any numbers >..1 and vectors e1
fl n
II i=l
~ i.,e, II~ ~ I AI I II e, 11.
1=1
3. Prove that if z,.-+ z, Yn-+ y and >..,.-+ >.., then
II z,. n- II J: II. z,. + Yn- X+ y, >..,.z,.- Ax.

53. Convergence in the norm and


coordinate convergence
In a real or complex finite dimensional vector
space it is possible to introduce another concept of convergence be-
sides the convergence in the norm. Consider a space X and Jet e1 ,
e 2 , • • • • • • , e,. be its basis. For any sequence {xm} of vectors
of X there are expansions

(53.1}

If for the vector


(53.2}

there are limiting relations


lim ~(m) = ~ro• (53.3)
-" -"
for every k = 1, 2, ... , n, then we shall say that there is a coordi-
nate convergence of {xm} to a vector x 0 •
Coordinate convergence is quite natural. If two vectors are "close''.
then it may be assumed that so should the corresponding coordi-
170 The Limit in Vector Space [Ch. G

nates in the expansion with respect to the same basis. Finite dimen-
sional normed spaces are noteworthy for the fact that the not ions of
convergence in the norm and of coordinate convergence are equiva-
lent in them.
It is easy to show that coordinate convergence implies conver-
gence in the norm. Indeed, let (53.3) hold. Using the axioms of the
absolute homogeneity of a norm and of the triangle inequality we
conclude from (53.1) and (53.2) that
n n
II Xm- Xo II= II ~ (~Lm) -61.01 ) e~~.l!::s;;; ~ 16Lm) -~1.0 ' I II ell II- 0.
11=1 11=1

The proof of the converse is essentially more involved.


Lemma 53.1. If in a finite dimensional normed space a sequence of
vectors is norm bounded, then the numerical sequences of all the coor-
dinates in the expansion of vectors with respect to any basis are bound-
ed too.
Proof. Let each vector of a sequence {xm} be represented in the
form (53.1). We introduce the notation

.and prove that {crm} is bounded.


Suppose this is not the case. Then we may choose an infinitely
large subsequence {crm p }. Set

If
n
y ~ 11<P>e 11 ,
= 11=1
p II

then of course
6(mp)
11(p)= _II_ _
II Omp

for e\·ery k = 1, 2, ... , n and every mp and we get

(53.4)

Sequences {11'~' } are bounded since by (53.4) I 11'~' I :::;;;; 1. Hence


it is possible to choose a subsequence of vectors {Yp,} such that
{11<P:l} is convergent, i.e.
Ji Ill 11~Pa) = 11 I
Pa-<»
Convergence in the norm 171

for some number TJ 1 • It is possible to choose in turn a subsequence


{y1, 2 } of {yp,} such that
lim TJ~p,) = TJ 2
p,-oo

for some number TJ 2 • As before


lim TJiP•> = l'Jt·
P•-oo

Continuing the process we choose a subsequence {y1,n} of {y1,}


such that there are limits
lim TJ~Pn) = TJII (53.5)
Pn-""
for every k = 1, 2, ... , n. By (53.4)
n
~ ITJ11I=L (53.(i)
II= I

Since coordinate convergence implies convergence in the norm,


limiting relations (53.5) mean that

lim II YPn- Y II= 0, (33. 7)


Pn-""
where
n
y= ~ T)11e11.
II= I

By virtue of (53.6) y must not equal zero. On the other hand,


II z"'Pn II
IIYPnll=
am,,n
-o.
since the sequence {zm } is norm bounded and {crm } is infinitely
Pn Pn
large. Hence it follows from (53.7) that II y II = 0, i.e. that y is
a zero vector. This contradiction proves the lemma.
Theorem 53.1. In a finite dimensional normed space convergence in
the norm implies coordinate convergence.
Proof. Let {zm} be a sequence of vectors converging in the norm
to a vector z 0 • It is obvious that it suffices to consider the case where
z 0 = 0 and {zm} contains no zero vectors. We represent vectors Zm
as (53.1). The sequence of vectors
1
Ym= llzmll Xm
172 The Limit in Vector SpacP [Ch. 6

will be norm bounded and by Lemma .13.1 the sequences of numbers


~(m)
(Ill)_ "
TJ,. - II Zm II

must be bounded for every k = 1, 2, ... , n. Since II Zm II- 0,


this is possible if and only if ~;.m•- 0 for every k. But this means that
there is coordinate convergence of the sequence {xm} to the vec-
tor x 0 •
Coordinate convergence can be efficiently used in theoretical
studies, but in practical applications it is more convenient to use
convergence in the norm. This is mainly because in the study of vec-
tor spaces of high dimension it is hard to deal with a large number
of coordinate sequences. Besides we do not always know at least
one basis. But even if we do, its use most often results in unjustly
lengthy calculations.

Exercises
1. Is the requirement that a space should be finite
dimensional essential in proving the equivalence of the two kinds of con-
vergence?
2. Prove that if some set of vectors of a finite dimensional space is bounded
in one norm, then it will be bounded in any other norm.
3. Prove that if in a finite dimensional space Zn-+ z in one norm, then
Zn-+ x in any other norm.

54. Completeness of normed spaces


Finite dimensional normed spaces are spaces
where many statements similar to those associated with the concept
of limit in number sets are true. Consider some of them.
Lemma 54.1. It is possible to choose a subsequence of any bounded
sequence of vectors of a finite dimensional normed space, convergent in
that space.
Proof. Let {.rm} be an arbitrary norm bounded sequence. Repre-
sent vectors Xm as (53.1). By Lemma 53.1 sequences g<;:'l} are
bounded for every k = 1, 2, ... , n. Choose, in the same way as in
the proof of Lemma 53.1, a subsequence {xmn} of {xm} such that
there are limiting relations ~~mnl - ~;.o' for every k. It follows that
{xmn' converges in the norm to vector (.'l3.2).
Thts lemma is similar to the well-known Bolzano-Weierstrass
theorem of mathematical analysis. It is of great importance for
studies of any finite dimensional normed spaces. As an illustration,
we prove some statements.
Theorem 54.1. Any finite dimensional normed space is complete.
54) Completeness of normed spaces 173

Proof. Let {zm} be a fundamental sequence. It is bounded. Choose


a com·ergent subsequence {xmn} and denote by x 0 its limit. We
have
II Xm - Zo II:::;;;; II Xm - Xm n II + II Xm n - Xo 11.
Take an arbitrary number e > 0. Since {zm} is fundamental,
there is N 1 such that II Zm- Zm n II< e/2 for m, mn > N 1. Since
the sequence {xmn} converges to x 0 , there is N 2 such that II Zmn -
- x 0 II< e/2 for mn > N 2 • If N is the maximum number of N 1
and N 2 , for m > N
II Xm- Xo II< e.
The number e is arbitrary. Hence the fundamental sequence {xm}
converges in the norm to the vector x 0 •
Lemma 54.2. Any finite dimensional subspace X 0 of a normed space
X is a closed set.
Proof. Consider in a normed space X a finite dimensional subspace
X 0 • Let a vector x E X be cluster point for X 0 • This means that
there is a sequence {zm} of vectors of X 0, distinct from z, such that
II Zm - z II- 0. The sequence {xm} is bounded and therefore it is
possible to choose a subsequence {zmp} of {zm}, converging, in view
of the completeness of X 0 , to some vector x 0 E X 0 • ~ow we have
II Z - Zo II:::;;;; II Z - Xmp II + II Xmp - Xo II- 0,
i.e. x = z 0 •
Lemma 54.3. Let X be a normed space and let X 0 be its finite dimen-
sional subspace distinct from it. There is a normed vector x Et X 0 such
that liz- z 0 II~ 1 for any vector x 0 E X 0 •
Proof. The subspace X 0 does not coincide with X and therefore
there is a vector z' Ef X 0 • Since X 0 is closed, we have
inf II z'- x 0 II= d > 0. (34.1)
xoEXo
By the definition of the greatest lower bound there is a vector x~"> in
X 0 for which
d::s;;; II x'- x~"> 11::::;;; 1-d2-ll •

The sequence {x~">} is bounded. Choose a subsequence {z~"~'>} of it,


conYerging, in view of the completeness of X 0 , to some vector x~ E
E X 0 • For that vector obviously
II x'- II= d. z; (54.2)
Set
1 (z 1 -z ' ) •
Z=d 0
174 The Limit in Vector Space [Ch. 6

It is clear that II :c II = 1. Moreover, if z 0 E X 0 , then by (54.1)

II x-x0 II= 11-}x'- ~ x~-- x II=-} II x'-(x~+ dx0)11~+


0 d = 1,
since the vector z~ + d:c 0 is in X 0 .
Incidentally, we have proved that the lower bound (54.1) is
reached on at least one vector z~ EX 0 • In II :c - x 0 II~ 1 equality
clearly holds for z 0 = 0.
To conclude, Lemma 54.1, which plays so great a part in finite
dimensional spaces, holds in no infmite dimensional space. Namely,
we have
Lemma 54.4. If it is possible to choose a convergent subsequence of
any bounded sequence of vectors of a normed space X, then X is finite
dimensional.
Proof. Suppose the contrary. Let X be an infmite dimensional
space. Choose an arbitrary normed vector z 1 and denote by L 1 its
span. By Lemma 54.3 there is a normed vector z 2 such that II x 2 -
-x 1 II~ 1. Denote by L 2 the span of z 1 and z 2 • Continuing the rea-
soning we find a sequence fzn} of normed vectors satisfying II Zn-
-:c" II~ 1 for every k < n. Hence it is impossible to choose any con-
vergent subsequence of {zn}· This contradicts the hypothesis of the
lemma and therefore the assumption that X is infinite dimensional
was false.
Exercises
1. Prove that a plane in a normed finite dimensional
space! is a closed set.
2. Prove that a set of vectors z of a finite dimensional space satisfying
II z !I~ a is a closed set.
3. Prove that in a closed bounded set of vectors of a finite dimensional
space there are vectors on which both the lower and the upper bound of values
of any norm are attained.
4. Prove that given any two norms, II z 11 1 and 11 z llrr. in a finite dimen-
sional space there arc positive a and ~ such that

a II z III ~ II z llu ~ ~ II z III


for every vector z. The numbers a and ~ are independent of z.

55. The limit and computational processes


In complete metric spaces the concept of limit
is widely used in constructing and justifying various computational
processes. We consider as an example one method of solving systems
of linear algebraic equations.
Given a system of two equations in two unknowns, assume that it
is compatible and has a unique solution. For simplicity of presenta-
The limit and computational processes 175

tion suppose that all coefficients are real. Each of the equations.
au.r + al2y = ft,
a21Z + a22Y = I 2

of the system defines in the plane a straight line. The point M of the
intersection of the straight lines gives a solution of the system
(Fig. 55.1).
Take a point M 0 lying on none of the straight lines in th£> plane.
Drop from it a perpendicular to either straight line. The foot M 1 of
the perpendicular is closer to M
than M 0 is, since a projection is
always smaller than the inclined g
line. Drop then a perpendicular
from M 1 to the other straight line.
The foot M 2 of that perpendicu-
lar is still closer to the solution.
Successively projecting now onto
one straight line now onto the other,
we obtain a sequence {M 11 } of
points of the plane converging toM. o'----------_,x,..
It is important to note that the
sequence constructed converges for Fig. 55.1
any choice of the initial point M 0 •
This example suggests how to construct a computational process.
to solve a system of linear algebraic equations of the general form
(48.2). We replace the problem by an equivalent problem of finding
vectors of the intersection of a system of hyperplanes (46.9). Sup-
pose the hyperplanes contain at least one common vector and a'>sume
for simplicity that the vector space is real.
Choose a vector v0 and project it onto the first hyperplane. ProJect
the resulting vector v1 onto the second hyperplane and so on. This
process determines some sequence {vp}· Let us investigate that
sequence.
Basic to the computational process is projecting some vector Vp
onto the hyperplane given by equation (46.8). It is clear that. v,.+I
satisfies that equation and is related to Vp by the equation
Vp+l = Vp + tn
for some number t. Substituting Vp+I in (46.8) determines t. From
this we get
bp -(n, vp) )
Vp+l=vp+ ( (n, n) n.

This formula says that all vectors of the sequence {vp} are in the
plane obtained by translating the span L (n 1 , n2 , ••• , n 11 ) to the
amount of a length of the vector v 0 • But all vectors which are in the
176 The Limit in Vector Space [Ch. 6

intersection of hyperplanes (46.9) are in the plane obtained by trans-


lating the orthogonal complement £1. (n 1 , n 2 , • • • , n 11 ). There is
a unique vector z0 that is in both planes.
If we prove that some subsequence of {vp} converges to some vector
that is in hyperplanes (46.9), then by the closure of a plane it is
to z0 that it converges. Moreover, so does the enlire sequence {vp}·
For any r vectors z0 - v,+ 1 and Vr+l - Vr are orthogonal and
therefore by the Pythagorean theorem
p2 (zo, Vr) = p2 (zo, Vr+l) + p2 (vr, Vr+l)·
Summing the equations obtained over r from 0 to p - 1 we find
p-1
p2 (z0, Vo) = P2 (zo, v p) + r=O
~ P2 (vr, Vr+l)•

Consequently
p-1
~ p2 (Vr, Vr+l)::s;;;p2 (Zo, Vo)
r=O
from which we conclude that
p (vp. v,+l)- 0. (55.1)
Denote by H r the hyperplane in the rth row of (46.9). It is clear
that the distance from Vp to II r is not greater than the distance be-
tween v, and any vector of H r· By the construction of {v,}, among
any k consecutive •lectors of {vp} there is necessarily a vector that is
in any of the hyperplanes. Using the triangle inequality and the
limiting relation (55.1) we get
P (vp. II r):::;;;; p (v,, Vp+ 1) + p (vp-+1• Vp+2) +
... + p (vp+ll-l• Vp+ll)- 0 (55.2)
for every r = 1, 2, ... , k.
The sequence {v11 } is obviously bounded. Choose some convergent
subsequence of it. Let that subsequence converge to a vector z~.
Proceeding in (55.2) to the limit we find that
p (z~, Hr) = 0
for every r = 1, 2, ... , k. But, as already noted earlier, the vector
z~ must coincide with z0 • Consequently, {vJ>} converges to z0 •

Exercises
1. Were the concepts of completeness and closure actu-
ally used in the above investigation?
2. How are other solutions of the system to be found if there are any?
3. How will the process behave itself if the system is incompatible?
PART II

Linear Operators

CHAPTER 7

Matrices and Linear Operators

56. Operators
A major point in creating the foundations of
mathematical analysis is the introduction of the concept of func-
tion. By definition, to specify a function it is necessary to indicate
two sets, X and Y, of real numbers and to formulate a rule assign-
ing to each number z EX a unique number y E Y. That rule is
a single-valued function of a real variable z given on the set X.
In realizing the general idea of functional dependence it is not at
all necessary to require that X and Y l"hould be sets of real numbers.
Understanding by X and Y various sets of elements we arrive at the
following definition generalizing the concept of function.
A rule assigning to each element z of some nonempty set X a unique
element y of a nonempty set Y is called an operator. A result y of
applying an operator A to an element z is designated
y = A (z), y = Az (56.1)
and A is said to be an operator from X to Y or to mapJX into Y.
The set X is said to be the domain of A. An element y of (56.1) is
the image of an element z and z is the inverse image of y. The collec-
tion T A of all images is the range (or image) of A. If each element
y E Y has only one inverse image, then operator (56.1) is said to
be 1-1. An operator is also called a mapping, transformation or
operation.
In what follows we shall mainly consider the so-called linear opera-
tors. Their distinctive features are as follows. First, the domain of
a linear operator is always some vector space or linear subspace.
Second, the properties of a linear operator are closely related to
operations on vectors of a vector space. As a rule in our study of
linear operators we shall assume that spaces are given over a field
of real or complex numbers. Unless otherwise stated, operator will
mean linear operator. In the general theory of operators linear opera-
tors play as important a part as the straight line and the plane do
in mathematical analysis. That is why they require a detailed study.
Let X and Y be vector spaces over the same field P. Consider an
operator A whose domain is X and whose range is some set
178 Matrices and Linear Operators [Ch. 7

in Y. The operator A is said to be linear if


A (au + ~v) = aAu + ~Av (56.2)
for any vectors u, v E X and any numbers a, ~ E P.
We have already repeatedly encountered linear operators. Accord-
ing to (9.8) a linear operator is the magnitude of a directed line
segment. Its domain is the set of all directed line segments of an
axis and its range is the set of all real numbers. Also a linear opera-
tor is by (21.2) an isomorphism between two vector spaces. Fix in
a vector space with a scalar product some subspace L. We obtain
two linear operators if each vector of the space is assigned either its
projection onto L or the perpendicular from that vector to L. This
follows from (30.5) and (30.6).
An operator assigning to each vector x of X the zero vector of Y
is obviously a linear operator. It is called the zero operator and
designated 0. So
0 =Ox.
Assigning to each vector x E X the same vector x yields a linear
operator E from X to X. This is called the identity or unit operator.
By definition
x =Ex.
Let A be some operator from X to Y. We construct a new opera-
tor B according to the prescription Bx = -Ax. The resulting
operator B is also a linear operator from X to Y. It is said to be
opposite to A.
Finally fix an arbitrary number a and assign to each vector x E X
a vector ax E X. The resulting operator is certainly a linear opera-
tor. It is called a scalar operator. When a = 0, we obtain a zero
operator, and when a = 1, we obtain an identity operator.
We shall soon describe a general method for constructing linear
operators, and now we point out some of their characteristic fea-
tures. By (56.2)
p p
A(~ tt 1x 1) = ~ a 1Ax1
i=l ~I
for any vectors x 1 and any numbers a 1• In particular, it follows that
any linear operator A transforms a zero vector into a zero vector, i.E"
0 = AO.
The range T A of a linear operator A is a subspace of Y. If z = Au
and w = A v, then a vector az + ~w is clearly the image of a vector
au + ~v for any numbers a and ~- Hence the vector az + ~w is
in the range of A. The dimension of T A is called the rank of A and
denoted by r A·
56] Operators 179

Along with T A consider the set N A of all vectors x E X satisfying


Ax= 0.
It is also a subspace and is called the kernel or null space of A. The
dimension nA of the kernel is called the nullity of A.
The rank and nullity are not independent rharacteristics of a lin-
ear operator A. Let X have a dimension m. Decompose it as a direct
sum
X=NA+illA. (56.3)
where N A is the kernel of A and M A is any complementary subspace.
Take a vector x E X and represent it as a sum
X= XN + XM,
where xN EN A and xM EM A· If y =Ax, then since A is linear
and AxN = 0

Hence any vector from T A has at least one inverse image in M A·


In fact, that inverse image is unique in M A· Suppose for some
vector y E T A we have two inverse images, XM, xi.f E M A· Since M A
is a subspace, XM - xi.f EM A· Since XM and xi1 are the inverse
images of the same vector y, xM- xi.f EN A· Subspaces M A and
N A have only a zero vector in common. Therefore x].,- xi.f = 0,
i.e. XM = xi.f.
Thus the operator A establishes a 1-1 correspondence between the
vectors of T A and M A· By virtue of the linearity of A this corre-
spondence is an isomorphism. Hence the dimensions of T A and M A
coincide and are equal to r A· Now it follows from (56.3) that
(56.4)
Note that the linear operator A establishes an isomorphism be-
tween T A and any M A of X that in the direct sum with the kernel of A
constitutes the entire space X. It may therefore be assumed that
every linear operator A generates an entire family of other linear
operators. First, it is the zero operator defined on the kernel N A•
i.e. the one from N A to 0. Second, it is a set of linear operators from
the subspaces M A complementary to the kernel to the subspace T A·
It is a very important fact that each of the new operators coincides
on its domain with A. If N A = 0, then M A = X and the entire
second set of operators coincides with A. If N A = X, however,
then A is a zero operator. We shall continue this discussion later on.
180 Matrices and Linear Operators [Ch. 7

Exercises
Prove that the following operators are linear.
1. A basis is given in a vector space X. An operator A assigns to each vector
z E X its coordinate with a fixed index.
2. A vector z 0 is fixed in a space X with a scalar product. An operator A
assigns to each vector z E X a sc&lar product (z, z 0 ).
3. A vector z 0 is fixed in a space V8 An operator A assigns to each vector
z E V 2 a vector product [z, z 0 ].
4. A space X is formed by polynomials with real coefficients. An operator A
assigns to each polynomial its kth derivative. It is called an operator of k-fold
differentiation.
5. In a space of polynomials dependent on a variable t, an operator A assigns
to each polynomial P (t) a polynomial t .p (t).
6. A space X is decomposed as a direct sum of subspaces S and T. Represent
each vectors z E X as a sum z = u + v, where u E S and v E T. An operator A
assigns to a vector z a vector u. It is called an operator of projectlon onto S paral-
lel to T.

57. The vector space of operators


Fix two vector spaces, X and Y, over the
same field P and consider the set Wxy of all linear operators from
X toY. In Wxy we can introduce the operations of operator addition
and of multiplication of an operator by a number from P turning
thereby Wxy into a vector space.
Two operators, A and B, from X to Y are said to be equal if
Az = Bz
for each vector z EX. It is easy to verify that the equality relation
of the operators is an equivalence relation. The equality of operators
is designated
A= B.
An operator C is said to be the sum of operators A and B from
X to Y if
Cx =Ax+ Bx
for each x E X. A sum of operators is designated
C =A+ B.
By definition it is possible to add any operators from X to Y.
If A and B are linear operators of Wxy, then so is their sum. For
any vectors u, v E X and any numbers a, ~ E P we have
C (au + ~v) = A (au +
~v) +
B (au + ~v) = aAu
+ ~Av + aBu + ~Bv =a (Au+ Bu) ~ (Av + + Bv)
= aCu + ~Cv.
5i] The vector space of operators 181

The addition of operators is an algebraic operation. It is also asso-


ciative. Indeed, let A, B and C be three arbitrary linear operators
of W,;ry. Then for any vector x E X
((A +B)+ C) x =(A +B) x + Cx =Ax+ Bx + Cx
= Ax + (Bx + Cx) = Ax + (B + C) x = (A + (B + C)) x.
But this means that
(A + B) + C = A + (B + C).
The addition of operators is a commutative operation. If A and B
are any operators of w x y and x is any vector of X, then
(A +B) x =Ax + Bx = Bx + Ax = (B +A) x,
i.e.
A+B=B+A.
Now it is easy to show that the set Wxr• with the operator addi-
tion introduced in it, is an Abelian group. It has at least one zero
element, for example a zero operator. Each element of Wxy has at
least one opposite element, for example an opposite operator. Every-
thing else follows from Theorem 7 .1.
It follows from the same theorem that the addition of operators
has an inverse. We shall call it subtraction and use the generally
accepted notation and properties.
An operator C is said to be the product of an operator A from X
to Y by a number A. from a field P if
Cx = A.·Ax
for each x E X. This product is designated
C =A.A.
A product of a linear operator of w XY by a number is again a lin-
ear operator of Wxy· Indeed, for any vectors u, v EX and any num-
bers a, ~ E P we have
C (au + ~v) = AA (au+ ~v) =A (aAu + ~Av) =a (AAu)
+ ~ (AA.v) = aCu + ~Cv.
It is not hard to show that the addition of operators and the multi-
plication of an operator by a number satisfy all properties that define
a vector space. Hence the set Wxy of all linear operators from avec-
tor space X to a vector space Y forms a new vector space. It follows
that from the point of view of the operations of multiplication of
an operator by a number and of addition and subtraction of opera-
tors all the rules of equivalent transformations of operator algebraic
expressions are valid. In what follows these rules will no longer be
stated explicitly.
182 Matrices and Linear Operators [Ch. 7

Note that nowhere did we use the relation between vector


spaces X and Y. They may be both distinct and coincident. The set
wxx of linear operators from a space X to the same space X will be
one of the main objects of our studies. We shall call them linear
operators in X.

Exercises
1. Prove that multiplying an operator by a nonzero
number leaves its rank and nullity unaffected.
2. Prove that the rank of a sum of operators is at most a sum of the ranks
of the summands.
3. Prove that a set of linear operators of CllxY whose ranges are in the same
subspace is a linear subspace.
4. Prove that a system of two nonzero operators of Cll ~ Y whose ranges are
distinct is linearly independent. ·
5. Prove that a space of linear operators in V1 is one-dimensional.

58. The ring of operators


Consider three vector spaces X, Y and Z
over the same field P. Let A be an operator from X to Y and
let B be an operator from Y to Z.
An operator C from X to Z is said to be a product of B by A if
C:c = B (Az)
for each vector :c E X. A product of B and A is designated
C =BA.
A product of linear operators is again a linear operator. For any
vectors u, v E X and any numbers a, ~ E P we have
C (au + ~v) = B (A (au + ~v)) = B (aAu + ~Av)
= aB (Au) + ~B (Av) = aCu + ~Cv.
The multiplication of operators is not an algebraic operation if
only because product is not defined for any pair of operators. Never-
theless if realizable operator multiplication possesses quite definite
properties. Namely:
(1) (AB) C = A (BC),
(2) ). (BA) = ().B) A = B ().A),
(58.1)
(3) (A +B) C = AC + BC,
(4) A (B + C) = A B + AC
58) The ring of operators 183

for any operators A, B and C and any number A. from P if of course


the corresponding expressions are defined.
The proofs of all these properties are similar, and therefore we
shall restrict ourselves to the study of the first property. Let X, Y,
Z and U be fixed vector spaces; and let A, B and C be any linear
operators, where A is an operator from X to Y, B is operator from
Y to Z, and C is an operator from Z to U. Observe first of all that in
equation (1) both operators, (AB) C and A (BC), are defined. For
any vector x E X we have
((AB) C) x = AB (Cx) = A (B (Cx)),
(A (BC)) x = A (BCx) = A (B (Cx)),
which shows the validity of equation (1).
Again consider the set W.:u: of linear operators in X. For any two
operators of Wxx both a sum and a product are defined. According
to properties (3) and (4) both operations are connected by the distrib-
utive law. The set Wxx of linear operators is therefore a ring. It
will be shown in what follows that a ring of operators is noncommu-
tative. It may of course happen that for some particular pair of
operators A and B the relation AB = BA does hold. Such operators
will be called commutative. In particular, an identity operator is
commutative with any operator.
In a ring of linear operators, as in any other ring, a product of
any operator by a zero operator is again a zero operator. The distrib-
utive law relates to multiplication not only a sum of operators but
also their difference. A ring of linear operators is at the same time
a vector space and therefore for a difference of operators we have
A - B = A + (-1) B.
Property (2) of (58.1) describes the relation of multiplication of oper-
ators in a ring to multiplication by a number. Remaining valid
of course are all the relations following from the properties of vector
spaces.

Exercises
1. In a space of polynomials in t, denote by D an oper-
ator of differentiation and by T an operator of multiplication by t. Prove that
DT ::p TD. Find the operator DT- TD.
2. Fix some operator B of the space wxx. Prove that the set of operators A
such that BA = 0 is a subspace in Wx~·
3. Prove that the rank of a product or operators is not greater than the rank
of each of the factors.
4. Prove that the nullity of a product of operators is not leas than the nullity
of each of the factors.
5. Prove that in the ring wx x of linear operators there are zero divisors.
184 ~latrices and Linear Operators [Ch. 7

59. The group of nonsingular operators


Linear operators in a space X form an Abelian
group relative to addition. But we can find among such operators
sets that are groups relative to multiplication. These are connected
with the so-called nonsingular operators.
An operator in a vector space is said to be nonsingular if its kernel
consists only of a zero vector. An operator that is not nonsingular is
termed singular.
Nonsingular operators are, for example, the identity operator and
the scalar operator, provided it is not zero. Sometimes it is possible
to associate with an operator A in X some nonsingular operator even
in the case where A is singular. Indeed, let T A be the range of A
and let N A be its kernel. If T A and N A have no nonzero vectors in
common, then by (56.4) we have
X=NA+T,\.
As already noted, an operator A generates many other operators
from any subspace complementary to N A to the subspace of values
T A· In the case considered A generates an operator from T A to T A•
That operator is nonsingular since it sends to zero only the zero
vector of T A·
Nonsingular operators possess many remarkable properties. For
such operators the nullity is zero and therefore it follows from (56.4)
that the rank of a nonsingular operator equals the dimension of the
space. If a nonsingular operator A is an operator in X, then its
range T A coincides with X. Thus each vector of X is the image of
some vector of X. This property of a non singular operator is equiva-
lent to its definition.
An important property of the nonsingular operator is the uniqueness
of the inverse image for any vector of a space. Indeed, suppose that
for some vector y there are two inverse images, u and v. This
means that
Au= y, Av = y.
But then
A (u- v) = 0.
By the definition of a nonsingular operator its kernel consists only
of a zero vector. Therefore It - v = 0, i.e. u = v. This property is
also equivalent to the definition of a nonsingular operator. Actually
it has already been noted in Section 56.
A product of any finite number of nonsingular operators is also
a nonsingular operator. Obviously it suffices to prove this assertion
for two operators. Let A and B be any nonsingular operators in the
same space X. Consider the equation
BAz = 0. (59.1)
59] The group of nonsiDgular operators 185

According to the definition of operator multiplication this equation


means that
B (Ax)= 0.
The operator B is nonsingular and therefore it follows from the last
equation that Ax = 0. But A is also nonsingular and therefore
x = 0. So only a zero vector satisfies (59.1), i.e. the operator BA
is nonsingular.
A sum of nonsingular operators is not necessarily a nonsingular
operator. If A is a nonsingular operator, then so is the operator
( -1) A. But the sum of these operators is a zero operator which
is singular.
Consider a set of nonsingular operators in the same vector space.
On that set the multiplication of operators is an algebraic and asso-
ciative operation. Among nonsingular operators is also the identity
operator E which plays the part of unity. Indeed it is easy to verify
that for any operator A in X
AE = EA =A.
If we show that for any nonsingular operator A there is a nonsingu-
lar operator such that a product of it and A is an identity operator,
then this will mean that the set of all nonsingular operators is
a group relative to multiplication.
Let A be a nonsingular operator. As we know, for each vector
y E X there is one and only one vector x EX connected with y by
the relation
y =Ax. (59.2)
Consequently, it is possible to assign to each vector y E X a unique
vector x E X such that y is its image. The relation constructed is some
operator. It is called the inverse of the operator A and designated A -I.
If (59.2) holds, then
X = A- 1y. (59.3)
We prove that an inverse operator is linear and nonsingular.
A product is defined for any operators and not only for linear
operators. Therefore it follows from the definition of an inverse
operator that
A -IA = AA -I = E. (59.4)
To prove these equations it suffices to apply the operator A -I to both
sides of (59.2) and the operator A to both sides of (59.3).
Tak£> any vectors u, v E X and any numbers a, ~ E P and con-
sider the vector
z = A -I (au - ~v) - aA -Iu - ~A -Iv.
186 Matrices and Linear Operators [Ch. 7

Now apply A to both sides of the equation. From the linearity of A


and from (59.4) we conclude that Az = 0. Since A is nonsingular,
this means that z = 0. Consequently,
A -I (au +
~v) = aA - 1u + ~A -lv,

i.e. A -I is linear.
It is easy to show that A - 1 is nonsingular. For any vector y from
the kernel of A -I we have
A - 1y = 0.
Apply A to both sides of this equation. Since A is a linear operator,
AO = 0. Considering (59.4) we conclude that y = 0. So the kernel
of A -I consists only of a zero vector, i.e. A - 1 is a nonsingular operator.
Thus the set of nonsingular operators is a group relative to multi-
plication. It will be shown somewhat later that this group is non-
commutative.
Using nonsingular operators it is possible to construct commuta-
tive groups too. Let A be an arbitrary operator in a space X. For any
positive integer p we define the pth power of A by the equation
p

(59.5)
where the right-hand side contains p multipliers. By virtue of the
associativity of the operation of multiplication the operator AP is
uniquely defined. Of course it is linear.
For any positive integers p and r it follows from (59.5) that
APA'"=AP+r. (59.6)
If it is assumed by definition that
A0 = E
for any operator A, then formula (59.6) will hold for any nonnegative
integers p and r.
Suppose A is a nonsingular operator. Then so is the operator A'
for any nonnegative r. Hence there is an inverse operator for A r.
By (7.2) and (59.5) we have
r
...----..
(Art!= (A-I)r =A-lA-I ... A-I. (59. 7)
Also it is assumed by definition that
A_,. = (A'")- 1•
Taking into account formulas (59.5) and (59.7) and the fact that
AA -I = A-lA, it is not hard to prove the relation
APA -r = A-,.A•
60] The matrix of an operator 187

for any nonnegative integers p and r. This means that formula (59.6)
holds for any integers p and r.
Now take a nonsingular operator A and make up a set wA of
operators of the form A 11 for all integers p. On that set the multiplica-
tion of operators is an algebraic and, as (59.6) implies, commutative
operation. Every operator A" has an inverse equal to A -P. Contained
in wA is also an identity operator E. Hence wA is a commutative
group relative to multiplication. It is celled a cyclic group generated
by the operator A.

Exercises
1. Prove that if for two linear operators A and B
of we have AB = E, then both operators are nonsiiurular.
CJl;g;x
2. Prove that for operators A and B of CJl;g~ to be nonsingular it is necessary
and sufficient that so should operators AB and BA.
3. Prove that if an operator A is nonsingular and a number a ::P 0, then the
operator aA is also nonsingular and (aA)-1 = (1/a)A - 1•
4. Prove that TAcNA if and only if A1 = 0.
5. Prove that for any operator A
N.4.sNA.sNA.s ... , T.4.~TA.~TA.~ •..•

6. Prove that an operator P is a projection operator if and only if P 1 = P.


What are subspaces N? and T p?
7. Prove that if P 1s a projection operator, then E - P is also a projection
operator.
8. Prove that if an operator A satisfies Am= 0 for some positive integer m,
then the operator aE - A is nonsingular for any number a ::P 0.
9. Prove that a linear operator A for which E a 1A + a 1 A1+ + ...
...+ anA n = 0 is nonsingular.
tO. Prove that if A is a nonsingular operator, then either all operators in
the cyclic group CJl A are distinct or some power of A coincides with the identity
operator.

60. The matrix of an operator


We discuss one general method of constructing
a linear operator from an m-dimensional space X to an n-dimen-
sional space Y. Suppose the vectors of a basis e1 , e2 , • • • , em of X
are assigned some vectors / 1 , / 2 , • • • , fm of Y. Then there is a unique
linear operator A from X to Y which sends every vector e" to a cor-
responding vector !11.·
Suppose that the desired operator A exists. Take a vector x E X
and represent it as
X = ~lei + ~2e2 + · · · + ~mem.
Then
m m m
Ax= A ( L ~11.e11.) = L ~11.Ae11. = ~ ~11.!11.·
lo=t 11=1 11=1
188 Matrices and Linear Operators [Ch. 7

The right-hand side of the relations is uniquely defined by x and


the images of the basis. Therefore the equation obtained proves the
uniqueness of the operator A if it exists. On the other hand, we can
define the operator A by this equation, i.e. we can put
m
Ax=~ f.~tfll·
ll=t

It is easy to verify that the operator obtained is a linear operator


from X to Y sending every vector e" to a corresponding vector f~~..
The range T A of A coincides with the span of the system of vectors
f1, f2, · · ., fm·
We can now draw an important conclusion: a linear operator A
from a space X to a space Y is completely defined by the collection of
images

of any fixed basis

of X.
Fix a basis e1 , e 2 , • • • , em in X and a basis q1 , q2 , • • • , qn in Y.
A vector e1 is sent by A to some vector Ae1 of Y which, as any vector
of that space, can be expanded with respect to the basis vectors
Ae1 = a 11 q1 +a 21 q 2 + ... + an qn.
1

Similarly

Aem = a1mq1 + a2mq2 + · · · + anmqn.


The coefficients a 11 of these relations define an n X m matrix A98
au a12 • • • aim)

A qe-_ a21 azz · · · azm


• ....... .
(
ani anz · · · anm
ca11ed the matrix of the operator A relative to the chosen bases.
The columns of the matrix of the operator are the coordinates of
the vectors Ae 1 , Ae 2 , • • • , Aem relative to the basis q1 , q 2 , • • • , qn.
To determine an element au of the matrix of A we should apply A
to the vector e1 and take the ith coordinate of the image Ae1• If
by {x} 1 we denote for brevity the ith coordinate of a vector x, then
au = {Ae 1 } 1• This method of determining the elements of the matrix
of an operator will be made use of in what follows.
60) The matrix of an operator 189

Consider a vector z E X and its image y = Az. We show how the


coordinates of the vector y can be expressed in terms of the coordi-
nates of z and the elements of the matrix of the operator. Let

(60.1)

Obviously
1n m m n nm
Az=A(_L ~ 1 e1 )- ~ ~ 1 Ae 1 = ~ ~ 1 1; a, 1q1 -~ (.~ ~p 11 )q,.
]=1 f=l )=t i=t i=t i=l
Comparing the right-hand side of these equations with expan•
sion (60.1) for y we conclude that the equations
m

~ al/s/='11
1=1
must. hold for i = 1, 2, ••• , n, i.e. that
au~l + au~2 + • • • + alm~m = TJu
a21 ~~ + a22~2 + • • • + a2m~m = TJ 21
(60.2)
anl~l + an2~1 + • • • + anm~m = TJn•

Thus, given fixed bases in X andY, every linear operator generates


relations (60.2) connecting the coordinates of image and inverse
image. To determine the coordinates of the image from the coordi-
nates of the inverse image it suffices to calculate the left-hand sides
of (60.2). To determine the coordinates of the inverse image from the
known coordinates of the vector y we have to solve the system of
linear algebraic equations (60.2) for the unknowns ~ 1 , ~ 2 , • • • , ~m·
The matrix of the system coincides with the matrix of the operator.
Relations (60.2) establish a deep connection between linear opera-
tors and systems of linear algebraic equations. In particular, it
follows from (60.2) that the rank of an operator coincides with that
of the matrix of the operator and that its kernel coincides with the
number of fundamental solutions of the reduced homogeneous sys-
tem. This fact trivially yields (56.4) and a number of other formulas.
We shall fairly often turn to the connection between linear algebra-
ic equations and linear operators. But we shall first prove that there
is a 1-1 correspondence between operators and matrices, which prop-
erly speaking determine systems of the form (60.2). We have already
shown that every operator A determines some matrix All.! given
fixed bases. Now take an n X m matrix A qe· With bases in X and Y
fixed, relations (60.2) assign to each vector z E X some vector y E Y.
It is easy to verify that this correspondence is a linear operator. We
construct the matrix of a given operator in the same bases. All the
190 Matrices and Linear Operators [Ch. 7

coordinates of a vector e1 are zero, except the jth coordinate which


is equal to unity. It follows from (60.2) that the coordinates of the
vector Ae1 coincide with the elements of the jth column of A qe and
therefore {Aet} 1 = a 11 • Hence the matrix of the constructed operator
coincides with the original matrix A qe·
So every n X m matrix is the matrix of some linear operator from
an m-dimensional space X to an n-dimensional space Y, with bases
in X andY fixed. A 1-1 correspondence is thus established between linear
operators and rectangular matrices given any fixed bases. Of course
both the vector spaces and the matrices are considered over the
same field P.
Consider some examples. Let 0 be a zero operator. We have

Hence all elements of the matrix of a zero operator are zero. Such
a matrix is called a zero matrix and designated 0.
Now take an identity operator E. For this operator we find

Therefore the matrix of an identity operator has the following form.


It is a square matrix whose principal diagonal has unities and whose
other elements are zeros. The matrix of an identity operator is
called an identity or unit matrix and denoted by E.
We shall fairly often deal with yet another type of matrix. Let
A-1 , A- 2 , • • • , "-n be arbitrary numbers from a field P. We construct
a square matrix A, with the given numbers along the principal
diagonal and all other elements zero, i.e.

Matrices of this form are called diagonal. If all diagonal elements are
equal, then the matrix is said to be a scalar matrix. In particular,
an identity matrix is a scalar matrix. Rectangular matrices construct-
ed in a similar way will also be called diagonal. If we consider
relations (60.2), we shall easily see what the action of a linear opera-
tor with a matrix A is. This operator "stretches" the ith coordinate
of any vector by a factor of A- 1 for every i.
61) Operations on matrices 19t

EKerclses
1. In a space of polynomials of degree not higher than n
a basis 1, t, t1 , •••• tn is fixed. or what form is the matrix of the operator
of differentiation in that basis?
2. In a space X an operator P of projection onto a subspace S parallel to
a subspace Tis given. Fix in X any basis made up as a union of the bases of S
and T. Of what form are the matrices of P and E - Pin that basis?
3. Let A be a linear operator from X to Y. Denote by M4 _a subspace in X
complementary to the kernel N.A and by RA a subspace in Y complementary
to T A· How will the matrix of A change if in choosing bases in X and Y we use
bases of some or all of the indicated subspaces?

6t. Operations on matriees


As we have shown, given fixed bases in spaces
every linear operator is uniquely defined by its matrix. Therefore
the operations on operators discussed earlier lead to quite definite
operations on matrices. In the questions of interest to us now the
choice of basis plays no part and therefore operators and their ma-
trices will be denoted by the same letters with no indices relating
to bases.
Let two equal operators from an m-dimensional space X to an
n-dimensional space Y be given. Since equal opera tors £>Xhibit same-
ness in all situations, they will have the same m a lrix. This justifies
the following definition.
Matrices A and B of the same size n X m with el£>ments a 11 and
b IJ are said to be equal if
all= b 11
fori = 1, 2, ••• , nand j = 1, 2, ••. , m. The equality of matrices
is designated
A= B.
Suppose now that A and B are two operators from X toY. Consid-
er an operator C = A + B. Denote the elements of the matrices
of the operators respectively by ell, a I} and biJ. According to the
foregoing ell = {Ce 1}1• Considering the definition of a sum of opera-
tors and the properties of the coordinates of vectors relative to the
operations on them we get
cu,.,. {CeJ}I ,.,. {(A + B) e1} 1 = {Ae1 + BeJ}t
= {Ae 1}, + {Be1} 1 = a, 1 + b11.
Therefore:
A sum ofj two matrices A and B of the same size n X m with
elements a 11 and b11 is a matrix C of the same size with elements CtJ if
CIJ = all + biJ
192 Matrices and Linear Operators [Cb. 7

for i = 1, 2,
designated
..., n and j = 1, 2, ••• , m. A sum of matrices is

C =A+ B.
A difference of two matrices A and B of the same size n X m
with elements all and biJ is a matrix C of the same size with ele-
ments cl}lif
cfl = all - biJ
for i = 1, 2, ••• , n and 1 = 1, 2, ••• , m. A difference of matrices is
designated
C =A- B.
Consider an operator A from X to Y and an operator C = A.A for
some number A.. If a 11 and CIJ are elements of the matrices of the op-
era tors, then
cu = {Ce1} 1 = {A.Ae 1 }, =A. {AeJ}t = A.au,
and we arrive at the following definition:
A product of an n X m matrix A with elements a 11 by a number A.
is an n X m matrix C with elements c 11 if
c 11 = A.au
for i = 1, 2, ... , nand j = 1, 2, ••• , m. A product of a matrix by
a number is designated
C =A.A.
Let an m-dimensional space X and an n-dimensional space Y be
given over the same field P. As proved earlier, given f1xed bases in X
andY there is a 1-1 correspondence between the set Wxy of all opera-
tors from X to Y and the set of all n X m matrices with elements
from P. Since operations on matrices were introduced in accor-
dance with operations on operators, the set of n X m matrices, just
as the set Wxy, is a vector space.
It is easy to show one of the bases of the space of matrices. It is,
for example, a system of matrices A (ltP) for k = 1, 2, ... , n and
p = 1, 2, ... , m, where the elements al~P> of a matrix A (ltP) are
defined by the following equations:
(ltp) { 1 if i=k, j=p,
ai·1 =
0 otherwise.
In the space Wxya basis is a system of operators with matrices A <11 P>.
From this we conclude that the vector space of operators from X to Y
is a finite dimensional space and that its dimension is equal to mn.
Let X, Y and Z be vector spaces, let A be an operator from X toY
and let B be an operator from Y to Z. Also let m, n and p be the
61] Operations on matrices 193

dimensions of X, Y and Z respectively. Assume that bases e1 ,


... , em, q1 , . • . , qn and r 1 , • • • , rp are fixed in X, Y and Z. The
operator A has an n X m matrix with elements a 11 , with
n
Ae1 = ~ a, 1q,.
s=t

The operator B has a p X n matrix with elements b11 , with

Investigating the matrix of the operator C = BA we conclude that


it must be p X m and that its elements are as follows:
n
c11 ={Ce1} 1 ={BAe1}, = (B(~ a, 1q,)j,
s=l
n n p p n n
=\.L as/Bq,l,=IL a,,~ b11,r11lt=f~ (~ b,.,a, 1 )rllli=~ b1,a,,.
s=l s=l 11=1 11=1 s=l s=l
This formula suggests the following definition:
A product of a p X n matrix B with elements b 11 and an n X m
matrix A with elements a 11 is a p X m matrix C with elements c11 if
n
c11 = ~ b1,a,1 (61.1)
s=t

for i = 1, 2, ... , p and j = 1, 2, ••. , m. A product of matrices is


designated
C = BA.
Thus a product is defined only for the matrices in which the num-
ber of columns of the left factor is equal to the number of rows of
the right factor. The element of the matrix of the product at the inter-
section of the ith row and the jth column is equal to the sum of the
products of all the elements of the ith row of the left factor by the
corresponding elements of the jth column of the right factor.
We recall once again that there is a 1-1 correspondence between
linear operators and matrices. Operations on matrices were intro-
duced according to operations on operators. Therefore the operation
of matrix multiplication is connected by relations (58.1) with the
operations of matrix addition and of multiplication of a matrix by
a number.
We have already noted that a ring of operators and the group of
all nonsingular operators in a vector space are noncommutative. To
prove this it obviously suffices to find two square matrices A and B
194 Matrices and Linear Operators [Ch. 7

such that AB =1= BA. Take, for example,

A=(~~). B=G ~)-


It is obvious that

BA=G ~)
and the noncommutativity of multiplication is proved.
The operation of matrix multiplication provides a convenient way
of writing relations of the type (60.2). Denote by Xe an m X 1 ma-
trix made up of the coordinates of a vector x andy an n X 1 matrix
made up of the coordinates of a vector y. Then reiations (60.2) are
equivalent to a single matrix equation
(61.2)
which is called a coordinate equation corresponding to the operator
equation
yAx=
and relates in matrix form the coordinates of inverse image and
image by means of the matrix of the operator.
It is important to note that the coordinate and the operator equa-
tion look quite alike from the notational point of view if of course
the indices are dropped and the symbol Ax is understood as a prod-
uct of A by x. Since the notation and the properties of operations
on matrices and operators coincide, any transformation of an opera-
tor equation leads to the same transformation of the coordinate
equation. Therefore formally it makes no difference whether we deal
with matrix equations or with operator equations.
In what follows we shall actually draw no distinction between
operator and coordinate equations. Moreover, all new notions and
facts that hold for operators will as a rule be extended to matrices, unless
otherwise noted.

Exerclse9
1. Prove that operations on matrices are related to the
operation of transposition by
(aA)' = aA', (A + B)' = AI + B'.
(AB)' = B'A', (A)'= A.
2. Prove that every linear operator of rank r can be represented as a sum
of r linear operators and cannot be represented as a sum of a smaller number of
operators of rank 1.
3. Prove that an n X m matrix has rank 1 if and only if it can be represented
as a product of two nonzero, n X 1 and 1 X m, matrices.
62) Matrices and determinants 195

4. Suppose that for fixed matrices A and B we have


AC =BC

for any matrix C. Prove that A = B.


5. Find the general form of a square matrix commutative W1th a given diago-
nal matrix.
6. Prove that for a matrix to be a scalar matn:\ it is necessary and sufficient
that it should be commutative with all square matrices.
7. A sum of the diagonal elements of a matrix A is called the trace of A and
designated tr A. Prove that
tr A = tr A', tr (aA) = a•trA,
tr(A +B)= trA + trB, tr(BA) = tr(AB).
8. Prove that a real matrb. A is zero if and only if tr (AA ') = 0.

62. Matrices and determinants


Matrices play a very important part in the
study of linear operators, with determinants not infrequently used
as an auxiliary tool. We discuss now some questions connected with
matrices and determinants.
Let A be a nonsingular operator in a space X. Its rank coincides
with the dimension of X. According to formulas (60.2) this means
that the rank of the system of columns of the matrix of the operator
coincides with the number of them. This is possible if and only if
the determinant of the matrix of an operator is zero. So
An operator in a vector space is nonsingular if and only if the deter-
minant of its matrix is nonzero.
This property of a nonsingular operator justifies the following
definitions:
A square matrix is said to be nonsingular if its determinant is
nonzero and singular otherwise.
Of course, relying on the corresponding properties of nonsingular
operators we can say that a product of nonsingular matrices is again
a nonsingular matrix, that all nonsingular matrices form a group
relative to multiplication, that every nonsingular matrix generates
a cyclic group and so on. Their connection with nonsingular opera-
tors allows us to say that every nonsingular matrix A has a unique
matrix A - 1 such that
A - 1A == A A - 1 = E. (62.1)
The matrix A - 1 is called the inverse of the matrix A.
Using the concept of determinant it is possible to express the
explicit form of the elements of A - 1 in terms of the minors of the
matrix A. Formulas (40.5) to (40.9) provide a basis for this. Taking
into account formula (61.1) for an element of a product of two ma-
196 Matrices and Linear Operators [Ch. 7

trices we conclude that equations (62.1) are satisfied by the matrix


(~ A21 Ami
d -d- --d-

Au Au Am2
A-1 = -d- --d- --d-

Alm A2m Amm


-d- -d- ... -d-J

Here d is the determinant of a matrix A; A 11 is the cofactor of aiJ,


an element of A. By virtue of the uniqueness of the inverse matrix
it is the only form it can have.
We introduce abbreviated notation for the minors of an arbitrary
matrix A. A minor on the rows i 1 , i 2 , • • • , ip
and columns j 1 , j 2 , • • •
• • •, jp is designated
il, i2, ...• ip) .
A( . . .
11• 12· ... , ]p
In addition, it is assumed that if any indices in the upper (lower) row
in the notation of a minor coincide, then this means that so do the
corresponding rows (columns) of the minor itsalf.
Theorem 62. t (Cauchy-Binet formula). Let a square n X n matrix C
be equal to the product of two rectangular matrices A and B, of size
n X m and m X n respectively, with m ~ n. Then
12
c( 1 2
~ (1 2 ...
A k 1 k2 •••
n )
kn
(k1 k2 •.•
B 1 2 ... n
kn)

(62.2)
I E;;ll1<ll2< ... <h n ~m

Proof. Denote by a;Jt biJ and c, 1 elements of A, Band C. Ac-


cording to the defmition of a matrix product we have
m
c11 = ~ a 18 b81 •
•=I
Substituting for the elements of C their expressions and using the
linear property of the determinant for column vectors we find
det ( cu
... Cln)
Cnl ... Cnn
m m m
( L a13 1b.11 ~ a,,,b. 22 ~ a1s b. n)
·~~l •2=1 •n=l n n
c::: det
m m m

l •1=1
L a"'lb.l, ~ a.,.,2b•22 ~ ansnbsnn J
•2=1 •n=l
62) Matrices and determinants 197

m
( a~o 1 bs 11 ~ au 2b, 2 2
S2=l

m m

~ aru 2 bs 2 2 ~ ansnbsnn
l tZ,u 1bs 1 1
•2=1 •n=l
m

m m [(ats 1 1
ba 1
~ alsnbsnn
•n=l

= ~ ~ det · · · ·
•1=1 •2=1

tlna1bs1l

m m m (1 2
•i~1 s~l •~I A S1 S2
(62.3)

Each of the indices s1 , s 2 , • • • , s,1 is independent of the others and


may take on any values from 1 tom and so the expression obtained
is a sum of mn terms. In that sum those terms are zero at least two of
whose indices are the same, since so are the corresponding minors of
the matrix A. All the other terms can be divided into groups of nl
terms each, regarding as a group all terms the values of whose indices
form the same collection of numbers.
Denote by k1 , k 2 , • • • , kn the values of the indices s1 , s 2 , • • • , Sn
arranged in increasing order. Let
e (stt s2 , ••• , sn) = ( -1)!V,
where N is the number of transpositions required for a permutation
s1 , s 2 , • • • , Sn to be transformed into k1 , k 2 , • • • , kn. Then within
one group of values of the indices s1 , s 2 , • • • , sn the sum of the corre-
sponding terms in (62.3) will be equal to
~
LJ e (s 1, s 2 , ••• , sn) A
(1k 2k ...
I 2 • • •
1 2 ... n ) ~
=A ( k k k LJ e (sl, s2, ...• Sn) bsllbs22 ••• bsnn
I 2 • • • n
1 2 ... n ) (k 1 k2
=A ( k1 k 2 • •• kn B 1 2
It is from this relation that (62.2) follow~.
198 Matrices and Linear Operators [Ch. 7

Corollary. The determinant of the product of two square matrices


equals the product of the determinants of the factors.
In this case the sum in (62.2) will consist of a single term. Therefore
c(1122 n)=A(1 2 ... n) B
n 12 ... n
(1122 ...
...n
n)

or equivalently
det C = det A ·det B.
Corollary. Let a square n X n matrix C equal the product of two
rectangular, n X m and m X n, matrices A and B, with m < n.
Then det C = 0.
Indeed, add to A and B n - m zero columns on the right and
n- m zero rows below, and the matrices obtained become square
n X n matrices with zero determinants. The product of those matrices
is a matrix C. Therefore according to the first corollary det C = 0.

Exercises
1. Prove that (A - 1)' = (A ')-1 for any nonsingular
matrix A.
2. Prove that det (A - 1) = (det A )- 1 for any nonsingular matrix A.
3. Prove that det (aA) = an ·det A for any square n X n matrix A.
4. Prove that if AB = E for square matrices A and B, then A is nonsingular
and B =A -1.
5. Write a fonnula of the type (62.2) fur an arbitrary minor of the product
of two matrices.
6. Prove that for any real matrix A all the principal minors of matrice..,
A' A and AA' are nonnegative.
7. Prove that the rank of a product of matrices is not greater than the rank
of each of the factors.
8. Prove that multiplying by a nonsingular matrix leaves the rank unaffected.

63. Change of basis


Given fixed bases in spaces the coordinate equa-
tion allows a complete study of the action of the linear operator. It
is obvious that the study is more efficient the simpler the form of the
matrix of the operator is. In general the matrices of operators are
dependent on the choice of bases and our immediate task is to clarify
this dependence.
Let eh e 2 , • • • , em and f 1 , f 2 , • • • , fm be two bases of the same
m-dimensional space X. The vectors / 1 , f 2 , • • • , f m are uniquely
defined by their expansions
/1 = Pue1 + P21e2 -i- · · · + Pm1em,
f2 = P12e1 + P22e2 -+- · · · + Pm2em,
(6.31)
63) Change of basis 199

with respect to the vectors e1 , e 2 , ••• , em. The coefficients p 11 define


a matrix
Pu P12 · · · P1m )
p= P.21. .P~2. : ·. · . ~2~ ,
(
Pml Pm2 ·· · Pmm
called the coordinate transformation matrix for a change from a basis
e1 , e2 , • • • , em to a basis f 1 , f 2 , • • • , fm·
Take a vector z E X and expand it with respect to the vectors of
both bases. Let
m m

z= 2:
h=1
~1 e 1 = ~ T],/ 1•
i=1

By (63.1) we have
m m m m mm mm

~ 1 ~,e,=] 1 TJd1=~ 1 TJ1~1 P11e1 = ~~ ~~~ TJIPJI)e,=i~l ~1 TJJPI/)e,.


Comparing the coefficients of e1 on the left and the right of these
relations we find
m

~~ = 1=1
~ PIJTIJ (63.2)

fori = 1, 2, ... , m. These formulas are called coordinatetransforma-


tion formulas. As before denote by z, and z 1 m X 1 matrices made
up of the coordinates of the vector z in the corresponding bases.
Formulas (63.2) show that
x, = Pz1• (63.3)
A coordinate transformation matrix must be nonsingular, since
otherwise there will be linear dependence among its columns and
hence among the vectors f 1 , f 2 , • • • , fm· Of course, any nonsingular
matrix is the matrix of some coordinate transformation defined by
equation (63.3). Multiplying (63.3) on the left by the matrix P- 1
we get
z 1 = p-tz,.
Now let e1 , • • • , em, f 1 , • • • , fm and r1 , • • • , rm be three bases in
a vector space X. A change from the first basis to the third can be
effected in two ways: either directly from the first to the third basis
or from the first to the second and then from the second to the third.
It is not hard to establish the connection between the corresponding
coordinate transformation matrices. By (63.3)
z, = Pz1, x 1 = Rxr, z, = Sz,.
200 Matrices and Linear Operators [Ch. i

From the first two relations it follows that


Xe = Px, = P (Rxr) = (PR) Xr,
\\ h ich implies that
S = PR.
Thus, with coordinate transformations successively carried out, the
matrix of the resulting transformation i~ equal to the product
of the matrices of the constituent transformations.
Again consider a linear operator A from X toY. Choose two bases,
e1 , • • . , em and / 1 , • • • , fm, in X and two bases, q1 , • • • , qn and
t 1 , • • • , t,., in Y. Corresponding to the same operator A in the fir::~t
pair of bases is the coordinate equation
Yq = A,,eX., (63.4)
and in the second
(63.5)
Accordingly we have two matrices A q" and A 11 for the same oper-
ator A.
Denote by P a coordinate transformation matrix for a change
from the basis e1 , • • • , em to the basis / 1 , • • • , fm and by Q a coordi-
nate transformation matrix for a change from q1 , • • • , qn to t~o ...
. . . , t,.. We have
Yq = Qy,. (63.6)
Substituting these expressions for Xe and Yq in (fl3.4) we fmd
Qy 1 = Aq .. Px 1,
which yields
Yt = (Q- 1AqeP) x,.
Comparing this with (63.!l) we conclude that
Act= Q- 1AqeP. (fi3. 7)
This is the desired relation connecting the matrices of the same
operator in different bases.

Exercises
1. Prove that the rank of the matrix of an operator is
not affected by a change to other bases.
2. Prove that the determinant of the matrix of an operator in a vector space
is independent of the choice of basis.
3. What correspondence can be established between nonsingular operators
in a space X and transformations of coordinates in the same space?
4. Let us say that two bases of the same real space are of the same sign if
the determinant of their coordmate transformation matrix is positive. Prove
that all bases can be divided into two classes of bases of the same sign.
5. Let us say that one class of bases of the same sign is left-handed and the
other is right-handed. Compare these classes with those described in Section 34.
64] Equivalent ao.d similar matrices 201

64. Equivalent and similar matric~s

Corresponding to every linear operator A from


a space X to a space Y is a set of its matrices defined by the possi-
bility of choosing different bases in X andY. The structure of that
set is essentially different according as X coincides with Y or
does not.
Two rectangular matrices A and B of the same size are said to be
equivalent if there are two nonsingular squ11re ml'ttrices R and S
such that
B = RAS.
It follows from (63.7) that two matrices corresponding to the same
linear operator when different bases are chosen in X and Y are
always equivalent. It is not hard to see that the converse is l'llso-
true. That is, two equivalent matrices always correspond to the same
linear operator in suitably chosen bases. Thus, corresponding to
every linear operator mapping X into Y is a dass of equivalPnt
matrices.
Theorem 64.f. For two rectangular matrices of the same size to be
equivalent it is necessary and sufficient th{l-t they should Mve the sam~
rank.
Proof. Multiplying any matrix by nonsingular matrices leaves.
its rank unaffected and therefore equivalent matrices have the same
rank. Now let two matrices of the same size have the same rank. ·we
prove that they are equivalent. We prove even more, that is t h11 t
every matrix of rank r is equivalent to a matrix
(10 00 0
01 ... 00 ... 0

0 0 1 0 0
1,=
00 ... 00 ... 0

0 0 0 0 0
l r J
Let a rectangular n X m matrix be given. It defines some linear-
operator A mapping a space X with a basis e1 , e 2 , ••• , em into a space
Y with a basis q1 , q2 , • • • , qn. Denote by r the number of linearly
independent vectors among the images of the vectors of the basis
Ae 11 Ae 2 , • • • , Aem. We may assume without loss of generality that
it is the vectors Ae 1 , Ae 2 , • • • , Ae, th11t are linearly independent,
since this can be achieved by a proper numbering of the basis vectors.
The remaining vectors Aer+l• ... , Aem can be linearly expressed in
202 Matrices and Linear Operators [Ch. i

terms of them,
r

Ae" = ~ c" 1Ae1 (64.1)


1=1

fork= r -r 1, ... , m. We define a new basis f 1 , f 2 , • • • , fm in X as


follows
k=1, 2, ... , r,
(64.2)
k=r+1, .... m.

Then by (64.1)
A/11 = 0 (64.3)
'for k = r + 1, ... , m. Set, further,
Af1 = t1 (64.4)
for j = 1, 2, ... , r. Vectors t1 , t 2 , • • • , tr are by assumption linearly
independent. Supplement them with some vectors tr+u ••• , tn to
a basis in Y and consider the matrix of the operator A in the new
bases f 1 , . • • , fm and t 1 , . . . , tn. The coefficients of the kth column of
the matrix coincide with the coordinates of the vector Af, in the
basis t 1 , • • • , tn. According to (64.3) and (64.4) the matrix of A will
·Coincide with I r·
The original matrix and I r correspond to the same operator and
therefore they are equivalent. Hence all matrices of the same rank
are equivalent to I r and therefore to one another.
While proving the theorem we answered a very important question:
Hou· are bases in spaces X and Y to be chosen for the matrix of the
linear operator to have the simplest form? Besides we have shown an
explicit form of that simplest matrix.
So simple and effective an answer has turned out to be possible
because bases in X andY could be chosen independently of each other.
Now let A be an operator in a space X. Of course, we could again
consider images and inverse images in different bases, but it is not
natural now since both images and inverse images are in the same
space. Using different bases would greatly hamper the study of the
action of the operator on the vectors of the space X. If there is one
basis, then the matrices P and Q in (63.6) coincide. Hence, corre-
sponding to every linear operator in a vector space is a class of matri-
ces connected by the relations
B = P- 1AP (64.5)
for different nonsingular matrices P. Such matrices are called similar
and a matrix P is called a similarity transformation matrix.
64) Equivalent and similar matrices 203

The question as to when two matrices can be similar is rather


complicated and we shall get the answer much later. As complicated
is the question of what the form of the simplest of all similar matrices
is like. Devoted to these studies are the next two chapters.

Exercises
1. Prove that the equivalence and similarity criteria
~f matrices are equivalence relations.
2. Prove that similar matrices have the same trace and the same determi-
nant.
3. Prove that under the same similarity transformation a cyclic group of
nonsingular matrices goes over into a cyclic group.
4. Prove that under the same similarity transformation a linear subspace of
matrices goes over into a linear subspace.
5. On a set of square matrices of the same size consider an operator of simi-
larity transformation of those matrices using a fixed similarity transformation
matrix. Prove that that operator is linear.
6. Prove that the set of all similarity transformation operators over the
same set of square matrices of the same size forms a group relative to multipli-
cation.
CHAPTER 8

The Characteristic Polynomial

65. Eigenvalues and eigenvectors


Let A be a linear operator in a space X. This
means that each vector x E X is assigned some vector y = Ax of
the same space X. It may turn out that for some nonzero vector x
its image and inverse image are collinear. As we shall see in what
follows such a situation substantially simplifies the study of the-
operator.
A number A. is said to be an eigenvalue and a nonzero Yector :z is
said to be an eigenvector of a linear operator A if they are connected
by the relation Ax = A.x.
Notice that if xis an eigenvector corresponding to an eigenvalue A.,.
then any collinear vector ax for a. :::/= 0 will be also an eigenYector.
If there are two eigenvectors, x and y, corresponding to an eigenval-
ue A., then any nonzero vector of the form ax + ~y will be an eigen-
vector. By definition a zero yector is not an eigenvector. Therefore
the set X,., of all eigenvectors which are linear combinations of any
number of given eigenvectors corresponding to the same eigenvalue A.
is not a subspace. If, however, we extend X,., by joining a zero vec-
tor, then X,., will become a subspace. We call it a proper subspac~
of A corresponding to the eigenvalue A..
It is not hard to understand that eigenvectors of the operators 0, E
and a.E are all the nonzero vectors of X. These operators have each
only one eigenvalue, 0, 1 and a., respectively, and hence at least
one proper subspace coinciding with the entire space X. The pro-
jection operator P has two collections of eigenvectors, all vectors in
the range of P and all vectors in the range of E - P. To the first
collection there corresponds the eigenvalue A. = 1 and to the second
there corresponds -i.. = 0. Indeed, since P 2 = P, we have
P (Px) = P 2x = Px = 1 · Px,
P ((E- P) x) = (P- P 2 ) x = (P- P) x = 0 = O·(E-P) x.
Consequently, a projection operator has at least two proper sub-
spaces.
Theorem 65.1. A system of eigenvectorsx 1 , x 2 , • • • , Xm of an 9perator A
corresponding to mutually distinct eigenvalues A. 1 , A- 2 , • • • , A.'" is linearlu
independent.
~5) Eigenvalues and eigenvectors 205

Proof. Eigenvectors are nonzero by definition and therefore the


theorem is clearly true for m = 1. Let it be true for any system
of m - 1 eigenvectors but false for vectors x 1 , x 2 , • • • , Xm· Then the
system of those vectors will be linearly dependent, i.e. for some
numbers a 1 , a 2 , • • • , am not all zero
(65.1)

Suppose that a 1 :/= 0. Applying A to (65.1) we get


(65.2)
On multiplying (65.1) by Am and subtracting it from (65.2) we find
<XI (AI - Am) X1 + Cl2 (A2 - Am) X2 +
• • · + Clm-1 (Am-1 -Am) Xm-1 = 0.
By induction hypothesis it follows that all coefficients of x 1 , x 2 , • • •
• . . , Xm- 1 are zero. In particular, a 1 (A1 - Am) = 0, which contra-
dicts the hypothesis that A1 :/= Am and the assumption that a 1 :/= 0.
Hence the system of vectors x 1 , x 2 , • • • , Xm is linearly independent.
Corollary. No linear operator in an m-dimensional space can have
more than m mutually distinct eigenvalues.
Of particular interest is the case when in an m-dimensional space
the operator A has m mutually distinct eigenvalues. Then by Theo-
rem 65.1 we can choose a basis of the space consisting entirely of
eigenvectors of A.
A linear operator A in an m-dimensional space X is said to be an
operator of a simple structure if it has m linearly independent eigen-
vectors.
The fact that of all linear operators we single out operators of
a simple structure is very simply explained. These and only these
operators have diagonal matrices in some basis. Indeed, let x 1 , x 2 , • • •
• • • , Xm be linearly independent eigenvectors of an operator A.
Take them as basis vectors of a space X and construct the matrix
of A in that basis. We have
Ax1 = Alx1,
Ax 2 = A2x 2 ,

Axm = AmXm·
We recall that column elements of the matrix of an operator coincide
with the coordinates of the images of basis vectors. Therefore the
matrix A>. of the operator A has the following form in a basis con-
206 The Characteristic Polynomial [Ch. S

sisting of eigenvectors:

A,~
c :·
0
A2

0
.0}
0

Am
If now A has in some basis x 1 , x 2 , • • • , Xm a diagonal matrix with
some, not necessarily different, numbers A1 , A2 , • • • , Am on the
principal diagonal, then x1 , x 2 , • • • , Xm are eigenvectors of A corre-
sponding to eigenvalues A~t A2 , • • • , Am.
Thus operators of a simple structure, and operators of a simple
structure alone, have diagonal matrices in some basis. That basis
can be made up only of eigenvectors of the operator A. The action
of any operator of a simple structure always reduces to a "stretching"
of the coordinates of a vector in the given basis. If all linear operators
had a simple structure, then the question of choosing a basis in which
the matrix of an operator has the simplest form would have been
completely solved. However, operators of a simple structure do not
exhaust all linear operators.
Exercises
1. Let an operator A have an eigenvector z correspond-
ing to an eigenvalue A. Prove that for the operator
a 0 E + a 1A + ... + anAn,
where a 0 , a 1 , ••• , an are some numbers, the vector x is also an eigenYector but
that it corresponds to an eigenvalue a 0 +~A+ ... + ani..n.
2. Prove that operators A and A - aE have the same eigenvectors for any
operator A and any number a.
3. Prove that an operator A is nonsingular if and only if it has no zero eigen-
values.
4. Prove that operators A and A - 1 have the same eigenvectors for any nonsin-
gular operator A. What is the connection between the eigenvalues of the op-
erators?"
5. Prove that if an operator A is of a simple structure, then the operator
a 0 E + ~A + .•• + anA n
is also of a simple structure.
6. Prove that an operator of differentiation in a space of polynomials is not
an operator of a simple structure. Find the eigenvectors and eigenvalues of
that operator.
i. Consider a similarity transformation operator with a diagonal matrix.
Prove that that operator is of a simple structure. Find ail its eigenvectors and
eigenYalues.
66. The characteristic polynomial
Not any linear operator has at least one eigen-
vector. Suppose, for example, that we have an operator in a
space v2 which turns every directed line segment about the origin
66) The characteristic polynomial 207

counterclockwise through an angle of 90". It is obvious that in that


case image and inverse image will never be collinear and the opera-
tor will have no eigenvector. To study the question of the existence
of eigenvectors we first introduce an equation satisfied by all eigen-
values of a linear operator.
Let A be a linear operator in an m-dimensional space X over
a field P. If A has an eigenvalue A. corresponding to an eigenvector x,
then by definition Ax = A.x or equivalently
(A.E- A) X = 0. (66.1)
Since xis nonzero, from (66.1) it follows that the operator A.E- A
is singular. Thus the eigenvalues of A are only those numberi' A.
from P for which A.E -A is singular.
Fix in X some basis e1 , e2 , • • • , em and denote by A. the matrix
of A in that basis. The operator A.E - A is singular if and only if
so is its matrix A.E - A •• i.e. if
det (A.E - A.) = 0. (6li.2)
Determining eigenvalues was independent of the choice of basis
in X. Therefore the numbers A. from P satisfying (66.2) must not
depend on the choice of basis either. It is in fact the left-hand side
of (66.2) that is independent of the choice of basis for any A., although
formally this dependence is noted. Suppose in some other basis f 1 ,
f 2 , • • • , fm the operator A has a matrix A 1. According to (64.5) ~1e
and A1 are related by
A1 =Q- 1A11Q
for some nonsingular matrix Q. Now for any A. from P we find
det (A.E- A 1) = det (A.Q-IEQ- Q-tA.Q) = det (Q-1 (A.E- A11) Q)
= det Q-t det (A.E- Ae) det Q = (det Qr 1 det (A.E- Ae) dt>t Q
= det (A.E- Ae)•
Taking into account the expression for the determinant of the
matrix in terms of its elements it is easy to see that the left-hand
side of (66.2) can be represented as follows:
det(A.E-Ae)=a 0 +a 1A.+ ... -ram_ 1A.m-l +amA.m. (66.3)
The coefficients a 0 , ••• , am are calculated in some way from the ele-
ments of A e and are independent of A.. The maximum power of A.
enters only into the product of the diagonal elements of A.E - A e
and therefore
am= 1.
We show explicit expressions of two more coefficients. Namely
a 0 ==(-1)mdetAe, am-t= -trA11 •
208 The Characteristic Polynomial [Ch. 8

In general, it may be assumed that expanding the determinant


{jet ('A.E - A e) in powers of A in different ways we should obtain
expressions of the type of the right-hand side of (66.3) but with
-different coefficients a 1• It will be shown later on, however, that
this assumption is not valid. The coefficients on the right of (66.3)
.are independent of the way they are calculated. Considering the
independence of the determinant det (AE - A e) from the basis we
-conclude that all coefficients a 0, ... , am-I are in fact characteristics
of the operator A. The function

(66.4)

is called the characteristic polynomial of the operator A.


Associated with every linear operator is a characteristic polynomial.
The converse is also true. Every polynomial of the form (66.4) is
a characteristic polynomial of some linear operator. This may be,
for example, an operator whose matrix A e in some basis has the
following form:
(-am-I -am-2 -at -ao)

A,~!
0 0 0

: 1

0
0

1
0

0 J
(66.5)

This is easy to see from a direct check, using the Laplace theorem
for calculating det (AE - A e)· A matrix of the form (66.5) is called
.a Frobenius matrix.
For a number A from P to be an eigenvalue of an operator A it is
necessary and sufficient that it should satisfy the equation

i.e. that it should be a root of the characteristic polynomial. Not


in every field P any polynomial with coefficients from P has at
least one root from P. As an illustration, A2 + 1 has no roots either
in the field of rationals or in the field of reals.
A field P is said to be algebraically closed if any polynomial with
-coefficients from P has at least one root from P.
Thus if a linear operator acts in a space over an algebraically
closed field, it must have at least one eigenvector. It is possible to
construct various algebraically closed fields, but it is only one of
them, the field of complex numbers, that is of the greatest practical
value. To prove the algebraic closure of this field is the aim of our
immediate studies.
6i) The polynomial ring 209

Exercises
1. Find the characteristic polynomial for the zero and
the identity operator.
2. Find the characteristic polynomial for the operator of differentiation..
3. Is the coincidence of characteristic polynomials an. indication. of the
equality of the operators?
4. Prove that operators with matrices A and A' have the same character-
istic polynomials.
5. Suppose that in. some basis an. operator has matrix (66.5). Find the coor-
dinates of the eigenvectors in. the same basis.
6. Prove that an. operator with matrix (66.5) has a simple structure if and
only if the characteristic polynomial has m mutually distinct roots.

67. The polynomial ring


In some exercises and examples we have already
noted the algebraic properties of polynomials. In connection with
the investigation of the characteristic polynomial we shall continue
those studies.
Let P be an arbitrary field. Consider a set of polynomials, i.e.
functions of the form
(67.1)
dependent on an independent variable z assuming values from P and
having coefficients a0 , • • • , an from P. A polynomial f (z) is said to
be a polynomial of degree n if an =I= 0 and all coefficients with larger
indices are zero. The only polynomial without a definite degree is
the one all of whose coefficients are zero. We shall call it a zero
polynomial and designate 0.
Two polynomials are said to be equal if so are all their coefficients
of equal powers of the independent variable.
Now let f (z) and g (z) be polynomials of degree n and s respective-
ly. Also let
f (z)=a0 +a 1z+ ... +an_ 1zn-t+anzn,
(67.2)
g (z) = b0 + b1z + ... + b,_ 1z'-t + b,z'
and suppose for definiteness that n;;;;;::, s. A sum f (z) + g (z) of f (z)
and g (z) is a polynomial

f (z) + g(z) =C0 +c 1z+ ... +cn_ 1zn-t +cnzn,


where c 1 = a 1 + b 1 for i ~ s and c 1 = a 1 for i > s. The degree of
the sum of the polynomials is n if n > s, but for n = s it is less
than n if bn = -an.
A product f (z) · g (z) off (z) and g (z) is a polynomial
f (z) · g (z) = d0 + dtz + ... + dn+a-tzn+•-t + dn+azn+•,
210 The Characteristic Polynomial [Ch. 8

where

fori = 0, f, ... , n + s. A coefficient d 1 is a sum of the product.'! of


the coefficients of f (z) and g (z) the sum of whose indices is i. For
example,

From the last equation it follows that dn+• :/= 0 and therefore the
degree of the product of nonzero polynomials is equal to the sum
of the degrees of the factors. Hence a product of nonzero polynomials
is a nonzero polynomial.
A special case of a product of polynomials is a product a.! (z) of
a polynomial f (z) by a number a., since a nonzero number can be
regarded as a polynomial of degree 0.
A set of polynomials with the operations introduced above is
a commutative ring. We shall not concern ourselves with checking
that all the axioms hold.
Theorem 67.f. For any polynomial f (z) and nonzero polynomial
g (z) we can find unique polynomials q (z) and r (z) such that
I (z) = g (z) q (z) +r (z), (67 .3)
with the degree of r (z) lower than that of g (z) or r (z) = 0.
Proof. Let f (z) and g (z) be polynomials of degrees n and s. If
n <s or f (z) = 0, then it is possible to set q (z) = 0 and r (z) =
= f (z) in (67 .3). Suppose therefore that n;;;;;::, s.
We represent f (z) and g (z) according to (67 .2) and set
f (z)- :: zn-•g (z) = /s(z). (67.4)

Let the degree of / 1 (z) be n 1 and let its leading coefficient be a~:.
It is clear that ~ < n. If ~;;;;;::, s, then we set
a< t1>
/ 1 (z)- b: zn 1 - 8g(z)=/2 (z). (67.5)

Denote by n 2 the degree of / 2 (z) and by a~2: its leading coefficient.


If n 2 ;;;;;::, s, then again we set
t42)
f 2 ( z ) -2- zn2-•g(z)=f 3 (z) (67.6)
b.
and so on.
The degrees of / 1 (z)., / 2 (z), ... are decreasing. Therefore in a finite
number of steps we arrive at the following equation:
a<ll-1)
/ 11 -dz)- ":; 1 zn~-~-•g (z) = / 11 (z), (67. 7)
67) The polynomial ring 211

where/., (z) is either zero or its degree n11 is less than s. After that the
process is stopped.
Now adding all equations of the type (67.4) to (67.7) we get
(I) a<ll-1) )
f (z)- ( :: z"-'+ a;: zn1-s+ ..• + n:.- 1 z"ll-1-• g(z)= ft. (z).

This means that the polynomials


(I) (11.- I)
4 1 4
q (z) = ~= zn-• + :. zn1-• + ... -+ ":.- 1 znll-1-•, r (z) =ft. (z)

satisfy equation (67 .3), with either r (z) = 0 or the degree of r (z)
less than that of g (z).
We now prove that the polynomials q (z) and r (z) satisfying the
hypothesis of the theorem are unique. Let there be other polynomials,
q' (z) and r' (z), such that
I (z) = g (z) q' (z) r' (z), +
with either r' (z) = 0 or the degree of r' (z) less than that of g (z).
Then
g (z) (q (z) - q' (z)) = r' (z) - r (z). (67 .8)
The polynomial on the right of this equation is either zero or its
degree is less than that of g (z). But the polynomial on the left of
this equation has a degree not less than that of g (z) for q (z) -
q' (z) :/= 0. Therefore (67 .8) is possible only if
q (z) = q' (z), r (z) = r' (z).
This completes the proof of the theorem.
A polynomial q (z) is called the quotient of f (z) by g (z) and r (z)
is the remainder. If the remainder is zero, then f (z) is said to be
divisible by g (z) and g (z) is said to be a divisor of f (z).
Consider division of a nonzero polynomial f (z) by a first-degree
polynomial (z - a). We have
f (z) = (z- a) q (z) r (z). + (67.9)
Since the degree of r (z) must be less than that of (z - a), r (z) is
a polynomial of degree zero, i.e. a constant. That constant is easy
to determine. On substituting on the right and left of (67.9) z =a
we find that r (z) = f (a). So
f (z) = (z- a) q (z) f (a). +
(67.10}
For f (z) to be divisible by (z - a) it is necessary and sufficient
that f (a) = 0. Numbers a such that f (a) = 0 are usually called
roots of f (z). Thus finding all linear divisors of a polynomial is
equivalent to finding all its roots.
212 The Characteristic Polynomial [Ch. 8

Formula (67.10) allows the following conclusion to be drawn. For


any a from P a polynomial/ (z) of degree n can be uniquely repre-
sented as an expansion in powers of (z - a):
~f (z) =Ao+As(z-a) + ••. +An_s(z -a)n-s +An (z-a)n, (67.11)
where A 0 , • • • , An are numbers from P.
The existence of at least one expansion (67.11) can be established
in a fairly simple way. On dividing f (z) by (z -a) we obtain a quo-
tient q1 (z) aod a remainder A 0 related by
(z) = (z- a) q1 (z) + A 0• (67.12)
If q1 (z) is of degree zero, then expansion (67 .11) is obtained. If,
however, the degree of q1 (z) is oth~,. than zero, then on dividing
q1 (z) by (z - a) we have
q1 (z) = (z -a) q1 (z) A1 • + (67.13)
Combining (67 .12) and (67 .13) we find
f (z) =(z-a) 2 q3 (z) + A1 (z -a)+ A0 •
We again divide q1 (z) by (z- a), if necessary, and so on. Since the
degrees of the quotients q1 (z), q1 (z), ... are successively decreasing,
the process stops in n steps to yield expansion (67 .11 ).
Suppose now that an expansion of the same form has been obtained
in some other way and has coefficients A~, ••. , A~. On letting
ql(z)=Ai+Ai+t(z-a)+ ... +A~(z-a)n-i
for t = 0, 1, ... , n we conclude that
qi (z) =- (z -a) qj+l (z) +AI, (67.14)
with q~ (z) = f (z) of course. Comparing (67 .12) with (67 .14), when
i = 0, and considering the uniqueness of quotient and remainder
we conclude that A 0 = A~ and q1 (z) = q;
(z). Similarly we can prove
the equality of the other coefficients.

t. Prove that )here are no zero divisors in a polyno-


mial ring.
2. Suppose that for some polynomials I (z) q> (1) = 1 (1) q> (1). Prove that if
q> (1) =I= 0, then I (1) = 1 (1).
3. Prove that nonzero polynomials I (1) and 1 (1) are divisible by each other
if and only if 1 (1) = a{ (1) for a nonzero number a.
4. Let each of the po ynomials / 1 (1), . . . , ! 11 (1) be divisible by q> (1). Prove
tha~ so is / 1 (1) IJ (1)
arbitrary polynomials.
+ ... + / 11 (1) 1 11 (z), where g1 (1), . . . , g11 (1) are
5. Prove that in expansions (67.1) and (67.H) for the same polynomial/ (z)
the coefficients an and An coincide.
68) The fundamental theorem of algebra 2i3

68. The fundamental theorem of algebra


We proceed to prove one of the most important
statements, the theorem on the algebraic closure of the field of com-
plex numbers. This theorem has application in various fields of
mathematics. In particular, it underlies all further theory of linear
operators. Following the established tradition we shall call it the
fundamental theorem of algebra.
So we must prove that any polynomial of degree n~ 1 with
complex coefficients has at least one root, in general a complex root.
We first consider polynomials of a special form. Namely,
f (z) = a- z". (68.1)
Let us represent complex numbers z in the so-called trigonometric
form
z = r (cos <p + i sin <p).
Here r is a nonnegative number called the absolute value or modulus
of a number z, and <p is a real number called the argument of z. It
is clear that for every number zits absolute value is uniquely defined.
For nonzero numbers z their argument is defined up to a multiple
of 2n; for z = 0 the argument is not defined. Composing the product
of two complex numbers
z = r (cos <p +
i sin <p), v = p (cos 'ljl +
i sin 'ljl),
we find
zv = rp (cos <p + i sin <p) (cos 'ljl + i sin 'ljl) = rp (cos (<p + 'ljl)
+ i sin (<p + 'IJ>)).
From this we deduce that
z" = r" (cos n<p + i sin n<p).
This equation is called de M oi!ITe's formula. It provides an easy
way for finding the roots of equation (68.1). Indeed, let a complex
number a be represented in trigonometric form
a = a (cos a i sin a).+
The equation
a-z"=O
in z is equivalent to
a (cos a +
i sin a) = r" (cos n<p + i sin n<p)
in r and <p. But the last equation clearly has the following solutions:
6+2kn
<p n
2t4 The Characteristic Polynomial [Ch. 8

for k = 0, 1, 2, ... , n - 1. Hence the complex numbers


a11 = + vn/-a ( cos 8 +n2kn ..l.-, t• Sln• e..!..n2kn ) (68.2)

are the roots of (68.1 ). We shall call them the nth roots of a and
designate them as
n/-
a11=y a.
Now let f (z) be a polynomial with complex coefficients. We consid-
er it to be a complex function of the complex independent vari-
able z. For such functions, as for real-valued functions of a real inde-
pendent variable, it is possible to introduce the concepts of conti-
nuity, of derivative and so on. Not all of these notions will be equal-
ly necessary to us, but they are all based on the use of the complete-
ness of the space of complex numbers.
A one-valued complex function f (z) of a complex independent
variable z is said to be continuous at a point z0 if for any arbitrarily
small number e > 0 we can find 6 > 0 such that for any complex
number z satisfying
I z- zo I< 6
we have
I I (z) - I (zo) I < e.
A function f (z) continuous at each point of its domain is called
everywhere continuous or simply continuous.
Lemma 68. t. A polynomial f (z) with complex coefficients is a con-
tinuous function of a complex independent variable z.
Proof. Let
(68.3)
and let z0 be an arbitrary fixed complex number. Denote h = z - z0 •
We show that for any arbitrarily small number e > 0 we can find
6 > 0 such that I I (z) - f (z 0 ) I < e for I h I < 6.
On expanding the polynomial f (z) in powers of (z - z0 ) we get
f (z) = Ao+ A1 (z-z0) + ...
+An (z-z 0)n.
Since A 0 =I (z 0 ) and (z - z0 ) is denoted by h, we have
f (zo +h)- f (z0)=A 1h + ... + Anhn. (68.4)
Hence
I f(zo+h)-f(zo) I~ IAtllh I+ ..• +I An llh ln=A( I hI). (68.5)
The real-valued function A (I h I) is a polynomial with real coeffi-
cients I A 1 I in a real variable I h 1. As is known from mathematical
analysis, A (I h I) is a continuous function everywhere and, in
68] The fundamental theorem of algebra 215

particular, for I h I = 0. Since A (0) = 0, given e > 0 we can find


6> 0 such that for
lh 1<6 (68.6)
we have
A (I h I)< e.
Taking into account inequality (68.5) we conclude that if (68.6)
holds so does the inequality
It (zo + h) - I (zo) I < e.
Corollary. The absolute value of a polynomial is a continuous func-
tion.
This statement is immediate from the following relation:
II I (z) I - I f (zo) II~ I I (z) - I (zo) 1.
Corollary. If a sequence of complex numbers {z11 } converges to z0 ,
then for any polynomial f (z)
lim f(z 11 )=/(z0).
ll-+oo

Lemma 68.2. If a polynomial f (z) of degree n;;;;;, 1 does not vanish


for z = z0 , then we can always find a complex number h such that
I f (zo + h) I < I I (zo) 1.
Proof. Again consider expansion (68.4). Let A 11 be the first nonzero
coefficient among A1 , A 2 , • • • , An. Take

h=t V- I ~:ol , (68.7)


where we take as the kth root any of its values and
0~ t~ 1. (68.8)
Let
Bp=Ap ( v- ~~~> )'.
Now from (68.4), taking into account (68. 7) and (68.8), we find
lf(zo+h) 1=1 f(zo)-tl!.f(zo)-t-tl!.+fBHI+ ••• +tnBn I
~I (1-t11)f(z0) I +til.+' I B 11 +~1 + ... +FI Bn I
=(1-tl!.) If (zo) 1-+ til.+! I BH1I + · • • + tn !:Bn I
=I I (zo) I +til. (-I I (zo) I+ t I B~~+, I+ .•• + tn-ll IBn I)
=If (zo) I+ t"B (t).
Finally we have
! If (zo +h) I~ I I (zo) I+ t"B (t).
216 The Characteristic Polynomial [Ch. 8

The function B (t) is a polynomial with real coefficients and real


independent variable t. It is a continuous function. But B (0) =
= -I I (z 0 ) I < 0 and therefore by virtue of the continuity of B (t)
there i!l t 0 within 0 < t 0 ~ 1 such that B (to) is also negative. For
a complex number h defined by a number t 0 according to (68.7)
we get
II (zo + h) I~ II (zo) I+ t: B (to) < II (zo) I·
Lemma 68.3. For any polynomial f (z) of degree n;;;;;, 1 and any
infinitely large sequence {z 11 } of complex numbers there is a limiting
relation
lim 1f(z 1J I= +oo. (68.9)
11-+00

Proof. Consider polynomial (68.3). For any z :f= 0 we find

I f(z) I;;;;;::, !an II z I" ( 1-1 :: 11 z 1-n_ ... -1 aan:l j1 z r•). (68.10)
Since {z11 } is infinitely large,
lim I z~~. I= + oo.
11-+oo

The right-hand side of relation (68.10) is a real-valued function,


and it is easy to see that

lim
1:11.1 .. oo
{1-~~~1 z"
Gn
,-n_ ... -~~~~ zll ,-•)=1.
an

But for the other factor of (68.10) we have


lim laniiZ~tln=+oo.
lt~&J-+00

Consequently, (68.9) is true.


Theorem 68.f (fundamental theorem of algebra). Any polynomial
f (z) of degree n;;;;;, 1 with complex coefficients has at least one root, in
general a complex root.
Proof. Consider the set of all possible absolute values of I (z).
Since If (z) 1;;;:::, 0, that set is bounded below. It is known from mathe-
matical analysis that any bounded below nonempty set of real num-
bers has a greatest lower bound. Let it be l for the set of values
I f (z) I . This means that for every natural k we can find a complex
number z11 such that
o~ I I (z~~> I- l~ 2-11..
It follows that
lim 1 t (z~~.) 1= l. (68.11)
11-+oo I
69] Consequences of the fundamental theorem 217

Assuming {z 11 } to be unbounded it would be possible to choose an


infinitely large subsequence of it, and according to Lemma 68.3
relation (68.11) could not hold. Therefore {z 11 } is bounded. Choose a.
subsequence {zhv} and let

According to a corollary of Lemma 68.1 the absolute value of a


polynomial is a continuous function. Hence
I I (zo) I = lim I I (z" ) I = l.
""'.."" "'
If l :/= 0, then from Lemma 68.2 it follows that there is a number
z0 such that II (z~) I < l. This contradicts the fact that l is the great-
est lower bound of the absolute values of the polynomial and there-
fore l = 0.
So we have shown that there is a complex number z0 such that.
I I (zo) I = 0 or equivalently
I (zo) = 0.
This means that z0 is a root of I (z).

Exercises
1. Prove that the set of all nth roots of the complex
number 1 forms a commutative group relative to multiplication.
2. Prove that for a sequence of complex numbers {1]1} to be bounded it is
necessary and sufficient that for at least one polynomial I (1) of degree n ;;, 1
the sequence {I (1)1)} should be bounded.
3. Prove that for any polynomial/ (1) of degree n;;;;. 1 and for any complex
number 1 0 there is a complex number h such that I I (z 0 +
h) I > I I (z 0 ) I·
4. Prove that all roots of polynomial (68.3) are in the ring

( f +max I
!.!!..j) -t
11>0 ao
~ I z I ~ ( 1 + max ~
h<n
I I) .
an
5. Try to ".Prove" the algebraic closure of a field of real numbers according
to the same scheme as that for complex numbers. In what place has the "proof'"
no anslogy?

69. Consequences
of the fundamental theorem
There arise a variety of consequences from the
fundamental theorem. Let us consider the most important of them.
A polynomial I (z) of degree n;;;;::, 1 with complex coefficients has
at least one root z1 • Therefore I (z) has a factorization
I (z) = (z - z1) cp (z),
218 The Characteristic Polynomial [Ch. 8

-where <p (z) is a polynomial of degree n - 1. The coefficients of


·q> (z) are again complex numbers. Consequently, <p (z) has a root
z 2 (if n;;;;;, 2) and
_<p (z) = (z - z2) 'IJ> (z)
from which it follows that
I (z) = (z - z1) z2) 'IJ> (z).
(z -
Continuing the process we obtain a representation of the polynomial
as a product of linear multipliers:
I (z) = b (z - z1) (z - z2) ... (z - Zn),
where b is some number. Removing the parentheses at the right and
-eomparing the coefficients of the powers with the coefficients a 1
of I (z) we conclude that b = an.
There may be equal numbers among z1 , z2 , ••• , Zn. Suppose for sim-
plicity that zh ... , Zr are mutually distinct and that each of the
numbers Zr+I• ••• , Zn is equal to one of the first numbers. Then
f (z) can be written as follows:
f(z)-=an(z-z 1)11 l(z-z2)11 • ... (z-zr)"r, (69.1)
-where z1 :/= z1 for i :f= j and
k1 + k 2 + ... + kr = n.
Representation (69.1) is called a canonical factorization of a polyno-
mial I (z).
The canonical factorization is unique for a polynomial f (z) up
to an arrangement of factors. Indeed, suppose that along with fac-
torization (69.1) there is another canonical factorization
f (z)=an (z- v1)l1 (z- va)'• ... (z- vm)'m.
Then
(z- z1)11 1 (z -- za) 11 t ••• (z- Zr)"r =(z- v1)l1 (z- va) 12 ... (z- Vm)'m.
(69.2)
Notice that the collection of numbers zh ... , Zr must coincide
with the collection of numbers v1 , • • • , Vm· If, for example, z1 is
equal to none of the numbers v1 , • • . , Vm, then substituting z = z1
in (69.2) we obtain zero at the left and a nonzero number at the right .
.So if there are two canonical factorizations off (z), then (69.2) may
be only as follows:
(z- z1)"1 (z- za)"s ... (z- Zr)"r= (z-z 1)'1 (z-za)la ... (z- Zr)lr.
Suppose, for example, that k 1 :f= l 1 and let k 1 > l 1 for definiteness.
By dividing the right- and left-hand sides of the last equation by the
same divisor (z - z1 ) 11 we get
(z- zJ"1-I1 (z- za)lla ..• (z- Zr)"r =(z- z2)l2 •.. (z- Zr)lr.
69] Consequences of the fundamental theorem 219

Substituting z = z1 we again see that there is zero at the left and a


nonzero number at the right. The uniqueness of the canonical factor-
ization is thus proved.
If k 1 = 1 in the canonical factorization (69.1 ), then a root z1 is
said to be simple; if k 1 > 1, then a root z1 is said to be multiple.
A number k 1 is the multiplicity of a root z1• We can now draw a very
important conclusion:
Any polynomial of degree n ~ 1 with complex coefficients has n
roots, each root counted according to its multiplicity.
A polynomial of degree zero has no roots. The only polynomial that
has arbitrarily many mutually distinct roots is the zero polynomial.
These facts may be used to draw the following conclusion:
If two polynomials f (z) and g (z) whose degrees do not exceed n have
equal values for more than n distinct values of the independent variable,
then all the corresponding coefficients of those polynomials are equal.
Indeed, according to the assumption the polynomial f (z) - g (z)
has more than n roots. But its degree does not exceed n and therefore
f (z) - g (z) = 0.
So a polynomial I (z) whose degree is not greater than n is complete-
ly defined by its values for any n + 1 distinct values of the indepen-
dent variable. This makes it possible to reconstruct the polynomial
from its values. It is not hard to show an explicit form of this "recon-
structing" polynomial. If for the values of the independent variable
equal to a 1, . . . , an+I a polynomial f (z) assumes the values
I (al), . . . , I (an+ 1), then
n+l
I (z)= ~ I (a,) (z-al) •.• lz-a,_l)(z-at+ll ..• (z-an+Il •
•=I (a,-al) ..• (CI,-Clf-1) (a;-CI 1 +Il ... (Cif-Cinul

It is clear that the degree of the polynomial at the right does not ex-
ceed n, and at the points z = a 1 it assumes the values f (a,). The
polynomial thus constructed is called Lagrange's interpolation poly-
rwmial.
Consider now a polynomial I (z) of degree n and let z1, z 2, . . . , Zn
be its roots repeated according to multiplicity. Then
,If (z) =an (z- Z1) (z- z2) ••• (z- Zn)·
Multiplying the parentheses at the right, collecting similar terms and
comparing the resulting coefficients with those in (68.3) we can de-
rive the following equations:
an -/an = -(Z1 + Z2 + ••• + Zn),
an -2/an = +(zlz2 + zlz3 + • • . + ZlZn + • • • + Zn-lZn),
an-ian = -(Z1Z2Z3 + Z1Z2Z6 + · · · + Zn-2Zn-IZn),

a11an = ( -1)"- 1 (z1z2 ••• Zn-I + ••• + z 2z3 .•• Zn),


aofan =( -1)nz1Z1 ••• Zn•
220 The Characteristic Polynomial [Ch. S

These are called Vieta's formulas and express coefficients of the poly-
nomial in terms of its roots.
On the right of the kth equation is the sum of all possible products.
k roots each, taken with a plus or a minus sign according ask is even
or odd.
For further discussion we shall need some consequences of the fun-
damental theorem of algebra, relating to polynomials with real coef-
ficients. Let a polynomial
/(z)=ao+a 1z+ ... +anz"
with real coefficients have a complex !(but not real!) root v, i.e.
a1 +a 1v+ •.. +anv"=O.
The last equation is not violated if all numbers are replaced in it by
complex conjugate ones. However, the coefficients a 0 , ••• , an and the
number 0, being reals, will remain unaffected by the replacement.
Therefore

i.e. t (v) = o.
Thus if a complex (but not a real!) number vis a root of a polynomi-
~1 f (z) with real coefficients, then so is the complex conjugate number
v.
It follows that f (z) will be !divisible by a quadratic trinomial
<pW=~-0~-~=~-~+~z+~
with real coefficients. Using this fact we prove that v and ii have the
same multiplicity.
Let them have multiplicities k and l respectively and let k > l,
for example. Then f (z) is divisible by the lth degree of the polynomial
<p (z), i.e.
I (z) = <p\ (z) · q (z).
The polynomial q (z), as a quotient of two polynomials with real coef-
ficients, has also real coefficients. By assumption it must have anum-
ber v as its (k - l)-fold root and must have no root equal to v. Ac-
cording to what was proved above it is impossible and therefore
k = l. Thus all complex roots of any polynomial with real coefficients
are mutually complex conjugate. From the uniqueness of the canoni-
cal factorization we can draw the following conclusion:
Any polynomial with real coefficients can be represented, up to an
arrangement of the factors, uniquely as a product of its leading coef-
ficient and polynomials with real coefficients. Those polynomials
have leading coefficients equal to unity and are linear, if they corre-
spond to real roots, and quadratic, if they correspond to a pair of com-
plex conjugate roots.
69] Consequences of the fundamental theorem 221

Finally, we proceed to the most important conc1usion, that for


the sake of which, properly speaking, the fundamental theorem of al-
gebra was proved. Let A be a linear operator in a complex space. The
eigenvalues of that operator, and they alone, are the roots of the
characteristic polynomial. J3y the fundamental theorem A has at
least one eigenvalue A. Hence
Any linear operator in a complex vector space has at least one eigen-
vector.
Notice that if A is an operator in a real or rational space, this con-
elusion is .no longer valid.
In reference to eigenvalues we shall apply the same terminology
as in reference to roots of a polynomial. In particular, an eigenvalue
will be said to be simple, if it is a simple root of a characteristic poly-
nomial, and multiple otherwise. The multiplicity of an eigenvalue
A will be the multiplicity of A as a root of the characteristic polyno-
mial.

Exercises
1. Prove that if a complex number a =1= 0, then for
any natura'l n there are only n distinct complex numbers whose nth power is
equal to a.
2. What is the relation between the roots off (1) and I (1 - a), where a is
a complex number?
3. Let a polynomial/ (s) of degree not greater than n with complex coeflici-
enta assume equal values for n + 1 distinct values of the independent variable.
Prove that f (s) is a polynomial of degree zero.
4. Prove that any polynomial of an odd degree with real coeflicients has at
least one real root.
5. Prove that a polynomial I (s) has at least a root in each of the regions

IzI ~ vI :: I· IzI ~ VI :: I·
6. Prove that an operator A has a simple structure if and only if there are
as many linearly independent eigenvectors corresponding to each of its eigen-
values as is the multiplicity of ~
CHAPTER 9

The Structure
of a Linear Operator

70. Invariant subspaees


All the studies to come next will be carried
on under the hypothesis that the linear operator is given in a complex
space X. As already noted earlier, this assumption ensures that
every linear operator has at least one eigenvector.
A subspace L of a vector space X is said to be invariant under an
operator A if for each vector x of L its image Ax is also in L.
Any linear operator has at least two trivial invariant subspaces,
the zero subspace and the entire space X. Of vital importance are
only nontrivial invariant subspaces. Among them are, for example,
proper subspaces. Since in a complex vector space any operator clear-
ly has at least one eigenvector, any operator in such a space must have
at least one nontrivial invariant subspace.
It is easy to verify that for every operator A its domain T A and
kernel N A are invariant subspaces. They are trivial if and only if
A is nonsingular or zero.
If L is an invariant subspace, then there may be many ways to
construct a complementary subspace M such that X = L + M.
Among those comp]ementary subspaces there may be no invariant
subspace, however. But if there is at ]east one invariant complemen-
tary subspace, then we may speak of decomposing the space into a
direct sum of invariant subspaces.
Knowledge of some invariant subspace and certainly of a decompo-
sition of a space as a direct sum of invariant subspaces makes it pos-
sible to construct a basis in which the matrix of the operator has a
simpler form. Let an operator A have in an m-dimensional space X
an invariant subspace L of dimension n. Choose a basis e1 , e2 , ••• , em
in X so that its first n vectors are in L. Then the images Ae1 , •••
• . . , A en of the vectors e1 , ••• , en are inLand it is possib]e to expand
them with respect to the vectors e1 , ••• , en as the basis vectors of L.
Consequently,
Ae1 = a 11e1+ + ... +
a 21 e2 an 1en,

Aen = a 1ne 1 + a ne + ... + annen.


2 2
Recall that the column elements of the matrix of an operator coin-
cide with the coordinates of the images of basis vectors. Therefore
70] Invariant suhspaces 2:!3

the matrix A, of A in the basis eu e 2 , ••• , em will be of the form


au · · · atn i at, n+t
1· ...
an• ••• ann 1an. n+l anm
A,= 0--·····..·-o-~-an+·~~~~-~-.-.-··~-~-;~~-~
0 0 am,n+t
As a rule matrices of such a type would be written in the so-called
block form. Namely,

A,
=(Au
0
A12)
Au . (70.1)

Here A 11 is a square n X n matrix, A 22 is a square m - n matrix,


0 is a zero (m - n) X n matrix and Au is an n X (m- n) matrix.
Suppose now that X is decomposed as a direct sum of invariant
subspaces Land M. Choose a basis eu e 2 , ••• , em in X so that its first
n vectors are in L and the remaining m - n vectors are in M. In
this case the images Ae1 , ••• , A en can be expanded only with respect
to the vectors e1 , • • • , en and the images Ae,,+ 1 , • • • , Aem only with
respect to the vectors en+l• ... , em. The matrix Au in (70.1)
will obviously be zero. Therefore the matrix A e of A in the basis un-
der consideration will have a still simpler form. Namely,

A.,= 0 (Au Au0) .


Suppose the action of the operator A is investigated only on the
vectors of the invariant subspace L. If x E L, then Ax E L. Hence
it may be assumed that A generates on L some other operator, A I L,
defined by the equation
(A I L) x =Ax

for every x E L. The operator A I L is called the induced operator


generated by the operator A. In relation to A I L the operator A is
called the generator. By virtue ofthe linearity of A the induced operator
is also linear. It coincides with the generator A on Land is not de-
fined outside L. Thus these operators mainly differ in their domains.
However artificial its introduction may seem, the induced opera-
tor provides a very convenient auxiliary tool in carrying out di-
verse studies. For example, the induced operator, as any other linear
operator, has at least one eigenvector. But since its domain coin-
cides with that of the generator, this means that
:224 The Structure of a Linear Operator [Ch. 9

Any linear operator has at least one eigenvector in every invariant sub-
.space.
If a space is decomposed as a direct sum of r invariant subspaces,
then the linear operator has at least r linearly independent eigen-
vectors.
It is clear that any eigenvalue and any eigenvector of an induced
operator are respectively the eigenvalue and the eigenvector of the
generator. Less obvious is
Theorem 70. f. The characteristic polynomial of an induced operator
generated on a nontrivial subspace is a divisor of the characteristic poly-
nomial of the generator.
Proof. Let an induced operator A I L be defmed on an invariant
subspace L. Again choose a basis e1 , ••• , ~m of a space X so that vec-
tors elt ... , en constitute a basis in L. If the matrix of the gener-
ator is A 6 of (70.1), then the matrix of A !Lis Au of (70.1). The char-
.acteristic polynomial is equal to det ('A.E - A e) for A and to
det ('A.E- Au) for A 1 L. Applying the Laplace theorem to expand
the determinant det ('A.E - A e) by the first n columns we find

AE=~:)
AE-A 11
dct (AE- Ae) = det (
0
=det (AE-A 11 ) det (AE-Au).
This equation establishes the validity of the theorem.
Determining all eigenvalues of the operator A reduces to finding
all roots of the characteristic polynomial. If A has a nontrivial in-
variant subspace, then by Theorem 70.1 this problem can be reduced to
·finding all roots of two polynomials of lower degree. If the induced
operator itself has a nontrivial invariant subspace, then the process
of factoring the characteristic polynomial can be continued.

Exercises

1. Prove that the sum and the intersection of invariant


-eubspaces are invariant subspaces.
2. Prove that if A is a nonsingular operator, then any induced operator is
also nonsingular.
3. Prove that if A is an operator of a simple structure, then any induced
operator has also a simple structure.
4. In what case is an invariant subspace of an operator of a si10ple structure
a direct sum of proper subspaces?
5. Prove that if at least one invariant subspace of an operator A has no
complementary invariant subspace, then A cannot be an operator of a simple
structure.
6. Prove that if A is an operator of a simple structure, then its range and
kernel have no nonzero vectors in common.
7. Prove that if a subspace is invariant under an operator A, then it is also
invariant under the operator a 0 E + a 1A + ... + a 11 AP.
it! The operator polynomial 225

71. The operator polynomial


One of the most important ways of construct-
ing invariant subspaces of a linear operator is by using polynomials
with complex coefficients.
Let A be some linear operator in a complex space X. Take a poly-
nomial
<p(z)=a 0 +a 1z+ ... +aPzP
with complex coefficients and consider a linear operator
<p(A)=-ao£+a 1A+ ••• +apAP.
It is an operator in X and is called an operator polynomial or a poly-
nomial in the operator A.
Fix an operator A and construct the set of all operator polynomials
in A. Since the set of all polynomials is a commutative ring, so is
the set of all operator polynomials. In particular, it follows that
<p (A) A = A<p (A)
for any polynomial cp (z). The commutativity of the ring of operator
polynomials plays an exceptionally important part in all further studies.
It is easy to show that the range T (j) of any operator polynomial
cp (A) is an invariant subspace for the operator A. Indeed, let x E
E TfJ. This means that x = cp (A) y for some y E X. By the per-
mutability of A and <p (A) we have
Ax = A<p (A) y = cp (A) (Ay).
Hence the vector Ax is the result of applying <p (A) to the vector
AyE X, i.e. AxE T(j).
The kernel N(j) of Lhc operator polynomial <p (A) is also an invariant
subspace for the operator A. If x EN fll• then <p (A) x = 0, but then
cp (A) (Ax) =A (cp (A) x) =A (0) = 0.
It has already been noted earlier that there is at least one eigen-
vector of the operator in any invariant subspace. Now it is possible to
make a more precise statement. Namely,
If an eigenvalue of an operator A is (is not) a root of a polynomial
<p (z), then all eigenvectors of A corresponding to that eigenvalue are
in the kernel (the range) of the operator <p (A).
Indeed, let x be an eigenvector of an operator A corresponding to
an eigenvalue A. In the exercises to Section 65 it was stressed that
x is also an eigenvector for <p (A) but corresponds to an eigenvalue
<p (/..). Hence <p (A) x = <p (I..) x. If 1.. is a root of <p (z), then <p (/..) =
--= 0 and x is in the kernel of <p (A). But if <p (/..) ='= 0, then cp (/..) x is a
nonzero vector and x is in the range of <p (A).
226 The Structure of a Linear Operator [Ch. 9

We cannot prove that any invariant subspace of A is either the


range or the kernel of some operator polynomial. This assertion is
false in general, which is exemplified by the identity operator. We
have <p (E) = <p (1) E for any polynomial <p (z), and therefore the
operator <p (E) is either zero or nonsingular. Hence range and kernel
are always trivial subspaces for <p (E). But an invariant subspace of
the operator E is any subspace. Nevertheless each invariant subspace
of A has a definite relation to operator polynomials in A. We have
Theorem 7t. t. Let L be an invariant subspace of an operator A. If
all eigenvalues of an operator induced on L are roots of a polynomial
<p (z), then L is m the kernels of operators <ph (A) for every sufficiently
large positive mtegral power k.
Proof. Denote by T;, T~, ... the ranges of the operators induced
on L using operator polynomials <pit (A), with k = 1, 2, . . . .
The operator <p (A) is singular on L, since its kernel contains at least
all eigenvectors of A lying in L. Therefore T; c: L, dim r;
< dim L.
The subspace r; is invariant under A. If T' is nonzero, then by Theo-
rem 70.1 the characteristic polynomial of the operator induced on r;
using A is a divisor of the characteristic polynomial of the operator
induced on L using A. Hence all eigenvalues of the operator induced
on r; are also roots of <p (z). But it again follows that T~c: r;,
dim T~ < dim r; and so on. The dimensions of r;,
T~, ... cannot
decrease without limit. Beginning with some k therefore these sub-
spaces will remain zero, which means that the theorem is true.
The above studies result in establishing an important fact concern-
ing the existence of nontrivial invariant subspaces.
Theorem 7f.2. Any linear operator A in an m-dlmensional complex
space X has at least one invariant subspace of dimension m - 1.
Proof. An operator A has at least one eigenvector x. Let it corre-
spond to an eigenvalue A.. By what has been proved the range T,.,
of the operator A - A.E is an invariant subspace of A. But since
A - AE is singular, the subspace T,., has a dimension not greater than
m-1.
Consider now any subspace L of dimension m- 1 entirely contain-
ing T,.. Any vector of X is transformed by A - A.E into some vector
of T ,.,. Any vector of L therefore again goes over into a vector of L.
Thus L is a subspace invariant under A - A.E and of course invariant
under A. Thus the theorem is proved.

EKerclses
i. Let A be an operator of differentiation in a finite
dimensional real space of rolynomials. What is the operator <p (A) for a
polynomial <p (z) with rea coefficients?
2. Let <p (z) be tbe characteristic polynomial of an induced operator generat-
ed by an operator A on an invariant subspace N. Prove that N is in the kernel
of an operator If" (A) for some positive Integer k.
i2) The triangular form 227

3. Prove that if all eigenvalues of an operator A are roots of a polynomial


= 0 for some positive integer k.
q> (z), then cp" (A)
4. Prove that the ring of operator polynomials generated by any operator
has zero divisors.
5. Prove that if A is an operator of a simple structure, then q> (A) is also aa
operator of a simple structure. Is the converse true?

72. The triangular form


Now we can solve the problem of reducing the
matrix of an operator to one of the simplest forms, the so-called
triangular form.
Theorem 72. t. For any linear operator A in an m-dimensional space
X there are invariant subspaces Lp of dimension p, p = 0, 1, ...
. . . , m - 1, m, such that
Lo c: L 1 c: ... c: Lm- 1 c: Lm.
Proof. The existence of L 0 and Lm is obvious. By what was proved
earlier an operator A has an invariant subspace Lm_ 1 of dimension
m-1.
Consider on Lm- 1 an induced operator. As any other operator
on Lm_ 1 it has _an invariant subspace Lm_ 2 of dimension m- 2.
But a subspace invariant under an induced operator is invariant un-
der the generator A too. Thus the existence of Lm-'!. is proved. If we
consider an induced operator on Lm_ 2 , we can similarly establish
the existence of Lm-a and so on.
The theorem is interesting mainly becaustl of its matrix interpreta-
tion. Construct a basis elt e 2 , • • • , em of X using invariant subspaces
L'P. As a vector e1 take any nonzero vector in L~o as a vector e2 take
any nonzero vector in L 2 that is not in L 1 , and in general take as a
vector ep any nonzero yector in Lp that is not in Lp_ 1 • Consider the
matrix A e of A in that basis. Since e1 is in L 1 and L 1 is invariant un-
der A, the vector Ae1 must be a linear combination of only vectors
elt e 2 , • • • , e1. In the expansion
Ae1 = a 11e1 + a 21e2 + ... + am e,., 1

l
then the coefficient of e1 must be zero for every i > j. Hence the mat-
rix of A is of the form
au a12 al, m-1
0 azz az, m-1 aa,m
2m
A.=
0 0 am-I. m-1 am-I. m
0 0 0 amm J
where a 11 = 0 for i > j.
A matrix all of whose elements under (above) the principal diag-
onal are zero is called a right (left) triangular matrix. In matrix term1
228 The Structure of 11 Linear Operator [Ch. 9

the result obtained implies that any square matrix is similar to a


right triangular matrix.
The triangular form of matrix is widely used in proving diverse
facts concerning linear operators. This is mainly due to the fol-
lowing property:
If an operator A has in some basis a triangular matrix A •• then
the diagonal elements of A. coincide with the eigenvalues of A even
taking into account their multiplicities.
Indeed, using the Laplace theorem we find that the characteristic
polynomial of A e is equal to
m
det (A.E- Ae)= [1 (A.- a 11 ),
f=t
which implies the validity of the above assertion.
Much of the further theory of linear operators is devoted to improv-
ing the result just obtained, that on reducing the matrix of an oper-
ator to triangular form. The simplest form possible the matrix of an
operator may have is the diagonal form. As we know, only the ma-
trices of operators of a simple structure can be reduced to this form.
However, the triangular form is not the simplest for operators of
not a simple structure either.

Exercises
1. Prove that any square matrix is similar to a left
triangular matrix.
2. Prove that a set of left (or right) triangular matrices forms a ring.
3. Prove that a set of nonsingular left (or right) triangular matrices forms
a group.
4. Let ).1 , ).2 , ••• , Am be the eigenvalues of an operator A written. out in
succession according to multiplicity. Prove that, taking into account their
multiplicities, the eigenvalues of an operator q> (A) for any polynomial q> (s)
are q> (~). q> ().. 2), •.. , p (Am)·
5. Prove that if all d1agonal elements of a triangular m X m matrix A are
zero, then Am= 0.
6. Let a triangular matrix be similar to a diagonal matrix. Prove that the
similarity transformation matrix may be chosen to be left (or right) triangular.

73. A direct sum of operators


A linear operator all of whose eigenvalues are
equal is in a sense an exception. Nevertheless we shall show that it
is such operators that any linear operator can be made up of.
Let a space X be represented as a direct sum of subspaces L and
M. We define some operator Bon Land some operator Con M. For
any vector x E X there is a unique decomposition
(73.1)
where xL ( L and xM EM.
i3] A direct sum of operators 229

An operator A defined by
Ax= BxL + CxM
is called a direct sum of B and C. If one of the subs paces L and M is
trivial, then the direct sum is also called trivial.
It is easy to verify that A is a linear operator in X. We show that
it can be represented only uniquely as a direct sum of operators cle-
fined on L and M. Indeed, for any vector x E L we have Ax = Bx.
Similarly Ax = Cx for any x E M. This means that B coincides
with the induced operator A I L and C coincides with A I M.
Consider now an operator A in a space X. If X is decomposed in
some way into a direct sum of subspaces L and M invariant under
A, then the operator A itself can be decomposed as a direct sum. Indeed,
construct A I L and A I M. On decomposing again a vector x EX
as a sum (73.1) we get
Ax = (A I L) xL + (A I M) xM.
In this case, by Theorem 70.1, the characteristic polynomial of A
is equal to the product of the characteristic polynomials of A I L
and A I M.
The operator A can be decomposed as a direct sum using any op-
erator polynomial <p (A). Denote by N" the kernel of an operator
<pit (A). This is a subspace invariant under A and it is obvious that
N 1 c N 2 c . . . . We first prove that if Nit = NH 1 for some k,
then N 11 = Np for every p > k. Indeed, take any vector x E Np.
Then <pP (A) x = 0. On writing this as q>11 +l (A) (<pp-lt- 1 (A) x) =0
we conclude that the vector <pp-11.- 1 (A) x E N 1!+ 1 • By virtue of
N 11 = Nlt+l the same vector is in Nit. Consequently,
<p 11 (A) (q>P-"- 1 (A) x) = qt- 1 (A) x = 0,

i.e. the vector x E Np_ 1 • The validity of the above assertion can
now be established by induction on p.
The space X in which A is an operator is finite dimensional. There-
fore the dimensions of ~ubspaces N" cannot increase without lim-
it. Let q be the smallest positive integer for which N q = N q+l·
Denote by Tit the range of an operator <p11 (A) and consider any vector
x common for subspaces T q and N . We have <pq (A) x = 0 and
x = <pq (A) y for some vector y E X.
It follows that <p1 q (A) y =
= 0, i.e. y E N tq· But by what has been proved N q = N 2 q. There-
fore y E N q• i.e. x = <pq (A) y = 0.
Thus T q and N q have only a zero vector in common. In view of for-
mula (56.3) this means that X = T q -f.- N q· Since T q and N q are
invariant subspaces, the possibility of decomposing the operator is
established.
230 rhr Structure of a Linl'ar Operator [Ch. 9

As already noted earlier, all eigenvectors of A must be in T 9 and


N q• with N 9 containing the eigenvectors that correspond to the eigen-
values coinciding with some roots of a polynomial <p (z) and T q
containing those for which the corresponding eigenvalues coincide
with none of the roots of <p (z). Since to every eigenvalue there cor-
responds at least one eigenvector, it follows that:
Each (none) of the roots of the characteristic polynomial of the op-
erator induced on N 9 (T 9 ) is (is not) a root of <p (z).
A final characteristic of decompositions of an operator as a direct
sum using operator polynomials is furnished by
Theorem 73. t. Let the characteristlc polynomial f (z) of an operator
A be decomposed as a product of polynomials <p (z) and'¢ (z) having no
roots in common. Then A can be decomposed uniquely as a direct sum of
operators B and C with the characteristic polynomials <p (z) and'¢ (z).
Proof. Consider a decomposition of A as a direct sum obtained
using a polynomial <p (z). Since the product of the characteristic poly-
nomials of the operators defining the direct sum coincides with the
characteristic polynomial f (z), the existence of at least one decompo-
sition follows from the above studies.
Suppose now that there is another decomposition of the space X
as a direct sum of invariant subspaces Nand T. The induced operator
on N has a characteristic polynomial <p (z) and the operator on T
has a polynomial 'I' (z). By Theorem 71.1, N c. N" for every suf-
ficiently large k, and therefore N c. N ,1• The operator <p (A) is non-
singular on T, and hence the set of the images of vectors from T
relative to rp (A) coincides with T. But this means that T c. T 11 for
every k. The subs paces N and T, as well as N 9 and T q in the direct sum,
form the space X. Therefore inclusion~ N c. N 9 and Tc. T q arc pos-
sible only when N = N 9 and T = T q· Thus the theorem is proved.
Let A be au operator in an m-dimensional space X. We represent
the characteristic polynomial f (z) of A as the canonical factoriza-
tion
(73.2)
where "-t, A- 1 , • . . , A.r are mutually distinct eigenvalues and k 1 +
+ + ... +
k2 kr = m. Consider the polynomials
(z-A- 1) 11 1, (z-A. 2 )"a, •.. , (z-A.r)"r.
They are divisors of the characteristic polynomial f (z) and no pair
of them have roots in common. By Theorem 73.1, there are invariant
subspaces R 11 R 2 , • • • , Rr such that
. . .
X=Rt +R2+ ... +Rr.
The dimension of a subspace R 1 is equal to k 1 and the induced opera-
tor on R 1 has a characteristic polynomial (z- A. 1)"•.
A subspace R 1 is called a root subspace of A corresponding to an
eigenvalue A.,. Vectors of a root subspace are called root vectors. It
73) A direct sum of operators 231

follows from what has been said that any operator can be decom-
posed as a direct sum of operators induced on root !<ubspaces.
A root subspace R 1 coincides with the kernel of the operator ((A -
- 1,. 1E) 11 ')q for some positive integer q. We show that in this case it
is always possible to put q = 1. Consider operators (A - ). 1£)P for
p = 1, 2, . . . . Let p 1 be the smallest number for which the kernel
of (A - ).iE)Pt coincides with that of (A - 1,. 1E)P,-u. Then R, will
coincide with the kernel of (A - 1,. 1E)Pt. Since the dimensions of
the kernels of (A - 1,. 1E)P for p = 1, 2, ... are monotonically in-
creasing and the dimension of R 1 is equal to k,, we have p, ~ k 1•
Thus R, corresponding to an eigenvalue ). 1 of multiplicity k 1
clearly coincides with the kernel of (A - 1,. 1E) 11 '.
Theorem 73.2 (Cayley-Hamilton). If I {z) is the characteristic poly-
nomial of an operator A, then f (A) is a zero operator.
Proof. Let us represent the characteristic polynomial as the canon-
ical factorization (73.2). Since the operator polynomial f (A) con-
tains the factor (A - ).,E) 11 ' and any polynomials in the same opera-
tor are commutative, f (A) x 1 = 0 for any vector x 1 in R 1• Now take
a vector x and represent it as x = x 1 x2+ + ... + Xr, where
x 1 E R 1• It is now clear that f (A) x = 0, i.e. that I (A) is a zero op-
erator.
Of great interest is again the matrix interpretation of the results
obtained. Compose a basis of the space as a successive combination of
any bases of root subspaces R 1 , R 1 , • •• , Rr. Root subspaces are in-
variant and their direct sum coincides with X. Therefore the matrix
A e of A in the basis has the so-called quasi-diagonal form

A, A,. ~ ) .
(73.3)
(
0 Arr

Each A 11 is a k, X k, matrix that is the matrix of the operator in-


duced on a subspace R r.

Exerelses
1. Can an operator of differentiation in a finite dimen-
sional space of polynomials be decomposed as a nontrivial direct sum?
2. Prove that a system of root vectors corresponding pairwise to distinct
eigenvalues is lineuly independent.
3. Prove that if an operator A is nonsingular, then A - 1 = q> (A) for some
polynomial q> (z).
4. An operator A is said to be nilpotent if A P = 0 for some positive inte-
ger p. Prove that an operator is nilpotent if and only if all its eigenvalues are
zero.
5. Let q> (z) be a polynomial of the lowest degree for which q> (A) = 0.
Prove that q1 (z) is a divisor of the characteristic polynomial of A.
232 The Structure of a Linear Operator [Ch. 9

74. The Jordan canonical form


A further simplification of the matrix of an
operator as compared with the quasi-diagonal form (73.3) can be ef-
fected only by special construction of bases for each of the root sub-
spaces. Root bases can of course be chosen so that each matrix Au
in (73.3) is triangular. This form of the matrix of an operator is not
the simplest either, however.
Let us study in more detail the structure of root subspaces. If
x E Rtt then (A - A1E)I!.' x = 0. But for every particular vector
x the equation (A - 1,. 1E)m x = 0 may well hold also for m < k 1•
In particular, if x is an eigenvector corresponding to a multiple ei-
genvalue )., then (A - 1,. 1E) x = 0, although k 1 ;;;;;::, 2.
The height of a root vector xis the smallest nonnegative integer m
such that (A - A1E)m x = 0.
All root vectors corresponding to an eigenvalue At are of height
not greater than the multiplicity of 1,. 1• Recall, however, that in gen-
eral the heights of root vectors and the multiplicities of eigenvalues
are two distinct notions. Thus, for example, for an operator of a sim-
ple structure there are no root vectors of height greater than unity at
ali, regardless of the multiplicities of the eigenvalues.
Let R 1 be a root subspace corresponding to an eigenvalue A1 of
multiplicity k 1• Denote by t the maximum height of root vectors in
R 1• It is clear that t ~ k 1• If a vector x is of height k, then a vector
(A - 1,. 1E) x will be of height k - 1. There are therefore root vectors
of all heights from 0 to t in R 1•
For any k ~ t, denote by H 11. the collection of all vectors whose
heights are at most k. It is easy to show that H11. is a subspace in R 1•
If x, y E H '" then (A - 1,. 1E)I!. x = (A - ). 1E)I!. y = 0. But then for
any a and ~ we have (A - ). 1E)I!. (ax +~y) = 0, i.e. ax + ~y E
E H~~.. It is, further, obvious that
0 = H 0 c H 1 c ... c. H 1 _1 c. H 1 = R 1•
We denote the dimensions of these subspaces by m~~., 0 = m0 <
<m1 < ... <m,_1 <m 1 = k 1•
Let 11 , • • • , lp, be arbitrary linearly independent vectors from
II 1 such that the direct sum of their span and H 1 _ 1 is H 1• It is clear
that they are root vectors of height t, that p 1 = m 1 - m 1 _ 1 and
that no nonzero linear combination of the vectors 111 • • • , lp, is
in H 1 _ 1 • Consider the collection of vectors
I t• ••• , I 'Pl'
(A-1,.,E) It• ...• (A- ).,E) I PI•
(A- ).,E) 2 It• . • ·• (A-1,. 1E) 2 IPl'
(74.1)
... ,
74) The Jordan canonical form 233

We show that they are linearly independent. Indeed, compose their


linear combination and equate it to zero. On applying to both sides
of the resulting equation the operator (A - 'A 1E) - 1 we find that the
linear combination of vectors / 1 , • • • , /p, is sent by (A - 'A 1E)'- 1
to the zero vector, i.e. that it is a vector in H 1 _ 1 • Hence the coeffi-
cients of these vectors must be zero. On applying now to the same
equation the operator (A - 'A 1E)'- 1 we similarly find that the coeffi-
cients of the vectors in the~second row of (74.1) must be zero and so on.
Notice that by virtue of the choice of vectors f 1 , • . . , /p, no nonze-
ro linear combination of vectors in the ith row of (74.1) is in H,_,.
We supplement the vectors (A - 'A 1E) / 1 , ••• , (A - 'A 1E) /p,
with vectors /p,+l• ••• , /p, of H,_ 1 such that the entire collection
is linearly independent and the direct sum of its span and H ,_1
is H, _1 • It is clear that they will be root vectors of height t - 1,
that p 1 = m t _1 -m 1 _ 2 and that no nonzero linear combination of the
vectors is in H 1_ 1 • We again construct the collection of vectors
fP1+h ... ,
(A-'A,E) fP1+I• ... ,
(74.2)
(A-'A,E)'-2 fP1+h • • ., (A-'A,E)'-2 /p1 •
With respect to the collection of vectors (A - 'A 1E) f~t ... , (A -
- 'A,E) /p, , /p1+1, ••• , /p1 we can prove all the facts proved with
respect to the collection of vectors f 1 , • • • , f pu replacing of course
t by t - 1. Going thus to subspaces H 1 _ 1 , H ,_ 3 , • • • , H 1 we
obtain a linearly independent system of k 1 vectors lying in a root sub-
spaceR,. Arrays of the type (74.1) and (74.2) end with an array con-
taining a single row
fp,_1+1t •••• /p,· (74.3)
These vectors are in H 1 , i.e. are eigenvectors and p 1 = m 1 - m0 •
We arrange the arrays of the type (74.1) to (74.3) successively from
left to right by aligning them by the last row and introducing a
more compact notation for each vector. Then the following array re-
sults:
... , e'"
P1'
... , e<t-u
P1 '
..... <l-11
ePI '
(74.4}
e<O
1 ' ' ' ''
em
P1 '
ep'11,+ I•
•' ''
e'u e'o e'u
PI ' ' ' '' P 1_ 1+I' ' ' ' ' Pt '
The vectors in the first row of (74.4) are of height t, the vectors in
the next row are of height t - 1 and so on. The vectors of the last
row are of height 1, i.e. the operator A - 'A,E sends them to the zero
vector. Each column of (74.4) defines an invariant subspace of A -
- 'A 1E and hence of the operator A. These subspaces are called cy-
clic. The first p 1 cyclic subspaces are of dimension t, the next p 2 -
234 The Structure of a LinPar Operator [Ch. 9

- p 1 subspaces are of dimension t - 1 and so on. The last columns


define one-dimensional cyclic su bspaces. The entire root subspace R 1
is a direct sum of the p 1 cyclic subspaces.
We write the matrix of the operator induced in a cyclic subspace.
11 21
Suppose, for example, that vectors e~ , e~ , • • • , e<t1>, e~ 1 > are taken
as a basis. Since
(A-A. 1E)e't=O, (A-A. 1E)e\21 =e'l\ ... , (A-A. 1 E)e~ 1 >=e~ 1 -l),
we have
Ae~u=A. 1 e~u, Ae~ = A. 1 e~ + ep>, ••• , Ae~ 1 >=A. 1 ep> +e~l-1).
11 21

Hence the matrix of the induced operator has the following form:
(A., 1 0 0 0 )
0 A., 1 0 0

0 0 0 ... A., 1
0 0 0 ..• 0 A., J
Matrices of this form are called Jordan canonical boxes.
We shall now construct a basis of a space as a successive combina-
tion of the bases of root subspaces R 1 , R 1 , • • • , Rr. As a basis of
each root subspace R 1 we take vectors of the type (74.4) ordered in
succession from bottom to top and from left to right. A space basis
constructed in this way is called a root basis.
In a root basis the matrix J of an operator A assumes the so-called
Jordan canonical form. It is a quasi-diagonal matrix made up of Jor-
dan boxes. First come Jordan boxes corresponding to an eigenvalue
"-t, in nonincreasing order of their sizes. Then, in the same order,
come Jordan boxes corresponding to A. 2 and so on. Thus

,., 1 0
0 '-r 0
0
0 0.. A.1
..
~.
J= . (74.5)
').,. , . 0
0 ).r .0
0
0 o...J.,.
75] The adjoint operator 235

In general some of the Jordan boxes of smaller sizes may be


lacking, of course.
Specifying an operator in a vector space defines a class of similar
matrices. The result obtained implies that any square matrix can be
reduced by similarity transformation to a Jordan canonical form.
It is clear that two square matrices of the same size are similar if
and only if they have identical Jordan forms. Given a fixed basis
therefore
Two square matrices of the same size define the same operator in a com-
plex space if and only if they have identical Jordan forms.

Exercises
1. Let x be a root vector of height v corresponding
to an eigenvalue ).. 1 of an operator A. Prove that if ).. 1 is a root of multiplicity
p of a polynomial <p (z), then a vector v = <p (A) z is a root vector of height
r = max (0, v - p} corresponding to the same eigenvalue ).. 1• What can be
said about the vector v if )..1 is not a root of <p (z)?
2. Let x be a nonzero vector and let <p (z) be a polynomial of the lowest degree
such that <p (A) x=O. Prove that <p (z) is a divisor of the characteristic polyno-
mial of A.
3. Prove that any square matrix can be reduced to a unique Jordan canoni-
cal form up to a permutation of Jordan boxes.
4. Prove that 1f a matrix is similar to the matrix J of (74.5), then it is simi-
lar to J' as well.
5. Prove that square matrices A and A' are the matrices of the same operator.
6. Let J be a Jordan canonical matrix. What is the form of matrices JP for
positive integers p?

75. The adjoint operator


Now we proceed to study linear operators in
a unitary space. Of course, all the results obtained earlier for opera-
tors in a complex space hold in this case too. We shall study therefore
only the additional properties of operators connected with the con-
cept of orthogonality. In some cases we shall also consider operators
from one unitary space into another. The principal part in our stud-
ies will be played by the so-called adjoint operator.
Let X and Y be two unitary spaces. An operator A* from Y to X
is said to be adjoint to an operator A from X to Y if for any vectors
x EX andy E Y
(Ax, y) = (x, A *y). (75.1)
Theorem 75. f. For any linear operator A there is an adjoint opera-
tor A * which is unique.
Proof. Choose in X some orthonormal basis e1 , e2 , • • • , em. Re-
call that for any vector r E X there is an expansion
m
X=~ (x, e11 ) e11 • (75.2)
II= I
236 The Structure of a Linear Operator [Ch. 9

If A* exists, then, by this formula, for any vector y E Y


m
A*y= ~ (A*y, e11,) e"
II= I
or considering (75.1)
m m m
A*y= ~ (e 11 , A*y) e"= ~ (Ae 11 , y) e11 = ~ (y, Ae") e11 • (75.3)
1&-1 '1&=1 1&-1

And this means that if A* exists, then it is unique.


Now take (75.3) to be the definition of A*. It is easy to verify that
the operator A* thus constructed is linear. It satisfies equation (75.1)
too. Indeed, considering that the system e11 e 2 , • • • , em is orthonor-
mal and taking into account (75.2) and (75.3) we get for any vectors
z EX andy E Y
m m
(Az, y)=(A ~ (x, e11 ) e11 , y) = ~ (x, e11 )(Ae", y),
'1&=1 1&-1
m m
(x, A*y)= ( ~ (x, e11 ) e11 , ~ (y, Ae 11 ) e 11 )
'1&=1 '1&-1
m m
= ~ (x, e11 ) (y, Ae 11 ) = ~ (x, e") (Ae", y).
'1&=1 11=1

Thus the theorem is proved.


The adjoint operator A* is connected with A by definite relations.
Note some of them:
(A*)*= A,
(A - B)* = A* + B*,
(aA)* = aA*, (75.4)
(AB)* = B*A*,
(A *)-1 = (A -1)*.
Here the bar over a means complex conjugation. All the relations
can be proved according to the same scheme. We shall prove in detail
therefore only the first and the last property.
Consider an operator A and the adjoint operator A*. The adjoint
operator of A* will in turn be an operator (A*)*. Now for any x E
E X and y E Y we have
(y, (A*)*x)=(A*y, x)=(x, A*y)=(Ax, y)=(y, Ax).
The left-hand side is equal to the right-hand side for any vector y.
Hence (A *)*x = Ax. But since this equation holds for any x, this
means that (A*)* = A.
75) The adjoint operator 237

Suppose now that A is an operator in X and is nonsingular. We


first prove that A* is also nonsingular. Let A *y = 0. According to
(75.3) it follows that
m
~ (y, Aelt) elt =0.
It= I

Since the system of vectors e1 , ••. , em is a basis,


(y, Ae11 ) = 0 (75.5)
for every k = 1, 2, ... , m. Since A is nonsingular, it converts any
basis again into a basis. But then the system of vectors Ae1 , • • •
• . . , Aem is also a basis and it follows from (75.5) that y = 0. Thus
the kernel of A* contains only a zero vector, i.e. A* is nonsingular.
Take vectors x, y E X. There are unique vectors u and v such that
Au= x, A *v = y.
We then find
(x, (A- 1 )*y) = (A- 1x, y) = (u, A*v) =(Au, v) = (x, (A*)- 1y).
The left-hand side equals the right-hand side for any x. Hence
(A - 1 )*y = (A *)- 1y. Since y is arbitrary, this means that (A - 1 )* =
= (A *)-1.
Many compatible properties of operators A and A* can be estab-
lished from investigating the matrices of these operators. Choose an
orthonormal basis e1 , e2 , • • • , em in X and an orthonormal basis
q1 , q2 , • • • , qn in Y. If X and Y coincide, it will be assumed that so
do their bases. Suppose that a matrix A 9 , with elements a 11 corre-
sponds to A. Then
n
Ae 1 =~ a11q1•
1=1

From this and (75.2) we conclude that


a 11 = (Ae 1, q1). (75.6)
Also suppose that corresponding to A* in the same bases is a mat-
rix A:q with elements at. By (75.6)
aTJ = (A *qJo e 1).
Comparing the elements a 11 and a~J and considering (75.1) we find
af1 = (A*q 1, e1) = (e 1, A*q1) = (Ae, q1) = a11 •
This formula justifies the following definition:
An m X n. matrix A •· with elements aTJ is said to be the adjoint
of an n X m matrix A with elements a 11 if aT; = aj, for all i and f.
238 The Structure of a Linear Operator [Ch. 9

Thus, corresponding to adjoint operators in any orthonormal bases


are adjoint matrices. Adjoint matrices clearly satisfy all relations
(i5.4). An adjoint matrix A • is related to a matrix A by the opera-
tions of transposition and complex conjugation. That is,

A• = (-4i) = (A)'. (75.i)


Here the bar means that all matrix ele11 ents are replaced by their
complex conjugates.
The rank of an operator coincides with that of its matrix. Therefore
it follows from (75. 7) that operators A and A • have the same rank.
Denote by N c: X, N* c: Y and T c: Y, T* c: X re-
spectively the kernels and ranges of operators A and A •. If x EN,
then Ax= 0 and (x, A *y) = 0. This means that the range of A • is
a subspace orthogonal to the kernel of A. Of course, the range of A
is also orthogonal to the kernel of A •. From the equality of the di-
mensions of the subspaces T and T* and from relations of the type
(56.4) we conclude that
X= N $ T*, Y = N* $ T. (75.8)

A basis Y~o y 2 , • • • , Ym of a unitary space X is said to be dual


to a basis x 1 , x 2 , • • • , Xm of the same space if
0 if i -=1= j.
(x,, YJ) = { 1 if i= j.
A dual basis is not infrequently used to study compatible proper-
ties of operators A and A • in the same space. We first prove that any
basis has a dual which is unique. Let x 1 , x 2 , • • • , Xm be a basis. For
any j, a vector y 1 must be orthogonal to vectors x 1 , • • . , x 1 _1
and x1 ... 1 , • • • , Xm and hence to the span L 1 constructed on those
vectors. It follows that y1 lies in a one-dimensional subspace Lf.
The normalization condition (xb y 1) = 1 defines it uniquely.
It is clear that a basis will be dual to itself if and only if it is ortho-
normal. The duality relation of bases is symmetrical and therefore
it makes sense speaking of a pair of mutually dual bases. Mutually
dual bases are called biorthonormal.
Theorem 75.2. If in some basis an operator A has a matrix J, then in
the basis dual to the given one the adjoint operator A* has a matrix J*.
Proof. Let A and A • have in an orthonormal basis e1 , e 2 , • • •
. . . , em corresponding matrices A • and A: and let A have a matrix
J in a basis x 1 , x 2 , • • • , Xm. Denote by P a coordinate transformation
matrix for a change from e1 , e 2 , • • • , em to x 1 , x 2 , • • • , Xm· Then by
(64.!1) we have
J = p-tA.P.
75) The adjoint operator 239

Applying matrix conjugation to the left- and right-hand sides of this


we find
J* = P* A~ (P- 1)*
or equivalently

This relation shows that the adjoint operator A • has a matrix J•


in a basis Y~t y 2 , • • • , Ym for which the coordinate transformation
matrix for a change from the basis e1 , e 2 , • • • , em is (P- 1 )•. Accord-
ing to (63.3) the cooordinates of the vectors x 1 , x 2 , • • • , Xm in
e11 e 1 , • •• , em are column elements of the matrix P, and the coor-
dinates of the vectors y 11 y 2 , • • • , Ym in e10 e 1 , • • • , em are column
elements of the matrix (P- 1 )•. Calculating pairwise scalar products
of vectors of the basis x 11 x 2 , • • • , Xm and vectors of y11 y 2 , • • • , Ym
is equivalent to calculating the elements of the matrix P' (P- 1)*.
But

Hence the basis y 1 , y 2 , • • • , Ym is dual to x 1 , x 2 , • • • , Xm.


Theorem 75.2 allows many consequences to be deduced. If, for
example, J is a Jordan canonical matrix, then there are eigenvalues
A-1 , A- 2 , • • • , Am along its diagonal. But the eigenvalues of the ma-
trix J* are~. X1 , • • • , Xm. Therefore the eigenvalues of A • are all
the complex conjugates of the eigenvalues of A. If A is an operator
of a simple structure, then Theorem 75.2 makes it possible to say
that the adjoint operator A • has also a simple structure. Basis sys-
tems of the eigenvectors of A and A • can be chosen so that they are
biorthonormal and so on.

Exercises

1. Suppose the coordinates of the vectors of some basis


of a Euclidean space in an orthonormal bol.'is e1, e2 , ••• , em form the
columns of a matrix A. Prove that the coordinates of the vectors of the
dual basis in the same bal.'is e1, e2 , ••• , e form the rows of a matrix A - 1 •
2. How are the characteristic polynomiafs of operators A and A • related'
3. Prove that if some subspace is invariant under an operator A, then 1ts
orthogonal complements is invariant under A •.
4. Prove that anf eigenvector of an operator A corresponding to an eigen-
value ). is orthogona to any eigenvector of an operator A • corresponding to an
eigenvalue J.l. =I= A.
5. Prove that any root vector of an operator A corresponding to an eigenval-
ue ). is orthogonal to any root vector of an operator A • corresponding to an
eigenvalue 1-1. =1= )..
240 The Structure of a Linear Operator [Ch. 9

76. The normal operator


The existence of an orthonormal basis in a
space and of a basis consisting of eigenvectors of a linear operator is
of great importance in making diverse studies. Our immediate task
therefore is to study a class of operators that have in a unitary space
orthonormal basis systems consisting of eigenvectors. Such operators
dearly exist. Among them, for example, are all scalar operators.
Theorem 76.1 (Schur). For any linear operator in a unitary space
there is an orthonormal basis in which the matrix of the operator is tri-
angular.
Proof. Consider, for example, the case of a right triangular matrix.
By Theorem 72.1, for any operator A there are invariant subspaccs
Lp, p = 1, 2, ... , m, such that the dimension of Lp is p and every
subspace with a smaller index is in all subspaces with larger indices.
The desired basis is constructed as follows. As a vector e1 we take
any normed vector of L 1 • As e2 we take a normed vector of £ 1 orthog-
onal to L 1 and so on. As em we take a normed vector of Lm orthogonal
to Lm-1" The basis e1 , e1 , • • • , em is orthonormal and, as noted
in Section 72, the matrix of an operator in such a basis is right tri-
angular.
A linear operator A is said to be normal if it is commutative with
its adjoint, i.e.
AA* = A*A.
We show that normal operators, and normal operators alone, have
in a unitary space basis systems of orthonormal eigenvectors.
The following remark is helpful in the study of these operators. If
a triangular matrix is commutative with its adjoint, then it is di-
agonal. Indeed, let, for example, an m X m matrix B be right trian-
gular and let B* B = BB*. Denote by b11 elements of B. The condi-
tion that the diagonal elements of the matrix B*B - BB* should be
zero gives the following system of equations in nondiagonal ele-
ments of B:
-I b12l 2 -1 bt31 2 - l bl, 12 - • • • - I btm 12 = 0,
I bl21 2 -l b2sl 2 -l ba, 12 - ···-I b2m 12 =0,
I b1sl 2 + I basl 2 - l bu 12 - ···-I bsm 12 = 0,
I blm 12 + I b2m 12 + I bsm 12 + •••+I bm-t,m 12 = 0.
Since the unique solution of this system is a zero solution, this
proves the validity of the above remark.
Theorem 76.2. For an operator in a unitary space to be normal it is
necessary and sufficient that it should have a basis system of orthonormal
eigenvectors.
76] The normal operator 241

Proof. Let A be a normal operator. Choose by Theorem 76.1 an


orthonormal basis such that the matrix of the operator is triangular.
In the same basis the operator A* has corresponding to it an adjoint
triangular matrix. Under the hypothesis A is normal, and therefore
the matrices of A and A*, in the chosen basis, must be commutative.
According to the above remark these matrices are diagonal. So we
have constructed an orthonormal basis in which the matrix of the
operator has a diagonal form. This means that that basis is made up
entirely of eigenvectors of the operator.
Suppose now that A has a basis system of orthonormal eigenvectors.
Then in the basis made up of those vectors the matrix of A will be
diagonal. But corresponding in the same basis to the operator A*
is an adjoint matrix that is obviously also diagonal. Diagonal ma-
trices are always commutative, and therefore so are A and A*.
In proving the theorem we have shown that if an operator A is
normal, then in a basis made up of orthonormal eigenvectors not only
the matrix of A but also the matrix of A* is diagonal. This leads to
Corollary. If A is a normal operator, then any orthonormal system
of eigenvectors of A is an orthonormal system of eigenvectors of A*,
and vice versa.
Corollary. If A is a normal operator, then the eigenvalues of A and
A* corresponding to the eigenvector they have in common are complex
conjugate.
Indeed, if Ax = 'Ax and A *x = f.LX, then by (75.1) for any normed
eigenvector x we have
).. = ('Ax, x) = (Ax, x) = (x, A*x) = (x, J.LX) = ~·
Of course, this fact holds for any operator A sharing eigenvectors
with A*. The normality of A ensures that there are common vectors.
The significance of normal operators in the general theory is ac-
counted for by two circumstances. One is that they constitute one of
the simplest classes of operators in a unitary space. The other is that
investigation of an arbitrary operator not infrequently reduces to a
study of normal operators.

Exerelses
1. Let A be a linear operator and let a and ~ be com-
plex numbers equal in absolnte value. Prove that aA + p...t • is a normal
operator.
2. Let A be a normal operator. Prove that for any polynomial 'P (s) the op-
erator 'P (A) is normal.
3. Prove that for a normal operator any induced operator is normal.
4. Prove that an operator A Is normal if and only if for any invariant sub-
space L its orthogonal complement Ll. is also innriant.
5. Let A be an operator of a simple structure in a complex space. Prove
that A can always be made normal by an appropria\e assignment of a sealar
product in its space.
242 The Structure of a Lint>ar Operator [Ch. 9

77. Unitary and Hermitian operators


Among the normal operators the most widely
used are operators of two types, unitary and Hermitian operators.
A linear operator U is said to be unitary if its adjoint operator u•
coincides with its inverse u- 1, i.e.
uu• = u•u =E.
Theorem 77.f. A normal operator U is unitary if and only if all
its eigenvalues are equal to unity in absolute value.
Proof. Let U be a unitary operator. Take any of its eigenvalues
A. and the corresponding normed eigenvector x. We have
1 = (x, x) =- (x, U*Ux)=(Ux, Ux) = (A.x, A.x)=A.·f(x, x)= I A. 12 •
Suppose now that all eigenvalues of the normal operator U are
equal to unity in absolute value. Let x1, ... , Xm denote orthonor-
mal eigenvectors of U and A. 1, ... , Am its eigenvalues. Under the hy-
pothesis, I A. 1 I = 1 for every i. Recall that for the adjoint operator
U*, x1, ... , Xm remain eigenvectors but correspond to the eigenval-
ues1.1, ... , 1m. Take a vector x and expand it with respect to the
eigenvectors of U

Now
U*Ux = U• (Ux) = U• (a 1A. 1X1 + ... +amA.mxm)
= Cl1A 1'I 1x1 + ... + ClmAm'ImXm = a 1x1 + ... +amXm =X.

Since x is an arbitrary vector, this means that u•u = E. Similarly


for UU* =E.
Theorem 77.2. An operator U is unitary if and only if for any two
vectors their scalar product equals that of their images.
Proof. Let U be a unitary operator. Then for any two vectors x
and y we have
(x, y) = (x, U*Uy) = (Ux, Uy). (77.1)
Suppose now that given some operator U equations (77.1) hold for
any vectors x and y. It follows that
(x, (U*U- E) y) = 0.
Since x and y are arbitrary, this means that u• U = E. The operator
U is nonsingular, for otherwise the equation u• U = E would- be im-
possible. Hence the operator u- 1 exists. Multiplying U*U = E by
U on the left and by u- 1 on the right we obtain another equation,
UU* = E. So U is a unitary operator.
7i) Unitary and Hermitian operators 243

Corollary. An operator U is unitary if and only if either UU* =


= E or U*U =E.
Corollary. Any unitary operator carries any orthonormal system of
vectors again into an orthonormal system.
Corollary. If a linear operator U carries any orthonormal basis again
into an orthonormal basis, then U is a unitary operator.
Indeed, let x 1 , • • • , Xm be an orthonormal basis and let Ux 1 =
= Yt and y 1, • • • , Ym be also an orthonormal basis. Take two
vectors, x and y. If

then
m
(x, y) = ~ a 1~1 •
t=l
By the linearity of U
m
Uy = ~ ~,Y;·
t=l
Therefore again
7fl

(Ux, Uy) = ~ a 1 ~1 •
t=l

So equations (77.1) hold for any vectors x andy.


Notice that we could define a unitary operator as an isometric op-
erator, i.e. an operator preserving the lengths of all vectors. This fol-
lows from Theorem 77.2 and from the easily verifiable relation
I x-f-y 12 -1 x-y l 2 +i I x+1y l 2 -dx-1y 12
(x, y) = 4 •

A linear operator H is said to be Hermitian or self-adjoint if it


coincides with its adjoint, i.e.
H = H*.
Theorem 77.3. A normal operator H is Hermitian if and only if all
its eigenvalues are real numbers.
Proof. Let H be a Hermitian operator. Take any of its eigenvalues
A and the cr1responding normed eigenvector x. We have
A = (A.x, x) = (Hx, x) = (x, H*x) = (x, Hx) = (x, A.x) = 1,"
i.e. A is a real number. Suppose now that the normal operator H
has real eigenvalues. Then in a basis made up of orthonormal eigen-
vectors of H the matrices of H and H* will coincide. Hence so do the
operators themselves, i.e. H is a Hermitian operator.
244 The Structure of a Linear Operator [Ch. 9

A Hermitian operator H is said to be nonnegative (positive definite)


if for any (nonzero) vector x
(Hx, x)~ 0 ( > 0).
Theorem 77.4.. A Hermitian operator H is nonnegative (positive
definite) if and only if all its eitenvalues are nonnegative (positive).
Proof. Choose an orthonormal basis made up of eigenvectors
x 1 , ••• , Xm of a Hermitian operator H. Then it follows from the ex-
pan~ion
X = S1X1 1 ... + SmXm

for a vector x that


(Hx, :c)=A•ISII 2 + · ·· +l..m ISm 12 •
Hence, if all eiienvalues of a Hermitian operator are nonnegative
(po~itive), then the operator itself is also nonnegative (positive def-
iRite). Putting x = X; we get
(Hx;, x 1) = 1.. 1
for every i. Therefore all eigenvalues of a nonnegative (pos!tive def-
inite) operator are nonnegative (positive).
It. follows from the foregoing that a positive definite operator is a
non~ingular nonnegative operator. Among all the Hermitian opera-
tor~ nonnegative and positive definite operators play an especially
important role. We note some of their properties.
If H and S are positive definite operators, then the operator a.H +
+ ~S is positive definite for any nonnegative numbers a. and ~ not both
zero.
Indeed, the operator a.H + ~S is Hermitian for any real numbers
a. and~- If. however, those numbers are nonnegative and are not both
zero, then
((a.H+ J}S) x, x) = a. (Hx, x) +~ (Sx, x) > 0

for .x :/= 0.
If an operator H is positive definite, then H- 1 is also a positive defi-
nite operator.
Indeed, since H = H*, we have H- 1 = (H*t 1 = (H- 1 )*, i.e.
the operator H- 1 is Hermitian. The eigenvalue~ of H- 1 are inverses
of the eigenvalues of H. Therefore they are positive and H- 1 is posi-
tive definite.
If H is positive definite and A is nonsingular operator, then A* H A
and AHA • are positive definite operators.
It is easy to verify that they are Hermitian. By the nonsingularity
of A we have Ax :/= 0 and A •x :/= 0 for any x :f=. 0. Therefore
(A*HAx, x) = (HAx, Ax)> 0, (AHA*x, x) = (l!A*x, A*x) > 0
for x :/= 0. In particular, it follows that for any nonsingular opera-
iil Unitary and Hermitian operators 245

tor A the operators A* A and AA * are positive definite. But if A


is a singular operator, then A* A and AA * are nonnegative.
For any nonnegative operator H there is a nonnegative operatorS such
that Sl = H.
Indeed, let A. 1 , • • • , Am be the eigenvalues of H and x 1 , • • • , Xm
the corresponding orthonormal eigenvectors. Then Hx, = A.,x, for
every i. Let S be defined by the equations Sx 1 = V A.,x 1• The opera-
tor S is nonnegative, since it has a basis system of orthonormal ei-
genvectors x 1 , • • • , Xm corresponding to nonnegative eigenvalues
Vi;, ... , V ~m· Besides, S 2x, = llx, = A.,x,. Thus, S 2 and Jl
coincide on the vectors of the basis x 1 , • • • , Xm and therefore tbey
do on all vectors, i.e. S 2 = II.
A nonnegative operator S is said to be the principal square root
of a nonnegative operator H if S 1 = IT.
It is important to stress that all eigenvectors of Sand Jl coincide.
Indeed, suppose A. 1 , • • • , A.r and VA. 1 , • • • , V ~ are the various ei-
genvalues of II and S respectively. Denote by X 1 (Y,), i = 1, 2, ...
. . . , r, the proper subspace of the operator IT (S) containing all
eigenvectors corresponding to an eigenvalue A. 1 <V ~,). The direct
sums of the proper subspaces X 1 , • • • , X r and Y 1 , • • • , Y r coincide
with the entire space. Therefore
dim XI+ ... + dim xr =dim yl + ... +dim Yr. (77 .2)

It is clear that Y 1 c X 1 for every i, i.e. dim Y,-;;;;;; dim X 1• Hence


(77.2) can hold only if for every i we have dim Y, = dim X;. i.e.
Y, =X,.
So the eigenvalues and eigenvectors of S arc uniquely defmed by
H. Since S is a Hermitian operator, this means that the principal
root of ll can be only umque.

Exercises
1. Prove that the set of all unitary operators in a
given unitary space forms a group relative to multiplication.
2. Prove that the set of all Hermitian operators in a given unitary space
forms a group relative to addition.
3. Let an operator A be Hermitian and B positive definite. Prove that the
eigenvalues of the operators BA and B-1A are real.
4. Prove that if A and B are positive definite eperators, then all eigenvalues
of the operator BA are posittve.
5. Prove that if A and B are commutative positive definite operators, then
the operator BA is also positive definite.
6. Prove that if A is a positive definite operator in a unitary space, then the
!unction (x, y)A = (A.r, y) satisfies all the scalar product axioms.
246 The Structure of a Linear Operat~r [Ch. !l

78. Operators A*A and AA*


If A is an operator from a unitary space X
to a unitary spaceY, then an operator A *A is defined in X and an
operator AA * in Y. These operators will play an important role in
our further studies. Therefore we shall no11' proceed to investigate
them.
From the first and fourth properties of (75.4) it follows that A* A
and AA * are Hermitian operators. 1\loreover, they are nonnegative,
since for any vectors x E X and y E Y we have
(A* Ax, x) = (Ax, Ax);;;;;::, 0,
(AA*y, y) = (A*y, A*y);;;;;, 0.
Therefore there is a nonnegative operator G in X and a nonnegative
operator F in Y such that
AA* = F2•
The operators(; and F satisfying these relations are unique.
Whatever the operator A, the operator A* A has an orthonormal
system of eigenvectors x 1 , x 2 , • • • , Xm. The operator A always car-
ries that system into some orthogonal system. Indeed, let
A* Ax,.= plx11 , p1, >O (78.1)
for all k = 1, 2, ... , m. Then
(Ax 1., Ax 1) = (A*Ax 11 • x,) = Pk (x", x 1) = 0
fork:/= l. In addition, for every k
I Ax,. I= p,.,
and therefore the vector Ax,. is nonzero if and only if the eigenvalue
r: of A* A is nonzero.
The nonzero vector Ax,. is an eigenvector of AA * and corresponds
to the eigenvalue pl. Indeed, by (78.1)
AA* (Ax 11 ) =A (A*Ax 11 ) =A (p1x 11 ) = plAx11 •
Thus all nonzero eigenvalues of A*A are eigenvalues of AA*.
The converse is also true o'f course. Therefore the nonzero eigenvalues
of A*A and AA* always coincide.
Eigenvalues of A *A and AA * will be denoted by pi, p;, . . . .
It may be assumed without loss of generality that
Pi;;;;;::, p;;;;;;::, ... ; ; ;: , p, > 0
and that the other eigenvalues pl are zero. It
is obvious that the ei-
genvalues of A* A and AA * differ only in the multiplicity of the ze-
78] Operators A* A and AA * 247

ro eigenvalue. The multiplicity of A* A is (m- t) and that of AA *


is (n - t).
The principal square roots of the common eigenvalues of A* A
and AA * are called singular (or principal) values of A.
Using eigenvectors of A* A and AA * it is possible to construct such
orthonormal bases in spaces X andY with the aid of which it is easy
to describe and investigate operators A and A*. Take as a basis in
X an orthonormal system x 1 , • • • , Xm of eigenvectors of A*A. It
follows from (75.8) that vectors x 1 , • • • , x 1 form a basis in T* and
that vectors x 1 + 1 , • • • , Xm form a basis inN. The orthonormal basis
y 1 , • • • , Yn in Y is constructed as follows. As y 1, • • • , y 1 we take the
vectors obtained after the normalization of Ax1 , • • • , Ax 1 • These
vectors form a basis in T. As Yt+l• ... , y,. we take any orthonormal
basis in N*. It is clear that y 1 , • • • , Yn are eigenvectors for AA *
and form a basis in Y. Considering that I Ax11 I = p~~, we now find
P~tYil• k~t.
Ax~~,= { (78.2)
0, k>t.
Multiplying these equations by A* and taking into account (78.1)
we get
(78.3)

The orthonormal bases in X and Y connected with A and A* by


relations (78.2) and (78.3) are called singular bases.
If X and Y are distinct spaces, then the matrix of A can be writ-
ten in singular bases. Denote it by .t\. By (78.2) it is as follows:
( Pt 0 1
P2

(78.4)
Pr
0

If X and Y coincide, then singular bases are not used as a rule to


write the matrix of an operator. Relations (78.2) and (78.3) hold
again, however.
Exercises
1. Prove that it is the kernels of operators A, A • A (A •,
A A *) and the ranges of A , A A • (A •, A • A) that coincide.
2. Prove that if dim X> dim Y (dim X <dim Y), then A *A (AA *) is
a singular operator.
248 The Structure of a Linear Operator [Ch. 9

3. Prove that singular values remain unaffected by multiplication of an


operator A by any unitary operators.
4. Let A be an operator in a space X and let all its singular values be mutu-
ally distinct. Prove that singular bases are uniquely defined up to multiplica-
tion of each of the vectors by a number equal to unity in absolute value.
5. Prove that the siniUlar values of a normal operator coincide with the
moduli of its eigenvalues.
6. Prove that the singular values of an operator A - 1 are inverses of the sin-
gular values of an operator A and that the singular bases of both operators coin-
cide.
7. Let A be an operator in an m-dimensional unitary space X. Denote by
~ •... , ~ its eigenvalues and by p1 , • • • , Pm its singular values. Prove that
m m m m
~ I ;,ll 12 :so; ~ Pt
ll=l 11=1
fJ
lt=l
I ;,~< I = fJ Pll·
11=1

8. Prove that if I).It 1 = p11 for all k = 1, 2, ... , m, then the operator is normal.

79. Decomposition of
an arbitrary operator
One of the circumstances determining the
significance of the unitary and the Hermitian operator is the possi-
bility of using them to represent an arbitrary linear operator.
Let A be a linear operator in a unitary space X. We show that it
can always be represented as
A = H1 iH 2 ,+ (79.1)
where H 1 and H 2 are Hermitian operators. Indeed, if this decomposi-
tion exists,

But then
1
H 1 =:r(A+A*),
It is these formulas that define decomposition (79.1). Since
HIH'I.-H'I.HI= ;, (A*A-AA*),

the normality of A implies that H 1 and H 2 are commutative, and


vice versa.
Let x 1 , • • • , Xm be an orthonormal system of eigenvectors of the
operator A* A. According to (78.2) there is an orthonormal system
y 1 , • • • , Ym of eigenvectors of AA * such that
Ax~a = P~rY11 (79.2)
for all k. Now let linear operators F and U be defined in a space X
by the following equations on basis systems of vectors:
Uxlt = Y~t• Fylt = P11Y1t· (79.3)
i9) Decompo!<ition ef an arbitrary operator 24~

Relations (79.2) and (79.3) imply that the following decomposition


is obtained:
A= FU. (79.4}
Here F is a nonnegative Hermitian operator, since it has a basis or-
thonormal system of eigenvectors y 1 , y 2 , • • • , Ym and nonnegative
eigenvalues p1 , p 2 , • • • , Pm· The operator U is unitary, since it car-
ries the orthonormal system of vectors x1 , x 2 , • • • , Xm into the ortho-
normal system y 1 , y 2 , • • • , Ym· Note that (79.4) yields
(79.5}
i.e. F is the principal square root of AA *.
Decomposition (79.4) is called a polar fact9rization of an operator A.
By virtue of the uniqueness of the principal root the operator F in a po-
lar factorization will always be unique. The operator U will be unique
only when the operator A is nonsingular. In that case U = F- 1A.
Again there is a direct coRnection between the normality of an
operator A and the commutativity of the components of its polar
factorization. Indeed, let UF = FU for some operator A. Then
A*A = U*F*FU = F*U*UF = F 2 ,
which together with (79.5) means that the operator A is normal.
Suppose now that A is a normal operator, i.e. that A* A = AA *.
By (79.4) A = FU. Hence A* = U*F. The normality condition of
the operator leads to U* F2 U = F 2 or
F 2 U = UF 2 •
Taking into account the ~cond of the rela lions (79.3) we get
Fl (Uy,,.) = P1 (Uyh)
for all k = 1, 2, ... , m, i.e. Uylt are eigenyectors for the operator
P. As noted earlier, F 2 and F have the same eigenvectors. Therefore
(FU) Y~t = F (Uy~t) = P~t (Uy11)
for all k = 1, 2, ... , m. On the other hand, by the second of the
relations (79.3)
(UF) Y~t = U (Fy11) = U (P~tYII) = P~t (Uy~t)·
These equations show that the operators FU and UF coincide on
the basis system of vectors y 1 , y 1 , . • . , Ym· Hence UF = FU.

Exereiaes
1. Prove that if an operator is normal, then the eigen-
values of the operator H 1 (H 1 ) of (i9.1) are the real (imaginary) parts of th~
eigenvalues of the operator A.
250 The Structnre of a Linear Operator [Ch. 9

2. Prove that if A is a normal operator, then the eigenvalues of the opera-


tor F (the independent variables of the eigenvalues of the operator U) of (79.4)
are the absolute values of the eigenvalues (the independent variables of the
nonzero eigenvalues) of A .
3. Prove that if the operator A is normal, then both operators in decomposi-
tion (79.1) have the same eigenvectors as A. What can be said about the eigen-
vectors of the components of decomposition (79.4)?

80. Operators in the real space


Additional difficulties arise in investigating
linear operators in a real space. They are mainly due to the fact that
not every linear operator in a real space has at least one eigenvector.
Of course, if the characteristic polynomial of an operator in a re-
al space has only real roots, then there is close similarity in theory.
In fact only terminology changes. That is, the words "complex, uni-
tary, Hermitian" are replaced respectively by "real, orthogonal,
symmetric". If, however, the characteristic polynomial has, in addi-
tion, complex roots, then the study of such an operator becomes a
more complicated matter.
Let the real space R be given. Consider the set of all possible pairs
{x; y) of vectors x and y from R. We define operations on those pairs.
It is assumed that
(x; y) + (u; -v) = (x + u; y + v)
for any two pairs and that for any complex number s + iT) and any
pair (x; y)
(s ,
iTJ) (x; y) = (~x - 1w; TJX + ~y).
It is easy to verify that the set of all pairs of vectors from R with the
operations thus introduced is a complex space C.
The constructed space C has the same dimension as the space R.
Indeed, let e1 , e 2 , • • • , em be a basis in R. For any pair of vectors u
and v from R we have
u = a 1e1 + ... + amem, } (80.1)
v = ~lei -r ... + ~mem,
where a, and ~~ are real numbers. But it follows that
m
(u; v) = L; (all 1- i~ll) (ell; 0). (80.2)
11=1

The system (e1 ; 0), ... , (em; 0) is linearly independent. Therefore


the dimension of C is equal to m.
For any basis e1 , • • • , em in Rand any real numbers a 1 , • • • , am
m m
L (all+ iO) (ell;
ll=l
0) = ( 2:;
~~~t
altell; 0).
80) Operators in the real space 251

Hence there is a 1-1 correspondence between all vectors u from R


and all pairs of the form (u; 0) from C. 1\loreover, this correspondence
is an isomorphism if restricted to the operations with real numbers.
If all pairs of the form (u; 0) are identified with vectors u from R,
then it follows from (80.1) and (80.2) that the space C may be con-
sidered as a set of elements
w = u + iv,
where u, v E R. It should be remembered, of course, that in fact the
elements u and v are pairs (u; 0) and (v; 0) and that multiplication by
a number i and addition are carried out according to the definitions
introduced above. When v = 0 we obtain elements of R. It is natural
to consider R to be some set of C. Elements of the form u + iO
will be called real and elements u -+- iv and u - iv complex conjugate.
The space C is called the complexification of the real space R.
In solving various problems in a Euclidean space we can proceed
in a similar way to obtain a unitary space. Consider the complexifi-
calion C of a Euclidean spnce R. For any two vectors
Z =X+ iy, w = u + iv
from C it is assumed by definition that
(z, w) = ((x, u) + (y, v)) -+- i ((y, u) - (x, v)).

It is not hard to establish that the space C with such a scalar product
is unitary. The scalar product for any two vectors from R is pre-
served.
Let A be an operator in R. Construct a new operator A inC equal
to A on R. To do this we set

+ iv)
A

A (u = Au -L iAv.

It is clear that A is a linear operator and that Au = Au for every


vector u E R.
The operator A is called the complexification of the operator A.
Now instead of studying the operator A in the real space R it is
A
possible to consider the operator in the complex space C and inves-
tigate it in R as a set of C. This device is most often used when some
fact in the complex space has no analogue in the real space.
Suppose that a real basis is given in C. Then in that basis the ma-
trix of the complexification A
is real and coincides with the matrix
of the operator A in the same basis. It follows that the characteristic
polynomial of A coincides with that of A and hence has real coeffi-
cients. It is obvious that
252 The Structure of a Linear Operator [Ch.!)

If the characteristic polynomial of an operator A in the real space R


has a real root, then that root is an eigem;alue of A and has at least one
real eigenvector corresponding to it.
Consider now some complex root A. of the characteristic polynomial
of A. It is an eigenvalue of A and has some eigenvector w correspond-
ing to it. Since the characteristic polynomial of A has real coeffi-
cients, A will also have the complex conjugate eigenvalue 1.. The op-
erator A carries complex conjugate vectors into complex conjugate
vectors. Therefore it follows from Aw = A.w that A w = 11C. Hence
the complex conjugate eigenvalues of A have the corresponding com-
plex conjugate vectors.
w
If A. :f= 1, then the vectors wand are linearly independent as ei-
genvectors corresponding to distinct eigenvalues.
Consider vectors x and y defined as follows in terms of u: and iC:
1 -
x =
2 (w + w), y = :r,I (w - -
1r). (80.3)

It is easy to verify that they are real. Moreover, it is not hard to see
+
that if A. = 1.1. iv, then
Ax= flX- vy, Ay = \"X -r f.IY·
Therefore the span in R constructed on vectors (80.3) is an invariant
subspace of A. The matrix of the induced operator on that ~ubspace
in basis (80.3) is as follows:

( -v
!.1. v) .
Jl,
Hence the characteristic polynomial of the induced operator i!' (z -
- J.1.) 2 + v 2 or equivalently z2 - (A. +
~) z +
AI. Note that in the
invariant subspace constructed A has no eigenvector for ,. :/= 0.
Thus we have arrived at an important conclusion. Namely:
If the characteristic polynomial of an operator A in the real space R
ha.<; a complex (not real!) root, then that root has in R a corresponding
tu·o-dimensional invariant subspace of A containing no eigenvectors.
This conclusion is as important for the study of operators in a
real space as is the fact of the existence of at least one eigenvector for
the study of operators in a complex space. Choosing in a suitable way
bases in the space R we can reduce the matrix of an operator to a form
resembling in a sense either the diagonal form or the triangular form
or the Jordan canonical form. This method of investigating the op-
erator is employed comparatively rarely, since real canonical forms
lack many merits of complex canonical forms. It is much easier and
more fruitful to investigate the complexification of an operator.
81) Matrices of a special form 253

Exercises

1. Prove that the range (the kernel) of an operator A


is a complexification of the range (the kernel) of an operator A.
2. Let a complexification A have a simple structure. Prove that a basis can
be chosen in R such that the matrix of an operator A has a quasi-diagonal form
with 1 X 1 and 2 X 2 matrices along the diagonal.
3. Prove that in the real space R of dimension m any operator has an invari-
ant subspace of dimension m - 1 or m - 2.
4. Wliat is the counterpart of Theorem 72.1 in a real space?
5. Prove that any linear operator in a real space of odd dimension has at
least one eigenvector.

St. Matrices of a special form


We have discussed some operators of a special
form. It is natural to suggest that the matrices of those operators
should also have some specificity.
A square complex matrix U is said to be unitary if its adjoint U*
coincides wilh its inverse u- 1 , i.e.
uu• = u•u =E.
We recall that in an orthonormal basis the adjoint operator has a
corresponding adjoint matrix. Hence the matrix of a unitary opera-
tor in an orthonormal basis is unitary.
Suppose that in a unitary space any two orthonormal bases are
given. We construct a coordinate transformation matrix for a change
from one of the bases to the other. According to (63.3) the matrix
columns are made up of the coordinates of the vectors of the second
basis relative to the first. But of the same form is also the matrix of
a linear operator transforming the vectors of the first basis into those
of the second. According to the second corollary of Theorem 77.2
that operator is unitary. Therefore
A coordinate transformation matrix for a change from an orthonormal
basls to an orthonormal basis is unitary.
We shall say that two matrices are unitarily similar if they are sim-
ilar and the similarity transformation matrix is unitary. It follows
from the properties of the unitary operator that any unitary matrix is
unitarily similar to a diagonal matrix with diagonal elements equal
to unity in absolute value.
It is easy to write the relations defining the elements of a unitary
matrix. Let U be an m X m matrix. We denote by u 11 its elements.
Then it follows from UU* = E that

~ - { 0 if i -=1= j,
£....1
11.=1
u,l!.uJI!. = 1 1'f l. = ]..
254 The Structure of a Linear Operator [Ch. 9

Similarly from U*U = E we get


m { O if i :/= j,
'k-;;;1 Ul!.fUI!.J = 1 if i = j.

Thus the systems of row vectors and column vectors of any unitary
matrix are orthonormal systems.
A real unitary matrix U is called orthogonal. It is defined by the
following relations:
UU'=U'U=E.

All properties of orthogonal rna trices follow from those of unitary


matrices.
A square complex matrix His said to be Hermitian or self-adjoint
if it coincides with its adjoint, i.e.
H = H*.
Thus the matrix of a Hermitian operator in an orthonormal basis is
Hermitian.
It follows from the properties of the Hermitian operator that any
Hermitian matrix is unitarily similar to a real diagonal matrix. If
h, 1 are elements of the Hermitian matrix H, then
ha = li 1,
for all i and j. It follows in particular that the diagonal elements of
any Hermitian matrix are real.
A real Hermitian matrix H is called symmetric. It is defined by
the following relation:
H=H'.
Note that any symmetric matrix is orthogonally similar to a real di-
agonal matrix.
A square matrix is said to be normal if it is commutative with its
adjoint.
According to this definition the matrix of a normal operator in an
orthonormal basis is normal. Taking into account the properties of
a normal operator it is easy to see that any complex normal matrix
is unitarily similar to a diagonal matrix.
Matrices of a special form play an important role in constructing
various computational algorithms. Nevertheless we shall not be con-
cerned with their detailed study. All the properties of these matrices
are virtually a reflection of similar properties of corresponding opera-
tors.
81) Matrices of a special form 255

Exercises
1. Prove that any complex matrix is unitarily sim-
ilar to a triangular matrix.
2. Let ~. ).,, ... , ).,. be the eigenvalues of a matrix A, each eigenvalue
repeated accorchng to mtiftiplicity. Prove that
m
~ I AI 12 :so; tr (A • A). (81.1)
1=1

3. Prove that equality holds in (81.1) if and only if the matrix A is normal.
4. Using the Binet-Cauchy formula prove tbat for any matrix A the princi-
pal minors of the matrix A •A are nonnegative.
5. Prove that the sum of the squares of the absolute values of all minors of
a unitary matrix in any fixed rows and columns is equal to unity.
6. Prove that any rectangular matrix A can be represented as A = QAS,
where Q and S are unitary matrices and A is a diagonal matrix with nonnegative
elements.
CHAPTER 10

Metric Properties
of an Operator

82. The continuity and boundedness


of an operator
We have introduced the concept of linear oper-
ator as some generalization of the notion of function. Assuming that
in spaces a metric is defined, it is possible to draw an analogy with
the boundedness of a function, the continuity of a function, etc.
When studying these questions we shall always assume that the op-
erator acts from an m-dimensional normed space X to an n-dimensio-
nal normed space Y. If X does not coincide withY, then the norms in
both spaces can be introduced independently of each other.
An operator A from X to Y is said to be continuous at a point x 0 E
f X if the condition x 11 -+Xo implies Ax11 -+ Ax0 for any sequence
{x 11 } in X. If the operator is continuous at each point of X, then it
is said to be everywhere continuous or simply continuous.
Theorem 82. t. A linear operator in arbitrary finite dimensional
normed spaces is continuous.
Proof. We take a vector x 0 E X and choose any basis e1 , e 2 , • • •
. . . , em in X. We have

e1 + ...
t(O) t(O)
Xo=':ot +':om em.
Suppose x 11 -+ x 0 and
t<h)
X11 =':II e1 + ·· · + Sm em.
"(h)

By Theorem 53.1 convergence in the norm implies coordinate conver-


gence. Therefore s~~~> -+ £~ 0 ' for every s. But
A Xo t<O>A el + • • • +':om
= ':II t<o>A em
and in addition
11
Ax11 = s\ >Ae 1 + ... +s~>Aem.
s!
Now the convergence of s~11 > -+ 0' for every swill imply the conver-
gence of Ax 11 -+Ax 0 in the norm of Y.
An operator A is said to be bounded if there is a constant M such
that II Ax II ~ M II x II for any vector x EX.
Theorem 82.2. A linear operator in arbitrary finite dimensional
normed $paces is bounded.
82] The continuity and boundedness of ao. operator 257

Proof. Suppose an operator A is not bounded in some case. Then


there is a sequence {x"} of nonzero vectors such that
II~ k
II Ax" II x" II·
Consider a sequence of vectors
1
Ylt = k llx" II x~~..
It converges to zero, since
1 1
II y" II = k II X" II II XII. II = T - 0.
On the other hand,
1
II Ay" II = k II x" II II Ax" II~ 1.
This means that {Ay"} does not converge to zero, i.e. that A is
not continuous at zero. This contradiction with Theorem 82.1 com-
pletes the proof.
It is natural to pose the question concerning the smallest of the
constants M satisfying II Ax II~ M II x II for all vectors x. Since
the set of those constants is bounded below by zero, the smallest con-
stant clearly exists. It is called the norm of the operator A and desig-
nated II A 11. By definition the norm of an operator has the following
two properties:
(1) for any vector x in X
II Ax II~ II A II • II x II, (82.1)
(2) for every number e > 0 there is a vector x 8 E X such that
II Axe II~ (II A II- e) II X 8 11. (82.2)
We prove that
II All= sup !lAx II (82.3)
l(xi(E;;t

or equivalently that ' I ~ I ) I 1 [1

IIAII=sup IIAxll (82.4)


x¢0 II xll
if of course dim X > 0.
We take a vector x satisfying II x II~ 1. Then it follows from (82.1)
that
II Ax II~ II A II II x II~ II A 11.
Consequently
sup
(lx(IE;;t
II Ax II~ II A II· (82.5)
258 Metric Properties of an Operator [Ch. 10

We further take any vector Xe according to (82.2) and construct a


vector

Then
1 1
II Ay, II = if;;~! II Axe II> liXeif (II A II- e) II Xe II = II A II- e.
Since II Y8 II = 1, we have
sup II Ax II> II Aye II;>: II A 11-e.
11x11,.;t
By virtue of the arbitrariness of e we get
sup IIAxii>IIAII. (82.6)
llxi(E:;I
Now from (82.5) and (82.6) we obtain relation (82.3) which was to
be established.
We shall soon show that the norm of an operator plays an exceptional-
ly important role in introducing a metric in the space of linear operators.
It is the explicit form (82.3) that will be essential.

Exercises
1. Prove that on a bounded closed set of vectors the
supremum and infimum of the norms of the values of a linear operator are
attained.
2. Prove that a linear operator carries any bounded closed set again into
a bounded closed set.
3. Is the assertion of the preceding exercise true if the boundedness requi~
ment of a set is dropped?
4. Prove that in (82.3) the supremum is attained on a set of vectors satisfying
II x II = 1 provided dim X > 0.
5. Let A be an operator in a space X. Prove that A is nonsingular ifand
only if there is a number m > 0 such that II Ax II> m II x II for any x E X.

88. Tile DOnD of an operator


A set (J) :rr of linear operators from X to
Y is a finite dimensional vector space. If that space is real or com-
plex, then it can be converted into a complete metric space by in-
troducing a norm in it in some way.
To introduce a norm in a space of linear operators the same methods
can be used as those employed in any other vector space. Of most in-
terest in this case, however, are only the norms in (J) rr that are suf-
ficiently closely related to those in X andY. One of the most impor-
tant classes of such norms is the class of the so-called compatible
norms.
83) The norm of an operator 259

If for each operator of (J) .xr


II Ax II~ II A II · II x II
for all x E X, then the operator norm is said to be compatible with
the vector norms in X and Y.
The advantage of compatible norm is easy to see from the follow-
ing example. Suppose that A is an eigenvalue of an operator A in
X and that x is the corresponding eigenvector. Then Ax= AX and
therefore
I A I · II x II = II A.x II = II Ax II~ II A II · II x II·
Hence I A I~ II A II- So we have obtained a very important conclu-
sion:
The moduli of the eigenvalues of a linear operator do not exceed any
of its compatible norms.
This example shows that to obtain the best estimates it is desir-
able that the smallest of the compatible norms should be used. It is
clear that all compatible norms are bounded below by expression
(82.3). If we show that this expression satisfies the norm axioms,
then it will precisely be the smallest of the compatible norms. This
justifies both the name of expression (82.3) and the notation used.
It is obvious that for any operator A the expression II A II is
nonnegative. If II A II = 0, i.e. if
sup II Ax II= 0,
ll:t(IE;;1

then II Ax II = 0 for every vector x whose norm does not exceed uni-
ty. But then, by the linearity of the operator, Ax = 0 for every
x. Hence A = 0. For any operator A and any A we have
IIAAII= sup fiAAxii=IAI sup IIAxii=IAI·IIAII.
n:~: rr.;;; 1 11z n.;;; 1

And finally for any two operators A and B in (J) xr


IIA+BII = sup IIAx+Bxll~ sup (IIAxii+IIBxll)
ll:t((E;;1 ll:tiiE;;1
~sup IIAxll+ sup tiBxii=IIAII+IIBII.
llzi1E;;1 ll:t((E;;1

All these relations precisely mean that (82.3) is a norm in a space of


linear operators. Norm (82.3) is called an operator norm subordinate
to the vector norms in X and Y.
A subordinate norm has a very important property relative to the
operation of operator multiplication, too. Let A be an operator from X
to Y and B an operator from Y to Z. As is known, this defines an
operator BA. Considering the compatibility of subordinate norms
260 Metric Properties of an Operator [Ch. to

we find
II BA II= sup II (BA) x II= sup II B (Ax) II
(lxi(E;;l ll:r(IE;;l

~sup (IIBII·IIAxll) =liB II sup II Axil= IIBII·IIAII.


l(x(IE;;l ((:ti(E;;l

Thus any subordinate norm of an operator has the following four


basic properties. For any operators A and B and any number A
(1) II A II> 0 if A -=!= 0; II o II= o,
(2) II AA II = I A I II A 11.
(83.1)
(3) II A + B II ~ II A II + II B 11.
(4) II BA II ~ II B II· II A II·
To note a further property, for the identity operator E
(5) II E II= 1.
This follows from (82.3), since Ex = x for any vector x.
In the general case a subordinate norm of an operator depends
both on the norm in X and on the norm in Y. If both spaces are uni-
tary, then we may take as a norm in them the length of the vectors.
The corresponding subordinate norm of the operator is called the
spectral norm and designated II· 11 2 • So for any operator A from X toY
II A 11; = sup (Ax, Ax). (83.2)
(x, x)E;;l
We investigate some properties of spectral norm.
The spectral norm remains unaffected by the multiplication of an op-
erator by any unitary operators.
Let V and U be arbitrary unitary operators in X and Y respec-
tively. Consider the operator B = UA V. We haYe
IIBII;= sup (Bx, Bx)= sup (UAVx, UAVx)
(x, x)E;;l (x, x)E;;l
= sup (AVx, U*UAVx) = sup (AVx, AVx)
(x, :t)E;;l (:11:, x)E;;l

=(Vx,sup
Vx)E;;l
(AVx, AVx) =sup (Av, Av) =II A 11:.
(11, v)E;;l

Assigning a spectral norm in the form (83.2) establishes its rela-


tion to singular values of the operator A. Let xlt x 2 , • • • , Xm be an
orthonormal system of eigenvectors of the operator A • A and let
pf, p! •... , p~ be its eigenvalues. It may be assumed without loss
of generality that
P1 ;;;;;::, P2 ;;;;;::, • • · ;;;;;::, Pm = 0. (83.3)
Wa represent a vector x E X as
X = Cl 1X 1 + ... + ClmXm, (83.4)
831 The norm of an operator 261

then
m
(x, x) = ~ I ai lz.
t=l

As noted in Section 78, the system x., x 2 , • • • , Xm is carried by an


operator A into an orthogonal system, with
(Ax 1, Ax 1) = pi
for every i. Hence
m
(Ax, Ax)= ~ I a 1 12 pi,
1=1

which yields
m
II A 11:= sup ~ I a, l~p:. (83.5)
m j ... t
~ let 111 E;;l
j.ol

It is clear that under (83.3)


II A 11:~p;.
But for the vector x1 the right-hand side of (83.5) takes on the value pf.
ThereforE'
II A 11: =Pi·
Thus
The spectral norm of an operator A is equal to its maximum singu-
lar value.
We recall that for a normal operator A its singular values coincide
with the moduli of its eigenvalues. Hence the spectral norm of a
unitary operator is equal to unity and the spectral norm of a non-
negative operator is equal to its largest eigenvalue.

Exercises
1. Prove that for any eigenvalue ). of an operator A
I ). I :so; inf II All 11 1/ll.
ll

2. Let q> (z) be any polynomial with nonnegative coefficients. Prove that
II q> (A) II :so; q> (II A II>·
3. Prove that II A II~ II A - 1 11-1 for any nonsingular operator A. When does
equality hold in the spectral norm case?
262 Metric Properties of an Operator [Ch. to

84. Matrix norms of an operator


The spectral norm is virtually the only subor-
dinate norm of an operator the calculation of which is not explicitly
connected with bases. If, however, in spaces, in which operators
are given, some bases are fixed, the possibility of introducing op-
erator norms is greatly extended.
So we again consider linear operators from a space X to a spaceY.
Suppose we fix a basis e1 , e 2 , • • • , em in X and a basis q1 , q2 , • • • , q,
in Y. Expanding a vector x E X with respect to a basis we get
(84.1)
Now it is possible to introduce a norm in X by formula (52.3), for
example, or in some other way in terms of the coefficients of expan-
sion (84.1). Similarly it is possible to introduce a norm in Y.
The most common are norms of the form (52.4). Therefore we shall
study operator norms subordinate to, and compatible with, those
norms. Moreover, it will be assumed that norms of the same type
are introduced in both X and Y. It is obvious that the corresponding
norms of an operator A must be somehow related to the elements all
of the matrix of the operator in the chosen basis.
We first establish expressions for operator norms subordinate to
the 1-norms and oo-norms of (52.4). We have
n m
II A 11 1 - sup II Ax 11 1 = sup ( ~ I ~ a 11 x 1 I)
((x((aE;;I ll:tll ,,..;; I a=l r-1
n m m n
~ sup CL ~ I a,J I I Xj I)~ l(:t((aE;;I
l(:t((aE;;I a=l 1=1
sup c~ I Xj I ~ I alj I)
i=l a=l
n n
~ ( max ~ I a 11 I) ( sup II x IIJ = max ~ I all I·
IE;;;,..;;m t-t ll:t(I,E;;I tE;;JE;;m f=l

We now show that for some vector x satisfying the condition II x 111 ~
~ 1, II Ax 111 coincides with the right-hand side of the relation ob-
tained.
Let the largest value at the right be reached when j = l. Then all
the inequalities become equations, for x = e~o for example. So
n
IIAII 1 = max ~I all I·
IE;;JE;;m •=I

Similarly for the other norm:


m
IIAII"' = sup II Axil..,= sup (max
lh·l(..,,.;;l (lx'j..,E;;I IE;;aE;;n
I:L
1=1
allx1 1)
84] Matrix norms of an operator 268

m m
~sup
l(x(( 00 E;;l
(max .L
lE;;iE;;n1=l
lallllx 1 1)~ sup
((xi( 00 E;;l
((max~
t.;;;t,..;;n;=t
la11 1)
m m
X ( max I x 1 I)) = ( max ~ I a11 I) ( sup II x lloo) = max ~ I all I·
lE;;JE;;m lE;;iE;;n i=l ((x(I 00 E;;I 1E;;tE;;n i=l

Suppose the largest value at the right is reached when i = l. We take


a vector x with coordinates x 1 = I all I / a 11 , if a 11 :/= 0, and with
x 1 = 1, if a 11 = 0. It is not hard to verify that for that vector all
the inequalities become equations. Hence
m
II A lloo = max 4 I a11 I·
t,.;.,.;n,=l

To find an operator norm subordinate to the 2-norms of (52.4) we


proceed as follows. We introduce in X and Y a scalar product in a
way similar to (32.1). Then the 2-norms of (52.4) will coincide with
the length of the vector. Therefore the subordinate norm is nothing
but the spectral norm of an operator corresponding to the given sca-
lar product. The bases for the chosen scalar products become ortho-
normal and therefore in these bases an adjoint operator will have
a corresponding adjoint matrix. If we let A qe denote the matrix
of an operator A, then it follows from the foregoing that
The operator norm subordinate to 2-norms is equal to the maximum
singular value of A qe·
The norms we have considered are some functions of the matrix
of an operator. Not only subordinate but also compatible norms
can we construct in this way. One of the most important compatible
norms is the so-called Euclidean norm. It will be designated II· liE·
If in the chosen bases an operator A has a matrix A qe with elements
a 1J> then by definition

The right-hand side of this is the norm in an n X m-dimensional


space of linear operators. That the first three properties of (83.1)
hold is therefore beyond doubt. Of great importance is the fact that
for a Euclidean norm the fourth property of (83. t) is also valid.
To prove this we use a Cauchy-Buniakowski-Schwarz inequality
of the type (27 .5).
Let us consider vector spaces X, Y and Z of dimensions m, n and p
respectively. Let A be an operator from X toY and B an operator
from Y to Z. By a 11 and bll we denote the elements of the matrices
264 Metric Properties of an Operator [Ch. 10

of the operators in the chosen bases. We have


p m n p m n
II BAllE=(~ ~I
1=11=1
L b,ltaltJn
11=1
112
~(~ ~ (~ I b,lt II ai<J 1)
f=1}=1 11=1
2 112
)

p m n n
~( L; ~ ( ~ (~
112
I b," 12 ) I a,J 12))
t=1J=1 lt=l t=l
p 11 n m
112
~ ~
= (( 1~1 I b,lt 12) (~ ~ I a,J 12)) =II B liE' II A liE·
11=1 l~t ;=t
In the general case a Euclidean norm is not subordinate. Its com-
patibility with 2-norms can be proved in the same way as the prop-
erty just considered.
A direct check makes it possible to establish important formulas
for the Euclidean norm. Namely,
II A liE = tr (A;eAq 11) = tr (A9eA;e)· (84.2)
We can now draw the following conclusions.
An adjoint matrix in orthonormal bases has a corresponding ad-
joint operator. We transform the chosen bases into orthonormal bases
if we introduce in X andY scalar products in a way similar to (32.1).
Since the trace of a matrix is equal to the sum of its eigenvalues, it
follows from (84.2) that
The square of a Euclidean norm of an operator is equal to the sum
of the squares of its singular values.
If scalar products are introduced in X andY, it is possible to speak
of unitary operators. It is for these unitary operators that it is easy
to show that
A Euclidean norm is not affected by the multiplication of an operator
by any unitary operators.
Indeed, as noted in the exercises to Section 78, singular Yalues re-
main unaffected by the multiplication by unitary operators, and the
Euclidean norm can be expressed only in terms of singular values.
In most applications connected with norms, not so much an ex-
plicit assignment of an operator norm is important as the fact that
properties (83.1) hold. An operator norm can therefore be defined
axiomatically in terms of its matrix. Choose in the spaces, in which
the operators are given, some bases, then each operator will have a
corresponding matrix. We assign to each matrix a number designat-
ed as II· II and suppose that conditions (83.1) hold as axioms. Anum-
ber II· II will be called a matri:c norm. If now each operator is assigned
the norm of its matrix, it is clear that this introduces a norm in
the space of the operators. Conditions (83.1) obviously hold for the
operators, too. The converse is also true. Given fixed bases, any op-
erator norm generates a matrix norm. These matrix norms will be
designated by similar symbols II· 11 2 , II· II""' etc. It is obvious that
85) Operator equations 265

we may also require axiomatically that the norm should be compat-


ible.
The above examples show that it is practically feasible to assign
an operator norm axiomatically in terms of a matrix norm. In what
follows, speaking of matrix and operator norms we shall always as-
sume that they are compatible and that conditions (83.1) hold.

Exercises
1. Prove that, given any norm, for a unit matrix
>
II E II t. (84.3)
2. Let~ •... , Am be the eigenvalues of a matrix A. Prove that
m
Inf II B-lAB liE=~ I All 12 •
B 11=1
Compare this equation with (81.1).

85. Operator equations


One of the most important problems of algebra
is that of solving linear algebraic equations. We have often met with
this problem in the present course. We now consider it from the view-
point of the theory of linear operators.
Given system (60.2) with elements from a field P of real or complex
numbers, take any m-dimensional space X and n-dimensional space Y
over the same field P and fix some bases in them. Then relations (60.2)
will be equivalent to a single matrix equation of the type (61.2) which
in turn is equivalent to an operator equation
Ax = y. (85.1)
Here A is an operator from X toY with the same matrix in the chosen
bases as that of system (60.2). Vectors x EX and y E Y have in the
chosen bases the coordinates (s 1 , • • • , Sm) and (TJ 1 , • • • , TJn) re-
spectively.
Thus instead of a system of linear algebraic equations we may
consider equations (85.1). The problem is to determine all vectors
x E X satisfying (85.1) for a given operator A and a given vector
y E Y. An equation of the form (85.1) is called an operator equation,
a vector y is a right-hand side and a vector x is a solution. Of course,
all the properties of a system of equations are automatically carried
over to operator equations and vice versa.
The Kronecker-Capelli theorem formulates a necessary and suf-
ficient condition for a system to be solved in terms of the rank of a
matrix. This is not very convenient since one is not allowed to notice
the deep connection existing between systems and equations of other
types.
266 ~lctric Properties of an Operator [Ch. 10

Let X and Y be unitary spaces. Then an operator A • is defined.


Equation (85.1) is called the basic nonhomogeneous equation, and the
equation
A*u = v
is the adjoint nonhomogeneous equation. If the right-hand sides are
zero, then the corresponding equations are called homogeneous. The
following statement is true:
Either the basic nonhomogeneous equation has a solution for any
right-hand side or the adjoint homogeneous equation has at least one
nonzero solution.
Indeed, let r denote the rank of an opera tor A. The operator A •
will have the same rank. Two cases· are possible: either r = n or
r < n. In the former case the range of A is of dimension n and hence
it coincides with Y. Therefore the basic nonhomogeneous equation
must have a solution for any right-hand side. In the same case the
nullity of the adjoint operator is equal to zero and therefore the kernel
has no nonzero solutions, i.e. the adjoint homogeneous equation has
no nonzero solutions. If r < n, then the range of A does not coincide
with Y and the basic nonhomogeneous equation cannot have a solu-
tion for any right-hand side. The kernel of the adjoint operator con-
sists not only of a zero vector and therefore the adjoint homogeneous
equation must have nonzero solutions.
The above statement is of particular importance when X and Y
coincide. Now the existence of a solution of the basic nonhomogeneous
equation for any right-hand side implies the nonsingularity of the
operator A. In this case therefore we have the so-called
Fredholm Alternative. Either the basic nonhomogeneous equation
always has a unique solution for any right-hand side or the adjoint homo-
geneous equation has at least one nonzero solution.
Fredholm Theorem. For a basic nonhomogeneous equation to be solv-
able, it is necessary and sufficient that its right-hand side should be
orthogonal to all solutions of the adjoint homogeneous equation.
Proof. Let N* denote the kernel of an operator A • and T the range
of A. If the basic nonhomogeneous equation is solvable, then its right-
hand side y E T. In view of (75.8) it follows that y.J...N*, i.e. that
(y, u) = 0 for every vector u satisfying A •u = 0. Now let (y, u) = 0
for the same vectors u. Then y j_ N* and by (75.8) y E T. But this
means that there is a vector x E X such that Ax = y, i.e. that the
basic nonhomogeneous equation is solvable.

Exercises
1. Prove that the equation. A •A .:z: = A •11 Ia solvable.
2. Prove that the equation (A •A )P.:z: = (A •A )qy is solvable for any poslUve
Integers p and q.
3". Give a geometrical interpretation of the Fredholm alternative and theorem.
86] Pseudosolutions and the pseudoinverse operator 267

86. Pseudosolutions and


the pseudoinverse operator
Prescribing arbitrarily an operator A and a
right-hand side y may result in equation (85.1) having no solution.
Obviously, this is only due to what exactly we mean by a solution of
an equation.
Take a vector x E X and consider a vector r = Ax - y called
the discrepancy of the vector x. For x to be a solution of (85.1) it is
necessary and sufficient that its discrepancy should be zero. In turn,
for the discrepancy to be zero it is necessary and sufficient that its
length should be zero. Thus all solutions of (85.1), and they alone,
satisfy the equation
I Ax- y pa = 0.
Since the zero value of the length of the discrepancy is the small-
est, the finding of solutions of equation (85.1) may be regarded as
the problem of finding such vectors x for which the following expres-
sion attains the smallest value:
<Do (x) = I Ax- y 11 • (86.f)
The right-hand side of the expression is called the functional of dis-
crepancy. Finding the vectors minimizing the functional of discrep-
ancy makes sense also when no solution of (85.1) exists. This justi-
fies the following definition:
A pseudosolution (or generalized solution) of equation (85.1) is any
vector x E X for which the functional of discrepancy attains its small-
est value. The shortest pseudosolution is called a normal pseudosolu-
tion.
We show that a normal pseudosolution always exists and is unique.
Fix in X and Y singular bases x 1 , • • • , Xm and y1 , • • • , Yn· Let

(86.2)

Considering relations (78.2) we find that


m n
Ax- y = ~ PJ.':lii.Yh- ~ PPYP"
11-t pz:t

It is assumed as before that the singular values p1 , ••• , p 1 are non-


zero and that the rest are zero. Since singular bases are orthonormal,
we have
I n
(])o(X)=~ IP11Cl11-Phl 2 + ~ l~pl 2 ·
h ~I P-1+ 1
268 Metric Properties of an Operator [Ch. 10

It is obvious that the smallest value of the functional of discrepancy


is attained on those vectors x whose last m - t coordinates all are
arbitrary and whose first t coordinates are defined by
a" =
P11/Pil· (86.3)
The normal pseudosolution will be as follows:
I

x 0 = £.J
"' b-xll.
Pll
(86.4)
ll=l

We recall that vectorsx 1+ 1 , ••• , Xm form the basis of the kernel N


of the operator A. Therefore the set of all pseudosolutions is a plane
in X whose direction subspace coincides with N and whose transla-
tion vector coincides with any pseudosolution. A normal pseudosolu-
tion is the only vector of that plane that is orthogonal to N.
Using relations (78.2) and (78.3) it is easy to show that pseudosolu-
tions, and they alone, satisfy
A* Ax = A *y. (86.5)
Indeed, write vectors x and y as expansions (86.2). We have
I I
A*Ax = ~ piallxlt, A*y = ~ pPpPxP.
lt=t J)=l

It follows that solutions of equation (86.5) are only those vectors x


whose first t coordinates all are calculated according to (86.3) and
whose last m- t coordinates are arbitrary.
Thus, if the solvability of (85.1) is not guaranteed, then we can always
replace the solution of the equation by the solution of (86.5). In addi-
tion, a minimization of the functional of discrepancy for (85.1) is en-
sured.
The inverse operator plays an important part in carrying out vari-
ous studies. However, it was defined only for the nonsingular opera-
tor and we have no analogue as yet for the singular operator and for
the operator from one space to another. This analogue can be con-
structed on the basis of pseudosolutions.
Suppose that A is an operator from X to Y. Then each vector
y E Y can be assigned a unique vector x 0 E X which is the normal
pseudosolution of (85.1). This correspondence defines some operator
A+ from Y to X called the pseudoinverse (or generalized inverse) of
A. So by definition
(86.6)
for any y E Y. It is clear that if the operator A is nonsingular, then
the pseudoinverse of A coincides with its inverse. We investigate the
properties of the pseudoinverse.
86] Pseudosolutions and the pseudoinverse operator 269

Suppose along with (86.6) we have u 0 = A+v for some vector


v E Y. Consider the vector a.y + ~v for any a. and ~- If we take it as
a right-hand side of (85.1), then the vector a.x 0 + ~u 0 will clearly
satisfy a corresponding equation of the type (86.5) and therefore it
will be a pseudosolution. Since x 0 and u 0 are orthogonal to the kernel
of A, so is the vector a.x 0 -+ ~u 0 • Hence it is the normal pseudosolu-
tion. The linearity of the pseudoinYerse operator is thus estab-
lished.
The properties of the pseudoinverse opera tor are easy to establish
if we consider its action on the vectors of singular bases. By (86.4)
we have
(86.7)
It follows that
The domain, kernel and range of the pseudoinverse operator and those
of the adjoint operator coincide.
Using (78.2), (78.3) and (86. 7) it is possible to obtain various rela-
tions connecting the operators A, A • and A+. We note some of them:
(1) (A*)+ = (A+)*,
(2) (A+)+= A,
(3) (AA +)* = AA +, (AA +) 2 = AA +,
(4) (A+A)* = A+A, (A+A)' = A+A,
(5) AA+A =A.
These relations can be proved according to the same scheme. As an
example, we therefore consider in more detail only the first and the
third.
Comparing (78.2) and (86.7) we take as an operator A the adjoint
operator A •. Since (78.3) holds for this operator, we have
k~t.
k>t.
Now, proceeding from (86.7), we apply a relation similar to (78.3)
to the operator (A+)*. Then
p~ 1y 11 , k~t,
(A+)*x~~. = { 0, k > t.
Thus the operators (A*)+ and (A+)* coincide on the basis x1 , • • •
• • • , Xm and therefore they are equal.
Taking into account (78.2) and (86. 7) we conclude that for the op-
erator AA+
k~t.
(86.8)
k>t.
270 Metric Properties of an Operator [Ch. 10

This means that AA + has an orthonormal system of eigenvectors


y 1 , • • • , Yn and real eigenvalues 1 and 0, i.e. that it is Hermitian.
This proves the first equation in the relations of group (3). The second
is obvious from (86.8).

Exercises
1. What is the :pseudoinverse of a zero operator?
2. Let X and Y be distinct spaces. Wnte the matrix of the pseudoinverse of
an operator in singular bases and compare it with (78.4).
3. Let U and V be unitary operators in X and Y respectively. Prove that
(VA U)+ = U*A+v•.
4. Prove that there are operators K in X and L in Y such that
A+= KA* = A*L.
Describe the action of the operators K and L.
5. Prove that the pseudoinverse of an operator is uniquely defined by the
conditions
AA+A =A,
A+= KA* = A*L.
6. Prove that all pseudosolutions, and they alone, are solutions of the equa-
tion
Ax= AA+y.
7. Give a geometrical interpretation of pseudosolutions.

87. Perturbation and nonsingularity


of an operator
We have repeatedly emphasized that small
changes in the basis, coordinate vectors, matrix elements and the
like may result in changes of many properties connected with the
concept of linear dependence. This notion plays a decisive role in
the entire theory of linear operators, so it is very important to study
the influence of small changes in operators themselves on their proper-
ties.
As an auxiliary tool in solving diverse questions one has not infre-
quently to use an operator almost equal to an identity operator. By
this we shall mean an operator in a space X of the form E +A,
where II A II < 1 for some norm.
If A is any eigenvalue of an operator A, then 1 +
A is an eigenvalue
of the operator E +A. Since I A I~ !I A II, by virtue of II A II< f
all eigenvalues of A are less than unity in absolute value. Hence all
eigenvalues of E +A are nonzero and the operator is nonsingular.
Thus if II A II< 1, there is an operator (E +
A)- 1 • lf, however, the
operator E + A is singular, then II A II~ 1 for any norm.
871 Perturbation and nonsingularity of an operator 27t

For any number a less than unity in absolute value we have the
limiting relation
(1 + at 1 = lim ap,
p-+oo

where
p
ap = ~ (-a) 11 •
11=0

We show that a similar relation holds for the operator (E + A )- 1


too, if II A II < 1. Consider a sequence {A P} of operators
p
AI,=~ (-A)".
11=0

It is easy to verify that


(E+A)Ap = E-(-A)P+t,
so
(87.1)
Formally this equation is true for p = -1 as well, if it is assumed
that A _1 = 0. Also we have
II(E+A)Ap-EII = II(Ap-(E+At 1)+A(Ap-(E+At1)11
~I 11Ap-(E+At 1 11-IIA11·11Ap-(E+At 1 II I
= (1-11 A II) II Ap-(E+ At 1 II·
Now, considering (87.1), we obtain for p = -1 an estimate of the
norm of the operator (E + A)- 1 , i.e.

II (E _L A)-I II~ II E II
~ 1-IIAII.
For any subordinate norm II E II = 1 and hence in this case

II (E + Att II~ 1-~1 A II • (87.2)

For p ~ 0 we obtain an estimate of the deviation of the operator Ap


from the operator (E + A)-1 • Namely,
- nA nP+I
II Ap-(E +A) 1II~ 1 _ 11 A II (87.3)

By virtue of the condition II A II< t this means that {A 71 } converges


to (E + A)- 1 • If Ap is assumed to be an approximation to (E + A)-1,
then formula (87 .3) gh·es an estimate of the accuracy of the approxi-
mation.
272 Metric Properties of an Operator [Ch. iO

Let A be any nonsingular operator. Consider an operator A +


eA,
where eA is an arbitrary operator. We shall call eA the perturbation
of the operator A, and A +
eA is a perturbed operator. We show under
what conditions on the value of the perturbation norm the perturbed
operator is nonsingular. We shall be concerned only with small
values of the perturbation norm.
The operator A is nonsingular and therefore there is an operator A -l.
Hence
A+ eA =A (E-r-A-teA)·
It follows that A + eA is nonsingular if and only if so is the oper-
ator E + A -leA" This condition clearly holds if
II A-teA II< 1
£A II < 1.
for some norm. Of course it holds if II A -l II II
Thus a perturbed operator is nonsingular for all perturbations satis-
fying
(87.4)
When an operator A is perturbed to an amount of eA, the inverse
operator A -l acquires a perturbation equal to (A +
eA)- 1 - A -l.
We denote by
.SA= IleA II .SA-t= II(A+eA)- 1 -A- 1 11 (87 . 5)
1
II A II ' II A- 11
the values of relative perturbations of A and A -l. When condition
(87 .4) holds, the operator E +
A -leA is nonsingular and therefore
A+eAt 1 -A-t = ((A+eAttA-E)A-t
= ((A-t (A+ eA)tt- E) A-t = ((E + A-teA)-t- E) A-t.
From formula (87 .3), for p = 0 we find that
II(A-l-e tt-A-tll~ IJA-leAII·IIA-liJ ~ UA-lii!IJeAII
' A ~ 1-IIA-1eA I! ~ i-IIA-1 Hl eAII •
Now, using symbols (87.5), we obtain the following estimate:

(87 .6)
where
(87. 7)
The number "A is called the condition number of the operator A.
Although it depends on the choice of norm, it can never be very small.
From
E = A- 1A
~i] Perturbation and nonsingularity of an operator 273

and (84.3) we conclude that


1~11 E!ll~ll A-• II·IIA II="A·
Formula (87.6) shows that a small relative perturbation of an op-
erator A results in a small relative perturbation of A -l only when the
condition number of A is not too large as compared with unity. This
number will occur in other problems too.
Suppose that given a nonsingular operator A we are to solve the
operator equation
Ax= y. (87 .8)
Consider the perturbed equation

(.4 + eA) z = y + e11 • (87.9)


If condition (87.4) holds, then the perturbed equation (87.9) and the
original equation (87.8) will have unique solutions and x. We eval-
uate their difference.
z
Along with (87 .5) and (87. 7) we introduce the corresponding sym-
bols for relative perturbations in x and y, i.e.
6 _ 11;-x II
X- llxll '
We have
x =A -ly, ; = (A + eA)- 1 (1/ + e11).
From this we find

z- x = ((E -i- A- eAt'-E) A-'y+ (E + A-•eAt' A-•ev


1

and further
11;-xii~II(E+A-•eAt'-EII·IIx II
+ II (E + A-•eA)- 1 II · II A-• II • II e11 II.
It is assumed that subordinate norm is used. Taking into account
estimates (87.2) and (87.3), as well as the inequality IIY II~ IIA II X
X II x 11. we get
- II A-1II· II eA 11·11 x II II A- 111-0 e" II
llx-xll~ t-UA- II·IIeAII +t-UA-1II·IIeAII
1

II A- 1II· II A II II eA II II A-111-11 An ..!!2.!


~ IIAII llxll+ IIYO llxll.
"'"""t-UA-111·11AII IleAl! 1-0A-10·0A110eAII
IIAII IIAU
274 Metric Properties of an Operator [Ch. 10

In symbols
6x~ 1 _""AAM (6A+ 6y). (87.10)
This formula again gives the value of the condition number and
again it is important from the viewpoint of stability that it should
TUJt be too large.

Exercises
t. Prove that a condition number expressed in terms
of fl{leetral norm is equal to the ratio of the maximum singular value to the
minimum singular value,
2. There are operators with the smallest condition. number. What are these-
operators if spectral norm is used?
3. Prove that multiplication of an operator by unitary operators leaves its
condition number expressed in. terms of spectral or Euclidean norm unchanged.
4. Prove that for any nonsingular operators A and B
IIB- 1 -A- 1 11 IIA-BII
II B- 1 II ~"A II A II •
5. What causes the large instability of the system ofiveetora described In
Section. 22? Evaluate the condition. number of the operator whose matrix coluJIIII&
coincide with the coordinates of£vectora (22. 7).

88. Stable solution of equations


Formula (87 .10) shows that for an operator
almost equal to a singular operator large perturbations are possible
in a solution even for small perturbations in the operator and the right-
hand side. It may seem that this is due only to the fact that it is not
always that a solution itself exists. However, the situation with
finding pseudosolutions is similar.
Indeed, let an operator be in a two-dimensional space. Suppose
that in some orthonormal basis a system of linear algebraic equations
of the following form corresponds to (85.1):
1·x1 +
O·x 2 = 1,
O·x1 +
O·x 2 = 1.
It is easy to find that the normal pseudosolution u 0 has the:following
coordinates:
uo = (1, 0).
It may well be that the perturbed equation should leadlin the same
basis to a system
1·x1 +
O·x 2 = 1,
O·x 1 +
e·x 2 = 1,
where the number e, although small, will nevertheless be~other than
zero. Now the normal pseudosolution u~8 > of the perturbed)quation
88) Stable ~olution or equation!~

has the coordinates


u~El = (1, e- 1).
For small e the vectors u 0 and u~El not only differ very much but are
even almost orthogonal.
If our equation has more than one pseudosolution, then in the
general case small perturbations in the operator and the right-hancl
side will always result in large perturbations in the normal pseudosolu·
tion. Nevertheless we show that despite the instability of many con-
cepts connected with operator equations a normal pseudosolution can
be stably determined.
Let A be an operator from X toY and suppose that equation (85.1)
is to be solved. Similarly to the functional of discrepancy we consider
the so-called regularizing functional
<D~ (x) =a I x 12 + I Ax- y 1 2
, (88.1)
where the number a~ 0. It is clear that for a = 0 the functional
coincides with the functional of discrepancy and attains its mini-
mum on the pseudosolutions of (85.1). We find on what vectors the
regularizing functional attains its minimum for a > 0. Using ex-
pansions (86.2) we find
I m n
2 2
<D~ (x) =~(a I a~al + I p,.~~~-~~~l )+o: ~ I o:~al + ~ lllp I'·
2
11=1 11=1+1 P-C+t
It follows that for the minimum to be attained it is necessary to take
the zero values of the last coordinates a 1+l, . . . , am and to miai-
mize for each k ~ t the expression
a I a,. 1 -r- I
2
p,.ali - ~~~ 12 •
This yields for k ~ t

Thus a mmtmum value of the regularizing functional (88.1) ia


attained for every o; > 0 on a unique vector

X
~
~i a+p2
= £.J Pll~ll x,,. (88.2)
11=1 "

A comparison of formulas (86.4) and (88.2) mal<es it possible to


establish some relations connecting x~ and x 0 • We have for p, a> 0
I ~ I~ P2 I ~ 12 I ~ 1 a + 2 I ~ 1 ap
2 2 2 2
~- - -
0~ - ..:....:.....:....-.-:-'--:-'-i:-:i-....:......
pl (a-t-p2)2 pl (a+p2)2
%76 Metric Properties of an Operalor [Ch. 10

hence
(88.3)
where
I
= ~ I ~11!
2
""2
., "'-.. Pt
11=1
We then find

x 0 -xa. =a ~ ~~~ x 11 ,
"'-.. Pll(a+p112 )
II= I
from which we conclude that
I Xo - Xa. I ~ ay, (88.4)
where
I

""2=
I "'~
LJ pt •
11=1 "
Consequently

Thus, for small values of a the vector Xa. may serve as an approxima-
tion to the normal pseudosolution x 0 •
We expand the vectors Xa. and x 0 with respect to singular bases in
a way similar to (86.2). A direct check easily shows that Xa. satisfies
(A*A+aE) xa. = A*y. (88.5)
For a > 0 the operator A* A + aE is positive definite and therefore
there is an operator (A *A + aE)- 1 , i.e.
Xa. =(A* A+ aEr 1 A*y. (88.6)
On Xa. the minimum value of functional (88.1) is attained and there-
fore <Da. (xa.) ~ $a. (x 0 ). Taking into account (88.3) and (88.4) yields
I Axa.- Y 1 2
~ I Axo- Y 12 + a (I Xo 12 -l Xa. 12)
~I Axo-Y l 2 +2a~ 2 • (88.7)
In addition (])a. (xa.) ~ (])a. (0), from which it follows that
.- IYI
I Xa.l ~---m-·
a
Together with (88.6) this means that, given any operator A and any
vector y, for a> 0
I (A*A+aE)- 1 A*y I~ ~~~~ . (88.8)
88] Stable solution of equations 277

In practice, when solving (85.1), the operator A and the right-hand


side y are usually given_ inexactly and one has in!_tead to consider
the perturbed operator A and the right-hand side y. If in X and Y
one uses the length of vectors as a norm, then the spectral norm of
the operators is subordinate to it. We shall therefore assume tha&
(88.9)
Determining an approximate solution ;a using the perturbed A
and y leads to the following equation:
(A*A+aE):ia. = A*y. (88.10)
From (88.5) and (88.10) we find
(A* A +aE)!<;a. -xa.)=A* (Axa.- y)- i• (Axa.- y)
=(A- A)• (Axa.- y) -A• ((A-A) Xa.-(Y- y)).
-
This means that the difference Xa. - Xa. is a solution of the equation
with ope!ator (A*A + aE) and the right-hand side of the form z =
= u + A*v, where
-
u =(A-· A)* (Ax a.-· y),
v = -((A- A) Xa.- (y- y)).
Therefore
:ia.-xa.=(A*A+aE)-tu+ (A*A+ a.Et 1A*v.
Now we evaluate the norms of both summands in this equation.
The eigenvalues of_th~ operator (X•X-t aE) are at least a. Hence
the eigenvalues of (A *A + aE)- 1 are at most a- 1 • For a positive
definite operator its spectral norm coincides with its maximum eigen-
value, i.e.
II (A*A+aE)-1 112~a-t.
Considering (88. 7) and (88.9) we have
II (A* A+ aEt 1u II~ II (A* A+ a.Er 1 112 II u II
- -
~ ~A II Ax a. - y II ~ e: ( II Axo- y 112 + 2a~2) t/2 •
To evaluate the second summand we use formulas (88.3), (88.8)
and (88.9). We find
II( A*A+ a E)-lA* v II::;:: llvll::;:::: 1 - -
""""al/2-.:::: al/2 (eA llxo II• eu)·
278 1\letric Properties of an Operator [Ch. iO

So.

11-;CI_ Xa. II~ :A (II Axo- y 11 2 -r 2a~ 2) 112 + ),2 (eA II Xo II+ ey).

The total error of the computed pseudosolution ;a. is


...,
I x1 - X a. II~ II x 0 - X a. II + II Xa. - Xa. II
i
~ ay + ---..:i....
a
(II Axo- y 11 2+ 2a2rj2) 1/2 + -.---
i
a 12
- -
(eA II x 0 II -1- ev). (88.11)

The right-hand side of this is independent of the perturbed A and


y. There il' therefore an a such that the right-hand side attains its
minimum. That nine of a will eno;ure almost the best approxima-
tion ;a. to the exart normal pseudosolution x 0 •
Suppose that and eA e,,
are values of an order of e and that e is
sufficiently small. If the original equation (85.1) has a solution, then
Ax,- y = 0. In this case the right-hand side of (88.11) is, according
to the nature of its dependence on a and e, a function of the form

For a = e2' 3 it takes on a value of an order of e213. If, however, the


original equation has no solution, then Ax 0 - y :f=. 0. Now the right-
hand side of (88. 11) is a function of the form
10 I'
a+a-+ al/2 •

For a = e111 it takes on a value of an order of e 111 •


Thus, if theJinput data in (85.1) are prescribed up to an order of e,
then the normal pseudosolution can be determined up to an order of
e1/3, if the original equation is solvable, and up to e111 oth~wise.
The parameter a ensuring the required approximation Xa. cannot
- -
be found only from the perturbed A and y. This is mainly due to the
fact that conditions (88.9) do not guarantee the continuity of the
normal pseudosolution in a given range of the operator and the right-
hand side. To determine the parameter a use is usually made of
additional information about the solution. In some problems no guar-
anteed closeness to the normal pseudosolution is required and it is
considered sufficient to determine stably a minimum of the functional
of discrepancy. In such problems it is a somewhat simpler matter to
determine a. Despite the importance of these questions we shall
not dwell on them, since they are beyond the scope of this book.
S9) PerLurbalion and eigenvalues 2iH

Exercises
1. Prove that 1J in estimate (88.3) is the norm of the
oormal solution of
A*A (A*A)ll 1.x = A*y.
2. Prove that y in estimate (88.4) is the norm of the normal solution of
(A *A)1.:z: =A *y.
3. Prove that the difference .:z:a. - .:z:~ satisfies
(A*A+aE) (A*A --~E) (.:z:a. -r~)=(~- a) A•y.
4. Compare (88.11) and (87.10). What can be said about estimate (88.11)
in the case of a nonsingular operator A?
5. To what accuracy can a normal pseudosolution be computed if A = 0?

89. Perturbation and eigenvalues


In the general case the perturbation of an op-
erator leads to changes in all of its eigenvalues and eigenvectors.
Since the study of this relation is very complicated, we restrict our-
selves to some illustrations. It is more convenient to describe this
problem in terms of the matrices of operators rather than operators
themselves.
Let B be matrix of a simple structure and H a matrix such that
H- 1BH = A, (89.1)
where A is a diagonal matrix of eigenvalues A.1 , A- 2 , • • • , Am· Con-
sider a perturbed matrix B +
e B and some of its eigenvalues A..
The matrix B +
e8 - A.E is singular and therefore so is the matrix
H- 1 (B-e 8 -A.E)H = (A-A.E)+H- 1esH·
Two cases are possible:
(1) A. = A. 1 for some i,
(2) A. :/= A. 1 for every i.
In the second case the matrix A- A.E is nonsingular, so
(A-A.E)+H- 1e 8 H = (A-A.E)(E+(A-A.Et 1 H- 1e8 H).
The matrix that is the second factor is singular. This means that
any norm of the matrix (A- A.E)-1 H- 1esH must at least be equal
to unity. In particular
II (A- A.E)- 1 H- 1eBH 11 2 ~1.
Hence
280 Metric Properties of an Operator [Ch. 1U

or
min IA,-AI~IIH- 1 112IIesii211H112·
IE;;fE;;n

In the first case this inequality also holds and therefore always
I A, - A I ~ v n II e B II a (89.2)
at least for one value of i. Here
Vn = II H-I ll2 II H lla
is the condition number of the matrix H expressed in terms of spec-
tral norm.
The relation obtained means that whatever the perturbation e B
of the matrix B is, for any eigenvalue A of the perturbed matrix
B + e B there is an eigenvalue A1 of B such that we have inequality
(89.2). Notice that we nowhere required that e B should be small. Rela-
tion (89.2) may be interpreted somewhat differently. Namely:
The eigenvalues of a perturbed matrix are in the region which is the
union of all disks with centres at A1 and of radius v n II e B 11 2 •
The columns of the matrix H are eigenvectors of the matrix B.
It follows from (89.2) therefore that as a general measure of sensi-
tivity of eigenvalues to the perturbation of a matrix we could ap-
parently take the condition number of the matrix H of eigenvectors
(rather than of the matrix B itselfl). The matrix H satisfying (89.1)
is not unique, since the eigenvectors are defined up to arbitrary fac-
tors. It will be assumed that H is always chosen so that its value v n
is a minimum one. We recall that in any case v H ~ 1.
If B is a normal matrix and, in particular, Hermitian or unitary,
then we may take H to be a unitary matrix. Then v H = 1 and con-
sequently
(89.3)
We consider in somewhat greater detail the case of a Hermitian ma-
trix B with Hermitian perturbation Es. Now we can show that:
Every disk with centre at A1 and of radius II e B 11 2 contains at least
one eigenvalue of a perturbed matrix.
Indeed, let us agree to consider a matrix B + e B as the "original''
matrix and the matrix B = (B + e s) - e B as a "perturbed'' matrix
with perturbation equal to -e B· Repeating the above calculations
word for word we obtain a formula similar to (89.3) but with the
eigenvalues of Band B + Es reversed. This means that for any t'igen-
value A1 of the "perturbed" matrix B there must be at least one eigt'n-
val ue A of the "original" matrix B + e B for which (89.3) holds.
If the eigenvalues of B are simple, then for a sufficiently small
perturbation e B all the disks become separated and then each disk
will contain one and only one eigenvalue of the perturbed matrix.
89] Perturbation and eigenvalues 28f

Formula (89.3) shows that the eigenvalues of normal matrices


possess a considerable stability to perturbation~. In the general prob-
lem of determining eigenvalues, however, this phenomenon is an
exception rather than a rule.
Consider as an example the case, an "extreme" case in a sense,
when the matrix B consists of a single Jordan canonical box. We
may agree to assume that all eigenvectors of such a matrix are col-
linear, that the matrix consisting of eigenvectors is singular and
that consequently its condition number equals "infinity". So, let B
be an m X m matrix of the form
A.o 1 0 )
A.o 1
A.o 1
B=
0
l
It is obvious that its characteristic polynomial is (A.- A. 0)m.
Now take such a matrix of perturbation e 11 in which only one ele-
ment, that in position (m, 1), is nonzero and equal to e. The charac-
teristic polynomial of the perturbed matrix is (A. - A. 0)m- e. The
eigenvalues of the perturbed matrix are therefore a distance of 1 e 11'm
away from those of the original matrix. If, for example, m = 20,
e = 10- 10 and A. 0 is of an order of unity, then any practical stability
is out of question.
It is important to understand that the instability of eigenvalues is
not necessarily due to the presence of multiple eigenvalues, nor is
it of course to the preEence of Jordan boxes. Let us consider a 20 X
X 20 matrix B:
20 20 0 )
19 20
18 20 •
B-
••
0 2 20
l 1J
It is a triangular matrix and therefore its eigenvalues are the diagonal
elements. On the face of it they are sufficiently well separated and
there seem to be no grounds to expect instability. But let us add per-
turbation £ to the zero element in position (20, 1). The free term of
the characteristic polynomial will change by an amount of 20 18 e.
Since a product of eigenvalues is equal to a free term, the eigenvalues
themselves must change very greatly.
282 :\lctric Properties of an Operator [Ch. 10

Still more complicated questions arise in the study of the stabil-


ity of eigenvectors. It is clear that if an eigenvalue A of a matrix B
is perturbation unstable, then the corresponding eigenvector x
clearly cannot be stable, since B, A and x are connected by the linear
relations Bx = 'Ax.
It is important to note, however, that even if the eigenvalues re-
main unaffected by perturbation, not only may the eigenvectors be
unstable, but their number may also change. For example, the first
of the matrices

(0201 0) 20 01 0)e
\0 0 1
0 '
(0 0 1
ha!> three linearly independent eigenvectors, and the second has two,
although their eigenvalue-: are equal. Theoretically this phenomenon
b due only to the presence of multiple eigenvalues in the original
matrix. But under conditions of approximate assignment of a matrix
it is hard, if not impossible, to decide which eigenvalues are to be con-
sidered multiple and which simple.
Questions concerning the stability of eigenvalues, eigenvectors
and root vectors are among the most complicated in the sections of
algebra !connected Jwi th computations.

1. Let B be a matrix of a simple structure but with


multiple eigenvalues. Prove that for any arbitrarily small e > 0 there is a
perturbation e B satisfying II e B II < e such that the matrix B + e B is no
longer of a simple structure.
2. Let a matrix B have mutually distinct eigenvalues and let d > 0 be the
smallest distance between them. Prove that there is a perturbation e B satis-
fying II e B 11 2 > d such that the matrix B + eJJ is not of a simple structure.
3. Now let B be a Hermitian matrix. Prove tbat if a Hermitian perturbation
e.11 satisfies the condition II e B II a < d/2, then the matrix B + e B has mutu-
ally distinct eigenvalues.
4. Finally, let B be a non-Hermitian matrix with mutually distinct eigen-
values. Prove that there is a number r satisfying 0 < r~ d such that the matrix
B + e B is of a simple structure provided 11 e B 11 2 ~ r.
5. Try to establish a more exact relation between the numbers rand d.
PART III

Bilinear Forms

CHAPTER 11

BUinear and Quadratic Forms


90. General properties of bilinear
and quadratic forms
Consider numerical functions <p (x, y) of two
independent vector variables x and y of some vector space Kn over a
number field P, taking on values from P. A function <p (x, y) is said
to be a bilinear form if for any vectors x, y, z E K,. and any number
a.EP
<p (x+ z, y) = <p (x, y) + <p (z, y), <p (ax, y) = a.<p (x, y), (
90 1
)
<p (x, y + z) = <p (x, y) + <p (x, z), <p (x, a.y) = a.<p (x, y). ·
The first two of the relations (90.1) imply the linearity of <p (x, y)
in the first independent variable, the last two imply the linearity in
the second independent variable.
It is easy to verify that a sum of two bilinear forms, as well as a
product of a bilinear form by a number, is again a bilinear form.
Therefore the set of all bilinear forms over the same space Kn as-
suming values from the same number field P is a vector space. The
"zero" of the given space is a bilinear form 0 (x, y) such that 0 (x, y) =
= 0 for all x and y. The form 0 (x, y) is called a zero bilinear form.
We have already encountered a function of this form. Comparing
(27.1) and (90.1) it is easy to notice that a scalar product in a Euclid-
ean space is a bilinear form. Recalling the important role played
by the scalar product in the study of Euclidean spaces and of linear
operators in them it may be suggested that a study of bilinear forms
may turn out to be useful.
A special place among the bilinear forms is occupied by symmetric
and skew-symmetric bilinear forms. A bilinear form <p (x, y) is said
to be symmetric if for any vectors x, y E Kn
<p (x, y) = <p (y, x).
If, however, for any x, y E Kn
<p (x, y) = -<p (y, x),
then the bilinear form is said to be skew-symmetric.
Any skew-symmetric bilinear form q> (x, y) assumes a zero value
284 Bilinear ao.d Quadratic Forms [Ch. H

when its independent variables coincide. Indeed, since q> (x, x) =


= -q> (x, x), we have q> (x, x) = 0. Somewhat unexpected is an-
other fact connected with the values of a symmetric bilinear form
when its independent variables coincide. Namely, any symmetric
bilinear form q> (x, y) is uniquely defined by its values when its inde-
pendent variables coincide. Indeed, letx andy be any vectors from Kn.
Taking into account the symmetry of q> (x, y), we have
<p (x + y, x + y) = <p (x, .:z;) + q> (y, y) + 2q> (x, y), (90.2)
whence
q> (x, y)= ~ {<p (x + y, x-,- y)- q> (x, x)- cp (y, y)}. (90.3)
This formula proves the ,·alidity of the above a~sertion, since the
right-hand side of the relation is a symmetric bilinear form.
A bilinear form is uniquely decomposable into a sum of a sym-
metric and a skew-symmetric bilinear form. In explicit form
<p(x, y)=+{<p(x, y)+q;(y, x)}+! {q>(x, y)-cp(y, x)}. (90.4)
It is easy to verify that the first two terms at the right yield a sym-
metric bilinear form and the last two yield a skew-symmetric form.
Assuming the existence of some other decomposition we shall have
to conclude, on substituting equal independent variables, that the
symmetric part of the decomposition is uniquely defined and that
hence so is the decomposition as a whole.
If the bilinear form is not symmetric, then instead of (90.2) we
shall have
<p (x + y, x + y) = <p (x, x) + q> (y, y) + <p (x, y) + <p (y, x).
Consequently
! {<p (x, y) + <p (y, x)}
1
=T{<p(x+y, x+y)-<p(x, x)-cp(y, y)}. (90.5)
Comparing this relation with (90.3) we conclude that for a nonsym-
metric bilinear form its symmetric part is uniquely defined by the
values of the form when its independent variables coincide.
Along with bilinear forms we shall consider the so-called quadratic
forms. Let <p (x, y) be a bilinear form in a space Kn. A quadratic
form is a numerical function q; (x, x) of a single independent vector
variable x E K, obtained from q> (x, y) by replacing the vector y
with x.
In general it is impossible to reconstruct uniquely from a quadrat-
ic form the bilinear form that has generated it. But, as formula (90.3)
implies, there is one and only one symmetric bilinear form from which
the original quadratic form can be obtained. That bilinear form is
901 General properties 285

called polar relative to a given quadratic form. The set of all bilinear
forms generating the same quadratic form can be obtained by adding
the polar bilin<>ar form and an arbitrary skew-symmetric form. In
using bilinear forms for the study of the properties of quadratic forms
it suffices therefore to consider only symmetric bilinear forms.
The impossibility of reconstructing a bilinear form from a qua-
dratic form is explained by the fact that the quadratic form gives
no information about the skew-symmetric part of any bilinear form.
Lemma 90. t. Skew-symmetric bilinear forms, and these forms alone,
assume zero values for all coinciding independent variables.
Proof. We have already noted that if <p (x, y) is skew-symmetric,
then <p (x, x) = 0 for every x. If, however, <p (x, x) = 0 for every x,
then from (90.5) it follows that <p (x, y) + <p (y, x) = 0 for all
vectors x and y, i.e. the bilinear form <p (x, y) is skew-symmetric.
A comparison of the properties of a scalar product and relations
(90.1) shows that in a unitary space strictly speaking a scalar product
is not a bilinear form. In a complex space, closely related to a scalar
product are Hermitian bilinear forms. A numerical function <p (x, y)
is said to be a Hermitian bilinear form if for any vectors x, y, z E Kn
and any number a from the complex field P
<p (x -r z, y) = <p (x, y) + <p (z, y), <p (ax, y) = a<p (x, y),
<p (x, y + z) = <p (x, y) + <p (x, z), <p (x, ay) = a<p (x, y).
Here the bar stands for complex conjugation.
Again a sum of two Hermitian bilinear forms, as well as a product
of a Hermitian bilinear form by a number, is a Hermitian bilinear
form. The set of all Hermitian bilinear forms over the complex space
assuming complex values is therefore a complex vector space.
A Hermitian bilinear form is said to be Hermitian-symmetric if for
any vectors x, y E Kn
<p (x, y) = <p (y, x).
If for any x, y E Kn
<p(x, Y)= -<p(y, x),
then the form is called skew-Hermitian. On coinciding vectors the
skew-Hermitian form assumes pure imaginary values and the Her-
mitian-symmetric form assumes real values. Now any Hermitian
bilinear form is uniquely defined by its values when its independent
variables coincide. But instead of (90.3) the following relation is
true:
1
cp(x, y)=T{<p(x+y, x+y)-<p(x-y, x-y)
+ i<p (x + iy, x + iy)- i<p (x- iy, x- iy)}. (90.6)
From this it follows in particular that
286 Bilinear and Quadratic Forms [Ch 1 t

Of the Hermttian bilinear forms only the zero form assumes zero values
when all its independent var1ables coincide.
In this case, too, a Hermitian bilinear form can be uniquely re-
presented as a sum of a Hermitian-symmetric and a skew-Hermitian
form, with
1 1
q> (x, y) = 2 {q> (x, y) -4- q> (y, x)} + 2 {q> (x, y)- q> (y, x)}. (90. 7)
The proofs of the facts for Hermitian forms are much the same as the
corresponding proofs for bilinear forms.
A quadratic Hermitian form is a numerical function q> (x, x) of a
single independent vector variable x E Kn obtained from a Hermitian
bilinear function q> (x, y) by replacing the vector y with x. Unlike
quadratic forms, a Hermitian quadratic form allows a unique recon-
struction of the Hermitian bilinear form that generates it. The re-
construction is carried out according to formula (90.6), and the cor-
responding bilinear form is also called polar relative to the original
quadratic form.
The possibility of reconstructing uniquely a Hermitian bilinear form
from the Hermitian quadratic form generated by it is due to a close
relation of Hermitian-symmetric to skew-Hermitian bilinear forms.
Lemma 90.2. If q> (x, y) is a Hermitian-symmetric (skew-Hermiti-
an) bilinear form, then 'ljl (x, y) = i<p (x, y) is a skew-Ilermitian (Her-
mitian-symmetric) bilinear form.
Proof. Suppose, for example, q> (x, y) is Hermitian-symmetric.
Then for all vectors x and y we have
'ljl (x, y) = i<p (x, y)= q> (tx, y) = q> (y, ix) = - i<p (y, x) = - \j; (y, x),
i.e. 'ljJ (x, y) is skew-Hermitian. The case of a skew-Hermitian form
q> (x, y) can be considered in a similar way.
In what follows we shall more often be concerned with Hermitian
quadratic forms generated by Hermitian-symmetric bilinear forms.
Lemma 90.3. Of the Hermitian bilinear forms only symmetric forms
generate real Hermitian quadratic forms.
Proof. As already noted earlier, Hermitian-symmetric forms as-
sume real values when their independent variables coincide. Suppose
now that a Hermitian quadratic form q> (x, x) assume only real val-
ues. According to (90.6), for a polar bilinear form q> (x, y) we have
1
cp (y, x) = 4{q> (Y+ x, y+ x)- q> (y-x, y-x)
+i<p(y+ix, y+ix)-i<p(y-ix, y-ix)}
=-}{q>(x+y, x+y)-<p(x-y, x-y)+i<p(x-iy, x-iy)

-i<p (x + iy, x+ iy)} =-} {q> (x + y, x+ y) -q> (x- y, x- y)


+i«r(x+iy, x+ iy)-i<p(x-iy, x-iy)}= q>(.r, y).
901 General properties 287

Corollary. Of the llermitian hilmear forms only skew-symmetric


forms generate pure imaginary Hermitian quadratic forms.
Corollary. No Hermit1an nonsymmetnc bilinear form can generate
a real Hermitzan quadratic form.
As follows from the properties of linearity of bilinear and Hernli-
tian bilinear forms in each independent variable, <p (0, 0) = 0 for
any quadratic form <p (x, x). In the general case, however, there may
also be nonzero vectors x such that <p (x, x) = 0. These vectors will
be called isotropic. The concept of isotropy is connected only with
quadratic form. Therefore vectors isotropic for one quadratic form
may be nonisotropic for another and Yice versa. In particular, Lem-
ma 90.1 implies that for a quadratic form generated by a skew-sym-
metric bilinear form all vectors in Kn, except the zero vector, are
isotropic.
Of the ordinary and Hermitian real forms the most widely used
are the forms that assume values of the same sign for all independent
vector variables. A real quadratic form <p (x, x) is said to be positive
definite if <p (x, x) > 0 for every x :f= 0. The form is said to be non-
negative if <p (x, x) ~ 0 for every x :/= 0. Similar definitions can be
obtained for nonpositive and negative definite quadratic forms.
It is only positive definite and negative definite quadratic forms
as a rule that are called forms of constant signs. But sometimes this
term is also applied to nonnegative and nonpositive quadratic forms.
To avoid confusion, whenever necessary positive definite and nega-
tive definite quadratic forms will be called forms strictly of constant
signs.
If a quadratic form is a form of constant signs, then the ordinary
or Hermitian bilinear form generating it will also be said to be posi-
tive definite, nonnegative and so on.
If a real quadratic form <p (x, x) is strictly of constant signs, then
it has no isotropic vectors. In the case of real bilinear and Hermi-
tian-symmetric bilinear forms <p (x, y) the corresponding quadratic
forms will be real, and the converse is true for them. That is, we have
Theorem 90.t. Let a quadratic form <p (x, x) be generated by a real
bilinear or Hermitian-symmetric bilinear form <p (x, y). If <p (x, x)
has no isotropic vectors, then it is strictly of constant signs.
Proof. As already noted, the quadratic formlq> (x, x) is real. In
both cases it assumes on collinear vector values of the same sign.
Suppose <p (x, x) is not strictly of constant signs. Then we can find
linearly independent vectors u and v such that <p (u, u) > 0 and
q> (v, v) < 0. For any real number a
<p (u + av, u + av) = <p (u, u) +a (<p (u, v) + <p (v, u))
+ a 2<p (v, v). (90.8)
The right-hand side of this is a second-degree polynomial in a.
It has real coefficients, which follows from the reality: of <p (x, x)
288 Bilinear and Quadratic Forms [Ch. 11

and Lemma 90.3. Since <p (u, u) and <p (v, v) have opposite signs,
polynomial (90.8) will have two real roots. Let a 0 be one of them.
This means that <p (u + a 0v, u + a 0 v) = 0. However, the vector
u --'- a 0 v is nonzero by virtue of linear independence of u and v, so
the vanishing on it of the quadratic form is impossible under the
hypothesis of the theorem. This contradiction completes the proof.
It is no chance that we restricted our discussion in Theorem 90.1
to quadratic forms generated only by the real bilinear and the Her-
mitian-symmetric bilinear form. No other bilinear form can lead
to a real quadratic form. Actually it only remains to consider the
bilinear form in the complex space. But it is impossible for such a bi-
linear form to generate a real quadratic form not identically zero.
If for some vector u the quadratic form takes on a nonzero real value
<p (u. u), then <p (au, au) = a 2 <p (u, u) will be a complex number
for any complex a with a nonzero real and a pure imaginary part.
So
For real quadratic forms to have no isotropic vectors it is necessary
and sufficient that they should be strictly of constant signs.
A complex bilinear form always generates a quadratic form with
isotropic vectors provided it is defined on a vector space of dimen-
sion greater than unity. Indeed, assuming that this is not the case
we can always find linearly independent vectors u and v such that
<p (u, u) =I= 0 and q> (v, v) =I= 0. But according to (90.8) the vector
u , av will be isotropic under a suitable choice of complex number a.
A Hermitian bilinear complex form can generate a quadratic form
having no isotropic vectors. It follows from our studies that
Fur a quadratic form generated by a Hermitian bilinear form to have
no isotropic vectors it is sufficient that the real (or imaginary) part of
the quadratic form should be strictly of constant signs.

Exercises
1. Prove that given any bilinear form cp (.:z:, y), equa-
tions cp (0, y) = cp (.:z:, 0) = 0 for any .:z:, y E Kn.
2. Find the dimension and a basis of a vector space of bilinear forms.
3. Prove that sets of symmetric and skew-symmetric bilinear forms con-
stitute subspaces in the vector srace of all bilinear forms.
4. Prove that the space of al bilinear forms is a direct sum of subspaces of
symmetric and skew-symmetric bilinear forms.
5. Prove that the set of all quadratic forms constitutes a vector space. Find
its dimension and a basis.
6. Are the following sets of quadratic forms linear subspaces:
the quadratic forms of constant signa,
the quadratic forms assuming real values,
the quadratic forms having no isotropic vectors,
the quadratic forms for which all vectors of a given set are isotropic?
7. Prove that, given any quadratic form in a normed space, there is anum-
ber a such that for every .:z:
I cp (.:z:, .:z:) I ~ cz II x 111 •
91) The matrices of bilinear and quadratic forms 289

8. Let q> (.:z:, r) be a quadratic form strictly of constant signs and 1jJ (.:z:, .:z:)
an arbitrary quadratic form. Prove that there is a number ~such that for every .:z:
I 'iJ (.:z:, .r) I ~ ~41' (.:z:, .:z:).
9. Prove tnat a quadratic form is not strictly of constant signs if and only
if the set of isotropic vectors and the zero vector form a linear subspace.
10. Consider Exercises 1 to 9 for Hermitian bilinear and quadratic forms.
Do all the assertions remain valid?
11. Suppose that in a complex space Kn some subspace L consists only of
isotropic vectors of a Hermitian bilinear form q> (.:z:, y) and a zero vector. Prove
that q> (u, v) = 0 for any vectors u, v E L.

91. The matrices of bilinear


and quadratic forms
We investigat~ a bilinear form <p (x, y) in a
space Kn. Choose in Kn two fixed bases, e1, e2 , • • • , e,. and q., q2 , • • •
• • . , qn, and let
n
x=~ ~,e 1 ,
i=t
Then by property (90.1) we have
n n n n
<p (x, y) = <p (~
1 se 1 1, ~
1 1') 1q1) = ~~ ;f1 <p (e 1 .~q 1 ) s1TJ 1. (91.1)

Denote as before by Xe and Yq n X 1 matrices made up of the coor-


dinates of vectors x and y in the corresponding bases and by G eq an
n X n matrix with elements g~~R.' = <p (e, q1). Relation (91.1) im-
plies that
(91.2)
Thus, given fixed bases in Kn, the bilinear form can be represented
in matrix form (91.2).
The matrix Geq is called the matrix of the bilinear form and is
uniquely defined given fixed bases. Assuming that for <p (x, y) there is
another similar representation with some matrix F eq besides (91.2)
and taking x = e1 and y = q/t we at once get Nr
= <p (e,. q1), i.e.
F ex= Geq·
Note that the right-hand side of (91.2) defines some bilinear form
whatever the matrix Geq· The validity of (90.1) is immediate from
the corresponding properties of matrix operations. Thus. given fixed
bases in Kn there is a 1-1 correspondence between bilinear forms and
quadratic matrices.
Changing the bases in Kn affects the matrix of the bilinear form,
of course. Let P be the coordinate transformation matrix for a change
from e1 , e 2 , • • • , en to / 1 , / 2 , • • • , In and Q the coordinate trans-
formation matrix for a change from q1 , q2 , • • • , qn to t 1 , t 2 , • • • , tn.
290 Bilinear and Quadratic Forms [Ch. 11

By (63.3)
X 11 = Px 1, Yq = Qy, (91.3)
and therefore it follows from (91.2) that
<p (x, y) = x;G,qyq = x/P'GeqQy,.
But on the other hand
<p (x, y) =- xjG 11 y 1•
Consequently
(91.4)
Since P and Q are nonsingular, in accordance with the terminology
introduced in Section 64 we shall call matrices G11 and G,q equiv-
alent. As shown earlier, equivalent matrices of the same size, and
only such matrice~>, have the same rank. This means that the rank of
the matrix of a bilinear form is independent of the choice of bases
and is a characteristic of the form itself. We shall call it the rank
of the bilinear form. A bilinear form is said to be nonsingular if so is
its matrix. A characteristic of a bilinear form is also the difference
between the dimenRion of the space Kn and the rank of the form. We
shall call it the nullity of the bilinear form.
It follows from the results of Section 64 that all matrices of the
same rank are equivalent to a diagonal matrix with elements 0 and 1.
In terms of bilinear forms this means that for an arbitrary form of
rank r we can always find bases / 1 , / 2 , • • • , In and tit t 2 , • • • , tnr
such that the form is of the simplest type. That is, if
n n
X=~ 'f,/,
1=1
y= 2:
1=1
v 1t 1,

then
r
<p (x, y) = ~ 'f 1v 1•
1
A separate choire of bases for each variable of the bilinear form is
made fairly rarely. It is more usual to choose a common basis. Let
e10 e 2 , • • • , en be some basis of Kn and
n n
X=~ ~ 1 e 1 , Y= ~ y1e1•
I= I J=l
In this case, as in (91.1), we obtain the following representation of a
bilinear form:
n n

<p (x, y)-= ~~ ~~ <p (e,, eJ) SIYJ


91] The matrices of bilinear and quadratic forms 291

or in matrix notation
<p (x, y) = x;G,y,. (91.5)
Jlere Ge is a matrix with elements glj' = <p (e., e1). It is Ge that
will throughout be called the matrix of a bilinear form. If again P
is a coordinate transformation matrix for a change from e1 , e 2 , • • • , en
to / 1 , / 2 , • • • , fn, then according to (91.4) matrices G. and G1 of the
same bilinear form <p (x, y) will be related by
G1 - P'G,P. (91.6)
The matrices Ge and G1 related by (91.6), with P nonsingular, are
called congruent. Congruent matrices are always equivalent. In gen-
eral the converse is not true, of course.
What was said about bilinear forms carries over with slight
changes to Hermitian bilinear forms. Every Hermitian form can be
represented uniquely in matrix notation
<p (x, y) = x;GeqYq•
with el' e 2 , ••• , en and q1 , q2 , ••• , qn fixed. Under a change to Itt
/ 1, ••• , In and t 11 t 2 , ••• , tn, instead of (91.4) we have
G11 = P'G,qQ.
If the independent variables of the Hermitian bilinear form are given
in a single basis, then the matrix notation of the form is similar
to (91.6). That is,
(91.7)
Under a change to a new basis the matrices of the form are related by
G1 = P'G,P
and we shall say that they are Hermitian-congruent.
Now we can establish a relation between the type of a bilinear form
and the type of its matrix. If the form is symmetric, then for any
basis e1 , e 2 , • • • , en
IW = <p (e 1, e1) = <p (e 1, e1) = ~),
i.e. G, = G; and the matrix G, of the form <p (x, y) is symmetric.
If, however, the form is skew-symmetric, then
~) = <p (e 1, e1) = - <p (e 1, e1) = - ~~>,
i.e. G, = -G;. In this case the matrix G, is also called skew-sym-
metric.
The converse is also true. If in some basis the matrix of a form is
symmetric (skew-symmetric), then so is the bilinear form generating
292 Bilinear and Quadratic Forms [Ch. 11

it. Let Ge = G;, then


q> (y, x) = y;GeXe -= (y;Gexe)' -= x;G;ye = x;GeYe = q> (x, y).
If, however, G; = -G 11 , then
q> (y, X)= y;G.x., = (y;G,xe)' .: x;G;ye = - x;GeYe = - q> (x, y).
Similar assertions hold also for the relation of the Hermitian bi-
linear form to its matrix. If the form is Hermitian-symmetric, then
t.~>=cp(e,, ej)=cp(e,, e,)=gw.
i.e. G.. =a: and the matrix Ge of the form q> (x, y) is Hetmitian.
If the form is skew-Hermitian, Lhen
g1j>=cp(e 1, e1)= -<p(e1 , e1)= -g~~>,
i.e. G. = -G:. In this case Ge is said to be skew-Hermitian.
The converse statements are also true. Let Ge = Then for the a:.
generating Hermitian bilinear form we have

q> (y, x) = y;Gei'. = (y;GeXe)' = x;G;y. = x;G:ie = x;G.Y. = q> (x, y).
For the case Ge = -G: we find
q> (y, x) =y;Geie = (y;Gexe)' = x;G;ye =x;G:y.= -x;GeYe= -cp (x, y).
The matrix of a zero bilinear form consists only of zero elements,
i.e. is a zero matrix. It is the only matrix that is simultaneously sym-
metric and skew-symmetric, as is the zero form.
We have already noted that there is a very close connection between
symmetric bilinear and quadratic forms. It is especially obvious on
the rnatrix level. For a bilinear form q> (x, y) th0 matrix relation
(91.5) holds. For the corresponding quadratic fl'lmt we have
q>:(x, x) = x;&11 X 11 • (\J1.8)
For a fixed basis e1 , e 2 , • • • , en, given any matrix G11 , (91.8) defines
some quadratic form. The matrix Ge in (91.8) is now called not the
matrix of a bilinear form but the matrix of a quadratic form.
While for bilinear forms there is a 1-1 correspondence between the
forms and their matrices given a fixed basis in Kn, there is no longer
such a correspondence now. Every quadratic form can be given by
the entire set of its matrices. This set contains only one symmetric
matrix and the difference between any two matrices of a given set is
a skew-symmetric matrix.
Thus any ordinary quadratic form can always be given by a sym-
metric matrix. Changing to a different basis affects the matrices of the
quadratic form according to (91.6). We therefore conclude again that
9t] The matrices of bilinear and quadratic forms 293

problems of investigating symmetric bilinear and quadratic forms


are closely related. For Hermitian quadratic forms this is no longer
the case, since there is a 1-1 correspondence between them and Her-
mitian bilinear forms and there is a 1-1 correspondence between their
matrices.
As with bilinear forms, the rank of a quadratic form is the rank of
its matrix in any basis. If the matrix of a quadratic form is nonsin-
gular, then the quadratic form is also called nonsingular.
Essentially the study of bilinear forms is the study of their ma-
trices in different bases or equivalently the study of a class of congruent
matrices. All our immediate studies therefore will be concerned with
the investigation of classes of congruent and Hermitian-congruent
matrices.
A number of properties for such classes follow at once from Lhe pre-
ceding results. Thus a matrix congruent with a symmetric (skew-sym-
metric) matrix will necessarily be symmetric (skew-symmetric). In
particular, symmetric is a matrix congruent with a diagonal matrix.
From this we ronclude that a nonzero symmetric matrix is ne,·er
congruent with a skew-symmetric matrix although it may be equiv-
alent to that matrix, and that a nonzero skew-symmetric matrix
can never be congruent with a diagonal matrix. A matrix Hermitian-
congruent with a Hermitian (skew-Hermitian) matrix is necessarily
Hermitian (skew-Hermitian). Of the diagonal matrices it is only the
matrix with real (pure imaginary) elements that can be Hermitian
(skew-Hermitian).
In accordance with decompositions (90.4) and (90.7) of bilinear and
Hermitian bilinear forms we obtain decompositions of an arbitrary
matrix as a sum of a symmetric and a skew-symmetric matrix as
well as that of a Hermitian and a skew-Hermitian matrix. These
decompositions can be written out in explicit form:
A=! (A+A')+! (A-A'),

A={(A+A*)+! (A-A*).
If A is the matrix of a bilinear form, then the first terms of the right-
hand sides are the matrices of the symmetric parts of the bilinear form
and the second terms are the matrices of the skew-symmetric parts
of the same form.
We shall often carry over to matrices without comment the ter-
minology introduced for bilinear and quadralicforms. For example, we
shall call a matrix positive defimte, meaning by this that it is a matrix
of a positive definite form and so on.
One of the major problems connected with the bilinear form is that
of determining the simplest form its matrix can be reduced to by
changing the basis and finding the appropriate basis. This problem
294 Bilinear and Quadratic Forms [Ch. 11

will be called the problem of transforming a bilinear form or the


problem of reducing it to the simplest form.
In matrix interpretation the transformation problem can be stated
as follows:
Given a matrix A find a nonsingular matrix P such that the matrtx
C = P'AP (91.9)
congruent with A has the simplest form.
Essentially this results from factoring the matrix, since it follows
from (91.9) that
A =- (P- 1\'CP- 1 •
For Hermitian bilinear forms, of course, instead of (91.9) we shall
consider the transformation
C =- P'AP. (91.10)
Computationally it is important that the matrix P in (91.9) and
(91.10) should not be very complicated. That is because in finding
new coordinates of vectors in terms of the old coordinates according
to (63.3) one has to solve a system of linear algebraic equations with
a matrix P and it is necessary that that solution should be carried
out sufficiently fast. In some cases it is more convenient to seek the
matrix P- 1 instead of P.
Some other forms of notation for bilinear and quadratic forms may
be used besides those considered above. Sometimes we shall give
them in explicit form:
n n n n
<D=~ 1 ~ 1 a11 x 1y1, F= ~~~~ a11 x 1x 1• (91.11)
This notation can be simplified. For example, let the space be real.
Then so are both the bilinear form and the matrix A of coefficients a 11 •
We introduce a space Rn whose elements are the column vectors
X~= (xt, Xn)'.
X2, • • • • Y = (Yt• Y21 · · · • Yn>'
and suppose that the scalar product is introduced as a sum of pairwise
products of coordinates. Now we can write:
<D = (Ax, y), F = (Ax, x). (91.12)
For Hermitian bilinear forms written as
n
~~
n

<D= ~~ ~~ a11 x 1y1,


-
n
~~
n
-
F = ~~ f-;;! 1 a11 x 1x 1
we again have (91.12) if of course the scalar product is introduced as
a sum of the products of the coordinates of the first vector hy the
complex conjugate roordinates of the second vector.
92) Reduction to canonical form 295

Fxerclses
1. Prove that the determinant of a Hermitian matrix
is a real number.
2. What kind of number is the determinant of a skew-Hermitian matrix?
3. Prove that the rank of a skew-symmetric matrix is an even number.
4. Bilinear forms q> (.:z:, y) and q> (y, .:z:) are in general different. What can be
said about their matrices?
5. Prove that the rank of a sum of bilinear forms does not exceed the sum
of the ranks of the summands.
6. Prove that every bilinear form of rank r can be represented as a sum of r
bilinear forms of rank 1.
7. Prove that every bilinear form q> (.:z:, y) of rank 1 can be represented as
q1 (.:z:, y) = q1 (.:z:, a) •ql (b, y)
for some vectors a and b. Is this representation unique?

92. Reduction to canonical form


Before proceeding to the study of various areas
of application of bilinear and quadratic forms we consider a general
method of congruence and Hermitian congruence transformation of
matrices to a simple form.
Given a square n X n matrix A, find a nonsingular matrix P such
that a matrix C = P'AP has a sufficiently simple form. Under a
Hermitian congruence transformation, it is the matrix C = P'AP
that must have a simple form. We shall now describe a general trans-
formation method suitable for all matrices A. Differences between
the congruence and Hermitian congruence transformations are in-
significant. To be definite, we shall therefore assume that it is the con-
gruence transformation of the matrix that holds.
The method consists in constructing a sequence of matrices A 0 =
=A, A 1 , A 2 , . • • , A,, where each subsequent matrix is congruent
with the preceding matrix, i.e.
A1, -t = P/.+ 1 A~tPh ·t
for some matrix Pll+l· Since the congruence relation is transitive.
the last matrix, A, will be congruent with the original matrix A-
The principle of constructing a sequence of matrices All relies on ob,
taining in the matrix All+ 1 more zero elements for every k than there
are in All. Moreover, each time we compute the matrix Pll+ 1 from A 11
we shall require that not only should there appear new zero elements
in All+ 1 but also that there should remain all zero elements obtained
at all the preceding steps.
The transformation of a matrix All into A~t+l will be called a basic
step of the method. Every basic step may consist of several auxiliary
steps. They will all be reduced to elementary operations: interchang-
ing of matrix columns (rows), addition to one column (row) ofj an-
296 Bilinear and Quadratic Forms [Ch. 11

other column (row) multiplied by a number, multiplication of a col-


umn (row) by a number. We describe the auxiliary steps in terms of
transformations of a matrix A into a matrix C = P'AP congruent
with it, dropping for simplicity the index k.
A. In a martix A the element au -=1= 0. There is a nonsingular ma-
trix P such that for the elements of the first column of a matrix C=
= P'AP

(92.1)

The matrix P differs from a unit matrix in having a different first


row, with
1, j = 1,
PtJ ={ a11 • (92.2)
--
au ' ]-=/=1.

Multiplying A on the left by a matrix P' does not affect the first row
of A and makes zero all off-diagonal elements of the first column of
a matrix P' A. Multiplying P' A on the right by P does not affect the
first column of P'A.
Note one important fact. All matrix minors in the upper left-hand
corner of a matrix will be called principal minors. Since the matrix P
is right triangular and each of its diagonal elements is equal to unity,
of all minors in the first r columns only the principal minor is non-
zero, it is equal to unity. All principal minors will therefore coin-
cide in the matrices A and C. Indeed, using the Binet-Cauchy for-
mula we find

c(f1 22 ...
.. .
r)- r
~
l~ltl<lt2< .•. <ltr~n
P'(1 2
kl kz

X AP (
kl
1 2
k, k )
r r = AP
(11 22
r
kr
)·P(k1 kz ... kr)=A(12 ...
1 2 ... r 1 2 ... r
r).
We shall use this remark later on.
B. In a matrix A the element au is 0, but some element au is other
than 0, where j > 1. There is a nonsingular matrix P such that for
a matrix C = P'AP the element Cu = au is other than 0. The ma-
trix P differs from a unit matrix only in the four elements at the inter-
sections of rows and columns with indices 1, j. In those positions P
is ofthe form (~~).Multiplying A on the right by P interchanges in A
92) Reduction to canonical form 297

the columns with indices 1, j. Multiplying the matrix AP on the left


by P' interchanges in A P the rows with indices 1, j.
C. In a matrix A all diagonal elements are zero, but there are in-
dices j, I, where j < l, such that a 11 +a 11 :/= 0. There is a nonsingular
matrix P such that for a matrix C = P'AP the element c11 = a 11 +
+ a11 is other than 0. The matrix P differs from a unit matrix in one
element p 11 = 1. Multiplying A on the right by P adds to the jth
column of A its lth column. Multiplying the matrix AP on the left
by P' adds to the jth row of AP its lth row.
D. A matrix A is nonzero skew-symmetric, the element a 12 is 0,
but some element a11 is other than 0, where 1 < I. There is a nonsin-
gular matrix P such that in a skew-symmetric matrix C = P' A P
the element c12 = a /i is other than 0. The matrix P is represented as
a product P = P 1 • P 2 • The matrices P 1 and P 2 differ from unit ma-
trices in the four elements at the intersections of rows and columns
with indices 1, j and 2, l respectively. In those positions P 1 and P 1
are of the form ( ~ ~). As already stated, multiplying on the right by
these matrices interchanges the columns and multiplying on the left
interchanges the rows.
E. The matrix of the third-order principal minor of a matrix A is
of the form

(92.3)

where the elements a 11 , a 23 and a 32 are nonzero. There is a nonsingu-


lar matrix P such that the first three principal minors inC = P'AP
are nonzero. The matrix P differs from a unit matrix in one element
Ps~t which may be any number save 0, -a 12a;! and -a 11 aj"~. Multi-
plying A on the right by P adds to the first column of A its third
column multiplied by p 3 1" Multiplying AP on the left by P' adds to
the first row of AP its third row multiplied by p 31 •
F. A matrix A is skew-symmetric, the element a 12 is other than 0.
There is a nonsingular matrix P such that for the elements of the first
two columns of a matrix C = P'AP

• - { - at2• j = 2,
(/I- 0, j :j=.2,

Since under a congruence transformation a skew-symmetric matrix


goes over into a skew-symmetric matrix, similar relations will hold
also for the elements of the first two rows of C. The matrix P can be
represented as a product P = P 1 ·P 2 • The matrix P 1 differs from a
298 Bilinear and Quadratic Forms [Ch. 11

unit matrix in having a different second row, with


0, j = 1,
p~~>= 1, j = 2,
1- . j>2.
ail
a12

The matrix P 2 differs from a unit matrix only in having a different


first row, with
1, j = 1,
0 j = 2,
PW= __' a}2

1 au
• j>2.
Multiplying A on the left by P; does not affect the first two rows and
the second column of A and makes zero all the elements of the first
column of P;A save the first two. Multiplying P;A on the left by P2
does not affect the first two rows and the first column of PIA and
makes zero all the elements of the second column of P' A save the
first two. Multiplying P' A on the right by P does not affect the first
two columns of P'A.
G. Suppose a matrix A has for some partition into blocks the struc-
ture

A=(~A12) 1 (92.4)
0 !A 22

where A11 and A 22 are square blocks. If P 22 is a nonsingular matrix


whose size is that of A 22 , then the matrix

C= (Au; ~12P22 )
0 : P11 A22P22
is congruent with A. Moreover, C = P'AP, where
El 0
p= Co··[··p~J ·
A direct check of all the assertions made in the descriptions of the
auxiliary steps presents no particular difficulty, and it is ]eft for
the reader as an exercise to show their validity.
The method as a whole is carried out as follows. At the first basic
step the matrix A is reduced to the form (92.4), where A 11 is a non-
singular 1 X 1 or 2 X 2 matrix. If a matrix A,., k ~ 1, is of the
form (92.4), then at the next basic step the matrix in the lower right-
92) Reduction to canonical form 299

hand corner is also reduced to the form (92.4) and a general congru-
ence transformation is carried out according to step G. The matrix
A II +I can again be represented in the form (92.4) but the block in the
upper left-hand corner will be not only nonsingular for it but will
also have a greater size than for the matrix A11 • The process is repeat-
ed until at some step in the matrix A, there appears in (92.4) a
zero block in the lower right-hand corner or the size of the block in
the upper left-hand corner is n X n. The resulting transformation
matrix is a left-to-right product of the transformation matrices of
all the steps.
The form of A, depends on whether the matrix A is skew-symmet-
ric or not. So does the composition of the basic steps and the aux-
iliary steps.
Whatever the structure of a basic step, its aim is to obtain the next
portion of zeros in the matrix to be transformed. If the original ma-
trix is not skew-symmetric, then zeros are always obtained using an
auxiliary step A, and steps B to C are necessary only for it to be pre-
pared. But if the original matrix is skew-symmetric, then zeros are
obtained using step F, and it is step D that is preparatory. We de-
scribe the basic step of the method also in terms of the transformation
of a matrix A and we begin with a nonskew-symmetric matrix A.
At the first basic step the matrix to be transformed is nonskew-
symmetric. If the element a 11 :/= 0 and all off-diagonal elements of
the first column are zero, then nothing changes and we assume that
the basic step has been carried out. We take a unit matrix as a trans-
formation matrix P. In general, however, we carry out the fust of
the auxiliary steps A to C that can be made. If this happens to be
step B or C, then after it we must carry out step A or both steps, B
and A. We take as a transformation matrix P a left-to-right product
of all transformation matrices of the actually made auxiliary steps.
As a result of the first basic step, in the transformed matrix A1 all
the off-diagonal elements of the first column will be zero, i.e. A1
will have a block structure of the form (92.4).
The difference of all the other steps from the first is due to the fact
that the matrix to be transformed may turn out to be skew-sym-
metric. If it is not, then the basic step to be made next does not differ
in anything from the first. If however, the matrix to be transformed
is skew-symmetric, then under any congruence transformation it
remains skew-symmetric and it is impossible to obtain a nonzero
element in the upper left-hand corner using this matrix alone. A way
out is based on transforming an extended lower diagonal block.
Until a skew-symmetric matrix is found, the block in the upper
left-hand corner of (92.4) for matrices A 11 will be a right triangular
matrix with nonzero diagonal elements. If the elements in positions
(1, 2) and (2, 1) of the skew-symmetric matrix are nonzero in the
lower right-hand rorner, then for the matrix A,. which is the next to
300 Bilinear and Quadratic Forms [Ch 11

be transformed we change representation (92.4) by decreasing by


unity the size of the block in the upper left-hand corner. Now the
3 X 3 matrix in the upper left-hand corner of the new lower diagonal
block will have the form (92.3) and we can carry out an auxiliary
step E. After that it is possible to carry out three times in succession
step A. Indeed, as we have noted, making step A does not affect the
principal minors of the matrix. Hence in the given case, after car-
rying out step A, in the lower right-hand corner of the new matrix
the first two principal minors will be nonzero. It is clearly possible
therefore to make another step A. A similar reasoning shows that
step A can be carried out a third time. Having made a step "back"
we "ere enabled to move three steps "forward". If necessary, stPp D
is carried out before step E.
Thus, if A is not a skew-symmetric matrix, then the above method
allows us to construct a nonsingular matrix P such that the congruent
matrix P' A P will have the following structure:
M:N
P' AP = (
0 [--0-) . (92.5)

Here M is a right triangular matrix with nonzero diagonal elements


and the size of M is equal to the rank of A.
If A is a skew-symmetric matrix, then all the basic steps of the
method, including the first, are carried out according to the same
scheme. Suppose we have already obtained a matrix A,. of the form
(92.4) and there is a nonsingular block-diagonal matrix with skew-
symmetric 2 X 2 blocks in the upper left-hand corner. Since under
a congruence transformation a skew-symmetric matrix goes over into
a skew-symmetric matrix, the block A 12 in (92.4) is zero. We first
have to obtain nonzero elements in positions (1, 2) and (2, 1) of
the skew-symmetric matrix in the lower right-hand corner. It is
possible that to do this should require an auxiliary step D. We further
carry out step F, which adds to the diagonal another nonsingular skew-
symmetric 2 X 2 block, and proceed to the next basic step. Now too
the process is continued until at some step in the matrix As there
appears in representation (92.4) a zero block in the lower right-hand
corner or the size of the block in the upper left-hand corner is n X n.
So if A is a skew-symmetric matrix, then in this case the method
allows us to construct a nonsingular matrix P such that P' A P has
the following structure:
M I 0

0--(o).
P'AP= ( -- (92.6)

HereM is a block-diagonal matrix with nonsingular skew-symmetric


2 X 2 blocks. The size of M equals the rank of A.
92) Reduction to canonical form 301

For a Hermitian congruence transformation the general scheme


of the method remains the same. The process, however, turns out
to be even simpler than for the ordinary congruence transformation
if the auxiliary step C is replaced by the following.
C'. In a matrix A all diagonal elements are zero but there are in-
dices j. l, where j < l, such that among the elements a 11 and a 11
there is at least one nonzero element. There is a nonsingular matrix P
such that for a matrix C = P' AP one of the diagonal elements c11
and c 11 is nonzero. That is, cJJ = a 11 +a 11 and c 11 = i (a 11 - all)·
The matrix P differs from a unit matrix in two elements, p 11 = 1
and p 11 = i. Multiplying A on the right by P adds to the jth column
of A its lth column and to the lth column of A its jth column multi-
plied by -i. Multiplying AP on the left by P' adds to the jth row
of AP its lth row and to the lth row of AP its Jth row multiplied by i.
Now there is no need for steps D to F of the general method, as we
shall never go beyond step C'. Moreover, formulas (92.2) remain un-
changed.
Thus, if A is a nonzero matrix, then the method allows us to con-
struct a nonsingular matrix P such that a matrix P' AP Hermitian-
congruent with A will have the following structure:

M: N)
P'AP= ( ··o-to (92.7)

Here M is a right triangular matrix with nonzero diagonal elements.


The size of M equals the rank of A.
The forms of matrices (92.5) to (92. 7) are called canonical forms
for the operations of congruence transformation. A canonical basis
is also any basis in which the original matrix has snch a form. Matrices
of the forms (92.5) and (92.7) are by themselves called right trapezoi-
dal matrices. Similarly defined are left trapezoidal matrices.
Note a number of interesting conclusions arising from the canoni-
cal forms of matrices. As we have already said, a congruence trans-
formation preserves the symmetry and skew-symmetry of matrices.
If one of these properties is a feature of the original matrix, it must
be inherited by the canonical form. In addition to what has been said
it can be concluded therefore that
A symmetric matrix is congruent with a diagonal matrix.
A Hermitian matrix is Hermitian-congruent with a real diagonal
matrix.
A skew-Hermitian matrix is Hermitian-congruent with a pure imag-
inary diagonal matrix.
In all these cases reduction to canonical form is effected particu-
larly simply, since there cannot arise a need to carry out even one
of the auxiliary steps D to F.
302 Bilinear and Quadratic Forms [Ch. 11

For matrices of the canonical forms (92.5) and (92.6) we may per-
form yet another congruence transformation with a diagonal matrix
and have the nonzero elements determining the nonsingularity of the
block M equal either +1 or -1. Such a canonical form of a matrix
and the corresponding basis are called normal. It is clear that mul-
tiplying on the right (left) by a diagonal matrix results in multi-
plying the columns (rows) by the diagonal elements of the transfor-
mation matrix. We again describe the transformation in terms of
the auxiliary step with a matrix A.
H. A real nonskew-symmetric matrix A of rank r has the canonical
form (92.5). There is a real diagonal matrix P such that the nonzero
diagonal elements CJJ of a matrix C = P'AP are equal to sgn au.
with
j~r.
112
_ { (ausgnau)- ,
Pu-
t, j>r.
A real (complex) skew-symmetric matrix A of rank r has the canon-
ical form (92.6). There is a real (complex) diagonal matrix P such
that the nonzero upper off-diagonal elements of the matrix C = P' A P
equal +1 and the nonzero lower off-diagonal elements equal -1,
with
1, j is odd,
Pu= { a- 1 . j is even
1-1,1' •

A complex nonskew-symmetric matrix A of rank r has the canon-


ical form (92.5). There is a complex diagonal matrix P such that
the nonzero diagonal elements c11 of C = P' A P equal 1, with
ant/2, j~r.
Pu= {
1.
J>r.
A Hermitian congruence transformation with a diagonal matrix is
rarely employed, since it can change only the absolute values of
the elements determining the nonsingularity of the block M in (92. 7)
but cannot make the complex diagonal elements real.

Exercises
1. Prove that if reduction to canonical form using
a matrix P is carried out according to the above method, then det P = ±1.
2. What does the matrix equation

(92.8)

mean in terms of the canonical form?


93] Congruence and matrix decompositions 303

3. To what form can a nonskew-symmetric matrix be reduced using congru-


ence transformation if the auxiliary step E is excluded?
4. What form would a transformation matrix P have if each basic step ef
the above method consisted only of the auxiliary step A?
5. Prove that any right triangular matrix is congruent with a left triangular
matrix. What is the simplest form of the transformation matrix?
6. Prove that any nonsingular matrix of odd size is congruent with a nonsin-
gular right triangular matrix.
7. Let G be the matrix of a positive definite bilinear form. Prove that given
its elements fc}l for all t and 1
Ku>O, (g11+KJ1) 2 <4g,lgJJ•
8. Let G be the matrix of a negative definite bilinear form. Prove that given
its elements g 11 , for all t and j

fu < 0, (to + 1Jc> 1 < 4gllfJJ·


9. Prove that the matrices of all symmetric positive (negative) definite
bilinear forms are congruent.
tO. Prove that for a matrix G to be the matrix of an altemating bilinear form
It is sufficient that there should be diagonal elements with opposite signs in it.

93. Congruence and matrix decompositions


The general Dlethod of congruence transforma-
tion of a matrix to canonical form does not always make it possible
to predict the form the coordinate transformation matrix for a change
to canonical basis will have. Under some additional constraints on
the original matrix, however, this question can be given a quite
definite answer.
Suppose that in a matrix A all principal minors, except perhaps
the highest-order minor, i.e. the determinant of A, are nonzero. We
show that it is always possible to represent such a matrix as a product
A= LDU, (93.1)
where L is a left triangular matrix with unit diagonal elements, D
is a diagonal matrix and U is a right triangular matrix with unit
diagonal elements, i.e.

1 o) (d. d 22 0 ) (1 ~~2 ~:).


1 0 dnn 0 1
Equating the elements of A and the products of LDU we get

(93.2)
Bilinear and Quadratic Forms [Ch. it

Now we find from (93.2) successively all the unknown elements of


the matrices of decomposition (93.1). Namely,
du =au,
a1 J a 11
UtJ=du' lJI=~, J>1,
i- I
d 11 =a 11 -

L
P=l
l 1 PdPPuP, t > 1,

a,,- 2 •-1
l1p dppUpJ (93.3)
u,, P=l
= _ _...:;;_..:.d-=-,-,---

1- I
aj 1 - ~ ljpdppUpf
l,, = _ _ _,po-'--"'1---;-----
du
t> 1, J>i.
We apply to (93.1) the Binet-Cauchy formula. Recall that among
the minors of the left triangular matrix L in the first r rows only the
principal minor is nonzero, it is equal to unity. A similar assertion
holds for the matrix U, if the rows are replaced by columns, of course.
Therefore
1 2
A (1 2

kr) -nu (11 2


r 2
Hence
A (I 2 ... ')
d - 1 2 •.• t (93.4)
I I - A ( 1 2 ..• 1-1)
1 2 ... t-1

Under the assumption the principal minors of A are nonzero. There-


fore so are all the diagonal elements d11 in (93.4) except perhaps the
last one.
We shall fairly often deal with decompositions (93.1) for symmet-
ric and Hermitian matrices. If again all the principal minors of A,
except perhaps the last one, are nonzero, then a symmetric matrix
can always be represented as a product
A = S'DS (93.5)
and a Hermitian matrix can be represented as a product
A=S'DS. (93.6)
93) Congruence and matrix decompositions 305

Here S is a right triangular matrix with unit diagonal elements


and D is a diagonal matrix, i.e.
(1 812 • • • Sin) ( du 0 1
s~ lo .J·
S2n d22
D=
I
1 J
0
Completely in accordance with (93.3) we now have
DtJ
SIJ-=-d , j>1,
11
i-t
d 11 =a 11 - ~ dPPs~ 1 • i>1. (93.7)
p=l
,_ l
O,J- ~ dppSpfSpj
p=l
SIJ = _ _.:;;..._d;;-1-1- - - j>i,

for decomposition (93.5) and


OtJ
SIJ -=-d I j>1,
11
t-1
d 11 = a 11 - ~ dPP I sp 1 12 , i > 1,
p=l
1-l
Olj- ~ dppSpjSpj
p=l
SIJ = --...:..--:-d-,--- j>i,
for decomposition (93.6). Formulas (93.4) remain valid.
Decompositions (93.1), (93.5) and (93.6) are extensively used to
solve diverse problems of linear algebra. As to congruence transforma-
tions of the matrix these decompositions lead to the following rela-
tions:
(L -t')' AL-I'= DUL -t·, (L- 1')' AL- 1' =DU0',
s-t'As- 1 = D, s-t· As-~= D.
The matrices DUL- 1' and DUL-l' are right triangular and the
matrices D are diagonal; the zero element on their principal diagonals
may be only the last. Therefore we have again obtained the already
familiar canonical forms of matrices under a congruence transforma-
tion. N'ow we can say, however, that coordinate transformation
matrices for a change to canonical basis are right triangular, since
so are the matrices L- 1 ' and S- 1• The above decompositions them-
selves give coordinate transformation matrices L' and S for a change
306 BilinPar and Quadratic Forms [Ch. 11

from the canonical basis to the original basis, which are also right
triangular.
In the case of a symmetric matrix the described process of decom-
position is closely related to the so-called Jacobi algorithm for
transforming a quadratic form to canonical form. The only difference
is that in the Jacobi algorithm we find the matrix s- 1 instead of s.
Notice that S is much easier to find than S- 1 •
Congruence transformations with a right triangular matrix are
among the simplest, but still sufficiently general transformations to
be applied to a wide class of matrices. Of certain interest therefore is
a description of the class of matrices that can be reduced to canonical
form using transformations wilh a right triangular matrix.
Lemma 93.1. If a rectangular matrix A is representable in the
block form

(93.8)

where B is a square nonsingular r X r matrix, then the rank of A is r if


and only if
T = RB-1Q. (93.9)
Proof. We multiply A on the left by a nonsingular block matrix

where the corresponding blocks have the same size as in (93.8). Then

vA= ( :··:· :;·:__·:B~-~Q) .


The matrices A and VA have the same rank but it is equal to r if
and only if T - RB- 1Q = 0.
Now we can describe the desired class of matrice~. It turns out to
be closely related to matrices of the form (93.8) and (93. 9).
Theorem 93. t. For a nonskew-symmetric matrix A to be reducible
to canonical form using a congruence transformation with a right trian-
gular matrix it is necessary and sufficient that the number of the first
nonzero principal minors of A should equal its rank.
Proof. Necessity. Let a nonskew-symmetric matrix A be reducible
to canonical form (92.5) using a right triangular matrix P. It is
clear that the number of the first nonzero principal minors in A
cannot be greater than the size of the block M. Applying the Binet-
Cauchy formula and considering that there is no nonzero minor in
93) Congruence 11nd matrix decompositions 307

the first columns of P, except the principal minor, we get


A( 1 2 . .. 1 2 . . . s)}
s) {p ( 12 2 = .1/ ( 1 2 . . . s)
12 ... s ... s 12 ... s
for every s not greater than the size of M. Since the principal minors
of M and P are nonzero, the number of the first nonzero principal
minors of A equals its rank.
Sufficiency. Suppose the number of the first nonzero principal
minors of A and its rank equal r. We represent A in the block
form (93.8), where the size of the block B is equal to r X r. Since
all principal minors of the matrix B are nonzero, it follows from the
foregoing that it can be represented as B = LDU similarly to (93.1).
We construct a block matrix

P= (L ~~.:.~--~B; ·R:).
1

A direct check shows that

( ~~~-~~~-~--~-~~~=-~-~- ~~-:-_?.~) .
1
P' AP = ,

The matrix DUL-l' is nonsingular right triangular and P is nonsin-


gular right triangular, and hence A is reducible in the required way
to canonical form.
For a congruence transformation of a skew-symmetric matrix and
a Hermitian congruence transformation of an arbitrary matrix the
corresponding statements can be proved in a similar way and we
shall restrict ourselves to their formulation.
Theorem 93.2. For a skeu:-symmetric matrix A of rank r to be redut-
ible to canonical form usmg a congruence transformation with a right
triangular matrix it is necessary and sufficient that the number of the
first nonzero principal minors of even order of A should equal r/2.
Theorem 93.3. For a matrix A to be reducible to canonical form
using a Hermitian congruence transformation with a right triangular
matrix it is necessary and sufficient that the number of the first nonze-
ro principal minors of A should equal its rank.
Congruence and Hermitian congruence transformations of a ma-
trix are not in general similarity transformations. However, if for
some class of matrix P one of the following groups of relations holds
PP' = P'P = E, PP* = P*P = E, (93.10)
then the congruence transformation becomes a similarity transforma-
tion and in order to make studies we may use the earlier obtained
results relating to the similarity of matrices. As we already know,
the first group of relations in (93.10) is satisfied by real orthogonal
308 Bilinl.'al' and Quadratic Forms [Ch. 11

matrices, and the second group of relations is satisfied by complex


unitary matrices. Recalling the results of Sections 76 to 81 for or-
thogonal and unitary similarities we therefore conclude that the fol-
lowing statements are true:
Any real symmetric or skeu:-symmetric matrix can be reduced to canon-
ical form using a congruence transformation with an orthogonal matrix.
Any complex matrix can be reduced to cauonical form using a Hermi-
tian congruence transformation with a unitary matrix.
These statements are mainly of theoretical interest, since in prac-
tice orthogonal and unitary transformation matrices are difficult to
find, especially for n ~ 5.

Exercises
1. Prove that if decompositions (93.1), (93.5) and
(93.6) exist, then they are unique.
2. Prove that 1f all the minors of a matrix A in the lower right-hand corner
(except perhaps the minor of the highest order) are nonzero, then there is a
unique decomposition A =LDC.:, where Lis a right triangular n1atrix, U is a left
triangular matrix, each with unit diagonal elements, and D is a diagonal matrix.
3. Prove that for the elements d 11 of the matrix D of Exercise 2
A ('• 1+1, ... , n)
d- _ 1, r,-1 •... , n
i<n.
11
- A (''1, 1+2•... , ") I

j -· 11 I T 2, • • • 1 n
4. Into what triangular factors can a matrix be faetored if its minors in the
lower left-hand CupJler right-hand) corner are nonzero?
5. Suppose for the elements a 11 of a matrix A
a,,= 0, k < j - t, 1 - i < l, (93.11)
given some numbers l < k. Such a matrix is called a band matnr. Prove that
if for a band matrix A decomposition (93.1) holds, then
llj = 0, 1 - i < l, u,,
= 0, j - f > k.
6. A matrix A is said to be tridiagonal if it satisfies conditions (93.11) for
k = 1 and l = -1. What form have formulas (93.3) and (93. 7) for a tridiagonal
matrix?
7. A matrix A is said to be raght (left) almost tnangular if it satisfies condi-
tions (93.11) fork= n and l = -1 (k = 1 and l = -n). What form have
formulas (93.3) for almost triangular matrices?
8. What number of arithmetical operations is required for the various forms
of matrices to obtain decompositions of the type (93.1)?
9. How are decompositions (93.1), (93.5) and (93.6) to be applied to solve
systems of linear algebraic equations?

94. Symmetric bilinear forms


Discussing bilinear and quadratic forms we
have often paid particular attention both to symmetric bilinear forms
and to bilinear forms generating real quadratic forms. Only two
94) Symmetric bilinear forms 309

kinds of bilinear forms simultaneously satisfy both conditions, these


are the real symmetric and the Hermitian-symmetric bilinear form.
The matrices of these forms are in any basis a real symmetric or
a Hermitian matrix respectively. A congruence transformation
reduces both forms of matrices to a diagonal real normal form.
As we have seen, various congruence transformations can reduce
the same matrix to canonical form. In general therefore the canonical
form b not uniquely defined. The question naturally arises: What
do the different canonical forms to which the same matrix is reduc-
ible have in common? We know that the rank of a matrix does not
depend on transformation. Whatever the method of reducing to canon-
ical form is therefore, the number of the last zero rows will be the
same. Much more can be said concerning the real symmetric and the
Hermitian matrix. The canonical form of these matrices can be
characterized by the number of positive and negative terms it con-
tains. There is an important
Theorem 94.1 (the law of inertia for quadratic forms). The number
of positive terms and that of negative terms in the canonical form of
a real symmetric matrix under the ordmary congruence transformation
and of a Hermitian matrix under a Hermitian congruence transforma-
tion do not depend on the method of reduction.
Proof. Let some matrix A satisfy the hypotheses of the theorem.
Consider a quadratic form F with a matrix A of rank r in variables
x 1 , x 2 , • • • , x, and suppose that two methods have been used to
reduce it to the normal form
F = y: "7 y~-;- ... T y! - Ytl - Yt2- ... - y~
(94.1)
Since the change from the variables x 1 , x 2 , • • • , x, to y 1 , y 2 , • • • , Yn
was effected using a nonsingular linear transformation, the new
variables will be linearly expressible in terms of the old variables,
with the determinant of the inverse transformation matrix nonze-
ro. So
II
bll ... b., )
y1 =-- ~ b1,x8 , det ........................ :::1=0. (94.2)
(
s=l bnl • • • bnn

Similarly
n
ell .. · C1n)
det ................. - .. :::1= 0. (94.3)
(
Cnt • · • Cnn

Suppose that k < land write the system of equations


.lit = Yl = ... = y,. = zl+l = ... = Zn = 0. (94.4)
310 Bilinear and Quadratic Forms [Ch. 11

If the left-hand sides of these equations are replaced by their expres-


sions in (94.2) and (94.3), a system n - l + k of homogeneous lin-
ear equations in n unknowns x 1 , x 2 , • • • , Xn is obtained. The number
of equations in that system is smaller than that of unknowns, so the
system has a nonzero real solution a 1 , a 2 , • • • , an.
Now replace in (94.1) all the variables by their expressions in
(94.2) and (94.3) and then substitute the numbers a 1 , a 2 , • • • , an
for x 1 , x 2 , • • • , Xn· If for brevity we denote by y 1 (a) and ZJ (a) the
values of the variables y 1 and z1 after such a substitution, Lht>n taking
into account (94.4) relation (94.1) becomes
- yf.. 1 (a)-- ... - y~ (a). zi (a)·-;- ... ...!... z~ (a).

It follows that
z1 (a) = ... = z 1 (a) = 0. (94.5)
On the other hand. according to the choice of the numbers a 17
a 2,• • • , an we have

Zt+l (a) = ... = Zr (a) = ... = Zn (a) = 0. (94.6)


Thus the system of n homogeneous linear equations
z, = 0, i = 1. 2, ... , n,
in n unknowns x 1 , x 2 , • • • , Xn has, by (94.3) and (94.G), a nonzero
solution a 1 , a 2 , • • • , a,,, i.e. the determinant of the system must be
zero. This contradicts (94.3). We arrive at a similar contradiction
assuming l < k. lienee l = k and the theorrm is proved.
Any real ordinary (Hermitian) quadratic form in a real (complc:\)
vector space has a unique real symmetric (complex Hermitian)
matrix in any basis. These matrices satisfy the hypotheses of Theo-
rem 94.1. Whatever the basis, the number of positive and negative
terms in the canonical form of a matrix is invariant for a quadratic
form and is called its positive and negative index of inertia respec-
tively. The difference between its positive and negative indices is
called the signature of the quadratic form. We can now formulate
some useful corollaries of Theorem 94.1.
Corollary. A quadratic form is positive (negative) definite if and only
if the positive (negative) index of inertia is equal to n.
Corollary. A quadratic form is of constant signs if and only if one
of the indices of inertia is zero.
The law of inertia permits some classification of real quadratic
forms to be given. We shall say that two quadratic forms are affinely
equivalent if for each of them we can choose a basis such that the
matrices of those quadratic forms are the same. In this case we shall
also say that a nonsingular transformation converts one quadratic
form into the other. It is easy to verify that the affine equivalence of
quadratic forms is an equivalence relation and that two qua•lralic
94] Symmetric bilinear forms 311

forms are equivalent if and only if their matrices are congruent in


the same basis. It follows from the law of inertia therefore that all
real quadratic forms in a vector space Kn can be grouped into non-
overlapping classes, each consisting only of affinely equivalent quad-
ratic forms. A class is characterized by rank and signature.
This grouping is called an affine classification of real quadratic forms.
Given any rank r of quadratic forms a given classification always
has two "extreme" classes, the classes with signatures +r and -r.
The first class comprises all nonnegative quadratic forms of rank r,
the second comprises all nonpositive quadratic forms of rank r.
Both classes taken together contain all rank-r quadratic forms of
constant signs and only these forms.
In general the constancy of signs of a quadratic form is easy to
establish by reducing the form to canonical form in one of the ways
described above. Of considerable interest in some cases, however, are
also direct criteria of the constancy of signs. Taking into account
the great significance of these very quadratic forms we shall carry
out additional studies for them, confining our discussion mainly to
quadratic forms in a real space. We shall again assume that the
matrix of a quadratic form is real symmetric. For the case of a com-
plex space the results of the studies will be the same and the proofs
differ in minor details.
Theorem 94.2 (Sylvester's criterion). For a quadratic form to be
positive definite it is necessary and sufficient that all principal minors
of the matrix of that form should be positive.
Proof. Necessity. Let a quadratic form with a matrix A be positive
definite. Then there is a nonsingular transformation with a matrix P
reducing the form to a sum of squares. According to (91.9) this means
that E = P' A P or A = (P- 1 )' P- 1 • Using the Binet-Cauchy formula
we find

A(11 22 ... s)
... s

Since P is nonsingular, the firsts columns contain at least one nonze-


ro minor. Hence for every s the right-hand side of the equation ob-
tained is positive.
Sufficiency. Suppose now that all principal minors of the matrix A
of some quadratic form are positive. We reduce that form to canoni-
312 Bilinear and Quadratic Forms [Ch. 11

cal form using the transformation defined by formulas (93. 7). Under
the hypotheses of the theorem and according to formulas (93.4)
all the coefficients of the canonical form will be positive, i.e. the
quadratic form is positive definite.
Corollary. For a quadratic form to be negative definite it is necessary
and sufficient that all principal minors of odd order should be negative
and all principal minors of even order should be positive.
The proof follows from Sylvester's criterion and from the fact
that if A is the matrix of a negative defmite quadratic form, then -A
is the matrix of a positive definite quadratic form.
Theorem 94.3 (Jacobi's criterion). For a quadratic form to be positive
definite it is necessary and sufficient that all coefficients of the character-
istic polynomial of the matrix of the form should be nonzero and ha~:e
alternating signs.
Proof. Necessity. As already noted, a transformation of variables
with an orthogonal matrix can reduce a given quadratic form to
canonical form, where coefficients are the eigenvalues A. 1 , A. 2 , • • • , An
of the matrix of the form. Under the hypotheses of the theorem the
eigenvalues must be positive. The characteristic polynomial f (A.)
equals
/(A)=(A-A 1)(A.-A. 2) ••• (A.-A.n)=A.n+an_ 1A,n-t + ... +a 1A.+a 0
and all of its coefficients are nonzero and have alternating signs,
which is immediate from Vieta's formulas for the coefficients a 1•
Sufficiency. Let the coefficients of the characteristic polynomial
be nonzero and have alternating signs. The roots of this polynomial
will be real as the eigenvalues of a symmetric matrix and it remains
to show that they are positive. Suppose this statement has been
proved for all polynomials of degree n-1. Since all coefficients f' (A.)
are nonzero and have alternating signs, under the assumption f' (A.)
has the (n - 1)th positive root. It is known from mathematical
analysis that if a polynomial has only real roots, then they are
separated by the roots of a derivative. Therefore f (A.) has at least
n - 1 positive roots. The last root will also be positive since the
product of roots is positive.
Criteria for nonnegative and nonpositive quadratic forms are
much more complicated, and this is mainly due to the fact that in
these cases the matrices of the forms are singular. One of the main
ways of investigating the constancy of signs of a quadratic form
involves reducing its matrix to a symmetric form (93.8) and (93.9)
and studying that form. Matrices of constant ~igns being closely re-
lated, we restrict our consideration to nonnegative matrices only.
We shall say that a matrix H is a permutation matrix if in each of
its rows and in each of its columns there is only one nonzero element
and all the nonzero elements equal unity. It is clear that multiply-
ing an arbitrary matrix A on the right by a permutation matrix H
!l'tl Symmetric bilinear forms 313

interchanges the columns in A and multiplying on the left inter-


changes the rows.
Lemma 94.1. For an arbitrary nonsingular matrix A there is a per-
mutation matrix H such that in the matrix AH all principal minors-
are nonzero.
Proof. The matrix A is nonsingular. Hence there is at least one-
nonzero element in its first row. Interchanging an appropriate column
with the first column makes the first-order principal minor nonzero.
Suppose that by interchanging the columns we have made all prin-
cipal minors up to the kth order nonzero. If interchanging the
last n - k columns cannot yield a nonzero principal minor of order
k + 1, this means that in the first k + 1 rows of A there is no
nonzero minor of order k + 1, i.e. that A must be singular. This
contradiction proves the lemma.
Theorem 94.4. For a quadratic form of rank r with a matrix A to
be nonnegative it is necessary and sufficient that there should be a permu-
tation matrix H such that in the matrix H'AH the first r principal mi-
nors are positive.
Proof. Necessity. Let a quadratic form of rank r with a matrix A
be nonnegative. Then there is a nonsingular matrix P such that
A = (P- 1)'ErP- 1 , where Er is a diagonal matrix whose first r ele-
ments equal unity and whose other elements are zero. According to
Lemma 94.1 there is a permutation matrix H such that all principal
minors of the matrix P- 1H are nonzero.
Using the Binet-Cauchy formula we find for 1 ~ s~ r

1 2 ...s)}
;;;;::: { (P-tH) ( 1 2 ... s
2
> 0.

Sufficiency. Suppose that for a quadratic form of rank r with


a matrix A there is a permutation matrix H such that in the matrix
H' AH the first r principal minors are positive. By Theorem 93.1
H' AH can be reduced to canonical form using a transformation with
a triangular matrix. According to (93.4) the nonzero coefficients of
the canonical form of H'AH and hence of A are positive, i.e. the
quadratic form is nonnegative.
314 Bilinear and QuadraLic Forms [Ch. 11

As to quadratic forms that are not of constant signs, there are no


theorems closely similar to Theorems 94.1 and 94..4 for them. There
is only
Theorem 94.5. lf a quadratic form has a symmetric matrix A of the
form (93.8) and (93.9), then its indices of inertia coincide with those of
the "truncatet!' quadrattc form defined by the matrix B of (93.8).
Proof. By Theorem 93.1 the matrix A can be reduced to canonical
form using a transformation with a right triangular matrix, with
relations (93.4) holding for the nonzero coefficients of the canonical
form. But the matrix B of the "truncated" quadratic form also sat-
isfies the hypotheses of Theorem 93.1 and for the coefficients of its
canonical form we again have relations (93.4). Therefore the indices
of inertia of the quadratic forms defined by A and B coincide.
The particular interest we have shown for quadratic forms of con-
stant signs is accounted for by their vast area of application. One
of the major applications is the introduction of a metric in a vector
space. Any bilinear form polar to some positive definite quadratic
form may be regarded as a scalar product and hence we can turn a vec-
tor space into a Euclidean or a unitary space, using it. The validity
-of the axioms for these spaces is obvious. Of no less importance for
introducing a metric, especially a metric on subspaces, are also non-
negative forms. As an example of using the constancy of signs we
prove the validity of
Theorem 94.6. For a nonsingular Hermitian bilinear form to be
reducible to diagoruzl form it is sufficient that its symmetric (or skew-
symmetric) part should be strictly of constant signs.
Proof. Consider the case of a positive definite symmetric part. Let A
be the matrix of the bilinear form. Then a matrix (1/2) (A + A*)
will be the matrix of the symmetric part. Since the symmetric part
is positive definite, the matrix (112) (A + A*) is Hermitian-congru-
ent with a unit matrix. Hence there is a nonsingular matrix S such
that
(94.7)

We show that the matrix S' AS is normal. From (94.7) we haYe


SS' = 2 (A+ A*fl.
Therefore
(S' AS) (S' AS)*- (S' AS)* (S' AS)= S' (ASS' A•- A*SS' A)
= 2S' (A (A+ A*fl A•-A• (A+ A*t 1 A) S
= 2S' ((A•-t (A+ A*) A-ttt- (A-I (A+ A*) A•-ttl) S
= 2S' ((A•-1 + A-1 t l - (A*-t + A-ttt) S = 0.
95] Second-degree hypersurfaces 315

By virtue of normality S' AS is reducible to diagonal form using


a Hermitian congruence transformation with a unitary matrix.
Thus the matrix A of a Hermitian bilinear form is Hermitian-
congruent with a diagonal matrix, which was to be shown. ThC'
proofs of the other cases are similar.

Exercises
1. Prctve that if all principal minors of a real sym-
melr•c or a complex Hermitian matrix are nonzero, then the number of its
positive and negative eigenvalues coincides respectively with that of
positive and negative terms of sequence (93.4).
2. Prove that if a matrix is positive definite, then any diagonal minor is
pGsitive.
3. Prove that a symmetric matrix of rank r always has at least one diagonal
minor of order r not equal to zero.
4. Prove that the maximum element of a positive definite matrix is on the
principal diagonal.
5. Prove that A is a positive definite matrix if for every t
I a, I>~ I a11l·
'""'
6. Prove that for any symmetric matrix A of rank r there is a permutation
matrix H such that among the first r principal minors of the matrix H'AH there
are no two adjacent zero minors and the minor of order r is nonzero.
7. Prove that the matrix H' A H of Exercise 6 can be represented asH' A H =
= S' DS, where S is a right triangular matrix with unit diagonal elements and D
is a block-diagonal matrix with t X t and 2 X 2 blocks.
8. Prove tliat any nonnegative matrix of rank r can be represented as a sum
of r nonnegative matrices of rank 1.
9. Let A and B be positive definite matrices with elements a 11 and b 11 .
Prove that a matrix C with elements c11 = ailb 11 is also positive definite.

95. Second-degree hypersurfaces


Closely related to the study of real quadratic
forms is another study, that concerned with second-degree hyper-
surfaces. Wishing to stress the geometrical character of many of
the properties of hypersurfaces, in what follows we shall nearly
always call vectors points of a space Rn.
A second-degree hypersurface f in Rn is a set of points whose coordi-
nates x1 , x 2 , • • • , Xn satisfy the equation
n n n
2; _2:
I=IJ=I
a11x 1x 1-2 L bltx 11 +c=0,
lt-1
(95.1)

where a 1 ~o b,. and c are real numbers.


We simplify the notation. As in the case of quadratic forms, we
shall assume that a matrix A with coefficients a 11 is symmetric. By b
we d,·note a vector with coordinates b1 , b 2 , • • • , bn. We introduce
316 BilinPa~ a:1d Quadratic Forms [Ch. 11

in Rn a scalar product as a sum of pairwise products of coordinates.


We can now regard a second-degree hypersurface f in Rn as a set of
points x of a Euclidean space Rn satisfying the equation
(Ax, x) - 2 (b, x) c = 0 +
(95.2)
or, A being symmetric, the equation
(x, Ax) - 2 (b, x) +c= 0.
To begin with. we consider relative positions of second-degree
hypersurfaces and straight lines. Take a straight line in R,.. Let it
pass through a point x 0 and have a direction yector l. Points x of
that straight line are defined by
x = x 0 --r It (95.3)
for all possible real numbers t. Substituting the expression for x in
(95.2) we get
t 2 (Al, l)- 2t ((b, l)- (Al, x 0 )) + (Ax x 0 ) - 2 (b, x 0 )
0. (95.4)
0, +c =
Thus the points of intersection of the straight line (95.3) with
hypersurface (95.2) are giyen by the roots of the quadratic equa-
tion (95..1).
We shall say that the straight line (95.3) with direction Yector l
has a nonasymptotic (asymptotic) direction relath·e to hypersur-
face (95.2) if (A l, l) -=/= 0 ((A l, l) = 0).
Consider any straight line with a nonasymptolic direction l inter-
secting a hypersurface. The points of intersection determine on
l'very such line a segment, which by analogy with elementary geom-
etry will be called a chord. We denote by/, the set of the midpoints
of all chords. If the ends of a chord collapse at a point, then that
point will be ragarded also as the midpoint of the chord. We show
that L is in some hyperplane.
The ends of any chord are given by the Yalues of a parameter t
coinciding with the roots of (95.4). Therefore the midpoint of a chord
is given by the value of t equal to a half-sum of the roots. According
to Vieta 's formulas this yields
t - (b. l)-(Al, .:z: 0 )
- (AI, l)
(
95 S) ••

lf z0 is the midpoint of a chord, then


(b, 1)-(Al, x0 )
Zo = Xo-;- (Al, l) l.
~ow we have
(Al, z0 ) = ( Al, x 0 + (b. t)-(At, .:z: 0 l z)
(Al, l)

= (Al
'Xo
) + lb. ll-tAI. .:z:,,l (Al l)
(AI, l) '
= (b ' l)

951 Second-degree hyper.,urfaccs 317

So the midpoints of all chords satisfy the equation


(A l, x) = (b, 1). (95.6)
Since its right-hand side is independent of x 0 , according to (46.8)
this equation defines a hyperplane whose normal vector is equal to A l.
Hyperplane (95.6) is called a diametrical hyperplane conjugate to
a direction l relative to hypersurface (95.2).
The explicit form of the equation of a diametrical hyperplane
allows us to establish a number of important properties of second-
degree hypersurfaces. Let A be a nonsingular matrix. Then for any
linearly independent vectors 11 , 12 , • • • , l,. so are vectors A 11 , A 12 , • • •
• . . , Al,.. Suppose further that all directions 11 , 12 , • • • , ln are
nonasymptotic. This will clearly hold, for example, when a quadrat-
ic form (Ax, x) is positive definite. lienee it is possible to construct
a system of u diametrical hyperplanes conjugate to 11 , 12 , • • • , ln. The
hyperplanes will have in common a unique point x*. From (95.6)
we now get
(Ax*-b, 11) =0
for i = 1, 2, ... , u. By virtue of the linear independence of the
vectors l, this means that Ax* - b = 0, i.e. that the point x* is
nothing but a solution of the system of linear algebraic equations
Ax = b. (95. 7)
The solution of the system with a nonsingular matrix is unique, so
the constructed point x* is in fact independent of the choice of vectors
l 1 , l2, ... , ln.
Simple calculations show that for any point x*
(Ax, x)- 2 (b, x) + c =-=(A (x- x•), x- x*) + 2 (Ax•-b, x -x*)
+(Ax*, x*)- 2 (b, x•) +c. (95.8)
If, however, x* is a solution of system (93.i), then with respect to
such a point hypersurface (95.2) has an important symmetry proper-
ty. That is, for any x the left-hand side of (95.2) assumes the same
values at the points
+
x = x* (x- x*), x' = x* - (x- x*). (95.9)
It follows in particular that both points x and x' are or are not
simultaneously on hypersurface (95.2). The equation

x•= ~ (x+x')
allows us to call the point x* the centre of symmetry of the hypersur-
face. If on (95.2) there is at least one point of Rn, then the centre of
symmetry is said to be real. Otherwise it is called imaginary.
318 Bilin£'ar and Quadratic Form" [Ch. 11

Now let x• be the centre of symmetry, i.e. for any x the left-hand
side of (95.2) assumes the same values at the points x and x'. Hence
(Ax, x)- 2 (b, x) 1 c = (Ax', x')- 2 (b, x') +c.
According to (95.8) and (95.9) this is possible only if for any x
(Ax* - b, x - x*) == 0.
But the last identity holds if and only if Ax* - b = 0, i.e. if x•
is a solution of system (95.7). Notice that here we have never assumed
either the nonsingularity of the matrix A or the presence of any
other of its properties besides symmetry. Therefore:
For a system Ax = b to have a solution it ts necessary and sufficient
that hypersurface (95.2) should have a centre of symmetry. The set of all
solutions coincides with that of all centres of symmetry.
Thus there emerges a far-reaching connection between systems of
linear algebraic equations and second-degree hypersurfaces. It is
widely used in constructing a host of computational algorithms.
Construction of a system of diametrical hyperplanes is central to
a large group of methods among the so-called methods of conjugate
directions. These will be discussed in the last chapter.
In general investigation of second-degree hypersurfaces may be
based on reducing them to canonical form in much the same way
as quadratic forms are. But in addition to linear nonsingular trans-
formations of variables, translations are required.
Consider any transformation of variables x = Py reducing a quad-
ratic form (Ax, x) to normal form. In variables y 1 , y 2 , • • • , Yn the
equation of a hypersurface will have the form
Yi + .· . + Y~- Y:+l- ... - y;
-2dsYs- · ·· -2drYr-2dr+tYr+l- ·· · -2dnYn+c=O.
Now we translate the variables by the formulas
y,-d,, 1~i~k,
z1 = y 1 +d 1, k+1~i~r,
{
y 1, r+1~i~n.

In these variables the equation is as follows:


zi+ ... + Z~ -z:+ 1 - ••• -z~-2dr+tZr+s- ... -2dnZ 11 +p = 0.
Let one of the numbers dr+I• ... , dn, for example dn, be other than 0.
Set
z,, i<n,
dr+tZr+t + ... + dnZ 11 , i = n,
93] Second-degree hyper~nrfaces 319

and then make another translation


r V;, i < n,
u,={ P
l v,- 2 , i = n.
Now the equation of a hypersurface is as follows:
±ui±u~± ... ±u~-2Un=0, 1~r~n-1.(95.10)
If there is no nonzero number among dr+l• ... , dn, p, then the
equation of a hypersurface assumes the form
± ui ± u: ± ... ± u~-= 0, 1 ~r~n. (95.11}
And finally if dr+l• ... , dn are zero but p =I= 0, then putting u 1 =
= z11I p 1112 for every i we obtain one more form of the equation of
a hypersurface. Namely,
(95.12}
Because of the law of inertia for quadratic forms, surfaces given
by different equations of the form (95. tO) to (95.12) cannot be con-
verted into one another using linear transformation of variables and
translation. As different one should regard equations that cannot be
converted into one another by multiplying by (-1) and changing
the indices of the coordinates. As in the case of quadratic forms, we
have again obtained a subdivision of all second-degree hypersurfaces
into nonoverlapping classes.
Not infrequently reduction of second-degree hypersurfaces to
canonical form uses only operations of translation and linear trans-
formations of variables with orthogonal matrices. This is mainly
due to both types of the transformations leaving unchanged the
distances between points. In this case the canonical forms will be
somewhat different, although on the whole they are obtained in the
same way as those considered above. For example, in the case of R~
a second-degree hypersurface can be reduced only to one of the fol-
lowing forms:
I. A. 1x 2 + A. 2y 2 + a 0 = 0,
II. A. 2 y2 + b0 x = 0, (95.13)
Ill. A. 1x + a 0 = 0,
2

in the case of R 3 it can be reduced to one of the following forms:


I. A. 1x2 + AzY 2 + A. 3 z2 + a 0 = 0,
II. A. 1x 2 + A. 2 y2 + b0 z = 0,
Ill. A. 1x2 + A. 2 y2 + a 0 = 0, (95.14)
IV. A. 1y 2 + boX= 0,
V. A. 1x2 +a 0 =0.
.320 Bilin('ar and Quadratic Forms [Ch. it

In all equations (93.13) and (95.14) the coefficients of the vari-


ables written out are nonzero. The free term may be zero. According to
the well-established terminology hypersurfaces in R 2 will be called
.second-degree curves and those in R 3 second-degree surfaces. Considering
the interests of many branches of mathematics we shall study in more
<letail second-degree curves and surfaces using their canonical forms
(95.13) and (95.14).

Exercises
1. Let A be a positive definite matrix. Prove that on
the solution of the system .4 x = b the expression (Ax, x) - 2 (b, x) reaches
its minimum.
2. Let A be a positive definite matrix. Prove that on the straight line (95.3)
the expression (Ax, x) - 2 (b, x) reaches its minimum for the value oft in (95.5).
3. Prove that for any direction to be nonasymptotic for hypersurface (95.2)
it is necessary and sufficient that the quadratic form (Ax, x) should be either
positive definite or negative definite.
4. What symmetry properLy has a diameLrical hyperplane conjugate to
a direction l if l is an eigenvector of a matrix A corresponding to a nonzero
.eigenvalue?
5. Prove that the system Ax = b has no solution if and only if hypersur-
face (95.2) can be reduced to canonical form (95.10).

96. Second-degree curves


We shall study second-degree curves using
equations (95.13). Let the equation of a straight line be of the form
A. 1x 2 , A2 Y2 + a 0 = 0. (96.1)
I. t. The number a 0 is not zero; the numbers A. 1 and A. 2 are opposite
in sign to a 0 • We write (96.1) as
X~ + ---:::-::y=2=~ = 1
(v- ~: )2 (v- :: )2
and set

a= V- ~: , b= V - ~: . (96.2)
Under the hypothesis a and bare real numbers, so (96.1) is equiva-
lent to
(96.3)

The curve described by this equation is called an ellipse (Fig. 96.1)


and the equation is called the canonical equation of an ellipse. We
show some properties of the ellipse. An ellipse is a bounded curve.
It follows from equation (96.3) that for all points of an ellipse we
96) Second-degree curves 321

have I x I~ a and I y I~ b. An ellipse has two axes of symmetry,


the x axis and they axis, and a centre of symmetry, the origin. This
follows from the fact that apart from a point with coordinates (x, y)
an ellipse contains points with coordinates (x, -y), (-x, y) and
(-x, -y). The axes of symmetry are called the principal axes
of the ellipse and the centre of symmetry is the centre of the
ellipse. If a> b, then the x axis is called the major axis of
the ellipse and the y axis is the minor axis. The points of in-
tersection of the principal axes
of the ellipse with the ellipse itself
are called the vertices of the ellipse.
When a = b, the ellipse is a circle
of radius a with centre at the
origin. Suppose for definiteness that .r
a> band let
c1 = a 1 - b1 • (96.4)
The points 'F1 and F 2 with coordi-
nates (-c, 0) and (+c, 0) are called Fig. 96.1
the foci of the ellipse.
Theorem 96.1. The sum of the distances from any point of an ellipse
to its foci is a constant value equal to 2a.
Proof. For any point M (x, y) of an ellipse
2 I
2-b2 1JX
Y - -- a-.•

=((-; x+aVf' x+a.


2
=-:
The last equation holds, since -ex/a + a> 0 in view of the fact
that I x I~ a and cia < 1. Further
b2:z;~
p (M, F 1) = ((x+ c) 2+ y2) 1/ 2 = ( x2 +2xc +c2+ b2- --ar- ) 1/2

1/2
= ( x2 ( 1 - :: ) 1 2xc !...a 2 )
I
= ( --...L2xcTa
:z;2c2
a:t
I
2) 1/2

c ) 2) 1/2 c
=.( ( -;x+a =-;x+a.
Finally we have
c c
p(M,F 1)+p(.lf, Fz)= --x+a+-x
a a
-a= 2a.
322 Bilin('ar and Quadratic Forms [Ch. 11

1.2. The number a 0 is not zero; the numbers "-t. A2 and a 0 have the
same sign. Let

a=V~> (96.5)

Then (96.1) is equivalent to

(96.6)

It is clear that there is no point in the plane satisfying (96.6). It


is usual to say that (96.G) is the equation of an imaginary elltpse.
1.3. The number a 0 is zero; the numbers A1 and A2 have the same
sign. Let

a= VI ;11'
Then (96.1) is equivalent to
xl y2
QT+b2 =0. (96.7)

It is clear that only the origin satisfies (96.7). It is usual to say


that (96. 7) is the equation of a degenerate ellipse.
1.4. The number a 0 is not zero; the numbers A1 and A2 are opposite
in sign. Introducing new coefficients similar to (96.2) and (96.5)
we reduce (96.1) to an equivalent
.Y equation (up to a designation of the
variables)
(96.8)

The curve described by this equation


is called a hyperbola (Fig. 96.2) and
the equation is called the canonical
equation of a hyperbola. Unlike the
ellipse, the hyperbola is an unbounded
Fig. 96.2 curve. As in the ellipse, the axes of sym-
metry of the hyperbola are the coordi-
nate axes and its·centre of symmetry is the origin. The axes of sym-
metry are the principal axes of the hyperbola and the centre of sym-
metry is the centre of the hyperbola. One of the principal axes (the x
axis) intersects with the hyperbola at two points, which are called
the vertices of the hyperbola. It is called the transverse (or real) axis
of the hyperbola. The other axis (the y axis) has no points in com-
mon with the hyperbola and is therefore called the imaginary
(or conjugate) a:cis of the hyperbola. Let
c1 = a 2 + b2•
961 Second-degree curves 323

The points F 1 and F 2 with coordinates (-c, 0) and (+c, 0) are called
the foci of the hyperbola.
Theorem 96.2. The absolute value of the difference of the distances
from any point of a hyperbola to its foci is a constant value equal to 2a.
Proof. For any point M (x, y) of a hyperbola
bZ+ b2:z;2
2-
y-- --ar-·
For the same point
p (M. F2) = ((x- c) 2+ y2) 112= ( x2- 2xc + c2- b2 + --az
b2:z;2 ) 1/2

- 2xc + a 2) 112 = ( --;r-- 2xc + a2 )


bl ) :z:Zcz
= ( x2 ( 1 -;;:: 1/2

Further
= ( x 2+ 2xc + c2- b2 +--;;-
1/2
p (111, F1) = ((x + c) 2+ y2)1/2
b2:z;2 )

= ( x2 ( 1 + :: ) + 2xc + a2 r' 2
= ( x:~! + 2xc + a2)
112

= (( : X+ a) 2r'2 =I : X+ a,.
For all points of a hyperbola we have I x I~ a and cia> 1. Therefore
c

F,)~ {
-x-a for x>O,
a
p(M, c
--x+a for x<O,
a
c
-x+a
a
for x>O,
p(M, FJ- { c
--x-a for x<O.
a
Finally
I p (M, F 1) - p (M, F 2) I = 2a.
Consider the part of the hyperbola that is in the first quarter. For
that part x ~ a and y ~ 0, and equation (96.8) is equivalent to
b ;--
y=-1
a
x2-a2,

assuming of course b > 0 and a > 0. It is easy to see that this func-
tion can be represented in the following form:
b ba
y=- X- • (96.9)
a x+ V :z:2-a2
324 Bilinear and Quadratic Forms [Ch. 11

Along with function (96.9) consider the equation of a straight line


1 b
y =-x.
a
(96.10)
1
Let M (x, y) and M' (x, Y denote a point of hyperbola (96.9) and
)

that of the straight line (96.10) having the same abscissa x. With x
increasing without limit, the difference
1 ba
Y -y= X
+'r
V X 2 -a 2 '

remammg positive, is monotonically decreasing and vanishing.


1 1
Hence M and M converge but M always remains below M •

A o;imilar property holds for the other parts of the hyperbola too,
the role of the straight line (96.10) being played by one of the
straight lines
b b
Y=-x,
a
y=--x.
a
(96.11)

These are called the asymptotes of the hyperbola.


Notice that we said that (96.8) was the equation of a hyperbola.
However, another equation also called the equation of a hyperbola
is known from the school course.
According to (96.11) we make a change of coordinates

1 X+IJ
Y=a- "b·
From (96.8) we have

( : - : ) (: -1- ~) =1.
Hence in the new coordinate system (not rectangular, in general) the
equation of a hyperbola has the form
1
x'y = 1 (96.12)
or
1 1
Y=?·
This is just the familiar school-book equation. Equation (96.12) is
called the equation of a hyperbola in its asymptotes.
1.5. The number a 0 is zero; the numbers A. 1 and A. 2 are opposite in
sign. After a standard change of coefficients we get

(96.1~)
96] Second-dcgreo curves 325

equivalent to (96.1). From (96.13) we get

(: - ~ ) ( : + ~ ) =0
or
b b
Y=a-x, Y=--aX· (96.14)
Thus (96.13) is the equation of a curve splitting into two intersecting
straight lines (96.14).
Consider now the second equation of (95.13). It has the
form
(96.15) !I
11.6. Both numbers A. 2 and b0 are non-
zero. Let
2p=- ~: *0.
Now (96.15) is equivalent to the following
equation:
y 2 = 2px. (96.16) Fig. 96.3
The curve described by this equation is called a parabola (Fig. 96.3)
and the equation is called the canonical equation of a parabola. It
may be assumed without loss of generality that p > 0, since for
p < 0 we obtain a curve symmetric with respect to the y axis. Like
the hyperbola, the parabola is an unbounded curve. It has only one
axis of symmetry, the x axis, and no centre of symmetry. The point
of intersection of the axis of the parabola with the parabola itself
is called the vertex of the parabola. The point F with coordinates
(p/2, 0) is called the focus of the parabola. A straight line L given by
X=- ~ (96.1i)
is called the directrix of the parabola.
Theorem 96.3. The distance from any point of a parabola to its direc-
trix is equal to that from the same point to the focus of the parabola.
Proof. We have for any point M (x, y) of a parabola
p (L, M) = X+ ~ '
and further
2 12 2 2
p (F, M)= (( x- ~ ) + y2r = ( (x- ~ ) +2px r'
=(x2-px+ ~2 +2px)t'2=(x2-+-px+~2rt2

=((x+ ~ rr'2=x+ ~,
since x;;;;:: 0 and p > 0.
326 Bilinear and Quadratic Forms [Ch. 11

Finally, consider the third equation of (95.13). It has a quite


simple form:
(96.18)
III. 7. The number a 0 is not zero; the number A1 is opposite in sign to a 0 •
Let
a 2--~
- ).1.

Then the equation of a curve (96.18) is equivalent to


x2 - a2 = 0 (96.19)
or
x =a, x =-a. (96.20)
Hence the equation of a curve (96.19) is the equation of a curve split-
ting into two parallel lines (96.20).
111.8. The number a 0 is not zero; the number A1 coincides in sign
with a 0 • Let

Then (96.18) is equivalent to


x1 + a2 = 0. (96.21)
It is clear that there is no point in the plane whose coordinates
satisfy that equation. It is usual to say that (96.21) is the equation of
two imaginary straight lines.
111.9. The number a 0 is zero. In this case (96.18) is equivalent to
x 2 = 0. (96.22)
By analogy with (96.19) it is usual to say that (96.22) defines two
coincident straight lines, each defined by
X= 0.
Notice that for all points of an ellipse or a hyperbola we have the
following equations:
p(M, FJ=: jx- : 2
1,
2
(96.23)
p (M' Fl) = : I X+ : 1.
Straight lines a 1 (i = 1, 2) defined by
as a2
x--=0 x+-=0
c
(96.24)
c '
are called the directrices of lhe ellipse and of the hyperbola. We
shall label the directrix and lhe focus with the same index if they
97) Second-degree surfaces 327

are in the same half-plane given by the y axis. We can now show that:
The ratio of distances p (M, F,) and p (M, a 1) is constant for all
points M of an ellipse, a hyperbola and a parabola.
For the parabola this statement follows from Theorem 96.3. For
the ellipse and the hyperbola it follows from (96.23) and (96.24).
The ratio
p(M, Ft)
e = ....:.....,-=-='---'c:....
p(M, Ut)
is called the eccentricity. We have:
e=~=(1- b2 )''2<1
a a4
for the ellipse,
c ( b2 ) 1/2
e=-;z= 1+-;zr >1
for the hyperbola,
e =1
for the parabola.

Fxerclses
1. What is a diametrical hyperplane conjugate to
a given direction for second-degree curves?
2. Write the equation of a tangent for the ellipse, the hyperbola and the
parabola.
3. Prove that a light ray issuing from one focus of an ellipse passes, after
a mirror reflection from a tangent, through the other focus.
4. Prove that a light ray issuing from the focus of a parabola passes, after
a mirror reflection from a tangent, parallel to the axis of the parabola.
5. Prove that a light ray issuing from one focus of a hyperbola, after a mir-
ror reflection from a tangent, appears to issue from the other focus.

97. Second-degree surf aces


We now proceed to study second-degree sur-
faces given as equations (95.14). We first consider the equation
A1x 2 + A2 y 2 + A3 z2 + a 0 = 0. (97 .t)
I.f. The number a 0 is not zero; the numbers A1 , A2 and A3 are all
opposite in sign to a 0 • A standard change of coefficients yields
x2 y2 z2
-2
a + -bl +---:r-=
c 1. (97 .2)
The surface described by this equation is called an ellipsoid
(Fig. 97.1) and equation (97 .2) is called the canonical equation of an
elllpsoid. It follows from (97 .2) that the coordinate planes are planes
328 Bilinear and Quadratic Forms [Ch. 11

of symmetry and that the origin is a centre of symmetry. The numbers a,


b and c are the semiaxes of the ellipsoid. An ellipsoid is a bounded
surface contained in a parallelepiped I x I~ a, I y I~ b, I z I~ c.
The curve of intersection of the ellipsoid with any plane is an ellipse.
Indeed, such a curve of intersection is a second-degree curve. By
virtue of the boundedness of the ellipsoid that curve is bounded but
the only bounded second-degree curve is the ellipse.
1.2. The number a 0 is not zero; the numbers A- 1 , A- 2 , A. 3 and a 0 are all
of the same sign. A standard change of coefficients yields
.:z:2 y2 z2
a +-b2
-! +-l
c = -1. (97.3)
There is no point in space whose coordinates satisfy this equation.
Equation (97 .3) is said to be the equation of an imaginary ellipsoid.
1.3. The number a 0 is zero; the numbers A- 1 , A- 2 and As are all of the
same sign. We have
(97.4)
This equation holds only for the origin. Equation (97.4) is said to be
the equation of a degenerate ellipsoid.
1.4. The number a 0 is not zero; the numbers A-1 and A- 1 have the same
sign opposite to that of A.s and a 0 • A standard change of coefficients
yields
.:z:l 11 1 z2
li2 +v-~=1. (97.5)
The surface described by this
equation is called a hyperboloid of
one sheet (Fig. 97 .2) and the equa-
tion is the canonical equation of
a hyperboloid of one sheet. It fol-
lows from (97.5) that the coordinate
planes are planes of symmetry and
Fig. 97.1
the origin is a centre of symmetry.
Consider a curve Ln of intersec-
tion of the one-sheeted hyperboloid
with planes z = h. The equation of a projection of such a curve onto
the x, y plane is obtained from (97 .5) if we assume z = h in it. It
is easy to see that that curve is an ellipse

where
a•=aV 1 + h'l.jc'l. and b*= bY 1 +hz;cz,
its size increasing without limit for h-+ +oo. Sections of the one-
sheeted hyperboloid by the y, z and x, z planes are hyperbolas.
97) Second-degree surfaces 329-

Thus the hyperboloid of one sheet is a surface consisting of one


sheet and resembling a tube extending without limit in the positive
and negative directions of the z axis.
1.5. The number a 0 is not zero; the numbers A- 1 , A- 2 and a 0 are opposite
in sign to A- 3 • Similarly to (97.5) we have
.:z:! y2 z2
-+-----1
a2 b2 c2 - • (97.6}
The surface described by this equation is called a hyperboloid of
two sheets (Fig. 97.3) and the equation is the canonical equation of
a hyperboloid of two sheets. The coordinate
planes are planes of symmetry and the origin Z ~
is a centre of symmetry. Curves Lh of inter- 1
section of the two-sheeted hyperboloid with
planes z = h are ellipses the equations of whose
projections onto the x, y plane have the form
.:z:l yl
a*~ +b*'i"=f,
where a• = aV -1 + h 2/c 2 and b* =
= b V -1 + h 2 /c 2 • It follows that the cutting
plane z = h begins to intersect the two-sheeted
hyperboloid only when I h I ;;r c. In the region
between the planes z = -c and z = +c there
are no points of the surface under consider-
ation. By virtue of symmetry with respect to
the x, y plane the surface consists of two
sheets lying outside that region. The sections
of the two-sheeted hyperboloid by the y, z and Fig. 97.2
x, z planes are hyperbolas.
1.6. The number a 0 is zero; the numbers A- 1 and A- 2 are opposite insigrr
to A- 3 • We have
y2
.:z:! z2
-2
a +-b2 - -c! =0.
(97. 7)
The surface defined by this equation is called an elliptic cone
(Fig. 97 .4) and the equation is the canonical equation of an elliptic
cone. The coordinate planes are planes of symmetry and the origin
is a centre of symmetry. Curves Lh of intersection of the elliptic cone-
with planes z = h are ellipses. If a point M (x 0 , y 0 , z0 ) is on the
surface of the cone, then (97. 7) is satisfied by the coordinates of
a point M 1 (tx 0 , ty 0 , tz 0 ) for any number t. Hence the entire straight
line passing through M 0 and the origin is entirely on the given sur-
face.
We now proceed to consider the second equation of (95.14). We
have
"330 Bilinear and Quadratic Forms [Ch. 11

II. 7. The numbers 1..1 and 1.. 2 are of the same sign. It may be assumed
without loss of generality that b 0 is opposite in sign, since if b 0
-eoincides in sign with 1..1 and 1.. 2 , we obtain a surface symmetric with

Fig. 9i.3 Fig. 9i.4

respect to the x, y plane. A standard change of coefficients yields


X~ y2
z=ar+-;;r-. (9i.8)
The surface described by this equation is called an elliptic parabo-
loid (Fig. 9i.5) and the equation is the canonical equation of an ellip-
tic paraboloid. For this ~urface the x, z
z and y, z planes are planes of symmetry
and there is no centre of symmetry. The
elliptic paraboloid lies in the half-space
z~ 0. Curves Lh of intersection of the
elliptic paraboloid with planes z = h,
h > 0, are ellipses whose projections
onto the x, y plane are defined by
x2 y2
a•2 + b*l =1,
Fig. 9i.5 where a* = ayii and b* = b~ h. IL fol-
lows that as h increases the ellipses in-
-crease without limit, i.e. an elliptic paraboloid is an infinite cup.
Sections of an elliptic paraboloid by the planes y = h and x = h
are parabolas. For example, the plane x = h intersects the surface
in the parabola

lying in the plane x = h


~li) Second-degree surfaces 331

11.8. The numbers A. 1 and A. 2 are of opposite signs. A typical surface


for this case is defined by the equation

The surface described by this equation is called a hyperbolic


paraboloid (Fig. 97.6) and the equation is the canonical equation of
a hyperbolic paraboloid. The x, z and y, z planes are planes of sym-
metry and there is no centre of sym-
metry. Curves of intersection of z
the hyperbolic paraboloid with
planes z = h are hyperbolas

whPre a• = aVh and b* = byii', g


for h > 0, and hyperbolas
x2 y2
-aU+ b*l = 1,
where a• = aV -h and b* = Fig. 97.6
= bV -h, for h<O. The plane z = 0
intersects the hyperbolic paraboloid in two straight lines
b
y=±-x.
a
All surfaces defined by Equations III to V of (95.14) are indepen-
dent of z. Therefore projections onto the x, y plane of curves of
intersection of those surfaces with planes z = hare also independent
of h. Such surfaces are called cylinders with added adjectives: ellip-
tic, hyperbolic, etc. depending on the form of projection of the surface
onto the x, y plane.
Theorem 97. t. There are two distinct straight lines through each
point of a one-sheeted hyperboloid and a hyperbolic paraboloid, lying
entirely on those surfaces.
Proof. Consider a hyperboloid of one sheet given by its canonical
equation
x2 y2 z2
-;;T+ v- C2= t. (97.10)

For any a and ~ not both zero a pair of planes

a (: + 7) = ~ (1 - ~ ) , ~(: - 7) =a ( 1 + ~ ) (97 .11)


determines some straight line r. It is easy to verify that the given
straight line r lies entirely on surface (97 .10). Moreover, there is
one straight line of family r throu!lh each point of that surfare.
332 Bilinear and Quadratic Forms [Ch. tr

Indeed, look at (97.11) as a system of two equations


- p (1 - ~ ) = 0,
a ( ; -T- 7)

a (1 + ~ )-P(; -7)=0
in a and p. The determinant of the system is zero if and only if
a point M (x, y, z) is on hyperbola (97.10). The rank of the matrix of
the system is obviously equal to unity. Hence a and pare determined
up to proportionality. But this precisely means that there is a
unique straight line r through each point of a hyperboloid.
Similarly we can see that through each point of a hyperboloid
there is a unique straight line r• determined by the planes

v(:+7)=A(1+~). A(:-7)=v(1-:).
The straight lines r and r• are distinct. The same reasoning shows
that the hyperbolic paraboloid
.:z:2 y2
Z= Q'2-'b!"
is covered by two distinct families of straight lines n and n• deter·
mined by the planes

az = p ( ; + ! ), p= a(; - : )
and
vz = ). (-=._- JL)
a b '

Exercises
1. What is a diametrical hyperplane conjugate to
a given. direction. for second-degree surfaces?
2. Write the equations of a tangential plane for the various second-degree
surfaces.
3. Investigate the optical properties of second-degree surfaces.
CHAPTER 12

Bilinear Metric Spaces

98. The Gram matrix and determinant


Let <p (x, y) be some bilinear form introduced
in a vector space Kn over a number field P. The space Kn is said to
be bilinear metric if each pair of vectors x and y from Kn is assigned
a number (x, y) from P called a scalar product, with
(x, y) = <p (x, y).
If a bilinear form in a complex space Kn is Hermitian, then Kn is
said to be Hermitian bilinear metric. In these cases we shall also
say that a bilinear metric is introduced in the vector space.
Some similarity can be seen between bilinear metric spaces and
Euclidean and unitary spaces considered earlier. It should be stressed
from the outset, however, that there are significant differences.
Comparing the definitions of a scalar product in Euclidean and uni-
tary spaces with that of a bilinear form it is not hard to observe that
in bilinear metric spaces a scalar preduct may in general not be
symmetric and positive definite.
The study of Euclidean and unitary spaces reduced to the investi-
gation of additional properties of both the spaces and operators in
them arising from bilinear forms defining scalar products. The same
problem faces the study of bilinear metric spaces. The necessity of
introducing a weaker definition of a scalar product results from the
fact that it is by far not always that bilinear functions studied si-
multaneously with vectors of a space and operators possess the
symmetry property, to say nothing of that of positive definiteness.
Many definitions and facts will be the same both for ordinary
bilinear spaces and for Hermitian bilinear metric spaces. Where no
confusion arises, the word "Hermitian" will therefore be dropped and
the appropriate calculations will be performed only for bilinear
spaces, tacitly assuming that for Hermitian spaces they are per-
formed in a similar way.
The main method of investigating any vector space is to expand
a vector with respect to a given system of vectors and to study the
expansion depending on various factors. In a general vector space
there is no tool using which we could find an expansion but in
a bilinear metric space the scalar product turns out to be such a tool.
334 Bilinear Metric Spacos [Ch. 12

Take a system of vectors x 1 , x 2 , ••• , Xm of a bilinear metric space


Kn and a vector x E Kn. We see what comes from the presence of
a scalar product for the study of the possibility of an expansion
X = a 1x 1 +a 2X 2 + ... + ClmXm (98.1)
of a vector x with respect to the chosen system. Performing succes-
sively ~calar multiplication of equation (98.1) on the right by x 11
x 2 , • • • , Xm we obtain a system of linear algebraic equations
a 1 (x 1 , x 1 ) + a (x2 2, x1 ) + . ·.. + Clm (xm, X 1) = (x, X1 ),
a 1 (x 1 , x 2 ) + a 2 (x 2, x 2) + ... + Clm (xm, X 2) = (x, X2 ),
(98.2)
a 1 (x 1 , Xm) + a 2 (x2, Xm) + ... + Clm (xm, Xm) = (x, Xm)
to determine the unknown coefficients a 1 , a 2 , • • • , am of expansion
(98.1). The matrix G, the transposed matrix of that system, has the
form
(x 1, x 1) (x 1, xJ (xt, xm) )
G= (~ 2 ~ ~ 1 ~ ~x~, . x~). ~X~,. X.m~ (98.3)
(
(xm, Xt) (xm, Xz) (xm, Xm)
and is called the Gram matrix of the system of vectors x 1 , x 2 , • • • , Xm•
Its determinant G (x 1 , x 2 , • • • , Xm) is called the Gram determinant
or Gramian. Thus the problems of investigating expansions (98.1)
and of solving systems (98.2) prove to be closely related.
If vectors x 1 , x 2 , • • • , Xm form a basis of a space, then for them the
Gram matrix is the matrix of the basic bilinear form (x, y). The
Gram matrices for different bases are congruent and hence have the
same rank. The rank of a Gram matrix is an invariant of a bilinE'ar
metric space and is called the rank of the space. The difference
between dimension and rank of a space is called the deficiency of the
space. If the deficiency is different from zero, the bilinear metric
space is said to be singular. The nonsingularity of the basic bilinear
form implies the nonsingularity of Gram matrices for all bases. In
this case the bilinear metric space is called nonsingular. For a non-
singular space, system (98.2), where x 1 , . . . , Xm is a basis, always has
a unique solution, since the matrix of the system is nonsingular,
and this enables us to investigate the coefficients of expansion (98.1)
as a solution of system (98.2).
Suppose for some vectors x and y of a bilinear metric space Kn
we have (x, y) = 0. In this case the vector y is called right
orthogonal to x and x is called left orthogonal toy. In bilinear metric
spaces we are forced to distinguish between left orthogonality and
right orthogonality, since in the genE>ral case (x, y) :/= (y, x). If,
98) The Gram matrix and determinant 335.

however, (x, y) = (y, x) = 0, then such vectors will be called simply


orthogonal. Taking into account the linearity of a scalar product in
each independent variable, it is easy to verify that the right orthog-
onality of y to vectors x 1 , x 2 , • • • , Xm implies its right orthogonality
to any linear combination of them. The same can be said concerning
the left orthogonality. In particular, it follows that for a vector of Kn
to be orthogonal to all vectors of a linear subspace L it is necessary
and sufficient that it should be orthogonal to the vectors of some
basis of L.
Lemma 98.t. If the Gram matrix of a system ofvectorsx1 , x 2 , • • • , Xm
is singular, then there are ~·ectors u and v, nontrivial linear combinations
of vectors x 1 , x 2 , • • • , Xm, such that u is right orthogonal and v is left
orthogonal to all vectors of the span of x 1 , x 2 , • • • , Xm·
Proof. If the Gram matrix (98.3) is singular, then its rows are
linearly dependent. Hence there are numbers y 1 , y 2 , • • • , 'Ym not all
zero such that a linear combination of the rows is zero, i.e.
'\'1 (x 1 , x 1 ) + '\' 2 (x 2 , x 1 ) + ... + Ym (xm, x 1 ) = 0,
)'1 (x1 , x 2 ) + y 2 (x 2 , x 2 ) -i-- ••• + Ym (xm, x 2 ) = 0,
(98.4}
y1 (xl, Xm) --! '\12 (x 2, Xm) + ... + '\'m (Xm, Xm) = 0.
Letting
n
V= ~ YJXJ,
1=1

relations (98.4) imply that (v, xi) = 0 for every j. The vector v is
a nontrivial linear combination of vectors x 1 , x 2 , • • • , Xm and is left
orthogonal to each of those vectors and therefore orthogonal to each
vector of their span. The vector u is constructed in a l'imilar way but
proceeding from the linear dependence of the columns of the Gram
matrix.
Corollary. It the Gram matrix for a linearly independent system of
vectors is singular, then the quadratic form (x, x) has an isotropic vector
lying in the span of the given system and right (left) orthogonal to all
vectors of the span.
Indeed, by virtue of the linear independence of the vectors of the
~ystem the vectors u and v are nonzero; moreover, (u, u) = (v, v) = 0.
In a number of important cases the Gramian is a convenient tool
for establishing the fact of linear dependence or independence of
a system of vectors.
Lemma 98.2. For any linearly dependent system of vectors the
Gramian is zero.
Proof. Let a system x 1 , x 2 , • • • , Xm be linearly dependent. Then
the zero vector x can be represented as a nontrivial linear combina-
tion of vectors x 17 x 2 , • • • , Xm· But in this case the homogeneous
.336 Bilinear Metric Spaces [Ch. 12

system (98.2) must have a nonzero solution. Hence the determinant


of the matrix of that system, i.e. the Gramian of the system x 1 ,
-X 2 , • • • , Xm, is zero.
Theorem 98. t. If a quadratic form (x, x) has no isotropic vectors,
then the Gramian is not zero if and only if its system of vectors is linear-
ly independent.
Proof. Necessity. Let the Gramian of a system of vectors x 1 , x 1 , • • •
• • • , Xm be nonzero. Assuming that that system is linearly dependent,
the Gramian must be zero by Lemma 98.2, which is impossible
under the hypothesis.
Sufficiency. Suppose the system of vectors is linearly indepen-
dent. If the Gramian is zero, then according to the corollary of Lem-
ma 98.1 there must be an isotropic vector. Since this is impossible
under the hypothesis, the Gramian is not zero.
Corollary. If a quadratic form (x, x) is strictly of constant signs,
then the Gramian is zero if and only if the system of vectors is linearly
dependent.
Corollary. If a bilinear form (x, y) is symmetric and a quadratic
form (x, x) is strictly of constant signs, then for any two vectors x and y
we have the Cauchy-Buniakowski-Schwarz inequality
I (x, y) 11 ~ (x, x) (y, y), (98.5)
.equahty holding if and only if x and y are linearly dependent.
Under the hypotheses of this statement the Gramian will be posi-
tive for linearly independent vectors x and y, according to Sylvester's
criterion or its corollary, and zero for linearly dependent vectors,
according to Lemma 98.2. In both cases inequality (98.5) holds. If,
however, equality holds in (98.5), then x and y will be linearly de-
pendent according to the preceding corollary, since their Gramian
is zero.
Consider the following simple but sufficiently important proper-
ties of the Gramian. They not only lead to numerous consequences
but also not infrequently allow a clear-cut geometrical interpreta-
tion of them to be given.
Property t. The Gramian remains unaffected by an interchange of
<Lny two vectors in a system x 1 , x 2 , • • • , Xm·
Indeed, if any two vectors x 1 and x 1 are interchanged in x 1 , x 2 , • • •
• • • , Xm, then so are the ith and jth columns and the ith and jth rows
in the Gramian. The Gramian changes sign twice, i.e. as a result it
will remain unchanged.
Property 2. The Gramian remains unaffected by addition to any vec-
tors of a system x 1 , x 2 , • • • , Xm of any linear combination of the remain-
ing vectors.
Obviously it suffices to consider the case where the vector x 1 is
changed, since the other cases reduce to that case in view of Proper-
ty 1. Let a vector a~ 2 + ... + ClmXm be added to Xp Suppose the
98] The GraJtl matrix and determinant 337

bilinear form (x, y) is ordinary. It is easy to verify that the new


Gramian is obtained from the old Gramian by adding to the first
row the second row multiplied by a 2 , etc., up to the last row multi-
plied by am and adding to the first column the second column multi-
plied by a 2 , etc., up to the last column multiplied by am. As is
known, this leaves the Gramian unaftected. If the bilinear form (x, y)
is Hermitian, then the columns are multiplied by 2 , • • • , a am.
Property 3. If any vector of a system x 1 , x 2 , • • • , Xm is multiplied by
a number a, then the Gram ian is multiplied by a 2 , if the bilinear form
(x, y) is ordinary, and by I a 12 , if (x, y) is Hermitian.
Again it is sufficient to consider the case where x 1 is changed.
But multipying x 1 by a leads to the multiplication of the first row and
the first column of the Gram ian by a in the case of the ordinary bilin-
ear form (x, y). If, however, (x, y) is Hermitian, then the first row
of the Gram ian is multiplied by a and the first column is multiplied
by a.It is from this that the property follows.
Property 4. If each of the:vectors x 1 , x 2 , • • • , Xm is left (right) orthog-
onal to all the preceding vectors, then for the Gram ian we have
m
G (xl, Xz, ••• ' Xm) = rr (x,, x,).
t-t
(98.6)

Indeed, the left (right) orthogonality of each vector of the system


x 1 , x 2 , • • • , Xm to all the preceding vectors results in the Gram ma-
trix being right (left) triangular. But the determinant of a triangular
matrix is equal to the product of its diagonal elements, whence (98.6).
Especially interesting properties of the Gram matrix and the
Gramian arise in the cases where the bilinear form (x, y) is real sym-
metric or Hermitian-symmetric, and positive definite. Of course
these cases imply simply that a bilinear metric space Kn is in fact
Euclidean or, respectively, unitary.
In a Euclidean and a unitary space the Gram matrix for any basis
system is the matrix of a positive definite quadratic form (x, x).
According to Sylvester's criterion all principal minors of the Gram
matrix are positive. Since any linearly independent system of vectors
can be supplemented to a basis, it follows that we have
Lemma 98.3. In a Euclidean and a unitary space the Gramian for
any linearly independent system of vectors is positive.
In a Euclidean space the Gramian has a v-ery simple geometrical
interpretation. This is stated by
Theorem 98.2. In a Euclidean space the Gramian G (x1 , x 2, ••• , Xm)
of a system of vectors x 1 , x 2 , • • • , Xm equals the square of the volume
V2 (x 1 , x 2 , • • • , Xm) of that system.
Proof. Consider a real-valued function G112 (x 1 , x 2 , • • • , Xm) of m
independent vector variables x 1 , x 2 , • • • , Xm· It satisfies Properties
A and B of (36.3) according to Properties 2 and 3 of the Gramian.
338 Bilinear Metric Spaces [Ch. 12

In a Euclidean space each vector of any orthonormal system of vectors


is orthogonal to all the preceding vectors of the system. According
to (98.6) therefore G1 12 (x 1 , x 2 , • • • , Xm) satisfies Property C of (36.3)
as well. But it now follows from Theorem 36.1 that G1 12 (x 1 , x 2 , • • •
• • • , Xm) coincides with the volume of the system of vectors.
Corollary. For any system of vectors x1 , x 2 , • • • , Xm of a Euclidean
space
m
O~G(x 1 , x2, ••• , xm)~ [I (x,, x 1),
i=l
the equation at the left holding tf and only if the system of vectors IS
linearly dependent and the equatwn at the right holding if and only if
either the system of vectors is orthogonal or there is a zero vector in it.
The validity of the statement follows from the first corollary of
Theorem 98.1 and from the property of the volume of a system of
vectors described by Hadamard's inequality (36.1).
Corollary. For any system of vectors xlt x 2 , • • • , Xm of a Euclidean
space
G (x1, ... , x, Xt+I• •• • , Xm)~ G (xlt .... x 1)·G (xi+I• ... , Xm),
equality holding if and only if either the sets of vectors x1 , • • • , x 1
and xl+It ..• , Xm are orthogonal or one of them is a linearly dependent
system.
The proof is based on a simple analysis of formula (35.4). Recall
only the following. If L 1 £ L 2 , where L 1 and L 2 are any subspaces,
then I ortL! x I~ I ortL1 x I for any vector x, equality holding only
if X _L L2.

Exercises

1. Are the problems of finding expansions (98.1) and


of solving system (98.2) equivalent?
2. What yields the solution of system (98.2) if the vector x is not in the
span of vectors x1 , • • • , xm?
3. What does the Gram matrix (98.3) look like if:
the vectors x1 , ••• , xm are mutually orthogonal,
each of the vectors x1 , ••• , xm is left (right) orthogonal to all the preceding
(subsequent) vectors,
each of the vectors x1 , • • • , xm is left (right) orthogonal to all the subse-
quent (preceding) vectors,
each of the vectors Xf+ 1, • • • , xm is left (right) orthogonal to each of the
vectors x1 , • • • , xl?
4. How does the Gram matrix change under elementary transformations of a
system of vectors?
5. Prove that if in an ordinary bilinear metric space (x, y)= 0 always implies
(y, x) = 0, then the scalar product is given by either a symmetric or a skew-
symmetric bilinear form.
6. Is the statement of Exercise 5 true for a Hermitian bilinear metric space?
99) ::\onsingular 1>ubspaces 339

i. Let G be the Gram matrix for some basis in a nonsingular Hermitian


bilinear metric space Kn. Prove that for an operator U with matrix 'G-•G' io.
the same basis for each vector x E Kn
(Ux, Ux) = (x, x).
8. Prove that for any linear operator A in a Euclidean or a unitary space Kn
the rcttio
k(A)= G(Ax 1, •••• Axml
G 1, ,x ••• , Xm)
is independent of the vectors x1 , ••• , xm and equals the product of the squares
of the moduli of the eigenvalues of A.
9. Prove that for any linearly independent system of vectors x1 , • • • , xm of
a Euclidean or a unitary space and any vector 1
G (x 1 , ••• , rm• z) ~ G fx 1• ••• , Xm-l· z)
G(x 1 , ••• ,xm) """'G(X 1 , ••• ,Xm-ll

99. Nonsingular subspaces


Any linear subspace L of Kn can be regarded
as a bilinear metric space relative to the same scalar product that
is introduced in Kn. In general the nonsingularity of Kn does not
imply that of L and vice versa.
Theorem 99.1. For all subspaces of Kn to be nonsingular it is necessary
and sufficient that the quadratlc form (x, x) should have no isotropic
vectors.
Proof. Necessity. Let all subspaces in Kn be nonsingular. Then so
are all one-dimensional subspaces. But the Gram matrices for non-
zero vectors x coincide with the scalar product (x, x) which must be
nonzero by virtue of the nonsingularity of one-dimensional sub-
spaces.
Sufficiency. Suppose (x, x) :/= 0 for every x :/= 0. Consider any
subspace L and a basis x 1 , x 2 , • • • , Xm in it. By Theorem 98.1 the
Gramian for this system is nonzero, i.e. L is nonsingular.
Corollary. For all subs paces in a bilinear metric space Kn to be non-
singular it is necessary and sufficient that so should all of its one-dimen-
sional subspaces.
Corollary. In any ordinary complex bilinear metric space there are
singular one-dimensional subspaces.
To prove this statement it suffices to recall that in an ordinary
complex bilinear metric space any quadratic form has isotropic
vectors.
When a quadratic form has isotropic vectors, there are both sin-
gular and nonsingular subspaces in the bilinear metric space. If
a bilinear form (x, y) has a rank r, then it is clear that there can be
no nonsingular subspaces of dimension greater than r. But nonsingu-
lar subspaces of dimension r exist. Such, for example, is the subspace
spanned by those vectors of the canonical basis for which the Gram
matrix coincides with the matrix M of (92.5).
340 Bilinear Metric Spaces [Ch. 12

We shall say that a set of vectors F of a bilinear metric space Kn is


right, left orthogonal or simply orthogonal to a set of ,·ectors G of Kn
if for every pair of vectors x and y, where x E F and y E G, we have
a similar orthogonality relation. It is clear that the set of all the vec-
tors of Kn right (left) orthogonal to each vector of F is a subspace.
It is called the right (left) orthogonal complement ofF and designated
F J. ( J.F).
In a Euclidean and a unitary ~pace, subspaces J.Kn and Kk coincide
and consist only of a zero vector. In bilinear metric spaces they may
be distinct and do not necessarily consist only of a zero vector. The
subspaces J.Kn and K"fj are called respectively the left and right null
subspaces in Kn.
Observe that for any set of \"ector<; F there are always inclusions
Kk £ FJ. and J.Kn £ iF and that for any vectors for J.Kn or Kk the
Gram matrices turn out to be zero.
Theorem 99.2. The dimensions of the left and right null subspaces
are the same and equal the nullity of a bilinear form (x, y).
Proof. Choose in Kn some basis x 1 , x 2 , • • • , Xm· Take a vector x
of Kk and represent it as an expa1.sion with respect to the basis
according to (98.1). The condition that x should be in Kfi is equiva-
lent to the conditions of the right orthogonality of x to each vector of
the basis. But these conditions lead to a solution of a homogeneous
system of the type (98.2) for determining the coefficients of the expan-
sion. It is known (see Section 48) that the set of solutions of that
system is a subspace whose dimension is equal to the nullity of the
Gram matrix or equivalently to the nullity of the bilinear form (x, y).
The proof for the left null subspace is similar.
Corollary. For Kn to be nonsingular it is necessary and sufficient
that the right and left null subspa(:es should consist only of a zero vector.
In Euclidean and unitary spaces any subspace is orthogonal to its
orthogonal complement and determines decomposition of the entire
space not only into a direct, but even into an orthogonal, sum of
those subspaces. Similar facts do not always hold in bilinear metric
spaces.
Theorem 99.3. Let L be a subspace in Kn. For a decomposition
Kn = L;LJ. = L..f.. J.L (99.1)
to exist it is necessary and sufficient that L shouldlbe nonsingular.
Proof. Necessity. Suppose that decompositions (99.1) hold. We
shall look atLas a bilinear metric space with the same scalar product
as in Kn. The intersection L nLJ. is the right null subspace in L.
Since sums (99.1) are direct, that subspacP contains only a zero vec-
tor. According to the corollary of Theorem 99.2 this means that L
is nonsingular.
Sufficiency. If L is nonsingular, then L n LJ. will contain only
a zero vector and it is necessary to show that any vector x E Kn can
99) l\on1'ingular subspaccs 341

be represented as x = u + v, where u E L and v ELi. Take some


basis x 1 , x 2 , • • • , Xm in L. For the desired decomposition x = u + v
to exist it is necessarv and !'ufficient that there ~hould be a vector u
in L such that x - ~ is right orthogonal to the vectors x 1 , x 2 , • • •
• • • , Xm. Again we obtain a system of linear algebraic equations
with a Gram matrix to determine the coefficients of the expansion
of u with respect to x 1, x 2 , • • • , Xm· That matrix is nonsingular and
the system has a solution, i.e. the vector u exists. Of course all
that has been said concerning Li carries ove.r completely to iL.
Corollary. If a nonsingular subspace L is of dimension m, then the
dtmension of Li and .J..L is n - m.
To prove this it suffices to use equation (19.1) and recall that the
dimension of L n Li and L n iL is zero.
Corollary. If a nonsingular subspace L has a maximum dimension,
then Li = K~ and J.L = iKn.
Indeed, let the rank of the bilinear form (x, y) be r. As we have
already noted, the subspace L will be of dimension rand LJ. and J.L
will be of dimension n - r. But K~ and iKn are also of dimen-
sion n- r and, moreover, K~ c::: Li and iKn c::: iL. Therefore
Kn = Li and iKn = J.L.
As to decompositions of the type (99.1) into orthogonal sums, Theo-
rem 99.3 yields
Corollary. Let L be a nonsingular subspace of maximum dimension.
Decompositions (99.1) will be orthogonal if and only if the left and righ(
null subspaces coincide.
Indeed, if decompositions (99.1) are orthogonal, then LJ. is not
only right orthogonal but also left orthogonal to L, i.e. Li £ iL.
Similarly we have iL £ L.J... Hence LJ. = iL. Since Lis of maximum
dimension, this means that K~ = iKn. If, however, the null sub-
~paces coincide, then it follows that LJ. = iL, i.e. that LJ. and iL
arc both right and left orthogonal to L, and that decompositions
(98.5) are orthogonal.
\\'e can now give the answer to the question: What is the connec-
tion between expansion (98.1) and the solution of system (98.2)?
Let vectors x 1 , . . . . Xm form a basis of a nonsingular subspace L.
By Theorem 99.3 there are direct decompositions (99.1). Therefore
each wctor x of a bilinear metric space Kn can be uniquely represent-
f:'d as x = u + v, where 1t E L and v E .J..L. Recall that a vector u
is the projection of a vector x onto L parallel to the subspace iL.
If we solve the system of linear algebraic equations (98.2) and com-
pose a vector
(99.2)
then it is that vector that will be the projection of x onto L parallel
to iL. Indeed, u is in L and by (98.2) the difference x - u is left
orthogonal to the vector~ x 1 , • • • , Xm· Hence r - u is in iL. It is
342 Bilinear MPtric Spaces [Ch. 12

clear that in order to obtain the projection of x onto L parallel to LJ.


it is necessary to solve the following system:
ct 1 (x 1, x1) + ct 2 (x1, x 2) ...!... + am (x1, Xm) = (x1, x),
a 1 (x 2, x 1) ...!... a 2 (x 2, x 2) + ...!... am (x 2, Xm) = (x 2, x),
(99.3)
a 1 (xrn, x1) -r a 2 (xm. x 2) -;- ... -- am (xm, Xm) = (xm, x)
and then calculate the desired projection according to (99.2). In the
case of a Hermitian bilinear metric ~pace the coefficients a 1 in (99.2)
are replaced by 1. a
Exercises
1. Describe all nonsingular subspaces of maximum
diJDension.
2. Prove that for any set L there are inclusions
L £ .J._(L.i.), L c;::; (.I.L).J..
In what cases docs equality hold in these fonnulas?
3. Prove that if L is a nonsingular subspace of mu.imum dimension in a
space Kn, then
.L(L.L)-(J.L)~ = Kn.
4. Prove that if a scalar product is given by a s~·mmctric or a skew-symmetric
bilinear form, then for an}' set F we have F~ = -F.
5. What is the connection between expansion (98.1) and the solution of sys-
tem (98.2) if the Gram matrix or a systPm x 1 • x 2 , • • • , xm is singular?
6. Can there be 11 basis made up or Isotropic vectors in a nonsingular space?
i. What can be said about the scalar prod11ct if the projections onto a fixcd
subspace L parallel to -{,and L- coincrdc for all vcctors?
8. What can be said about the scalar product if the projections of a fixed
vector onto the entire subspace L parallel to ~ L and {,- coincide?
9. Let L be a nonsingular subsface or a Hermitian bilinear metric space Kn
of rank r < n. Prove that the fo lowing statements are equivalent:
the subspace J. L is of dimension n < r,
the subspace L.L is of dimension n - r,
the subspace L is of dimension r,
the subspaces L.L and K-it coincide,
the subspaces J. L and J. Kn coincide,
the subspace J. L consists of isotropic vectors and' a zero vector,
the subspace LJ. consists of isotropic vectors and a zero vector,
the scalar product on J. L is zero,
the scalar product on LJ. is zero.
10. What form will the Gram matrices have for the bases made up of bases
of a nonsingular subspace L and the subspace - L (L-'-)?

tOO. Orthogonality in bases


In bilinear metric spaces the bases are non-
equivalent. There are such among them for which systems (98.2)
can be solved and studied particularly simply. This happens, for
tOO) Orthogonality in bases 343

example, when much of a Gram matrix consists of zero elements.


Depending on the form the Gram matrices have we shall consider
different classes of bases in bilinear metric spaces.
The simplest matrices are diagonal matrices. Diagonal Gram ma-
trices occur if and only if the bases consist of mutually orthogonal
Yectors. Such bases will be called orthogonal. A system of vectors
formed by an orthogonal basis in their span will be called an orthog-
onal system.
Orthogonal bases can be defined in different ways. Definition in
terms of mutual orthogonality is not always convenient to check,
especially when the vectors of a basis are constructed successively
starting from the fust. It is sometimes useful therefore to employ the
following definition:
A basts e1 , e 2 , • • • , en is said to be orthogonal if each of its vectors is
orthogonal to all the preceding vectors.
A Gram matrix for vectors that satisfy this definition is diagonal,
so both definitions are equivalent. In the general case a basis may
contain both nonisotropic and isotropic vectors. Vectors of an or-
thogonal basis can always be interchanged so that nonisotropic vec-
tors are the first and isotropic vectors are the last. The diagonal form
of the Gram matrix is preserved of course.
Not all bilinear metric or Hermitian bilinear metric spaces have
orthogonal bases. If there is at least one orthogonal basis, then this
means that the matrix of the bilinear form (x, y) in the given basis
has a diagonal form. lienee the matrix of (x, y) must be congruent
with a diagonal matrix in any other basis. The converse is also true
of course. Therefore
For an orthogonal basis to exist in a bilinear metric or Hermitian
bilinear metric space it is necessary and sufficient that the matrix of the
bilinear form (x, !J) should be congruent with a diagonal matrix. The
set of all orthogonal bases coincides up to an interchange of vectors
with the set of canonical bases of (x, y).
Relying on our earlier studies of bilinear forms we can now say
that of the ordinary bilinear metric spaces only those spaces have
orthogonal bases in which the basic bilinear form (x, y) is symmet-
ric. As to Hermitian bilinear metric spaces, it is spaces with a Her-
mitian and skew-Hermitian basic bilinear form (x, y), as well as
those with a bilinear form (x, y) having the real or imaginary part of
the quadratic form (x, x) of constant signs, that have orthogonal
bases.
1\ote from the outset one fundamental difference between ordinary
and Hermitian bilinear metric spaces with orthogonal bases. In an
ordinary bilinear metric space Kn the existence of an orthogonal
basis implies the symmetry of the scalar product (x, y) and this in
turn ensures the existence of an orthogonal basis in any subspace
of Kn. In a Hermitian bilinear metric space, in general the existence
344 Bilinear Metric Spaces [Ch. 12

of an orthogonal basil; in the space itself does not automatically imply


the existence of an orthogonal basis in any of its subspaces. But
if a scalar product is given by a Hermitian-symmetric or skew-
Hermitian bilinear form, then the consequence is again valid.
Consider any orthogonal basis e 1 , e 2 , • • • , e,. of a bilinear metric
space K,.. There are as many isotropic and a!' many nonisotropic
'·ectors as are respecth·ely the deficiency and rank of K,.. Taking
into account the law of inertia for quadratic forms we conclude that
if thP bilinear form (x, y) is real symmetric or Hermitian-symmetric,
then there will be the same number of vectors with positive and
negative values of (e~o e 1) in any orthogonal basis. Each number is
invariant for all orthogonal bases in K,.. Accordingly we shall speak
of a positive and a negative index, as well as signature, of c;paceo; with
a symmetric form (x, y). For bilinear metric spaces with a nonsym-
mctric form (x, y), we shall speak only of the rank and deficiency of
the spaces.
If K,. is a nonsingular space, then no orthogonal basis e1 , e 2 • . • , en
has isotropic vectors. In this case, for any vector x E K,.
n
"" (:z:, e J) (100.1)
X= .<::::.J (eJo eJ) e,.
1=1

Indeed, performing a right scalar multiplication of


x = a 1e1 +a e
2 2 + ... + a,.e,. (100.2)
successively by the vectors e1 , e 2 , ••• , e,. we get
(:z:, eJ)
a--+-~~
J- (ej, eJ)

for every j. The vectors of an orthogonal basis in a nonsingular space


can be normed to yield an orthonormal basis. For an orthonormal
basis e1 , e 2 , • • • , e,. there are relations I (e1, e1) I = 1 for every j.
In singular spaces, there must be isotropic vectors among the
vectors of any basis. Therefore representation (100.1) no longer holds
for expansion (100.2) of vectors of a space. Orthogonal bases prove
to be sufficiently useful in these spaces too, however. As an illustra-
tion of their use, conside1
Theorem tOO. t. If there is an orthogonal basis in a space with a sca-
lar product, then the right and left null subspaces coincide.
Proof. Let there be an orthogonal basis e1 , e 2 , • • • , e,. in a space Kn
of rank r. It is assumed that the vectors e 1 , • • • , er are nonisotropic
and er+l• ... , e,. are isotropic. We take a vector x E K,. and expand
it according to (100.2). Using representation (100.2) and taking into
account the orthogonality of the basis and the isotropy of the vectors
er+l• ... , e,. it is easy to establish that (x, e1) = (e 1, x) = 0 for
r < j ~ n. Hence er+l• ... , en are simultaneously in the right and
100] Orthogonality in bases

in the left null subspace. But er+h ... , en are linearly independent
as vectors of a basis and equal in number the dimension of th&
null subspaces, so both null subspaces coincide.
Corollary. If in a bllinear metric space the scalar product is given.
by a symmetric or Hermitian-symmetric bilinear form, then its right
and left null subspaces coincide.
Corollary. In any orthogonal basis isotropic z:ectors. and only
isotropic vectors, form the basis of the null subspace in common.
Corollary. If in a space with a scalar product there is an orthogonal
basis, then that space can be decomposed as an orthogonal sum of an!f
nonsingular subspace of maximum dimension and a null subspace.
The last corollary actually means that the study of any singulazo
spaces with orthogonal bases reduces to studying separately non-
singular subspaces with orthogonal bases and subspaces on which
the scalar product is zero.
To know an orthogonal basis in a space is not only to be able to-
find an orthogonal basis in the nonsingular subspace of maximum
dimension but also to obtain an explicit expansion of the orthogonal
projection of any vector onto that subspace with respect to its orthog-
onal basis. Indeed, let e1 , e 2 , • • • , e, be an orthogonal basis in K,,
let e1 , • • • , er be nonisotropic vectors and let er+t• ... , e, b&
isotropic vectors. Denote by L the subspace spanned by the vectors
e1 , • • • , er. It is clear that it is nonsingular and of maximum dimen-
sion, that LJ. = J.L and in addition
K,=L~LJ..
Any vector x in K, can be represented in a unique way as a sum
x = u + v, where u E L and v E LJ.. Here u is called the left orthog-
onal projection of x onto a subspace Land vis the left perpendicular
to that subspace. We write for x expansion (100.2) with respect to
e1 , e2 , • • • , e,. Formula (100.1) no longer holds. Observe, however,
that the first r terms in (100.2) form a vector u and the last n - r
terms form a vector v. Performing a right scalar multiplication of
equation (100.2) successively by eh ... , er we get
r
"" (x, eJ)
u= .<::::.J (e}t eJ) eJ.
1=1

The projection v of a vector x onto the null subspace is defined very


simply
r
v=x- .~ (eJ, e1) e1•
(x, eJ)

J=l
The only thing that cannot be done now is to find the expansion of v
with respect to the vectors er+l• ... , e, using a scalar product~
although the expansion itself does exist.
346 Bilinear Metric Spaces [Ch. i2

As we have already said, not all bilinear metric and Hermitian


bilinear metric spaces have orthogonal bases. This circumstance
leads us to seek other classes of bases, more convenient from the
point of view of the scalar product given in space. A solution is sug-
gested by the canonical form of the matrix of a bilinear form.
A basis e1, e 2 , • • • • e,. is said to be pseudoorthogoual if each of its
vectors is left orthogonal to all the preceding vectors and each of
its isotropic vectors is left orthogonal to all vectors of the basis.
The system of vectors forming a pseudoorthogonal basis in their span
will be called a pseudoorthogonal system.
Observe that in this definition the left orthogonality of vectors
to all thl' preceding vectors can be replaced by the right orthogonality
of vectors to all the subsequent vectors. This gives the same condi-
tions.
The Gram matrix for the vectors of a pseudoorthogonal basis is
right trapezoidal. If the vectors of the basis are interchanged so that
the nonisotropic vectors are the first vectors and the isotropic vectors
are the last vectors, then the Gram matrix not only remains right
trapezoidal but will also have the canonical form (92.5). Our earlier
studies on reducing the matrix of a bilinear form to canonical form
give a complete answer to the question as to when there is a pseudo-
orthogonal basis. ~aml'ly,
There is a pseudoorthogonal basis in any Hermitian bilinear metric
space, as well as in any ordinary bilinear metric space, except for spaces
with a skew-symmetric bilinear form (x, y). The set of all pseudoorthog-
~mal bases coincides up to an interchange of vectors wtth the set of
canonical bases of (x, y).
Every orthogonal basis is pseudoorthogonal. An ordinary bilinear
metric space cannot contain simultaneously an orthogonal basis and
a pseudoorthogonal basis that is not orthogonal. That is because the
existence of at least one orthogonal basis implies the symmetry of
all Gram matrices. A right trapezoidal matrix may be symmetric
only if it is diagonal. A Hermitian bilinear metric space can contain
simultaneously both an orthogonal basis and a pseudoorthogonal
basis that is not orthogonal. This means that a right trapezoidal
~omplex matrix may be Hermitian congruent with a diagonal matrix,
which is also supported by example (92.8).
If Kn is a nonsingular space, then none of the pseudoorthogonal
bases has isotropic vectors, since a right trapezoidal matrix can be
nonsingular if and only if it is a right triangular matrix with nonzero
diagonal elements. In a nonsingular space, for the coefficients a. 1
of expansion (100.2) of a vector x with respect to the vectors of a
pseudoorthogonal basis e1 , e2 , • • • , en we obtain a system of linear
algebraic equations with a left triangular matrix. Indeed, multiply-
ing equation (100.2) successively on the right by e1 , e 2 , • • • , e,.
100) Orthogonality in bases 347

we find that
a.I (et, el) = (x, e 1),
al (el, e2) + a2 (e2, e2) = (x, e 2 ),
(100.3)
a 1 (e 1, e,.) -;- a 2 (e 2, e,) + ... +
a, (e,, e,.) = (x, e,).
From these we successively determine a~' a 2 , • • • , a,. Of course,
the yectors of a pseudoorthogonal basis in a nonsingular space can
be normed to yield a pseudoorthonormal basis such that I (eJo e1) I =
= 1 for every j.
Observe that the process of solving system (100.3) gives much
more than just a simple expansion of a Yector x with respect to
a pseudoorthogonal basis e 1 , e 2 , • • • , e,. Simultaneously, without
any extra costs we can determine all the vectors
u 11 = a 1e1 +a e
2 2 + ... + a"e 11 •
The vectors u 11 form a sequence of projections of the same vector x
onto embedded subspaces
L1 c::: L 2 s;:;: ••• s;:;: L,,
where L 11 is the span of vectors e 1 , e 2 • • • • , e11 • If we look at u 11
as an "approximation" to the solution x, then the left orthogonality
of the "error" v11 = x - x 11 to L 11 in fact implies the left orthogonality
of v11 to u 1 , u 2 , • • • , u 11 • We shall return to all these questions some-
what later.
If a space K, is singular, then in general the existence of a pseudo-
orthogonal basis does not guarantee the coincidence of the right
and left null subspaces and hence one cannot expect the space to be
decomposed as an orthogonal sum of some of its snbspaces. But knowing
a pseudoorthogonal basis makes possible an efficiPnt construction of
a decomposition of the space as a direct sum (99.1).
Suppose that in a space K, of rank r there is a pseudoorthogonal
basis e 1 , e 2 , • • • , e,. It will be assumed that the vectors e1 , • • • , er
are nonisotropic and er+I• ... , e, are isotropic. In a pseudoorthogo-
nal basis the isotropic ,·ectors are left orthogonal to all vectors of
thP basis, so they are left orthogonal to all vectors of K,. But this
means that the isotropic vectors of the pseudoorthogonal basis form
a basis of the left null subspace J.K,.. Denote by L the span of the
vectors e 1 , • • • , er. By the second corollary of Theorem 99.3
.
K,=L+l.L=L+J.Kn,
.
with bases known for both L and J.K,.. For L the basis e 1 , • • • , er
is pseudoorthogonal.
So the stndy of any singular spaces with a pseudoorthogonal basis
reduces to a simultanl'ous study of nonsingular subspaces with
Bilinear Metric Spaces [Ch. 12

a pseudoorthogonal basis and subspaces on which the scalar product


is zero.
Any vector of Kn can be uniquely represented as a sum x = u +
+ v, where u E L, v E .l.Kn. If for a vector x we write expansion
(100.2), then we again obtain a system of the type (100.3), but now
with a left trapezoidal matrix instead of a nonsingular left triangular
matrix, to determine the coefficients a. 1. Nevertheless we can deter-
mine from that system the first coefficients a. 1 , • • • , a., and we have
u = a 1e1 -:- a. 2 e 2 + ... + a.,e.,
i.e. the projection of x onto L is determined completely relying on
the knowledge only of a pseudoorthogonal basis in L. Again v =
= x - u and again we cannot find an expansion of the vector v
with respect to the vectors er+J• ... , en using a scalar product.
A pseudoorthogonal basis is a sufficiently general type of basis,
since almost all spaces have such a basis. As we already know, there
is no basis of this type only in ordinary bilinear metric spaces with
a skew-symmetric form (x, y). For these spaces the most convenient
type of basis is ob\'ious, it is certainly the canonical bMis of Gram
matrices. In general it is possible to introduce a type of basis cover-
ing all the above types of bases and existing in any space with a sca-
lar product. Few new factors follow, however, and we shall not dis-
cuss it now.
· Besides a single basis with certain orthogonality relations between
its vectors we shall sometimes deal with pairs of similar bases.
A basis / 1 , / 2 , • • • , f,. is said to be left (right) dual to a basis
e 1 , e 2 , • • • , en if (j,. e,) = 0 ((e,, fd = 0) for i :/= j, with (j,, e,)
((e, /,)) equalling 1 or 0 for every i.
A basis / 1 , / 2 , • • • , fn is said to be left (right) pseudodual to a
basis e1 , e2 , • • • , en if U~o e,) = 0 ((e,, /,) = 0) for j < i, with
U~o ed = 1 ((e 1, /,) = 1), and for every j, with (j, e 1) = 0
((e,, f,) = 0).
It is easy to see that the matrix of a bilinear form (x, y) is diagonal
in a pair of dual bases and right (left) trapezoidal in a pair of pseudo-
dual bases. The questions of the existenre and construction of dual
and pseudodual bases are closely related to equivalence transforma-
tions (91.4) of the matrix of a bilinear form (x, y) as well as to the
factorization of that matrix. We shall turn to a detailed study of
such bases only as need arises. For the present we shall restrict
ourselves to their brief discussion.
Theorem 100.2. In any nonsingular space each basis has a right
and a left dual basis, u·hich are unique.
Proof. Consider a basis e1 • e 2 , • • • , en in a nonsingular space Kn
and let Ge be the matrix of a bilinear form (x, y) in that basis. Accord-
ing to (91.4) finding a left (right) dual basis to e1 , e 2 , • • • , e,. is
equivalent to determining a matrix P (Q) for which P'Ge (GeQ) is
101) Operators ao.d bilinear forms 349

a unit matrix. Then P (Q) is the coordinate transformation matrix


for a change from the basis e1, e2 , • • • , en to the dual basis. Since
the space is nonsingular, so is the matrix Ge and there is a unique
solution: P = G; 1'(Q = G; 1 ).
Corollary. In any nonsingular space each basis has a left and a right
pseudodual basis.
Indeed, every left (right) dual basis is simultaneously a left (right)
p~eudodual basis.
Taking into account the form of the matrix of a bilinear form (x, y)
it is easy to establish that if in a nonsingular space we change from
a left (right) dual basis to another basis. one with a left triangular
coordinate tran~formation matrix with nnit diagonal elements, then
the new basis is left (right) pseudodnal.

Exercises
1. Let a scalar product be symmetric. Is the number of
vectors with positive, negative and zero values of (e 1, e,) invariant for nonor-
thogonal bases e1 , e2 , ••• , en?
2. How can any real or complex vector space be converted into a bilinear
metric space with a symmetric scalar product with a given rank ao.d signature?
3. An orthogonal basis has no isotropic vectors in a nonsingular space. Can
there be a basis consisting of isotropic vectors in such a space?
4. Prove that as functions of vectors of a bilinear metric space orthogonal
projection and perpendicular are linear operators.
5. What form has the Gram matrix for a pseudoorthogonal basis if the right
and left null spaces coincide?
6. Prove that io. any ordinary or Hermitian bilinear metric space there is
a basis in which the Gram matrix is a right block-triangular matrix with 1 X 1
and 2 X 2 blocks along the diagonal.
i. How can the coefficients of an expansion of a vector with respect to
a basis for which some dual or pseudodual basis is known be determined?
8. Prove that in a nonsingular space the coordinate transformation matrix
for a change from one basis, pseudodual to a given basis, to ao.y other pseudo-
dual basis of the same name is left triangular.

101. Operators and bilinear forms


If we have a linear operator in an ordinary or
Hermitian bilinear metric space, then of course all the results ob-
tained earlier for operators in a real or a complex space hold. We
shall therefore study only additional properties of operators due to
the presence in space of a scalar product.
One of the major objects is the adjoint operator. In a Euclidean
and a unitary space the adjoint operator was introduced using a
scalar product, but in investigating its properties wide use was made
of the existence in space of an orthonormal basis. Now we cannot
take this way, since there may be no orthogonal basis in the general
bilinear metric space. We shall make our studies in the Hermitian
bilinear metric space. Changes for the ordinary bilinear metric space
are very simple.
350 Bilinear .!llctric Spaces (Ch. 12

An operator A* (*A) in a Hermitian bilinear metric space Kn


is said to be a right (left) adjoint operator to an operator A in Kn
if for any vectors x, y E Kn
(Ax, y) = (x, A*y) ((x, Ay) =(*Ax, y)). (101.1)
Take a basis e1 , e 2 , • • • , en in Kn and let G, be the Gram matrix
for that basis. Denote by Ae the matrix of A in the basis e1 , e 2 , • • •
. . . , en and by A: and *Ae the matrices of A* and *A if they exist.
Theorem tOt. t. For any linear operator A in a nonsingular Hermi-
tian bilinear metric space there are unique adjoint operators A* and* A,
with
A •~ = G--~A-,G-
~ t ~'
*A e = G-~ 1'A-'G'
~ r• (101.2)
Proof. If A* exists, then according to (101.1), in matrix notation
of the type (61.2) and (91.i), we have
(Ax, y) =(Ax); GeYe = x; (A;G,) Ye•
(x, A*y) = x;Ge(A*y)e = x; (G,A:) y,.
The right-hand sides of these relations must coincide(fo( all vectors
x, and Ye· so A;Ge = G,A:, whence follows the first of the equations
(101.2). Similarly,
(x, Ay) = x;Ge (Ay), = x; (GeAe) Ye•
(*Ax, y) = (*Ax); GeYe = x: (* A;G,) Ye•
so GeA e = *A;G, and we obtain the second of the equations (101.2).
Equations (101.2) imply that if adjoint operators exist, then they
are unique. Take now those equations as a form of assigning the
right and left adjoint operators. It is easy to verify directly that the
operators thus constructed are linear and satisfy relations (101.1).
Corollary. If a Hermitian bilinear form (x, y) is symmetric or
skew-symmetric, then the right and left adjoint operators coincide.
Indeed, in these cases Ge = ±G; for any basis e1 , e 2 , • • • , en.
According to (101.2) we now conclude that A:= *A,.
It follows from the corollary that the right and left adjoint opera-
tors coincide in a unitary space. But this fact can be established in
another way. If in a unitary space an orthonormal basis e1 , e 2 , • • •
. . . , en is taken, then Ge = E and we obtain the well-known equa-
tions A:= *Ae =.A;.
The adjoint operators are connected with A by certain relations.
Note some of them, for example, for the right adjoint operator:
(A-+- B)*= A*+ B*,
(aA)* = aA *,
(101.3)
(AB)* = B*A*,
(A*)-1 = (A-1)*.
iOI] Operators and bilinear forms

For the left adjoint operator the relations are similar. All relations
can be proved according to the same scheme using representations
(101.2) for the matrices of adjoint operators. Therefore we shall
prove only the validity of the last property. We have
(A:tt = G;1 (A;tt (G;ltt = G;l (A;l)' G.= (A; I)•.
Comparing formulas (75.4) and (101.3) we can see the absence in
(101.3) of the analogue of the fir~t of the relations (75.4). It now looks
like this:
(*A)*= *(A*)= A. (101.4)
To prove its validity we again turn to representations (101.2) and get
(*Ae)* = G; 1(*Ae)' Ge = G;l (G; 11
A;G;)' Ge = G; 1GeA/[; 1Ge = Ae,
*(A:)= G; 1 ' (A:)' G~ = G; 1' (G; 1A;G;)' G; = G; 1 'G;AeG; 1 'G; = Ae,
i.e. relations (101.4) do hold.
Theorem 101.2. If in a nonsingular Hermitian bilinear metric
space an operator A has in some basis a matrix J, then in a right (left)
dual basis the operator A • (*A) has a matrix J*.
Proof. Let A have a matrix J in a basis e1 , e2 , • • • , en. Consider
a right dual basis / 1 , / 2 , • • • , In· Denote by Ge, G, and Get= E
the matrices of a bilinear form (x, y) in the corresponding bases.
If P is a coordinate transformation matrix for a change from the
first basis to the second, then we have
G.=G.,J5-t=p-t, G,=P'Get=P'
and then, taking into account (63.7) and (101.2), we get
Ai = G!1AjG1 = G/ 1 (P- 1JP)' G1 = G! 1G;J'G! 1G1 = J*.
If, however, A has a matrix J in a basis / 1 , / 2 , • • • , ln. then for
that basis the basis e1 , e2 , • • • , en is the left dual basis and we now
find
• Ae = G; 1 ' A;G; = G; 1 ' (PA 1 P- 1)' G; = G; 1 'G~J'G; 1 'G; = J•.
This theorem is as significant in the study of adjoint operators in
Hermitian bilinear metric spaces as Theorem 75.2 is in unitary
spaces. It follows from it in particular that the right and left adjoint
operators A • and • A have the same eigenvalue~. complex conjugate
to those of A, that the right and left adjoint operators A • and • A
have a simple structure, if A has a simple structure, and so on.
· Besides a scalar product (x, y) other Hermitian bilinear forms can
be given in a Hermitian bilinear metric space. Consider, for example,
functions of the form (Ax, y) and (r, Ay), where A is an arbitrary
linear operator. It is not hard to see that they are Hermitian bilinear
352 Bilinear Metric Spaces [Ch. 12

forms. Different operators give different forms in any nonsingular


space Kn. Indeed, if A and Bare different operators, then Ax:/= Bx
at least for one vector x. Suppose (Ax, y) = (Bx, y) for each y E Kn.
It follows that ((A -B) x, y) = 0 for each y E Kn, i.e. that
(A -B) x E J.Kn. But in a nonsingular space the subspace J.Kn
-consists only of a zero vector, so Ax = Bx.
Theorem 10t.3. In a nonsingular Hermitian bilinear metric space Kn
.any Hermitian bilinear form <p (x, y) can be uniquely represented as
<p (x, y) = (Ax, y) = (x, By),
~vhere A and B are some linear operators in Kn.
Proof. Choose in Kn some basis e 1 , e 2 , • • • , en and let Ge be
the Gram matrix in that basis and <De the matrix of the form <p (x, y).
We ha,·e
~ (x, y) = x;<DeYe = x;<DeG; 1G,i, •
= (G; 1'<D;xe)' GeYe = x;GeGe 1<DeY: = x;Ge (G; 1<DeYe)·
Now the matrices Ae and B 11 of the desired operators A and B are
<lefined by
(101.5)
The uniqueness of A and B was proved earlier.
An adjoint operator can be defined using a scalar product. There-
fore, if different scalar products are introduced in a vector space, then
the same linear operator will have different adjoint operators. Sup-
pose in a vector space together with a scalar product given by a bi-
linear form (x, y) we introduce scalar products given by forms
(Mx. y) and (x, My). Label with a subscript M on the left (right)
adjoint operators relating to the scalar product (Mx, y) ((x, My)).
Theorem t0t.4. For any operator A and a nonsingular operator M
MA· = (M AM"- 1)*' LA= M- 1 (*A) M.
(101.6)

Proof. Choose some basis e1 , e 2 , • • • , en and let Ge and Me be the


matrices of a bilinear form (x, y) and of the operator M in that
basis respectively. According to (101.5) the matrix of a bilinear
form (Mx, y) is equal to M;G'!. According to (101.2) we now find
MA: = (M;G,)- 1 A; (M;Ge) = G-; 1 (M; 1' A;M;) G:
= G; 1 .,.,(M:-::-,A-,-e-=M-=;-:1)' Ge = (M AM- 1):,

1,A, = (M~Get 1 ' A; (M'Ge)' = M; 1G; 1' A;G;M


= M;1 (G; 11A;G;) Me= M;1 (*A,) M II•
101] Operators and bilinear forms 353

These matrix equations prove the validity of the first group of the
operator equations (101.6). The second group follows trivially from
the first, if we take into account the equation (x, My) = (* Mx, y)
and relations (101.2).
There are different types of operators in the Hermitian bilinear
metric space. An operator A is said to be Hermitian or self-adjoint,
if for any x, y E Kn
(Ax, y) = (x, Ay),
and skew-Hermitian or skew-adjoint, if
(Ax, y) = -(x, Ay).
Hence respectively
A= A*= *A, A= -A*= -*A.
An operator A is said to be isometric if for any x, y E Kn
(Ax, Ay) = (x, y).
This leads to the equations
*AA = A*A.
In an ordinary bilinear metric space the analogues of the Hermi-
tian and the skew-Hermitian operator are called a symmetric and
a skew-symmetric operator respectively. In what follows we shall
often deal with operators defined by the equation
A • = a.E + ~A (101.7)
for some numbers a and p.
By far not all properties of operators of a special form carry over
from the unitary to the Hermitian bilinear metric space, although
they do have something in common. We shall not discuss all these
questions.
Exercises
1. How are the characteristic polynomials of operators
A, A• and •A related?
2. Let a subspace L be invariant under an operator A. Prove that the sub-
space L.l. (.l.L) is invariant under A• (•A).
3. Prove that any ei~envector of an operator A corresponding to an eigen-
value ). is left (right) orthogonal to any eigenvector of A • (• A) corresponding to
an eigenvalue f.1. =I= ):.
4. Prove that any root vector of an operator A corresponding to an eigenval-
ue).. is left (right) orthogonal to any root vector of A• (•A) corresponding to
an eigenvalue f.1. =I= i.
5. Prove that the eigenvalues of a Hermitian (skew-Hermitian) operator
corresponding to nonisotropic eigenvectors are real (pure imaginary).
6. Prove that the moduli of the eigenvalues of an isometric operator cor-
responding to nonisotropic eigenvectors equal unity.
354 Bilinear Metric Spaces [Ch. 12

7. Suppose in a nonsingular space the scalar ,Product is Hermitlan-s}'J!lmet-


ric. Prove that if A is a Hermitian (skew-Hermitian) operator, then the blllnear
form (A.:z:, y) is Hermitian-symmetric (skew-symmetric).
8. Suppose in a nonsingular space the scalar product is Hermitian-symmet-
ric. Prove that if the bilinear form (Ax, y) is Hermitian-symmetric (skew-sym-
metric), then the operator A is Hermitian (skew-Hermitian).
9. How do the statements of Exercises i and 8 change if the scalar product is
skew-Hermitian?
10. Prove that if an operator A satisfying condition (101.7) has at least two
distinct eigenvalues, then I ~ I = 1.

102. Bilinear metric Isomorphism


It was shown in the study of Euclidean and
unitary spaces that to within isomorphism there is only one space of
each dimension n. For bilinear metric spaces the situation is more
com plica ted.
We introduce the concept of isomorphism. We shall say that ordi-
nary or Hermitian bilinear metric spaces over the same number field
are isomorphic if they arc isomorphic as vector spaces and the scalar
products of pairs of corresponding vectors are equal to each other.
It follows from this definition that in isomorphic spaces the Gram
matrices of the systems of corresponding vectors coincide. The con-
verse is also true. If in bilinear metric spaces over the same number
field there are such bases in which Gram matrices coincide, then
those spaces are isomorphic. Indeed, by establishing a correspondence
between bases with equal Gram matrices we ensure that scalar
products coincide for any pair of vectors in the bases and hence for
any pair of vectors.
Theorem to2. t. Ordmary (Hermitian) bilmear metric spaces over
the same number field are isomorphic if and only if the Gram matrices
of arbitrary bases of those spaces are congruent (Hermitian-congruent).
Proof. Necessity. The Gram matrices of all bases of the same
space are congruent and they coincide on the corresponding bases of
different spaces. By virtue of the transitivity of the congruence rela-
tion, the Gram matrices of arbitrary bases of isomorphic spaces are
congruent.
Sufficiency. If the Gram matrices of arbitrary bases of bilinear
metric spaces are congruent, then there are bases in different spaces
on which the Gram matrices coincide. But then the spaces are iso-
morphic.
The theorem says that the problem of classifying bilinear metric
spaces is equivalent to that of classifying bilinear forms to within
congruence. Consider some classes of bilinear metric spaces.
A real bilinear metric space Kn is said to be pseudo-Euclidean if the
scalar product is given by a nonsingular symmetric bilinear form.
For an arbitrary basis of a pseudo-Euclidean space the Gram ma-
trix is real symmetric and, as \ve know, congruent with a diagonal
102] Bilinear metric isomorphism 355

matrix with elements ±1. This means that in every pseudo-Euclid-


ean space there is a basis in which a scalar product (x, y) of vectors
s
x and y with coordinates 11 • • • , Sn and TJ 11 • • • , 'I'Jn is given by
the formula
(x, Y) = S1fJ1 + • • • + Ss'I'Ja - Sa+t'l•+l - •••- SnfJn•
To within isomorphism pseudo-Euclidean spaces are defined by their
two characteristics: dimension and signature, a positive and a nega-
tive index and so on. Of particular int~rest to physics among the
pseudo-Euclidean spaces is four-dimensional space with a positive
index equal to unity. This is the so-called Minkowski space-time or
Minkowski universe. It is the space-time of special relativity.
A real bilinear metric space Kn is said to be simplectic if the sc11lar
product is given by a nonsingular skew-symmetric bilinear form.
The Gram matrix for any simplectic space is skew-symmetric and
therefore congruent with a block-diagonal matrix with blocks of
the form ( -~ ~) • Consequently the dimension of a simplectic
space is always even and, to within isomorphism, there is only one
simplectic space of a given even dimension. There is a basis in such
a space in which the scalar product of vectors x and y with coordi-
nates s1 , • • • , s,. and fJ 1 , • • • , TJn has the form

A complex bilinear metric space Kn is said to be complex Euclidean


if the scalar product is given by a nonsingular symmetric bilinear
form.
For any basis its Gram matrix is complex symmetric and congru-
ent with a unit matrix. To within isomorphi!'m there is only one
complex Euclidean space of each dimension. Jn any complex Euclid-
ean space there is a basis in which the scalar product of Ycctors x
and y is as follows:

A complex Hermitian bilinear metric space is said to be pseudouni-


tary if the scalar product is given by a nonsingular Hermitian-sym-
metric bilinear form.
The Gram matrix for any pseudounitary space is Hermitian. It is
Hermitian-congruent with a real diagonal matrix with elements ±1.
There is always a basis therefore in which the scalar product of
vectors x and y has the form

(r, y)=~.fi.+ ... +ssns-~s+•~•+t- ... -sn~...


356 Bilinear Metric Spaces [Ch. 12

where sl.... , Sn and 111• .•. , Tin are the coordinates of X and y.
Again to within isomorphism a pseudounitary space is uniquely
defined by its two characteristics: dimension and signature, a posi-
tive and a negative index and so on.

Exercises
t. Prove that in isomorphic spaces to orthogonal
(pseudoorthogonal, dual, pseudodual) bases there correspond orthogonal (pseu-
doorthogonal, dual, pscudodual) bases.
2. Prove that in isomorphic spaces to nonsingnlar subspaces there corre-
spond nonsingular subspaces.
3. Prove that in isomorphic spaces perpendicular and projection go over
into perpendicular and proJection respectively.
4. Prove that in isomorphic spaces the Gramians of the corresponding systems
of vectors are equal.
CHAPTER 13

Bilinear Forms
in Computational Processes

103. Orthogonalization processes


One of the major concepts associated with
any bilinear metric space is that of orthogonality. We have often
seen what important role orthogonal systems of vectors and espe-
cially orthogonal base~ play in the study of Euclidean and unitary
spaces. Of no less significance is the role played by bases with ortho-
gonal vectors in other spaces. Up to now, however, most of our rea-
soning has been connected with proving the existence of such systems
and not with the processes of constructing them. The only exception
in a sense is the general method of transforming the matrices of
bilinear forms to canonical form and the related construction of
canonical bases. In view of the importance of orthogonal, pseudoor-
thogonal and other similar systems for designing diverse computa-
tional algorithms we consider now a general process of constructing
such systems in bilinear metric spaces. ·
Let (z, y) be a scalar product given in a complex vector space Kn
using some nonsingular Hermitian bilinear form. We consider a basis
e1 , e 2 , • • • , en and try to construct another basis, 111 11 , • • • , In•
possessing the following two properties:
(1) for any k ;;> 1 the spans L., of vectors e1 , • • • ,e.,and 11 , • • •
• • • , / 11 coincide,
(2) the basis 11 , • • • , In is pseudoorthogonal.
Let (e1 , e1) :/= 0 and put 11 = e1. Suppose a system of pseudoorthogonal
vectors 11 , • • • , 1., has already been constructed, with the span of
those vectors and that of e1 , ••• , e11 coinciding and (j, 11) :f= 0 for
1 ~ i ~ k. We shall seek a vector 111.+ 1 in the form

fHt = e.,+t + ~" a.,,.,+dh


f=l
(103.1)
"+
v.here a 1 • 1 , • • • , a 11 , 1!.+1 are unknown coefficients. The conditions
of the left orthogonality of 1~~.+ 1 to 111 • • • , 1., give the following
system of linear algebraic equations to determine. a 1 , ll.+ll .••
• .. , a.,, 11.+1:
a1, 11.+1 (11, l1) = -(ell+1• l1),
+
al, 11.+1 (fll l2) a,, 11.+1 <J2, l2) = -(ell.+1• l2),
(103.2)
+
a1, 1!.+1 (flt 1,.) a2, 11+1 (j,, /.,)+
... + a,., 11.+1 (f.,, 1.,) = -(ell+l• 1,.).
358 Bilinl'ar Forms in Computational Processes [Ch. i3

The matrix of the system is left triangular. Under the assumption its
diagonal elements are nonzero, so system (103.2) has a unique solu-
tion. It is clear that the vector I II.+ 1 thus constructed and the vectors
/ 1 , . . . , l11. form together a pseudoorthogonal system and that their
span coincide!' with that of e1 , • • • , eh+I· The system of vectors
/ 1 , • • • , l11.+ 1 is linearly independent, for so is the system 11 , • • •

• · ., fh, e11+1·
We continue the process further. If it turns out that for every i
(j, 11) are nonzero, then the resulting system of vectors 11 , ••• , In
will be the desired pseudoorthogonal basis. Of course, we can now
norm the vectors 11 , . • • , In and obtain a pseudoorthonormal basis.
A useful consequence follows from (103.1). We rewrite it as
II.
el!.+t = (- ~ a.,,h+dt) + fh+t·
f=l

The vector in the parentheses lies in L 11 and the vector 111.+ 1 is in J.L,
by construction, so the solution of system (103.2) gives in fact a de-
composition of each vector e11 + 1 into the projection and left perpen-
dicular relative to L 11 •
The process is greatly simplified if the scalar product is given by
a Hermitian symmetric bilinear form. In this case the conditions
(j, IJ) = 0 for j < i imply that the conditions (f, !1) = 0 hold for
f =fo i. System (103.2) therefore becomes a system with a diagonal
matrix and we have
(ell+h ft)
a.,,l!.+t =- (f, It)

for every i. The constructed basis 11 , 12 , • • • , In will be not only


pseudoorthogonal but it will also be orthogonal.
The only thing that may prevent the construction of a pseudoor-
thogonal basis 11 , • • • , In from e1 , • • • , en is that one of the scalar
products (/ 1, 11), i < n, may vanish. Such a situation is called degen-
erate. No degenerate situation will obviously set in if the quadratic
form (x, x) has no isotropic vectors, if, for example, it is strictly of
constant signs. Indeed, equation (f, 11) = 0 is then possible only
for I,= 0. But 11 =#=0 for every i, since vectors 11 , • • • , 11 are linear-
ly independent. Hence the process can now be realized under any
choice of basis elt ... , en.
In many problems it is not necessary to preserve the relation of
the new basis 11 , • • • , fn to the original basis e1 , • • • , en, since it is
required to construct only some pseudoorthogonal basis in space.
It is then necessary whenever (!,. 11) = 0 occurs to replace e1 by
another vector and compute again a vector 11• repeating the procedure
until the condition (/ 1, 11) =#= 0 holds. The vectors l~t ... , 11 _1
remain unchanged.
103) Orthogonalization processes 359

The vector required for the replacement of e1 will necessarily be


found. Suppose that (/ 1, f 1) = 0 holds for any vector e1• By virtue
of the left orthogonality of the vector f 1 to the vectors e1, • • • , e 1-1
this means that the subspace J.L 1_ 1 consists only of isotropic vectors
and a zero vector. But the subspace L 1 _ 1 is nonsingular, so J.L,_ 1 =
= J.Kn. The last equation is impossible for i - 1 < n, since due
to the nonsingularity of Kn the subspace J. Kn consists only of a zero
vector.
Similarly it is possible to construct a basis pseudodual to a given
one. Suppose again that e1 , e 2 , • • • , en is a given basis and that
it is necessary to construct a basis pseudodual to it, for example a left
basis. Take another basis, q1, q2, ... , qn. Suppose (q1, e1) :/= 0
and put t 1 = ql" Assume that a system of vectors t 1 , . . • , t~r has
already been constructed such that their span coincides with that
of vectors q1 , ••• , q11 , and that (t 1, e1) :/= 0 holds for 1 ~ i ~ k
and (t, e1) = 0 does for j < i. We seek a vector tll+ 1 in the form
II
tl!.+t = qll+t +~ t=l
~l.ll+tt, (103.3)

where ~ 1 . ll+ 1, ... , ~~~. ll+ 1 are unknown coefficients. The condi-
tions of the left orthogonality of tll+ 1 to the vectors e1, ... , e"
again yield for the determination of ~ 1 . 11.+ 1, ... , ~~~. 11+ 1 a system
of linear algebraic equations with a left triangular matrix:
~1.11.+1 (tit e1) = -(qll+I• e1),
~1,11.+1 (th e2) + ~~. 11+1 (t2, e2) = -(qlt+1• e2),
(103.4)
~1, 11+1 (t1, e~r) + ~2, lt+J. (t2, e~r) + ... + ~~~. 11+1 (t~~., e~r)
= -(qlt+1• e~~.).
According to the assumption about diagonal elements the system
has a unique solution. If it is found in continuing the process that
the quantities (t, e 1) are nonzero for all i, then after an appropriate
normalization the resulting system of vectors is a left pseudodual
basis to e1 , e 2 , • • • , e,.. Notice that now the process does not become
simpler if the scalar product is given by a symmetric bilinear form.
Employing an auxiliary basis q1 , q 2 , • • • , qn makes it possible to
avoid the degeneration of the process by replacing at the proper time
one of the vectors q1 and repeating the computation of the vector t 1•
Again the vectors t 1 , • • • , t 1 _ 1 remain unchanged.
In what follows, regardless of their particular content all the above
and similar processes will most often be called orthogonalization
processes. We shall sometimes have, however, to construct in the same
bilinear metric space Kn sequences of vectors orthogonal or pseudo-
orthogonal relative to different bilinear forms. We shall discuss only
360 Bilinear Forms in Computational Processes [Ch. 13

bilinear forms of the form (Rx, y), where R is some linear operator
in Kn. To distinguish between sequences, we shall speak in this case
of R-orthogonalization, R-pseudoorthogonalization, and so on.
Many properties and features of orthogonalization processes can
be established by considering their matrix notation. Let a scalar
product in Kn be given by a Hermitian bilinear form (x, y). The
pseudoorthogonality of a basis / 1 , / 1 , • • • , In implies that (f, fJ) =
= 0 for j < i, i.e. that the Gram matrix G1 of the bilinear form (x, y)
in the basis / 1 , / 2 , • • • , In is right triangular. According to the
process of constructing a new basis the spans of vectors / 1 , • • • , !11.
and e1 , • • • , e11. coincide. Hence in view of (103.1) we conclude that
el = /1•
e2 = -al, J1 + /2,
(103.5)
en= -Cll, n/1- Cl2, n / 2 - • • • - Cln-1,nfn-1+/n,

where a 11 are precisely the coefficients that are computed from system
(103.2). Therefore the coordinate transformation matrix A for a
change from the new basis / 1 , / 2 , • • • , In to e1 , e2 , • • • , en is a right
triangular matrix with unit diagonal elements. Since the coordinate
transformation matrix for a change from the old basis to a new basis
coincides with A -I, we have
G, = A-I'GeA-1.
Hence
(103.6)
It is easy to verify that G1A is a right triangular matrix whose
diagonal elements coincide with those of G1.
Denote byE q (F q) a matrix whose columns are the coordinates of
vectors e1 , • • • , en (/1 , • • • , In) in a basis q1 , • • • , qn. Relations
(103.5) show that
(103.7)
and that of course
(103.8)
Thus the above process of constructing a pseudoorthogonal basis
proves to be closely related to the factorization of a Gram matrix
into triangular factors and to factorization (103. 7) into the factors
of the matrix of coordinates.
Theorem 103.1. For process (103.1) and (103.2) of constructing
a pseudoorthogonal basis / 1 , / 2 , • • • , In from a basis eu e1 , • • • , en
in a nonsingular bilinear metric space K,. to be implementable it is
necessary and sufficient that the Gram matrix of the system e1 , e 2 , • • •
. . . , en should have nonzero princtpal miTUJrs.
103] Orthogoo.alization processes 361

Proof. Necessity. Let the process be implementable, i.e. let rela-


tion (103.6) hold. The matrix G1 is nonsingular, since it is the Gram
matrix of a nonsingular bilinear form (x, y) for a basis. Therefore
all its diagonal elements are nonzero. Applying the Binet-Cauchy
formula we get
c.(1 2 ...
12 ... r
r)=A'G,.A(1 2 ...
12 ... r
r)=Gt(1
2 ...
12 .. .
rr):;e=o
for every r.
Sufficiency. Let the principal minors of the Gram matrix G,
be nonzero. Hence according to (93.1) there is a decomposition
Ge = L.D,U e• where Le is a left triangular matrix with unit diago-
nal elements, De is a diagonal matrix with nonzero elements and U e
is a right triangular matrix with unit diagonal elements. It is easy
to see that the matrix
Gt-- u--e 1 'G fl u-~-
e - u--l'L
e P
ne
is a left triangular matrix whose diagonal elements coincide with
those of D 11 • If we now take as a matrix A a right triangular matrix
U e with unit diagonal elements, then relations (103.5) will hold for
the basis / 1 , / 2 , • • . , fn· It is this basis that will be constructed
according to process (103.1) and (103.2), which can easily be seen
from a direct check.
If the scalar product is given by a symmetric Hermitian bilinear
form, then the matrix Ge will be Hermitian, as well as the matrix Gt·
But it follows that G1 is diagonal. This fact has already been noted.
Comparing (93.5) and (103.6) we conclude that in this case the orthog-
onalization process virtually coincides completely with the process
of obtaining decomposition (93.5).
If Kn is a unitary space, then the orthogonalization process deter-
mines not only the factorization of the Gram matrix into triangular
factors but also the decomposition of the matrix of coordinates as
a product of the unitary and the right triangular factor. Indeed,
choose an orthonormal basis q1 , q2 , • • • , qn and denote by D ft
a diagonal matrix made up of the lengths of the columns of the
matrix F 9 of (103. 7). We now have E 9 = (F qlJii1) (D '!A). The-
matrix D 9 .A is right triangular. But G9 and G1 (D;z1) 1 are unat matri-
ces. In this case according to (103.8) (F 9D-q 1)' (F qD;z 1) = E, i.ep
F qD;z 1 is a unitary matrix.
The fact that the basis t 1 , t 2 , • • • , tn is left pseudodual to the-
basis e1 , e2 , • • • , en implies that for the bilinear form (x, y) deter-
mining the scalar product in Kn conditions (t, e1) = 0 for j < i
and (t, e 1) = 1 for every i hold. In other words, this implies that
for a pair of bases e1 , e1 , • • • , en and t1 , t 1 , • • • , tn the matrix
G1• of the bilinear form (x, y) is a right triangular matrix with unit
362 Bilinear Forms in. Computational Processes [Ch. 13

diagonal elements. From this we conclude that the coordinate trans-


formation matrix Q- 1 for a change from the original basis q1 , q2 , • • •
. • . , q,. to t 1 , t 2 , • • • , t,. is right triangular. Now the diagonal ele-
ments will not equal unity, however, since the vectors t 1 , t 2 , • • • , tn
have undergone normalization. We have
G,, = Q-t'Gqe
-and further
Gqe == Q' Gte·
The process of constructing a basis pseudodual to a given basis
also proves to be closely related to factorization (93.1) of a matrix
into triangular factors.
Theorem 103.2. For process (103.3) and (103.4) of constructing
a basis t 1 , t 2 , • • • , t,., left pseudodual to e1 , e 2 , • • . , en, starting
from q1 , q2 , • • • , qn to be implementable it is necessary and sufficient
that the matrix Gqe of a bilinear form (x, y) should have nonzero princi-
pal minors in tlie bases qtt q2 , • • • , qn and e1 , e 2 , • • • , en.
The proof of this theorem is omitted since it is almost a word-for-
W•>rd repetition of the proof of the preceding theorem.
To conclude, the orthogonalization processes considered carry over
completely to ordinary bilinear metric spaces. There are changes
in some details associated with complex conjugation. Besides, it is
more complicated to eliminate degenerate situations.

Exercises

t. What is the geometrical interpretation of the


orthogonalization process?
2. Prove that if the orthogonalization process is applied to a linearly depen-
dent system e1 , e,, ... , en, then /11. = 0 for some k~ n.
3. Let a quadratic form (:z:, :z:) have no isotropic vectors. How is a basis
of the given system of vectors to be constructed using the orthogonalization
process?
4. Prove that if the orthogonalization process is carried out in a Euclidean
or a unitary space, then I {11. I ~ I ell I for every k, equality holding if and only
if a vector ell 1s orthogona to vectors e 1, .•• , ell-l·
5. Let the coordinates of vectors e1 , ••• , en form a triangular matrix in some
orthonormal basis of a Euclidean or a unitary space. How is the matrix of coor-
dinates affected by the orthogonalization process?
6. Is it possible to construct a dual basis using the orthogonalization pro-
-cess?
7. How is the orthogonalization process to be applied to obtain a right
pseudodual basis?
8. Prove that the decomposition of a complex nonsingular matrix into
a product of a unitary matrix and a right triangular matrix is unique if it is
required that the diagonal elements of tlie triangular matrix should be positive.
9. How is the orthogonalization process to be applied in order to solve
-a system of linear algebraic equations?
1041 Orthogonalization of a power sequence 363

10. Let a space Kn be singular. How is a Jlseudoorthogonal basis of a non-


singular subspace of maximum dimension to be constructed using the orthoKO-
nalization process?
11. Does the construction of pseudoorthogonal systems of vectors become
simpler in a singular space if the quadratic form (.:z:, .:z:) is of constant sings?

104. Orthogonalization
of a power sequence
In orthogonalization processes the coordinate
transformation matrix for a change from the old basis to a new basis
is always triangular. However, if the original basis is chosen in
a special way, it is possible to obtain significantly simpler representa-
tions for the coordinate transformation matrix and hence simpler
.orthogonalization processes.
Let A be some operator in a nonsingular Hermitian bilinear metric
space Kn. Take a nonzero vector x and consider a sequence of vectors
x, Ax, A 2x, ... , A 11 - 1x. (104.1)
We shall call such sequences power sequences generated by the vec-
tor x.
In any power sequence some number of the first vectors is linearly
independent. Suppose k is the largest of such numbers. This means
that there are numbers a 0 , a 1 , ••• , a 11 , with a 11 :::/= 0, such that
CloX + a 1Ax + ... + a 11 A 11x = 0. (104.2)
Denote by <p (A.) = a 11A. 11 + ... + a A. + a 0 a polynomial
1 of de-
gree k. Clearly (104.2) is equivalent to
<p (A) X= 0. (104.3)
There are many polynomials for which relations of the type
(104.3) hold. In particular, such a polynomial is the characteristic
polynomial of A. But there is clearly a polynomial of the lowest
degree among them. It is called the minimum polynomial annihilat-
ing the vector x. It is clear that its degree equals the maximum
number of the first vectors of the power sequence (104.1) that form
the linearly independent system or equivalently is a unity less than
the minimum number of the first vectors that form the linearly
dependent system.
The degree of the minimum polynomial turns out to be closely
related to the expansion of the vector x with respect to the root
basis of the operator A by the heights of the root vectors and the
number of mutually distinct eigenvalues. That is, we have
Lemma 104. t. The degree of a minimum polynomial annihilating
a vector x equals the sum of the marimum heights of the root vectors of
Bilinear Forms in Computational Process88 [Ch. t3

an operator A present in the expansion of x with respect to a root basis


and co"esponding to mutually distinct eigenvalues.
Proof. We represent a vector x as a sum
x = u1 + u + ... + u,,
2 (104.4)
where u 1 , • • • , u, lie in different cyclic subspaces of A. Since differ-
ent cyclic subspaces have no common vectors except the zero vector,
for equation (104.3) to hold it is necessary and sufficient that the
equations q> (A) u 1 = 0 should hold for every i. If u 1 is a root vector
of height m 1 and corresponds to an eigenvalue A., then the equation
q> (A) u 1 = 0 will hold if and only if the polynomial q> (A.) is divisible
by (A. - A. 1t, where r ;;> m 1• In this case q> (A) UJ = 0 not only for
j = i but also for all those 1 for which the vectors UJ correspond to
the eigenvalues coinciding with A. 1 and have heights not greater
than r. Let A.;,, •.. , A. 1P be mutually distinct eigenvalues corre-
sponding to the vectors u1 , ••• , u, of (104.4) and let mt, •••• , m 1P
be the maximum heights of the root vectors u 1 , • • • , u, correspond-
ing to the eigenvalues coinciding with A. 11 , ••• , A- 1p • Then
q>(A.)=(A.-A.t,)m's. •• (A.-A.tp)mlp

will be the minimum polynomial annihilating the vecor x. Thus the


lemma is proved.
Suppose the vectors e1 = A t-lx for 1 ~ i ~ k are linearly inde-
pendent. Apply to this system the process described earlier for obtain-
ing a pseudoorthogonal system of vectors f, assuming of course that
the process is feasible. If the operator A is in no way related to the
scalar product introduced in Kn, then there is little hope for any
simplification of the process. The situation sharply changes, however,
if A satisfies relation (101. 7), if for example it is a self-adjoint opera-
tor in a unitary space.
Theorem t04. t. If an operator A satisfies relation (101. 7), vectors
e1 = A t-lx are linearly independent for 1 ~ i ~ k and vectors f1, •••
• • •, !1r. are obtained from e1 , • • • , e, using the pseudoorthogonaliza-
tion process, then the following relations hold

'· = x,
Is = A/1 - a.Ju (104.5)
t > 1,
where
i> 1,
(104.6)
1.04} Orthogonalization of a power sequence 365

For definiteness the proof is made in a Hermitian bilinear metric


space. Taking into account the form of vectors e, we conclude from
(103.5) that
f-2
/, = AHz+ ~ YJ. 1A z
1

for some numbers YJ. 1• It follows that the vector / 1+1 - A/1 is in
the span of vectors z, Ax, •.. , A 1- 1x or equivalently that of vectors
ft, / 1 , • • • , f 1• Therefore

e
for some numbers 1, t+l· The conditions of the left orthogonality
.of / 1+1 to / 1 , / 2 , ••• , ft yield the following system of linear al-
gebraic equations to determine the coefficients 61 , 1+1:
Sl, 1+1 (fit /1) = -(A/, /1),
e1. 1+1 (fl. t2> + s2. 1+1 <J2. t2> = -<At, t2>·
(104. 7)
~~. 1+1 (fl. IH) + ... + Sl-1, 1+1 <JI-1• fl-1) = -(A/, "-1>•
61, 1+1 <J1, /1) + ... + 61-1, 1+1 (JI-1, /1) + 61. t+l <J, /1)
=-(A/, /1).
Under the hypothesis of the theorem the operator A satisfies
condition (101. 7). Therefore in view of the pseudoorthogonality of
the system of vectors t1 we have for l < i - 1
(A/1• /,) = (fl, A•fl) =(II, (a.E +JM) /1) =a (/1• f,) +"ji (II, A/1)
I I
= ~(11. ft+t- ~ 61. 1+d1) = ~ {(!1• f~+t>- ~ ~J. 1+t U1• !1)} = o.
1=1 1=1

Among the right-hand sides of system (104. 7) only the last two are
s s
nonzero. Hence only 1- 1 , l+l and 1 , l+l may be different from zero,
which proves the validity of relations (104.5). The value for the
coefficient a. 1 is found from the condition of the left orthogonality of
/ 1 to / 1 and the values for the coefficients a. 1 and ~ 1 _ 1 are found from
the condition of the left orthogonality of the vector 11+1 to the vec-
tors / 1 and /1-1·
Thus the orthogonalization process for a power sequence does turn
out to be much simpler than that for a general sequence. If at some
step it is found that (j, / 1) = 0 but / 1 =I= 0, then the degeneration
of the process can be avoided by choosing a new vector x.
Suppose that there are n linearly independent vectors in the power
sequence. Applying the orthogonalization process it is possible to
366 Bilinear Forms in Computational Processes [Ch. 13"

construct a basis / 1 , / 2 , • • • , In of a space Kn· This basis is remark-


able for the matrix A 1 of the operator A having in it a tridiagonal
form
(at ~~ )
1 a2 ~2 0
1 as ~3
At= (104.8)
0 1 an-t ~n-l
l an J
1
Indeed, the columns of the matrix of the operator are the coordi-
nates of the vectorsA/1 , A/ 2 , ••• , Afn relative to the basis / 1 , / 2 ,
... , fn· But by (104.5)
A/1 = aJ1 + /2,
A/2 = ~~~~ + aJ2 + /,,
(104.9)
Afn-1 = ~n-Jn-2 + a11-Jn-1 + fn,
Afn = ~n -1/n-1 + anfn·
If there are not n linearly independent vectors in the power se-
quence, then applying the orthogonalization process we get fr+l = ()
for some r < n. We take a new vector u and form a vector
r

V=U-~TlJfJ
1=1

determining the coefficients T) 1 from the left orthogonality of th&


vector v to vectors / 1 , / 2 , • • • , fr· We construct a power sequence
generated by v. It is easy to show that each vector of that sequence
is also left orthogonal to / 1 , / 2 , • • • , fr· This property is by con-
struction characteristic of the vector v. Suppose that we have proved
it for all vectors v, Av, ... , A ltv. Then, considering relations (104.9)
and equation (101.7), we get for the operator A
(A 11 +tv, f 1) = (Ahv, A*/1) = (A"v, (aE +~A) f,)
- It - It
=a (A v, /,)+~(A v, ~ 1 -tf1-t + a,f, + fl+t) = 0.
Applying the orthogonalization process to the new sequence we
construct a system of linearly independent vectors q1 , q2 , • • • , q8
pseudoorthogonal to one another and, as a collection, to / 1 , / 2 , • • •
. • . , fr· If r + s < n, we continue the process of supplementing the
basis until we have constructed a basis of the entire space, which
will decompose into a direct sum of invariant subspaces. In such
a basis the matrix of the operator A will have a block-diagonal form
with tridiagonal blocks of the type (104.8).
10:>) Methods of conjugate directions 367

Exercises

1. Prove that the minimum polynomial annihilating


a vector xis a divisor of tbe characteristic polynomial.
2. Prove tbat the minimum polynomial annihilating a vector x is unique
up to a scalar multiplier.
3. Prove that if A is a Hermitian operator and Kn is a unitary apace then
formulas (104.6) become as follows:
(A[,, II)
U~o It> '

4. Prove that under the conditions of Exercise 3 there is a diagonal matrix D


such that for the matrix A 1 of (104.8) the matrix D-1A 1D is real symmetric
tridiagonal.
5. Prove that if condition (101.i) holds then the matrices of bilinear forms
(Ax, y) and (x, A y) in the basis /1,.!, ... , In are right almost triangular.
6. Prove that if conrlitions o ,t;xercise 3 hold tben the matrices of bilinear
forms (Ax, y) and (x, Ay) in the basis fJ., ... , In are Hermitian tridiagonal.
7. Prove that if Kn is a singular space tllen using processes (104.5) and (104.6)
it is possible to construct a pseudoorthogonal basis of a nonsingular subspace
of maximum dimension.

105. Methods of conjugate directions


Constructing orthogonal, pseudoorthogonal,
etc. systems of vectors, especially by making use of power sequences,
provides great possibilities for developing various numerical methods
of solving equations of the form
Ax= b, (105.1)
where A is an operator in a vector space Kn, b is a given vector and x
is the desired vector.
We have repeatedly touched on various aspects of this problem.
We now describe a large group of numerical methods of solving
equation (105.1) known under the general name of methods of conju-
gate directions. They are all based on orthogonalization processes
of power sequences. Suppose for simplicity of presentation that the
operator A is nonsingular and that hence equation (105.1) always
has a unique solution. We assume that the space Kn is complex and
that the scalar product in it is given using a symmetric positive
definite Hermitian bilinear form, i.e. that Kn is a unitary space.
We take any nonsingular operators C and B and let s 1 , • • • , Sn
be some CAB-pseudoorthogonal system of vectors, i.e. let
(CABsh s 1) :/= 0, (CABs, s 11 ) = 0, k < i,
368 Bilinear Forms in Computational Processes [Ch. t3

for every i. Denote by x 0 the initial vector and let

(105.2)

r 1 = Ax 1 - b.
Then from the relations
x 1 = x 1_ 1 + a Bs1 1 (105.3)
it follows that
r 1 = r 1_ 1 + a ABs,. 1 (105.4)
It is easy to show that for the chosen CAB-pseudoorthogonal sys-
tem s1 , • • • , Sn
(Crtt s 11 ) = 0, 1 ~ k ~ i. (105.5)
Indeed
n
rl=Ax 1-b=A(x,-x)=- ~ a1ABs1
J-t+t
and further
n
(Cr 1, s11) =-. ~ a1 (CABs I• '") =0
1=i+t
for every k ~ i.
We assume that the system of vectors s1 , • • • , sn is constructed
parallel to the system r 0 , ••• , rn_ 1 using the process of its CAB-
pseudoorthogonalization. Set s1 = r 0 and for every i we have
t
St+t = r, + 11-t
~ P11, t+tSII• (105.6)

A2. always the conditions of the left CAB-pseudoorthogonality of


a vector s1+1 to vectors s1, • • • , s1 yield a left triangular system to
determine the coefficients P11 • 1+ 1 • In this case is a linear combina- r"
tion of vectors s1 , ••• , s11 + 1 • Hence the scalar product (Cr;, r 11 ) is a
linear combination of numbers (Cr, s1), • • • , (Cr, sll+ 1 ) and it is
zero by (105.5) for k < i, i.e.
(Cr, r 11 ) = 0, k < i. (105.7)
This means that if (Cr, r 1) :/= 0 for every i then the sequence of
vectors r 1 is C-pseudoorthogonal. In an n-dimensional vector space
a C-pseudoorthogonal system cannot contain more than n nonzero
vectors. At some step of the computational process therefore one of
the discrepancies vanishes and we obtain an exact solution of equa-
tion (105.1).
105} Methods of conjugate directions 369

To implement the process, it is necessary for us to determine the


coefficients a 1 in (105.2) and the coefficients ~~~. I +I' in (105.6). The
coefficients a 1 can always be found in a very simple' way. According
to (105.4), (105.5) and (105. 7) we have
ICr1-~o s, l
(CABs, sf)
(105.8)

In general computation of the coefficients ~~~. 1+ 1 is much more


complicated. But if the operators A, B and C are connected by the
relation
(CABC- 1 )* = a.E ~AB + (105.9)
for some numbers a. and ~. then among all the coefficients ~"·HI
only ~~. l+l can be nonzero. Let
(105.10)
The coefficient b1 is uniquely determined from the condition of the
left CAB-orthogonality of the vector s1+I to s, which, considering
(105.9) and (105.10), yields
b ~ _ (CABr,. s1 ) = _ ~ (Cr,, ABsj)
I (CABs,, sf) (CABs,, Sf) •
(105.11)

Suppose that computing a sequence of vectors s1 by formulas


(105.10) and (105.11) we have shown that the sequence s1 , • • • , s1
forms a CAB-pseudoorthogonal system. This is obviously true for
i = 2. Taking into account (105.9) we get for k < i from (105.4)
to (105. 7)
(CABsl+u s11 ) ~ (CABr,. s") + b,(CABs1, s11 ) = ((CABC- 1) Cr 1, s11 )
= (Cr, (CABC- 1)* s11 ) = (Cr,. (a.E + ~AB) s11 )=a (Cr,, s 11 )-l-~(Cr 1 , ABs,J

= ~ ( Cr 1, - 1 (r"- r"_ 1)) = ! {(Cr1, r 11 ) - (Cr,, r 11 _ 1)} = 0.


a" a 11
Thus, (105.9) holding, the solution of the operator equation (105.1)
an be effected according to the following prescription:
sl = ro,
r 1 = r 1_ 1 + a,ABs, (105.12)
s1+1 = r 1 + b 1sh
x1 = x 1 _ 1 + a,Bs 1•
Here x 0 is an arbitrary initial vector, the coefficients a 1 and b 1 are
computed according to (105.8) and (105.11). Letting u 1 = Bs,.
370 Bilinear Forms in Computational Processes [Ch. 13

the process will be as follows:


u 1 = Br0 ,
r, = r,_ + a Au
1 1 1,
u,+l = Br, + b u,,
1
z, = z 1_ 1 + a u,.
1
with
(Crt-It r,_ 1 ) (B'"~*Crt-lt u,)
a,== (CAu 1, r,_ 1 ) (.B- 1 *CAu,. u,)'
b _ _ (B-hCABr, u,) _ _ A (Cr,, Au1)
'- (B-hCAu, ul) - t' (B-t•CAu, ui) •

It is these processes that are called methods of conjugate directions.


From formulas (105.4) and (105.10) we conclude that vectors r 1
and s1+1 are linear combinations of vectors of the same power se-
quence
r 0 , ABr0 , • • • , (AB)'r0• (105.f3)
Moreover, they are obtained from it using C- and CAB-pseudoorthog-
onalizations respectively. This result has consequences of excep-
~ional importance.
If not all the components are present in the expansion of the vector
r 0 with respect to the Jordan canonical basis of the operator AB, then
the vanishing of the discrepancy occurs earlier than at the nth step.
The process terminates especially rapidly if AB has a simple struc-
ture and a large number of coinciding eigenvalues. That is, if in
the expansion of r 0 with respect to the eigenvectors of the matrix of
AB the nonzero components correspond to m mutually distinct
eigenvalues, then rm = 0.
By Theorem 101.1 trinomial relations of the type (104.5) must
hold for vectors s, and r,. They can be obtained directly from (105.4)
and (105.10). Namely,
s1+l = a 1ABs 1+ +(1 s,_
b1) s, - b 1_1 1 , i > f,
btal+t ) btai+l ._ (105.14)
r 1+1 =a 1+1 r +( +
AB
1 1 --
a1
r 1 - - - r1_ 1,
a,
1
t::::::"'.

From these we can obtain other relations, for example this:


z 1+1 = z 1_ 1 + +
w1+1 (a. 1Br1 z, - z,_1),
'(l.·here w,+l and a. 1 are suitably chosen numbers.
In view of what has been said concerning sequence (105.13) note
the following peculiarity of condition (105.9). On the face of it this
condition differs from those of the type (101. 7). But if we take into
account (101.6), then it is easy to show that (105.9) is in fact also
a condition of the type (101. 7), in relation to two scalar products
105] Methods of conjugate directions 37t

(CABx, y) and (Cx, y) for that matter. Indeed, observe that the
adjoint operator in (105.9) is connected with the basic scalar product
of a unitary space, whereas the orthogonality of vectors s1 and r 1
is ensured in relation to the scalar products (CABx, y) and (Cx, y)
respectively
cAs(AB)* = (CAB·AB· (CAB)- 1 )* = (CABC- 1)* = aE+PAB,
c(AB)* = (CABC- 1)* = aE + PAB.
The implementation of methods of conjugate directions can be
prevented only by the vanishing of one of the scalar products,
(CABs, s 1) or (Cr 1_ 1 , r,_ 1), before the discrepancy vanishes. If
(CABs, s 1) = 0, then the coefficients a, and b1 cannot be computed.
If, however, (Cr 1 _ 1 , r 1 _ 1 ) = 0, then this leads to a zero coefficient a,.
to the coincidence of the nonzero discrepancies r 1_ 1 and r 1 and hence
to the equation (CABsl+lt s 1+ 1) = 0 holding. Such a situation can
be avoided by choosing a new initial vector x 0 • If the operators CAB
and C are positive definite, then the above degenerations are impos-
sible and the computational process runs without complications. If
CAB is positive definite, then the methods of conjugate directions
acquire further interesting properties.
The closeness of a vector z to the solution of (105.1) can be judged
by the smallness of the square of some norm of the difference e =
= x - z. To that end it is convenient to use the so-called generalized
e"or functionals of the form (Re, e), where R is any positive definite
operator, for example B- 1*CA. That operator is positive definite,
since it is connected with the operator CAB by the relation B- 1*CA =
= B-1 • (CAB) B- 1 . We have
Theorem t05.t. If an operator CAB is positive definite, then among
the totality of vectors of the form z = x 0 + Bs, where s is in the span
of vectors s1 , • • • , s,. a vector x 1 gives a minimum of the generaltzed
e"or functional
<p (z) = (B- 1 *CAe, e).

Proof. Since CAB is positive definite, the system of vectors s 1 is


CAB-orthogonal. We represent a vector z as a decomposition similar
to (105.2) for a vector x:
1

z=x 0 +B)~ h 1s1.


i=l

We have
<p (z) = (.8"" 1*CA (x- z), x- z)
372 Bilinear Forms in Computational Processes [Ch. 13

1 n
= ~ !aJ-hJI 2 (CABs1, sJ)+
J=l
2: laJI 2 (CABsJ,
J=t+l
sJ)•

From this we conclude that a minimum of the error functional is


attained for h1 = ab j ~ i, i.e. for z = x,.
The error functional cannot be determined in practical calcula-
tions, since it depends on the solution x, which is unknown. However,
it differs only in the presence of a constant term from another func-
tional:
,,. (z) = (B- 1*CAz, z) - 2Re (B-1 *Cb, z),
which can be calculated. Indeed,
cp (z) = (B- 1*C A (x - z), x - z) = (B- 1 *CAx, x)
- (B- 1*CAx, z) - (B- 1 *CAz, x) + (B- *CAz,
1 z)
= (B- 1 *CAz, z) - (B- 1 *Ch, z) - (B- 1 *Cb, z)
+ (B- 1 *Cb, x) = lJ> (z) + (B- *Cb,
1 x).
Finally we note some classes of operators for which condition
(105. 9) holds.
J. 1. All operators A, B and C are Hermitian, with B = C. Condi-
tion (105.9) hold-; for a = 0 and ~ = 1:
(CABC- 1 )* = (BA)* = A *11* = O·E + 1·AB.
2. Operators CAB and C are Hermitian too. Condition (105.9)
again holds for a = 0 and ~ = 1:
(CABC- 1 )* = C- 1 * (CAB)* = c- 1CAB = O•E + 1·AB.
3. An operator C is commutative with AB, and AB is normal and
its spectrum lies on a straight line. The last condition implies that
AB = yE +6H for some Hermitian operator II. We now find
1
(CABC- )* = (CC- 1 AB)* = (AB)* = (yE + 6H)*
=yE -fJH= :V~~~'\' E.,~ (yE+6H)= 2 Im~{v~lE+ ~ AB.

4. Represent an operator A as A = M N, where M = M* +


and N = -N*. If M is a nonsingular operator, then put B = C =
= M- 1 • Condition (105 9) holds for a = 2 and ~ = -1:
(CABC- 1)* = (M- 1 (M + N))* N) M- 1
= (M -
= 2£- (M +
N) M-1 = 2·E- f·AB.
10fiJ MaiD variants 373

5. If in the decomposition A = M + Nit is N that is a nonsingu-


lar operator, put B = C = N- 1 • Condition (105.9) holds for a = 2
and ~ = -1:
(CABC- 1 )* = (N- 1 (M +
N))* = -(M- N) N- 1
= 2E- (M +
N) N- 1 = 2·E - 1·ABo

Exercises
1. Prove that the matrix of a bilinear form (CA Bx, y) is:
right triangular m a basis sJ, ... , sn,
right almost triangular in a basis r 0 , ••• , rn_1,
left triangular in bases s1 , • • • , sn and r 0 , ••• , rn_ 1 if Cis a Hermitian opera-
tor.
2. How does the form of the matrt:\ of the bilinear form (CABx, y) of Exer-
cise 1 change if CAB is a Hermitian operator?
3. Prove that the matrix of a bilinear form (Cx, y) is:
right triangular 1n a basis r 0 , ••• , rn _1,
right triangular in bases r 0 , . . . , rn- 1 and s1 , ••• , sn,
right triangular in bases A Bs1 , • • • , A Bsn and s1 , • • • , sn,
right almost triangular in bases ABr0 , ••• , ABrn-1 and r 0 , ••• , rn_1.
4. How does the form of the matrix of the bilinear form (Cz, y) of Exercise 3
change if C is a Hermitian operator?
5. Prove that if condition (105.9) is replaced by the following:
(CABC-1)* = a. 1 E +
~AB + ... + a.p (AB)P, p> t,
then relation. (105.10) will be like this:
Sf+l = r1 + b1s1 + b1- 1st-1 + •••+ bl-p+JSI-P+l•
6. Prove that
(Crl-1• r1-1)
a,= (CABs,. s 1) o
7. Prove that if CAB and Care Hermitian operators then
b (Cr,. r 1)
I= (Crl-1• rl-1) o
8. Prove that if CAB and C are Hermitian. and positive definite operators
then a1 < 0 and b 1 > 0 for every t.
9. Prove that the matrix of an operator AB in a basis made up of vectors
s., ... •_!n or r 0 , • • • , rn-I has a tridiagonal form.
tO. How do the methods of conjugate directions run if A is a singular opera-
tor?

f06. Maio variants


We discuss the best known variants of the
methods of conjugate directions. Theoretically they all fit into
scheme (105.12) described above. Practical calculations, however,
are sometimes carried out using somewhat different algorithms.
Method of conjugate gradients. In this method A is a Hermitian
po"ilive definite operator, B = C = E and (105.9) holds for a = 0
374 Bilinear Forms in Computational Processes [Ch. 13

and ~= 1. The positive definiteness of the operators CAB = A and


C = E guarantees that the computational process runs without
degeneration. At each step of the method the error functional with
matrix A is minimized. The computational scheme of the method
has the form
sl = ro,
r, = r 1_ 1 a 1As,. +
s + = r + b sit
1 1 1 1
x 1 = x 1 _ 1 + a s,. 1
where
a,=-= _ (rl-t• r,_1) _ (r;_ 1, s1) <O b, = _ (r, As,) (r,. rj) >O
(As,. r,_ 1) (As,. s 1 ) ' (As,. s1) (rt-h r,_1) •
In the method of conjugate gradients vectors r 1 form an orthogonal
system, and vectors s1 form an A-orthogonal system.
Method of AA *-minimum iterations. In this method A is an
arbitrary nonsingular operator, B = A*, C = E and (105.9) holds
for a = 0 and ~ = 1. The positive definiteness of CAB = AA • and
C = E guarantees that the computational process runs without
degeneration. At each step of the method the error functional with
matrix E, i.e. the square of the Euclidean norm of the error, ismin-
imized. The computational scheme here has the form
u 1 = A*r0 ,
r 1 = r 1_ 1 +a1Au 1,
ui+ 1 = A*r 1 b1u 1, +
x 1 = x 1_ 1 +a 1u 1,
where
_ (rf-t• rt-1) (rl-t• r;-tl < O b _ (r, Au,) (r,, r,) > 0
a,-- (Au,. r,_ 1)
- (u,. u,) '
,-- (u,. Uf) (rf-h rt-1) ·
In the method of AA *-minimum iterations vectors r 1 and u 1 form
orthogonal systems.
Method of A* A-minimum iterations. In this method A is an ar-
bitrary nonsingular operator, B = A*, C = AA * and (105.9) holds
for a = 0 and ~ = 1. The positive definiteness of CAB = (AA *)s
and C = AA * guarantees that the computational process runs with-
out degeneration. At each step of the method the error functional
with matrix A* A, i.e. the square of the Euclidean norm of the
discrepancy vector, is minimized. The computational scheme of the
method has the form
u 1 = A*r 0 ,
r 1 = r 1_ 1 -+- a1Au 1,
u 1+1 = A*r 1 +b u 1 1,
tOO] Maio. variants 375

where

In the method of A* A-minimum iterations vectors A *r 1 and A u 1


form orthogonal systems.
Method of complete Hermitian decomposition. In this method A
is an arbitrary nonsingular operator. We represent it as a sum A =
= M + N, where M = M* and N = -N*. If either MorN is
nonsingular, we set B = C = M- 1 or B = C = N-1 respectively.
Condition (105.9) holds for a = 2 and ~ = -1. If M (or iN) is an
operator of constant signs, the process runs without degeneration.
For example, let M > 0 and B = C = M-1 • The operator C will
be positive definite and therefore (Cz, z) = (M- 1z, z) > 0 for any
nonzero vector z. Consider now the operator CAB= M-1 + M- 1N M- 1 •
For any z =1= 0
(CABz, z) = (M- 1 z, z) + (M-1N M-1z, z) =1= 0,
since the first scalar product at the right is real and, in view of the
positive definiteness of the operator, positive and the second scalar
product is pure imaginary in view of the fact that the operator
M- 1N M-1 is skew-Hermitian. For the case B = C = M-1 the com-
putational scheme of the method has the form
Mu 1 = ro,
r1 = r 1_1 + a Au,.
1
Mv 1 = r,
u 1+1 = v1 +bu 1 11
x1 = x 1_ 1 + a,u,.
where
(r,_ 1, u,) b _ (Vf, Aut)
a,= - (Au 1, ut) ' 1- (Au 1, ul) •

If B = C = N- 1 , then the computational scheme and the formulas


for the coefficients a 1 and b 1 remain the same, except that l'lf is re-
placed by N of course.
Method of incomplete Hermitian decomposition. In this method A
is a Hermitian positive definite operator. We represent it as a sum
A = M + N, where M = M* and N = N*. If M is nonsingular,
then we set B = C = M- 1 • Condition (105.9) holds for a = 0 and
~ = 1. If M is positive definite, the process runs without degenera-
tion. At each !'ltep of the method the error functional with matrix A
is minimized. The computational scheme remains the same as in
the case of the complete decomposition method.
376 Bilinear Forms in Computational Processes [Ch. 13

Speeding up the computational process. A~ we have already noted,


the methods of conjugate directions allow a solution to be found
especially rapidly if the operator AB has few mutually distinct
eigenvalues. U!->e is made of this fact lo construct ,·arious devices
for ~peeding up the ~olution of equation (105.1), which are based
on the following idea.
Suppose the operator A can be represented as a sum A = M N, +
where the operator M determines the "principal" part of A and
allows a simple solution of equations of the type (105.1) with M
at the left. Now instead of (105.1) we shall solve the equation
(E +
NM- 1 )y = b, (106.1)
where Mx = y. If in some reasonable sense the operator M is close
to A, then most eigenvalues of N and hence of the operator N M- 1
are close to zero or zero. Applying the methods of conjugate direc-
tions to equation (106.1) in this case results in finding rapidly a
solution.
Observe that it is this idea that underlies the method of incomplete
Hermitian decomposition, which in many cases proves to be more
efficient than the classical variant of the method of conjugate gra-
dients. It all depends on how lucky one is in decomposing the op-
erator A.
We shall not discuss in detail the computational schemes of speed-
ing up processes, since they rely too heavily on the use of particular
features of the operator A.
Ezerclses
1. Under what conditions is it appropriate to apply
one variant or another of the method of conjugate directions?
2. How many iterations are required to implement the different variants of
the methods of conjugate directions for an operator A of the form E + R, where
the operator R is of rank r?
3. Assuming an operator A to be a matriJt evaluate the number of arithmeti-
cal operat.ions required for solving a system of linear algebraic equations by
the methods of conjugate directions.
4. The matrix of an operator A is Hermitian and differs from a tridiagonal
matrix in a small number of its elements. Which of the variants of the methods
of conjugate directions is it appropriate to apply in this case?
5. Let P 0 (t), P 1 (t), . . . be some sequence of polynomials. Choose a vec-
tor .:z: 0 and construct a sequence of vectors .:z: 0, .:z:1, ••• by the rule
.:z:llH = .:z:11- BP~a (AB) (A.x~a- b), k';;;l: 0. (t06.2)
How do the expansions of the discrepancies r 0 , r1 , • • • with respect to the 1ordan
canonical basis of the operator A B change ask increases depending on the choice
of a sequence of polynomials?
6. How is sequence (t06.2) to be used for the purpose of constructing an
initial vector for the methods of conjugate directions that ensures that a solu-
tion is obtained in a smaller number of iterations?
i. Which of the systems of vectors in each of the particular variants of the
methods of conjugate directions are up to normalization A-pseudodual?
i07] Operator equations and p~eudoduality 377

107. Operator equations


and pseudoduality
The methods of conjugate directions are not
the only methods of solving the operator equation
Ax = b (107.1)
that rPly on the use of bilinear forms. Vast possibilities for creating
methods arise from constructing systems of vectors dual or pseudo-
dual to some bilinear form related to the operator A of equation
(107.1).
We again assume that A is a nonsingular operator in a unitary
space Kn. Consider a bilinear form (Ax, y) and suppose that systems
of vectors u 1 , u 2 , • • • , Un and vl> v2 , • • • , Vn, A-pseudodual up
to normalization, have been obtained for it in some way, i.e. that
(Au, v1) :/= 0, (Au, v11 ) = 0, k < i (107.2)
for every i ~ n. We show that knowing A-pseudodual systems of
vectors makes it possible to construct the process of finding a solu-
tion of equation (107 .1).
We choose vector x 0 • Since A -pseudodual systems are linearly
independent, there is a decomposition
11

X=- Xo .L ajUJ.
+ 1=1 (107 .3)
If
i
x 1 = x0 + ~ a1uJ>
1=1

then similarly to (105.3) and (105.4) we have


x 1 = x 1_ 1 +
a 1u;. r; = r 1_ 1 a 1Au 1• + (107.4)
Further
n
r 1 =Ax1 -b=A(x1 -x)=- .~ a1AuJ>
1=f+1

and according to the second conditions in (107.2) we find that


n
(r,, v 11 ) = - . ~ aJ (Au,, v 11 ) = 0
1=i+1
for every k ~ i. So
(107.5)
for k ~ i. This makes it possible to determine coefficients a 1 from
(107.4). That is,
(107.6)
378 B1Iinear Forms in Computational Processes [Ch. t3

According to the first conditions in (107 .2) the denominator on the


right of (1 07 .6) is nonzero.
It follows from (107.5) that the vector rn is left orthogonal, and
by the symmetry of a scalar product simply orthogonal, to the lin-
early independent vectors v1 , v2 , ••• , Vn, i.e. that rn = 0 and
the vector Xn is a solution of equation (107.1).
The above methods of solving equation (107 .1) are known as
methods of dual directions. The number of different methods is
infinite in the full sense of the word since there are an infinite number
of different A-pseudodual pairs of systems of vectors. The methods
of conjugate directions considered earlier obviously belong to this
group.
In general there is no analogue of Theorem 105.1 for methods of
dual directions, even for a positive definite operator A. The beha-
viour of errors e11 = x - x 11 in these methods is described only by
the weak, and yet useful,
Theorem 107. t. Let P,. be an operator of projection onto the subspace
spanned by vectors u 1 , • • • , u 11 parallel to the subspace spanned by
vectors u 11 +l, ••• , Un· Then
(107. 7)
Proof. By formula (107 .3) we have the following expansion for
the error e 0 of an initial vector x 0 :

But by the definition of the projection operator


II

P11 e0 = ~~ a1u1.

The right-hand side of this equation is nothing but x 11 - x 0 • There-


fore
P 11 e0 = x" - x 0 = (x - x 0 ) - (x - x~r) = e 0 - e11 ,
which proves the theorem.
Interesting results associated with A-pseudodual systems can be
obtained by considering the matrix interpretation of the above
methods.
We assume that the space Kn is not only unitary but also arith-
metical, which isadmissible by virtue of the isomorphism of finite
dimensional vector spaces. The entire reasoning above remains valid,
only the terminology is changed: equation (107 .1) becomes a system
of linear algebraic equations, the operators are replaced by matrices,
and by vectors column vectors are meant. Denote by U (V) a matrix
whose columns are vectors u 1 , • • . , Un (v1 , ••• , Vn)· Then the fact
107] Operator cq uations and pseudoduality 379

that those vectors satisfy relations (107.2) implies that the matrix
C = V*AU
is nonsingular left triangular. From this we obtain the following
factorization of the matrix A:
A = v-l•cu- 1. (107.8)
So a knowledge of A-pseudodual, up to normalization, systems
of vectors allows us to solve the system of linear algebraic equations
(107.1) with error estimates (107.7) and to obtain factorization
(107.8) of the matrix A into factors among which there is one triangu-
lar factor. We show that the converse is also true. That is, any meth-
od of solving systems of linear algebraic equations, based on factor-
ing a matrix into factors among which there is at least one that is
triangular, determines some A-pseudodual, up to normalization,
systems of vectors. Hence implementing such methods according
to schemes (107 .3) to (107 .6), it is possible to use estimates (107. 7).
Consider a matrix P obtained from a unit matrix by reversing
its columns or equivalently its rows. It is easy to verify that multi-
plying an arbitrary matrix C on the right by P reverses the order
of the columns of C and multiplying a matrix CP on the left by P
reverses the rows of CP. The elements ! 11 of the matrix F = PCP are
therefore connected with the elements c11 of C by the relation
f11 = Cu-1+1. n-J+l•

A number of useful consequences follow. Let us label the diagonals


of a matrix parallel to the principal matrix upwards in succession by
the numbers -(n - 1), -(n- 2), ... , 0, ... , (n- 2), (n- 1).
The diagonal with the zero index is the principal diagonal. In such
an indexing the elements of the kth diagonal are defined by the rela-
tion j - i = k. If the matrix C satisfies the conditions
CIJ = 0, k < j - i, I- i < l
for some numbers i ~ k, then for the matrix F = PCP
I 11 = o, -l < I- i, 1- i < -k.
Hence under the transformation F = PCP a diagonal matrix re-
mains diagonal, a right (left)' triangular matrix becomes left (right)
triangular, a right (left) bidiagonal matrix becomes left (right)
bidiagonal, and so on.
Suppose now that use is being made of some method of solving
a system of linear algebraic equations (107.1) based on a preliminary
factorization of a matrix A:
A = QCR, (107.9)
380 Bilinear Forms in Computational Processes [Ch. 13

where C is a triangular matrix. It may be assumed without essential


loss of generality that C is left triangular, since otherwise instead of
(107.9) we should consider the factorization
A = (QP) (PCP) (PR),
where according to the foregoing the rna trix PCP must be left trian-
gular. The desired matrices U and V defining A-pseudodual, up to
normalization, systems of vectors u11 ••• , Un and v11 • • • , Vn can
be given by the equations
U = R- 1 , V = Q-1 *.
Observe that in factorization (107.9) generated by some numerical
method the matrices Q and R are as a rule sufficiently simple. Most
often they are unitary or triangular matrices as well as matrices
differing from triangular matrices in having permuted rows and
columns. Matrices R- 1 and Q- 1 * can therefore be found without
particular difficulties. At all events the total computational costs
of finding them are much lower than those of obtaining factoriza·
tion (107.9). This is characteristic of such widely known methods
as the Gauss methods, the square root method, the Jordan method,
the orthogonalization method, the method of reflections, the method
of revolutions; methods based on reducing a system to a bidiagonal
form and on obtaining normed decompositions; methods of conjugate
directions, and so on.
Thus most of the existing numerical methods of solving operator
equations (107.1) in a finite dimensional space are in fact methods of
constructing A-pseudodual systems of vector. Despite the diversity
of their specific modes all these methods can be analyzed from a gen-
eral standpoint based on Theorem 107 .1.

Exercises
t. Assume an operator to be a matrix and vectors of
a space to be column vectors. Prove that in the methods of dual directions
suCCessive errors are related by
ell = (E - Sit) elt-1•
where~
UJtV:A
S~t= v*Au • (107.10)
It It

2. Prove that operators Sit satisfy the equations


St=S~t, SiS~t=O,

S 1 (E - Sit) (E - S1t-1l ••• (E - S1) = 0, t < k.


3. Prove that the operators Sit of (t07.t0) and the operator Pit of (t07.7)
are related by
Pit = (E - SA) (E - S 11 _1) ... (E - S 1).
1081 D•linear forms in spectral problems 381

4. What role do operators Sit and Pit play io. the particular methods defined
by factorization (107.9)?
5. How do errors ell change in the particular methods defined by factoriza-
tion (107.9)?
6. Which of the known methods of solving systems of linear algebraic equa-
tions is not based on factorization (107.9)?

108. Bilinear forms


in spectral problems
Bilinear forms are widely used not only in
solving operator equations but also in many other problems of linear
algebra, in particular for determining the eigenvalues of an operator.
We now discuss two methods of finding the eigenvalues of a Hermi-
tian operator in unitary space.
Let A be a Hermitian operator. It certainly satisfies condition
(101. 7). We choose a vector x, construct a power sequence and sub-
ject it to pseudoorthogonalization process. Since the scalar product
is Hermitian-symmetric, the resulting sequence of vectors / 1 , / 2 ,
•.. , In is orthogonal. By (104.5)
/1 =X,
/2 = A/1 - a.J11.
(108.1)
i > 1,

formulas (104.6) assuming the following form by virtue of the orthog-


onality of the system of vectors / 1 , / 2 , • • • , fn:
a. - (Afl. Ill A -(A/,, /1-1)
I- (f,, Ill • t ' l - l - (fl-h /1-J).

The self-adjointness of A allows us to conclude that all coefficients


a. 1 and ~ 1 - 1 are real and that the coefficients
~t-J are in addition posi-
tive. Indeed,
(A/, / 1) = (1 1, A*/1) = (1 1, Af 1) = (Af 1, / 1),
whence it follows that a. 1 are real. Further (108.1) yields
(A/, /1-1) = (!, A/1-1) = (!, /1- a.,_Ifl-1- ~1-JI-2)=(/, /,).
Therefore
A - (/,, Ill
"' 1_1_ U1-1o f1-1l
and the positiveness of ~ 1 - 1 follows from the positiveness in unitary
spnce of scalar products of nonzero equal vectors.
382 Bilinear Forms in Computational Processes [Ch. 13

As already said, the matrix of A in the basis made up of vectors


• • • , In has a tridiagonal form. If in finding a basis / 1 , / 2 , • • •
/ 1 , / 2,
. . ., In there were early terminations of the process of orthogonaliza-
tion of the power sequence, then the matrix splits into a number of
tridiagonal matrices of smaller size.
The positiveness of ~ 1 allows the form of tridiagonal matrix (104.8)
to be simplified. Take a nonsingular diagonal matrix D with ele-
ments a 1 and perform a similarity transformation with matrix (104.8).
In the matrix D- 1A 1D the diagonal elements will remain the same
as in the matrix A 1, the elements ~ 1 a; 1 a 1 + 1 will take the place of
the off-diagonal elements ~" and the elements symmetric with
respect to them will equal a 1ait 1 • If we take
a 1 =1, al+ 1 = ~:; 2 , i;;;;::,1, (108.2)
then the matrix D is real and the matrix D- 1A 1D is real symmetric
tridiagonal. Since ~ 1 are positive, by (108.2) the diagonal elements
of D and hence the off-diagonal elements of D - 1A 1D are also positive.
Thus the problem of determining the eigenvalues and eigenvectors
of any Hermitian operator can always be reduced to that of determin-
ing the eigenvalues and eigenvectors of a real symmetric tridiagonal
matrix A with nonzero off-diagonal elements:
(a, 1'1
y, a2 '\'2 0
A= Ys

'Yn-2 an-I '\'n-1

l 0 'Yn-1 an

Such matrices are called Jacobian matrices.


One of the most efficient numerical methods of finding the eigen-
values of a Jacobian matrix is based on the law of inertia for quadrat-
ic forms. But before proceeding to describe it we consider some
properties of Jacobian matrices.
Theorem t08. t. All eigenvalues of a Jacobian matrix are simple.
Proof. Suppose the eigenvalue A. is multiple. Then the rank of the
matrix A - A.E must not be greater than n - 2. But it is clearly
not less than n - 1, since the off-diagonal elements are nonzero
and hence so is the minor in the first n - 1 columns and the last
n - 1 rows. The contradiction obtained proves the theorem.
Corollary. If a symmetric tridiagonal matrix has an eigenvalue of
multiplicity p, then there are at least p - 1 elements equal to zero
among its upper off-diagonal elements.
If A is a Jacobian matrix, then the matrix A - A.E remains
Jacobian for all real A.. Denote by a 1 (A.), •.• , O"n (A.) the principal
108) Bilinear forms in spectral problems 383

minors of A - 'J..E. It is clear that for all r, a r ('A) is a polynomial of


degree r coinciding up to a sign with the characteristic polynomial
of the matrix of the minor of order r for A. Since A is a symmetric
Jacobian matrix, all the roots of polynomials a r ('A) are real and
simple. Expanding a minor a r ('A) by the last row or the last column
we obtain the following recurrence relations:
a0 ('A)=1, ad'A)=a 1 -'A 1,
CJr ('A)= (ar- 'A) CJr-l ('A)- Y~-tCJr-z ('A), (10R.4)
2~r~n.
Theorem 108.2. For all r > 1 no polynomials a r ('A) and a r- 1 ('A)
have common roots.
Proof. Suppose for some r the number 'A is a common root of poly-
nomials a r ('A) and a r- 1 ('A). Then it follows from (108.4) that 'A is
a root of a r- 2 ('A), since Yr- 1 :/= 0. Proceeding with this reasoning we
arrive at the conclusion that 'A is a root of a 0 ('A). But a 0 ('A) has no
roots, and therefore no adjacent polynomials have common roots.
Corollary. If 'A is a root of a polynomial a r ('A), then a r-1 ('A) and
a r+l ('A) are nonzero and have opposite signs.
Let 'A be a root of none of the polynomials a 1 ('A), ••• , CJn ('A).
Calculate for that 'A the values of all polynomials from formulas
(108.4) and consider the alternation of signs in the sequence
Oo ('A), CJ1 ('A), CJ2 ('A), ••• , CJn ('A). (108.5)
Taking into account the law of inertia and the connection between
the principal minors of a matrix, the coefficients of the canonical
form and the eigenvalues we can say that the number n_ ('A) of
sign alternations in sequence (108.5) equals the number of the eigen-
values of the matrix A strictly less than 'A.
The presence of zero terms in (108.5) does not lead to any difficul-
ties. Indeed, choosing e sufficiently small, we can make all poly-
nomials a r ('J..- e) nonzero, while maintaining the signs of the non-
zero minors of (108.5). According to the corollary of Theorem 108.2
the signs of the minors that were zero do not affect the general
number of sign alternations in (108.5). Therefore all zero terms of
sequence (108.5), except CJn ('A), may be assigned arbitrary signs.
If, however, CJn ('A) = 0, then 'A is an eigenvalue of A. The number
of sign alternations n_ ('A) in sequence (108.5) without CJn ('A) is
again equal to the number of the eigenvalues of A strictly less than 'A.
Suppose the eigenvalues 'A of A are indexed in algebraically decreas-
ing order, i.e.
(108.6)
We show how to determine the ktn eigenvalue 'J.. 11 • Let numbers a 0
and b0 be known such that
bo > a 0 , n_ (a 0 ) < k, n_ (b 0 ) ~ k.
384 Bilinear Forms in Computational Processes [Ch. 13

Then A.ll is clearly in the half-interval [a 0, b0). Observe that we may


take any number less than -II A II as a 0 and any number greater
than II A II as b0 • Now set
t
c0 = 2(a 0 + b0)

and determine n_ (c 0 ). If n_ (c 0 ) < k, then A.ll is in the half-interval


!c0 , b0 ). If n_ (c0 ) ~ k, then A.ll is in [a 0 , c 0). Therefore we can always
find a half-interval, half the previous one, containing A.ll. Proceeding
with this process we obtain a system of embedded half-intervals
[a,, b,) containing A.ll, with

This allows the eigenvalue A.ll to be localized to any required preci-


sion.
The described method of determining the eigenvalues of a tri-
diagonal matrix is called the method of bisections and sequence (108.5)
is called the Sturm sequence. Once any eigenvalue A. has been deter-
mined, the eigenvectors relating to it are determined as solutions of
a homogeneous system of linear algebraic equations with a matrix
A - A.E.
Another method of finding the eigenvalues of a Hermitian opera-
tor A is based on some extreme properties of the related quadratic
form (Ax, x).
It is again assumed that the eigenvalues of A are arranged accord-
ing to (108.6). Suppose that for the eigenvalues A.,., A.,,, •.• , A- 1,.
orthogonal eigenvectors x,,, x,,, ... , x,,. are known. Denote by L,.
the span of those eigenvectors and let A.,. 1 be the largest of the eigen-
values of A.
Theorem 108.3.
(Ax, x)
A.,., = max -(--) , (108.7)
:t¢O z, X
:t.LL,.

and any vector for which a maximum is attained u·ill be an eigenvector


co"esponding to A-,. 1 •
Proof. The quotient on the right of (108. 7) will remain unchanged
if the vector x is multiplied by any nonzero number. Instead of
(108. 7) we may therefore investigate the following equation:
A.,., = max (Ax, x). (108.8)
(%, :11:)=1
:t.LL,

Choose in Kn an orthogonal basis made up of eigenvectors of the


operator A, including the vectors x,,, .•. , x,,. Then in this basis
i08) Bilinear forms in spectral problems 385

equation (108.8) assumes thE~ following~ form:


n
A.,.,= max ~ A. 1 1a112, (108.9)
laa1 1 + ... +lan12=11=1
at,-· .. -a 1,.=o

where a 1 , • • • , an are the coordinates of a vector x in the expansion


with respect to the chosen basis.
The unitary space Kn is complete, the set of vectors satisfying
the condition (x, x) = 1 is bounded, and therefore there are vectors
on which the maximum in (108.8) is attained. Let ~ 1 , • • • , ~n be the
coordinates of one of such vectors. We show that if A.1 :::fo A.,.,, then
~~ = 0. Indeed, if A.1 > A.,.. the equation ~~ = 0 follows from the-
condition x ...1.. L,., since eigenvalues greater than A.,., may exist only
among At,, •.• , At,.· If A.1 <A.,.., then the coordinate ~~ cannot be-
nonzero, since otherwise we could obtain a larger value of the sum
in (108.9) by taking I~,., 11 + I~~ 11 and 0 instead of the numbers
I ~,.. 1 and I ~~ 1'.
1

Thus the largest value of the sum in (108.9) may be attained only
for those d1 , • • • , dn for which d1 :::fo 0 only when A.1 = A.,.,. Hence-
the maximum value of the quotient in (108.7) is attained on eigen-
vectors corresponding to A.,.,. Of course equation (108. 7) follows too.
This theorem shows a way of constructing a numerical method of
finding the eigenvalues and the eigenvectors of the operator A,
which is based on seeking the maxima of the function
a(x) = (Ax, x)
(x, x)

called the Rayleigh quotient. We shall restrict ourselves to a brief


discussion of the method.
Take a vector x and consider the behaviour of the Rayleigh quo-
tient in the neighbourhood of that vector. Simple transformations
show that for any vector l and a small e the following representation
holds:

a (x + el) = a (x) + ( 2£
X, X
) Re (Ax - a (x) x, l) +0 (e1 ).
(108.10)
If Re (Ax- a (x) x, l) :::fo 0, then for sufficiently small real e whose
sign coincides with that of the real part of the scalar product we get
+
a (x el) > a (x), the inequality being opposite for small e of the
opposite sign.
386 Bilinear Forms in Computational Processes [Ch. 13

Suppose the vector x is not an eigenvector. Then Ax-e (x) x =1=


'=/:= 0 and we may set
l = Ax - e (x) x. (108.11)
The scalar product under the sign of the real part in (108.10) is
positive and we find from (108.11) for l that
8(x+el)=8(x)+-(2£)
x,x
(l, l)+O(e2).

We can now choose for a small positive e a vector y = x + el such


that e (y) > e (x).
Proceeding in a similar way we obtain, under some additional
conditions not to be discussed here, a maximum of the Rayleigh
quotient and hence a maximum eigenvalue and the corresponding
eigenvector. If at some step it is found that the vector l of (108.11)
is zero, then this means that xis an eigenvector and there corresponds
an eigenvalue to it equaling e (x). In this case, it is not possible at
all to guarantee the increase of the Rayleigh quotient relying only
on formula (108.10). For this reason the eigenvectors of the operator
A are called stationary points of the Rayleigh quotient in the terminol-
ogy of mathematical analysis.
If one or several eigenvectors are known, then in order to determine
another eigenvector we shall again seek the maximum of the Ray-
leigh quotient, yet not in the entire space but only in the orthogonal
complement of the subspace spanned by the previously found)eigen-
vectors.
Exercises
t. Prove that each root of the rolynomial a,._ 1 ().) of
(t08.4) lies between two adjacent roots of the polynomia a ().).
2. Suppose a tridiagonal symmetric matrix has severaf zero off-diagonal
elements. Does this mean that the matrix has multiple eigenvalues?
3. Prove that, with the coefficient an of matrix (t08.3) increasing (decreas-
ing) without limit, all the eigenvalues except the maximum (minimum) one
remain bounded.
4. Prove that, with the absolute value of the coefficient ~n-l of (t08.3)
increasing without limit, all the eigenvalues except the maximum and the
minimum one remain bounded.
5. A 1acobian matrix of size 2n +
t with elements a 1 = n - I +t and
61 = t has no multiple eigenvalues. Prove that its maximum eigenvalue differs
lrom the one closest to it by an amount of an order of (nl)-1 •
41. Prove that
, (Ax, x) , • (Ax, x)
11. 1 = ruax -(--) • 11.n = mtn - - .
x¢0 x, x x¢0 (x, x)
"1. What does the quantity
p=min (Ax, x)
X¢0 (X, X)
Xl.Lr
mean in the notation of Theorem t08.3?
Conclusion

This textbook has provided ample enough


material for the reader to comprehend both the theoretical basis of
linear algebra and the numerical methods it employs. However,
because of the peculiarities of individual syllabuses and the limited
lecture time allowed for this course some sections of this text may
escape the reader's attention. We shall therefore characterize briefly
all the material presented.
Linear algebra as a science studies sets of special structure and
functions of them. In general similar problems face other areas of
mathematics, for exam pie mathematical analysis. A characteristic
feature explicitly of linear algebra is that the sets are always finite
dimensional vector spaces and the functions are linear operators.
The general properties of vector spaces are discussed in Sections 10
and 13 to 21, and the general properties of linear operators are treat-
ed in Sections 56 to 61 and 63 to 74. The information presented in
those sections can be obtained in various ways, including direct
methods employing no concepts or tools but the most elementary.
Yet one of the additional concepts does deserve comment. It is the
determinant.
As a numerical function given on systems of vectors, the determi-
nant is a relatively simple object. Nevertheless it possesses many im-
portant properties. They have made it a widely used tool significant-
ly facilitating various studies. Besides, the determinant is very often
employed in constructing numerical methods. All this has led us to
give the concept of determinant sufficiently much attention by
considering its geometrical and algebraic properties in Sections 34
to 42 and 62. As a tool of study the determinant is used in this book
to prove diverse assertions.
Another numerical function of two independent vector variables,
the scalar product, defines two major classes of vector spaces called
Euclidean and unitary. The basic new concept in these spaces is
that of orthogonality. Sections 27 to 33 discuss the properties of
vector spaces due to the scalar product, and Sections 75 to 81 describe
the properties of linear operators due to the scalar product.
Systems of linear algebraic equations play an exceptionally im-
portant role throughout mathematics, not only in linear algebra.
Devoted to the study of the various aspects of them are Sections 22,
45, 46 and 48.
As a rule only the material listed constitutes the basis of a course
in linear algebra, with a course in analytic geometry added as a
separate unit. In the present text the necessary facts from analytic
388 Conclusion

geometry are given not in isolation but intermittently with the


corresponding facts from linear algebra. Such a presentation of the
material has allowed us to attain a number of advantages: to reduce
many proofs of the same type in both courses, to emphasize the
geometrical interpretation of abstract algebraic notions, such as
vector space, plane in a vector space, determinant, systems of linear
algebraic equations and so on.
Linear algebra is to a great extent enriched in new facts if the
concept of distance between vectors and that of limit of a sequence
of vectors are introduced into vector spaces. The necessity of intro-
ducing them is also dictated by the requirements of numerical
methods. The metric properties of vector spaces are studied in Sec-
tions 49 to 54, and those of linear operators are considered in Sec-
tions 82 to 84. Of course all these facts are usually given in function-
al analysis, but as a rule many results important for finite dimen-
sional vector spaces remain unaccentuated.
Numerical solution of problems in linear algebra is nearly ahrays
accompanied by the appearance of round-off errors. Therefore the
computer to be must realize what changes in the properties of differ-
ent objects of linear algebra result from small changes in vectors and
operators. The effects of small perturbations are treated in Sections
33, 87 and 89.
Properties of many objects of linear algebra may be reversed even
by arbitrarily small perturbations. Thus a linearly dependent system
of vectors may become linearly independent or increase its rank, an
operator with Jordan structure may become one of a simple struc-
ture, a compatible system of linear algebraic equations may become
incompatible and so on. All these facts give rise to exceptionally
great difficulties in practical solutions of problems.
It is imperative that the reader should once again peruse Section 22
to study the example given there and to ponder on the questions
posed at the end of the section.
Despite the instability of many notions of linear algebra its prob-
lems can be solved in a stable way. To demonstrate this the book
includes a description of a stable method of solving systems of lin-
ear algebraic equations. Its theoretical justification and general
scheme are given in Sections 85, 86 and 88.
The last part of the book is devoted to the description and investi-
gation of various questions relating to bilinear and quadratic forms.
These numerical functions play a very important role in linear al-
gebra and are closely related to the construction of numerical meth-
ods. Sections 90 to 94 deal with the general properties of bilinear
forms and with the connection of their transformations with matrix
decompositions, Sections 98 to 101 discuss the extended concept of
orthogonality and Sections 103 to 108 consider uses of bilinear forms
in computational processes.
INDEX

AccumulaUon point, 162 Coordinate syetem, rectangular, 78


Atllne coordinate system, 75, 76, 77 sphPrlcat, 81
left-handed, 109 Coordinate transformation matrix, 199
r~ght-banded, 109 Cramer's rute 159
Alfliie coordlna tcs of a point, 76, 77 Cylinders, 3311
Afflne proJections or a point, 76, 77
Angle, between a vector and a subspace, 102
between vectonJ, 83, 99 De Molvre's formula, 213
AXIS, 34 Determinant, 128
or a bsclssas, 76 Directed line segment, 21
or ordinates, 76 magnitude or, 34
or z coordinates, 77 product or by a number 35
Direction, asymptotlr., 316
nonasymptotlc, 316
Band matrix, 308 Direction subspace 145
Bases, blorthonormal, 338 Direction vector, i 39, 149
Basic Identity, 35 Distance, bo!tween sets, I 00
Basis, 53 between vcctonl, 100, 160
dual, 238 Distributive taw, 30
tert dual, 348 Dividing a segment In a given ratio, 84
left pseudodual, 348
orthonormal, 95
pseudoorthogonal, 346 Elgennctor, 204
right dual, 348 Ellipse 320
right pseudodual, 348 Elllpso\d, 327
Basis minor, 133 Equal elements, 20
Bilinear form, 283, 285 Equation, Intercept form of, 138
Hermitian, 285 Equation or a 111ane, normed, IU
HPrmltlan-symmetrlc, 28~ Equation or a plane in space, general, 137
rna trix or, 299 E1uation or a straight line, canonical 139
nonslngular, 290 n the plane, 139
nullity or, 290 normed, IH
polar, 285 parametric, 140 149
rank of, 290 Equivalence relation, 19
skew-Hermitian, 285 Erro1· functional, generalized, 371
skew-symmetric, 283 Euclldran Isomorphism, 104
symmetric, 283
transformation or, 294
zero, 283 Field, 31
Blnet-cauchy formula, 196 algebraically closed, 208
Finite product, U
Block matrix, 228
Flntte sum, '0
Fredholm alternative, 266
Fredholm theorem, 266
Canonical equation, of a cylinder, 331 Frobenl\18 matrl:.:, 208
of a hfP81'bola, 322 Function, continuous, 21'
of a hyperbolic parabola, 381 Functional, of discrepancy, 267
of a hyperboloid of one sheet, 328 rewlarl7.lng, 275
of a hyperboloid of two sheets, 829 Fundamental syetem of solutions, 157
of a parabola, 325 Fundamental theorem of alpbra, 216
of a straight line, 139
of an elllpae, 320
of an ellfpsold, 327 Gauss method, 70, 185
of an elliptic cone, 329 Gram determinant, 186
of an elliptic paraboloid, 330 expansion of wtth respect to a column, 181
Canonical factorization of a polynomial, 218 expansion of wtth respect to a row, 181
canonical form or a matrix, sot Gram matrlll:, 334
cauchy-Bunlakowskl-8chwarz Inequality, Gramtan~ 184.
92 106 Group, :.:6
Cayley-Hamilton theorem, 231 commutative (Abelian), 29
Class, 16 Identity of, 27
ClustP.r point, 161 operation, 27
COfactor, 130
COmptexlllcatlon of space, 251
Cone, elliptic, 329 Hadamard's Inequality, 117
COnJugate number&, 165 Half-space, closed, 155
Conv~ence In the norm, 169 negative, 154
COordinate convergence.~ 189 nonnegative, 155
COordinate equation, hi' nonposltlve, 155
COordinate 8)'8tem, afflne, 75, 77 open, 15'
cylindrical, 80 posl tlve t 54
polar, 79 Hl!tder's Inequality, 165
390 Index
Hyperbol11 322 lletbod, of conJugal~> directions, 370
Hyperboloid, of onl' sheet, 328 of conjugate gradients 373
or two she~>t,~ 329 of Incomplete Hermitian decomposition,
Hyperplane, 1~11 375
diiiMctrlcal, 317 Minimum polynomial, 363
ltflnkowskl spoce-tlml', 355
Mlnkowl'kl unlvl'rsl', 355
Identity element, 27 Mlnkowskl's lnequallty, 166
Identity motrlx, 190 Minor, 129
Identity operator, 178 algpbrale adJunct or, 130
Inclined line to a subspace, 102 complementary, 129
Inl'rtlll, Index of, 310 principal, 129
law or, 309 lllultlrJie root of a polynomial, 219
lnvrrslon, 124
Neighbourhood, 161
.Jacobi algorithm, 306 l\orm, Euclldl'an, 169
Jordan canonical form, 234 or operator, 257
compatlbll', 259
spectral, 2GO
Kroneckl'r-Capelll theorem, 157 or vector, 167

Lajrrangi''S Interpolation polynomial, 219


Laplace throrem, 130 Opl'rOtJon, algl'bralc, 13
Le11dlng element, 70, 135 assoctauv .. algebraic, 14
Limit of o sequence, 161 commutatiVe algebraic, 14
Llmlt point, 161 Inverse, 17
Lin<• segment ln vector space, 1~9 left Inverse, 17
right lnversl', 17
Operator, 177
Mntrlces, congruent, 291 adjoint, 235
equal, 191 almost equal to on Identity operator, 270
equivalent, 201 bound!'d, 25G
difference of, 192 characteristic polynomial of, 208
Hermitian-congruent, 291 complexillcatlon or, 251
product of, 193 condition number of, 272
similar, 202 continuous, 2Ml
sum or, 191 continuous at a point, 256
lrlotrix, adjoint, 237 domain of, 178
dl11gonat, 190 eigenvalue or, 204
llermltlan (self-adJoint), 254 eig!'nvector or, 204
lnvl'rse or, 195 Hrrmltlon (self-adjoint), [243
Jocohlan, 382 HemJltian decomposition of, 2-i8
left almost triangular, 308 imal(e or, 178
lelt trapezoidal, 301 Induced, 223
nonslngular, 195 Invariant ,ubspoce of, 222
norm of, 264 inverse, I 85
normal, 254 isometric, 243
of a system, 156 .Jordan canonical form of, 234
augmented, 157 kemel of. 179
orthogonal, 2M ll'rt adJoint, 350
positiVI' drtlnlte, 293 linear, 178
principal dlagonol or, 128 matrix of, 188
product of br o number, 192 nilpotent, 231
quasl·dlugonol, 231 nonnl'gntlve, 2U
rank of, 133 nonsingul11r, 184
rect11ngular, 132 norm of, 257
right almost triangular, 308 normal, 240
right trape1oldal, 301 null space of, 179
scalar, 190 nullity or, 179
singular, 195 or proJection, 191
skew-Hermitian, 292 or n slmpll' structure, 205
skew-symml'trlc, 291 opposite, 178
BQUarl', 127 orthogonal, 250
symmf'trlc, 254 perturbation or, 272
trace or, 195 polar factorization of, 249
transpose of, 128 po~ltlve dt>llnttl', 2U
tridiagonal, 308 power or, 186
unitary, 253 product or by a number, 181
Mat1h· norm, 264 proper ,ubspoce of, 204
or operator, 262 pseudolnvl'rse (genPrallzed lnvene), 261
subomlnate, 259 range of, 178
Method, or AA*·mlnlmum lterotlons, an rank of 178
or A*A·mlnlmum lterotlons, 37, right adJoint, 350
or blsec tiona, 38' root ba~ls or, 234
of complPtr Hl'rmltlon decomposition, 375 root subspace of, 230
IDdeJ: 39t
011erntor, root \'!'Ctor or, 230 Seaaence, fundamental, 162
scalar, liB 1nflnltely large, 164
singular, 184 Set, I I
si'lf.!'Ular baS<'s or, 247 bounded! 161
sln<:ulnr (principal) \"IIIU<', of, 247 closed, 62
syunnl'tric, 250 convn:, 153
unllar)·, 242 tlement or, 11
Opl'rator l'quntion. 194, 26~ llnltP II
Ot"'ralor polynomial. 22~ Similarity transformation matnx, 202:
Opt•rator:, commutalive, 183 Simple root or a polynomial, 219
ndic group or, ll!i Simplectic SIJRCI', 35~
<i!r. ct su111 or, 229 Space, arllhmellcaJl 68
nonsinf(ular group of, 184 bilinear metric, .:~33
product or. 182 complete, 163
rln~: or, 183 complex, 38
suhtraction or, 181 complex Euclidean, 3~5
sum or, 180 Euclidean, 91
Orthogonal complement, 97 lin Ite-dlmenslonal, ~ 3
Jl'rt, ::140 Hermitian bilinear metric, 333
right, 340 infinite-dimensional, ~3
Orthogonal sets, 96 linear, 38
Ort hogonallza tlon proceaaes, 3~9 metric\ 160
normea, 167
pseudounltary, 355
Parahola, 325 rational, 38
Par11bolold, elliptic, 330 rl'al, 38
hyprrbollc, 331 unitary, lOG
ParalJI'Iograrn Jaw, 26 Sphl're, 161
Permutation, 12~ closed, 162
l'nn, 124 Stationary point, 386
normal~.l24 Sturn1 sequence, 384
odd, I~ Subspace, cyclic, 233
l'l'lmu ta tlon 111a trJ:r, 312 lnvarlan t, 222
Perpl'ndlcular, to a hyperplane, 153 linear, 60
to a subspace, 102 nonslnguJRr, 339
Plaul' In a vector space, U5 nontrivial, 60
Plaul'<, crossing, 147 null, 340
lnll'rsectlng, 147 proper, 204
In ll'rseetlon of, 147 trivial, 60
pllrauel, 146 Subspace&, d1rrct sum or. 63
Polygon closing law, 25 Intersection or, 61
Powl'r sequence, 363 orthogonal sum of, 97
P~··udosolutlon (generalized solution), 267 sum of, 61
normal, 267 Summation lndPx, U
Sylvester's criterion, 311
System, compatible, 69
free unknowns of, 71, 159
Quadratic form, 284 general solution or, 1~7
mdex of ln,rtla or, 310 homo~teneous, I ~7
rna trJx of, 292 Incompatible, 69
nega tlve defU1It':,o 287 nonhomogeneous, 157
nonnegative, 28J nonnaJ solution of, 158
nonposltlve, 287 or linear algebraic equRllons, 69
non~lngular, 293 partial solution or, 157
or constant signs, 287 particular solution or, 1~7
posl tlve dellnl te, 287 rl'duced, 1!'17
signature or, 310 right-hand side or, 69
strictly of constant signs, 287 solution or, 69
unknowns or, 69
Systems or llnPar algebraic equatlonl, equiv-
Rayleigh quotient, 385 alent, 70
Ring, commutative, 30
noncommutatlve, 30
Root basis, 234
Root, or a polynomial, 211
or an o)l('rator, 245 Transitivity, 19
Root subSpace, 230 Transla tlon, 22
Root vrctor, 230 Translation vector, 14~
height or, 232 orthogona 1, 146
Transposition, 124
Triangle law, 24
Scalar product of vecton, 88, 90, 106 Triple &calar product, 110
Schur's theorem, 240
Second-degree curve, 320
Second-degree hypenurtace, 3 t 5
second-degree surface, 320 Unit matrl:r, 190
Sequl'nce, convergent, 161 Unit operator, 178
392 Index
Vector, 21, 38 Vectors, coplanar, 23
coordinate proJection of, 82 elemt>ntary transformation of a system of,
coordinates off 541 81 52
dlscrl'pancy o , 2o7 equal, 22
expansion of wltb rl'spect to a basis, M equlvall'nt systems or, 50
fixed, 21 In a general position, 148
Image or, 177 left-blinded triple of, 109
lnvprse Image of, li7 llnl'ar combination of, 45
Isotropic, 287 linear dependence or the systPm of~ U
l!'rt orthogonal, 334 linear Independence of the system or, U
ll'ngtb of, 82, 98 oriented volume of a system of, 110, ll6
norm of, 169 orthogonal sets of, 96
normal, 137, 151 psl'udoortbogonal system or, 346
normed, 92 rank or a system or, 52
orthogonal, 88, 94 right-banded triple of, 109
onto a subspace, 101
8'
orthogonal proJection of, span of, '5
subsystem of, ';,
on to the hyperplane, 153
orthonormal, 88, 94
system or, '5
volume of a system of, 110, ll5
projection or, 88 Vieta's formulas, 220
right orthogonal 334
\'ector addition, 241
Vector product, 110
Vector space, 38
Veator spacas, tsomorpbtsm of, 85 Zero divisor, 31
Vector subtraction, 25 Zl'ro matrix, 190
Vectors, basis of the system or, 51 Zero operator, 178
collinear, 23, 0 3 Zero subspace, 60

Printed In tlw Union of Sovt.t Soctalflt RepubUes

You might also like